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SERIES  FOREWORD 


With  the  rapid  introduction  of  highly  sophisticated  computers, 
(tele)communication,  service,  and  manufacturing  systems,  a 
major  shift  has  occurred  in  the  way  people  use  technology  and 
work  with  it.  The  objective  of  this  book  series  on  Human  Fac¬ 
tors  and  Ergonomics  is  to  provide  researchers  and  practition¬ 
ers  a  platform  where  important  issues  related  to  these  changes 
can  be  discussed,  and  methods  and  recommendations  can  be 
presented  for  ensuring  that  emerging  technologies  provide  in¬ 
creased  productivity,  quality,  satisfaction,  safety,  and  health  in 
the  new  workplace  and  the  Information  Society. 

The  present  volume  is  published  at  a  very  opportune  time, 
when  Information  Technologies  are  emerging  as  a  dominant 
force,  both  in  the  workplace,  and  in  everyday  life  activities.  In 
order  for  these  new  technologies  to  be  truly  effective,  they  must 
provide  communication  modes  and  interaction  modalities  that 
address  the  requirements  of  the  user  population  at  large,  in¬ 
cluding  individuals  with  disabilities  and  older  adults,  thus  mak¬ 
ing  the  Information  Society  universally  accessible  to  the  bene¬ 
fit  of  mankind. 

This  second  edition  of  the  Handbook  is  published  at  a  time 
when  the  field  of  HCI  has  experienced  a  major  scientific  and 
technical  evolution  in  the  depths  and  breadths  of  the  subject 
matter.  Currently  HCI  research  and  practice  is  prevalent  in  over 


60  countries  and  encompasses  over  80,000  practitioners  and 
researchers.  The  subject  matter  of  HCI  is  published  in  over 
20  journals,  over  30  different  conferences,  and  in  over  400 
books  and  over  250  company  provided  services  in  HCI.  There 
are  well  over  50,000  individual  conference  papers,  journal  arti¬ 
cles  and  technical  reports  relating  to  HCI.  The  second  edition  of 
this  handbook  captures  the  most  important  scientific  and  tech¬ 
nical  know-how  derived  from  the  above  sources.  Thus,  it  effec¬ 
tive  captures  the  field  of  HCI.  The  67  chapters  have  been  au¬ 
thored  by  some  of  the  foremost  authorities  in  the  field  including 
individuals  from  academia,  industry,  and  research  institutes 
from  around  the  globe.  With  over  5,500  references,  400  figures 
and  100  tables,  the  handbook  provides  a  wealth  of  data  as  well 
as  pointers  to  the  literature  for  more  in  depth  reading  for  both 
researchers  and  practitioners. 

Gavriel  Salvendy 

Series  Editor 
Purdue  University 
and 

Tsinghua  University 
P.R.  China 


xvii 


FOREWORD 


This  handbook  celebrates  the  success  of  human-computer  inter¬ 
action  (HCI).  Readers  from  within  the  discipline  can  take  satisfac¬ 
tion  in  their  substantial  accomplishments:  basic  research  on 
menus,  pointing  devices,  and  screen  design  as  well  as  design  in¬ 
novations  in  hot  HCI  topics  such  as  online  communities,  infor¬ 
mation  visualization,  mobile  devices,  or  interactive  games.  Practi¬ 
tioners  can  take  pride  in  their  contributions  to  better  products  and 
services,  and  its  growing  acceptance  by  decision-makers. 

Newcomers  to  HCI  will  find  informative  section  introduc¬ 
tions  that  highlight  the  key  ideas  and  guide  readers  to  the  de¬ 
tails.  They  can  ground  themselves  in  the  underlying  cognitive 
principles  (Part  I) ,  appreciate  the  range  of  design  possibilities 
(Parts  II  and  III)  as  well  as  the  diversity  of  users  one  must  con¬ 
sider  (Part  V),  and  recognize  the  growing  sophistication  of  mod¬ 
ern  development  processes  (Part  VI). 

Outsiders  can  dip  into  the  topics  that  concern  them  now, 
and  expand  their  reading  as  their  interests  grow.  They  will  likely 
begin  with  chapters  on  applications  for  which  they  must  build 
or  improve  user  interfaces  (Part  IV) .  Visionary  readers  may  find 
forward-looking  ideas  to  help  shape  the  future,  especially  the 
reflections  on  human  values,  ethics,  and  trust  (Part  VII).  Every 
reader  will  benefit  from  the  numerous  references  that  connect 
to  the  foundational  research. 

It  is  satisfying  to  see  the  breadth  and  depth  of  HCI,  and  re¬ 
flect  on  its  future.  I  kept  wondering  whether  this  volume  marks 
this  new  discipline’s  emergence  from  childhood,  its  youthful 
adolescence,  or  young  adult  status. 

The  case  for  childhood  is  that  HCI  is  just  beginning  to  be  ac¬ 
cepted  in  academic  departments  and  corporate  boardrooms. 
Only  a  few  years  have  passed  since  the  topic  became  a  funding 
area  within  governmental  research  support  agencies  (e.g.  U.  S. 
National  Science  Foundation  and  other  national  counterparts) 
and  the  first  doctoral  programs  were  announced.  On  the  pro¬ 
fessional  side,  usability  engineers  fight  for  acceptance,  authors 
propose  competing  design  guidelines,  and  analysts  squabble 
over  appropriate  metrics  of  success. 

The  argument  for  adolescence  is  that  the  uncertainty  of 
childhood  has  passed.  HCI  has  already  made  its  mark,  as  these 
articles  confirm,  but  this  energetic  adolescent  still  has  to  choose 
from  many  possible  directions  for  its  future.  It  could  become  a 
basic  science  like  physics  and  psychology;  it  could  remain  an 
eclectic  interdiscipline  like  bio-physics  and  socio-linguistics;  or 
it  could  mature  into  a  professional  discipline  with  scientific 
foundations  like  architecture  and  medicine. 


The  rationale  for  the  young  adult  status  is  that  human-com¬ 
puter  interaction  has  already  produced  interesting  theories  and 
stimulated  important  innovations  such  as  graphical  user  inter¬ 
faces,  the  World  Wide  Web,  email,  online  communities,  instant 
messaging,  and  multimedia.  It  has  successfully  done  its  part  in 
bringing  information  and  communications  technologies  to  a  di¬ 
verse  set  of  users  and  in  fashioning  devices  that  are  well  on  their 
way  to  universality,  ubiquity,  and  invisibility. 

Different  readers  may  come  to  different  conclusions  and  all 
be  correct.  Like  any  sufficiently  large  intellectual  domain,  there 
is  room  for  many  concurrent  viewpoints.  Those  who  think  HCI 
is  still  in  childhood  can  guide  its  development  by  engaging  in 
new  applications  and  innovative  technologies.  They  can  try  to 
influence  HCTs  maturation  by  refining,  extending,  and  applying 
predictive  and  explanatory  theories,  while  proposing  new  the¬ 
ories  and  integrative  frameworks. 

Those  who  see  HCI  in  the  adolescent  stages  can  engage  in 
broadening  the  field  by  introducing  new  educational  programs, 
expanding  participation  in  professional  societies,  and  enlarging 
existing  journals.  Energetic  adolescents  should  take  some  risks 
and  explore  affiliations  with  other  disciplines  such  as  sociology, 
semiotics,  or  nanotechnology.  They  can  expand  their  experi¬ 
ences  by  working  in  other  cultures  than  their  own. 

Those  who  see  HCI  as  a  young  adult  can  promote  its  inte¬ 
gration  into  traditional  social  structures.  They  can  help  usabil¬ 
ity  professionals  to  rise  through  corporate  and  government 
ranks,  to  be  more  influential  in  controlling  product  develop¬ 
ment  plans,  budgets  and  schedules.  They  can  find  the  way  to  en¬ 
gage  journalists  to  tell  HCI  stories  and  make  HCI  issues  more 
visible  in  related  disciplines  such  as  software  engineering,  net¬ 
working,  or  security.  HCI  advocates  still  have  work  to  do  to  gain 
acceptance  by  top  corporate  executives,  key  government  offi¬ 
cials,  and  leading  science  policy  makers. 

As  for  the  future,  I  have  high  expectations  and  great  hopes 
for  HCI  as  a  mature  discipline  and  profession.  We  can  preserve 
our  youthful  infatuation  with  new  and  cool  technologies,  but  we 
must  also  address  adult  responsibilities  for  usable,  universal  and 
useful  technologies  (CHr99  Research  Agenda  Workshop): 

1.  Usable;  Current  user  interfaces  and  their  underlying 
systems  are  just  too  frustrating  for  most  users.  A  major  effort  for 
software  developers  should  be  to  increase  the  reliability  and 
usability  of  every  system.  The  frequency  of  system  reboots,  ap¬ 
plication  crashes,  and  incompatible  file  formats  that  stop  users 


XIX 


XX 


FOREWORD 


in  the  middle  of  their  tasks  must  be  reduced.  Then  the  incom¬ 
prehensible  instructions,  mysterious  menus,  and  troubling  dia¬ 
log  boxes  need  to  be  revised  to  enable  users  to  complete  their 
work  promptly  and  confidently.  In  our  study  of  frustrations  en¬ 
countered  by  more  than  one  hundred  users,  they  also  made  fre¬ 
quent  comments  about  long  download  times  of  web  pages, 
dropped  sessions  for  networked  applications,  and  the  disrup¬ 
tion  caused  by  unsolicited  email  (spam),  and  destructive 
viruses.  As  the  number  of  users  has  grown,  all  these  problems 
have  become  more  serious.  Novice  users  want  the  benefits  of 
email  or  web  services,  but  they  are  the  poorly  equipped  and 
motivated  to  overcome  the  problems  that  high-tech  early 
adopters  proudly  conquered.  Improved  training  can  help,  but 
improved  designs  are  an  important  component  of  making  the 
next  generation  of  interfaces  more  appealing  and  the  users 
more  successful. 

2.  Universal:  Once  systems  are  made  more  reliable,  then 
we  will  be  ready  to  address  the  challenge  of  making  them  uni¬ 
versal  usable.  We  must  enable  people  with  diverse  backgrounds 
to  become  successful  satisfied  users:  old  &  young,  men  & 
women,  novice  &  expert,  users  with  varied  disabilities,  and 
users  with  low  motivation,  low  self-confidence,  and  poor  read¬ 
ing  skills.  We  will  have  to  make  it  possible  for  users  with  differ¬ 
ing  language  skills,  foreign  cultures,  and  non-traditional  social 
norms  to  participate  in  online  communities,  e-commerce,  and 
e-learning.  We  must  also  enable  people  with  varying  technolo¬ 
gies  to  benefit  from  information  and  communications  services; 
users  with  slow  modems,  small  screens,  or  older  processors.  Ad¬ 
vanced  technical  solutions  will  be  needed  to  convert  web  pages 
to  fit  small  mobile  devices  and  provide  web  services  by  cell¬ 
phone.  Finally,  we  must  learn  to  bridge  the  gap  between  what 
people  know  and  what  they  need  to  know.  Improved  online 
help,  tutorials,  email  assistance,  and  customer  service  are  es¬ 
sential.  These  contributions  will  often  come  first  from  software 
producers  in  developed  nations,  but  they  need  to  be  pressured 


to  include  adequate  concerns  for  the  needs  of  users  and  op¬ 
portunities  in  developing  nations. 

3.  Useful:  Usable  and  universal  technologies  will  make  it 
possible  to  deliver  vital  services:  life-critical  medical  systems, 
military  and  police  protection,  safe  transportation,  and  effec¬ 
tive  job  training.  Then  content  providers  can  create  truly  useful 
e-business,  e-learning,  e-healthcare,  e-government,  and  many 
other  applications.  HCI  professionals  and  researchers  can  bring 
respect  to  the  discipline  by  engaging  in  vital  national  and  inter¬ 
national  issues  such  as  economic  growth,  community  develop¬ 
ment,  terror  prevention,  and  conflict  resolution.  By  taking  re¬ 
sponsibility  for  difficult  questions  such  as  the  contemporary 
issues  of  free  speech,  intellectual  property,  voting  technologies, 
pornography  control,  and  privacy  we  will  win  respect  within  and 
outside  our  field.  We  need  theories  of  human  needs  to  under¬ 
stand  what  is  missing  and  where  the  greatest  opportunities  lie. 
The  next  major  direction  I  foresee  is  the  enhancement  of  cre¬ 
ativity  support  tools  to  make  more  people  more  creative  more 
of  the  time.  This  ambitious  goal  can  produce  a  sustainable  chain 
reaction  of  further  creativity  and  even  better  tools  that  more 
effectively  harness  each  individual’s  capacity  to  contribute  and 
participate  in  shaping  the  future. 

Further  signs  of  adulthood  will  be  the  willingness  of  HCI  re¬ 
searchers  and  professionals  to  take  on  the  issues  that  transcend 
technology.  We  will  gain  respect  as  we  contribute  to  systems  that 
empower  individuals,  enrich  human  relationships,  encourage  co¬ 
operation,  build  trust,  and  support  empathy.  Our  acceptance  of 
responsibility  for  the  future  will  be  more  than  a  model  for  other 
disciplines;  it  can  bring  hope  for  a  better  life  to  many  people. 

Ben  Shneiderman 
University  of  Maryland 
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This  second  edition  of  the  HCI  Handbook  provides  a  fresh  per¬ 
spective  on  the  field  of  human-computer  interaction  (HCI).  Nu¬ 
merous  chapters  deal  with  topics  that  were  not  addressed  in  the 
first  edition.  Different  authors  were  recruited  to  address  some 
topics  from  the  first  edition,  ensuring  that  this  revision  provides 
new  perspectives  and  insights.  Finally,  other  chapters  were  up¬ 
dated  to  ensure  that  they  present  the  latest  in  research  and 
practice.  While  HCI  may  have  emerged  from  within  computing, 
it  was  quickly  energized  by  substantial  contributions  from  other 
fields,  such  as  engineering,  psychology,  education,  and  graphic 
design.  The  resulting  interdisciplinary  research  has  produced 
important  outcomes  including  an  improved  understanding  of 
the  relationship  between  people  and  technology  as  well  as  more 
effective  processes  for  utilizing  this  knowledge  in  the  design 
and  development  of  solutions  which  can  increase  productivity, 
quality  of  life,  and  competitiveness.  HCI  now  has  a  home  in 
every  application,  environment,  and  device,  and  is  routinely 
used  as  a  tool  for  inclusion.  HCI  is  no  longer  just  an  area  of  spe¬ 
cialization  within  more  traditional  academic  disciplines,  but  has 
developed  such  that  both  undergraduate  and  graduate  degrees 
are  available  that  focus  explicitly  on  the  subject.  This  handbook 
should  be  of  tremendous  value  to  practitioners,  researchers, 
students,  and  academicians  regardless  of  enterprise,  because 
HCI,  like  computing  itself,  is  now  ubiquitous. 

Composed  of  eight  sections  which  include  67  chapters,  the 
nearly  2000  pages  of  this  handbook  were  authored  by  over 
130  imminent  researchers  and  professionals  from  around  the 
world.  Authors  work  in  academia,  industry,  and  research  insti¬ 
tutes.  The  Advisory  Board  was  instrumental  throughout  the 


process  as  they  helped  shape  and  refine  our  vision.  We  thank 
the  contributing  authors  and  the  Advisory  Board  as  their  com¬ 
mitment  to  the  success  of  this  volume  was  unwavering.  We  are 
honored  to  have  worked  with  each  one  of  them.  It  was  also  truly 
a  pleasure  working  with  Gavriel  Salvendy,  Lori  Handelman, 
Anthony  Messina,  and  Anne  Duffy  whose  insights  and  facilita¬ 
tion  were  critical  to  the  success  of  this  effort. 

The  handbook  opens  with  a  new  perspective  on  the  evolu¬ 
tion  of  HCI,  which  provides  a  sense  of  perspective  on  the  field, 
recognizes  significant  visionary  developments,  and  directs  the 
reader  to  seminal  work  in  the  field.  Sections  covering  humans  in 
HCI  and  computers  in  HCI  follow.  These  16  chapters  address 
fundamental  issues  that  affect  all  design  decisions.  Next,  a  col¬ 
lection  of  22  chapters  spanning  two  sections  address  the  de¬ 
sign  of  human-computer  interactions  as  well  as  the  design  of 
solutions  for  specific  applications  and  domains.  An  entire  sec¬ 
tion  titled  “Designing  for  Diversity”  is  dedicated  to  addressing 
the  needs  of  diverse  users.  This  unique  collection  of  8  chapters 
sets  this  volume  apart  from  many  other  HCI  resources.  The  final 
19  chapters  explore  the  state  of  the  art  in  the  development 
process  and  the  management  of  HCI.  The  volume  concludes 
with  a  chapter  that  presents  interviews  with  five  international 
experts  where  they  share  their  perspectives  on  the  past,  pre¬ 
sent,  and  future  of  HCI. 


Andrew  Sears  and  Julie  A.  Jacko 
September  2006 
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PREAMBLE:  HISTORY  IN  A  TIME 
OF  RAPID  OBSOLESCENCE 


“What  is  a  typewriter?”  my  six-year-old  daughter  asked. 

I  hesitated.  “Well,  it’s  like  a  computer,”  I  began. 

A  paper  widely  read  15  years  ago  concluded  with  the  advice 
to  design  a  word  processor  by  analogy  to  something  familiar  to 
everyone:  a  typewriter.  Even  then,  one  of  my  Danish  students 
questioned  this  reading  assignment,  noting,  “The  typewriter  is  a 
species  on  its  last  legs.” 

Why  study  the  history  of  human-computer  interaction  (HCI)? 
For  most  of  the  computing  era,  interaction  involved  80-column 
punch  cards,  paper  tape,  line  editors,  1920-character  displays, 
1-megabye  diskettes,  or  other  extinct  species.  Are  the  interac¬ 
tion  issues  of  those  times  relevant  today?  No. 

Of  course,  aspects  of  the  human  side  of  HCI  change  more 
slowly  if  at  all.  Much  of  what  was  learned  about  perceptual,  cog¬ 
nitive,  social,  and  emotional  processes  in  interacting  with  older 
technologies  applies  to  emerging  technologies.  However,  with 
human  behavior,  what  was  learned  is  often  more  important 
than  how  it  came  to  be  learned.  Other  chapters  of  the  Hand¬ 
book  lay  out  this  knowledge. 

Nevertheless,  in  an  apparent  paradox,  the  rapid  pace  of 
change  could  make  knowledge  of  the  field’s  history  particularly 
useful  for  several  reasons: 

1.  Several  disciplines  are  engaged  in  HCI  research  and  applica¬ 
tion,  but  most  people  are  exposed  to  only  one.  By  seeing 
how  others  evolved,  we  can  identify  possible  benefits  and 
challenges  in  expanding  our  focus. 

2.  The  recognition  of  past  visionaries  and  innovators  is  part  of 
building  a  community  and  inspiring  future  contributors,  even 
when  specific  past  achievements  are  difficult  to  appreciate  today 

3.  Some  visions  and  prototypes  were  quickly  converted  to 
widespread  application,  some  took  decades,  and  some  re¬ 
main  unrealized.  By  understanding  the  reasons  for  different 
outcomes,  we  might  assess  today’s  visions  more  realistically. 

4.  Crystal  balls  are  notoriously  unreliable,  but  anyone  planning 
or  managing  a  career  in  a  rapidly  changing  field  must  con¬ 
sider  the  future,  for  one  thing  is  certain:  It  won’t  resemble 
the  present.  Our  best  chance  is  to  find  trajectories  that  ex¬ 
tend  from  the  past  through  the  present. 

Accordingly,  this  account  emphasizes  disciplinary  issues,  early 
visions,  and  when  technologies  and  practices  became  widely  used. 
The  latter  makes  this  more  of  a  social  history  than  an  engineering 
history  that  emphasizes  “firsts.”  I  point  out  possible  trends  and  tra¬ 
jectories  that  you  might  download  into  your  crystal  balls. 
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A  historical  account  is  a  perspective.  It  emphasizes  some 
things,  and  de-emphasizes  or  omits  others.  A  history  can  be 
wrong  in  details,  but  it  can  never  be  right  in  any  final  sense.  Your 
questions  and  your  interests  will  determine  whether  a  perspec¬ 
tive  is  useful  to  you.  More  information  on  some  topics  can  be 
found  in  prior  accounts. 

The  blueprint  for  intellectual  histories  of  HCI  was  estab¬ 
lished  by  Ron  Baecker  in  the  opening  chapters  of  the  1987  and 
1995  editions  oi Readings  in  Human-Computer  Interaction, 
followed  by  Richard  Pew’s  wonderfully  written  chapter  in  the 
2003  version  of  this  Handbook .  Brian  Shackel’s  (1997)  account 
of  European  contributions  and  specialized  essays  by  Brad  Myers 
(1998)  on  HCI  engineering  history  and  Alan  Blackwell  (2006)  on 
the  history  of  metaphor  in  design  include  additional  references. 
Recent  years  have  seen  a  wave  of  popular  books  covering  the 
history  of  personal  computing  (e.g.  Hiltzik,  1999;  Hertzfeld, 
2005;  Markoff,  2005).  This  chapter  expands  on  Grudin  (2005; 
2006).  Additional  information  on  some  topics  can  be  found  in 
the  Timelines  column  in  ACM  Interactions  from  March  2006. 

These  writers  and  I  are  not  trained  historians.  We  lived 
through  much  of  the  computing  era  as  participants  and  wit¬ 
nesses.  Our  objectivity  can  be  called  into  question.  I  have  no 
personal  contributions  to  defend  (self-citations  are  only  to  pa¬ 
pers  that  include  historical  or  demographic  data),  and  I  con¬ 
ducted  many  interviews,  but  we  have  biases. 

Personal  experiences  can  illustrate  points  and  enliven  an  ac¬ 
count  by  conveying  human  consequences  of  changes  that  oth¬ 
erwise  appear  abstract  or  distant.  Some  readers  welcome  them, 
but  others  find  them  irritating.  I  have  attempted  to  satisfy  both 
groups  by  moving  several  personal  examples  to  a  short  Appen¬ 
dix,  akin  to  “deleted  scenes”  on  a  DVD. 

Definitions:  HCI,  CHI,  Human  Factors,  Ergonomics,  IS,  IT 

Exploring  the  HCI  literature  is  complicated  by  differences  in 
the  ways  that  dozens  of  simple  terms  are  used.  Some  of  these 
differences  are  identified  later  in  this  chapter.  Here  I  explain 
how  I  use  several  key  terms.  HCI  is  used  very  broadly  to  cover 
work  in  several  disciplines.  CHI  (computer-human  interac¬ 
tion)  refers  to  one  narrower  focus,  associated  mainly  with 
computer  science,  an  Association  for  Computing  Machinery 
Special  Interest  Group  (ACM  SIGCHI)  and  its  annual  CHI  con¬ 
ference.  I  use  human  factors  (HF),  ergonomics  (E),  and  HF&E 
interchangeably — some  writers  attach  more  specialized 
meanings  to  ergonomics,  which  is  a  term  favored  in  Europe, 
but  there  is  little  agreement  among  writers  and  no  clear  ben¬ 
efit  in  differentiating  them.  The  Human  Factors  Society  (HFS) 
became  the  Human  Factors  and  Ergonomics  Society  (HFES) 
in  1992.  I  use  IS  (information  systems)  to  refer  to  the  man- 
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agement  discipline  that  has  also  been  labeled  data  process¬ 
ing  (DP)  and  management  information  systems  (MIS).  The 
latter  was  more  widely  used  until  recently.  I  follow  common 
parlance  in  referring  to  organizational  information  systems 
specialists  as  “IT  professionals  (IT  pros).”  Information  sys¬ 
tems  is  to  be  differentiated  from  information  science,  an  old 
field  with  a  new  digital  incarnation  arising  in  transformed  li¬ 
brary  schools. 


HUMAN-TOOL  INTERACTION 
AT  THE  DAWN  OF  COMPUTING 


A  century  ago,  Frederick  Taylor  (1911)  employed  new  tech¬ 
nologies  and  methods — moving  pictures  and  statistical  analy¬ 
sis — to  improve  work  practices.  Time-and-motion  studies  were 
most  successful  with  assembly-line  manufacturing  and  other 
manual  tasks.  Despite  some  uneasiness  with  “Taylorism,”  as  re¬ 
flected  in  Charlie  Chaplin’s  popular  satire  “Modern  Times,”  sci¬ 
ence  and  engineering  would  remain  committed  to  the  pursuit 
of  efficiency. 

The  World  Wars  accelerated  efforts,  focused  on  matching 
people  to  jobs,  training  them,  and  then  designing  equipment 
and  jobs  to  be  more  easily  mastered.  Simple  flaws  in  the  designs 
of  World  War  II  aircraft  controls  (Roscoe,  1997)  and  escape 
hatches  (Dyson,  1979)  led  to  aircraft  losses  and  thousands  of  ca¬ 
sualties.  Engineering  psychology  was  born  during  the  war;  af¬ 
terwards,  American  aviation  psychologists  created  the  HFS.  Two 
legacies  of  the  conflict  were  a  greater  awareness  of  the  potential 
of  computers  and  an  enduring  interest  in  behavioral  require¬ 
ments  for  design.  For  more  on  this  period,  see  Roscoe  (1997) 
and  Meister  (1999,  2005). 

Early  tool  use  was  not  discretionary,  whether  by  an  assembly¬ 
line  worker  or  a  pilot.  If  training  was  necessary,  workers  were 
trained.  One  research  goal  was  to  reduce  training  time,  but 
much  more  important  was  to  increase  the  speed  of  reliable 
skilled  performance. 

Three  Roles  in  Early  Computing 

Early  computer  projects  employed  people  in  three  roles: 
(a)  management,  (b)  programming,  and  (c)  operation.  Man¬ 
agers  oversaw  design,  development,  and  operation,  specifying 
the  programs  to  be  written  and  distributing  the  output.  A  small 
army  of  operators  was  needed.  ENIAC,  arguably  the  first  gen¬ 
eral-purpose  electronic  computer  in  1946,  was  10  feet  tall,  cov¬ 
ered  1,000  square  feet,  and  consumed  as  much  energy  as  a  small 
town.  Once  a  program  was  written,  several  people  loaded  it  by 
setting  switches,  dials,  and  cable  connections.  Despite  a  design 
innovation  that  boosted  vacuum-tube  reliability  by  operating 
them  at  25%  normal  power,  50  spent  tubes  had  to  be  found  and 
replaced  on  an  average  day.  Large  computers  failed  every  several 
minutes,  and  some  were  repaired  by  people  continuously 
wheeling  shopping  carts  full  of  vacuum  tubes  down  aisles  seems 
unnecessary. 

Computer  operation,  management,  and  programming  each 
eventually  became  a  major  focus  of  HCI  research.  Despite  the 


evolution  of  computers  and  our  interaction  with  them,  the 
research  spectrum  today  reflects  aspects  of  this  early  division 
of  labor. 


1945-1955:  MANAGING  VACUUM  TUBES 


Reducing  operator  burden  was  a  key  focus  of  early  innovation. 
Major  strides  were  reducing  the  time  spent  replacing  or  reset¬ 
ting  vacuum  tubes  and  the  invention  of  stored-program  com¬ 
puters,  which  could  be  loaded  from  tape  rather  than  manually 
with  cables  and  switches.  These  endeavors  were  consistent  with 
the  post-war  “knobs  and  dials”  human  factors  or  ergonomics  ap¬ 
proaches.  Before  long,  one  computer  operator  could  do  work 
that  previously  required  a  team. 

Grace  Hopper,  Liberating  Computer  Users 

As  computers  became  more  reliable  and  capable,  program¬ 
ming  became  a  central  activity.  To  improve  programmers’  in¬ 
terfaces  to  computers  meant  to  develop  languages,  compil¬ 
ers,  and  constructs  such  as  subroutines.  Hopper,  a  pioneer  in 
these  areas,  described  her  goal  as  “freeing  mathematicians  to 
do  mathematics”  (Hopper,  1952;  see  also  Sammet,  1992). 
These  words  are  echoed  in  today’s  usability  goal  of  freeing 
users  to  do  their  work. 


1955-1965:  TRANSISTORS,  NEW  VISTAS 


Early  forecasts  that  the  world  would  need  few  computers  re¬ 
flected  the  limitations  of  vacuum  tubes.  More  powerful  and  re¬ 
liable  solid-state  computers,  first  available  commercially  in  1958, 
changed  everything.  As  they  were  acquired  to  support  scientific 
and  engineering  tasks  outside  research  settings,  less  computer- 
savvy  operators  needed  better  interfaces,  and  people  envi¬ 
sioned  uses  of  computers  that  had  been  unimaginable  for  barn¬ 
sized  machines  of  limited  capability. 

Supporting  Operators:  The  First  Systematic  HCI  Research 

“In  the  beginning,  the  computer  was  so  costly  that 
it  had  to  be  kept  gainfully  occupied  for  every 
second;  people  were  almost  slaves  to  feed  it.” 

— Brian  Shackel  (1997) 

Almost  all  computer  use  of  this  period  involved  programs 
and  data  that  were  read  in  from  cards  or  tape.  A  program  then 
ran  without  interruption,  produced  printed,  or  punched,  or 
tape  output,  and  terminated.  This  “batch  processing”  restricted 
HCI  to  basic  operation,  programming,  and  use  of  the  output, 
of  which  only  operation  involved  hands-on  computer  use. 

Low-paid  computer  operators  set  switches,  pushed  buttons, 
read  lights,  loaded,  and  burst  printer  paper,  loaded  and  unloaded 
cards,  magnetic  tapes,  and  paper  tapes,  and  so  on.  Teletypes  sup¬ 
ported  direct  interaction,  commands  typed  by  the  operator  in¬ 
terleaved  with  computer  responses  and  status  messages.  Even- 
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tually,  the  paper  that  scrolled  up  one  line  at  a  time  yielded  to 
“glass  tty’s”  (glass  teletypes),  called  visual  display  units  (VDUs)  or 
video  display  terminals  (VDTs),  or  CRTs  for  cathode-ray  tubes 
(CRTs) ,  which  also  scrolled  operator  commands  and  computer¬ 
generated  messages  one  line  at  a  time.  A  monochrome  terminal 
restricted  to  displaying  alphanumeric  characters  cost  $50,000  in 
today’s  dollars:  expensive,  but  a  small  fraction  of  the  cost  of  a 
computer.  A  large  computer  might  have  one  such  console,  used 
only  by  the  operator. 

Improving  the  design  of  buttons,  switches,  and  displays  was  a 
natural  extension  of  human  factors/ergonomics.  Experts  in  this 
field  authored  the  first  HCI  papers.  In  1959,  British  researcher 
Brian  Shackel  published  the  article,  “Ergonomics  for  a  Computer,” 
followed  in  1962  by  “Ergonomics  in  the  Design  of  a  Large  Digital 
Computer  Console.”  These  described  the  redesign  of  the  consoles 
for  the  EMIac  and  EMIdec  2400  analog  and  digital  computers.  The 
latter  was  the  largest  computer  at  the  time  (Shackel,  1997). 

In  the  United  States,  the  HFS  formed  in  1957  and  focused  on 
improving  the  efficiency  of  skilled  performance,  reducing  errors 
in  skilled  performance,  and  training  people  to  achieve  skilled 
performance.  Sid  Smith’s  (1963)  “Man-Computer  Information 
Transfer”  marked  the  start  of  a  career  in  the  human  factors  of 
computing. 

Visions  and  Demonstrations 

As  transistors  replaced  vacuum  tubes,  a  wave  of  imaginative 
writing,  conceptual  innovation,  and  prototype  building  swept 
through  the  research  community.  Although  some  of  the  lan¬ 
guage  now  seems  dated,  notably  the  use  of  male  generics,  many 
of  their  key  concepts  resonate  today. 

Vannevar  Bush  and  the  MEMEX.  In  one  way  or  an¬ 
other,  Vannevar  Bush  set  much  of  this  in  motion.  He  advised 
U.S.  Presidents  Franklin  Roosevelt  and  Harry  Truman  and 
served  as  Director  of  the  Office  of  Scientific  Research  and  De¬ 
velopment.  His  influential  1945  essay,  ‘As  'We  May  Think,”  out¬ 
lined  a  multimedia  device  that  he  called  the  MEMEX.  Although 
Bush  initially  envisioned  it  as  an  improbable  mechanical  device 
for  information  storage  and  retrieval  on  microfilm,  he  antici¬ 
pated  many  computer  capabilities.  Bush  emphasized  the  linking 
of  information,  which  he  called  associative  indexing, 

the  basic  idea  of  which  is  a  provision  whereby  any  item  may  be  caused 
at  will  to  select  immediately  and  automatically  another.  This  is  the  es¬ 
sential  feature  of  the  MEMEX.  .  .  .  Any  item  can  be  joined  into  numer¬ 
ous  trails.  .  .  .  New  forms  of  encyclopedias  will  appear,  ready  made  with 
a  mesh  of  associative  trails,  [which  a  user  could  extend] . 

The  lawyer  has  at  his  touch  the  associated  opinions  and  decisions  of 
his  whole  experience  and  of  the  experience  of  friends  and  authorities. 
The  patent  attorney  has  on  call  the  millions  of  issued  patents,  with  fa¬ 
miliar  trails  to  every  point  of  his  client’s  interest.  The  physician,  puz¬ 
zled  by  a  patient’s  reactions,  strikes  the  trail  established  in  studying  an 
earlier  similar  case  and  runs  rapidly  through  analogous  case  histories, 
with  side  references  to  the  classics  for  the  pertinent  anatomy  and  his¬ 
tology.  The  chemist,  struggling  with  the  synthesis  of  an  organic  com¬ 
pound,  has  all  the  chemical  literature  before  him  in  his  laboratory,  with 
trails  following  the  analogies  of  compounds  and  side  trails  to  their  phys¬ 
ical  and  chemical  behavior. 


The  historian,  with  a  vast  chronological  account  of  a  people,  parallels 
it  with  a  skip  trail  which  stops  only  on  the  salient  items,  and  can  follow 
at  any  time  contemporary  trails  which  lead  him  all  over  civilization  at  a 
particular  epoch.  There  is  a  new  profession  of  trailblazers,  those  who 
find  delight  in  the  task  of  establishing  useful  trails  through  the  enor¬ 
mous  mass  of  the  common  record. 

Sixty  years  later,  professionals  make  discretionary  hands-on 
use  of  computers  in  just  these  ways.  However,  20  years  after 
Bush  wrote  his  essays,  hands-on  computer  use  was  largely  re¬ 
stricted  to  operators  and  data-entry  personnel  who  were,  in 
Shackel’s  words,  “almost  slaves”  tending  costly  machines. 

J.C.R.  Licklider  at  BBN  and  AREA.  Between  I960  and 
1965,  there  was  an  outpouring  of  ideas  and  systems  tied  to  the 
newly  realized  potential  of  computers.  Licklider,  a  psychologist, 
played  a  dual  role  similar  to  that  of  Bush.  He  wrote  influential 
essays  and  backed  some  of  the  most  important  research  in 
computer-science  history  as  a  manager  at  Bolt  Beranek  and 
Newman  (BBN)  from  1957-1962  and  director  of  the  Informa¬ 
tion  Processing  Techniques  Office  (IPTO)  of  the  Department  of 
Defense  Advanced  Research  Projects  Agency  (called  ARPA  and 
DARPA  at  different  times)  from  1962-1964. 

BBN  conducted  extensive  computer-related  work  funded 
by  the  government  and  employed  dozens  of  influential  re¬ 
searchers,  including  John  Seely  Brown,  Richard  Pew,  and 
many  who  also  worked  at  MIT  (e.g.,  John  McCarthy,  Marvin  Min¬ 
sky,  Licklider  himself).  IPTO  funding  was  crucial  in  launching 
computer-science  departments  and  establishing  artificial  intel¬ 
ligence  as  a  field  in  the  1960s,  in  addition  to  its  best-known 
accomplishment,  giving  birth  to  the  Internet. 

In  I960,  Licklider  described  “man-machine  symbiosis,”  not¬ 
ing,  “there  are  many  man-machine  systems.”  He  continued,  ‘At 
present,  however,  there  are  no  man-computer  symbioses — 
answers  are  needed.”  Licklider  saw  the  computer  as  “a  fast  in¬ 
formation-retrieval  and  data-processing  machine,”  but  he 
stressed  its  capability  to  play  a  larger  role:  “One  of  the  main 
aims  of  man-computer  symbiosis  is  to  bring  the  computing  ma¬ 
chine  effectively  into  the  formulative  parts  of  technical  prob¬ 
lems”  (pp.  4-5). 

This  would  require  more  rapid  real-time  interaction  than 
batch  systems  of  the  time  supported.  In  1962,  Licklider  and 
■Wes  Clark  outlined  the  requirements  of  a  system  for  “on-line 
man-computer  communication.”  They  identified  capabilities 
that  were  ripe  for  development:  timesharing  of  a  computer 
among  many  users;  electronic  input-output  surfaces  for  the  dis¬ 
play  and  communication  of  symbolic  and  pictorial  information; 
interactive,  real-time  support  for  programming  and  informa¬ 
tion  processing;  large-scale  information  storage  and  retrieval 
systems;  and  facilitation  of  human  cooperation.  They  were  also 
perceptive  in  recognizing  that  other  desirable  technologies, 
such  as  speech  recognition  and  natural  language  understand¬ 
ing,  would  be  very  difficult  to  achieve. 

In  1963,  Licklider  identified  his  ARPA  colleagues  as  “the  mem¬ 
bers  and  affiliates  of  the  Intergalactic  Computer  Network,”  antici¬ 
pating  the  Internet  that  ARPA  would  be  instrumental  in  developing 
(Pew,  2003).  His  1965  hook  Libraries  of  the  Future  summarized 
and  expanded  this  vision.  Licklider’s  role  in  rapidly  advancing 
computer  science  and  HCI  is  detailed  in  'Waldrop  (2001). 
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John  McCarthy,  Christopher  Strachey,  Wesley  Clark. 

McCarthy  and  Strachey  worked  out  details  of  timesharing,  a  cru¬ 
cial  step  in  enabling  interactive  computing  (Fano  &  Corbato, 
1966).  Apart  from  a  small  number  of  researchers  who  could  use 
computers  built  without  concern  for  cost  through  military  fund¬ 
ing,  no  one  could  use  a  computer  interactively  as  long  as  that  re¬ 
quired  exclusive  access.  Timesharing  allowed  several  (and  later 
dozens)  of  simultaneous  users  at  terminals.  Languages  were  de¬ 
veloped  to  facilitate  online  control  and  programming  of  time¬ 
sharing  systems  (e.g.,  JOSS  in  1964). 

Clark  was  instrumental  in  building  the  TX-0  and  TX-2  at  MIT’s 
Lincoln  Labs  to  demonstrate  timesharing  and  other  innovative 
concepts.  These  machines,  which  cost  on  the  order  of  |10  mil¬ 
lion,  were  factors  in  establishing  the  Boston  area  as  a  center  for 
computer  research  (all  prices  are  in  2007  U.S.  dollars).  A  CHTOS 
panel  discussion  of  this  period  that  includes  Clark  and  Ivan 
Sutherland  can  be  viewed  online  (ePresence,  2006).  The  TX-2  was 
the  most  powerful  and  capable  in  the  world  at  the  time,  but  to  put 
it  in  perspective,  it  was  less  powerful  than  a  smartphone  is  today. 

Ivan  Sutherland  and  Computer  Graphics.  Suther¬ 
land’s  1963  PhD  thesis,  describing  the  Sketchpad  system  built 
on  the  TX-2  with  the  intention  of  making  computers  “more  ap¬ 
proachable,”  is  arguably  the  most  impressive  and  influential 
document  in  the  history  of  HCI.  A  nice  version  edited  by  Alan 
Blackwell  and  Kerry  Rodden  is  available  at  http:/Avww.cl.cam.ac 
.uk/TechReportsAJCAM-CL-TR-574.pdf  Sketchpad  launched 
computer  graphics,  setting  into  motion  a  field  of  research  that 
would  have  a  decisive  impact  on  HCI  20  years  later. 

Sutherland  demonstrated  iconic  representations  of  con¬ 
straints,  copying,  moving,  and  deleting  of  hierarchically  orga¬ 
nized  objects,  object-oriented  programming  concepts,  inter¬ 
action  techniques,  and  approaches  to  animation.  He  supported 
picture  construction  using  a  light  pen  and  facilitated  visualiza¬ 
tion  by  separating  the  coordinate  system  used  to  define  a  pic¬ 
ture  from  the  one  used  to  display  it.  Sutherland  noted  the  po¬ 
tential  for  animated  graphics  in  the  movie  industry.  His  frank 
description  of  what  did  not  work — what  he  called  a  “big  flop,” 
when  engineers  found  Sketchpad  too  limited  for  computer- 
assisted  design  (CAD) — enabled  others  to  make  rapid  progress. 

After  completing  his  PhD,  Sutherland  succeeded  Licklider 
as  the  director  of  IPTO  in  1964.  Among  those  whose  work  he 
funded  in  this  capacity  was  Douglas  Engelbart  at  the  Stanford 
Research  Institute. 

Douglas  Engelbart,  augmenting  human  intellect.  In 

1963,  Engelbart  published  “a  conceptual  framework  for  the  aug¬ 
mentation  of  man’s  intellect,”  and  over  the  next  several  years  he 
built  systems  that  made  great  strides  toward  realizing  this  vision. 
He  also  supported  and  inspired  engineers  and  programmers 
who  subsequently  made  major  independent  contributions. 

Echoing  Bush  and  Licklider,  Engelbart  saw  the  potential  for 
computers  to  become  congenial  tools  that  people  would 
choose  to  use  interactively.  He  wrote: 

By  “augmenting  man’s  intellect”  we  mean  increasing  the  capability  of  a 
man  to  approach  a  complex  problem  situation,  gain  comprehension  to 
suit  his  particular  needs,  and  to  derive  solutions  to  the  problems.  ...  By 
“complex  situations”  we  include  the  professional  problems  of  diplomats. 


executives,  social  scientists,  life  scientists,  physical  scientists,  attorneys, 
designers.  .  .  .  We  refer  to  a  way  of  life  in  an  integrated  domain  where 
hunches,  cut-and-try  intangibles,  and  the  human  “feel  for  a  situation” 
usefully  coexist  with  powerful  concepts,  streamlined  terminology  and 
notation,  sophisticated  methods,  and  highly  powered  electronic  aids. 

Engelbart  used  his  AREA  funding  to  develop  and  integrate  an 
extraordinary  set  of  prototype  applications  into  his  NLS  system 
in  the  span  of  a  few  years.  In  doing  so,  Engelbart  conceptualized 
and  implemented  the  foundations  of  word  processing,  invented 
or  refined  input  devices  including  the  mouse  and  multikey  con¬ 
trol  box,  and  made  use  of  multidisplay  environments  that  inte¬ 
grated  text,  graphics,  and  video  in  windows.  In  1968,  these 
would  be  demonstrated  in  a  sensational  90-minute  event  at  the 
Fall  Joint  Computer  Conference  in  San  Francisco  (http://sloan 
.stanford.edu/MouseSite/1968Demo.html).  The  focal  point  for 
interactive  systems  research  in  the  United  States  appeared  to 
have  moved  from  the  East  Coast  to  the  West  Coast. 

Engelbart,  an  engineer  by  training,  believed  in  careful  human 
factors  testing  of  systems.  He  focused  on  improving  efficiency 
and  reducing  errors  in  skilled  use,  including  concern  with  ef¬ 
fects  of  fatigue  and  stress.  Engelbart  also  considered  training  to 
be  a  key  component,  feeling  that  people  should  be  willing  to 
tackle  a  difficult  interface  if  it  delivered  greater  power  once  mas¬ 
tered.  The  relative  importance  of  optimizing  for  skilled  vs.  initial 
use  later  became  a  source  of  contention  and  still  surfaces  in  dis¬ 
cussions  of  interaction. 

Ted  Nelson’s  vision  of  interconnectedness.  In  I960, 
while  a  graduate  student  in  sociology,  the  inventor  of  the  term 
“hypertext”  founded  Project  Xanadu  to  create  an  easily  used  com¬ 
puter  network.  In  1965,  he  published  a  paper  titled  ‘A  file  struc¬ 
ture  for  the  complex,  the  changing,  and  the  indeterminate.”  Nel¬ 
son  continued  to  produce  works  (often  self-published)  with 
stirring  calls  for  systems  to  democratize  computing  through  a 
highly  interconnected,  extensible  network  of  digital  objects  (e.g.. 
Nelson,  1973).  Xanadu  was  never  fully  realized.  Nelson  did  not 
see  the  early  World  Wide  Web  as  an  adequate  realization  of  his 
vision,  but  features  of  lightweight  technologies  such  as  weblogs, 
wikis,  unstructured  collaborative  tagging,  and  search  enable 
many  of  the  activities  Nelson  envisioned. 

Nelson  also  foresaw  the  significance  of  intellectual  property 
issues  that  would  arise  in  digital  domains.  Although  his  solu¬ 
tions  were  again  not  fully  implemented,  they  drew  attention  to 
the  issues. 


Conclusion:  Visions,  Demos,  and  Widespread  Use 

Progress  in  HCI  can  be  understood  in  terms  of  the  inspiring 
visions,  conceptual  advances  needed  to  realize  and  demo  as¬ 
pects  of  the  visions  in  working  prototypes,  and  the  evolution 
of  widespread  practices.  Tying  these  together  and  enabling  the 
visions  to  be  realized  in  practice  are  the  remarkable,  relentless 
hardware  advances  that  led  to  the  production  of  modern  per¬ 
sonal  computers  today,  which  are  many  millions  of  times  more 
powerful  than  the  expensive  systems  used  by  pioneers. 

At  the  conceptual  level,  much  of  the  basic  foundation  for  to¬ 
day’s  graphical  user  interfaces  was  in  place  by  1965,  although  it 
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would  be  a  mistake  to  underestimate  the  technical  invention  and 
detailed  work  that  followed.  However,  most  computer  use  of  that 
time  bore  no  resemblance  to  what  could  be  accomplished  by  per¬ 
sonal  use  of  a  $10  million  custom-built  machine.  Pew  (2003)  de¬ 
scribed  the  breakthrough  I960  Digital  Equipment  Corporation 
(DEC)  PDP-1  as  “truly  a  computer  with  which  an  individual  could 
interact.”  The  PDP-1  came  with  CRT  display,  keyboard,  light  pen, 
and  paper-tape  reader.  It  cost  about  $1  million  and  had  the  ca¬ 
pacity  of  a  Radio  Shack  TRS  80  20  years  later.  It  required  consid¬ 
erable  technical  and  programming  support.  It  was  a  machine 
used  by  a  few  fortunate  computer-sawy  researchers. 

Licklider’s  “man-computer  symbiosis,”  Engelbart’s  “augment¬ 
ing  human  intellect,”  and  Nelson’s  “conceptual  framework  for 
man-machine  everything”  described  a  world  that  did  not  exist,  in 
which  attorneys,  doctors,  chemists,  designers,  and  other  pro¬ 
fessionals  were  hands-on  users  of  computers  out  of  choice.  The 
reality  was  that  for  some  time  to  come,  most  hands-on  use  would 
continue  to  be  routine,  nondiscretionary  operation.  As  for  the  vi¬ 
sions,  today  some  of  the  capabilities  are  taken  for  granted,  some 
are  just  being  realized,  and  others  remain  elusive. 


1965-1980:  HCl  BEFORE 
PERSONAL  COMPUTING 


Control  Data  Corp.  launched  the  transistor-based  6000  series 
computers  in  1964.  In  1965,  the  first  commercial  computers 
based  on  integrated  circuits  arrived  with  the  IBM  System/360. 
These  powerful  systems,  later  called  mainframes  to  distinguish 
them  from  minicomputers,  brought  computing  into  the  busi¬ 
ness  realm.  Each  of  the  three  roles  in  computing — operation, 
management,  programming — became  a  significant  profession. 

Operators  interacted  directly  with  computers  to  perform 
routine  maintenance,  load  and  run  programs,  handle  printouts, 
and  so  on.  As  timesharing  spread,  this  hands-on  category  ex¬ 
panded  to  include  data  entry  and  other  repetitive  tasks. 

Managers  oversaw  hardware  acquisition,  software  develop¬ 
ment,  operation,  routing,  and  the  use  of  output.  They  were  usu¬ 
ally  not  hands-on  users,  but  people  who  relied  on  printed  out¬ 
put  and  reports  considered  themselves  computer  users. 

Apart  from  those  working  in  research  settings,  programmers 
were  rarely  direct  users  until  late  in  this  period.  Many  flowcharted 
and  wrote  programs  on  paper  forms.  Keypunch  operators  then 
punched  the  program  instructions  onto  cards.  These  were  sent  to 
computer  centers  for  computer  operators  to  run.  Printouts  and 
other  output  were  picked  up  later.  Many  programmers  would  use 
computers  directly  when  they  could,  but  the  cost  of  computer 
use  generally  dictated  an  efficient  division  of  labor. 


Human  Factors  and  Ergonomics  Embraces  Computers 

In  1970,  Brian  Shackel  founded  the  Human  Sciences  and  Ad¬ 
vanced  Technology  (HUSAT)  center  at  Loughborough  Univer¬ 
sity  in  the  UK,  which  was  devoted  to  ergonomics  research  em¬ 
phasizing  HCI.  Sid  Smith  and  other  human  factors  engineers 
examined  a  range  of  input  and  output  issues,  notably  the  rep¬ 


resentation  of  information  on  displays  (e.g.,  S.L.  Smith,  Ear- 
quhar,  &  Thomas,  1965)  but  also  computer-generated  speech 
(e.g.,  S.L.  Smith  &  Goodwin,  1970).  In  1972,  the  Computer  Sys¬ 
tems  Technical  Group  (CSTG)  of  the  HFS  formed;  soon  it  was 
the  largest  technical  group  in  the  society. 

Leading  publications  were  the  general  journal  Human  Fac¬ 
tors  and,  starting  in  1969,  the  computer-focused  International 
Journal  of  Man-Machine  Studies  (IJMMS). 

The  first  widely  read  HCI  book  was  James  Martin’s  1973  De¬ 
sign  of  Man-Computer  Dialogues.  A  comprehensive  survey  of 
interfaces  for  operation  and  data  entry,  it  began  with  an  arrest¬ 
ing  opening  chapter  that  described  a  world  in  transition.  Ex¬ 
trapolating  from  declining  hardware  prices,  Martin  wrote. 

The  terminal  or  console  operator,  instead  of  being  a  peripheral  consid¬ 
eration,  will  become  the  tail  that  wags  the  whole  dog. .  .  .  The  computer 
industry  will  be  forced  to  become  increasingly  concerned  with  the  us¬ 
age  of  people,  rather  than  with  the  computer’s  intestines,  (pp.  3-4) 

In  the  mid-1970s,  U.S.  government  agencies  responsible  for 
agriculture  and  social  security  initiated  some  large-scale  data- 
processing  system  development  efforts,  described  in  Pew  (2003). 
Although  these  efforts  did  not  succeed,  they  led  to  method¬ 
ological  innovation  in  the  use  of  style  guides,  usability  labs,  pro¬ 
totyping,  and  task  analysis. 

In  1980,  three  significant  HE&E  books  were  published:  two 
on  "VDT  design  (Cakir,  Hart,  &  Stewart,  1980;  Grandjean  & 
"Vigliani,  1980)  and  one  of  general  guidelines  (Damodaran,  Simp¬ 
son,  &  "Wilson,  1980) .  German  work  on  "VDT  standards,  first  pub¬ 
lished  in  1981,  provided  an  economic  incentive  to  design  for 
human  capabilities  by  threatening  to  ban  noncompliant  prod¬ 
ucts.  Later  in  1981  a  corresponding  ANSI  standards  group 
formed  for  office  and  text  systems. 

Information  Systems 

Beginning  in  1967,  the  journal  Management  Science  published 
a  column  titled  “Information  Systems  in  Management  Science.” 
Early  definitions  of  IS  included  “an  integrated  man/machine  sys¬ 
tem  for  providing  information  to  support  the  operation,  man¬ 
agement,  and  decision-making  functions  in  an  organization”  and 
“the  effective  design,  delivery  and  use  of  information  systems  in 
organizations”  (G.  B.  Davis,  1974;  Keen,  1980;  cited  in  Zhang, 
Nah,  &  Preece,  2004) .  A  historical  survey  of  IS  research  (Banker  & 
Kaufmann,  2004)  identified  HCI  as  one  of  five  major  research 
streams.  This  stream  began  with  a  paper  on  challenges  in  deal¬ 
ing  with  computer-generated  information  (Ackoff,  1967). 

Companies  acquired  expensive  business  computers  to  ad¬ 
dress  major  organizational  concerns.  At  times,  the  principal 
concern  was  to  appear  modern  (Greenbaum,  1979),  but  when 
computers  were  used,  managers  could  be  chained  to  them 
almost  as  tightly  as  operator  and  data-entry  “slaves.”  That  said, 
operator  or  end-user  resistance  to  using  a  system  was  a  ma¬ 
jor  management  concern.  The  sociotechnical  approach  to  sys¬ 
tem  design  was  one  response;  it  educated  representative 
workers  in  technology  possibilities  and  involved  them  in  de¬ 
sign  in  part  to  increase  acceptance  of  the  resulting  system 
(Mumford,  1971). 


A  Moving  Target:  The  Evolution  of  Human-Computer  Interaction  •  7 


Cognitive  style,  a  major  topic  of  early  IS  research,  focused  on 
difficulties  that  managers  had  communicating  with  people  knowl¬ 
edgeable  about  computers.  IS  researchers  published  HCI  articles 
in  management  journals  and  also  in  the  human  factors-oriented 
IJMMS.  The  latter,  which  became  International  Journal  of  Hu¬ 
man-Computer  Studies  in  1994,  was  the  23rd  most  influential  IS 
journal  according  to  Mylonopoulos  and  Theoharakis  (2001,  with 
their  calculation  amended  in  Grudin,  2005,  p.  60  note  17). 

Programming:  Subject  of  Study,  Source  of  Change 

In  the  1960s  and  1970s,  more  than  1,000  research  papers  on 
variables  affecting  programming  performance  were  published 
(Baecker  &  Buxton,  1987).  Most  viewed  programming  in  isola¬ 
tion,  independent  of  organizational  context.  Gerald  Weinberg’s 
landmark  The  Psychology  of  Computer  Programming  appeared 
in  1971.  In  1980,  Ben  Shneiderman  published  Software  Psy¬ 
chology,  and  in  1981  Beau  Sheil  reviewed  studies  of  program¬ 
ming  notation  (conditionals,  control  flow,  data  types),  practices 
(flowcharting,  indenting,  variable  naming,  commenting),  and 
tasks  (learning,  coding,  debugging). 

Programmers  changed  their  own  field  through  invention.  In 
1970,  Xerox  Palo  Alto  Research  Center  (PARC)  was  founded  to 
advance  computer  technology  by  developing  new  hardware, 
programming  languages,  and  programming  environments.  It 
drew  researchers  and  system  builders  from  the  labs  of  Engel- 
bart  and  Sutherland.  In  1971,  Allen  Newell  of  Carnegie  Mellon 
University  proposed  a  project  to  PARC,  launched  three  years 
later:  “Central  to  the  activities  of  computing — programming, 
debugging,  etc. — are  tasks  that  appear  to  be  within  the  scope 
of  this  emerging  theory  (a  psychology  of  cognitive  behavior)” 
(quoted  in  Card  &  Moran,  1986). 

Like  HUSAT,  also  launched  in  1970,  PARC  had  a  broad  re¬ 
search  charter.  HUSAT  focused  on  ergonomics,  anchored  in 
the  tradition  of  nondiscretionary  use,  one  component  of  which 
was  the  human  factors  of  computing.  PARC  focused  on  com¬ 
puting,  anchored  in  visions  of  discretionary  use,  one  compo¬ 
nent  of  which  was  also  the  human  factors  of  computing.  Re¬ 
searchers  at  PARC  and  a  few  other  places  extended  the 
primarily  perceptual-motor  focus  of  human  factors  to  higher- 
level  cognition.  HUSAT,  influenced  by  sociotechnical  design, 
extended  human  factors  by  considering  organizational  factors. 

Computer  Science:  A  New  Discipline 

Computer  science  departments  emerged  in  the  mid-1960s. 
Some  arose  out  of  engineering.  Computer  graphics  was  one 
such  specialization  of  particular  relevance  to  HCI;  software  en¬ 
gineering  came  later.  Other  computer  science  departments 
originated  as  applied  mathematics;  this  background  was  shared 
by  many  early  artificial  intelligence  researchers. 

Early  machines  capable  of  interesting  work  in  these  areas 
were  expensive.  They  were  uniformly  funded  without  regard 
to  cost  by  branches  of  the  military,  for  which  technical  success 
was  the  sole  criterion  (Norberg  &  O’Neill,  1996).  ARPA  funding 
under  the  direction  of  Licklider,  Sutherland,  and  their  succes¬ 
sors  played  a  major  role.  Reliance  on  massive  funding  meant 


that  researchers  were  concentrated  at  a  few  centers.  These  en¬ 
vironments  bore  little  resemblance  to  the  batch  and  timeshared 
environments  of  business  computing.  Hands-on  users  in  re¬ 
search  settings  were  technically  savvy.  They  had  less  need  for 
low-level  interface  enhancements. 

The  computer  graphics  and  AI  perspective  that  developed 
in  these  centers  differed  from  that  of  the  HCI  researchers  of 
the  time,  who  focused  on  less  expensive,  less  capable  systems 
that  could  be  studied  in  many  more  settings.  To  HCI,  hard¬ 
ware  advances  meant  greater  computing  capability  at  a  rela¬ 
tively  fixed  low  price.  Computer  graphics  and  AI  required 
high  levels  of  computation — hardware  advances  meant  de¬ 
clining  cost  for  a  relatively  fixed  level  of  computation.  Only 
later  would  widely  available  machines  be  able  to  support 
graphical  interfaces  and  AI  programming.  Nevertheless,  be¬ 
tween  1965  and  1980  some  computer  science  researchers  fo¬ 
cused  on  interaction,  which  had  been  part  of  Ivan  Suther¬ 
land’s  initial  vision. 

Computer  Graphics:  realism  and  interaction.  In 

1968,  Sutherland  joined  David  Evans  to  establish  a  hugely  in¬ 
fluential  computer  graphics  lab  at  the  University  of  Utah.  The 
computer  science  department  there  was  founded  in  1965,  one 
in  the  first  wave  that  emerged  from  mathematics  and  electrical 
engineering.  The  western  migration  continued  as  students 
from  the  lab,  including  Alan  Kay  and  'William  Newman  (and 
later  Jim  Blinn  and  Jim  Clark),  went  to  California.  Most  graphics 
systems  were  built  on  the  DEC  PDP-1  and  PDP-7.  These  expen¬ 
sive  machines — the  list  price  of  a  high-resolution  display  alone 
was  over  $100,000  in  today’s  dollars — were  capable  of  multi¬ 
tasking,  but  graphics  programs  generally  required  all  of  the 
processing  for  one  task. 

In  1973,  the  Xerox  Alto  arrived.  It  was  a  powerful  step  to¬ 
ward  realizing  Alan  Kay’s  vision  of  computation  as  a  medium 
for  personal  computing  (Kay  &  Goldberg,  1977).  It  was  too 
expensive  to  be  widely  used — the  Alto  never  became  a  prod¬ 
uct — and  not  powerful  enough  to  support  high-end  graphics 
research.  However,  it  was  a  machine  produced  in  volume  that 
supported  graphical  user  interfaces  of  the  kind  Engelbart  had 
prototyped. 

■William  Newman  expressed  the  result  this  way:  “Every¬ 
thing  changed — the  computer  graphics  community  got  inter¬ 
ested  in  realism,  I  remained  interested  in  interaction,  and  I 
eventually  found  myself  doing  HCI”  (personal  communica¬ 
tion).  Ron  Baecker  and  Jim  Foley  were  other  graphics  re¬ 
searchers  whose  focus  shifted  to  broader  interaction  issues. 
Foley  and  'Wallace  (1974)  identified  requirements  for  design¬ 
ing  “interactive  graphics  systems  whose  aim  is  good  symbiosis 
between  man  and  machine,”  and  18  papers  in  the  first  SIG- 
GRAPH  conference  the  same  year  had  “interactive”  or  “inter¬ 
action”  in  their  titles. 

At  Xerox,  Larry  Tesler  and  Tim  Mott  took  another  step,  rec¬ 
ognizing  that  the  Alto  could  support  a  graphical  interface  ac¬ 
cessible  to  untrained  people.  By  early  1974,  they  had  developed 
the  G'YPSY  text  editor,  which  along  with  Xerox’s  Bravo  editor  de¬ 
veloped  by  Charles  Simonyi  preceded  and  influenced  Microsoft 
■Word  (Hiltzik,  1999). 

The  distinct  focus  on  interaction  was  given  a  voice  in  1976, 
when  SIGGRAPH  sponsored  a  two-day  workshop  in  Pittsburgh 
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titled,  “User  Oriented  Design  of  Interactive  Graphics  Systems.” 
Participants  who  were  later  active  in  CHI  included  Jim  Foley, 
William  Newman,  Ron  Baecker,  John  Bennett,  Phyllis  Reisner, 
and  Tom  Moran. 

J.C.R.  Licklider  and  Nicholas  Negroponte  presented  vision 
papers.  UODIGS’76  can  be  thought  of  as  ending  the  visionary 
period,  embodying  an  idea  whose  time  had  not  quite  yet  come. 
Licklider  (1976,  p.  89)  saw  it  clearly: 

Interactive  computer  graphics  appears  likely  to  be  one  of  the  main 
forces  that  will  bring  computers  directly  into  the  lives  of  very  large  num¬ 
bers  of  people  during  the  next  two  or  three  decades.  Truly  user-oriented 
graphics  of  sufficient  power  to  be  useful  to  large  numbers  of  people 
has  not  been  widely  affordable,  but  it  will  soon  become  so,  and,  when 
it  does,  the  appropriateness  and  quality  of  the  products  offered  will  to 
a  large  extent  determine  the  future  of  computers  as  intellectual  aids  and 
partners  of  people. 

UODIGS  was  not  repeated  and  the  150-page  proceedings 
were  not  cited.  Not  until  1981  was  another  such  “user-oriented 
design”  conference  held,  after  which  such  conferences  were 
held  every  year.  Application  was  not  quite  at  hand;  most  HCI 
research  remained  focused  on  interaction  driven  by  commands, 
forms,  and  full-page  menus. 

Artificial  Intelligence:  winter  follows  summer.  In  the 

late  1960s  and  early  1970s,  AI  burst  onto  the  scene,  promising  to 
transform  HCI.  It  did  not  go  as  expected. 

Logically,  AI  and  HCI  are  closely  related.  What  are  intelligent 
machines  for  if  not  to  interact  with  people?  AI  research  has  in¬ 
fluenced  HCI:  speech  recognition  and  natural  language  are 
perennial  HCI  topics;  expert,  knowledge-based,  adaptive,  and 
mixed-initiative  systems  have  been  tried,  as  have  applications 
of  production  systems,  neural  nets,  and  fuzzy  logic;  and  recently 
human-robot  interaction  has  attracted  attention. 

Although  some  AI  features  make  their  way  into  systems  and 
applications,  frequent  predictions  that  more  powerful  machines 
would  soon  bring  major  AI  technologies  into  widespread  use 
were  not  borne  out.  Thus,  AI  did  not  come  into  focus  in  HCI, 
and  AI  researchers  have  shown  limited  interest  in  HCI. 

To  piece  this  together  requires  a  brief  review  of  the  early  his¬ 
tory.  The  term  artificial  intelligence  first  appeared  in  a  1955 
call  by  John  McCarthy  for  a  meeting  on  machine  intelligence 
that  was  held  in  Dartmouth  the  next  year.  Also  in  1956,  Alan 
Turing’s  prescient  essay,  ’’Computing  Machinery  and  Intelli¬ 
gence,”  was  reprinted  in  The  World  of  Mathematics,  where  it  at¬ 
tracted  attention.  (It  was  first  published  in  1950,  as  were  Claude 
Shannon’s  “Programming  a  Computer  for  Playing  Chess”  and 
Isaac  Asimov’s  three  laws  of  robotics.)  Newell  and  Simon’s 
logic-theory  machine  appeared  in  1956,  after  which  they  fo¬ 
cused  on  developing  a  general  problem  solver.  McCarthy  in¬ 
vented  LISP  in  1958. 

AI  pioneers  were  trained  primarily  in  mathematics  and  logic, 
where  much  can  be  built  from  a  few  axioms  and  a  small  set  of 
rules.  Mathematics  is  considered  a  high  form  of  intelligence,  even 
by  nonmathematicians.  It  is  not  surprising  that  AI  researchers  an¬ 
ticipated  that  machines  that  operate  logically  and  tirelessly  would 
make  great  strides.  Nor  is  it  surprising  that  mathematicians  would 
overlook  the  complexity  and  illogic  that  mark  human  users  and 
social  constructs.  Early  work  on  AI  focused  heavily  on  theorem 
proving  and  on  games  and  problems  with  a  strong  logical  focus, 


such  as  chess  and  go.  In  1988  McCarthy,  who  espoused  predi¬ 
cate  calculus  as  a  foundation  for  AI,  summed  it  up  as  follows: 

As  suggested  by  the  term  “artificial  intelligence,”  we  weren’t  considering 
human  behavior  except  as  a  clue  to  possible  effective  ways  of  doing 
tasks.  The  only  participants  who  studied  human  behavior  were  Newell 
and  Simon.  (The  goal)  was  to  get  away  from  studying  human  behavior 
and  consider  the  computer  as  a  tool  for  solving  certain  classes  of  prob¬ 
lems.  Thus,  AI  was  created  as  a  branch  of  computer  science  and  not  as 
a  branch  of  psychology. 

Strong  claims  date  back  to  the  predawn  of  AI,  when  in  the 
summer  of  1949  Alan  Turing  was  quoted  in  the  London  Times-. 

I  do  not  see  why  [the  computer]  should  not  enter  any  one  of  the  fields 
normally  covered  by  the  human  intellect,  and  eventually  compete  on 
equal  terms.  I  do  not  think  you  can  even  draw  the  line  about  sonnets, 
though  the  comparison  is  perhaps  a  little  bit  unfair  because  a  sonnet 
written  by  a  machine  will  be  better  appreciated  by  another  machine. 

When  he  identified  speech  understanding  as  important,  Lick¬ 
lider  realized  its  difficulty.  He  predicted  that  intelligent  machines 
would  appear  in  10  to  500  years  (Pew,  2003) .  As  director  of  ARPAs 
Information  Processing  Techniques  Office  from  1962-1964,  he 
initiated  extensive  support  for  computer  science  in  general  and 
AI  in  particular.  MIT’s  Project  Mac,  founded  in  1963  by  Marvin 
Minsky  and  others,  initially  received  $13  million  per  year,  rising 
to  |24  million  in  1969.  ARPA  also  sponsored  the  Artificial  Intelli¬ 
gence  Laboratory  at  Stanford  Research  Institute,  AI  research  at 
SRI  and  CMU,  and  Nicholas  Negroponte’s  Machine  Architecture 
Group  at  MIT.  An  early  dramatic  result,  SRI’s  Shakey  the  Robot, 
was  featured  in  1970  articles  inLife  (Darrach,  1970)  wANational 
Geographic.  Given  a  simple  but  nontrivial  task,  Shakey  went  to 
the  desired  location,  scanned  and  reasoned  about  the  surround¬ 
ings,  and  moved  objects  as  needed  to  accomplish  the  goal  (for 
Shakey  at  work:  http:/Avww.ai. sri.com/shakey/). 

In  1970,  Negroponte  outlined  the  case  for  machine 
intelligence: 

People  generally  distrust  the  concept  of  machines  that  approach  (and 
thus  why  not  pass?)  our  own  human  intelligence  .  .  .  Why  ask  a  machine 
to  learn,  to  understand,  to  associate  courses  with  goals,  to  be  self-im¬ 
proving,  to  be  ethical — in  short,  to  be  intelligent?  .  .  .  Because  any  de¬ 
sign  procedure,  set  of  rules,  or  truism  is  tenuous,  if  not  subversive, 
when  used  out  of  context  or  regardless  of  context,  (p.  1) 

Negroponte  followed  this  insightful  diagnosis  with  a  false 
inference: 

It  follows  that  a  mechanism  must  recognize  and  understand  the  context 
before  carrying  out  an  operation,  (p.  1) 

Not  when  the  mechanism  is  guided  by  a  human  who  is  cog¬ 
nizant  of  the  context.  However,  Negroponte  used  this  to  build 
a  case  for  an  ambitious  research  program: 

Therefore,  a  machine  must  be  able  to  discern  changes  in  meaning 
brought  about  by  changes  in  context,  hence,  be  intelligent.  And  to  do 
this,  it  must  have  a  sophisticated  set  of  sensors,  effectors,  and  proces¬ 
sors  to  view  the  real  world  directly  and  indirectly.  ...  A  paradigm  for 
fruitful  conversations  must  be  machines  that  can  speak  and  respond  to 
a  natural  language.  .  .  .  But,  the  tete-a-tete  (sic)  must  be  even  more  di¬ 
rect  and  fluid;  it  is  gestures,  smiles,  and  frowns  that  turn  a  conversa- 
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tion  into  a  dialogue.  .  .  .  Hand-waving  often  carries  as  much  meaning  as 
text.  Manner  carries  cultural  information:  the  Arabs  use  their  noses,  the 
Japanese  nod  their  heads.  .  .  (pp.  12-13) 

Imagine  a  machine  that  can  follow  your  design  methodology,  and  at 
the  same  time  discern  and  assimilate  your  conversational  idiosyncrasies. 
This  same  machine,  after  observing  your  behavior,  could  build  a  pre¬ 
dictive  model  of  your  conversational  performance.  Such  a  machine 
could  then  reinforce  the  dialogue  by  using  the  predictive  model  to  re¬ 
spond  to  you  in  a  manner  that  is  in  rhythm  with  your  personal  behav¬ 
ior  and  conversational  idiosyncrasies  .  .  .  The  dialogue  would  be  so  inti¬ 
mate — even  exclusive — that  only  mutual  persuasion  and  compromise 
would  bring  about  ideas,  ideas  unrealizable  by  either  conversant  alone. 
No  doubt,  in  such  a  symbiosis  it  would  not  be  solely  the  human  de¬ 
signer  who  would  decide  when  the  machine  is  relevant. 

The  same  year,  Negroponte’s  MIT  colleague  Minsky  took  the 
next  step,  as  reported  in  Life-. 

In  from  three  to  eight  years  we  will  have  a  machine  with  the  general  in¬ 
telligence  of  an  average  human  being.  I  mean  a  machine  that  will  be  able 
to  read  Shakespeare,  grease  a  car,  play  office  politics,  tell  a  joke,  and 
have  a  fight.  At  that  point,  the  machine  will  begin  to  educate  itself  with 
fantastic  speed.  In  a  few  months,  it  will  be  at  genius  level  and  a  few 
months  after  that  its  powers  will  be  incalculable.  (Darrach,  1970,  p.  60) 

Other  AI  researchers  told  Darrach  that  Minsky’s  timetable 
was  somewhat  ambitious: 

Give  us  15  years’  was  a  common  remark — but  all  agreed  that  there 
would  be  such  a  machine  and  that  it  would  precipitate  the  third  Indus¬ 
trial  Revolution,  wipe  out  war  and  poverty  and  roll  up  centuries  of 
growth  in  science,  education  and  the  arts.  (p.  60) 

Responding  to  such  calls,  ARPA  initiated  major  funding  of 
speech  recognition  and  natural  language  understanding  in 
1971.  Five  years  later,  disappointed  with  the  progress,  ARPA  dis¬ 
continued  support  for  speech  and  language — for  a  while. 

In  Europe,  a  similar  story  unfolded.  Through  the  1960s,  AI 
research  expanded  in  Great  Britain,  a  principal  proponent  being 
Turing’s  former  colleague  Donald  Michie.  Then  in  1973,  the 
Lighthill  Report,  commissioned  by  the  Science  and  Engineer¬ 
ing  Research  Council,  reached  generally  negative  conclusions 
about  the  prospects  for  AI  systems  to  scale  up  to  address  real- 
world  problems,  and  almost  all  government  funding  was  cut  off 

The  next  decade  was  described  as  an  AI  winter. 


1980-1985:  DISCRETIONARY 
USE  COMES  INTO  FOCUS 


In  1980,  most  people  in  HE&E  and  IS  were  focused  on  the 
down-to-earth  business  of  making  efficient  use  of  expensive 
mainframes.  Almost  unnoticed  was  the  foreshadowing  of  a 
major  shift.  Less  expensive  and  more  capable  minicomputers 
based  on  LSI  technology  enabled  Digital  Equipment  Corpora¬ 
tion,  Wang  Laboratories,  and  Data  General  to  make  inroads  into 
the  mainframe  market.  At  the  low  end,  home  computers  gained 
capability.  Growing  numbers  of  student  and  hobbyist  program¬ 
mers  were  drawn  to  these  minis  and  micros,  creating  a  popula¬ 
tion  of  hands-on  discretionary  users. 

Then,  between  1981  and  1984,  the  Xerox  Star,  IBM  PC,  Apple 
Lisa,  Symbolics  and  LMI  Lisp  machines,  Sun  Microsystems  and 


Silicon  Graphics  workstations,  and  the  Apple  Macintosh  were 
released. 

Another  major  event  came  on  January  1,  1984,  when  AT&T 
broke  into  competing  companies.  AT&T  had  had  the  most  em¬ 
ployees  and  the  most  customers  of  any  U.S.  company.  Neither 
customers  nor  employees  had  much  discretion  in  technology 
use,  so  AT&T  and  its  Bell  Laboratories  division  had  focused  on 
improving  training  and  efficiency  through  Human  Eactors.  In 
addition  to  competing  in  telephony,  AT&T  entered  the  PC  mar¬ 
ket  inl985  with  the  ill-fated  Unix  PC.  AT&T  and  the  regional  op¬ 
erating  companies  faced  customers  who  had  choices,  and  their 
HCI  focus  broadened  accordingly  (see  Israelski  &  Lund,  2003). 

In  general,  less-expensive  computers  created  markets  for 
shrinkwrap  software,  and  for  the  first  time,  computer  and  soft¬ 
ware  companies  targeted  significant  numbers  of  nontechnical 
hands-on  users  who  would  get  little  or  no  formal  training. 

After  20  years,  the  visions  were  being  realized.  Nonprogram¬ 
mers  were  choosing  to  use  computers  to  do  their  work.  The  psy¬ 
chology  of  discretionary  users  was  of  particular  interest  to  two 
groups:  (a)  psychologists  who  liked  to  use  computers  and  (b) 
technology  companies  planning  to  sell  to  discretionary  users. 
There  was  a  match:  Computer  and  telecommunication  companies 
hired  many  experimental  psychologists.  However,  before  describ¬ 
ing  this,  I  will  discuss  discretionary  use  and  its  contrast:  the  hands- 
on  use  required  to  keep  expensive  systems  running  efficiently. 

Discretion  in  Computer  Use 

Our  lives  are  distributed  along  a  continuum  between  the 
assembly-line  nightmare  of  Modern  Times  and  utopian  visions 
of  completely  empowered  individuals.  To  use  a  technology  or 
not  to  use  it:  Sometimes  we  have  a  choice,  other  times  we  don’t. 
On  the  phone,  I  may  have  to  wrestle  with  speech  recognition 
and  routing  systems.  In  contrast,  my  home  computer  use  is 
largely  discretionary.  The  workplace  often  lies  in  between:  Tech¬ 
nologies  are  recommended  or  prescribed,  but  we  can  ignore 
some  injunctions,  obtain  exceptions,  use  some  features  but  not 
others,  and  join  with  colleagues  to  press  for  changes  in  policy  or 
availability. 

Eor  early  computer  builders,  work  was  more  a  calling  than  a 
job,  but  operation  required  a  staff  to  carry  out  essential  but  less 
interesting  tasks.  Eor  the  first  half  of  the  computing  era,  most 
hands-on  use  was  by  people  hired  with  a  mandate.  Hardware  in¬ 
novation,  more  versatile  software,  and  steady  progress  in  under¬ 
standing  the  psychology  of  users  and  tasks — and  transferring 
that  understanding  to  software  developers — led  to  hands-on 
users  with  more  choice  in  what  they  did  and  how  they  did  it. 
Rising  expectations  played  a  role — people  learned  that  software 
is  flexible  and  expected  it  to  be  more  congenial.  Competition 
among  vendors  produced  alternatives.  Today  there  is  more  em¬ 
phasis  on  marketing  to  consumers  and  user  friendliness. 

Discretion  is  not  all  or  none.  No  one  must  use  a  computer, 
but  many  jobs  and  pastimes  require  it.  True,  people  can  resist, 
sabotage,  use  some  features  but  not  others,  or  quit  the  job.  How¬ 
ever,  a  clerk  or  systems  administrator  is  in  a  different  situation 
than  someone  using  technology  for  leisure  activity.  For  an  airline- 
reservation  operator,  computer  use  is  mandatory.  For  someone 
booking  his  or  her  own  flight,  computer  use  is  discretionary.  This 
chapter  explores  implications  of  these  differences. 
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Several  observers  noted  the  shift  toward  greater  discretion. 
A  quarter  century  ago,  John  Bennett  (1979)  predicted  that  dis¬ 
cretionary  use  would  lead  to  more  concern  for  usability.  A 
decade  later,  Liam  Bannon  (1991)  noted  broader  implications  of 
a  shift  “from  human  factors  to  human  actors.”  However,  the  tra¬ 
jectory  is  not  always  toward  choice.  Discretion  can  be  cur¬ 
tailed — for  example,  a  word  processor  is  now  required  for  many 
jobs  and  is  no  longer  simply  an  alternative  to  a  typewriter.  Even 
in  an  era  of  specialization,  customization,  and  competition,  the 
exercise  of  choice  varies  over  time  and  across  contexts. 

Discretion  is  one  factor  of  many,  but  an  analysis  of  its  role 
casts  light  on  the  relationships  among  diverse  HCI  efforts:  why 
they  differ  and  why  they  have  remained  distinct. 

Minicomputers  and  Office  Automation 

Cabinet-sized  computers  that  could  support  several  people  were 
available  from  the  mid-1960s.  Starting  with  the  VAX  11/780,  su¬ 
perminis  in  the  late  1970s  were  capable  enough  to  affect  main¬ 
frame  sales  and  support  integrated  suites  of  productivity  tools.  In 
1980,  Digital  Equipment  Corporation,  Data  General,  and  Wang 
Laboratories  were  growth  companies  near  Boston.  Digital  be¬ 
came  the  second  largest  computer  company  in  the  world. 

A  minicomputer  could  handle  a  database  of  moderate  size  or 
personal  productivity  tools  used  from  terminals.  Eor  “dumb  ter¬ 
minals,”  the  central  processor  handled  each  keystroke;  other 
minicomputers  came  with  terminals  with  a  processor  that  sup¬ 
ported  a  person  filling  out  a  screen  with  information  that  was 
then  sent  to  the  central  processor.  A  mini  could  support  a  small 
group  (or  “office”)  with  file  sharing,  applications  such  as  word 
processing,  spreadsheets,  and  e-mail,  and  output  devices.  They 
were  marketed  as  “office  systems,”  “office  automation  systems,” 
or  “office  information  systems.” 

In  1980,  the  Stanford  International  Symposium  on  Office 
Automation  launched  a  research  field  that  was  influential  for 
a  decade.  Douglas  Engelbart  contributed  two  papers  to  the 
proceedings  (Landau,  Bair,  &  Siegman,  1982).  The  same  year, 
the  American  Eederation  of  Information  Processing  Societies 
(AFIPS,  parent  organization  of  ACM  and  IEEE  at  the  time)  held 
the  first  of  seven  annual  Office  Automation  conferences  with 
an  associated  product  exhibition.  Also  in  1980,  ACM  formed  a 
Special  Interest  Group  on  Office  Automation  (SIGOA),  which 
two  years  later  launched  the  biennial  Office  Information  System 
Conference.  In  1983,  ACM  Transactions  on  Office  Information 
Systems  (TOOIS)  emerged,  one  year  after  the  independent  jour¬ 
nal  Office:  Technology  and  People. 

Office  Information  Systems,  which  focused  on  the  uses  of 
minicomputers  at  the  time,  was  positioned  alongside  Manage¬ 
ment  Information  Systems  (IS),  which  focused  on  mainframes. 
The  scope  is  reflected  in  the  charter  of  TOOIS:  database  theory, 
artificial  intelligence,  behavioral  studies,  organizational  theory, 
and  communications.  Minis  were  accessible  database  research 
tools;  Digital’s  PDP  series  was  a  favorite  for  AI  researchers;  minis 
were  familiar  to  behavioral  researchers  who  used  them  to  run 
and  analyze  experiments,  and  they  became  interactive  comput¬ 
ers  of  choice  for  many  organizations.  Computer-mediated  com¬ 
munication  was  an  intriguing  new  capability  that  easily  sup¬ 
ported  users  at  different  terminals  of  the  same  computer. 


The  researchers  were  discretionary  computer  users,  but 
most  office  workers  did  not  choose  their  tools.  The  term  “au¬ 
tomation,”  challenging  and  exciting  to  the  researchers,  conjured 
up  different  images  for  some  office  workers,  some  of  whom  pre¬ 
ferred  Engelbart’s  focus  on  augmentation. 

Papers  in  the  SIGOA  newsletter,  COIS,  and  TOOIS  included 
technical  work  on  database  theory,  a  moderate  number  of  AI  pa¬ 
pers  (the  AI  winter  had  not  yet  ended),  decision-support  and 
computer-mediated  communication  papers  from  the  IS  com¬ 
munity,  and  behavioral  studies  by  researchers  who  later  became 
active  in  CHI.  IS  papers  were  prevalent  in  the  newsletter  and 
technical  papers  in  the  journal.  The  journal  was  also  a  major 
outlet  for  behavioral  studies  before  HCI  started  in  1985. 

Although  OA/OIS  research  was  eventually  absorbed  by  other 
fields,  it  led  the  way  in  addressing  a  range  of  important  emerg¬ 
ing  issues,  including  hypertext,  computer-mediated  communi¬ 
cation,  and  collaboration  support  more  generally. 

The  Formation  of  ACM  SlGCHl 

Major  threads  of  HCI  research  are  illustrated  in  Fig  1:  Human  Fac¬ 
tors,  Information  Systems,  and  the  research  focused  on  discre¬ 
tionary  hands-on  use  that  emerged  in  the  1980s.  In  1980,  Human 
Interaction  with  Computers,  by  Harold  Smith  and  Thomas 
Green,  perched  on  a  cusp.  It  briefly  addressed  “the  human  as  a 
systems  component”  (the  nondiscretionary  perspective,  p.  fx). 
One  third  covered  research  on  programming.  The  remainder  ad¬ 
dressed  “nonspecialist  people,”  discretionary  users  who  were  not 
computer  specialists.  Smith  and  Green  wrote,  “It’s  not  enough 
just  to  establish  what  people  can  and  cannot  do;  we  need  to 
spend  just  as  much  effort  establishing  what  people  can  and  want 
to  do”  (italics  in  the  original;  p.  viii). 

That  year,  as  IBM  prepared  to  launch  the  PC,  a  groundswell 
of  attention  to  computer  user  behavior  was  building.  IBM  had 
recently  added  software  to  hardware  as  a  product  focus.  Sev¬ 
eral  cognitive  psychologists  joined  an  IBM  research  group  that 
included  John  Gould,  who  had  engaged  in  human  factors  re¬ 
search  since  the  late  1960s.  They  initiated  empirical  studies  of 
programming  and  software  design  and  use.  Other  psychologists 
leading  recently  formed  HCI  groups  included  Phil  Barnard  at 
the  Medical  Research  Council  Applied  Psychology  Unit  (APU) 
in  Cambridge,  England;  Tom  Landauer  at  Bell  Labs;  Donald  Nor¬ 
man  at  the  University  of  California,  San  Diego;  and  John  White- 
side  at  Digital  Equipment  Corp. 

Xerox  PARC  and  its  CMU  collaborators  were  particularly  active, 
continuing  work  in  several  areas  that  proved  to  have  singular  in¬ 
fluence.  The  1981  Star,  with  its  carefully  designed  graphical  user 
interface,  was  not  a  commercial  success  (nor  were  a  flurry  of  GUIs 
that  followed,  including  the  Apple  Lisa),  but  it  influenced  re¬ 
searchers  and  developers — and  of  course  the  Macintosh. 

Communications  of  the  ACM  created  a  “Human  Aspects  of 
Computing”  department  in  1980.  The  next  year,  Tom  Moran 
edited  a  special  issue  of  Computing  Surveys  on  “The  Psychol¬ 
ogy  of  the  Computer  User.”  Also  in  1981,  the  ACM  Special  In¬ 
terest  Group  on  Social  and  Behavioral  Science  Computing 
(SIGSOC)  extended  its  workshop  to  cover  interactive  software 
design  and  use.  In  1982,  a  conference  in  Gaithersburg,  Mary¬ 
land,  on  “Human  Factors  in  Computing  Systems”  was  unex- 
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FIGURE  1 .  HCl  events  and  topics  discussed  in  this  chapter.  Expansion  of  acronyms,  significance  of  people  and  books,  and  reasons 
for  their  placement  are  in  the  text. 


pectedly  well  attended.  Shortly  afterwards,  SIGSOC  shifted  its 
focus  to  computer-human  interaction  and  its  name  to  SIGCHI 
(Borman,  1996). 

In  1983,  the  first  CHI  conference  drew  more  than  1,000  peo¬ 
ple.  Half  of  the  58  papers  were  from  the  seven  research  labs 
just  mentioned.  Cognitive  psychologists  in  industry  dominated 
the  program,  although  the  HFS  cosponsored  the  conference 
and  contributed  the  program  chair  Richard  Pew,  committee 
members  Sid  Smith,  H.  Rudy  Ramsay,  and  Paul  Green,  and  sev¬ 
eral  presenters.  Brian  Shackel  and  society  president  Robert 
Williges  gave  tutorials  the  first  day. 

The  first  profession  to  become  discretionary  hands-on  users 
was  computer  programming,  as  paper  coding  sheets  were  dis¬ 
carded  in  favor  of  text  editing  at  interactive  terminals,  PCs,  and 
small  minicomputers.  Therefore,  many  early  CHI  papers,  by  Ruven 
Brooks,  Bill  Curtis,  Thomas  Green,  Ben  Shneiderman,  and  others, 
continued  the  psychology-of-programming  research  thread.  IBM 
Watson  researchers  also  contributed,  as  noted  byjohn  Thomas: 

One  of  the  main  themes  of  the  early  work  was  basically  that  we  in  IBM 
were  afraid  that  the  market  for  computing  would  be  limited  by  the  num¬ 
ber  of  people  who  could  program  complex  systems,  so  we  wanted  to 


find  ways  for  “nonprogrammers”  to  be  able,  essentially,  to  program, 
(personal  communication,  October  2003) 

Psychologists  and  studies  of  editing  were  so  prevalent  that  in 
1984  Thomas  Green  remarked  that,  “text  editors  are  the  white 
rats  of  HCI.”  As  personal  computing  spread  and  the  same  meth¬ 
ods  were  applied  to  studying  other  discretionary  use,  studies 
of  programming  gradually  disappeared. 

CHI  focused  on  novice  use.  Initial  experience  is  particularly 
important  for  discretionary  users  and  the  vendors  developing 
software  for  them.  Novice  users  are  also  a  natural  focus  when 
studying  new  technologies,  and  a  critical  focus  when  more  peo¬ 
ple  take  up  computing  each  year  than  did  the  year  before. 

Routine,  experienced  use  was  still  widespread.  Computer 
databases  were  extensively  used  by  airlines,  banks,  government 
agencies,  and  other  organizations.  This  hands-on  activity  was 
rarely  discretionary.  Managers  mainly  oversaw  development 
and  analyzed  data,  leaving  data  entry  and  information  retrieval 
to  people  hired  for  those  jobs.  Improving  skilled  data  handling 
was  a  human  factors  undertaking.  CHI  studies  of  database  use 
were  few — I  count  three  over  a  decade,  all  focused  on  novice 
or  casual  use. 
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Fewer  European  companies  produced  mass-market  soft¬ 
ware.  European  research  favored  in-house  development  and 
use.  At  Loughborough  University,  HUSAT  focused  on  job  design 
(the  division  of  labor  between  people  and  systems)  and  collab¬ 
orated  with  the  Institute  for  Consumer  Ergonomics,  particularly 
on  product  safety.  In  1984,  Loughborough  initiated  an  HCI  grad¬ 
uate  program  drawing  on  human  factors,  industrial  engineering, 
and  computer  science.  The  International  Conference  on  HCI 
(INTERACT)  conference,  first  held  in  London  in  1984  and  chaired 
by  Shackel,  drew  HE&E  and  CHI  researchers. 

In  his  perceptive  essay  just  cited,  Bannon  urged  that  more  at¬ 
tention  be  paid  to  discretionary  use  while  criticizing  CHI’s  heavy 
emphasis  on  initial  experiences;  this  may  have  been  a  reflec¬ 
tion  of  Bannon’s  European  perspective. 

The  visionaries  were  not  familiar  to  many  of  the  CHI  re¬ 
searchers  who  helped  realize  some  of  their  visions.  The  633  ref¬ 
erences  in  the  58  papers  presented  at  CHI  83  included  many  au¬ 
thored  by  well-known  cognitive  scientists,  but  Bush,  Sutherland, 
and  Engelbart  were  not  cited  at  all.  Many  computer  scientists 
familiar  with  the  early  work  entered  CHI  a  few  years  later,  and 
the  CHI  psychologists  eventually  discovered  and  identified  with 
these  pioneers  who  shared  their  concern  for  discretionary  use, 
provided  conceptual  continuity,  and  bestowed  legitimacy  on  a 
young  enterprise  seeking  to  establish  itself  academically  and 
professionally. 

CHI  and  Human  Factors  Diverge 

Between  1980  and  1985,  researchers  at  Xerox  PARC  and  CMU 
introduced  another  influential  research  program.  Card,  Moran, 
and  Newell  (1980a,  1980b)  described  a  “keystroke-level  model 
for  user  performance  time  with  interactive  systems”  with  cog¬ 
nitive  components  in  the  goals,  operators,  methods,  selection 
rules  (GOMS)  model  that  were  the  basis  for  their  landmark  1983 
book.  The  Psychology  of  HCI. 

This  work  was  highly  respected  within  CHI  although  it  did 
not  address  discretionary,  novice  use.  On  the  contrary,  it  fo¬ 
cused  on  the  repetitive  expert  use  studied  in  human  factors.  In 
fact,  it  was  explicitly  positioned  in  opposition  to  the  stimulus- 
response  bias  of  human-factors  research: 

Human-factors  specialists,  ergonomists,  and  human  engineers  will  find 
that  we  have  synthesized  ideas  from  modern  cognitive  psychology  and 
artificial  intelligence  with  the  old  methods  of  task  analysis.  .  .  .  The  user 
is  not  an  operator.  He  does  not  operate  the  computer,  he  communi¬ 
cates  with  it.  .  .  .  (p.  viii) 

Newell  and  Card  (1985)  noted  that  human  factors  had  a  role 
in  design,  but  continued. 

Classical  human  factors  . . .  has  all  the  earmarks  of  second-class  status.  (Our 
approach)  avoids  continuation  of  the  classical  human-factors  role  (by 
transforming)  the  psychology  of  the  interface  into  a  hard  science,  (p.  221) 

Card  wrote: 

Human  factors  was  the  discipline  we  were  trying  to  improve.  ...  I  per¬ 
sonally  changed  the  (CHI  conference)  call  in  1986,  so  as  to  emphasize 
computer  science  and  reduce  the  emphasis  on  cognitive  science,  be¬ 


cause  I  was  afraid  that  it  would  just  become  human  factors  again.  (E- 
mail,  June  2004). 

“Hard  science,  in  the  form  of  engineering,  drives  out  soft  science,  in 
the  form  of  human  factors,”  wrote  Newell  and  Card.  (1985,  p.  212) 

Ultimately,  human-performance  modeling  drew  a  modest- 
but-fervent  CHI  following.  Key  goals  of  the  modelers  differed 
from  those  of  practitioners  and  other  researchers.  “The  central 
idea  behind  the  model  is  that  the  time  for  an  expert  to  do  a  task 
on  an  interactive  system  is  determined  by  the  time  it  takes  to  do 
the  keystrokes,”  wrote  Card,  Moran,  and  Newell  (1980b).  Mod¬ 
eling  was  extended  to  a  range  of  cognitive  processes,  but  re¬ 
mained  most  useful  in  helping  to  design  for  nondiscretionary 
users,  such  as  telephone  operators  engaged  in  repetitive  tasks 
(e.g..  Gray,  John,  Stuart,  Lawrence,  &  Atwood,  1990).  Its  role  in 
augmenting  human  intellect  was  unclear. 

CHI  and  human  factors  moved  apart,  although  “Human  Fac¬ 
tors  in  Computing  Systems”  remains  the  CHI  conference  subti¬ 
tle.  They  were  never  closely  integrated.  Most  of  the  cognitive 
psychologists  had  turned  to  HCI  after  earning  their  degrees  and 
were  unfamiliar  with  the  human-factors  research  literature.  The 
HFS  did  not  again  cosponsor  CH,  and  its  researchers  disap¬ 
peared  from  the  CHI  program  committee.  Most  CHI  researchers 
who  had  published  in  the  annual  human-factors  conference  and 
Human  Factors  journal  shifted  to  CHI,  Communications  of  the 
ACM,  and  the  journalise/  established  in  1985  by  Moran  and  pub¬ 
lished  by  Erlbaum,  a  publisher  of  psychology  books  and  journals. 

The  shift  was  reflected  at  IBM  TJ.  "Watson  Research  Center. 
John  Gould  and  Clayton  Lewis  authored  a  CHI  83  paper  that 
nicely  defined  the  CHI  focus  on  user-centered,  iterative  design 
based  on  prototyping.  "Watson  cognitive  scientists  helped  shape 
CHI,  but  Gould’s  principal  focus  remained  on  human  factors;  he 
served  as  HFS  president  four  years  later.  Symbolically,  in  1984, 
"Watson’s  Human  Factors  Group  faded  away  and  a  User  Interface 
Institute  emerged. 

CHI  researchers  wanted  to  be  seen  as  engaged  in  “hard”  sci¬ 
ence  or  engineering.  The  terms  cognitive  engineering  and  us¬ 
ability  engineering  were  adopted.  In  the  first  paper  presented 
at  CHI  83,  “Design  Principles  for  Human-Computer  Interfaces,” 
Donald  Norman  applied  engineering  techniques  to  discre¬ 
tionary  use,  creating  “user-satisfaction  functions”  based  on  tech¬ 
nical  parameters.  Only  years  later  did  CHI  loosen  its  identifica¬ 
tion  with  engineering. 


Workstations  and  Another  AI  Summer 

High-end  workstations  from  Apollo,  Sun,  and  Silicon  Graphics 
appeared  between  1981  and  1984.  Graphics  researchers  no 
longer  had  to  congregate  in  heavily  financed  labs  (notably  MIT 
and  Utah  in  the  1960s;  MIT,  NYIT,  and  PARC  in  the  1970s).  Be¬ 
cause  these  workstations  did  not  reach  a  mass  market,  graph¬ 
ics  research  that  focused  on  photorealism  and  animation  did 
not  directly  influence  HCI  more  broadly. 

The  Xerox  Star  (actually  called  an  Office  "Workstation),  Ap¬ 
ple  Lisa,  and  other  commercial  GUIs  appeared,  but  when  the 
first  CHI  conference  was  held  in  December,  1983,  none  was  suc¬ 
ceeding.  They  were  priced  too  high  or  ran  on  processors  that 
were  too  weak  to  exploit  graphics  effectively. 
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In  1981,  Symbolics  and  LMI  introduced  workstations  opti¬ 
mized  to  run  the  Lisp  programming  language.  The  timing  could 
not  have  been  more  fortuitous.  In  October  of  that  year,  a  con¬ 
ference  on  Next  Generation  Technology  was  held  in  the  National 
Chamber  of  Commerce  auditorium  in  Tokyo,  and,  in  1982,  the 
Japanese  government  announced  the  establishment  of  the  In¬ 
stitute  for  New  Generation  Computer  Technology  (ICOT)  and  its 
10-year  Fifth-Generation  project  focused  on  AI.  AI  researchers 
in  Europe  and  the  United  States  sounded  the  alarm.  Donald 
Michie  of  Edinburgh  saw  it  as  a  threat  to  western  computer  tech¬ 
nology,  and  Ed  Feigenbaum  of  Stanford  wrote. 

The  Japanese  are  planning  the  miracle  product. .  .  .  They’re  going  to  give 
the  world  the  next  generation — the  Fifth  Generation — of  computers, 
and  those  machines  are  going  to  be  intelligent.  .  .  .  We  stand,  however, 
before  a  singularity,  an  event  so  unprecedented  that  predictions 
are  almost  silly.  .  .  .  Who  can  say  how  universal  access  to  machine 
intelligence — faster,  deeper,  better  than  human  intelligence — ^will  change 
science,  economics,  and  warfare,  and  the  whole  intellectual  and  socio¬ 
logical  development  of  mankind?  (Feigenbaum  &  McCorduck,  1983) 

At  the  same  time,  parallel  distributed  processing,  or  neural 
net  models,  seized  the  attention  of  researchers  and  media.  Used 
to  model  signal  detection,  motor  control,  semantic  processing, 
and  a  wide  range  of  phenomena,  neural  nets  represented  con¬ 
ceptual  and  technical  advances  over  earlier  AI  work  on  percep- 
trons,  but  were  of  particular  interest  because  the  new  genera¬ 
tion  of  minicomputers  and  workstations  supported  simulation 
experiments.  Production  systems,  a  computer-intensive  AI  mod¬ 
eling  approach  with  a  psychological  foundation  developed  at 
CMU,  gained  wider  use  in  research. 

These  developments  triggered  an  artificial  intelligence 
gold  rush.  As  with  actual  gold  rushes,  most  of  the  money  was 
made  by  those  who  outfitted  and  provisioned  the  prospec¬ 
tors,  although  generous  government  funding  again  flowed  to  AI 
researchers.  The  European  ESPRIT  and  UK  Alvey  programs 
invested  over  $200  million  per  year  starting  in  1984  (Oakley, 
1990).  In  the  United  States,  funding  for  the  DARPA  Strategic 
Computing  AI  program  alone,  begun  in  1983,  rose  to  almost 
$400  million  in  1988  (Norberg  &  O’Neill,  1996) .  Investment  in  AI 
by  150  U.S.  companies  was  estimated  at  about  |2  billion  in  1985 
(Kao,  1998). 

The  unfulfilled  promises  of  the  past  led  to  changes  this  time 
around.  General  problem  solving  was  emphasized  less,  domain- 
specific  problem  solving  was  emphasized  more.  Terms  such  as 
intelligent  knowledge-based  systems,  knowledge  engineering, 
expert  systems,  machine  learning,  language  understanding,  im¬ 
age  understanding,  neural  nets,  and  robotics  were  often  favored 
over  AI. 

In  1983,  Raj  Reddy  of  CMU  and  'Victor  Zue  of  MIT  criticized 
the  mid-1970s  abandonment  of  funding  for  speech-processing 
research,  and  soon  funds  were  plentiful  for  that  and  other  AI 
efforts  (Norberg  &  O’Neill,  1996,  p.  238).  Johnson  (1985)  esti¬ 
mated  that  800  corporate  employees  and  400  academics  were 
working  on  natural  language-processing  research  in  1985.  Com¬ 
mercial  NLU  interfaces  to  databases  such  as  AI  Corporation’s 
Intellect  and  Microrim  Clout  appeared. 

AI  optimism  is  illustrated  by  two  meticulously  researched 
Ovum  reports  on  speech-and  language  processing  (Johnson, 


1985;  Engelien  &  McBride,  2001).  In  1985,  speech-and-language 
product  “revenue”  was  $75  million,  comprised  mostly  of  in¬ 
come  from  grants  and  investor  capital.  Ovum  projected  that 
sales  would  reach  $750  million  by  1990  and  $2.75  billion  by 
1995.  In  1991,  sales  were  under  $90  million,  but  projections 
were  optimistically  pegged  at  $490  million  for  1995  and  $3.6  bil¬ 
lion  for  2000. 

U.S.  corporations  banded  together  to  counter  the  Japanese 
Fifth-Generation  project,  jointly  funding  the  Microelectronics 
and  Computer  Technology  Corporation  (MCC).  (U.S.  antitrust 
laws  were  relaxed  to  allow  them  to  cooperate  in  this  way.)  MCC 
embraced  AI,  reportedly  becoming  the  leading  customer  for 
both  Symbolics  and  LMI.  MCC  projects  included  two  parallel 
NLU  efforts,  work  on  intelligent  advising,  and  CYC,  Douglas 
Lenat’s  ambitious  project  to  build  an  encyclopedic  common- 
sense  knowledge  base  for  other  programs  to  consult.  In  1984, 
Lenat  predicted  that  by  1994  CYC  would  be  intelligent  enough 
to  educate  itself  from  online  texts.  Five  years  later,  CYC  was  re¬ 
ported  to  be  on  schedule  to  “spark  a  vastly  greater  renaissance 
in  (machine  learning)”  (Lenat,  1989). 

Knowledge  engineering  was  another  facet  of  AI  that  involved 
human  interaction,  bringing  it  closer  to  HCI,  at  least  in  theory. 
The  difficulty  of  eliciting  knowledge  from  experts  frustrated  re¬ 
searchers  whose  primary  interest  was  in  representing  and  reason¬ 
ing  about  knowledge.  This  created  opportunities  for  researchers 
in  the  HF&E  community.  European  funding  directives  particu¬ 
larly  encouraged  work  spanning  technical  and  behavioral  con¬ 
cerns.  International  Journal  of  Man-Machine  Studies  became  a 
major  outlet  for  both  HE&E  and  AI  research  in  the  1980s. 

AI  interaction  with  CHI  was  limited.  CHI’SJ  and  CHr85  ses¬ 
sions  covered  speech  and  language,  cognitive  modeling,  knowl¬ 
edge-based  help,  and  knowledge  elicitation,  but  AI  technologies 
did  not  succeed  in  the  marketplace  and  were  often  directed  at 
nondiscretionary  users.  AI  Corporation  sold  the  database  inter¬ 
face  Intellect  primarily  to  the  government  before  it  disappeared. 
Nor  were  many  AI  researchers  and  developers  interested  in  in¬ 
teraction  details:  They  focused  on  the  power  of  tools  such  as 
EMACS  and  UNIX,  quickly  forgetting  the  painful  weeks  spent 
learning  needlessly  arbitrary  commands. 


1985-1995:  GRAPHICAL  USER 
INTERFACES  SUCCEED 


“There  will  never  be  a  mouse  at  the  Ford  Motor  Company.” 

— High-level  acquisition  manager,  1985 

Graphical  user  interfaces  were  a  disruptive  revolution  in 
interaction  when  they  finally  succeeded  commercially,  as  were 
earlier  shifts  to  stored  programs  and  to  interaction  based  on 
commands,  full-screen  forms  and  full-screen  menus.  Some  sec¬ 
tors  were  affected  well  before  others. 

GUIs  were  particularly  attractive  to  new  users.  Their  success 
immediately  affected  the  CHI  field.  However,  not  until  'Windows 
3.0  succeeded  in  1990  did  GUIs  have  much  influence  among 
government  agencies  and  business  organizations  that  were  the 
focus  of  the  other  HCI  researchers.  By  then  the  technology  was 
better  understood  and  less  disruptive.  The  early  1990s  also  saw 
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the  maturation  of  local  area  networks  and  the  Internet.  This 
foundation  for  computer-mediated  communication  and  infor¬ 
mation  sharing  was  also  transformational. 

CHI  Embraces  Computer  Science 

Apple  launched  the  Macintosh  with  a  1984  Super  Bowl  ad  pitched 
at  office  work,  but  sales  did  not  follow,  and  in  mid-1985  Apple  was 
in  trouble.  Then  Macs  appeared  with  four  times  as  much  RAM, 
sufficient  to  manage  Aldus  PageMaker,  Adobe  Postscript,  and  the 
Apple  LaserWriter  as  they  were  released,  along  with  Microsoft’s 
Excel  and  Word  for  Macintosh.  The  more  powerful  Mac  Plus  ar¬ 
rived  in  January,  1986.  The  Mac  succeeded  where  the  many  com¬ 
mercial  GUIs  before  it  had  not;  it  was  popular  with  consumers 
and  the  platform  for  desktop  publishing. 

Even  within  CHI,  GUIs  were  initially  controversial.  They  had 
disadvantages:  an  extra  level  of  interface  code  increased  devel¬ 
opment  complexity  and  created  reliability  challenges;  they  con¬ 
sumed  processor  cycles  and  distanced  users  from  the  under¬ 
lying  system,  which,  many  believed,  experienced  users  would 
have  to  learn  eventually.  Carroll  and  Mazur  (1986)  showed  that 
GUIs  confused  and  created  problems  for  people  familiar  with 
existing  interfaces.  An  influential  essay  on  direct  manipulation 
interfaces,  Hutchins,  Hollan,  and  Norman  (1986),  concluded,  “it 
is  too  early  to  tell”  how  GUIs  would  fare.  GUIs  could  well  prove 
useful  for  novices,  they  wrote,  but  “we  would  not  be  surprised  if 
experts  are  slower  with  Direct  Manipulation  systems  than  with 
command  language  systems.”  Most  HCI  research  had  focused 
on  expert  use,  so  this  valid  insight  seemed  significant.  However, 
in  a  rapidly  expanding  consumer  market,  first-time  use  is  critical. 
Hardware  and  software  advances  eliminated  other  difficulties, 
and  GUIs  were  around  to  stay. 

The  effects  within  CHI  were  dramatic.  Active  topics  of  re¬ 
search  including  command  naming,  text  editing,  and  the  psy¬ 
chology  of  programming  were  quickly  abandoned;  more  techni¬ 
cal  topics  such  as  “user-interface  management  systems”  became 
significant.  At  a  higher  level,  psychology  gave  way  to  computer 
science  as  the  driving  force  in  interaction  design. 

Researchers  had  been  engaged  in  establishing  a  comprehen¬ 
sive  psychological  theoretical  framework  based  on  formal  ex¬ 
periments  (Newell  &  Card,  1985;  Carroll  &  Campbell,  1986; 
Long,  1989;  Barnard,  1991).  This  was  conceivable  for  constrained 
command-  and  form-based  interaction,  but  could  not  be  scaled 
up  to  design  spaces  that  included  color,  sound,  animation,  and 
an  endless  variety  of  icons,  menu  designs,  window  arrangements, 
and  so  on.  The  urgent  need  was  to  identify  the  most  pressing 
problems  and  find  satisfactory  rather  than  optimal  solutions.  Rig¬ 
orous  experimentation,  the  principal  skill  of  cognitive  psycholo¬ 
gists,  gave  way  to  faster  and  less  precise  assessment  methods. 
Also,  to  explore  the  dynamically  evolving,  unconstrained  design 
space  required  software  engineering  expertise. 

As  a  result,  the  late  1980s  saw  an  influx  of  computer  scientists 
into  the  CHI  community.  The  topic  HCI  became  part  of  the  cur¬ 
riculum  of  many  computer-science  departments.  Computer  sci¬ 
entists  working  on  interactive  graphics  saw  CHI  as  a  natural 
home,  as  did  software  engineers  interested  in  interaction,  and 
some  AI  researchers  working  on  speech  recognition,  language 
understanding,  and  expert  systems.  Reflecting  this  shift,  in  1994 


ACM  launched  Transactions  on  Computer-Human  Interaction. 
Of  course,  computer-science  researchers  were  more  familiar 
with  early  pioneering  work  than  were  many  of  the  cognitive  sci¬ 
entists  who  preceded  them. 

Early  PCs  and  Macs  were  not  easily  networked,  but  as  local 
area  networks  spread,  CHI’s  focus  expanded  to  include  collab¬ 
oration  support.  This  brought  it  into  contact  with  efforts  in  MIS 
and  Office  Automation  research,  which  are  discussed  below. 


Human  Factors  and  Ergonomics  Maintains 
a  Nondiscretionary  Use  Focus 

HF&E  research  continued  to  respond  to  the  needs  of  govern¬ 
ment  agencies,  the  military,  aviation,  and  telecommunications. 
Census,  tax,  social  security,  health  and  welfare,  power-plant  op¬ 
eration,  air-traffic  control,  ground  control  for  space  missions, 
military  logistics,  processing  text,  and  voice  data  for  intelligence 
contribute  to  the  government  being  the  largest  consumer  of 
computing. 

Most  users  in  these  settings  are  assigned  technology.  The  fo¬ 
cus  is  on  skilled  use.  Small  efficiency  gains  in  individual  trans¬ 
actions  can  yield  large  benefits  over  time.  For  routine  data  en¬ 
try  and  other  tasks,  improvements  that  may  not  influence 
discretionary  users  can  make  a  difference.  After  CHI  formed,  the 
HFS  undertook  a  study  to  see  how  it  would  affect  membership 
in  its  Computer  Systems  Technical  Group  and  found  unexpect¬ 
edly  little  overlap  (Richard  Pew,  personal  communication). 

Research  funding  in  HF&E  responded  to  governmental  con¬ 
cerns  and  initiatives.  Government  also  promoted  the  develop¬ 
ment  of  ergonomic  standards,  in  part  to  help  with  the  problem 
of  defining  system  requirements  for  competitive  bidding  while 
remaining  at  arms’  length  from  the  potential  developers  who 
better  understand  the  technical  possibilities.  Compliance  with 
standards  was  specified  in  a  contract. 

In  1986,  Sid  Smith  and  Jane  Mosier  published  the  last  in  a 
series  of  government-sponsored  interface  guidelines.  Nine  hun¬ 
dred  and  forty-four  specific  design  guidelines  were  organized 
into  sections  titled  Data  Entry,  Data  Display  Data  Transmission, 
Data  Protection,  Sequence  Control,  and  User  Guidance.  They 
recognized  that  GUIs  would  expand  the  design  space  beyond 
the  reach  of  such  an  already  cumbersome,  comprehensive  set 
of  guidelines  that  did  not  cover  icons,  pull-down  or  pop-up 
menus,  mice-button  assignments,  sound,  animation,  and  so  on. 
Requirements  definition  shifted  to  specify  predefined  interface 
styles  and  design  processes  rather  than  to  identify  specific  fea¬ 
tures  that  would  be  built  from  scratch. 

DARPA’s  heavily  funded  Strategic  Computing  program  set 
out  to  develop  an  Autonomous  Land  'Vehicle,  a  Pilot’s  Associ¬ 
ate,  and  a  Battle  Management  system.  All  raised  human-factors 
research  issues.  These  systems  would  require  interactive  tech¬ 
nologies  such  as  speech  recognition,  language  understanding, 
and  heads-up  displays.  These  might  not  be  used  by  people  who 
have  a  choice,  but  pilots,  people  guiding  autonomous  vehicles, 
and  officers  under  stressful  conditions  may  have  no  better  al¬ 
ternative.  Such  technologies  could  also  prove  useful  for  profes¬ 
sional  translators  and  intelligence  analysts,  when  a  phone  sys¬ 
tem  provides  no  alternative,  a  disability  limits  keyboard  use,  or 
hands  are  otherwise  occupied. 


A  Moving  Target:  The  Evolution  of  Human-Computer  Interaction 
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IS  Extends  Its  Range 

Although  graphical  user  interfaces  were  not  quickly  adopted  by 
organizations,  business  graphics  was  important  in  a  research 
field  focused  on  managerial  use.  Remus  (1984)  contrasted  tab¬ 
ular  and  graphic  presentations,  Benbasat  and  Dexter  (1985) 
added  color  as  another  factor,  and  many  studies  followed.  The 
concept  of  cognitive  fit  between  task  and  tool  was  introduced  in 
this  context  to  explain  apparently  contradictory  results  in  the  lit¬ 
erature  (Vessey  &  Galletta,  1991).  Studies  considered  online  and 
paper  presentation.  In  practice,  color  displays  were  rare  in  the 
1980s;  most  managers  dealt  with  printed  reports. 

Involvement  of  internal  end  users  in  the  development 
process  was  actively  discussed  (Friedman,  1989).  A  Scandinavian 
group  set  out  to  empower  workers  in  this  way  (Bjerknes,  Ehn,  & 
Kyng,  1987),  but  more  often,  the  focus  was  on  increasing  user 
acceptance  of  the  resulting  system. 

Hands-on  managerial  use  was  atypical,  but  it  was  central  to 
group-decision  support-systems  research,  which  emerged  from 
decision  support  systems  and  evolved  into  group  support  sys¬ 
tems.  Computer-supported  meeting  facility  research  was  con¬ 
ducted  in  the  mid-1980s  in  several  laboratories  (e.g.,  Begeman 
et  al.,  1986;  DeSanctis  &  Gallupe,  1987;  Dennis,  George,  Jessup, 
Nunamaker,  &  Vogel,  1988).  Given  the  expense  of  these  systems 
and  the  decision-maker  focus,  key  research  was  in  schools  of  man¬ 
agement  and  not  in  computer-science  departments  or  software 
companies.  Jay  Nunamaker’s  group  at  the  University  of  Arizona 
explored  approaches  to  brainstorming,  idea  organizing,  online 
voting,  and  other  meeting  activities.  This  became  a  significant  IS 
contribution  to  research  in  Computer  Supported  Cooperative 
Work  (CSCW),  discussed  next,  and  also  led  to  the  formation  of  a 
company  to  market  the  technology  (Nunamaker,  Briggs,  Mittle- 
man,  Vogel,  &  Balthazard,  1997). 

An  influential  IS  research  thread  is  based  on  the  Technology 
Acceptance  Model  (TAM)  introduced  in  F.D.  Davis  (1989).  It  fo¬ 
cuses  on  perceived  usefulness  and  perceived  usability  to  im¬ 
prove  “white-collar  performance”  that  is  “often  obstructed 
by  users’  unwillingness  to  accept  and  use  available  systems” 
(p.  319).  ‘An  element  of  uncertainty  exists  in  the  minds  of  deci¬ 
sion  makers  with  respect  to  the  successful  adoption,”  wrote 
Bagozzi,  Davis,  and  Warshaw  (1992,  p.  664).  This  managerial 
view  of  individual  behavior  was  influenced  by  Davis’s  exposure 
to  some  early  CHI  usability  research. 

High  interest  in  TAM  showed  that  the  MIS  focus,  in  which 
hands-on  use  was  primarily  a  nondiscretionary  operation, 
data  entry,  and  data  retrieval,  was  shifting  as  hands-on  use 
spread  to  white-collar  workers  who  could  refuse  to  play.  Con¬ 
trast  IS  with  CHI:  Consumers  choose  technologies  that  they 
perceive  to  be  useful,  so  CHI  assumes  perceived  utility  and 
rarely  considers  utility  at  all.  TAM  researchers  considered  util¬ 
ity  more  important  than  usability.  CHI  focused  on  usability  a 
decade  before  TAM,  albeit  more  on  measures  of  actual  usabil¬ 
ity  than  measures  of  perceived  usability.  Perception  was  a  sec¬ 
ondary  “user  satisfaction”  measure  to  CHI  researchers,  who 
believed  (not  entirely  correctly)  that  measurable  reduction 
in  time,  errors,  questions,  and  training  would,  over  time, 
translate  into  positive  perceptions.  ‘Acceptance”  is  not  in  the 
CHI  vocabulary.  A  discretionary  user  chooses  or  adopts, 
rather  than  accepts. 


Harvard  Business  Review  published  “Usability:  The  new  di¬ 
mension  of  product  design”  (March,  1994).  In  concluding  that 
“user-centered  design  is  still  in  its  infancy,”  it  made  no  mention 
of  CHI.  The  communities  remained  largely  isolated. 

Collaboration  Support:  OIS  Gives  Way  to  CSCW 

In  the  late  1980s,  three  research  communities  focused  on  small- 
group  communication  and  information  sharing.  The  Office  Au¬ 
tomation/Office  Information  System  field  was  already  there.  De¬ 
clining  costs  of  computing  attracted  MIS  researchers  focused  on 
organizational  decision  making  to  group  decision  making  more 
generally,  as  previously  noted.  The  proliferation  of  LAN  net¬ 
works  encouraged  some  CHI  researchers  to  move  from  indi¬ 
vidual  productivity  to  a  quest  for  “killer  apps”  that  would  ap¬ 
peal  to  groups. 

Although  the  OA/OIS  field  had  led  the  way,  it  declined  and 
largely  disappeared  in  this  period.  The  minicomputer  platform 
for  much  of  the  work  did  not  survive  the  competition  from 
PCs  and  workstations.  The  concept  of  “office”  or  group  was 
problematic:  organizations  and  individuals  are  relatively 
sharply  defined  persistent  entities  with  goals  and  needs,  but 
small  groups  often  have  ambiguous  memberships  and 
processes  that  shift  when  one  member  joins  or  departs.  Peo¬ 
ple  who  work  together  often  fall  under  different  budgets,  mak¬ 
ing  technology  acquisition  complicated  when  it  is  not  organi¬ 
zation  wide.  Conference  series  and  journals  dropped  the 
identification  with  offices. 

First  automation  lost  favor  as  a  term;  ACM  SIGOA  shifted  to 
SIGOIS  (Office  Information  Systems)  in  1986,  the  same  year  the 
annual  AFIPS  OA  conferences  ended.  By  1991,  the  term  “Office” 
itself  began  to  disappear:  Transactions  on  Office  Information 
Systems  became  Transactions  on  Information  Systems;  Office: 
Information  and  People  became  Information  Technology  and 
People-,  Conference  on  Office  Information  Systems  became 
Conference  on  Organizational  Communication  Systems. 

The  AI  summer  that  was  one  component  of  the  OA/OIS  ef¬ 
fort  ended  as  AI  failed  to  meet  expectations:  massive  funding 
did  not  deliver  a  Pilot’s  Associate,  Autonomous  Land  'Vehicle, 
or  Battle  Management  system  for  the  military  of  automated  of¬ 
fices  for  enterprises.  CHI  conference  sessions  on  language  pro¬ 
cessing  diminished  early,  but  sessions  on  modeling,  adaptive 
interfaces,  advising  systems,  and  other  uses  of  intelligence  in  in¬ 
terfaces  increased  through  the  late  1980s  before  declining  in  the 
1990s.  AI  research  did  not  disappear,  but  funding  became  scarce, 
employment  opportunities  dried  up,  and  conference  participa¬ 
tion  dropped  off 

Building  on  a  1984  workshop  (Greif,  1985),  the  1986  Com¬ 
puter  Supported  Cooperative  "Work  conference  brought  to¬ 
gether  researchers  from  diverse  disciplines  interested  in  issues 
of  communication,  information  sharing,  and  coordination.  Par¬ 
ticipants  came  primarily  from  IS,  OIS,  CHI,  distributed  AI,  and 
anthropology.  Four  of  13  CSCW  program  committee  members 
and  many  papers  were  from  schools  of  management,  with  sim¬ 
ilar  numbers  from  OIS. 

A  field  seemed  to  coalesce  in  1988,  with  the  publication  of 
Computer-Supported  Cooperative  Work,  edited  by  Irene  Greif, 
and  SIGCHI’s  management  of  the  biennial  CSCW  conferences 
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held  in  North  America.  However,  representation  from  IS  and 
OIS  faded.  Scandinavian  cooperative  or  participatory  design  re¬ 
searchers  remained  active  longer.  Although  they  shared  the  IS 
focus  on  organizational  development  and  use  rather  than  com¬ 
mercial  software  development,  they  also  shared  CHI’s  focus 
on  discretionary  use,  albeit  with  a  different  slant:  empowering 
workers  to  control  the  design  of  their  workplaces.  Ethnogra¬ 
phers,  whose  ethic  is  also  to  act  to  benefit  their  informants,  in 
this  case  technology  users  rather  than  technology  managers, 
rose  in  influence  in  CSCW 

Some  IS  researchers  shifted  participation  to  a  series  of  less 
selective  annual  Groupware  conferences  that  started  in  1992. 
They  also  created  a  newsletter:  Groupware  Report  listed  many 
relevant  conferences,  but  not  CSCW  Eventually,  many  in  IS  set¬ 
tled  on  the  Collaboration  Technology  track  of  the  annual  HICSS 
conference  as  a  prejournal  publication  arena,  with  some  partic¬ 
ipating  in  the  Organizational  Computing  Systems  (1991-1995) 
and  GROUP  (1997-present)  descendants  of  the  Conference  on 
Office  Information  Systems. 

The  cultural  issues  that  pulled  apart  the  groups  initially  ac¬ 
tive  in  CSCW  are  addressed  in  the  Discussion  section.  How¬ 
ever,  CSCW  remains  a  strong  research  area.  It  has  attracted  a 
broad  swath  of  HCI  researchers  long  enough  to  foster  mutual 
understanding,  ranging  from  highly  technical  work  on  group 
undo  to  thick  ethnographies  of  workplace  activity,  from  studies 
of  IM  dyads  to  scientific  collaboratories  involving  hundreds  of 
people  over  several  continents  and  many  years.  For  details  see 
the  Handbook  chapter  “Groupware  and  Computer  Supported 
Cooperative  Work”  by  prominent  CSCW  researchers  Gary  and 
Judy  Olson. 


1995-2005:  THE  INTERNET  ERA  ARRIVES 


How  did  the  spread  of  the  Internet  and  emergence  of  the  Web 
affect  the  different  HCI  research  threads?  CHI  researchers  were 
relatively  Internet  savvy.  Although  excited  by  the  prospects,  they 
took  this  change  in  stride.  Over  time,  though,  the  nature  of  re¬ 
search,  development,  and  use  were  affected.  The  change  was 
not  disruptive  to  human  factors  and  ergonomics,  either.  The 
Web  was  initially  a  return  to  a  relatively  familiar  form-driven  in¬ 
terface  style,  and  it  was  not  a  locus  of  routine  work  for  many 
people.  However,  these  developments  had  a  seismic  impact  on 
IS,  so  that  is  where  this  section  begins. 

The  Formation  of  AIS  SIGHCI 

Computer  users  in  organizations  were  no  longer  “almost  slaves” 
devoted  to  maximizing  computer  use — screen  savers  vied  with 
solitaire  to  be  the  main  consumer  of  processor  cycles.  Embrace 
of  the  Internet  created  more  porous  organizational  boundaries. 
Employees  download  free  software  such  as  instant-messaging 
clients,  music  players,  and  weblog  tools  inside  the  firewall  de¬ 
spite  IT  concerns  about  productivity  and  security.  These  are  not 
the  high-overhead  applications  of  the  past.  Another  change  over 
time  is  that  home  use  of  software  has  reduced  employee  pa¬ 


tience  with  poor  interactive  software  at  work.  In  addition, man¬ 
agers  who  were  hands-off  users  in  the  1980s  became  late 
adopters  in  the  1990s  and  are  now  hands-on  early  adopters  of 
technologies  that  benefit  them. 

These  changes  affect  use,  but  the  Web  had  a  more  dramatic 
effect  on  IS  research.  Corporate  IT  departments  had  focused  on 
internal  operations,  but  suddenly  organizations  were  creating 
Web  interfaces  to  vendors  and  customers.  The  Internet  bubble 
revealed  how  little  was  understood  in  these  areas,  but  neither 
online  presence  and  services  nor  business-to-business  systems 
disappeared  when  the  bubble  burst.  Portals  proliferated:  the 
Web  became  an  essential  business  tool.  IT  professionals  tasked 
with  providing  interfaces  to  highly  discretionary  external  cus¬ 
tomers  found  themselves  in  much  the  same  place  CHI  had  been 
20  years  earlier,  whether  they  realized  it  or  (most  often)  not. 

In  2001,  the  Association  for  Information  Systems  (AIS)  es¬ 
tablished  the  Special  Interest  Group  in  Human-Computer  Inter¬ 
action  (SIGHCI).  The  founders  defined  HCI  by  citing  12  works 
by  CHI  researchers  (Zhang,  et  al.,  2004,  p.  148)  and  made  it  a 
priority  to  bridge  to  CHI  and  the  Information  Science  commu¬ 
nity  (Zhang,  2004,  p.  2).  SIGHCTs  broad  charter  includes  a 
range  of  organizational  issues,  but  published  work  focuses  on 
interface  design  for  e-commerce,  online  shopping,  online  be¬ 
havior  “especially  in  the  Internet  era,”  and  effects  of  Web-based 
interfaces  on  attitudes  and  perceptions.  Eight  of  the  first  10  pa¬ 
pers  in  SIGHCI-sponsored  journal  issues  cover  Internet  and 
Web  behavior. 


Human  Factors  and  Ergonomics 
Embraces  Cognitive  Approaches 

In  1996,  the  Human  Factors  and  Ergonomics  Society  formed  a 
new  technical  group.  Cognitive  Engineering  and  Decision  Mak¬ 
ing.  It  became  the  largest  technical  group  in  the  society.  A 
decade  earlier,  this  would  have  seemed  an  unlikely  develop¬ 
ment:  Senior  human  factors  researchers  disliked  cognitive  ap¬ 
proaches,  and  it  was  in  the  CHI  field  that  cognitive  engineering 
was  being  used  in  this  sense  (Norman,  1982;  1986). 

Even  more  astonishing  would  have  been  the  fact  that  in 
2005,  human  performance  modeling  would  be  a  new,  thriving 
technical  group  in  HFES.  The  technical  group  was  started  by 
Wayne  Gray  and  Dick  Pew,  both  of  whom  participated  in  CHI  in 
the  1980s.  Human  performance  modeling  was  the  Card,  Moran, 
and  Newell  (1983)  effort  to  reform  the  discipline  of  human  fac¬ 
tors  from  the  outside.  Work  had  continued,  focused  on  expert 
performance  (e.g.,  a  special  issue  oiHCI,  Vol.  12,  Number  4, 
1997).  Today  the  reform  effort  is  appropriately  positioned 
within  human  factors,  focused  largely  on  nondiscretionary  use. 
See  Michael  Byrne’s  chapter  on  cognitive  architectures  for  more 
on  this  topic. 

HF&E  has  to  a  large  degree  shaped  government  funding. 
The  U.S.  National  Science  Foundation  Interactive  Systems  pro¬ 
gram — subsequently  renamed  Human-Computer  Interaction — 
was  described  as  follows: 

The  Interactive  Systems  Program  considers  scientific  and  engineering 
research  oriented  toward  the  enhancement  of  human-computer  com- 
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munications  and  interactions  in  all  modalities.  These  modalities  include 
speech/language,  sound,  images  and,  in  general,  any  single  or  multiple, 
sequential,  or  concurrent,  human-computer  input,  output,  or  action.” 
(National  Science  Foundation,  1993) 

One  NSF  HCI  program  manager  reported  that  his  proudest 
accomplishment  was  doubling  the  already  ample  funding  for 
natural  language  understanding.  NSF  established  a  separate 
Human  Language  and  Communication  Program  in  2003,  but 
speech-and-language  research  continued  to  draw  heavy  funding 
support  in  the  HCI  and  Accessibility  programs,  and  lighter  sup¬ 
port  in  AI  and  other  NSF  programs. 

Two  subsequent  NSF  HCI  program  managers  chose  to 
emphasize  “direct-brain  interfaces”  or  “brain-computer  inter¬ 
action,”  using  brainwaves,  implants,  or  other  means.  These  pro¬ 
gram  managers  rarely  attended  CHI  conferences,  where  one  finds 
little  work  on  speech,  language,  or  direct-brain  interaction. 
Whether  or  not  these  technologies  prove  useful,  they  will  not  ap¬ 
pear  soon  in  many  homes  or  offices.  A  review  committee  in  2003 
noted  that  a  random  sample  of  NSF  HCI  grants  included  none  by 
prominent  CHI  researchers  (National  Science  Foundation,  2003). 

Human-factors  research  on  computer  use  in  general  has  dis¬ 
persed.  Within  HFES,  the  Computer  Systems  Technical  Group 
has  declined  in  membership,  but  HCI  issues  now  appear  in 
most  branches  of  human  factors,  from  telecommunications  to 
medical  systems. 


CHI  Evolves,  Embraces  Design 

The  steady  flow  of  new  hardware,  software  features,  applica¬ 
tions,  and  systems  insures  that  initial  and  early  use  of  digital 
technology  is  always  present,  is  important  to  technology  pro¬ 
ducers,  and  raises  new  research  issues.  CHI  has  tracked  this 
flow  of  innovation,  generally  picking  up  an  innovation  at  the 
point  it  attracts  a  wide  audience. 

As  an  application  matures,  use  may  become  routine.  Many 
people  now  have  little  choice  about  using  e-mail  and  word  pro¬ 
cessing,  for  example.  Such  technologies  get  less  attention  as 
CHI  directs  its  gaze  at  discretionary  use  of  the  moment:  instant 
messaging,  weblogs,  collaboration  technology,  Web  design, 
ubiquitous  computing,  mobile  computing,  and  social  comput¬ 
ing,  and  so  on.  New  technologies  have  raised  new  issues,  such 
as  privacy,  and  encouraged  new  methods,  such  as  ethnography. 
At  a  more  abstract  level,  there  is  continuity  at  CHI:  there  is  con¬ 
tinued  exploration  of  input  devices,  communication  channels, 
information- visualization  techniques,  and  design  methods. 

The  growing  participation  in  an  Internet  that  seemed  to  be¬ 
come  more  reliable  and  support  higher  bandwidth  every  month 
through  the  mid-1990s  increased  the  focus  on  the  use  of  real¬ 
time  communication  technologies,  and  quasi-real-time  tech¬ 
nologies  such  as  e-mail.  The  Web  temporarily  stopped  much  of 
this,  however,  by  shifting  attention  to  less  indirect  interaction 
via  static  sites. 

The  Web  is  like  a  new  land  mass,  a  new  continent.  First  came 
the  explorers,  posting  flags  here  and  there.  Then  came  the  first 
attempts  at  settlement,  in  the  form  of  virtual  worlds  research 
and  development.  Few  of  these  first  pioneers  survived.  There 


was  little  to  do  in  virtual  worlds,  except  for  multiplayer  games 
and  simulations.  However,  in  recent  years,  people,  especially 
young  people,  are  shifting  major  portions  of  their  work  and  play 
online,  including  reliance  on  online  reference,  digital  photos, 
social  software,  preferences  for  digital  documents  and  online 
shopping,  and  multiplayer  games.  This  is  reflected  in  CHI  re¬ 
search  topics,  although  game  design  has  not  yet  attracted  the  at¬ 
tention  it  merits. 

The  Web  has  curtailed  some  research  focused  on  self-con¬ 
tained  personal  productivity  tools.  Despite  high  development 
and  maintenance  costs,  representing  knowledge  in  application 
software  was  appealing  when  external  information  resources 
were  limited.  Now  that  so  much  information  is  available  online, 
including  the  ability  to  locate  and  access  knowledgeable  people, 
static  knowledge  representation  is  less  useful.  In  contrast,  adap¬ 
tive  systems  that  build  and  maintain  local  knowledge  can  play  a 
greater  role.  Steady  progress  in  machine  learning  is  influencing 
productivity  tools — although  hyperbolic  forecasts  have  not 
disappeared. 

The  psychologists  and  computer  scientists  who  formed  CHI 
considered  interface  design  to  be  a  scientific  and  engineering 
undertaking.  Their  focus  on  performance  assumed  that  people 
eventually  choose  the  most  efficient  alternatives.  Because  human 
discretion  involves  aesthetic  preferences  and  invites  marketing 
and  nonrational  persuasion,  as  computing  costs  came  down  this 
view  could  not  be  sustained.  However,  it  held  on  longer  in  CHI 
than  in  SIGGRAPH,  where  aesthetic  appeal  motivated  much  of 
the  research.  Even  now,  the  study  of  enjoyment  in  CHI  is  labeled 
“funology”  (Blythe,  Monk,  Overbeeke,  &  Wright,  2003)  lest  some¬ 
one  think  that  the  researchers  are  too  relaxed  about  it. 

Visual  designers  participated  in  graphical  interface  research 
from  early  on.  Aaron  Marcus  began  working  full-time  on  com¬ 
puter  graphics  in  the  late  1960s.  William  Bowman’s  1968  book 
Graphic  Communication  was  a  strong  influence  on  the  Xerox 
Star,  for  which  the  designer  Norm  Cox’s  icons  were  chosen  in 
an  evaluation  described  in  Bewley  et  al.,  1983).  However, 
graphic  designers  were  usually  seen  as  secondary  (Evenson, 
2005).  In  1995,  building  on  working-group  meetings  at  recent 
conferences,  SIGCHI  initiated  Designing  Interactive  Systems 
(DIS),  a  biennial  conference  drawing  some  visual  designers  and 
many  systems  designers.  In  2003,  SIGCHI,  SIGGRAPH,  and  the 
American  Institute  of  Graphic  Arts  (AIGA)  initiated  the  Design¬ 
ing  for  User  Experience  (DUX)  conference  series  that  fully  em¬ 
braces  visual  and  commercial  design. 

The  evolution  of  CHI  is  reflected  in  the  influential  contribu¬ 
tions  of  Donald  Norman.  A  cognitive  scientist  who  introduced 
the  term  cognitive  engineering,  he  presented  the  first  CHI  83 
paper.  It  defined  “User  Satisfaction  Functions”  based  on  speed 
of  use,  ease  of  learning,  required  knowledge,  and  errors.  His 
influential  1988  hook  Psychology  of  Everyday  Things  (POET)  fo¬ 
cused  on  pragmatic  usability.  Its  1990  reissue  as  Design  of 
Everyday  Things  reflected  a  field  refocusing  on  invention.  Four¬ 
teen  years  later,  he  published  Emotional  Design:  Why  We  Love 
(or  Hate)  Everyday  Things,  stressing  the  role  of  aesthetics  in 
our  response  to  objects. 

Design’s  first  cousin,  marketing,  has  been  poorly  regarded  in 
the  CHI  community  (see  Marcus,  2004).  However,  website  de¬ 
sign  forces  the  issue.  Site  owners  often  wish  to  keep  users  on  a 
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site;  whereas  users  may  prefer  to  escape  quickly.  Consider  su¬ 
permarkets,  where  items  that  most  shoppers  want  are  posi¬ 
tioned  far  apart,  forcing  people  to  traverse  aisles  so  other  prod¬ 
ucts  can  beckon.  CHI  professionals  usually  align  themselves 
with  “end  users,”  but  when  designing  for  a  site  owner,  they  face 
a  stakeholder  conflict.  This  was  not  true  in  the  past:  Designers 
of  individual  productivity  tools  had  negligible  conflict  of  interest 
with  prospective  customers.  Marketing  is  likely  to  find  a  place  in 
CHI,  perhaps  as  “brandology.” 


LOOKING  BACK:  CULTURES  AND  BRIDGES 


Despite  a  significant  common  focus  and  a  dynamic  environ¬ 
ment  with  shifting  alliances,  the  three  major  threads  of  HCI 
research — human  factors  and  ergonomics  (HF&E),  Information 
Systems  (IS),  and  Computer-Human  Interaction  (CHI) — have 
not  merged.  They  have  not  even  interacted  as  much  as  one 
might  expect — and  not  for  lack  of  trying.  The  HFS  was  a  50% 
cosponsor  of  the  first  CHI  conference.  CSCW  set  out  to  bring  to¬ 
gether  CHI  and  IS,  and  AIS  SIGHCI  has  engaged  in  another  such 
effort.  Understanding  the  obstacles  to  interaction  provides  in¬ 
sight  into  the  nature  of  these  disciplines. 

Effects  of  Varying  Discretion 

A  principal  distinction  is  that  HF&E  and  IS  arose  before  discre¬ 
tionary  hands-on  use  was  widespread,  whereas  CHI  made  that 
its  focus.  Researchers  in  the  first  two  examined  organizational 
as  well  as  technical  issues,  the  latter  came  only  slowly  to  con¬ 
sider  organizational  context.  Researchers  in  HF&E  and  IS 
shared  journals;  for  example,  Benbasat  and  Dexter  (1985)  was 
published  in  Management  Science  and  cited  five  human  fac¬ 
tors  articles. 

The  difference  in  user  motivation  affected  methods.  The  psy¬ 
chologists  who  shaped  HF&E  and  CHI  were  trained  to  test  hy¬ 
potheses  about  behavior  in  laboratory  experiments.  Experi¬ 
mental  subjects  agree  to  follow  instructions  for  an  extrinsic 
reward.  This  is  a  good  model  for  nondiscretionary  use,  but  not 
for  discretionary  use.  CHI  researchers  relabeled  them  “partici¬ 
pants,”  which  sounds  volitional,  but  discovered  that  lab  findings 
require  confirmation  in  real-world  settings  more  often  than  is 
true  for  ergonomics  studies. 

Traditional  ergonomic  goals  apply — fewer  errors,  faster  per¬ 
formance,  quicker  learning,  greater  memorability,  and  being  en¬ 
joyable — but  the  emphasis  differs.  For  power-plant  operation, 
error  reduction  is  key  and  performance  enhancement  is  good. 
Other  goals  are  less  critical.  In  other  settings,  a  formal  experi¬ 
ment  is  necessary  to  show  that  a  new  interface  will  shave  a  few 
seconds  from  a  repetitive  operation. 

In  contrast,  consumers  often  respond  to  visceral  appeal  and 
initial  experience  at  the  expense  of  long-term  usability  and  utility. 
In  assessing  designs  for  mass  markets  where  catching  obvious 
problems  is  more  significant  than  striving  for  an  optimal  solu¬ 
tion,  less-rigorous  studies  (e.g.,  discount  usability,  Nielsen,  1989) 
are  adequate  and  time-consuming  qualitative  approaches  (e.g., 
personas,  Pruitt,  and  Adlin,  2006)  and  can  provide  a  deeper  un¬ 


derstanding  of  users  that  can  be  drawn  upon  as  new  possibili¬ 
ties  arise.  Unlike  HF&E,  CHI  slowly  abandoned  its  roots  in  sci¬ 
entific  theory  and  engineering.  This  did  not  impress  rigorously 
experimental  HE&E  or  theory-oriented  IS  researchers.  The 
controversial  psychological  method  of  verbal  reports,  devel¬ 
oped  by  Newell  and  Simon  (1972),  was  applied  to  design  as 
the  thinking-aloud  method  by  Clayton  Lewis  (1983;  Lewis  & 
Mack,  1982).  Perhaps  the  most  widely  used  CHI  method,  it  led 
some  researchers  in  the  other  areas  to  characterize  CHI  peo¬ 
ple  as  wanting  to  talk  about  their  experiences  rather  than  do¬ 
ing  research. 

Academic,  Linguistic,  and  Generational  Cultures 

HF&E  and  IS  also  share  the  traditional  academic  culture  of  the 
sciences:  Conferences  are  venues  for  work  in  progress,  and 
journals  are  repositories  for  polished  work.  In  contrast,  for  CHI 
and  other  U.S.  computer-science  disciplines,  conference  pro¬ 
ceedings  are  the  final  destination  for  most  work.  Journals  are 
secondary.  Outside  the  United  States,  computer  science  retains 
more  of  a  journal  focus,  perhaps  due  to  the  absence  of  profes¬ 
sional  societies  that  archive  proceedings.  This  circumstance  im¬ 
pedes  communication  across  disciplines  and  continents.  Re¬ 
searchers  in  journal  cultures  chafe  at  CHI’s  rejection  rates;  CHI 
researchers  are  dismayed  by  the  relatively  unpolished  work  at 
other  conferences. 

Accepting  only  around  of  20%  of  submissions,  CHI  confer¬ 
ences  are  selective.  With  few  exceptions,  HF&E  and  IS  confer¬ 
ences  have  acceptance  rates  of  50%  or  greater.  On  the  other 
hand,  CHI  journals  receive  fewer  submissions  and  have  higher 
acceptance  rates.  (See  Grudin,  2005.)  Many  CHI  researchers  re¬ 
port  that  journals  are  not  relevant,  and  I  estimate  that  as  little 
as  10%  of  work  in  CHI-sponsored  conferences  reaches  journal 
publication.  In  contrast,  an  IS  track  organizer  for  Hawaii  Inter¬ 
national  Conference  on  System  Sciences  2004  estimated  that 
80%  of  research  there  progressed  to  a  journal  (Jay  Nunamaker, 
HICCS-38  presentation,  January  2004). 

A  linguistic  divide  also  set  CHI  apart.  HF&E  and  IS  used  the 
term  operator-,  in  IS,  user  could  be  a  manager  who  used 
printed  computer  output,  not  a  hands-on  end  user.  Within  CHI, 
operator  was  demeaning,  user  was  always  hands-on,  and  end 
user  seemed  a  superfluous  affectation.  In  HE&E  and  IS,  task 
analysis  generally  refers  to  an  organizational  decomposition  of 
work,  or,  especially  more  recently,  and  a  broader  analysis  that 
includes  external  factors;  in  CHI  it  was  a  cognitive  decomposi¬ 
tion,  such  as  breaking  a  text  editing  move  operation  into  select, 
cut,  select,  paste.  In  IS,  implementation  meant  deployment  of 
a  system  in  an  organization;  in  CHI  it  was  a  synonym  for  devel¬ 
opment.  System,  application,  and  evaliMtion  also  had  markedly 
different  connotations  or  denotations.  Significant  misunder¬ 
standings  and  rejections  resulted  from  failure  to  recognize 
these  distinctions. 

Different  perspectives  and  priorities  were  reflected  in  atti¬ 
tudes  toward  standards.  Many  HF&E  researchers  contributed  to 
the  development  of  standards,  believing  that  standards  con¬ 
tribute  to  efficiency  and  innovation.  A  widespread  CHI  view  was 
that  standards  inhibit  innovation.  There  are  elements  of  truth  in 
both  views,  and  positions  may  have  converged  as  Internet  and 
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Web  standards  were  tackled.  However,  the  attitudes  reflected 
the  different  demands  of  government  contracting  and  commer¬ 
cial  software  development.  Specifying  adherence  to  standards  is 
a  useful  tool  for  those  preparing  requests  for  proposals,  but 
compliance  with  standards  can  make  it  more  difficult  for  a  prod¬ 
uct  to  differentiate  itself 

A  generational  divide  also  existed.  Many  CHI  researchers 
grew  up  in  the  1960s  and  1970s,  and  did  not  appreciate  the 
HF&E  orientation  toward  military  and  government  systems,  or 
the  inability  (still  occasionally  observed)  of  HF&E  and  IS  “man- 
machine  interaction”  researchers  to  adopt  gender-neutral  ter¬ 
minology.  It  added  to  the  difficulty  of  generating  enthusiasm  for 
building  bridges  and  exploring  literatures. 


LOOKING  AHEAD:  TRAIECTORIES 


The  future  of  HCI  will  be  tremendously  varied,  dynamic,  and  full 
of  surprises.  The  intent  in  this  chapter  is  to  provide  a  perspec¬ 
tive  or  framework  with  which  to  view  other  chapters  or  topics  of 
interest  and  assess  how  they  might  develop.  In  this  section,  I 
extrapolate  from  some  observations  about  the  past  and  present 
state  of  the  field. 


Discretion — Now  You  See  It,  Now  You  Don’t 

We  exercise  choice  more  at  home  than  at  work,  a  lot  when  buy¬ 
ing  online,  none  when  confronted  by  a  telephone  answering 
system,  considerable  when  young  and  healthy,  less  when  con¬ 
strained  by  injury  or  aging.  Software  that  was  discretionary  yes¬ 
terday  is  indispensable  today.  The  need  to  collaborate  forces  us 
to  adopt  common  systems  and  conventions. 

Consider  a  hypothetical  team.  In  1987,  one  member  still 
used  a  typewriter,  and  others  chose  different  word  processors. 
All  exchanged  printed  documents.  One  emphasized  by  under¬ 
lining.  another  by  italicizing,  a  third  by  bolding.  In  1997, 
group  members  wanted  to  share  documents  digitally,  so  they 
had  to  adopt  the  same  word  processor  and  conventions. 
Choice  was  curtailed;  it  had  to  be  exercised  collectively.  Today, 
it  suffices  to  share  documents  in  PDF  format,  so  in  2007  the 
team  is  using  different  word  processors  again.  Tomorrow  per¬ 
haps  I  could  personalize  my  view  to  see  in  italics  what  you  see 
as  bold. 

Shackel  (1997)  noted  this  progression  under  the  heading 
“From  Systems  Design  to  Interface  Usability  and  Back  Again.” 
Early  designers  focused  at  the  system  level;  operators  had  to 
cope.  When  the  PC  merged  the  roles  of  operator,  output  user, 
and  program  provider,  the  focus  shifted  to  the  human  interface 
and  choice.  Then  individual  users  again  became  components 
in  fully  networked  organizational  systems.  When  a  technology 
becomes  mission-critical,  as  e-mail  did  for  many  in  the  1990s, 
discretion  is  gone. 

The  converse  also  occurs.  Discretion  increases  when  em¬ 
ployees  download  free  software  and  demand  capabilities  they 
have  at  home.  Managers  are  less  likely  to  mandate  the  use  of 
a  technology  that  they  use  and  find  burdensome.  Even  in  the 
military,  where  language-processing  systems  appealed  to  mili¬ 


tary  officers,  the  situation  changed  when  they  became  hands- 
on  users: 

Our  military  users  .  .  .  generally  flatly  refuse  to  use  any  system  that  re¬ 
quires  speech  recognition.  .  .  .  Over  and  over  and  over  again,  we  were 
told  ‘If  we  have  to  use  speech,  we  will  not  take  it.  I  don’t  even  want  to 
waste  my  time  talking  to  you  if  it  requires  speech.  .  .  .’  I  have  seen  gen¬ 
erals  come  out  of  using,  trying  to  use  one  of  the  speech-enabled  sys¬ 
tems  looking  really  whipped.  One  really  sad  puppy,  he  said  ‘OK,  what’s 
your  system  like,  do  I  have  to  use  speech?’  He  looked  at  me  plaintively. 
And  when  I  said  ‘No,’  his  face  lit  up,  and  he  got  so  happy.”  (Forbus, 
2003;  see  also  Forbus,  Usher,  &  Chapman,  et  al.,  2003) 

As  familiar  applications  become  essential  and  security  con¬ 
cerns  curtail  openness,  one  might  expect  discretion  to  recede. 
However,  Moore’s  law,  competition,  and  the  phenomenal  ease 
of  sharing  bits  seem  to  guarantee  that  a  steady  flow  of  unproven 
technologies  will  find  their  way  to  us. 


Ubiquitous  Computing,  Invisible  HCI? 

Norman  (1988)  wrote  of  “the  invisible  computer  of  the  future.” 
Like  motors,  he  speculated,  computers  would  be  present  every¬ 
where  and  visible  nowhere.  We  interact  with  clocks,  refrigera¬ 
tors,  and  cars.  Each  has  a  motor,  but  there  is  no  human-motor 
interaction  specialization.  A  decade  later,  at  the  height  of  the 
Y2K  crisis  and  the  Internet  bubble,  computers  were  more  visible 
than  ever.  We  may  always  want  a  multipurpose  display  or  two, 
but  part  of  Norman’s  vision  is  materializing.  However,  with  com¬ 
puters  embedded  everywhere,  concern  with  interaction  is  also 
everywhere. 

Perhaps  HCI,  too,  will  become  invisible  through  omnipres¬ 
ence.  Interaction  with  digital  technology  is  becoming  part  of 
everyone’s  research  and  the  three  major  HCI  fields  are  losing 
participation. 

Human  Factors  and  Ergonomics.  David  Meister,  au¬ 
thor  of  The  History  of  Human  Factors  and  Ergonomics,  stresses 
the  continuity  of  HF&E  in  the  face  of  technology  change: 

Outside  of  a  few  significant  events,  like  the  organization  of  HFS  in  1957 
or  the  publication  of  Proceedings  of  the  annual  meetings  in  1972,  there 
are  no  seminal  occurrences  ...  no  sharp  discontinuities  that  are  mem¬ 
orable.  A  scientific  discipline  like  HF  has  only  an  intellectual  history;  one 
would  hope  to  find  major  paradigm  changes  in  orientation  toward  our 
human  performance  phenomena,  but  there  is  none,  largely  because  the 
emergence  of  HF  did  not  involve  major  changes  from  pre-World  War  II 
applied  psychology.  In  an  intellectual  history,  one  has  to  look  for  ma¬ 
jor  changes  in  thinking,  and  I  have  not  been  able  to  discover  any  in 
HF.  (e-mail  to  author,  7  September  2004) 

Membership  in  the  Computer  Systems  Technical  Group  of 
HFES  has  declined  sharply,  but  technology  use  is  stressed  in 
many  technical  groups,  such  as  Cognitive  Engineering  and  De¬ 
cision  Making,  Communication,  Human  Performance  Modeling, 
Internet,  System  Development,  and  Virtual  Environment.  Nor 
can  Aging,  Medical  Systems,  or  other  technical  groups  avoid  “in¬ 
visible  computers.” 
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Information  Systems.  As  IS  thrived  during  the  Y2K  cri¬ 
sis  and  Internet  bubble  years,  other  management-school  disci¬ 
plines — finance,  marketing,  operations  research,  organizational 
behavior — become  more  technically  savvy.  When  the  bubble 
burst  and  enrollments  declined,  IS  was  left  with  a  less  well-de¬ 
fined  niche.  IS  research  issues,  including  HCI,  remain  signifi¬ 
cant,  but  this  cuts  two  ways.  With  the  standardization  and  out¬ 
sourcing  of  IT  functions,  Web  portals  and  business-to-business 
ties  get  more  attention.  These  bring  in  economic  and  marketing 
considerations,  making  it  easier  to  outsource  HCI  functions  to 
the  traditional  management  disciplines. 

Computer-Human  Interaction.  This  nomadic  group 
started  in  psychology  and  obtained  a  place  at  the  table  in  com¬ 
puter  science — in  some  departments.  Lacking  a  well-defined  aca¬ 
demic  niche,  CHI’s  identity  is  tied  to  its  conference,  where  par¬ 
ticipation  peaked  in  2001.  Conferences  that  are  more  specialized 
thrive.  As  technologies  appear  and  attract  a  critical  mass  at  an 
ever-increasing  pace,  researchers  may  start  new  conferences  or 
just  blog  their  findings.  Soon  after  the  Web  came  into  view,  annual 
"WWW  conferences  drew  papers  on  HCI  issues.  Now  we  see  con¬ 
ferences  on  ubiquitous  and  pervasive  computing,  agents,  design, 
emerging  technologies,  and  so  on.  HCI  is  invisibly  present  in 
each.  High  computer  science  conference  rejection  rates  and  a 
new  generational  divide  could  accelerate  this  dispersion  of  effort. 


Information  Science 

In  2005,  the  first  annual  “i-Conference”  drew  deans  and  faculty 
from  information  schools  to  State  College,  Pennsylvania.  Nine¬ 
teen  universities  with  information  programs  are  listed  on  the 
Web  site  http://iconference.ist.psu.edu/contentA'iew/23/37/.  These 
schools  vary  considerably:  Some  are  transformed  library  schools, 
some  have  closer  ties  to  information  systems,  and  others  formed 
relatively  recently  focusing  on  digital  libraries  and  other  issues. 
Together  they  represent  an  interesting  development,  in  that 
they  have  drawn  HCI  researchers  from  HF&E,  IS,  and  CHI.  In¬ 
formation  science  also  has  conceptual  and  human  links  to  the 
office  information  system  thread,  which  had  information  re¬ 
trieval  and  language-processing  elements  that  have  been  part  of 
the  digital  library  research. 

Within  these  schools,  the  issues  of  academic  cultures  are  be¬ 
ing  addressed  on  almost  a  daily  basis.  It  is  too  early  to  say  how 
this  will  evolve,  but  information  science  will  be  a  significant 
player  in  HCI.  Design  and  information  are  two  foci  active  in 
2007:  Design  is  a  compensation  for  past  neglect,  and  informa¬ 
tion  is  a  new  world  opening  before  us. 


CONCLUSION:  THE  NEXT  GENERATION 


Moore’s  law  ensures  that  landscapes  will  continue  to  shift,  pro¬ 
viding  new  forms  of  interaction  to  explore  and  new  practices 
to  improve.  The  first  generation  of  computer  researchers,  de¬ 
signers,  and  users  grew  up  without  computers.  Then  came  a 
generation  who  began  using  computers  as  students;  they  en¬ 


tered  workplaces  and  changed  the  way  technology  was  used. 
Now  a  generation  has  grown  up  with  game  consoles  and  cell 
phones,  absorbing  an  aesthetic  of  technology  design,  commu¬ 
nicating  with  IM  and  text  messaging,  developing  skills  at  search¬ 
ing,  browsing,  tagging,  synthesizing,  and  acquiring  multimedia 
authoring  talent  via  digital  cameras  and  blogs.  They  are  entering 
workplaces,  and  they  will  change  everything  once  again. 

However  it  comes  to  be  defined  and  wherever  it  is  studied, 
HCI  is  still  in  its  early  days. 


APPENDIX:  A  FEW  PERSONAL  OBSERVATIONS 


My  career  from  1973  to  1993  followed  a  common  path:  com¬ 
puter  programmer,  cognitive  psychology  student,  who  worked 
first  in  HCI  R&D  industry,  and  then  academia.  My  experiences 
were  not  special;  I  was  one  of  many  in  a  position  to  see  and  feel 
events  and  changes.  I  describe  some  of  them  here  to  add  tex¬ 
ture  and  a  sense  of  human  impact.  My  interest  in  history  arose 
from  the  feeling  of  being  disrupted  and  swept  along  by  unex¬ 
pected  forces.  My  first  try  at  understanding  was  called,  “The 
Computer  Reaches  Out”  (Grudin,  1990) — I  saw  computers 
slowly  reaching  into  the  world  and  changing  us  in  ways  we  did 
not  foresee. 


1 970:  A  Change  in  Plans 

I  am  a  student,  reading  and  believing  the  Life  magazine  article 
describing  super-human  intelligent  computers  arriving  in  sev¬ 
eral  years.  I  conclude  that  if  we  make  it  through  the  next  few 
years,  we  can  count  on  machines  to  do  all  useful  work.  Humans 
should  focus  on  what  they  enjoy.  I  shifted  from  physics  to  math¬ 
ematics,  from  politics  to  literature. 


1973:  Three  Computing  Professions, 

Only  One  of  Them  Hands-On 

Looking  for  my  first  job  in  1973,  I  found  three  computer  job 
categories  in  the  Boston  Globe  classifieds:  operators,  program¬ 
mers,  and  systems  analysts.  Not  qualified  to  be  a  highly  paid  an¬ 
alyst,  I  interviewed  for  some  low-paid,  hands-on  operator  jobs, 
and  landed  a  programming  job  with  ’Wang  Laboratories,  which 
was  then  a  small  electronics  company.  In  two  years  at  that  job,  I 
never  saw  the  computer  my  programs  ran  on.  ’We  flowcharted 
on  paper  and  coded  on  coding  sheets,  which  a  secretary  sent 
to  be  punched  and  verified.  A  van  carried  the  stack  of  cards 
20  miles  to  a  computer  center,  and  later  that  day  or  the  next  I 
got  the  printout  (e.g.,  “Error  in  Line  20”). 


1975:  Joining  the  First  Profession  of  Discretionary 
Hand-On  Users 

In  1975,  the  company  acquired  a  few  teletype  terminals  with 
access  to  the  "WYLBUR  line  editor,  developed  at  the  Stanford 
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Linear  Accelerator.  Some  programmers  chose  to  abandon  paper 
and  became  hands-on  computer  users. 

1983:  A  Chilly  Reception  for  an  Early  Paper 
on  Discretion  in  Use 

My  first  HCI  publication  showed  that  people  may  choose  a 
slower  interface  for  aesthetic  or  other  reasons  even  when  fa¬ 
miliar  with  a  more  efficient  alternative  (Grudin  &  MacLean, 
1984).  A  senior  colleague  asked  us  not  to  publish  it.  He  worked 
on  improving  expert  performance  efficiency  through  cognitive 
human  performance  modeling.  He  felt  that  a  demonstration 
that  greater  efficiency  could  be  undesirable  would  be  a  distrac¬ 
tion,  saying,  “Sometimes  the  larger  enterprise  is  more  impor¬ 
tant  than  a  small  study.” 

1984:  First  Encounters  with  IS,  Human 
Factors,  and  Design 

I  returned  to  Wang  Laboratories,  now  a  large  minicomputer 
company.  I  was  influenced  there  by  another  cognitive  psychol¬ 
ogist,  Susan  Ehrlich.  She  was  in  a  marketing  research  group  and 
later  managed  the  human  factors  group.  She  introduced  me  to 
IS  literature,  which  I  found  difficult  to  understand.  I  attended 
Boston-area  chapter  meetings  of  both  the  HFS  and  SIGCHI.  I 
noted  the  cultural  differences  but  felt  CHI  could  learn  from  hu¬ 
man  factors.  I  decided  to  label  myself  human  factors  engineer, 
a  conscious-if-futile  gesture  to  counter  CHI  antipathy  toward 
human  factors.  I  drove  to  Cambridge  to  see  the  newly  released 
Macintosh.  Few  software  engineers  had  the  visual  design  skills 
that  I  realized  would  become  important,  so  at  work  I  found  in¬ 
dustrial  designers  of  the  “boxes”  who  could  be  attracted  to  soft¬ 
ware  interface  design. 

1985:  The  GUI  Shock 

In  the  early  1980s,  I  was  one  of  many  cognitive  psychologists 
working  on  command  naming.  This  was  an  important  application 
in  the  era  of  command-line  interfaces,  but  our  greater  ambition 
was  to  develop  a  comprehensive  theoretical  foundation  for  HCI. 
The  success  of  the  Mac  in  1985  curtailed  interest  in  command 
names.  For  most  of  us,  it  also  dashed  the  hope  of  developing  a 
comprehensive  psychology  of  HCI.  No  one  would  build  on  our 
past  work — a  depressing  thought.  We  had  to  choose:  Am  I  a  cog¬ 
nitive  psychologist  or  a  computer  professional? 

1986:  Beyond  "The  User’’:  Group  and  Organizational  Issues 

I  joined  MCC,  an  industry  research  consortium.  Between  jobs  I 
worked  on  two  papers,  each  addressing  a  different  major  chal¬ 
lenge  encountered  in  product  development.  From  1984  to 
1986, 1  had  worked  on  several  products  or  features  intended  to 
support  groups  rather  than  individual  users.  These  had  not 
done  well.  Why  was  group  support  so  challenging?  In  addition, 
it  was  painfully  evident  that  organizational  structures  and  de¬ 


velopment  processes  were  badly  suited  to  interactive  software 
development.  What  could  be  done  about  it?  These  two  issues 
formed  the  basis  of  much  of  my  research  for  the  next  decade 
and  made  CSCW  my  natural  home. 

1989:  Discovering  Contexts  of  Development, 
a  Major  CHI-IS  Differentiator 

I  was  invited  to  spend  two  years  at  Aarhus  University  in  Den¬ 
mark.  Within  weeks  of  arriving  in  a  country  with  little  commer¬ 
cial  software  development,  I  recognized  the  significance  of  the 
differences  in  the  conditions  that  govern  product,  in-house,  and 
contract  development  of  interactive  software,  differences  that 
shape  CHI,  IS,  and  software  engineering  perspectives.  Sorting 
this  out  led  to  my  first  library  research  into  historical  informa¬ 
tion  (Grudin,  1991).  Consulting  journals  and  magazines  long 
untouched  in  dusty  library  corridors  felt  like  wandering  through 
an  archaeological  site. 

1990:  lust  Words:  Terminology  Can  Matter 

I  felt  a  premonition  in  1987  when  my  IS-oriented  colleague  Su¬ 
san  Ehrlich  titled  a  paper  “Successful  Implementation  of  Office 
Communication  Systems.”  By  implementation,  she  meant  in¬ 
troduction  into  organizations.  To  me,  implementation  was  a 
synonym  for  coding  or  development.  Sure  enough,  the  ACM  ed¬ 
itor  asked  her  to  change  implementation  to  adoption  (Ehrlich, 
1987).  What  she  called  systems,  I  called  applications.  It  left  me 
uneasy.  Would  my  dear  friend  Language  get  in  the  way? 

In  1990, 1  set  out  to  teach  HCI  at  Aarhus,  describing  the  focus 
as  “user-interface  evaluation.”  My  new  colleagues  seemed  em¬ 
barrassed.  Weeks  later,  a  book  written  by  one  of  them  was  pub¬ 
lished  (Bpdker,  1990).  Its  first  sentence  was  a  quotation:  “Design 
is  where  the  action  is,  not  evaluation.”  Now  I  was  embarrassed. 
In  their  in-house  development  world,  with  its  dogma  of  getting 
the  design  right  up  front,  development  projects  often  took 
10  years  or  more.  Evaluation  occurred  at  the  end,  when  only 
cosmetic  changes  were  possible — thus  the  negative  stigma.  In 
commercial  product  development,  evaluation  of  the  previous 
version,  competitive  products,  and  (ideally)  prototypes  was  in¬ 
tegral  to  design.  Evaluation  is  central  to  iterative  design.  It  was 
also  the  skill  experimental  psychologists  brought  to  the  table. 
We  considered  it  a  good  thing. 

Later  in  1990, 1  participated  in  a  panel  on  task  analysis  at  a 
European  conference.  To  my  dismay,  this  IS-oriented  group  de¬ 
fined  task  analysis  differently  than  I  did.  To  them,  it  meant  an 
organizational  task  analysis:  tasks  as  components  in  a  broad 
work  process.  In  CHI,  it  meant  a  cognitive  task  analysis:  break¬ 
ing  a  simple  task  into  components;  for  example,  is  “move  text” 
better  thought  of  as  select-delete-paste  or  select-move-place? 

Also  in  1990,  en  route  to  give  a  job  talk  at  UC  Irvine,  my  first 
lecture  to  an  IS  audience  at  the  UCLA  Anderson  School  of  Man¬ 
agement  ended  badly  when  the  department  head  asked  a  ques¬ 
tion.  It  seemed  meaningless,  so  I  replied  cautiously.  He  rephrased 
the  question.  I  rephrased  my  response.  He  started  again,  then 
stopped  and  shrugged  as  if  to  say,  “this  fellow  is  hopeless.” 
When  I  saw  him  a  few  months  later,  he  was  visibly  astonished 
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to  learn  that  his  Irvine  friends  were  hiring  me.  Later,  I  under¬ 
stood  the  basis  of  our  failure  to  communicate:  We  attached  dif¬ 
ferent  meanings  to  the  word  users.  In  CHI,  it  means  hands-on 
computer  users.  IS  users  often  never  used  a  keyboard — they 
identihed  software  requirements,  managed  development,  read 
printed  output  and  reports,  and  so  on.  His  question  had  focused 
on  users  who  were  not  hands  on.  To  me,  all  use  was  hands  on, 
so  the  question  made  no  sense. 

A  book  could  be  written  about  the  word  user.  From  a  CHI 
perspective,  the  IS  user  was  called  “customer.”  Consultants  use 
client.  In  IS,  the  hands-on  user  was  the  end  user.  In  CHI’s  par¬ 
lance,  end  user  and  user  were  one  and  the  same — a  person  who 
both  entered  data  and  used  the  output — so  end  user  seemed  a 
superfluous  or  odd  affectation.  Human  factors  used  operator, 
which  CHI  considered  demeaning.  In  software  engineering, 
user  typically  denoted  a  tool  user,  namely  a  software  engineer. 

I  usually  consider  words  a  necessary  but  uninteresting 
medium  for  conveying  meaning,  but  these  experiences  led  to  an 
essay  on  unintended  consequences  of  language  (Grudin,  1993). 

2007:  Reflecting  on  Bridging  Efforts 

I  have  been  a  minor  participant  in  efforts  to  find  synergies  be¬ 
tween  CHI  and  human  factors,  office  information  systems,  in¬ 


formation  systems  (in  both  CSCW  and  AIS  SIGHCI),  and  Design. 
A  “sixties  person,”  I  experienced  generational  and  cultural  di¬ 
vides.  Some  of  us  avoided  publishing  in  “man-machine”  con¬ 
ferences  and  journals,  and  many  of  my  MCC  colleagues  joined 
the  consortium  to  avoid  “Star  Wars”  military  projects.  We  lived 
through  disputes  between  cognitive  psychologists  and  radical 
behaviorist  or  strictly  perceptual-motor  psychologists.  Many 
CHI  researchers  shifted  from  journals  to  conferences  as  pre¬ 
ferred  publication  venues,  and  from  hypothesis-driven  research 
to  build-and-assess  research. 

Some  differences  fade  over  time,  but  terminology  continues 
to  impede  communication.  Conference  reviewers  are  often  ir¬ 
ritated  by  acronyms  used  by  authors  from  other  fields.  Writing 
a  chapter  for  an  IS-oriented  book,  my  coauthor  and  I  wrangled 
at  great  length  with  the  editor  over  terminology  (Palen  & 
Grudin,  2002). 

A  final  example:  In  this  research,  I  reviewed  the  literature 
on  TAM,  the  model  of  White-collar  employee  perceptions  of 
technology  that  is  heavily  cited  in  IS  but  never  in  CHI.  I  repeat¬ 
edly  had  difficulty  using  a  search  engine  to  return  to  TAM  refer¬ 
ences.  On  the  third  such  occasion,  I  saw  why:  TAM  stands  for 
“Technology  Acceptance  Model,”  but  I  always  typed  in  “Tech¬ 
nology  Adoption  Model.”  Nondiscretionary  acceptance  vs.  dis¬ 
cretionary  adoption:  different  biases  led  to  different  terminol¬ 
ogy,  and  to  error  and  confusion. 
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PERCEPTUAL-MOTOR  INTERACTION: 
A  BEHAVIORAL  EMPHASIS 


Many  of  us  can  still  remember  purchasing  our  first  computers  to 
be  used  for  research  purposes.  The  primary  attributes  of  these 
new  tools  were  their  utilities  in  solving  relatively  complex  math¬ 
ematical  problems  and  performing  computer-based  experi¬ 
ments.  However,  it  was  not  long  after  that  word  processing 
brought  about  the  demise  of  the  typewriter,  and  our  depart¬ 
ment  secretaries  no  longer  prepared  our  research  manuscripts 
and  reports.  It  is  interesting  to  us  that  computers  are  not  so  sub¬ 
stantively  different  from  other  tools  such  that  we  should  disre¬ 
gard  much  of  what  the  study  of  human  factors  and  experimen¬ 
tal  psychology  has  contributed  to  our  understanding  of  human 
behavior  in  simple  and  complex  systems.  Rather,  it  is  the  com¬ 
puter’s  capacity  for  displaying,  storing,  processing,  and  even 
controlling  information  that  has  led  us  to  the  point  at  which 
the  manner  with  which  we  interact  with  such  systems  has  be¬ 
come  a  research  area  in  itself 

In  our  studies  of  human-computer  interaction  (HCI),  also 
known  as  human-machine  interaction,  and  perceptual-motor 
interaction  in  general,  we  have  adopted  two  basic  theoretical 
and  analytical  frameworks  as  part  of  an  integrated  approach.  In 
the  first  framework,  we  view  perceptual-motor  interaction  in  the 
context  of  an  information-processing  model.  In  the  second 
framework,  we  have  used  analytical  tools  that  allow  detailed 
investigations  of  both  static  and  dynamic  interactions.  Our  chap¬ 
ter  in  the  previous  edition  of  this  handbook  (Chua,  Weeks,  & 
Goodman,  2003)  reviewed  both  avenues  of  research  and  their 
implications  for  HCI  with  a  particular  emphasis  on  our  work  re¬ 
garding  the  translation  of  perceptual  into  motor  space.  Much 
of  our  more  recent  research,  however,  has  explored  the  broader 
interplay  between  the  processes  of  action  and  attention.  Thus, 
in  the  present  chapter,  we  turn  our  focus  to  aspects  of  this  work 
that  we  believe  to  have  considerable  implications  for  those 
working  in  HCI. 

Human  Information  Processing 
and  Perceptual-Motor  Behavior 

The  information-processing  framework  has  traditionally  pro¬ 
vided  a  major  theoretical  and  empirical  platform  for  many  sci¬ 
entists  interested  in  perceptual-motor  behavior.  The  study  of 
perceptual-motor  behavior  within  this  framework  has  inquired 
into  such  issues  as  the  information  capacity  of  the  motor  system 
(e.g.,  Fitts,  1954),  the  attentionai  demands  of  movements  (e.g., 
Posner  &  Keele,  1969),  motor  memory  (e.g.,  Adams  &  Dijkstra, 
1966),  and  processes  of  motor  learning  (e.g.,  Adams,  1971).  The 
language  of  information  processing  (e.g.,  Broadbent,  1958)  has 
provided  the  vehicle  for  discussions  of  mental  and  computa¬ 
tional  operations  of  the  cognitive  and  perceptual-motor  system 
(Posner,  1982).  Of  interest  in  the  study  of  perceptual-motor  be¬ 
havior  is  the  nature  of  the  cognitive  processes  that  underlie  per¬ 
ception  and  action. 

The  information-processing  approach  describes  the  human 
as  an  active  processor  of  information,  in  terms  that  are  now 
commonly  used  to  describe  complex  computing  mechanisms. 


An  information-processing  analysis  describes  observed  behavior 
in  terms  of  the  encoding  of  perceptual  information,  the  manner 
in  which  internal  psychological  subsystems  utilize  the  encoded 
information,  and  the  functional  organization  of  these  subsys¬ 
tems.  At  the  heart  of  the  human  cognitive  system  are  processes 
of  information  transmission,  translation,  reduction,  collation, 
storage,  and  retrieval  (e.g.,  Fitts,  1964;  Marteniuk,  1976;  Stel- 
mach,  1982;  Welford,  1968).  Consistent  with  a  general  model 
of  human  information  processing  (e.g.,  Fitts  &  Posner,  1967), 
three  basic  processes  have  been  distinguished  historically.  For 
our  purposes,  we  refer  to  these  processes  as  stimulus  identifi¬ 
cation,  response  selection,  and  response  programming.  Briefly, 
stimulus  identification  is  associated  with  processes  responsible 
for  the  perception  of  information.  Response  selection  pertains 
to  the  translation  between  stimuli  and  responses  and  the  selec¬ 
tion  of  a  response.  Response  programming  is  associated  with 
the  organization  of  the  final  output  (see  Proctor  &  Vu,  2003,  or 
the  present  volume). 

A  key  feature  of  early  models  of  information  processing  is 
the  emphasis  upon  the  cognitive  activities  that  precede  action 
(Marteniuk,  1976;  Stelmach,  1982).  From  this  perspective, 
action  is  viewed  only  as  the  end  result  of  a  complex  chain  of 
information-processing  activities  (Marteniuk,  1976).  Thus, 
chronometric  measures,  such  as  reaction  time  and  movement 
time,  as  well  as  other  global  outcome  measures,  are  often  the 
predominant  dependent  measures.  However,  even  a  cursory 
examination  of  the  literature  indicates  that  time  to  engage  a 
target  has  been  a  primary  measure  of  interest.  For  example,  a 
classic  assessment  of  perceptual-motor  behavior  in  the  context 
of  HCI  and  input  devices  was  conducted  by  Card,  English,  and 
Burr  (1978;  see  also  English,  Engelhart,  &  Berman,  1967).  Em¬ 
ploying  measures  of  error  and  speed.  Card  et  al.  (1978)  had 
subjects  complete  a  cursor  positioning  task  using  four  different 
control  devices  (mouse,  joystick,  step  keys,  text  keys).  The  data 
revealed  the  now  well-known  advantage  for  the  mouse.  Of  in¬ 
terest  is  that  the  speed  measure  was  decomposed  into  “hom¬ 
ing”  time,  the  time  that  it  took  to  engage  the  control  device 
and  initiate  cursor  movement,  and  “positioning”  time,  the  time 
to  complete  the  cursor  movement.  Although  the  mouse  was 
actually  the  poorest  device  in  terms  of  the  homing  time  mea¬ 
sure,  the  advantage  in  positioning  time  produced  the  faster 
overall  time.  That  these  researchers  sought  to  glean  more  in¬ 
formation  from  the  time  measure  acknowledges  the  impor¬ 
tance  of  the  movement  itself  in  perceptual-motor  interactions 
such  as  these. 

The  fact  that  various  pointing  devices  depend  on  hand 
movement  to  control  cursory  movement  has  led  to  the  empha¬ 
sis  that  researchers  in  HCI  have  placed  on  Fitts’  law  (Fitts,  1954) 
as  a  predictive  model  of  time  to  engage  a  target.  The  law  pre¬ 
dicts  pointing  (movement)  time  as  a  function  of  the  distance 
to  and  width  of  the  target — where,  in  order  to  maintain  a  given 
level  of  accuracy,  movement  time  must  increase  as  the  distance 
of  the  movement  increase  and/or  the  width  of  the  target  de¬ 
creases.  The  impact  of  Fitts’  law  is  most  evident  by  its  inclusion 
in  the  battery  of  tests  to  evaluate  computer-pointing  devices  in 
ISO  9241-9.  We  argue  that  there  are  a  number  of  important  lim¬ 
itations  to  an  exclusive  reliance  on  Fitts’  law  in  this  context. 

First,  although  the  law  predicts  movement  time,  it  does  this 
based  on  distance  and  target  size.  Consequently,  it  does  not  al- 


low  for  determining  what  other  factors  may  influence  move¬ 
ment  time.  Specifically,  Fitts’  law  is  often  based  on  a  movement 
to  a  single  target  at  any  given  time  (although  it  was  originally  de¬ 
veloped  using  reciprocal  movements  between  two  targets). 
However,  in  most  HCI  and  graphical  user  interface  (GUI)  con¬ 
texts,  there  is  an  array  of  potential  targets  that  can  be  engaged 
by  an  operator.  As  we  will  discuss  later  in  this  chapter,  the  in¬ 
fluence  of  these  distracting  nontarget  stimuli  on  both  the  tem¬ 
poral  and  physical  characteristics  of  the  movements  to  the  im¬ 
perative  target  can  be  significant. 

Second,  we  suggest  that  the  emphasis  on  Fitts’  law  has  di¬ 
verted  attention  from  the  fact  that  cognitive  processes  involving 
the  selection  of  a  potential  target  from  an  array  are  an  impor¬ 
tant,  and  time  consuming,  information  processing  activity  that 
must  precede  movement  to  that  target.  For  example,  the  Hick- 
Hyman  law  (Hick,  1952;  Hyman,  1953)  predicts  the  decision 
time  required  to  select  a  target  response  from  a  set  of  potential 
responses — where  the  amount  of  time  required  to  choose  the 
correct  response  increases  with  the  number  of  possible  alter¬ 
native  responses.  'What  is  important  to  understand  is  that  the 
two  laws  work  independently  to  determine  the  total  time  it 
takes  for  an  operator  to  acquire  the  desired  location.  In  one 
instance,  an  operator  may  choose  to  complete  the  decision¬ 
making  and  movement  components  sequentially.  Under  these 
conditions,  the  total  time  to  complete  the  task  will  be  the  sum 
of  the  times  predicted  by  the  Hick-Hyman  and  Fitts’  laws.  Alter¬ 
natively,  an  operator  may  opt  to  make  a  general  movement  that 
is  an  approximate  average  of  the  possible  responses  and  then 
select  the  final  target  destination  while  the  movement  is  being 
completed.  Under  such  conditions,  Hoffman  and  Lim  (1997)  re¬ 
ported  interference  between  the  decision  and  movement  com¬ 
ponent  that  was  dependent  on  their  respective  difficulties  (see 
also  Meegan  &  Tipper,  1998). 

Finally,  although  Fitts’  law  predicts  movement  time  given  a 
set  of  movement  parameters,  it  does  not  actually  reveal  much 
about  the  underlying  movement  itself  Indeed,  considerable  re¬ 
search  effort  has  been  directed  toward  revealing  the  movement 
processes  that  give  rise  to  Fitts’  law.  For  example,  theoretical 
models  of  limb  control  have  been  forwarded  that  propose  that 
Fitts’  law  emerges  as  a  result  of  multiple  submovements  (e.g.. 
Crossman  &  Goodeve,  1963/1983),  or  as  a  function  of  both  ini¬ 
tial  movement  impulse  variability  and  subsequent  corrective 
processes  late  in  the  movement  (Meyer,  Abrams,  Kornblum, 
■Wright,  &  Smith,  1988).  These  models  highlight  the  importance 
of  conducting  detailed  examinations  of  movements  themselves 
as  a  necessary  complement  to  chronometric  explorations. 

For  these  reasons,  HCI  situations  that  involve  dynamic  per¬ 
ceptual-motor  interactions  may  not  be  best  indexed  merely  by 
chronometric  methods  (cf ,  Card  et  al.,  1978).  Indeed,  as  HCI 
moves  beyond  the  simple  key  press  interfaces  that  are  charac¬ 
teristic  of  early  systems  to  include  virtual  and  augmented  real¬ 
ity,  teleoperation,  gestural,  and  haptic  interfaces,  among  oth¬ 
ers,  the  dynamic  nature  of  perceptual-motor  interactions  are 
even  more  evident.  Consequently,  assessment  of  the  actual 
movement  required  to  engage  such  interfaces  would  be  more 
revealing. 

To  supplement  chronometric  explorations  of  basic  percep¬ 
tual-motor  interactions,  motor  behaviour  researchers  have  also 
advocated  a  movement-process  approach  (Kelso,  1982).  The  ar¬ 
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gument  is  that,  in  order  to  understand  the  nature  of  movement 
organization  and  control,  analyses  should  also  encompass  the 
movement  itself,  and  not  just  the  activities  preceding  it  (e.g., 
Kelso,  1982;  1995;  Marteniuk,  MacKenzie,  &  Leavitt,  1988). 
Thus,  investigators  have  examined  the  kinematics  of  move¬ 
ments  in  attempts  to  further  understand  the  underlying  orga¬ 
nization  involved  (e.g..  Brooks,  1974;  Chua  &  Elliott,  1993; 
Elliott,  Carson,  Goodman,  &  Chua,  1991;  Kelso,  Southard,  & 
Goodman,  1979;  MacKenzie,  Marteniuk,  Dugas,  Liske,  &  Eick- 
meier,  1987;  Marteniuk,  MacKenzie,  Jeannerod,  Athenes,  & 
Dugas,  1987).  The  relevance  of  this  approach  will  become  ap¬ 
parent  in  later  sections. 

Translation,  Coding,  and  Mapping 

As  outlined  above,  the  general  model  of  human  information 
processing  (e.g.,  Eitts  &  Posner,  1967)  distinguishes  three  basic 
processes:  stimulus  identification,  response  selection,  and  re¬ 
sponse  programming.  "While  stimulus  identification  and  re¬ 
sponse  programming  are  functions  of  stimulus  and  response 
properties,  respectively,  response  selection  is  associated  with 
the  translation  between  stimuli  and  responses  (Welford  1968). 

Translation  is  the  seat  of  the  human  “interface”  between  per¬ 
ception  and  action.  Moreover,  the  effectiveness  of  translation 
processes  at  this  interface  is  influenced  to  a  large  extent  by  the 
relation  between  perceptual  inputs  (e.g.,  stimuli)  and  motor 
outputs  (e.g.,  responses).  Since  the  seminal  work  of  Eitts  and 
colleagues  (Fitts  &  Seeger,  1953;  Fitts  &  Deninger,  1954),  it  has 
been  repeatedly  demonstrated  that  errors  and  choice  reaction 
times  to  stimuli  in  a  spatial  array  decrease  when  the  stimuli  are 
mapped  onto  responses  in  a  spatially  compatible  manner.  Fitts 
and  Seeger  (1953)  referred  to  this  finding  as  stimulus-response 
(S-R)  compatibility  and  ascribed  it  to  cognitive  codes  associated 
with  the  spatial  locations  of  elements  in  the  stimulus  and  re¬ 
sponse  arrays.  Presumably,  it  is  the  degree  of  coding  and  recod¬ 
ing  required  to  map  the  locations  of  stimulus  and  response  el¬ 
ements  that  determine  the  speed  and  accuracy  of  translation 
and  thus  response  selection  (e.g.,  "Wallace,  1971). 

The  relevance  of  studies  of  S-R  compatibility  to  the  domain 
of  human  factors  engineering  is  paramount.  It  is  now  well  un¬ 
derstood  that  the  design  of  an  optimal  human-machine  inter¬ 
face  in  which  effective  S-R  translation  facilitates  fast  and  accurate 
responses  is  largely  determined  by  the  manner  in  which  stimu¬ 
lus  and  response  arrays  are  arranged  and  mapped  onto  each 
other  (e.g.,  Bayerl,  Millen,  &  Lewis,  1988;  Chapanis  &  Linden- 
baum,  1959;  Proctor  &  "Van  Zandt,  1994).  As  a  user,  we  experi¬ 
ence  the  recalibrating  of  perceptual-motor  space  when  we  take 
hold  of  the  mouse  and  move  it  in  a  fairly  random  pattern  when 
we  interact  with  a  computer  for  the  first  time.  Presumably,  what 
we  are  doing  here  is  attempting  to  calibrate  our  actual  move¬ 
ments  to  the  resulting  virtual  movements  of  the  cursor  on  the 
screen.  Thus,  for  optimal  efficiency  of  functioning,  it  seems 
imperative  that  the  system  is  designed  to  require  as  little  re¬ 
calibration  as  possible.  Again,  our  contribution  to  the  previ¬ 
ous  edition  of  this  handbook  reviews  our  work  in  the  area  of 
stimulus-response  translation  and  the  implications  of  this  work 
for  HCI  (Chua  et  al.,  2003).  "We  encourage  those  who  are  more 
interested  in  these  issues  to  read  that  chapter. 
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PERCEPTUAL-MOTOR  INTERACTION: 
ATTENTION  AND  PERFORMANCE 


The  vast  literature  on  selective  attention  and  its  role  in  the  fil¬ 
tering  of  target  from  nontarget  information  (e.g.  Cherry,  1953; 
Treisman,  1964a,  1964b,  1986;  Deutsch  &  Deutch,  1963;  Treis- 
man  &  Gelade,  1980)  has  no  doubt  been  informative  in  the  res¬ 
olution  of  issues  in  HCI  pertaining  to  stimulus  displays  and  in¬ 
puts  (e.g.,  the  use  of  color  and  sound).  However,  attention 
should  not  be  thought  of  as  a  unitary  function,  but  rather  as  a 
set  of  information  processing  activities  that  are  important  for 
perceptual,  cognitive,  and  motor  skills.  Indeed,  the  evolution 
of  HCI  into  the  realm  of  augmented  reality,  teleoperation,  ges¬ 
tural  interfaces,  and  other  areas  that  highlight  the  importance  of 
dynamic  perceptual-motor  interactions,  necessitates  a  greater 
consideration  of  the  role  of  attention  in  the  selection  and  exe¬ 
cution  of  action.  Recent  developments  in  the  study  of  how  se¬ 
lective  attention  mediates  perception  and  action  and,  in  turn, 
how  intended  actions  influence  attentional  processes,  are  poised 
to  make  just  such  a  contribution  to  HCI.  We  will  now  turn  to  a 
review  of  these  developments  and  some  thoughts  on  their  po¬ 
tential  relevance  to  HCI. 


Attention 

We  are  all  familiar  with  the  concept  of  attention  on  a  phenome¬ 
nological  basis.  Even  our  parents,  who  likely  never  formally 
studied  cognition,  demonstrated  their  understanding  of  the  es¬ 
sential  characteristics  of  attention  when  they  directed  us  to  pay 
attention  when  we  were  daydreaming  or  otherwise  not  doing 
what  was  asked.  They  knew  that  humans,  like  computers,  have 
a  limited  capacity  to  process  information  in  that  we  can  only  re¬ 
ceive,  interpret,  and  act  upon  a  fixed  amount  of  information  at 
any  given  moment.  As  such,  they  knew  that  any  additional,  non¬ 
task  processing  would  disrupt  the  performance  of  our  goal  task, 
be  it  homework,  cleaning,  or  listening  to  their  lectures.  But 
what  is  attention?  What  does  it  mean  to  pay  attention?  What  in¬ 
fluences  the  direction  of  our  attention?  The  answers  to  these 
questions  are  fundamental  to  understanding  how  we  interact 
with  our  environment.  Thus,  it  is  paramount  for  those  who  are 
involved  in  the  design  of  HCI  to  consider  the  characteristics  of 
attention  and  its  interactive  relationship  with  action  planning. 

Characteristics  of  Attention 

Attention  is  the  collection  of  processes  that  allow  us  to  ded¬ 
icate  our  limited  information  processing  capacity  to  the  pur¬ 
poseful  (cognitive)  manipulation  of  a  subset  of  available  infor¬ 
mation.  Stated  another  way,  attention  is  the  process  through 
which  information  enters  into  working  memory  and  achieve  the 
level  of  consciousness.  There  are  three  important  characteris¬ 
tics  of  attention  (a)  attention  is  selective  and  allows  only  a  spe¬ 
cific  subset  of  information  to  enter  the  limited  processing  sys¬ 
tem;  (b)  the  focus  of  attention  can  be  shifted  from  one  source  of 
information  to  another;  and,  (c)  attention  can  be  divided  such 
that,  within  certain  limitations,  one  may  selectively  attend  to 


more  that  one  source  of  information  at  a  time.  The  well-known 
“cocktail  party”  phenomenon  (Cherry,  1953)  effectively  demon¬ 
strates  these  characteristics. 

Picture  yourself  at  the  last  busy  party  or  poster  session  you 
attended  where  there  was  any  number  of  conversations  contin¬ 
uing  simultaneously.  You  know  from  your  own  experience  that 
you  are  able  to  filter  out  other  conversations  and  selectively  at¬ 
tend  to  the  single  conversation  in  which  you  are  primarily  en¬ 
gaged.  You  also  know  that  there  are  times  when  your  attention 
is  drawn  to  a  secondary  conversation  that  is  continuing  nearby. 
These  shifts  of  attention  can  occur  automatically,  especially  if 
you  hear  your  name  dropped  in  the  second  conversation,  or 
voluntarily  especially  when  your  primary  conversation  is  boring. 
Finally,  you  know  that  you  are  able  to  divide  your  attention  and 
follow  both  conversations  simultaneously.  However,  although 
you  are  able  to  keep  track  of  each  discussion  simultaneously 
you  will  note  that  your  understanding  and  contributions  to  your 
primary  conversation  diminish  as  you  dedicate  more  and  more 
of  your  attentional  resources  to  the  secondary  conversation. 
The  diminishing  performance  in  your  primary  conversation  is, 
of  course,  an  indication  that  the  desired  amount  of  information 
processing  has  exceeded  your  limited  capacity. 

What  does  the  “cocktail  party”  phenomenon  tell  us  about 
designing  HCI  environments?  The  obvious  implication  is  that,  in 
order  to  facilitate  the  success  of  the  performer,  the  HCI  de¬ 
signer  must  be  concerned  about  limiting  the  stress  on  the  indi¬ 
viduals’  information  processing  systems  by  (a)  creating  inter¬ 
faces  that  assist  in  the  selection  of  the  most  appropriate 
information;  (b)  being  knowledgeable  about  the  types  of  atten¬ 
tion  shifts  and  about  when  (or  when  not)  to  use  them;  and 
(c)  understanding  that,  when  attention  must  be  divided  amongst 
a  series  of  tasks,  that  each  of  these  tasks  should  be  designed  to 
facilitate  automatic  performance  so  as  to  avoid  conflicts  in  the 
division  of  our  limited  capacity  and  preserve  task  performance. 
While  these  suggestions  seem  like  statements  of  the  obvious, 
the  remainder  of  the  chapter  will  delve  deeper  into  these  gen¬ 
eral  characteristics  and  highlight  situations  in  which  some  as¬ 
pects  of  design  might  not  be  as  intuitive  as  it  seems.  Because 
vision  is  the  dominant  modality  of  information  transfer  in  HCI, 
we  will  concentrate  our  discussion  on  visual  selective  attention. 
It  should  be  noted,  however,  that  there  is  a  growing  literature 
on  cross-modal  influences  on  attention,  especially  visual-audi¬ 
tory  system  interactions  (e.g.,  Spence,  Lloyd,  McGlone,  Nichols, 
&  Driver,  2000),  that  will  be  relevant  in  the  near  future. 

Shifts  and  Coordinate  Systems  of  Attention 

Structural  analyses  of  the  retinal  (photo  sensitive)  surface 
of  the  eye  has  revealed  two  distinct  receiving  areas — the  fovea 
and  the  perifoveal  (peripheral)  areas.  The  fovea  is  a  relatively 
small  area  (about  two  to  three  degrees  of  visual  angle)  near  the 
center  of  the  retina,  which  has  the  highest  concentration  of 
color-sensitive  cone  cells.  It  is  this  high  concentration  of  color- 
sensitive  cells  that  provides  the  rich,  detailed  information  that 
we  typically  use  to  identify  objects.  There  are  several  important 
consequences  of  this  structural  and  functional  arrangement. 
First,  because  of  the  foveas’  pivotal  role  in  object  identification 
and  the  importance  of  object  identification  for  the  planning  of 
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action  and  many  other  cognitive  processes,  visual  attention  is 
typically  dedicated  to  the  information  received  by  the  fovea. 
Second,  because  the  fovea  is  such  a  small  portion  of  the  eye, 
we  are  unable  to  derive  a  detailed  representation  of  the  envi¬ 
ronment  from  a  single  fixation.  As  a  result,  it  is  necessary  to  con¬ 
stantly  move  information  from  objects  in  the  environment  onto 
the  fovea  by  rotating  the  eye  rapidly  and  accurately.  These  rapid 
eye  movements  are  known  as  saccadic  eye  movements.  Because 
of  the  tight  link  between  the  location  of  visual  attention  and  sac¬ 
cadic  eye  movements,  these  rapid  eye  movements  are  referred 
to  as  overt  shifts  of  attention. 

Although  visual  attention  is  typically  dedicated  to  foveal  in¬ 
formation,  it  must  be  remembered  that  the  perifoveal  retinal  sur¬ 
face  also  contains  color-sensitive  cells  and,  as  such,  is  able  to  pro¬ 
vide  details  about  objects.  A  covert  shift  of  attention  refers  to  any 
situation  in  which  attention  is  being  dedicated  to  a  nonfoveated 
area  of  space.  Covert  shifts  of  attention  are  employed  when  the 
individual  wants  or  needs  to  maintain  the  fovea  on  a  particular 
object  while  continuing  to  scan  the  remaining  environment  for 
other  stimuli.  Covert  shifts  of  attention  also  occur  immediately 
prior  to  the  onset  of  an  overt  shift  of  attention  or  other  type  of 
action  (e.g..  Shepherd,  Findlay,  &  Hockey,  1986).  For  this  rea¬ 
son,  people  are  often  able  to  identify  stimuli  at  the  location  of 
covert  attention  prior  to  the  acquisition  of  that  location  by  foveal 
vision  (e.g.,  overt  attention)  (Deubel  &  Schneider,  1996). 

Because  attention  is  typically  dedicated  to  the  small  foveal 
subdivision  of  the  retinal  surface,  attention  is  often  considered  to 
work  as  a  spotlight  or  zoom  lens  that  constantly  scans  the  envi¬ 
ronment  (e.g.,  Eriksen  &  Eriksen,  1974).  More  often  than  not, 
however,  the  objects  that  we  attend  to  are  larger  than  the  two 
to  three  degrees  of  visual  angle  covered  by  the  fovea.  Does  this 
mean  that  the  components  of  objects  that  are  outside  of  foveal 
vision  do  not  receive  attentional  processing?  No,  in  fact  it  has 
been  repeatedly  shown  that  attention  can  work  in  an  object- 
based  coordinate  system  where  attention  actually  spreads  along 
the  full  surface  of  an  object  when  attention  is  dedicated  to  a 
small  section  of  the  object  (Davis  &  Driver,  1997;  Egly  Driver,  & 
Rafal,  1994;  see  also  Duncan,  1984).  These  object-centered  at¬ 
tentional  biases  occur  even  when  other  objects  block  connecting 
sections  of  the  continuous  object  (e.g.,  Pratt  &  Sekuler,  2001). 
Finally,  it  should  be  noted  that,  although  entire  objects  receive 
attentional  processing,  particular  sections  of  the  object  often 
receive  preferential  attentional  treatment  often  based  on  the 
action  potential  of  the  object  (see  Attention  and  Stimulus- 
Response  Compatibility  below).  Thus,  attention  should  be  seen 
as  a  flexible  resource  allocation  instead  of  a  fixed  commodity  with 
a  rigid  structure.  The  spotlight  coding  system  is  typically  em¬ 
ployed  during  detailed  discrimination  tasks,  for  example  when 
reading  the  text  on  this  page,  whereas  object-based  coding  might 
be  more  effective  when  gaining  an  appreciation  for  the  context 
of  an  object  in  the  scene  or  the  most  interactive  surface  of  the 
object.  The  relevance  of  the  flexible,  action-dependent  nature  of 
attentional  coding  systems  will  be  readdressed  in  latter  sections. 

Stimulus  Characteristics  and  Shifts  of  Attention 

Both  overt  and  covert  shifts  of  attention  can  be  driven  by 
stimuli  in  the  environment  or  by  the  will  of  the  performer.  Shifts 


of  attention  that  are  driven  by  stimuli  are  known  as  exogenous, 
or  bottom-up,  shifts  of  attention.  They  are  considered  to  be  au¬ 
tomatic  in  nature  and  thus,  for  the  most  part,  are  outside  of  cog¬ 
nitive  influences.  Exogenous  shifts  of  attention  are  typically 
caused  by  a  dynamic  change  in  the  environment  such  as  the 
sudden,  abrupt  appearance  (onset)  or  disappearance  (offset)  of 
a  stimulus  (e.g.,  Pratt  &  McAuliffe,  2001),  a  change  in  the  lumi¬ 
nance  or  color  of  a  stimulus  (e.g.,  Folk,  Remington,  &  Johnston, 
1992;  Posner,  Nissen,  &  Ogden,  1978;  Posner  &  Cohen,  1984),  or 
the  abrupt  onset  of  object  motion  (e.g.,  Abrams  &  Chirst,  2003; 
Folk,  Remington,  &  Wright,  1994).  The  effects  of  exogenous 
shifts  have  a  relatively  rapid  onset,  but  are  fairly  specific  to  the 
location  of  the  dynamic  change  and  are  transient,  typically 
reaching  their  peak  influence  around  100  milliseconds  after  the 
onset  of  the  stimulus  (Cheal  &  Lyon,  1991;  Muller  &  Rabbitt, 
1989).  From  an  evolutionary  perspective,  it  could  be  suggested 
that  these  automatic  shifts  of  attention  developed  because  such 
dynamic  changes  would  provide  important  survival  information 
such  as  the  sudden,  unexpected  appearance  of  a  predator  or 
prey.  However,  in  modern  times,  these  types  of  stimuli  can  be 
used  to  quickly  draw  one’s  attention  to  the  location  of  impor¬ 
tant  information. 

In  contrast,  performer-driven,  or  endogenous,  shifts  of  at¬ 
tention  are  under  complete  voluntary  control.  The  effects  of  en¬ 
dogenous  shifts  of  attention  take  longer  to  develop,  but  can  be 
sustained  over  a  much  longer  period  of  time  (Cheal  &  Lyon, 
1991;  Muller  &  Rabbitt,  1989).  From  an  HCI  perspective,  there 
advantages  and  disadvantages  to  the  fact  that  shifts  of  attention 
can  be  under  cognitive  control.  The  main  benefit  of  cognitive 
control  is  that  shifts  of  attention  can  result  from  a  wider  variety 
of  stimuli  such  as  symbolic  cues  like  arrows,  numbers,  or  words. 
In  this  way,  performers  can  be  cued  to  locations  or  objects  in  the 
scene  with  more  subtle  or  permanent  information  than  the  dy¬ 
namic  changes  that  are  required  for  exogenous  shifts.  The  main 
problem  with  endogenous  shifts  of  attention  is  that  the  act  of  in¬ 
terpreting  the  cue  requires  a  portion  of  the  limited  information 
processing  capacity  and  thus  can  interfere  with,  or  be  interfered 
by,  concurrent  cognitive  activity  (Jonides,  1981). 

Although  it  was  originally  believed  that  top-down  processes 
could  not  influence  exogenous  shifts  of  attention  (e.g.,  that 
dynamic  changes  reflexively  capture  attention  regardless  of  in¬ 
tention),  Folk  et  al.  (1992)  demonstrated  that  this  is  not  always 
the  case.  The  task  in  the  Folk  et  al.  (1992)  study  was  to  identify 
a  stimulus  that  was  presented  in  one  of  four  possible  locations. 
For  some  participants,  the  target  stimulus  was  a  single  abrupt 
onset  stimulus  (the  target  appeared  in  one  location  and  nothing 
appeared  in  the  other  three  locations) ,  whereas  for  the  remain¬ 
ing  participants  the  target  stimulus  was  a  color  singleton  (a  red 
stimulus  that  was  presented  at  the  same  time  as  white  stimuli 
that  appeared  in  the  other  three  possible  locations).  One- 
hundred  fifty  milliseconds  prior  to  the  onset  of  the  target,  par¬ 
ticipants  received  cue  information  at  one  of  the  possible  target 
locations.  The  cue  information  was  either  abrupt  onset  stimuli 
at  a  single  location  or  color  singleton  information.  Across  a  se¬ 
ries  of  experiments.  Folk  et  al.  (1992)  found  that  the  cue  tended 
to  slow  reaction  times  to  the  target  stimulus  when  the  cue  in¬ 
formation  was  presented  at  a  location  that  was  different  from 
where  the  target  subsequently  appeared  indicating  that  atten¬ 
tion  had  initially  been  exogenously  drawn  to  the  cue.  Impor- 
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tantly,  the  cue  stimuli  only  interfered  with  the  identification  of 
the  target  stimulus  when  the  characteristics  of  cue  stimuli 
matched  the  characteristics  of  the  target  stimulus  (e.g.,  onset 
cue-onset  target  and  color  cue-color  target  conditions).  When 
the  characteristics  of  the  cue  did  not  match  the  target  stimulus 
(e.g.,  onset  cue-color  target  and  color  cue-onset  target  condi¬ 
tions),  the  location  of  the  cue  did  not  influence  reaction  times. 
Thus,  these  results  reveal  that  dynamic  changes  only  capture 
attention  when  the  performer  is  searching  for  a  dynamic  change 
stimulus.  Stated  another  way,  it  seems  that  automatic  attentional 
capture  is  dependent  upon  the  expectations  of  the  performer. 
Folk  et  al.  (1992)  suggested  that  people  create  an  attention  set 
in  which  they  establish  their  expectations  for  the  characteris¬ 
tics  of  the  target  stimulus.  Stimuli  meeting  the  established  set 
will  automatically  capture  attention,  whereas  stimuli  that  do  not 
meet  the  established  set  will  not  (see  also  Folk  et  al.,  1994).  The 
obvious  implication  of  these  results  is  that  the  most  efficient 
HCIs  will  be  those  for  which  the  designer  has  considered  per¬ 
ceptual  expectations  of  the  person  controlling  the  system.  As  we 
will  discuss  in  the  Action-Centered  Attention  section,  however, 
consideration  of  the  perceptual  components  alone  is,  at  best,  in¬ 
complete. 

Facilitation  and  inhibition  of  return.  While  it  is  the 
case  that  our  attention  can  be  endogenously  and  exogenously 
shifted  to  any  location  or  object  in  our  environment,  it  seems 
there  are  unconscious  mechanisms  that  work  to  hinder  the 
movement  of  our  attention  to  previously  investigated  locations 
and  objects.  The  existence  of  these  reflexive  mechanisms  was 
first  revealed  through  a  series  of  studies  by  Posner  and  Cohen 
(1984)  and  has  been  reliably  demonstrated  many  times  since. 
The  basic  task  in  these  studies  is  to  respond  as  quickly  as  possi¬ 
ble  following  the  onset  of  a  target  that  randomly  appears  at  one 
of  any  number  of  possible  locations.  The  presentation  of  the  tar¬ 
get  is  preceded  by  a  briefly  presented  cue  that  is  not  predictive 
of  the  target  location  (e.g.,  if  there  are  2  possible  target  loca¬ 
tions,  the  target  will  appear  at  the  location  of  the  cue  on  50% 
of  the  trials  and  at  the  uncued  location  of  the  remaining  50%  of 
the  trials).  The  key  findings  of  these  studies  are  that  (a)  when 
there  is  a  short  time  interval  between  the  onset  of  the  cue  and 
the  onset  of  the  target  (less  than  200  ms),  participants  are  faster 
at  responding  to  targets  presented  at  the  cued  location  than  at 
the  uncued  location;  whereas,  (b)  when  there  is  a  longer  time 
interval  between  the  onset  of  the  cue  and  the  onset  of  the  target 
(greater  than  400-500  ms),  participants  are  faster  at  respond¬ 
ing  to  the  target  presented  at  the  uncued  location  than  at  the 
cued  location  (see  Fig.  1.1  for  some  representative  data).  It  is 
important  to  remember  that  the  reaction  times  to  cued  targets 
are  facilitated  at  short  intervals  and  inhibited  at  longer  intervals 
even  though  the  cue  has  no  predictive  relation  to  the  location  of 
the  subsequent  target.  The  earlier  facilitation  effect  is  thought  to 
arise  because  attention  has  been  exogenously  drawn  from  a 
central  fixation  point  to  the  location  of  the  cue  and  is  still  there 
when  the  target  subsequently  appears — with  attention  already 
at  the  target  location,  subsequent  target  processing  is  efficient. 
As  the  time  elapses  after  the  onset  of  the  cue,  however,  the  per¬ 
former  knows  that  the  target  is  equally  likely  to  appear  at  any  lo¬ 
cation  and  so  endogenously  returns  attention  back  to  the  cen- 
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FIGURE  1.1.  Exemplar  reaction  time  data  as  a  function  of  Tar¬ 
get  Location  (cued  (large  dashed  black  line|  vs.  uncued  (solid 
black  line])  and  Cue-Target  Onset  Asynchrony  (CTOA)  and  hy¬ 
pothetical  activation  levels  of  the  facilitatory  (dotted  line]  and 
inhibitory  (small  dashed  line|  mechanisms  that  cause  Facilitation 
and  Inhibition  of  Return. 


tral  point.  It  is  suggested  that  in  moving  attention  back  to  a  cen¬ 
tral  point,  the  performer  places  an  inhibitory  code  on  the  loca¬ 
tion  of  the  cue.  This  inhibitory  code  subsequently  interferes 
with  the  processing  of  target  when  it  appears  at  the  location  of 
the  cue  and  increases  reactions  for  the  cued  target  relative  to 
any  uncued  (e.g.,  uninhibited)  target  leading  to  the  inhibition  of 
return  (lOR)  effect. 

The  twenty  plus  years  of  research  on  these  phenomena  has 
revealed  many  characteristics  of  the  mechanisms  of  attention 
system  that  are  relevant  for  HCI.  First  is  the  time  course  of  the 
development  of  the  mechanisms  underlying  these  facilitation 
and  lOR  effects.  It  is  thought  that  the  mechanism  of  facilitation 
has  a  very  short  onset  and  a  small  life,  whereas  the  mechanism  of 
inhibition  has  a  longer  latency  but  has  a  much  longer  lasting  in¬ 
fluence  (up  to  three  to  four  seconds;  see  Fig.  1.1).  Thus,  if  a  de¬ 
signer  intends  on  using  a  dynamic  environment  in  which  irrele¬ 
vant,  but  perhaps  aesthetic,  stimuli  constantly  appear,  disappear, 
or  move,  then  it  is  important  to  realize  that  the  users’  perfor¬ 
mance  might  be  negatively  influenced  because  of  the  inadver¬ 
tent  facilitation  of  the  identification  of  nontarget  information  or 
the  inhibition  of  the  identification  of  target  information  depend¬ 
ing  on  the  spatiotemporal  relationship  between  the  relevant  and 
irrelevant  information.  Alternatively,  video  game  designers  could 
exploit  these  effects  to  suit  their  purpose  when  they  want  to  fa¬ 
cilitate  or  hinder  the  performance  of  a  gamer  in  a  certain  situa¬ 
tion  as  it  has  been  shown  that  even  experienced  video  game 
players  demonstrate  facilitation  and  lOR  effects  of  similar  mag¬ 
nitude  to  novice  gamers  (Castel,  Drummond,  &  Pratt,  2005). 
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The  second  important  feature  is  that  facilitation  and  lOR 
only  occur  when  attention  has  been  shifted  to  the  location  of 
the  cue.  Thus,  if  the  designer  is  using  cues  that  typically  only 
produce  endogenous  shifts  of  attention  (e.g.,  by  using  a  sym¬ 
bolic  cue  such  as  an  arrow  or  word  that  indicates  a  particular 
location),  and  then  reaction  times  will  be  similar  across  cued 
and  uncued  targets  (Posner  &  Cohen,  1984).  The  lack  of  facili¬ 
tation  and  lOR  following  symbolic  cues  is  thought  to  occur  be¬ 
cause  the  participant  can  easily  ignore  the  cue  and  prevent  the 
shift  of  attention  to  the  cued  location  that  activates  the  facilita- 
tory  or  inhibitory  mechanisms.  It  is  important  to  note,  however, 
the  mechanisms  of  lOR  are  activated  each  time  attention  has 
been  shifted.  Thus,  symbolic  information  can  result  in  lOR  if  the 
participant  does  shift  their  attention  to  the  cued  location  (even 
if  they  were  asked  not  to)  or  if  the  participant  is  required  to  re¬ 
spond  to  the  cue  information  (Rafal,  Calabresi,  Brennan,  & 
Sciolto,  1989;  Taylor  &  Klein,  2000).  Although  symbolic  infor¬ 
mation  presented  at  locations  that  are  outside  of  the  target  lo¬ 
cations  do  not  typically  activate  the  possibly  detrimental  mech¬ 
anisms  of  facilitation  and  lOR,  one  must  still  use  caution  when 
presenting  such  information  and  be  sensitive  to  context  (e.g., 
how  similar  symbolic  cues  have  been  used  in  previous  interac¬ 
tions)  and  the  response  expectations  of  the  user. 

Finally,  as  briefly  mentioned  in  the  previous  paragraph,  it  is 
important  to  realize  that  the  inhibitory  mechanisms  of  lOR  are 
also  activated  by  responding  to  a  location.  That  is,  when  people 
are  required  to  make  a  series  of  responses  to  targets  that  are  ran¬ 
domly  presented  at  one  of  a  number  of  possible  locations,  they 
are  slower  at  initiating  response  n  when  it  is  the  same  as  re¬ 
sponse  n-1  than  when  it  is  different  from  response  n-1  (Maylor  & 
Hockey,  1985;  Tremblay,  Welsh,  &  Elliott,  2005).  This  target-target 
lOR  effect  has  important  implications  for  two  reasons.  First,  if  the 
designer  intends  on  requiring  the  user  to  complete  a  series  of 
choice  responses  to  targets  that  appear  at  the  same  locations 
each  time  and  the  user  is  uncertain  as  to  where  each  target  will 
appear  (as  in  typical  interactions  with  automated  banking  ma¬ 
chines),  then  the  user  will  be  slower  at  responding  to  targets  that 
appear  successively  at  the  same  location  than  to  targets  that  ap¬ 
pear  at  new  locations.  When  it  is  also  considered  that  it  has  been 
shown  that,  in  a  cue-target  lOR  experiments,  inhibitory  codes 
can  be  maintained  at  up  to  four  locations  simultaneously  (Tip¬ 
per,  Weaver,  &  Watson,  1996;  Wright  &  Richard,  1996;  c£,  Abrams 
&  Pratt,  1996),  the  designer  must  be  cautious  about  designing 
displays  that  use  similar  locations  on  successive  interactions.  The 
second  reason  that  an  understanding  of  target-target  lOR  is  im¬ 
portant  for  HCI  is  that  we  have  recently  shown  that  target-tar¬ 
get  lOR  effects  transfer  across  people  (Welsh,  Elliott,  Anson, 
Dhillon,  Weeks,  Lyons,  et  al.,  2005;  Welsh,  Lyons,  Weeks,  Anson, 
Chua,  Mendoza,  et  al.,  in  press).  That  is,  if  two  people  are  per¬ 
forming  a  task  in  which  they  must  respond  to  a  series  of  succes¬ 
sive  targets,  then  an  individual  will  be  slower  at  returning  to  the 
previously  responded-to  location  regardless  if  that  person  or 
their  partner  completed  the  initial  response.  Although  the  re¬ 
search  on  social  attention  is  in  its  infancy,  the  results  of  our  work 
indicate  that  those  involved  in  the  emerging  fields  of  virtual  re¬ 
ality  or  immersive  collaborative  and  multiuser  environments 
must  be  aware  of  and  consider  how  attention  is  influenced  by  a 
variety  of  stimuli  including  the  actions  of  other  participants. 


Action-Centered  Attention 

The  majority  of  the  literature  reviewed  thus  far  has  involved 
experiments  that  investigated  attentional  processes  through 
tasks  that  employed  simple  or  choice  key  press  actions.  Cog¬ 
nitive  scientists  typically  use  these  arbitrary  responses  (a)  be¬ 
cause  key  press  responses  are  relatively  uncomplicated  and 
provide  simple  measures  of  performance,  namely  reaction  time 
and  error;  and  (b)  because,  by  using  a  simple  response,  the  re¬ 
searcher  assumes  that  they  have  isolated  the  perceptual  and  at¬ 
tentional  processes  of  interest  from  additional,  complex  mo¬ 
tor  programming  and  control  processes.  While  there  are 
certainly  numerous  examples  of  HCIs  in  which  the  desired  re¬ 
sponse  is  an  individual  key  press  or  series  of  key  presses,  there 
are  perhaps  as  many  situations  in  which  movements  that  are 
more  complicated  are  required.  As  HCIs  move  increasingly  into 
virtual  reality,  touch  screen,  and  other  more  complex  environ¬ 
ments,  it  will  become  more  and  more  important  to  consider 
the  ways  in  which  attention  and  motor  processes  interact. 
Thus,  it  will  become  more  and  more  critical  to  determine  if  the 
same  principles  of  attention  apply  when  more  involved  motor 
responses  are  required.  In  addition,  some  cognitive  scientists 
have  suggested  that,  because  human  attention  systems  have 
developed  through  evolution  to  acquire  the  information  re¬ 
quired  to  plan  and  control  complex  actions,  studying  attention 
under  such  constrained  response  conditions  may  actually  pro¬ 
vide  an  incomplete  or  biased  view  of  attention  (Allport,  1987; 
1993).  The  tight  link  between  attention  and  action  is  apparent 
when  one  recognizes  that  covert  shifts  of  attention  occur  prior 
to  saccadic  eye  movements  (Deubel  &  Schneider,  1996)  and 
that  overt  shifts  of  attention  are  tightly  coupled  to  manual  aim¬ 
ing  movements  (Helsen,  Elliott,  Starkes,  &  Ricker,  1998;  2000). 
Such  considerations,  in  combination  with  neuroanatomical 
studies  revealing  tight  links  between  the  attention  and  motor 
centers  (Rizzolatti,  Riggio,  &  Sheliga,  1994),  have  lead  to  the  de¬ 
velopment  of  action-centered  models  of  attention  (Rizzolatti, 
Riggio,  Dascola,  &  Umilta,  1987;  Tipper,  Howard,  &  Houghton, 
1999;  Welsh  &  Elliott,  2004a). 

Arguably  the  most  influential  work  in  the  development  of  the 
action-centered  models  was  the  paper  by  Tipper,  Lortie,  and 
Baylis  (1992).  Participants  in  these  studies  were  presented  with 
nine  possible  target  locations,  arranged  in  a  three-by- three  ma¬ 
trix  and  were  asked  to  identify  the  location  of  a  target  stimulus 
appearing  at  one  of  these  locations  while  ignoring  any  non¬ 
target  stimuli  presented  at  one  of  the  remaining  eight  locations. 
The  key  innovation  of  this  work  was  that  Tipper  and  colleagues 
(1992)  asked  participants  to  complete  a  rapid  aiming  movement 
to  the  target  location  instead  of  identifying  it  with  a  key  press. 
Consistent  with  traditional  key  press  studies,  the  presence  of  a 
distractor  was  found  to  increase  response  times  to  the  target.  Al¬ 
though  the  finding  of  distractor  interference  in  this  selective 
reaching  task  was  an  important  contribution  to  the  field  in  and 
of  itself,  the  key  discovery  was  that  the  magnitude  of  the  inter¬ 
ference  effects  caused  by  a  particular  distractor  location  was 
dependent  on  the  aiming  movement  being  completed.  Specifi¬ 
cally,  it  was  found  that  distractors  (a)  along  the  path  of  the 
movement  caused  more  interference  than  those  that  were  out¬ 
side  the  path  of  the  movement  (the  proximity-to-hand  effect) ; 
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and,  (b)  ipsilateral  to  the  moving  hand  caused  more  interfer¬ 
ence  than  those  in  the  contralateral  side  of  space  (the  ipsilateral 
effect).  Based  on  this  pattern  of  interference,  Tipper  et  al. 
(1992)  concluded  that  attention  and  action  are  tightly  linked 
such  that  the  distribution  of  attention  is  dependent  on  the  ac¬ 
tion  that  was  being  performed  (e.g.,  attention  was  distributed  in 
an  action-centered  coordinate  system)  and  that  the  dedication 
of  attention  to  a  stimulus  evokes  a  response  to  that  stimulus. 

While  the  Tipper  et  al.  (1992)  paper  was  critical  in  initiating 
investigations  into  action-centered  attention,  more  recent  re¬ 
search  has  demonstrated  that  the  behavioral  consequences  of 
selecting  and  executing  target-directed  actions  in  the  presence 
of  action-relevant  nontarget  stimuli  extend  beyond  the  time 
taken  to  prepare  and  execute  the  movement  (e.g.,  Meegan  & 
Tipper,  1998;  Pratt  &  Abrams,  1994).  Investigations  in  our  labs 
and  others  have  revealed  that  the  actual  execution  of  the  move¬ 
ment  changes  in  the  presence  of  distractors.  For  example,  there 
are  reports  that  movements  will  deviate  towards  (Welsh,  Elliott, 
&  Weeks,  1999;  Welsh  &  Elliott,  2004a;  2005)  or  away  from 
(Howard  &  Tipper,  1997;  Tipper,  Howard,  &  Jackson,  1997; 
Welsh  &  Elliott,  2004a)  the  nontarget  stimulus.  Although  these 
effects  seem  paradoxical,  Welsh  and  Elliott  (2004a)  have  formu¬ 
lated  and  tested  (e.g.,  Welsh  &  Elliott,  2004a;  2004b;  2005)  a 
conceptual  model  that  can  account  for  these  movement  execu¬ 
tion  effects. 

The  model  of  response  activation.  Consistent  with  the 
conclusions  of  Tipper  et  al.  (1992),  Welsh  and  Elliott  (2004a) 
based  the  model  of  response  activation  on  the  premise  that  at¬ 
tention  and  action  processes  are  so  tightly  linked  that  the  dedi¬ 
cation  of  attention  to  a  particular  stimulus  automatically  initiates 
response-producing  processes  that  are  designed  to  interact  with 
that  stimulus.  Responses  are  activated  to  attended  stimuli  re¬ 
gardless  of  the  nature  of  attentional  dedication  (e.g.,  reflexive 
or  voluntary).  It  is  proposed  that  each  time  a  performer  ap¬ 
proaches  a  known  scene,  a  response  set  is  established  in  work¬ 
ing  memory  in  which  the  performer  identifies  and  maintains  the 
characteristics  of  the  expected  target  stimulus  and  the  charac¬ 
teristics  of  the  expected  response  to  that  stimulus.  Thus,  the 
response  set  in  the  model  of  response  activation  is  an  extension 
of  the  attentional  set  of  Eolk  et  al.  (1992)  in  that  the  response  set 
includes  the  performer’s  expectations  of  the  target  stimulus  as 
well  as  preexcited  (preprogrammed)  and/or  preinhibited  re¬ 
sponse  codes.  Each  stimulus  that  matches  the  physical  charac¬ 
teristics  established  in  the  response  set  captures  attention  and, 
as  a  result,  activates  an  independent  response  process.  Stimuli 
that  do  not  possess  at  least  some  of  the  expected  characteristics 
do  not  capture  attention  and  thus  do  not  activate  responses. 
Thus,  if  only  one  stimulus  in  the  environment  matches  the  re¬ 
sponse  set,  then  that  response  process  is  completed  unop¬ 
posed  and  the  movement  emerges  rapidly  and  in  an  unconta¬ 
minated  form.  On  the  other  hand,  under  conditions  in  which 
more  than  one  stimulus  matches  the  response  set,  multiple  re¬ 
sponse  representations  are  triggered  and  subsequently  race  one 
another  to  surpass  the  threshold  level  of  neural  activation  re¬ 
quired  to  initiate  a  response.  It  is  important  to  note  that  this  is 
not  a  winner-take-all  race  where  only  the  characteristics  of  the 
winning  response  influence  the  characteristics  of  actual  move¬ 
ment  alone.  Instead,  the  characteristics  of  the  observed  move¬ 


ment  are  determined  by  the  activation  level  of  each  of  the  com¬ 
peting  responses  at  the  moment  of  movement  initiation.  In  this 
way,  if  more  than  one  neural  representation  is  active  (or  if  one  is 
active  and  one  is  inhibited)  at  response  initiation,  then  the 
emerging  response  will  have  characteristics  of  both  responses 
(or  characteristics  that  are  opposite  to  the  inhibited  response) . 

The  final  relevant  element  of  the  model  is  that  the  activation 
level  of  each  response  is  determined  by  at  least  three  interactive 
factors:  the  salience  of  the  stimulus  and  associated  response, 
an  independent  inhibitory  process,  and  the  time  course  of  each 
independent  process.  The  first  factor,  the  salience  or  action-rel¬ 
evancy  of  the  stimulus,  is  in  fact  the  summation  of  a  number  of 
separate  components  including  the  degree  attentional  capture 
(based  on  the  similarity  between  the  actual  and  anticipated 
stimulus  within  the  response  set),  the  complexity  of  the  re¬ 
sponse  afforded  by  the  stimulus,  and  the  S-R  compatibility. 
When  attentional  capture  and  stimulus-response  compatibil¬ 
ity  are  maximized  and  response  complexity  minimized,  the 
salience  of  an  individual  response  is  maximized  and  the  re¬ 
sponse  to  that  stimulus  will  be  activated  rapidly.  The  second  fac¬ 
tor  that  influences  the  activation  level  of  a  response  is  an  inde¬ 
pendent  inhibitory  process  that  works  to  eliminate  nontarget 
responses.  When  this  inhibitory  mechanism  has  completed  its 
task,  the  neuronal  activity  associated  with  the  inhibited  re¬ 
sponse  will  have  been  reduced  to  below  baseline.  Thus,  in  ef¬ 
fect,  an  inhibited  response  will  add  characteristics  that  are  op¬ 
posite  to  it  to  the  formation  of  the  target  response.  The  final 
factor  that  contributes  to  the  activation  level  of  each  indepen¬ 
dent  response  process  is  the  time  course  of  the  development 
of  the  representation.  It  is  assumed  that  the  neural  representa¬ 
tions  of  each  response  do  not  develop  instantaneously  and  that 
the  inhibitory  mechanism  that  eliminates  nontarget  responses 
does  not  instantly  remove  the  undesired  neural  activity.  Instead, 
each  of  these  processes  requires  time  to  reach  an  effective  level 
and  the  time  course  of  each  responses’  development  will  be  in¬ 
dependent  of  one  another.  Eor  example,  a  response  to  a  very 
salient  stimulus  will  achieve  a  higher  level  of  activation  sooner 
than  a  response  to  a  stimulus  with  a  lower  saliency.  If  this  stim¬ 
ulus  of  high  salience  evokes  the  desired  response,  then  re¬ 
sponses  to  other,  less  salient  stimuli  will  cause  little  interference 
because  they  simply  do  not  reach  as  high  an  activation  level 
than  the  target  response.  In  contrast,  when  nontarget  stimuli 
are  more  salient,  then  the  interference  is  much  more  severe 
(Welsh  &  Elliott,  2005) .  In  sum,  to  extend  the  analogy  of  the  race 
model,  responses  that  run  faster  (the  responses  were  evoked  by 
more  salient  stimuli),  have  a  head  start  relative  to  another  (one 
stimulus  was  presented  prior  to  another) ,  or  have  a  shorter  dis¬ 
tance  to  go  to  the  finish  line  (the  response  was  partially  pre¬ 
programmed  in  the  response  set)  will  achieve  a  higher  level  of 
activation  and  will,  as  a  result,  contribute  more  to  the  charac¬ 
teristics  of  the  final  observable  movement  than  ones  that  are 
further  behind. 

So,  what  implications  does  the  model  of  response  activation 
have  for  the  design  of  HCI?  In  short,  because  the  model  of  re¬ 
sponse  activation  provides  a  fairly  comprehensive  account  of 
movement  organization  in  complex  environments,  it  could  be 
used  as  the  basis  for  the  design  of  interfaces  that  consider  the 
cognitive  system  as  an  interactive  whole  as  opposed  to  separate 
units  of  attention  and  movement  organization.  One  of  the 


more  obvious  implications  is  that  a  designer  should  consider 
the  time  intervals  between  the  presentation  of  each  stimulus  in 
a  multiple  stimuli  set  as  this  can  have  dramatic  effects  on  the 
performers  ability  to  quickly  respond  to  each  stimulus  (e.g., 
psychological  refractory  period;  Telford,  1931;  Pashler,  1994) 
and  the  physical  characteristics  of  each  response  (Welsh  & 
Elliott,  2004a). 

Perhaps  more  importantly,  the  model  highlights  the  impor¬ 
tance  for  the  designer  to  consider  the  interactions  between  at¬ 
tention  and  motor  processing  because  there  are  some  situa¬ 
tions  in  which  the  transfer  from  simple  to  complex  movements 
is  straightforward,  whereas  there  are  others  that  do  not  trans¬ 
fer  at  all.  The  study  by  Bekkering  and  Pratt  (2004)  provided  a 
good  demonstration  of  a  situation  in  which  the  interaction  be¬ 
tween  attention  and  action  provides  the  same  results  in  key 
press  and  aiming  responses.  Specifically,  they  showed  that  a  dy¬ 
namic  change  on  one  location  of  an  object  facilitates  reaction 
times  for  aiming  movements  to  any  other  portion  of  the  object 
(e.g.,  attention  can  move  in  object-centered  coordinate  systems 
in  aiming  responses  as  it  does  for  key  press  response).  However, 
Welsh  &  Pratt  (2005)  recently  found  that  the  degree  of  atten- 
tional  capture  by  some  dynamic  changes  is  different  when  key 
press  and  spatially-directed  responses  are  required.  In  this 
study,  participants  were  asked  to  identify  the  location  of  an  on¬ 
set  or  offset  target  stimulus  while  ignoring  a  distractor  stimulus 
of  the  opposite  characteristics  (e.g.,  onset  targets  were  paired 
with  offset  distractors  and  vice  versa).  In  separate  experiments, 
participants  responded  to  the  target  stimulus  with  a  choice  key 
press  response  or  an  aiming  movement  to  the  target  location. 
The  results  indicated  that  an  onset  distractor  slowed  respond¬ 
ing  to  an  offset  target  in  both  tasks.  Offset  distractor,  on  the 
other  hand,  only  interfered  with  task  performance  when  a  key 
press  was  required.  When  participants  were  asked  to  perform 
an  aiming  movement,  an  offset  distractor  actually  caused  a  non¬ 
significant  facilitation  effect.  Similar  action-specific  interference 
effects  have  been  shown  across  pointing  and  grasping  actions 
(Bekkering  &  Neggers,  2002),  pointing  and  verbal  responses 
(Meegan  &  Tipper,  1999),  and  different  types  of  pointing  re¬ 
sponses  (Meegan  &  Tipper,  1999;  Tipper,  Meegan,  &  Howard, 
2002).  In  sum,  now  that  HCI  is  moving  into  virtual  reality  and 
other  types  of  assisted  response  devices  (voice  activated,  head 
mounted,  roller  ball,  and  eye-gaze  mouse  systems),  it  will  be¬ 
come  increasingly  important  to  consider  the  required  and/or 
anticipated  action  when  designing  HCI  environments.  Given 
our  emphasis  on  the  importance  of  considering  the  interaction 
of  attention  and  motor  processes,  we  will  explore  this  issue  in 
greater  detail  in  the  following  sections. 

Attention  and  action  requirements.  Initial  investiga¬ 
tions  into  action-centered  attention  were  focused  primarily  on 
the  influence  that  the  spatial  location  of  distractors  with  re¬ 
spect  to  the  target  had  on  the  planning  and  execution  of  ac¬ 
tion  (e.g.,  Meegan  &  Tipper,  1998;  Pratt  &  Abrams,  1994;  Tipper 
et  al.,  1992).  In  that  context,  an  action-centered  framework 
could  offer  a  useful  perspective  for  the  spatial  organization  of 
perceptual  information  presented  in  an  HCI  context.  However, 
often  the  reason  for  engaging  a  target  in  an  HCI  task  is  that  the 
target  symbolically  represents  an  outcome  or  operation  to 
be  achieved.  Indeed,  this  is  what  defines  an  icon  as  a  target — 
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target  features  symbolically  carry  a  meaning  that  defines  it  as 
the  appropriate  target.  An  interest  in  the  application  of  the 
action-centered  model  to  human  factors  and  HCI  led  Weir, 
Weeks,  Welsh,  Elliott,  Chua,  Roy,  and  Lyons  (2003)  to  consider 
whether  or  not  distractor  effects  could  be  elicited  based  upon 
the  specific  actions  required  to  engage  a  target  and  distractor 
object.  The  question  was  whether  the  engagement  properties 
of  target  and  distractor  objects  (e.g.,  turn  or  pull)  in  a  control 
array  would  mediate  the  influence  of  the  distractor  on  the  con¬ 
trol  of  movement.  In  that  study,  participants  executed  their 
movements  on  a  control  panel  that  was  located  directly  in  front 
of  them.  On  some  trials,  the  control  panel  consisted  of  a  sin¬ 
gle  pull-knob  or  right-turn  dial  located  at  the  midline  either 
near  or  far  from  a  starting  position  located  proximal  to  the  par¬ 
ticipant.  On  other  trials,  a  second  control  device  (pull  knob  or 
dial)  was  placed  into  the  other  position  on  the  display.  If  this 
second  device  was  present,  it  served  as  a  distractor  object  and 
was  to  be  ignored.  Weir  et  al.  (2003)  found  that  the  distractor 
object  only  interfered  with  the  programming  of  the  response  to 
the  target  stimulus  when  the  distractor  afforded  a  different  re¬ 
sponse  from  the  target  response  (e.g.,  when  the  pull  knob  was 
presented  with  the  turn  dial) .  These  results  suggest  that  when 
moving  in  an  environment  with  distracting  stimuli  or  objects, 
competing  responses  may  be  programmed  in  parallel  and  that 
these  parallel  processes  will  only  interfere  with  one  another 
when  they  are  incompatible.  The  implication  is  that  the  termi¬ 
nal  action  required  to  engage  the  objects  in  the  environment 
is  also  important  to  the  distribution  of  attention  and  movement 
planning  and  execution. 

In  addition  to  considering  the  terminal  action  requirements, 
other  work  in  our  labs  suggests  that  the  manner  in  which  the  ac¬ 
tions  are  completed  is  an  equally  important  concern.  For  ex¬ 
ample,  the  “cluttered”  environment  of  response  buttons  em¬ 
ployed  by  researchers  interested  in  selective  reaching  struck  us 
as  being  analogous  to  the  array  of  icons  present  in  a  typical  GUI. 
In  a  study  Lyons,  Elliott,  Ricker,  Weeks,  and  Chua  (1999)  sought 
to  determine  whether  these  paradigms  could  be  imported  into 
a  “virtual”  environment  and  ultimately  serve  as  a  test  bed  for 
investigations  of  perceptual-motor  interactions  in  an  HCI  con¬ 
text.  The  task  space  in  the  Lyons  et  al.  (1999)  study  utilized  a 
three-by-three  matrix  similar  to  that  used  by  Tipper  et  al.  (1992). 
The  matrix,  made  up  of  nine  blue  circles,  was  displayed  on  a 
monitor  placed  vertically  in  front  of  the  participant.  Participants 
were  required  to  move  the  mouse  on  the  graphics  tablet,  which 
would  in  turn  move  a  cursor  on  the  monitor  in  the  desired  di¬ 
rection  toward  the  target  (red)  circle  while  ignoring  any  distrac¬ 
tor  (yellow)  circles.  The  participants  were  unable  to  view  their 
hand;  the  only  visual  feedback  of  their  progress  was  from  the 
cursor  moving  on  the  monitor.  The  graphics  tablet  allowed  the 
researchers  to  record  displacement  and  time  data  of  the  mouse 
throughout  the  trial.  In  contrast  to  previous  experiments  (e.g., 
Meegan  &  Tipper,  1998;  Tipper  et  al.,  1992),  the  presence  of  a 
distractor  had  relatively  little  influence  on  performance.  Lyons 
et  al.  (1999)  postulated  that,  in  a  task  environment  in  which 
perceptual-motor  interaction  is  less  direct  (e.g.,  using  a  mouse 
to  move  a  cursor  on  a  remote  display)  perceptual  and  motor 
workspaces  are  misaligned,  and  the  increased  translation  pro¬ 
cessing  owing  to  the  misalignment  serves  to  limit  the  impact  of 
distractor  items. 
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To  test  this  idea,  Lyons  et  al.  (1999)  modified  the  task  envi¬ 
ronment  so  as  to  align  the  perceptual  and  motor  workspaces. 
The  monitor  was  turned  and  held  screen  down  inside  a  sup¬ 
port  frame.  The  same  three-by-three  matrix  was  displayed  on 
the  monitor  and  reflected  into  a  half-silvered  mirror  positioned 
above  the  graphics  tablet  allowing  for  sufficient  space  for  the 
participant  to  manipulate  the  mouse  and  move  the  cursor  to 
the  target  without  vision  of  the  hand.  With  this  configuration, 
the  stimulus  display  was  presented  and  superimposed  on  the 
same  plane  as  the  motor  workspace  (e.g.,  the  graphics  tablet). 
Under  this  setup,  distractor  effects  emerged  and  were  consis¬ 
tent  with  an  action-centered  framework  of  attention.  Taken  to¬ 
gether,  these  findings  underscore  the  influence  of  translation 
requirements  demanded  by  relative  alignment  of  perceptual 
and  motor  workspaces.  More  importantly,  these  findings  sug¬ 
gest  that  even  relatively  innocuous  changes  to  the  layout  of 
the  task  environment  may  have  significant  impact  on  processes 
associated  with  selective  attention  in  the  mediation  of  action  in 
an  HCI  context. 

Attention  and  stimulus-response  compatibility.  Thus 
far  we  have  only  lightly  touched  on  the  issue  of  attention  and 
S-R  compatibility.  However,  the  action-centered  model  of  selec¬ 
tive  attention  clearly  advocates  the  view  that  attention  and  ac¬ 
tion  are  intimately  linked.  The  fundamental  premise  is  that  at¬ 
tention  mediates  perceptual-motor  interactions,  and  these,  in 
turn,  influence  attention.  Consistent  with  this  perspective,  the 
role  of  attention  in  the  translation  between  perceptual  inputs 
and  motor  outputs  has  also  received  considerable  interest  over 
the  past  decade.  As  discussed  above,  a  key  element  in  the  se¬ 
lection  of  an  action  is  the  translation  between  stimuli  and  re¬ 
sponses,  the  effectiveness  of  which  is  influenced  to  a  large  ex¬ 
tent  by  the  spatial  relation  between  the  stimuli  and  responses. 
The  degree  of  coding  and  recoding  required  to  map  the  loca¬ 
tions  of  stimulus  and  response  elements  has  been  proposed  to 
be  a  primary  determinant  of  the  speed  and  accuracy  of  transla¬ 
tion  (e.g.  Wallace,  1971).  Attentional  processes  have  been  im¬ 
plicated  in  the  issue  of  how  relative  spatial  stimulus  information 
is  coded.  Specifically,  the  orienting  of  attention  to  the  location  of 
a  stimulus  has  been  proposed  to  result  in  the  generation  of  the 
spatial  stimulus  code. 

Initial  interest  in  the  link  between  attention  orienting  and 
spatial  translation  have  emerged  as  a  result  of  attempts  to  ex¬ 
plain  the  Simon  effect.  The  Simon  effect  (Simon,  1968;  Simon  & 
Rudell,  1969),  often  considered  a  variant  of  spatial  S-R  compati¬ 
bility,  occurs  in  a  situation  in  which  a  nonspatial  stimulus  at¬ 
tribute  indicates  the  correct  response  and  the  spatial  attribute  is 
irrelevant  to  the  task.  Thus,  the  spatial  dimension  of  the  stimu¬ 
lus  is  an  irrelevant  attribute,  and  a  symbolic  stimulus  feature 
constitutes  the  relevant  attribute.  Although  the  spatial  stimulus 
attribute  is  irrelevant  to  the  task,  faster  responding  is  found 
when  the  position  of  the  stimulus  and  the  position  of  the  re¬ 
sponse  happen  to  correspond.  A  number  of  researchers  (e.g., 
Umilta  &  Nicoletti,  1992)  have  suggested  that  attentional  pro¬ 
cesses  may  be  a  unifying  link  between  the  Simon  effect  and  the 
spatial  compatibility  effect  proper.  Specifically,  the  link  between 
attention  and  action  in  these  cases  is  that  a  shift  in  attention  is 
postulated  to  be  the  mechanism  that  underlies  the  generation 


of  the  spatial  stimulus  code  (e.g.,  Nicoletti  &  Umilta,  1989, 
1994;  Proctor  &  Lu,  1994;  Rubichi,  Nicoletti,  lani,  &  Umilta, 
1997;  Stoffer,  1991;  Stoffer  &  Umilta,  1997;  Umilta  &  Nicoletti, 
1992).  According  to  an  attention-shift  account,  when  a  stimulus 
is  presented  to  the  left  or  right  of  the  current  focus  of  atten¬ 
tion,  a  reorienting  of  attention  occurs  toward  the  location  of 
the  stimulus.  This  attention  shift  is  associated  with  the  genera¬ 
tion  of  a  spatial  code  that  specifies  the  position  of  the  stimu¬ 
lus  with  respect  to  the  last  attended  location.  If  this  spatial  stim¬ 
ulus  code  is  congruent  with  the  spatial  code  of  the  response, 
then  S-R  translation,  and  therefore  response  selection,  is  facili¬ 
tated.  If  the  two  codes  are  incongruent,  response  selection  is 
hindered. 

Recent  work  in  our  lab  has  also  implicated  a  role  for  atten¬ 
tion  shifts  in  compatibility  effects  and  object  recognition.  In 
these  studies,  Lyons,  Weeks,  and  Chua  (2000a;  2000b)  sought 
to  examine  the  influence  of  spatial  orientation  on  the  speed  of 
object  identification.  Participants  were  presented  with  video 
images  of  common  objects  that  possessed  a  graspable  surface 
(e.g.,  a  tea  cup,  frying  pan),  and  were  instructed  to  make  a 
left  or  right  key  press  under  two  distinct  mapping  rules  de¬ 
pending  on  whether  the  object  was  in  an  upright  or  inverted 
vertical  orientation.  The  first  mapping  rule  required  partici¬ 
pants  to  respond  with  a  left  key  press  when  the  object  was  in¬ 
verted  and  a  right  key  press  when  the  object  was  upright.  The 
opposite  was  true  for  the  second  mapping  rule.  The  orienta¬ 
tion  of  the  object’s  graspable  surface  was  irrelevant  to  the  task. 
The  results  showed  that  identification  of  object  orientation 
was  facilitated  when  the  graspable  surface  of  the  object  was 
also  oriented  to  the  same  side  of  space  as  the  response  (see 
also  Tucker  &  Ellis,  1998).  In  contrast,  when  participants  were 
presented  with  objects  that  possessed  symmetrical  graspable 
surfaces  on  both  sides  (e.g.,  a  sugar  bowl  with  two  handles), 
identification  of  object  orientation  was  not  facilitated.  Lyons 
et  al.  (2000b)  also  showed  that  response  facilitation  was  evi¬ 
dent  when  the  stimuli  consisted  simply  of  objects  that,  though 
may  not  inherently  be  graspable,  possessed  a  left-right  asym¬ 
metry.  Taken  together,  these  results  were  interpreted  in  terms 
of  an  attentional  mechanism.  Specifically,  Lyons  et  al.  (2000a; 
2000b)  proposed  that  a  left-right  object  asymmetry  (e.g.,  a 
protruding  handle)  might  serve  to  capture  spatial  attention 
(cf ,  Tucker  &  Ellis,  1998).  If  attention  is  thus  oriented  toward 
the  same  side  of  space  as  the  ensuing  action,  the  spatial  code 
associated  with  the  attention  shift  (e.g.,  see  discussion  above) 
would  lead  to  facilitation  of  the  response.  In  situations  in 
which  no  such  object  asymmetry  exists,  attentional  capture 
and  orienting  may  be  hindered,  and  as  a  result,  there  is  no  fa¬ 
cilitation  of  the  response. 

Taken  into  the  realm  of  HCI,  it  is  our  position  that  the  inter¬ 
play  between  shifts  of  attention,  spatial  compatibility,  and  object 
recognition  will  be  a  central  human  performance  factor  as  tech¬ 
nological  developments  continue  to  enhance  the  directness  of 
direct-manipulation  systems  (cf,  Shneiderman,  1983;  1992). 
Specifically,  as  interactive  environments  become  better  abstrac¬ 
tions  of  reality  with  greater  transparency  (Rutkowski,  1982),  the 
potential  influence  of  these  features  of  human  information¬ 
processing  will  likely  increase.  Thus,  it  is  somewhat  ironic  that 
the  view  toward  virtual  reality,  as  the  solution  to  the  problem 


of  creating  the  optimal  display  representation,  may  bring  with 
it  an  “unintended  consequence”  (Tenner,  1996).  Indeed,  the  op¬ 
erator  in  such  an  HCI  environment  will  be  subject  to  the  same 
constraints  that  are  present  in  everyday  life. 

The  primary  goal  of  human  factors  research  is  to  guide  tech¬ 
nological  design  in  order  to  optimize  perceptual-motor  inter¬ 
actions  between  human  operators  and  the  systems  they  use 
within  the  constraints  of  maximizing  efficiency  and  minimiz¬ 
ing  errors.  Thus,  the  design  of  machines,  tools,  interfaces,  and 
other  sorts  of  devices  utilizes  knowledge  about  the  character¬ 
istics,  capabilities,  as  well  as  limitations,  of  the  human  percep¬ 
tual-motor  system.  In  computing,  the  development  of  input 
devices  such  as  the  mouse  and  graphical  user  interfaces  was  in¬ 
tended  to  improve  human-computer  interaction.  As  technol¬ 
ogy  has  continued  to  advance,  the  relatively  simple  mouse  and 
graphical  displays  have  begun  to  give  way  to  exploration  of 
complex  gestural  interfaces  and  virtual  environments.  This  de¬ 
velopment  may  be,  perhaps  in  part,  a  desire  to  move  beyond 
the  artificial  nature  of  such  devices  as  the  mouse,  to  ones  that 
provide  a  better  mimic  of  reality.  Why  move  an  arrow  on  a  mon¬ 
itor  using  a  hand-held  device  to  point  to  a  displayed  object, 
when  instead,  you  can  reach  and  interact  with  the  object?  Per¬ 
haps  such  an  interface  would  provide  a  closer  reflection  of  real- 
world  interactions — and  the  seeming  ease  with  which  we  inter¬ 
act  with  our  environments,  but  also  subject  to  the  constraints 
of  the  human  system. 


PERCEPTUAL-MOTOR  INTERACTION  IN 
APPLIED  TASKS:  A  FEW  EXAMPLES 


As  we  mentioned  at  the  outset  of  this  chapter,  the  evolution  of 
computers  and  computer-related  technology  has  brought  us  to 
the  point  at  which  the  manner  with  which  we  interact  with  such 
systems  has  become  a  research  area  in  itself  Current  research  in 
motor  behavior  and  experimental  psychology  pertaining  to  at¬ 
tention,  perception,  action,  and  spatial  cognition  is  poised  to 
make  significant  contributions  to  the  area  of  HCI.  In  addition 
to  the  continued  development  of  a  knowledge  base  of  funda¬ 
mental  information  pertaining  to  the  perceptual-motor  capa¬ 
bilities  of  the  human  user,  these  contributions  will  include  new 
theoretical  and  analytical  frameworks  that  can  guide  the  study 
of  HCI  in  various  settings.  In  this  final  section,  we  highlight  just 
a  few  specific  examples  of  HCI  situations  that  offer  a  potential 
arena  for  the  application  of  the  basic  research  that  we  have  out¬ 
lined  in  this  chapter. 

Remote  Operation  and  Endoscopic  Surgery 

Work  by  Hanna,  Shimi,  and  Cuschieri  (1998)  examined  task 
performance  of  surgeons  as  a  function  of  the  location  of  the 
image  display  used  during  endoscopic  surgical  procedures.  In 
their  study,  the  display  was  located  in  front,  to  the  left,  or  to  the 
right  of  the  surgeon.  In  addition,  the  display  was  placed  either 
at  eye  level  or  at  the  level  of  the  surgeon’s  hands.  The  sur¬ 
geons’  task  performance  was  observed  with  the  image  display 
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positioned  at  each  of  these  locations.  Hanna  et  al.  (1998) 
showed  that  the  surgeons’  performance  was  affected  by  the 
location  of  the  display.  Performance  was  facilitated  when  the 
surgeons  were  allowed  to  view  their  actions  with  the  monitor 
positioned  in  front  and  at  the  level  of  the  immediate  workspace 
(the  hands) .  Similar  findings  have  also  been  demonstrated  by 
Mandryk  and  MacKenzie  (1999).  In  addition  to  the  frontal  im¬ 
age  display  location  employed  by  Hanna  et  al.  (1998),  Mandryk 
and  MacKenzie  (1999)  also  investigated  the  benefits  of  project¬ 
ing  and  superimposing  the  image  from  the  endoscopic  camera 
directly  over  the  workspace.  Their  results  showed  that  perfor¬ 
mance  was  superior  when  participants  were  initially  exposed  to 
the  superimposed  viewing  conditions.  This  finding  was  attributed 
to  the  superimposed  view  allowing  the  participants  to  better  cal¬ 
ibrate  the  display  space  with  the  workspace.  These  findings  are 
consistent  with  our  own  investigations  of  action-centered  at¬ 
tention  in  virtual  environments  (Lyons  et  al.,  1999).  "We  would 
suggest  that  the  alignment  of  perceptual  and  motor  work¬ 
spaces  in  the  superimposed  viewing  condition  facilitated  per¬ 
formance  due  to  the  decreased  translation  requirements  de¬ 
manded  by  such  a  situation.  However,  the  findings  of  Lyons 
et  al.  (1999)  would  also  lead  us  to  suspect  that  this  alignment 
may  have  additional  implications  with  respect  to  processes  as¬ 
sociated  with  selective  attention  in  the  mediation  of  action.  Al¬ 
though  the  demands  on  perceptual-motor  translation  may  be 
reduced,  the  potential  intrusion  of  processes  related  to  selec¬ 
tive  attention  and  action  selection  may  now  surface.  Thus,  we 
would  cautiously  suggest  that  the  optimal  display  location  in 
this  task  environment  placed  less  translation  demands  on  the 
surgeon  during  task  performance. 

In  addition  to  considering  the  orientation  of  the  workspace  of 
the  surgeon,  it  must  be  recognized  that  the  surgeons  work  with 
a  team  of  support  personnel  performing  a  variety  of  actions  all 
designed  to  achieve  a  common  goal.  Thus,  the  group  of  people 
can  be  considered  to  be  a  synergistic  unit  much  like  an  individual 
consists  of  a  synergistic  group  of  independent  processes.  This 
similarity  in  functional  structure  led  us  (Welsh  et  al.,  2005,  in 
press)  and  others  to  contemplate  whether  the  behavior  of  a 
group  of  people  in  perceptual-motor  and  attention  tasks  follow 
the  same  principles  as  those  that  govern  the  behavior  of  an  in¬ 
dividual.  "While  these  initial  experiments  provide  converging  ev¬ 
idence  that  the  same  processes  are  involved  in  group  and  indi¬ 
vidual  behavior,  this  line  of  research  is  still  in  its  infancy  and 
additional  research  is  required  to  determine  possible  conditions 
in  which  different  rules  apply.  For  example,  an  interesting  quali¬ 
fication  is  that  the  interactions  between  attention  and  action  that 
occur  when  two  humans  interact  do  not  emerge  when  a  human 
interacts  with  a  robotic  arm  (Castiello,  2003).  Thus,  given  the  dif¬ 
ferent  vantage  points  and  goals  of  each  member  of  the  surgical 
team,  an  important  empirical  and  practical  question  is  will  be  the 
manner  in  which  perceptual-motor  workspace  can  be  effectively 
optimized  to  maximize  the  success  of  the  team. 

Personal  Digital  Assistants 

Hand-held  computer  devices  (PDAs  and  other  similar  commu¬ 
nication  devices  like  the  Blackberry)  are  becoming  increasingly 
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sophisticated  as  they  become  more  and  more  popular  as  mobile 
information  processing  and  communication  systems.  One  of 
the  very  relevant  aspects  of  the  evolving  design  of  PDAs  is  the 
incorporation  of  a  stylus  touch  screen  system.  This  design  al¬ 
lows  for  a  tremendous  flexibility  in  possible  functions  while  at 
the  same  time  maintains  a  consistent  coding  between  real  and 
virtual  space.  This  latter  advantage  should  allow  for  the  straight¬ 
forward  application  of  the  principles  of  action  and  attention  dis¬ 
cussed  in  this  chapter  to  this  virtual  environment  (e.g.,  Tipper 
et  al.,  1992  vs.  Lyons  et  al.,  1999).  However,  caution  should  once 
more  be  urged  when  implementing  design  change  without  due 
consideration  of  issues  such  as  S-R  compatibility  as  it  has  been 
shown  that  novices  are  fairly  poor  judges  of  configurations  that 
facilitate  the  most  efficient  performance  (Payne,  1995;  Vu  & 
Proctor,  2003) .  In  addition  to  the  principles  of  action-centered 
attention  for  lower-level  interactions  such  as  program  naviga¬ 
tion,  the  design  of  higher-order  language-based  functions  must 
also  take  into  account  the  experience  and  expectations  of  the 
user.  For  example,  Patterson  and  Lee  (2005)  found  that  partici¬ 
pants  had  greater  difficulty  in  learning  to  use  characters  from 
the  new  written  language  developed  for  some  PDAs  when  those 
characters  did  not  resemble  the  characters  of  traditional  Eng¬ 
lish.  Thus,  as  been  highlighted  many  times  in  this  chapter,  con¬ 
sideration  of  perceptual  and  motor  expectations  of  the  user  is 
necessary  for  the  efficient  use  of  these  devices. 

Eye-Gaze  vs.  Manual  Mouse  Interactions 

Based  on  our  review  of  attention  and  action,  it  is  obvious  that 
the  spatiotemporal  arrangement  of  relevant  and  nonrelevant 
stimulus  information  will  affect  the  manner  in  which  actions 
are  completed.  Although  the  work  conducted  in  our  labs  (e.g., 
Lyons  et  al.,  1999;  Welsh  et  al.,  1999;  Welsh  &  Elliott,  2004a; 
2004b)  revealed  that  manual  responses,  such  as  those  control¬ 
ling  conventional  mouse  or  touch  screen  systems,  are  affected 
by  nontarget  distracting  information,  research  from  other 
labs  has  revealed  that  eye  movements  are  similarly  affected 
by  the  presentation  of  nontarget  information  (e.g.,  Godjin  & 
Theewues,  2004).  Given  the  intimate  link  between  eye  move¬ 
ments  and  attention,  this  similarity  between  the  action-centered 
effects  on  eye  and  hand  movements  should  not  be  surprising. 
Although  the  similarities  in  the  interactions  between  attention 
and  the  manual  and  eye  movement  systems  would  initially  lead 
one  to  believe  that  environments  designed  for  manual  re¬ 
sponses  can  be  immediately  imported  into  eye-gaze  controlled 
systems,  there  are  important  differences  between  the  systems 
that  must  be  considered.  For  example,  it  has  been  reported  that 
one  of  the  fundamental  determinants  of  manual  reaction  times, 
the  number  of  possible  target  locations  (Hick-Hyman  law;  Hick, 
1952;  Hyman,  1953)  does  not  influence  eye  movement  reaction 
times  (Kveraga,  Boucher,  &  Hughes,  2002).  Thus,  while  the  pro¬ 
gramming  of  eye-gaze  controlled  HCI  systems  may  be  much 
more  complicated  than  those  involved  in  manual  systems,  the 
productivity  of  the  user  of  eye-gaze  systems  may  be  more  effi¬ 
cient  than  those  of  traditional  manual  (mouse)  systems  (Sibert 
&  Jacob,  2000).  Although  eye  movements  still  seem  to  be  sus¬ 
ceptible  to  the  goals  of  the  manual  system  (Bekkering  &  Neg- 
gers,  2002),  additional  investigation  into  the  viability  of  eye-gaze 


and/or  interactive  eye-gaze  and  manual  response  systems  is  cer¬ 
tainly  warranted. 


SUMMARY 


The  field  of  HCI  offers  a  rich  environment  for  the  study  of  per¬ 
ceptual-motor  interactions.  The  design  of  effective  human-com¬ 
puter  interfaces  has  been,  and  continues  to  be,  a  significant  chal¬ 
lenge  that  demands  an  appreciation  of  the  entire  human 
perceptual-motor  system.  The  information-processing  approach 
has  provided  a  dominant  theoretical  and  empirical  framework 
for  the  study  of  perceptual-motor  behavior  in  general,  and  for 
consideration  of  issues  in  HCI  and  human  factors  in  particular. 
Texts  in  the  area  of  human  factors  and  HCI  (including  the  pre¬ 
sent  volume)  are  united  in  their  inclusion  of  chapters  or  sections 
that  pertain  to  the  topic  of  human  information  processing. 
Moreover,  the  design  of  effective  interfaces  reflects  our  knowl¬ 
edge  of  the  perceptual  (e.g.,  visual  displays,  use  of  sound,  graph¬ 
ics),  cognitive  (e.g.,  conceptual  models,  desktop  metaphors), 
and  motoric  constraints  (e.g.,  physical  design  of  input  devices, 
ergonomic  keyboards)  of  the  human  perceptual-motor  system. 

Technological  advances  have  undoubtedly  served  to  improve 
the  HCI  experience.  For  example,  we  have  progressed  beyond 
the  use  of  computer  punch  cards  and  command-line  interfaces 
to  more  complex  tools  such  as  graphical  user  interfaces,  speech 
recognition,  and  eye-gaze  control  systems.  As  HCI  has  become 
not  only  more  effective,  but  by  the  same  token  more  elaborate, 
the  importance  of  the  interaction  between  the  various  percep¬ 
tual,  cognitive,  and  motor  constraints  of  the  human  system  has 
come  to  the  forefront.  In  our  previous  chapter,  we  presented  an 
overview  of  some  topics  of  research  in  stimulus-response  com¬ 
patibility  in  perceptual-motor  interactions  that  we  believed  were 
relevant  to  HCI.  In  this  chapter,  we  have  shifted  the  focus  to  cur¬ 
rent  issues  in  the  interaction  between  attention  and  action¬ 
planning.  While  action-centered  models  will  require  additional 
development  to  describe  full-body  behavior  in  truly  complex  en¬ 
vironments  (e.g.,  negotiating  a  busy  sidewalk  or  skating  rink), 
the  settings  in  which  these  models  have  been  tested  are,  in  fact, 
very  similar  to  modern  HCI  environments.  Thus,  we  believe  that 
the  relevance  of  this  work  for  HCI  cannot  be  underestimated. 
Clearly,  considerable  research  will  be  necessary  to  evaluate  the 
applicability  of  both  of  these  potentially  relevant  lines  of  inves¬ 
tigation  to  specific  HCI  design  problems.  Nevertheless,  the  ex¬ 
perimental  work  to  date  leads  us  to  conclude  that  the  allocation 
of  attention  carries  an  action-centered  component.  For  this  rea¬ 
son,  an  effective  interface  must  be  sensitive  to  the  perceptual 
and  action  expectations  of  the  user,  the  specific  action  associ¬ 
ated  with  a  particular  response  location,  the  action  relationship 
between  that  response  and  those  around  it,  and  the  degree  of 
translation  required  to  map  the  perceptual-motor  workspaces. 
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It  is  natural  for  an  applied  psychology  of  human- 
computer  interaction  to  be  based  theoretically 
on  information-processing  psychology. 

— Card,  Moran,  &  Newell,  1983 

Human-computer  interaction  (HCI)  is  fundamentally  an 
information-processing  task.  In  interacting  with  a  computer,  a 
user  has  specific  goals  and  subgoals  in  mind.  The  user  initiates 
the  interaction  by  giving  the  computer  commands  that  are  di¬ 
rected  toward  accomplishing  those  goals.  The  commands  may 
activate  software  programs  designed  to  allow  specific  types  of 
tasks,  such  as  word  processing  or  statistical  analysis  to  be  per¬ 
formed.  The  resulting  computer  output,  typically  displayed  on  a 
screen,  must  provide  adequate  information  for  the  user  to  com¬ 
plete  the  next  step,  or  the  user  must  enter  another  command  to 
obtain  the  desired  output  from  the  computer.  The  sequence  of 
interactions  to  accomplish  the  goals  may  be  long  and  complex, 
and  several  alternative  sequences,  differing  in  efficiency,  may  be 
used  to  achieve  these  goals.  During  the  interaction,  the  user  is  re¬ 
quired  to  identify  displayed  information,  select  responses  based 
on  the  displayed  information,  and  execute  those  responses.  The 
user  must  search  the  displayed  information  and  attend  to  the  ap¬ 
propriate  aspects  of  it.  She  or  he  must  also  recall  the  commands 
and  resulting  consequences  of  those  commands  for  different 
programs,  remember  information  specific  to  the  task  that  is  be¬ 
ing  performed,  and  make  decisions  and  solve  problems  during 
the  process.  For  the  interaction  between  the  computer  and  user 
to  be  efficient,  the  interface  must  be  designed  in  accordance  with 
the  user’s  information  processing  capabilities. 


HUMAN  INFORMATION 
PROCESSING  APPROACH 


The  rise  of  the  human  information-processing  approach  in  psy¬ 
chology  is  closely  coupled  with  the  growth  of  the  fields  of  cog¬ 
nitive  psychology,  human  factors,  and  human  engineering.  Al¬ 
though  research  that  can  be  classified  as  falling  within  these 
fields  has  been  conducted  since  the  last  half  of  the  19th  century 
their  formalization  dates  back  to  "World  "War  II  (see  Hoffman  & 
Deffenbacher,  1992).  As  part  of  the  war  efforts,  experimental 
psychologists  worked  along  with  engineers  on  applications  as¬ 
sociated  with  using  the  sophisticated  equipment  being  devel¬ 
oped.  As  a  consequence,  the  psychologists  were  exposed  not 
only  to  applied  problems  but  also  to  the  techniques  and  views 
being  developed  in  areas  such  as  communications  engineering 
(see  Roscoe,  2005).  Many  of  the  concepts  from  engineering  (for 
instance,  the  notion  of  transmission  of  information  through  a 
limited  capacity  communications  channel)  were  seen  as  applic¬ 
able  to  analyses  of  human  performance. 

The  human  information-processing  approach  is  based  on  the 
idea  that  human  performance,  from  displayed  information  to 
response,  is  a  function  of  several  processing  stages.  The  nature  of 
these  stages,  how  they  are  arranged,  and  the  factors  that  influ¬ 
ence  how  quickly  and  accurately  a  particular  stage  operates  can 
be  discovered  through  appropriate  research  methods.  It  is  of¬ 
ten  said  that  the  central  metaphor  of  the  information-processing 
approach  is  that  a  human  is  like  a  computer  (e.g.,  Lachman, 
Lachman,  &  Butterfield,  1979).  However,  even  more  fundamen¬ 


tal  than  the  computer  metaphor  is  the  assumption  that  the  hu¬ 
man  is  a  complex  system  that  can  be  analyzed  in  terms  of  sub¬ 
systems  and  their  interrelation.  This  point  is  evident  in  the  work 
of  researchers  on  attention  and  performance,  such  as  Paul  Fitts 
(1951)  and  Donald  Broadbent  (1958),  who  were  among  the  first 
to  adopt  the  information-processing  approach  in  the  1950s. 

The  systems  perspective  underlies  not  only  human  infor¬ 
mation-processing  but  also  human  factors  and  HCI,  providing 
a  direct  link  between  the  basic  and  applied  fields  (Proctor  &  Van 
Zandt,  1994).  Human  factors  in  general,  and  HCI  in  particular, 
begin  with  the  fundamental  assumption  that  a  human-machine 
system  can  be  decomposed  into  machine  and  human  subsystems, 
each  of  which  can  be  analyzed  further.  The  human  information¬ 
processing  approach  provides  the  concepts,  methods,  and  the¬ 
ories  for  analyzing  the  processes  involved  in  the  human  sub¬ 
system.  Posner  (1986)  stated,  “Indeed,  much  of  the  impetus  for 
the  development  of  this  kind  of  empirical  study  stemmed  from 
the  desire  to  integrate  description  of  the  human  within  overall 
systems”  (p.  V-6).  Young,  Clegg,  and  Smith  (2004)  emphasized 
that  the  most  basic  distinction  between  three  processing  stages 
(perception,  cognition,  and  action),  as  captured  in  a  block  dia¬ 
gram  model  of  human  information  processing,  is  important 
even  for  understanding  the  dynamic  interactions  of  an  opera¬ 
tor  with  a  vehicle  for  purposes  of  computer-aided  augmented 
cognition.  They  noted: 

This  block  diagram  model  of  the  human  is  important  because  it  not  only 
models  the  flow  of  information  and  commands  between  the  vehicle  and 
the  human,  it  also  enables  access  to  the  internal  state  of  the  human  at 
various  parts  of  the  process.  This  allows  the  modeling  of  what  a  cogni¬ 
tive  measurement  system  might  have  access  to  (internal  to  the  human), 
and  how  that  measurement  might  then  be  used  as  part  of  a  closed-loop 
human-machine  interface  system,  (pp.  261-262) 

In  the  first  half  of  the  20th  century,  the  behaviorist  approach 
predominated  in  psychology,  particularly  in  the  United  States. 
Within  this  approach,  many  sophisticated  theories  of  learning 
and  behavior  were  developed  that  differed  in  various  details 
(Bower  &  Hilgard,  1981).  However,  the  research  and  theories  of 
the  behaviorist  approach  tended  to  minimize  the  role  of  cogni¬ 
tive  processes  and  were  of  limited  value  to  the  applied  problems 
encountered  in  World  War  II.  The  information-processing  ap¬ 
proach  was  adopted  because  it  provided  a  way  to  examine  top¬ 
ics  of  basic  and  applied  concern,  such  as  attention,  that  were  rel¬ 
atively  neglected  during  the  behaviorist  period.  It  continues  to 
be  the  main  approach  in  psychology,  although  contributions 
have  been  made  from  other  approaches,  some  of  which  we  will 
consider  in  the  last  section  of  the  chapter. 

Within  HCI,  human  information-processing  analyses  are  used 
in  two  ways.  First,  empirical  studies  evaluate  the  information¬ 
processing  requirements  of  various  tasks  in  which  a  human  uses 
a  computer.  Second,  computational  models  are  developed 
which  are  intended  to  characterize  human  information  pro¬ 
cessing  when  interacting  with  computers,  and  predict  human 
performance  with  alternative  interfaces.  In  this  chapter,  we  sur¬ 
vey  methods  used  to  study  human  information  processing  and 
summarize  the  major  findings  and  the  theoretical  frameworks 
developed  to  explain  them.  We  also  tie  the  methods,  findings, 
and  theories  to  HCI  issues  to  illustrate  their  use. 
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INFORMATION-PROCESSING  METHODS 


Any  theoretical  approach  makes  certain  presuppositions  and 
tends  to  favor  some  methods  and  techniques  over  others. 
Information-processing  researchers  have  used  behavioral  and, 
to  an  increasing  extent,  psychophysiological  measures,  with  an 
emphasis  on  chronometric  (time-based)  methods.  There  also 
has  been  a  reliance  on  flow  models  that  are  often  quantified 
through  computer  simulation  or  mathematical  modeling. 

Signal  Detection  Methods  and  Theory 

One  of  the  most  useful  methods  for  studying  human  informa¬ 
tion  processing  is  that  of  signal  detection  (Macmillan  &  Creel- 
man,  2005).  In  a  signal-detection  task,  some  event  is  classified  as 
a  signal  and  the  subject’s  task  is  to  detect  whether  the  signal  is 
present.  Trials  in  which  it  is  not  present  are  called  “noise  trials.” 
The  proportion  of  trials  in  which  the  signal  is  correctly  identi¬ 
fied  as  present  is  called  the  “hit  rate,”  and  the  proportion  of 
trials  in  which  the  signal  is  incorrectly  identified  as  present  is 
called  the  “false  alarm  rate.”  By  using  the  hit  and  false  alarm 
rates,  whether  the  effect  of  a  variable  is  on  detectability  or  re¬ 
sponse  bias  can  be  evaluated. 

Signal-detection  theory  is  often  used  as  the  basis  for  analyz¬ 
ing  data  from  such  tasks.  This  theory  assumes  that  the  response 
on  each  trial  is  a  function  of  two  discrete  operations:  encoding 
and  decision.  In  a  trial,  the  subject  samples  the  information  pre¬ 
sented  and  decides  whether  this  information  is  sufficient  to 
warrant  a  signal  present  response.  The  sample  of  information 
is  assumed  to  provide  a  value  along  a  continuum  of  evidence 
states  regarding  the  likelihood  that  the  signal  was  present.  The 
noise  trials  form  a  probability  distribution  of  states,  as  do  the  sig¬ 
nal  trials.  The  decision  that  must  be  made  on  each  trial  can  be 
characterized  as  to  whether  the  event  is  from  the  signal  or  noise 
distribution.  The  subject  is  presumed  to  adopt  a  criterion  value 
of  evidence  above  which  he  or  she  responds  “signal  present” 
and  below  which  he  or  she  responds  “signal  absent.” 

In  the  simplest  form,  the  distributions  are  assumed  to  be 
normal  and  equal  variance.  In  this  case,  a  measure  of  detectabil¬ 
ity  (d')  can  be  derived.  This  measure  represents  the  difference 
in  the  means  for  the  signal  and  noise  distributions  in  standard 
deviation  units.  A  measure  of  response  bias  (P),  which  repre¬ 
sents  the  relative  heights  of  the  signal  and  noise  distributions 
at  the  criterion,  can  also  be  calculated.  This  measure  reflects  the 
subject’s  overall  willingness  to  say  “signal  present,”  regardless  of 
whether  it  actually  is  present.  There  are  numerous  alternative 
measures  of  detectability  and  bias  based  on  different  assump¬ 
tions  and  theories,  and  many  task  variations  to  which  they  can 
be  applied  (see  Macmillan  &  Creelman,  2005). 

Signal-detection  analyses  have  been  particularly  useful  be¬ 
cause  they  can  be  applied  to  any  task  that  can  be  depicted  in 
terms  of  binary  discriminations.  For  example,  the  proportion 
of  words  in  a  memory  task  correctly  classified  as  old  can  be 
treated  as  a  hit  rate  and  the  proportion  of  new  lures  classified  as 
old  can  be  treated  as  a  false  alarm  rate  (Lockhart  &  Murdock, 
1970).  In  cases  such  as  these,  the  resulting  analysis  helps  re¬ 
searchers  determine  whether  variables  are  affecting  detectabil¬ 
ity  of  an  item  as  old  or  response  bias. 


An  area  of  research  in  which  signal-detection  methods  have 
been  widely  used  is  that  of  vigilance  (Parasuraman  &  Davies, 
1977).  In  a  typical  vigilance  task,  a  display  is  monitored  for  cer¬ 
tain  changes  in  it  (e.g.,  the  occurrence  of  an  infrequent  stimu¬ 
lus).  Vigilance  tasks  are  common  in  the  military,  but  many  aspects 
also  can  be  found  in  computer-related  tasks  such  as  monitor¬ 
ing  computer  network  operations  (Percival  &  Noonan,  1987).  A 
customary  finding  for  vigilance  tasks  is  the  vigilance  decrement, 
in  which  the  hit  rate  decreases  as  time  on  the  task  increases. 
The  classic  example  of  this  vigilance  decrement  is  that,  during 
■World  "War  II,  British  radar  observers  detected  fewer  of  the  en¬ 
emy’s  radar  signals  after  30  minutes  in  a  radar  observation  shift 
(Mackworth,  1948).  Parasuraman  and  Davies  concluded  that,  for 
many  situations,  the  primary  cause  of  the  vigilance  decrement  is 
an  increasingly  strict  response  criterion.  That  is,  the  false  alarm 
rate  as  well  as  the  hit  rate  decreases  as  a  function  of  time  on 
task.  They  provided  evidence  that  perceptual  sensitivity  also 
seems  to  decrease  when  the  task  requires  comparison  of  each 
event  to  a  standard  held  in  memory  and  the  event  rate  is  high. 
Subsequently,  See,  Howe,  'Warm,  and  Dember  (1995)  concluded 
that  a  decrease  in  perceptual  sensitivity  occurs  for  a  broad  range 
of  tasks.  Although  signal-detection  theory  can  be  used  to  help 
determine  whether  a  variable  affects  encoding  quality  or  deci¬ 
sion,  as  in  the  vigilance  example,  it  is  important  to  keep  in  mind 
that  the  measures  of  detectability  and  bias  are  based  on  certain 
theoretical  assumptions.  Balakrishnan  (1998)  argued,  on  the  ba¬ 
sis  of  an  analysis  that  does  not  require  the  assumptions  of  signal- 
detection  theory  that  the  vigilance  decrement  is  not  a  result  of 
a  biased  placement  of  the  response  criterion,  even  when  the  sig¬ 
nal  occurs  infrequently  and  time  on  task  increases. 

Chronometric  Methods 

Chronometric  methods,  for  which  time  is  a  factor,  have  been 
the  most  widely  used  for  studying  human  information  process¬ 
ing.  Indeed,  Lachman  et  al.  (1979)  portrayed  reaction  time  (RT) 
as  the  main  dependent  measure  of  the  information-processing 
approach.  Although  many  other  measures  are  used,  RT  still  pre¬ 
dominates  in  part  because  of  its  sensitivity  and  in  part  because 
of  the  sophisticated  techniques  that  have  been  developed  for 
analyzing  RT  data. 

A  technique  called  the  “subtractive  method,”  introduced  by 
F.  C.  Donders  (1868/1969)  in  the  1860s,  was  revived  in  the  1950s 
and  60s.  This  method  provides  a  way  to  estimate  the  duration  of 
a  particular  processing  stage.  The  assumption  of  the  subtrac¬ 
tive  method  is  that  a  series  of  discrete  processing  stages  inter¬ 
vene  between  stimulus  presentation  and  response  execution. 
Through  careful  selection  of  tasks  that  differ  by  a  single  stage, 
the  RT  for  the  easier  task  can  be  subtracted  from  that  for  the 
more  difficult  task  to  yield  the  time  for  the  additional  process. 
The  subtractive  method  has  been  used  to  estimate  the  dura¬ 
tions  of  a  variety  of  processes,  including  rates  of  mental  rotation 
(approximately  12  to  20  ms  per  degree  of  rotation;  Shepard  & 
Metzler,  1971)  and  memory  search  (approximately  40  ms  per 
item;  Sternberg,  1969).  An  application  of  the  subtractive  method 
to  HCI  would  be,  for  example,  to  compare  the  time  to  find 
a  target  link  on  two  web  pages  that  are  identical  except  for 
the  number  of  links  displayed,  and  to  attribute  the  extra  time 
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to  the  additional  visual  search  required  for  the  more  complex 
web  page. 

The  subtractive  method  has  several  limitations  (Pachella, 
1974).  First,  it  is  only  applicable  when  discrete,  serial  processing 
stages  can  be  assumed.  Second,  the  processing  for  the  two  tasks 
being  compared  must  be  equivalent  except  for  the  additional 
process  that  is  being  evaluated.  This  requires  an  assumption  of 
pure  insertion,  which  is  that  the  additional  process  for  the  more 
complex  of  two  tasks  can  be  inserted  without  affecting  the 
processes  held  in  common  by  the  two  tasks.  However,  this  as¬ 
sumption  often  is  not  justified. 

Sternberg  (1969)  developed  the  additive  factors  method  to 
allow  determination  of  the  processes  involved  in  performing  a 
task.  The  additive  factors  method  avoids  the  problem  of  pure  in¬ 
sertion  because  the  crucial  data  are  whether  two  variables  affect 
RT  for  the  same  task  in  an  additive  or  interactive  manner.  Stern¬ 
berg  assumed,  as  did  Donders,  that  information  processing  oc¬ 
curs  in  a  sequence  of  discrete  stages,  each  of  which  produces  a 
constant  output  that  serves  as  input  to  the  next  stage  in  the  se¬ 
quence.  With  these  assumptions,  he  showed  that  two  variables 
that  affect  different  stages  should  have  additive  effects  on  RT. 
In  contrast,  two  variables  that  affect  the  same  stage  should  have 
interactive  effects  on  RT.  Sternberg  performed  detailed  analyses 
of  memory  search  tasks  in  which  a  person  holds  a  set  of  letters 
or  digits  in  memory  and  responds  to  a  target  stimulus  by  indi¬ 
cating  whether  it  is  in  the  memory  set.  Based  on  the  patterns 
of  additive  and  interactive  effects  that  he  observed,  Sternberg 
concluded  that  the  processing  in  such  tasks  involves  four  stages: 
target  identification,  memory  search,  response  selection,  and 
response  execution.  Grobelny,  Karwowski,  and  Drury  (2005) 
provided  an  application  of  additive-factors  logic  to  usability  of 
graphical  icons  in  the  design  of  HCI  interfaces.  Mode  of  icon 
array  (menu  or  dialog  box),  number  of  icons,  and  difficulty  of 
movement  had  additive  effects  on  response  times,  implying  that 
these  variables  affect  different  processing  stages. 

Both  the  subtractive  and  additive  factors  methods  have  been 
challenged  on  several  grounds  (Pachella,  1974).  First,  the  as¬ 
sumption  of  discrete  serial  stages  with  constant  output  is  diffi¬ 
cult  to  justify  in  many  situations.  Second,  both  methods  rely  on 
analyses  of  RT,  without  consideration  of  error  rates.  This  can 
be  problematic  because  performance  is  typically  not  error  free, 
and,  as  described  below,  speed  can  be  traded  for  accuracy.  De¬ 
spite  these  limitations,  the  methods  have  proved  to  be  robust 
and  useful  (Sanders,  1998).  For  example,  Salthouse  (2005)  noted 
that  the  process  analysis  approach  employed  in  contemporary 
research  into  aging  effects  on  cognitive  abilities  “has  used  a  va¬ 
riety  of  analytical  methods  such  as  subtraction,  additive  factors 
...  to  partition  the  variance  in  the  target  variable  into  theoreti¬ 
cally  distinct  processes”  (p.  288). 


Speed-Accuracy  Methods 
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FIGURE  2.1.  Speed-accuracy  operating  characteristic  curve. 
Faster  responding  occurs  at  the  cost  of  lower  accuracy. 


can  result  in  a  large  decrease  in  RT.  With  respect  to  text  entry  on 
computing  devices,  MacKenzie  and  Soukoreff  (2002)  stated, 
“Clearly,  both  speed  and  accuracy  must  be  measured  and  ana¬ 
lyzed.  .  .  .  Participants  can  enter  text  more  quickly  if  they  are  will¬ 
ing  to  sacrifice  accuracy”  (pp.  159-160). 

In  speed-accuracy  tradeoff  studies,  the  speed-accuracy  crite¬ 
rion  is  varied  between  blocks  of  trials  or  between  subjects  by  us¬ 
ing  different  instructions  regarding  the  relative  importance  of 
speed  versus  accuracy,  varying  payoffs  such  that  speed  or  accu¬ 
racy  is  weighted  more  heavily,  or  imposing  different  response 
deadlines  (Wickelgren,  1977).  These  studies  have  the  potential 
to  be  more  informative  than  RT  studies  because  they  can  pro¬ 
vide  information  about  whether  variables  affect  the  intercept 
(time  at  which  accuracy  exceeds  chance),  asymptote  (the  maxi¬ 
mal  accuracy) ,  and  rate  of  ascension  from  the  intercept  to  the 
asymptote,  each  of  which  may  reflect  different  processes.  For 
example,  Boldini,  Russo,  and  Avons  (2004)  obtained  evidence 
favoring  dual-process  models  of  recognition  memory  over  sin¬ 
gle-process  models  by  varying  the  delay  between  a  visually  pre¬ 
sented  test  word  and  a  signal  to  respond.  Recognition  accuracy 
benefited  from  a  modality  match  at  study  and  test  (better  per¬ 
formance  when  the  study  words  were  also  visual  rather  than 
auditory)  at  short  response-signal  delays,  but  it  benefited  from 
deep  processing  during  study  (judging  pleasantness)  over  shal¬ 
low  processing  (repeating  aloud  each  word)  at  long  response- 
signal  delays.  Boldini  et  al.  interpreted  these  results  as  consis¬ 
tent  with  the  view  that  recognition  judgments  are  based  on  a 
fast  familiarity  process  or  a  slower  recollection  process. 

Because  the  speed-accuracy  criterion  is  manipulated  in  addi¬ 
tion  to  any  other  variables  of  interest,  much  more  data  must  be 
collected  in  a  speed-accuracy  study  than  in  a  RT  study.  Conse¬ 
quently,  use  of  speed-accuracy  methods  has  been  restricted  to 
situations  in  which  the  speed-accuracy  relation  is  of  major  con¬ 
cern,  rather  than  being  widely  adopted  as  the  method  of  choice. 


The  function  relating  response  speed  to  accuracy  is  called  the 
“speed-accuracy  tradeoff”  (Pachella,  1974).  The  function,  illus¬ 
trated  in  Fig.  2.1,  shows  that  very  fast  responses  can  be  per¬ 
formed  with  chance  accuracy,  and  accuracy  will  increase  as  re¬ 
sponding  slows  down.  Of  importance  is  the  fact  that  when 
accuracy  is  high,  as  in  most  RT  studies,  a  small  increase  in  errors 


Psychophysiological  and  Neuroimaging  Methods 

In  recent  years,  psychophysiological  methods  have  been  used 
to  evaluate  implications  of  information-processing  models  and 
to  relate  the  models  to  brain  processes.  Such  methods  can  pro- 
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vide  details  regarding  the  nature  of  processing  by  examining 
physiological  activity  as  a  task  is  being  performed.  The  most 
widely  used  method  involves  measurement  of  electroenceph¬ 
alograms  (EEGs) ,  which  are  recordings  of  changes  in  brain  ac¬ 
tivity  as  a  function  of  time  measured  by  electrodes  placed  on  the 
scalp  (Rugg  &  Coles,  1995).  Of  most  concern  for  information¬ 
processing  research  are  event-related  potentials  (ERPs) ,  which 
are  the  changes  in  brain  activity  that  are  elicited  by  an  event 
such  as  stimulus  presentation  or  response  initiation.  ERPs  are 
obtained  by  averaging  across  many  trials  of  a  task  to  remove 
background  EEG  noise  and  are  thought  to  reflect  postsynaptic 
potentials  in  the  brain. 

There  are  several  features  of  the  ERP  that  represent  differ¬ 
ent  aspects  of  processing.  These  features  are  labeled  according 
to  their  polarity,  positive  (P)  or  negative  (N),  and  their  sequence 
or  latency.  The  first  positive  (PI)  and  first  negative  (Nl)  com¬ 
ponents  are  associated  with  early  perceptual  processes.  They 
are  called  “exogenous  components”  because  they  occur  in  close 
temporal  proximity  to  the  stimulus  event  and  have  a  stable  la¬ 
tency  with  respect  to  it.  Later  components  reflect  cognitive 
processes  and  are  called  “endogenous”  because  they  are  a  func¬ 
tion  of  the  task  demands  and  have  a  more  variable  latency  than 
the  exogenous  components.  One  such  component  that  has 
been  studied  extensively  is  the  P3  (or  P300),  which  represents 
post-perceptual  processes.  When  an  occasional  target  stimulus 
is  interspersed  in  a  stream  of  standards,  the  P3  is  observed  in  re¬ 
sponse  to  targets,  but  not  to  standards.  By  comparing  the  ef¬ 
fects  of  task  manipulations  on  various  ERP  components  such  as 
P3,  their  onset  latencies,  and  their  scalp  distributions,  relatively 
detailed  inferences  about  the  cognitive  processes  can  be  made. 

One  example  of  applying  a  P3  analysis  to  HCI  is  a  study  by 
Trimmel  and  Huber  (1998).  In  their  study,  subjects  performed 
three  HCI  tasks  (text  editing,  programming,  and  playing  the 
game  Tetris)  for  seven  minutes  each.  They  also  performed  com¬ 
parable  paper/pencil  tasks  in  three  other  conditions.  The  P3  was 
measured  after  each  experimental  task  by  having  subjects  mon¬ 
itor  a  stream  of  high-  and  low-pitched  tones,  keeping  count  of 
each  separately.  The  P3  varied  as  a  function  of  type  of  task,  as 
well  as  medium  (computer  vs.  paper/pencil).  The  amplitude  of 
the  P3  was  smaller  following  the  HCI  tasks  than  following  the 
paper/pencil  tasks,  suggesting  that  the  HCI  tasks  caused  more  fa¬ 
tigue  or  depletion  of  cognitive  resources  than  the  paper/pencil 
task.  The  P3  latency  was  shorter  after  the  programming  task 
than  after  the  others,  which  the  authors  interpreted  as  an  after¬ 
effect  of  highly  focused  attention. 

Another  measure  that  has  been  used  in  studies  of  human  in¬ 
formation  processing  is  the  lateralized  readiness  potential  (LRP; 
Eimer,  1998).  The  LRP  can  be  recorded  in  choice-reaction  tasks 
that  require  a  response  with  the  left  or  right  hand.  It  is  a  mea¬ 
sure  of  differential  activation  of  the  lateral  motor  areas  of  the 
visual  cortex  that  occurs  shortly  before  and  during  execution 
of  a  response.  The  asymmetric  activation  favors  the  motor  area 
contralateral  to  the  hand  making  the  response,  because  this  is 
the  area  that  controls  the  hand.  The  LRP  has  been  obtained  in 
situations  in  which  no  overt  response  is  ever  executed,  allowing 
it  to  be  used  as  an  index  of  covert,  partial-response  activation. 
The  LRP  is  thus  a  measure  of  the  difference  in  activity  from  the 
two  sides  of  the  brain  that  can  be  used  as  an  indicator  of  covert 
reaction  tendencies,  to  determine  whether  a  response  has  been 


prepared  even  when  it  is  not  actually  executed.  It  can  also  be 
used  to  determine  whether  the  effects  of  a  variable  are  prior  or 
subsequent  to  response  preparation. 

Electrophysiological  measurements  do  not  have  the  spatial 
resolution  needed  to  provide  precise  information  about  the 
brain  structures  that  produce  the  recorded  activity.  Recently  de¬ 
veloped  neuroimaging  methods,  including  positron-emission 
tomography  (PET)  and  functional  magnetic  resonance  imaging 
(fMRI),  measure  changes  in  blood  flow  associated  with  neu¬ 
ronal  activity  in  different  regions  of  the  brain  (Huettel,  Song,  & 
McCarthy,  2004).  These  methods  have  poor  temporal  resolution 
but  much  higher  spatial  resolution  than  the  electrophysiological 
methods.  In  an  imaging  study,  both  control  and  experimental 
tasks  are  performed,  and  the  functional  neuroanatomy  of  the 
cognitive  processes  is  derived  by  subtracting  the  image  during 
the  control  task  from  that  during  the  experimental  task. 

Application  of  cognitive  neuroscience  to  HCI  has  been  ad¬ 
vocated  under  the  heading  of  neuroergonomics.  According  to 
Parasuraman  (2003),  “Neuroergonomics  focuses  on  investiga¬ 
tions  of  the  neural  bases  of  mental  functions  and  physical  per¬ 
formance  in  relation  to  technology  work,  leisure,  transporta¬ 
tion,  health  care  and  other  settings  in  the  real  world”  (p.  5). 
Neuroergonomics  has  the  goal  of  using  knowledge  of  the  rela¬ 
tion  between  brain  function  and  human  performance  to  design 
interfaces  and  computerized  systems  that  are  sensitive  to  brain 
function  with  the  intent  of  increasing  the  efficiency  and  safety  of 
human-machine  systems. 


INFORMATION-PROCESSING  MODELS 


Discrete  and  Continuous  Stage  Models 

It  is  common  to  assume  that  the  processing  between  stimuli 
and  responses  consists  of  a  series  of  discrete  stages  for  which 
the  output  for  one  stage  serves  as  the  input  for  the  next,  as  Don- 
ders  and  Sternberg  assumed.  This  assumption  is  made  for  the 
Model  Human  Processor  (Card  et  al.,  1983)  and  the  Executive- 
Process  Interactive  Control  (EPIC;  Meyer  &  Kieras,  1997)  archi¬ 
tectures,  both  of  which  have  been  applied  to  HCI.  However, 
models  can  be  developed  that  allow  for  successive  processing 
stages  to  operate  concurrently.  A  well-known  model  of  this  type 
is  McClelland’s  (1979)  cascade  model,  in  which  partial  informa¬ 
tion  at  one  subprocess,  or  stage,  is  transferred  to  the  next.  Each 
stage  is  continuously  active,  and  its  output  is  a  continuous  value 
that  is  always  available  to  the  next  stage.  The  final  stage  results 
in  selection  of  which  of  the  possible  alternative  responses  to 
execute.  Many  parallel  distributed  processing,  or  neural  net¬ 
work,  models  are  of  a  continuous  nature. 

According  toj.  Miller  (1988),  models  of  human  information 
processing  can  be  classified  as  discrete  or  continuous  along 
three  dimensions:  (a)  representation,  (b)  transformation,  and 
(c)  transmission.  “Representation”  refers  to  whether  the  input 
and  output  codes  for  the  processing  stage  are  continuous  or 
discrete.  “Transformation”  refers  to  whether  the  operation  per¬ 
formed  by  the  processing  stage  (e.g.,  spatial  transformation)  is 
continuous  or  discrete.  “Transmission”  is  classified  as  discrete  if 
the  processing  of  successive  stages  does  not  overlap  temporally. 
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The  discrete  stage  model  proposed  by  Sternberg  (1969)  has 
discrete  representation  and  transmission,  whereas  the  cascade 
model  proposed  by  McClelland  (1979)  has  continuous  repre¬ 
sentation,  transmission,  and  transformation.  Models  can  be  in¬ 
termediate  to  these  two  extremes.  For  example,).  Miller’s  (1988) 
asynchronous  discrete  coding  model  assumes  that  most  stimuli 
are  composed  of  features,  and  these  features  are  identified  sep¬ 
arately.  Discrete  processing  occurs  for  feature  identification,  but 
once  a  feature  is  identified,  this  information  can  be  passed  to  re¬ 
sponse  selection  while  the  other  features  are  still  being  identified. 

Sequential  Sampling  Models 

Sequential  sampling  models  are  able  to  account  for  both  RT  and 
accuracy,  and  consequently,  the  tradeoff  between  them  (Ratcliff 
&  Smith,  2004;  Van  Zandt,  Colonius,  &  Proctor,  2000).  According 
to  such  models,  information  from  the  stimulus  is  sequentially 
sampled,  resulting  in  a  gradual  accumulation  of  information  on 
which  selection  of  one  of  the  alternative  responses  is  based.  A 
response  is  selected  when  the  accumulated  information  ex¬ 
ceeds  a  threshold  amount  required  for  that  response.  Factors 
that  influence  the  quality  of  information  processing  have  their 
effects  on  the  rate  at  which  the  information  accumulates,  where¬ 
as  factors  that  bias  speed-versus-accuracy  or  specific  responses 
have  their  effects  on  the  response  thresholds. 

Balakrishnan  (1998)  argued  that  sequential  sampling  may 
be  a  factor  even  when  the  experiment  does  not  stress  speed  of 
responding.  As  mentioned  before,  he  showed  that  an  analysis  of 
vigilance  data  that  does  not  make  the  assumptions  of  signal- 
detection  theory  suggests  that  attribution  of  the  vigilance  decre¬ 
ment  to  a  change  toward  a  more  conservative  response  bias  is 
incorrect.  One  reason  why  signal  detection  theory  may  lead  to 
an  incorrect  conclusion  is  that  the  model  assumes  that  the  de¬ 
cision  is  based  on  a  fixed  sample  of  information,  rather  than  in¬ 
formation  that  is  accumulating  across  time.  Balakrishnan  argued 
that  even  though  there  are  no  incentives  to  respond  quickly  in 
the  typical  vigilance  task,  subjects  may  choose  not  to  wait  until 
all  of  the  stimulus  information  has  been  processed  before  re¬ 
sponding.  He  proposed  that  a  sequential  sampling  model,  in 
which  the  subject  continues  to  process  the  information  until  a 
stopping  rule  condition  is  satisfied,  provides  a  better  depiction. 
In  this  model,  there  are  two  potential  sources  of  bias:  the  stop¬ 
ping  rule  and  decision  rule.  Based  on  this  model,  Balakrishnan 
concluded  that  there  is  a  response  bias  initially  when  the  signal 
rate  is  low,  and  that  the  vigilance  decrement  is  due  to  a  gradual 
reduction  of  this  response  bias  toward  a  more  optimal  decision 
during  the  time-course  of  the  vigil. 


INFORMATION  PROCESSING 
IN  CHOICE  REACTION  TASKS 


In  a  typical  choice  reaction  task  in  which  each  stimulus  is  as¬ 
signed  to  a  unique  response,  it  is  customary  to  distinguish  be¬ 
tween  three  stages  of  processing:  stimulus  identification,  re¬ 
sponse  selection,  and  response  execution  (Proctor  &  Van  Zandt, 
1994).  The  stimulus  identification  stage  involves  processes  that 


are  entirely  dependent  on  stimulus  properties.  The  response- 
selection  stage  concerns  those  processes  involved  in  determin¬ 
ing  which  response  to  make  to  each  stimulus.  Response  execu¬ 
tion  refers  to  programming  and  execution  of  motor  responses. 
Based  on  additive  factors  logic,  Sanders  (1998)  decomposed  the 
stimulus  identification  stage  into  three  subcategories  and  the 
response  execution  stage  into  two  subcategories,  resulting  in  six 
stages  (see  Fig.  2.2). 

Stimulus  Identification 

The  preprocessing  stage  of  stimulus  identification  refers  to  pe¬ 
ripheral  sensory  processes  involved  in  the  conduction  of  the 
sensory  signal  along  the  afferent  pathways  to  the  sensory  pro¬ 
jection  areas  of  the  cerebral  cortex.  These  processes  are  affected 
by  variables  such  as  stimulus  contrast  and  retinal  location.  As 
stimulus  contrast,  or  intensity,  increases,  RT  decreases  until 
reaching  asymptote.  For  example,  Bonin-Guillaume,  Possamai, 
Blin,  and  Hasbroucq  (2000)  had  younger  and  older  adults  per¬ 
form  a  two-choice  reaction  task,  in  which  a  left  or  right  key¬ 
press  was  made  to  a  bright  or  dim  light  positioned  to  the  left 
or  right.  Stimulus  intensity  interacted  with  age,  with  RTs  being 
about  25  ms  shorter  to  a  bright  stimulus  than  to  a  dim  stimulus 
for  younger  adults  compared  to  50  ms  for  older  adults.  The  ef¬ 
fect  of  stimulus  intensity  did  not  interact  with  variables  that  af¬ 
fect  response  selection  or  motor  adjustment,  suggesting  that 
the  age-related  deficit  in  sensory  preprocessing  did  not  affect 
the  later  processing  stages. 

Feature  extraction  involves  lower-level  perceptual  processing 
based  in  area  VI  (the  visual  cortex)  and  other  early  visual  corti¬ 
cal  areas.  Stimulus  discriminability  word  priming,  and  stimulus 
quality  affect  the  feature  extraction  process.  For  example,  ma¬ 
nipulations  of  stimulus  quality  (such  as  superimposing  a  grid) 
slow  RT,  presumably  by  creating  difficulty  for  the  extraction  of 
features.  Identification  itself  is  influenced  by  word  frequency 
and  mental  rotation.  The  latter  refers  to  the  idea  that,  when  a 
stimulus  is  rotated  from  the  upright  position,  the  time  it  takes  to 
identify  the  stimulus  increases  as  an  approximately  linear  func¬ 
tion  of  angular  deviation  from  upright  (Shepard  &  Metzler, 
1971).  This  increase  in  identification  time  is  presumed  to  reflect 
a  normalization  process  by  which  the  image  is  mentally  rotated 
in  a  continuous  manner  to  the  upright  position. 

Response  Selection 

Response  selection  refers  to  those  processes  involved  in  deter¬ 
mining  what  response  to  make  to  a  particular  stimulus.  It  is  af¬ 
fected  by  the  number  of  alternatives,  stimulus-response  com¬ 
patibility,  and  precuing.  RT  increases  as  a  logarithmic  function  of 
the  number  of  stimulus-response  alternatives  (Hick,  1952;  Hy¬ 
man  1953).  This  relation  is  known  as  the  Hick-Hyman  law,  which 
for  At  equally  likely  alternatives  is: 

KV  =  a  +  b  log2  N 

where  a  is  the  base  processing  time  and  b  is  the  amount  that 
RT  increases  with  increases  in  N.  The  slope  of  the  Hick-Hyman 
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FIGURE  2.2.  Information-processing  stages  and  variables  that  affect  them,  based  on  Sanders'  (1998)  taxonomy. 


function  is  influenced  by  many  factors.  For  example,  the  slope 
decreases  as  subjects  become  practiced  at  a  task  (Teichner  & 
Krebs,  1974).  Usher,  Olami,  and  McClelland  (2002)  provided 
evidence  from  fits  of  a  sequential  sampling  model  that  the  Hick- 
Hyman  law  is  due  to  subjects’  adjusting  their  response  criteria 
upward  as  the  number  of  alternatives  increases,  in  an  attempt  to 
maintain  a  constant  high  level  of  accuracy. 

One  variable  that  influences  the  slope  of  the  Hick-Hyman 
function  is  stimulus-response  compatibility,  which  has  consid¬ 
erable  impact  on  response-selection  efficiency  (see  Proctor  & 
"Vu,  2006,  for  a  review  of  compatibility  principles).  Compatibil¬ 


ity  effects  are  differences  in  speed  and  accuracy  of  responding 
as  a  function  of  how  natural,  or  compatible,  the  relation  be¬ 
tween  stimuli  and  responses  is.  Two  types  of  compatibility  ef¬ 
fects  can  be  distinguished  (Kornblum,  Hasbroucq,  &  Osman, 
1990).  For  one  type,  certain  sets  of  stimuli  are  more  compati¬ 
ble  with  certain  sets  of  responses  than  with  others.  For  example, 
the  combinations  of  verbal-vocal  and  spatial-manual  sets  yield 
better  performance  than  the  combinations  of  verbal-manual 
and  spatial-vocal  sets  (Wang  &  Proctor,  1996).  For  the  other 
type,  within  a  specific  stimulus-response  set,  some  mappings 
of  individual  stimuli  to  responses  produce  better  performance 
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than  others.  If  one  stimulus  has  the  meaning  “left”  and  the 
other  “right,”  performance  is  better  if  the  left  stimulus  is 
mapped  to  the  left  response  and  the  right  stimulus  to  the  right 
response,  for  all  stimulus  and  response  modes. 

Fitts  and  Seeger  (1953)  and  Fitts  and  Deininger  (1954)  dem¬ 
onstrated  both  types  of  compatibility  effects  for  spatially  ar¬ 
ranged  display  and  response  panels.  However,  compatibility  ef¬ 
fects  occur  for  a  much  wider  variety  of  other  stimulus-response 
sets.  According  to  Kornblum  et  al.  (1990),  dimensional  overlap 
(similarity)  between  the  stimulus  and  response  sets  is  the  criti¬ 
cal  factor.  When  the  sets  have  dimensional  overlap,  a  stimulus 
will  activate  its  corresponding  response  automatically.  If  this  re¬ 
sponse  is  correct  (compatible  mapping) ,  responding  will  be  fa¬ 
cilitated,  but  if  it  is  not  correct  (incompatible  mapping),  re¬ 
sponding  will  be  inhibited.  A  second  factor  contributing  to  the 
advantage  for  the  compatible  mapping  is  that  intentional  trans¬ 
lation  of  the  stimulus  into  a  response  will  occur  quicker  when 
the  mapping  is  compatible  than  when  it  is  not.  Most  contem¬ 
porary  models  of  stimulus-response  compatibility  incorporate 
both  automatic  and  intentional  response-selection  routes  (Hom- 
mel  &  Prinz,  1997),  although  they  differ  regarding  the  exact  con¬ 
ditions  under  which  each  plays  a  role  and  the  way  in  which  they 
interact. 

One  reason  why  automatic  activation  is  considered  to  con¬ 
tribute  to  compatibility  effects  is  that  such  effects  occur  when 
irrelevant  stimulus  information  overlaps  with  the  response  set 
(Lu  &  Proctor,  1995) .  The  Stroop  color-naming  effect,  for  which 
an  incongruent  color  word  produces  interference  in  naming  a 
relevant  stimulus  color,  is  the  most  well  known  example.  An  ir¬ 
relevant  stimulus  location  also  produces  interference  when  it  is 
incongruent  with  the  location  of  a  key-press  to  a  relevant  stimu¬ 
lus  dimension,  a  phenomenon  known  as  the  Simon  effect  (Si¬ 
mon,  1990).  Psychophysiological  studies  in  which  the  LRP  has 
been  measured  have  provided  evidence  that  the  Simon  effect  is 
due  at  least  in  part  to  activation  of  the  response  corresponding 
to  stimulus  location  (Valle-Inclan,  de  Labra,  &  Redondo,  2000). 

For  completely  unrelated  stimulus  and  response  sets  that  are 
structured,  performance  is  better  when  structural  correspon¬ 
dence  is  maintained  (Reeve  &  Proctor,  1990).  For  instance, 
when  stimuli  and  responses  are  ordered  (e.g.,  a  row  of  four 
stimulus  locations  and  a  row  of  four  response  locations) ,  RT  is 
faster  when  the  stimulus-response  mapping  can  be  character¬ 
ized  by  a  rule  (e.g.,  pressing  the  key  at  the  mirror  opposite  lo¬ 
cation)  than  when  the  mapping  is  random  (Duncan,  1977).  Spa¬ 
tial  compatibility  effects  also  occur  when  display  and  response 
elements  refer  to  orthogonal  spatial  dimensions  (Cho  &  Proc¬ 
tor,  2003).  However,  stimulus-response  compatibility  effects 
sometimes  do  not  occur  under  conditions  in  which  one  would 
expect  them  to.  For  example,  when  compatible  and  incompati¬ 
ble  mappings  are  mixed  within  a  single  block,  the  typical  com¬ 
patibility  effect  is  eliminated  (Shaffer,  1965;  Vu  &  Proctor,  2004). 
Moreover,  the  same  display  and  response  elements  can  be 
coded  along  multiple  dimensions  in  certain  situations  (e.g.,  ver¬ 
tical  position  versus  horizontal  position).  The  relative  impor¬ 
tance  of  maintaining  compatibility  on  each  dimension  is  a  func¬ 
tion  of  how  salient  the  dimensions  are  made  by  the  task 
environment  (Rubichi,  Vu,  Nicoletti,  &  Proctor,  2006). 

Because  when  and  where  compatibility  effects  are  going  to 
occur  is  not  always  obvious,  interface  designers  are  likely  to 


make  poor  decisions  if  they  rely  only  on  their  intuitions.  Payne 
(1995),  Vu  and  Proctor  (2003),  and  Tlauka  (2004)  have  shown 
that  naive  subjects  can  predict  simple  compatibility  effects,  such 
as  that  performance  will  be  better  with  a  mapping  that  is  spatially 
compatible  than  with  one  that  is  not.  However,  they  are  not  able 
to  accurately  predict  many  other  compatibility  effects  that  occur, 
such  as  the  benefit  of  maintaining  a  consistent  stimulus-response 
mapping  rule.  One  encouraging  finding  is  that  estimates  of  rela¬ 
tive  compatibility  can  be  improved  by  a  small  amount  of  experi¬ 
ence  performing  with  the  different  stimulus-response  mappings 
(Vu  &  Proctor,  2003).  The  important  point  for  HCI  is  that  de¬ 
signers  need  to  be  aware  of  the  potential  problems  created  by 
various  types  of  incompatibility  between  display  and  response  el¬ 
ements  because  their  effects  are  not  always  obvious.  A  designer 
can  get  a  better  feel  for  the  relative  compatibility  of  alternative 
arrangements  by  performing  tasks  that  use  them.  However,  after 
the  designer  selects  a  few  arrangements  that  would  seem  to  yield 
good  performance,  these  remaining  arrangements  need  to  be 
tested  more  thoroughly  on  groups  of  users. 

Response  Execution 

“Motor  programming”  refers  to  specification  of  the  physical  re¬ 
sponse  that  is  to  be  made.  This  process  is  affected  by  variables 
such  as  relative  stimulus-response  frequency  and  movement 
direction.  One  factor  that  influences  this  stage  is  movement 
complexity.  The  longer  the  sequence  of  movements  that  is  to  be 
made  upon  occurrence  of  a  stimulus  in  a  choice-reaction  task, 
the  longer  the  RT  to  initiate  the  sequence  (Sternberg,  Monsell, 
Knoll,  &  Wright,  1978).  This  effect  is  thought  to  be  due  to  the 
time  required  to  load  the  movement  sequence  into  a  buffer  be¬ 
fore  initiating  the  movements.  Time  to  initiate  the  movement 
sequence  decreases  with  practice,  and  recent  fMRI  evidence 
suggests  that  this  decrease  in  RT  involves  distinct  neural  sys¬ 
tems  that  support  visuomotor  learning  of  finger  sequences  and 
spatial  learning  of  the  locations  of  the  finger  movements  on  a 
keypad  (Parsons,  Harrington,  &  Rao,  2005). 

One  of  the  most  widely  known  relations  attributed  to  re¬ 
sponse  execution  is  Fitts’  Law,  which  describes  the  time  to  make 
aimed  movements  to  a  target  location  (Fitts,  1954).  This  law,  as 
originally  specified  by  Fitts,  is: 

Movement  Time  =  a  +  b  log2  (TDIW) 

where  a  and  b  are  constants,  D  is  distance  to  the  target,  and  W 
is  target  width.  However,  there  are  slightly  different  versions  of 
the  law.  According  to  Fitts’  Law,  movement  time  is  a  direct  func¬ 
tion  of  distance  and  an  inverse  function  of  target  width.  Fitts’ 
Law  has  been  found  to  provide  an  accurate  description  of  move¬ 
ment  time  in  many  situations,  although  alternatives  have  been 
proposed  for  certain  situations.  One  of  the  factors  that  con¬ 
tribute  to  the  increase  in  movement  time  as  the  index  of  diffi¬ 
culty  increases  is  the  need  to  make  a  corrective  submovement 
based  on  feedback  in  order  to  hit  the  target  location  (Meyer, 
Abrams,  Kornblum,  Wright,  &  Smith,  1988). 

The  importance  of  Fitts’  Law  for  HCI  is  illustrated  by  the  fact 
that  the  December  2004  issue  of  International  Journal  of  Hu¬ 
man-Computer  Studies  was  devoted  to  the  50th  anniversary  of 
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Fitts’  original  study.  In  the  preface  to  the  issue,  the  editors,  Guiard 
and  Beudouin-Lafon  (2004),  stated:  “What  has  come  to  be  known 
as  Fitts’  law  has  proven  highly  applicable  in  Human-Computer 
Interaction  (HCI),  making  it  possible  to  predict  reliably  the  mini¬ 
mum  time  for  a  person  in  a  pointing  task  to  reach  a  specified  tar¬ 
get”  (p.  747).  Several  illustrations  of  this  point  follow. 

One  implication  of  the  law  for  interface  design  is  that  the 
slope  of  the  function  {b)  may  vary  across  different  control  de¬ 
vices,  in  which  case  movement  times  will  be  faster  for  the  de¬ 
vices  that  yield  lower  slopes.  Card,  English,  and  Burr  (1978)  con¬ 
ducted  a  study  that  evaluated  how  efficient  text  keys,  step  keys, 
a  mouse,  and  a  joystick  are  at  a  text-selection  task  in  which  users 
selected  text  by  positioning  the  cursor  on  the  desired  area  and 
pressing  a  button  or  key.  They  showed  that  the  mouse  was  the 
most  efficient  device  for  this  task;  positioning  time  for  the 
mouse  and  joystick  could  be  accounted  for  by  Fitts’  Law,  with 
the  slope  of  the  function  being  less  steep  for  the  mouse;  posi¬ 
tioning  time  with  the  keys  was  proportional  to  the  number  of 
keystrokes  that  had  to  be  executed. 

Another  implication  of  Fitts’  Law  is  that  any  reduction  in  the 
index  of  difficulty  should  decrease  the  time  for  movements. 
Walker,  Smelcer,  and  Nilsen  (1991)  evaluated  movement  time 
and  accuracy  of  menu  selection  for  the  mouse.  Their  results 
showed  that  reducing  the  distance  to  be  traveled  (which  re¬ 
duces  the  index  of  difficulty)  by  placing  the  initial  cursor  in  the 
middle  of  the  menu,  rather  than  the  top,  improved  movement 
time.  Placing  a  border  around  the  menu  item  in  which  a  click 
would  still  activate  that  item,  and  increasing  the  width  of  the 
border  as  the  travel  distance  increases,  also  improved  perfor¬ 
mance.  The  reduction  in  movement  time  by  use  of  borders  is 
predicted  by  Fitts’  Law  because  borders  increase  the  size  of  the 
target  area. 

Gillan,  Holden,  Adam,  Rudisill,  and  Magee  (1992)  noted  that 
designers  must  be  cautious  when  applying  Fitts’  Law  to  HCI  be¬ 
cause  factors  other  than  distance  and  target  size  play  a  role 
when  using  a  mouse.  Specifically,  they  proposed  that  the  critical 
factors  in  pointing  and  dragging  are  different  than  those  in 
pointing  and  clicking  (which  was  the  main  task  in  Card  et  al.’s 
(1978)  study).  Gillan  et  al.  showed  that,  for  a  text-selection  task, 
both  point-click  and  point-drag  movement  times  can  be  ac¬ 
counted  for  by  Fitts’  Law.  For  point-click  sequences,  the  diago¬ 
nal  distance  across  the  text  object,  rather  than  the  horizontal 
distance,  provided  the  best  fit  for  pointing  time.  For  point-drag, 
the  vertical  distance  of  the  text  provided  the  best  fit.  The  reason 
why  the  horizontal  distance  is  irrelevant  is  that  the  cursor  must 
be  positioned  at  the  beginning  of  the  string  for  the  point-drag 
sequence.  Thus,  task  requirements  should  be  taken  into  ac¬ 
count  before  applying  Fitts’  Law  to  the  interface  design. 

Motor  adjustment  deals  with  the  transition  from  a  central  mo¬ 
tor  program  to  peripheral  motor  activity.  Studies  of  motor  ad¬ 
justment  have  focused  on  the  influence  of  foreperiod  duration  on 
motor  preparation.  In  a  typical  study,  a  neutral  warning  signal  is 
presented  at  various  intervals  prior  to  the  onset  of  the  impera¬ 
tive  stimulus.  Bertelson  (1967)  varied  the  duration  of  the  warning 
foreperiod  and  found  that  RT  reached  a  minimum  at  a  foreperiod 
of  150  ms  and  then  increased  slightly  at  the  200-  and  300-ms 
foreperiods.  However,  error  rate  increased  to  a  maximum  at  the 
150-ms  foreperiod  and  decreased  slightly  at  the  longer  foreperi¬ 
ods.  This  relatively  typical  pattern  suggests  that  it  takes  time  to  at¬ 


tain  a  state  of  high  motor  preparation,  and  that  this  state  reflects 
an  increased  readiness  to  respond  quickly  at  the  expense  of  ac¬ 
curacy.  Courtire,  Hardouin,  "Vidal,  Possamai,  and  Hasbroucq 
(2003)  recently  concluded  that  nitrous  oxide  impaired  motor  ad¬ 
justment  in  rats  because  inhalation  of  nitrous  oxide  interacted 
with  foreperiod  duration  but  not  with  stimulus  luminance. 


MEMORY  IN  INFORMATION  PROCESSING 


Memory  refers  to  explicit  recollection  of  information  in  the  ab¬ 
sence  of  the  original  stimulus  and  to  persisting  effects  of  that  in¬ 
formation  on  information  processing  that  maybe  implicit.  Mem¬ 
ory  may  involve  recall  of  an  immediately  preceding  event  or  one 
many  years  in  the  past,  knowledge  derived  from  everyday  life 
experiences  and  education,  or  procedures  learned  to  accom¬ 
plish  complex  perceptual-motor  tasks.  Memory  can  be  classified 
into  several  categories.  Episodic  memory  refers  to  memory  for  a 
specific  event  such  as  going  to  the  movie  last  night,  whereas 
semantic  memory  refers  to  general  knowledge  such  as  what  a 
movie  is.  Declarative  memory  is  verbalizable  knowledge,  and 
procedural  memory  is  knowledge  that  can  be  expressed  non¬ 
verbally.  In  other  words,  declarative  memory  is  knowledge  that 
something  is  the  case,  whereas  procedural  memory  is  knowl¬ 
edge  of  how  to  do  something.  For  example,  telling  your  friend 
your  new  phone  number  involves  declarative  memory,  whereas 
riding  a  bicycle  involves  procedural  knowledge.  A  memory  test 
is  regarded  as  explicit  if  a  person  is  asked  to  judge  whether  a 
specific  item  or  event  has  occurred  before  in  a  particular  con¬ 
text;  the  test  is  implicit  if  the  person  is  to  make  a  judgment,  such 
as  whether  a  string  of  letters  is  a  word  or  nonword,  that  can  be 
made  without  reference  to  earlier  “priming”  events.  In  this  sec¬ 
tion,  we  focus  primarily  on  explicit  episodic  memory. 

Three  types  of  memory  systems  are  customarily  distin¬ 
guished:  sensory  stores,  short-term  memory  (STM;  or  working 
memory),  and  long-term  memory  (LTM).  Sensory  stores,  which 
we  will  not  cover  in  detail,  refer  to  brief  modality-specific  per¬ 
sistence  of  a  sensory  stimulus  from  which  information  can  be  re¬ 
trieved  for  one  or  two  seconds  (see  Nairne,  2003).  STM  and  LTM 
are  the  main  categories  by  which  investigations  of  episodic 
memory  are  classified,  and  as  the  terms  imply,  the  distinction  is 
based  primarily  on  duration.  The  dominant  view  is  that  these 
are  distinct  systems  that  operate  according  to  different  princi¬ 
ples,  but  there  has  been  debate  over  whether  the  processes  in¬ 
volved  in  these  two  types  of  memories  are  the  same  or  different. 
A  recent  fMRI  study  by  Talmi,  Grady,  Goshen-Gottstein,  and 
Moscovitch  (2005)  found  that  recognition  of  early  items  in  the 
list  was  accompanied  by  activation  of  areas  in  the  brain  associ¬ 
ated  with  LTM  whereas  recognition  of  recent  items  did  not, 
supporting  a  distinction  between  STM  and  LTM  stores. 

Short-Term  (Working)  Memory 

STM  refers  to  representations  that  are  currently  being  used  or 
have  recently  been  used  and  last  for  a  short  duration.  A  distin¬ 
guishing  characteristic  is  that  STM  is  of  limited  capacity.  This 
point  was  emphasized  in  G.  A.  Miller’s  (1956)  classic  article. 
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“The  Magical  Number  Seven  Plus  or  Minus  Two,”  in  which  he 
indicated  that  capacity  is  not  simply  a  function  of  the  number 
of  items,  but  rather  the  number  of  “chunks.”  For  example,  “i,” 
“b,”  and  “m”  are  three  letters,  but  most  people  can  combine 
them  to  form  one  meaningful  chunk:  “IBM.”  Consequently, 
memory  span  is  similar  for  strings  of  unrelated  letters  and 
strings  of  meaningful  acronyms  or  words.  Researchers  refer  to 
the  number  of  items  that  can  be  recalled  correctly,  in  order,  as 
“memory  span.” 

As  most  people  are  aware  from  personal  experience,  if  dis¬ 
tracted  by  another  activity,  information  in  STM  can  be  forgot¬ 
ten  quickly.  With  respect  to  HCI,  Oulasvirta  and  Saariluoma 
(2004)  noted  that  diversion  of  attention  from  the  current  task  to 
a  competing  task  is  a  common  occurrence,  for  example,  when 
an  unrequested  pop-up  dialogue  box  requiring  an  action  ap¬ 
pears  on  a  computer  screen.  Laboratory  experiments  have 
shown  that  recall  of  a  string  of  letters  that  is  within  the  mem¬ 
ory  span  decreases  to  close  to  chance  levels  over  a  retention 
interval  of  18  s  when  rehearsal  is  prevented  by  an  unrelated  dis- 
tractor  task  (Brown,  1958;  Peterson  &  Peterson,  1959).  This 
short-term  forgetting  was  thought  initially  to  be  a  consequence 
of  decay  of  the  memory  trace  due  to  prevention  of  rehearsal. 
However,  Keppel  and  Underwood  (1962)  showed  that  proactive 
interference  from  items  on  previous  lists  is  a  significant  con¬ 
tributor  to  forgetting.  They  found  no  forgetting  at  long  reten¬ 
tion  intervals  when  only  the  first  list  in  a  series  was  examined, 
with  the  amount  of  forgetting  being  much  larger  for  the  sec¬ 
ond  and  third  lists  as  proactive  interference  built  up.  Consis¬ 
tent  with  this  interpretation,  “release”  from  proactive  inhibi¬ 
tion — that  is,  improved  recall — occurs  when  the  category  of  the 
to-be-remembered  items  on  the  current  list  differs  from  that  of 
previous  lists  (D.  D.  Wickens,  1970). 

The  capacity  limitation  of  STM  noted  by  G.  A.  Miller  (1956) 
is  closely  related  to  the  need  to  rehearse  the  items.  Research  has 
shown  that  the  memory  span,  the  number  of  words  that  can  be 
recalled  correctly  in  order,  varies  as  a  function  of  word  length. 
That  is,  the  number  of  items  that  can  be  retained  decreases  as 
word  length  increases.  Evidence  has  indicated  that  the  capacity 
is  the  number  of  syllables  that  can  be  said  in  about  two  seconds 
(Baddeley  Thomson,  &  Buchanan,  1975;  Schweickert  &  Boruff, 
1986).  That  pronunciation  rate  is  critical  suggests  a  time-based 
property  of  STM,  which  is  consistent  with  a  decay  account.  Con¬ 
sequently,  the  most  widely  accepted  view  is  that  both  interfer¬ 
ence  and  decay  contribute  to  short-term  forgetting,  with  decay 
acting  over  the  first  few  seconds  and  interference  accounting  for 
the  largest  part  of  the  forgetting. 

As  the  complexity  of  an  HCI  task  increases,  one  consequence 
is  the  overloading  of  STM.  Jacko  and  Ward  (1996)  varied  four  dif¬ 
ferent  determinants  of  task  complexity  (multiple  paths,  multiple 
outcomes,  conflicting  interdependence  among  paths,  or  uncer¬ 
tain  or  probabilistic  linkages)  in  a  task  requiring  use  of  a  hierar¬ 
chical  menu  to  acquire  specified  information.  When  one  deter¬ 
minant  was  present,  performance  was  slowed  by  approximately 
50%,  and  when  two  determinants  were  present  in  combination, 
performance  was  slowed  further.  That  is,  as  the  number  of  com¬ 
plexity  determinants  in  the  interface  increased,  performance 
decreased.  Jacko  and  Ward  attributed  the  decrease  in  perfor¬ 
mance  for  all  four  determinants  to  the  increased  STM  load  they 
imposed. 


The  best-known  model  of  STM  is  Baddeley  and  Hitch’s 
(1974)  working  memory  model,  which  partitions  STM  into 
three  main  parts:  central  executive,  phonological  loop,  and  vi- 
suospatial  sketchpad.  The  central  executive  controls  and  coor¬ 
dinates  the  actions  of  the  phonological  loop  and  visuospatial 
sketchpad.  The  phonological  loop  is  composed  of  a  phonolog¬ 
ical  store  that  is  responsible  for  storage  of  the  to-be-remem¬ 
bered  items,  and  an  articulatory  control  process  that  is  respon¬ 
sible  for  recoding  verbal  items  into  a  phonological  form  and 
rehearsal  of  those  items.  The  items  stored  in  the  phonological 
store  decay  over  a  short  interval  and  can  be  refreshed  through 
rehearsal  from  the  articulatory  control  process.  The  visuospatial 
sketchpad  retains  information  regarding  visual  and  spatial  in¬ 
formation,  and  it  is  involved  in  mental  imagery. 

The  working  memory  model  has  been  successful  in  explain¬ 
ing  several  phenomena  of  STM  (Baddeley,  2003) — for  example, 
that  the  number  of  words  that  can  be  recalled  is  affected  by 
word  length.  However,  the  model  cannot  explain  why  memory 
span  for  visually  presented  material  is  only  slightly  reduced 
when  subjects  engage  in  concurrent  articulatory  suppression 
(such  as  saying  the  word  “the”  aloud  repeatedly).  Articulatory 
suppression  should  monopolize  the  phonological  loop,  pre¬ 
venting  any  visual  items  from  entering  it.  To  account  for  such 
findings,  Baddeley  revised  the  working  memory  model  to  in¬ 
clude  an  episodic  buffer  (see  Fig.  2.3).  The  buffer  is  a  limited- 
capacity  temporary  store  that  can  integrate  information  from 
the  phonological  loop,  visuospatial  sketchpad,  and  long-term 
memory.  By  attending  to  a  given  source  of  information  in  the 
episodic  buffer,  the  central  executive  can  create  new  cognitive 
representations  that  might  be  useful  in  problem  solving. 

Long-Term  Memory 

LTM  refers  to  representations  that  can  be  remembered  for  du¬ 
rations  longer  than  can  be  attributed  to  STM.  LTM  can  involve 
information  presented  from  minutes  to  years  ago.  Initially,  it  was 


FIGURE  2.3.  Baddeley's  (2000)  revised  working  memory  model. 
From  "The  episodic  buffer:  A  new  component  of  working  mem¬ 
ory?”  by  A.  D.  Baddeley,  2000,  Trends  in  Cognitive  Sciences,  4,  p.  42 1 . 
Copyright  2000  by  Elsevier  Science  Ltd.  with  permission. 
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thought  that  the  probability  of  an  item  being  encoded  into  LTM 
was  a  direct  function  of  the  amount  of  time  that  it  was  in  STM, 
or  how  much  it  was  rehearsed.  However,  Craik  and  Watkins 
(1973)  showed  that  rehearsal  in  itself  is  not  sufficient,  but  rather 
that  deep-level  processing  of  the  meaning  of  the  material  is  the 
important  factor  in  transferring  items  to  LTM.  They  presented 
subjects  with  a  list  of  words  and  instructed  them  that  when  the 
experimenter  stopped  the  presentation,  they  were  to  recall  the 
last  word  starting  with  the  letter  “a.”  The  number  of  other  words 
between  instances  of  “a”  words  was  varied,  with  the  idea  that 
the  amount  of  time  a  word  was  rehearsed  would  depend  on  the 
number  of  words  before  the  next  “a”  word.  At  the  end  of  the 
session,  subjects  were  given  a  surprise  test  in  which  they  were 
to  recall  all  “a”  words.  There  was  no  effect  of  number  of  inter¬ 
vening  words  on  recall,  suggesting  that  although  subjects  re¬ 
hearsed  the  words  longer,  their  recall  did  not  improve  because 
the  words  were  not  processed  deeply. 

Craik  and  Watkins’  (1973)  results  were  consistent  with  the 
levels  of  processing  framework  proposed  by  Craik  and  Lock¬ 
hart  (1972).  According  to  this  view,  encoding  proceeds  in  a  se¬ 
ries  of  analyses,  from  shallow  perceptual  features  to  deeper, 
semantic  levels.  The  deeper  the  level  of  processing,  the  more 
strongly  the  item  is  encoded  in  memory.  A  key  study  support¬ 
ing  the  levels  of  processing  view  is  that  of  Hyde  and  Jenkins 
(1973).  In  their  study,  groups  of  subjects  were  presented  a  list 
of  words  for  which  they  engaged  in  shallow  processing  (e.g., 
deciding  whether  each  word  contained  a  capital  letter)  or 
deep  processing  (e.g.,  identifying  whether  each  word  was  a 
verb  or  a  noun).  Subjects  were  not  told  in  advance  that  they 
would  be  asked  to  recall  the  words,  but  were  given  a  surprise 
recall  test  at  the  end  of  the  session.  The  results  showed  that 
the  deep  processing  group  recalled  more  words  than  the  shal¬ 
low  processing  group.  Of  direct  relevance  to  HCI,  Oulasvirta, 
Karkkainen,  and  Laarni  (2005)  recently  found  that  participants 
who  viewed  the  content  area  of  a  web  page  had  no  better 
memory  for  the  locations  of  content  objects  than  did  a  con¬ 
trol  group  who  guessed  where  those  objects  would  be  placed, 
because  the  participants’  task  was  to  locate  navigational  ob¬ 
jects  on  the  page  and  not  to  process  the  content  information. 

Another  well-known  principle  for  LTM  is  encoding  specificity, 
which  states  that  the  probability  that  a  retrieval  cue  results  in 
recollection  of  an  earlier  event  is  an  increasing  function  of  the 
match  between  the  features  encoded  initially  and  those  pro¬ 
vided  by  the  retrieval  cue  (Tulving  &  Thomson,  1973).  An  impli¬ 
cation  of  this  principle  is  that  memory  will  be  context  dependent. 
Godden  and  Baddeley  (1975)  demonstrated  a  context-depen¬ 
dent  memory  effect  by  having  divers  learn  a  list  of  words  on 
land  or  under  water,  and  recall  the  words  on  land  or  under  wa¬ 
ter.  Recall  was  higher  for  the  group  who  learned  on  land  when 
the  test  took  place  on  land  than  under  water,  and  vice  versa  for 
the  group  who  learned  under  water.  A  related  principle  is  that  of 
transfer-appropriate  processing  (Morris,  Bransford,  &  Franks, 
1977).  Morris  et  al.  showed  that  deep-level  semantic  judgments 
during  study  produced  better  performance  than  shallow  rhyme 
judgments  on  a  standard  recognition  memory  test.  However, 
when  the  memory  test  required  decisions  about  whether  the 
test  words  rhymed  with  studied  words,  the  rhyme  judgments 
led  to  better  performance  than  the  semantic  judgments.  Healy 
■Wbhldmann,  and  Bourne  (2005)  have  proposed  that  encoding 


specificity  and  transfer-appropriate  processing  can  be  incorpo¬ 
rated  within  the  single  principle  of  procedural  reinstatement: 
Retention  will  be  evident  to  the  extent  that  the  procedures  en¬ 
gaged  in  during  study  or  training  are  reinstated  during  the  re¬ 
tention  test. 

Research  has  confirmed  that  the  levels-of-processing  frame¬ 
work  must  accommodate  the  effects  of  the  retention  context, 
as  captured  by  the  above  principles,  to  explain  the  effects  of 
processing  performed  during  encoding.  Although  levels-of- 
processing  has  a  strong  effect  on  accuracy  of  explicit  recall  and 
recognition,  Jacoby  and  Dallas  (1981)  found  no  effect  on  an  im¬ 
plicit  memory  test.  Later  studies  have  shown  a  robust  effect  of 
levels-of-processing  on  implicit  tests  similar  to  that  obtained  for 
recall  and  recognition  if  the  test  is  based  on  conceptual  cues, 
rather  than  on  perceptual  cues  (see  Challis,  Velichkovsky,  & 
Craik,  1996).  Challis  et  al.  constructed  direct  recognition  tests, 
in  which  the  words  were  graphemically  phonologically,  or  se¬ 
mantically  similar  to  the  studied  words,  that  showed  no  levels  of 
processing  effect.  They  emphasized  that,  to  account  for  levels- 
of-processing  results,  it  is  necessary  to  specify  the  types  of  in¬ 
formation  produced  by  the  levels-of-processing,  the  types  of 
information  required  for  the  specific  test,  and  how  task  instruc¬ 
tions  modify  encoding  and  retrieval  processes. 

Other  Factors  Affecting  Retrieval  of  Earlier  Events 

Memory  researchers  have  studied  many  factors  that  influence 
long-term  retention.  Not  surprisingly,  episodic  memory  im¬ 
proves  with  repetition  of  items  or  events.  Also,  massed  repeti¬ 
tion  (repeating  the  same  item  in  a  row)  is  less  effective  than 
spaced  repetition  (repeating  the  same  item  with  one  or  more 
intervening  items).  This  benefit  for  spaced  repetition,  called 
the  “spacing  effect”  or  “lag  effect,”  is  often  attributed  to  two 
main  factors.  First,  study  time  for  the  same  items  appearing  in 
succession  is  less  than  study  time  for  the  same  items  appear¬ 
ing  further  apart.  Second,  when  the  items  are  studied  over  a 
longer  period  of  time,  there  is  an  opportunity  for  the  items  to 
be  associated  with  different  cues  that  can  aid  recall  later.  The 
spacing  or  lag  effect  is  widespread  and  occurs  for  both  recall 
and  recognition  (Hintzman,  1974).  Bahrick  and  Hall  (2005) 
noted  that  a  similar  spacing  benefit  is  found  for  learning  lists 
of  items  when  practice  sessions,  each  with  test  and  learning 
phases,  are  separated  by  several  days.  They  presented  evidence 
that  a  large  part  of  the  spacing  benefit  in  this  case  arises  from 
individuals  determining  which  study  strategies  are  more  effec¬ 
tive  at  promoting  long-term  retention  and  then  using  those 
strategies  more. 

Another  widely  studied  phenomenon  is  the  generation  ef¬ 
fect,  in  which  recall  is  better  when  subjects  have  to  generate  the 
to-be-remembered  words  rather  than  just  studying  the  words  as 
they  are  presented  (Slamecka  &  Graf,  1978).  In  a  generation  ef¬ 
fect  experiment,  subjects  are  divided  into  two  groups,  read  and 
generate.  Each  group  receives  a  series  of  words,  with  each  word 
spelled  out  completely  for  the  read  group  and  missing  letters 
for  the  generate  group.  An  example  is  as  follows: 

Read  group:  CAT;  ELEPHANT;  GRAPE;  CAKE 

Generate  group:  C  _  T;  E_E_H  _  _  NT;  G  _  APE;  CAK_ 
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The  typical  results  show  that  subjects  in  the  generate  group 
can  recall  more  words  than  those  in  the  read  group.  One  appli¬ 
cation  of  the  generation  effect  to  HCI  is  that  when  a  computer 
user  needs  a  password  for  an  account,  the  system  should  allow 
the  user  to  generate  the  password  rather  than  providing  her 
with  one  because  the  user  would  be  more  likely  to  recall  the 
generated  password.  The  common  method  of  proactive  pass¬ 
word  generation,  in  which  users  are  asked  to  generate  a  pass¬ 
word  that  meets  certain  restrictions  (e.g.,  contain  an  uppercase 
letter,  a  lowercase  letter,  a  digit,  etc.),  is  intended  to  result  in 
more  memorable  and  secure  passwords  (e.g.,  Vu,  Bhargav,  & 
Proctor,  2003). 

Events  that  precede  or  follow  an  event  of  interest  can  inter¬ 
fere  with  recall  of  that  event.  The  former  is  referred  to  as  “proac¬ 
tive  interference,”  and  was  discussed  in  the  section  on  STM; 
the  latter  is  referred  to  as  “retroactive  interference.”  One  area  of 
research  in  which  retroactive  interference  is  of  central  concern 
is  that  of  eyewitness  testimony.  Loftus  and  Palmer  (1974) 
showed  that  subsequent  events  could  distort  a  person’s  mem¬ 
ory  of  an  event  that  the  person  witnessed.  Subjects  were  shown 
a  sequence  of  events  depicting  a  car  accident.  Subsequently, 
they  were  asked  the  question,  “How  fast  were  the  cars  going 

when  they _ each  other.”  When  the  verb  “contacted”  was 

used,  subjects  estimated  the  speed  to  be  32  miles  per  hour,  and 
only  one-tenth  of  them  reported  seeing  broken  glass.  However, 
when  the  verb  “smashed”  was  used,  the  estimated  speed  in¬ 
creased  to  41  miles  per  hour,  and  almost  one  third  of  the  sub¬ 
jects  reported  seeing  broken  glass.  Demonstrations  like  these 
indicate  not  only  that  retroactive  interference  can  cause  forget¬ 
ting  of  events,  but  that  it  also  can  cause  the  memory  of  events 
to  be  changed.  More  recent  research  has  shown  that  com¬ 
pletely  false  memories  can  be  implanted  (see  Roediger  &  Mc¬ 
Dermott,  1995). 

Mnemonic  techniques  can  also  be  used  to  improve  recall. 
The  basic  idea  behind  mnemonics  is  to  connect  the  to-be- 
remembered  material  with  an  established  organizational  struc¬ 
ture  that  can  be  easily  accessible  later  on.  Two  widely  used 
mnemonic  techniques  are  the  pegword  method  (Wood  &  Pratt, 
1987)  and  the  method  of  loci  (Verhaeghen  &  Marcoen,  1996). 
In  the  pegword  method,  a  familiar  rhyme  provides  the  organi¬ 
zational  structure.  A  visual  image  is  formed  between  each  peg- 
word  in  the  rhyme  and  the  associated  target  item.  At  recall,  the 
rhyme  is  generated,  and  the  associated  items  come  to  mind. 
For  the  method  of  loci,  locations  from  a  well-known  place,  such 
as  your  house,  are  associated  with  the  to-be-remembered 
items.  Although  specific  mnemonic  techniques  are  limited  in 
their  usefulness,  the  basic  ideas  behind  them  (utilizing  imagery, 
forming  meaningful  associations,  and  using  consistent  encod¬ 
ing  and  retrieval  strategies)  are  of  broad  value  for  improving 
memory  performance. 

"Vu,  Tai,  Bhargav,  Schultz,  and  Proctor  (2004)  examined  the  ef¬ 
fectiveness  of  a  “first-letter”  mnemonic  technique  to  help  users 
relate  individual  characters  of  a  password  to  a  structured  sen¬ 
tence  in  order  to  aid  recall  at  a  later  time.  In  one  condition,  "Vu 
et  al.  had  users  generate  a  sentence  and  take  the  first  letter  of 
each  word  in  the  sentence  to  form  a  password;  in  another  con¬ 
dition,  users  generated  a  sentence  that  also  included  numbers 
and  special  characters  embedded  into  the  sentence  and  result¬ 
ing  password.  Passwords  generated  using  the  first-letter  tech¬ 


nique  were  more  memorable  when  users  did  not  have  to  em¬ 
bed  a  digit  and  special  character  into  the  sentence,  but  were 
more  secure  (i.e.,  more  resistant  to  cracking)  when  the  sen¬ 
tence  and  resulting  password  included  the  digit  and  special 
character.  Thus,  when  it  comes  to  memory  and  security  of  com¬ 
puter  passwords,  there  seems  to  be  a  tradeoff  between  memo¬ 
rability  and  security. 


ATTENTION  IN  INFORMATION  PROCESSING 


Attention  is  increased  awareness  directed  at  a  particular  event 
or  action  to  select  it  for  increased  processing.  This  processing 
may  result  in  enhanced  understanding  of  the  event,  improved 
performance  of  an  action,  or  better  memory  for  the  event.  At¬ 
tention  allows  us  to  filter  out  unnecessary  information  so  that 
we  can  focus  on  a  particular  aspect  that  is  relevant  to  our  goals. 
Several  significant  information-processing  models  of  attention 
have  been  proposed. 

Models  of  Attention 

In  an  influential  study.  Cherry  (1953)  presented  subjects  with 
different  messages  to  each  ear  through  headphones.  Subjects 
were  to  repeat  aloud  one  of  the  two  messages  while  ignoring 
the  other.  "When  subsequently  asked  questions  about  the  two 
messages,  subjects  were  able  to  accurately  describe  the  message 
to  which  they  were  attending  but  could  not  describe  anything 
except  physical  characteristics,  such  as  gender  of  the  speaker, 
about  the  unattended  message. 

To  account  for  such  findings,  Broadbent  (1958)  developed 
the  filter  theory,  which  assumes  that  the  nervous  system  acts  as 
a  single-channel  processor.  According  to  filter  theory,  informa¬ 
tion  is  received  in  a  preattentive  temporary  store  and  then  is 
selectively  filtered,  based  on  physical  features  such  as  spatial 
location,  to  allow  only  one  input  to  access  the  channel.  Broad- 
bent’s  filter  theory  implies  that  the  meaning  of  unattended  mes¬ 
sages  is  not  identified,  but  later  studies  showed  that  the  unat¬ 
tended  message  could  be  processed  beyond  the  physical  level, 
in  at  least  some  cases  (Treisman,  1964). 

To  accommodate  the  finding  that  meaning  of  an  unattended 
message  can  influence  performance,  Treisman  (1964)  reformu¬ 
lated  filter  theory  into  what  is  called  the  “filter-attenuation  the¬ 
ory.”  According  to  attenuation  theory,  early  selection  by  filtering 
still  precedes  stimulus  identification,  but  the  filter  only  attenu¬ 
ates  the  information  on  unattended  channels.  This  attenuated 
signal  may  be  sufficient  to  allow  identification  if  the  stimulus  is 
one  with  a  low  identification  threshold,  such  as  a  person’s  name 
or  an  expected  event.  Deutsch  and  Deutsch  (1963)  proposed 
that  unattended  stimuli  are  always  identified  and  the  bottleneck 
occurs  in  later  processing,  a  view  called  “late-selection  theory.” 
The  difference  between  attenuation  theory  and  late-selection 
theory  and  is  that  latter  assumes  that  meaning  is  fully  analyzed, 
but  the  former  does  not. 

Lavie,  Hirst,  de  Fockert,  and  Viding  (2004)  have  proposed  a 
load  theory  of  attention,  which  they  claim  “resolves  the  long¬ 
standing  early  versus  late  selection  debate”  (p.  339).  Specifically, 
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the  load  theory  includes  two  selective  attention  mechanisms:  a 
perceptual  selection  mechanism  and  a  cognitive  control  mech¬ 
anism.  When  perceptual  load  is  high  (i.e.,  great  demands  are 
placed  on  the  perceptual  system),  the  perceptual  mechanism 
excludes  irrelevant  stimuli  from  being  processed.  When  mem¬ 
ory  load  is  high,  it  is  not  possible  to  suppress  irrelevant  infor¬ 
mation  at  a  cognitive  level.  In  support  of  load  theory,  Lavie  et  al. 
showed  that  interference  from  distracting  stimuli  is  reduced  un¬ 
der  conditions  of  high  perceptual  load  but  increased  under  con¬ 
ditions  of  high  working  memory  load. 

In  divided  attention  tasks,  a  person  must  attend  to  multiple 
sources  of  information  simultaneously.  Kahneman  (1973)  pro¬ 
posed  a  unitary  resource  model  that  views  attention  as  a  single 
resource  that  can  be  divided  up  among  different  tasks  in  differ¬ 
ent  amounts,  based  on  task  demands  and  voluntary  allocation 
strategies.  Unitary  resource  models  provided  the  impetus  for 
dual-task  methodologies  (such  as  performance  operating  char¬ 
acteristics)  and  mental  workload  analyses  that  are  used  widely 
in  HCI  (Eberts,  1994).  The  expectation  is  that  multiple  tasks 
should  produce  interference  when  their  resource  demands  ex¬ 
ceed  the  supply  that  is  available. 

Many  studies  have  shown  that  it  is  easier  to  perform  two  tasks 
together  when  they  use  different  stimulus  or  response  modali¬ 
ties  than  when  they  use  the  same  modalities.  Performance  is  also 
better  when  one  task  is  verbal  and  the  other  visuospatial  than 
when  they  are  the  same  type.  These  result  patterns  provide  the 
basis  for  multiple  resource  models  of  attention  such  as  that  of 
C.  D.  Wickens  (1984).  According  to  multiple  resource  models, 
different  attentional  resources  exist  for  different  sensory-motor 
modalities  and  coding  domains.  Multiple  resource  theory  cap¬ 
tures  the  fact  that  multiple  task  performance  typically  is  better 
when  the  tasks  use  different  input-output  modes  than  when  they 
use  the  same  modes.  However,  it  is  often  criticized  as  being  too 
flexible  because  new  resources  can  be  proposed  arbitrarily  to  fit 
any  finding  of  specificity  of  interference  (Navon,  1984). 

A  widely  used  metaphor  for  visual  attention  is  that  of  a  spot¬ 
light  that  is  presumed  to  direct  attention  to  everything  in  its 
field  (Posner  &  Cohen,  1984).  Direction  of  attention  is  not  nec¬ 
essarily  the  same  as  the  direction  of  gaze  because  the  atten¬ 
tional  spotlight  can  be  directed  independently  of  fixation.  Stud¬ 
ies  show  that  when  a  location  is  cued  as  likely  to  contain  a  target 
stimulus,  but  then  a  probe  stimulus  is  presented  at  another  lo¬ 
cation,  a  spatial  gradient  surrounds  the  attended  location  such 
that  items  nearer  to  the  focus  of  attention  are  processed  more 
efficiently  than  those  farther  away  from  it  (Yantis,  2000).  The 
movement  of  the  attentional  spotlight  to  a  location  can  be  trig¬ 
gered  by  two  types  of  cues,  exogenous  and  endogenous.  An  ex¬ 
ogenous  cue  is  an  external  event  such  as  the  abrupt  onset  of  a 
stimulus  at  a  peripheral  location  that  involuntarily  draws  the  at¬ 
tentional  spotlight  to  its  location.  Exogenous  cues  produce 
rapid  performance  benefits,  which  dissipate  quickly,  for  stimuli 
presented  at  the  cued  location.  This  is  followed  by  a  period  in 
which  performance  is  worse  for  stimuli  at  the  cued  location 
than  for  ones  presented  at  the  uncued  location,  a  phenome¬ 
non  called  “inhibition  of  return”  (Posner  &  Cohen,  1984).  An  en¬ 
dogenous  cue  is  typically  a  symbol  such  as  a  central  arrowhead 
that  must  be  identified  before  a  voluntary  shift  in  attention  to 
the  designated  location  can  be  made.  The  performance  benefits 
for  endogenous  cues  take  longer  to  develop  and  are  sustained 


for  a  longer  period  of  time  when  the  cues  are  relevant,  indicat¬ 
ing  that  their  benefits  are  due  to  conscious  control  of  the  at¬ 
tentional  spotlight  (Klein  &  Shore,  2000). 

In  a  visual  search  task,  subjects  are  to  detect  whether  a  target 
is  present  among  distractors.  Treisman  and  Gelade  (1980)  de¬ 
veloped  Feature  Integration  Theory  to  explain  the  results  from 
visual  search  studies.  When  the  target  is  distinguished  from  the 
distractors  by  a  basic  feature  such  as  color  (feature  search),  RT 
and  error  rate  often  show  little  increase  as  the  number  of  dis¬ 
tractors  increases.  However,  when  two  or  more  features  must 
be  combined  to  distinguish  the  target  from  distractors  (con¬ 
junctive  search),  RT  and  error  rate  typically  increase  sharply  as 
the  number  of  distractors  increases.  To  account  for  these  re¬ 
sults,  feature  integration  theory  assumes  that  basic  features  of 
stimuli  are  encoded  into  feature  maps  in  parallel  across  the  vi¬ 
sual  field  at  a  preattentive  stage.  Feature  search  can  be  based  on 
this  preattentive  stage  because  a  “target-present”  response  re¬ 
quires  only  detection  of  the  feature.  The  second  stage  involves 
focusing  attention  on  a  specific  location  and  combining  features 
that  occupy  the  location  into  objects.  Attention  is  required  for 
conjunctive  search  because  responses  cannot  be  based  on  de¬ 
tection  of  a  single  feature.  According  to  feature  integration  the¬ 
ory,  performance  in  conjunctive  search  tasks  decreases  as  the 
number  of  distractors  increases  because  attention  must  be 
moved  sequentially  across  the  search  field  until  a  target  is  de¬ 
tected  or  all  items  present  have  been  searched.  Feature  inte¬ 
gration  theory  served  to  generate  a  large  amount  of  research  on 
visual  search  that  showed,  as  is  typically  the  case,  that  the  situa¬ 
tion  is  not  as  simple  as  depicted  by  the  theory.  This  has  resulted 
in  modifications  of  the  theory,  as  well  as  alternative  theories.  For 
example,  Wolfe’s  (1994)  Guided  Search  Theory  maintains  the 
distinction  between  an  initial  stage  of  feature  maps  and  a  sec¬ 
ond  stage  of  attentional  binding,  but  assumes  that  the  second 
stage  is  guided  by  the  initial  feature  analysis. 

In  HCI,  a  common  visual  search  task  involves  locating  menu 
items.  When  users  know  exactly  what  option  to  search  for,  they 
can  use  identity  matching,  in  which  they  search  the  display  for 
the  menu  name  that  they  want  to  find.  Perlman  (1984)  sug¬ 
gested  that  when  identity  search  is  used,  the  menu  options 
should  be  displayed  in  alphabetical  order  to  facilitate  search. 
When  users  do  not  know  where  an  option  is  included  within  a 
main  list  of  menus,  inclusion  matching  is  used.  The  users  must 
decide  in  which  group  the  specific  option  would  be  categorized 
and  then  search  the  list  of  items  within  that  group.  With  inclu¬ 
sion  matching,  search  times  may  be  longer  for  items  that  can 
be  classified  in  more  than  one  of  the  main  groupings  or  when 
the  items  are  less  well-known  examples  of  a  main  grouping 
(Somberg  &  Picardi,  1983).  Equivalence  search  occurs  when  the 
user  knows  what  option  to  select,  but  does  not  know  how  that 
option  is  labeled.  McDonald,  Stone,  and  Liebelt  (1983)  showed 
that  alphabetical  and  categorical  organizations  yield  shorter 
search  times  than  randomized  organization  for  equivalence 
search.  Search  can  also  be  affected  by  the  breadth  versus  depth 
of  the  menu  design.  Lee  and  MacGregor  (1985)  showed  that 
deep  hierarchies  are  preferred  over  broad  ones.  However,  more 
recently,  Tullis,  Catani,  Chadwick-Dias,  and  Cianchette  (2005) 
suggested  that,  for  complex  or  ambiguous  situations,  there  is  a 
benefit  for  broad  menu  designs  because  they  facilitate  compar¬ 
ison  between  categories.  The  main  point  is  that,  when  struc- 


56  •  PROCTOR  AND  VU 


turing  menus,  designers  must  take  into  account  the  type  of 
search  in  which  the  user  would  most  likely  be  engaged. 

The  role  of  attention  in  response  selection  has  been  investi¬ 
gated  extensively  using  the  psychological  refractory  period 
(PRP)  paradigm  (Pashler,  1998).  In  the  PRP  paradigm,  a  pair  of 
choice-reaction  tasks  must  be  performed,  and  the  stimulus  on¬ 
set  asynchrony  (SOA)  of  the  second  stimulus  is  presented  at  dif¬ 
ferent  intervals.  RT  for  Task  2  is  slowed  at  short  SOAs,  and  this 
phenomenon  is  called  the  “PRP  effect.”  The  experimental  re¬ 
sults  have  been  interpreted  with  what  is  called  “locus  of  slack 
logic”  (Schweickert,  1978),  which  is  an  extension  of  additive  fac¬ 
tors  logic  to  dual-task  performance.  The  basic  idea  is  that  if  a 
Task  2  variable  has  its  effect  prior  to  a  bottleneck,  that  variable 
will  have  an  underadditive  interaction  with  SOA.  This  under¬ 
additivity  occurs  because,  at  short  SOAs,  the  slack  period  during 
which  post-bottleneck  processing  cannot  begin  can  be  used  for 
continued  processing  for  the  more  difficult  condition.  If  a  Task 
2  variable  has  its  effect  after  the  bottleneck,  the  effect  will  be  ad¬ 
ditive  with  SOA. 

The  most  widely  accepted  account  of  the  PRP  effect  is  the 
response-selection  bottleneck  model  (Pashler,  1998).  The  pri¬ 
mary  evidence  for  this  model  is  that  perceptual  variables  typi¬ 
cally  have  underadditive  interactions  with  SOA,  implying  that 
their  effects  are  prior  to  the  bottleneck.  In  contrast,  post-per¬ 
ceptual  variables  typically  have  additive  effects  with  SOA,  im¬ 
plying  that  their  effects  are  after  the  bottleneck.  There  has  been 
dispute  as  to  whether  there  is  also  a  bottleneck  at  the  later  stage 
of  response  initiation  (De  Jong,  1993);  whether  the  response- 
selection  bottleneck  is  better  characterized  as  a  parallel  proces¬ 
sor  of  limited  capacity  that  divides  resources  among  to-be- 
performed  tasks  (Tombu  &  Jolicceur,  2005);  and  whether  the 
apparent  response-selection  bottleneck  is  structural,  or  simply 
a  strategy  adopted  by  subjects  to  comply  with  task  instructions 
(Meyer  &  Kieras,  1997).  This  latter  approach  is  consistent  with 
a  recent  emphasis  on  the  executive  functions  of  attention  in  the 
coordination  and  control  of  cognitive  processes  (Monsell  & 
Driver,  2000). 

Automaticity  and  Practice 

Attention  demands  are  high  when  a  person  first  performs  a  new 
task.  However,  these  demands  decrease  and  performance  im¬ 
proves  as  the  task  is  practiced.  Because  the  quality  of  perfor¬ 
mance  and  attentional  requirements  change  substantially  as  a 
function  of  practice,  it  is  customary  to  describe  performance  as 
progressing  from  an  initial  cognitively  demanding  phase  to  a 
phase  in  which  processing  is  automatic  (Anderson,  1982;  Fitts  & 
Posner,  1967). 

The  time  to  perform  virtually  any  task  from  choice  RT  to 
solving  geometry  problems  decreases  with  practice,  with  the 
largest  benefits  occurring  early  in  practice.  Newell  and  Rosen- 
bloom  (1981)  proposed  a  power  function  to  describe  the 
changes  in  RT  with  practice: 

RT  = 

where  is  the  number  of  practice  trials,  B  is  RT  on  the  first  trial, 
and  ct  is  the  learning  rate.  Although  the  power  function  has  be¬ 


come  widely  accepted  as  a  law  that  describes  the  changes  in  RT, 
Heathcote,  Brown,  and  Mewhort  (2000)  indicated  that  it  does 
not  fit  the  functions  for  individual  performers  adequately.  They 
showed  that  exponential  functions  provided  better  fits  than 
power  functions  to  40  individual  datasets,  and  proposed  a  new 
exponential  law  of  practice.  The  defining  characteristic  of  the 
exponential  function  is  that  the  relative  learning  rate  is  a  con¬ 
stant  at  all  levels  of  practice,  whereas,  for  the  power  function, 
the  relative  learning  rate  is  a  hyperbolically  decreasing  function 
of  practice  trials. 


PROBLEM  SOLVING  AND  DECISION  MAKING 


Beginning  with  the  work  of  Newell  and  Simon  (1972),  it  has 
been  customary  to  analyze  problem  solving  in  terms  of  a  prob¬ 
lem  space.  The  problem  space  consists  of  the  following:  (a)  an 
initial  state,  (b)  a  goal  state  that  is  to  be  achieved,  (c)  operators 
for  transforming  the  problem  from  the  initial  state  to  the  goal 
state  in  a  sequence  of  steps,  and  (d)  constraints  on  application 
of  the  operators  that  must  be  satisfied.  The  problem-solving 
process  itself  is  conceived  of  as  a  search  for  a  path  that  connects 
the  initial  and  goal  states. 

Because  the  size  of  a  problem  space  increases  exponentially 
with  the  complexity  of  the  problem,  most  problem  spaces  are 
well  beyond  the  capacity  of  short-term  memory.  Consequently, 
for  problem  solving  to  be  effective,  search  must  be  constrained 
to  a  limited  number  of  possible  solutions.  A  common  way  to 
constrain  search  is  to  use  heuristics.  For  example,  people  often 
use  a  means-ends  heuristic  for  which,  at  each  step,  an  operator 
is  chosen  that  will  move  the  current  state  closer  to  the  goal  state 
(Atwood  &  Poison,  1976).  Such  heuristics  are  called  “weak 
methods”  because  they  do  not  require  much  knowledge  about 
the  exact  problem  domain.  Strong  methods,  such  as  those  used 
by  experts,  rely  on  prior  domain-specific  knowledge  and  do  not 
require  much  search  because  they  are  based  on  established 
principles  applicable  only  to  certain  tasks. 

The  problem  space  must  be  an  appropriate  representation 
of  the  problem  if  the  problem  is  to  be  solved.  One  important 
method  for  obtaining  an  appropriate  problem  space  is  to  use 
analogy  or  metaphor.  Analogy  enables  a  shift  from  a  problem 
space  that  is  inadequate  to  one  that  may  allow  the  goal  state  to 
be  reached.  There  are  several  steps  in  using  analogies  (Holland, 
Holyoak,  Nisbett,  &  Thagard,  1986),  including  detecting  simi¬ 
larity  between  source  and  target  problems,  and  mapping  the 
corresponding  elements  of  the  problems.  Humans  are  good  at 
mapping  the  problems,  but  poor  at  detecting  that  one  problem 
is  an  analog  of  another.  An  implication  for  HCI  is  that  potential 
analogs  should  be  provided  to  users  for  situations  in  which  they 
are  confronted  by  novel  problems. 

The  concept  of  mental  model,  which  is  closely  related  to  that 
of  the  problem  space,  has  become  widely  used  in  recent  years 
(see  chapter  3,  this  volume).  The  general  idea  of  mental  models 
with  respect  to  HCI  is  that  as  the  user  interacts  with  the  com¬ 
puter,  he  or  she  receives  feedback  from  the  system  that  allows 
him  or  her  to  develop  a  representation  of  how  the  system  is 
functioning  for  a  given  task.  The  mental  model  incorporates  the 
goals  of  the  user,  the  actions  taken  to  complete  the  goals,  and 
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expectations  of  the  system’s  output  in  response  to  the  actions. 
A  designer  can  increase  the  usability  of  an  interface  by  using 
metaphors  that  allow  transfer  of  an  appropriate  mental  model 
(e.g.,  the  desktop  metaphor),  designing  the  interface  to  be  con¬ 
sistent  with  other  interfaces  with  which  the  user  is  familiar  (e.g., 
the  standard  web  interface),  and  conveying  the  system’s  func¬ 
tions  to  the  user  in  a  clear  and  accurate  manner.  Feedback  to 
the  user  is  perhaps  the  most  effective  way  to  communicate  in¬ 
formation  to  the  user  and  can  be  used  to  guide  the  user’s  men¬ 
tal  model  about  the  system. 

Humans  often  have  to  make  choices  regarding  situations  in 
which  the  outcome  depends  on  events  that  are  outside  of  those 
humans’  control.  According  to  expected  utility  theory,  a  nor¬ 
mative  theory  of  decision  making  under  uncertainty,  the  deci¬ 
sion  maker  should  determine  the  expected  utility  of  a  choice 
by  multiplying  the  subjective  utility  of  each  outcome  by  the  out¬ 
come’s  probability  and  summing  the  resulting  values  (see  Proc¬ 
tor  &  'Van  Zandt,  1994).  The  expected  utility  should  be  com¬ 
puted  for  each  choice,  and  the  optimal  decision  is  the  choice 
with  the  highest  expected  utility.  It  should  be  clear  from  this 
description  that  for  all  but  the  simplest  of  problems,  a  human 
decision  maker  cannot  operate  in  this  manner.  To  do  so  would 
require  attending  to  multiple  cues  that  exceed  attentional  ca¬ 
pacity,  accurate  estimates  of  probabilities  of  various  events,  and 
maintenance  of,  and  operation  on,  large  amounts  of  informa¬ 
tion  that  exceed  short-term  memory  capacity. 

Research  of  Kahneman  and  Tversky  (2000)  and  others  has 
shown  that  what  people  do  when  the  outcome  associated  with 
a  choice  is  uncertain  is  to  rely  heavily  on  decision-making 
heuristics.  These  heuristics  include  representativeness,  avail¬ 
ability,  and  anchoring.  The  representativeness  heuristic  is  that 
the  probability  of  an  instance  being  a  member  of  a  particular  cat¬ 
egory  is  judged  on  the  basis  of  how  representative  the  instance 
is  of  the  category.  The  major  limitation  of  the  representativeness 
heuristic  is  that  it  ignores  base  rate  probabilities  for  the  respec¬ 
tive  categories.  The  availability  heuristic  involves  determining 
the  probability  of  an  event  based  on  the  ease  with  which  in¬ 
stances  of  the  event  can  be  retrieved.  The  limitation  is  that  avail¬ 
ability  is  affected  not  only  by  relative  frequency,  but  also  by 
other  factors.  The  anchoring  heuristic  involves  making  a  judg¬ 
ment  regarding  probabilities  of  alternative  states  based  on  initial 
information,  and  then  adjusting  these  probabilities  from  this  ini¬ 
tial  “anchor”  as  additional  information  is  received.  The  limita¬ 
tion  of  anchoring  is  that  the  initial  judgment  can  produce  a  bias 
for  the  probabilities.  Although  heuristics  are  useful,  they  may 
not  always  lead  to  the  most  favorable  decision.  Consequently, 
designers  need  to  make  sure  that  the  choice  desired  for  the 
user  in  a  particular  situation  is  one  that  is  consistent  with  the 
user’s  heuristic  biases. 


NEW  DEVELOPMENTS  AND 
ALTERNATIVE  APPROACHES 


It  should  not  be  surprising  that  since  its  ascendance,  the  hu¬ 
man  information-processing  approach  has  been  challenged  on 
many  grounds,  and  alternative  approaches  have  been  champi¬ 
oned.  We  describe  three  of  those  approaches  below.  In  advo¬ 


cating  one.  Smith  and  Henning  (2005)  recently  stated,  “The  ba¬ 
sic  scientific  problem  with  information  processing  models  that 
have  come  to  dominate  cognitive  science  in  the  past  four 
decades  is  that  they  are  non-refutable”  (p.  641).  Such  an  argu¬ 
ment  misses  the  point  that  although  the  theoretical  framework 
itself  is  non-refutable,  as  with  any  other  such  framework,  the 
specific  models  and  theories  developed  within  the  framework 
are  refutable.  The  issue  is  the  relative  success  of  the  informa¬ 
tion-processing  approach  at  providing  solutions  to  issues  of 
theoretical  and  applied  importance.  Pashler  (1998),  who  took 
an  information-processing  approach  to  the  study  of  attention, 
noted,  “The  success  or  failure  of  information-processing  psy¬ 
chology  can  be  assessed  only  on  the  basis  of  insights  that  do 
or  do  not  emerge  from  the  research  it  spawns”  (p.  7).  It  is  the 
assessment  of  many  researchers  that  the  information-process¬ 
ing  approach  has  generally  fared  well  on  many  counts  in  com¬ 
parison  to  alternative  approaches. 

Cognitive  Neuroscience 

A  quick  scan  of  journals  and  job  announcements  would  reveal 
that  cognitive  neuroscience  is  a  rapidly  emerging  field.  As  noted 
earlier,  the  major  goal  of  research  in  this  area  is  to  analyze  the 
brain  mechanisms  responsible  for  cognitive  functioning.  The 
knowledge  gained  from  behavioral  studies  of  human  informa¬ 
tion  processing  provides  the  foundation  for  much  of  this  re¬ 
search.  This  point  is  acknowledged  explicitly  in  the  instructions 
for  authors  for  the  Journal  of  Cognitive  Neuroscience ,  which 
state,  “The  Journal  publishes  papers  that  bridge  the  gap  be¬ 
tween  descriptions  of  information  processing  and  specifications 
of  brain  activity”  (MIT  Press,  2007).  Note  that  there  would  not 
be  a  gap  to  bridge  were  it  not  for  the  success  of  research  on  hu¬ 
man  information  processing.  If  this  gap  is  bridged  successfully, 
the  advances  in  neuroergonomics  and  augmented  cognition  en¬ 
visioned  by  Parasuraman  (2003)  and  Schmorrow  (2005)  will  be¬ 
come  reality. 

The  Ecological  Approach 

Gibson  (1979)  developed  an  ecological  approach  to  perception 
and  action  that  is  typically  presented  as  an  alternative  to  the 
information-processing  approach  (e.g.,  Schvaneveldt,  2005). 
The  ecological  approach  places  emphasis  on  analyzing  the 
information  that  is  available  in  the  optic  array  and  the  dynam¬ 
ics  of  this  information  as  individuals  and  objects  move  in  the  en¬ 
vironment.  It  also  takes  the  position  that  perception  is  direct 
and  that  accounts  relying  on  mental  representations  and  cogni¬ 
tive  processes  are  unnecessary  and  incorrect  (see,  for  example, 
Michaels  &  Stins,  1997).  Much  of  HCI  involves  performing 
information-intensive  tasks  in  artificial,  human-made  environ¬ 
ments  that  do  not  resemble  carrying  out  actions  in  the  natural 
environment.  For  example,  Aziz  and  Macredie  (2005)  wrote, 
“The  use  of  web-based  information  system  [s]  mainly  involves 
processing  of  information”  (p.  1).  Because  such  environments 
are  similar  to  those  studied  in  basic  human  information-pro¬ 
cessing  experiments,  the  approach  has  much  to  offer  in  analyz¬ 
ing  and  understanding  the  tasks  performed  in  them. 


58  •  PROCTOR  AND  VU 


For  HCI  tasks  of  a  more  ecological  nature,  such  as  navigat¬ 
ing  in  a  virtual  world,  understanding  the  information  available  to 
an  observer/actor  in  the  environment  being  simulated  is  clearly 
essential.  As  an  example,  Schvaneveldt  (2005)  noted  that  he  and 
his  colleagues  have  focused  their  recent  studies  of  aviation  on 
analyses  of  the  information  in  the  flight  environment  that  allows 
pilots  to  perform  effectively  during  different  stages  of  flight. 
From  an  information-processing  perspective,  although  such 
ecological  analyses  are  valuable,  and  indeed  necessary,  they 
must  be  incorporated  within  information-processing  models 
(e.g.,  Ullman,  1980). 

Cybernetic  Approach 

The  cybernetic  view  of  cognition  is  that  cognition  emerges  as  a 
consequence  of  motor  control  over  sensory  feedback.  The  cy¬ 
bernetic  approach  is  closely  related  to  the  ecological  approach, 
but  places  greater  emphasis  on  self-regulated  control  of  per¬ 
ception  and  cognition.  Smith  and  Henning  (2005),  who  advo¬ 
cated  a  cybernetic  approach,  took  a  strong  position  against  the 
information-processing  approach,  stated: 

The  cybernetic  perspective  presented  here  differs  in  a  number  of  fun¬ 
damental  scientific  respects  from  an  information  processing  model. 
Behavioral  cybernetics  .  .  .  emphasizes  active  control  of  information  as 
sensory  feedback  via  motor-sensory  behavior,  with  motor-sensory  be¬ 
havior  mediating  both  perception  and  cognition.  In  contrast,  infor¬ 
mation  processing  models  treat  information  as  a  fixed  commodity  pre¬ 
sented  on  the  input  side  of  the  processing  system,  ignoring  the  need 
for  specific  design  factors  to  promote  human  motor  control  over  this 
information  as  a  source  of  sensory  feedback.  This  failure  to  include 
motor-sensory  control  is  a  direct  consequence  of  using  an  information 
processing  model,  where  overt/motor  behavior  is  viewed  as  a  system 
output  with  no  central  role  in  the  organization  and  control  over  in¬ 
formation  as  feedback,  nor  over  the  control  of  subsequent  behavior 
through  the  reciprocal  effects  of  motor  control  on  psychophysiologi- 
cal  state,  (p.  641) 

Smith  and  Henning’s  (2005)  statement  stands  in  contrast  to 
Young  et  al.’s  (2004)  assessment,  described  earlier  in  the  chapter, 
that  an  information-processing  analysis  is  essential  to  under¬ 
standing  the  dynamic  interactions  of  an  operator  with  a  vehicle 
for  purposes  of  computer-aided  augmented  cognition.  Although 
Smith  and  Henning  may  be  correct  that  greater  emphasis  should 
be  placed  on  the  role  of  sensory  feedback  produced  by  actions, 
they  have  mistakenly  attributed  a  simplifying  tactic  used  by  re¬ 
searchers  to  fundamental  assumptions  underlying  the  human  in¬ 


formation-processing  approach.  Recollect  that  human  informa¬ 
tion  processing  has  its  roots  in  control  theory. 

Situated  Cognition 

Another  approach  that  has  been  advocated  recently  is  that  of  sit¬ 
uated  cognition,  or  situated  action.  Kiekel  and  Cooke  (2005) 
stated  that,  according  to  situated  cognition  theories,  “  [m]  uch  of 
what  symbolic  [information-processing]  theorists  assign  to  the 
individual’s  head  takes  place  outside  of  the  confines  of  the  in¬ 
dividual  and  is  directed  by  a  world  that  is  a  much  richer  place 
than  symbolic  theories  tend  to  suggest”  (p.  92).  They  go  on  to 
say,  “Information  processing  research  tends  to  isolate  psycho¬ 
logical  principles  from  a  generic  context,  such  as  a  laboratory 
.  .  .,  while  SA  [situated  action]  research  focuses  on  understand¬ 
ing  the  specific  contextual  constraints  of  the  environment”  (pp. 
92-93) .  The  extent  to  which  the  principles  and  theories  devel¬ 
oped  from  information-processing  research  conducted  in  the 
laboratory  generalize  to  other  contexts  is  an  empirical  issue.  Al¬ 
though  the  evidence  is  not  all  in,  the  widespread  application  of 
information-processing  principles  and  theories  to  human  fac¬ 
tors  and  HCI  (e.g.,  Wickens,  Lee,  Liu,  &  Becker,  2004)  suggests 
that  what  has  been  learned  about  human  information  process¬ 
ing  is  applicable  to  a  variety  of  domains. 


SUMMARY  AND  CONCLUSION 


The  methods,  theories,  and  models  in  human  information  pro¬ 
cessing  are  currently  well  developed.  The  knowledge  in  this 
area,  which  we  are  only  able  to  describe  at  a  surface  level  in  this 
chapter,  is  relevant  to  a  wide  range  of  concerns  in  HCI,  from  vi¬ 
sual  display  design  to  representation  and  communication  of 
knowledge.  For  HCI  to  be  effective,  the  interaction  must  be 
made  compatible  with  the  human  information-processing  ca¬ 
pabilities.  Cognitive  architectures  that  incorporate  many  of  the 
facts  about  human  information  processing  have  been  devel¬ 
oped  that  can  be  applied  to  HCI.  The  Model  Human  Processor 
of  Card  et  al.  (1983)  is  the  most  widely  known,  but  applica¬ 
tions  of  other  more  recent  architectures,  including  the  ACT 
model  of  Anderson  and  colleagues  (Anderson,  Matessa,  &  Le- 
biere,  1997),  the  SOAR  Model  of  Newell  and  colleagues  (Howes 
&  Young,  1997),  and  the  EPIC  Model  of  Kieras  and  Meyer  (1997), 
hold  considerable  promise  for  the  field,  as  demonstrated  in 
chapter  5  of  this  volume. 
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The  plan  for  this  chapter  is  as  follows  It  begins  by  reviewing  and 
discussing  the  term  “mental  models”  as  it  has  been  used  in  the 
literature  on  human-computer  interaction  (HCI),  and  in  the 
neighboring  disciplines  of  cognitive  psychology  where  it  was 
first  coined.  There  is  little  consensus  on  what  exactly  is  and  is 
not  a  mental  model,  and  yet  it  is  too  widely  used  for  any  posthoc 
attempt  at  a  narrower  definition  to  somehow  cleanse  the  field. 
In  consequence,  I  characterize  several  layers  of  theoretical  com¬ 
mitment  that  the  term  may  embrace,  following  an  earlier  dis¬ 
cussion  (Payne,  2003).  To  illustrate  the  argument,  several  clas¬ 
sic  and  more  recent  studies  from  the  HCI  literature  will  be 
reviewed,  with  pointers  to  others.  This  first  part  of  the  chapter 
is  based  on  material  published  in  Payne  (2003). 

In  cognitive  psychology,  mental  models  have  major  currency 
in  two  sub-disciplines — text  comprehension  and  reasoning,  al¬ 
though  in  the  former  they  more  often  currently  go  by  the  name 
“situation  models.”  Discussion  in  the  latter  focuses  on  quite  re¬ 
fined  theoretical  disputes  that  currently  have  little  relevance 
for  HCI.  The  work  on  text  comprehension,  however,  is  ger¬ 
mane.  With  the  advent  of  the  Web,  the  comprehension  of  text  of 
various  kinds  has  become  a  dominant  mode  of  HCI,  with  im¬ 
portant  design  issues  for  Websites,  digital  libraries,  and  so  on. 
Interaction  with  text  is  in  some  ways  a  paradigm  for  interaction 
with  information.  With  these  points  in  mind,  the  concept  of 
mental  models  in  text  comprehension  will  be  discussed,  with  a 
particular  eye  to  the  issues  that  HCI  accentuates,  such  as  un¬ 
derstanding  multiple  texts. 

Two  of  the  major  practical  questions  raised  by  mental  models 
are  (a)  how  are  they  acquired  and  how  can  their  acquisition  be 
supported  by  instruction?  The  third  section  of  this  chapter  will 
discuss  two  angles  on  these  questions  in  HCI:  first,  the  use  of 
interactive  computation  and  multimedia  as  an  instructional 
method;  second,  the  important  tension  between  exploration 
and  instruction,  first  systematically  discussed  in  the  HCI  litera¬ 
ture  by  Carroll’s  (1990)  work  on  minimalism. 

Finally,  the  paper  will  review  some  recent  work  on  the  im¬ 
portance  of  mental  models  for  understanding  aspects  of  collab¬ 
orative  teamwork.  This  area  suggests  that  a  relatively  expansive 
view  of  human  knowledge  representations  may  be  necessary  for 
progress  in  HCI. 

Throughout  the  chapter,  a  particular  approach  is  taken  to  re¬ 
view:  to  choose  one  or  two  key  studies  and  report  them  in  some 
detail.  I  hope  that  this  will  allow  some  of  the  empirical  method¬ 
ologies  and  the  rich  variation  in  these  to  be  conveyed.  The  cho¬ 
sen  studies  will  be  accompanied  by  some  further  references  to 
the  literature,  but  there  are  too  many  subtopics  reviewed  to  aim 
for  completeness. 


WHAT  IS  A  MENTAL  MODEL? 


The  user’s  mental  model  of  the  device  is  one  of  the  more 
widely  discussed  theoretical  constructs  in  HCI.  Alongside  wide- 
ranging  research  literature,  even  commercial  style  guides  have 
appealed  to  mental  models  for  guidance  (i.e.,  Mayhew,  1992; 
Tognazzini,  1992;  Apple  Human  Interface  Guidelines  Apple 
Computer  Inc.,  1987). 

Yet  a  casual  inspection  of  the  HCI  literature  reveals  that  men¬ 
tal  models  are  used  to  label  many  different  aspects  of  users’ 


knowledge  about  the  systems  they  use.  Nevertheless,  I  propose 
that  even  this  simple  core  construct — ^what  users  know  and  be¬ 
lieve  about  the  systems  they  use — is  worth  highlighting  and 
promoting.  It  is  more  distinctive  than  it  might  first  seem,  espe¬ 
cially  in  comparison  with  other  cognitive-science  approaches. 
Further,  beyond  the  core  idea  there  is  a  progression  of  stronger 
theoretical  commitments  that  have  been  mobilized  by  the  men¬ 
tal  models  label,  each  of  which  speaks  to  important  issues  in 
HCI  research,  if  not  yet  in  practice. 

The  fundamental  idea  is  that  the  contents  of  people’s  knowl¬ 
edge,  including  their  theories  and  beliefs,  can  be  an  important 
explanatory  concept  for  understanding  users’  behavior  in  rela¬ 
tion  to  systems.  This  idea  may  seem  obvious  and  straightfor¬ 
ward,  but  in  fact  it  suggests  research  questions  that  go  against 
the  grain  of  most  contemporary  cognitive  psychology,  which 
has  concerned  itself  much  more  with  the  general  limits  of  the 
human-information-processing  system,  such  as  the  constraints 
on  attention,  retrieval,  and  processing.  Thus,  cognitive  psychol¬ 
ogy  tends  to  focus  on  the  structure  of  the  mind,  rather  than  its 
contents.  (The  major  exception  to  the  rule  that  cognitive  psy¬ 
chology  has  been  obsessed  with  architecture  over  content  is  the 
work  on  expertise,  and  even  here,  recent  work  has  focused  on 
explanations  of  extreme  performance  in  terms  of  general  inde¬ 
pendent  variables  such  as  “motivated  practice,”  i.e.,  Ericsson, 
Krampe,  and  Tesch-Romer  (1993),  rather  than  epistemological 
analysis.) 

Refocusing  attention  on  mental  content  about  particular  do¬ 
mains  is  what  made  mental  models  a  popular  idea  in  the  early 
1980s,  such  as  the  papers  in  Gentner  and  Stevens  (1983).  For 
example,  work  on  naive  physics  (i.e.,  McClosky  1983)  attempts 
to  explain  people’s  reasoning  about  the  physical  world,  not  in 
terms  of  working  memory  limits  or  particular  representations, 
but  in  terms  of  their  beliefs  about  the  world,  such  as  the  nature 
of  their  theories  of  mechanics  or  electricity,  for  example.  This  fo¬ 
cus  on  people’s  knowledge,  theories,  and  beliefs  about  partic¬ 
ular  domains  transfers  naturally  to  questions  in  HCI,  where 
practical  interest  may  focus  on  how  users  conceive  the  workings 
of  a  particular  device;  how  their  beliefs  shape  their  interactive 
behavior,  and  what  lessons  may  be  drawn  for  design. 

In  this  mold,  consider  a  very  simple  study  of  my  own  (Payne, 
1991).  Students  were  interviewed  about  ATMs.  Following  Collins 
and  Gentner  (1987)  among  others,  “what  if”  questions  were 
posed  to  uncover  student’s  theories  about  the  design  and  func¬ 
tion  of  ATMs.  For  example,  students  were  asked  whether  ma¬ 
chines  sometimes  took  longer  to  process  their  interactions; 
what  information  was  stored  on  the  plastic  card;  and  what 
would  happen  if  they  “typed  ahead”  without  waiting  for  the  next 
machine  prompt. 

The  interviews  uncovered  a  wide  variety  in  student’s  beliefs 
about  the  design  of  ATMs.  For  example,  some  assumed  that  the 
plastic  card  was  written  to  as  well  as  read  from  during  transac¬ 
tions,  and  thus  could  encode  the  current  balance  of  their  ac¬ 
count.  Others  assumed  that  the  only  information  on  the  card 
was  the  user’s  personal  identification  number,  allowing  the  ma¬ 
chine  to  check  the  identity  of  the  user  (as  it  turns  out,  both 
these  beliefs  are  incorrect).  A  conclusion  from  this  simple  ob¬ 
servation  is  that  users  of  machines  are  eager  to  form  explana¬ 
tory  models  and  will  readily  go  beyond  available  data  to  infer 
models  that  are  consistent  with  their  experiences.  (One  might 
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wonder  whether  such  explanations  were  not  merely  ad  hoc, 
prompted  during  the  interview:  in  fact  some  were,  but  explicit 
linguistic  cues — such  as  “I’ve  always  thought” — strongly  sug¬ 
gested  that  many  were  not.) 

Another  observation  concerning  students  “models”  of  ATMs 
was  that  they  were  fragmentary,  perhaps  more  fragmentary  than 
the  term  “model”  might  ordinarily  connote:  they  were  collec¬ 
tions  of  beliefs  about  parts  of  the  system,  processes,  or  behav¬ 
iors,  rather  than  unified  models  of  the  whole  design.  Students 
would  happily  recruit  an  analogy  to  explain  one  part  of  the  ma¬ 
chine’s  operation  that  bore  no  relation  to  the  rest  of  the  system. 
This  fragmentary  character  of  mental  models  of  complex  sys¬ 
tems  may  be  an  important  aspect  (see  i.e.,  Norman,  1983), 
allowing  partial  understandings  to  be  maintained.  One  impli¬ 
cation  is  that  users’  mental  models  of  single  processes  or  oper¬ 
ations  might  be  a  worthwhile  topic  for  study  and  practical  in¬ 
tervention  (in  design  or  instruction). 

One  widely  held  belief  about  a  particular  process  affected 
the  students’  behavior  as  users.  Almost  all  respondents  believed 
that  it  was  not  possible  to  type  ahead  during  machine  pauses.  At 
the  time  the  study  was  conducted  this  was  true  for  some,  but 
not  all,  designs  in  use.  Consequently,  in  some  cases  transactions 
were  presumably  being  needlessly  slowed  because  of  an  aspect 
of  users’  mental  models. 

A  more  recent  study  of  a  similar  kind  is  an  investigation  of 
users’  models  of  the  navigation  facilities  provided  by  Internet 
browsers  (Cockburn  &  Jones,  1996).  Internet  browsers,  like  In¬ 
ternet  Explorer,  maintain  history  lists  of  recently  visited  pages, 
providing  direct  access  to  these  pages  without  needing  to  enter 
the  URL  or  follow  a  hyperlink.  The  “back”  and  “forward”  buttons 
provide  a  very  frequently  used  mechanism  for  browsing  history 
lists,  but  do  users  have  good  mental  models  for  how  they  work? 
Cockburn  and  Jones  (1996)  showed  that  many  do  not. 

The  history  list  of  visited  pages  can  be  thought  of  as  a  stack: 
a  simple  last-in-first-out  data  structure  to  which  elements  can  be 
added  (pushed)  or  taken  out  (popped)  only  from  the  top  (con¬ 
sider  a  stack  of  trays  in  a  canteen) .  When  a  new  web  page  is  vis¬ 
ited  by  following  a  hyperlink,  or  by  entering  a  URL,  its  address 
is  pushed  onto  the  top  of  the  stack.  This  is  true  even  if  the  page 
is  already  in  the  history  list,  so  that  the  history  list  may  contain 
more  than  one  copy  of  the  same  page.  However,  when  a  page 
is  visited  by  using  the  Back  button  (or,  at  least  typically,  by 
choosing  from  the  history  list),  the  page  is  not  pushed  onto  the 
stack.  So,  what  happens  when  the  currently  displayed  page  is 
not  at  the  top  of  the  stack  (because  it  has  been  visited  via  the 
history  list)  and  a  new  link  is  followed  (or  a  new  URL  entered)? 
The  answer  is  that  all  the  pages  in  the  history  list  that  were 
above  the  current  page  are  popped  from  the  stack,  and  the 
newly  visited  page  is  pushed  onto  the  stack  in  their  place.  For 
this  reason  the  history  list  does  not  represent  a  complete 
record,  or  time-line  of  visited  pages,  and  not  all  pages  in  the  cur¬ 
rent  browsing  episode  can  be  backed-up  to.  In  Cockburn  and 
Jones’  study,  few  users  appreciated  this  aspect  of  the  device. 

This  then,  has  been  the  major  thrust  of  work  on  mental 
models  in  HCI:  what  do  people  know  and  believe  to  be  true 
about  the  way  the  systems  they  interact  with  are  structured? 
How  do  their  beliefs  affect  their  behavior?  In  this  literature  a 
“mental  model”  is  little  more  than  a  pointer  to  the  relevant 
parts  of  the  user’s  knowledge,  yet  this  is  not  to  deny  its  useful¬ 


ness.  One  approach  that  it  has  engendered  is  a  typology  of 
knowledge — making  groupings  and  distinctions  about  types 
of  knowledge  that  are  relevant  in  certain  circumstances.  It  is 
in  exactly  this  way  that  a  literature  on  “shared  mental  models” 
as  an  explanatory  concept  in  teamwork  has  been  developed. 
This  topic  is  perhaps  the  most  rapidly  growing  area  of  mental 
models  research  in  HCI  and  will  be  reviewed  in  the  final  section 
of  this  chapter. 

However,  as  argued  at  length  in  Payne  (2003),  there  are  ap¬ 
proaches  to  mental  models  in  HCI  that  go  beyond  a  concern 
with  user  knowledge  and  beliefs  to  ask  more  nuanced  theoreti¬ 
cal  questions.  The  first  of  these  is  to  investigate  the  form  of 
mental  models  by  inspecting  the  processes  through  which  men¬ 
tal  models  might  have  their  effects  on  behavior. 

A  powerful  idea  here  is  that  mental  models  of  machines  pro¬ 
vide  a  problem  space  that  allows  more  elaborate  encoding  of  re¬ 
membered  methods,  and  in  which  novice  or  expert  problem 
solvers  can  search  for  new  methods  to  achieve  tasks. 

The  classic  example  of  this  approach  is  the  work  of  Halasz 
and  Moran  (1983)  on  Reverse  Polish  Notation  (RPN)  calculators. 
RPN  is  a  post-fix  notation  for  arithmetic,  so  that  to  express  3  + 
4,  one  would  write  3  4  -f .  RPN  does  away  with  the  need  for 
parentheses  to  disambiguate  composed  operations.  For  exam¬ 
ple  (1  -f  2)  *  3  can  be  expressed  12  3-3’^  with  no  ambiguity. 
RPN  calculators  need  a  key  to  act  as  a  separator  between 
operands,  which  is  conventionally  labeled  ENTER,  but  they  do 
not  need  an  =  key,  as  the  current  total  can  be  computed  and 
displayed  whenever  an  operator  is  entered. 

Halasz  and  Moran  taught  one  group  of  students  how  to  use 
an  RPN  calculator  using  instructions,  like  a  more  elaborate  ver¬ 
sion  of  the  introduction  above,  which  simply  described  the  ap¬ 
propriate  syntax  for  arithmetic  expressions.  A  second  group  of 
subjects  was  instructed,  using  a  diagram,  about  the  stack  model 
that  underlies  RPN  calculation.  Briefly,  when  a  number  is  keyed 
in,  it  is  “pushed”  on  top  of  a  stack-data  structure  (and  the  top 
slot  is  displayed) .  The  ENTER  key  copies  the  contents  of  the  top 
slot  down  to  the  next  slot.  Any  binary  arithmetic  operation  is 
always  performed  on  the  contents  of  the  top  two  slots  and  leads 
to  the  result  being  in  the  top  slot,  with  the  contents  of  slots  3 
and  below  moving  up  the  stack. 

Halasz  and  Moran  discovered  that  the  stack-model  instruc¬ 
tions  made  no  difference  to  participants’  ability  to  solve  rou¬ 
tine  arithmetic  tasks:  the  syntactic  “method-based”  instructions 
sufficed  to  allow  participants  to  transform  the  tasks  into  RPN  no¬ 
tation.  However,  for  more  creative  problems  (such  as  calculat¬ 
ing  (6  +  4)  and  (6  3-  3)  and  (6  3-  2)  and  only  keying  the  num¬ 
ber  6  once)  the  stack  group  was  substantially  better.  "Verbal 
protocols  showed  that  these  subjects  reasoned  about  such 
problems  by  mentally  stepping  through  the  transformations  to 
the  stack  at  each  keystroke. 

This  kind  of  reasoning,  stepping  through  a  sequence  of 
states  in  some  mental  model  of  a  machine,  is  often  called  “men¬ 
tal  simulation”  in  the  mental  models  literature,  and  the  kind  of 
model  that  allows  simulation  is  often  called  a  “surrogate” 
(Young,  1983;  Carroll  &  Olson,  1988).  From  a  practical  stand¬ 
point,  the  key  property  of  this  kind  of  reasoning  is  that  it  re¬ 
sults  in  behavior  that  is  richer  and  more  flexible  than  the  mere 
rote  following  of  learned  methods.  The  idea  that  the  same 
method  may  be  encoded  more  richly,  so  that  it  is  more  flexible 
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and  less  prone  to  forgetting  will  be  returned  to  later  in  the  chap¬ 
ter  when  a  theory  of  mental  models  of  interactive  artifacts  is 
considered,  and  when  ideas  about  instruction  for  mental  mod¬ 
els  are  reviewed. 

A  second  example  of  mental  models  providing  a  problem 
space  elaboration  of  rote  methods  comes  in  the  work  of  Kieras 
and  Bovair  (1984).  This  research  was  similar  to  that  of  Halasz 
and  Moran  (1983)  in  that  it  compared  the  learning  performance 
of  two  groups:  (a)  one  instructed  with  rote  procedures,  (b)  the 
other  additionally  with  a  diagrammatic  model  of  the  device  on 
which  the  procedures  were  enacted.  In  this  case,  the  device  was 
a  simple  control  panel,  in  which  each  rote  procedure  specified 
a  sequence  of  button-pushes  and  knob-positions  leading  to  a  se¬ 
quence  of  light-illuminations.  The  model  was  a  circuit  diagram 
showing  the  connections  between  power-source  switches  and 
display-lights. 

Kieras  and  Bovair  (1984)  found  that  the  participants  in¬ 
structed  with  the  model  learned  the  procedures  faster,  retained 
the  procedures  more  accurately,  executed  the  procedures 
faster,  and  could  simplify  inefficient  procedures  that  contained 
redundant  switch  settings.  They  argued  that  this  was  because 
the  model  (circuit  diagram)  explained  the  contingencies  in  the 
rote-action  sequences  (i.e.,  if  a  switch  is  set  to  MA,  so  that  the 
main  accumulator  circuit  is  selected,  then  the  FM,  fire  main, 
button  must  be  used). 

A  related  theoretical  idea  is  that  mental  models  are  a  special 
kind  of  representation,  sometimes  called  an  analog  represen¬ 
tation:  one  that  shares  the  structure  of  the  world  it  represents. 
This  was  taken  as  the  definitional  property  of  mental  models 
by  the  modern  originator  of  the  term,  the  British  psychologist 
Kenneth  Craik  (1943).  It  is  this  intuition  that  encourages  the  use 
of  terms  like  “mental  simulation” — the  intuition  that  a  mental 
model  is  like  a  physical  model,  approximating  the  structure  of 
what  it  represents,  just  as  a  model  train  incorporates  (aspects 
of)  the  physical  structure  of  a  train. 

The  idea  that  mental  models  are  analog  in  this  sense  is  a  de¬ 
finitional  property  in  the  work  on  reasoning  and  comprehen¬ 
sion  by  Johnson-Laird  (Johnson-Laird,  1983,  1989;  this  will  be 
further  discussed  in  part  2,  concerning  representations  of  text) 
and  also  in  the  theory  of  Holland,  Holyoak,  Nisbett,  and  Tha- 
gard  (1986)  and  Moray  (1999).  However,  there  are  different  nu¬ 
ances  to  the  claim,  which  must  be  considered.  And,  in  addition, 
there  is  a  vexed  question  to  be  asked;  namely,  what  is  the  ex¬ 
planatory  or  predictive  force  of  a  commitment  to  analog  repre¬ 
sentational  form?  Is  there  any  reason  for  HCI  researchers  to  pay 
attention  to  theoretical  questions  at  this  level? 

Certainly,  this  is  the  view  of  Moray  (1999)  who  is  concerned 
with  mental  models  of  complex  dynamic  systems,  such  as  in¬ 
dustrial  plants.  He  proposes  that  models  of  such  systems  are 
structure-sharing  homomorphisms  rather  than  isomorphisms, 
i.e.  they  are  many  to  one  rather  than  one-to-one  mappings  of 
objects,  properties,  and  relations.  (In  this  he  follows  Holland, 
Holyoak,  Nisbett,  &  Thagard,  1986). 

Homomorphic  models  of  dynamic  systems  may  not  share 
structure  with  the  system  at  the  level  of  static  relations,  but  only 
at  the  level  of  state-changes.  Thus,  such  models  have  the  char¬ 
acter  of  state-transition  diagrams,  making  the  empirical  conse¬ 
quences  of  structure  sharing  somewhat  unclear,  because  any 
problem  space  can  be  represented  in  this  way. 


In  my  view,  a  clearer  view  of  the  explanatory  force  of  analog 
mental  models  can  be  derived  by  carefully  considering  the  ideas 
of  computational  and  informational  equivalence  first  introduced 
by  Simon  (1978). 

It  is  obviously  possible  to  have  two  or  more  distinct  repre¬ 
sentations  of  the  same  information.  Call  such  representations 
“informationally  equivalent”  if  all  the  information  in  one  is 
inferable  from  the  other,  and  vice  versa.  Two  informationally 
equivalent  representations  may  or  may  not  additionally  be 
“computationally  equivalent,”  meaning  that  the  cost  structure  of 
accessing  and  processing  the  information  is  equivalent  in  both 
cases,  or,  as  Larkin  and  Simon  (1987)  put  it:  “information  given 
explicitly  in  the  one  can  also  be  drawn  easily  and  quickly  from 
the  information  given  explicitly  in  the  other,  and  vice  versa.”  As 
Larkin  and  Simon  point  out,  “easily”  and  “quickly”  are  not  pre¬ 
cise  terms,  and  so  this  definition  of  computational  equivalence 
is  inherently  somewhat  vague;  nevertheless  it  points  to  empi¬ 
rical  consequences  of  a  representation  (together  with  the 
processes  that  operate  upon  it)  that  depend  on  form,  and  there¬ 
fore  go  beyond  mere  informational  content. 

In  Payne  (2003),  I  propose  adopting  task-relative  versions  of 
the  concepts  of  informational  and  computational  equivalence. 
Thus,  representations  are  informationally  equivalent,  with  re¬ 
spect  to  a  set  of  tasks,  if  they  allow  the  same  tasks  to  be  per¬ 
formed  (i.e.  contain  the  requisite  information  for  those  tasks). 
The  representations  are,  additionally,  computationally  equiva¬ 
lent  with  respect  to  the  tasks  they  allow  to  be  performed,  if  the 
relative  difficulty  of  the  tasks  is  the  same,  whichever  represen¬ 
tation  is  being  used.  (Note  that  according  to  these  definitions, 
two  representations  might  be  computationally  equivalent  with 
regard  to  a  subset  of  the  tasks  they  support  but  not  with  regard 
to  the  total  set,  so  that  in  Larkin  and  Simon’s  sense  they  would 
merely  be  informationally  equivalent.  The  task-relative  versions 
of  the  constructs  thus  allow  more  finely  graded  comparisons 
between  representations.) 

This  idea  can  express  what  is  behaviorally  important  about 
the  idea  of  analog  models,  or  structure-sharing  mental  repre¬ 
sentations  of  a  state  of  affairs  of  a  dynamic  system.  An  analog 
representation  is  computationally  equivalent  (with  respect  to 
some  tasks)  to  external  perception  and  manipulation  of  the 
state  of  affairs  it  represents. 

Bibby  and  Payne  (1993;  1996)  exploited  this  distinction  be¬ 
tween  computational  and  informational  equivalence  in  the  do¬ 
main  of  HCI,  using  a  computer  simulation  of  a  device  derived 
from  that  studied  by  Kieras  and  Bovair  (1984).  The  device  was 
a  multiroute  circuit,  in  which  setting  switches  into  one  of  several 
configurations  would  make  a  laser  fire;  various  indicator  lights 
showed  which  components  of  the  circuit  were  receiving  power. 
What  concerned  Bibby  and  Payne  (1993)  was  the  idea  of  com¬ 
putational  equivalence  between  a  mental  model  and  a  diagram 
of  the  device,  rather  than  the  device  itself 

Bibby  and  Payne  asked  participants  to  repeatedly  perform 
two  types  of  tasks:  a  switch  task,  in  which  all  but  one  switch  was 
already  in  position  to  make  a  laser  fire  (the  participant  had  to  key 
the  final  switch)  and  a  fault  task,  in  which  the  pattern  of  indica¬ 
tor  lights  was  such  that  one  of  the  components  must  be  broken 
(the  participant  had  to  key  the  name  of  the  broken  component) . 

Participants  were  instructed  about  the  device  with  either  a 
table,  which  showed  the  conditions  under  which  each  indica- 
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tor  light  would  be  illuminated,  or  with  procedures,  sequences  of 
switch  positions  enabling  the  laser  to  be  fired.  Both  instructions 
were  sufficient  for  both  switch  and  fault  tasks;  they  were  infor¬ 
mationally  equivalent  with  respect  to  those  tasks.  However,  the 
table  made  the  fault  task  easier  than  the  switch  task,  whereas 
the  procedures  made  the  switch  task  easier. 

During  practice,  when  participants  consulted  the  instruc¬ 
tions,  this  pattern  of  relative  difficulty  was  confirmed  by  a 
crossover  interaction  in  response  times.  Furthermore,  when  the 
instructions  were  removed  from  the  participants,  so  that  they 
had  to  rely  on  their  mental  representation  of  the  device,  the 
crossover  interaction  persevered,  demonstrating  that  the  men¬ 
tal  representations  were  computationally  equivalent  to  the  ex¬ 
ternal  instructions. 

In  subsequent  experiments,  Bibby  and  Payne  (1996)  demon¬ 
strated  that  this  pattern  persevered  even  after  considerable  in¬ 
teraction  with  the  device  that  might  have  been  expected  to  pro¬ 
vide  an  opportunity  to  overcome  the  representational  constraints 
of  the  initial  instruction.  The  crossover  interaction  eventually 
disappeared  only  after  extended  practice  on  the  particular  fault- 
and-switch  task  (80  examples  of  each:  perhaps  because  of  as¬ 
ymptotic  performance  having  been  reached).  At  this  point, 
Bibby  and  Payne  introduced  two  similar  but  new  types  of  tasks 
designed  so  that  once  again,  the  table  favored  one  task  whereas 
procedures  favored  the  other.  (However,  the  device  instruc¬ 
tions  were  not  re-presented.)  At  this  point  the  crossover  re¬ 
appeared,  demonstrating  that  participants  were  consulting  their 
instructionally  derived  mental  model  of  the  device,  and  that  this 
was  still  in  a  form  computationally  equivalent  to  the  original 
external  representation  of  the  instructions. 

Practically,  this  research  shows  that  the  exact  form  of  in¬ 
structions  may  exert  long-lasting  effects  on  the  strategies  that 
are  used  to  perform  tasks,  so  that  designers  of  such  instructions 
must  be  sensitive  not  only  to  their  informational  content  but 
also  to  their  computational  properties.  In  this  light,  they  also 
suggest  that  one  instructional  representation  of  a  device  is  very 
unlikely  to  be  an  optimal  vehicle  for  supporting  all  user  tasks: 
it  may  well  be  better  to  provide  different  representations  of  the 
same  information,  each  tailored  to  particular  tasks.  In  this  sense, 
perhaps  instructions  should  mirror  and  exploit  the  natural  ten¬ 
dency,  noted  above,  for  users  to  form  fragmentary  mental  mod¬ 
els,  with  different  fragments  for  different  purposes. 

In  terms  of  theory,  Bibby  and  Payne’s  findings  lend  support  to 
the  suggestion  developed  above  that  mental  models  of  a  device 
that  are  formed  from  instructions  may  be  computationally  equiv¬ 
alent  to  the  external  representations  of  the  device.  This  idea 
gives  a  rather  new  reading,  and  one  with  more  ready  empirical 
consequences  to  the  theoretically  strong  position  that  mental 
models  are  essentially  analog,  homomorphic  representations. 


MENTAL  MODELS  OF  TEXT 
AND  OTHER  ARTIFACTS 


The  psychological  literature  on  text  comprehension  has  been 
transformed  by  the  idea  of  a  situation  model,  first  put  forward  as 
part  of  a  general  theory  of  text  comprehension  by  van  Dijk  and 
Kintsch  (1983),  and  developed  over  the  years  by  Kintsch  (e.g., 


1998)  and  followers.  The  central  idea  of  the  general  theory  is 
that  readers  construct  mental  representations  of  what  they  read 
at  several  different  levels.  First,  they  encode  the  surface  form 
of  the  text:  the  words  and  syntax.  Second,  they  go  beyond  this 
to  a  representation  of  the  propositional  content  of  the  text.  Fi¬ 
nally,  they  go  beyond  the  propositional  context  of  the  text  itself 
to  represent  what  the  text  is  about,  incorporating  their  world 
knowledge  to  construct  a  situation  model  or  mental  model  of 
the  described  situation. 

(Under  this  view,  it  is  the  content  that  distinguishes  a  situa¬ 
tion  model  from  a  text  base,  rather  than  a  representational  for¬ 
mat.  However,  some  researchers,  notably  Johnson-Laird  (1983), 
and  followers  have  pursued  the  idea  of  mental  models  derived 
from  text  as  analog  representations  of  the  described  situation. 
Thus,  in  text  comprehension,  there  is  a  version  of  the  issue  dis¬ 
cussed  in  part  one.) 

It  is  instructive  to  consider  some  of  the  evidence  for  situa¬ 
tion  models,  and  what  important  issues  in  text  comprehension 
the  theory  of  situation  models  allows  us  to  address. 

A  classic  early  study  was  conducted  by  Bransford,  Barclay, 
and  Franks  (1972).  They  asked  participants  to  read  simple  sen¬ 
tences  such  as. 

Three  turtles  rested  beside/on  a  floating  log,  and  a  fish  swam  beneath 
them. 

(The  slash  indicates  that  some  subjects  read  the  sentence 
with  the  word  “beside”,  and  others  read  the  same  sentence  with 
the  word  “on”). 

In  a  later  recognition  test,  interest  centered  on  how  likely 
readers  were  to  falsely  accept  minor  rewordings  of  the  original 
sentences.  In  the  above  case,  the  foil  sentence  was 

Three  turtles  rested  beside/on  a  floating  log,  and  a  fish  swam  beneath  it. 

The  key  finding  was  that  people  who  had  read  the  “on”  ver¬ 
sions  of  the  sentences  were  much  more  likely  to  accept  the 
changed  version  of  the  sentence,  despite  the  fact  that  that  at  the 
level  of  the  sentences  the  difference  between  original  and  foil 
sentences  in  the  two  conditions  is  identical,  limited  in  each  case 
to  the  last  word  of  the  sentence.  The  reason  for  false  recogni¬ 
tion  in  one  case  is  because,  in  this  case,  but  not  when  “on”  is 
replaced  by  “beside,”  the  original  and  foil  sentences  describe 
the  same  situation. 

A  related  series  of  experiments  was  reported  by  Fletcher  and 
Chrysler  (1990).  In  a  series  of  carefully  controlled  experiments, 
they  varied  the  overlap  between  sentences  in  a  recognition  test 
and  sentences  from  10  different  texts  read  by  the  participants. 
Each  text  described  a  state  of  affairs  (i.e.,  the  relative  cost  of  an¬ 
tiques)  consistent  with  a  linear  ordering  among  a  set  of  five  ob¬ 
jects.  They  found  that  participants  were  influenced  by  overlap 
between  sentences  at  study  and  test  corresponding  to  the  three 
levels  of  discourse  representation  proposed  by  van  Dijk  and 
Kintsch  (1983):  surface  form,  text  base,  and  situation  model. 
Recognition  performance  was  best  when  distracter  items  were 
inconsistent  with  all  three  levels  of  representation.  Recognition 
was  above  chance  when  distracters  violated  merely  the  surface 
form  of  the  original  sentences  (i.e.  substituting  rug  for  carpet). 
It  improved  further  when  propositional  information  from  the 
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text  base,  but  not  the  linear  ordering  of  the  situation,  was  vio¬ 
lated.  Recognition  was  best  of  all  when  the  distracters  were  in¬ 
consistent  with  the  situation  described  by  the  text.  This  suggests 
that  the  some  aspects  of  the  structure  of  the  situation  (in  this 
case  a  set  of  linear  orderings)  were  retained. 

Next,  consider  work  by  Radvansky  and  Zacks  (Radvansky  & 
Zacks,  1991;  Radvansky,  Spieler,  &  Zacks,  1993).  In  these  exper¬ 
iments,  participants  read  sentences  such  as,  “The  cola  machine 
is  in  the  hotel,”  each  of  which  specified  the  location  of  an  object. 
In  one  condition  sentences  shared  a  common  object  (i.e.  cola 
machine)  but  different  locations.  In  a  second  condition,  differ¬ 
ent  objects  share  a  common  location  (i.e.  the  city  hall).  Later  in 
the  experiment  participants  were  given  a  speeded-recognition 
test.  Radvansky  and  Zacks  found  a  significant  fan  effect  (Ander¬ 
son,  1974)  for  the  common  object  condition;  times  to  verify  sen¬ 
tences  increased  as  the  number  of  different  locations  rose.  For 
the  common  location  sentences  no  significant  fan  effect 
emerged.  This  was  interpreted  as  evidence  that  participants 
formed  mental  models  around  the  common  location  (a  repre¬ 
sentation  of  such  a  location  containing  all  the  specified  objects) 
and  retrieval  from  long-term  memory  was  organized  around 
these  mental  models.  It  is  impossible,  or  much  harder,  to  form 
such  a  representation  of  the  same  object  in  multiple  locations. 

What  all  these  studies,  and  many  like  them,  reveal  is  that 
when  understanding  text,  readers  spontaneously  construct  a 
mental  representation  that  goes  beyond  the  text  itself,  and  what 
it  means  and  use  inferences  to  construct  a  richer  model  of  what 
the  text  is  about — a  situation  model. 

Beyond  these  refined  and  clever,  but  undeniably  rather  nar¬ 
row  experimental  contexts,  the  construct  of  situation  model  has 
been  put  to  work  to  illuminate  some  practical  issues  concerning 
text  comprehension,  and  exactly  this  issue  will  be  returned  to 
later,  where  we  will  see  how  it  can  inform  attempts  to  under¬ 
stand  instructional  strategies  for  engendering  useful  mental 
models. 

There  are  two  principle  ways  in  which  the  literature  on  text 
comprehension  is  relevant  to  HCI.  First,  it  provides  support  for 
the  idea  that  a  mental  model  is  a  representation  of  what  a  rep¬ 
resentational  artifact  represents.  The  layered  model  of  text  com¬ 
prehension  previously  outlined  can  be  generalized  to  the  claim 
that  the  user  of  any  representational  artifact  must  construct  a 
representation  of  the  artifact  itself,  and  of  what  the  artifact  rep¬ 
resents,  and  of  the  mapping  between  the  two  (how  the  artifact 
represents).  This  is  the  basis  of  the  Yoked  State  Space  (YSS) 
hypothesis  (Payne,  Squibb,  &  Howes,  1990). 

If  a  reader’s  goal  is  just  to  understand  a  text,  as  it  was  in  the 
experiments  just  reviewed,  then  the  text-representation  can  be 
discarded  once  a  model  has  been  constructed.  However,  there 
are  many  tasks  of  text  use,  in  which  it  is  necessary  to  maintain  a 
representation  of  the  text,  alongside  a  mental  model  of  the 
meaning  of  the  text.  Consider,  for  example,  the  tasks  of  writing 
and  editing,  or  of  searching  for  particular  content  in  a  text.  In 
such  tasks,  it  is  necessary  to  keep  in  mind  the  relation  between 
the  surface  form  of  the  text — wording,  spatial  layout,  etc. — and 
its  meaning.  Text  is  a  representational  artifact,  and  to  use  it  in 
this  sense  one  needs  a  mental  representation  of  the  structure  of 
the  text,  and  of  the  “situation”  described  by  the  text  and  of  the 
mapping  between  the  two. 


According  to  the  Yoked  State  Space  hypothesis  (Payne, 
Squibb,  &  Howes,  1990),  this  requirement  is  general  to  all  rep¬ 
resentational  artifacts,  including  computer  systems.  To  use  such 
artifacts  requires  some  representation  of  the  domain  of  appli¬ 
cation  of  the  artifact — the  concepts  the  artifact  allows  you  to 
represent  and  process.  The  user’s  goals  are  states  in  this  do¬ 
main,  which  is  therefore  called  the  goal  space.  However,  states 
in  the  goal  space  cannot  be  manipulated  directly.  Instead,  the 
user  interacts  with  the  artifact,  and  therefore  needs  knowledge 
of  the  artifact,  and  of  the  operations  that  allow  states  of  the  ar¬ 
tifact  to  be  transformed.  Call  this  problem  space  the  device 
space.  In  order  to  solve  problems  in  the  goal  space  by  search¬ 
ing  in  the  device  space,  the  user  must  know  how  the  device 
space  represents  the  goal  space.  In  this  sense  the  two  spaces 
need  to  be  yoked.  The  minimal  device  space  for  a  certain  set  of 
tasks  must  be  capable  of  representing  all  the  states  in  the  cor¬ 
responding  goal  space.  More  elaborate  device  spaces  may 
incorporate  device  states  that  do  not  directly  represent  goal 
states,  but  which  allow  more  efficient  performance  of  tasks,  just 
as  the  stack  model  of  an  RPN  calculator  allows  an  elaboration 
of  methods  for  simple  arithmetic. 

The  work  of  Halasz  and  Moran  (1983)  can  readily  be  assimi¬ 
lated  into  the  YSS  framework.  The  no-model  condition  was  pro¬ 
vided  with  enough  information  to  translate  algebraic  expres¬ 
sions  into  their  Reverse  Polish  equivalent.  However,  in  this 
understanding  of  RP  expressions,  the  ENTER  key  was  given 
merely  an  operational  account,  serving  simply  as  a  separator  of 
operands,  and  did  not  transform  the  device  state.  The  stack 
model,  however,  provides  a  figurative  account  of  the  ENTER  key. 

This  discussion  illustrates  a  practical  lesson  for  the  design 
of  interfaces  and  instructions.  In  the  case  of  the  copy  buffer  and 
the  calculator  stack,  the  standard  interface  does  not  allow  the 
appropriate  device  space  readily  to  be  induced,  so  that  concep¬ 
tual  instructions  must  fill  the  gap.  The  obvious  alternative, 
which  has  been  developed  to  some  extent  in  both  cases,  is  to 
redesign  the  user  interface  so  as  to  make  the  appropriate  device 
space  visible.  These  examples  suggest  a  simple  heuristic  for  the 
provision  of  conceptual  instructions  that  may  help  overcome 
the  considerable  controversy  over  whether  or  not  such  instruc¬ 
tions  (as  opposed  to  simple  procedural  instructions)  are  useful 
(see,  i.e.,  "Wright,  1988).  According  to  this  heuristic,  conceptual 
instructions  will  be  useful  if  they  support  construction  of  a  YSS 
that  the  user  would  otherwise  have  difficulty  inducing  (Payne, 
Howes,  &  Hill,  1992). 

A  more  direct  way  in  which  text  comprehension  research  is 
relevant  to  HCI  is  that  so  much  HCI  is  reading  text.  Beyond  the 
standard  issues,  the  widespread  availability  of  electronic  texts 
raises  some  new  concerns  that  have  not  yet  seen  much  work, 
yet  are  perhaps  the  most  directly  relevant  to  HCI  design.  Two 
issues  stand  out:  (a)  the  usability  of  documents  that  incorporate 
multiple  media  alongside  text,  and  (b)  the  exploitation  by  read¬ 
ers  of  multiple  texts  on  the  same  topic. 

How  are  multimedia  “texts”  that  incorporate  graphics  com¬ 
prehended?  There  is  only  a  small  literature  on  this  within  the 
mainstream  field  of  text  comprehension,  but  this  literature  ex¬ 
ploits  the  idea  of  a  mental  model. 

Glenberg  and  Langston  (1992)  argued  that  the  widespread 
idea  that  diagrams  can  assist  the  comprehension  of  technical 
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text  had,  at  the  time,  been  little  tested  or  understood  and  that 
mental  models  were  an  important  explanatory  construct.  In 
their  analysis,  diagrams  are  useful  in  concert  with  texts  precisely 
because  they  assist  the  construction  of  mental  models.  This  idea 
has  been  pursued  in  a  very  active  program  of  work  on  multi- 
media  instruction  by  Mayer  and  colleagues,  which  will  be  re¬ 
viewed  in  the  next  section. 

What  about  when  the  multiple  sources  of  information  are 
not  presented  as  part  of  a  single  text,  but  rather  independently, 
covering  overlapping  ground,  so  that  the  reader  has  to  perform 
all  the  integration  and  mapping?  This  is  the  issue  of  multiple 
texts,  and  it  has  become  commonplace  in  the  age  of  the  Inter¬ 
net.  It  is  now  rarely  the  case  that  a  student  struggles  to  find  rel¬ 
evant  source  documents  on  a  topic.  Instead,  students  are  typi¬ 
cally  faced  with  an  overabundance  of  relevant  materials  and 
must  somehow  allocate  their  time  across  them,  and  integrate 
the  knowledge  they  derive  from  different  sources. 

Perfetti  (1997)  has  suggested  that  learning  from  multiple 
texts  is  one  of  the  most  important  new  challenges  for  text  re¬ 
searchers.  Research  has  shown,  for  example,  that  integrating 
information  across  multiple  texts  is  a  skill  that  does  not  come 
readily  but  can  be  acquired  and  taught  (Stahl,  Hind,  Britton, 
McNish,  &  Bosquet,  1996;  Rouet,  Favart,  Britt,  &  Perfetti, 
1997). 

The  YSS  theory  raises  important  issues  here.  As  previously 
noted,  everyday  reading  of  text  can  be  seen  as  engendering  a 
progression  of  mental  representations  moving  from  the  surface 
form  through  the  propositional  content  to  a  situation  model. 
When  reading,  earlier  representations  can  be  discarded  as  later 
ones  are  formed,  but  for  other  tasks  of  text  use,  the  reader 
needs  to  maintain  a  representation  of  the  form  of  the  multitext, 
and  map  this  form  onto  the  content.  Payne  and  Reader  (in 
press)  refer  to  such  a  representation  as  a  structure  map. 

The  usefulness  of  a  structure  map  becomes  even  more  ap¬ 
parent  when  multiple  texts  are  considered.  In  this  case,  struc¬ 
ture  maps  could  play  a  role  in  encoding  source  information, 
which  might  be  important  not  only  for  locating  information,  but 
also  for  integrating  diverse  and  potentially  contradictory  infor¬ 
mation  and  for  making  judgments  of  trust  or  confidence  in  the 
information.  Source  information  might  additionally  encode 
temporal  properties  of  information  sources,  and  thus  be  useful 
for  memory  updating — revising  knowledge  in  the  light  of  new 
information,  making  distinctions  between  current  and  super¬ 
seded  propositions. 

The  widespread  availability  of  the  Web  not  only  means  that 
multiple  texts  are  more  widely  encountered,  but  also  encour¬ 
ages  a  situation  where  multiple  texts  are  read  in  an  interleaved 
fashion,  in  a  single  sitting,  or  at  least  temporally  close,  raising 
the  importance  of  the  above  challenges,  and  meaning  that  re¬ 
cency  in  autobiographical  memory  is  unlikely  to  accomplish 
source  identification,  so  further  stressing  the  importance  of  a 
structure  map. 

Payne  and  Reader  (in  press)  studied  readers’  ability  to  search 
for  specific  ideas  in  multiple  texts  that  they  had  just  read.  They 
found  evidence  that  readers  spontaneously  constructed  struc¬ 
ture  maps,  as  just  described,  in  that  they  showed  some  mem¬ 
ory  of  which  documents  contained  which  ideas,  even  when  they 
did  not  expect  to  need  such  knowledge  when  reading  the  texts. 


INSTRUCTIONS  FOR  MENTAL  MODELS 


Multimedia  Instruction 

If  mental  models  are  important  for  operating  devices,  how 
should  they  best  be  taught?  We  have  seen  that  models  are  con¬ 
structed  automatically  by  readers  of  text,  but  can  modern  com¬ 
putational  media,  such  as  animations,  be  used  to  improve  the 
acquisition  of  mental  models  from  instructional  texts,  just  as 
Glenberg  and  Langston  (1992)  suggested  in  the  case  of  simple 
diagrams?  Just  such  a  question  has  been  addressed  in  a  long¬ 
standing  programme  of  work  by  Richard  Mayer  and  colleagues, 
which  will  be  reviewed  in  this  section. 

Mayer  and  Moreno  (2002)  present  a  cognitive  theory  of  multi- 
media  learning,  which  builds  on  three  main  ideas: 

1.  From  dual  coding  theory  the  authors  suppose  that  humans 
have  separate  visual  and  verbal  information  processing  sys¬ 
tems  (Clark  &  Paivio,  1991;  Paivio,  1986); 

2.  From  cognitive  load  theory  the  authors  assume  that  the  pro¬ 
cessing  capacity  of  both  the  visual  and  the  verbal  memory 
system  is  strictly  limited  (Baddeley  1992;  Chandler  &  Sweller, 
1991)  and  that  cognitive  load  during  instruction  can  interfere 
with  learning; 

3.  From  constructivist  learning  theory  the  authors  take  the  idea 
that  meaningful  learning  requires  learners  actively  to  select 
relevant  information,  to  structure  it  into  coherent  represen¬ 
tations,  and  make  connections  with  other  relevant  knowl¬ 
edge  (Mayer,  1996;  Mayer,  1999a). 

This  latter  process,  of  building  coherent  representations  that 
connect  information  from  different  modalities  with  pre-existing 
knowledge  bears  clear  relation  to  Johnson-Laird’s  construct  of 
mental  model,  and  indeed  Mayer  and  colleagues  use  the  term  in 
this  context.  In  the  case  of  the  physical  systems  that  many  of 
their  studies  have  addressed,  mental  models  may  take  the  form 
of  cause-effect  chains.  According  to  Mayer  and  Moreno  (2002) 
a  key  design  principle  for  instructional  materials  is  that  they 
should  maximise  the  opportunity  for  these  model-construction 
processes  to  be  completed. 

Mayer  and  colleagues  have  conducted  a  large  number  of  ex¬ 
periments  comparing  learning  from  multimedia  source  materials 
with  learning  from  components  of  these  materials  (words,  pic¬ 
tures,  etc)  successively  or  in  other  kinds  of  combination.  Based 
on  this  research,  Mayer  (1999b)  and  Mayer  and  Moreno  (2002) 
have  identified  some  principles  of  instructional  design  that  fos¬ 
ter  multimedia  learning. 

The  multiple  presentation  principle  states  that  explanations 
in  words  and  pictures  will  be  more  effective  than  explanations 
that  use  only  words.  (Mayer  &  Moreno,  2002,  p.  107).  When 
words  only  are  presented,  learners  may  find  it  difficult  to  con¬ 
struct  an  appropriate  mental  image,  and  this  difficulty  may  block 
effective  learning.  Mayer  and  Anderson  (1991;  Experiment  2b) 
compared  four  treatment  groups:  words  with  pictures,  words 
only,  pictures  only,  and  control,  on  tests  of  creative  problem 
solving  involving  reasoning  how  a  bicycle  pump  works.  Results 
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demonstrated  that  participants  in  the  words  with  pictures  group 
generated  a  greater  number  of  creative  problem  solutions  than 
did  participants  in  the  other  groups.  Interestingly,  animation 
without  narration  was  equivalent  to  no  instruction  at  all.  Other 
studies  have  offered  support  for  the  general  idea  that  learners 
will  acquire  richer  knowledge  from  narration  and  animation 
than  from  narration  alone  (Mayer  &  Anderson,  1991,  Experi¬ 
ment  2a;  Mayer  &  Anderson,  1992,  Experiments  1  and  2). 

The  contiguity  principle  is  the  claim  that  simultaneous  as 
opposed  to  successive  presentation  of  visual  and  verbal  materi¬ 
als  is  preferred  (Mayer  &  Moreno,  2002),  because  this  will  en¬ 
able  learners  to  build  referential  connections  more  readily 
(Mayer  &  Sims,  1994).  Mayer  and  Anderson  (1991,  Experiments 
1  and  2)  studied  a  computer-based  animation  of  how  a  bicycle 
pump  works.  They  compared  a  version  that  presented  words 
with  pictures  against  the  same  content  presenting  words  be¬ 
fore  pictures,  and  tested  acquisition  with  tests  of  creative  prob¬ 
lem  solving.  Those  in  the  words-with-pictures  group  generated 
about  50%  more  solutions  to  the  test  problems  than  did  sub¬ 
jects  in  the  words-before-pictures  group. 

The  individual  differences  principle  predicts  that  factors 
such  as  prior  knowledge  or  spatial  ability  will  influence  transfer 
of  learning  from  multimedia  materials,  moderating  the  effects  of 
other  principles  (Mayer,  1999).  With  regard  to  domain  specific 
knowledge,  Mayer  proposed  that  experienced  learners  may  suf¬ 
fer  little  decrease  in  problem  solving  transfer  when  receiving 
narration  and  animation  successively  because  their  background 
knowledge  will  allow  a  mental  model  to  be  constructed  from 
the  words  alone,  then  linked  to  the  visual  information.  Low- 
experience  learners  on  the  other  hand  will  have  no  means  to 
over-ride  the  effects  underlying  the  contiguity  principle,  and 
their  problem  solving  transfer  will  suffer  (Mayer  &  Sims,  1994). 
In  support  of  this  suggestion,  experimental  work  by  Mayer  and 
Gallini  (1990)  demonstrated  across  three  studies  that  the  syn¬ 
chronization  of  words  and  pictures  served  to  improve  transfer 
for  low-  but  not  high-experience  learners. 

The  chunking  principle  refers  to  a  situation  in  which  visual 
and  verbal  information  must  be  presented  successively,  or  al¬ 
ternately  (against  the  contiguity  principle).  It  states  that  learners 
will  demonstrate  better  learning  when  such  alternation  takes 
place  in  short  rather  than  long  segments.  The  reasoning  is 
straightforward,  given  the  assumptions  of  the  framework:  work¬ 
ing  memory  may  become  overloaded  by  having  to  hold  large 
chunks  before  connections  can  be  formed  (Mayer,  1999b).  An 
experiment  by  Mayer  and  Moreno  (1998)  investigated  this 
chunking  principle  using  explanations  of  how  lightning  storms 
develop.  The  ability  to  solve  novel,  transfer  problems  about 
lightning  exhibited  by  a  ‘large  chunk’  group  (who  received  all 
the  visual  information  before  or  after  all  the  verbal  information) 
was  compared  with  that  of  a  ‘small  chunk’  group  (alternating 
presentations  of  a  short  portion  of  visual  followed  by  a  short 
portion  of  narration).  The  gain  in  performance  of  the  small 
chunk  group  over  the  large  chunk  group  was  circa  100%  (Mayer 
&  Moreno,  1998). 

The  debt  of  Mayer’s  work  to  Sweller’s  programme  of  re¬ 
search  on  Cognitive  Load  Theory  is  obvious.  Mayer’s  design 
principles  reflect  the  premise  that  students  will  learn  more 
deeply  when  their  visual  and/or  verbal  memories  are  not  over¬ 
loaded.  Students  are  better  able  to  make  sense  of  information 


when  they  receive  both  verbal  and  visual  representations  rather 
than  only  verbal;  when  they  can  hold  relevant  visual  and  verbal 
representations  in  working  memory  at  the  same  time;  when 
they  have  domain  specific  knowledge  and/or  high  spatial  ability; 
and  when  they  receive  small  bits  of  information  at  a  time  from 
each  mode  of  presentation. 

Despite  incredibly  positive  research  results,  at  this  stage 
Mayer’s  work  should  be  viewed  with  a  little  caution.  Almost  all 
of  the  experiments  utilise  very  short  instructional  presentations, 
with  some  of  the  animations  lasting  only  30  seconds.  Subjects 
are  then  required  to  answer  problem-solving  questions  that 
seem  ambiguous,  requiring  students  to  be  fairly  creative  in  or¬ 
der  to  generate  solutions.  Mayer’s  work  also  typically  neglects  to 
include  any  tests  of  long-term  retention.  It  may  conceivably  be 
falling  into  the  instructional  trap  of  maximising  performance 
during  learning  at  the  expense  of  longer-term  performance. 
This  issue  is  the  focus  of  the  next  section. 

The  Theory  of  Learning  by  Not  Doing 

Mayer’s  theory  of  multimedia  instruction  adheres  to  the  com¬ 
mon  assumption  that  the  optimal  design  of  instructional  mate¬ 
rial  involves  minimizing  the  cognitive  burden  on  the  learner  due 
to  the  limits  of  the  working  memory. 

Yet  minimizing  the  mental  effort  of  learners  is  not  necessar¬ 
ily  or  always  a  good  instructional  strategy.  According  to  Schmidt 
and  Bjork  (1992),  instructional  conditions  that  achieve  the  train¬ 
ing  goals  of  generalizability  and  long-term  retention  are  not  nec¬ 
essarily  those  that  maximize  performance  during  the  acquisition 
phase. 

They  argue  that  the  goal  of  instruction  and  training  in  real- 
world  settings  should  first  be  to  support  a  level  of  performance 
in  the  long  term,  and  second  to  support  the  capability  to  trans¬ 
fer  that  training  to  novel-tasks  environments.  Methodologically, 
in  order  to  measure  a  genuine  learning  effect,  some  form  of 
long-term  assessment  of  retention  must  take  place;  skill  acqui¬ 
sition  is  not  a  reliable  indicator  of  learning. 

Schmidt  and  Bjork  (1992)  discussed  three  situations  in 
which  introducing  difficulties  for  the  learner  can  enhance  long¬ 
term  learning.  First,  studies  that  vary  the  scheduling  of  tasks 
during  practice  were  reported.  Random  practice  is  more  diffi¬ 
cult  than  blocked  schedules  of  practice,  as  a  given  task  is  never 
practiced  on  the  successive  trial.  Using  a  complex  motor  task 
involving  picking  up  a  tennis  ball  and  using  it  to  knock  over  a 
particular  set  of  barriers,  Shea  and  Morgan  (1979)  reported  a 
clear  advantage  for  subjects  who  practiced  under  blocked  con¬ 
ditions  (subsets  of  barriers  to  knock),  in  terms  of  performance 
during  practice.  However,  the  amount  of  learning  as  demon¬ 
strated  by  the  retention  phase  favored  the  random  condition. 
Similar  results  have  been  reported  by  Baddeley  and  Longman 
(1978),  Lee  and  Magill  (1983),  and  (with  verbal  tasks)  Landauer 
and  Bjork,  1978. 

Schmidt  and  Bjork  offer  an  explanation  for  this  paradigm,  in 
which  retrieval  practice  may  play  a  key  role.  They  suggest  that 
there  may  be  a  benefit,  in  terms  of  long-term  retention,  for  ac¬ 
tivities  that  actually  cause  forgetting  of  the  information  to  be 
recalled,  forcing  the  learner  to  practice  retrieving  this  informa¬ 
tion  (Bjork  &  Allen,  1970). 
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Experiments  that  vary  the  feedback  the  learner  receives  have 
demonstrated  a  similar  phenomenon.  A  study  by  Schmidt, 
Young,  Swinnen,  and  Shapiro  (1989)  demonstrated  that  delay¬ 
ing  the  feedback  that  subjects  received  during  motor  tasks  in¬ 
terfered  with  performance.  However,  on  a  delayed-retention 
test,  those  who  had  received  the  feedback  least  often  demon¬ 
strated  the  most  effective  performance.  This  seems  to  contra¬ 
dict  the  established  opinion  that  feedback  is  vital  for  effective 
learning.  Schmidt  and  Bjork  (1992)  suggested  that  frequent 
feedback  may  actually  serve  to  block  information-processing  ac¬ 
tivities  that  are  important  during  the  skill-acquisition  phase. 

A  final  area  reviewed  by  Schmidt  and  Bjork  concerns  the  in¬ 
troduction  of  variability  during  practice,  such  as  when  practicing 
tossing  a  beanbag  at  a  target  at  a  particular  distance.  Practicing 
at  variable  distances  is  more  effective  than  practicing  at  a  fixed 
distance  (Kerr  &  Booth,  1978). 

Does  the  Schmidt  and  Bjork  approach  extend  to  HCI  tasks, 
and  in  particular  to  instruction  for  mental  models? 

One  impressive  example  of  an  instructional  effect  in  the 
Schimdt  and  Bjork  (1992)  paradigm  is  informed  by  the  idea  of 
mental  models  or  situation  models  derived  from  text,  as  dis¬ 
cussed  in  part  2  of  this  chapter.  Informed  by  the  distinction  be¬ 
tween  a  text  base  and  a  situation  model,  work  by  McNamara, 
Kintsch,  Songer,  and  Kintsch  (1996)  has  shown  how  expository 
text  can  be  designed  to  introduce  difficulties  for  readers  in  ex¬ 
actly  the  productive  manner  advocated  by  the  Schmidt  and 
Bjork  conception  of  training.  These  authors  created  two  ver¬ 
sions  of  target  texts,  one  more  coherent  than  the  other  (one 
experiment  used  a  text  about  traits  of  mammals,  a  second  used 
a  text  about  heart  disease).  Coherence  cues  were  provided  by 
linking  clauses  with  appropriate  connectives  and  by  inserting 
topic  headings.  The  level  of  readers’  background  knowledge 
on  the  topic  of  the  text  was  also  assessed  with  a  pretest.  After 
reading  a  text,  participants  were  given  tests  of  the  text  base  (free 
recall  of  the  text  propositions  and  specific  factual  questions 
about  the  contents  of  the  text)  and  tests  of  the  situation  model 
(problem-solving-based  questions,  questions  requiring  infer¬ 
ences  from  the  text,  and  a  concept-sorting  task). 

McNamara  et  al.  (1996)  reported  that  for  measures  that 
tested  the  text  base,  the  high  coherence  texts  produced  better 
performance.  However,  for  situation-model  measures,  test  per¬ 
formance  for  high-knowledge  readers  was  better  when  they 
read  the  low-coherence  text.  McNamara  et  al.  argued  that  limit¬ 
ing  the  coherence  of  a  text  forced  readers  to  engage  in  com¬ 
pensatory  processing  to  infer  unstated  relations  in  the  text.  This 
compensatory  processing  supported  a  deeper  understanding  of 
the  text,  in  that  the  information  in  the  text  became  more  inte¬ 
grated  with  background  knowledge.  Thus,  for  high-knowledge 
readers,  the  texts  that  were  more  difficult  to  read  improved  the 
situation  model  by  encouraging  more  transfer-appropriate  pro¬ 
cessing.  Low-knowledge  readers  were,  presumably,  unable  to 
achieve  the  compensatory  inferences,  and  therefore  did  better 
with  more  coherent  texts.  Because  the  text  base  does  not  in¬ 
corporate  background  knowledge,  it  was  not  enhanced  by  any 
compensatory  processing.  (This  finding  is  related  to  the  work  of 
Mayer  and  Sims  (1994)  reviewed  above.) 

One  very  successful  practical  approach  to  the  design  of  in¬ 
structions  for  interactive  devices  which  is  well  known  in  the  HCI 
community,  is  perhaps  quite  strongly  related  to  this  more  theo¬ 


retically  oriented  work.  The  concept  of  a  “minimal  manual”  was 
outlined  by  Carroll  (1990).  It  sought  to  minimize  the  extent  to 
which  instructional  materials  obstruct  learning.  Crucially,  a  well- 
designed  Minimal  Manual  does  not  necessarily  optimize  the 
speed  at  which  users  can  perform  procedures  as  they  read.  Car¬ 
roll’s  manuals  avoided  explicit  descriptions  that  encouraged 
rapid  but  mindless  rote  performance.  Instead,  the  emphasis  was 
on  active  learning  whereby  learners  were  encouraged  to  gener¬ 
ate  their  own  solutions  to  meaningful  tasks.  This  process  was  fa¬ 
cilitated  in  part  by  reducing  the  amount  of  text  provided  and 
including  information  about  error  recovery. 

O’Hara  and  Payne  (1998,  1999)  argued  that  learning  from  a 
problem-solving  experience  might  be  enhanced  to  the  extent 
that  problem  solvers  planned  their  moves  through  the  prob¬ 
lem  space.  Many  puzzles  with  an  interactive  user  interface,  and 
indeed  many  user  interfaces  to  commercial  systems,  encour¬ 
age  a  one-step-at-a-time  approach  to  problem  solving,  in  which 
a  move  is  chosen  from  the  currently  available  set.  This  may  be 
quick  and  relatively  effortless,  yet  lead  to  little  learning  and  in¬ 
efficient  solutions.  For  example,  in  an  HCI  task,  participants  had 
to  copy  names  and  addresses  from  a  database  to  a  set  of  letters. 
Each  item  category  from  the  database  had  to  be  copied  to  sev¬ 
eral  letters,  so  that  the  most  obvious  and  perhaps  least  effortful 
strategy  of  preparing  letters  one  at  a  time  was  inefficient  in 
terms  of  database  access.  O’Hara  and  Payne’s  manipulation  was 
to  increase  the  cost  of  making  each  move  (in  the  copying  ex¬ 
periment  by  adding  a  system  lock-out  time).  This  resulted  in 
more  planning,  more  think-time  per  move,  meaning  slower  so¬ 
lutions  in  the  first  instance,  but  more  efficient  behavior  in  the 
longeterm,  and  the  discovery  of  strategies  that  required  fewer 
database  accesses  and  fewer  user  inputs. 

Recent  work  by  Duggan  and  Payne  (2001)  combined  several 
of  the  insights  in  the  work  just  reviewed  to  explore  acquisition 
of  interactive  procedures  during  instruction  following.  Good 
procedural  instructions  for  interactive  devices  must  satisfy  two 
criteria.  First,  they  must  support  performance.  Like  all  proce¬ 
dural  instructions  they  should  effectively  communicate  the  pro¬ 
cedure  they  describe,  so  as  to  allow  users  who  don’t  know  the 
procedure  to  enact  it  successfully  and  efficiently.  Second,  they 
must  support  learning.  In  common  with  instructions  for  all  pro¬ 
cedures  that  will  be  used  repeatedly  they  should  facilitate  sub¬ 
sequent  memory  for  the  procedure,  so  that  it  might  later  be  per¬ 
formed  without  consulting  the  instructions. 

How  might  procedural  instructions  be  designed  so  as  to  fol¬ 
low  the  Schmidt  and  Bjork  paradigm  and  provide  transfer- 
appropriate  practice  opportunities  for  the  learner?  Of  course, 
not  all  manipulations  that  introduce  difficulties  during  learning 
are  beneficial  for  the  learner.  Simply  making  the  instructions  un¬ 
clear  is  unlikely  to  be  effective.  However,  much  this  idea  may 
have  informed  the  design  of  some  commercial  user  manuals. 
The  criterion  that  quality  instructions  must  communicate  the 
procedure  that  they  describe  cannot  be  ignored. 

The  work  of  Diehl  and  Mills  (1995)  further  illustrated  the  rel¬ 
evance  of  the  theory  of  text  comprehension  to  the  design  of  in¬ 
struction  for  interactive  procedures.  They  argued  that  in  the 
case  of  procedural  instructions  the  distinction  between  situa¬ 
tion  model  and  text  base  maps  directly  onto  a  distinction  be¬ 
tween  memory  for  the  procedure  (as  tested  by  later  task  per¬ 
formance)  and  memory  for  the  instructions  themselves. 
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Texts  describing  how  to  complete  a  task  using  a  device  (setting 
an  alarm  clock  or  constructing  a  child’s  toy)  were  provided.  While 
reading  a  text,  participants  were  required  to  either  perform  the 
task  (read  and  do),  or  do  nothing  (read  only).  (In  addition,  Diehl 
and  Mills  studied  some  intermediate  conditions,  such  as  read  and 
watch  experimenter.  These  conditions  produced  intermediate  re¬ 
sults  and  are  not  relevant  to  the  current  argument.)  The  effect  of 
these  training  methods  was  then  examined  by  asking  participants 
to  recall  the  text  and  then  complete  the  task. 

Diehl  and  Mills  reported  that  the  increased  exposure  to  the 
device  in  the  read-and-do  condition  resulted  in  improved  task 
performance  times  relative  to  the  read-only  condition.  How¬ 
ever,  text  recall  was  better  in  the  read-only  condition,  support¬ 
ing  the  conceptual  separation  of  text  base  and  situation  model. 

Inspired  by  this  work,  Duggan  and  Payne  (2001)  introduced  a 
particular  technique  to  exploit  the  principle  of  Schmidt  and  Bjork 
(1992)  and  the  methods  of  McNamara  and  colleagues  (1996).  Like 
the  manipulations  of  Diehl  and  Mills  (1995) ,  their  innovation  cen¬ 
tered  not  on  the  design  of  the  instructions  per  se,  but  rather  on 
the  way  the  instructions  are  read  and  used.  Diehl  and  Mills’  re¬ 
ported  advantage  for  reading  and  doing  over  reading  alone  has 
no  real  practical  implication,  as  it  is  difficult  to  imagine  anyone  ad¬ 
vocating  isolated  reading  as  a  preferred  method.  However,  Dug¬ 
gan  and  Payne  suggested  that  the  way  learners  manage  the  inter¬ 
leaving  of  reading  and  doing  will  affect  their  later  retention,  and 
thus  offers  an  important  lever  for  improving  instruction. 

Many  procedural  instructions  have  a  natural  step-wise  struc¬ 
ture,  and  in  these  cases  it  is  possible  to  execute  the  procedure 
while  reading  with  minimal  load  on  memory.  Learners  can  read 
a  single  step,  and  then  execute  it  before  reading  the  next  step. 
Such  an  approach  is  low  in  effort  (and  therefore  attractive  to  the 
learner),  but  also  low  in  transfer-appropriate  practice  and  there¬ 
fore,  one  would  argue  on  the  basis  of  the  reviewed  work,  poor 
at  encouraging  retention.  If  learners  could  instead  be  prompted 
to  read  several  procedural  steps  before  enacting  them,  perfor¬ 
mance  would  be  made  more  effortful,  but  learning  might  ben¬ 
efit.  Readers  would  be  encouraged  to  integrate  the  information 
across  the  chunk  of  procedural  steps,  and  the  increased  mem¬ 
ory  load  would  provide  transfer-appropriate  practice. 

Duggan  and  Payne  (2001)  developed  this  idea  as  follows. 
First,  by  implementing  an  online  help  system  in  the  context  of 
experimental  tasks  (programming  a  ’VCR)  they  forced  partici¬ 
pants  into  either  a  step-wise  or  a  chunk-based  strategy  for  in¬ 
terleaving  reading  and  acting.  These  experiments  demonstrated 
that  reading  by  chunks  did  tax  performance  during  training,  but 
improved  learning,  in  particular  retention  of  the  procedure. 
Next,  they  developed  a  more  subtle,  indirect  manipulation  of 
chunking.  By  adding  a  simple  cost  to  the  access  of  online  in¬ 
structions  (c.f ,  O’Hara  &  Payne,  1998),  they  encouraged  readers 
to  chunk  steps  so  as  to  minimize  the  number  of  times  the  in¬ 
structions  were  accessed.  Just  as  with  enforced  chunking,  this 
led  to  improved  retention  of  the  procedures. 
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In  the  last  10  years  or  so  there  has  been  a  rapid  surge  of  interest 
in  the  concept  of  shared  mental  models  in  the  domain  of  team¬ 


work  and  collaboration.  The  use  of  mental  models  in  this  litera¬ 
ture,  to  date,  is  somewhat  inexact,  with  little  theoretical  force,  ex¬ 
cept  to  denote  a  concern  with  what  the  team  members  know,  be¬ 
lieve,  and  want.  As  the  name  suggests,  shared  mental  models 
refers  to  the  overlap  in  individuals’  knowledge  and  beliefs. 

The  central  thesis  and  motive  force  of  the  literature  is  that 
team  performance  will  improve  when  team  members  share  rel¬ 
evant  knowledge  and  beliefs  about  their  situation,  task,  equip¬ 
ment,  and  team.  Different  investigations  and  different  authors 
have  stressed  different  aspects  of  knowledge,  and  indeed  pro¬ 
posed  different  partitions  into  knowledge  domains.  (And  re¬ 
cently,  as  we  shall  see,  some  investigators  have  questioned  the 
extent  to  which  overlapping  knowledge  is  a  good  thing.  There 
are  some  situations  in  which  explicit  distribution  or  division  of 
knowledge  may  serve  the  team  goals  better.) 

At  first  glance,  the  idea  that  teams  need  to  agree  about  or 
share  important  knowledge  seems  intuitively  plain.  Models  of 
communication  i.e.,  Clark,  1992)  stress  the  construction  of  a 
common  ground  of  assumptions  about  each  partner’s  back¬ 
ground  and  intentions.  The  idea  of  shared  mental  models  de¬ 
velops  this  idea  in  a  plausible  practical  direction. 

A  recent  study  by  Mathieu,  Heffner,  Goodwin,  Salas,  and 
Cannon-Bowers  (2000)  was  one  of  the  most  compelling  demon¬ 
strations  of  the  basic  phenomenon  under  investigation,  as  well 
as  being  centered  on  an  HCI  paradigm.  For  these  reasons,  this 
study  will  be  described  and  used  as  a  framework  to  introduce 
the  space  of  theoretical  and  empirical  choices  that  characterize 
the  mainstream  of  the  shared  mental  models  literature. 

Mathieu,  Heffner,  Goodwin,  Salas,  and  Cannon-Bowers 
(2000)  considered  team  members’  mental  models  as  comprising 
knowledge  of  four  separate  domains:  (a)  technology  (essentially 
the  mental  models  described  in  part  one  of  this  chapter);  (b)  job 
or  task;  (c)  team  interaction  (such  as  roles,  communication 
channels  and  information  flow)  and  (d)  other  teammates’  knowl¬ 
edge  and  attitudes.  Knowledge  of  the  last  three  types  would 
rarely  be  called  a  mental  model  outside  this  literature,  and  so 
straight  away  we  can  see  a  broader  and  more  practical  orienta¬ 
tion  than  in  individually  oriented  mental  models  literatures. 

For  the  purposes  of  operationalization,  the  authors  suggested 
that  these  four  categories  of  knowledge  may  be  treated  as  two: 
task  related  and  team  related.  This  binary  distinction  mirrors  a 
distinction  that  has  been  made  in  terms  of  team  behaviors  and 
communications,  which  have  been  considered  in  terms  of  a  task 
track  and  a  teamwork  track  (McIntyre  &  Salas,  1995). 

Mathieu  and  colleagues  studied  dyads  using  a  PC-based 
flight  simulator.  One  member  of  each  dyad  was  assigned  to  the 
joystick  to  control  aircraft  position.  The  other  was  assigned  to 
keyboard,  speed,  weapon  systems,  and  information  gathering. 
Both  members  could  fire  weapons.  The  experimental  proce¬ 
dure  incorporated  a  training  phase,  including  the  task  and  ba¬ 
sics  of  teamwork,  and  then  the  flying  of  six  missions,  divided 
into  three  equally  difficult  blocks  of  two,  each  mission  lasting 
around  10  minutes.  Performance  on  a  mission  was  scored  in 
terms  of  survival,  route  following,  and  shooting  enemy  planes. 
Team  processes  were  scored  by  two  independent  raters  viewing 
videotapes  to  assign  scores,  for,  example,  how  well  the  dyad 
communicated  with  each  other. 

Mental  models  were  measured  after  each  pair  of  missions. 
At  each  measurement  point,  each  individual’s  task  or  team 
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mental  model  was  elicited  by  the  completion  of  a  relatedness 
matrix  (one  for  task,  one  for  team),  in  which  the  team  member 
rated  the  degree  to  which  each  pair  from  a  set  of  dimensions 
was  related.  For  the  task  model  there  were  eight  dimensions, 
including  diving  versus  climbing;  banking  or  turning;  and 
choosing  airspeed.  For  the  team  model  there  were  seven  di¬ 
mensions,  including  amount  of  information  and  roles  and 
team  spirit. 

Thus,  at  each  measurement  point,  participants  had  to  assign 
numbers  between  —4  (negatively  related,  a  high  degree  of  one 
requires  a  low  degree  of  the  other)  and  -1-4  (positively  related, 
a  high  degree  of  one  requires  a  high  degree  of  the  other)  to 
each  pair  of  dimensions  in  each  domain.  For  example,  they  had 
to  rate  the  relatedness  of  diving  versus  climbing  to  choosing  air¬ 
speed,  and  the  relatedness  of  roles  to  team  spirit.  For  each  team 
at  each  time  for  each  model-type  a  convergence  index  was  cal¬ 
culated  by  computing  a  correlation  co-efficient  (QAP  correla¬ 
tion)  between  the  two  matrices.  The  co-efficient  could  vary 
from  —  1  (complete  disagreement)  to  -FI  (completely  shared 
mental  models). 

The  main  findings  of  this  investigation  were  as  follows.  Con¬ 
trary  to  hypothesis,  convergence  of  mental  models  did  not  in¬ 
crease  over  time;  rather  it  was  stable  across  missions  1  to  3.  This 
runs  counter  to  a  major  and  plausible  assumption  of  the  shared 
mental  models  program,  which  is  that  agreement  between  team 
members’  should  increase  with  extent  of  communication  and 
collaboration. 

Nevertheless,  convergence  of  both  task  and  team  models 
predicted  the  quality  of  team  process  and  the  quality  of  perfor¬ 
mance.  Further,  the  relationship  between  convergence  and  per¬ 
formance  was  fully  mediated  by  quality  of  team  process. 

The  most  natural  interpretation  of  these  findings  is  that  team 
process  is  supported  by  shared  mental  models.  In  turn,  good 
team  processes  lead  to  good  performance.  According  to  its  au¬ 
thors,  this  study  provided  the  first  clear  empirical  support  for 
the  oft-supposed  positive  relationship  between  shared  mental 
models  and  team  effectiveness  (Mathieu  et  al.,  2000,  p.  280). 

As  well  as  being  paradigmatic  in  illustrating  the  key  ideas  in 
the  shared  mental  models  literature,  this  study  has  several  as¬ 
pects  that  highlight  the  range  of  approaches  and  the  contro¬ 
versy  in  the  field. 

First,  it  is  worth  considering  what  particular  properties  of  the 
task  and  teams  may  have  contributed  to  the  positive  relation  be¬ 
tween  shared  mental  models  and  team  process  and  perfor¬ 
mance.  Compared  with  most  situations  in  which  coordination 
and  collaboration  are  of  prime  interest,  including  most  situa¬ 
tions  addressed  by  CSCW  researchers,  the  teams  studied  by 
Matheiu  et  al.  were  minimal  (two  members)  and  the  tasks  were 
very  short  term  and  relatively  circumscribed.  Beyond  these  ob¬ 
vious  remarks,  I  would  add  that  the  division  of  labor  in  the  task 
was  very  “close,”  and  the  workers’  performance  was  extremely 
interdependent.  Of  course,  interdependence  is  the  signature 
of  collaborative  tasks;  nevertheless  a  situation  in  which  one  per¬ 
son  controls  airspeed  and  another  controls  altitude  may  make 
this  interdependence  more  immediate  than  is  the  norm. 

It  is  also  possible  that  the  relatively  circumscribed  nature  of 
the  task  and  collaboration  contributed  to  the  failure  of  this 
study  to  find  evidence  for  the  sharing  of  mental  models  in¬ 
creasing  across  the  duration  of  collaboration. 


As  just  mentioned,  although  the  literature  contains  many 
proposals  that  shared  mental  models  will  positively  influence 
process  and  performance,  there  has  been  much  less  empirical 
evidence.  Another  study  of  particular  relevance  to  HCI  is  con¬ 
cerned  with  the  workings  of  software  development  teams. 

Software  development  is  an  ideal  scenario  for  the  study  of 
team  coordination  for  several  reasons.  First,  much  modern 
software  development  is  quintessentially  team  based  (Crow- 
ston  &  Kammerer,  1998;  Curtis,  Krasner,  &  Iscoe,  1998;  Kraut  & 
Streeter,  1995),  and  relies  heavily  on  the  complex  coordina¬ 
tions  of  team  members.  Secondly,  this  effort  is  often  geograph¬ 
ically  dispersed,  further  stressing  collaboration  and  putting  an 
emphasis  on  communications  technologies.  Finally,  software 
development  takes  place  in  technologically  advanced  settings 
with  technologically  savvy  participants,  so  that  it  provides 
something  of  a  test  bed  for  collaboration  and  communication 
technologies. 

One  study  of  complex  geographically  distributed  software 
teams  has  been  reported  that  partially  supports  the  findings  of 
the  Mathieu  et  al.  (2000)  study  and  provided  complementary  ev¬ 
idence  for  positive  effects  of  shared  mental  models  on  team  per¬ 
formance.  Espinosa,  Kraut,  Slaughter,  Lerch,  Herbsleb,  and 
Mockus  (2002)  reported  a  multimethod  investigation  of  soft¬ 
ware  teams  in  two  divisions  of  a  multinational  telecommunica¬ 
tions  company.  The  most  relevant  aspect  of  their  study  was  a 
survey  of  97  engineers  engaged  in  team  projects  of  various  sizes 
ranging  from  2  to  7.  Team  coordination  and  shared  mental  mod¬ 
els  (SMM)  were  both  measured  by  simple  survey  items,  fol¬ 
lowed  by  posthoc  correlational  analysis  to  uncover  the  relation 
between  shared  mental  models  and  team  process.  As  in  the 
Mathieu  et  al.  (2000)  study,  shared  mental  models  were  consid¬ 
ered  in  two  categories:  task  and  team.  A  positive  relation  be¬ 
tween  team  SMM  and  coordination  was  discovered,  but  the 
effect  of  task  SMM  was  not  significant. 

It  is  worth  being  clear  about  the  positive  relation  and  how  it 
was  computed.  Team  SMM  was  computed  for  each  team  by 
assessing  the  correlations  between  each  team  member’s  re¬ 
sponses  to  the  team  SMM  survey  items.  This  index  was  entered 
as  an  independent  variable  in  a  multiple  regression  to  predict 
average  reported  levels  of  team  coordination.  It  is,  of  course, 
hard  to  infer  any  causal  relation  from  such  correlational  analy¬ 
ses,  and  one  might  also  wonder  about  the  validity  of  purely 
questionnaire-based  measures  of  some  of  the  constructs,  yet 
nevertheless  the  study  is  highly  suggestive  that  SMM  can  have 
a  positive  influence  in  group-work  situations  far  removed  from 
pairs  of  students  interactive  with  a  flight  simulator.  Addition¬ 
ally,  Espinosa,  Kraut,  Slaughter,  Lerch,  Herbsleb,  and  Mockus 
(2002)  reported  an  interview  study  in  which  respondents  con¬ 
firmed  their  own  belief  that  SMM  contributed  positively  to  pro¬ 
ject  communications  and  outcomes. 

Nevertheless,  Espinosa  et  al.  (2002)  failed  to  find  any  relation 
between  task  SMM  and  team  process.  It  seems  to  me  that,  in 
view  of  the  survey  methodology,  this  would  have  been  the  more 
compelling  evidence  in  favor  of  the  SMM  construct.  It  seems 
less  surprising  and  perhaps  less  interesting  that  there  should 
be  a  correlation  between  participants’  survey  responses  con¬ 
cerning  how  well  they  communicated  on  their  team,  and,  for 
example,  their  agreement  about  which  teammates  had  high 
knowledge  about  the  project. 
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Levesque,  Wilson,  and  Wholey  (2001)  reported  a  different 
study  of  software  development  teams,  using  ad  hoc  student- 
project  groupings  to  study  whether  sharing  of  Team  SMM 
increased  over  time.  They  only  measured  Team  SMM,  using  Lik¬ 
ert  scale  items  on  which  participants  signaled  amount  of  agree¬ 
ment  or  disagreement  with  statements  like,  “Most  of  our  team’s 
communication  is  about  technical  issues,”  “Voicing  disagree¬ 
ment  on  this  team  is  risky,”  or  “Lines  of  authority  on  this  team 
are  clear.”.  Team  SMM  was  measured  by  computing  correlations 
among  team  members  of  these  responses  after  1,  2,  and  3 
months  of  working  on  a  joint  project. 

Levesque,  Wilson,  and  Wholey  (2001)  found  that,  contrary  to 
their  hypothesis,  team  SMM  decreased  over  time.  They  argue 
that  this  is  because  projects  were  managed  by  a  division  of  labor 
thath  required  much  initial  collaboration  but  meant  that  later 
activity  was  more  individual. 

There  are  surely  many  teamwork  situations  in  which  role  dif¬ 
ferentiation  is  critical  for  success,  and  this  observation  sug¬ 
gested  that  the  most  straightforward  interpretation  of  shared 
mental  models  is  overly  simple.  Indeed,  even  in  teams  that  con¬ 
tinue  to  meet,  communicate,  and  collaborate,  it  maybe  that  role 
differentiation  means  that  task  mental  models  should  not  so 
much  be  “shared”  as  “distributed”  to  allow  for  effective  team 
performance.  (Studies  of  intimate  couples  have  explored  a  sim¬ 
ilar  process  of  specialization  of  memory  functions,  under  the 
name  “transactive  memory,”  i.e.,  Wegner,  1987,  1995). 

When  roles  are  differentiated,  it  is  no  longer  important  that 
task  knowledge  is  shared,  but  rather  that  individuals’  knowl¬ 
edge  about  who  knows  what  is  accurate.  Thus,  one  would  ex¬ 
pect  team  SMMs  to  support  communication  and  collaboration 
even  in  teams  with  highly  differentiated  roles.  This  may  explain 
the  findings  reviewed  above.  In  the  Mathieu  et  al.  study,  the 
team  members’  technical  roles  remained  tightly  interdepen¬ 
dent,  so  that  both  task  and  team  models  had  to  be  shared  for 
successful  performance.  In  the  Espinosa  et  al.  (2002)  study,  the 
technical  roles  may  have  been  differentiated  but  the  level  of 


communication  remained  high,  so  that  team  SMM  affected  per¬ 
formance  but  task  SMM  did  not.  In  the  Levesque  et  al.  study,  the 
teams  divided  their  labor  to  the  extent  that  communication  and 
collaboration  ceased  to  be  necessary  (apart,  perhaps  for  some 
final  pooling  of  results).  In  this  case,  we  would  predict  that  nei¬ 
ther  task  nor  team  SMMs  would  affect  performance  once  the 
division  of  labor  had  been  accomplished.  No  data  on  perfor¬ 
mance  were  reported,  but  team  models  became  less  shared 
over  the  course  of  the  projects. 

Although  there  has  been  quite  a  sudden  flurry  of  interest  in 
shared  mental  models,  this  brief  review  makes  clear  that  much 
empirical  and  conceptual  work  remains  to  be  done.  Of  particu¬ 
lar  relevance  to  this  chapter  is  the  question  of  what  exactly  is 
meant  by  a  mental  model  in  this  context. 

To  date  throughout  the  field,  mental  models  have  been  con¬ 
sidered  as  semantic  knowledge,  using  traditional  associative 
networks  as  a  representation.  Thus,  mental  models  have  typi¬ 
cally  been  tapped  using  simple  likert  scales  or  direct  questions 
about  the  relations  (i.e.  similarity)  between  constructs,  analyzed 
with  multidimensional  techniques  such  as  pathfinder  (for  a  re¬ 
view  of  measurement  techniques  in  this  field,  see  Mohammed, 
Kilmoski,  &  Rentsch,  2000).  Because  interest  has  focused  on  the 
extent  to  which  knowledge  and  beliefs  are  common  among 
team  members,  these  approaches  have  been  useful,  allowing 
quantitative  measures  of  similarity  and  difference.  Nevertheless, 
compared  with  the  literature  on  individual  mental  models,  they 
tend  to  reduce  participants’  understanding  to  mere  associa¬ 
tions,  and  yet  the  thrust  of  the  individual  work  shows  that  this 
may  not  be  appropriate,  because  the  particular  conceptualiza¬ 
tions  of  the  domain,  the  analogies  drawn,  the  computational  as 
well  as  informational  relations  between  internal  and  external 
representations  etc.,  can  have  real  effects  on  performance.  It 
seems  that  an  important  path  of  development  may  be  to  adopt 
this  more  refined  cognitive  orientation  and  investigate  the  im¬ 
pact  of  shared  models — as  opposed  to  shared  networks  of  facts 
and  associations — on  collaboration. 
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Emotion  is  a  fundamental  component  of  being  human.  Joy, 
hate,  anger,  and  pride,  among  the  plethora  of  other  emotions, 
motivate  action  and  add  meaning  and  richness  to  virtually  all 
human  experience.  Traditionally,  human-computer  interaction 
(HCI)  has  been  viewed  as  the  “ultimate”  exception;  users  must 
discard  their  emotional  selves  to  work  efficiently  and  rational¬ 
ity  with  computers,  the  quintessentially  unemotional  artifact. 
Emotion  seemed  at  best  marginally  relevant  to  HCI  and  at  worst 
oxymoronic. 

Recent  research  in  psychology  and  technology  suggests  a 
different  view  of  the  relationship  between  humans,  computers, 
and  emotion.  After  a  long  period  of  dormancy  and  confusion, 
there  has  been  an  explosion  of  research  on  the  psychology  of 
emotion  (Gross,  1999).  Emotion  is  no  longer  seen  as  limited  to 
the  occasional  outburst  of  fury  when  a  computer  crashes  inex¬ 
plicably,  excitement  when  a  video  game  character  leaps  past  an 
obstacle,  or  frustration  at  an  incomprehensible  error  message. 
It  is  now  understood  that  a  wide  range  of  emotions  plays  a  crit¬ 
ical  role  in  every  computer-related,  goal-directed  activity,  from 
developing  a  three-dimensional  (3D)  CAD  model  and  running 
calculations  on  a  spreadsheet,  to  searching  the  Web  and  send¬ 
ing  an  e-mail,  to  making  an  online  purchase  and  playing  soli¬ 
taire.  Indeed,  many  psychologists  now  argue  that  it  is  impossi¬ 
ble  for  a  person  to  have  a  thought  or  perform  an  action  without 
engaging,  at  least  unconsciously,  his  or  her  emotional  systems 
(Picard,  1997b). 

The  literature  on  emotions  and  computers  has  also  grown 
dramatically  in  the  past  few  years,  driven  primarily  by  advances 
in  technology.  Inexpensive  and  effective  technologies  that  en¬ 
able  computers  to  assess  the  physiological  correlates  of  emo¬ 
tion,  combined  with  dramatic  improvements  in  the  speed  and 
quality  of  signal  processing,  now  allow  even  personal  computers 
to  make  judgments  about  the  user’s  emotional  state  in  real  time 
(Picard,  1997a).  Multimodal  interfaces  that  include  voices,  faces, 
and  bodies  can  now  manifest  a  much  wider  and  more  nuanced 
range  of  emotions  than  was  possible  in  purely  textual  interfaces 
(Cassell,  Sullivan,  Prevost,  &  Churchill,  2000).  Indeed,  any  in¬ 
terface  that  ignores  a  user’s  emotional  state  or  fails  to  manifest 
the  appropriate  emotion  can  dramatically  impede  performance 
and  risks  being  perceived  as  cold,  socially  inept,  untrustworthy, 
and  incompetent. 

This  chapter  reviews  the  psychology  and  technology  of  emo¬ 
tion,  with  an  eye  toward  identifying  those  discoveries  and  con¬ 
cepts  that  are  most  relevant  to  the  design  and  assessment  of  in¬ 
teractive  systems.  The  goal  is  to  provide  the  reader  with  a  more 
critical  understanding  of  the  role  and  influence  of  emotion,  as 
well  as  the  basic  tools  needed  to  create  emotion-conscious  and 
consciously  emotional  interface  designs. 

The  “seat”  of  emotion  is  the  brain;  hence,  we  begin  with  a 
description  of  the  psychophysiological  systems  that  lie  at  the 
core  of  how  emotion  emerges  from  interaction  with  the  envi¬ 
ronment.  By  understanding  the  fundamental  basis  of  emotional 
responses,  we  can  identify  those  emotions  that  are  most  read¬ 
ily  manipulable  and  measurable.  We  then  distinguish  emotions 
from  moods  (longer-term  affective  states  that  bias  users’  re- 
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FIGURE  4. 1 .  Neurological  structure  of  emotion. 

sponses  to  any  interface)  and  other  related  constructs.  The  fol¬ 
lowing  section  discusses  the  cognitive,  behavioral,  and  attitudi- 
nal  effects  of  emotion  and  mood,  focusing  on  attention  and 
memory,  performance,  and  user  assessments  of  the  interface. 
Designing  interfaces  that  elicit  desired  affective  states  requires 
knowledge  of  the  causes  of  emotions  and  mood;  we  turn  to  that 
issue  in  the  following  section.  Finally,  we  discuss  methods  for 
measuring  affect,  ranging  from  neurological  correlates  to  ques¬ 
tionnaires,  and  describe  how  these  indicators  can  be  used  both 
to  assess  users  and  to  manifest  emotion  in  interfaces. 


UNDERSTANDING  EMOTION 


what  is  emotion?  Although  the  research  literature  offers  a 
plethora  of  definitions  (Kleinginna  &  Kleinginna,  1981),  two 
generally  agreed-upon  aspects  of  emotion  stand  out:  (a)  emo¬ 
tion  is  a  reaction  to  events  deemed  relevant  to  the  needs,  goals, 
or  concerns  of  an  individual;  and,  (b)  emotion  encompasses 
physiological,  affective,  behavioral,  and  cognitive  components. 
Pear,  for  example,  is  a  reaction  to  a  situation  that  threatens  (or 
seems  to  threaten,  as  in  a  frightening  picture)  an  individual’s 
physical  well-being,  resulting  in  a  strong  negative  affective  state, 
as  well  as  physiological  and  cognitive  preparation  for  action.  Joy, 
on  the  other  hand,  is  a  reaction  to  goals  being  fulfilled  and  gives 
rise  to  a  more  positive,  approach-oriented  state. 

A  useful  model  for  understanding  emotion,  based  on  a  sim¬ 
plified  view  of  LeDoux’s  (1996)  work  in  neuropsychology,  is 
shown  in  Fig.  4.1.  There  are  three  key  regions  of  the  brain  in  this 
model:  (a)  the  thalamus,  (b)  the  limbic  system,  and  (c)  the  cor¬ 
tex.  All  sensory  input  from  the  external  environment  is  first  re¬ 
ceived  by  the  thalamus,  which  functions  as  a  basic  signal  proces¬ 
sor.  The  thalamus  then  sends  information  simultaneously  both 
to  the  cortex,  for  “higher-level”  processing,  and  directly  to  the 
limbic  system  (LeDoux,  1995).  The  limbic  system,^  often  called 
the  “seat  of  emotion,”  constantly  evaluates  the  need/goal  rele¬ 
vance  of  its  inputs.  If  relevance  is  determined,  the  limbic  system 
sends  appropriate  signals  both  to  the  body,  coordinating  the 
physiological  response,  and  also  to  the  cortex,  biasing  atten¬ 
tion  and  other  cognitive  processes. 


The  limbic  system  is  often  considered  to  include  the  hypothalamus,  the  hippocampus,  and  the  amygdala.  According  to  LeDoux,  the  amygdala  is 
the  only  critical  area  (LeDoux  &  Phelps,  2000). 


4.  Emotion  in  Human-Computer  Interaction  •  79 


The  direct  thalamic-limbic  pathway  is  the  mechanism  that  ac¬ 
counts  for  the  more  primitive  emotions,  such  as  startle-based 
fear,  as  well  as  innate  aversions  and  attractions.  Damasio  (1994) 
called  these  the  “primary”  emotions.  In  an  HCI  context,  on¬ 
screen  objects  and  events  have  the  potential  to  activate  such 
primitive  emotional  responses  (Reeves  &  Nass,  1996).  For  ex¬ 
ample,  objects  that  appear  or  move  unexpectedly  (i.e.  pop-up 
windows,  sudden  animations)  and  loud  or  sharp  noises  are 
likely  to  trigger  startle-based  fear.  Visual  stimuli  that  tend  to  be 
particularly  arousing  include  images  that  fill  a  large  fraction  of 
the  visual  field  (either  because  the  image  or  screen  is  large  or 
because  the  eyes  are  close  to  the  screen;  (Detenber  &  Reeves, 
1996;  Voelker,  1994)),  images  that  seem  to  approach  the  user 
(i.e.,  a  rapidly  expanding  image  on  the  screen,  an  image  that 
appears  to  be  flying  out  from  the  screen,  or  a  character  that 
walks  toward  the  user),  and  images  that  move  in  peripheral  vi¬ 
sion  (i.e.,  on  the  side  of  the  screen;  (Reeves  &  Nass,  1996)).  Fi¬ 
nally,  certain  images  and  sounds  may  be  innately  disturbing  or 
pleasing  due  to  their  evolutionary  significance  (i.e.,  screeching 
or  crying  noises  or  explicit  sexual  or  violent  imagery;  (see,  i.e., 
Lang,  1995;  Malamuth,  1996)). 

Most  of  the  emotions  that  we  are  concerned  with  in  the  de¬ 
sign  of  HCI — and  the  ones  we  will  focus  on  in  the  remainder  of 
this  chapter — require  more  extensive  cognitive  (i.e.,  knowl¬ 
edge-based)  processing.  These  “secondary”  emotions,  such  as 
frustration,  pride,  and  satisfaction,  result  from  activation  of  the 
limbic  system  by  processing  in  the  cortex.  Such  cortical  pro¬ 
cessing  can  occur  at  various  levels  of  complexity,  from  simple 
object  recognition  (i.e.,  seeing  the  Microsoft  Office  Paperclip)  to 
intricate  rational  deliberation  (i.e.,  evaluating  the  consequences 
of  erasing  a  seldom-used  file),  and  may  or  may  not  be  con¬ 
scious.  The  cortex  can  even  trigger  emotion  in  reaction  to  in¬ 
ternally  generated  stimuli  (i.e.,  thinking  about  how  difficult  it 
will  be  to  configure  a  newly  purchased  application). 

Finally,  an  emotion  can  result  from  a  combination  of  both 
the  thalamic-limbic  and  the  cortico-limbic  mechanisms.  For  ex¬ 
ample,  an  event  causing  an  initial  startle/fear  reaction  can  be 
later  recognized  as  harmless  by  more  extensive,  rational  evalu¬ 
ation  (i.e.,  when  you  realize  that  the  flash  of  your  screen  sud¬ 
denly  going  blank  is  just  the  initiation  of  the  screen  saver).  In 
other  situations,  higher-level  processing  can  reinforce  an  ini¬ 
tial  evaluation.  Whatever  the  activation  mechanism — thalamic 
or  cortical,  conscious  or  nonconscious — the  cortex  receives  in¬ 
put  from  an  activated  limbic  system,  as  well  as  feedback  from 
the  body,  both  contributing  to  the  conscious  “experience”  of 
emotion. 

The  previous  discussion  provides  a  useful  framework  for 
considering  one  of  the  classic  debates  in  emotion  theory:  are 
emotions  innate  or  learned?  At  one  extreme,  evolutionary  the¬ 
orists  argue  that  all  emotions  (including  complex  emotions 
such  as  regret  and  relief)  are  innate,  each  evolved  to  address 
a  specific  environmental  concern  of  our  ancestors  (Darwin, 
1872/1998;  Neese,  1990;  Tooby  &  Cosmides,  1990;  see  also  Ek- 
man,  1994;  Izard,  1992).  These  theories  are  consistent  with  a  hy¬ 
pothesis  of  high  differentiation  within  the  limbic  system,  corre¬ 
sponding  to  each  of  the  biologically  determined  emotions. 
From  this  perspective,  it  is  also  reasonable  to  speculate  that 
each  emotion  is  associated  with  a  unique  set  of  physiological 
and  cognition-biasing  responses. 


At  the  other  extreme,  many  emotion  theorists  argue  that, 
with  the  exception  of  startle  and  innate  affinity/disgust  (which 
they  would  consider  pre-emotional);  emotions  are  almost 
entirely  learned  social  constructions  (Averill,  1980;  Ortony  & 
Turner,  1990;  Shweder,  1994;  Wierzbicka,  1992).  Such  theories 
emphasize  the  role  of  higher  cortical  processes  in  differentiating 
emotions  and  concede  minimal,  if  any,  specificity  within  the  lim¬ 
bic  system  (and  consequently,  within  physiological  responses). 
For  example,  the  limbic  system  may  operate  in  simply  an  on/off 
manner,  or  at  most  be  differentiated  along  the  dimensions  of  va¬ 
lence  (positive/negative  or  approach/avoidance)  and  arousal 
(low/high)  (Barrett  &  Russell,  1999;  Lang,  1995).  From  this  per¬ 
spective,  emotions  are  likely  to  vary  considerably  across  cul¬ 
tures,  with  any  consistency  being  based  in  common  social  struc¬ 
ture,  not  biology. 

Between  these  two  extremes  lie  those  who  believe  that  there 
are  “basic  emotions.”  Citing  both  cross-cultural  universals  and 
primate  studies,  these  theorists  contend  that  there  is  a  small 
set  of  innate,  basic  emotions  shared  by  all  humans  (Ekman, 
1992;  Oatley  &  Johnson-Laird,  1987;  Panksepp,  1992).  Which 
emotions  qualify  as  basic  is  yet  another  debate,  but  the  list  typ¬ 
ically  includes  fear,  anger,  sadness,  joy,  disgust,  and  sometimes 
also  interest  and  surprise.  Other  emotions  are  seen  either  as 
combinations  of  these  basic  emotions  or  as  socially  learned  dif¬ 
ferentiations  within  the  basic  categories  (i.e.,  agony,  grief,  guilt, 
and  loneliness  are  various  constructions  of  sadness;  (Bower, 
1992).  In  this  view,  the  limbic  system  is  prewired  to  recognize 
the  basic  categories  of  emotion,  but  social  learning  and  higher 
cortical  processes  still  play  a  significant  role  in  differentiation. 

If  the  “basic  emotions”  view  is  correct,  a  number  of  implica¬ 
tions  for  interaction  design  and  evaluation  emerge.  First,  the  ba¬ 
sic  categories  would  likely  be  the  most  distinguishable,  and 
therefore  measurable,  emotional  states  (both  in  emotion  recog¬ 
nition  systems  as  well  as  in  postinteraction  evaluations).  Fur¬ 
ther,  the  basic  emotions  would  be  less  likely  to  vary  significantly 
from  culture  to  culture;  facilitating  the  accurate  translation  and 
generalizability  of  questionnaires  intended  to  assess  such  emo¬ 
tions.  Lower  variability  also  enables  more  reliable  prediction  of 
emotional  reactions  to  interface  content,  both  across  cultures 
and  across  individuals.  Finally,  for  users  interacting  with  on¬ 
screen  characters,  depictions  of  the  basic  emotions  would  pre¬ 
sumably  be  most  immediately  recognizable.  If  the  social  con¬ 
struction  view  of  emotions  is  valid,  then  emotion  measurement 
and  assessment,  prediction,  and  depictions  are  more  challeng¬ 
ing  and  nuanced. 


DISTINGUISHING  EMOTION 
FROM  RELATED  CONSTRUCTS 


Mood 

It  is  useful  to  distinguish  among  several  terms  often  used  am¬ 
biguously:  emotion,  mood,  and  sentiment.  Emotion  can  be  dis¬ 
tinguished  from  mood  by  its  object-directedness.  As  Erijda 
(1994)  explained,  emotions  are  intentional-.  They  “imply  and 
involve  relationships  with  a  particular  object.”  We  get  scared  of 
something,  angry  with  someone,  and  excited  about  some  event. 
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Moods,  on  the  other  hand,  though  they  may  be  indirectly 
caused  by  a  particular  object,  are  “nonin ten tional”;  they  are  not 
directed  at  any  object  in  particular  and  are  thus  experienced  as 
more  diffuse,  global,  and  general.  A  person  can  be  sad  about 
something  (an  emotion)  or  generally  depressed  (a  mood).  Un¬ 
fortunately,  the  English  language  often  allows  the  same  term  to 
describe  both  emotion  and  mood  (i.e.,  “happy”). 

Another  distinction  between  emotion  and  mood  emerges 
from  a  functional  perspective.  As  a  reaction  to  a  particular  situ¬ 
ation,  emotions  bias  action — they  prepare  the  body  and  the 
mind  for  an  appropriate,  immediate  response.  As  such,  emo¬ 
tions  also  tend  to  be  relatively  short  lived.  Moods,  in  contrast, 
tend  to  bias  cognitive  strategies  and  processing  over  a  longer 
term  (Davidson,  1994).  More  generally,  moods  can  be  seen  to 
serve  as  a  background  affective  filter  through  which  both  inter¬ 
nal  and  external  events  are  appraised.  A  person  in  a  good  mood 
tends  to  view  everything  in  a  positive  light,  while  a  person  in  a 
bad  mood  does  the  opposite.  The  interaction  between  emo¬ 
tions  and  moods  is  also  important.  Moods  tend  to  bias  which 
emotions  are  experienced,  lowering  the  activation  thresholds 
for  mood-related  emotions.  Emotions,  on  the  other  hand,  often 
cause  or  contribute  to  moods. 

When  assessing  user  response  to  an  interface,  it  is  impor¬ 
tant  to  consider  the  biasing  effects  of  user  mood.  Users  entering 
a  usability  or  experimental  study  in  a  good  mood,  for  instance, 
are  more  likely  to  experience  positive  emotion  during  an  inter¬ 
action  than  users  in  a  bad  mood.  Pretesting  for  mood  and  in¬ 
cluding  it  as  a  variable  in  analysis  can,  therefore,  reduce  noise 
and  increase  interpretive  power.  If  pretesting  users  immediately 
prior  to  an  interaction  is  inappropriate,  there  is  a  second  noise- 
reducing  option:  assessment  of  temperament.  Temperament  re¬ 
flects  the  tendency  of  certain  individuals  to  exhibit  particular 
moods  with  great  frequency.  Participants  can  be  pretested  for 
temperament  at  any  point  prior  to  the  study,  enabling  the  ex¬ 
clusion  of  extreme  cases  of  depressive  or  excitable  individuals 
(i.e..  Bishop,  Jacks,  &  Tandy,  1993).  Finally,  if  user  testing  in¬ 
volves  multiple  stimuli,  order  of  presentation  can  also  influence 
the  results.  For  example,  earlier  stimuli  may  establish  a  mood 
that  biases  emotional  reactions  to  subsequent  stimuli.  To  com¬ 
bat  this  problem,  the  order  of  stimuli  should  be  varied  from  par¬ 
ticipant  to  participant,  when  feasible. 

Sentiment 

Sentiment  is  also  often  confused  with  emotion.  Unlike  emo¬ 
tions  (and  moods),  sentiments  are  not  states  of  an  individual, 
but  assigned  properties  of  an  object.  When  people  say  that  they 
“like”  an  interface  or  find  an  interface  to  be  “frustrating,”  what 
they  really  mean  is  that  that  they  associate  the  interface  with  a 
positive  or  frustrating  emotional  state;  in  other  words,  they  ex¬ 
pect  interaction  with  the  interface  to  lead  to  positive  or  frus¬ 
trating  emotions.  The  basis  for  this  judgment  often  comes  from 
direct  experience  and  subsequent  generalization,  but  may  also 
arise  from  social  learning  (Frijda,  1994). 

One  reason  for  the  confusion  between  emotions  and  senti¬ 
ment  is  that  many  languages  use  the  same  words  for  both.  For 
example,  the  word  “like”  can  be  used  both  to  indicate  predic¬ 


tion  or  opinion  (sentiment)  as  well  as  a  current  emotional  state 
(i.e.  “I  like  receiving  e-mail”  vs.  “I  like  the  e-mail  that  just  ar¬ 
rived”).  Clore  (1994,  p.  108)  offers  an  interesting  explanation  for 
this  ambiguity,  theorizing  that  sentiments  are  judged  by  bring¬ 
ing  the  object  to  mind  and  observing  the  affective  reaction.  But, 
while  emotions  and  moods  are  fleeting — emotions  last  only  sec¬ 
onds  and  moods  last  for  hours  or  even  days — sentiments  can 
persist  indefinitely  and  are  thus  responsible  for  guiding  our 
propensities  to  seek  out  or  avoid  particular  objects  and  situa¬ 
tions.  In  this  sense,  sentiments  are  of  critical  importance  for  HCI 
because  they  motivate  users  to  return  to  particular  software 
products  or  Websites. 

Although  direct  interaction  with  an  object  is  the  most  accu¬ 
rate  way  for  a  user  to  create  a  sentiment  (consider  the  colloquial 
phrase,  “how  do  you  know  you  don’t  like  it  unless  you  try  it”), 
sentiments  can  also  be  caused  by  assumptions  based  on  the 
communicated  properties  of  an  object.  People  may,  for  exam¬ 
ple,  base  a  sentiment  on  someone  else’s  description  of  their  in¬ 
teraction  with  the  object,  or  even  immediately  adopt  the  senti¬ 
ment  of  someone  they  know  or  respect  (i.e.,  consider  the 
presumed  influence  of  celebrities  in  software  advertisements) . 

As  a  predictive  construct,  sentiments  are  often  generaliza¬ 
tions  about  a  class  of  objects  with  a  given  recognizable  prop¬ 
erty,  i.e.,  stereotypes.  Although  some  of  these  generalizations 
may  be  logical  and  accurate,  others  may  not — in  fact,  they  may 
not  even  be  conscious.  Negative  experiences  with  a  particular 
computer  character,  for  example,  may  lead  users  to  conclude 
that  they  dislike  all  character-based  interfaces.  However,  using 
a  character  that  people  know  and  like  already — Mickey  Mouse, 
for  example — may  be  able  to  leverage  sentiment  to  an  inter¬ 
face’s  advantage.  Similarly,  many  people  have  well-established 
sentiments  regarding  certain  types  of  applications  (i.e.  “I  hate 
spreadsheet  applications”).  For  such  users,  interfaces  that  avoid 
triggering  their  negative  stereotypes  have  the  advantage.  Positive 
stereotypes,  on  the  other  hand,  should  be  encouraged  when¬ 
ever  possible,  such  as  when  learning  applications  are  framed  as 
entertainment. 


EFFECTS  OF  AFFECT 


Attention 

One  of  the  most  important  effects  of  emotion  lies  in  its  ability  to 
capture  attention.  Emotions  have  a  way  of  being  completely  ab¬ 
sorbing.  Eunctionally,  they  direct  and  focus  our  attention  on 
those  objects  and  situations  that  have  been  appraised  as  impor¬ 
tant  to  our  needs  and  goals  so  that  we  can  deal  with  them  ap¬ 
propriately.  Emotion-relevant  thoughts  then  tend  to  dominate 
conscious  processing — the  more  important  the  situation,  the 
higher  the  arousal,  and  the  more  forceful  the  focus  (Clore  & 
Gasper,  2000).  In  an  HCI  context,  this  attention-getting  function 
can  be  used  advantageously,  as  when  a  sudden  beep  is  used  to 
alert  the  user,  or  can  be  distracting,  as  when  a  struggling  user 
is  frustrated  and  can  only  think  about  his  or  her  inability. 

Emotion  can  further  influence  attention  through  a  sec¬ 
ondary  process  of  emotion  regulation  (Gross,  1998).  Once  an 
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emotion  is  triggered,  higher  cognitive  processes  may  determine 
that  the  emotion  is  undesirable.  In  such  cases,  attention  is  often 
directed  away  from  the  emotion-eliciting  stimulus  for  the  pur¬ 
pose  of  distraction.  For  example,  becoming  angry  with  an  on¬ 
screen  agent  may  be  seen  as  ineffectual  (i.e.,  because  it  doesn’t 
recognize  your  anger)  or  simply  unreasonable.  An  angered  user 
may  then  actively  try  to  ignore  the  agent,  focusing  instead  on 
other  onscreen  or  off-screen  stimuli,  or  even  take  the  next  step 
and  completely  remove  the  agent  from  the  interaction  (which 
could  mean  leaving  an  application  or  "Website  entirely) .  Positive 
emotions  may  likewise  require  regulation  at  times,  such  as 
when  amusing  stimuli  lead  to  inappropriate  laughter  in  a  work 
environment.  If  the  emotionally  relevant  stimulus  is  too  arous¬ 
ing,  however,  regulation  through  selective  attention  is  bound  to 
fail  (Wegner,  1994),  because  users  will  be  unable  to  ignore  the 
stimulus. 

Mood  can  have  a  less  profound  but  more  enduring  effect 
on  attention.  At  the  most  basic  level,  people  tend  to  pay  more 
attention  to  thoughts  and  stimuli  that  have  some  relevance  to 
their  current  mood  state  (Bower  &  Forgas,  2000).  However, 
people  also  often  consciously  regulate  mood,  selecting  and  at¬ 
tending  to  stimuli  that  sustain  desired  moods  or,  alternatively, 
counteract  undesired  moods.  An  interface  capable  of  detect¬ 
ing — or  at  least  predicting — a  user’s  emotional  or  mood  state 
could  similarly  assume  an  affect-regulation  role,  helping  to 
guide  attention  away  from  negative  and  toward  more  positive 
stimuli.  For  example,  a  frustrated  user  could  be  encouraged  to 
work  on  a  different  task,  focus  on  a  different  aspect  of  the  prob¬ 
lem  at  hand,  or  simply  take  a  break  (perhaps  by  visiting  a  sug¬ 
gested  online  entertainment  site). 

Memory 

Emotion’s  effect  on  attention  also  has  implications  for  memory. 
Because  emotion  focuses  thought  on  the  evoking  stimulus, 
emotional  stimuli  are  generally  remembered  better  than  un¬ 
emotional  events  (Thorson  &  Friestad,  1985).  Negative  events, 
which  tend  to  be  highly  arousing,  are  typically  remembered 
better  than  positive  events  (Newhagen  &  Reeves,  1991,  1992; 
Reeves  &  Nass,  1996,  Chapter  10;  Reeves,  Newhagen,  Maibach, 
Basil,  &  Kurz,  1991).  In  addition,  emotionality  “improves  memory 
for  central  details  while  undermining  memory  for  background 
details”  (see  Heuer  &  Reisberg,  1992;  Parrott  &  Spackman, 
2000). 

Mood  also  comes  into  play  in  both  memory  encoding  and  re¬ 
trieval.  Research  has  shown  that  people  will  remember  “mood- 
congruent”  emotional  stimuli  better  than  incongruent  stimuli. 
Bower,  Gilligan,  and  Monteiro  (1981),  for  example,  hypnotized 
subjects  into  either  a  happy  or  sad  mood  before  having  them 
read  stories  about  various  characters.  The  next  day,  subjects 
were  found  to  remember  more  facts  about  characters  whose 
mood  had  agreed  with  their  own  than  about  other  characters. 
Similarly,  on  the  retrieval  end,  people  tend  to  better  recall  mem¬ 
ories  consistent  with  their  current  mood  (Ellis  &  Moore,  1999). 
However,  the  reverse  effect  has  also  been  shown  to  occur  in 
certain  situations;  people  will  sometimes  better  recall  mood- 
incongruent  memories  (i.e.,  happy  memories  while  in  a  sad 


mood).  Parrott  and  Spackman  (2000)  hypothesized  that  mood 
regulation  is  responsible  for  this  inverse  effect:  "When  a  given 
mood  is  seen  as  inappropriate  or  distracting,  people  will  often 
actively  try  to  evoke  memories  or  thoughts  to  modify  that  mood 
(see  Forgas,  1995)  Affect  Infusion  Model  (AIM)  for  insight  into 
these  contradictory  findings  (also  see  Erber  &  Erber,  2001)).  Ei- 
nally  there  is  some  evidence  for  mood-dependent  recall:  Mem¬ 
ories  encoded  while  in  a  particular  mood  are  better  recalled 
when  in  that  same  mood.  This  effect  is  independent  of  the  emo¬ 
tional  content  of  the  memory  itself  (Ucros,  1989).  It  should  be 
noted,  however,  that  the  effects  of  mood  on  memory  are  often 
unreliable  and  therefore  remain  controversial. 

Performance 

Mood  has  also  been  found  to  affect  cognitive  style  and  perfor¬ 
mance.  The  most  striking  finding  is  that  even  mildly  positive  af¬ 
fective  states  profoundly  affect  the  flexibility  and  efficiency  of 
thinking  and  problem  solving  (Hirt,  Melton,  McDonald,  & 
Harackiewicz,  1996;  Isen,  2000;  Murray,  Sujan,  Hirt,  &  Sujan, 
1990).  In  one  of  the  best-known  experiments,  subjects  were 
induced  into  a  good  or  bad  mood  and  then  asked  to  solve 
Duncker’s  (1945)  candle  task.  Given  only  a  box  of  thumbtacks, 
the  goal  of  this  problem  was  to  attach  a  lighted  candle  to  the 
wall,  such  that  no  wax  drips  on  the  floor.  The  solution  required 
the  creative  insight  to  thumbtack  the  box  itself  to  the  wall  and 
then  tack  the  candle  to  the  box.  Subjects  who  were  first  put  into 
a  good  mood  were  significantly  more  successful  at  solving  this 
problem  (Isen,  Daubman,  &  Nowicki,  1987).  In  another  study, 
medical  students  were  asked  to  diagnose  patients  based  on 
X-rays  after  first  being  put  into  a  positive,  negative,  or  neutral 
mood.  Subjects  in  the  positive-affect  condition  reached  the  cor¬ 
rect  conclusion  faster  than  did  subjects  in  other  conditions 
(Isen,  Rosenzweig,  &  Young,  1991).  Positive  affect  has  also  been 
shown  to  increase  heuristic  processing,  such  as  reliance  on 
scripts  and  stereotypes.  Though  some  have  argued  that  such  re¬ 
liance  is  at  the  expense  of  systematic  processing  (Schwartz  & 
Bless,  1991),  more  recent  evidence  suggests  that  heuristic  pro¬ 
cessing  and  systematic  processing  are  not  mutually  exclusive 
(Isen,  2000).  Keeping  a  user  happy  may,  therefore,  not  only 
affect  satisfaction,  but  may  also  lead  to  efficiency  and  creativity. 

Assessment 

Mood  has  also  been  shown  to  influence  judgment  and  decision 
making.  As  mentioned  earlier,  mood  tends  to  bias  thoughts  in 
a  mood-consistent  direction,  while  also  lowering  the  thresholds 
of  mood-consistent  emotions.  One  important  consequence  of 
this  is  that  stimuli — even  those  unrelated  to  the  current  affec¬ 
tive  state — are  judged  through  the  filter  of  mood  (Clore  et  al., 
2001;  Erber  &  Erber,  2001;  Niedenthal,  Setterlund,  &  Jones, 
1994).  This  suggests  that  users  in  a  good  mood  will  likely  judge 
both  the  interface  and  their  work  more  positively,  regardless  of 
any  direct  emotional  effects.  It  also  suggests  that  a  happy  user  at 
an  e-commerce  site  would  be  more  likely  to  evaluate  the  prod¬ 
ucts  or  services  positively. 
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Positive  mood  also  decreases  risk-taking,  likely  in  an  effort 
to  preserve  the  positive  mood.  That  is,  although  people  in  a  pos¬ 
itive  mood  are  more  risk-prone  when  making  hypothetical  de¬ 
cisions,  when  presented  with  an  actual  risk  situation,  they  tend 
to  be  more  cautious  (Isen,  2000).  In  an  e-commerce  purchasing 
situation,  then,  one  can  predict  that  a  low-risk  purchase  is  more 
likely  during  a  good  mood,  due  to  a  biased  judgment  in  favor  of 
the  product,  while  a  high-risk  purchase  may  be  more  likely  in  a 
less  cautious,  neutral,  or  negative  mood  (consistent  with  the 
adage  that  desperate  people  resort  to  desperate  measures). 

A  mood’s  effect  on  judgment,  combined  with  its  effect  on 
memory,  can  also  influence  the  formation  of  sentiments.  Sen¬ 
timents  are  not  necessarily  determined  during  interaction  with 
an  object;  they  often  are  grounded  in  reflection.  This  is  impor¬ 
tant  to  consider  when  conducting  user  tests,  as  the  mood  set 
by  the  interaction  immediately  prior  to  a  questionnaire  may 
bias  like/dislike  assessments  of  earlier  interactions.  Thus,  vary¬ 
ing  order  of  presentation  ensures  both  that  later  stimuli  do  not 
influence  the  assessment  of  earlier  stimuli,  and  that  earlier  stim¬ 
uli  do  not  influence  the  experience  of  later  stimuli  (as  dis¬ 
cussed  earlier). 


CAUSES  OF  EMOTION 


What  causes  emotions?  The  answer  to  this  question  is  critical  for 
HCI  because  an  understanding  of  emotions’  antecedents  will 
better  enable  us  to  design  interfaces  that  encourage  desired 
emotional  states  and  understand  interfaces  that  do  not. 

Needs  and  Goals 

As  we  discussed  in  the  first  section,  emotions  are  reactions  to 
situations  deemed  relevant  to  the  needs  and  goals  of  the  indi¬ 
vidual.  Clearly,  a  user  comes  to  a  computer  hoping  to  achieve 
certain  application-specific  goals — composing  a  document, 
sending  an  e-mail,  finding  a  piece  of  information,  etc.  The  de¬ 
gree  to  which  an  interface  facilitates  or  hampers  those  goals 
has  a  direct  effect  on  the  emotional  state  of  the  user.  An  inter¬ 
face  capable  of  detecting  emotion  could,  therefore,  use  such 
information  as  feedback  regarding  whether  the  user’s  goals  are 
being  met,  modifying  its  behavior  as  necessary.  In  an  informa¬ 
tion-seeking  context,  for  example,  emotional  reactions  to  dis¬ 
played  content  could  be  used  to  improve  the  goal-relevance 
of  future  retrievals.  Similarly,  if  an  interface  detects  frustration, 
desperation,  or  anger  in  a  user,  goals  may  be  facilitated  by  try¬ 
ing  a  new  approach  or  offering  assistance  (Klein,  Moon,  &  Pi¬ 
card,  1999;  Picard,  1997a).  (If  the  particular  goals  implicated 
by  an  emotion  are  not  clear,  there  can  be  advantages  to  an  in¬ 
terface  that  empathizes  with  the  user;  (Klein  et  ak,  1999)). 
More  generally,  user  preferences  can  be  automatically  deter¬ 
mined  based  on  a  user’s  emotional  reactions  to  interface  ele¬ 
ments  (Picard,  1997a). 

There  are  also  a  host  of  more  abstract  needs  underlying,  and 
often  adjacent  to,  application-specific  goals.  A  user  may  have  a 
strong  need  to  feel  capable  and  competent,  maintain  control, 
learn,  or  be  entertained.  A  new  user  typically  needs  to  feel  com¬ 


fortable  and  supported,  while  an  expert  is  more  focused  on  aes¬ 
thetic  concerns  of  efficiency  and  elegance.  Acknowledging  these 
more  abstract  goals  in  interface  design  can  be  as  instrumental  in 
determining  a  user’s  affective  state  as  meeting  or  obstructing 
application-specific  goals.  Maslow’s  hierarchy  (Maslow,  1968) 
presents  a  useful  starting  place  for  considering  the  structure  of 
these  more  abstract  user  needs.  In  his  later  work,  Maslow  (1968) 
grouped  an  individual’s  basic  needs  into  eight  categories: 

•  Physiological:  hunger,  thirst,  bodily  comforts,  etc. 

•  Safety/security:  being  out  of  danger 

•  Social:  affiliate  with  others,  be  accepted 

•  Esteem:  to  achieve,  be  competent,  gain  approval  and 
recognition 

•  Cognitive:  to  know,  to  understand,  and  explore 

•  Aesthetic:  symmetry,  order,  and  beauty 

•  Self-actualization:  to  find  self-fulfillment  and  realize  one’s 
potential 

•  Transcendence:  to  help  others  find  self-fulfillment  and 
realize  their  potential. 

"When  a  particular  situation  or  event  is  deemed  as  promoting 
these  needs,  positive  emotion  results.  'When  someone  or  some¬ 
thing  hampers  these  needs,  negative  emotion  results.  The  spe¬ 
cific  emotion  experienced  is  due  in  part  to  the  category  of  need 
implicated  by  the  event.  Fright,  for  example,  is  typically  associ¬ 
ated  with  threatened  safety/security  needs;  love  and  embarrass¬ 
ment  with  social  needs;  pride  with  esteem  needs;  and  curiosity 
with  cognitive  needs. 

Within  Maslow’s  (1968)  framework,  application-specific  goals 
of  a  user  can  be  seen  as  instruments  ultimately  serving  these 
more  basic  needs.  For  example,  a  user  who  successfully  en¬ 
hances  a  digital  family  photograph  may  simultaneously  be  con¬ 
tributing  to  the  fulfillment  of  social,  esteem,  cognitive,  and  aes¬ 
thetic  needs.  However,  interfaces  can  also  directly  address  a 
user’s  basic  needs.  For  example,  a  spell-checker  interface  that 
praises  a  user  on  his  or  her  spelling  ability,  regardless  of  the 
user’s  actual  performance,  is  a  somewhat  humorous,  though  il¬ 
lustrative,  approach  to  acknowledging  a  user’s  esteem  needs. 
Such  interfaces,  by  enhancing  the  user’s  affective  state,  have 
been  shown  also  to  be  viewed  as  more  intelligent  and  likable 
(Reeves  &  Nass,  1996,  Chapter  4).  As  another  example,  an  in¬ 
terface  that  takes  care  to  establish  a  trusting  and  safe  relation¬ 
ship  with  users  may  ultimately  lead  to  more  effective  and  coop¬ 
erative  interactions  (Fogg,  1998).  Educational  software  should 
address  users’  emotional  needs,  not  only  teaching  the  relevant 
content,  but  also  ensuring  users  believe  that  they  are  learning. 
Optimized  learning  further  requires  a  careful  balance  of  esteem 
and  self-actualization  needs,  offering  appropriate  levels  of  en¬ 
couragement  and  challenge,  as  well  as  praise  and  criticism.  Fi¬ 
nally,  one  of  the  key  arguments  for  social  interfaces  is  that  they 
meet  the  social  needs  of  users  (Reeves  &  Nass,  1996). 

Although  the  type  of  need  relevant  in  a  situation  offers  some 
insight  into  emotional  reaction,  need  category  alone  is  not  suf¬ 
ficient  to  differentiate  fully  among  all  emotions.  Distinguishing 
frustration  and  anger,  for  example,  cannot  be  achieved  based 
solely  on  knowing  the  users’  need;  it  also  requires  some  notion 
of  agency. 
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Appraisal  Theories 

‘Appraisal”  theories  provide  much  greater  predictive  power 
than  category  or  hierarchy-based  schemes  by  specifying  the  crit¬ 
ical  properties  of  antecedent  events  that  lead  to  particular  emo¬ 
tions  (Lazarus,  1991;  Ortony,  Clore,  &  Collins,  1988;  Roseman, 
Antoniou,  &Jose,  1996;  Scherer,  1988).  Ellsworth  (1994),  forex- 
ample,  described  a  set  of  “abstract  elicitors”  of  emotion.  In  ad¬ 
dition  to  novelty  and  valence,  Ellsworth  contended  that  the 
level  of  certainty /uncertainty  in  an  event  has  a  significant  im¬ 
pact  on  the  emotion  experienced.  Eor  instance,  “uncertainty 
about  probably  positive  events  leads  to  interest  and  curiosity, 
or  to  hope,”  while,  “uncertainty  about  probably  negative  events 
leads  to  anxiety  and  fear”  (Ellsworth,  1994,  p.  152).  Certainty,  on 
the  other  hand,  can  lead  to  relief  in  the  positive  case  and  despair 
in  the  negative  case. 

Because  slow,  unclear,  or  unusual  responses  from  an  inter¬ 
face  generally  reflect  a  problem,  one  of  the  most  common  in¬ 
terface  design  mistakes — from  an  affective  standpoint — is  to 
leave  the  user  in  a  state  of  uncertainty.  Users  tend  to  fear  the 
worst  when,  for  example,  an  application  is  at  a  standstill,  the 
hourglass  remains  up  longer  than  usual,  or  the  hard  drive  sim¬ 
ply  starts  grinding  away  unexpectedly.  Such  uncertainty  leads 
to  a  state  of  anxiety  that  can  be  easily  avoided  with  a  well-placed, 
informative  message  or  state  indicator.  Providing  users  with  im¬ 
mediate  feedback  on  their  actions  reduces  uncertainty,  promot¬ 
ing  a  more  positive  affective  state  (see  Norman,  1990,  on  visi¬ 
bility  and  feedback).  When  an  error  has  actually  occurred,  the 
best  approach  is  to  make  the  user  aware  of  the  problem  and  its 
possible  consequences,  but  frame  the  uncertainty  in  as  positive 
a  light  as  possible  (i.e.,  “this  application  has  experienced  a  prob¬ 
lem,  but  the  document  should  be  recoverable”). 

According  to  Ellsworth  (1994),  obstacles  and  control  also 
play  an  important  role  in  eliciting  emotion.  High  control  can 
lead  to  a  sense  of  challenge  in  positive  situations,  but  stress  in 
negative  situations.  Lack  of  control,  on  the  other  hand,  often  re¬ 
sults  in  frustration,  which  if  sustained  can  lead  to  desperation 
and  resignation.  In  an  HCI  context,  providing  an  appropriate 
level  of  controllability,  given  a  user’s  abilities  and  the  task  at 
hand,  is  thus  critical  for  avoiding  negative  affective  conse¬ 
quences.  Control  need  not  only  be  perceived  to  exist  (Skinner, 
1995;  Wegner,  Bargh,  Gilbert,  Eiske,  et  al.,  1998),  but  must  be 
understandable  and  visible,  otherwise  the  interface  itself  is  an 
obstacle  (Norman,  1990). 

Agency  is  yet  another  crucial  factor  determining  emotional 
response  (Ellsworth,  1994;  Friedman  &  Kahn,  1997).  When  one¬ 
self  is  the  cause  of  the  situation,  shame  (negative)  and  pride 
(positive)  are  likely  emotions.  "When  another  person  or  entity  is 
the  cause,  anger  (negative)  and  love  (positive)  are  more  likely. 
However,  if  fate  is  the  agent,  one  is  more  likely  to  experience  sor¬ 
row  (negative)  and  joy  (positive).  An  interface  often  has  the  op¬ 
portunity  to  direct  a  user’s  perception  of  agency.  In  any  anom¬ 
alous  situation,  for  example — be  it  an  error  in  reading  a  file, 
inability  to  recognize  speech  input,  or  simply  a  crash — if  the  user 
is  put  in  a  position  encouraging  blame  of  oneself  or  fate,  the  neg¬ 
ative  emotional  repercussions  may  be  more  difficult  to  diffuse 
than  if  the  computer  explicitly  assumes  blame  (and  is  apolo¬ 
getic)  .  For  example,  a  voice  interface  encountering  a  recognition 
error  can  say,  “This  system  failed  to  understand  your  command” 


(blaming  itself),  “The  command  was  not  understood”  (blaming 
no  one),  or  “You  did  not  speak  clearly  enough  for  your  com¬ 
mand  to  be  understood”  (blaming  the  user) . 

Appraisal  theories  of  emotion,  such  as  Ellsworth’s  (1994),  are 
useful  not  only  in  understanding  the  potential  affective  impacts 
of  design  decisions,  but  also  in  creating  computer  agents  that 
exhibit  emotion.  Although  in  some  cases  scripted  emotional  re¬ 
sponses  are  sufficient,  in  more  dynamic  or  interactive  contexts, 
an  agent’s  affective  state  must  be  simulated  to  be  believable. 
Ortony,  Clore,  and  Collins’  (1988)  cognitive  theory  of  emotion  is 
currently  the  most  commonly  applied  appraisal  theory  for  such 
purposes  (Bates,  Loyall,  &  Reilly,  1994;  Elliott  &  Brzezinski, 
1998;  for  alternate  approaches,  see  Ball  &  Breese,  2000;  Bozi- 
novski  &  Bozinovska,  2001;  Scheutz,  Sloman,  &  Logan,  2000). 
Appraisal  theories  can  also  be  used  to  help  model  and  predict 
a  user’s  emotional  state  in  real  time  (Elliott  &  Brzezinski,  1998). 

Contagion 

Another  cause  of  emotion  that  does  not  fit  cleanly  into  the 
structure  just  described  is  contagion  (Hatfield,  Cacioppo,  & 
Rapson,  1994).  People  often  “catch”  other’s  emotions.  Some¬ 
times  this  social  phenomenon  seems  logical,  such  as  when  a 
person  becomes  afraid  upon  seeing  another  experience  fear.  At 
other  times,  contagion  seems  illogical,  such  as  when  another 
person’s  laughter  induces  immediate,  “unexplainable”  amuse¬ 
ment.  Anticipatory  excitement  is  another  emotion  that  transfers 
readily  from  person  to  person. 

Emotions  in  interfaces  can  also  be  contagious.  For  example, 
a  character  that  exhibits  excitement  when  an  online  product 
appears  can  make  users  feel  more  excited.  Similarly,  an  attempt 
at  light  humor  in  a  textual  interface,  even  if  unsuccessful,  may 
increase  positive  affect  (Morkes,  Kernal,  &  Nass,  2000). 

Moods  and  Sentiments 

Mood  and  sentiment  can  also  bias  emotion.  One  of  the  funda¬ 
mental  properties  of  mood  is  that  it  lowers  the  activation  thresh¬ 
old  for  mood-consistent  emotions.  Sentiment  can  act  in  a  similar 
way.  For  example,  interaction  with  an  object,  to  which  a  senti¬ 
ment  is  already  attached,  can  evoke  emotion  either  in  memory  of 
past  interaction  or  in  anticipation  of  the  current  interaction. 
Thus,  an  interface  that  proved  frustrating  in  the  past  may  elicit 
frustration  before  the  user  even  begins  working.  In  addition,  sen¬ 
timent  can  bias  perception  of  an  object,  increasing  the  probabil¬ 
ity  of  eliciting  sentiment-consistent  emotions.  For  example,  an 
application  that  users  like  can  do  no  wrong,  while  one  that  users 
dislike  does  everything  to  anger  them,  regardless  of  the  appli¬ 
cation’s  actual  behavior.  Of  critical  importance  here  is  that  sen¬ 
timents  need  not  derive  from  direct  experience;  they  may  also  be 
inferred  from  stereotypes  or  other  generalizations. 

Previous  Emotional  State 

Finally,  a  user’s  previous  emotional  state  can  affect  the  experi¬ 
ence  of  subsequent  emotions.  This  occurs  not  only  through  the 
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mechanism  of  mood — emotions  can  cause  moods  and  moods 
then  bias  the  activation  thresholds  of  emotions — but  also 
through  the  mechanisms  of  excitation  transfer  and  habituation. 
Excitation  transfer  (Zillmann,  1991)  is  based  on  the  fact  that  af¬ 
ter  an  emotion-causing  stimulus  has  come  and  gone,  an  acti¬ 
vated  autonomic  nervous  system  takes  some  time  to  return  to 
its  deactivated  state.  If  another  emotion  is  triggered  before  that 
decay  is  complete,  the  residual  activation  (“excitement”)  will 
be  added  to  the  current  activation  and  be  perceived  as  part  of 
the  current  emotion.  As  Zillmann  (1991)  explained,  “residues 
of  excitation  from  a  previous  affective  reaction  will  combine  with 
excitation  produced  by  subsequent  affective  stimulation  and 
thereby  cause  an  overly  intense  affective  reaction  to  subsequent 
stimuli. . . .  Residual  arousal  from  anger,  then,  may  intensify  fear; 
residues  from  fear  may  intensify  sexual  behaviors;  residual  sex¬ 
ual  arousal  may  intensify  aggressive  responses;  and  so  forth” 
(p.  116).  Thus,  people  who  have  just  hit  the  “purchase”  button 
associated  with  their  web  shopping  cart  can  become  particularly 
angry  when  they  are  presented  with  multiple  pages  before  they 
can  complete  their  transaction:  The  arousal  of  buying  increases 
the  intensity  of  their  frustration  with  the  post-purchase  process. 
Similarly,  Reeves  and  Nass  (1996)  have  argued  that  pictorial  char¬ 
acters  “raise  the  volume  knob”  on  both  positive  and  negative 
feelings  about  an  interaction,  because  explicitly  social  interac¬ 
tions  are  more  arousing  than  their  non-social  counterparts  are. 

Habituation  is,  in  some  sense,  the  converse  of  excitation 
transfer.  It  posits  that  the  intensity  of  an  emotion  decreases  over 
time  if  the  emotion  is  experienced  repeatedly.  One  explanation 
for  this  effect  relates  back  to  appraisal  theory:  “Emotions  are 
elicited  not  so  much  by  the  presence  of  favorable  or  unfavorable 
conditions,  but  by  actual  or  expected  changes  in  favorable  or 
unfavorable  conditions”  (Frijda,  1988,  p.  39).  Repeated  pleasur¬ 
able  affective  states,  therefore,  become  expected  and  thus  grad¬ 
ually  lose  intensity.  The  same  is  true  for  negative  affective  states; 
however,  particularly  extreme  negative  emotional  states  may 
never  habituate  (Frijda,  1988).  This  may  be  why  negative  expe¬ 
riences  with  frequently  used  interfaces  (i.e.,  operating  systems) 
are  remembered  more  vividly  than  positive  experiences. 


CAUSES  OF  MOOD 


Mood  has  a  number  of  potential  causes.  The  most  obvious  is 
emotion  itself  Intense  or  repetitive  emotional  experiences  tend 
to  prolong  themselves  into  moods.  A  user  who  is  continually 
frustrated  will  likely  be  put  in  a  frustrated  mood,  while  a  user 
who  is  repeatedly  made  happy  will  likely  be  put  in  a  positive 
mood.  Mood  can  also  be  influenced,  however,  by  anticipated 
emotion,  based  on  sentiment.  For  example,  if  users  know  that 
they  must  interact  with  an  application  that  they  dislike  (i.e.,  they 
associate  with  negative  emotion),  they  may  be  in  a  bad  mood 
from  the  start. 

Contagion 

Similar  to  emotion,  moods  also  exhibit  a  contagion  effect  (Neu¬ 
mann  &  Strack,  2000).  For  example,  a  depressed  person  will  of¬ 


ten  make  others  feel  depressed  and  a  happy  person  will  often 
make  others  feel  happy.  Murphy  and  Zajonc  (1993)  have  shown 
that  even  a  mere  smiling  or  frowning  face,  shown  so  quickly  that 
the  subject  is  not  conscious  of  seeing  the  image,  can  affect  a 
person’s  mood  and  subsequently  bias  judgment.  From  an  in¬ 
terface  standpoint,  the  implications  for  character-based  agents 
are  clear:  Moods  exhibited  by  onscreen  characters  may  directly 
transfer  to  the  user’s  mood.  Onscreen  mood  can  also  lead  to 
“perceived  contagion”  effects:  One  smiling  or  frowning  face  on 
the  screen  can  influence  users’  perceptions  of  other  faces  that 
they  subsequently  see  on  the  screen,  perhaps  because  of  prim¬ 
ing  (Reeves,  Biocca,  Pan,  Oshagan,  &  Richards,  1989;  Reeves  & 
Nass,  1996,  Chapter  22). 

Color 

Color  can  clearly  be  designed  into  an  interface  with  its  mood 
influencing  properties  in  mind.  Warm  colors,  for  example,  gen¬ 
erally  provoke  “active  feelings,”  while  cool  colors  are  “much  less 
likely  to  cause  extreme  reactions”  (Levy,  1984).  Gerard  (1957; 
1958),  for  example,  found  that  red  light  projected  onto  a  dif¬ 
fusing  screen  produces  increased  arousal  in  subjects,  using  a 
number  of  physiological  measures  (including  cortical  activation, 
blood  pressure,  and  respiration) ,  while  blue  light  has  essentially 
the  opposite  “claming”  effect  (see  Walters,  Apter,  &  Svebak, 
1982).  Subjective  ratings  of  the  correlations  between  specific 
colors  and  moods  can  be  more  complicated.  As  Gardano  (1986) 
summarized,  “yellow  (a  warm  color)  has  been  found  to  be  as¬ 
sociated  with  both  sadness  (Peretti,  1974)  and  with  cheerfulness 
(Wexner,  1954).  Similarly,  red  (another  warm  color)  is  related 
to  anger  and  violence  (Schachtel,  1943)  as  well  as  to  passionate 
love  (Henry  &  Jacobs,  1978;  Pecjak,  1970);  and  blue  (a  cool 
color),  to  tenderness  (Schachtel,  1943)  and  sadness  (Peretti, 
1974).  .  .  .”  Nevertheless,  as  any  artist  will  attest,  carefully  de¬ 
signed  color  schemes  (combined  with  other  design  elements) 
can  produce  reliable  and  specific  influences  on  mood. 

Other  Effects 

A  number  of  other  factors  can  affect  mood.  For  example,  in 
music,  minor  scales  are  typically  associated  with  negative  emo¬ 
tion  and  mood,  while  major  scales  have  more  positive/happy 
connotations  (Gregory,  Worrall,  &  Sarge,  1996).  Other  possible 
influences  on  mood  include  weather,  temperature,  hormonal 
cycles,  genetic  temperament,  sleep,  food,  medication,  and  light¬ 
ing  (Thayer,  1989). 


MEASURING  AFFECT 


Measuring  user  affect  can  be  valuable  both  as  a  component  of 
usability  testing  and  as  an  interface  technique.  When  evaluat¬ 
ing  interfaces,  affective  information  provides  insight  into  what 
a  user  is  feeling — the  fundamental  basis  of  liking  and  other  sen¬ 
timents.  Within  an  interface,  knowledge  of  a  user’s  affect  pro¬ 
vides  useful  feedback  regarding  the  degree  to  which  a  user’s 
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goals  are  being  met,  enabling  dynamic  and  intelligent  adapta¬ 
tion.  In  particular,  social  interfaces  (including  character-based 
interfaces)  must  have  the  ability  to  recognize  and  respond  to 
emotion  in  users  to  execute  effectively  real-world  interpersonal 
interaction  strategies  (Picard,  1997a). 

Neurological  Responses 

The  brain  is  the  most  fundamental  source  of  emotion.  The  most 
common  way  to  measure  neurological  changes  is  the  electro¬ 
encephalogram  (EEG).  In  a  relaxed  state,  the  human  brain  ex¬ 
hibits  an  alpha  rhythm,  which  can  be  detected  by  EEG  record¬ 
ings  taken  through  sensors  attached  to  the  scalp.  Disruption  of 
this  signal  (alpha  blocking)  occurs  in  response  to  novelty,  com¬ 
plexity,  and  unexpectedness,  as  well  as  during  emotional  excite¬ 
ment  and  anxiety  (Frijda,  1986).  EEG  studies  have  further  shown 
that  positive/approach-related  emotions  lead  to  greater  activa¬ 
tion  of  the  left  anterior  region  of  the  brain,  while  negative/ 
avoidance-related  emotions  lead  to  greater  activation  of  the 
right  anterior  region  (Davidson,  1992;  see  also  Heller,  1990). 
Indeed,  when  one  flashes  a  picture  to  either  the  left  or  the  right 
of  where  a  person  is  looking,  the  viewer  can  identify  a  smiling 
face  more  quickly  when  it  is  flashed  to  the  left  hemisphere  and 
a  frowning  face  more  quickly  when  it  is  flashed  to  the  right 
hemisphere  (Reuter-Lorenz  &  Davidson,  1981).  Current  EEG 
devices,  however,  are  fairly  clumsy  and  obstructive,  rendering 
them  impractical  for  most  HCI  applications.  Recent  advances 
in  magneto  resonance  imaging  (MRI)  offer  great  promise  for 
emotion  monitoring,  but  are  currently  unrealistic  for  HCI  be¬ 
cause  of  their  expense,  complexity,  and  form  factor. 

Autonomic  Activity 

Autonomic  activity  has  received  considerable  attention  in  stud¬ 
ies  of  emotion,  in  part  due  to  the  relative  ease  in  measuring  cer¬ 
tain  components  of  the  autonomic  nervous  system  (ANS), 
including  heart  rate,  blood  pressure,  blood-pulse  volume,  res¬ 
piration,  temperature,  pupil  dilation,  skin  conductivity,  and 
more  recently,  muscle  tension  (as  measured  by  electromyogra¬ 
phy  (EMG)) .  However,  the  extent  to  which  emotions  can  be  dis¬ 
tinguished  on  the  basis  of  autonomic  activity  alone  remains  a 
hotly  debated  issue  (see  Ekman  &  Davidson,  1994,  ch.  6;  Lev- 
enson,  1988).  On  the  one  end  are  those,  following  in  the  Jame¬ 
sian  tradition  (James,  1884),  who  believe  that  each  emotion  has 
a  unique  autonomic  signature — technology  is  simply  not  ad¬ 
vanced  enough  yet  to  fully  detect  these  differentiators.  On  the 
other  extreme,  there  are  those,  following  Cannon  (1927),  who 
contended  that  all  emotions  are  accompanied  by  the  same  state 
of  nonspecific  autonomic  (sympathetic)  arousal,  which  varies 
only  in  magnitude — most  commonly  measured  by  galvanic  skin 
response  (GSR),  a  measure  of  skin  conductivity  (Schachter  & 
Singer,  1962).  This  controversy  has  clear  connections  to  the  na¬ 
ture-nurture  debate  in  emotion,  described  earlier,  because  au¬ 
tonomic  specificity  seems  more  probable  if  each  emotion  has  a 
distinct  biological  basis,  while  nonspecific  autonomic  (sympa¬ 
thetic)  arousal  seems  more  likely  if  differentiation  among  emo¬ 
tions  is  based  mostly  on  cognition  and  social  learning. 


Though  the  debate  is  far  from  resolved,  certain  measures 
have  proven  reliable  at  distinguishing  among  “basic  emotions.” 
Heart  rate,  for  example,  increases  most  during  fear,  followed  by 
anger,  sadness,  happiness,  surprise,  and  finally  disgust,  which 
shows  almost  no  change  in  heart  rate  (Cacioppo,  Bernston, 
Klein,  &  Poehlmann,  1997;  Ekman,  Levenson,  &  Friesen,  1983; 
Levenson,  Ekman,  &  Friesen,  1990).  Heart  rate  also  generally  in¬ 
creases  during  excitement,  mental  concentration,  and  “upon  the 
presentation  of  intense  sensory  stimuli”  (Frijda,  1986).  Decreases 
in  heart  rate  typically  accompany  relaxation,  attentive  visual  and 
audio  observation,  and  the  processing  of  pleasant  stimuli  (Frijda, 
1986).  As  is  now  common  knowledge,  blood  pressure  increases 
during  stress  and  decreases  during  relaxation.  Cacioppo  et  al. 
(2000)  further  observed  that  anger  increases  diastolic  blood 
pressure  to  the  greatest  degree,  followed  by  fear,  sadness,  and 
happiness.  Anger  is  further  distinguished  from  fear  by  larger  in¬ 
creases  in  blood  pulse  volume,  more  nonspecific  skin  conduc¬ 
tance  responses,  smaller  increases  in  cardiac  output,  and  other 
measures  indicating  that  “anger  appears  to  act  more  on  the  vas¬ 
culature  and  less  on  the  heart  than  does  fear”  (Cacioppo  et  al., 
1997).  Results  using  other  autonomic  measures  are  less  reliable. 

Combined  measures  of  multiple  autonomic  signals  show 
promise  as  components  of  an  emotion  recognition  system.  Pi¬ 
card,  Vyzas,  and  Healey  (in  press),  for  example,  achieved  81% 
percent  recognition  accuracy  on  eight  emotions  through  com¬ 
bined  measures  of  respiration,  blood  pressure  volume,  and  skin 
conductance,  as  well  as  facial  muscle  tension  (to  be  discussed  in 
the  next  subsection).  Many  autonomic  signals  can  also  be  mea¬ 
sured  in  reasonably  nonobstructive  ways  (i.e.,  through  user  con¬ 
tact  with  mice  and  keyboards;  (Picard,  1997a). 

However,  even  assuming  that  we  could  distinguish  among  all 
emotions  through  autonomic  measures,  it  is  not  clear  that  we 
should.  In  real-world  social  interactions,  humans  have  at  least 
partial  control  over  what  others  can  observe  of  their  emotions. 
If  another  person,  or  a  computer,  is  given  direct  access  to  users’ 
internal  states,  users  may  feel  overly  vulnerable,  leading  to  stress 
and  distraction.  Such  personal  access  could  also  be  seen  as  in¬ 
vasive,  compromising  trust.  It  may,  therefore,  be  more  appro¬ 
priate  to  rely  on  measurement  of  the  external  signals  of  emotion 
(discussed  next) . 

Facial  Expression 

Facial  expression  provides  a  fundamental  means  by  which  hu¬ 
mans  detect  emotion.  Table  4.1  describes  characteristic  facial 
features  of  six  basic  emotions  (Ekman  &  Friesen,  1975;  Rosen- 
feld,  1997).  Endowing  computers  with  the  ability  to  recognize 
facial  expressions,  through  pattern  recognition  of  captured  im¬ 
ages,  have  proven  to  be  a  fertile  area  of  research  (Essa  &  Pent- 
land,  1997;  Lyons,  Akamatsu,  Kamachi,  &  Gyoba,  1998;  Martinez, 
2000;  Yacoob  &  Davis,  1996);  for  recent  reviews,  see  Cowie 
et  al.,  2001;  Lisetti  &  Schiano,  2000;  Tian,  Kanade,  &  Cohn, 
2001).  Ekman  and  Friesen’s  (1977)  Facial  Action  Coding  System 
(FACS),  which  identifies  a  highly  specific  set  of  muscular  move¬ 
ments  for  each  emotion,  is  one  of  the  most  widely  accepted 
foundations  for  facial-recognition  systems  (Tian  et  al.,  2001). 
In  many  systems,  recognition  accuracy  can  reach  as  high  as 
90%-98%  on  a  small  set  of  basic  emotions.  However,  current 
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TABLE  4.1.  Facial  Cues  and  Emotion 


Emotion 


Obsen/ed  Facial  Cues 


Surprise 


Fear 


Disgust 


Anger 


Happiness 


Sadness 


Brows  raised  (curved  and  high) 

Skin  below  brow  stretched 

Horizontal  wrinkles  across  forehead 

Eyelids  opened  and  more  of  the  white  of  the  eye  is  visible 

law  drops  open  without  tension  or  stretching  of  the  mouth 

Brows  raised  and  drawn  together 

Forehead  wrinkles  drawn  to  the  center 

Upper  eyelid  is  raised  and  lower  eyelid  is  drawn  up 

Mouth  is  open 

Lips  are  slightly  tense  or  stretched  and  drawn  back 
Upper  lip  is  raised 

Lower  lip  is  raised  and  pushed  up  to  upper  lip  or  it  is  lowered 
Nose  is  wrinkled 
Cheeks  are  raised 

Lines  below  the  lower  lid,  lid  is  pushed  up  but  not  tense 
Brows  are  lowered 
Brows  lowered  and  drawn  together 
Vertical  lines  appear  between  brows 
Lower  lid  is  tensed  and  may  or  may  not  be  raised 
Upper  lid  is  tense  and  may  or  may  not  be  lowered  due  to 
brows’  action 

Eyes  have  a  hard  stare  and  may  have  a  bulging  appearance 
Lips  are  either  pressed  firmly  together  with  corners 
straight  or  down  or  open,  tensed  in  a  squarish  shape 
Nostrils  may  be  dilated  (could  occur  in  sadness  too) 
unambiguous  only  if  registered  in  all  three  facial  areas 
Corners  of  lips  are  drawn  back  and  up 
Mouth  may  or  may  not  be  parted  with  teeth  exposed  or  not 
A  wrinkle  runs  down  from  the  nose  to  the  outer  edge 
beyond  lip  corners 
Cheeks  are  raised 

Lower  eyelid  shows  wrinkles  below  it,  and  may  be  raised 
but  not  tense 

Crow’s-feet  wrinkles  go  outward  from  the  outer  corners 
of  the  eyes 

Inner  corners  of  eyebrows  are  drawn  up 

Skin  below  the  eyebrow  is  triangulated,  with  inner  corner  up 

Upper  lid  inner  corner  is  raised 

Corners  of  the  lips  are  drawn  or  lip  is  trembling 


remains  unclear.  Further,  “not  all .  .  .  emotions  are  accompanied 
by  visually  perceptible  facial  action”  (Cacioppo  et  ah,  1997). 

An  alternate  method  for  facial  expression  recognition,  capa¬ 
ble  of  picking  up  both  visible  and  extremely  subtle  movements 
of  facial  muscles,  is  facial  electromyography  (EMG).  EMG  sig¬ 
nals,  recorded  through  small  electrodes  attached  to  the  skin, 
have  proven  most  successful  at  detecting  positive  versus  nega¬ 
tive  emotions  and  show  promise  in  distinguishing  among  basic 
emotions  (Cacioppo  et  al.,  2000).  Though  the  universality  (and 
biological  basis)  of  facial  expression  is  also  debated,  common 
experience  tells  us  that,  at  least  within  a  culture,  facial  expres¬ 
sions  are  reasonably  consistent.  Nonetheless,  individual  differ¬ 
ences  may  also  be  important,  requiring  recognition  systems  to 
adapt  to  a  specific  user  for  greatest  accuracy.  Gestures  can  also 
be  recognized  with  technologies  similar  to  those  for  facial- 
expression  recognition,  but  the  connection  between  gesture  and 
emotional  state  is  less  distinct,  in  part  due  to  the  greater  influ¬ 
ence  of  personality  (Cassell  &  Thorisson,  in  press;  Collier,  1985). 


Voice 

Voice  presents  yet  another  opportunity  for  emotion  recognition 
(see  Cowie  et  al.,  2001  for  an  extensive  review).  Emotional  arousal 
is  the  most  readily  discernible  aspect  of  vocal  communication, 
but  voice  can  also  provide  indications  of  valence  and  specific 
emotions  through  acoustic  properties  such  as  pitch  range, 
rhythm,  and  amplitude  or  duration  changes  (Ball  &  Breese,  2000; 
Scherer,  1989).  A  bored  or  sad  user,  for  example,  will  typically  ex¬ 
hibit  slower,  lower-pitched  speech,  with  little  high  frequency  en¬ 
ergy,  while  a  user  experiencing  fear,  anger,  or  joy  will  speak  faster 
and  louder,  with  strong  high-frequency  energy  and  more  explicit 
enunciation  (Picard,  1997a) .  Murray  and  Arnott  (1993)  provided  a 
detailed  account  of  the  vocal  effects  associated  with  several  basic 
emotions  (see  Table  4.2).  Though  few  systems  have  been  built  for 
automatic  emotion  recognition  through  speech,  Banse  and 
Scherer  (1996)  have  demonstrated  the  feasibility  of  such  systems. 
Cowie  and  Douglas-Cowie’s  ACCESS  system  (Cowie  &  Douglas- 
Cowie,  1996)  also  presents  promise  (Cowie  et  al.,  2001). 


recognition  systems  are  tested  almost  exclusively  on  “produced” 
expressions  (i.e.,  subjects  are  asked  to  make  specific  facial  move¬ 
ments  or  emotional  expressions),  rather  than  natural  expres¬ 
sions  resulting  from  actual  emotions.  The  degree  of  accuracy 
that  can  be  achieved  on  more  natural  expressions  of  emotion 


Self-Report  Measures 

A  final  method  for  measuring  a  user’s  affective  state  is  to  ask 
questions.  Post-interaction  questionnaires,  in  fact,  currently 


TABLE  4.2.  Voice  and  Emotion 


Fear 

Anger 

Sadness 

Happiness 

Disgust 

Speech  rate 

Much  faster 

Slightly  faster 

Slightly  slower 

Faster  or  slower 

Very  much  slower 

Pitch  average 

Very  much  higher 

Very  much  higher 

Slightly  lower 

Much  higher 

Very  much  lower 

Pitch  range 

Much  wider 

Much  wider 

Slightly  narrower 

Much  wider 

Slightly  wider 

Intensity 

Normal 

Higher 

Lower 

Higher 

Lower 

Voice  quality 

Irregular  voicing 

Breathy  chest  tone 

Resonant 

Breathy  blaring 

Grumbled  chest  tone 

Pitch  changes 

Normal 

Abrupt  on  stressed 

Downward  inflections 

Smooth  upward 

Wide  downward 

syllables 

inflections 

terminal  inflections 

Articulation 

Precise 

Tense 

Slurring 

Normal 

Normal 
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serve  as  the  primary  method  for  ascertaining  emotion,  mood, 
and  sentiment  during  an  interaction.  However,  in  addition  to 
the  standard  complexities  associated  with  self-report  measures 
(such  as  the  range  of  social  desirability  effects),  measuring 
affect  in  this  way  presents  added  challenges.  To  begin  with, 
questionnaires  are  capable  of  measuring  only  the  conscious 
experience  of  emotion  and  mood.  Much  of  affective  process¬ 
ing,  however,  resides  in  the  limbic  system  and  in  nonconscious 
processes.  Although  it  is  debatable  whether  an  emotion  can 
exist  without  any  conscious  component  at  all,  a  mood  surely 
can.  Further,  questions  about  emotion,  and  often  those  about 
mood,  refer  to  past  affective  states  and  thus  rely  on  imperfect 
and  potentially  biased  memory.  Alternatively,  asking  a  user  to 
report  on  an  emotion  as  it  occurs  requires  interruption  of  the 
experience.  In  addition,  emotions  and  moods  are  often  diffi¬ 
cult  to  describe  in  words.  Finally,  questions  about  sentiment, 
although  the  most  straightforward  given  their  predictive  na¬ 
ture,  are  potentially  affected  by  when  they  are  asked  (both  be¬ 
cause  of  current  mood  and  memory  degradation).  Neverthe¬ 
less,  self-report  measures  are  the  most  direct  way  to  measure 
sentiment  and  a  reasonable  alternative  to  direct  measures  of 
emotion  and  mood  (which  currently  remain  in  the  early  stages 
of  development) . 

Several  standard  questionnaires  exist  for  measuring  affect 
(Plutchik  &  Kellerman,  1989,  Chapter  1-3).  The  most  common 
approach  presents  participants  with  a  list  of  emotional  adjec¬ 
tives  and  asks  how  well  each  describes  their  affective  state. 
Izard’s  (1972)  Differential  Emotion  Scale  (DES),  for  example, 
includes  24  emotional  terms  (such  as  delighted,  scared,  happy, 
and  astonished)  that  participants  rate  on  seven-point  scales 
indicating  the  degree  to  which  they  are  feeling  that  emotion 
(from  “not  at  all”  to  “extremely”).  McNair,  Lorr,  and  Drop- 
pieman’s  (1981)  Profile  of  Mood  States  (POMS)  is  a  popular 
adjective-based  measure  of  mood.  Researchers  have  created 
numerous  modifications  of  these  standard  scales  (Desmet, 
Hekkert,  &  Jacobs,  2000,  presented  a  unique  nonverbal  adap¬ 
tation),  and  many  current  usability  questionnaires  include  at 
least  some  adjective-based  affect  assessment  items  (i.e.,  the 
Questionnaire  for  User  Interface  Satisfaction  (QUIS)  (Chin, 
Diehl,  &  Norman,  1988)). 

A  second  approach  to  questionnaire  measurement  of  affect 
derives  from  dimensional  theories  of  emotion  and  mood. 
Many  researchers  argue  that  two  dimensions — arousal  (activa¬ 
tion)  and  valence  (pleasant/unpleasant) — are  nearly  sufficient 
to  describe  the  entire  space  of  conscious  emotional  experience 
(Feldman,  Barrett,  &  Russell,  1999).  Lang  (1995),  for  example, 
presented  an  interesting  measurement  scheme  where  subjects 
rate  the  arousal  and  valence  of  their  current  affective  state  by 
selecting  among  pictorial  representations  (rather  than  the  stan¬ 
dard  number/word  representation  of  degree).  Watson,  Clark, 
and  Tellegen’s  (1988)  Positive  and  Negative  Affect  Schedule 
(PANAS)  is  a  popular  dimensional  measure  of  mood.  Finally, 
to  measure  emotion  as  it  occurs,  with  minimum  interruption, 
some  researchers  have  asked  subjects  to  push  one  of  a  small 
number  of  buttons  indicating  their  current  emotional  reaction 
during  presentation  of  a  stimulus  (i.e.,  one  button  each  for 
positive,  negative,  and  neutral  response  (Breckler  &  Berman, 
1991)). 


Affect  Recognition  by  Users 

Computers  are  not  the  only  (potential)  affect  recognizers  in 
human-computer  interactions.  When  confronted  with  an 
interface — particularly  a  social  or  character-based  interface — 
users  constantly  monitor  cues  to  the  affective  state  of  their 
interaction  partner,  the  computer  (though  often  noncon- 
sciously;  see  Reeves  &  Nass,  1996).  Creating  natural  and  effi¬ 
cient  interfaces  requires  not  only  recognizing  emotion  in  users, 
but  also  expressing  emotion.  Traditional  media  creators  have 
known  for  a  long  time  that  portrayal  of  emotion  is  a  fundamen¬ 
tal  key  to  creating  the  “illusion  of  life”  (Jones,  1990;  Thomas  & 
Johnson,  1981;  for  discussions  of  believable  agents  and  emo¬ 
tion,  see,  i.e.,  Bates,  1994;  Maldonado,  Picard,  &  Hayes-Roth, 
1998). 

Facial  expression  and  gesture  are  the  two  most  common 
ways  to  manifest  emotion  in  screen-based  characters  (Cassell 
et  al.,  2000;  Kurlander,  Skelly,  &  Salesin,  1996).  Though  ani¬ 
mated  expressions  lack  much  of  the  intricacy  found  in  human 
expressions,  users  are  nonetheless  capable  of  distinguishing 
emotions  in  animated  characters  (Cassell  et  al.,  2000;  Schiano, 
Ehrlich,  Rahardja,  &  Sheridan,  2000).  As  with  emotion  recogni¬ 
tion,  Ekman  and  Friesen’s  (1977)  Facial  Action  Coding  System 
(FACS)  is  a  commonly  used  and  well-developed  method  for 
constructing  affective  expressions.  One  common  strategy  for 
improving  accurate  communication  with  animated  characters 
is  to  exaggerate  expressions,  but  whether  this  leads  to  corre¬ 
sponding  exaggerated  assumptions  about  the  underlying  emo¬ 
tion  has  not  been  studied. 

Characters  that  talk  can  also  use  voice  to  communicate  emo¬ 
tion  (Nass  &  Gong,  2000).  Prerecorded  utterances  are  easily 
infused  with  affective  tone,  but  are  fixed  and  inflexible.  Cahn 
(1990)  have  successfully  synthesized  affect-laden  speech  using  a 
text-to-speech  (TTS)  system  coupled  with  content-sensitive 
rules  regarding  appropriate  acoustic  qualities  (including  pitch, 
timing,  and  voice  quality;  see  also  Nass,  Foehr,  &  Somoza,  2000). 
Users  were  able  to  distinguish  among  six  different  emotions 
with  about  50%  accuracy,  which  is  impressive  considering  that 
people  are  generally  only  60%  accurate  in  recognizing  affect  in 
human  speech  (Scherer,  1981). 

Finally,  characters  can  indicate  affective  state  verbally  through 
word  and  topic  choice,  as  well  as  explicit  statements  of  affect 
(i.e.,  “I’m  happy”).  Characters,  whose  nonverbal  and  verbal  ex¬ 
pressions  are  distinctly  mismatched,  however,  may  be  seen  as 
awkward  or  even  untrustworthy.  In  less  extreme  mismatched 
cases,  recent  evidence  suggests  that  users  will  give  precedence 
to  nonverbal  cues  in  judgments  about  affect  (Nass  et  al.,  2000). 
This  finding  is  critical  for  applications  in  which  characters/agents 
mediate  interpersonal  communication  (i.e.,  in  virtual  worlds  or 
when  characters  read  email  to  a  user),  because  the  affective 
tone  of  a  message  may  be  inappropriately  masked  by  the  char¬ 
acter’s  affective  state.  Ideally,  in  such  computer-mediated  com¬ 
munication  contexts,  emotion  would  be  encoded  into  the  mes¬ 
sage  itself,  either  through  explicit  tagging  of  the  message  with 
affect,  through  natural  language  processing  of  the  message,  or 
through  direct  recognition  of  the  sender’s  affective  state  dur¬ 
ing  message  composition  (i.e.,  using  autonomic  nervous  system 
or  facial  expression  measures).  Mediator  characters  could  then 
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display  the  appropriate  nonverbal  cues  to  match  the  verbal  con¬ 
tent  of  the  message. 


OPEN  QUESTIONS 


Beyond  the  obvious  need  for  advancements  in  affect  recogni¬ 
tion  and  manifestation  technology  it  is  our  opinion  that  there 
are  five  important  and  remarkably  unexplored  areas  for  re¬ 
search  in  emotion  and  HCI: 

1 .  With  which  emotion  should  HCI  designers  be  most 
concerned? 

Which  emotion(s)  should  interface  designers  address  first? 
The  basic  emotions,  to  the  extent  that  they  exist  and  can  be 
identified,  have  the  advantage  of  similarity  across  cultures  and 
easy  discriminability.  Thus,  designs  that  attempt  to  act  upon  or 
manipulate  these  dimensions  may  be  the  simplest  to  imple¬ 
ment.  However,  within  these  basic  emotions,  little  is  known 
about  their  relative  manipulability  or  manifestability — particu¬ 
larly  within  the  HCI  context — or  their  relative  impact  on  indi¬ 
viduals’  attitudes  and  behaviors.  Once  one  moves  beyond  the 
basic  emotions,  cultural  and  individual  differences  introduce 
further  problems  and  opportunities. 

2.  When  and  how  should  interfaces  attempt  to  directly  address 
users'  emotions  and  basic  needs  (vs.  application-specific  goals}? 

If  one  views  a  computer  or  an  interface  merely  as  a  tool,  then 
interface  design  should  solely  focus  on  application-specific 
goals,  assessed  by  such  metrics  as  efficiency,  learnability,  and  ac¬ 
curacy.  However,  if  computers  and  interfaces  are  understood  as 
a  medium,  then  it  becomes  important  to  think  about  both  uses 
and  gratifications  (Katz,  Blumler,  &  Gurevitch,  1974;  Rosen- 
gren,  1974;  Rubin,  1986);  that  is,  the  more  general  emotional 
and  basic  needs  that  users  bring  to  any  interaction.  Notions  of 
“infotainment”  or  “edutainment”  indicate  one  category  of  at¬ 
tempts  to  balance  task  and  affect.  However,  there  is  little  un¬ 
derstanding  of  how  aspects  of  interfaces  that  directly  manipu¬ 
late  users’  emotions  compliment,  undermine,  or  are  orthogonal 
to  aspects  of  interfaces  that  specifically  address  users’  task 
needs. 

3.  How  accurate  must  emotion  recognition  be  to  be  useful 
as  an  interface  technique? 

Although  humans  are  not  highly  accurate  emotion  detec¬ 
tors — the  problem  of  “receiving  accuracy”  (Picard,  1997a, 
p.  120) — they  nonetheless  benefit  from  deducing  other’s  emo¬ 
tions  and  acting  on  those  deductions  (Goleman,  1995).  Clearly, 
however,  a  minimum  threshold  of  accuracy  is  required  before 
behavior  based  on  emotion  induction  is  appropriate.  Very  little 
is  known  about  the  level  of  confidence  necessary  before  an  in¬ 
terface  can  effectively  act  on  a  user’s  emotional  state. 


4.  When  and  how  should  users  be  informed  that  their  affective 
states  are  being  monitored  and  adapted  to? 

■When  two  people  interact,  there  is  an  implicit  assumption 
that  each  person  is  monitoring  the  other’s  emotional  state  and 
responding  based  on  that  emotional  state.  However,  an  explicit 
statement  of  this  fact  would  be  highly  disturbing:  “To  facilitate 
our  interaction,  I  will  carefully  and  constantly  monitor  every¬ 
thing  you  say  and  do  to  discern  your  emotional  state  and  re¬ 
spond  based  on  that  emotional  state”  or  “I  have  determined 
that  you  are  sad;  I  will  now  perform  actions  that  will  make  you 
happier.”  However,  when  machines  acquire  and  act  upon  infor¬ 
mation  about  users  without  making  that  acquisition  and  adap¬ 
tation  explicit,  there  is  often  a  feeling  of  “surreptitiousness”  or 
“manipulation.”  Furthermore,  if  emotion  monitoring  and  adapt¬ 
ing  software  are  desired  by  consumers,  there  are  clearly  incen¬ 
tives  for  announcing  and  marketing  these  abilities.  Because  nor¬ 
mal  humans  only  exhibit  implicit  monitoring,  the  psychological 
literature  is  silent  on  the  psychological  and  performance  impli¬ 
cations  for  awareness  of  emotional  monitoring  and  adaptation. 

5.  How  does  emotion  play  out  in  computer-mediated 
communication  (CMC)? 

This  chapter  has  focused  on  the  direct  relationship  between 
the  user  and  the  interface.  However,  computers  also  are  used  to 
mediate  interactions  between  people.  In  face-to-face  encoun¬ 
ters,  affect  not  only  creates  richer  interaction,  but  also  helps  to 
disambiguate  meaning,  allowing  for  more  effective  communi¬ 
cation.  Little  is  known,  however,  about  the  psychological  effects 
of  mediated  affect,  or  the  optimal  strategies  for  encoding  and 
displaying  affective  messages  (see  Maldonado  &  Picard,  1999; 
Rivera,  Cooke,  &  Bauhs,  1996). 


CONCLUSION 


Though  much  progress  has  been  made  in  the  domain  of  “affec¬ 
tive  computing”  (Picard,  1997a),  more  work  is  clearly  necessary 
before  interfaces  that  incorporate  emotion  recognition  and 
manifestation  can  reach  their  full  potential.  Nevertheless,  a  care¬ 
ful  consideration  of  affect  in  interaction  design  and  testing  can 
be  instrumental  in  creating  interfaces  that  are  both  efficient 
and  effective,  as  well  as  enjoyable  and  satisfying.  Designers  and 
theorists,  for  even  the  simplest  interfaces,  are  well  advised  to 
thoughtfully  address  the  intimate  and  far-reaching  linkages  be¬ 
tween  emotion  and  HCI. 
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Designing  interactive  computer  systems  to  be  efficient  and  easy  to  use 
is  important  so  that  people  in  our  society  may  realize  the  potential  ben¬ 
efits  of  computer-based  tools.  .  .  .  Although  modern  cognitive  psychol¬ 
ogy  contains  a  wealth  of  knowledge  of  human  behavior,  it  is  not  a  sim¬ 
ple  matter  to  bring  this  knowledge  to  bear  on  the  practical  problems  of 
design — to  build  an  applied  psychology  that  includes  theory,  data,  and 
knowledge.  (Card,  Moran,  &  Newell,  1983,  p.  vii) 

Integrating  theory,  data,  and  knowledge  about  cognitive  psy¬ 
chology  and  human  performance  in  a  way  that  is  useful  for  guid¬ 
ing  design  in  HCI  is  still  not  a  simple  matter.  However,  there 
have  been  significant  advances  since  Card,  Moran,  and  Newell 
(1983)  wrote  the  previous  passage.  One  of  the  key  advances  is 
the  development  of  cognitive  architectures,  the  subject  of  this 
chapter.  The  chapter  will  first  consider  what  it  is  to  be  a  cogni¬ 
tive  architecture  and  why  cognitive  architectures  are  relevant  for 
HCI.  In  order  to  detail  the  present  state  of  cognitive  architec¬ 
tures  in  HCI,  it  is  important  to  consider  some  of  the  past  uses  of 
cognitive  architectures  in  HCI  research.  Then,  three  architec¬ 
tures  actively  in  use  in  the  research  community  (LICAI/CoLiDeS, 
EPIC,  and  ACT-R)  and  their  application  to  HCI  will  be  examined. 
The  chapter  will  conclude  with  a  discussion  of  the  future  of  cog¬ 
nitive  architectures  in  HCI. 


WHAT  ARE  COGNITIVE  ARCHITECTURES? 


Most  dictionaries  will  list  several  different  definitions  for  the 
word  architecture.  For  example,  dictionary.com  lists  among 
them  “a  style  and  method  of  design  and  construction,”  e.g., 
Byzantine  architecture;  “orderly  arrangement  of  parts;  struc¬ 
ture,”  e.g.,  the  architecture  of  a  novel;  and  one  from  computer 
science:  “the  overall  design  or  structure  of  a  computer  system.” 
What,  then,  would  something  have  to  be  to  qualify  as  a  “cogni¬ 
tive  architecture”?  It  is  something  much  more  in  the  latter 
senses  of  the  word  architecture,  an  attempt  to  describe  the 
overall  structure  and  arrangement  of  a  very  particular  thing, 
the  human  cognitive  system.  A  cognitive  architecture  is  a  broad 
theory  of  human  cognition  based  on  a  wide  selection  of  human 
experimental  data,  and  implemented  as  a  running  computer 
simulation  program.  Young  (Gray,  Young,  &  Kirschenbaum,  1997; 
Ritter  &  Young,  2001)  defined  a  cognitive  architecture  as  an  em¬ 
bodiment  of  “a  scientific  hypothesis  about  those  aspects  of  hu¬ 
man  cognition  that  are  relatively  constant  over  time  and  rela¬ 
tively  independent  of  task.” 

This  idea  has  been  a  part  of  cognitive  science  since  the  early 
days  of  cognitive  psychology  and  artificial  intelligence,  as  mani¬ 
fested  in  the  General  Problem  Solver  or  GPS  (Newell  &  Simon, 
1963),  one  of  the  first  successful  computational  cognitive  mod¬ 
els.  These  theories  have  progressed  a  great  deal  since  GPS  and 
are  gradually  becoming  broader.  One  of  the  best  descriptions  of 
the  vision  for  this  area  is  presented  in  Newell’s  (1990)  book  Uni¬ 
fied  Theories  of  Cognition.  In  it,  Newell  argued  that  the  time  has 
come  for  cognitive  psychology  to  stop  collecting  disconnected 
empirical  phenomena  and  begin  seriously  considering  theoreti¬ 
cal  unification  in  the  form  of  computer  simulation  models.  Cog¬ 
nitive  architectures  are  attempts  to  do  just  this. 

Cognitive  architectures  are  distinct  from  engineering  ap¬ 
proaches  to  artificial  intelligence,  which  strive  to  construct  in¬ 


telligent  computer  systems  by  whatever  technologies  best  serve 
that  purpose.  Cognitive  architectures  are  designed  to  simulate 
human  intelligence  in  a  humanlike  way  (Newell,  1990).  For  ex¬ 
ample,  the  chess  program  that  defeated  Kasparov,  Deep  Blue, 
would  not  qualify  as  a  cognitive  architecture  because  it  does  not 
solve  the  problem  (chess)  in  a  humanlike  way.  Deep  Blue  uses 
massive  search  of  the  game  space,  while  human  experts  gener¬ 
ally  look  only  a  few  moves  ahead,  but  concentrate  effectively 
on  quality  moves. 

Cognitive  architectures  differ  from  traditional  research  in 
psychology  in  that  work  on  cognitive  architecture  is  integra¬ 
tive — that  is,  architectures  include  mechanisms  to  support  at¬ 
tention,  memory,  problem  solving,  decision  making,  learning, 
and  so  forth.  Most  theorizing  in  psychology  follows  a  divide- 
and-conquer  strategy  that  tends  to  generate  highly  specific  the¬ 
ories  of  a  very  limited  range  of  phenomena;  this  has  changed  lit¬ 
tle  since  the  1970s  (Newell,  1973).  This  limits  the  usefulness  of 
such  theories  for  an  applied  domain  such  as  HCI,  where  users 
employ  a  wide  range  of  cognitive  capabilities  in  even  simple 
tasks.  Instead  of  asking,  “How  can  we  describe  this  isolated  phe¬ 
nomenon?”  people  working  with  cognitive  architectures  can 
ask,  “How  does  this  phenomenon  fit  in  with  what  we  already 
know  about  other  aspects  of  cognition?” 

Another  important  feature  of  cognitive  architectures  is  that 
they  specify  only  the  human  “virtual  machine,”  the  fixed  ar¬ 
chitecture.  A  cognitive  architecture  alone  cannot  do  anything. 
Generally,  the  architecture  has  to  be  supplied  with  the  knowl¬ 
edge  needed  to  perform  a  particular  task.  The  combination 
of  an  architecture  and  a  particular  set  of  knowledge  is  gener¬ 
ally  referred  to  as  a  “computational  cognitive  model,”  or  just 
a  “model.”  In  general,  it  is  possible  to  construct  more  than  one 
model  for  any  particular  task.  The  specific  knowledge  incor¬ 
porated  into  a  particular  model  is  determined  by  the  modeler. 
Because  the  relevant  knowledge  must  be  supplied  to  the  ar¬ 
chitecture,  the  knowledge  engineering  task  facing  modelers 
attempting  to  model  performance  on  complex  tasks  can  be 
formidable. 

A  third  centrally  important  feature  of  cognitive  architectures 
is  that  they  are  software  artifacts  constructed  by  human  pro¬ 
grammers.  This  has  a  number  of  relevant  ramifications.  First,  a 
model  of  a  task  constructed  in  a  cognitive  architecture  is 
runnable  and  produces  a  sequence  of  behaviors.  These  behav¬ 
ior  sequences  can  be  compared  with  the  sequences  produced 
by  human  users  to  help  assess  the  quality  of  a  particular  model. 
They  may  also  provide  insight  into  alternate  ways  to  perform  a 
task — that  is,  they  may  show  possible  strategies  that  are  not  ac¬ 
tually  utilized  by  the  people  performing  the  task.  This  can  be 
useful  in  guiding  interface  design  as  well.  Another  feature  of 
many  architectures  is  that  they  enable  the  creation  of  quantita¬ 
tive  models.  For  instance,  the  model  may  say  more  than  just 
“click  on  button  A  and  then  menu  B,”  but  may  include  the  time 
between  the  two  clicks  as  well.  Models  based  on  cognitive  ar¬ 
chitectures  can  produce  execution  times,  error  rates,  and  even 
learning  curves.  This  is  a  major  strength  of  cognitive  architec¬ 
tures  as  an  approach  to  certain  kinds  of  HCI  problems  and  will 
be  discussed  in  more  detail  in  the  next  section. 

On  the  other  hand,  cognitive  architectures  are  large  software 
systems,  which  are  often  considered  difficult  to  construct  and 
maintain.  Individual  models  are  also  essentially  programs,  writ- 
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ten  in  the  “language”  of  the  cognitive  architecture.  Thus,  indi¬ 
vidual  modelers  need  to  have  solid  programming  skills. 

Finally,  cognitive  architectures  are  not  in  wide  use  among 
HCI  practitioners.  Right  now,  they  exist  primarily  in  academic  re¬ 
search  laboratories.  One  of  the  barriers  for  practitioners  is  that 
learning  and  using  most  cognitive  architectures  is  itself  gener¬ 
ally  a  difficult  task;  however,  this  is  gradually  changing,  and 
some  of  the  issues  being  addressed  in  this  regard  will  be  dis¬ 
cussed  in  section  4.  Furthermore,  even  if  cognitive  architectures 
are  not  in  wide  use  by  practitioners,  this  does  not  mean  that 
they  are  irrelevant  to  practitioners.  The  next  section  highlights 
why  cognitive  architectures  are  relevant  to  a  wide  HCI  audience. 


RELEVANCE  TO  HUMAN-COMPUTER 
INTERACTION 


For  some  readers,  the  relevance  of  models  that  produce  quan¬ 
titative  predictions  about  human  performance  will  be  obvious. 
For  others,  this  may  be  less  immediately  clear.  Cognitive  archi¬ 
tectures  (a)  are  relevant  to  usability  as  an  engineering  discipline, 
(b)  have  several  HCI-relevant  applications  in  computing  systems, 
and  (c)  serve  an  important  role  in  HCI  as  a  theoretical  science. 

At  nearly  all  HCToriented  conferences,  and  many  online  re¬ 
sources,  there  are  areas  where  corporations  recruit  HCI  profes¬ 
sionals.  A  common  job  title  in  these  forums  is  “usability  engi¬ 
neer.”  Implicit  in  this  title  is  the  view  that  usability  is,  at  least  in 
part,  an  engineering  enterprise.  In  addition,  while  people  with 
this  job  title  are  certainly  involved  in  product  design,  there  is  a 
sense  in  which  most  usability  engineering  would  not  be  recog¬ 
nized  as  engineering  by  people  trained  in  more  traditional  engi¬ 
neering  disciplines,  such  as  electrical  or  aerospace  engineering. 
In  traditional  engineering  disciplines,  design  is  generally  guided 
at  least  in  part  by  quantitative  theory.  Engineers  have  at  their  dis¬ 
posal  hard  theories  of  the  domain  in  which  they  work,  and  these 
theories  allow  them  to  derive  quantitative  predictions.  Consider 
an  aerospace  engineer  designing  a  wing.  Like  a  usability  engi¬ 
neer,  the  aerospace  engineer  will  not  start  with  nothing;  a  pre¬ 
existing  design  often  provides  a  starting  point;  however,  when 
the  aerospace  engineer  decides  to  make  a  change  in  that  design, 
there  is  usually  quantitative  guidance  about  how  the  perfor¬ 
mance  of  the  wing  will  change  because  of  the  change  in  design. 
This  guidance,  while  quantitative,  is  not  infallible,  hence  the 
need  for  evaluation  tools  such  as  wind  tunnels.  This  is  similar  to 
the  usability  engineer’s  usability  test.  Unlike  usability  testing, 
however,  the  aerospace  engineer  has  some  quantitative  idea 
about  what  the  outcome  of  the  test  will  be,  and  this  is  not  guided 
simply  by  intuition  and  experience,  but  by  a  mathematical  theory 
of  aerodynamics.  In  fact,  this  theory  is  now  so  advanced  that  few 
wind  tunnels  are  built  anymore.  Instead,  they  are  being  replaced 
by  computer  simulations  based  on  “computational  fluid  dynam¬ 
ics,”  an  outcome  of  the  application  of  computational  techniques 
to  complex  problems  in  aerodynamics.  Simulations  have  not  en¬ 
tirely  replaced  wind  tunnels,  but  the  demand  for  wind  tunnel 
time  has  clearly  been  affected  by  this  development. 

For  the  most  part,  the  usability  engineer  lacks  the  quantita¬ 
tive  tools  available  to  the  aerospace  engineer.  Every  design  must 


be  subjected  to  its  own  wind  tunnel  (usability)  test,  and  the  en¬ 
gineer  has  little  guidance  about  what  to  expect  other  than  from 
intuition  and  experience  with  similar  tests.  While  intuition  and 
experience  can  certainly  be  valuable  guides,  they  often  fall  short 
of  more  “hard”  quantitative  methods.  Perhaps  the  engineer  can 
intuit  that  interface  “X”  will  allow  users  to  complete  tasks  faster 
than  with  interface  “Y,”  but  how  much  faster?  Ten  percent? 
Twenty  percent?  Even  small  savings  in  execution  times  can  add 
up  to  large  financial  savings  for  organizations  when  one  con¬ 
siders  the  scale  of  the  activity.  The  paradigm  example  is  the  tele¬ 
phone  operators  studied  by  Gray,  John,  and  Atwood  (1993), 
where  even  a  second  saved  on  an  average  call  would  save  the 
telephone  company  millions  of  dollars. 

Computational  models  based  on  cognitive  architectures 
have  the  potential  to  provide  detailed  quantitative  answers,  and 
for  more  than  just  execution  times.  Error  rates,  transfer  of  knowl¬ 
edge,  learning  rates,  and  other  kinds  of  performance  measures 
are  all  metrics  than  can  often  be  provided  by  architecture-based 
models.  Even  if  such  models  are  not  always  precisely  accurate  in 
an  absolute  sense,  they  may  still  be  useful  in  a  comparative 
sense.  Eor  example,  if  a  usability  engineer  is  comparing  interface 
A  with  interface  B  and  the  model  at  his  or  her  disposal  does  not 
accurately  predict  the  absolute  times  to  complete  some  set  of 
benchmark  tasks,  it  may  still  accurately  capture  the  ordinal  dif¬ 
ference  between  the  two  interfaces,  which  may  be  enough. 

Additionally,  there  are  certain  circumstances  when  usability 
tests  are  either  impractical,  prohibitively  costly,  or  both.  For 
example,  access  to  certain  populations  such  as  physicians  or 
astronauts  may  be  difficult  or  expensive,  so  bringing  them  in  for 
repeated  usability  tests  may  not  be  feasible.  While  developing  a 
model  of  a  pilot  or  an  air  traffic  controller  performing  an  expert 
task  with  specialized  systems  may  be  difficult  at  first,  rerunning 
that  model  to  assess  a  change  made  to  the  user  interface  should 
be  much  more  straightforward  than  performing  a  new  usability 
test  for  each  iteration  of  the  system.  This  is  possible  only  with  a 
quantitatively  realistic  model  of  the  human  in  the  loop,  one  that 
can  produce  things  such  as  execution  times  and  error  rates. 
Computational  models  can,  in  principle,  act  as  surrogate  users 
in  usability  testing,  even  for  special  populations. 

Of  course,  some  of  these  measures  can  be  obtained  through 
other  methods  such  as  GOMS  (Goals,  Operators,  Methods,  and 
Selection  rules)  analysis  (Card  et  al.,  1983;  John  &  Kieras,  1996) 
or  cognitive  walkthrough  (Poison,  Lewis,  Rieman,  &  Wharton, 
1992);  however,  these  techniques  were  originally  grounded  in 
the  same  ideas  as  some  prominent  cognitive  architectures  and 
are  essentially  abstractions  of  the  relevant  architectures  for  par¬ 
ticular  HCI  purposes.  In  addition,  architecture-based  computa¬ 
tional  models  provide  things  that  GOMS  models  and  cognitive 
walkthroughs  do  not.  First,  models  are  executable  and  genera¬ 
tive.  A  GOMS  analysis,  on  the  other  hand,  is  a  description  of  the 
procedural  knowledge  the  user  must  have  and  the  sequence  of 
actions  that  must  be  performed  to  accomplish  a  specific  task 
instance,  while  the  equivalent  computational  model  actually 
generates  the  behaviors,  often  in  real  time  or  faster.  Equally  im¬ 
portantly,  computational  models  have  the  capacity  to  be  reac¬ 
tive  in  real  time.  So  while  it  may  be  possible  to  construct  a 
GOMS  model  that  describes  the  knowledge  necessary  and  the 
time  it  will  take  an  operator  to  classify  a  new  object  on  an  air 
traffic  controller’s  screen,  a  paper-and-pencil  GOMS  model  can- 
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not  actually  execute  the  procedure  in  response  to  the  appear¬ 
ance  of  such  an  object.  A  running  computational  model,  on  the 
other  hand,  can. 

Because  of  this  property,  architecture-based  computational 
models  have  some  other  important  uses  beyond  acting  as  virtual 
users  in  usability  tests.  One  such  use  is  in  intelligent  tutoring 
systems  (ITSs).  Consider  the  Lisp  tutor  (Anderson,  Conrad,  & 
Corbett,  1989).  This  tutoring  system  contained  an  architecture- 
based  running  computational  model  of  the  knowledge  neces¬ 
sary  to  implement  the  relevant  Lisp  functions,  as  well  as  a  mod¬ 
ule  for  assessing  which  pieces  of  this  knowledge  were  mastered 
by  the  student.  Because  the  model  was  executable,  it  could  pre¬ 
dict  what  action  the  student  would  take  if  the  student  had  cor¬ 
rect  knowledge  of  how  to  solve  the  problem.  When  the  student 
took  a  different  action,  this  told  the  ITS  that  the  student  was 
missing  one  or  more  relevant  pieces  of  knowledge.  The  student 
could  then  be  given  feedback  about  what  knowledge  was  miss¬ 
ing  or  incomplete,  and  problems  that  exercise  this  knowledge 
could  be  selected  by  the  ITS.  It  is  possible  to  generate  more  ef¬ 
fective  educational  experiences  by  identifying  students’  knowl¬ 
edge,  as  well  as  the  gaps  in  that  knowledge.  Problems  that  con¬ 
tain  knowledge  students  have  already  mastered  can  be  avoided, 
so  as  not  to  bore  students  with  things  they  already  know.  This 
frees  up  students  to  concentrate  on  the  material  they  have  not 
yet  mastered,  resulting  in  improved  learning  (Anderson  et  al., 
1989).  While  the  Lisp  tutor  is  an  old  research  system,  ITSs  based 
on  the  same  underlying  cognitive  architecture  with  the  same  es¬ 
sential  methodology  have  been  developed  for  more  pressing 
educational  needs  such  as  algebra  and  geometry  and  are  now 
sold  commercially  (see  http:/Avww.carnegielearning.com). 

Another  HCTrelevant  application  for  high-fidelity  cognitive 
models  is  populating  simulated  worlds  or  situations.  For  exam¬ 
ple,  training  an  F-16  fighter  pilot  is  expensive,  even  in  a  simula¬ 
tor,  because  that  trainee  needs  to  face  realistic  opposition.  Real¬ 
istic  opposition  consists  of  other  trained  pilots,  so  training  one 
person  requires  taking  several  trained  pilots  away  from  their  nor¬ 
mal  duties  (e.g.,  flying  airplanes  on  real  missions).  This  is  difficult 
and  expensive.  If,  however,  the  other  pilots  could  be  realistically 
simulated,  the  trainee  could  face  opposition  that  would  have 
useful  training  value  without  having  to  remove  already  trained 
pilots  from  their  duties.  Many  training  situations  such  as  this  ex¬ 
ist,  where  the  only  way  to  train  someone  is  to  involve  multiple 
human  experts  who  must  all  be  taken  away  from  their  regular 
jobs.  The  need  for  expensive  experts  can  potentially  be  elimi¬ 
nated  (or  at  least  reduced),  however,  by  using  architecturally 
based  cognitive  models  in  place  of  the  human  experts.  The  U.S. 
military  has  already  started  to  experiment  with  just  such  a  sce¬ 
nario  (Jones,  Laird,  Nielsen,  Coulter,  Kenny,  &  Koss,  1999).  Having 
realistic  opponents  is  desirable  in  domains  other  than  training  as 
well,  such  as  video  games.  Besides  things  like  texture-mapped 
3-D  graphics,  one  of  the  features  often  used  to  sell  games  is  net¬ 
work  play.  This  enables  players  to  engage  opponents  whose  ca¬ 
pabilities  are  more  comparable  to  their  own  than  typical  com¬ 
puter-generated  opponents;  however,  even  with  network  play, 
it  is  not  always  possible  for  a  game  to  find  an  appropriate  oppo¬ 
nent.  If  the  computer-generated  opponent  were  a  more  high- 
fidelity  simulation  of  a  human  in  terms  of  cognitive  and  percep¬ 
tual-motor  capabilities,  then  video  game  players  would  have  no 
difficulty  finding  appropriate  opponents  without  relying  on  net¬ 


work  play.  "While  this  might  not  be  the  most  scientifically  inter¬ 
esting  use  of  cognitive  architectures,  it  seems  inevitable  that  cog¬ 
nitive  architectures  will  be  used  in  this  way. 

Cognitive  architectures  are  also  theoretically  important  to 
HCI  as  an  interdisciplinary  field.  Many  people  (including  some 
cognitive  psychologists)  find  terms  from  cognitive  psychology 
such  as  “working  memory”  or  “mental  model”  vague  and  ill  de¬ 
fined.  A  harsh  evaluation  of  explanations  relying  on  such  terms 
is  found  in  Salthouse  (1988,  p.  3):  “It  is  quite  possible  that  in¬ 
terpretations  relying  on  such  nebulous  constructs  are  only  mas¬ 
querading  ignorance  in  what  is  essentially  vacuous  terminol¬ 
ogy.”  Computational  cognitive  architectures,  on  the  other  hand, 
require  explicit  specifications  of  the  semantics  of  theoretical 
terms.  Even  if  the  architectures  are  imperfect  descriptions  of 
the  human  cognitive  system,  they  are  at  a  minimum  well  speci¬ 
fied  and  therefore  clearer  in  what  they  predict  than  strictly  ver¬ 
bal  theories. 

A  second  theoretical  advantage  of  computational  theories 
such  as  cognitive  architectures  is  that  they  provide  a  window 
into  how  the  theory  actually  works.  As  theories  grow  in  size  and 
number  of  mechanisms,  the  interactions  of  those  mechanisms 
becomes  increasingly  difficult  to  predict  analytically.  Computer 
simulations  permit  relatively  rapid  evaluations  of  complex 
mechanisms  and  their  interactions.  (For  an  excellent  discussion 
of  this  topic,  see  Simon,  1996.)  Another  problem  with  theories 
based  solely  on  verbal  descriptions  is  their  internal  coherence 
can  be  very  difficult  to  assess,  while  such  assessment  is  much 
more  straightforward  with  computational  models.  "Verbal  theo¬ 
ries  can  easily  hide  subtle  (and  not  so  subtle)  inconsistencies 
that  make  them  poor  scientific  theories.  Computational  models, 
on  the  other  hand,  force  explanations  to  have  a  high  level  of  in¬ 
ternal  coherence;  theories  that  are  not  internally  consistent  are 
typically  impossible  to  implement  on  real  machines. 

Finally,  HCI  is  an  interdisciplinary  field,  and  thus  theories 
that  are  fundamentally  interdisciplinary  in  nature  are  appropri¬ 
ate.  Cognitive  architectures  are  such  theories,  combining  com¬ 
putational  methods  and  knowledge  from  the  artificial  intelli¬ 
gence  end  of  computer  science  with  data  and  theories  from 
cognitive  psychology.  "While  cognitive  psychology  and  computer 
science  are  certainly  not  the  only  disciplines  that  participate  in 
HCI,  they  are  two  highly  visible  forces  in  the  field.  Psychologi¬ 
cal  theories  that  are  manifested  as  executable  programs  should 
be  less  alien  to  people  with  a  computer  science  background 
than  more  traditional  psychological  theories. 

Thus,  cognitive  architectures  are  clearly  relevant  to  HCI  at  a 
number  of  levels.  This  fact  has  not  gone  unnoticed  by  the  HCI 
research  community.  In  fact,  cognitive  architectures  have  a  long 
history  in  HCI,  dating  back  to  the  original  work  of  Card,  Moran, 
and  Newell  (1983). 


BRIEF  LOOK  AT  PAST  SYSTEMS  IN  HCI 


The  total  history  of  cognitive  architectures  and  HCI  would  be  far 
too  long  to  document  in  a  single  chapter;  however,  it  is  possible 
to  touch  on  some  highlights.  "While  not  all  of  the  systems  de¬ 
scribed  in  this  section  qualify  as  complete  cognitive  architec- 
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tures,  they  all  share  intellectual  history  with  more  current  ar¬ 
chitectures  and  influenced  their  development  and  use  in  HCI. 
Finally  many  of  the  concepts  developed  in  these  efforts  are  still 
central  parts  of  the  ongoing  research  on  cognitive  architecture. 
In  addition,  there  is  a  natural  starting  point: 

The  Model  Human  Processor  (MHP)  and  GOMS 

The  Psychology  of  Human-Computer  Interaction  (Card,  Moran, 
&  Newell,  1983)  is  clearly  a  seminal  work  in  HCI,  one  of  the 
defining  academic  works  in  the  early  days  of  the  field.  While  that 
work  did  not  produce  a  running  cognitive  architecture,  it  was 
clearly  in  the  spirit  of  cognitive  architectures  and  was  quite  in¬ 
fluential  in  the  development  of  current  cognitive  architectures. 
Two  particular  pieces  of  that  work  are  relevant  here,  the  Model 
Human  Processor  (MHP)  and  GOMS. 

The  MHP  represents  a  synthesis  of  the  literature  on  cognitive 
psychology  and  human  performance  up  to  that  time,  and 
sketches  the  framework  around  which  a  cognitive  architecture 
could  be  implemented.  The  MHP  is  a  system  with  multiple 
memories  and  multiple  processors,  and  many  of  the  properties 
of  those  processors  and  memories  are  described  in  some  detail. 
(See  Fig.  5.1.)  Card,  Moran,  and  Newell  (1983)  also  specified  the 
interconnections  of  the  processors  and  a  number  of  general 
operating  principles.  In  this  system,  there  are  three  processors: 


FIGURE  5.1.  Model  Human  Processor.  Based  on  Card,  Moran, 
and  Newell  (1983). 


(a)  one  cognitive,  (b)  one  perceptual,  and  (c)  one  motor.  In 
some  cases,  the  system  essentially  behaves  serially.  For  instance, 
in  order  for  the  system  to  press  a  key  in  response  to  the  ap¬ 
pearance  of  a  light,  the  perceptual  processor  must  detect  the 
appearance  of  the  light  and  transmit  this  information  to  the  cog¬ 
nitive  processor.  The  cognitive  processor’s  job  is  to  decide  what 
the  appropriate  response  should  be,  and  then  transmit  that  to 
the  motor  processor,  which  is  responsible  for  actually  executing 
the  appropriate  motor  command.  In  this  situation,  the  proces¬ 
sors  act  serially,  one  after  another;  however,  in  more  complex 
tasks  such  as  transcription  typing,  all  three  processors  often 
work  in  parallel. 

Besides  the  specification  of  the  timing  for  each  processor 
and  the  connectivity  of  the  processors,  Card,  Moran,  and  Newell 
named  some  general  operating  principles  ranging  from  very 
general  and  qualitative  to  detailed  and  quantitative.  For  exam¬ 
ple,  Principle  P9,  the  Problem  Space  Principle,  stated: 

The  rational  activity  in  which  people  engage  to  solve  a  problem  can  be 
described  in  terms  of  (1)  a  set  of  states  of  knowledge,  (2)  operators  for 
changing  one  state  into  another,  (3)  constraints  on  applying  operators, 
and  (4)  control  knowledge  for  deciding  which  operator  to  apply  next. 
(Card  et  al.  1983,  p.  27) 

This  is  a  particularly  general  and  somewhat  vague  principle. 
In  contrast,  consider  Principle  P5,  Fitts’  Law: 

The  time  Tpos  to  move  the  hand  to  a  target  of  size  S  which  lies  a  dis¬ 
tance  D  away  is  given  by: 

Tpos  =  lifog2{D/S  +  .5) 

where 4f  =  100  [70  ~  120]  ms/bit.  (Card  et  al.  1983,  p.  27). 

This  is  a  very  specific  principle  that  quantitatively  describes 
hand  movement  behavior,  which  is  highly  relevant  to,  say,  point¬ 
ing  with  a  mouse.  Overall,  the  specification  of  the  MHP  is  quite 
thorough,  and  it  lays  out  a  basis  for  a  cognitive  architecture  able 
to  do  a  wide  variety  of  HCI-relevant  tasks;  however.  Card  et  al. 
(1983)  did  not  implement  the  MHP  as  a  running  cognitive  ar¬ 
chitecture.  This  is  likely  for  pedagogical  reasons;  it  is  not  nec¬ 
essary  to  have  a  complete,  running  cognitive  architecture  for 
the  general  properties  of  that  architecture  to  be  useful  for  guid¬ 
ing  HCI  researchers  and  practitioners.  At  the  time,  computa¬ 
tional  modeling  was  the  domain  of  a  very  specialized  few  in  cog¬ 
nitive  psychology. 

Card  et  al.  (1983)  laid  out  another  concept  that  has  been 
highly  influential  throughout  HCI  and  particularly  in  the  com¬ 
munity  of  computational  modelers.  This  is  GOMS,  which  stands 
for  Goals,  Operators,  Methods,  and  Selection  rules.  GOMS  is  a 
framework  for  task  analysis  that  describes  routine  cognitive 
skills  in  terms  of  the  four  listed  components.  Routine  cognitive 
skills  are  those  where  the  user  knows  what  the  task  is  and  how 
to  do  the  task  without  doing  any  problem  solving.  Text  editing 
with  a  familiar  editor  is  the  prototypical  case  of  this,  but  clearly, 
a  great  many  tasks  of  interest  in  HCI  could  be  classified  as  rou¬ 
tine  cognitive  skills.  Thus,  the  potential  applicability  of  GOMS 
is  quite  broad.  Indeed,  GOMS  has  been  applied  to  a  variety 
of  tasks;  the  website  http:/Avww.gomsmodel.org  lists  143  GOMS- 
related  papers  in  its  bibliography. 
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What  does  a  GOMS  analysis  provide?  Essentially,  a  GOMS 
analysis  of  a  task  describes  the  hierarchical  procedural  knowl¬ 
edge  a  person  must  have  to  successfully  complete  that  task. 
Based  on  that,  and  the  sequence  of  operators  that  must  be  ex¬ 
ecuted,  it  is  possible  to  make  quantitative  predictions  about  the 
execution  time  for  a  particular  task.  Other  analyses,  such  as  pre¬ 
dictions  of  error,  functionality  coverage,  and  learning  time,  are 
also  sometimes  possible.  Since  the  original  formulation  pre¬ 
sented  in  Card  et  al.  (1983),  a  number  of  different  forms  of 
GOMS  analysis  have  been  developed,  each  with  slightly  differ¬ 
ent  strengths  and  weaknesses  (John  &  Kieras,  1996). 

The  core  point  as  it  relates  to  cognitive  architectures  is  that 
GOMS  analysis  is  originally  based  on  a  production  rule  analysis 
(Card,  personal  communication,  1999).  Because  this  will  come 
up  several  times,  a  brief  introduction  to  production  systems  is 
warranted.  Production  rules  are  IF-THEN  condition-action  pairs, 
and  a  set  of  production  rules  (or  simply  “productions,”  or  just 
“rules”)  and  a  computational  engine  that  interprets  those  pro¬ 
ductions  is  called  a  “production  system.”  In  addition  to  produc¬ 
tions,  production  systems  contain  some  representation  of  the 
current  state.  This  representation  typically  consists  of  a  set 
of  loosely  structured  data  elements  such  as  propositions  or 
attribute-value  pairs.  This  set  is  called  the  “working  memory” 
or  “declarative  memory.”  Because  “working  memory”  is  also 
a  psychological  term  with  somewhat  different  meaning  in 
that  literature,  “declarative  memory”  will  be  used  in  all  further 
discussions. 

The  operation  of  a  production  system  is  cyclic.  On  each  cy¬ 
cle,  the  system  first  goes  through  a  pattern-matching  process. 
The  IF  side  of  each  production  tests  for  the  presence  of  a  par¬ 
ticular  pattern  in  declarative  memory.  When  the  IF  conditions  of 
a  production  are  met,  the  production  is  said  to  fire,  and  the  ac¬ 
tions  specified  on  the  THEN  side  are  executed.  The  actions  can 
be  things  such  as  pressing  a  button,  or  even  some  higher  level 
abstraction  of  action  (e.g.,  “turn  left”) .  Actions  also  include  mod¬ 
ifying  the  contents  of  declarative  memory,  which  usually  means 
that  a  different  production  or  productions  will  match  on  the 
next  cycle.  At  this  abstract  and  purely  symbolic  level,  production 
systems  are  Turing  complete  and  thus  can  compute  anything 
that  is  computable  (Newell,  1990),  thus,  they  should  be  flexible 
enough  to  model  the  wide  array  of  computations  performed 
by  the  human  cognitive  system. 

This  is  relevant  to  cognitive  architectures  because  most  cog¬ 
nitive  architectures  are  (or  contain)  production  systems.  GOMS 
was  actually  abstracted  from  production  rule  analysis.  Card  et  al. 
(1983)  discovered  that,  for  routine  cognitive  skills,  the  structure 
of  the  productions  was  quite  similar  across  tasks  and  a  more 
abstract  representation  was  possible.  This  representation  is  the 
original  GOMS  formulation.  Thus,  translating  a  GOMS  analysis 
into  production  rules,  the  language  of  most  cognitive  architec¬ 
tures,  is  generally  straightforward.  Similarly,  for  routine  cogni¬ 
tive  skills,  it  is  often  relatively  simple  to  derive  a  GOMS  analysis 
from  the  set  of  productions  used  to  model  the  task.  Models 
based  on  cognitive  architectures  can  go  well  beyond  routine 
cognitive  skills,  but  this  connection  has  certainly  influenced  the 
evolution  of  research  on  cognitive  architecture  and  HCI.  This 
connection  has  also  fed  back  into  research  and  development  of 
GOMS  techniques  themselves,  such  as  Natural  GOMS  Language 
(NGOMSL;  Kieras,  1988).  NGOMSL  allows  the  prediction  of 


learning  time  for  the  knowledge  described  in  a  GOMS  model 
based  on  a  theory  of  transfer  of  training  referred  to  as  cogni¬ 
tive  complexity  theory  (CCT),  which  will  be  described  in  more 
detail  in  the  next  section. 


Cognitive  Complexity  Theory  (CCT) 

When  someone  has  learned  to  perform  a  task  with  a  particular 
interface  and  must  switch,  doing  the  same  task  with  a  new  in¬ 
terface,  how  much  better  off  will  they  be  than  someone  just 
learning  to  do  the  task  with  the  new  interface? — that  is,  how 
much  is  the  knowledge  gained  from  using  the  old  interface 
“transferred”  to  using  the  new  interface?  This  kind  of  question 
has  intrigued  psychologists  for  at  least  a  century,  and  having 
some  answers  to  this  question  has  implications  for  training  pro¬ 
grams,  user  interface  design,  and  many  other  areas.  Cognitive 
complexity  theory  (Bovair,  Kieras,  &  Poison,  1990;  Kieras  &  Poi¬ 
son,  1985)  is  a  psychological  theory  of  transfer  of  training  ap¬ 
plied  to  HCI.  Most  relevant  to  the  current  discussion,  this  the¬ 
ory  is  based  on  production  rules.  The  major  points  of  CCT  are 
as  follows: 

•  Knowledge  of  the  procedures  that  people  need  to  execute 
to  perform  routine  tasks  can  be  represented  with  production 
rules.  The  relevant  production  rules  can  be  generated  based 
on  a  GOMS  analysis  of  the  task  to  be  modeled. 

•  The  complexity  of  a  task  will  be  reflected  in  the  number  and 
content  of  the  production  rules.  When  certain  conventions 
are  adopted  about  the  style  of  those  production  rules,  com¬ 
plexity  is  reflected  almost  entirely  in  the  number  of  rules. 

•  The  time  it  takes  to  execute  a  procedure  can  be  predicted 
with  a  production  system  that  interprets  those  rules  along 
with  a  set  of  typical  operator  times,  for  example,  the  time  it 
takes  to  type  a  three-letter  command.  The  production  inter¬ 
preter  used  in  this  work  was  not  intended  to  be  a  general 
cognitive  architecture,  but  the  production  system  framework 
is  certainly  consistent  with  current  architectures. 

•  The  time  it  takes  to  learn  a  task  is  a  function  of  the  number  of 
new  rules  that  the  user  must  learn.  “New”  is  clearly  defined  in 
this  context.  If  the  user  already  has  a  production,  and  a  new 
task  requires  a  rule  that  is  similar  (again,  similarity  is  well  de¬ 
fined  based  on  the  production  rule  syntax),  then  the  rule  for 
the  new  task  need  not  be  learned. 

•  Some  predictions  about  errors  and  speedup  with  practice  can 
also  be  gleaned  from  the  contents  of  the  production  rules. 

Obviously,  this  was  an  ambitious  agenda,  and  there  are  many 
subtleties.  Eor  example,  the  notion  of  a  “task”  as  the  term  was 
used  in  the  description  of  CCT  actually  includes  more  than  just 
the  task  at  an  abstract  level.  Consider  a  simple  instance  of  a  text¬ 
editing  task,  deleting  the  word  “redux”  from  the  middle  of  a  sen¬ 
tence.  The  actual  commands  needed  to  accomplish  this  task 
could  be  very  different  in  different  text  editors,  thus,  modeling 
the  “delete  word”  task  would  require  two  different  sets  of  pro¬ 
ductions,  one  for  each  editor — that  is,  the  necessary  knowl¬ 
edge,  and  thus  the  production  rules  for  representing  it,  is  ac¬ 
tually  a  function  both  of  the  task  from  the  user  point  of  view 
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(e.g.,  “delete  word”)  and  the  interface  provided  to  accomplish 
the  task.  Transfer  from  one  text  editor  to  another  therefore  de¬ 
pends  a  great  deal  on  the  particulars  of  each  interface.  CCT  thus 
predicts  asymmetrical  transfer:  learning  editor  A  after  editor  B 
should  not  be  the  same  as  learning  editor  B  after  editor  A. 

CCT  models,  such  as  a  GOMS  analysis,  omit  modeling  many 
details  of  user  behavior.  In  general,  anything  that  falls  outside 
the  domain  of  procedural  knowledge  (how-to-do-it  knowledge) 
is  not  modeled.  This  means  that  the  model  does  not  model  mo¬ 
tor  actions  such  as  keypresses,  and  instead  has  a  “DoKeystroke” 
primitive  operator.  Nor  do  CCT  models  model  things  such  as 
natural  language  comprehension,  clearly  a  requirement  in  text 
editing.  CCT  models  also  do  not  include  any  model  of  the  per¬ 
ceptual  processes  required  by  users — the  model  was  simply 
given  information  about  the  state  of  the  display  and  did  not 
have  to,  for  example,  look  to  see  if  the  cursor  was  over  a  partic¬ 
ular  character.  This  is  the  same  scope  as  a  typical  GOMS  model, 
though  a  CCT  model  is  more  formalized  and  quantitative  than 
the  GOMS  models  described  by  Card  et  al.  (1983). 

In  spite  of  these  limitations  (or  perhaps  in  part  because 
these  limitations  allowed  the  researchers  to  concentrate  on  the 
most  central  aspects  of  the  phenomena),  CCT  fared  very  well. 
Numerous  laboratory  experiments  provide  empirical  support 
for  many  of  the  claims  of  CCT  (see  especially  Bovair  et  al. ,  1990) . 
The  CCT  framework  was  developed  and  validated  in  greatest  de¬ 
tail  to  pre-GUI  text  editing,  but  it  has  also  been  applied  to  menu- 
based  systems  (Poison,  Muncher,  &  Engelbeck,  1986)  and  a  con¬ 
trol  panel  device  (Kieras  &  Bovair,  1986).  Singley  and  Anderson 
(1989)  provided  a  strikingly  similar  analysis  of  transfer  of  train¬ 
ing,  as  well  as  supporting  empirical  results,  lending  credence 
to  the  CCT  analysis.  CCT  was  certainly  one  of  the  most  promi¬ 
nent  early  successes  of  computational  modeling  in  HCI. 

CAPS 

CAPS  (Collaborative  Activation-based  Production  System;  Just  & 
Carpenter,  1992)  is  a  cognitive  architecture  designed  to  model 
individual  differences  in  working  memory  (WM)  capacity  and 
the  effects  of  working  memory  load.  This  speciality  is  applica¬ 
ble  to  a  number  of  HCI  situations.  Certainly,  some  kinds  of  user 
interfaces  can  create  excessive  working  memory  demands,  for 
example,  phone-based  interfaces.  In  phone-based  interaction 
(PBI),  options  do  not  remain  on  a  screen  or  in  any  kind  of  avail¬ 
able  storage;  rather,  users  are  forced  to  remember  the  options 
presented.  This  seems  like  a  prime  candidate  for  modeling  with 
a  system  designed  to  capture  the  effects  of  working  memory  de¬ 
mand,  and  this  is  exactly  what  Huguenard,  Lerch,  Junker,  Patz, 
and  Kass  (1997)  did.  Their  data  showed  that,  contrary  to  guide¬ 
line  advice  and  most  people’s  intuition,  restricting  phone 
menus  to  only  a  few  (three)  items  each  does  not  reduce  error 
rates.  The  CAPS-based  model  provided  a  clear  theoretical 
account  of  this  phenomenon.  The  model  showed  that  short 
menus  are  not  necessarily  better  in  PBI  because  of  two  side  ef¬ 
fects  of  designing  menu  hierarchies  with  few  options  at  each 
level.  First,  for  the  same  number  of  total  items,  this  increases 
menu  depth,  which  creates  working  memory  demand.  Second, 
with  fewer  items  at  each  level,  each  individual  item  has  to  be 
more  general  and  therefore  more  vague,  especially  at  the  top 


levels  of  the  hierarchy.  This  forces  users  to  spend  WM  resources 
on  disambiguating  menu  items  when  they  are  in  a  situation 
where  WM  demands  outstrip  supply. 

Another  application  of  CAPS  that  is  HCI  relevant  is  the  ac¬ 
count  of  postcompletion  error  provided  by  Byrne  and  Bovair 
(1997).  "What  is  a  postcompletion  error?  Anecdotal  evidence  and 
intuition  suggests  that,  when  interacting  with  man-made  arti¬ 
facts,  certain  kinds  of  errors  occur  with  greater  frequency  than 
others.  In  particular,  one  entire  family  of  errors  seems  anecdo¬ 
tally  common:  errors  that  are  made  when  some  part  of  a  task  oc¬ 
curs  after  the  main  goal  of  the  task  has  been  accomplished 
(hence,  “postcompletion”).  Nearly  everyone  reports  having 
made  an  error  of  this  type  at  one  time  or  another.  Here  are  two 
prototypical  examples: 

1.  Leaving  the  original  on  the  glass  of  a  photocopier.  The  main 
goal  one  generally  has  when  using  a  photocopier  is  “get 
copies,”  and  this  goal  is  satisfied  before  one  remove  the  orig¬ 
inal  document.  This  error  is  less  common  now  that  many 
photocopiers  include  document  feeders;  the  more  current 
equivalent  is  leaving  a  document  on  the  glass  in  a  flatbed 
scanner. 

2.  Leaving  one’s  bank  card  in  an  automated  teller  machine 
(ATM).  Again,  the  main  goal  is  something  on  the  order  of 
“get  cash,”  and  in  many  ATMs,  card  removal  occurs  after  the 
cash  is  dispensed.  This  error  was  common  enough  in  the 
first  generation  of  ATMs  that  many  ATMs  are  now  designed  in 
such  a  way  that  this  error  is  now  impossible  to  make. 

Other  postcompletion  errors  include  leaving  the  gas  cap  off 
after  filling  up  the  car’s  gas  tank,  leaving  change  in  vending  ma¬ 
chines,  and  more — most  readers  can  probably  think  of  several 
others.  While  numerous  HCI  researchers  were  aware  of  this 
class  of  error  (e.g..  Young,  Barnard,  Simon,  &  'Whittington,  1989; 
Poison  et  al.,  1992),  no  previous  account  explained  why  this 
type  of  error  is  persistent,  yet  not  so  frequent  that  it  occurs 
every  time.  The  CAPS  model  provides  just  such  an  account,  and 
can  serve  as  a  useful  example  of  the  application  of  a  cognitive  ar¬ 
chitecture  to  an  HCI  problem. 

Like  most  other  production  systems,  CAPS  contains  two 
kinds  of  knowledge:  (a)  declarative  memory  and  (b)  produc¬ 
tions.  Declarative  memory  elements  in  CAPS  also  have  associ¬ 
ated  with  them  an  activation  value,  and  elements  below  a 
threshold  level  of  activation  cannot  be  matched  by  productions’ 
IF  sides.  Additionally,  unlike  most  other  production  systems,  the 
THEN  side  of  a  CAPS  production  may  request  that  the  activation 
of  an  element  be  incremented.  For  this  to  be  truly  useful  in 
modeling  working  memory,  there  is  a  limit  to  the  total  amount 
of  activation  available  across  all  elements.  If  the  total  activation 
exceeds  this  limit,  then  all  elements  lose  some  activation  to 
bring  the  total  back  within  the  limit.  This  provides  a  mechanism 
for  simulating  human  working  memory  limitations. 

In  Byrne  and  Bovair’s  (1997)  postcompletion  error  model, 
there  is  a  production  that  increments  the  activation  of  subgoals 
when  the  parent  goal  is  active  and  unsatisfied.  Therefore,  to  use 
the  photocopier  example,  the  “get  copies”  subgoal  supplies  ac¬ 
tivation  to  all  the  unfulfilled  subgoals  throughout  the  task;  how¬ 
ever,  when  the  “get  copies”  goal  is  satisfied,  the  activation  sup- 
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ply  to  the  subgoals  stops.  Because  the  goal  to  remove  the  orig¬ 
inal  is  a  subgoal  of  that  goal,  it  loses  its  activation  supply.  Thus, 
what  the  model  predicts  is  that  the  postcompletion  subgoals  are 
especially  vulnerable  to  working  memory  load,  and  lower  ca¬ 
pacity  individuals  are  more  “at  risk”  than  higher-capacity  indi¬ 
viduals.  Byrne  and  Bovair  conducted  an  experiment  to  test  this 
prediction,  and  the  data  supported  the  model. 

This  is  a  good  demonstration  of  the  power  of  cognitive 
architectures.  Byrne  and  Bovair  neither  designed  nor  imple¬ 
mented  the  CAPS  architecture,  but  were  able  to  use  the  theory 
to  construct  a  model  that  made  empirically  testable  predictions, 
and  those  predictions  were  borne  out.  Though  it  seems  unlikely 
that  CAPS  will  guide  future  HCI  work  (since  its  designers  have 
gone  in  a  different  direction),  it  provides  an  excellent  example 
case  for  a  variety  of  reasons. 

Soar 

The  development  of  Soar  is  generally  credited  to  Allan  Newell 
(especially,  Newell,  1990),  and  Soar  has  been  used  to  model  a 
wide  variety  of  human  cognitive  activity  from  syllogistic  reason¬ 
ing  (Polk  &  Newell,  1995)  to  flying  combat  aircraft  in  simulated 
war  games  (Jones  et  al.,  1999).  Soar  was  Newell’s  candidate 
“unified  theory  of  cognition”  and  was  the  first  computational 
theory  to  be  offered  as  such. 

"While  Soar  is  a  production  system  like  CAPS,  it  is  possible  to 
think  of  Soar  at  a  more  abstract  level.  The  guiding  principle  be¬ 
hind  the  design  of  Soar  is  Principle  P9  from  the  Model  Human 
Processor,  the  Problem  Space  Principle.  Soar  casts  all  cognitive 
activity  as  occurring  in  a  problem  space,  which  consists  of  a  num¬ 
ber  of  states.  States  are  transformed  through  the  applications  of 
operators.  Consider  Soar  playing  a  simple  game  like  tic-tac-toe  as 
player  “X.”  The  problem  space  is  the  set  of  all  states  of  the  tic-tac- 
toe  board — not  a  very  large  space.  The  operators  available  at  any 
given  state  of  that  space  are  placing  an  X  at  any  of  the  available 
open  spaces  on  the  board.  Obviously,  this  is  a  simplified  exam¬ 
ple;  the  problem  space  and  the  available  operators  for  flying  an 
F-16  in  a  simulated  war  game  are  radically  more  complex. 

Soar’s  operation  is  also  cyclic,  but  the  central  cycle  in  Soar’s 
operation  is  called  a  decision  cycle.  Essentially,  on  each  decision 
cycle.  Soar  answers  the  question,  “'What  do  I  do  next?”  Soar 
does  this  in  two  phases.  First,  all  productions  that  match  the 
current  contents  of  declarative  memory  Hre.  This  usually  causes 
changes  in  declarative  memory,  so  other  productions  may  now 
match.  Those  productions  are  allowed  to  fire,  and  this  contin¬ 
ues  until  no  new  productions  fire.  At  this  time,  the  decision  pro¬ 
cedure  is  executed,  in  which  Soar  examines  a  special  kind  of  de¬ 
clarative  memory  element,  the  preference.  Preferences  are 
statements  about  possible  actions,  for  example,  “operator  o3  is 
better  than  o5  for  the  current  operator,”  or  “slO  rejected  for  su¬ 
pergoal  state  s7.”  Soar  examines  the  available  preferences  and 
selects  an  action.  Thus,  each  decision  cycle  may  contain  many 
production  cycles.  "When  modeling  human  performance,  the 
convention  is  that  each  decision  cycle  lasts  50  ms,  so  produc¬ 
tions  in  Soar  are  very  low-level,  encapsulating  knowledge  at  a 
small  grain  size. 

Other  than  the  ubiquitous  application  of  the  problem  space 
principle.  Soar’s  most  defining  characteristics  come  from  two 


mechanisms  developed  specifically  in  Soar:  (a)  universal  sub- 
goaling  and  (b)  a  general-purpose  learning  mechanism.  Because 
the  latter  depends  on  the  former,  universal  subgoaling  will  be 
described  first.  One  of  the  features  of  Soar’s  decision  process 
is  that  it  is  not  guaranteed  to  have  an  unambiguous  set  of  pref¬ 
erences  to  work  with.  Alternately,  there  may  be  no  preferences 
listing  an  acceptable  action.  Perhaps  the  system  does  not  know 
any  acceptable  operators  for  the  current  state,  or  perhaps  the 
system  lacks  the  knowledge  of  how  to  apply  the  best  operator. 
"Whatever  the  reason,  if  the  decision  procedure  is  unable  to  se¬ 
lect  an  action,  an  impasse  is  said  to  occur.  Rather  than  halting  or 
entering  some  kind  of  failure  state.  Soar  sets  up  a  new  state  in 
a  new  problem  space  with  the  goal  of  resolving  the  impasse.  For 
example,  if  multiple  operators  were  proposed,  the  goal  of  the 
new  problem  space  is  to  choose  between  the  proposed  opera¬ 
tors.  In  the  course  of  resolving  one  impasse.  Soar  may  encounter 
another  impasse  and  create  another  new  problem  space,  and  so 
on.  As  long  as  the  system  is  provided  with  some  fairly  generic 
knowledge  about  resolving  degenerate  cases  (e.g.,  if  all  else 
fails,  choose  randomly  between  the  two  good  operators),  this 
universal  subgoaling  allows  Soar  to  continue  even  in  cases 
where  there  is  little  knowledge  present  in  the  system. 

Learning  in  Soar  is  a  by-product  of  universal  subgoaling. 
"Whenever  an  impasse  is  resolved.  Soar  creates  a  new  produc¬ 
tion  rule.  This  rule  summarizes  the  processing  that  went  on  in 
the  substate.  The  resolution  of  an  impasse  makes  a  change  to 
the  superstate  (the  state  in  which  the  impasse  originally  oc¬ 
curred);  this  change  is  called  a  result.  This  result  becomes  the 
action,  or  THEN,  side  of  the  new  production.  The  condition,  or 
IF,  side  of  the  production  is  generated  through  a  dependency 
analysis  by  looking  at  any  declarative  memory  item  matched  in 
the  course  of  determining  this  result.  "When  Soar  learns,  it  learns 
only  new  production  rules,  and  it  only  learns  as  the  result  of  re¬ 
solving  impasses.  It  is  important  to  realize  that  an  impasse  is  not 
equated  with  failure  or  an  inability  to  proceed  in  the  problem 
solving,  but  may  arise  simply  because,  for  example,  there  are 
multiple  good  actions  to  take  and  Soar  has  to  choose  one  of 
them.  Soar  impasses  regularly  when  problem  solving,  and  thus 
learning  is  pervasive  in  Soar. 

Not  surprisingly.  Soar  has  been  applied  to  a  number  of  learn¬ 
ing-oriented  HCI  situations.  One  of  the  best  examples  is  the  re¬ 
cent  work  by  Altmann  (Altmann  &John,  1999;  Altmann,  2001). 
Altmann  collected  data  from  an  experienced  programmer  while 
she  worked  at  understanding  and  updating  a  large  computer 
program  by  examining  a  trace.  These  data  included  verbal  pro¬ 
tocols  (e.g.,  thinking  aloud)  as  well  as  a  log  of  the  actions  taken 
(keystrokes  and  scrolling).  The  programmer  occasionally  scrolled 
back  to  find  a  piece  of  information  that  had  previously  been  dis¬ 
played.  Altmann  constructed  a  Soar  model  of  her  activity.  This 
model  simply  attempts  to  gather  information  about  its  environ¬ 
ment;  it  is  not  a  complete  model  of  the  complex  knowledge  of 
an  experienced  programmer.  The  model  attends  to  various 
pieces  of  the  display,  attempts  to  comprehend  what  it  sees,  and 
then  issues  commands.  “Comprehension”  in  this  model  is  man¬ 
ifested  as  an  attempt  to  retrieve  information  about  the  object 
being  comprehended. 

"When  an  item  is  attended,  this  creates  a  production  that 
notes  that  the  object  was  seen  at  a  particular  time.  Because 
learning  is  pervasive.  Soar  creates  many  new  rules  like  this,  but 
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because  of  the  dependency-based  learning  mechanism,  these 
new  productions  are  quite  specific  to  the  context  in  which  the 
impasse  originally  occurred.  Thus,  the  “index”  into  the  model’s 
rather  extensive  episodic  memory  consists  of  very  specific  cues, 
usually  found  on  the  display.  Seeing  a  particular  variable  name  is 
likely  to  trigger  a  memory  for  having  previously  seen  that  vari¬ 
able  name.  Importantly,  this  memory  is  generated  automatically, 
without  need  for  the  model  to  deliberately  set  goals  to  remem¬ 
ber  particular  items. 

Altmann  (2001)  discussed  some  of  the  HCI  ramifications  for 
this  kind  of  always-on  episodic  memory  trace  in  terms  of  display 
clutter.  While  avoiding  display  clutter  is  hardly  new  advice,  it  is 
generally  argued  that  it  should  be  avoided  for  visual  reasons 
(e.g.,  Tullis,  1983).  What  Altmann’s  model  shows,  however,  is 
that  clutter  can  also  have  serious  implications  for  effective  use  of 
episodic  memory.  Clutter  can  create  enormous  demands  for  re¬ 
trieval.  Since  more  objects  will  generally  be  attended  on  a  clut¬ 
tered  display,  the  episodic  trace  will  be  large,  lowering  the  pre¬ 
dictive  validity  for  any  single  cue.  It  is  unlikely  that  kind  of 
analysis  would  have  been  generated  if  it  had  not  been  guided  by 
a  cognitive  architecture  that  provided  the  omnipresent  learning 
of  Soar. 

Soar  has  also  been  used  to  implement  models  of  exploratory 
learning,  somewhat  in  the  spirit  of  LICAI  (described  later) . 
There  are  two  prominent  models  here,  (a)  one  called  IDXL  (Rie- 
man.  Young,  &  Howes,  1996)  and  (b)  a  related  model  called 
Task-Action  Learner  (Howes  &  Young,  1996).  These  models 
both  attempt  to  learn  unfamiliar  GUI  interfaces.  IDXL  operates 
in  the  domain  of  using  software  to  construct  data  graphs,  while 
the  Task-Action  Leaner  starts  with  even  less  knowledge  and 
learns  basic  GUI  operations  such  as  how  to  open  a  file.  For 
brevity,  only  IDXL  will  be  described  in  detail. 

IDXL  goes  through  many  of  the  same  scanning  processes  as 
LICAI,  but  must  rely  on  very  different  mechanisms  for  evalua¬ 
tion  since  Soar  is  fundamentally  different  from  LICAI.  IDXL 
models  evaluation  of  various  display  elements  as  search  through 
multiple  problem  spaces,  one  that  is  an  internal  search  through 
Soar’s  internal  knowledge  and  the  other  a  search  through  the 
display.  As  items  are  evaluated  in  the  search.  Soar  learns  pro¬ 
ductions  that  summarize  the  products  of  each  evaluation.  At 
first,  search  is  broad  and  shallow,  with  each  item  receiving  a 
minimum  of  elaboration;  however,  that  prior  elaboration  guides 
the  next  round  of  elaboration,  gradually  allowing  IDXL  to  focus 
in  on  the  “best”  items.  This  model  suggests  a  number  of  ways 
in  which  interface  designers  could  thus  help  learners  acquire  a 
new  the  knowledge  needed  to  utilize  a  new  interface.  Like  the 
LICAI  work,  the  IDXL  work  highlights  the  need  for  good  labels 
to  guide  exploration.  A  more  radical  suggestion  is  based  on  one 
of  the  more  subtle  behavior  of  users  and  IDXL.  ’When  explor¬ 
ing  and  evaluating  alternatives,  long  pauses  often  occur  on  par¬ 
ticular  menu  items.  During  these  long  pauses,  IDXL  is  attempt¬ 
ing  to  determine  the  outcome  of  selecting  the  menu  item  being 
considered.  Thus,  one  suggestion  for  speeding  up  learning  of  a 
new  menu-driven  GUI  is  to  detect  such  pauses,  and  show  (in 
some  easily  undoable  way)  what  the  results  of  selecting  that 
item  would  be.  For  instance,  if  choosing  that  item  brings  up  a  di¬ 
alog  box  for  specifying  certain  options,  that  dialog  box  could 
be  shown  in  some  grayed-out  form,  and  would  simply  vanish  if 
the  user  moved  off  of  that  menu  item.  This  would  make  the 


evaluation  of  the  item  much  more  certain  and  would  be  an  ex¬ 
cellent  guide  for  novice  users.  This  is  not  unlike  ToolTips  for 
toolbar  icons,  but  on  a  much  larger  scale. 

A  model  that  does  an  excellent  job  of  highlighting  the  power 
of  cognitive  architectures  is  NTD-Soar  (Nelson,  Lehman,  & 
John,  1994).  NTD  stands  for  “NASA  Test  Director,”  who 

...  is  responsible  for  coordinating  many  facets  of  the  testing  and  prepa¬ 
ration  of  the  Space  Shuttle  before  it  is  launched.  He  must  complete  a 
checklist  of  launch  procedures  that,  in  its  current  form,  consists  of  3000 
pages  of  looseleaf  manuals  ...  as  well  as  graphical  timetables  describ¬ 
ing  the  critical  timing  of  particular  launch  events.  To  accomplish  this,  the 
NTD  talks  extensively  with  other  members  of  launch  team  over  a  two- 
way  radio.  ...  In  addition  to  maintaining  a  good  understanding  of  the 
status  of  the  entire  launch,  the  NTD  is  responsible  for  coordinating  trou¬ 
bleshooting  attempts  by  managing  the  communication  between  mem¬ 
bers  of  the  launch  team  who  have  the  necessary  expertise,  (p.  658) 

Constructing  a  model  that  is  even  able  to  perform  this  task  at 
all  is  a  significant  accomplishment.  Nelson,  Lehman,  and  John 
(1994)  were  able  to  not  only  build  such  a  model,  but  this  model 
was  able  to  produce  a  timeline  of  behavior  that  closely  matched 
the  timeline  produced  by  the  actual  NTD  being  modeled — that 
is,  the  ultimate  result  was  a  quantitative  model  of  human  per¬ 
formance,  and  an  accurate  one  at  that. 

It  is  unlikely  that  such  an  effort  could  have  been  accom¬ 
plished  without  the  use  of  an  integrated  cognitive  architecture. 
The  NTD  model  made  extensive  use  of  other  Soar  models.  Nel¬ 
son,  Lehman,  and  John  (1994)  did  not  have  to  generate  and  im¬ 
plement  theory  of  natural  language  understanding  to  model  the 
communication  between  the  NTD  and  others,  or  the  NTD  read¬ 
ing  the  pages  in  the  checklist,  because  one  had  already  been 
constructed  in  Soar  (Lewis,  1993).  They  did  not  have  to  con¬ 
struct  a  model  of  visual  attention  to  manage  the  scanning  and  vi¬ 
sual  search  of  those  3000  pages,  because  such  a  model  already 
existed  in  Soar  (Wiesmeyer,  1992).  A  great  deal  of  knowledge 
engineering  still  took  place  in  order  to  understand  and  model 
this  complex  task,  but  using  an  integrated  architecture  greatly 
eased  the  task  of  the  modelers. 

■While  this  modeling  effort  was  not  aimed  at  a  particular  HCI 
problem,  it  is  not  difficult  to  see  how  it  would  be  applicable  to 
one.  If  one  wanted  to  replace  the  3000-page  checklist  with 
something  like  a  personal  digital  assistant  (PDA),  how  could  the 
PDA  be  evaluated?  There  are  very  few  NTDs  in  the  world,  and  it 
is  unlikely  that  they  would  be  able  to  devote  much  time  to  par¬ 
ticipatory  design  or  usability  testing.  Because  an  appropriate 
quantitative  model  of  the  NTD  exists,  however,  it  should  be  pos¬ 
sible  to  give  the  model  a  simulated  PDA  and  assess  the  impact  of 
that  change  on  the  model’s  performance.  Even  if  the  model 
does  not  perfectly  capture  the  effects  of  the  change,  it  is  likely 
that  the  model  would  identify  problem  areas  and  at  least  guide 
the  developers  in  using  any  time  they  have  with  an  actual  NTD. 

Soar  has  also  been  used  as  the  basis  for  simulated  agents  in 
war  games,  as  mentioned  previously  (Jones  et  al.,  1999).  This 
model  (TacAir-Soar)  participates  in  a  virtual  battle  space  in 
which  humans  also  participate.  TacAir-Soar  models  take  on  a 
variety  of  roles  in  this  environment,  from  fighter  pilots  to  heli¬ 
copter  crews  to  refueling  planes.  Their  interactions  are  more 
complex  than  simple  scripted  agents,  and  they  can  interact  with 
humans  in  the  environment  with  English  natural  language.  One 
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of  the  major  goals  of  the  project  is  to  make  sure  that  TacAir-Soar 
produces  humanlike  behavior,  because  this  is  critical  to  their 
role,  which  is  to  serve  as  part  of  training  scenarios  for  human 
soldiers.  In  large-scale  simulations  with  many  entities,  it  is  much 
cheaper  to  use  computer  agents  than  to  have  humans  fill  every 
role  in  the  simulation.  While  agents  (other  than  the  ubiquitous 
and  generally  disliked  paper  clip)  have  not  widely  penetrated 
the  common  desktop  interface,  this  remains  an  active  HCI 
research  area,  and  future  agents  in  other  roles  could  also  be 
based  on  cognitive  architecture  rather  than  more  engineering- 
oriented  AI  models. 

While  the  Soar  architecture  is  still  being  used  for  applied  AI 
work  such  as  TacAir-Soar,  there  have  been  essentially  no  new 
HCI-oriented  Soar  efforts  in  the  last  several  years.  Nothing,  in 
principle,  has  precluded  such  efforts,  but  for  the  time  being. 
Soar  is  primarily  an  example  of  previous  achievements  using 
cognitive  architectures  in  HCI,  as  opposed  to  a  source  of  new 
insight. 


CONTEMPORARY  ARCHITECTURES 


Current  cognitive  architectures  are  being  actively  developed, 
updated,  and  applied  to  HCI-oriented  tasks.  Two  of  the  three 
most  prominent  systems  (EPIC  and  ACT-R)  are  production  sys¬ 
tems,  or  tather  are  centrally  built  around  production  systems. 
While  all  contain  production  rules,  the  level  of  granularity  of  an 
individual  production  rule  varies  considerably  from  architec¬ 
ture  to  architecture.  Each  one  has  a  different  history  and 
unique  focus,  but  they  share  a  certain  amount  of  intellectual 
history;  in  particular,  they  have  all  been  influenced  one  way  or 
another  by  both  the  MHP  and  each  other.  At  some  level,  they 
may  have  more  similarities  than  differences;  whether  this  is 
because  they  borrow  from  one  another  or  because  the  science 
is  converging  is  still  an  open  question.  The  third  system  is 
somewhat  different  from  these  two  production  system  models 
and  will  be  considered  first. 

LlCAl/CoLiDeS 

LICAI  (Kitajima  &  Poison,  1997)  is  a  primary  example  of  an  HCI- 
oriented  cognitive  architecture  not  based  on  the  production  sys¬ 
tem  framework.  With  the  exception  of  the  Soar  work  on  ex¬ 
ploratory  learning,  all  of  the  work  discussed  up  to  this  point  more 
or  less  assumes  that  the  modeled  users  are  relatively  skilled  with 
the  specific  interface  being  used;  these  approaches  do  a  poor 
job  of  modeling  relatively  raw  novices.  One  of  the  main  goals  of 
LICAI  is  to  address  this  concern.  The  paradigm  question  ad¬ 
dressed  by  LICAI  is,  “How  do  users  explore  a  new  interface?” 

Unlike  the  other  architectures  discussed,  LI  CATs  central  con¬ 
trol  mechanisms  are  not  based  on  a  production  system.  Instead, 
LICAI  is  built  around  an  architecture  originally  designed  to 
model  human  discourse  comprehension:  construction-integra¬ 
tion  (C-I;  Kintsch,  1998).  Like  production  systems,  C-Ts  opera¬ 
tion  is  cyclic.  What  happens  on  those  cycles,  however,  is  some¬ 
what  different  from  what  happens  in  a  production  system.  Each 
cycle  is  divided  into  two  phases:  (a)  construction  and  (b)  inte¬ 


gration  (hence,  the  name).  In  the  construction  phase,  an  initial 
input  (e.g.,  the  contents  of  the  current  display)  is  fed  into  a 
weakly  constrained,  rule-based  process  that  generates  a  net¬ 
work  of  propositions.  Items  in  the  network  are  linked  based  on 
their  argument  overlap.  For  example,  the  goal  of  “graph  data” 
might  be  represented  with  the  proposition  (PERFORM  GRAPH 
DATA) .  Any  proposition  containing  GRAPH  or  DATA  would  thus 
be  linked  to  that  goal. 

Once  the  construction  phase  is  completed,  the  system  is  left 
with  a  linked  network  of  propositions.  What  follows  is  the  inte¬ 
gration  phase,  in  which  activation  propagates  through  the  net¬ 
work  in  a  neural,  network-like  fashion.  Essentially,  this  phase  is 
a  constraint-satisfaction  phase,  which  is  used  to  select  one  of 
the  propositions  in  the  network  as  the  “preferred”  one.  For  ex¬ 
ample,  the  system  may  need  to  select  the  next  action  to  perform 
while  using  an  interface.  Action  representations  will  be  added  to 
the  network  during  the  construction  phase,  and  an  action  will 
be  selected  during  the  integration  phase.  The  action  will  be  per¬ 
formed,  and  the  next  cycle  initiated.  Various  C-I  models  have 
used  this  basic  process  to  select  things  other  than  actions.  The 
original  C-I  system  used  these  cycles  to  select  between  differ¬ 
ent  interpretations  of  sentences. 

LICAI  has  three  main  types  of  cycles:  (a)  one  to  select  ac¬ 
tions,  (b)  one  to  generate  goals,  and  (c)  one  to  select  goals.  This 
contrasts  with  how  most  HCI  tasks  have  been  modeled  in  pro¬ 
duction  system  architectures;  in  such  systems,  the  goals  are  usu¬ 
ally  included  in  the  knowledge  given  to  the  system.  This  is  not 
true  in  LICAI;  in  fact,  the  knowledge  given  to  LICAI  by  the  mod¬ 
elers  is  minimal.  For  the  particular  application  of  LICAI,  which 
was  modeling  users  who  knew  how  to  use  a  Macintosh  for  other 
tasks  (e.g.,  word  processing)  and  were  now  being  asked  to  plot 
some  data  using  a  Macintosh  program  called  CricketGraph  (one 
group  of  users  actually  worked  with  Microsoft  Excel) ,  it  included 
some  very  basic  knowledge  about  the  Macintosh  GUI  and  some 
knowledge  about  graphing.  Rather  than  supply  the  model  with 
the  goal  hierarchy,  Kitajima  and  Poison  (1997)  gave  the  model 
the  same  minimal  instructions  that  were  given  to  the  subjects. 
One  of  the  major  jobs  of  the  LICAI  model,  then,  was  to  generate 
the  appropriate  goals  as  they  arose  while  attempting  to  carry 
out  the  instructions. 

Again,  this  illustrates  one  of  the  benefits  of  using  a  cognitive 
architecture  to  model  HCI  tasks.  Kitajima  and  Poison  (1997)  did 
not  have  to  develop  a  theory  of  text  comprehension  for  LICAI  to 
be  able  to  comprehend  the  instructions  given  to  subjects;  since 
LICAI  is  based  on  an  architecture  originally  designed  for  text 
comprehension,  they  essentially  got  that  functionality  gratis.  Ad¬ 
ditionally  they  did  not  include  just  any  text  comprehension  en¬ 
gine,  but  one  that  makes  empirically  validated  predictions  about 
how  people  represent  the  text  they  read.  Thus,  the  claim  that 
the  model  started  out  with  roughly  the  same  knowledge  as  the 
users  is  highly  credible. 

The  actual  behavior  of  the  model  is  also  revealing,  as  it  ex¬ 
hibits  many  of  the  same  exploratory  behaviors  seen  in  the  users. 
First,  the  model  pursues  a  general  strategy  that  can  be  classi¬ 
fied  as  label-following  (Poison  et  al.,  1992).  The  model,  like  the 
users,  had  a  strong  tendency  to  examine  anything  on  the  screen 
that  had  a  label  matching  or  nearly  matching  (e.g.,  a  near  syn¬ 
onym)  a  key  word  in  the  task  instructions.  When  the  particular 
subtask  being  pursued  by  the  model  contained  well-labeled 
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steps,  the  users  were  rapid,  which  was  predicted  by  the  model. 
While  this  prediction  is  not  counter-intuitive,  it  is  important 
to  note  that  LICAI  is  not  programmed  with  this  strategy.  This 
strategy  naturally  emerges  through  the  normal  operation  of 
construction-integration  through  the  linkages  created  by  shared 
arguments.  The  perhaps  less-intuitive  result — modeled  suc¬ 
cessfully  by  LICAI — is  the  effect  of  the  number  of  screen  objects. 
During  exploration  in  this  task,  users  were  slower  to  make 
choices  if  there  were  more  objects  on  the  screen,  but  only  if 
those  items  all  had  what  were  classified  as  “poor”  labels.  In  the 
presence  of  “good”  labels  (literal  match  or  near  synonym),  the 
number  of  objects  on  the  screen  did  not  affect  decision  times  of 
either  the  users  or  LICAI. 

The  programmers  who  implemented  the  programs  operated 
by  the  users  put  in  several  clever  direct  manipulation  tricks.  For 
example,  to  change  the  properties  of  a  graph  axis,  one  double¬ 
clicks  on  the  axis  and  a  dialog  box  specifying  the  properties  of 
that  axis  appears.  Microsoft  Excel  has  some  functionality  that  is 
most  easily  accessed  by  drag-and-drop.  Franzke  (1994)  found 
that  in  a  majority  of  first  encounters  with  these  types  of  direct 
manipulations,  users  required  hints  from  the  experimenter  to 
be  able  to  continue,  even  after  two  minutes  of  exploration.  LI¬ 
CAI  also  fails  at  these  interactions  because  no  appropriate  links 
are  formed  between  any  kind  of  task  goal  and  these  unlabeled, 
apparently  static  screen  objects  during  the  construction  phase. 
Thus,  these  screen  objects  tend  to  receive  little  activation  dur¬ 
ing  the  integration  phase,  and  actions  involving  other  objects 
are  always  selected. 

Overall,  LICAI  does  an  excellent  job  of  capturing  many  of  the 
other  empirical  regularities  in  exploratory  learning  of  a  GUI  in¬ 
terface.  This  is  an  important  issue  for  many  interfaces — partic¬ 
ularly,  any  interface  that  is  aimed  at  a  walk-up-and-use  audience. 
While  currently  common  walk-up-and-use  interfaces,  such  at 
ATMs,  provide  simple  enough  functionality  that  this  is  not  al¬ 
ways  enormously  difficult,  this  is  not  the  case  for  more  sophis¬ 
ticated  systems,  such  as  many  information  kiosks. 

More  recently,  LICAI  has  been  updated  (and  renamed  to 
CoLiDeS,  for  Comprehension-based  Linked  model  of  Deliberate 
Search;  Kitajima,  Blackmon,  &  Poison,  2000)  to  handle  interac¬ 
tion  with  web  pages.  This  involves  goals  that  are  considerably 
less  well  elucidated  and  interfaces  with  a  much  wider  range  of 
semantic  content.  In  order  to  help  deal  with  these  complexities, 
LICAI  has  been  updated  with  a  more  robust  attentional  mecha¬ 
nism  and  a  much  more  sophisticated  notion  of  semantic  simi¬ 
larity  based  on  Latent  Semantic  Analysis  (LSA;  Landauer  &  Du- 
mais,  1997).  LSA  is  a  technique  for  mathematically  representing 
the  meaning  of  words.  While  the  details  of  the  mathematics  are 
complex,  the  underlying  idea  is  straightforward.  To  perform 
LSA,  one  needs  a  large  corpus  of  documents.  An  enormous  ma¬ 
trix  is  created,  showing  which  words  appear  in  which  contexts 
(documents) .  This  matrix  is  then  reduced  by  a  mathematical 
technique  called  “singular  value  decomposition”  (a  cousin  of 
factor  analysis)  to  a  smaller  dimensional  space  of,  say,  300  by  300 
dimensions.  Each  word  can  then  be  represented  by  a  vector  in 
this  space.  The  “meaning”  of  a  passage  is  simply  the  sum  of  the 
vectors  of  the  individual  words  in  the  passage.  The  similarity  of 
two  words  (or  two  passages)  can  be  quantified  as  the  cosine 
between  the  vectors  representing  the  words  (or  passages),  with 
larger  numbers  signifying  greater  semantic  similarity. 


While  LSA  is  certainly  surrounded  by  a  host  of  mathematical 
and  theoretical  issues  (it  should  be  noted  that  many  similar 
techniques  that  differ  in  a  variety  of  subtle  ways  have  also  been 
developed),  the  key  result  from  an  HCI  point  of  view  is  that  the 
technique  works.  Similarity  measures  generated  by  LSA  gener¬ 
ally  agree  very  well  with  human  judges,  even  for  complex  lan¬ 
guage  understanding  tasks  such  as  grading  essays.  (In  fact,  an 
LSA-based  system  for  grading  essays  and  similar  tasks  has  been 
turned  into  a  successful  commercial  product;  see  http://www 
.pearsonkt.com.) 

Based  on  the  LSA-augmented  CoLiDeS,  Blackmon  and  col¬ 
leagues  (2002,  2003,  2005)  have  developed  a  web  usability  as¬ 
sessment  and  repair  technique  they  call  “Cognitive  Walkthrough 
for  the  Web”  (CWW) .  CWW  is  based  on  the  idea  that  when  users 
are  exploring  a  large  website,  they  go  through  a  two-stage 
process  of  parsing  and  understanding  the  page,  and  ultimately 
generate  descriptions  of  the  available  actions.  Using  those  de¬ 
scriptions,  they  judge  which  action  is  most  similar  to  their  goal, 
and  then  select  the  link  or  widget  that  corresponds  with  that  ac¬ 
tion.  This  is  somewhat  more  complex  than  simply  applying  LSA 
to  all  of  the  labels  on  the  page  and  comparing  each  one  to  the 
goal;  the  page  is  parsed  into  “regions,”  each  of  which  is  as¬ 
sessed,  and  smaller  bits  of  text  (say,  link  labels)  are  elaborated 
with  a  C-I-based  process.  It  is  those  elaborations,  rather  than  the 
raw  link  labels,  that  are  compared  via  LSA  to  the  description  of 
the  user’s  goal. 

Based  on  this,  CWW  identifies  where  users  with  a  particular 
goal  are  likely  to  have  navigation  problems  and  even  generates 
problem  severity  estimates.  These  predictions  have  been  vali¬ 
dated  in  a  series  of  experiments  (Blackmon,  Kitajima,  &  Poison, 
2003;  Blackmon,  Kitajima,  &  Poison,  2005),  and  tools  which  par¬ 
tially  automate  the  process  are  available  on  the  web  (see  http:// 
autocww.colorado.edu/).  This  is  a  significant  advance  in  HCI,  as 
the  problem  of  predicting  how  people  use  semantic  similarity  to 
guide  search  on  the  web  is  not  a  trivial  one  and,  prior  to  tech¬ 
niques  like  this,  has  generally  required  extensive  iteration  of 
user  tests. 


EPIC 

With  the  possible  exception  of  the  NTD  model,  the  models  dis¬ 
cussed  up  to  this  point  have  been  primarily  “pure”  cognitive 
models — that  is,  the  perception  and  action  parts  of  the  mod¬ 
els  have  been  handled  in  an  indirect,  abstract  way.  These  mod¬ 
els  focus  on  the  cognition  involved,  which  is  not  surprising 
given  the  generally  cognitive  background  of  these  systems; 
however,  even  the  original  formulation  of  the  MHP  included 
processors  for  perception  and  motor  control.  Additionally,  in 
fact,  user  interfaces  have  also  moved  from  having  almost  ex¬ 
clusively  cognitive  demands  (e.g.,  one  had  to  remember  or 
problem-solve  to  generate  command  names)  to  relying  much 
more  heavily  on  perceptual-motor  capabilities.  This  is  one  of 
the  hallmarks  of  the  GUI,  the  shift  to  visual  processing  and  di¬ 
rect  manipulation  rather  than  reliance  on  complex  composi¬ 
tion  of  commands. 

Providing  accurate  quantitative  models  for  this  kind  of  activ¬ 
ity,  however,  requires  a  system  with  detailed  models  of  human 
perceptual  and  motor  capabilities.  This  is  one  of  the  major  foci 
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and  contributions  of  EPIC  (for  executive  process  interactive 
control).  EPIC  is  the  brainchild  of  Kieras  and  Meyer  (see  espe¬ 
cially  1996,  1997;  Kieras,  Wood,  &  Meyer,  1997).  The  overall 
structure  of  the  processors  and  memories  in  EPIC  is  shown  in 
Fig.  5.2.  This  certainly  bears  some  surface  similarity  to  the  MHT 
but  EPIC  is  substantially  more  detailed.  EPIC  was  explicitly  de¬ 
signed  to  pair  high-fidelity  models  of  perception  and  motor 
mechanisms  with  a  production  system.  The  perceptual-motor 
processors  represent  a  new  synthesis  of  the  human  perfor¬ 
mance  literature,  while  the  production  system  is  the  same  one 
used  in  the  CCT  work  discussed  earlier. 

All  EPIC  processors  run  in  parallel  with  one  another.  There¬ 
fore,  while  the  Visual  Processor  is  recognizing  an  object  on  the 
screen,  the  Cognitive  Processor  can  decide  what  word  should 
be  spoken  in  response  to  some  other  input,  while  at  the  same 
time  the  Manual  Motor  processor  is  pressing  a  key.  The  infor¬ 
mation  flow  is  typical  of  traditional  psychological  models,  with 
information  coming  in  through  the  eyes  and  ears,  and  outputs 
coming  from  the  mouth  and  hands.  More  specifically,  what  is 
modeled  in  each  of  the  processors  is  primarily  time  course. 
EPIC’S  Visual  Processor  does  not  take  raw  pixels  as  input  and 
compute  that  those  pixels  represent  a  letter  ‘A.”  Instead,  it  de¬ 
termines  whether  the  object  on  the  screen  can  be  seen  and  at 
what  level  of  detail,  as  well  as  how  long  it  will  take  for  a  repre¬ 
sentation  of  that  object  to  be  delivered  to  EPIC’s  declarative 
memory  once  the  letter  becomes  available  to  the  Visual  Proces¬ 


sor.  The  appearance  of  the  letter  can  actually  cause  a  number 
of  different  elements  to  be  deposited  into  EPIC’s  declarative 
memory  at  different  times.  For  example,  information  about  the 
letter’s  color  will  be  delivered  before  information  about  the  let¬ 
ter’s  identity. 

Similarly,  on  the  motor  side,  EPIC  does  not  simulate  the  com¬ 
putation  of  the  torques  or  forces  needed  to  produce  a  particu¬ 
lar  hand  movement.  Instead,  EPIC  computes  the  time  it  will  take 
for  a  particular  motor  output  to  be  produced  after  the  Cognitive 
Processor  has  requested  it.  This  is  complicated  by  the  fact  that 
movements  happen  in  phases.  Most  importantly,  each  move¬ 
ment  includes  a  preparation  phase  and  an  execution  phase.  The 
time  to  prepare  a  movement  is  dependent  on  the  number  of 
movement  features  that  must  be  prepared  for  each  movement 
and  the  features  of  the  last  movement  prepared.  Features  that 
have  been  prepared  for  the  previous  movement  can  sometimes 
be  reused  to  save  time.  EPIC  can  make  repeated  identical  move¬ 
ments  rapidly  because  no  feature  preparation  time  is  necessary 
if  the  movements  are  identical.  If  they  are  not  identical,  the 
amount  of  savings  is  a  function  of  how  different  the  current 
movement  is  from  the  previous  one.  After  being  prepared,  a 
movement  is  executed.  The  execution  time  for  a  movement  cor¬ 
responds  roughly  to  the  time  it  takes  to  physically  execute  the 
movement;  the  execution  time  for  aimed  movements  of  the 
hands  or  fingers  are  governed  by  Fitts’  Law.  EPIC’s  motor 
processors  can  only  prepare  one  movement  at  a  time,  and  can 
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FIGURE  5.2.  Overall  structure  of  the  EPIC  architecture.  Based  on  Kieras  and  Meyer  (1996). 


5.  Cognitive  Architecture  •  105 


only  execute  one  movement  at  a  time,  but  may  be  preparing 
one  movement  while  executing  another.  Thus,  in  some  tasks,  it 
may  be  possible  to  effectively  pipeline  movements  in  order  to 
generate  very  rapid  sequences  of  movements. 

EPIC’S  Cognitive  Processor  is  a  production  system,  the  same 
one  that  was  used  for  the  earlier  CCT  work.  One  highly  salient 
feature  of  this  system  is  that  multiple  rules  can  fire  on  a  pro¬ 
duction  cycle.  In  fact,  there  is  no  upper  bound  on  the  number 
of  productions  that  can  fire  on  a  cycle.  Productions  in  this  sys¬ 
tem  are  at  a  much  higher  grain  size  than  productions  in  Soar, 
which  gives  EPIC  a  parallel  quality  at  all  levels — that  is,  all  the 
processors  work  in  parallel,  and  EPIC’s  cognitive  processor  is 
itself  capable  of  parallel  processing. 

This  allows  EPIC  particular  leverage  in  multiple-task  situa¬ 
tions.  When  more  than  one  task  is  being  performed,  the  tasks 
can  execute  in  parallel;  however,  many  of  the  perceptual-motor 
processors  are  effectively  serial.  People  only  have  one  set  of  eyes 
that  can  only  be  aimed  at  one  place  at  a  time,  so  if  multiple  tasks 
are  ongoing  and  they  both  require  the  eyes,  there  must  be 
something  that  arbitrates.  In  EPIC,  this  additional  knowledge 
about  how  to  manage  multiple  tasks  is  termed  “executive” 
knowledge,  and  the  productions  that  implement  this  knowl¬ 
edge  execute  in  parallel  with  the  productions  implementing  the 
task  knowledge. 

"Why  is  all  this  machinery  and  extra  knowledge  necessary?  Be¬ 
cause  the  world  of  HCI  is  changing.  The  GUI  forced  designers 
and  analysts  to  consider  more  seriously  the  perceptual-motor 
constraints,  and  the  propagation  of  computers  with  user  inter¬ 
faces  away  from  the  desktop  and  into  mobile  phones,  kiosks,  au¬ 
tomobiles,  and  many,  many  other  places  creates  a  huge  demand 
on  people’s  ability  to  multitask.  Multiple-task  issues  have  largely 
gone  unmodeled  and  have  been  outside  the  theoretical  scope 
of  most  psychological  accounts  in  HCI,  at  least  before  EPIC. 

While  LICAI  and  Soar  have  not  been  adequately  equipped 
to  deal  with  high-performance  perception  and  action  compo¬ 
nents  of  many  tasks,  EPIC  is  not  equipped  to  handle  some  of  the 
issues  covered  by  other  architectures.  In  particular,  EPIC  does 
not  include  any  learning  mechanisms,  so  it  would  be  difficult 
to  generate  EPIC  models  for  many  of  the  domains  Soar  has  ap¬ 
proached  successfully.  This  is  not  a  fatal  shortcoming,  however, 
as  there  are  a  wide  variety  of  domains  in  which  learning  is  not  a 
key  component  and  where  high-fidelity  modeling  of  perception 
and  action,  along  with  multiple  tasking,  are  central. 

Naturally,  these  are  the  kinds  of  domains  to  which  EPIC  has 
been  applied.  One  of  the  first  major  applications  of  EPIC  was  to 
a  deceptively  simple  dual-task  paradigm  known  as  the  “psycho¬ 
logical  refractory  period”  (or  PRP;  see  Meyer  &  Kieras,  1997). 
In  this  task,  laboratory  subjects  are  typically  confronted  with 
two  choice  reaction  time  tasks,  something  on  the  order  of, 
“Either  a  red  light  or  a  green  light  will  appear.  If  it’s  red,  hit  the 
‘L’  key.  If  it’s  green,  hit  the  ‘J’  key.”  This  sounds  simple,  but  the 
empirical  literature  is  rich  and  shows  a  variety  of  subtle  effects, 
for  which  EPIC  provides  the  first  unified  account.  Critically,  what 
the  EPIC  models  of  these  experiments  show  is  that  people’s 
low-level  strategies  for  scheduling  the  tasks  play  a  large  role  in 
determining  performance  in  this  simple  paradigm. 

EPIC  has  been  used  to  model  several  tasks  with  a  more  di¬ 
rectly  HCToriented  flavor.  One  of  those  tasks  is  menu  selection 
(Kieras  &  Meyer,  1997;  Hornof  &  Kieras,  1997,  1999),  but  for 


brevity,  a  detailed  description  of  these  models  will  be  omitted. 
Another  application  of  EPIC  that  definitely  merits  mention  is  the 
model  of  telephone  assistance  operators  (TAOs) ,  data  originally 
presented  in  Gray  et  al.  (1993).  When  a  telephone  customer  di¬ 
als  “0,”  a  TAO  is  the  person  who  answers.  The  TAOs  modeled 
here  sat  at  a  “dumb  terminal”  style  workstation  and  assisted  cus¬ 
tomers  in  completing  telephone  calls.  In  particular,  TAOs  de¬ 
termine  how  calls  should  be  billed,  and  this  is  done  by  speak¬ 
ing  to  the  customer.  The  detailed  EPIC  models  (Kieras  et  al., 
1997)  covered  a  subset  of  the  possible  billing  types. 

This  provided  a  good  test  of  EPIC  because  the  task  was  per¬ 
formed  under  time  pressure,  and  seconds — actually,  millisec¬ 
onds — counted  in  task  performance.  Second,  this  task  is  multi¬ 
modal.  The  TAO  must  speak,  type,  listen,  and  look  at  a  display. 
Third,  very  fine-grained  performance  data  were  available  to  help 
guide  model  construction.  By  now,  it  should  come  as  no  sur¬ 
prise  to  the  reader  that  it  was  possible  to  construct  an  EPIC 
model  that  did  an  excellent  job  of  modeling  the  time  course  of 
the  TAO’s  interaction  with  the  customer;  however,  this  model¬ 
ing  effort  went  beyond  just  that  and  provided  some  insight  into 
the  knowledge  engineering  problem  facing  modelers  utilizing 
cognitive  architectures  as  well. 

Like  other  production  system  models,  EPIC  provides  a  cer¬ 
tain  amount  of  freedom  to  the  modeler  in  model  construction. 
While  the  architecture  used  provides  certain  kinds  of  con¬ 
straints,  and  these  constraints  are  critical  in  doing  good  science 
and  affecting  the  final  form  of  the  model  (Howes  &  Young, 
1997),  the  modeler  does  have  some  leeway  in  writing  the  pro¬ 
duction  rules  in  the  model.  This  is  true  even  when  the  produc¬ 
tion  rule  model  is  derived  from  another  structured  representa¬ 
tion  such  as  a  GOMS  model,  which  was  the  case  in  the  TAO 
model.  In  EPIC,  it  is  possible  to  write  a  set  of  “aggressive”  pro¬ 
ductions  that  maximize  the  system’s  ability  to  process  things  in 
parallel,  while  it  is  also  possible  to  write  any  number  of  less  ag¬ 
gressive  sets  representing  more  conservative  strategies.  EPIC 
will  produce  a  quantitative  performance  prediction  regardless 
of  the  strategy,  but  which  kind  of  strategy  should  the  modeler 
choose?  There  is  generally  no  a  priori  basis  for  such  a  decision, 
and  it  is  not  clear  that  people  can  accurately  self-report  on  such 
low-level  decisions. 

Kieras  et  al.  (1997)  generated  an  elegant  approach  to  this 
problem,  later  termed  “bracketing”  (Kieras  &  Meyer,  2000).  The 
idea  is  this:  construct  two  models,  one  of  which  is  the  maximally 
aggressive  version.  At  this  end  of  the  strategy  spectrum,  the 
models  contain  very  little  in  the  way  of  cognition.  The  Cognitive 
Processor  does  virtually  no  deliberation  and  spends  most  of  its 
cycles  simply  reading  off  perceptual  inputs  and  immediately 
generating  the  appropriate  motor  output.  This  represents  the 
“super-expert”  whose  performance  is  limited  almost  entirely  by 
the  rate  of  information  flow  through  the  perceptual-motor  sys¬ 
tems.  At  the  other  end  of  the  spectrum,  a  model  incorporating 
the  slowest  reasonable  strategy  is  produced.  The  slowest  rea¬ 
sonable  strategy  is  one  where  the  basic  task  requirements  are 
met,  but  with  no  strategic  effort  made  to  optimize  scheduling  to 
produce  rapid  performance.  The  idea  is  that  observed  perfor¬ 
mance  should  fall  somewhere  in  between  these  two  extremes, 
hence,  the  data  should  be  bracketed  by  the  two  versions  of  the 
model.  Different  users  will  tend  to  perform  at  different  ends  of 
this  range  for  different  tasks,  so  this  is  an  excellent  way  to  ac- 
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commodate  some  of  the  individual  differences  that  are  always 
observed  in  real  users. 

What  was  discovered  by  employing  this  bracketing  proce¬ 
dure  to  the  TAO  models  was  surprising.  Despite  the  fact  that  the 
TAOs  were  under  considerable  time  pressure  and  were  ex¬ 
tremely  well  practiced  experts,  their  performance  rarely  ap¬ 
proached  the  fastest  possible  model.  In  fact,  their  performance 
most  closely  matched  a  version  of  the  model  termed  the  “hier¬ 
archical  motor-parallel  model.”  In  this  version  of  the  model,  eye, 
hand,  and  vocal  movements  are  executed  in  parallel  with  one 
another  when  possible;  furthermore,  the  motor  processor  is 
used  somewhat  aggressively,  preparing  the  next  movement 
while  the  current  movement  was  in  progress.  The  primary  place 
where  EPIC  could  be  faster  but  the  data  indicated  the  TAOs 
were  not  was  in  the  description  of  the  task  knowledge.  It  is  pos¬ 
sible  to  represent  the  knowledge  for  this  task  as  one  single,  flat 
GOMS  method  with  no  use  of  subgoals.  On  the  other  hand,  the 
EPIC  productions  could  represent  the  full  subgoal  structure  or 
a  more  traditional  GOMS  model.  Retaining  the  hierarchical  rep¬ 
resentation — thus  incurring  time  costs  for  goal  management — 
provided  the  best  fit  to  the  TAOs  performance.  This  provides 
solid  evidence  for  the  psychological  reality  of  the  hierarchical 
control  structure  inherent  in  GOMS  analysis,  since  even  well- 
practiced  experts  in  fairly  regular  domains  do  not  abandon  it  for 
some  kind  of  faster  knowledge  structure. 

The  final  EPIC  model  that  will  be  considered  is  the  model  of 
the  task  first  presented  in  Balias,  Heitmeyer,  and  Perez  (1992). 
Again,  this  model  first  appeared  in  Kieras  and  Meyer  (1997),  but 
a  richer  version  of  the  model  is  described  in  more  detail  later, 
in  Kieras,  Meyer,  and  Balias  (2001).  The  display  used  is  a  split 
screen.  On  the  right  half  of  the  display,  the  user  was  confronted 
with  a  manual  tracking  task,  which  was  performed  using  a  joy¬ 
stick.  The  left  half  of  the  display  was  a  tactical  task  in  which  the 
user  had  to  classify  objects  as  hostile  or  neutral  based  on  their 
behavior.  There  were  two  versions  of  the  interface  to  the  tactical 
task,  one  a  command-line  style  interface  using  a  keypad  and  one 
a  direct-manipulation-style  interface  using  a  touch  screen.  The 
performance  measure  of  interest  in  this  task  is  the  time  taken  to 
respond  to  events  (such  as  the  appearance  of  a  new  object  or  a 
change  in  state  of  an  object)  on  the  tactical  display. 

This  is  again  a  task  well  suited  to  EPIC  because  the  perceptual- 
motor  demands  are  extensive.  This  is  not,  however,  what  makes 
this  task  so  interesting.  What  is  most  interesting  is  the  human 
performance  data:  in  some  cases,  the  keypad  interface  was 
faster  than  the  touch  screen  interface,  and  in  many  cases,  the 
two  yielded  almost  identical  performance,  while  in  some  other 
cases,  the  touch  screen  was  faster.  Thus,  general  claims  about 
the  superiority  of  GUIs  do  not  apply  to  this  case  and  a  more  pre¬ 
cise  and  detailed  account  is  necessary. 

EPIC  provides  just  the  tools  necessary  to  do  this.  Two  models 
were  constructed  for  each  interface,  again  using  the  bracketing 
approach.  The  results  were  revealing.  In  fact,  the  fastest  possible 
models  showed  no  performance  advantage  for  either  interface. 
The  apparent  direct-manipulation  advantage  of  the  touch  screen 
for  initial  target  selection  was  almost  perfectly  offset  by  some 
type-ahead  advantages  for  the  keypad.  The  reason  for  the  in¬ 
consistent  results  is  that  the  users  generally  did  not  operate  at  the 
speed  of  the  fastest  possible  model;  they  tended  to  work  some¬ 
where  in  between  the  brackets  for  both  interfaces.  However,  they 


tended  to  work  more  toward  the  upper  (slowest-reasonable) 
bracket  for  the  touch  screen  interface.  This  suggests  an  advan¬ 
tage  for  the  keypad  interface,  but  the  caveat  is  that  the  slowest 
reasonable  performance  bound  for  the  touch  screen  was  faster 
than  the  slowest  possible  for  the  keypad.  Thus,  any  strategy 
changes  made  by  users  in  the  course  of  doing  the  task,  perhaps 
as  a  dynamic  response  to  changes  in  workload,  could  affect 
which  interface  would  be  superior  at  any  particular  point  in  the 
task.  Thus,  results  about  which  interface  is  faster  are  likely  to 
be  inconsistent — which  is  exactly  what  was  found. 

This  kind  of  analysis  would  be  impossible  to  conduct  without 
a  clear,  quantitative  human  performance  model.  Constructing 
and  applying  such  a  model  also  suggested  an  alternative  inter¬ 
face  that  would  almost  certainly  be  faster  than  either,  which  is 
one  using  a  speech-driven  interface.  One  of  the  major  perfor¬ 
mance  bottlenecks  in  the  task  was  the  hands,  and  so  in  this  case, 
voice-based  interaction  should  be  faster.  Again,  this  could  only 
be  clearly  identified  with  the  kind  of  precise  quantitative  mod¬ 
eling  enabled  by  something  like  the  EPIC  architecture. 

ACT-R  5.0 

ACT-R  5.0  (Anderson  et  al.,  2004)  represents  another  approach 
to  a  fully  unified  cognitive  architecture,  combining  a  very  broad 
model  of  cognition  with  rich  perceptual-motor  capabilities.  ACT-R 
5.0  is  the  most  recent  iteration  of  the  ACT-R  cognitive  architec¬ 
ture  (introduced  in  Anderson,  1993).  ACT-R  has  a  long  history 
within  cognitive  psychology,  as  various  versions  of  the  theory 
have  been  developed  over  the  years.  In  general,  the  ACT  family 
of  theories  has  been  concerned  with  modeling  the  results  of  psy¬ 
chology  experiments,  and  this  is  certainly  true  of  the  current  in¬ 
carnation,  ACT-R.  Anderson  and  Lebiere  (1998)  show  some  of 
this  range,  covering  areas  as  diverse  as  list  memory  (chapter  7), 
choice  (chapter  8),  and  scientific  reasoning  (chapter  11). 

ACT-R  is,  like  EPIC  and  Soar,  a  production  system  with  activ¬ 
ity  centered  on  the  production  cycle,  which  is  also  set  at  50  ms 
in  duration;  however,  there  are  many  differences  between  ACT- 
R  and  the  other  architectures.  Eirst,  ACT-R  can  only  fire  one  pro¬ 
duction  rule  per  cycle.  When  multiple  production  rules  match 
on  a  cycle,  an  arbitration  procedure  called  conflict  resolution 
comes  into  play.  Second,  ACT-R  has  a  well-developed  theory  of 
declarative  memory.  Unlike  EPIC  and  Soar,  declarative  memory 
elements  in  ACT-R  are  not  simply  symbols.  Each  declarative  el¬ 
ement  in  ACT-R  also  has  an  activation  value  associated  with  it, 
which  determines  whether  and  how  rapidly  it  may  be  accessed. 
Third,  ACT-R  contains  learning  mechanisms,  but  is  not  a  perva¬ 
sive  learning  system  in  the  same  sense  as  Soar.  These  mecha¬ 
nisms  are  based  on  a  “rational  analysis”  (Anderson,  1990)  of  the 
information  needs  of  an  adaptive  cognitive  system. 

Eor  example  of  rational  analysis,  consider  conflict  resolution. 
Each  production  in  ACT-R  has  associated  with  it  several  numeric 
parameters,  including  numbers  which  represent  the  probabil¬ 
ity  that  if  the  production  fires,  the  goal  will  be  reached  and  the 
cost,  in  time,  that  will  be  incurred  if  the  production  fires.  These 
values  are  combined  according  to  a  formula  that  trades  off  prob¬ 
ability  of  success  vs.  cost  and  produces  an  “expected  gain”  for 
each  production.  The  matching  production  with  the  highest 
expected  gain  is  the  one  that  gets  to  fire  when  conflict  resolu- 
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tion  is  invoked.  The  expected  gain  values  are  noisy,  so  the  sys¬ 
tem’s  behavior  is  somewhat  stochastic,  and  the  probability  and 
cost  values  are  learned  over  time  so  that  ACT-R  can  adapt  to 
changing  environments. 

Similarly,  the  activation  of  elements  in  declarative  memory 
is  based  on  a  Bayesian  analysis  of  the  probability  that  a  declara¬ 
tive  memory  element  will  be  needed  at  a  particular  time.  This 
is  a  function  of  the  general  utility  of  that  element,  reflected  in 
what  is  termed  its  “base-level”  activation,  and  that  element’s  as¬ 
sociation  with  the  current  context.  The  more  frequently  and  re¬ 
cently  an  element  has  been  accessed,  the  higher  its  base-level 
activation  will  be,  and  thus  the  easier  it  is  to  retrieve.  This  value 
changes  over  time  according  to  the  frequency  and  recency  of 
use,  thus  this  value  is  learned.  These  kinds  of  mechanisms  have 
helped  enable  ACT-R  to  successfully  model  a  wide  range  of  cog¬ 
nitive  phenomena. 

The  current  version  of  the  theory,  5.0,  incorporates  several 
important  changes  over  previous  versions  of  the  theory.  First, 
the  architecture  is  now  a  fully  modular  architecture,  with  fully 
separate  and  independent  modules  for  different  kinds  of  infor¬ 
mation  processing  (such  as  declarative  memory  and  vision) . 
Second,  ACT-R  5.0  no  longer  contains  a  “goal  stack”  for  auto¬ 
mated  goal  management;  goals  are  now  simply  items  in  declar¬ 
ative  memory  and  subject  to  processes  like  learning  and  decay. 
Third,  the  new  modular  architecture  allows  processing  in  ACT-R 
to  be  mapped  onto  the  human  brain,  with  different  modules 
identified  with  different  brain  regions.  Finally,  a  production  rule 
learning  mechanism  has  been  fully  incorporated,  so  learning  is 
now  more  pervasive  in  ACT-R.  The  5.0  version  of  the  theory  sub¬ 
sumes  all  previous  versions  of  the  theory  including  ACT-R/PM, 
which  is  now  considered  obsolete. 

In  ACT-R  5.0,  the  basic  production  system  is  augmented  with 
four  EPIC-like  peripheral  modules,  as  well  as  a  goal  module  and 
a  declarative  memory  module,  as  depicted  in  Fig.  5.3.  Like  EPIC, 
all  of  these  modules  run  in  parallel  with  one  another,  giving 
ACT-R  the  ability  to  overlap  processing.  The  peripheral  modules 
come  from  a  variety  of  sources.  ACT-R’s  "Vision  Module  is  based 
on  the  ACT-R  "Visual  Interface  described  in  Anderson,  Matessa, 
and  Lebiere  (1997).  This  is  a  feature-based  attentional  visual 
system,  but  does  not  explicitly  model  eye  movements.  Recently, 
the  Vision  Module  has  been  extended  to  include  an  eye-move¬ 
ment  model  (Salvucci,  2001a)  as  well.  The  Motor  Module  is 
nearly  identical  to  the  Manual  Motor  Processor  in  EPIC  and  is 
based  directly  on  the  specification  found  in  Kieras  and  Meyer 
(1996),  and  the  Speech  Module  is  similarly  derived  from  EPIC. 
The  Audition  Module  is  a  hybrid  of  the  auditory  system  found  in 
EPIC  and  the  attentional  system  in  ACT-R’s  Vision  Module. 

One  other  important  property  of  ACT-R  is  that  it  is  possible 
to  have  models  interact  with  the  same  software  as  the  human 
users  being  modeled.  The  software  development  conditions  are 
fairly  restrictive,  but  if  these  conditions  are  met,  then  both  the 
user  and  the  model  are  forced  to  use  the  same  software.  This  re¬ 
duces  the  number  of  degrees  of  freedom  available  to  the  mod¬ 
eler  in  that  it  becomes  impossible  to  force  any  unpleasant  mod¬ 
eling  details  into  the  model  of  the  user  interface,  because  there 
is  no  model  of  the  user  interface.  More  will  be  said  about  this 
issue  in  the  next  section. 

ACT-R  has  been  applied  to  numerous  HCI  domains.  The  first 
example  comes  from  the  dissertation  work  of  Ehret  (1999). 


FIGURE  5.3.  Overall  structure  of  the  ACT-R  5.0  architecture. 


Among  other  things,  Ehret  developed  an  ACT-R/PM  model  of  a 
fairly  simple,  but  subtle,  experiment.  In  that  experiment,  sub¬ 
jects  were  shown  a  target  color  and  asked  to  click  on  a  button 
that  would  yield  that  color.  The  buttons  themselves  had  four 
types:  (a)  blank,  (b)  arbitrary  icon,  (c)  text  label,  and  (d)  color. 
In  the  color  condition,  the  task  was  simple:  users  simply  found 
the  color  that  matched  the  target  and  then  clicked  the  button. 
In  the  text  label  condition,  the  task  was  only  slightly  more  diffi¬ 
cult:  users  could  read  the  labels  on  the  buttons  and  select  the 
correct  one  because  the  description  matched  the  color.  In  the 
arbitrary  icon  condition,  more  or  less  random  pictures  appeared 
on  each  icon  (e.g.,  a  mailbox).  Users  had  to  either  memorize  the 
picture-to-color  mapping,  which  they  had  to  discover  by  trial 
and  error,  or  memorize  the  location  of  each  color,  since  the  but¬ 
tons  did  not  change  their  functions  over  time.  In  the  hardest 
condition,  the  blank  condition,  users  simply  had  to  memorize 
the  mapping  between  button  location  and  color,  which  they  had 
to  discover  through  trial  and  error. 

Clearly,  the  conditions  will  produce  different  average  re¬ 
sponse  times,  and  what  Ehret  (1999)  found  is  that  they  also  pro¬ 
duced  somewhat  different  learning  curves  over  time.  Ehret 
added  an  additional  manipulation  as  well:  after  users  performed 
the  task  for  some  time,  all  labeling  was  removed.  Not  surpris¬ 
ingly,  the  amount  of  disruption  was  different  in  the  different  con¬ 
ditions,  reflecting  the  amount  of  incidental  location  learning  that 
went  on  as  subjects  performed  the  task.  The  ACT-R  model  that 
Ehret  constructed  did  an  excellent  job  of  explaining  the  results. 
This  model  represented  the  screen  with  the  built-in  visual  mech¬ 
anisms  from  ACT-R  and  learned  the  mappings  between  color  and 
location  via  ACT-R’s  standard  associative  learning  mechanisms. 
The  initial  difference  between  the  various  conditions  was  repro- 
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duced,  as  were  the  four  learning  curves.  The  model  also  suffered 
disruptions  similar  to  those  suffered  by  the  human  users  when 
the  labels  were  removed.  This  model  is  an  excellent  demonstra¬ 
tion  of  the  power  of  ACT-R,  exercising  both  the  perceptual-mo- 
tor  capabilities  of  the  system  as  well  as  the  graded  learning  in 
ACT-R’s  rational-analysis  driven  mechanisms. 

Salvucci  (2001b)  described  an  ACT-R  model  that  tests  ACT- 
R’s  ability  to  handle  multimodal,  high-performance  situations  in 
a  very  compelling  task:  this  model  drives  an  automobile  driving 
simulator.  This  is  not  a  robotics  project;  the  ACT-R  model  does 
not  actually  turn  the  steering  wheel  or  manipulate  the  pedals; 
rather,  it  communicates  with  the  automobile  simulation  soft¬ 
ware.  The  model’s  primary  job  is  to  maintain  lane  position  as 
the  car  drives  down  the  road.  Salvucci  (2001b)  added  an  addi¬ 
tional  task  that  makes  it  particularly  interesting:  the  model  di¬ 
als  telephone  numbers  on  a  variety  of  mobile  phone  interfaces. 
Two  factors  were  crossed:  (a)  whether  the  telephone  was  di¬ 
aled  manually  via  keypad  vs.  dialed  by  voice,  and  (b)  whether 
the  full  telephone  number  needed  to  be  dialed  vs.  a  shortened 
“speed  dial”  system.  The  model  was  also  validated  by  compari¬ 
son  with  data  from  human  users. 

Both  the  model  and  the  human  users  showed  that  dialing 
while  not  driving  is  faster  than  dialing  while  driving,  and  that 
steering  performance  can  be  disrupted  by  telephone  dialing. 
Not  surprisingly,  the  most  disruptive  interface  was  the  “full- 
manual”  interface,  where  the  full  phone  numbers  were  dialed 
on  a  keypad.  This  is  due  largely  to  the  fact  that  dialing  with  the 
keypad  requires  visual  guidance,  causing  the  model  (and  the 
users)  to  take  their  eyes  off  the  road.  There  was  very  little  dis¬ 
ruption  associated  with  the  voice  interfaces,  regardless  of 
whether  full  numbers  or  speed-dial  was  used. 

This  is  a  powerful  illustration  of  the  value  of  cognitive 
architectures  for  a  number  of  reasons.  First,  the  basic  driving 
model  could  simply  be  reused  for  this  task;  it  did  not  have  to  be 
reimplemented.  Second,  the  model  provides  an  excellent 
quantitative  fit  to  the  human  data.  Third,  this  is  an  excellent 
example  of  a  situation  where  testing  human  users  can  be  diffi¬ 
cult.  Testing  human  drivers  with  interfaces  that  degrade  driving 
performance  is  dangerous,  so  simulators  are  generally  used  for 
this  kind  of  evaluation;  however,  maintaining  a  driving  simula¬ 
tor  requires  a  great  deal  of  space  and  is  quite  expensive.  If 
someone  wanted  to  test  another  variant  of  the  telephone  in¬ 
terface,  it  would  be  much  faster  and  cheaper  to  give  that  inter¬ 
face  to  Salvucci’s  (2001b)  model  than  it  would  be  to  recruit  and 
test  human  drivers. 

One  of  the  long-standing  criticisms  of  cognitive  architectures 
in  HCI  is  that  they  are  theoretical  tools  designed  to  support  re¬ 
search  and  are  too  difficult  for  the  average  practitioner  to  use 
in  a  more  applied  context.  This  criticism  certainly  holds  some 
truth  (though  how  much  may  be  open  to  debate);  however, 
some  researchers  are  actively  trying  to  address  it.  One  of  the 
most  interesting  developments  in  the  cognitive  architecture 
arena  in  the  past  few  years  has  been  the  work  of  John  and  col¬ 
leagues  (2004)  on  something  they  term  “CogTool”  (see  http:// 
www.cogtool.org).  CogTool  allows  interface  designers  to  mock 
up  an  interface  in  HTML,  demonstrate  the  tasks  to  be  per¬ 
formed  in  a  GUI  environment,  and  have  an  ACT-R  based  cogni¬ 
tive  model  of  the  tasks  automatically  synthesized.  Alternative 


interfaces  can  be  mocked  up,  and  predicted  performance  com¬ 
pared,  giving  the  designer  important  information  about  how 
changes  in  the  interface  may  impact  user  performance.  This  is 
not  limited  strictly  to  prediction  of  time  of  execution,  either,  as 
the  full  model  output — a  complete  trace  of  the  predicted  user 
behavior — is  also  available  for  inspection. 

This  is  an  important  advance,  as  it  makes  the  power  of  com¬ 
plete  and  detailed  computational  cognitive  modeling  available 
to  people  with  no  experience  with  cognitive  architectures.  It 
also  has  important  ramifications  for  the  methodology  of  model 
building,  since  models  generated  by  CogTool  are  not  hand- 
tweaked  in  an  attempt  to  optimize  fit  to  data;  in  fact,  the  intent 
is  to  make  predictions  about  nonexistent  data.  Some  validation, 
however,  has  been  performed,  and  early  explorations  with  Cog¬ 
Tool  have  found  the  time  predictions  to  be  quite  accurate.  As 
exciting  a  development  as  this  is,  it  is  important  to  note  that 
there  are  clearly  limitations.  The  language  of  available  actions 
is  based  on  the  keystroke-level  model  (KLM)  of  Card,  Moran, 
and  Newell  (1983)  and  is  thus  somewhat  limited.  For  example, 
it  does  not  provide  a  model  for  any  kind  of  cognition  more  com¬ 
plex  than  very  simple  decisions  about  where  to  point.  For  ex¬ 
ample,  if  one  wanted  to  model  use  of  an  ATM,  CogTool  would 
not  model  the  process  of  the  user  trying  to  retrieve  a  security 
code  from  memory.  (However,  if  the  user  was  highly  practiced 
such  that  this  retrieval  is  very  rapid  and  always  successful,  then 
CogTool  could  be  appropriate.)  Efforts  are  underway  to  make 
automatic  translation  of  slightly  more  complex  GOMS-style 
models  into  ACT-R  models  possible  (St.  Amant,  Freed,  &  Ritter, 
2005),  but  that  still  would  not  address  the  more  complex  cog¬ 
nition  required  by  many  interfaces.  Nonetheless,  CogTool  rep¬ 
resents  a  significant  advance  in  making  architecture-based  mod¬ 
eling  accessible  to  those  who  are  not  experts  in  the  field. 

The  last  application  of  ACT-R  in  HCI  to  be  covered  in  de¬ 
tail  is  based  on  Pirolli  and  Card’s  (1999)  theory  of  information 
foraging.  This  theory  is  based  on  the  idea  that  people  forage 
for  information  in  much  the  same  way  that  animals  forage  for 
food.  The  literature  on  the  latter  type  of  foraging  is  quite  rich, 
including  equations  that  describe  when  foragers  will  give  up 
on  a  particular  patch  of  information  and  move  on  to  the  next 
one.  Foraging  for  food  is  often  guided  by  proximal  cues  (e.g., 
scent)  that  indicate  the  presence  of  distal  content  (e.g.,  food); 
analogously,  information  foragers  may  use  proximal  cues  (e.g., 
visible  link  labels)  to  guide  their  search  for  information  that 
they  cannot  presently  access.  In  the  spirit  of  the  analogy,  Pirolli 
and  Card  term  these  proximal  cues  “information  scent.”  That 
such  a  theory  should  be  applicable  to  the  web  should  be  fairly 
obvious. 

This  theory  has  spawned  a  number  of  models  based  on 
modified  versions  of  ACT-R.  The  original  model  from  Pirolli  and 
Card  (1999)  was  called  ACT-IF  (for  “information  foraging”). 
Later  versions  were  SNIF-ACT  1.0  (Card  et  al.,  2001)  and  SNIF- 
ACT  2.0  (Pirolli  &  Fu,  2003),  where  SNIF  stands  for  “scent-based 
navigation  and  information  foraging.”  These  models  use  a  mod¬ 
ified  form  of  ACT-R’s  spreading  activation  to  compute  informa¬ 
tion  scent,  which  drives  the  model’s  choices  as  it  browses 
through  web  pages.  SNIF-ACT  1.0  and  later  also  uses  an  altered 
version  of  ACT-R’s  conflict  resolution  mechanism  to  determine 
when  to  abandon  the  current  website  (or  information  “patch”) 
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and  move  on  to  a  new  site.  SNIF-ACT  2.0  adds  a  further  im¬ 
provement,  which  is  the  use  of  a  new  measure  to  compute  the 
strengths  of  association  (and  therefore  spreading  activation) 
between  words.  This  new  measure,  termed  “pointwise  mutual 
information”  (PMI),  is  much  easier  to  compute  than  LSA,  but 
generally  produces  similar  results.  SNIF-ACT  2.0  was  used  to 
predict  link  selection  frequency  in  a  large  (N  =  244)  Web  be¬ 
havior  study  with  impressive  results,  accounting  for  72%  of 
the  selection  variance  on  one  site  and  90%  of  the  variance  on 
a  second  site. 

Obviously,  this  work  is  clearly  related  to  CWW  In  fact,  the  for¬ 
aging  work  has  also  led  to  a  tool  to  help  automate  the  process  of 
web  usability  assessment,  this  one  named  “Bloodhound”  (Chi 
et  al.,  2003).  Bloodhound  takes  a  site  and  a  goal  description  and 
uses  scent  metrics  combined  with  Monte  Carlo  methods  to 
compute  usage  patterns  and  identify  navigation  problems.  The 
two  projects  clearly  have  similar  goals  and  similar  outputs.  What 
is  less  clear  is  the  extent  to  which  the  differences  in  the  under¬ 
lying  cognitive  architecture  and  differences  in  other  aspects  of 
the  techniques  (such  as  LSA  vs.  PMI)  actually  lead  to  different 
predictions  about  user  behavior.  As  of  this  writing,  the  two  sys¬ 
tems  have  never  both  been  applied  to  the  same  website.  How¬ 
ever  appealing  such  a  “bake-off”  might  be,  it  is  important  not 
to  lose  sight  of  the  advance  even  the  possibility  of  such  a  com¬ 
parison  represents:  it  is  only  recently  that  the  Held  has  had  psy¬ 
chologically  informed  and  quantitative  predictions  about  some¬ 
thing  as  complex  as  Web  behavior  available  for  potential 
comparison.  Thus,  this  is  a  concrete  example  of  cognitive  ar¬ 
chitectures  paying  off  in  HCI. 

Other  published  and  ongoing  work  applying  ACT-R  to  HCI- 
oriented  problems  is  available  (e.g.,  Taatgen  &  Lee,  2003;  Pee¬ 
bles  &  Cheng,  2003;  Byrne,  2001;  Schoelles  &  Gray,  2000),  but 
space  considerations  prohibit  a  more  exhaustive  review.  The  vi¬ 
tality  of  the  research  effort  suggests  that  ACT-R’s  combination  of 
perceptual-motor  modules  and  a  strong  theory  of  cognition  will 
continue  to  pay  dividends  as  an  HCI  research  tool. 

In  fact,  this  is  not  limited  to  ACT-R;  overall,  cognitive  archi¬ 
tectures  are  an  exciting  and  active  area  of  HCI  research.  The  sys¬ 
tems  described  here  all  take  slightly  different  approaches  and 
focus  on  slightly  different  aspects  of  various  HCI  problems,  but 
there  is  clearly  a  great  deal  of  cross-pollination.  Lessons  learned 
and  advancements  made  in  one  architecture  often  affect  other 


systems,  for  example,  the  development  of  EPIC ’s  peripheral  sys¬ 
tems  clearly  impacted  ACT-R. 

Comparisons 

An  exhaustive  and  detailed  comparison  of  the  major  cognitive 
architectures  is  beyond  the  scope  of  this  chapter;  however,  an 
excellent  comparison  that  includes  a  number  of  other  architec¬ 
tures  can  be  found  in  Pew  and  Mavor  (1998).  Certainly,  the  three 
production  systems  are  related:  (a)  Soar,  (b)  EPIC,  and  (c)  ACT-R. 
A  major  difference  between  them  is  their  original  focus;  they 
were  originally  developed  to  model  slightly  different  aspects  of 
human  cognition.  As  they  develop,  however,  there  appears  to 
be  more  convergence  than  divergence.  This  is  generally  taken  as 
a  good  sign  that  the  science  is  cumulating.  Still,  there  are  dif¬ 
ferences,  and  certainly  between  the  production  systems  and  LI- 
CAI/CoLiDeS.  Many  of  the  relevant  comparisons  are  summa¬ 
rized  in  Table  5.1. 

Most  of  the  entries  on  this  table  have  been  previously  dis¬ 
cussed,  with  the  exception  of  the  last  two.  Support  for  learning 
will  be  discussed  in  the  next  section.  The  presence  and  size  of 
the  user  community  has  not  been  discussed,  as  it  is  not  clear 
what  role  (if  any)  such  a  community  plays  in  the  veridicality  of 
the  predictions  made  by  the  system;  however,  it  may  be  relevant 
to  researchers  for  other  reasons,  particularly  those  trying  to 
learn  the  system. 

In  addition,  many  of  the  values  on  this  table  are  likely  to 
change  in  the  future.  Eor  example,  the  6.0  version  of  ACT-R  will 
be  available  by  the  time  this  chapter  appears  (though  there  are 
few  major  theoretical  changes  planned),  and  planning  for  a  ma¬ 
jor  revision  for  version  7.0  is  already  underway. 

It  is  difficult  to  classify  the  value  an  architecture  has  on  a  par¬ 
ticular  attribute  as  an  advantage  or  a  disadvantage,  because  what 
constitutes  an  advantage  for  modeling  one  phenomenon  may 
be  a  disadvantage  for  modeling  others.  For  example,  consider 
learning  in  Soar.  Certainly,  when  attempting  to  model  the  im¬ 
provement  of  users  over  time  with  a  particular  interface.  Soar’s 
learning  mechanism  is  critical;  however,  there  are  many  appli¬ 
cations  for  which  modeling  learning  is  not  critical,  and  Soar’s 
pervasive  learning  feature  occasionally  causes  undesired  side  ef¬ 
fects  that  can  make  model  construction  more  difficult. 


TABLE  5.1.  Architecture  Feature  Comparison 


LICAI/CoLiDeS 

EPIC 

ACT-R  5.0 

Original  focus 

Text  comprehension 

Multiple-task  performance 

Memory  and  problem-solving 

Basic  cycle 

Construction-  integration 

Production  cycle  (parallel) 

Production  cycle  (serial) 

Symbolic  or  activation-based? 

Both 

Symbolic 

Both 

Architectural  goal  management 

Special  cycle  types,  supports 
Vague  goals 

None 

None 

Detailed  perceptual-motor  systems 

No 

Yes 

Yes 

Learning  mechanisms 

No 

No 

Yes 

Large,  integrated  models 

No 

No 

Soon 

Extensive  natural  language 

Yes 

No 

No 

Support  for  learning  system 

None 

None 

Extensive  tutorial  materials,  summer  school 

User  community* 

None 

None 

Growing,  primarily  psychology 

*Outside  of  the  researchers  who  have  developed  the  system. 
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THE  FUTURE  OF  COGNITIVE 
ARCHITECTURES  IN  HCI 


Beyond  the  incremental  development  and  application  of  archi¬ 
tectures  such  as  EPIC  and  ACT-R,  what  will  the  future  hold  for 
cognitive  architectures  in  HCI?  What  challenges  are  faced,  and 
what  is  the  ultimate  promise?  Indeed,  a  number  of  important 
limitations  for  cognitive  architectures  currently  exist.  There  are 
questions  they  cannot  yet  address,  and  it  is  hard  to  see  how  they 
even  would  address  some  questions.  Other  limitations  are  more 
pragmatic  than  in  principle,  but  these  are  relevant  as  well. 

First,  there  is  a  wide  array  of  HCI  problems  that  are  simply 
outside  the  scope  of  current  cognitive  architectures.  Right  now, 
these  architectures  focus  on  cognition  and  performance,  but 
not  on  other  aspects  of  HCI,  such  as  user  preference,  boredom, 
aesthetics,  fun,  and  so  on.  Another  important  challenge,  though 
one  that  might  be  overcome,  is  that  these  architectures  gener¬ 
ally  have  not  been  applied  to  social  situations,  such  as  those  en¬ 
countered  in  groupware  or  online  communities  (Olson  &  Ol¬ 
son,  this  volume;  Preece,  this  volume).  In  principle,  it  is  possible 
to  implement  a  model  of  social  interaction  in  a  cognitive  archi¬ 
tecture;  however,  the  knowledge  engineering  problem  here 
would  certainly  be  a  difficult  one.  How  does  one  characterize 
and  implement  knowledge  about  social  situations  with  enough 
precision  to  be  implemented  in  a  production  system?  It  may  ul¬ 
timately  be  possible  to  do  so,  but  it  is  unlikely  that  this  will  hap¬ 
pen  anytime  soon. 

One  problem  that  will  never  entirely  be  resolved,  no  matter 
how  diligent  the  modelers  are,  is  the  knowledge  engineering 
problem.  Every  model  constructed  using  a  cognitive  architec¬ 
ture  still  needs  knowledge  about  how  to  use  the  interface  and 
what  the  tasks  are.  By  integrating  across  models,  the  knowledge 
engineering  demands  when  entering  a  new  domain  may  be  re¬ 
duced  (the  NTD  is  a  good  example),  but  they  will  never  be  elim¬ 
inated.  This  requirement  will  persist  even  if  an  architecture  was 
to  contain  a  perfect  a  theory  of  human  learning — and  there  is 
still  considerable  work  to  be  done  to  meet  that  goal. 

Another  barrier  to  the  more  widespread  use  of  cognitive  ar¬ 
chitectures  in  HCI  is  that  the  architectures  themselves  are  large 
and  complex  pieces  of  software,  and  (ironically)  little  work  has 
been  done  to  make  them  usable  or  approachable  for  novices. 
For  example,  “EPIC  is  not  packaged  in  a  ‘user-friendly’  manner; 
full-fledged  Lisp  programming  expertise  is  required  to  use  the 
simulation  package,  and  there  is  no  introductory  tutorial  or 
user’s  manual”  (Kieras  &  Meyer,  1997,  p.  399).  The  situation  is 
slightly  better  for  ACT-R:  tutorial  materials,  documentation,  and 
examples  for  ACT-R  are  available,  and  most  years  there  is  a  two- 
week  “summer  school”  for  those  interested  in  learning  ACT-R 
(see  http://act.psy.cmu.edu). 

Another  limiting  factor  is  implementation.  In  order  for  a  cog¬ 
nitive  architecture  to  accurately  model  interaction  with  an  in¬ 
terface,  it  must  be  able  to  communicate  with  that  interface.  Be¬ 
cause  most  user  interfaces  are  “closed”  pieces  software  with  no 
built-in  support  for  supplying  a  cognitive  model  with  the  infor¬ 
mation  it  needs  for  perception  (e.g.,  what  is  on  the  screen 
where)  or  accepting  input  from  a  model,  this  creates  a  technical 
problem.  Somehow,  the  interface  and  the  model  must  be  con¬ 
nected.  An  excellent  summary  of  this  problem  can  be  found  in 


Ritter,  Baxter,  Jones,  and  Young  (2000).  A  number  of  different 
approaches  have  been  taken.  In  general,  the  EPIC  solution  to 
this  problem  has  been  to  reimplement  the  interface  to  be  mod¬ 
eled  in  Lisp  (more  recent  versions  of  EPIC  require  C-f-f),  so  the 
model  and  the  interface  can  communicate  via  direction  function 
calls.  The  ACT-R  solution  is  not  entirely  dissimilar.  In  general, 
ACT-R  has  only  been  applied  to  relatively  new  experiments  or  in¬ 
terfaces  that  were  initially  implemented  in  Lisp,  and  thus  ACT- 
R  and  the  interface  can  communicate  via  function  calls.  In  or¬ 
der  to  facilitate  the  construction  of  models  and  reduce  the 
modeler’s  degrees  of  freedom  in  implementing  a  custom  inter¬ 
face  strictly  for  use  by  a  model,  ACT-R  does  provide  some  abili¬ 
ties  for  automatically  managing  this  communication  when  the 
interface  is  built  with  the  native  GUI  builder  for  Macintosh  Com¬ 
mon  Lisp  under  MacOS  and  Allegro  Common  Lisp  under  'Win¬ 
dows.  If  the  interface  is  implemented  this  way,  both  human 
users  and  the  models  can  interact  with  the  same  interface. 

The  most  intriguing  development  along  this  line,  however,  is 
recent  work  by  St.  Amant  and  colleagues  (St.  Amant  &  Riedl, 
2001;  St.  Amant,  Horton,  &  Ritter,  in  press).  They  have  imple¬ 
mented  a  system  called  SegMan,  which  directly  parses  the 
screen  bitmap  on  'Windows  systems — that  is,  given  a  'Windows 
display — any  'Windows  display — SegMan  can  parse  it  and  iden¬ 
tify  things  such  as  text,  scroll  bars,  GUI  widgets,  and  the  like.  It 
also  has  facilities  for  simulating  mouse  and  keyboard  events. 
This  is  an  intriguing  development,  because  in  principle,  it 
should  be  possible  to  connect  this  to  an  architecture  such  as 
EPIC  or  ACT-R,  which  would  enable  the  architecture  to  poten¬ 
tially  work  with  any  "Windows  application  in  its  native  form. 

"While  a  number  of  technical  details  would  have  to  be  worked 
out,  this  has  the  potential  of  fulfilling  one  of  the  visions  held  by 
many  in  the  cognitive  architecture  community:  a  high-fidelity 
“virtual  user”  that  could  potentially  use  any  application  or  even 
combination  of  applications.  Besides  providing  a  wide  array  of 
new  domains  to  researchers,  this  could  be  of  real  interest  to 
practitioners  as  well  because  this  opens  the  door  for  at  least 
some  degree  of  automated  usability  testing.  This  idea  is  not  a 
new  one  (e.g.,  Byrne,  "Wood,  Sukaviriya,  Foley,  &  Kieras,  1994; 
St.  Amant,  2000),  but  technical  and  scientific  issues  have  pre¬ 
cluded  its  adoption  on  even  a  limited  scale.  This  would  not  elim¬ 
inate  the  need  for  human  usability  testing  (see  Ritter  &  Young, 
2001,  for  a  clear  discussion  of  this  point)  for  some  of  the  reasons 
listed  above,  but  it  could  significantly  change  usability  engi¬ 
neering  practice  in  the  long  run. 

The  architectures  themselves  will  continue  to  be  updated 
and  applied  to  more  tasks  and  interfaces.  As  mentioned,  a  new 
version  of  ACT-R  (version  6.0)  is  currently  under  development, 
and  this  new  version  has  definitely  been  influenced  by  issues 
raised  by  HCI  concerns.  New  applications  of  EPIC  result  in  new 
mechanisms  (e.g.,  similarity-based  decay  in  verbal  working 
memory  storage;  Kieras,  Meyer,  Mueller,  &  Seymour,  1999)  and 
new  movement  styles  (e.g.,  click-and-point,  Hornof  &  Kieras, 
1999).  Applications  such  as  the  "World  "Wide  "Web  are  likely  to 
drive  these  models  into  more  semantically  rich  domain  areas, 
and  tasks  that  involve  greater  amounts  of  problem  solving  are 
also  likely  candidates  for  future  modeling. 

Despite  their  limitations,  this  is  a  particularly  exciting  time  to 
be  involved  in  research  on  cognitive  architectures  in  HCI.  There 
is  a  good  synergy  between  the  two  areas,  as  cognitive  architec- 
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tures  are  certainly  useful  to  HCI,  so  HCI  is  also  useful  for  cognitive 
architectures.  HCI  is  a  complex  and  rich  yet  still  tractable  domain, 
which  makes  it  an  ideal  candidate  for  testing  cognitive  architec¬ 
tures.  HCI  tasks  are  more  realistic  and  require  more  integrated  ca¬ 
pabilities  than  typical  cognitive  psychology  laboratory  experi¬ 
ments,  and  thus  cognitive  architectures  are  the  best  theoretical 
tools  available  from  psychology.  Theories  like  EPIC -Soar  (Chong 
&  Laird,  1997)  and  ACT-R  are  the  first  complete  psychological 
models  that  go  from  perception  to  cognition  to  action  in  detail. 
This  is  a  significant  advance  and  holds  a  great  deal  of  promise. 


The  need  for  truly  quantitative  engineering  models  will  only 
grow  as  user  interfaces  propagate  into  more  and  more  places 
and  more  and  more  tasks.  Cognitive  architectures,  which  al¬ 
ready  have  a  firmly  established  niche  in  HCI,  seem  the  most 
promising  road  toward  such  models.  Thus,  as  the  architectures 
expand  their  range  of  application  and  their  fidelity  to  the  hu¬ 
mans  they  are  attempting  to  model,  this  niche  is  likely  to  ex¬ 
pand.  HCI  is  an  excellent  domain  for  testing  cognitive  architec¬ 
tures  as  well,  so  this  has  been,  and  will  continue  to  be,  a  fruitful 
two-way  street. 
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SZALMAAND  HANCOCK 


TASKS  AS  STRESSORS:  THE  CENTRALITY  OF 
STRESS  IN  HUMAN-COMPUTER  INTERACTION 


For  those  whose  professional  lives  revolve  around  human- 
computer  interaction  (HCI) ,  they  may  ask  themselves  why  they 
should  even  glance  at  a  chapter  on  stress.  While  it  is  evident  that 
many  computer  systems  have  to  support  people  operating  in 
stressful  circumstances,  there  are  important  design  issues  con¬ 
cerning  how  to  present  information  in  these  very  demanding 
circumstances.  However,  one  could  still  ask,  are  these  of  cen¬ 
tral  interest  to  those  in  the  mainstream  of  HCI?  Indeed,  if  these 
were  the  only  issues,  we  would  agree  and  would  recommend 
the  reader  to  pass  quickly  onto  something  of  much  more  evi¬ 
dent  relevance.  However,  we  hope  to  persuade  you  that  the  rel¬ 
evance  of  stress  research  to  HCI  is  not  limited  to  such  concerns. 
Indeed,  we  hope  to  convince  the  reader  that  stress,  in  the  form 
of  task  loading,  is  central  to  all  HCI.  To  achieve  this,  we  first  pre¬ 
sent  a  perspective  that  puts  stress  front  and  center  in  the  HCI 
realm.  Traditionally,  stress  has  been  seen  as  exposure  to  some 
adverse  environmental  circumstances,  such  as  excessive  heat, 
cold,  noise,  vibration,  and  so  forth,  and  its  effects  manifest 
themselves  primarily  in  relation  to  the  physiological  system 
most  perturbed  by  the  stress  at  hand.  However,  Hancock  and 
Warm  (1989)  observed  that  stress  effects  are  virtually  all  medi¬ 
ated  through  the  brain,  but  for  the  cortex  such  effects  are  al¬ 
most  always  of  secondary  concern  since  the  brain  is  primarily  in¬ 
volved  with  the  goals  of  ongoing  behavior  or,  more  simply,  the 
current  task.  Therefore,  we  want  to  change  the  orientation  of 
concern  so  that  stress  is  not  just  a  result  of  peripheral  interfer¬ 
ence  but  rather  that  the  primary  source  of  stress  comes  from 
the  ongoing  task  itself  If  we  now  see  the  task  itself  as  the  pri¬ 
mary  driving  influence  then  stress  concerns  are  central  to  all 
HCI  issues. 

It  is  one  of  the  most  evident  paradoxes  of  modern  work 
that  computer-based  systems,  which  are  designed  to  reduce 
task  complexity  and  cognitive  workload,  actually  often  impose 
even  greater  demands  and  stresses  on  the  very  individuals 
they  are  supposed  to  be  helping.  How  individuals  cope  with 
such  stress  has  both  immediate  and  protracted  effects  on  their 
performance  and  well-being.  Although  operational  environ¬ 
ments  and  their  associated  tasks  vary  considerably  (e.g.,  air 
traffic  control,  baggage  screening,  hospital  patient  monitoring, 
power  plant  operations,  command  and  control,  and  banking 
and  finance),  there  are  certain  mechanisms  that  are  common 
to  the  stress  appraisal  of  all  task  demands.  Thus,  there  are  de¬ 
sign  and  HCI  principles  for  stress  that  generalize  across  multi¬ 
ple  domains  (Hancock  &  Szalma,  2003a).  In  this  chapter  we  ex¬ 
plore  such  principles  to  understand  stress  effects  in  the  HCI 
domain. 

The  structure  of  our  chapter  flows  from  these  fundamental 
observations.  First,  we  provide  the  reader  with  a  brief  over¬ 
view  of  stress  theory  and  its  historical  development  to  set  our 
observations  in  context.  Second,  we  articulate  areas  for  fu¬ 
ture  research  needed  to  more  completely  understand  how 
stress  and  workload  impact  human-computer  interaction  and 
how  to  exploit  the  positive  effects  while  mitigating  their  neg¬ 
ative  effects. 


TRADITIONAL  APPROACHES 
TO  STRESS  RESEARCH 


Traditionally  stress  has  been  conceived  of  as  either  (a)  an  ex¬ 
ternal,  aversive  stimulus  (constituted  of  either  physical,  cogni¬ 
tive,  or  social  stimulation  patterns)  imposed  upon  an  individ¬ 
ual  or  (b)  response  to  such  perturbations.  Each  of  these  views 
presents  certain  intrinsic  operational  difficulties.  Considering 
stress  as  external  stimulation  is  useful  for  categorizing  effects 
of  the  physical  environments  (e.g.,  heat,  noise,  vibration),  but 
such  an  approach  cannot  explain  why  the  same  stimulus  pattern 
has  vastly  different  effects  on  different  individuals.  Physiologi¬ 
cal  interpretations  (e.g.,  Selye,  1976)  have  tried  to  promulgate 
arousal  explanations  of  stress.  However,  the  more  recent  recog¬ 
nition  that  different  sources  of  stress  are  associated  with  differ¬ 
ent  patterns  of  cognitive  effects  made  clear  that  adaptation  or 
arousal  theories  of  stress  do  not  completely  address  the  issue  ei¬ 
ther  (Hockey,  R.,  1984;  Hockey,  R.  &  Hamilton,  1983;  Hockey, 
G.  R.  J..  Gaillard,  &  Coles,  1986). 

Thus  to  understand  stress  effects,  we  now  have  to  embrace 
an  even  wider,  multidimensional  perspective  (e.g.,  Matthews, 
2001).  Here  we  emphasize  a  view  of  stress  as  primarily  an  out¬ 
come  of  the  appraisal  of  environmental  demands  as  either  taxing 
or  exceeding  an  individual’s  resources  to  cope  with  that  demand. 
These  person-environment  transactions  (Lazarus  &  Folkman, 
1984)  occur  at  multiple  levels  within  the  organism  (Matthews, 
2001;  van  Reekum  &  Scherer,  1997)'  Further,  these  processes  rep¬ 
resent  efforts  by  organism  to  adapt  to  demands  imposed  via  reg¬ 
ulation  of  both  the  internal  state  and  the  external  environment.  In 
the  following  section,  we  describe  the  theoretical  frameworks 
that  guide  our  observations  on  HCI.  These  perspectives  emerge 
from  the  work  of  Hancock  and  "Warm  (1989),  G.  R.  J.  Hockey, 
(1997),  and  Lazarus  (1999;  see  also  Lazarus  &  Folkman,  1984). 


THEORETICAL  FRAMEWORKS 


Appraisal  Theory 

Among  the  spectrum  of  cognitive  theories  of  stress  and  emo¬ 
tion,  perhaps  the  best  known  is  the  relational  theory  proposed 
by  Richard  Lazarus  and  his  colleagues  (see  Lazarus,  1991,  1999; 
Lazarus  &  Folkman,  1984).  This  theory  is  cognitive  in  that  stress 
and  emotion  each  depend  upon  an  individual’s  cognitive  ap¬ 
praisals  of  internal  and  external  events,  and  these  appraisals 
depend  in  part  on  the  person’s  knowledge  and  experience  (cf , 
Bless,  2001).  The  theory  is  motivational  in  that  emotions  in  gen¬ 
eral,  including  stress  responses,  are  reactions  to  one’s  perceived 
state  of  progress  toward  or  away  from  one’s  goals  (see  Carver 
&  Scheier,  1998).  The  relational  aspect  emphasizes  the  impor¬ 
tance  of  the  transaction  between  individuals  and  their  environ¬ 
ment.  Together  these  three  components  shape  the  emotional 
and  stress  state  of  an  individual.  The  outcomes  of  these  pro¬ 
cesses  are  patterns  of  appraisal  that  Lazarus  (1991)  referred  to 
as  “core  relational  themes.”  For  instance,  the  core  relational 
theme  for  anxiety  is  uncertainty  and  existential  threat,  while  that 
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for  happiness  is  evident  progress  toward  goal  achievement. 
Thus,  when  individuals  appraise  events  relative  to  their  desired 
outcomes  (goals),  these  can  produce  negative,  goal-incongruent 
emotions  and  stress  if  such  events  are  appraised  as  hindering 
progress.  Conversely,  promotion  of  well-being  and  pleasure 
occur  when  events  are  appraised  as  facilitating  progress  toward 
a  goal  (e.g.,  goal-congruent  emotions).  Promotion  of  pleasure 
and  happiness  (see  Hancock,  Pepe,  &  Murphy,  2005;  Ryan  & 
Deci,  2001)  therefore  requires  the  design  of  environments  and 
tasks  themselves  that  afford  goal-congruent  emotions.  The  un¬ 
derstanding  of  interface  characteristics  in  HCI  which  facilitate 
positive  appraisals  and  reduce  negative  appraisals  is  thus  a  cru¬ 
cial  issue  and  an  obvious  avenue  in  which  HCI  and  stress  re¬ 
search  can  fruitfully  interact. 

A  major  limitation  of  all  appraisal  theories,  however,  is  the 
neglect  of  understanding  how  task  parameters  influence  result¬ 
ing  coping  response.  While  the  appraisal  mechanism  itself  may 
be  similar  across  individuals  and  contexts  (e.g.,  see  Scherer, 
1999),  the  specific  content  (e.g.,  which  events  are  appraised  as 
a  threat  to  well-being)  does  vary  across  individuals  and  contexts. 
One  would  expect  that  the  criteria  for  appraisal  (e.g.,  personal 
relevance,  self-efficacy  for  coping)  would  be  similar  across  indi¬ 
viduals  for  specific  task  parameters  as  for  any  other  stimulus  or 
event.  Individual  differences  occur  in  the  specific  content  of  the 
appraisal  (e.g.,  one  person’s  threat  is  another’s  challenge)  and 
in  the  resultant  response.  An  understanding  of  stress  effects  in 
HCI  therefore  requires  understanding  the  task  and  person  fac¬ 
tors,  and  treating  the  transaction  between  the  human  and  the 
system  as  the  primary  unit  of  analysis  (see  Lazarus  &  Folkman, 
1984).  This  entails  knowing  how  different  individuals  appraise 
specific  task  parameters  and  how  changes  in  knowledge  struc¬ 
tures  might  ameliorate  negative  stress  effects  and  promote  pos¬ 
itive  affect  in  human-technology  interaction.  A  visual  represen¬ 
tation  of  this  emergent  unit  of  analysis  that  comes  from  the 
interaction  of  person  and  environment,  including  the  task,  is 
shown  in  Fig.  6.1  (Hancock,  1997). 

Adaptation  Under  Stress 

A  theoretical  framework  developed  specifically  for  stress  as  it 
relates  to  performance  is  the  maximal  adaptability  model  pre¬ 
sented  by  Hancock  and  Warm  (1989).  They  distinguished  the 
three  facets  of  stress  distinguished  above,  and  they  labeled 
these  the  trinity  of  stress,  shown  in  Fig.  6.2.  The  first,  “Input,” 
refers  to  the  environmental  events  to  which  the  individual  is  ex¬ 
posed  which  include  information  (e.g.,  displays)  as  well  as  tra¬ 
ditional  input  categories  such  as  temperature,  noise,  vibration, 
and  so  forth  (e.g.,  Conway,  Szalma,  &  Hancock,  2007;  Hancock, 
Ross,  Szalma,  in  press;  Pilcher,  Nadler,  &  Busch,  2002).  The  sec¬ 
ond,  ‘Adaptation,”  encompasses  the  appraisal  mechanisms  re¬ 
ferred  to  previously.  The  third  and  final  component,  “Output,” 
is  the  level  which  indicates  how  the  organism  behaves  in  re¬ 
spect  to  goal  achievement.  A  fundamental  tenet  of  the  Hancock 
and  Warm  (1989)  model  is  that  in  the  large  majority  of  situa¬ 
tions  (and  even  in  situations  of  quite  high  demand)  individuals 
do  adapt  effectively  to  the  input  disturbance.  That  is,  they  can 
tolerate  high  levels  of  either  overload  or  underload  without 


Person - Environment 


FIGURE  6.1.  An  illustration  of  the  emergence  of  a  supra- 
ordinate  unit  of  analysis  that  derives  from  the  interaction  of  the 
individual  (person,  the  tool  they  use  (computers),  the  task  they 
have  to  perform,  and  the  context  (environment)  in  which  action 
occurs.  From  Hancock  1997) 


FIGURE  6.2.  The  trinity  of  stress  which  identifies  three  possible 
'loci’  of  stress.  It  can  be  viewed  as  an  input  from  the  physical  envi¬ 
ronment  which  can  be  described  deterministically.  Since  such  a  pro¬ 
file  is  by  definition  unique,  it  is  referred  to  as  a  stress  signature. 
The  second  locus  is  adaptation,  which  describes  the  populational  or 
nomothetic  reaction  to  the  input  itself.  It  is  most  evidently  mea¬ 
surable  in  the  processes  of  compensation.  The  third  and  final  locus 
is  the  output  which  is  expressed  as  the  impact  on  the  on-going 
stream  of  behavior.  Since  the  goals  of  different  individuals  almost 
always  vary,  this  output  is  largely  idiographic  or  person-specific.  It  is 
this  facet  of  stress  which  has  been  very  much  neglected  in  prior  and 
contemporary  research.  From  Hancock  and  Warm  (1989) 


enormous  change  to  their  performance  capacity.  Adaptive 
processes  occur  at  multiple  levels,  some  being  the  physiologi¬ 
cal,  behavioral  (e.g.,  performance),  and  subjective/affective  lev¬ 
els.  These  adaptations  are  represented  in  the  model  as  a  series 
of  nested,  extended,  inverted-U  functions  (see  Fig.  6.3)  that  re¬ 
flect  the  fact  that  under  most  conditions  the  adaptive  state  of 
the  organism  is  stable.  However,  under  extremes  of  environ¬ 
mental  underload  or  overload,  failures  in  adaptation  do  occur. 
Thus,  as  the  individual  is  perturbed  by  the  input,  the  first 
threshold  they  traverse  is  subjective  comfort.  This  is  followed 
by  behavioral  effects,  and  finally  failure  of  the  physiological  sys¬ 
tem  (e.g.,  loss  of  consciousness).  Examples  of  such  extreme  fail- 
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Physiological  Zone  of  Maximal  Adaptibility 
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STRESS  LEVEL 


FIGURE  6.3.  The  extended-U  relationship  between  stress  level  and  response  capacity.  As  is  evi¬ 
dent,  the  form  of  degradation  is  common  across  the  different  forms  of  response.  At  the  center 
of  the  continuum  is  the  normative  zone  which  reflects  optimal  subjective  functioning.  Outside  of 
this  is  the  comfort  zone  which  reflects  the  behavioral  recognition  of  a  state  of  satisfaction. 
Beyond  this  lies  the  reaction  of  psychological  or  cognitive  performance  capacity.  Finally  the 
outer  envelope  is  composed  of  physiological  functioning.  There  are  proposed  strong  linkages 
between  the  deviation  from  stability  at  one  level  being  matched  to  the  onset  of  radical  failure 
at  the  more  vulnerable  level  which  is  nested  within  it.  The  model  is  symmetrical  in  that  under¬ 
load  (hypostress)  has  mirror  effects  to  overload  (hyperstress).  The  latter  is  considered  the  com¬ 
monly  perceived  interpretation  of  stress.  From  Hancock  and  Warm  ( 1 989) 


ures  are  relatively  rare  in  most  work  settings,  although  when 
they  do  occur  they  are  often  catastrophic  for  the  individual  and 
the  system  they  are  operating  (e.g.,  Harris,  Hancock,  &  Harris, 
2005). 

This  model  is  unique  in  that  it  provides  explicit  recognition 
that  the  proximal  form  of  stress  in  almost  all  circumstances  is 
the  task  itself  Task  characteristics  are  incorporated  in  the  model 
by  two  distinct  base  axes  representing  spatial  and  temporal 
components  of  any  specified  task.  Information  structure  (the 
spatial  dimension)  represents  how  task  elements  are  organized, 
including  challenges  to  such  psychological  capacities  such  as 
working  memory,  attention,  decision  making,  response  capacity, 
and  the  like.  The  temporal  dimension  is  represented  by  infor¬ 
mation  rate.  Together  these  dimensions  can  be  used  to  form  a 
vector  (see  Fig.  6.4)  that  serves  to  identify  the  current  state  of 
adaptation  of  the  individual.  Thus,  if  the  combination  of  task 
characteristics  and  an  individual’s  stress  level  can  be  specified, 
a  vector  representation  can  be  used  to  predict  behavioral  and 
physiological  adaptation.  The  challenge  lies  in  quantifying  the 
information  processing  components  of  cognitive  work  (see 
Hancock,  Szalma,  &  Oron-Gilad,  2005). 


Although  the  model  shown  in  Fig.  6.4  describes  the  level  of 
adaptive  function,  it  does  not  articulate  the  mechanisms  by 
which  such  adaptation  occurs.  Hancock  and  Warm  (1989)  argued 
that  one  way  in  which  individuals  adapt  to  stress  is  to  narrow 
their  attention  by  excluding  task  irrelevant  cues  (Easterbrook, 
1959).  Such  effects  are  known  to  occur  in  spatial  perception 
(e.g.,  Bursill,  1958;  Cornsweet,  1969),  and  narrowing  can  occur 
at  levels  of  both  the  central  and  peripheral  neural  systems 
(Dirkin  &  Hancock,  1984;  1985;  Hancock  &  Dirkin,  1983).  More 
recently  Hancock  and  Weaver  (2005)  argued  that  distortions  of 
temporal  perception  under  stress  are  also  related  to  this  nar¬ 
rowing  effect.  However,  recent  evidence  suggests  that  these  two 
perceptual  dimensions  (space  and  time)  may  not  share  com¬ 
mon  perceptual  mechanisms  (see  Ross,  Szalma,  Thropp,  &  Han¬ 
cock,  2003;  Thropp,  Szalma,  &  Hancock,  2004). 

The  Cognitive-Energetic  Framework 

The  Hancock  and  Warm  (1989)  model  accounts  for  the  levels 
of  adaptation  and  adaptation  changes  under  the  driving  forces 
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FIGURE  6.4.  The  description  given  in  Figure  6.3  is  now  expanded  into  a  three-dimensional  rep¬ 
resentation  by  parsing  the  base  "hypostress-hyperstress”  axis  into  its  two  component  elements. 
These  divisions  are  compose  of  information  rate  (the  temporal  axis)  and  information  structure  (the 
spatial  axis).  Note  that  any  one  source  of  input  stress  can  be  described  as  a  scalar  on  the  base 
axis  and  these  scalars  can  be  summed  to  provide  a  multi-input  stress  vector  which  then  pro¬ 
vides  a  prediction  of  both  performance  and  physiological  adaptability  which  are  the  primary 
descriptors  on  the  vertical  axis.  From  Hancock  and  Warm  ( 1 989) 


of  stress.  However,  it  does  not  articulate  how  effort  is  allocated 
under  stress  or  the  mechanisms  by  which  individuals  appraise 
the  task  parameters  that  are  the  proximal  source  of  stress.  The 
effort  allocation  issue  is  address  by  a  cognitive-energetical  frame¬ 
work  described  by  G.  R.  J.  Hockey  (1997).  The  compensatory 
control  model  is  based  upon  three  assumptions:  behavior  is 
goal  directed;  self-regulatory  processes  control  goal  states;  and 
regulatory  activity  has  energetic  costs  (e.g. ,  consumes  resources) . 
In  this  model  a  compensatory  control  mechanism  allocates  re¬ 
sources  dynamically  according  to  the  goals  of  the  individual  and 
the  environmental  constraints.  The  mechanisms  operate  at  two 
levels  (see  Fig.  6.5).  The  lower  level  is  more  or  less  automatic 
and  represents  established  skills.  Regulation  at  this  level  requires 
few  energetic  resources  or  active  regulation  and  effort  (cf., 
Schneider  &  Shiffrin,  1977).  The  upper  level  is  a  supervisory 
controller  which  can  shift  resources  (effort)  strategically  to 
maintain  adaptation  and  reflects  effortful  and  controlled  pro¬ 
cessing.  The  operation  of  the  automatic  lower  loop  is  regulated 
by  an  effort  monitor,  which  detects  changes  in  the  regulatory 
demands  placed  on  the  lower  loop.  When  demand  increases  be¬ 
yond  the  capacity  of  the  lower  loop  control  is  shifted  to  the 
higher,  controlled  processing  loop.  Two  strategic  responses  of 
the  supervisory  system  are  increased  effort  and  changing  the 
goals.  Goals  can  be  modified  in  their  kind  (change  the  goal 
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FIGURE  6.5.  The  two-level  effort  regulation  model  by  Hockey. 
This  model  provides  a  mechanism  by  which  an  individual  allo¬ 
cates  limited  cognitive  resources  to  different  aspects  of  perfor¬ 
mance.  From  Hockey  (1997). 


itself)  or  in  strength  (e.g.,  lowering  the  criterion  for  perfor¬ 
mance).  Essentially,  this  is  adjusting  the  discrepancy  between 
goal  state  and  current  state  by  increasing  effort  or  changing  the 
goal  (and  see  Carver  &  Scheier,  1998). 
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RELATIONSHIP  BETWEEN  STRESS 
AND  COGNITIVE  WORKLOAD 


Cognitive  Workload  as  a  Form  of  Stress 

The  Hancock  and  Warm  (1989)  model  explicitly  identified  the 
task  itself  as  the  proximal  source  of  stress.  In  operational  envi¬ 
ronments,  this  is  often  manifested  as  increases  or  decreases  in 
cognitive  workload  (Moray,  1979).  As  in  the  case  of  stress,  work¬ 
load  is  easy  to  define  anecdotally  but  difficult  to  define  opera¬ 
tionally.  Workload  can  manifest  in  terms  of  the  amount  of  infor¬ 
mation  to  be  processed  (an  aspect  of  information  structure), 
and  the  time  available  for  processing  (information  rate).  Thus, 
the  base  axes  of  the  Hancock  and  Warm  model  captured  di¬ 
mensions  of  workload  as  well  as  stress  (see  Hancock  &  Caird, 
1993).  Indeed,  physiological  measures  of  workload  (O’Donnell 
&  Eggemeier,  1986)  are  often  the  same  as  those  measures  used 
to  assess  physiological  stress.  Similarly,  subjective  measures  of 
workload  and  stress  reflect  appraisals  of  the  task  environment 
and  of  its  perceived  effect  on  the  individual  (Hart  &  Staveland, 
1988).  Although  the  two  concepts  developed  in  separate  re¬ 
search  traditions,  the  artificial  boundary  between  them  should 
be  dissolved,  as  each  term  refers  to  similar  processes.  The  im¬ 
plication  for  HCI  is  that  computer-based  tasks  that  impose  ei¬ 
ther  too  much  or  too  little  demand  will  likely  be  appraised  as 
stressful.  Thus,  the  design  process  for  development  of  com¬ 
puter  interfaces  should  include  assessment  of  perceived  work¬ 
load  as  well  as  affective  state. 

Performance  and  Workload:  Associations/Dissociations 

It  is  often  the  case  that  performance  is  maintained  under  in¬ 
creased  workload  and  stress,  which  is  reflected  in  the  extended- 
U  model  described  by  Hancock  and  Warm  (1989)  and  in  the 
mechanisms  of  Hockey’s  (1997)  energetic  model  of  compen¬ 
satory  control.  Maintaining  performance  under  stress  has  costs, 
both  physiologically  and  cognitively.  Further,  one  would  expect 
that  in  easier  tasks  performance  is  not  as  costly  and  that  there 
should  therefore  be  a  direct  association  between  task  difficulty 
and  perceived  workload.  Such  performance-workload  associa¬ 
tions  do  occur,  most  prevalently  in  vigilance  (Warm,  Dember,  & 
Hancock,  1996;  see  also  Szalma  et  al.,  2004).  However,  other 
forms  of  workload-performance  relations  can  occur.  For  in¬ 
stance,  perceived  workload  may  change  as  a  function  of  changes 
in  task  demand,  but  performance  remains  constant.  Hancock 
(1996)  referred  to  these  situations  as  “insensitivities,”  which  can 
be  diagnostic  with  respect  to  the  relation  between  the  individ¬ 
ual  and  the  task  (see  also  Parasuraman  &  Hancock,  2001).  Thus, 
consistent  with  both  Hancock  and  Warm  and  G.  R.  J.  Hockey’s 
(1997)  frameworks,  one  response  to  increased  task  demand  is 
to  exert  more  effort,  thereby  maintaining  performance  but  in¬ 
creasing  perceived  workload.  Alternatively,  one  could  have  a 
situation  in  which  task  demands  increase,  performance  de¬ 
creases,  but  perceived  workload  does  not  change.  This  suggests 
that  the  appraisals  of  the  task  are  not  sensitive  to  actual  changes 
in  that  task. 


Interesting  corollaries  are  the  performance-workload  disso¬ 
ciations  that  sometimes  occur  (Hancock,  1996;  Yeh  &  Wickens, 
1988).  In  such  cases,  decreased  performance  is  accompanied  by 
decreased  workload.  One  possible  reason  for  such  a  result  might 
be  disengagement  of  the  individual  from  the  task  (e.g.,  the  per¬ 
son  gives  up;  see  Hancock,  1996).  In  the  case  where  increased 
performance  is  observed  to  be  accompanied  by  increased  per¬ 
ceived  workload,  the  pattern  suggests  effective  improvement 
of  performance  at  the  cost  of  increased  effort  allocation.  An  area 
of  much  needed  research  is  establishing  which  task  parameters 
control  the  patterns  of  performance-workload  associations  and 
dissociations,  and  how  these  change  dynamically  as  a  function 
of  time  on  task.  It  may  well  be  that  reformulating  the  task  by 
innovations  in  the  interface  itself  may  well  address  these  crucial 
concerns  (see  Hancock,  1997).  Indeed,  the  structure  and  orga¬ 
nization  of  computer  interfaces  will  be  a  major  factor  in  both 
performance  under  stress  and  in  the  relation  of  performance 
to  perceived  workload. 


MITIGATION  OF  STRESS 


If  changing  the  fundamental  nature  of  the  demand  is  one  so¬ 
lution,  we  now  look  at  other  approaches  to  mitigation  of  the 
negative  effects  of  stress  and  workload.  These  strategies  in¬ 
clude  skill  development  (e.g.,  Hancock,  1986),  and  specific  dis¬ 
play  design  (Hancock  &  Szalma,  2003a;  Wickens,  1996),  as  well 
as  technologies  employing  adaptive  automation  and  decision 
aids  (Hancock  &  Chignell,  1987).  Developing  skills  so  that  they 
are  relatively  automatic  rather  than  controlled  processing 
(Schneider  &  Shiffrin,  1977)  and  developing  expertise  can  mit¬ 
igate  some  of  the  negative  effects  of  stress.  In  regard  to  display 
design,  simple,  easily  perceivable  graphics  can  permit  quick,  di¬ 
rect  extraction  of  information  when  cognitive  resources  are  re¬ 
duced  by  stress  and  workload  (Hancock  &  Szalma,  2003a). 
Adaptive  automation  can  be  employed  by  adjusting  the  level 
of  automation  and  the  management  of  automation  according 
to  stress  state  (e.g.,  Scerbo,  Freeman,  &  Mikulka,  2003).  In  ad¬ 
dition,  adapting  the  form  of  automation  (e.g.,  level,  manage¬ 
ment  type)  to  the  operator  based  on  individual  differences 
can  serve  to  improve  its  utility  for  aiding  performance  and  re¬ 
ducing  stress  and  workload  (see  Thropp,  Oron-Gilad,  Szalma, 
&  Hancock,  2007). 


Changing  the  Person 

Training/ skill  development.  Clearly,  the  greater  the 
skill  of  the  individual  the  more  resilient  their  performance  will 
be  under  stress  (Hancock,  1986).  This  well-established  phe¬ 
nomenon  is  incorporated  into  the  energetic  theories  of  stress 
and  performance  (Hancock  &  Warm,  1989;  Hockey,  1997)  and 
is  an  approach  most  often  taken  to  mitigate  workload  and 
stress  effects.  However,  training  on  relevant  tasks  is  only  one 
method  of  training  for  stress.  There  are  also  techniques  for 
training  individuals  to  cope  more  effectively  with  stress,  essen- 
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tially  building  stress  coping  skills.  An  example  of  such  an  ap¬ 
proach  is  stress  exposure  training  (SET;  Johnston  &  Cannon- 
Bowers,  1996),  a  three  phase  procedure  in  which  individuals 
are  provided  with  information  regarding  the  stress  associated 
with  task  performance,  are  provided  with  training  on  the  task, 
and  then  practice  their  task  skills  under  simulated  stress  con¬ 
ditions.  This  technique  has  been  shown  to  be  effective  in  re¬ 
ducing  anxiety  and  enhancing  performance  (Saunders, 
Driskell,  Johnston,  &  Salas,  1996)  and  there  is  preliminary  evi¬ 
dence  that  coping  skills  learned  with  a  particular  type  of  stres¬ 
sor  and  task  can  transfer  to  novel  stressors  and  tasks  (Driskell, 
Johnston,  &  Salas,  2001).  For  such  an  intervention  to  succeed, 
however,  it  is  crucial  that  the  training  be  designed  based  on  an 
analysis  of  the  task  environment  (Johnston  &  Cannon-Bowers, 
1996).  If  the  task  parameters  that  are  most  responsible  for  the 
workload  and  stress  are  identified,  these  can  be  targeted  for 
attention  in  training. 

An  additional  issue  for  training  for  more  effective  stress  cop¬ 
ing  is  to  modify  the  individual’s  appraisal  of  events.  By  induc¬ 
ing  automaticity  in  some  skills,  not  only  are  more  resources 
freed  for  stress  coping,  but  the  task  environment  itself  will  be 
appraised  as  less  threatening.  Even  if  the  event  is  appraised  as 
a  threat  to  an  individual’s  psychological  or  physical  well-being, 
the  highly  skilled  individual  will  appraise  his  or  her  coping  abil¬ 
ity  as  sufficient  to  handle  the  increased  demand.  However,  there 
has  been  limited  research  on  how  individuals  who  develop  ex¬ 
pertise  also  develop  the  capacity  to  effectively  cope  with  the 
stress  that  accompanies  performance  in  a  given  domain,  and 
the  extent  to  which  stress  coping  skills  in  one  domain  transfer 
to  other  domains.  Deliberate  practice  generally  facilitates  skill 
development  (Ericsson,  2007).  If  one  considers  coping  with 
stress  to  be  a  skill,  then  in  principle  deliberate  practice  should 
permit  the  development  of  expertise  in  coping.  This  will  likely 
involve  parsing  the  task  into  components,  based  on  cognitive 
task  analysis,  and  designing  training  procedures  that  target  the 
stressful  aspects  of  the  task.  However,  such  efforts  will  require 
understanding  of  how  different  forms  of  stress  affect  different 
forms  of  information  processing.  Since  these  variables  are  diffi¬ 
cult  to  quantify,  establishing  these  linkages  must  be  theory  dri¬ 
ven.  Elucidation  of  these  issues  will  provide  the  groundwork  for 
future  development  of  stress  mitigation  tools  during  training 
and  skill  development. 

Personnel  selection.  Selection  techniques  have  been  a 
popular  choice  for  matching  individuals  to  specific  jobs,  but  the 
focus  has  typically  been  on  intellectual  skills  (e.g.,  Yerkes,  1918). 
Selecting  individuals  for  their  stress-coping  capability  has  been 
applied  to  the  selection  criteria  for  police  officers,  who  there¬ 
fore  tend  to  be  as  stable  as  or  more  emotionally  stable  than  the 
rest  of  the  population  (for  a  review,  see  Brown  &  Campbell, 
1994).  However,  research  is  needed  that  links  particular  traits  to 
stress  coping  skills  for  specific  task  environments.  The  effec¬ 
tiveness  of  general  life  stress  coping,  such  as  that  observed  in  in¬ 
dividuals  who  are  extraverted  (McCrae  &  Costa,  1986;  Penley  & 
Tomaka,  2002)  or  optimistic  (Aspinwall,  Richter,  &  Hoffman, 
2002;  Scheier  &  Carver,  1985),  may  not  predict  effective  coping 
in  specific  task  domains.  Understanding  which  individuals  will 
likely  cope  effectively  with  a  particular  task  therefore  requires 


a  thorough  understanding  of  the  perceptual,  cognitive,  and  psy¬ 
chomotor  demands  of  the  task,  and  then  linking  these  parame¬ 
ters  to  trait  profiles.  By  far,  the  most  research  on  the  relation  of  af¬ 
fective  traits  to  task  performance  has  been  in  Extraversion  and 
trait  anxiety/Neuroticism  (for  a  review,  see  Matthews,  Deary,  & 
"Whiteman,  2003).  However,  the  characteristics  of  greatest  in¬ 
terest  may  vary  somewhat  across  domains,  although  some  gen¬ 
eral  traits  (e.g.,  emotional  stability,  conscientiousness)  would 
be  expected  to  moderate  performance  across  a  variety  of  task 
environments. 

Changing  the  Task 

Display  design.  Although  training  and  selection  can  mit¬ 
igate  stress  effects,  the  tasks  themselves  should  be  redesigned, 
for  two  reasons.  First,  there  will  be  many  instances  where  selec¬ 
tion  is  not  possible  and  expenditure  of  significant  resources  on 
training  is  undesirable.  Second,  there  are  instances  in  which  one 
wishes  to  design  an  interface  that  requires  little  or  no  training 
and  that  can  be  used  by  any  member  of  a  large  population  of 
individuals  (e.g.,  consumers).  Particularly  in  light  of  the  obser¬ 
vation  that  the  task  represents  the  proximal  source  of  stress,  fu¬ 
ture  work  in  stress  mitigation  for  HCI  should  focus  on  redesign 
of  the  task  and  the  interface  itself  "We  have  previously  argued 
that  existing  display  design  techniques  that  are  simple  and  easily 
perceived  would  be  the  best  choice  for  an  interface  that  will  be 
used  in  stressful  environments  (Hancock  &  Szalma,  2003a). 
Specifically,  configural  or  object  displays  can  represent  complex, 
multivariable  systems  as  simple  geometric  shapes  or  emergent 
features  if  those  features  are  well-mapped  to  system  dynamics 
(see  Bennett  &  Flach,  1992).  Under  stress,  the  complex  problem 
solving  and  analytical  skills  are  the  most  vulnerable  and  decline 
first.  A  display  that  allows  fast  extraction  of  information  with 
minimal  cost  in  working  memory  load  can  mitigate  stress  effects 
(Hancock  &  Szalma,  2003a;  "Wickens,  1996).  A  combination  of 
training  to  automaticity  and  displays  of  information  that  can  be 
perceived  directly  with  a  minimum  of  information  processing 
requirements  is  currently  one  of  the  best  approaches  for  stress 
mitigation  in  cognitively  complex  environments. 

Adaptive  automation.  Another  approach  for  stress  mit¬ 
igation  is  the  allocation  of  function  to  automated  systems  (Han¬ 
cock  &  Chignell,  1987).  The  advent  of  modern  automated  sys¬ 
tems  allows  for  automation  to  adapt  to  the  state  of  the  individual 
(Scerbo  et  al.,  2003).  Thus,  at  points  in  time  when  an  operator  is 
overtaxed,  the  system  can  assume  control  of  some  task  func¬ 
tions,  thereby  freeing  resources  to  effectively  cope  with  increased 
task  demand.  Two  potential  problems  for  automated  systems 
are  that  over  reliance  can  occur  and  operator  skills  can  atrophy. 
However,  a  dynamic  (adaptive)  automated  system  that  permit¬ 
ted  or  required  the  operator  to  perform  functions  at  different 
points  in  time  could  reduce  the  probability  of  skill  atrophy  while 
still  relieving  the  workload  and  stress  of  task  performance. 

However,  the  introduction  of  automation  can  itself  induce 
stress.  Operators  who  work  with  automated  systems,  particularly 
static,  inflexible  automation,  are  relegated  to  the  role  of  monitors 
who  must  respond  only  when  untoward  events  occur.  Sustained 
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attention  requirements  are  in  fact  quite  stressful  (Szalma,  1999; 
Warm,  1993),  and  paradoxically  induce  higher  levels  of  perceived 
workload  (Warm  et  al.,  1996).  Adaptive  automation  can  miti¬ 
gate  this  problem  by  dynamically  assigning  tasks  to  the  machine 
or  the  human  depending  on  the  environmental  conditions  and 
the  state  of  the  operator  (Hancock  &  Chignell,  1987).  Indeed,  ef¬ 
forts  to  use  operator  neurological  state  to  adjust  automation 
are  currently  underway  (e.g.,  Scerbo,  2007). 

Hedonomics:  Promoting  Enjoyable 
Human-Computer  Interaction 

Stress  research  has  traditionally  followed  the  edict  of  er¬ 
gonomics  and  human  factors  in  general,  to  do  no  harm  and  to 
prevent  pain  and  injury.  As  with  the  rest  of  behavioral  science, 
stress  researchers  sought  to  treat  the  symptoms  of  stress  and 
mitigate  its  negative  effects  on  performance.  However,  with  the 
advent  of  positive  psychology  (Seligman  &  Csikszentmihalyi, 
2000)  there  has  been  a  movement  to  incorporate  the  promo¬ 
tion  of  pleasure  and  well-being  rather  than  restricting  efforts 
to  pain  prevention.  Hancock  coined  the  term  hedonomics  and 
defined  it  as  that  branch  of  science  that  facilitates  the  pleasant 
or  enjoyable  aspects  of  human-technology  interaction  (Han¬ 
cock,  Pepe,  &  Murphy,  2005).  In  short,  the  goal  for  hedonomics 
is  to  design  for  happiness.  Hedonomics  is  a  fairly  new  research 
area,  but  during  the  last  10  years,  there  has  been  a  rapid  growth 
in  research  concerning  affect  and  pleasure.  Affective  evalua¬ 
tions  provide  a  new  and  different  perspective  in  Human  Factors 
Engineering.  It  is  not  how  to  evaluate  users — it  is  how  the  user 
evaluates  (Hancock,  Pepe,  et  al.,  2005).  The  research  on  hedo¬ 
nic  values  and  seductive  interfaces  is  in  fact  a  welcome  contrast 
to  safety  and  productivity,  which  have  dominated  human  factors 
and  ergonomics.  Note,  however,  that  pleasurable  interaction 
with  technology  is  not  necessarily  conducive  to  happiness.  In¬ 
dulging  pleasures  can  sometimes  interfere  with  happiness  and 
well-being  (see  Fromm,  1976;  Kasser,  2002;  Ryan  &  Deci,  2001). 

Our  argument  is  not  that  we  should  discard  current  methods 
in  human  factors  and  ergonomics.  Clearly  functionality  and  us¬ 
ability  are  necessary  conditions  for  pleasurable  interaction  with 
technology.  If  an  interface  does  not  function  in  a  way  congruent 
with  the  user’s  goals,  so  that  the  user  appraises  the  technology 
as  an  agent  that  is  interfering  with  goal  achievement,  that  inter¬ 
action  is  likely  to  be  stressful  and  system  performance  more  vul¬ 
nerable  to  decline.  However,  function  and  usability  are  not  suffi¬ 
cient  conditions  for  pleasurable  interactions  with  technology.  The 
interface  should  be  designed  such  that  it  affords  appraisals  of  the 
technology  as  a  convivial  tool  (Illich,  1973)  or  aid.  One  can  also 
utilize  the  human  tendency  to  anthropomorphize  technology  to 
facilitate  appraisals  of  the  technology  as  helpful  and  supportive 
rather  than  as  an  enemy  (Luczak,  Roetting,  &  Schmidt,  2003). 

Hedonomic  design  will  be  of  obvious  importance  for  devel¬ 
opment  of  consumer  products,  but  in  principle,  it  can  also 
transform  the  very  nature  of  work,  rendering  it  fun.  Although 
there  will  be  some  tasks  which  will  never  be  enjoyable,  there  are 
many  individuals  who  have  jobs  that  could  be  made  more  en¬ 
joyable  by  designing  the  tasks  such  that  they  promote  teletic 
work  (Csikszentmihalyi,  1990)  and  facilitate  intrinsic  motiva¬ 
tion  (Deci  &  Ryan,  2000). 


Teletic  work  and  intrinsic  motivation.  A  useful  theo¬ 
retical  framework  for  hedonomics  is  Self-Determination  The¬ 
ory  (SDT;  Deci  &  Ryan,  1985,  2000;  Ryan  &  Deci,  2000,  2001). 
From  this  perspective  there  are  three  organismic  needs  that  are 
essential  for  facilitating  intrinsic  motivation  for  task  activity  and 
the  positive  affect  that  can  accompany  such  states.  These  needs 
are  for  competence  (self-efficacy;  see  also  Bandura,  1997),  au¬ 
tonomy  (personal  agency,  not  independence  per  se),  and  relat¬ 
edness.  An  important  difference  between  this  theory  and  other 
theories  of  motivation  is  the  recognition  that  there  are  qualita¬ 
tively  different  forms  of  motivation  (Gagne  &  Deci,  2005) .  Thus, 
in  SDT  five  categories  of  motivated  behavior  are  identified  that 
vary  in  the  degree  to  which  the  motivation  is  self-determined. 
Four  of  the  categories  reflect  extrinsic  motivation  and  one  cat¬ 
egory  is  intrinsic  motivation.  In  the  latter  case,  an  individual  is 
inherently  motivated  to  engage  in  activity  for  its  own  sake  or 
for  the  novelty  and  challenge.  The  four  extrinsic  motivation  cat¬ 
egories  vary  in  the  degree  to  which  regulation  of  behavior  is  in¬ 
ternalized  by  the  individual  and  therefore  more  autonomous 
and  self-determined  (Ryan  &  Deci,  2000).  The  process  of  inter¬ 
nalization  involves  transforming  an  external  regulation  or  value 
into  one  that  matches  one’s  own  values.  The  development  of 
such  autonomous  motivation  is  crucial  to  skill  development, 
since  the  person  must  maintain  effort  throughout  a  long  and  ar¬ 
duous  process.  Individuals  who  are  autonomously  motivated 
to  learn  are  those  who  develop  a  variety  of  effective  self  regula¬ 
tion  strategies,  have  high  self  efficacy,  and  who  set  a  number  of 
goals  for  themselves  (Zimmerman,  2000).  Further,  effective  self¬ 
regulation  develops  in  four  stages:  observation,  emulation,  self 
control,  and  self-regulation.  Successful  skill  development  in¬ 
volves  focus  on  process  goals  in  early  stages  of  learning  and 
outcome  goals  in  the  fourth  stage  (Zimmerman,  2000). 

Intrinsic  motivation  and  skill  development.  Re¬ 
search  has  established  that  intrinsic  motivation  is  facilitated  by 
conditions  promoting  autonomy,  competence,  and  relatedness 
(see  Deci  &  Ryan,  2000).  Three  factors  that  support  autonomy 
are  (a)  meaningful  rationale  for  doing  the  task,  (b)  acknowl¬ 
edgement  that  the  task  might  not  be  interesting,  and  (c)  an  em¬ 
phasis  on  choice  rather  than  control.  It  is  important  to  note  that 
externally  regulated  motivation  predicts  poorer  performance  on 
heuristic  tasks  (Gagne  &  Deci,  2005),  suggesting  that  as  experts 
develop  better  knowledge  representations  it  will  be  crucial  to 
promote  internal  regulation  of  motivation.  Although  intrinsic 
motivation  has  been  linked  to  how  task  activities  and  environ¬ 
mental  contexts  meet  psychological  needs,  it  is  not  clear  why 
skilled  performers  are  able  to  meet  these  needs,  or  why  indi¬ 
viduals  choose  a  particular  computer  interface.  It  is  likely  that 
interest  in  activities  codevelops  with  abilities  and  traits  (see  Ack¬ 
erman  &  Heggestad,  1997),  but  this  issue  needs  thorough  in¬ 
vestigation  in  the  context  of  complex  computer  environments 
that  require  highly  skilled  workers. 

In  addition  to  the  issue  of  efficacy  and  self-regulation,  there 
is  a  need  to  examine  the  process  by  which  individuals  internal¬ 
ize  extrinsic  motivation  as  they  gain  experience  with  a  particular 
interface  or  system.  In  particular,  Gagne  and  Deci  (2005)  noted 
that  little  research  has  examined  the  effect  of  reward  structures 
and  work  environments  on  the  internalization  process.  It  is 
likely  that  those  environments  that  are  structured  to  meet  ba- 
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sic  needs  will  more  likely  facilitate  internalization  processes  and 
inoculate  learners  against  the  trials  and  tribulations  that  face 
them  as  they  interact  with  new  technologies. 

Teletic  work  and  motivational  ajfordances.  Teletic, 
or  autoteletic,  work  refers  to  “work”  that  is  experienced  as  en¬ 
joyable  and  is  associated  with  flow  or  optimal  experience  char¬ 
acterized  by  a  sense  of  well  being  and  harmony  with  one’s  sur¬ 
roundings  (Csikszentmihalyi,  1990).  There  is  variation  in  both 
tasks  and  individuals  with  respect  to  the  degree  to  which  the 
human-technology  interaction  is  teletic.  There  are  four  cate¬ 
gories  in  which  individuals  tend  to  fall  with  respect  to  their  rela¬ 
tion  to  work.  First,  there  is  a  small  proportion  of  the  population 
that  are  always  happy  in  life,  regardless  of  their  activity,  which 
Csikszentmihalyi  referred  to  as  individuals  with  an  autotelic  per¬ 
sonality.  There  are  also  individuals  who  are  predisposed  to  hap¬ 
piness  about  a  specific  task.  They  appraise  such  tasks  as  enjoy¬ 
able,  and  seek  out  these  activities.  The  third  group  consists  of 
individuals  who  enjoy  specific  activities  but  cannot  do  them  pro¬ 
fessionally,  such  as  amateur  athletes.  The  vast  majority  of  people, 
however,  do  work  for  purely  functional  reasons  (e.g.,  security). 
For  these  individuals  work  is  boring  and  grinding  because  the 
task  itself  is  aversive.  A  goal  for  hedonomics,  then,  is  to  design 
work  that  can  be  enjoyed  to  the  greatest  extent  possible.  This 
means  structuring  the  environment  as  an  entire  system,  ranging 
from  the  specific  cognitive  and  psychomotor  demands  to  the 
organization  in  which  the  person  works.  Even  in  jobs  that  are  not 
inherently  enjoyable,  some  degree  of  positive  affect  can  be  ex¬ 
perienced  by  workers  if  they  their  environment  is  structured  to 
facilitate  a  sense  of  autonomy  (personal  agency),  competence, 
and  relatedness  (Deci  &  Ryan,  2000;  see  also  Gagne  &  Deci, 
2005).  From  an  ecological  perspective  (e.g.,  Flach,  Hancock, 
Caird,  &  Vicente,  1995),  this  means  identifying  the  motivational 
affordances  in  the  task  and  work  environment,  and  designing 
for  these  affordances.  Thus,  just  as  one  might  analyze  the  affor- 
dance  structure  of  an  interface  using  ecological  interface  design 
methods  (e.g.,  Vicente  &  Rasmussen,  1992),  one  can  design  an 
environment  so  that  the  elements  of  the  physical  and  social  en¬ 
vironment  afford  stress  reduction  and  enhanced  intrinsic  moti¬ 
vation.  Note  that  although  an  affordance  is  a  property  of  the 
environment,  it  does  not  exist  independently  of  the  individual 
in  the  environment.  Affordances  therefore  share  conceptual  el¬ 
ements  of  person-environment  transactions  that  drive  emotion 
and  stress.  They  differ  in  that  the  classical  definition  of  an  affor¬ 
dance  is  a  physical  property  of  the  environment  (Gibson,  1966, 
1979),  while  a  transaction  emphasizes  the  individual’s  subjec¬ 
tive  appraisal  of  the  environment.  In  both  cases,  however,  one 
cannot  define  the  concept  by  isolating  either  the  individual  or 
the  context.  Thus,  although  affordances  and  invariants  are  con¬ 
sidered  physical  properties  of  the  environment,  these  concepts 
are  still  relevant  for  motivational  processes  (and  see  Reed,  1996) . 

Motivational  affordances  may  be  conceived  as  elements  of 
the  work  environment  that  facilitate  and  nurture  intrinsic  moti¬ 
vation.  The  key  for  design  is  to  identify  motivational  invariants 
or  environmental  factors  that  consistently  determine  an  indi¬ 
vidual’s  level  of  intrinsic  motivation  across  contexts.  There  are 
some  aspects  of  work  that  have  been  identified  as  important 
for  facilitating  intrinsic  motivation  and  would  thus  be  consid¬ 
ered  motivational  invariants.  For  instance,  providing  feedback 


that  is  perceived  as  controlling  rather  than  informative  tends  to 
undermine  a  sense  of  autonomy  and  competence  and  thereby 
reduces  intrinsic  motivation  (Deci,  Ryan,  &  Koestner,  1999). 
Careful  analyses  of  the  motivational  affordance  structure  will 
permit  design  of  tasks  that  are  more  likely  to  be  enjoyable  by 
rendering  the  tools  convivial  (Illich,  1973)  and  thereby  facilitat¬ 
ing  human-machine  synergy  (and  see  Hancock,  1997). 


PROBLEMS  FOR  FUTURE  RESEARCH 


In  this  section,  we  will  identify  the  areas  for  future  research. 
These  include  a  better  understanding  of  resources  and  quanti¬ 
fying  task  dimensions  defined  in  the  Hancock  and  Warm  (1989) 
model,  which  will  likely  reduce  to  the  thorny  problem  of  quan¬ 
tifying  human  information  processing  (see  Hancock,  Szalma,  & 
Oron-Gilad,  2005).  Further,  we  will  discuss  the  need  for  research 
on  performance-workload  associations  and  dissociations,  and 
the  evident  need  for  programmatic  investigation  of  individual 
differences  in  performance,  workload,  and  stress. 

The  Hancock  and  Warm  (1989)  model  of  stress  explicitly  iden¬ 
tified  task  dimensions  that  influence  stress  state  and  behavioral 
adaptability.  However,  the  metrics  for  these  dimensions,  and 
how  specific  task  characteristics  map  to  them,  have  yet  to  be 
fully  understood.  Thus,  future  research  should  aim  to  examine 
how  different  task  components  relate  to  performance  and  sub¬ 
jective  and  physiological  state.  Development  of  a  quantitative 
model  of  task  characteristics  will  permit  the  derivation  of  vec¬ 
tors  for  the  prediction  of  adaptability  under  stress.  Cognitive 
Neuroscience  and  Neuroergonomics  in  particular  offer  one 
promising  approach  to  such  development.  An  additional  step  in 
this  direction,  however,  will  be  facilitated  by  improved  quanti¬ 
tative  models  of  how  humans  process  information  (Hancock, 
Szalma,  &  Oron-Gilad,  2005). 

Understanding  Mental  Resources 

One  of  the  challenges  for  quantifying  human  information  pro¬ 
cessing  is  that  there  is  little  understanding  or  consensus  re¬ 
garding  the  capacities  that  process  the  information.  A  central 
concept  in  energetic  models  of  human  performance  is  mental 
resources.  Resource  theory  replaced  arousal  and  served  as  an 
intervening  variable  to  explain  the  relations  between  task  de¬ 
mand  and  performance.  However,  a  continual  problem  for  the 
resource  concept  is  to  operationally  define  what  it  is.  Most 
treatments  of  resources  use  that  term  metaphorically  (Navon 
&  Gopher,  1979;  Wickens,  1980,  1984),  and  failures  to  specify 
what  resources  are  have  led  some  to  challenge  the  utility  of  the 
concept  (Navon,  1984).  As  resource  theory  is  a  central  concept 
in  the  theories  of  stress  discussed  herein,  and  represents  one 
of  the  most  important  issues  to  be  resolved  in  future  research 
on  stress  and  performance,  we  now  turn  to  the  definitional 
concerns  associated  with  the  resource  construct  and  the  im¬ 
perative  for  future  research  to  refine  the  concept. 

Resource  metaphors.  Two  general  categories  of  resource 
metaphors  may  be  identified:  structural  metaphors  and  ener- 
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getic  metaphors.  One  of  the  earliest  conceptualizations  of  re¬ 
source  capacity  used  a  computer-based  metaphor  (Moray,  1967). 
Thus,  cognitive  capacity  was  viewed  as  analogous  to  the  RAM 
and  processing  chip  of  a  computer,  consisting  of  information 
processing  units  that  can  be  deployed  for  task  performance. 
However,  the  structural  metaphor  has  been  applied  more  to 
theories  of  working  memory  than  to  attention  and  resource  the¬ 
ory.^  Most  early  resource  theories,  including  Kahneman’s  (1973) 
original  view  and  modifications  by  Norman  and  Bobrow  (1975), 
Navon  and  Gopher  (1979),  and  Wickens  (1980,  1984),  applied 
energetic  metaphors  to  resources,  and  conceptualized  them  as 
commodities  or  as  pools  of  energy  to  be  spent  on  task  perfor¬ 
mance.  In  general,  energetic  approaches  tend  to  employ  either 
economic  or  thermodynamic/hydraulic  metaphors.  The  eco¬ 
nomic  model  is  reflected  in  the  description  of  resources  in  terms 
of  supply  and  demand:  Performance  on  one  or  more  tasks  suf¬ 
fers  when  the  resource  demands  of  the  tasks  exceed  available 
supply.  Presumably,  the  total  amount  of  this  supply  fluctuates 
with  the  state  of  the  individual,  with  the  assets  diminishing  with 
increases  in  the  intensity  or  duration  of  stress.  Although  Kah¬ 
neman’s  original  conception  allowed  for  dynamic  variation  avail¬ 
able  resource  capacity,  most  early  models  assumed  a  fixed 
amount  of  resources  (see  Navon  &  Gopher,  1979).  In  thermo¬ 
dynamic  analogies,  resources  are  a  fuel  that  is  consumed,  or  a 
tank  of  liquid  to  be  divided  among  several  tasks,  and  under 
stressful  conditions  the  amount  of  resources  available  is  de¬ 
pleted  and  performance  suffers.  In  discussing  his  version  of 
resource  theory,  Wickens  (1984)  warned  that  the  hydraulic 
metaphor  should  not  be  taken  too  literally,  but  most  subsequent 
descriptions  of  resources  have  employed  visual  representations 
of  resources  as  just  this  form  (e.g.,  a  tank  of  liquid).  Similarly, 
many  discussions  of  resource  availability  and  expenditure  adopt 
the  economic  language  of  supply  and  demand,  and  Navon  and 
Gopher  explicitly  adopted  principles  of  microeconomics  in  de¬ 
veloping  their  approach.  An  additional  problem  for  resource 
theory  is  that  in  most  cases  (e.g.,  Navon  &  Gopher,  1979;  Wick¬ 
ens,  1980,  1984),  the  structural  and  energetic  metaphors  were 
treated  as  interchangeable,  a  further  testament  to  the  ambiguity 
of  the  construct. 

A  problem  with  using  nonbiological  metaphors  to  represent 
biological  systems  is  that  such  models  often  fail  to  capture  the 
complexity  and  the  unique  dynamic  characteristics  (e.g.  adaptive 
responses)  of  living  systems.  For  instance,  a  hydraulic  model  of 
resources  links  the  activity  of  a  tank  of  liquid,  governed  by  ther¬ 
modynamic  principles,  to  the  action  of  arousal  mechanisms  or 
energy  reserves  that  are  allocated  for  task  performance.  How¬ 
ever,  a  thermodynamic  description  of  the  physiological  pro¬ 
cesses  underlying  resources  is  at  a  level  of  explanation  that  may 
not  adequately  describe  the  psychological  processes  that  gov¬ 
ern  performance.  Thermodynamic  principles  can  be  applied  to 
the  chemical  processes  that  occur  within  and  between  neurons, 
but  they  may  be  less  useful  in  describing  the  behavior  of  large 
networks  of  neurons.^  Similarly,  economic  metaphors  of  supply 
and  demand  may  not  adequately  capture  the  relation  between 


cognitive  architecture  and  energy  allocated  for  their  function. 
Economic  models  of  resources  define  them  as  commodities  to 
be  spent  on  one  or  more  activities,  and  they  assume  an  isomor¬ 
phism  between  human  cognitive  activity  and  economic  activity 
an  assumption  which  may  not  be  tenable.  Indeed,  Navon  and 
Gopher  (1979)  admitted  that  their  static  economic  metaphor 
for  multiple  resources  may  need  to  be  replaced  by  a  dynamic 
one  that  includes  temporal  factors  (e.g.  serial  versus  parallel 
processing;  activity  of  one  processing  unit  being  contingent 
upon  the  output  of  another).  Such  concerns  over  the  metaphors 
used  to  describe  resources  are  hardly  new  (Navon,  1984;  Wick¬ 
ens,  1984),  but  their  use  has  become  sufficiently  ingrained  in 
thinking  about  resources  and  human  performance  that  reeval¬ 
uation  of  the  metaphors  is  warranted.  A  regulatory  model  based 
on  physiology  may  serve  as  a  better  metaphor  (and,  in  the  future 
may  serve  to  describe  resources  themselves  to  the  extent  that 
they  can  be  established  as  a  hypothetical  construct)  to  describe 
the  role  of  resources  in  human  cognition  and  performance. 
However,  even  a  physiologically-based  theory  of  resources  must 
be  tempered  by  the  problems  inherent  in  reducing  psychologi¬ 
cal  processes  to  physiological  activity. 

Function  of  resources.  Another  problem  for  resource 
theory  is  the  absence  of  a  precise  description  of  how  resources 
control  different  forms  of  information  processing.  Do  resources 
determine  the  energy  allocated  to  an  information  processor 
(Kahneman,  1973),  do  they  provide  the  space  within  which  the 
processing  structure  works  (Moray,  1967),  or  does  the  processor 
draw  on  the  resources  as  needed  (and  available)?  In  the  latter 
case,  the  cognitive  architecture  would  drive  energy  consump¬ 
tion  and  allocation,  but  the  locus  of  control  for  the  division  of 
resources  remains  unspecified  in  any  case.  Presumably,  an  ex¬ 
ecutive  function  that  either  coordinates  information  processors 
drawing  on  different  pools  of  resources  or  decides  how  re¬ 
sources  will  be  allocated  must  itself  consume  resources,  in 
terms  of  both  energy  required  for  decision  making  and  mental 
space  or  structure  required.  Hence,  resource  theory  does  not 
solve  the  homunculus  problem  for  theories  of  attention,  nor 
does  it  adequately  describe  resource  allocation  strategies  be¬ 
hind  performance  of  information  processing  tasks. 

Empirical  tests  of  the  model.  Navon  and  Gopher  (1979) 
commented  on  the  problem  of  empirically  distinguishing  de¬ 
clines  in  performance  due  to  insufficient  supply  from  those 
resulting  from  increases  in  demand.  They  asked,  “When  the  per¬ 
formance  of  a  task  deteriorates,  is  it  because  the  task  now  gets 
fewer  resources  or  because  it  now  requires  more?”  (p.  243). 
Navon  and  Gopher  characterized  the  problem  as  distinguish¬ 
ing  between  changes  in  resources  and  changes  in  the  subject- 
task  parameters  that  constrain  resource  utilization,  and  they  of¬ 
fered  two  approaches  to  avoid  this  difficulty.  One  approach  is  to 
define  the  fixed  constraints  of  the  task  and  observe  how  the  in¬ 
formation  processing  system  manages  the  processes  within  those 
constraints.  The  degree  of  freedom  of  the  system,  in  this  view. 


^This  is  a  curious  historical  development,  since  these  relatively  separate  areas  of  research  converge  on  the  same  psychological  processes. 

^The  argument  here  is  not  that  neural  structures  are  not  constrained  by  the  laws  of  thermodynamics — clearly  they  are — but  that  thermodynamic  prin¬ 
ciples  implied  by  the  metaphor  are  not  sufficient  for  the  development  of  a  complete  description  of  resources  and  their  relation  to  cognitive  activity. 
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is  the  pool  of  resources  available,  in  which  the  term  resource  is 
interpreted  broadly  to  include  quality  of  information,  number  of 
extracted  features,  or  visual  resolution.  The  subject-task  para¬ 
meters  define  what  is  imposed  on  the  system  (the  demands) 
and  the  resources  refer  to  what  the  system  does  in  response  to 
the  demands  (allocation  of  processing  units).  From  this  per¬ 
spective  resources  can  be  manipulated  by  the  information  pro¬ 
cessing  system  within  the  constraints  set  by  the  subject-task 
parameters.  A  second  approach  is  to  distinguish  the  kind  of  con¬ 
trol  the  system  exerts  on  resources,  between  control  on  the  use 
of  processing  devices  (what  we  have  called  “structure”)  and  the 
control  of  the  properties  of  the  inputs  that  go  into  these  de¬ 
vices.  The  devices  are  processing  resources.  The  other  kind  of 
control  is  exerted  on  input  resources,  which  represents  the  flex¬ 
ibility  the  person  has  for  determining  which  inputs  are  oper¬ 
ated  on,  as  determined  by  subject-task  parameters.  Processing 
resources  are  limited  by  the  capacities  of  the  information 
processors,  while  the  input  resources  are  limited  by  subject-task 
parameters  (and  allocation  strategies  that  determine  which  in¬ 
formation  the  operator  attends  to).  Presumably,  the  individual 
would  have  some  control  over  the  allocation  strategy,  in  terms 
of  the  processing  resources  devoted  to  a  task,  although  these 
can  also  be  driven  by  task  demands  (e.g.  a  spatial  task  requires 
spatial  processing  units).  Navon  and  Gopher  did  not  advocate 
either  approach,  but  presented  them  as  alternatives  for  further 
investigation.  The  implication  for  examining  the  resource  model 
of  stress  is  that  one  must  manipulate  both  the  subject-task  pa¬ 
rameters  (e.g.  by  varying  the  psychophysical  properties  of  the 
stimulus,  manipulating  the  state  of  the  observer,  or  varying  the 
kind  of  information  processing  demanded  by  the  task)  as  well  as 
the  allocation  strategies  the  operator  uses  (the  input  resources — 
e.g.  payoff  matrices,  task  instructions).  This  would  provide  in¬ 
formation  regarding  how  specific  stressors  impair  specific  in¬ 
formation  processing  units  and  how  they  change  the  user’s  re¬ 
source  allocation  strategies  in  the  presence  of  stress  that  is 
continuously  imposed  on  operators  of  complex  computer-based 
systems. 

In  a  later  article,  Navon  (1984)  moved  to  a  position  less  fa¬ 
vorable  toward  resources  than  the  earlier  approach,  asserting 
that  predictions  derived  by  resource  theory  could  be  made,  and 
results  explained,  without  appealing  to  the  resource  concept 
(see  also  Rugg,  1986).  One  could  instead  interpret  effects  in 
terms  of  the  outputs  of  information  processors.  Most  manipu¬ 
lations,  such  as  difficulty  (which  in  his  view  influences  the  effi¬ 
ciency  of  a  unit  of  resources)  or  complexity  (which  affects  the 
load,  or  the  number  of  operations  required)  influence  the  de¬ 
mand  for  processing,  with  supply  having  no  impact  upon  their 
interaction.  However,  this  approach  assumes  a  clear  distinction 
between  outputs  of  a  processing  system  and  the  concept  of  a  re¬ 
source,  and  Navon’s  notion  of  specific  processors  seems  blurred 
with  the  notion  of  a  resource,  as  both  are  utilized  for  task  per¬ 
formance.  Nevertheless,  his  critique  that  the  regarding  the 
vagueness  of  the  resource  concept  is  relevant,  and  Navon 
argued  that  if  resources  are  viewed  as  an  intervening  variable 
rather  than  a  hypothetical  construct,  the  concept  has  utility  for 
describing  the  process. 

Structural  mechanisms.  If  different  kinds  of  information 
processing  draw  on  different  kinds  of  resources,  in  terms  of  the 


information  processors  engaged  in  a  task,  stressors  may  have 
characteristic  effects  on  each  resource.  In  addition,  as  Navon  and 
Gopher  (1979)  noted,  an  aspect  of  resource  utilization  is  the  ef¬ 
ficiency  of  each  resource  unit.  It  may  be  that  stress  degrades  the 
efficiency  of  information  processing  units,  independent  of  en¬ 
ergy  level  or  allocation  strategy  (cf.,  Eysenck,  M.  W  &  Calvo, 
1992).  Investigation  of  such  effects  could  be  accomplished  by 
transitioning  between  tasks  requiring  different  kinds  of  infor¬ 
mation  processing  and  determining  if  the  effects  of  stress  on  one 
structure  impacts  the  efficiency  of  a  second  structure. 

The  quality  of  resources  can  vary  not  only  in  terms  of  the 
kind  of  information  processing  unit  engaged,  but  also  in  terms 
of  the  kind  of  task  required.  Following  Rasmussen’s  (1983) 
classification  system  for  behavior  as  a  heuristic  for  design, 
some  tasks  require  knowledge-based  processing,  in  which  the 
operator  must  consciously  rely  on  his  or  her  mental  model  of 
the  system  in  order  to  achieve  successful  performance.  Other 
tasks  fall  under  the  category  of  rule-based  behavior,  in  which 
a  set  rules  or  procedures  define  task  performance.  The  third 
category  is  skill-based  behavior,  in  which  the  task  is  performed 
with  a  high  degree  of  automaticity.  Presumably  each  kind  of 
task  requires  different  amounts  of  resources,  but  they  may  also 
represent  qualitatively  different  forms  of  resource  utilization. 
In  other  words,  these  tasks  may  differ  in  the  efficiency  of  a  unit 
of  resources  as  well  as  different  effort  allocation  strategies.  As 
task  performance  moves  from  knowledge  to  rule  to  skill  based 
processing  (e.g.  with  training),  the  cognitive  architecture  may 
change  such  that  fewer  information  processing  units  are  re¬ 
quired,  and  those  that  are  engaged  become  more  efficient. 
Moreover,  the  way  in  which  each  of  these  systems  degrade 
with  time  under  stress  may  be  systematic,  with  the  more  frag¬ 
ile  knowledge-based  processing  degrading  first,  followed  by 
rule  based  processing,  with  skill  based  processing  degrading 
last  (at  this  point,  one  may  begin  to  see  breakdown  of  not  only 
psychological  processes  but  physiological  ones  as  well;  see 
Hancock  &  "Warm,  1989).  This  degradation  may  follow  a  hys¬ 
teresis  function,  such  that  a  precipitous  decline  in  perfor¬ 
mance  occurs  as  the  operator’s  resource  capacity  is  reduced 
below  a  minimum  threshold  for  performance.  Moreover,  these 
processes  may  recover  in  an  inverse  form,  with  skill-based  pro¬ 
cessing  recovering  first,  followed  by  rule  and  knowledge-based 
processing. 

Note  that  it  may  be  difficult  to  distinguish  pure  knowledge- 
based  processing  from  rule-  or  skill-based  activity.  An  alternative 
formulation  is  the  distinction  between  controlled  and  automatic 
processing  (Schneider  &  Shiffrin,  1977).  Although  originally 
conceived  as  categories,  it  is  likely  that  individuals  engaged  in 
real-world  tasks  utilize  both  automatic  and  controlled  process¬ 
ing  for  different  aspects  of  performance  and  that  for  a  given  task 
there  are  levels  of  automaticity  possible.  Treating  skills  as  a  con¬ 
tinuum  rather  than  as  discrete  categories  may  be  a  more  theo¬ 
retically  useful  framework  for  quantifying  resources  and  infor¬ 
mation  processing,  and  thereby  elucidating  the  effects  of  stress 
on  performance. 

Energetic  mechanisms.  To  investigate  the  energetic  as¬ 
pects  of  resources,  one  must  manipulate  environmentally- 
based  perturbations,  in  the  form  of  external  stressors  (noise, 
heat)  and  task  demands,  to  systematically  affect  inflow  versus 
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outflow  of  energy.  Presumably,  inflow  is  controlled  by  arousal 
levels,  physiological  energy  reserves,  and  effort.  One  could  ex¬ 
amine  performance  under  manipulations  of  energetic  re¬ 
sources  under  dual  task  performance  (e.g..  What  happens  to 
performance  on  two  tasks  under  sleep  deprivation  or  caffeine 
consumption?).  For  example,  the  steady  state  can  be  per¬ 
turbed  by  increasing  (e.g.  caffeine)  or  decreasing  (e.g.  sleep 
deprivation)  energy  while  systematically  varying  the  demands 
for  two  tasks. 

Structure  and  energy.  Another  empirical  challenge  is  to 
distinguish  resources  as  structure  from  resources  as  energy. 
Given  the  definitional  problems  associated  with  the  resource 
concept,  it  is  not  clear  whether  performance  declines  because 
of  reduction  in  energy  level  or  degradation  in  structures  (e.g., 
failures  or  declines  in  the  efficiency  of  the  processing  units),  or 
a  combination  of  both.  If  structure  and  energy  are  distinct  ele¬ 
ments  of  resources,  it  is  hypothetically  possible  to  manipulate 
one  while  holding  the  other  constant,  although  the  validity  of 
that  assumption  is  questionable.  Is  it  possible  to  manipulate 
specific  forms  of  information  processing  under  constant  energy 
level?  Is  it  possible  to  manipulate  energy  level  independent  of 
which  cognitive  processes  are  utilized?  If  the  decline  in  available 
resources  is,  at  least  in  part,  due  to  the  degradation  of  particular 
information  processing  units,  then  transferring  to  a  task  requir¬ 
ing  the  same  processor  should  lead  to  worse  performance  than 
transferring  to  one  that  is  different  (cf ,  Wickens,  1980,  1984). 
For  instance,  if  a  person  engages  in  a  task  requiring  verbal  work¬ 
ing  memory  while  under  stress,  then  transitions  to  a  task  re¬ 
quiring  spatial  discrimination,  performance  on  the  latter  should 
depend  only  on  energetic  factors,  not  on  structural  ones.  Note, 
however,  that  in  this  case  the  effects  of  different  mental  capaci¬ 
ties  would  be  confounded  with  the  effects  of  novelty  and  moti¬ 
vation  on  performance. 

Application  of  neuroergonomics.  The  burgeoning 
field  of  Neuroergonomics  seeks  to  identify  the  neural  bases  of 
psychological  processes  involved  in  real-world  human-technol¬ 
ogy  interaction  (Parasuraman,  2003).  As  we  have  stated  else¬ 
where  (Hancock  &  Szalma,  2007),  recent  advances  in  Neuroer- 
gonimcs  promises  to  identify  cognitive  processes  and  their  link 
to  neurological  processes.  This  may  permit  a  more  robust  and 
quantitative  definition  of  resources,  although  we  caution  that  a 
reductionistic  approach  is  not  likely  to  be  fruitful  (and  see  Han¬ 
cock  &  Szalma,  2003b).  In  addition,  the  stress  concept  itself 
rests  in  part  on  more  precise  definitions  of  resources  (Hancock 
&  Szalma,  2007).  Thus,  resolution  of  the  resource  issue  in  re¬ 
gard  to  cognitive  processing  and  task  performance  would  also 
clarify  the  workload  and  stress  concepts.  We  view  Neuro¬ 
ergonomics  as  one  promising  avenue  for  future  research  to  re¬ 
fine  the  workload  and  stress  and  resource  concepts. 

Development  of  the  Adaptation  under  Stress  Model 

Quantify  the  task  dimensions.  A  major  challenge  for 
the  Hancock  and  Warm  (1989)  model  is  the  quantification  of  the 
base  axes  representing  task  dimensions.  Specification  of  these 
dimensions  is  necessary  if  the  vector  representation  postulated 


by  Hancock  and  Warm  is  to  be  developed  and  if  the  resource 
construct  is  to  be  more  precisely  defined  and  quantified.  How¬ 
ever,  task  taxonomies  that  are  general  across  domains  present  a 
theoretical  challenge,  because  they  require  an  understanding 
and  quantification  of  how  individuals  process  information  along 
the  spatial  and  temporal  task  dimensions,  and  how  these  change 
under  stressful  conditions.  Quantification  of  information  pro¬ 
cessing,  and  subsequent  quantification  of  the  base  axes  in  the 
Hancock  and  Warm  model,  permit  the  formalization  of  the  vec¬ 
tor  representation  of  adaptive  state  under  stress  (see  Fig.  6.4). 

Attentional  narrowing.  Recall  that  Hancock  and  Weaver 
(2005)  argued  that  the  distortions  of  spatial  and  temporal  per¬ 
ception  have  a  common  attentional  mechanism.  Two  implica¬ 
tions  of  this  assertion  are  (a)  that  events  (internal  or  external) 
that  distort  one  dimension  will  distort  the  other,  and  (b)  that 
these  distortions  are  unlikely  to  be  orthogonal.  With  very  few 
exceptions,  little  research  has  addressed  the  possibility  of  an  in¬ 
teraction  between  distortions  of  spatial  and  temporal  percep¬ 
tions  in  stressful  situations  on  operator  performance.  Preliminary 
evidence  suggests  that  these  two  dimensions  may  in  fact  not 
share  a  common  mechanism  (Ross  et  al.,  2003;  Thropp  et  al., 
2004),  although  further  research  is  needed  to  confirm  these  find¬ 
ings.  An  additional  important  issue  for  empirical  research  is 
whether  we  are  dealing  with  time-in-memory  or  time-in-passing 
(and  to  some  extent  space-in-memory  vs.  space-in-passing). 
Thus,  the  way  in  which  perceptions  of  space  and  time  interact 
to  influence  operator  state  will  depend  upon  how  temporal  per¬ 
ceptions  (and  spatial  perception,  for  that  matter)  are  measured. 

A  possible  explanation  for  perceptual  distortions  under  con¬ 
ditions  of  heavy  workload  and  stress  concerns  the  failure  to 
switch  tasks  when  appropriate.  Switching  failures  may  be  re¬ 
sponsible  for  the  observation  in  secondary  task  methodology 
that  some  participants  have  difficulty  dividing  their  time  be¬ 
tween  tasks  as  instructed  (e.g.,  70%  to  the  primary  task  and  30% 
to  the  secondary  task) .  This  difficulty  may  result  from  the  par¬ 
ticipant’s  inability  to  accurately  judge  how  long  he  or  she  has  at¬ 
tended  to  each  task  during  a  given  time  period.  The  degree  to 
which  distortions  in  perception  of  space-time  are  related  to  im¬ 
pairments  in  task  switching  under  stressful  conditions,  and  the 
degree  to  which  these  distortions  are  related  to  attention  allo¬ 
cation  strategies  in  a  secondary  task  paradigm,  are  questions  for 
empirical  resolution. 

Stressor  characteristics.  Even  if  space  and  time  do  pos¬ 
sess  a  common  mechanism,  it  may  be  that  specific  stressors  do 
not  affect  spatial  and  temporal  perceptions  in  the  same  way.  For 
instance,  heat  and  noise  may  distort  perception  of  both  space 
and  time,  but  not  to  the  same  degree  or  in  the  same  fashion.  It 
is  important  to  note  that  spatial  and  temporal  distortions  may 
be  appraised  as  stressful,  as  they  might  interfere  with  the  infor¬ 
mation  processing  requirements  of  a  task.  Consequently,  some 
kinds  of  information  processing  might  be  more  vulnerable  to 
one  or  the  other  kind  of  perceptual  distortion.  Clearly,  perfor¬ 
mance  on  tasks  requiring  spatial  abilities,  such  as  mental  rota¬ 
tion,  could  suffer  as  a  result  of  spatial  distortion,  but  they  might 
be  unaffected  (or,  in  some  cases,  facilitated)  by  temporal  dis¬ 
tortion.  Other  tasks,  such  as  those  that  rely  heavily  on  working 
memory,  mathematical  ability,  or  tasks  requiring  target  detec- 
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tion,  could  each  show  different  patterns  of  change  in  response 
to  space-time  distortion. 

Potential  benefits  of  space-time  distortion.  Under 
certain  conditions,  the  narrowing  of  spatial  attention  can  bene¬ 
fit  performance  through  the  elimination  of  irrelevant  cues.  The 
precise  conditions  under  which  this  occurs,  however,  remains 
unclear.  In  addition,  it  is  important  to  identify  the  circumstances 
under  which  time  distortion  might  actually  prove  beneficial. 
Here,  operators  perceive  that  they  have  additional  time  to  com¬ 
plete  the  task  at  hand  (Hancock  &  Weaver,  2005).  This  would 
have  great  benefit  in  task  performance  situations  where  atten- 
tional  narrowing  is  less  likely  to  have  deleterious  effects.  At  this 
point,  this  is  an  empirical  question  that  might  be  amenable  to 
controlled  testing. 

Changes  in  adaptation:  the  roles  of  time  and  intensity. 

The  degree  to  which  a  task  or  the  physical  and  social  environ¬ 
ment  imposes  stress  is  moderated  by  the  characteristics  of  the 
stimuli  as  well  as  the  context  in  which  events  occur.  However, 
two  factors  that  seem  to  ubiquitously  influence  how  much  stress 
impairs  adaptation  are  the  (appraised)  intensity  of  the  stressor 
and  the  duration  of  exposure.  We  have  recently  reported  meta- 
analytic  evidence  that  these  two  factors  jointly  impact  task  per¬ 
formance  across  different  orders  of  task  (e.g.,  vigilance,  prob¬ 
lem  solving,  tracking;  see  Hancock,  Ross,  &  Szalma,  in  press). 
Duration  is  further  implicated  in  information  processing  itself, 
and  may  be  a  central  organizing  principle  for  information  pro¬ 
cessing  in  the  brain  (Hancock,  Szalma,  &  Oron-Gilad,  2005). 
Empirical  research  is  needed,  however,  to  programmatically  ex¬ 
plore  the  interactive  effects  of  these  two  variables  across  multi¬ 
ple  forms  of  information  processing. 

Understanding  Performance-Workload 
Associations/Dissociations 

Task  factors.  Although  Hancock  (1996)  and  Yeh  and 
Wickens  (1988)  articulated  the  patterns  of  performance-work¬ 
load  relations  and  how  these  are  diagnostic  with  respect  to  pro¬ 
cessing  requirements,  there  has  been  little  systematic  effort  to 
further  investigate  these  associations/dissociations.  The  primary 
question  is  what  factors  drive  dissociations  and  insensitivities 
when  they  occur.  For  instance,  for  vigilance  mostly  associations 
are  observed,  while  for  other  tasks,  such  as  those  with  high 
working  memory  demand,  dissociations  are  more  common 
(Yeh  &  Wickens,  1988).  Enhanced  understanding  of  these  rela¬ 
tions  would  inform  the  Hancock  and  Warm  (1989)  model  by 
permitting  specification  of  the  conditions  under  which  individ¬ 
uals  pass  over  the  thresholds  of  failure  at  each  level  of  person- 
environment  transaction/adaptation. 

Multidimensionality  of  workload.  To  date,  considera¬ 
tion  of  performance-workload  dissociations  has  been  primarily 
concerned  with  global  measures  of  perceived  workload.  How¬ 
ever,  there  is  clear  evidence  that  perceived  workload  is  in  fact 
multidimensional.  For  instance,  vigilance  tasks  are  characterized 
by  high  levels  of  mental  demand  and  frustration  (Warm,  Dember, 
&  Hancock,  1996).  It  is  likely  that  the  pattern  of  performance- 


workload  links  will  be  different  for  different  orders  of  perfor¬ 
mance  (different  tasks)  but  also  for  different  dimensions  of 
workload.  One  approach  to  addressing  this  question  would  be  to 
systematically  manipulate  combinations  of  these  two  variables. 
For  instance,  if  we  consider  performance  in  terms  of  detection 
sensitivity,  memory  accuracy,  speed  of  response,  and  consider 
the  dimensions  of  workload  defined  by  the  NASA  Task  Load  In¬ 
dex  (Hart  &  Staveland,  1988),  one  could  examine  how  variations 
in  memory  load  or  discrimination  difficulty  link  to  each  subscale. 


INDIVIDUAL  DIFFERENCES  IN  PERFORMANCE, 
WORKLOAD,  AND  STRESS 


Elsewhere,  we  reviewed  the  relations  between  individual  dif¬ 
ferences  in  state  and  trait  to  efforts  to  quantify  human  informa¬ 
tion  processing  (Szalma  &  Hancock,  2005).  Here,  we  address 
how  individual  differences  (state  and  trait)  are  related  to  stress 
and  coping. 

Trait  Differences 

Individual  differences  research  has  been  a  relatively  neglected 
area  in  human  factors  and  experimental  psychology.  Much  of 
the  early  work  on  individual  differences  was  done  by  individuals 
not  concerned  with  human-technology  interaction,  to  the 
extent  that  a  bifurcation  between  two  kinds  of  psychology 
occurred  (Cronbach,  1957).  There  is  evidence,  however,  that  af¬ 
fective  traits  influence  information  processing  and  perfor¬ 
mance.  Thus,  extraversion  is  associated  with  superior  perfor¬ 
mance  in  working  memory  tasks  and  divided  attention,  but  also 
with  poorer  sustained  attention  (c£,  Koelega,  1992).  Trait  anxi¬ 
ety  is  associated  with  poorer  performance,  although  results  vary 
across  task  types  and  contexts  (Matthews  et  al.,  2003).  A  possi¬ 
ble  next  step  for  such  research  will  be  to  systematically  vary  task 
elements,  as  discussed  previously  in  the  context  of  the  Hancock 
and  Warm  (1989)  model,  and  test  hypotheses  regarding  how 
trait  anxiety  relates  to  specific  task  components  (for  an  example 
applied  to  Extraversion,  see  Matthews,  1992).  The  theoretical 
challenge  for  such  an  undertaking  is  that  it  requires  a  good  tax¬ 
onomic  scheme  for  tasks  as  well  as  a  well-articulated  theory  of 
traits  and  performance.  However,  trait  theories  have  neglected 
specific  task  performance,  focusing  instead  on  global  measures 
(e.g.,  see  Barrick,  Mount,  &  Judge,  2001),  and  there  is  a  lack  of 
a  comprehensive  theory  to  account  for  trait-performance  rela¬ 
tions  (Matthews  et  al.,  2003).  Most  current  theories  are  more 
like  frameworks  that  do  not  provide  specific  mechanisms  for 
how  personality  impacts  cognition  and  performance  (e.g.,  see 
McCrae  &  Costa,  1999).  Although  H.  J.  Eysenck  (1967)  proposed 
a  theory  of  personality  based  on  arousal  and  activation,  which 
has  found  some  support  (Eysenck  &  Eysenck,  1985),  there  has 
also  been  evidence  that  arousal  and  task  difficulty  fail  to  inter¬ 
act  as  predicted  (Matthews,  1992).  H.  J.  Eysenck’s  (1967)  theory 
was  also  weakened  by  the  general  problems  associated  with 
arousal  theory  accounts  for  stress  effects  (Hockey,  R.,  1984). 
An  alternative  formulation  is  that  of  Gray  (1991),  who  argued 
for  two  systems,  one  responding  to  reward  signals  and  one  with 
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punishment.  The  behavioral  activation  system  (BAS)  is  associ¬ 
ated  with  positive  affect,  while  the  behavioral  inhibition  system 
with  negative  affect.  In  a  review  and  comparisons  of  the  H.  J. 
Eysenck  and  Gray  theories,  Matthews  and  Gilliland  (1999)  con¬ 
cluded  that  both  theories  have  only  been  partially  supported, 
but  that  Gray’s  BAS/BIS  distinction  provides  a  superior  match 
to  positive  and  negative  affect  relative  to  H.  J.  Eysenck’s  arousal 
dimensions.  Further,  the  BAS/BIS  accords  with  theories  of  ap¬ 
proach/avoidance  motivation  (e.g.,  Elliot  &  Covington,  2001). 
There  are  also  theories  that  focus  on  a  particular  trait,  such  as 
Extraversion  (Humphreys  &  Revelle,  1984)  or  trait  anxiety 
(Eysenck,  M.  W,  &  Calvo,  1992).  While  useful,  such  specific  the¬ 
ories  do  not  encompass  other  traits  or  interactions  among  traits. 
Such  interactive  effects  can  influence  cognitive  performance 
and  perceived  stress  and  workload  (Szalma,  Oron-Gilad,  Stafford, 
&  Hancock,  2005).  These  interactions  should  be  further  studied 
with  an  eye  to  linking  them  to  information  processing  theories. 

Affective  State  Differences 

It  is  intuitive  that  stress  would  induce  more  negative  affective 
states,  and  that  traits  would  influence  performance  via  an  ef¬ 


fect  on  states.  Eor  instance,  one  would  expect  that  trait  anxiety 
would  influence  performance  because  high  trait  anxious  indi¬ 
viduals  experience  state  anxiety  more  frequently  than  those  low 
on  that  trait.  While  such  mediation  effects  are  observed,  there  is 
also  evidence  that,  for  certain  processes,  such  as  hyper  vigilance 
to  threat,  trait  anxiety  is  a  better  predictor  of  performance  than 
state  anxiety  (Eysenck,  M.  W,  1992).  In  terms  of  appraisal  theory, 
traits  may  influence  the  form  and  content  of  appraisal,  as  well  as 
the  coping  skills  the  individual  can  deploy  to  deal  with  the 
stress.  In  regard  to  the  adaptation,  it  is  likely  that  individual  dif¬ 
ferences  in  both  trait  and  state  will  influence  adaptation,  both 
behavioral  and  physiological,  by  affecting  the  width  of  the 
plateau  of  effective  adaptation  at  a  given  level,  and  by  changing 
the  slope  of  decline  in  adaptation  when  the  adaptation  thresh¬ 
old  has  been  reached.  That  is,  higher  skill  levels  protect  from 
declines  in  adaptive  function  by  increasing  the  threshold  for  fail¬ 
ure  at  a  given  level  (e.g.,  comfort,  performance,  physiological 
response).  The  modification  of  the  Hancock  and  Warm  (1989) 
model,  illustrating  these  individual  differences  effects,  is  shown 
in  Fig.  6.6.  Multiple  frameworks  of  state  dimensions  exist,  but 
most  focus  on  either  two  (e.g.,  Thayer,  1989;  Watson  &  Tellegen, 
1985),  or  three  (Matthews  et  al.,  1999,  2002).  In  the  context  of 
task  performance,  Matthews  and  his  colleagues  identified  three 
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FIGURE  6.6.  The  adaptability  model  of  Hancock  and  Warm  (1989)  shown  in  Figure  3  has  been 
modified  to  illustrate  how  individual  differences  may  influence  stress  and  adaptation.  It  is  likely 
that  cognitive  and  affective  traits  influence  both  the  width  of  the  comfort  and  performance  zones 
(i.e.,  the  'thresholds’  for  declines  in  adaptation)  as  well  as  the  rate  of  decline  in  adaptability 
when  a  threshold  has  been  crossed.  For  instance,  individuals  high  in  trait  anxiety  would  likely  have 
a  narrower  plateau  of  stability  and  would  therefore  manifest  lower  thresholds  for  discomfort  and 
performance  degradation  than  individuals  low  on  that  trait.  Further,  the  rate  of  decline  in  adap¬ 
tation  may  increase  as  a  function  of  trait  anxiety. 
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broad  state  dimensions  reflecting  the  cognitive,  affective,  and 
motivational  aspects  of  an  individual’s  current  psychological 
state.  These  dimensions  are  “worry,”  which  reflects  the  cogni¬ 
tive  dimension  of  stress,  and  “task  engagement”  and  “distress,” 
which  reflect  the  affective,  cognitive,  and  motivational  compo¬ 
nents  of  state.  Specifically,  a  high  level  of  distress  is  indicative 
of  overload  in  processing  capacity,  while  task  engagement  re¬ 
flects  a  theme  of  commitment  to  effort  (Matthews  et  al.,  2002). 
Matthews  and  his  colleagues  (2002)  demonstrated  that  changes 
in  task  demand  influence  the  pattern  of  stress  state.  It  is  there¬ 
fore  important  to  incorporate  assessment  of  operator  state  into 
the  interface  design  process  so  that  the  interaction  with  the 
technology  fosters  task  engagement  and  minimizes  distress 
and  worry. 

Attentional  narrowing  and  adaptive  response.  As 

with  other  aspects  of  perception,  there  are  individual  differ¬ 
ences  in  the  perception  of  space  and  time  (Hancock  &  Weaver, 
2002;  Wachtel,  1967).  Further,  because  the  subjective  experi¬ 
ence  of  stress  is  often  multidimensional,  it  may  be  that  two  in¬ 
dividuals  are  subjectively  stressed  by  the  same  situation  but  that 
their  stress  state  profiles  differ.  Individuals  are  also  likely  to  dif¬ 
fer  in  the  strategies  they  employ  to  cope  with  the  distortions  of 
space-time  they  experience  while  in  a  stressful  environment, 
and  these  coping  differences,  if  they  exist,  might  depend  on 
the  quality  (e.g.  noise,  heat,  low  signal  salience)  and  source  (e.g. 
environment,  the  task)  of  the  stress  and  the  personality  traits 
of  the  individual. 

Hedonomics  and  individual  differences.  In  addition 
to  application  of  individual  differences  research  to  development 
of  training  or  selection  procedures,  individual  of  relevant  indi¬ 
vidual  different  variables  can  promote  hedonomic  approaches 
to  design  and  facilitate  individuation  in  interface  design.  Thus,  if 
the  traits  that  influence  the  subjective  experience  of  an  interac¬ 
tion  with  technology  are  identified,  that  interface  can  then  be 
configured  to  meet  the  preferences  and  the  trait/state  profile 
of  the  individual  user  and  promote  positive  affective  states. 
However,  for  such  efforts  to  succeed,  the  relations  among  traits 
and  cognitive,  perceptual,  and  motor  performance  will  need  to 
be  established  via  theory-guided  empirical  research. 


IMPLICATIONS  OF  STRESS 
FOR  RESEARCHERS  AND  PRACTITIONERS 


For  both  research  and  design  application,  the  extant  research 
on  stress  and  performance  indicates  that  assessment  of  work¬ 
load  and  affective  state  are  important  for  a  more  complete  un¬ 
derstanding  of  HCI.  Such  assessments  can  aid  in  identifying 
which  components  of  an  interface  or  task  are  appraised  as 
stressful  and  thereby  design  to  mitigate  their  negative  effects. 
For  instance,  research  is  needed  to  establish  which  task  para¬ 
meters  control  the  patterns  of  performance-workload  associa¬ 
tions  and  dissociations,  and  how  these  change  dynamically  as  a 
function  of  time  on  task.  The  Hancock  and  Warm  (1989)  model 
of  stress  established  general  task  dimensions  (space-time)  that 
influence  stress  state  and  behavioral  adaptability,  but  the  metrics 


for  these  dimensions  remain  elusive.  This  problem  results  from 
the  central  issue  regarding  how  to  quantify  human  information 
processing  (Hancock,  Szalma,  &  Oron-Gilad,  2005)  and  define 
mental  resources  more  precisely  (Hancock  &  Szalma,  2007).  Ef¬ 
forts  to  resolve  these  definitional  problems  would  improve 
stress  theory  and  its  application  to  interface  design.  Future  re¬ 
search  should  therefore  examine  the  relations  between  task  di¬ 
mensions  and  user  characteristics,  and  how  these  change  over 
time  and  under  high-stress  conditions. 

In  addition  to  changing  the  task,  there  are  other  techniques 
that  can  be  applied  to  the  design  of  HCIs  for  use  in  stressful  en¬ 
vironments.  These  include  skill  development  (e.g.,  Hancock, 
1986)  and  use  of  configural  displays  (Hancock  &  Szalma,  2003a; 
Wickens,  1996),  as  well  as  technologies  employing  adaptive  au¬ 
tomation  and  decision  aids  (Hancock  &  Chignell,  1987).  In  re¬ 
gard  to  skill  development  in  particular,  an  area  in  need  of  re¬ 
search  is  how  individuals  who  develop  expertise  also  learn  how 
to  cope  with  stress  while  performing  the  task.  Understanding 
how  individuals  accomplish  this  will  require  advances  in  under¬ 
standing  how  different  forms  of  stress  influence  different  forms 
of  information  processing. 

It  is  also  important  for  both  researchers  and  practitioners 
to  consider  the  characteristics  of  the  user  and  to  consider  how 
these  characteristics  interact  with  the  task  or  interface  to  influ¬ 
ence  performance.  Understanding  how  individual  differences 
influence  human-computer  interaction  can  facilitate  develop¬ 
ment  of  tailored  training  regimens  as  well  as  interfaces  that 
more  effectively  adapt  to  the  user.  Systems  that  can  respond 
to  changes  in  operator  affective  state  will  achieve  the  desired 
human-machine  synergy  in  HCI  (c.f ,  Hancock,  1997).  Realizing 
these  goals,  however,  will  require  adequate  theory  develop¬ 
ment  and  subsequent  empirical  research  to  determine  the  na¬ 
ture  of  the  relations  among  the  person  and  environmental  vari¬ 
ables.  It  will  be  particularly  important  to  design  interfaces  that 
permit  autonomous  motivation  (Deci  &  Ryan,  2000),  and  to  un¬ 
derstand  how  operators  of  computer-based  systems  can  inter¬ 
nalize  extrinsic  motivation  as  they  gain  experience  with  the 
task.  (Gagne  &  Deci,  2005).  We  suggest  here  that  researchers 
and  designers  identify  the  motivational  affordances  in  the  task 
environment  and  utilize  these  to  enhance  the  experience  of 
HCI  and  improve  overall  system  performance  under  stress.  Mo¬ 
tivational  affordances  will  be  elements  of  the  work  environ¬ 
ment  that  facilitate  and  nurture  intrinsic  motivation.  Particu¬ 
larly  important  for  design  will  be  to  identify  motivational 
invariants,  which  are  those  environmental  factors  that  consis¬ 
tently  determine  an  individual’s  level  of  intrinsic  (or  extrinsic) 
motivation  across  contexts.  Careful  analyses  of  the  motivational 
affordance  structure  will  permit  design  of  tasks  that  are  more 
likely  to  be  enjoyable  by  rendering  the  tools  convivial  (Illich, 
1973)  and  thereby  facilitating  human-machine  synergy  (and  see 
Hancock,  1997). 


SUMMARY  AND  CONCLUSIONS 


In  this  chapter,  we  reviewed  three  theories  of  stress  and  perfor¬ 
mance  and  their  relevance  for  human-technology  interaction. 
We  also  showed  that  despite  separate  research  traditions,  work- 
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load  and  stress  might  be  viewed  as  different  perspectives  on  the 
same  problem.  We  outlined  some  general  principles  for  stress 
mitigation,  and  issues  requiring  further  research.  Of  particular 
importance  will  be  establishing  sound  measures  of  information 
processing  and  mental  resources,  as  well  as  articulating  the  rel¬ 
evant  task  dimensions  and  how  they  relate  to  self-regulatory 
mechanisms.  Given  that  stress  can  only  be  understood  in  rela¬ 
tion  to  the  transaction  between  an  individual  and  the  environ¬ 
ment,  it  will  be  crucial  to  establish  how  traits  and  states  of  the  in¬ 
dividual  influence  their  appraisals  of  their  environments.  Finally, 
it  will  be  important  in  practical  application  to  treat  stress  at  mul¬ 
tiple  levels,  ranging  from  the  physiological  to  the  organizational 
sources  of  adverse  performance  effects.  Traditional  attempts  to 


treat  stress  problems  unidimensionally  will  continue  to  fail  until 
the  person,  task,  and  physical,  social,  and  organizational  envi¬ 
ronments  are  treated  as  a  system.  Researchers  and  practition¬ 
ers  in  HCI  should  therefore  expand  their  efforts  beyond  the  de¬ 
sign  of  the  displays  and  controls  of  interfaces  and  include 
assessment  of  the  person  factors  that  influence  performance  as 
well  as  the  design  of  the  physical  and  social  environment  in 
which  the  human-computer  interaction  occurs. 

The  argument  here  is  not  that  neural  structures  are  not  con¬ 
strained  by  the  laws  of  thermodynamics — clearly  they  are — but 
that  thermodynamic  principles  implied  by  the  metaphor  are  not 
sufficient  for  the  development  of  a  complete  description  of  re¬ 
sources  and  their  relation  to  cognitive  activity. 
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Since  the  advent  of  modern  computing  in  1946,  the  uses  of 
computing  technology  have  expanded  far  beyond  their  initial 
role  of  performing  complex  calculations  (Denning  &  Metcalfe, 
1997).  Computers  are  not  just  for  scientists  any  more;  they 
are  an  integral  part  of  workplaces  and  homes.  The  diffusion 
of  computers  has  led  to  new  uses  for  interactive  technology; 
including  the  use  of  computers  to  change  people’s  attitudes 
and  behavior — in  a  word:  persuasion.  Computing  pioneers  of 
the  1940s  probably  never  imagined  computers  being  used  to 
persuade. 

Today,  creating  successful  human-computer  interactions 
(HCIs)  requires  skills  in  motivating  and  persuading  people. 
However,  interaction  designers  don’t  often  view  themselves  as 
agents  of  influence.  They  should.  The  work  they  perform  often 
includes  crafting  experiences  that  change  people — the  way  peo¬ 
ple  feel,  what  they  believe,  and  the  way  in  which  they  behave. 
Consider  these  common  challenges:  How  can  designers  moti¬ 
vate  people  to  register  their  software?  How  can  they  get  peo¬ 
ple  to  persist  in  learning  an  online  application?  How  can  they 
create  experiences  that  build  product  loyalty?  Often,  the  success 
of  today’s  interactive  product  hinges  on  changing  people’s  atti¬ 
tudes  or  behaviors. 

Sometimes  the  influence  elements  in  HCI  are  small,  almost 
imperceptible,  such  as  creating  a  feeling  of  confidence  or  trust 
in  what  the  computing  product  says  or  does.  Other  times,  the 
influence  element  is  large,  even  life  altering,  such  as  motivating 
someone  to  quit  smoking.  Small  or  large,  elements  of  influence 
are  increasingly  present  on  Websites,  in  productivity  tools,  in 
video  games,  in  wearable  devices,  and  in  other  types  of  inter¬ 
active  computing  products.  Due  to  the  growing  use  of  comput¬ 
ing  products  and  to  the  unparalleled  ability  of  software  to  scale, 
interaction  designers  may  well  become  leading  change  agents  of 
the  future.  Are  we  ready? 

The  study  and  design  of  computers  as  persuasive  technolo¬ 
gies,  referred  to  as  captology,  is  a  relatively  new  endeavor  when 
compared  to  other  areas  of  HCI  (Fogg,  1997,  1998,  2003).  For¬ 
tunately,  understanding  in  this  area  is  growing.  HCI  profession¬ 
als  have  established  a  foundation  that  outlines  the  domains  of 
applications,  useful  frameworks,  methods  for  research,  design 
guidelines,  best-in-class  examples,  as  well  as  ethical  issues 
(Berdichevsky  &  Neuenschwander,  1999;  Fogg,  1999;  Khaslavsky 
&  Shedroff,  1999;  King  &  Tester,  1999;  Tseng  &  Fogg,  1999).  This 
chapter  will  not  address  all  these  areas  in-depth,  but  it  will  share 
some  key  perspectives,  frameworks,  and  design  guidelines  re¬ 
lating  to  captology. 
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DEFINING  PERSUASION  AND  GIVING 
HIGH-TECH  EXAMPLES 


What  is  “persuasion”?  As  one  might  predict,  scholars  do  not 
agree  on  the  precise  definition.  For  the  sake  of  this  chapter,  per- 
suasion  is  a  noncoercive  attempt  to  change  attitudes  or  be¬ 
haviors.  There  are  some  important  things  to  note  about  this  de¬ 
finition.  First,  persuasion  is  noncoercive.  Coercion — the  use  of 
force — is  not  persuasion;  neither  is  manipulation  or  deceit. 
These  methods  are  shortcuts  to  changing  how  people  believe 
or  behave,  and  for  interaction  designers  these  methods  are 
rarely  justifiable. 

Next,  persuasion  requires  an  attempt  to  change  another  per¬ 
son.  The  word  attempt  implies  intentionality.  If  a  person 
changes  someone  else’s  attitude  or  behavior  without  intent  to 
do  so,  it  is  an  accident  or  a  side  effect;  it  is  not  persuasion.  This 
point  about  intentionality  may  seem  subtle,  but  it  is  not  trivial. 
Intentionality  distinguishes  between  a  side  effect  and  a  planned 
effect  of  a  technology.  At  its  essence,  captology  focuses  on  the 
planned  persuasive  effects  of  computer  technologies. 

Finally,  persuasion  deals  with  attitude  changes  or  behavior 
changes  or  both.  While  some  scholars  contend  persuasion  per¬ 
tains  only  to  attitude  change,  other  scholars  would  concur  with 
our  view:  including  behavior  change  as  a  target  outcome  of  per¬ 
suasion.  Indeed,  these  two  outcomes — attitude  change  and  be¬ 
havior  change — are  fundamental  in  the  study  of  computers  as 
persuasive  technologies. 

Note  how  attitude  and  behavior  changes  are  central  in  two 
examples  of  persuasive  technology  products.  First,  consider  the 
CD-ROM  pmAuct  5  A  Day  Adventures  (www.dole5aday.com). 
Created  by  Dole  Foods,  this  computer  application  was  designed 
to  persuade  kids  to  eat  more  fruits  and  vegetables.  Using  5  A 
Day  Adventures,  children  enter  a  virtual  world  with  characters 
like  “Bobby  Banana”  and  “Pamela  Pineapple,”  who  teach  kids 
about  nutrition  and  coach  them  to  make  healthy  food  choices. 
The  program  also  offers  areas  where  children  can  practice  mak¬ 
ing  meals  using  fresh  produce,  and  the  virtual  characters  offer 
feedback  and  praise.  This  product  clearly  aims  to  change  the 
attitudes  children  have  about  eating  fruits  and  vegetables.  How¬ 
ever,  even  more  important,  the  product  sets  out  to  change  their 
eating  behaviors. 

Next,  consider  a  more  mundane  example:  Amazon.com.  The 
goal  of  this  Website  is  to  persuade  people  to  buy  products  again 
and  again  from  Amazon.com.  Everything  on  the  Website  con- 
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tributes  to  this  result:  user  registration,  tailored  information, 
limited-time  offers,  third-party  product  reviews,  one-click  shop¬ 
ping,  confirmation  messages,  and  more.  Dozens  of  persuasion 
strategies  are  integrated  into  the  overall  experience.  Although 
the  Amazon  online  experience  may  appear  to  be  focused  on 
providing  mere  information  and  seamless  service,  it  is  really 
about  persuasion — buy  things  now  and  come  back  for  more. 


THE  FOURTH  WAVE:  PERSUASIVE 
INTERACTIVE  TECHNOLOGY 


Computing  systems  did  not  always  contain  elements  of  influ¬ 
ence.  It  has  only  been  in  recent  years  that  interactive  computing 
became  mature  enough  to  spawn  applications  with  explicit  ele¬ 
ments  of  influence.  The  dramatic  growth  of  technologies  de¬ 
signed  to  persuade  and  motivate  represents  the  fourth  wave  of 
focus  in  end-user  computing.  The  fourth  wave  leverages  ad¬ 
vances  from  the  three  previous  waves  (Fig.  7.1). 

The  first  wave  of  computing  began  over  50  years  ago  and 
continues  today.  The  energy  and  attention  of  computer  profes¬ 
sionals  mainly  focused  on  getting  computing  devices  to  work 
properly,  and  then  to  make  them  more  and  more  capable.  In 
short,  the  first  wave  is  function. 

The  second  wave  of  computing  began  in  the  1970s  with  the 
emergence  of  digital  gaming,  first  represented  by  companies 
like  Atari  and  with  products  like  Pong.  This  wave  is  entertain¬ 
ment,  and  it  continues  to  swell  because  of  continued  attention 
and  energy  devoted  to  computer-based  fun. 

The  third  wave  of  computing  came  in  the  1980s  when  hu¬ 
man  factors  specialists,  designers,  and  psychologists  sought  to 
create  computers  for  ordinary  people.  This  third  wave  is  ease 
of  use.  Although  new  developments,  like  the  computer  mouse 
and  the  graphical-user  interface  came  before  1980,  a  consumer 


product — the  Apple  Macintosh —  generated  widespread  atten¬ 
tion  and  energy  to  making  computers  easier  to  use.  Like  the 
previous  two  waves,  the  third  wave  keeps  rolling  today.  It  pro¬ 
vides  the  foundation  for  most  work  in  HCI  arenas. 

In  addition,  this  brings  us  to  the  fourth  wave:  computers  de¬ 
signed  to  persuade.  Early  signs  of  this  wave  appeared  in  the 
1970s  and  1980s  with  a  handful  of  computing  systems  designed 
to  motivate  health  behaviors  and  work  productivity.  However, 
it  wasn’t  until  the  late-1990s — specifically  during  the  rise  of  the 
World  Wide  Web — that  more  than  a  handful  of  people  began  to 
devote  attention  and  energy  to  making  interactive  systems  ca¬ 
pable  of  motivating  and  influencing  users.  This  fourth  wave — 
persuasion — is  new  and  could  be  as  significant  as  the  three 
waves  that  have  come  before  it. 


DOMAINS  WHERE  PERSUASION 
AND  MOTIVATION  MATTER 


Captology  is  relevant  to  systems  designed  for  many  facets  of  hu¬ 
man  life.  The  most  obvious  domain  is  in  promoting  com¬ 
merce — buying  and  branding,  especially  via  the  Web.  While 
promoting  commerce  is  perhaps  the  most  obvious  and  lucrative 
application,  at  least  11  other  domains  are  potential  areas  for  per¬ 
suasive  technology  products.  The  various  domains,  along  with  a 
sample  target  behavior  change,  are  summarized  in  Table  7.1. 

The  domains  in  the  table  reflect  how  much  persuasion  is 
part  of  ordinary  human  experience,  from  personal  relationships 
to  environmental  conservation.  Interactive  technologies  have 
been — and  will  continue  to  be — created  to  influence  people  in 
these  12  domains,  as  well  as  in  others  that  are  less  apparent.  The 
way  various  computing  products  incorporate  persuasion  and 
motivation  principles  will  evolve  as  computing  technology  ad- 
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FIGURE  7. 1 .  Persuasion  is  the  fourth  major  wave  of  computing. 


TABLE  7.  i .  12  Domains  for  Persuasive  Technology 


Domains  for  Persuasive 

Technologies 

Example 

Commerce — Buying  and  Branding 

To  buy  a  certain  product 

Education,  Learning,  and  Training 

To  engage  in  activities  that 
promote  learning 

Safety 

To  drive  more  safely 

Environmental  Conservation 

To  reuse  shopping  bags 

Occupational  Productivity 

To  set  and  achieve  goals  at 
work 

Preventative  Health  Care 

To  quit  smoking 

Fitness 

To  exercise  with  optimal 
intensity/frequency 

Disease  Management 

To  manage  diabetes  better 

Personal  Finance 

To  create  and  adhere  to  a 
personal  budget 

Community  Involvement/Activism 

To  volunteer  time  at  a 
community  center 

Personal  Relationships 

To  keep  in  touch  with  their 
aging  parents 

Personal  Management  and 
Improvement 

To  avoid  procrastination 
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vances  and  as  people  adopt  a  wider  array  of  interactive  systems 
for  a  wider  range  of  human  activities.  The  influence  elements 
in  these  systems  can  be  readily  apparent,  or  they  can  be  woven 
into  the  fabric  of  an  interactive  experience,  a  distinction  ex¬ 
plored  in  the  next  section. 


PERSUASION  AND  INTERACTIVE 
TECHNOLOGY:  TWO  LEVELS  OF  ANALYSIS 


One  key  insight  in  captology  is  to  see  that  persuasion  in  com¬ 
puting  products  takes  place  on  two  levels:  macro  and  micro. 
On  the  macro  level,  one  finds  products  designed  for  an  overall 
persuasive  outcome.  For  example,  the  Dole  5  A  Day  CD-ROM 
and  the  Amazon.com  Website  are  designed  specifically  for  per¬ 
suasion.  For  these  and  other  products,  persuasion  and  moti¬ 
vation  are  the  sole  reason  these  products  exist.  We  use  the 
word  “macrosuasion”  to  describe  this  type  of  big-picture  target 
outcome. 

On  the  other  hand,  one  finds  computing  products  with 
what  I  call  “microsuasion.”  These  products  could  be  word-pro- 
cessing  programs  or  spreadsheets;  they  do  not  necessarily  have 
a  persuasive  outcome  as  the  overall  goal  of  the  product.  How¬ 
ever,  they  will  incorporate  smaller  elements  of  influence  to 
achieve  other  goals.  Microsuasion  can  be  incorporated  into  di¬ 
alogue  boxes,  visual  elements,  interactions  sequences,  and 
more.  In  productivity  software,  microsuasion  can  lead  to  in¬ 
creased  productivity  or  stronger  brand  loyalty.  The  following 
examples  will  help  clarify  the  distinction  between  macrosua¬ 
sion  and  microsuasion  (Fogg,  2003). 

Examples  of  Macrosuasion 

One  notable  example  of  macrosuasion  is  a  product  named  Baby 
Think  It  Over.  A  U.S.  company  (www.btio.com)  designed  this 
computerized  doll  to  simulate  the  time  and  energy  required  to 
care  for  a  baby,  with  the  overall  purpose  of  persuading  teens  to 
avoid  becoming  parents  prematurely.  Used  as  part  of  many 
school  programs  in  the  United  States,  the  Baby  Think  It  Over  in¬ 
fant  simulator  looks,  weighs,  and  cries  something  like  a  real 
baby.  The  computer  embedded  inside  the  doll  triggers  a  crying 
sound  at  random  intervals;  in  order  to  stop  the  crying  sound, 
the  teen  caregiver  must  pay  immediate  attention  to  the  doll.  If 
the  caregiver  fails  to  respond  appropriately,  the  computed  em¬ 
bedded  inside  the  doll  records  the  neglect.  After  a  few  days  of 
caring  for  the  simulated  infant,  teenagers  generally  report  less 
interest  in  becoming  a  parent  in  the  near  future  (see  www.btio 
.com),  which — along  with  reduced  teen  pregnancy  rates — is  the 
intended  outcome  of  the  device. 

Next,  consider  Scorecard.org  as  another  example  of  macro¬ 
suasion.  Created  by  the  Environmental  Defense  Foundation, 
this  Website  helps  people  find  information  about  pollution 
threats  in  their  neighborhoods.  When  users  enter  their  zip 
code,  the  site  lists  names  of  the  polluting  institutions  in  their 
area,  gives  data  on  chemicals  being  released,  and  outlines  the 
possible  health  consequences.  But  that’s  not  all.  Scorecard.org 
then  encourages  users  to  take  action  against  the  polluting  or¬ 


ganizations  and  makes  it  easy  to  contact  policymakers  to  ex¬ 
press  concerns.  This  Website  aims  to  increase  community  ac¬ 
tivism  in  order  to  pressure  officials  and  offending  institutions 
into  cleaning  up  the  environment.  The  entire  point  of  this  Web¬ 
site  is  to  get  people  to  take  action  against  polluting  institutions 
in  their  neighborhoods.  This  is  macrosuasion. 

Examples  of  Microsuasion 

Most  computing  products  were  not  created  with  persuasion  as 
the  main  focus.  Larger  software  categories  include  applications 
for  productivity,  entertainment,  and  creativity.  Yet  these  same 
products  often  use  influence  elements,  microsuasion,  as  part 
of  the  overall  experience.  Examples  of  interactive  products  us¬ 
ing  microsuasion  are  plentiful — and  sometimes  subtle.  A  word¬ 
processing  program  may  encourage  users  to  spell  check  text,  or 
a  Website  devoted  to  high-chool  reunions  may  reward  alumni 
for  posting  a  current  photograph  online.  This  is  persuasion  on  a 
microlevel. 

For  a  deeper  look  at  microsuasion,  consider  the  personal  fi¬ 
nance  application  Quicken,  created  by  Intuit  (www.intuit.com). 
Quicken  is  a  productivity  software  product.  Its  overall  goal  is  to 
simplify  the  process  of  managing  personal  finances.  Quicken 
uses  microsuasion  to  accomplish  this  overall  goal.  For  exam¬ 
ple,  the  application  reminds  users  to  take  financial  responsibil¬ 
ity,  such  as  paying  bills  on  time.  In  addition,  the  software  tracks 
personal  spending  habits  and  shows  results  in  graphs,  allowing 
projections  into  future  financial  scenarios.  In  addition,  the  soft¬ 
ware  praises  users  for  doing  necessary  but  menial  tasks,  like  bal¬ 
ancing  their  online  check  registry.  These  microsuasion  ele¬ 
ments — reminders,  visualizations,  and  praise — are  influence 
elements  embedded  in  the  Quicken  experience  in  order  to 
change  what  users  think  and  how  they  act.  Ideally,  when  these 
microsuasion  elements  succeed,  users  benefit  from  Quicken’s 
approach  to  managing  personal  finances. 

Like  Quicken,  educational  software  often  uses  microsuasion. 
The  overall  point  of  most  educational  applications  and  interac¬ 
tive  experiences  is  to  teach  facts  and  skills,  not  to  persuade. 
However,  in  order  to  get  users  to  stick  with  the  program  or  to 
believe  what  is  presented,  many  products  will  incorporate  mo¬ 
tivational  elements  as  well  as  building  credibility  perceptions  of 
the  content.  The  product  may  seek  to  persuade  the  learner  that 
the  content  is  important,  that  the  learner  is  able  to  successfully 
master  it,  and  that  following  the  guidelines  of  the  program  will 
lead  to  the  greatest  success.  Note  how  these  smaller  elements  of 
the  program — the  microsuasions — contribute  to  the  overall 
goal:  learning.  Furthermore,  interactive  educational  products 
will  often  incorporate  elements  of  games,  which  leads  to  a  large 
area  related  to  microsuasion:  computer-based  gaming. 

’Video  games  are  typically  rich  in  microsuasion  elements.  The 
overall  goal  of  most  games  is  to  provide  entertainment,  not  to 
persuade.  However,  during  the  entertainment  experience  play¬ 
ers  can  be  bombarded  with  microsuasion  elements,  sometimes 
continuously.  ’Video  games  can  leverage  the  seven  basic  intrinsic 
motivators:  challenge,  curiosity,  fantasy,  control,  competition, 
cooperation,  and  recognition  (Maline  &  Lepper,  1987).  "Video 
games  can  also  incorporate  other  categories  of  microsuasion, 
such  as  social-influence  dynamics. 
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Captology  is  relevant  to  computing  products  designed  with 
macrosuasion  in  mind — like  Baby  Think  It  Over — and  to  those 
that  simply  use  microsuasion  in  order  to  make  the  product 
more  successful — like  Quicken.  In  both  cases,  designers  must 
understand  how  to  create  interactive  experiences  that  change 
the  way  people  think  and  behave,  whether  it  is  for  a  single  over¬ 
all  outcome  or  for  near-term  outcomes  that  are  the  building 
blocks  of  a  larger  experience. 


NO  UNIVERSAL  THEORY  OF  PERSUASION 


Creating  interactive  technology  experiences  that  motivate  and 
influence  users  would  be  easy  if  persuasion  were  fully  under¬ 
stood.  It’s  not.  Our  understanding  is  limited,  despite  the  fact 
that  the  study  of  persuasion  extends  back  at  least  2,000  years. 
The  fields  of  psychology,  marketing,  advertising,  public-infor¬ 
mation  campaigns,  and  others  have  developed  theories  and  per¬ 
spectives  on  how  to  influence  and  motivate  people,  but  all 
approaches  have  limitations.  The  reality  is  this:  we  have  no  uni¬ 
versal  theory  or  framework  for  persuasion.  In  other  words,  no 
single  set  of  principles  fully  explains  what  motivates  people, 
what  causes  them  to  adopt  certain  attitudes,  and  what  leads 
them  to  perform  certain  behaviors  (Fogg,  2003;  Ford,  1992).  In 
some  ways,  this  is  not  a  surprise.  Human  psychology  is  complex, 
and  persuasion  is  a  large  domain,  often  with  fuzzy  boundaries. 
’Without  a  universal  theory  of  persuasion,  we  must  draw  from  a 
set  of  theories  and  models  that  describe  influence,  motivation, 
or  behavior  change  in  specific  situations  and  for  specific  types  of 
people.  This  limitation  creates  an  additional  challenge  for  de¬ 
signers  of  persuasive  technology  products. 

Because  computing  technology  creates  new  possibilities  for  in¬ 
fluencing  people,  work  in  captology  can  lead  to  new  frameworks, 
which,  although  not  perfect,  enhance  the  knowledge  and  practice 
in  HCI.  One  such  framework  is  the  “Functional  Triad”  (Fogg,  2003) . 


THE  FUNCTIONAL  TRIAD:  A  FRAMEWORK 
FOR  PERSUASIVE  TECHNOLOGY 


Computers  play  many  roles,  some  of  which  go  unseen  and  un¬ 
noticed.  From  a  user’s  perspective,  computers  function  in  three 
basic  ways:  as  (a)  tools,  as  (b)  media,  and  as  (c)  social  actors.  In 
the  last  two  decades,  researchers  and  designers  have  discussed 
variants  of  these  functions,  usually  as  metaphors  for  computer 
use  (i.e.,  Kay,  1984;  ’Verplank,  Fulton,  Black,  &  Moggridge,  1993). 
However,  these  three  categories  are  more  than  metaphors;  they 
are  basic  ways  that  people  view  or  respond  to  computing  tech¬ 
nologies.  These  categories  also  represent  three  basic  types  of 
experiences  that  motivate  and  influence  people. 

Described  in  more  detail  elsewhere  (Fogg,  1999,  2000,  2003), 
the  Functional  Triad  is  a  framework  that  makes  explicit  these 
three  computer  functions — tools,  media,  and  social  actors.  First, 
as  this  framework  suggests,  computer  applications  or  systems 
function  as  tools,  providing  users  with  new  abilities  or  powers. 
Using  computers  as  tools,  people  can  do  things  they  could  not 
do  before,  or  they  can  do  things  more  easily. 


The  Functional  Triad  also  suggests  that  computers  function 
as  media,  a  role  that  has  grown  dramatically  during  the  1990s 
as  computers  became  increasingly  powerful  in  displaying  graph¬ 
ics  and  in  exchanging  information  over  a  network  such  as  the 
Internet.  As  a  medium,  a  computer  can  convey  either  symbolic 
content  (i.e.,  text,  data  graphs,  icons)  or  sensory  content  (i.e., 
real-time  video,  virtual  worlds,  simulation). 

Finally,  computers  also  function  as  social  actors.  Empirical  re¬ 
search  demonstrates  that  people  form  social  relationships  with 
technologies  (Reeves  &  Nass,  1996).  The  precise  causal  factors 
for  these  social  responses  have  yet  to  be  outlined  in  detail,  but 
I  propose  that  users  respond  socially  when  computers  do  at 
least  one  of  the  following:  (1)  adopt  animate  characteristics  (i.e., 
physical  features,  emotions,  voice  communication),  (2)  play  an¬ 
imate  roles  (i.e.,  coach,  pet,  assistant,  opponent),  or  (3)  follow 
social  rules  or  dynamics  (i.e.,  greetings,  apologies,  taking  turns) 
(Fogg,  2003). 

The  Functional  Triad  is  not  a  theory;  it  is  a  framework  for 
analysis  and  design.  In  all  but  the  most  extreme  cases,  a  single 
interactive  technology  is  a  mix  of  these  three  functions,  com¬ 
bining  them  to  create  an  overall  user  experience. 

In  captology  the  Functional  Triad  is  useful  because  it  helps 
show  how  computer  technologies  can  employ  different  tech¬ 
niques  for  changing  attitudes  and  behaviors.  For  example,  com¬ 
puters  as  tools  persuade  differently  than  computers  as  social  ac¬ 
tors.  The  strategies  and  theories  that  apply  to  each  function 
differ.  The  paragraphs  that  follow  use  the  Functional  Triad  to 
highlight  aspects  of  persuasive  technology,  including  general 
design  strategies  and  approaches  for  creating  computing  prod¬ 
ucts  that  persuade  and  motivate. 

Computers  as  Persuasive  Tools 

In  general,  computers  as  persuasive  tools  affect  attitude  and 
behavior  changes  by  increasing  a  person’s  abilities  or  making 
something  easier  to  do  (Tombari,  Fitzpatrick,  &  Childress,  1985). 
Although  one  could  propose  numerous  possibilities  for  persua¬ 
sion  in  this  manner,  below  are  four  general  ways  in  which  com¬ 
puters  persuade  as  tools:  by  (a)  increasing  self-efficacy,  (b)  pro¬ 
viding  tailored  information,  (c)  triggering  decision  making,  and 
(d)  simplifying  or  guiding  people  through  a  process. 

Computers  That  Increase  Self-Efficacy 

Computers  can  increase  self-efficacy  (Lieberman,  1992),  an 
important  contributor  to  attitude  and  behavior  change 
processes.  Self-efficacy  describes  individuals’  beliefs  in  their  abil¬ 
ity  to  take  successful  action  in  specific  domains  (Bandura,  1997; 
Bandura,  Georgas,  &  Manthouli,  1996).  'When  people  perceive 
high  self-efficacy  in  a  given  domain,  they  are  more  likely  to  take 
action.  In  addition,  because  self-efficacy  is  a  perceived  quality 
even  if  individuals  merely  believe  that  their  actions  are  more 
effective  and  productive  (perhaps  because  they  are  using  a  spe¬ 
cific  computing  technology),  they  are  more  likely  to  perform  a 
particular  behavior  (Bandura,  1997;  Bandura,  Georgas,  &  Mant- 
housli,  1996).  As  a  result,  functioning  as  tools,  computing  tech¬ 
nologies  can  make  individuals  feel  more  efficient,  productive,  in 
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control,  and  generally  more  effective  (DeCharms,  1968;  Kernal, 
1999;  Pancer,  George,  &  Gebotys,  1992).  For  example,  a  heart- 
rate  monitor  may  help  people  feel  more  effective  in  meeting 
their  exercise  goals  when  it  provides  ongoing  information  on 
heart  rate  and  calories  burned.  Without  the  heart-rate  monitor, 
people  could  still  take  their  pulse  and  calculate  calories,  but  the 
computer  device — whether  it  be  worn  or  part  of  the  exercise 
machinery — makes  these  tasks  easier.  The  ease  of  tracking  heart 
rate  and  calories  burned  likely  increases  self-efficacy  in  fitness 
behavior,  making  it  more  likely  the  individual  will  continue  to 
exercise  (Brehm,  1997;  Strecher,  DeVellis,  Becker,  &  Rosenstock, 
1986;  Thompson,  1992). 

Computers  That  Provide  Tailored  Information 

Next,  computers  act  as  tools  when  they  tailor  information, 
offering  people  content  that  is  pertinent  to  their  needs  and  con¬ 
texts.  Compared  to  general  information,  tailored  information  in¬ 
creases  the  potential  for  attitude  and  behavior  change  (Beniger, 
1987;  Dijkstra,  Librand,  &  Timminga.,  1998;  Jimison,  Street,  & 
Gold,  1997;  Nowak,  Shamp,  Hollander,  Cameron,  Schumann, 
&  Thorson,  1999;  Strecher,  1999;  Strecher,  Kreuter,  Den  Boer, 
Kobrin,  Hospers,  &  Skinner.,  1994). 

One  notable  example  of  a  tailoring  technology  is  the  Website 
discussed  earlier.  Chemical  Scorecard  (www.scorecard.org), 
which  generates  information  according  to  an  individual’s  geo¬ 
graphical  location  in  order  to  achieve  a  persuasive  outcome.  Af¬ 
ter  people  enter  their  zip  code  into  this  Website,  the  Web  tech¬ 
nology  reports  on  chemical  hazards  in  their  neighborhood, 
identifies  companies  that  create  those  hazards,  and  describes 
the  potential  health  risks.  Although  no  published  studies  docu¬ 
ment  the  persuasive  effects  of  this  particular  technology,  out¬ 
side  research  and  analysis  suggests  that  making  information  rel¬ 
evant  to  individuals  increases  their  attention  and  arousal,  which 
can  ultimately  lead  to  increased  attitude  and  behavior  change 
(Beniger,  1987;  Macinnis  &  Jaworski,  1989;  Macinnis,  Moorman, 
&Jaworski,  1991;  Strecher,  1999). 

Computers  That  Trigger  Decision-Making 

Technology  can  also  influence  people  by  triggering  or  cueing 
a  decision-making  process.  For  example,  today’s  web  browsers 
launch  a  new  window  to  alert  people  before  they  send  infor¬ 
mation  over  insecure  network  connections.  The  message  win¬ 
dow  serves  as  a  signal  to  consumers  to  rethink  their  planned  ac¬ 
tions.  A  similar  example  exists  in  a  very  different  context.  Cities 
concerned  with  automobile  speeding  in  neighborhoods  can  use 
a  stand-alone  radar  trailer  that  senses  the  velocity  of  an  oncom¬ 
ing  automobile  and  displays  that  speed  on  a  large  screen.  This 
technology  is  designed  to  trigger  a  decision-making  process  re¬ 
garding  driving  speed. 

Computers  That  Simplify  or  Guide  People  Through  a  Process 

By  facilitating  or  simplifying  a  process  for  users,  technology 
can  minimize  barriers  that  may  impede  a  target  behavior.  For  ex¬ 


ample,  in  the  context  of  web  commerce,  technology  can  sim¬ 
plify  a  multistep  process  down  to  a  few  mouse  clicks.  Typically, 
in  order  to  purchase  something  online,  a  consumer  needs  to  se¬ 
lect  an  item,  place  it  in  a  virtual  shopping  cart,  proceed  to 
checkout,  enter  personal  and  billing  information,  and  verify  an 
order  confirmation.  Amazon.com  and  other  e-commerce  com¬ 
panies  have  simplified  this  process  by  storing  customer  infor¬ 
mation  so  that  consumers  need  not  reenter  information  every 
transaction.  By  lowering  the  time  commitment  and  reducing 
the  steps  to  accomplish  a  goal,  these  companies  have  reduced 
the  barriers  for  purchasing  products  from  their  sites.  The  prin¬ 
ciple  used  by  Web  and  other  computer  technology  (Todd  & 
Benbasat,  1994)  is  similar  to  the  dynamic  Ross  and  Nisbett 
(1991)  discussed  on  facilitating  behaviors  through  modifying 
the  situation. 

In  addition  to  reducing  barriers  for  a  target  behavior,  com¬ 
puters  can  also  lead  people  through  processes  to  help  them 
change  attitudes  and  behaviors  (Muehlenhard,  Baldwin,  Bourg, 
&  Piper,  1988;  Tombari,  Fitzpatrick,  &  Childress,  1985).  For  ex¬ 
ample,  a  computer  nutritionist  can  guide  individuals  through  a 
month  of  healthy  eating  by  providing  recipes  for  each  day  and 
grocery  lists  for  each  week.  In  general,  by  following  a  computer- 
led  process,  users  (a)  are  exposed  to  information  they  may  not 
have  seen  otherwise,  and  (b)  are  engaged  in  activities  they  may 
not  have  done  otherwise  (Fogg,  2000,  2003). 

Computers  as  Persuasive  Media 

The  next  area  of  the  Functional  Triad  deals  with  computers  as 
persuasive  media.  Although  “media”  can  mean  many  things, 
here  the  focus  is  on  the  power  of  computer  simulations.  In  this 
role  computer  technology  provides  people  with  experiences, 
either  first-hand  or  vicarious.  By  providing  simulated  experi¬ 
ences,  computers  can  change  people’s  attitudes  and  behaviors. 
Outside  the  world  of  computing,  experiences  have  a  powerful 
impact  on  people’s  attitudes,  behaviors,  and  thoughts  (Reed, 
1996).  Experiences  offered  via  interactive  technology  have  sim¬ 
ilar  effects  (Bullinger,  Roessler,  Mueller-Spahn,  Riva,  &  Wieder- 
hold,  1998;  Fogg,  2000). 

Three  types  of  computer  simulations  are  relevant  to  persua¬ 
sive  technologies: 

•  simulated  cause-and-effect  scenarios 

•  simulated  environments 

•  simulated  objects 

The  paragraphs  that  follow  discuss  each  simulation  type  in 
turn.  (Note  that  other  taxonomies  for  simulations  exist.  For  ex¬ 
ample,  see  Gredler  (1986),  dejong  (1991),  and  Alessi  (1991)). 

Computers  That  Simulate  Cause  and  Effect 

One  type  of  computer  simulation  allows  users  to  vary  the  in¬ 
puts  and  observe  the  effects  (Hennessy  &  O’Shea,  1993) — what 
one  could  call  “cause-and-effect  simulators.”  The  key  to  effec¬ 
tive  cause-and-effect  simulators  is  their  ability  to  demonstrate  the 
consequence  of  actions  immediately  and  credibly  (Alessi,  1991; 
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Balci,  1998;  Balci,  Henrikson,  &  Roberts,  1986;  Crosbie  &  Hay, 
1978;  dejong,  1991;  Hennessy  &  O’Shea,  1993;  Zietsman  &  Hew- 
son,  1986).  These  computer  simulations  give  people  first-hand 
insight  into  how  inputs  (such  as  putting  money  in  a  savings  ac¬ 
count)  affect  an  output  (such  as  accrued  retirement  savings). 
By  allowing  people  to  explore  causes  and  effects  of  situations, 
these  computer  simulations  can  shape  attitudes  and  behaviors. 

Computers  That  Simulate  Environments 

A  second  type  of  computer  simulation  is  the  environment 
simulator.  These  simulators  are  designed  to  provide  users  with 
new  surroundings,  usually  through  images  and  sound.  In  these 
simulated  environments,  users  have  experiences  that  can  lead 
to  attitude  and  behavior  change  (Bullinger  et  al.,  1998),  includ¬ 
ing  experiences  that  are  designed  as  games  or  explorations 
(Lieberman,  1992;  Schlosser  &  Kanifer,  1999;  Schneider,  1985; 
Woodward,  Carnine,  &  Davis,  1986). 

The  efficacy  of  this  approach  is  demonstrated  by  research  on 
the  Tectrrx  Virtual  Reality  Bike  (an  exercise  bike  that  includes  a 
computer  and  monitor  that  shows  a  simulated  world).  Porcari 
and  colleagues  (1998)  found  that  people  using  an  exercise  de¬ 
vice  with  computer  simulation  of  a  passing  landscape  exercised 
harder  than  those  who  used  an  exercise  device  without  simula¬ 
tion.  Both  groups,  however,  felt  that  they  had  exerted  them¬ 
selves  a  similar  amount.  This  outcome  caused  by  simulating  an 
outdoor  experience  mirrors  findings  from  other  research:  peo¬ 
ple  exercise  harder  when  outside  than  inside  a  gym  (Ceci  &  Has- 
smen,  1991). 

Environmental  simulators  can  also  change  attitudes.  Using  a 
virtual  reality  environment  in  which  the  people  saw  and  felt  a 
simulated  spider,  Carlin  and  colleagues  (1997)  were  able  to  de¬ 
crease  the  fear  of  spiders  in  his  participants.  In  this  research, 
participants  wore  a  head-mounted  display  that  immersed  them 
into  a  virtual  room,  and  they  were  able  to  control  both  the  num¬ 
ber  of  spiders  and  their  proximity.  In  this  case  study,  Carlin 
found  that  the  virtual  reality  treatment  reduced  the  fear  of  spi¬ 
ders  in  the  real  world.  Other  similar  therapies  have  been  used 
for  fear  of  flying  (Klein,  1999;  Wiederhold,  Davis,  Wiederhold, 
&  Riva,  1998),  agoraphobia  (Ghosh  &  Marks,  1987),  claustro¬ 
phobia  (Bullinger  et  al.,  1998),  and  fear  of  heights  (Bullinger), 
among  others  (Kirby,  1996). 

Computers  That  Simulate  Objects 

The  third  type  of  computer  simulations  are  “object  simula¬ 
tors.”  These  computerized  devices  simulate  an  object  (as  op¬ 
posed  to  an  environment).  The  Baby  Think  It  Over  infant  simu¬ 
lator  described  earlier  in  this  chapter  is  one  such  device. 
Another  example  is  a  specially  equipped  car  created  by  Chrysler 
Corporation,  designed  to  help  teens  experience  the  effect  of 
alcohol  on  their  driving.  Used  as  part  of  high-school  programs, 
teen  drivers  first  navigate  the  special  car  under  normal  condi¬ 
tions.  Then  the  operator  activates  an  onboard  computer  system, 
which  simulates  how  an  inebriated  person  would  drive — break¬ 
ing  sluggishly,  steering  inaccurately,  and  so  on.  This  computer- 
enhanced  care  provides  teens  with  an  experience  designed  to 


change  their  attitudes  and  behaviors  about  drinking  and  driving. 
Although  the  sponsors  of  this  car  do  not  measure  the  impact 
of  this  intervention,  the  anecdotal  evidence  is  compelling  (i.e., 
see  Machrone,  1998). 

Table  7.2  lists  the  three  types  of  simulations  just  discussed 
above  and  outlines  what  advantage  each  type  of  simulation  of¬ 
fers  as  far  as  persuasion  and  motivation  are  concerned. 

Computers  as  Persuasive  Social  Actors 

The  final  corner  of  the  Functional  Triad  focuses  on  computers 
as  “persuasive  social  actors,”  a  view  of  computers  that  has  only 
recently  become  widely  recognized.  Past  empirical  research  has 
shown  that  individuals  form  social  relationships  with  technology 
even  when  the  stimulus  is  rather  impoverished  (Fogg,  1997;  Mar¬ 
shall  &  Maguire,  1971;  Moon  &  Nass,  1996;  Muller,  1974;  Nass, 
Fogg,  &  Youngme,  1996;  Nass,  Moon,  Fogg,  Reeves,  &  Dryer, 
1995;  Nass  &  Steuer,  1993;  Nass,  Youngme,  Morkes,  Eun-Young, 
&  Eogg,  1997;  Parise,  Kiesler,  Sproull,  &  Waters,  1999;  Quintanar 
Crowell,  &  Pryor  1982;  Reeves  &  Nass,  1996).  For  example,  indi¬ 
viduals  share  reciprocal  relationships  with  computers  (Fogg  & 
Nass,  1997a;  Parise,  Keisler,  Sproull,  &  Waters,  1999),  can  be  flat¬ 
tered  by  computers  (Fogg  &  Nass,  1997b),  and  are  polite  to  com¬ 
puters  (Nass,  Moon,  &  Carney,  1999). 

In  general  we  propose  that,  computers  as  social  actors  can 
persuade  people  to  change  their  attitudes  and  behaviors  by 
(a)  providing  social  support,  (b)  modeling  attitudes  or  behav¬ 
iors,  and  (c)  leveraging  social  rules  and  dynamics  (Fogg,  2003). 

Computers  That  Provide  Social  Support 

Computers  can  provide  a  form  of  social  support  in  order  to 
persuade,  a  dynamic  that  has  long  been  observed  in  human- 

TABLE  7.2.  Captology  Includes  Three  Types 
of  Persuasive  Simulations 

Simulation  lype  Ksy  Advantages 

Cause-and-effect  simulators  •  Allow  users  to  explore  and 

experiment 

•  Show  cause-and-effect  relationships 
clearly  and  quickly 

•  Persuade  without  being  overly 
didactic 

Environment  simulators  •  Can  create  situations  that  reward 

and  motivate  people  for  a  target 
behavior 

•  Allow  rehearsal:  practicing  a  target 
behavior 

•  Can  control  exposure  to  new  or 
frightening  situations 

•  Facilitate  role  playing:  adopting 
another  person's  perspective 

Object  simulators  •  Fit  into  the  context  of  a  person's 

normal  life 

•  Are  less  dependent  on  imagination 
or  suspension  of  disbelief 

•  Make  clear  the  impact  on  normal  life 
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human  interactions  (Jones,  1990).  While  the  potential  for  ef¬ 
fective  social  support  from  computer  technology  has  yet  to  be 
fully  explored,  a  small  set  of  empirical  studies  provide  evidence 
for  this  phenomenon  (Fogg,  1997;  Fogg  &  Nass,  1997b;  Nass 
et  al.,  1996;  Reeves  &  Nass,  1996).  For  example,  computing  tech¬ 
nology  can  influence  individuals  by  providing  praise  or  criticism, 
thus  manipulating  levels  of  social  support  (Fogg  &  Nass,  1997b; 
Muehlenhard  et  al.,  1988). 

Outside  the  research  context,  various  technology  products 
use  the  power  of  praise  to  influence  users.  For  example,  the 
Dole  5  A  Day  CD-ROM,  discussed  earlier,  uses  a  cast  of  over 
30  onscreen  characters  to  provide  social  support  to  users  who 
perform  various  activities.  Characters  such  as  “Bobby  Banana” 
and  “Pamela  Pineapple”  praise  individuals  for  checking  labels  on 
virtual  frozen  foods,  for  following  guidelines  from  the  food 
pyramid,  and  for  creating  a  nutritious  virtual  salad. 

Computers  That  Model  Attitudes  and  Behaviors 

In  addition  to  providing  social  support,  computer  systems 
can  persuade  by  modeling  target  attitudes  and  behaviors.  In  the 
natural  world,  people  learn  directly  through  first-hand  experi¬ 
ence  and  indirectly  through  observation  (Bandura,  1997).  When 
a  behavior  is  modeled  by  an  attractive  individual  or  is  shown  to 
result  in  positive  consequences,  people  are  more  likely  to  en¬ 
act  that  behavior  (Bandura).  Lieberman’s  research  (1997)  on  a 
computer  game  designed  to  model  health-maintenance  behav¬ 
iors  shows  the  positive  effects  that  an  onscreen  cartoon  model 
had  on  those  who  played  the  game.  In  a  similar  way,  the  product 
‘Alcohol  101”  (www.centurycouncil.org/underage/education/ 
alOl.cfm)  uses  navigable  onscreen  video  clips  of  human  actors 
dealing  with  problematic  situations  that  arise  during  college 
drinking  parties.  The  initial  studies  on  the  Alcohol  101  inter¬ 
vention  show  positive  outcomes  (Reis,  1998).  Computer-based 
characters,  whether  artistically  rendered  or  video  images,  are  in¬ 
creasingly  likely  to  serve  as  models  for  attitudes  and  behaviors. 

Computers  That  Leverage  Social  Rules  and  Dynamics 

Computers  have  also  been  shown  to  be  effective  persuasive 
social  actors  when  they  leverage  social  rules  and  dynamics 
(Fogg,  1997;  Friedman  &  Grudin,  1998;  Marshall  &  Maguire, 
1971;  Parise  et  al.,  1999).  These  rules  include  turn  taking,  po¬ 


liteness  norms,  and  sources  of  praise  (Reeves  &  Nass,  1996). 
The  rule  of  reciprocity —  that  we  must  return  favors  to  others — 
is  among  the  most  powerful  social  rule  (Gouldner,  I960)  and  is 
one  that  has  also  been  shown  to  have  force  when  people  inter¬ 
act  with  computers.  Fogg  and  Nass  (1997a)  showed  that  peo¬ 
ple  performed  more  work  and  better  work  for  a  computer  that 
assisted  them  on  a  previous  task.  In  essence,  users  reciprocated 
help  to  a  computer.  On  the  retaliation  side,  the  inverse  of  reci¬ 
procity,  the  research  showed  that  people  performed  lower  qual¬ 
ity  work  for  a  computer  that  had  served  them  poorly  in  a  previ¬ 
ous  task.  In  a  related  vein.  Moon  (1998)  found  that  individuals 
followed  rules  of  impression  management  when  interacting 
with  a  computer.  Specifically,  when  individuals  believed  that  the 
computer  interviewing  them  was  in  the  same  room,  they  pro¬ 
vided  more  honest  answers,  compared  to  interacting  with  a 
computer  believed  to  be  a  few  miles  away.  In  addition,  subjects 
were  more  persuaded  by  the  proximate  computer. 

The  previous  paragraphs  outline  some  of  the  early  demon¬ 
strations  of  computers  as  social  actors  that  motivate  and  influ¬ 
ence  people  in  predetermined  ways,  often  paralleling  research 
from  long-standing  human-human  research. 

Functional  Triad  Summary 

Table  7.3  summarizes  the  Functional  Triad  and  the  persuasive 
affordances  that  each  element  offers. 

In  summary,  the  Functional  Triad  can  be  a  useful  framework 
in  captology  the  study  of  computers  as  persuasive  technologies. 
It  makes  explicit  how  a  technology  can  change  attitudes  and  be¬ 
haviors — either  by  increasing  a  person’s  capability,  by  providing 
users  with  an  experience,  or  by  leveraging  the  power  of  social  re¬ 
lationships.  Each  of  these  paths  suggests  related  persuasion 
strategies,  dynamics,  and  theories.  One  element  that  is  common 
to  all  three  functions  is  the  role  of  credibility.  Credible  tools, 
credible  media,  and  credible  social  actors  will  all  lead  to  in¬ 
creased  power  to  persuade.  This  is  the  focus  of  the  next  section. 


COMPUTERS  AND  CREDIBILITY 


One  key  issue  in  captology  is  computer  credibility,  a  topic  that 
suggests  questions  such  as,  “Do  people  find  computers  to  be 
credible  sources?,”  “"What  aspects  of  computers  boost  credibil- 


TABLE  7.3.  Captology  Outlines  Three  Ways  That  Computers  Influence  People 


Function 

Essence 

Persuasive  Affordances 

Computer  as  tool  or  instrument 

Increases  capabilities 

•  Reduces  barriers  (time,  effort,  cost) 

•  Increases  self-efficacy 

•  Provides  information  for  better  decision  making 

•  Changes  mental  models 

Computer  as  medium 

Provides  experiences 

•  Provides  first-hand  learning,  insight,  visualization,  resolve 

•  Promotes  understanding  of  cause-and-effect  relationships 

•  Motivates  through  experience,  sensation 

Computer  as  social  actor 

Creates  relationship 

•  Establishes  social  norms 

•  Invokes  social  rules  and  dynamics 

•  Provides  social  support  or  sanction 
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ity?,”  and  “How  do  computers  gain  and  lose  credibility?”  Under¬ 
standing  the  elements  of  computer  credibility  promotes  a 
deeper  understanding  of  how  computers  can  change  attitudes 
and  behaviors,  as  credibility  is  a  key  element  in  many  persuasion 
processes  (Gahm,  1986;  Lerch  &  Prietula,  1989;  Lerch,  Prietula, 
&  Kulik,  1997). 

Credibility  has  been  a  topic  of  social  science  research  since 
the  1930s  (for  reviews,  see  Petty  &  Cacioppo,  1981;  Self,  1996). 
Virtually  all  credibility  researchers  have  described  credibility  as  a 
perceived  quality  made  up  of  multiple  dimensions  (i.e.,  Buller  & 
Burgoon,  1996;  Gatignon  &  Robertson,  1991;  Petty  &  Cacioppo, 
1981;  Self,  1996;  Stiff,  1994).  This  description  has  two  key  com¬ 
ponents  germane  to  computer  credibility.  First,  credibility  is  a 
perceived  quality;  it  does  not  reside  in  an  object,  a  person,  or  a 
piece  of  information.  Therefore,  in  discussing  the  credibility  of 
a  computer  product,  one  is  always  discussing  the  perception  of 
credibility  for  the  computer  product. 

Next,  researchers  generally  agree  that  credibility  perceptions 
result  from  evaluating  multiple  dimensions  simultaneously.  Al¬ 
though  the  literature  varies  on  exactly  how  many  dimensions 
contribute  to  the  credibility  construct,  the  majority  of  re¬ 
searchers  identify  trustworthiness  and  expertise  as  the  two  key 
components  of  credibility  (Self,  1996).  Trustworthiness,  a  key  el¬ 
ement  in  the  credibility  calculus,  is  described  by  the  terms  well 
intentioned,  truthful,  unbiased,  and  so  on.  The  trustworthiness 
dimension  of  credibility  captures  the  perceived  goodness  or 
morality  of  the  source.  Expertise,  the  other  dimension  of  credi¬ 
bility,  is  described  by  terms  such  as  knowledgeable,  experienced, 
competent,  and  so  on.  The  expertise  dimension  of  credibility 
captures  the  perceived  knowledge  and  skill  of  the  source. 

Extending  research  on  credibility  to  the  domain  of  comput¬ 
ers,  we  have  proposed  that  highly  credible  computer  products 
will  be  perceived  to  have  high  levels  of  both  trustworthiness 
and  expertise  (Fogg  &  Tseng,  1999).  In  evaluating  credibility,  a 
computer  user  will  assess  the  computer  product’s  trustworthi¬ 
ness  and  expertise  to  arrive  at  an  overall  credibility  assessment. 

When  Does  Credibility  Matter? 

Credibility  is  a  key  component  in  bringing  about  attitude 
change.  Just  as  credible  people  can  influence  other  people, 
credible  computing  products  also  have  the  power  to  persuade. 
Computer  credibility  is  not  an  issue  when  there  is  no  aware¬ 
ness  of  the  computer  itself  or  when  the  dimensions  of  com¬ 
puter  credibility — trustworthiness  and  expertise — are  not  at 
stake.  In  these  cases  computer  credibility  does  not  matter  to  the 
user.  However,  in  many  cases  credibility  is  key.  The  following 
seven  categories  outline  when  credibility  matters  in  HCI  (Tseng 
&  Fogg,  1999). 

1.  When  computers  act  as  a  knowledge  repository. 

Credibility  matters  when  computers  provide  data  or  knowledge 
to  users.  The  information  can  be  static  information,  such  as  sim¬ 
ple  web  pages  or  an  encyclopedia  on  CD-ROM.  But  computer 
information  can  also  be  dynamic.  Computers  can  tailor  infor¬ 
mation  in  real  time  for  users,  such  as  providing  information  that 
matches  interests,  personality,  or  goals.  In  such  cases,  users  may 
question  the  credibility  of  the  information  provided. 


2.  When  computers  instruct  or  tutor  users.  Computer 
credibility  also  matters  when  computers  give  advice  or  provide 
instructions  to  users.  Sometimes  it’s  obvious  why  computers 
give  advice.  For  example,  auto-navigation  systems  give  advice 
about  which  route  to  take,  and  online  help  systems  advise  users 
on  how  to  solve  a  problem.  These  are  clear  instances  of  com¬ 
puters  giving  advice.  However,  at  times  the  advice  from  a  com¬ 
puting  system  is  subtle.  For  example,  interface  layout  and  menu 
options  can  be  a  form  of  advice.  Consider  a  default  button  on  a 
dialogue  box.  The  fact  that  one  option  is  automatically  selected 
as  the  default  option  suggests  that  certain  paths  are  more  likely 
or  profitable  for  most  users.  One  can  imagine  that  if  the  default 
options  are  poorly  chosen,  the  computer  program  could  lose 
some  credibility. 

3.  When  computers  report  measurements.  Computer 
credibility  is  also  at  stake  when  computing  devices  act  as  mea¬ 
suring  instruments.  These  can  include  engineering  measure¬ 
ments  (i.e.,  an  oscilloscope),  medical  measurements  (i.e.,  a  glu¬ 
cose  monitor),  geographical  measurements  (i.e.,  devices  with 
GPS  technology),  and  others.  In  this  area  we  observed  an  inter¬ 
esting  phenomenon  in  the  1990s  when  digital  test  and  mea¬ 
surement  equipment  was  created  to  replace  traditional  analog 
devices.  Many  engineers,  usually  those  with  senior  status,  did 
not  trust  the  information  from  the  digital  devices.  As  a  result, 
some  engineers  rejected  the  convenience  and  power  of  the  new 
technology  because  their  old  analog  equipment  gave  informa¬ 
tion  they  found  more  credible. 

4.  When  computers  report  on  work  performed. 

Computers  also  need  credibility  when  they  report  to  users  on 
work  the  computer  has  performed.  For  example,  computers 
report  the  success  of  a  software  installation  or  the  eradication  of 
viruses.  In  these  cases  and  others,  the  credibility  of  the  com¬ 
puter  is  at  issue  if  the  work  the  computer  reports  do  not  match 
what  actually  happened.  For  example,  suppose  a  user  runs  a 
spell  check  and  the  computer  reports  no  misspelled  words.  If 
the  user  later  finds  a  misspelled  word,  then  the  credibility  of  the 
program  will  suffer. 

5.  When  computers  report  about  their  own  state. 

Computers  also  report  their  own  state,  and  these  reports  have 
credibility  implications.  For  example,  computers  may  report 
how  much  disk  space  they  have  left,  how  long  their  batteries 
will  last,  how  long  a  process  will  take,  and  so  on.  A  computer 
reporting  about  its  own  state  raises  issues  about  its  competence 
in  conveying  accurate  information  about  itself,  which  is  likely 
to  affect  user  perceptions  of  credibility. 

6.  When  computers  run  simulations.  Credibility  is  also 
important  when  computers  run  simulations.  This  includes  sim¬ 
ulations  of  aircraft  navigation,  chemical  processes,  social  dy¬ 
namics,  nuclear  disasters,  and  so  on.  Simulations  can  show 
cause-and-effect  relationships,  such  as  the  progress  of  a  disease 
in  a  population  or  the  effects  of  global  warming.  Similarly,  sim¬ 
ulations  can  replicate  the  dynamics  of  an  experience,  such  as 
piloting  an  aircraft  or  caring  for  a  baby.  Based  on  rules  that 
humans  provide,  computer  simulations  can  be  flawed  or  biased. 
Even  if  the  bias  is  not  intentional,  when  users  perceive  that  the 
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computer  simulation  lacks  veridicality,  the  computer  application 
will  lose  credibility. 

7.  When  computers  render  virtual  environments. 

Related  to  simulations  is  the  computer’s  ability  to  create  virtual 
environments  for  users.  Credibility  is  important  in  making  these 
environments  believable,  useful,  and  engaging.  However,  virtual 
environments  don’t  always  need  to  match  the  physical  world; 
they  simply  need  to  model  what  they  propose  to  model.  For  ex¬ 
ample,  like  good  fiction  or  art,  a  virtual  world  for  a  fanciful  arcade 
game  can  be  highly  credible  if  the  world  is  internally  consistent. 

Web  Credibility  Research  and  Guidelines  for  Design 

■When  it  comes  to  credibility,  the  'Web  is  unusual.  The  "Web  can 
be  the  most  credible  source  of  information,  and  the  "Web  can 
be  among  the  least  credible  sources.  Limitations  inherent  to  tra¬ 
ditional  media — most  notably  modality,  interactivity,  and  space 
limitations — are  often  avoidable  on  the  'Web.  As  a  result,  online 
information  has  the  potential  to  be  more  complete  and  en¬ 
hanced  by  interactive  options  for  users  to  more  thoughtfully 
process  what  they  read. 

However,  this  potential  is  accompanied  by  several  features  of 
the  "Web  that  can  erode  its  credibility  as  a  medium  (Danielson, 
2005).  First,  the  'Web  lacks  the  traditional  gate  keeping  and 
quality-control  mechanisms  that  are  commonplace  to  more  tra¬ 
ditional  publishing,  such  as  editing  and  fact  checking.  Second, 
because  digital  information  can  be  manipulated,  disseminated, 
and  published  with  relative  ease,  online  information  seekers 
must  learn  to  account  for  incorrect  information  being  widely 
and  quickly  duplicated  (Metzger,  Flanagin,  &  Zwarun,  2003),  as 
in  the  case  of  ubiquitous  “Internet  hoaxes.”  Third,  where  in 
most  media  environments  prior  to  the  Web  and  in  face-to-face 
interactions  the  speaker  or  writer  of  proposed  ideas  and  facts 
was  typically  clear  to  the  listener  or  reader,  source  ambiguity  is 
often  the  rule  rather  than  the  exception  in  web  information 
seeking.  Finally,  as  with  any  new  media  technology,  the  Web  re¬ 
quires  users  to  develop  new  skills  when  evaluating  various 
claims  (Greer,  2003),  as  in  the  case  of  checking  Uniform  Re¬ 
source  Locators  (URLs)  as  an  indicator  of  site  credibility. 

Many  Websites  today  offer  users  low-quality — or  outright 
misleading —  information.  As  a  result,  credibility  has  become  a 
major  concern  for  those  seeking  or  posting  information  on  the 
Web  (Burbules,  2001;  Caruso,  1999;  Johnson  &  Kaye,  1998;  Kil¬ 
gore,  1998;  McDonald,  Schumann,  &  Thorson,  1999;  Nielsen, 
1997;  Sullivan,  1999).  Web  users  are  becoming  more  skeptical  of 
what  they  find  online  and  may  be  wary  of  Web-based  experi¬ 
ences  in  general. 

There’s  a  direct  connection  between  web  credibility  and  per¬ 
suasion  via  the  Web.  'When  a  site  gains  credibility,  it  also  gains 
the  power  to  change  attitudes,  and,  at  times,  behaviors.  'When 
a  Website  lacks  credibility,  it  will  not  be  effective  in  persuading 
or  motivating  users.  In  few  arenas  is  this  connection  more  direct 
than  in  e-commerce,  where  various  online  claims  and  promises 
about  products  and  services  provide  the  primary  or  sole  basis 
for  buying  decisions. 

As  part  of  the  Persuasive  Technology  Lab,  we  have  been  in¬ 
vestigating  factors  influencing  Website  credibility  and  user 


strategies  for  making  such  assessments.  A  general  framework 
for  research  in  this  relatively  young  field  is  captured  in  Fogg’s 
Prominence-Interpretation  Theory  (Fogg,  2002,  2003).  Credibil¬ 
ity  assessment  is  an  iterative  process  driven  by  (a)  the  likelihood 
that  particular  Website  elements  (such  as  its  privacy  policy,  ad¬ 
vertisements,  attractiveness,  etc.)  will  be  noticed  by  an  infor¬ 
mation  seeker  (prominence'),  and  (b)  the  value  that  element 
will  be  assigned  by  the  user  in  making  a  credibility  judgment 
(i.e.,  increases  or  decreases  perceived  credibility)  (interpreta¬ 
tion).  Several  factors  can  influence  the  likelihood  of  an  element 
being  noticed,  including  the  user’s  level  of  involvement,  the  sig¬ 
nificance  of  the  information  sought,  and  the  user’s  level  of  Web 
experience,  domain  expertise,  and  other  individual  differences. 
Similarly,  interpretation  is  influenced  by  such  individual  and 
contextual  factors.  Noticeable  Website  elements  are  evaluated 
until  either  the  user  is  satisfied  with  an  overall  credibility  as¬ 
sessment,  or  a  constraint  (often  associated  with  lack  of  time  or 
motivation)  is  reached. 

Perhaps  more  than  with  any  other  medium.  Web-interaction 
designers  face  increasing  challenges  to  design  Web  experiences 
that  first  and  foremost  hold  the  attention  and  motivation  of  in¬ 
formation  seekers;  the  second  hill  to  climb  is  in  persuading  Web 
users  to  adopt  specific  behaviors,  such  as  the  following: 

•  register  personal  information 

•  purchase  things  online 

•  fill  out  a  survey 

•  click  on  the  ads 

•  set  up  a  virtual  community 

•  download  software 

•  bookmark  the  site  and  return  often 

If  web  designers  can  influence  people  to  perform  these  ac¬ 
tions,  they  have  been  successful.  These  are  key  behavioral  out¬ 
comes.  But  what  do  users  notice  when  evaluating  web  content, 
and  how  are  those  noticed  elements  interpreted?  'What  makes 
a  Website  credible?  We  offer  the  following  broad  guidelines,  aris¬ 
ing  out  of  our  lab’s  experimental  work: 

Guideline  #1:  Design  websites  to  convey  the  “real 
world”  aspect  of  the  organization.  Perhaps  the  most  ef¬ 
fective  way  to  enhance  the  credibility  of  a  Website  is  to  include 
elements  that  highlight  the  brick-and-mortar  nature  of  the  or¬ 
ganization  it  represents.  Despite  rampant  source  ambiguity  on 
the  Web,  web  users  show  a  strong  reliance  on  indicators  of 
identity  (Rieh,  2002),  including  credentials,  photos,  and  con¬ 
tact  information  (Fogg,  Marshall,  Laraki,  Osipovich,  'Varma, 
Fang,  et  al.,  2001).  The  overall  implication  seems  clear:  To  create 
a  site  with  maximum  credibility,  designers  should  highlight  fea¬ 
tures  that  communicate  the  legitimacy  and  accessibility  of  the 
organization. 

Guideline  #2:  Invest  resources  in  visual  design. 

Web  users  depend  to  a  surprisingly  large  degree  on  the  visual 
design  of  Websites  when  making  credibility  judgments.  In  one 
study,  we  found  “design  look”  to  be  the  single  most  mentioned 
category  by  a  sample  of  more  than  2,800  users  when  evaluating 
the  credibility  of  sites  across  a  wide  variety  of  domains  (Fogg, 
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Soohoo,  Danielson,  Marable,  Stanford,  &  Tauber,  2003).  Similar 
to  the  assessment  of  human  communicators,  the  attractiveness 
and  professional  design  of  a  Website  is  often  used  as  a  first  in¬ 
dicator  of  credibility. 

Guideline  #3=  Make  Websites  easy  to  use.  In  the  HCI 

community  we  have  long  emphasized  ease  of  use,  so  a  guideline 
advocating  ease  of  use  is  not  new.  However,  our  work  adds  an¬ 
other  important  reason  for  making  Websites  usable:  it  will  en¬ 
hance  the  site’s  credibility.  In  one  study  (Fogg  et  al.,  2001),  people 
awarded  a  Website  credibility  points  for  being  usable  (i.e.,  “The 
site  is  arranged  in  a  way  that  makes  sense  to  you”),  and  they  de¬ 
ducted  credibility  points  for  ease-of-use  problems  (i.e.,  “the  site  is 
difficult  to  navigate”).  "While  this  information  should  not  change 
how  we,  as  HCI  professionals,  design  user  experiences  for  the 
Web,  it  does  add  a  compelling  new  reason  for  investing  time  and 
money  in  usable  design — it  makes  a  site  more  credible.  Going  be¬ 
yond  the  data,  one  could  reasonably  conclude  that  a  simple,  us¬ 
able  Website  would  be  perceived  as  more  credible  than  a  site  that 
has  extravagant  features  but  is  lacking  in  usability. 

Guideline  #4:  Include  markers  of  expertise.  Exper¬ 
tise  is  a  key  component  in  credibility,  and  our  work  supports 
the  idea  that  Websites  that  convey  expertise  can  gain  credibility 
in  users’  eyes.  Important  “expertise  elements”  include  listing 
an  author’s  credentials  and  including  citations  and  references. 
It’s  likely  that  many  other  elements  also  exist.  Many  Websites 
today  miss  opportunities  to  convey  legitimately  expertise  to 
their  users. 

Guideline  #5:  Include  markers  of  trustworthiness. 

Trustworthiness  is  another  key  component  in  credibility.  As  with 
expertise.  Website  elements  that  convey  trustworthiness  will  lead 
to  increased  perceptions  of  credibility.  Such  elements  include  link¬ 
ing  to  outside  materials  and  sources,  stating  a  policy  on  content, 
and  so  on.  Making  information  verifiable  on  a  Website  increases 
credibility  despite  the  fact  that  users  are  unlikely  to  follow  through 
on  verification  (Metzger  et  al.,  2003).  Thus,  the  mere  presence  of 
some  design  elements  will  influence  user  perceptions.  We  pro¬ 
pose  that  Website  designers  who  concentrate  on  conveying  the 
honest,  unbiased  nature  of  their  Website  will  end  up  with  a  more 
credible — and  therefore  more  effective — ^Website. 

Guideline  #6:  Tailor  the  user  experience.  Although 
not  as  vital  as  the  previous  suggestions,  tailoring  does  make  a 
difference.  Our  work  shows  that  tailoring  the  user  experience 
on  a  Website  leads  to  increased  perceptions  of  web  credibility. 
For  example,  people  think  a  site  is  more  credible  when  it  ac¬ 
knowledges  that  the  individual  has  visited  it  before.  To  be  sure, 
tailoring  and  personalization  can  take  place  in  many  ways.  Tai¬ 
loring  extends  even  to  the  type  of  ads  shown  on  the  page:  ads 
that  match  what  the  user  is  seeking  seem  to  increase  the  per¬ 
ception  of  Website  credibility. 

Guideline  #7.  Avoid  overly  commercial  elements  on 
a  Website.  Although  most  Websites,  especially  large  Websites, 
exist  for  commercial  purposes,  our  work  suggests  that  users 
penalize  sites  that  have  an  aggressively  commercial  flavor.  For 
example,  web  pages  that  mix  ads  with  content  to  the  point  of 


confusing  readers  will  be  perceived  as  not  credible.  Fogg  et  al. 
(2001)  found  that  mixing  ads  and  content  received  the  most 
negative  response  of  all.  However,  it  is  important  to  note  that 
ads  don’t  always  reduce  credibility.  In  this  study  and  elsewhere 
(Kim,  1999),  quantitative  research  shows  that  banner  ads  done 
well  can  enhance  the  perceived  credibility  of  a  site.  It  seems  rea¬ 
sonable  that,  as  with  other  elements  of  people’s  lives,  we  accept 
commercialization  to  an  extent  but  become  wary  when  it  is 
overdone. 

Guideline  #8.  Avoid  the  pitfalls  of  amateurism. 

Most  web  designers  seek  a  professional  outcome  in  their  work. 
Organizations  that  care  about  credibility  should  be  ever  vigi¬ 
lant — and  perhaps  obsessive — to  avoid  small  glitches  in  their 
Websites.  These  “small”  glitches  seem  to  have  a  large  impact  on 
web  credibility  perceptions.  Even  one  typographical  error  or  a 
single  broken  link  is  damaging.  While  designers  may  face  pres¬ 
sures  to  create  dazzling  technical  features  on  Websites,  failing  to 
correct  small  errors  undermines  that  work. 

Despite  the  growing  body  of  research,  much  remains  to  be 
discovered  about  web  credibility.  The  study  of  web  credibility 
needs  to  be  an  ongoing  concern  because  three  things  continue 
to  evolve:  (a)  Web  technology,  (b)  the  type  of  people  using  the 
Web,  and  (c)  people’s  experiences  with  the  Web.  Fortunately, 
what  researchers  learn  about  designing  for  web  credibility  can 
translate  into  credible  experiences  in  other  high-tech  devices 
that  share  information,  from  mobile  phones  to  gas  pumps. 


POSITIVE  AND  NEGATIVE  APPLICATIONS 
OF  PERSUASIVE  TECHNOLOGY 


As  the  power  of  persuasive  techniques  becomes  more  under¬ 
stood,  we  are  beginning  to  see  more  examples  of  persuasive 
technologies  being  created.  Many  have  positive  goals  in  mind, 
but  there  are  also  many  technologies  designed  to  negatively  in¬ 
fluence  attitudes  and  behaviors. 

An  example  of  a  positive  technology  is  a  mobile  application 
called  MyFoodPhone.  Mobile  persuasive  devices  have  several 
unique  properties  that  may  improve  their  abilities  to  persuade. 
First,  they  are  personal  devices:  people  carry  their  mobile 
phones  everywhere,  customize  them,  and  store  personal  infor¬ 
mation  in  them.  Second,  intrinsic  to  them  being  mobile,  these 
devices  have  the  potential  to  intervene  at  the  right  moment,  a 
concept  called  kairos. 

MyFoodPhone  is  an  application  for  the  camera  phone  that 
helps  people  watch  what  they  eat — ^whether  they  want  to  change 
their  weight  or  just  eat  right.  "Whenever  a  user  is  concerned  with 
an  item  they  are  about  to  eat,  they  simply  take  a  picture  of  it  with 
their  camera,  then  use  MyFoodPhone  to  send  it  to  a  system  that 
shares  the  images  with  a  expert  dietician.  The  user  receives  expert 
evaluation  and  feedback.  In  this  case,  the  simplicity  and  appropri¬ 
ate  timing  of  the  application  make  it  a  powerful  persuasive  tool. 

On  the  Web,  GoDaddy  (www.godaddy.com),  a  popular  Web¬ 
hosting  company,  attempts  to  persuade  users  to  purchase  more 
expensive  hosting  solutions  by  “disguising”  links  to  their  less- 
expensive  plans  with  plain  text  links,  while  links  to  more  pricey 
upgrades  are  in  large,  brightly  colored  buttons. 
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A  more  negative  example  can  be  found  in  the  rise  in  “Pro 
Anorexia”  Websites,  encouraging  self-starvation  and  sharing  tips 
for  losing  weight.  Though  they  reached  their  height  in  earlier 
part  of  the  decade,  many  of  these  sites  are  still  being  operated. 
By  creating  social  networks  around  it,  people  suffering  from 
anorexia  are  supported  and  encouraged  to  continue  their  un¬ 
healthy  habits.  Many  of  these  Websites  use  the  web  credibility 
techniques  discussed  earlier:  the  sites  are  well  designed  and 
contain  expert  advice. 

As  the  power  of  persuasive  technologies  becomes  more  un¬ 
derstood,  the  consideration  of  the  ethical  ramifications  of  these 
technologies  becomes  essential. 


THE  ETHICS  OF  COMPUTING  SYSTEMS 
DESIGNED  TO  PERSUADE 


In  addition  to  research  and  design  issues,  captology  addresses 
the  ethical  issues  that  arise  from  design  or  distributing  persua¬ 
sive  interactive  technologies.  Persuasion  is  a  value-laden  activity. 
By  extension,  creating  or  distributing  an  interactive  technology 
that  attempts  to  persuade  is  also  value  laden.  Ethical  problems 
arise  when  the  values,  goals,  and  interests  of  the  creators  don’t 
match  with  those  of  the  people  who  use  the  technology.  HCI 
professionals  can  ask  a  few  key  questions  to  get  insight  into  pos¬ 
sible  ethical  problem  areas: 

•  Does  the  persuasive  technology  advocate  what’s  good  and  fair? 

•  Is  the  technology  inclusive,  allowing  access  to  all,  regardless 
of  social  standing? 

•  Does  it  promote  self-determination? 

•  Does  it  represent  what’s  thought  to  be  true  and  accurate? 

Answering  no  to  any  of  these  questions  suggests  the  persua¬ 
sive  technology  at  hand  could  be  ethically  questionable  and  per¬ 
haps  downright  objectionable  (for  a  longer  discussion  on  ethics, 
see  Friedman  &  Kahn,  later  in  this  volume). 

’While  it’s  clear  that  deception  and  coercion  are  unethical  in 
computing  products,  some  behavior  change  strategies  such  as 
conditioning,  surveillance,  and  punishment,  are  less  cut  and  dry. 
For  example.  Operant  conditioning — a  system  of  rewards — can 
powerfully  shape  behaviors.  By  providing  rewards,  a  computer 
product  could  get  people  to  perform  new  behaviors  without 
their  clear  consent  or  without  them  noticing  the  forces  of  in¬ 
fluence  at  work. 

Surveillance  is  another  common  and  effective  way  to  change 
behavior.  People  who  know  they  are  being  watched  behave  dif¬ 
ferently.  Today,  computer  technologies  allow  surveillance  in 


ways  that  were  never  before  possible,  giving  institutions  re¬ 
markable  new  powers.  Although  advocates  of  computer-based 
employee  surveillance  (i.e.,  DeTienne,  1993)  say  that  monitor¬ 
ing  can  “inspire  employees  to  achieve  excellence,”  they  and  op¬ 
ponents  agree  that  such  approaches  can  hurt  morale  or  create  a 
more  stressful  workplace.  ’When  every  keystroke  and  every  re¬ 
stroom  break  is  monitored  and  recorded,  employees  may  feel 
they  are  part  of  an  electronic  sweatshop. 

Another  area  of  concern  is  when  technologies  use  punish¬ 
ment — or  threats  of  punishment — to  shape  behaviors.  Al¬ 
though  punishment  is  an  effective  way  to  change  outward  be¬ 
haviors  in  the  short  term,  punishment  has  limited  outcomes 
beyond  changing  observable  behavior,  and  many  behavior 
change  experts  frown  on  using  it.  The  problems  with  punish¬ 
ment  increase  when  a  computer  product  punishes  people.  The 
punishment  may  be  excessive  or  inappropriate  to  the  situation. 
Also,  the  long-term  effects  of  punishment  are  likely  to  be  nega¬ 
tive.  In  these  cases,  who  bears  responsibility  for  the  outcome? 

Discussed  elsewhere  in  more  detail  (Berdichevsky,  1999;  Fogg, 
1998,  2003),  those  who  create  or  distribute  persuasive  technolo¬ 
gies  have  a  responsibility  to  examine  the  moral  issues  involved. 


PERSUASIVE  TECHNOLOGY: 
POTENTIAL  AND  RESPONSIBILITY 


Computer  systems  are  now  becoming  a  common  part  of  every¬ 
day  life.  'Whatever  the  form  of  the  system,  from  a  desktop  com¬ 
puter  to  a  smart  car  interior  to  a  mobile  phone,  these  interactive 
experiences  can  be  designed  to  influence  our  attitudes  and  af¬ 
fect  our  behaviors.  They  can  motivate  and  persuade  by  merg¬ 
ing  the  power  of  computing  with  the  psychology  of  persuasion. 

’We  humans  are  still  the  supreme  agents  of  influence — and  this 
won’t  change  any  time  soon.  Computers  are  not  yet  as  effective  as 
skilled  human  persuaders  are,  but  at  times  computing  technology 
can  go  beyond  what  humans  can  do.  Computers  never  forget, 
they  don’t  need  to  sleep,  and  they  can  be  programmed  to  never 
stop  trying.  For  better  or  worse,  computers  provide  us  with  a  new 
avenue  for  changing  how  people  think  and  act. 

To  a  large  extent,  we  as  a  community  of  HCI  professionals 
will  help  create  the  next  generation  of  technology  products,  in¬ 
cluding  those  products  designed  change  people’s  attitudes  and 
behaviors.  If  we  take  the  right  steps — raising  awareness  of  per¬ 
suasive  technology  in  the  general  public  and  encouraging  tech¬ 
nology  creators  to  follow  guidelines  for  ethical  interactive  tech¬ 
nologies — we  may  well  see  persuasive  technology  reach  its 
potential,  enhancing  the  quality  of  life  for  individuals,  commu¬ 
nities,  and  society. 
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HUMAN  ERROR 


We  are  all  familiar  with  the  annoyance  of  errors  we  make  with 
everyday  devices,  such  as  turning  on  the  heat  under  on  an 
empty  kettle,  or  making  mistakes  in  the  programming  sequence 
of  our  videocassette  recorders.  People  have  a  tendency  to  blame 
themselves  for  “human  error.”  However,  the  use  and  abuse  of 
the  term  has  led  some  to  question  the  very  notion  of  “human 
error”  (Wagenaar  &  Groeneweg,  1988).  “Human  error”  is  often 
invoked  in  the  absence  of  technological  explanations.  Chapa- 
nis  (1999)  wrote  that  back  in  the  1940s  he  noted  that  “pilot  er¬ 
ror”  was  really  “designer  error.”  This  was  a  challenge  to  con¬ 
temporary  thinking,  and  showed  that  design  is  all-important  in 
human-error  reduction.  Chapanis  became  interested  in  why  pi¬ 
lots  often  retracted  the  landing  gear  instead  of  the  landing  flaps 
after  landing  the  aircraft.  He  identified  the  problem  as  designer 
error  rather  than  pilot  error,  as  the  designer  had  put  two  iden¬ 
tical  toggle  switches  side  by  side — one  for  the  landing  gear,  the 
other  for  the  flaps.  Chapanis  proposed  that  the  controls  should 
be  separated  and  coded.  The  separation  and  coding  of  controls 
is  now  standard  human-factors  practice.  Half  a  century  after 
Chapanis’s  original  observations,  the  idea  that  one  can  design 
error-tolerant  devices  is  beginning  to  gain  credence  (Baber  & 
Stanton,  1994).  One  can  argue  that  human  error  is  not  a  simple 
matter  of  one  individual  making  one  mistake,  so  much  as  the 
product  of  a  design  which  has  permitted  the  existence  and  con¬ 
tinuation  of  specific  activities  which  could  lead  to  errors  (Rea¬ 
son,  1990). 

Human  error  is  an  emotive  topic  and  psychologists  have 
been  investigating  its  origins  and  causes  since  the  dawn  of  the 
discipline  (Reason,  1990).  Traditional  approaches  have  attrib¬ 
uted  errors  to  individuals.  Indeed,  so-called  “Freudian  slips” 
were  considered  the  unwitting  revelation  of  intention,  errors  re¬ 
vealing  what  a  person  was  really  thinking  but  did  not  wish  to  dis¬ 
close.  More  recently,  cognitive  psychologists  have  considered 
the  issues  of  error  classification  and  explanation  (Senders  & 
Moray,  1991).  The  taxonomic  approaches  ofNorman  (1988)  and 
Reason  (1990)  have  fostered  the  development  and  formal  defi¬ 
nition  of  several  categories  of  human  error  (e.g.,  capture  errors, 
description  errors,  data  driven  errors,  associated  activation  er¬ 
rors,  and  loss  of  activation  errors)  while  the  work  of  Reason 
(1990)  and  Wickens  (1992)  attempted  to  understand  the  psy¬ 
chological  mechanisms  which  combine  to  cause  errors  (e.g., 
failure  of  memory,  poor  perception,  errors  of  decision  making, 
and  problems  of  motor  execution).  Reason  (1990)  in  particular 
has  argued  that  we  need  to  consider  the  activities  of  the  indi¬ 
vidual  if  we  are  to  be  able  to  identify  what  may  go  wrong.  Rather 
than  viewing  errors  as  unpredictable  events,  this  approach  re¬ 
gards  them  to  be  wholly  predictable  occurrences  based  on  an 
analysis  of  an  individual’s  activities.  Reason’s  definition  pro¬ 
poses  that  errors  are  “those  occasions  in  which  a  planned  se¬ 
quence  of  mental  or  physical  activities  fail  to  achieve  its  intended 
outcome,  [and]  when  these  failures  cannot  be  attributed  to  the 
intervention  of  some  chance  agency.”  (p.  9) 

If  errors  are  no  longer  to  be  considered  as  random  occur¬ 
rences,  then  it  follows  that  we  should  be  able  to  identify  them 
and  predict  their  likelihood.  The  impetus  to  achieve  this  has 
been  fueled  in  the  wake  of  several  recent  and  significant  inci¬ 


dents,  most  notably  in  the  nuclear  industry  where  there  now  ex¬ 
ists  several  human-error  identification  (HEI)  techniques.  The 
aims  of  this  chapter  are  to, 

1.  Consider  human-error  classifications; 

2.  Look  at  systems  approaches  to  human  error; 

3.  Consider  how  human  error  can  be  predicted; 

4.  Examine  the  validation  evidence;  and 

5.  Look  at  human  error  in  the  context  of  design. 


HUMAN-ERROR  CLASSIFICATION 


The  development  of  formal  human-error  classification  schemes 
has  assisted  in  the  anticipation  and  analysis  of  error.  The  antici¬ 
pation  of  error  has  come  about  through  the  development  of  for¬ 
mal  techniques  for  predicting  error,  which  is  dealt  with  in  the 
Predicting  Human  Error  section.  The  analysis  of  error  is  assisted 
by  taxonomic  systems  and  the  interpretation  of  underlying  psy¬ 
chological  mechanisms.  Three  contemporary  systems  were  pre¬ 
sented  in  the  work  of  Norman  (1981),  Reason  (1990),  and'Wick- 
ens  (1992). 

Norman’s  (1981)  research  focused  on  the  categorization  of 
action  slips,  in  which  he  presented  the  analysis  of  1,000  inci¬ 
dents.  Underpinning  his  analysis  was  a  psychological  theory  of 
schema  activation.  He  argued  that  action  sequences  are  trig¬ 
gered  by  knowledge  structures  (organized  as  memory  units  and 
called  “schemas”).  'Within  the  mind  is  a  hierarchy  of  schemas 
that  are  invoked  (or  triggered)  if  particular  conditions  are  satis¬ 
fied  or  certain  events  occur.  The  theory  seems  particularly  per¬ 
tinent  as  a  description  of  skilled  behavior.  The  classification 
scheme  is  presented  in  Table  8.1. 

In  Neisser’s  (1976)  seminal  work  “Cognition  and  Reality,”  he 
contended  that  human  thought  is  closely  coupled  with  a  per¬ 
son’s  interaction  with  the  world.  He  argued  that  knowledge  of 
how  the  world  works  (e.g.,  mental  models)  leads  to  the  antici¬ 
pation  of  certain  kinds  of  information,  which  in  turn  directs  be¬ 
havior  to  seek  out  certain  kinds  of  information  and  to  provide 
a  ready  means  of  interpretation.  During  this  process,  as  the  en¬ 
vironment  is  sampled,  the  information  garnered  serves  to  up¬ 
date  and  modify  the  internal,  cognitive  schema  of  the  world, 
which  will  again  direct  further  search.  An  illustration  of  the  per¬ 
ceptual  cycle  is  shown  in  Eig.  8.1. 

The  perceptual  cycle  can  be  used  to  explain  human  infor¬ 
mation  processing  in  control  rooms.  For  example — assuming 
that  an  individual  has  the  correct  knowledge  of  a  videocassette 
recorder  he  is  programming — his  mental  model  will  enable  him 
to  anticipate  events  (such  as  the  menu  items  he  expects  to  see), 
search  for  confirmatory  evidence  (look  at  the  panel  on  the 
video  machine),  direct  a  course  of  action  (select  a  channel,  day 
of  the  week,  start  time,  end  time,  etc.),  and  continually  check 
that  the  outcome  is  as  expected  (menu  item  and  data  field  re¬ 
spond  as  anticipated) .  If  the  individual  uncovers  some  data  he 
does  not  expect  (such  as  a  menu  item  not  previously  encoun¬ 
tered,  or  the  data  field  not  accepting  his  input),  he  is  required  to 
access  a  source  of  a  wider  knowledge  of  the  world  to  consider 
possible  explanations  that  will  direct  future  search  activities. 
The  completeness  of  the  model  is  in  the  description  of  process 
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TABLE  8.1.  Taxonomy  of  Slips  with  Examples 

Taxonomy  of  Slips  Examples  of  Error  Types 

Slips  that  result  from  errors  in  the  formation  of  intention  Mode  errors:  erroneous  classification  of  the  situation 

Description  errors:  ambiguous  or  incomplete  specification  of  intention 

Slips  that  result  from  faulty  activation  of  schemas  Capture  errors:  similar  sequences  of  action,  where  stronger  sequence  takes  control 

Data-driven  activation  errors:  external  events  that  cause  the  activation  of  schemas 
Association-activation  errors:  currently  active  schemas  that  activate  other  schemas 
with  which  they  are  associated 

Loss-of-activation  errors:  schemas  that  lose  activation  after  they  have  been 
activated 

Slips  that  result  from  faulty  triggering  of  active  schemas  Blend  errors:  combination  of  components  from  competing  schemas 

Premature  activation  errors:  schemas  that  are  activated  too  early 
Failure  to  activate  errors:  failure  of  the  trigger  condition  or  event  to  activate 
the  schema 
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environment 
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FIGURE  8. 1 .  The  perceptual  cycle. 


(the  cyclical  nature  of  sampling  the  world)  and  product  (the  up¬ 
dating  of  the  world  model  at  any  point  in  time) . 

This  interactive  schema  model  works  well  for  explaining  how 
we  act  in  the  world.  As  Norman’s  (1981)  research  has  shown,  it 
may  also  explain  why  errors  occur  as  they  do.  If,  as  schema  the¬ 
ory  predicts,  action  is  directed  by  schema,  then  faulty  schemas 
or  faulty  activation  of  schemas  will  lead  to  erroneous  perfor¬ 
mance.  As  Table  8.1  shows,  this  can  occur  in  at  least  three  ways. 
First,  we  can  select  the  wrong  schema  due  to  misinterpretation 
of  the  situation.  Second,  we  can  activate  the  wrong  schema  be¬ 
cause  of  similarities  in  the  trigger  conditions.  Third,  we  can  ac¬ 
tivate  schemas  too  early  or  too  late.  Examples  of  these  types  of 
errors  are  presented  in  Table  8.1. 

Of  particular  interest  is  the  problem  of  mode  errors.  Norman 
(1981)  singled  out  this  type  of  error  as  requiring  special  atten¬ 
tion  in  the  design  of  computing  systems.  He  pointed  out  that 
the  misclassification  of  the  computing  system  mode  could  lead 
to  input  errors  which  may  result  in  serious  consequences.  In 
word  processors  this  may  mean  the  loss  of  documents;  in  video 
recorders  this  may  mean  the  loss  of  recordings;  on  flight  decks 
this  may  mean  damage  to  aircraft. 

Casey  (1993)  described  a  case  in  which  an  apparently  sim¬ 
ple  mode  error  by  a  radiotherapy  technician  working  in  a  can¬ 
cer  care  center  led  to  the  death  of  a  patient.  The  Therac-25  she 
was  operating  was  a  state-of-the-art,  million-dollar  machine 


that  could  be  used  as  both  a  high-power  x-ray  machine  (25  mil¬ 
lion  electron  volts  delivered  by  typing  an  “x”  on  the  keyboard) 
and  low-power  electron  beam  machine  (200  rads  delivered  by 
typing  an  “e”  on  the  keyboard) .  After  preparing  the  patient  for 
radiation  therapy,  the  radiotherapist  went  to  her  isolated  con¬ 
trol  room.  She  accidentally  pressed  the  “x”  key  instead  of  the 
“e”  key,  but  quickly  realized  this  and  selected  the  edit  menu 
so  that  she  could  change  the  setting  from  x-ray  to  electron- 
beam.  Then  she  returned  to  the  main  screen  to  wait  for  the 
“beam  ready”  prompt.  All  of  this  occurred  within  eight  sec¬ 
onds.  Unknown  to  her,  this  rapid  sequence  of  inputs  had 
never  been  tested  on  the  machine  before  and  it  had  actually 
entered  a  hybrid  mode,  delivering  blasts  of  25,000  rads,  which 
was  more  than  125  times  the  prescribed  dose.  When  the 
“beam  ready”  prompt  was  displayed,  the  radiotherapist 
pressed  the  “b”  key  to  fire  the  beam.  The  high-energy  beam 
was  delivered  to  the  patient  and  the  computer  screen  dis¬ 
played  the  prompt  “Malfunction  54.”  Unaware  that  the  ma¬ 
chine  had  already  fired,  the  operator  reset  it  and  pressed  “b” 
again.  This  happened  for  a  third  time  until  the  patient  ran  out 
of  the  room  reporting  painful  electric  shocks.  On  investiga¬ 
tion,  the  problem  with  the  machine  modes  was  found,  but  not 
before  other  overdoses  had  been  given.  This  case  study 
demonstrates  the  need  to  consider  the  way  in  which  design 
of  a  system  can  induce  errors  in  users. 

A  thorough  understanding  of  human  error  is  required  by  the 
design  team.  Error  classification  schemes  can  certainly  help,  but 
they  need  to  be  supported  by  formal  error-prediction  tech¬ 
niques  within  a  user-centered  design  approach. 

Reason  (1990)  developed  a  higher-level  error  classification 
system,  incorporating  slips,  lapses,  and  mistakes.  Slips  and 
lapses  are  defined  by  attentional  failures  and  memory  failures, 
respectively.  Both  slips  and  lapses  are  examples  of  where  the  ac¬ 
tion  was  unintended,  whereas  mistakes  are  associated  with  in¬ 
tended  action.  This  taxonomy  is  presented  in  Table  8.2. 

Wickens  (1992),  taking  the  information  processing  frame¬ 
work,  considered  the  implications  of  psychological  mechanisms 
at  work  in  error  formation.  He  argued  that  with  mistakes,  the 
situation  assessment  and/or  planning  are  poor  while  the  re¬ 
trieval  action  execution  is  good;  with  slips,  the  action  execu¬ 
tion  is  poor  whereas  the  situation  assessment  and  planning  are 
good;  and  finally,  with  lapses,  the  situation  assessment  and  ac- 


150 


STANTON 


TABLE  8.2.  Basic  Error  Types  with  Examples 


Basic  Error  Type 

Example  of  Error  Type 

Slip 

Action  intrusion 

Omission  of  action 

Reversal  of  action 

Misordering  of  action 

Mistiming  of  action 

Lapse 

Omitting  of  planned  actions 

Losing  place  in  action  sequence 

Forgetting  intended  actions 

Mistake 

Misapplication  of  good  procedure 
Application  of  a  bad  procedure 
Misperception 

Poor  decision  making 

Failure  to  consider  alternatives 
Overconfidence 

tion  execution  are  good,  but  memory  is  poor.  A  summary  of 
these  distinctions  is  shown  in  Table  8.3. 

Wickens  (1992)  was  also  concerned  with  mode  errors,  with 
particular  reference  to  technological  domains.  He  suggested 
that  a  pilot  raising  the  landing  gear  while  the  aircraft  is  still  on 
the  runway  is  an  example  of  a  mode  error.  Wickens  proposed 
that  mode  errors  are  the  result  of  poorly  conceived  system  de¬ 
sign  that  allows  the  mode  confusion  to  occur  and  the  opera¬ 
tion  in  an  inappropriate  mode.  Chapanis  (1999)  argued  that  the 
landing  gear  switch  could  be  rendered  inoperable  if  the  land¬ 
ing  gear  could  be  configured  to  detect  weight  on  the  wheels, 
as  the  aircraft  would  be  on  the  ground. 

Taxonomies  of  errors  can  be  used  to  anticipate  what  might 
go  wrong  in  any  task.  Potentially,  every  task  or  activity  could  be 
subject  to  a  slip,  lapse,  or  mistake.  The  two  approaches  repre¬ 
sented  within  the  taxonomies  are  a  schema-based  approach  and 
an  error-list-based  approach.  Examples  of  these  two  approaches 
will  be  presented  next  in  the  form  of  formal  human-error  iden¬ 
tification  techniques. 


PREDICTING  HUMAN  ERROR 


An  abundance  of  methods  for  identifying  human  error  exists; 
some  of  these  methods  may  be  appropriate  for  the  analysis  of 
consumer  products.  In  general,  most  of  the  existing  techniques 
have  two  key  problems.  The  first  of  these  problems  relates  to  the 
lack  of  representation  of  the  external  environment  or  objects. 
Typically,  human-error  analysis  techniques  do  not  represent  the 
activity  of  the  device  nor  the  material  that  the  human  interacts 
with  in  more  than  a  passing  manner.  Hollnagel  (1993)  empha¬ 
sized  that  Human  Reliability  Analysis  (HRA)  often  fails  to  take  ad¬ 
equate  account  of  the  context  in  which  performance  occurs.  Sec¬ 
ond,  there  tends  to  be  an  overdependence  on  the  analyst’s 
judgment.  Different  analysts  with  different  experiences  may 
make  different  predictions  regarding  the  same  problem  (intra¬ 
analyst  reliability).  Similarly,  the  same  analyst  may  make  differ¬ 
ent  judgments  on  different  occasions  (inter-analyst  reliability). 
This  subjectivity  of  analysis  may  weaken  the  confidence  that  can 
be  placed  in  any  predictions  made.  The  analyst  is  required  to  be 


TABLE  8.3.  Error  Types  and  Associated 
Psychological  Mechanisms 


Error  Type 

Associated  Psychological  Mechanism 

Slip 

Action  execution 

Lapse  and  mode  errors 

Memory 

Mistake 

Planning  and  intention  of  action 

Mistake 

Interpretation  and  situation  assessment 

an  expert  in  the  technique  as  well  as  in  the  operation  of  the  de¬ 
vice  being  analyzed  if  the  analysis  has  any  potential  for  validity. 

Two  techniques  are  considered  here  because  of  the  inher¬ 
ent  differences  in  the  way  the  methods  work.  Systematic  Human 
Error  Reduction  and  Prediction  Approach  (SHERPA)  is  a  diver¬ 
gent  error-prediction  method;  it  works  by  associating  up  to  10 
error  modes  with  each  action.  In  the  hands  of  a  novice,  it  is  typ¬ 
ical  for  there  to  be  an  over-inclusive  strategy  for  selecting  error 
modes.  The  novice  user  would  rather  play-safe-than-sorry  and 
would  tend  to  predict  many  more  errors  than  would  actually  oc¬ 
cur.  This  might  be  problematic;  “crying  wolf”  too  many  times 
might  ruin  the  credibility  of  the  approach.  Task  Analysis  For  Er¬ 
ror  Identification  (TAFEI),  by  contrast,  is  a  convergent  error- 
prediction  technique;  it  works  by  identifying  the  possible  tran¬ 
sitions  between  the  different  states  of  a  device  and  uses  the 
normative  description  of  behavior  (provided  by  the  Hierarchical 
Task  Analysis  (HTA))  to  identify  potentially  erroneous  actions. 
Even  in  the  hands  of  a  novice  the  technique  seems  to  prevent 
the  individual  from  generating  too  many  false  alarms,  certainly 
no  more  than  they  do  using  heuristics.  In  fact,  by  constraining 
the  user  of  TAFEI  to  the  problem  space  surrounding  the  transi¬ 
tions  between  device  states,  it  should  exclude  extraneous  error 
prediction.  Indeed,  this  was  one  of  the  original  aims  for  the  tech¬ 
nique  when  it  was  originally  developed  (Baber  &  Stanton,  1994). 

Systematic  Human-Error  Reduction  and  Prediction 
Approach  (SHERPA) 

SHERPA  represents  the  error-list  approach.  At  its  core  is  an  error 
taxonomy  that  is  not  unlike  the  classification  schemes  pre¬ 
sented  in  the  previous  section.  The  idea  is  that  each  task  can 
be  classified  into  one  of  five  basic  types.  SHERPA  (Embrey  1986; 
Stanton,  2005)  uses  Hierarchical  Task  Analysis  (HTA;  Annett, 
Duncan,  Stammers,  &  Gray,  1971)  together  with  an  error  tax¬ 
onomy  to  identify  credible  errors  associated  with  a  sequence 
of  human  activity.  In  essence,  the  SHERPA  technique  works  by 
indicating  which  error  modes  are  credible  for  each  task  step  in 
turn,  based  on  an  analysis  of  work  activity.  This  indication  is 
based  on  the  judgment  of  the  analyst,  and  requires  input  from 
a  subject  matters  expert  to  be  realistic.  A  summary  of  the  pro¬ 
cedure  is  shown  in  Fig.  8.2. 

The  process  begins  with  the  analysis  of  work  activities,  us¬ 
ing  Hierarchical  Task  Analysis.  HTA  (Annett  et  al.,  1971;  Annett, 
2004,  2005)  is  based  on  the  notion  that  task  performance  can  be 
expressed  in  terms  of  a  hierarchy  of  goals  (what  the  person  is 
seeking  to  achieve),  operations  (the  activities  executed  to 
achieve  the  goals),  and  plans  (the  sequence  in  which  the  oper¬ 
ations  are  executed) .  An  example  of  HTA  for  the  programming 
of  a  videocassette  recorder  is  shown  in  Fig.  8.3 
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FIGURE  8.2.  The  SHERPA  procedure. 


For  the  application  of  SHERPA,  each  task  step  from  the  bot¬ 
tom  level  of  the  analysis  is  taken  in  turn.  First,  each  task  step  is 
classified  into  one  of  the  following  types  from  the  taxonomy: 

•  Action  (e.g.,  pressing  a  button,  pulling  a  switch,  opening  a 
door) 

•  Retrieval  (e.g.,  getting  information  from  a  screen  or  manual) 

•  Checking  (e.g.,  conducting  a  procedural  check) 

•  Information  communication  (e.g.,  talking  to  another  party) 

•  Selection  (e.g.,  choosing  one  alternative  over  another) 

This  classification  of  the  task  step  then  leads  the  analyst  to 
consider  credible  error  modes  associated  with  that  activity,  as 
shown  in  Table  8.4. 

For  each  credible  error  (e.g.,  those  judged  by  a  subject  mat¬ 
ter  expert  to  be  possible)  a  description  of  the  form  that  the  er¬ 
ror  would  take  is  given,  as  illustrated  in  Table  8.5.  The  conse¬ 
quence  of  the  error  on  the  system  needs  to  be  determined  next, 
as  this  has  implications  for  the  criticality  of  the  error.  The  last 
four  steps  consider  the  possibility  for  error  recovery,  the  ordi¬ 
nal  probability  of  the  error  (high,  medium  of  low),  its  criticality 


(either  critical  or  not  critical),  and  potential  remedies.  Again, 
these  are  shown  in  Table  8.5. 

As  Table  8.5  shows,  there  are  six  basic  error  types  associated 
with  the  activities  of  programming  a  VCR.  These  are, 

1.  Failing  to  check  that  the  VCR  clock  is  correct; 

2.  Failing  to  insert  a  cassette; 

3.  Failing  to  select  the  program  number; 

4.  Failing  to  wait; 

5.  Failing  to  enter  programming  information  correctly; 

6.  Failing  to  press  the  confirmatory  buttons. 

The  purpose  of  SHERPA  is  not  only  to  identify  potential  er¬ 
rors  with  the  current  design,  but  also  to  guide  future  design 
considerations.  The  structured  nature  of  the  analysis  can  help  to 
focus  the  design  remedies  for  solving  problems,  as  shown  in  the 
remedial  strategies  column.  As  this  analysis  shows,  quite  a  lot 
of  improvements  could  be  made.  It  is  important  to  note,  how¬ 
ever,  that  the  improvements  are  constrained  by  the  analysis. 
This  does  not  address  radically  different  design  solutions  that 
may  remove  the  need  to  program  at  all. 


Task  Analysis  For  Error  Identification  (TAFEl) 

TAFEI  represents  the  schema-based  approach.  It  explicitly  ana¬ 
lyzes  the  interaction  between  people  and  machines  (Baber  & 
Stanton,  1994;  Stanton  &  Baber,  1996;  Stanton  &  Baber,  2005a). 
TAFEI  analysis  is  concerned  with  task-based  scenarios.  This 
analysis  is  done  by  mapping  human  activity  onto  machine 
states.  An  overview  of  the  procedure  is  shown  in  Fig.  8.4.  TAFEI 
analysis  consists  of  three  principal  components:  Hierarchical 
Task  Analysis  (HTA);  State-Space  Diagrams  (SSD),  which  are 
loosely  based  on  finite  state  machines  (Angel  &  Bekey  1968); 
and  Transition  Matrices  (TM) .  HTA  provides  a  description  of  hu¬ 
man  activity,  SSDs  provide  a  description  of  machine  activity,  and 
TM  provides  a  mechanism  for  determining  potential  erroneous 
activity  through  the  interaction  of  the  human  and  the  device. 
In  a  similar  manner  to  Newell  and  Simon  (1972),  legal  and  illegal 
operators  (called  “transitions”  in  the  TAFEI  methodology)  are 
identified. 

In  brief,  the  TAFEI  methodology  is  as  follows.  First,  the  sys¬ 
tem  needs  to  be  defined.  Next,  the  human  activities  and  ma¬ 
chine  states  are  described  in  separate  analyses.  The  basic  build¬ 
ing  blocks  are  HTA  (describing  human  activity;  see  Fig.  3)  and 
state  space  diagrams  (describing  machine  activity).  These  two 
types  of  analysis  are  then  combined  to  produce  the  TAFEI  de¬ 
scription  of  human-machine  interaction,  as  shown  in  Fig.  8.5. 

From  the  TAFEI  diagram,  a  transition  matrix  is  compiled 
and  each  transition  is  scrutinized.  In  Table  8.6,  each  transition 
has  been  classified  as  “impossible”  (e.g.,  the  transition  cannot 
be  performed),  “illegal”  (the  transition  can  be  performed  but 
it  does  not  lead  to  the  desired  outcome),  or  “legal”  (the  tran¬ 
sition  can  be  performed  and  is  consistent  with  the  descrip¬ 
tion  of  error-free  activity  provided  by  the  HTA),  until  all  tran¬ 
sitions  have  been  analyzed.  Finally,  “illegal”  transitions  are 
addressed  in  turn  as  potential  errors,  to  consider  changes  that 
may  be  introduced. 
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Thirteen  of  the  transitions  are  defined  as  “illegal,”  and  these 
can  be  reduced  to  a  subset  of  six  basic  error  types: 

1.  Switch  VCR  off  inadvertently 

2.  Insert  cassette  into  machine  when  switched  off 

3.  Program  without  cassette  inserted 

4.  Fail  to  select  program  number 

5.  Fail  to  wait  for  “on”  light 

6.  Fail  to  enter  programming  information 

In  addition,  one  legal  transition  has  been  highlighted  be¬ 
cause  it  requires  a  recursive  activity  to  be  performed.  These  ac¬ 
tivities  seem  to  be  particularly  prone  to  errors  of  omission. 
These  predictions  then  serve  as  a  basis  for  the  designer  to  ad¬ 
dress  the  redesign  of  the  VCR.  A  number  of  illegal  transitions 
could  be  dealt  with  relatively  easily  by  considering  the  use  of 
modes  in  the  operation  of  the  device,  such  as  switching  off  the 
VCR  without  stopping  the  tape  and  pressing  play  without  in¬ 
serting  the  tape.  As  with  the  SHERPA  example,  the  point  of  the 


analysis  is  to  help  guide  design  efforts  to  make  the  product 
error-tolerant  (Baber  &  Stanton,  2004). 


VALIDATION  OF  HUMAN-ERROR 
IDENTIFICATION 

There  have  been  a  few  attempts  to  validate  HEI  techniques 
(Williams,  1989;  Whalley  &  Kirwan,  1989;  Kirwan,  1992a,  1992b; 
Kennedy,  1995;  Baber  &  Stanton,  1996).  For  instance,  Whalley 
and  Kirwan  (1989)  evaluated  six  HEI  techniques  (Heuristics, 
PHECA,  SRK,  SHERPA,  THERI)  and  HA20P)  for  their  ability  to 
account  for  the  errors  known  to  have  contributed  to  four  gen¬ 
uine  incidents  within  the  nuclear  industry.  More  recently,  Kir¬ 
wan  (1992b)  has  developed  a  comprehensive  list  of  eight  crite¬ 
ria  to  evaluate  the  acceptability  of  these  techniques  at  a  more 
qualitative  level.  In  an  unpublished  study,  Kennedy  (1995)  has 
included  Kirwan’s  criteria  when  examining  the  ability  of  the 
techniques  to  predict  ten  actual  incidents  retrospectively.  While 
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TABLE  8.4.  Error  Modes  and  their  Description 


Error  Mode 

Error  Description 

Action 

AI 

Operation  too  long/short 

A2 

Operation  mistimed 

A3 

Operation  in  wrong  direction 

A4 

Operation  too  much/little 

A5 

Misalign 

A6 

Right  operation  on  wrong  object 

A7 

Wrong  operation  on  right  object 

A8 

Operation  omitted 

A9 

Operation  incomplete 

AIO 

Wrong  operation  on  wrong  object 

Information  Retrieval 

R1 

Information  not  obtained 

R2 

Wrong  information  obtained 

R3 

Information  retrieval  incomplete 

Checking 

Cl 

Check  omitted 

C2 

Check  incomplete 

C3 

Right  check  on  wrong  object 

C4 

Wrong  check  on  right  object 

C5 

Check  mistimed 

C6 

Wrong  check  on  wrong  object 

Information 

Communication 

11 

Information  not  communicated 

12 

Wrong  information  communicated 

13 

Information  communication  incomplete 

Selection 

SI 

Selection  omitted 

S2 

Wrong  selection  made 

these  studies  failed  to  identify  a  clear  favorite  from  among  these 
HEI  techniques,  all  three  studies  indicated  impressive  general 
performance  using  the  SHERPA  method.  SHERPA  achieved  the 
highest  overall  rankings  and  Kirwan  (1992b)  recommended  a 
combination  of  expert  judgment  together  with  the  SHERPA 
technique  as  the  most  valid  approach. 

The  strength  of  these  studies  lies  in  the  high  level  of  eco¬ 
logical  or  face  validity  that  they  achieve.  The  methodologies 
make  use  of  the  opinions  of  expert  assessors  for  the  predic¬ 
tion  of  errors  contributing  to  real  world  events.  However,  these 
studies  do  raise  several  methodological  concerns.  Specifically, 
the  number  of  assessors  using  each  technique  is  small  (typically 
one  to  three),  and  the  equivalence  of  assessment  across  tech¬ 
niques  is  brought  into  question  because  different  people  are 
assessing  each  HEI  technique.  A  second  methodological  con¬ 
cern  centers  on  the  use  of  subjective  rating  scales.  It  is  doubtful 
that  the  assessors  will  share  the  same  standards  when  rating 
the  acceptability  or  usefulness  of  an  HEI  technique.  This  fac¬ 
tor,  combined  with  the  small  number  of  assessors  for  each 
technique,  means  that  these  data  should  be  accepted  with 
some  degree  of  caution. 

In  light  of  these  criticisms,  Baber  and  Stanton  (1996)  aimed  to 
provide  a  more  rigorous  test  for  the  predictive  validity  of  SHERPA 
and  TAFEI.  Predictive  validity  was  tested  by  comparing  the  errors 


identified  by  an  expert  analyst  with  those  observed  during  300 
transactions  with  a  ticket  machine  on  the  London  Underground. 
Baber  and  Stanton  (1996)  suggested  that  SHERPA  and  TAFEI  pro¬ 
vided  an  acceptable  level  of  sensitivity  based  on  the  data  from 
two  expert  analysts  (d’  =  0.8).  The  strength  of  this  latter  study 
over  Kirwan’s  was  that  it  reports  the  use  of  the  method  in  detail 
as  well  as  the  error  predictions  made  using  SHERPA  and  TAFEI. 
Stanton  and  Baber  (2002)  reported  a  study  on  the  performance 
of  SHERPA  and  TAFEI  using  a  larger  pool  of  analysts,  including 
novice  analysts,  to  examine  the  important  issue  of  ease  of  ac¬ 
quiring  the  method.  They  reported  reliability  values  of  between 
0.4  and  0.6  and  sensitivity  values  of  between  0.7  and  0.8.  This 
compares  favorably  with  Hollnagel,  Kaarstad,  and  Lee’s  (1998) 
analysis  of  the  Cognitive  Reliability  and  Error  Analysis  Method 
(CREAM),  for  which  they  claim  a  68.6%  match  between  pre¬ 
dicted  outcomes  and  actual  outcomes. 

The  research  into  HEI  techniques  suggested  that  the  meth¬ 
ods  enable  analysts  to  structure  their  judgment  (Stanton  & 
Baber,  2005b).  However,  the  results  run  counter  to  the  litera¬ 
ture  in  some  areas  (such  as  usability  evaluation)  which  suggest 
the  superiority  of  heuristic  approaches  (Nielsen,  1993).  The 
views  of  Lansdale  and  Ormerod  (1994)  may  help  us  to  recon¬ 
cile  these  findings.  They  suggested  that,  to  be  applied  success¬ 
fully,  a  heuristic  approach  needs  the  support  of  an  explicit 
methodology  to  “ensure  that  the  evaluation  is  structured  and 
thorough”  (p.  257).  Essentially,  SHERPA  and  TAFEI  provide  a 
semi-structured  approach  which  forms  a  framework  for  the 
judgment  of  the  analyst  without  constraining  it.  It  seems  to  suc¬ 
ceed  precisely  because  of  its  semi-structured  nature,  which  al¬ 
leviates  the  burden  otherwise  placed  on  the  analyst’s  memory 
while  allowing  them  room  to  use  their  own  heuristic  judgment. 


APPLYING  TAFEI  TO  INTERFACE  DESIGN 


A  study  by  Baber  and  Stanton  (1999)  has  shown  how  TAFEI  can 
be  used  as  part  of  the  interface  design  process  for  a  computer 
workstation.  Their  case  study  is  based  on  a  design  project  for  a 
medical  imaging  software  company.  The  software  was  used  by 
cytogeneticists  in  research  and  hospital  environments.  The  ex¬ 
isting  software  was  a  menu-driven,  text-based  interface.  The  task 
of  metaphase  finding  has  six  main  subtasks:  (a)  the  set-up  task 
(where  the  computer  is  fired  up,  the  microscope  is  calibrated, 
the  scan  is  defined,  and  the  cells  are  prepared);  (b)  the  capture 
task  (where  the  drug  is  applied  to  the  cells  on  the  slide  using  a 
pipette  and  the  images  are  captured);  (c)  the  processing  task 
(where  the  background  is  subtracted  from  the  cell  image  and 
the  image  processing  is  performed);  (d)  the  analysis  task 
(where  the  data  are  graphed  and  tabulated);  (e)  the  storage  task 
(where  that  images  are  assessed  and  selected  images  and  data 
are  saved);  and  (f)  the  shut-down  task  (where  the  computer 
and  microscope  are  shut  down).  Using  TAFEI  to  model  the 
existing  system,  Baber  and  Stanton  (1999)  found  that  the  se¬ 
quence  of  activities  required  to  conduct  the  metaphase-finding 
tasks  were  not  supported  logically  by  the  computer  interface. 
The  two  main  problems  were  the  confusing  range  of  choices 
offered  to  users  in  each  system  state  and  the  number  of  recur¬ 
sions  in  the  task  sequence  required  to  perform  even  the  sim- 


154 


STANTON 


TABLE  8.5.  The  SHERPA  description. 


Task  Step 

Error  Mode 

Error  Description 

Consequence 

Recovery 

P 

C 

Remedial  Strategy 

1.1 

A8 

Fall  to  switch  VCR  on 

Cannot  proceed 

Immediate 

L 

Press  any  button  to  switch  VCR  on 

1.2 

Cl 

Omit  to  check  clock 

VCR  Clock  time 

None 

L 

! 

Automatic  clock  setting  and  adjust 

may  be  incorrect 

via  radio  transmitter 

C2 

Incomplete  check 

1.3 

A3 

Insert  cassette  wrong 

Damage  to  VCR 

Immediate 

L 

! 

Strengthen  mechanism 

way  around 

On-screen  prompt 

A8 

Fall  to  Insert  cassette 

Cannot  record 

Task  3 

L 

2 

A8 

Fall  to  pull  down  front  cover 

Cannot  proceed 

Immediate 

L 

Remove  cover  to  programming 

3.1 

SI 

Fall  move  timer  selector 

Cannot  proceed 

Immediate 

L 

Separate  timer  selector  from 

programming  function 

3.2 

A8 

Fail  to  press  PROGRAM 

Cannot  proceed 

Immediate 

L 

Remove  this  task  step  from 

sequence 

3.3 

A8 

Fail  to  press  ON  button 

Cannot  proceed 

Immediate 

L 

Label  button  START  TIME 

4.1.1 

A8 

Fail  to  press  UP  button 

Wrong  channel 

None 

M 

! 

Enter  channel  number  directly 

selected 

from  keypad 

4.1.2 

A8 

Fail  to  press  DOWN  button 

Wrong  channel 

None 

M 

! 

Enter  channel  number  directly 

selected 

from  keypad 

4.2 

A8 

Fail  to  press  DAY  button 

Wrong  day  selected 

None 

M 

! 

Present  day  via  a  calendar 

4.3 

11 

No  time  entered 

No  program  recorded 

None 

L 

! 

Dial  time  in  via  analogue  clock 

12 

Wrong  time  entered 

Wrong  program  recorded 

None 

L 

1 

Dial  time  in  via  analogue  clock 

4.4 

A1 

Fail  to  wait 

Start  time  not  set 

Task  4.5 

L 

Remove  need  to  wait 

4.5 

A8 

Fail  to  press  OFF  button 

Cannot  set  finish  time 

Label  button  FINISH  TIME 

4.6 

11 

No  time  entered 

No  program  recorded 

None 

L 

! 

Dial  time  in  via  analogue  clock 

12 

Wrong  time  entered 

Wrong  program  recorded 

None 

L 

1 

Dial  time  in  via  analogue  clock 

4.7 

A8 

Fail  to  set  timer 

No  program  recorded 

None 

L 

! 

Seperate  timer  selector  from 
programming  function 

4.8 

A8 

Fail  to  press  TIME 

No  program  recorded 

None 

L 

! 

Remove  this  task  step  from 

RECORD  button 
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plest  of  tasks.  Interviews  with  the  cytogeneticists  revealed 
some  fundamental  problems  with  the  existing  software,  such  as: 
(a)  the  existing  system  was  not  error  tolerant,  (b)  there  was  a 
lack  of  consistency  between  commands,  (c)  the  error  and  feed¬ 
back  messages  were  not  meaningful  to  users,  and  (d)  the  same 
data  had  to  be  entered  several  times  in  the  procedure.  The  lack 
of  trust  that  the  users  had  in  the  system  meant  that  they  also 
kept  a  paper-based  log  of  the  information  they  entered  into  the 
computer,  meaning  even  greater  duplication  of  effort. 

From  their  analysis  of  the  tasks  sequences,  Baber  and  Stan¬ 
ton  (1999)  developed  a  TAFEI  diagram  of  the  ideal  system  states 
that  would  lead  the  user  through  the  metaphase-finding  task  in 
a  logical  manner.  One  of  the  software  engineers  described  the 
analysis  as  “a  sort  of  video  playback  of  someone  actually  doing 
things  with  the  equipment.”  The  TAFEI  analysis  was  a  revela¬ 
tion  to  the  company,  who  had  some  vague  idea  that  all  was  not 
well  with  the  interface,  but  had  not  conceived  of  the  problem  as 
defining  the  task  sequence  and  user  actions.  As  a  modeling  ap¬ 
proach,  TAFEI  does  this  rather  well.  The  final  step  in  the  analy¬ 
sis  was  to  define  and  refine  the  interface  screen.  This  was  based 
on  the  task  sequence  and  TAFEI  description,  to  produce  a  pro¬ 
totype  layout  for  comment  with  the  users  and  software  engi¬ 
neers.  Specific  task  scenarios  were  then  tried  out  to  see  if  the 
new  interface  would  support  user  activity.  Following  some  mi¬ 
nor  modification,  the  company  produced  a  functional  prototype 
for  user  performance  trials. 


Baber  and  Stanton  (1999)  reported  successful  completion 
of  the  interface  design  project  using  TAFEI.  They  argued  that 
the  method  supports  analytical  prototyping  activity.  TAFEI  en¬ 
ables  designers  to  focus  attention  on  the  task  sequence,  user  ac¬ 
tivity,  and  interface  design.  It  also  highlights  potential  problem 
points  in  the  interaction,  where  errors  are  likely  to  occur.  While 
it  is  accepted  that  these  considerations  might  also  form  part  of 
other  methods  (e.g.,  storyboards),  by  contrast  TAFEI  does  this 
in  a  structured,  systematic  rigorous  and  auditable  approach  for 
the  consideration  of  human  error  potential  in  system  design. 


APPLYING  SHERPA  TO  SAFETY 
CRITICAL  SYSTEMS 


Studies  reported  on  SHERPA  show  how  it  can  be  applied  to  the 
evaluation  of  energy  distribution  (Glendon  &  Stanton,  1999) 
and  oil  extraction  (Stanton  &  Wilson,  2000)  in  the  safety-critical 
industries.  Both  of  these  represent  multiperson  systems.  The 
study  by  Glendon  and  Stanton  (1999)  compared  the  impact  of 
the  implementation  of  a  new  safety  management  system  on  er¬ 
ror  potential  in  an  energy-distribution  company.  The  first  as¬ 
sessment  of  human-error  potential  in  electrical  switching  oper¬ 
ations  was  undertaken  in  1994.  Following  this,  the  company 
undertook  major  organization-restructuring  activities.  This  in- 
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FIGURE  8.4.  The  TAFEl  procedure. 


eluded  a  major  safety-management  initiative,  which  was  to  sep¬ 
arate  operational  control  of  the  energy  distribution  system  from 
safety  management.  In  the  original  assessment  of  the  system  in 
1994  it  became  obvious  that  safety  was  often  secondary  to  op¬ 
erational  performance.  In  order  to  remove  this  conflict,  a  sepa¬ 
rate  safety-management  center  was  established.  This  enabled 
the  operational  center  to  pass  control  of  part  of  the  energy- 
distribution  system  over  to  the  safety-management  center  when 
safety  issues  arose.  Thus  operational  control  and  safety  man¬ 
agement  were  not  in  conflict.  When  the  safety  issues  were  re¬ 
solved,  control  of  the  energy-distribution  system  was  passed 
back  to  the  operational  center.  The  effect  on  reducing  error  po¬ 
tential  in  the  system  is  illustrated  in  Fig.  8.6. 

Statistical  analyses,  using  the  Binomial  test,  were  con¬ 
ducted  to  determine  the  difference  between  the  error  rates 
in  1994  with  those  in  1997.  All  comparisons  between  critical 
(low  and  medium  likelihood)  errors  proved  statistically  signif¬ 
icant  (p  <  0.001).  It  should  be  noted  that  there  were  no  “high” 
likelihood  errors  in  the  analyses  of  this  system,  all  of  which 
had  probably  been  designed  out  of  the  system  as  it  evolved. 
Fig.  8.6  shows  a  reduction  in  the  error  potential  by  over  50% 
for  critical  errors  from  1994  to  1997.  This  suggests  that  the  sys¬ 
tem  is  likely  to  be  safer.  The  SHERPA  technique  assisted  in  the 


redesign  process  after  the  1994  analysis  by  proposing  design- 
reduction  strategies.  The  results  of  these  proposals  are  seen  in 
the  1997  analysis. 

Stanton  and  Wilson  (2000)  described  the  assessment  of  an 
oil  drilling  task  undertaken  offshore  using  the  SHERPA  tech¬ 
nique.  Surprisingly,  the  oil  industry  lacked  a  method  for  assess¬ 
ing  the  robustness  of  its  working  practices.  SHERPA  scrutinized 
the  minutiae  of  human  action,  and  provided  some  basis  for  as¬ 
sessing  the  magnitude  of  increased  risk  from  human  activity, 
which  could  then  be  used  to  inform  which  intervention  strate¬ 
gies  were  likely  to  make  the  working  environment  safer. 
Drilling  is  a  highly  integrated  team  activity  comprised  of  a  site 
controller,  drilling  supervisor,  geologist,  toolpusher,  directional 
driller,  assistant  driller,  mudlogger,  mud  engineer,  derrickman, 
crane  operator,  roughnecks,  and  roustabouts.  It  is  also  a  fairly 
dangerous  occupation,  as  the  incidents  at  Piper  Alpha  and 
Ocean  Odyssey  show.  It  is  argued  that  the  risks  arising  from  hu¬ 
man  operations  can  be  controlled  in  the  same  way  as  engi¬ 
neering  risk.  Stanton  and  Wilson  (2000)  conducted  a  detailed 
task  analysis  of  drilling  operations,  covering  monitoring  the  sta¬ 
tus  of  the  well  (monitor),  detecting  abnormalities  (detect),  and 
dealing  with  kicks  (deal).  The  results  of  the  SHERPA  analysis  are 
shown  in  Fig.  8.7. 

As  Fig.  8.7  shows,  the  error  potential  in  the  detection  of 
abnormalities  task  and  the  dealing  with  kicks  task  are  consid¬ 
erably  greater  than  the  monitor  the  well  task.  This  should  be 
some  cause  for  concern,  and  certainly  explains  some  of  the 
problems  that  oil  companies  have  observed.  The  proposed 
remedies  from  the  SHERPA  analysis  included  strategies  such 
as  computer-prompted  checks  of  drilling  variables,  placement 
of  drilling  parameters  on  separate  displays,  redesign  of  trend 
displays,  computer-prompted  alarm  levels,  automatic  transmis¬ 
sion  of  information,  electronic  links  between  the  drilling  team 
members,  computer-generated  tables  of  mud  loss,  computer- 
based  procedures,  online  weather  systems,  and  automated  shut¬ 
down  procedures.  All  of  these  proposals  are  really  about  the 
design  of  the  human-computer  interface.  Stanton  and  Wilson 
(2000)  argued  that  adopting  these  approaches  should  help  or¬ 
ganizations  design  safer  systems  and  help  prevent  catastrophic 
disasters  in  the  future. 


CONCLUSIONS 


Despite  most  HEI  techniques,  and  ergonomics  methods  result¬ 
ing  from  them  being  used  in  an  evaluative  and  summative  man¬ 
ner,  it  is  entirely  feasible  for  them  to  be  used  in  a  formative  de¬ 
sign  approach.  Stanton  and  Young  (1999)  argued  that  the 
methods  may  have  the  greatest  impact  at  the  prototyping  stage, 
particularly  considering  one  of  the  key  design  stages:  analytic 
prototyping.  Although  in  the  past  it  may  have  been  costly  or 
even  impossible,  to  alter  design  at  the  structural  prototyping 
stage,  with  the  advent  of  computer-aided  design  it  is  made 
much  simpler.  It  may  even  be  possible  to  compare  alternative 
designs  at  this  stage  with  such  technology.  In  terms  of  the  ana¬ 
lytical  prototyping  of  human  interfaces,  Baber  and  Stanton 
(1999)  argued  that  there  are  three  main  forms:  functional  analy¬ 
sis  (e.g.,  consideration  of  the  range  of  functions  the  device  sup- 
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FIGURE  8.5.  The  TAFEl  description. 
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ports),  scenario  analysis  (e.g.,  consideration  of  the  device  with 
regard  to  a  particular  sequence  of  activities),  and  structural 
analysis  (e.g.,  non-destructive  testing  of  the  interface  from  a 
user-centered  perspective) .  The  case  studies  of  how  TAFEl  and 
SHERPAwere  applied  on  interface  design  projects  demonstrate 
how  system  improvements  may  be  made.  Stanton  and  Young 
(1999)  have  extended  this  analysis  to  other  ergonomics  meth¬ 
ods.  There  are  three  take-home  messages  from  this  chapter, 
which  are  that 

1.  Most  technology-induced  errors  are  entirely  predictable; 

2.  Structured  methods,  such  as  SHERPA  and  TAFEl,  produce  re¬ 
liable  and  valid  error  data; 

3.  Ergonomics  methods  should  be  used  as  part  of  the  formative 
design  process,  to  improve  design  and  reduce  errors. 

Successful  design  will  require  expertise  in  the  subject  of 
analysis  as  well  as  expertise  in  the  methodology  being  used.  Ex¬ 
ploring  design  weaknesses  through  SHERPA  and  TAFEl  will  help 
in  developing  more  error- tolerant  devices  and  products. 
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FIGURE  8.6.  SHERPA  analysis  of  an  energy-distribution  system. 


FIGURE  8.7.  SHERPA  analysis  of  drilling  for  oil  tasks. 
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WHAT'S  AN  INPUT  DEVICE  ANYWAY? 


Input  devices  sense  physical  properties  of  people,  places,  or 
things.  Yet  any  treatment  of  input  devices  without  regard  to  the 
corresponding  visual  feedback  is  like  trying  to  use  a  pen  without 
paper.  Small-screen  devices  with  integrated  sensors  underscore 
the  indivisibility  of  input  and  output.  This  chapter  treats  input 
technologies  at  the  level  of  interaction  techniques,  which  pro¬ 
vide  a  way  for  users  to  accomplish  tasks  by  combining  input 
with  appropriate  feedback.  An  interaction  designer  must  con¬ 
sider  (a)  the  physical  sensor,  (b)  the  feedback  presented  to  the 
user,  (c)  the  ergonomic  and  industrial  design  of  the  device,  and 
(d)  the  interplay  between  all  of  the  interaction  techniques  sup¬ 
ported  by  a  system. 

This  chapter  enumerates  properties  of  input  devices  and 
provides  examples  of  how  these  properties  apply  to  common 
pointing  devices,  as  well  as  mobile  devices  with  touch  or  pen 
input.  We  will  discuss  how  to  use  input  signals  in  applications,  as 
well  as  models  and  theories  that  help  to  evaluate  interaction 
techniques  and  reason  about  design  options.  We  will  also  dis¬ 
cuss  discrete  symbolic  entry,  including  mobile  and  keyboard- 
based  text  entry.  The  chapter  concludes  with  some  thoughts 
about  future  trends. 


UNDERSTANDING  INPUT  TECHNOLOGIES 


A  designer  who  understands  input  technologies  and  the  task  re¬ 
quirements  of  users  has  a  better  chance  of  designing  interaction 
techniques  that  match  a  user’s  natural  workflow.  Making  an  op¬ 
timal  choice  for  tasks  in  isolation  leads  to  a  poor  design,  so  the 
designer  must  weigh  competing  design  requirements,  as  well  as 
transitions  between  tasks. 

Input  Device  Properties 

The  variety  of  pointing  devices  is  bewildering,  but  a  few  important 
properties  characterize  most  input  sensors.  These  properties  help 
a  designer  understand  a  device  and  anticipate  potential  problems. 
We  will  first  consider  these  device  properties  in  general,  and  then 
show  how  they  apply  to  some  common  input  devices. 

Property  sensed.  Most  devices  sense  linear  position,  mo¬ 
tion,  or  force;  rotary  devices  sense  angle,  change  in  angle,  and 
torque  (Buxton,  1995c;  Card,  Mackinlay,  &  Robertson,  1991). 
For  example,  tablets  sense  position  of  a  pen,  mice  sense  motion 
(change  in  position),  and  isometric  joysticks  sense  force.  The 
property  sensed  determines  the  mapping  from  input  to  out¬ 
put,  or  transfer  function,  that  is  most  appropriate  for  the 
device.  Position  sensing  devices  are  absolute  input  devices, 
whereas  motion-sensing  devices  are  relative  input  devices.  A 
relative  device,  such  as  the  mouse,  requires  visual  feedback  in 
the  form  of  a  cursor  to  indicate  a  screen  location.  With  absolute 
devices,  the  nulling  problem  (Buxton,  1983)  arises  if  the  posi¬ 
tion  of  a  physical  intermediary,  such  as  a  slider  on  a  mixing  con¬ 
sole,  is  not  in  agreement  with  a  value  set  in  software.  This  prob¬ 


lem  cannot  occur  with  relative  devices,  but  users  may  waste 
time  clutching-,  the  user  must  occasionally  lift  a  mouse  to  repo¬ 
sition  it. 

Number  of  dimensions.  Devices  sense  one  or  more  in¬ 
put  dimensions.  For  example,  a  mouse  senses  two  linear  di¬ 
mensions  of  motion,  a  knob  senses  one  angular  dimension,  and 
a  six  degree-of-freedom  magnetic  tracker  measures  three  posi¬ 
tion  dimensions  and  three  orientation  dimensions.  A  pair  of 
knobs  or  a  mouse  with  a  scroll  wheel  sense  separate  input  di¬ 
mensions  and  thus  form  a  “ID  -I-  ID”  device,  or  a  “2D  -I-  ID” 
multichannel  device,  respectively  (Zhai,  Smith,  &  Selker,  1997). 
Multidegree-offreedom  devices  (3D  input  devices)  sense  three 
or  more  simultaneous  dimensions  of  spatial  position  or  orien¬ 
tation  (Bowman,  Kruijff,  LaViola,  &  Poupyrev,  2004;  Hinckley 
Pausch,  Goble,  &  Kassell,  1994;  Hinckley,  Sinclair,  Hanson, 
Szeliski,  &  Conway,  1999;  Zhai,  1998). 

Indirect  versus  direct.  A  mouse  is  an  indirect  input  de¬ 
vice  because  the  user  must  move  the  mouse  to  indicate  a  point 
on  the  screen,  whereas  a  direct  input  device  has  a  unified  in¬ 
put  and  display  surface.  Direct  devices  such  as  touchscreens, 
or  display  tablets  operated  with  a  pen,  are  not  necessarily  eas¬ 
ier  to  use  than  indirect  devices.  Direct  devices  lack  buttons  for 
state  transitions.  Occlusion  is  also  a  major  design  challenge.  The 
finger  or  pen  occludes  the  area  at  which  a  user  is  pointing,  so 
the  user  may  not  realize  that  he  or  she  has  activated  a  control; 
occlusion  by  the  hand  and  arm  also  may  cause  the  user  to  over¬ 
look  pop-up  menus,  dialogs,  or  status  indicators. 

Device  acquisition  time.  The  average  time  to  move 
one’s  hand  to  a  device  is  known  as  acquisition  time.  Homing 
time  is  the  time  to  return  from  a  device  to  a  “home”  position 
(e.g.,  return  from  mouse  to  keyboard).  For  common  desktop 
workflows  that  involve  switching  between  text  entry  and  point¬ 
ing,  the  effectiveness  of  a  device  for  pointing  tends  to  domi¬ 
nate  acquisition  and  homing  time  costs  (Douglas  &  Mithal, 
1994).  Thus,  integration  of  a  pointing  device  with  the  keyboard 
may  not  improve  overall  performance,  but  evaluations  still  must 
assess  any  influence  of  acquisition  times  (Dillon,  Eday  & 
Tombaugh,  1990;  Hinckley  et  al.,  2006). 

Gain.  Also  known  as  control-to-display  (C :  D)  gain  or 
C :  D  ratio,  gain  is  defined  as  the  distance  moved  by  an  input 
device  divided  by  the  distance  moved  on  the  display.  Gain  con¬ 
founds  what  should  be  two  measurements — (a)  device  size 
and  (b)  display  size — with  one  arbitrary  metric  (Accot  &  Zhai, 
2001;  MacKenzie,  1995),  and  is  therefore  suspect  as  a  factor  for 
experimental  study.  In  experiments,  gain  typically  has  little  or 
no  effect  on  the  time  to  perform  pointing  movements,  but 
variable  gain  functions  may  provide  some  benefit  by  reducing 
the  required  footprint  (physical  movement  space)  of  a  device. 

Other  metrics.  System  designers  must  weigh  other  per¬ 
formance  metrics,  including  pointing  speed  and  accuracy,  er¬ 
ror  rates,  learning  time,  footprint,  user  preference,  comfort,  and 
cost  (Card,  Mackinlay,  &  Robertson,  1990).  Other  important  en¬ 
gineering  parameters  include  sampling  rate,  resolution,  accu¬ 
racy,  and  linearity  (MacKenzie,  1995). 
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A  Brief  Tour  of  Pointing  Devices 

Most  operating  systems  treat  all  input  devices  as  virtual  devices, 
which  tempts  one  to  believe  that  pointing  devices  are  inter¬ 
changeable;  however,  the  details  of  what  the  input  device  senses, 
such  as  how  it  is  held,  the  presence  or  absence  of  buttons,  and 
many  other  properties,  can  significantly  impact  the  interaction 
techniques — and  hence,  the  end-user  tasks — that  a  device  can 
effectively  support.  The  following  tour  discusses  important  prop¬ 
erties  of  several  common  pointing  devices. 

Mice.  Douglas  Englebart  and  colleagues  (English,  Engle- 
bart,  &  Berman,  1967)  invented  the  mouse  in  1967  at  the  Stan¬ 
ford  Research  Institute.  Eorty  years  later,  the  mouse  persists  be¬ 
cause  its  properties  provide  a  good  match  between  human 
performance  and  the  demands  of  graphical  interfaces  (Bala- 
krishnan,  Baudel,  Kurtenbach,  &  Eitzmaurice,  1997).  Eor  typi¬ 
cal  pointing  tasks  on  a  desktop  computer,  one  can  point  with 
the  mouse  about  as  well  as  with  the  hand  itself  (Card,  English,  & 
Burr,  1978).  Because  the  mouse  stays  put  when  the  user  re¬ 
leases  it  (unlike  a  stylus,  for  example),  it  is  quick  for  users  to 
reacquire  and  allows  designers  to  integrate  multiple  buttons  or 
other  controls  on  its  surface.  Users  exert  force  on  mouse  but¬ 
tons  in  a  direction  orthogonal  to  the  mouse’s  plane  of  motion, 
thus  minimizing  inadvertent  motion.  Einally  with  mice,  all  of  the 
muscle  groups  of  the  hand,  wrist,  arm,  and  shoulder  contribute 
to  pointing,  allowing  high  performance  for  both  rapid,  coarse 
movements  as  well  as  slow,  precise  movements  (Guiard,  1987; 
Zhai,  Milgram,  &  Buxton,  1996).  These  advantages  suggest  the 
mouse  is  hard  to  beat;  it  will  remain  the  pointing  device  of  choice 
for  desktop  graphical  interfaces. 

Trackballs.  A  trackball  senses  the  relative  motion  of  a 
partially  exposed  ball  in  two  degrees  of  freedom.  Trackballs  have 
a  small  working  space  {footprint),  and  afford  use  on  an  angled 
surface.  Trackballs  may  require  frequent  clutching  movements 
because  users  must  lift  and  reposition  their  hand  after  rolling 
the  ball  a  short  distance.  The  buttons  are  located  to  the  side 
of  the  ball,  which  can  make  them  awkward  to  hold  while  rolling 
the  ball  (MacKenzie,  Sellen,  &  Buxton,  1991).  A  trackball  en¬ 
gages  different  muscle  groups  than  a  mouse,  offering  an  alter¬ 
native  for  users  who  experience  discomfort  when  using  a  mouse. 

Isometric  joysticks.  An  isometric  joystick  (e.g.,  the  IBM 
Trackpoint)  is  a  force-sensing  joystick  that  returns  to  center 
when  released.  Most  isometric  joysticks  are  stiff,  offering  little 
feedback  of  the  joystick’s  displacement.  The  rate  of  cursor 
movement  is  proportional  to  the  force  exerted  on  the  stick;  as 
a  result,  users  must  practice  in  order  to  achieve  good  cursor 
control.  Isometric  joysticks  may  offer  the  only  pointing  option 
when  space  is  at  a  premium  (Douglas  &  Mithal,  1994;  Rutledge 
&  Selker,  1990;  Zhai  et  al.,  1997). 

Isotonic  joysticks.  Isotonic  joysticks  sense  angle  of  de¬ 
flection.  Some  hybrid  designs  blur  the  distinctions  of  isometric 
and  isotonic  joysticks,  but  the  main  questions  are,  “Does  the 
joystick  sense  force  or  angular  deflection?”  “Does  the  stick  re¬ 
turn  to  center  when  released?”  and  “Does  the  stick  move  from 


the  starting  position?”  Eor  a  discussion  of  the  complex  design 
space  of  joysticks,  see  Lipscomb  and  Pique  (1993). 

Indirect  tablets.  Indirect  tablets  report  the  absolute  po¬ 
sition  of  a  pointer  on  a  sensing  surface.  Touch  tablets  sense  a 
bare  finger,  whereas  graphics  tablets  or  digitizing  tablets  typi¬ 
cally  sense  a  stylus  or  other  physical  intermediary.  Tablets  can 
operate  in  absolute  mode,  with  a  fixed  C :  D  gain  between  the 
tablet  surface  and  the  display,  or  in  relative  mode,  in  which  the 
tablet  responds  only  to  motion  of  the  stylus.  If  the  user  touches 
the  stylus  to  the  tablet  in  relative  mode,  the  cursor  resumes  mo¬ 
tion  from  its  previous  position;  in  absolute  mode,  it  would  jump 
to  the  new  position.  Absolute  mode  is  generally  preferable  for 
tasks  such  as  drawing,  handwriting,  tracing,  or  digitizing,  but  rel¬ 
ative  mode  may  be  preferable  for  typical  desktop  interaction 
tasks  such  as  selecting  graphical  icons  or  navigating  through 
menus.  Tablets  thus  allow  coverage  of  many  tasks  (Buxton,  Hill, 
&  Rowley,  1985),  whereas  mice  only  operate  in  relative  mode. 

Touchpads.  Touchpads  are  small,  touch-sensitive  tablets 
often  found  on  laptop  computers.  Touchpads  use  relative  mode 
for  cursor  control  because  they  are  too  small  to  map  to  an  entire 
screen,  but  most  touchpads  also  have  an  absolute  mode  to  al¬ 
low  features  such  as  sliding  along  the  edge  of  the  pad  to  scroll. 
Touchpads  support  clicking  by  recognizing  tapping  or  double¬ 
tapping  gestures,  but  accidental  contact  (or  loss  of  contact)  can 
erroneously  trigger  such  gestures  (MacKenzie  &  Oniszczak,  1998). 
Like  trackballs,  the  small  size  of  touchpads  necessitates  frequent 
clutching,  and  touchpads  can  be  awkward  to  use  while  holding 
down  a  button,  unless  the  user  employs  his  or  her  other  hand. 

Touchscreens  and  pen-operated  devices.  Touch¬ 
screens  are  transparent,  touch-sensitive  tablets  mounted  on  a 
display.  Some  touchscreens  can  only  sense  a  bare  finger;  oth¬ 
ers  can  sense  either  a  plastic  stylus  or  a  bare  finger.  Transparent 
electromagnetic  digitizers,  such  as  those  found  on  the  Tablet 
PC,  cannot  sense  touch,  and  require  the  use  of  a  special  pen. 
Parallax  error  is  a  mismatch  between  the  sensed  input  position 
and  the  apparent  input  position  due  to  viewing  angle;  look  to 
minimize  the  displacement  between  the  sensing  and  display 
surfaces  to  avoid  this  problem.  Depending  on  the  mounting 
angle,  a  touch  or  pen-operated  display  may  result  in  arm  or 
neck  fatigue  (Sears,  Plaisant,  &  Shneiderman,  1992).  A  touch¬ 
screen  that  is  integrated  with  a  mobile  device  can  be  prone  to 
accidental  contact  when  the  user  picks  up  the  device.  Yet  mo¬ 
bile  devices  that  only  sense  a  pen  require  the  user  to  unsheathe 
the  stylus  to  perform  any  interaction;  the  user  cannot  quickly 
poke  at  the  screen  with  a  finger.  The  limited  states  and  events 
sensed  by  pen  or  touch-operated  devices  raise  additional  design 
challenges,  as  discussed  below. 


Input  Device  States 

There  is  a  fundamental  mismatch  between  the  demands  of 
graphical  interfaces  and  the  states  and  events  that  can  be  sensed 
by  devices  such  as  touchscreens  and  pen-operated  devices, 
which  makes  it  difficult  to  support  the  full  set  of  graphical  in- 
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terface  primitives,  including  click,  drag,  double-click,  and  right- 
click.  There  is  no  easy  solution  when  using  nonstandard  point¬ 
ing  devices  that  does  not  involve  design  compromises.  When 
considering  such  devices,  to  make  device  limitations  and  dif¬ 
ferences  concrete,  one  of  the  first  things  a  designer  should  do  is 
diagram  all  of  these  states  and  transitions. 

Input  devices  taken  in  general  support  three  possible  states 
(Fig.  9.1):  (a)  out-of-range,  (b)  tracking,  and  (c)  dragging;  prac¬ 
titioners  refer  to  these  as  State  0,  State  1,  and  State  2,  respec¬ 
tively,  of  the  three-state  model  (Buxton,  1990).  This  model  is 
useful  for  reasoning  about  the  relationship  between  the  events 
sensed  by  an  input  device  and  the  demands  of  interaction 
techniques. 

The  three-state  model  describes  the  mouse  as  a  two-state  de¬ 
vice,  supporting  State  1,  the  cursor  tracking  state,  as  well  as 
State  2,  the  dragging  state.  State  1  provides  cursor  feedback  of 
the  screen  position  that  the  device  will  act  upon,  while  State  2 
allows  the  user  to  drag  an  object  by  holding  down  the  primary 
mouse  button  while  moving  the  mouse.  The  mouse  senses 
movement  in  both  the  tracking  and  dragging  states,  as  repre¬ 
sented  by  the  dx,  dy  in  each  state  (Fig.  9.2,  left),  indicating  rela¬ 
tive  motion  tracking  capability. 

Many  touch-activated  devices  such  as  touchscreens,  touch- 
pads,  and  PDA  screens  are  also  two-state  devices,  but  do  not 
sense  the  same  two  states  as  the  mouse  (Fig.  9.2,  right).  For 
example,  a  PDA  can  sense  a  finger  when  it  is  in  contact  with 
the  screen;  this  is  the  equivalent  of  the  mouse  dragging  state 
(State  2).  The  PDA  can  also  sense  when  the  finger  is  removed 
from  the  screen,  but  once  the  finger  breaks  contact,  this  enters 
State  0  (out-of-range),  where  no  motion  can  be  detected  (em¬ 
phasized  by  the  nil  in  state  0  of  Fig.  9.2,  right).  Thus,  although 
the  mouse  and  PDA  screen  both  sense  two  states,  the  lack  of 
a  second  motion-sensing  state  on  the  PDA  means  that  it  will  be 
difficult  to  support  the  same  interaction  techniques  as  a 
mouse.  For  example,  should  sliding  one’s  finger  on  the  screen 
move  a  cursor,  or  drag  an  object?  The  designer  must  choose 
one;  the  PDA  screen  cannot  support  both  behaviors  at  the 
same  time. 

The  Tablet  PC  is  an  example  of  a  pen-operated  device  that 
senses  all  three  states  of  the  three-state  model  (Fig.  9.3).  The 
Tablet  PC  senses  the  location  of  the  stylus  when  it  is  proximate 
to  the  screen.  The  pen  triggers  an  event  when  it  makes  contact 
with  the  screen,  as  well  as  when  it  enters  or  leaves  proximity. 

Unfortunately,  even  with  all  three  states,  it  is  still  difficult  for 
a  pen  to  support  all  the  interaction  techniques  offered  by  a 
mouse.  To  help  illustrate  why  this  is  the  case,  we  can  extend 
the  three-state  model  to  more  fully  characterize  the  interaction 
techniques  at  the  core  of  graphical  user  interfaces  (Fig.  9.4).  The 


State  Description 

0  Out  Of  Range:  The  device  is  not  in  its  physical  tracking 

range. 

1  Tracking:  Device  motion  moves  only  the  cursor. 

2  Dragging:  Device  motion  moves  objects  on  the  screen. 


FIGURE  9. 1 .  Summary  of  states  in  Buxton’s  three-state  model. 
Adapted  from  (Buxton,  1990). 


FIGURE  9.2.  States  sensed  by  a  mouse  (left)  versus  states 
sensed  by  touch-operated  devices  such  as  touchpads  (right). 
Adapted  from  (Hinckley,  Czerwinski,  &  Sinclair,  1 998a) 


FIGURE  9.3.  States  sensed  by  a  Tablet  PC  pen.  Adapted  from 
(Hinckley  et  ah,  1998a). 


FIGURE  9.4.  The  five  states  and  transition  rules  at  the  core  of 
graphical  user  interfaces. 


resulting  five  states  of  graphical  user  interfaces  are  (a)  Tracking 
(1),  (b)  Hover  (1^),  (c)  Left  Click  (2^),  (d)  Dragging  (2^,),  and 
(e)  Right  Click  (2^). 

This  diagram  suggests  will  be  difficult  to  support  all  of  these 
interactions  on  a  pen  or  touch-based  device;  there  is  no  ele¬ 
gant  solution  in  the  literature.  In  this  five-state  model,  a  click  is 
the  series  of  transitions  1-2 with  no  motion  in  state  2^,  and  a 
double  click  is  l-2i-l-2i-l  with  no  motion  between  the  two 
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clicks.  Even  if  the  device  can  sense  State  1,  hovering  (1^)  for 
help  text  requires  holding  the  pointer  motionless  above  the  dis¬ 
play.  These  gestures  are  all  difficult  to  perform  with  a  pen  or  fin¬ 
ger  because  touching  or  lifting  disturb  the  cursor  position  (Bux¬ 
ton  et  al.,  1985).  Furthermore,  because  the  pen  may  be  out  of 
range,  and  because  the  user  must  move  the  pen  through  the 
tracking  zone  to  enter  State  2,  pen  operated  input  devices  lack 
a  well-defined  current  cursor  position. 

Pen  and  touch  devices  also  lack  a  second  button  for  right- 
click.  A  finger  obviously  has  no  buttons,  but  even  a  barrel  button 
on  a  pen  is  slow  to  access  and  is  prone  to  inadvertent  activa¬ 
tion  (Li,  Hinckley,  Guan,  &  Landay,  2005).  Some  mobile  devices 
use  dwelling  with  the  pen  or  finger  as  a  way  to  simulate  right- 
click,  but  the  timeout  introduces  an  unavoidable  delay;  for  rapid 
activation,  it  should  be  short  as  possible,  but  to  avoid  inadver¬ 
tent  activation  (e.g.,  if  the  user  is  thinking  about  the  next  ac¬ 
tion  to  take  while  resting  the  pen  on  the  screen) ,  the  timeout 
must  be  as  long  as  possible.  A  500-millisecond  timeout  offers  a 
reasonable  compromise  (Hinckley,  Baudisch,  Ramos,  &  Guim- 
bretiere,  2005a).  Even  techniques  designed  for  pen-operated 
devices  (Apitz  &  Guimbretiere,  2004;  Kurtenbach  &  Buxton, 
1991;  Moran,  Chiu,  &  van  Melle,  1997)  require  rapid  and  unam¬ 
biguous  activation  of  one  of  several  possible  actions  as  funda¬ 
mental  building  blocks;  otherwise,  inefficient  or  highly  modal 
interactions  become  necessary  and  may  reduce  the  appeal  of 
such  devices  (Hinckley  et  al.,  2006). 

Similar  issues  plague  other  interaction  modalities,  such  as 
(a)  motion-sensing  mobile  devices  (Hinckley,  Pierce,  Horvitz, 
&  Sinclair,  2005b),  (b)  camera-based  tracking  of  the  hands  (Wil¬ 
son  &  Oliver,  2003),  and  (c)  3D  input  devices  (Hinckley  et  al., 
1994).  All  of  these  techniques  require  a  method  for  users  to 
move  the  device  or  their  hands  without  accidentally  performing 
an  action.  Thus  state  transitions  form  fundamental  indications 
of  intent  that  are  essential  for  rapid  and  dependable  interaction. 


WHAT'S  AN  INPUT  DEVICE  FOR? 

THE  COMPOSITION  OF  USER  TASKS 

One  way  of  reasoning  about  input  devices  and  interaction  tech¬ 
niques  is  to  view  a  device  or  technique  in  light  of  the  tasks  that 
it  can  express.  But  what  sort  of  tasks  are  there? 


Elemental  Tasks 

While  computers  can  support  many  activities,  at  the  input  level, 
some  subtasks  appear  repeatedly  in  graphical  user  interfaces, 
such  as  pointing  at  a  target  on  the  screen  or  typing  a  character. 
Foley  and  colleagues  (Foley,  Wallace,  &  Chan,  1984)  identified 
elemental  tasks  including  (a)  text  (entering  symbolic  data),  (b)  se¬ 
lect  (indicating  objects  from  a  set  of  alternatives),  (c)  position 
(pointing  to  a  screen  coordinate),  and  (d)  quantify  (specifying 
an  exact  numeric  value).  If  these  are  elemental  tasks,  however, 
then  where  do  devices  such  as  global  positioning  system  (GPS) 
readers,  cameras,  or  fingerprint  scanners  fit  in?  These  offer  new 
“elemental”  data  types  (e.g.,  location,  images,  and  identity).  Ad¬ 


vances  in  technology  will  continue  to  yield  data  types  that  en¬ 
able  new  tasks  and  scenarios  of  use. 

Compound  Tasks  and  Chunking 

Another  problem  with  the  elemental  task  approach  is  that  the 
level  of  analysis  for  “elemental”  tasks  is  not  well  defined.  For 
example,  a  mouse  indicates  an  (x,y)  position  on  the  screen,  but 
an  Etch  A  Sketch®  separates  positioning  into  two  subtasks  by 
providing  a  single  knob  forx  and  a  single  knob  fory  (Buxton, 
1986b).  If  position  is  an  elemental  task,  why  must  we  subdivide 
this  task  for  some  devices  but  not  others?  One  way  to  resolve 
this  puzzle  is  to  view  all  tasks  as  hierarchies  of  subtasks  (Eig. 
9.5).  Whether  or  not  a  task  is  “elemental”  depends  on  the  in¬ 
put  device  being  used:  the  Etch  A  Sketch  supports  separate 
Quantify X and  QuantifyY tasks,  whereas  the  mouse  supports  a 
compound  2D  Position  task  (Buxton,  1986a). 

From  the  user’s  perspective,  a  series  of  elemental  tasks  may 
seem  like  a  single  task.  For  example,  scrolling  a  web  page  to 
click  on  a  link  could  be  conceived  as  an  elemental  ID  position¬ 
ing  task  followed  by  a  2D  selection  task,  or,  it  can  be  viewed  as 
a  compound  navigation! selection  task  (Buxton  &  Myers,  1986) . 
An  interaction  technique  can  encourage  the  user  to  work  at  the 
higher  level  of  the  compound  task,  for  example,  by  scrolling 
with  one  hand  while  pointing  to  the  link  with  the  other  hand. 
This  is  known  as  chunking. 

These  examples  show  that  the  choice  of  device  influences 
the  level  at  which  the  user  is  required  to  think  about  the  indi¬ 
vidual  actions  that  must  be  performed  to  achieve  a  goal.  The 
design  of  input  devices  and  interaction  techniques  can  help  to 
structure  the  interface  such  that  there  is  a  more  direct  match 
between  the  user’s  tasks  and  the  low-level  syntax  of  the  indi¬ 
vidual  actions  that  must  be  performed  to  achieve  those  tasks. 
The  choice  of  device  and  technique  thus  directly  influences  the 
steps  required  of  the  user  and  hence  the  apparent  complexity  of 
an  interface  design  (Buxton,  1986a). 


EVALUATION  AND  ANALYSIS 
OF  INPUT  DEVICES 

Beyond  standard  usability  engineering  techniques  (see  Part  'VI: 
The  Development  Process),  a  number  of  techniques  tailored  to 
the  study  of  input  devices  also  exist.  Representative  tasks  (Bux¬ 
ton,  1995c),  such  as  target  acquisition,  pursuit  tracking,  free¬ 
hand  drawing,  and  dragging  versus  tracking  performance 
(MacKenzie  et  al.,  1991),  can  be  used  to  formally  or  informally 
evaluate  devices.  Here,  we  focus  on  formal  analysis  using  Fitts’ 
Law,  the  Steering  Law,  and  the  Keystroke-Level  Model. 

ID  Position  2D  Position  3D  Position 


QuantifyX  QuantifyX  QuantifyY  QuantifyX  QuantifyY  QuantifyZ 

FIGURE  9.5.  Task  hierarchies  for  ID,  2D,  and  3D  position  tasks. 


1 66  •  HINCKLEY 


Fitts'  Law  and  Hick's  Law 

Fitts’  Law  (Fitts,  1954)  is  an  experimental  paradigm  that  has  been 
widely  applied  to  the  comparison  and  optimization  of  pointing 
devices.  Fitts’  Law  is  used  to  measure  how  effectively  a  pointing 
device  can  acquire  targets  on  the  screen.  Fitts’  Law  was  first  ap¬ 
plied  to  the  study  of  input  devices  by  Card,  English,  and  Burr 
(1978);  it  is  now  a  standard  for  device  comparisons  (Douglas, 
Kirkpatrick,  &  MacKenzie,  1999).  Fitts’  Law  applies  to  a  remark¬ 
ably  diverse  task  conditions,  including  (a)  rate-controlled  devices 
(MacKenzie,  1992a),  (b)  area  cursors  (Kabbash  &  Butxon,  1995), 
(c)  scrolling  (Hinckley  et  al.,  2002),  and  (d)  zooming  (Guiard, 
Buourgeois,  Mottet,  &  Beaudouin-Lafon,  2001).  For  further  guid¬ 
ance  on  conducting  Fitts’  Law  studies,  see  Douglas  and  colleagues 
(1999),  MacKenzie  (1992b),  and  Raskin  (2000). 

The  standard  Fitts’  task  paradigm  measures  the  movement 
time  iWT’ between  two  targets  separated  by  amplitude  z\,  with  a 
width  W  of  error  tolerance  (Fig.  9.6). 

Fitts’  Law  states  that  a  logarithmic  function  of  the  ratio  of  A 
to  W  predicts  the  average  movement  lime  MT.  The  Fitts’  Law 
formulation  typically  used  for  input  device  studies  is 

MT  =  a  +  b  log2(A/W  +  1)  (9.1) 

Here,  the  constants  a  and  b  are  coefficients  fit  to  the  average 
of  all  observed  MT  for  each  combination  of  A  and  W  in  the  ex¬ 
periment.  One  calculates  a  and  b  via  linear  regression  using  a 
statistical  package  or  spreadsheet.  The  constants  a  and  b  de¬ 
pend  heavily  on  the  exact  task  setting  and  input  device,  so  be 
wary  of  substituting  “typical”  values  for  these  constants,  or  of 
comparing  constants  derived  from  different  studies. 

Psychomotor  interpretations  for  Fitts’  Law  have  been  pro¬ 
posed  (Douglas  &  Mithal,  1997);  however,  since  the  law  char¬ 
acterizes  the  central  tendency  of  a  large  number  of  pointing 
movements,  the  law  may  simply  reflect  information-theoretic 
entropy  (MacKenzie,  1989).  For  example.  Hick’s  Law,  a  model  of 
decision  time  for  a  set  of  choices  (e.g.,  in  a  menu),  has  a  gen¬ 
eral  form  almost  identical  to  Fitts’  Law: 

H  =  log2(w  -F  1)  (9.2) 

Here,  n  is  the  number  of  equally  probably  alternatives,  and// 
is  the  entropy  of  the  decision.  If  we  view  Fitts’  task  (0)  as  a  “de- 
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FIGURE  9.6.  Fitts’  task  paradigm  (Fitts,  1954). 
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cision”  along  the  amplitude  A  between  n  discrete  targets  of 
width  W,  this  raises  the  possibility  that  Fitts’  Law  and  Hick’s  Law 
are  fundamentally  the  same  law  where  n  =  A/W. 


The  Steering  Law  and  Minimum  Jerk  Law 

Steering  a  cursor  through  a  narrow  tunnel,  as  required  to  navi¬ 
gate  a  pull-down  menu,  is  not  a  Fitts’  task  because  steering  re¬ 
quires  a  continuous  accuracy  constraint:  the  cursor  must  stay 
within  the  tunnel  at  all  times.  For  a  straight-line  tunnel  (Fig.  9.7) 
of  width  W  and  length  A,  for  example,  the  Steering  Law  predicts 
that  movement  time  is  a  linear  function  of  A  and  W-. 

MT=a  +  bA/W  (9.3) 

The  Steering  Law  can  also  model  arbitrary  curved  paths,  as 
well  as  instantaneous  velocity  (Accot  &  Zhai,  1997).  A  limitation 
of  the  Steering  Law  is  that  it  only  models  successful  completion 
of  the  task;  errors  are  not  considered. 

The  Minimum  Jerk  Law  (Viviani  &  Flash,  1995)  characterizes 
the  dynamics  of  motions  that  may  follow  a  curved  path  but  do 
not  have  a  continuous  accuracy  constraint.  The  law  states  that 
unconstrained  human  movement  trajectories  tend  to  minimize 
the  derivative  of  acceleration  (jerk);  one  of  its  implications  is 
that  there  is  a  two-thirds  power  law  linking  tangential  velocity 
and  path  curvature;  however,  no  one  has  yet  formulated  a  uni¬ 
versal  law  that  handles  varying  accuracy  constraints  and  curva¬ 
ture  (Lank  &  Saund,  2005). 


The  Keystroke-Level  Model  (KLM)  and  GOMS  Analysis 

The  KLM  is  an  engineering  and  analysis  tool  that  can  be  used 
to  estimate  the  time  needed  for  experts  to  complete  a  routine 
task  (Card,  Moran,  &  Newell,  1980).  To  apply  the  KLM,  count  the 
elemental  inputs  required  to  complete  a  task,  including  (a)  key¬ 
strokes,  (b)  homing  times  to  acquire  input  devices,  (c)  pauses 
for  mental  preparation,  and  (d)  pointing  at  targets.  For  each  el¬ 
emental  input,  substitute  a  constant  estimate  of  the  average 
time  required  using  the  values  from  Card  et  al.  (1980),  or  by  col¬ 
lecting  empirical  data  (Hinckley  et  al.,  2006),  and  sum  them  to 
yield  an  overall  time  estimate.  The  model  assumes  error-free  ex¬ 
ecution,  so  it  cannot  estimate  time  for  the  problem-solving  be¬ 
haviors  of  novices,  but  it  does  employ  several  heuristics  to 
model  mental  pauses  (Raskin,  2000). 

GOMS  (Goals,  Objects,  Methods,  and  Selection  rules)  mod¬ 
els  extend  the  keystroke-level  model  (John  &  Kieras,  1996). 
Some  GOMS  models  can  account  for  user  knowledge  and  in- 
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FIGURE  9.7.  The  Steering  Law  for  a  straight  tunnel.  The  user 
must  follow  the  dotted  line  without  moving  beyond  the  borders. 
Adapted  from  (Accot  &  Zhai,  1999). 
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terleaving  of  tasks,  but  are  more  difficult  to  apply  than  the  KLM. 
Both  GOMS  and  KLM  models  are  engineering  tools  that  pro¬ 
duce  estimates  for  expert  completion  time  of  routine  tasks. 
These  models  do  not  replace  the  need  for  usability  testing  and 
evaluation,  but  they  do  offer  a  means  to  assess  a  design  without 
implementing  software,  training  end  users,  and  evaluating  their 
performance  (Olson  &  Olson,  1990).  Physical  articulation  times 
derived  from  KLM  or  GOMS  analyses  can  also  be  used  to  help 
interpret  results  of  empirical  studies  (Hinckley  et  al.,  2006;  see 
also  chapter  60,  “Model-based  Evaluations”;  and  chapter  5, 
“Cognitive  Architecture”. 


TRANSFER  FUNCTIONS: 

HOW  TO  TRANSFORM  AN  INPUT  SIGNAL 


A  transfer  function  is  a  mathematical  transformation  that  scales 
the  data  from  an  input  device.  Typically,  the  goal  is  to  provide 
more  stable  and  more  intuitive  control,  but  one  can  easily  de¬ 
sign  a  poor  transfer  function  that  hinders  performance.  A  trans¬ 
fer  function  that  matches  the  properties  of  an  input  device  is 
known  as  an  appropriate  mapping.  For  force  sensing  input  de¬ 
vices,  the  transfer  function  should  be  a  force-to-velocity  func¬ 
tion;  for  example,  the  force  one  exerts  on  the  IBM  Trackpoint 
isometric  joystick  controls  the  speed  at  which  the  cursor  moves. 
Other  appropriate  mappings  include  position-to-position  or 
velocity-to-velocity  functions,  used  with  tablets  and  mice, 
respectively. 

A  common  example  of  an  inappropriate  mapping  is  calculat¬ 
ing  a  velocity  based  on  the  position  of  the  mouse  cursor,  such 
as  to  scroll  a  document.  The  resulting  input  is  difficult  to  control, 
and  this  inappropriate  rate  mapping  is  only  necessary  because 
the  operating  system  clips  the  cursor  to  the  screen  edge.  A  bet¬ 
ter  solution  would  be  to  ignore  the  cursor  position  and  instead 
use  the  relative  position  information  reported  by  the  mouse  to 
directly  control  the  change  of  position  within  the  document. 

Self-centering  devices.  Rate  mappings  suit  force-sensing 
devices  or  other  devices  that  return  to  center  when  released 
(Zhai,  1993;  Zhai  et  al.,  1997).  This  property  allows  the  user  to 
stop  quickly  by  releasing  the  device.  The  formula  for  a  nonlinear 
rate  mapping  is 

dx  =  Kyf^  (9.4) 

Where  x  is  the  input  signal,  dx  is  the  resulting  rate,  if  is  a  gain 
factor,  and  a  is  the  nonlinear  parameter.  The  best  values  for  AT 
and  a  depend  on  the  details  of  the  device  and  application,  and 
appropriate  values  must  be  identified  by  experimentation  or 
optimal  search  (Zhai  &  Milgram,  1993).  Many  commercial  de¬ 
vices  use  more  complex  mappings  (Rutledge  &  Selker,  1990). 

Motion-sensing  devices.  Desktop  systems  use  an  expo¬ 
nential  transformation  of  the  mouse  velocity,  known  as  an  ac¬ 
celeration  function,  to  modify  the  cursor  response  (Microsoft 
Corp.,  2002).  Acceleration  functions  do  not  directly  improve 
pointing  performance,  but  do  limit  the  footprint  required  by  a 


device  (Jellinek  &  Card,  1990),  which  may  lead  to  greater  com¬ 
fort  or  less  frequent  clutching  (Hinckley  et  al.,  2002). 

Absolute  devices.  It  is  possible  to  temporarily  violate  the 
1 : 1  control-to-display  mapping  of  absolute  devices  such  as 
touchscreens  by  damping  small  motions  to  provide  fine  adjust¬ 
ments;  large  motions  revert  to  an  absolute  1 : 1  mapping  (Sears 
&  Shneiderman,  1991).  A  drawback  of  this  technique  is  that  cur¬ 
sor  feedback  in  the  tracking  state  becomes  the  default  behav¬ 
ior  of  the  device,  rather  than  dragging  (Buxton,  1990),  but  re¬ 
searchers  are  exploring  ways  to  overcome  this  (Benko,  Wilson, 
&  Baudisch,  2006). 


FEEDBACK:  WHAT  HAPPENS 
IN  RESPONSE  TO  AN  INPUT? 


From  the  technology  perspective,  one  can  consider  feedback 
as  active  or  passive.  Active  feedback  is  under  computer  control; 
passive  feedback  is  not,  and  may  result  from  internal  sensations 
within  the  user’s  own  body,  such  as  muscle  tension  from  hold¬ 
ing  down  a  button,  or  physical  properties  of  the  device,  such 
as  the  feel  of  clicking  its  buttons. 

The  industrial  design  suggests  how  to  use  a  device  before  a 
user  touches  it  (Norman,  1990).  Mechanical  sounds  and  vibra¬ 
tions  produced  by  a  device  provide  positive  feedback  for  the 
user’s  actions  (Lewis,  Potosnak,  &  Magyar,  1997).  The  shape  of 
the  device  and  the  presence  of  landmarks  can  help  users  ac¬ 
quire  a  device  without  having  to  look  at  it  (Hinckley,  Pausch, 
Proffitt,  &  Kassell,  1998b). 

Proprioceptive  and  Kinesthetic  Feedback 

Internal  sensations  of  body  posture,  motion,  and  muscle  ten¬ 
sion  (Burdea,  1996;  Gibson,  1962)  may  allow  users  to  feel  how 
they  are  moving  an  input  device  without  either  looking  at  the 
device  or  receiving  visual  feedback  on  a  display.  This  is  impor¬ 
tant  when  the  user’s  attention  is  divided  between  multiple  tasks 
and  devices  (Balakrishnan  &  Hinckley,  1999;  Fitzmaurice  &  Bux¬ 
ton,  1997;  Mine,  Brooks,  &  Sequin,  1997).  Muscular  tension  can 
help  to  phrase  together  multiple  related  inputs  (Buxton,  1986a) 
and  may  make  mode  transitions  more  salient  to  the  user 
(Hinckley  et  al.,  2006;  Raskin,  2000;  Sellen,  Kurtenbach,  &  Bux¬ 
ton,  1992). 

Kinesthetic  Correspondence 

Graphical  feedback  on  the  screen  should  correspond  to  the  di¬ 
rection  that  the  user  moves  the  input  device  (Britton,  Lipscomb, 
&  Pique,  1978).  If  the  user  moves  a  device  to  the  left,  then  the 
object  on  the  screen  should  likewise  move  left;  however,  users 
can  easily  adapt  to  certain  kinds  of  noncorrespondences:  when 
the  user  moves  a  mouse  forward  and  backward,  the  cursor  ac¬ 
tually  moves  up  and  down  on  the  screen;  if  the  user  drags  a 
scrollbar  downward,  the  text  on  the  screen  scrolls  upwards.  Re¬ 
searchers  have  also  found  that  the  dimensions  of  an  input  de- 


1 68  •  HINCKLEY 


vice  should  match  the  perceptual  structure  of  a  task  (Jacob,  Sib- 
ert,  McFarlane,  &  Mullen,  1994). 

Snapping  Behaviors  and  Active  Haptic  Feedback 

Software  constraints,  such  as  snapping  (Baudisch,  Cutrell, 
Hinckley,  &  Eversole,  2005),  often  suffice  to  support  a  user’s 
tasks.  Active  force  or  tactile  feedback  (Burdea,  1996)  can  pro¬ 
vide  attractive  forces  for  a  target,  or  additional  feedback  for  the 
boundaries  of  a  target,  but  such  feedback  typically  yields  little  or 
no  performance  advantage  even  for  isolated  target  selection 
(Akamatsu  &  Mackenzie,  1996;  MacKenzie,  1995).  Such  tech¬ 
niques  must  evaluate  selection  among  multiple  targets,  because 
haptic  feedback  or  snapping  behavior  for  one  target  interferes 
with  the  selection  of  others  (Grossman  &  Balakrishnan,  2005; 
Oakley,  Brewster,  &  Gray,  2001).  Visual  dominance  refers  to  the 
tendency  for  vision  to  dominate  other  modalities  (Wickens, 
1992);  haptic  feedback  typically  must  closely  match  visual  feed¬ 
back,  which  limits  its  utility  as  an  independent  modality  (Camp¬ 
bell,  Zhai,  May,  &  Maglio,  1999).  One  promising  use  of  tactile 
feedback  is  to  improve  state  transitions  (Poupyrev  &  Maruyama, 
2003;  Snibbe  &  MacLean,  2001).  For  further  discussion  of  ac¬ 
tive  feedback  modalities,  see  chapter  12,  Haptic  Interfaces,  and 
chapter  13,  “Nonspeech  Auditory  Output.” 


KEYBOARDS  AND  TEXT  ENTRY 


Typewriters  have  been  in  use  for  over  100  years;  the  QWERTY 
key  layout  dates  to  1868  (Yamada,  1980).  Despite  the  antiquity 
of  the  design,  Q'WERTY  keyboards  are  extremely  well  suited  to 
human  performance  and  are  unlikely  to  be  supplanted  by  new 
key  layouts,  speech  recognition  technologies,  or  other  tech¬ 
niques  any  time  soon.  Many  factors  can  influence  typing  per¬ 
formance,  including  the  size,  shape,  activation  force,  key  travel 
distance,  and  the  tactile  and  auditory  feedback  provided  by 
striking  the  keys  (Lewis  et  al.,  1997),  but  these  well-established 
design  details  are  not  our  focus  here. 

Procedural  Memory 

Procedural  memory  allows  performance  of  complex  sequences 
of  practiced  movements,  seemingly  without  any  cognitive  effort 
(Anderson,  1980).  Procedural  memory  enables  touch  typing  on 
a  keyboard  with  minimal  attention  while  entering  commonly 
used  symbols.  As  a  result,  users  can  focus  attention  on  mental 
composition  and  verification  of  the  text  appearing  on  the  screen. 
Dedicated  or  chorded  key  presses  for  frequently  used  com¬ 
mands  (hotkeys)  likewise  allow  rapid  command  invocation 
(McLoone,  Hinckley,  &  Cutrell,  2003).  The  automation  of  skills  in 
procedural  memory  is  described  by  the  power  law  of  practice: 

T  =  aP>’  (9.5) 

Here,  T  is  the  time  to  perform  a  task,  P  is  the  amount  of  prac¬ 
tice,  and  the  multiplier  a  and  exponent  b  are  fit  to  the  observed 


data  (Anderson,  1980).  For  a  good  example  of  applying  the 
power  law  of  practice  to  text  entry  research,  see  MacKenzie, 
Kober,  Smith,  Jones,  and  Skepner  (2001). 

Alternative  keyboard  layouts  such  as  Dvorak  offer  about  a  596 
performance  gain  (Lewis  et  al.,  1997),  but  the  power  law  of  prac¬ 
tice  suggests  this  small  gain  comes  at  a  substantial  cost  for 
retraining  time;  however,  ergonomic  QWERTY  keyboards  do 
preserve  much  of  a  user’s  skill  for  typing.  These  split-angle  key¬ 
boards  are  not  faster,  but  some  can  help  maintain  neutral  pos¬ 
ture  of  the  wrist  and  thus  avoid  ulnar  deviation  (Honan,  Serina, 
Tal,  &  Rempel,  1995;  Marklin,  Simoneau,  &  Monroe,  1997;  Smutz, 
Serina,  Bloom,  &  Rempel,  1994),  which  has  been  associated  with 
increased  pressure  in  the  carpal  tunnel  (Putz-Anderson,  1988; 
Rempel,  Bach,  Gordon,  &  Tal,  1998). 

Mobile  Text  Entry,  Character  Recognition, 
and  Handwriting  Recognition 

The  difficulty  of  entering  text  on  handheld  devices  and  cell 
phones  has  led  to  many  new  text-entry  techniques  (MacKen¬ 
zie,  2002),  but  most  offer  only  10-20  words-per-minute  (wpm) 
typing  rates,  compared  to  approximately  60  wpm  for  a  touch 
typist. 

Many  designs  for  cell  phones  and  other  handheld  devices, 
such  as  the  RIM  Blackberry,  offer  two-thumb  keyboards  with 
QWERTY  key  layouts.  The  principal  virtue  of  QWERTY  is  that 
common  pairs  of  letters  tend  to  occur  on  opposite  hands.  This 
alternation  is  a  very  efficient  movement  pattern  for  both  stan¬ 
dard  and  two-thumb  keyboards,  since  one  hand  completes  a 
key  press  while  the  other  hand  moves  to  the  next  key  (MacKen¬ 
zie  &  Soukoreff,  2002).  A  recent  study  found  that  two-thumb 
keyboards  offer  text  entry  rates  approaching  60  wpm  (Clarkson, 
Clawson,  Lyons,  &  Starner,  2005).  This  suggests  that  one-handed 
text  entry  rates  are  fundamentally  limited  due  to  the  serial  na¬ 
ture  of  character  entry,  despite  novel  improvements  (MacKenzie 
et  al.,  2001;  Wigdor  &  Balakrishnan,  2003).  Word  prediction  may 
help,  but  also  requires  overhead  for  users  to  monitor  and  de¬ 
cide  whether  to  use  the  predictions. 

Soft  keyboards  and  character  recognition  techniques  are 
popular  for  pen-operated  devices,  but  likewise  are  limited  to  se¬ 
rial  entry.  Soft  keyboards  depict  keys  in  a  graphical  user  inter¬ 
face  to  allow  typing  with  a  touchscreen  or  stylus.  Design  issues 
for  soft  keyboards  differ  from  mechanical  keyboards.  Soft  key¬ 
boards  demand  visual  attention  because  the  user  must  look  at 
the  keyboard  to  aim  the  pointing  device.  Only  one  key  at  a  time 
can  be  touched,  so  much  of  the  time  is  spent  moving  back  and 
forth  between  keys  (Zhai,  Hunter,  &  Smith,  2000).  A  soft  key¬ 
board  can  allow  the  user  to  draw  gestures  across  multiple  keys; 
in  combination  with  a  language  model,  this  can  allow  entire 
words  to  be  entered  with  a  single  pen  gesture  (Kristensson  & 
Zhai,  2004). 

Handwriting  (even  on  paper,  with  no  “recognition”  involved) 
proceeds  at  about  15  wpm.  Ink  has  significant  value  as  a  natural 
data  type  without  recognition:  it  offers  an  expressive  mix  of  writ¬ 
ing,  sketches,  and  diagrams.  Although  recognition  technology 
continues  to  improve,  recognizing  natural  handwriting  remains 
difficult  and  error  prone  for  computers,  and  demands  error  cor¬ 
rection  input  from  the  user.  To  make  performance  more  pre- 
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dictable  for  the  user,  some  devices  instead  rely  on  single-stroke 
gestures,  known  as  unistrokes  (Goldberg  &  Richardson,  1993), 
mdudm^ graffiti  for  the  PalmPilot.  Unistroke  alphabets  attempt 
to  strike  a  design  balance  such  that  each  letter  is  easy  for  a  com¬ 
puter  to  distinguish,  yet  also  straightforward  for  users  to  learn 
(MacKenzie  &  Zhang,  1997). 


MODALITIES  OF  INTERACTION 


In  the  search  for  designs  that  enhance  interfaces  and  enable 
new  usage  scenarios,  researchers  have  explored  several  strate¬ 
gies  that  transcend  any  specific  type  of  device. 


Speech  and  Multimodal  Input 

Speech  has  substantial  value  without  recognition.  Computers 
can  augment  human-human  communication  across  both  time 
and  space  by  allowing  users  to  record,  edit,  replay,  or  transmit 
digitized  speech  and  sounds  (Arons,  1993;  Buxton,  1995b;  Stifel- 
man,  1996).  Systems  can  also  use  microphone  input  to  detect 
ambient  speech  and  employ  this  as  a  cue  to  help  prioritize  no¬ 
tifications  (Horvitz,  Jacobs,  &  Hovel,  1999;  Sawhney  &  Schmandt, 
2000;  Schmandt,  Marmasse,  Marti,  Sawhney,  &  Wheeler,  2000). 

Computer  understanding  of  human  speech  does  not  enable 
users  to  talk  to  a  computer  as  one  would  converse  with  another 
person  (but  see  also  chapter  19,  Conversational  Interfaces  and 
Technologies).  Speech  recognition  can  succeed  for  a  limited  vo¬ 
cabulary,  such  as  speaking  the  name  of  a  person  from  one’s  con¬ 
tact  list  to  place  a  cell  phone  call;  however,  error  rates  increase 
as  the  vocabulary  and  complexity  of  the  grammar  grows,  if  the 
microphone  input  is  poor,  or  if  users  employ  “out-of-vocabulary” 
words.  It  is  difficult  to  use  speech  to  refer  to  spatial  locations,  so 
it  cannot  eliminate  the  need  for  pointing  (Cohen  &  Sullivan, 
1989;  Oviatt,  DeAngeli,  &  Kuhn,  1997;  see  also  chapter  21, 
“Multimodal  Interfaces”).  Currently,  keyboard-mouse  text  entry 
for  the  English  language  is  about  twice  as  fast  as  automatic 
speech  recognition  (Karat,  Halverson,  Horn,  &  Karat,  1999);  fur¬ 
thermore,  speaking  can  interfere  with  one’s  ability  to  compose 
text  and  remember  words  (Karl,  Pettey  &  Shneiderman,  1993). 
Finally,  speech  is  inherently  nonprivate  in  public  situations. 
Thus,  speech  has  an  important  role  to  play,  but  claims  that 
speech  will  soon  supplant  manual  input  devices  should  be  con¬ 
sidered  with  skepticism. 

Bimanual  Input 

People  use  both  hands  to  accomplish  most  real-world  tasks 
(Guiard,  1987),  but  computer  interfaces  make  little  use  of  the 
nonpreferred  hand  for  tasks  other  than  typing.  Bimanual  input 
enables  compound  input  tasks  such  as  navigation/selection 
tasks,  where  the  user  can  scroll  with  the  nonpreferred  hand 
while  using  the  mouse  in  the  preferred  hand  (Buxton  &  Myers, 
1986).  This  assignment  of  roles  to  the  hands  corresponds  to 
Guiard’s  kinematic  chain  theory  (Guiard,  1987):  the  nonpre¬ 


ferred  hand  sets  a  frame  of  reference  (scrolling  to  a  location  in 
the  document)  for  the  action  of  the  preferred  hand  (selecting 
an  item  within  the  page  using  the  mouse) .  Other  applications 
for  bimanual  input  include  command  selection  (Bier,  Stone, 
Pier,  Buxton,  &  DeRose,  1993;  Kabbash,  Buxton,  &  Sellen,  1994), 
drawing  tools  (Kurtenbach,  Fitzmaurice,  Baudel,  &  Buxton, 
1997),  and  virtual  camera  control  and  manipulation  (Balakrish- 
nan  &  Kurtenbach,  1999;  Hinckley  et  al.,  1998b).  Integrating  ad¬ 
ditional  buttons  and  controls  with  keyboards  to  encourage  bi¬ 
manual  interaction  can  also  improve  the  efficiency  of  some 
common  tasks  (MacKenzie  &  Guiard,  2001;  McLoone  et  al.,  2003). 

Pen  and  Gesture  Input 

The  Palm  Pilot  and  Tablet  PC  have  led  to  a  renaissance  in  pen 
and  gesture  research.  Pens  lend  themselves  to  command  ges¬ 
tures  analogous  to  proofreader’s  marks,  such  as  crossing  out  a 
word  to  delete  it.  Note  that  in  this  example,  the  gesture  inte¬ 
grates  the  selection  of  a  delete  command  with  the  selection  of 
the  word  to  be  deleted.  Another  example  is  moving  a  paragraph 
by  circling  it  and  drawing  a  line  to  its  new  location.  This  inte¬ 
grates  the  verb,  object,  and  indirect  object  by  specifying  the 
command,  the  extent  of  text  to  move,  and  the  new  location  for 
the  text  (Hinckley  et  al,  2005a;  Kurtenbach  &  Buxton,  1991). 
Marking  menus  use  straight-line  gestures  along  the  primary 
compass  directions  for  rapid  command  selection  (Kurtenbach, 
Sellen,  &  Buxton,  1993;  Zhao  &  Balakrishnan,  2004). 

Pen  interfaces  must  decide  whether  to  treat  pen  strokes  as 
ink  content  or  gesture  commands.  Some  applications  avoid 
this  recognition  problem  by  treating  all  strokes  as  commands 
(Kurtenbach  &  Buxton,  1991),  but  for  a  free-form  drawing  or 
note-taking  application,  users  need  to  interleave  ink  content 
and  command  input.  The  status-quo  solution  presents  com¬ 
mands  in  a  toolbar  or  menu  at  the  edge  of  the  screen;  however, 
this  necessitates  round  trips  between  the  work  area  and  the 
command  area  (Fitzmaurice,  Khan,  Pieke,  Buxton,  &  Kurten¬ 
bach,  2003a),  which  becomes  inconvenient  in  direct  proportion 
to  the  display  size.  Pressing  a  button  with  the  nonpreferred 
hand  is  a  fast  and  robust  means  to  switch  between  ink  and  ges¬ 
ture  modes  (Li  et  al.,  2005). 

Techniques  to  automatically  distinguish  ink  and  gestures 
have  been  proposed,  but  only  for  highly  restricted  gesture  sets 
(Saund  &  Lank,  2003).  Punctuation  (tapping)  has  also  been  ex¬ 
plored  as  a  way  to  both  identify  and  delimit  command  phrases 
(LaViola,  2004).  A  fundamental  problem  with  both  of  these  ap¬ 
proaches  is  that  the  system  cannot  classify  a  set  of  strokes  as  a 
gesture  or  as  ink  until  after  the  user  has  finished  drawing  the  en¬ 
tire  command  phrase.  This  makes  it  difficult  to  provide  interac¬ 
tive  feedback  or  to  prompt  the  user  with  the  available  com¬ 
mands  before  the  user  commits  to  an  operation. 

"While  moving  the  pen  to  toolbars  at  the  edge  of  the  screen 
seems  slow  on  a  tablet  computer,  in  practice,  this  “round  trip 
strategy”  (Fitzmaurice  et  al.,  2003a)  is  difficult  to  improve  upon. 
On  a  tablet  the  size  of  a  standard  8.5  X  11  inch  sheet  of  paper, 
a  round  trip  requires  approximately  1.5  seconds;  however,  the 
user  can  mentally  prepare  for  the  next  step  of  the  interaction 
while  moving  the  pen.  A  locally  drawn  gesture  (such  as  a  straight- 
line  marking  menu  command)  may  take  less  time  to  articulate. 
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but  thinking  about  what  command  to  select  requires  additional 
time  unless  the  task  is  a  routine  one.  Pressing  a  button  for  ges¬ 
ture  mode  also  requires  some  overhead,  as  does  lifting  the  pen 
at  the  end  of  the  gesture.  Also  note  that  performing  a  sequence 
of  gestures  (e.g.,  tagging  words  in  a  document  as  keywords  by 
circling  them)  requires  time  to  travel  between  screen  locations. 
The  round-trip  strategy  absorbs  this  travel  time  into  the  round 
trip  itself,  but  with  gestures,  this  is  an  extra  cost  that  reduces  the 
benefit  of  keeping  the  interaction  localized. 

Thus,  on  a  tablet-sized  device,  it  is  difficult  to  realize  a  sub¬ 
stantial  time  savings  just  by  reducing  round  trips.  For  our  hy¬ 
pothetical  task  of  tagging  keywords  in  a  document,  Fig.  9.8 
illustrates  this  predicament  for  average  task  times  drawn  from 
recent  studies  (Hinckley  et  al.,  2006;  Li  et  al.,  2005).  The  chart 
shows  two  successive  command  selections,  and  assumes  some 
mental  preparation  is  required  before  issuing  each  command. 
Thus,  the  potential  benefit  of  pen  gestures  depends  on  the 
sequence  of  operations  as  well  as  the  elimination  of  multiple 
round  trips,  as  may  be  possible  with  techniques  that  integrate 
selection  of  verb,  object,  and  indirect  object  (Hinckley  et  al., 
2005a;  Kurtenbach  &  Buxton,  1991).  Localized  interaction  may 
also  offer  indirect  benefits  by  reducing  physical  effort  and  by 
keeping  the  user’s  visual  attention  focused  on  their  work  (Gross- 
man,  Hinckley,  Baudisch,  Agrawala,  &  Balakrishnan,  2006;  Kab- 
bash  et  al.,  1994). 

Whole  Hand  Input 

Humans  naturally  gesture  and  point  using  their  hands  during 
verbal  communication,  which  has  motivated  research  into  free¬ 
hand  gestures,  often  in  combination  with  speech  recognition 


(Bolt,  1980;  Hauptmann,  1989;  Wilson  &  Oliver,  2003).  Cadoz 
categorized  hand  gestures  as  semiotic,  ergotic,  or  epistemic. 
Semiotic  gestures,  such  as  “thumbs  up,”  communicate  infor¬ 
mation  (Rime  &  Schiaratura,  1991).  Ergotic  gestures  manipulate 
physical  objects.  Epistemic  gestures  are  exploratory  motions 
that  gather  information  (Kirsh,  1995;  Kirsh  &  Maglio,  1994).  The 
interaction  literature  focuses  on  empty-handed  semiotic  ges¬ 
tures  (Freeman  &  Weissman,  1995;  Jojic,  Brumitt,  Meyers,  & 
Harris,  2000;  Maes,  Darrell,  Blumberg,  &  Pentland,  1997).  A  ma¬ 
jor  challenge  is  to  correctly  identify  when  a  gesture,  as  opposed 
to  an  incidental  hand  movement,  starts  and  stops  (Baudel  & 
Beaudouin-Lafon,  1993;  Wilson  &  Oliver,  2003).  The  lack  of  de¬ 
terministic  state  transitions  (Buxton,  1990;  Vogel  &  Balakrish¬ 
nan,  2005)  can  lead  to  errors  of  user  intent  or  errors  of  com¬ 
puter  interpretation  (Bellotti,  Back,  Edwards,  Grinter,  Lopes,  & 
Henderson,  2002).  Other  problems  include  fatigue  from  ex¬ 
tending  one’s  arms  for  long  periods,  and  the  imprecision  of 
pointing  at  a  distance.  By  contrast,  tangible  interaction  tech¬ 
niques  (Ishii  &  Ullmer,  1997)  and  augmented  devices  (Harrison, 
Fishkin,  Gujar,  Mochon,  &  Want,  1998)  sense  ergotic  gestures 
via  a  physical  intermediary  (Hinckley  et  al.,  1998b;  Zhai  et  al., 
1996).  The  emergence  of  cameras,  cell  phones,  and  tablets  aug¬ 
mented  with  accelerometers  and  other  sensors  suggest  this  is  a 
promising  design  space. 

Background  Sensing  Techniques 

Sensors  can  enable  a  mobile  device  to  sense  when  the  user 
picks  up,  puts  down,  looks  at,  holds,  or  walks  around  with  the 
device.  These  actions  give  a  device  information  about  the  con¬ 
text  of  its  use,  and  represent  a  hidden  vocabulary  of  naturally  oc- 
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FIGURE  9.8.  Chart  comparing  task  times  for  performing  two  successive  commands  on  a  tablet, 
with  either  in-place  command  selection  (top)  or  round-trip  command  selection  (bottom).  The  indi¬ 
vidual  boxes  in  each  subtask  represent  1 00  millisecond  intervals. 
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curring  gestures  that  people  spontaneously  exhibit  in  day-to-day 
activity.  For  example,  commercially  available  digital  cameras  now 
employ  a  tilt  sensor  to  detect  the  orientation  of  the  camera,  and 
use  this  to  automatically  save  photos  in  the  correct  orientation, 
as  well  as  to  interactively  switch  the  display  between  portrait/ 
landscape  formats  (Hinckley  et  al.,  2005b;  Hinckley,  Pierce,  Sin¬ 
clair,  &  Horvitz,  2000).  Here  the  sensor  allows  the  device  to 
adapt  its  behavior  to  the  user’s  needs,  rather  than  requiring  the 
user  to  take  extra  steps  to  control  the  photo  orientation  and  dis¬ 
play  format  (Buxton,  1995a). 

Sensors  can  also  be  embedded  in  the  environment.  When 
one  walks  into  a  modern  department  store,  there  is  no  explicit 
command  to  open  the  doors:  the  doors  sense  motion  and  au¬ 
tomatically  open.  Researchers  are  investigating  new  ways  to 
leverage  such  contextual  sensing  to  enrich  and  simplify  inter¬ 
action  with  devices  and  digital  environments  (Abowd  &  Mynatt, 
2000;  Schilit,  Adams,  &  Want,  1994). 

Multitouch  Tables  and  Screens 

Technical  advances  have  led  to  much  recent  research  in  touch- 
sensitive  tables  and  projection  screens.  These  technologies  blur 
the  distinction  between  whole-hand  gestural  input  and  tradi¬ 
tional  single-point  touchscreens  and  touch  tablets.  Recent  pro¬ 
totype  systems  demonstrate  capture  of  the  shape  formed  by  the 
hand(s)  in  contact  with  a  surface  (Wilson,  2005),  multiple  points 
of  contact  (Han,  2005),  or  even  images  of  objects  placed  on 
or  near  a  surface  (Matsushita  &  Rekimoto,  1997;  Wilson,  2004). 
The  DiamondTouch  (Dietz  &  Leigh,  2001)  is  unique  in  its  abil¬ 
ity  to  determine  which  user  produces  each  point  of  contact, 
which  has  led  to  a  number  of  innovative  applications  and  tech¬ 
niques  (Shen,  Everitt,  &  Ryall,  2003;  Wu  &  Balakrishnan,  2003). 
For  an  overview  of  design  issues  for  tabletop  systems,  see  Scott, 
Grant,  and  Mandryk  (2003). 

A  Society  of  Devices 

Wireless  networking  is  the  technology  that  will  most  disrupt  tra¬ 
ditional  approaches  to  human-computer  interaction  in  the  com¬ 
ing  years,  because  it  breaks  down  barriers  between  devices  and 
enables  a  new  society  of  devices  that  fill  specific  roles,  yet  can  still 
coordinate  their  activities  (Fitzmaurice,  Khan,  Buxton,  Kurten- 
bach,  &  Balakrishnan,  2003b;  Want  &  Bordello,  2000).  The  in¬ 
teraction  designer  thus  must  consider  the  full  range  of  scale  for 
display  sizes  and  form-factors  that  may  embody  an  interaction 
task,  as  well  as  the  interactions  between  different  types  of  de¬ 
vices.  How  can  interaction  migrate  from  watches,  cell  phones, 
handheld  devices,  and  tablets,  all  the  way  up  to  desktop  moni¬ 
tors,  digital  whiteboards,  and  interactive  wall-sized  displays? 

As  digital  devices  become  physically  smaller,  the  displays  and 
input  mechanisms  they  offer  are  shrinking.  Considerable  effort 
has  been  devoted  to  supporting  web  browsing  in  limited  screen 
space  (Buyukkokten,  Garcia-Molina,  &  Paepcke,  2001;  Jones, 
Marsden,  Mohd-Nasir,  Boone,  &  Buchanan,  1999;  Trevor,  Hilbert, 
Schilit,  &  Koh,  2001).  Techniques  to  make  small  displays  virtu¬ 
ally  larger  include  (a)  peephole  displays  (Fitzmaurice,  1993;  Yee, 
2003),  (b)  transparent  overlays  (Harrison,  Ishii,  Vicente,  &  Bux¬ 


ton,  1995;  Kamba,  Elson,  Harpold,  Stamper,  &  Sukaviriya,  1996), 
and  (c)  using  on-screen  visuals  to  suggest  the  locations  of  off¬ 
screen  objects  (Baudisch  &  Rosenholtz,  2003).  Touchscreens  and 
touch-sensitive  controls  minimize  the  vertical  profile  of  devices 
(Wherry,  2003),  but  may  suffer  from  inadvertent  activation.  Phys¬ 
ical  manipulations  such  as  tilting  that  use  the  device  itself  as  an 
interface  seem  particularly  well  suited  to  small  devices  (Harrison 
et  al.,  1998;  Hinckley  et  al.,  2000;  Rekimoto,  1996).  Tiny,  bright, 
and  inexpensive  laser  or  LED  projectors  are  just  around  the  cor¬ 
ner;  progress  on  computer  vision  techniques  suggests  that  inter¬ 
active  projection  may  allow  small  devices  to  project  large  displays 
and  sensing  surfaces  (Raskar  et  al.,  2004;  Wilson,  2005). 

At  the  other  end  of  the  spectrum,  large-format  displays  are 
now  affordable  and  common.  Large  displays  lend  themselves 
to  collaboration  and  sharing  of  information  with  groups  (Funk- 
houser  &  Li,  2000;  Swaminathan  &  Sato,  1997),  as  well  as  giving 
a  substantial  physical  presence  to  virtual  activities  (Buxton,  Fitz¬ 
maurice,  Balakrishnan,  &  Kurtenbach,  2000;  Trimble,  Wales,  & 
Gossweiler,  2003) .  Researchers  have  explored  pen  and  touch¬ 
screen  interaction  techniques  for  large  displays  (Guimbretiere, 
Stone,  &  Winograd,  2001;  Moran  et  al.,  1997).  Unfortunately 
many  technologies  sense  only  a  single  point  of  contact.  For  in¬ 
teraction  at  a  distance  with  large  displays,  it  remains  unclear 
what  interaction  techniques  work  best  (Olsen  &  Nielsen,  2001; 
Vogel  &  Balakrishnan,  2004;  Vogel  &  Balakrishnan,  2005);  even 
when  a  user  is  close  to  a  large  display,  interacting  with  portions 
of  the  display  that  are  out  view  or  beyond  arm’s  length  raises 
challenges  (Bezerianos  &  Balakrishnan,  2005;  Khan,  Matejka, 
Fitzmaurice,  &  Kurtenbach,  2005). 

Displays  of  various  sizes  support  different  activities  and  so¬ 
cial  conventions;  one  of  the  principle  challenges  of  ubiqui¬ 
tous  computing  (Weiser,  1991)  is  finding  techniques  that  make 
it  easy  for  users  to  work  within  a  digital  ecology  that  supports 
a  range  of  tasks  spanning  multiple  computers,  displays,  and  in¬ 
teractive  surfaces.  Even  on  a  single  computer,  users  do  not 
treat  multiple  monitors  as  one  large  display,  but  instead  em¬ 
ploy  the  boundary  between  displays  to  partition  their  tasks 
(Grudin,  2001).  Several  projects  have  probed  how  to  use  small 
displays  as  adjuncts  to  large  ones  (Myers,  Lie,  &  Yang,  2000; 
Myers,  Stiel,  &  Garguilo,  1998;  Rekimoto,  1998),  allowing  si¬ 
multaneous  interaction  with  private  information  on  a  personal 
device,  and  a  shared  or  public  context  on  a  larger  display. 
Users  need  techniques  that  allow  them  to  access  and  share  in¬ 
formation  across  the  boundaries  of  individual  devices,  as  well 
as  to  dynamically  bind  together  multiple  devices  and  displays 
to  accomplish  their  tasks  (Hinckley,  Ramos,  Guimbretiere, 
Baudisch,  &  Smith,  2004;  Rekimoto,  Ayatsuka,  &  Kohno,  2003). 
Such  interaction  techniques  inherently  involve  multiple  per¬ 
sons,  and  thus  must  consider  how  people  use  physical  prox¬ 
imity  and  relative  body  orientation,  a  field  known  as proxemics 
(Deasy  &  Lasswell,  1985;  Hinckley  et  al.,  2004;  Sommer,  1965). 


CURRENT  AND  FUTURE  TRENDS  FOR  INPUT 


The  designer  of  an  interactive  system  should  take  a  broad  view 
of  input,  and  consider  not  only  traditional  pointing  techniques 
and  graphical  user  interface  widgets,  but  also  issues  such  as 
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search  strategies  to  access  information  in  the  first  place,  sensor 
inputs  that  enable  entirely  new  data  types,  synthesis  techniques 
to  extract  meaningful  structure  from  data,  and  integration  of 
traditional  technologies  such  as  paper  that  offer  fundamental 
strengths. 

Good  search  tools  may  reduce  the  many  inputs  needed  to 
manually  search  and  navigate  file  systems.  Knowledge  work  re¬ 
quires  integration  of  external  information  from  web  pages  or 
databases  (Yee,  Swearingen,  Li,  &  Hearst,  2003)  as  well  as  reuse  of 
personal  information  from  documents,  electronic  mail  messages, 
and  other  content  authored  or  viewed  by  a  user  (Dumais,  Cutrell, 
Cadiz,  Jancke,  Sarin,  &  Robbins,  2003;  Lansdale  &  Edmonds, 
1992).  Unified  full-text  indexing  allows  users  to  quickly  query 
their  personal  information  across  multiple  information  silos, 
and  can  present  information  in  the  context  of  memory  land¬ 
marks  such  as  the  date  a  message  was  sent,  who  authored  the 
text,  or  the  application  used  to  create  a  document  (Cutrell,  Rob¬ 
bins,  Dumais,  &  Sarin,  2006). 

New  sensor  inputs  such  as  location  and  tagging  technologies 
are  coming  to  fruition.  Radio-frequency  identification  (RFID) 
tags  (Want,  Fishkin,  Gujar,  &  Harrison,  1999)  allow  computers 
to  identify  tagged  physical  objects,  thus  enabling  manipulation 
of  ordinary  objects  as  an  input  for  computers.  A  mobile  tag 
reader  can  identify  tagged  physical  locations  (Smith,  Davenport, 
&  Hwa,  2003).  Wireless  communication  technologies  are  poised 
to  deliver  ubiquitous  location-based  services  (Schilit  et  al.,  1994; 
Want,  Hopper,  Falcao,  &  Gibbons,  1992).  Cell  phones  and  low- 
power  radios  for  wireless  networking  can  sense  their  location  or 
proximity  to  other  devices  via  analysis  of  signal  strengths  (Bahl 
&  Padmanabhan,  2000;  Krumm  &  Hinckley,  2004).  As  another 
example,  attempting  to  type  a  secure  password  on  a  mobile 


phone  keypad  quickly  convinces  one  that  biometric  sensors  or 
some  other  convenient  means  for  establishing  identity  is  essen¬ 
tial  for  these  devices  to  succeed.  Such  sensors  could  also  make 
services  such  as  personalization  of  interfaces  much  simpler. 

The  need  to  extract  models  and  synthesize  structure  from 
large  quantities  of  low-level  inputs  suggests  that  data  mining  and 
machine  learning  techniques  will  become  important  adjuncts 
to  interaction  (Fails  &  Olsen,  2003;  Fitzmaurice,  Balakrisnan,  & 
Kurtenbach,  1999;  Horvitz,  Breese,  Heckerman,  Hovel,  &  Rom- 
melse,  1998).  Whenever  a  system  considers  automatic  actions  on 
behalf  of  the  user,  however,  an  important  design  principle  in  the 
face  of  uncertainty  is  to  “do  less,  but  do  it  well”  (Horvitz,  1999). 

With  so  many  new  technologies,  it  is  easy  to  forget  that  pa¬ 
per  remains  an  important  input  and  display  medium  (Sellen  & 
Harper,  2002).  Paper  is  inexpensive,  easy  to  annotate,  rapid  to  ac¬ 
cess,  comfortable  to  read,  light  to  carry,  and  can  be  accessed  af¬ 
ter  tens  or  even  hundreds  of  years.  Because  technology  will  not 
replace  paper  any  time  soon,  researchers  are  studying  how  to 
bring  paperlike  interfaces  to  digital  devices  (Schilit,  Golovchin- 
sky  &  Price,  1998;  Wolf,  Rhyne,  &  Ellozy,  1989),  as  well  as  how 
to  augment  paper  with  digital  capabilities  (Guimbretiere,  2003; 
Liao,  Guimbretiere,  &  Hinckley,  2005;  Stifelman,  1996). 

We  must  make  substantial  progress  in  all  of  these  areas  to  ad¬ 
vance  human  interaction  with  technology.  The  forms  and  capa¬ 
bilities  of  these  and  other  technologies  will  continue  to  advance, 
but  human  senses  and  cognitive  skills  will  not.  We  will  continue 
to  interact  with  computers  using  our  hands  and  physical  inter¬ 
mediaries,  not  necessarily  because  our  technology  requires  us 
to  do  so,  but  because  touching,  holding,  and  moving  physical 
objects  is  the  foundation  of  the  long  evolution  of  tool  use  in 
the  human  species  (Wilson,  1998). 
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INTRODUCTION 


Sensors  convert  a  physical  signal  into  an  electrical  signal  that 
may  be  manipulated  symbolically  on  a  computer.  A  wide  vari¬ 
ety  of  sensors  have  been  developed  for  aerospace,  automotive 
and  robotics  applications  (Fraden,  2003).  Continual  innovations 
in  manufacturing  and  reductions  in  cost  have  allowed  many 
sensing  technologies  to  find  application  in  consumer  products. 
An  interesting  example  is  the  development  of  the  ubiquitous 
computer  mouse.  Douglas  Engelbart’s  original  mouse,  so  named 
because  its  wire  “tail”  came  out  of  its  end,  used  two  metal 
wheels  and  a  pair  of  potentiometers  to  sense  the  wheels  rolling 
over  a  desk  surface.  Soon,  mice  used  a  ball  and  a  pair  of  optical 
encoders  to  convert  the  movement  of  the  hand  into  digital  sig¬ 
nals  indicating  precise  relative  motion.  Now,  even  the  most 
inexpensive  mice  employ  a  specialized  camera  and  image  pro¬ 
cessing  algorithms  to  sense  motions  at  the  scale  of  one  one- 
thousandth  of  an  inch  several  thousand  times  per  second.  Ac¬ 
celerometers,  devices  that  sense  acceleration  due  to  motion  and 
the  constant  acceleration  due  to  gravity,  are  another  interest¬ 
ing  example.  Today’s  tiny  accelerometers  were  originally  devel¬ 
oped  for  application  in  automotive  air-bag  systems.  Digital  cam¬ 
eras  now  incorporate  accelerometers  to  sense  whether  a  picture 
is  taken  in  landscape  or  portrait  mode,  and  save  the  digital 
photo  appropriately.  Many  laptops  with  built-in  hard  disks  also 
include  accelerometers  to  detect  when  the  laptop  has  been 
dropped,  and  park  the  hard  drive  before  impact.  Meanwhile, 
mobile  phone  manufacturers  are  experimenting  with  phones 
that  use  accelerometers  to  sense  motion  for  use  in  interaction, 
such  as  in-the-air  dialing,  scrolling,  and  detecting  the  user’s 
walking  pattern. 

Research  in  human-computer  interaction  (HCI)  explores  the 
application  of  sensing  to  enhance  interaction.  The  motivation  of 
this  work  is  varied.  Some  researches  seek  to  expand  the  array  of 
desktop  input  options,  or  to  build  completely  new  computing 
form  factors  such  as  mobile  devices  that  know  where  they  are 
pointed  and  intelligent  environments  that  are  aware  of  their  in¬ 
habitants.  Other  researchers  are  interested  in  using  sensors  to 
make  our  machines  behave  more  like  we  do,  or  alternatively  to 
make  them  complement  human  abilities.  Entertainment,  sur¬ 
veillance,  safety,  productivity,  mobile  computing,  and  affective 
computing  are  all  active  areas  in  which  researchers  are  apply¬ 
ing  sensors  in  interesting  ways. 

"While  a  wide  array  of  sensors  is  available  to  researchers, 
rarely  does  a  sensor  address  exactly  the  needs  of  a  given  appli¬ 
cation.  Consider  building  into  a  computer  the  capability  to 
sense  when  its  user  is  frustrated.  Detection  of  user  frustration 
would  allow  a  computer  to  respond  by  adopting  a  new  strategy 
of  interaction,  playing  soothing  music,  or  even  calling  technical 
support;  however,  today,  no  “frustration  meter”  may  be  pur¬ 
chased  at  the  local  electronics  store.  What  are  the  alternatives? 
A  microphone  could  be  used  to  sense  when  the  user  mutters 
or  yells  at  the  machine.  A  pressure  sensor  in  the  mouse  and  key¬ 
board  could  detect  whether  the  user  is  typing  harder  or  squeez¬ 
ing  the  mouse  in  frustration  (Klein,  Moon,  &  Picard,  2002; 
Reynolds,  2001) .  A  webcam  might  detect  scowling  or  furrowing 
of  the  eyebrows.  Sensors  in  the  chair  could  detect  user  agitation 


(Tan,  Slivovsky  &  Pentland,  2001).  Ultimately,  the  system  chosen 
should  probably  exploit  a  consistent,  predictable  relationship 
between  the  output  of  one  of  these  sensors  and  the  user’s  frus¬ 
tration  level;  for  example,  if  the  mouse  is  squeezed  at  a  level 
exceeding  some  set  threshold,  the  computer  may  conclude  that 
the  user  is  frustrated. 

In  our  effort  to  build  a  frustration  detector,  we  may  find  a 
number  of  issues  confounding  the  relationship  between  the 
sensors  and  the  state  to  be  detected: 

•  There  is  no  easy  a  priori  mapping  between  the  output  of  the 
sensors  and  the  presumed  state  of  frustration  in  the  user.  Im¬ 
plementation  of  a  pressure  sensor  on  the  mouse  requires  ob¬ 
servation  of  the  user  over  time  to  determine  how  much  pres¬ 
sure  reliably  indicates  frustration.  Implementation  of  the 
more  complex  approach  of  detecting  furrowed  brows  by 
computer  vision  requires  an  elaborate  image  processing 
algorithm. 

•  The  output  of  the  sensors  is  noisy  and  often  accompanied 
by  a  degree  of  uncertainty. 

•  Initial  experimentation  reveals  that  while  no  single  sensor 
seems  satisfactory,  it  sometimes  may  suffice  to  combine  the 
output  of  multiple  sensors. 

•  Our  preconceived  notions  of  frustration  may  not  correspond 
to  what  the  sensors  observe.  This  may  cause  us  to  revisit  our 
understanding  of  how  people  express  frustration,  which,  in 
turn,  may  lead  us  to  a  different  choice  of  sensors. 

•  The  manner  in  which  the  user  expresses  frustration  depends 
greatly  on  the  user’s  current  task  and  other  contextual  factors, 
such  as  the  time  of  day  and  level  of  arousal.  Exploiting  knowl¬ 
edge  of  the  user’s  current  application  may  address  many  cases 
where  our  algorithm  for  detecting  frustration  fails. 

•  After  realizing  that  our  frustration  detector  does  not  perform 
flawlessly,  we  struggle  to  balance  the  cost  of  our  system  mak¬ 
ing  mistakes  with  the  benefit  the  system  provides. 

These  are  just  some  of  the  considerations  that  are  typical  in  a 
nontrivial  application  of  sensors  to  recognition  in  HCI.  While  this 
article  does  not  propose  to  solve  the  problem  of  detecting  and 
responding  to  user  frustration,  it  will  survey  aspects  of  sensor- 
based  recognition  highlighted  by  this  example.  In  particular,  this 
article  presents  the  variety  of  available  sensors  and  how  they  are 
often  used  in  interactive  systems.  Signal  processing,  recognition 
techniques,  and  further  considerations  in  designing  sensor  and 
recognition-based  interactive  systems  are  briefly  addressed. 


SENSORS  AND  SENSING  MODES 


This  article  focuses  only  on  those  sensors  relevant  to  interactive 
applications  and  their  typical  modes  of  use.  Experimenting  with 
such  sensors  has  never  been  easier.  Microcontrollers  such  as  the 
Microchip  PIC  and  BasicStamp  can  interface  sensors  to  PCs  and 
other  devices,  and  can  be  programmed  using  high-level  lan¬ 
guages  such  as  C  and  BASIC.  The  Phidgets  hardware  toolkit 
(Greenberg  &  Fitchett,  2001)  enables  effortless  “plug  and  play” 
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prototyping  with  modular  sensors  and  actuators  that  are 
plugged  into  a  base  interface  board,  and  provides  software  APIs. 
The  Berkeley  and  Intel  Mote  projects  also  offer  wireless  sensor 
packages  useful  in  data  collection  and  sensor  networks  (Kahn, 
Katz,  &  Pister,  2000;  Nachman,  Kling,  Huang,  &  Hummel,  2005). 
Another  source  for  inexpensive  sensors  and  sensor  interface 
kits  is  the  hobbyist  robotics  community. 

Occupancy  and  Motion 

Probably  owing  to  the  importance  of  sensing  technology  in  se¬ 
curity  applications,  many  devices  and  techniques  exist  to  sense 
either  motion  or  a  person’s  presence  (occupancy).  Among 
these  are 

•  Air  pressure  sensors  that  detect  changes  in  air  pressure  re¬ 
sulting  from  the  opening  of  doors  and  windows. 

•  Capacitive  sensors  that  detect  capacitance  changes  induced 
by  the  body. 

•  Acoustic  sensors. 

•  Photoelectric  and  laser-based  sensors  that  detect  disruption 
of  light. 

•  Optoelectric  sensors  that  detect  variations  in  illumination. 

•  Pressure  mat  switches  and  strain  gauges. 

•  Contact  and  noncontact  (magnetic)  switches. 

•  Vibration  detectors. 

•  Infrared  motion  detectors. 

•  Active  microwave  and  ultrasonic  detectors. 

•  Triboelectric  detectors  that  detect  the  static  electric  charge  of 
a  moving  object  (Fraden,  2003). 

Perhaps  one  of  the  most  familiar  motion  detectors  is  the  pas¬ 
sive  infrared  (PIR)  detector,  which  is  sensitive  to  small  changes 
in  the  pattern  of  infrared  radiation  within  the  spectral  range  of 
4  to  20  micrometers  (“far  infrared”).  PIR  detectors  sense  heat 
changes  over  a  small  duration  of  time,  indicating  the  presence  of 
a  person  moving  through  the  room.  These  devices  often  control 
lights  in  office  buildings,  and  can  be  useful  in  office-awareness 
applications  when  combined  with  other  sensors. 

More  selective  motion  and  occupancy  detection  can  be  ob¬ 
tained  with  video  cameras  and  simple  computer  vision  tech¬ 
niques.  For  example,  a  computer  vision  system  allows  for  the 
definition  of  multiple  regions  of  interest  (ROI)  that  allow  fine 
distinctions  regarding  the  location  of  motion.  Such  a  system 
may  thus  be  able  to  ignore  distracting  motion. 

Range  Sensing 

Range  sensors  calculate  the  distance  to  a  given  object.  Such  de¬ 
tectors  can  be  used  as  occupancy  detectors,  and  are  also  useful 
in  motion-  and  gesture-driven  interfaces. 

Many  range  and  proximity  sensors  triangulate  the  position 
of  the  nearest  object.  For  example,  the  Sharp  IR  Ranger  emits  a 
controlled  burst  of  near  infrared  light  from  a  light  emitting  diode 


(LED).  This  light  is  reflected  by  any  object  within  a  few  feet  and 
is  focused  onto  a  small  linear  charge-coupled  devices  (CCD) 
array  that  is  displaced  slightly  from  the  emitter.  The  position  of 
the  reflection  on  the  sensor  can  be  related  the  distance  to  the 
object  by  trigonometry.  Similar  approaches  can  be  used  over  a 
longer  effective  distance  with  the  use  of  lasers  rather  than  LEDs. 

Stereo  computer  vision  systems  similarly  use  triangulation  to 
calculate  depth.  If  the  same  object  is  detected  in  two  displaced 
views,  the  difference  in  their  sensed  2D  positions,  called  “dis¬ 
parity,”  can  be  related  to  the  depth  of  the  object  (Forsyth  & 
Ponce,  2002;  Horn,  1986).  Stereo  vision  techniques  may  be  used 
to  determine  the  depth  of  a  discrete  object  in  the  scene,  or  to 
compute  depth  at  each  point  in  the  image  to  arrive  at  a  full 
range  image. 

A  second  approach  to  calculating  range  is  based  on  measur¬ 
ing  the  time-of-flight  of  an  emitted  signal.  The  Polaroid  ultra¬ 
sonic  ranging  device,  for  example,  was  originally  developed  for 
auto-focus  cameras,  and  subsequently  became  popular  in  ro¬ 
botics.  Such  sensors  emit  a  narrow  ultrasonic  “chirp”  and  later 
detect  the  chirp’s  reflection.  The  duration  in  time  between  the 
chirp  and  the  detection  of  the  reflection  is  used  to  calculate  the 
distance  to  the  object.  Ultrasonic  range  finders  can  sometimes 
be  confused  by  multiple  reflections  of  the  same  chirp;  such  dif¬ 
ficulties  are  eliminated  by  measuring  the  time  of  flight  of  emit¬ 
ted  light  rather  than  sound,  but  such  sensors  are  still  compara¬ 
tively  exotic. 

Position 

Designers  of  sensing-based  interactive  systems  would  probably 
most  like  a  low-power,  wireless,  inexpensive  3D  position  sensor 
that  does  not  rely  on  the  installation  of  complicated  infrastruc¬ 
ture.  Originally  designed  for  military  application.  Global  Posi¬ 
tioning  Satellite  (GPS)  devices  are  useful  for  sensing  street-level 
movement  but  are  limited  to  outdoor  application.  Unfortunately, 
no  indoor  tracking  standard  has  gained  the  popularity  of  GPS. 

The  position  of  a  wireless  RF  receiver  can  be  determined  by 
measuring  signal  strengths  to  RF  transmitters  of  known  position 
using  'Wi-Fi,  Bluetooth,  and  GSM  standards  (LaMarca  et  al., 
2005).  Under  the  assumption  that  signal  strength  approximates 
distance,  position  may  determined  by  triangulation,  but  often 
interference  from  buildings,  walls,  furniture,  and  even  people 
can  be  troublesome.  Another  approach  is  to  treat  the  pattern 
of  signal  strengths  as  a  “signature”  which,  when  recognized  later, 
indicates  the  position  associated  with  the  signature  (Krumm  & 
Horvitz,  2004).  Using  'Wi-Fi  transceivers  and  a  number  of 'Wi-Fi 
access  points,  position  can  be  calculated  within  several  feet  of 
accuracy  under  ideal  laboratory  conditions  using  a  combina¬ 
tion  of  approaches  (Letchner,  Fox,  &  LaMarca,  2005).  Finally,  the 
Ubisense  location  system  achieves  accuracy  on  the  order  of 
15  cm  indoors  by  using  arrival  time  and  angle  of  RF  signals  in 
ultrawideband  (UWB)  frequencies. 

Commercially  available  motion  capture  systems  employ  a  va¬ 
riety  of  strategies  to  track  the  position  and  orientation  of  multi¬ 
ple  points.  These  systems  are  generally  used  to  record  human 
motions  for  applications  such  as  video  game  character  anima¬ 
tion.  Most  require  the  performer  to  wear  several  small  tracking 
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devices.  For  example,  when  precisely  calibrated,  the  Polhemus 
and  Ascension  magnetic  tracking  devices  achieve  millimeter  ac¬ 
curate,  six  degree  of  freedom  (position  and  orientation)  track¬ 
ing  of  multiple  points,  but  rely  on  technology  that  connects 
each  tracking  device  to  a  base  station  with  wires.  Such  prod¬ 
ucts  have  been  very  useful  in  prototyping  gesture-based  inter¬ 
actions  that  require  accurate  3D  position  and  orientation  (Fitz- 
maurice,  Ishii,  &  Buxton,  1995;  Hinckley,  Pausch,  Proffitt,  & 
Kassel,  1998;  Ware,  1990;  Ware  &  Jessome,  1988),  but  often  are 
too  expensive  for  widespread  use. 

Much  research  in  computer  vision  focuses  on  accurate  and 
reliable  object  tracking.  Where  multiple  cameras  are  available,  it 
is  possible  to  compute  the  3D  position  of  a  tracked  object  using 
triangulation.  To  recover  3D  position  useful  for  interactive  ap¬ 
plications  in  a  typical  room  setting,  such  cameras  require  care¬ 
ful  calibration.  Several  prototype  interactive  systems  use  vision 
techniques  to  track  the  hands,  head,  and  body  of  a  user.  Often, 
a  model  of  shape  and  appearance  that  includes  typical  skin  color 
is  used  to  track  the  head  and  hands.  The  ALIVE  system,  for  ex¬ 
ample,  determines  the  2D  position  of  the  head  and  hands  of  a 
user  by  first  extracting  the  user’s  silhouette  against  a  controlled 
(static)  background.  Point  of  high  curvature  along  this  contour 
are  then  extracted  and  tracked  as  hands  or  head.  Later  variants 
exploit  color  information  as  well.  Depth  is  computed  by  assum¬ 
ing  the  user  is  standing  on  a  flat  floor  (Maes,  Darrell,  Blumberg, 
&  Pentland,  1995;  Wren,  Azarbayejani,  Darrell,  &  Pentland,  1995). 

Computer  vision-based  tracking  systems  often  suffer  from 
poor  tracking  reliability  and  sensitivity  to  variations  in  back¬ 
ground  illumination.  Tracking  reliability  can  be  enhanced  by 
controlling  the  appearance  of  the  object  so  that  it  can  be 
tracked  unambiguously.  A  number  of  professional  motion  cap¬ 
ture  systems,  such  as  the  Vicon  Peak  system,  rely  on  passive, 
wireless,  infrared-reflective  pieces,  but  also  require  a  powerful 
infrared  light  source  and  multiple,  redundant  sensors  (cameras) 
to  minimize  missing  data  resulting  from  occlusion.  Alternatively, 
cameras  sensitive  in  the  infrared  domain  can  be  used  to  track  an 
infrared  LED  (IR-LED).  The  position  sensitive  device  (PSD),  for 
example,  is  an  inexpensive,  camera-like  device  that  reports  the 
brightest  spot  on  its  imaging  array  and  is  thus  suitable  for  inex¬ 
pensive  IR-LED-based  tracking  systems.  Multiple  IR-LEDs  can  be 
tracked  using  a  PSD  by  carefully  controlling  when  each  is  illu¬ 
minated.  Gross  room-level  location  can  be  determined  using 
the  simplest  infrared  detectors  and  IR-LEDs  that  transmit  the 
identify  of  the  user  by  blinking  specific  patterns  over  time, 
much  like  a  television  remote  control  (Want,  Hopper,  Falcao,  & 
Gibbons,  1992). 

Acoustic  tracking  systems  are  able  to  triangulate  position  us¬ 
ing  time-of-flight  measurements.  One  approach  is  to  equip  the 
room  with  multiple  detectors  that  are  able  to  hear  a  mobile 
tracking  device  equipped  to  emit  a  sound  at  a  known  frequency. 
This  configuration  can  also  be  inverted,  with  the  detector  on  the 
tracked  device  and  the  emitters  in  the  environment  (Smith,  Bal- 
akrishnan,  Goraczko,  &  Priyantha,  2004;  Ward,  Jones,  &  Hopper, 
1997).  Related  signal  processing  algorithms  can  combine  the 
output  of  two  or  more  microphones  to  triangulate  the  position 
of  an  arbitrary  sound  source  (Rui  &  Florencio,  2003).  This  ap¬ 
proach  can  be  particularly  effective  when  combined  with  other 
techniques  such  as  computer  vision-based  face  tracking  (Zhang 
&  Hershey  2005). 


Movement  and  Orientation 

Unlike  most  tracking  technologies,  a  number  of  movement  and 
orientation  sensors  do  not  rely  on  external  infrastructure.  Iner¬ 
tial  sensors,  for  example,  sense  spatial  and  angular  motion 
(translation  and  rotation) .  They  can  be  used  for  activity  recog¬ 
nition  as  well  as  gesture  and  body  motion-based  interactive  ap¬ 
plications  where  it  is  acceptable  to  wear  or  hold  a  small  wire¬ 
less  sensor  package  (Bao  &  Intille,  2004;  Hinckley,  Pierce, 
Sinclair,  &  Horvitz,  2000;  Lee  &  Mase,  2002). 

Very  simple  tilt  sensors  such  as  mercury  switches  have  been 
used  for  years  to  sense  gross  orientation.  More  recently,  inex¬ 
pensive  accelerometer  devices  using  micro-electro-mechanical 
systems  (MEMS)  technology  were  developed  for  application  in 
automotive  airbag  systems  (Kovacs,  1998).  MEMS  accelerome¬ 
ters  feature  a  tiny  proof  mass  or  cantilever  beam  and  deflection 
sensing  circuitry  to  sense  both  varying  accelerations  due  to 
movement  as  well  as  the  constant  acceleration  due  to  gravity. 
Two-axis  MEMS  accelerometers  can  be  applied  to  sensing  tilt 
(pitch  and  roll)  and  have  been  used  in  gaming  controllers.  But 
nonmilitary  accelerometers  are  not  sufficiently  precise  to  sup¬ 
port  the  double  integration  of  acceleration  necessary  to  calcu¬ 
late  position  information  for  more  than  a  few  seconds.  In  such 
applications,  it  may  suffice  to  add  a  coarse  position  sensor  to 
combat  the  effects  of  drift. 

MEMS  technology  has  also  allowed  the  development  of  gy¬ 
roscope  devices  that  sense  angular  acceleration  rather  than  ab¬ 
solute  orientation.  These  devices  have  been  used  in  stabilizing 
handheld  cameras  and  in  the  Gyromouse  product,  which  maps 
relative  change  in  gyroscope  orientation  to  the  relative  motion 
of  the  mouse  cursor. 

Magnetometers  are  compact,  solid-state  devices  able  to  de¬ 
tect  the  strength  of  the  earth’s  magnetic  field  along  its  princi¬ 
ple  axis,  and  so  are  useful  in  determining  absolute  orientation 
information.  The  output  of  a  pair  of  orthogonally  mounted  mag¬ 
netometers  held  level  may  be  combined  to  find  magnetic  north. 
It  is  common  to  combine  a  two-axis  accelerator  with  a  two-axis 
magnetometer  to  “correct”  the  output  of  the  magnetometer 
when  it  is  not  held  level  (Caruso,  1997).  Three  axis  magneto¬ 
meters  are  available,  but  alone  do  not  give  a  true  3D  orientation 
(e.g.,  a  magnetometer’s  reading  does  not  change  when  it  is  ro¬ 
tated  about  magnetic  north). 

Touch 

The  microswitch  typical  of  today’s  mouse  requires  a  certain 
amount  of  force  to  activate,  thus  allowing  a  user  to  comfortably 
rest  their  forefinger  on  the  button  without  accidentally  click¬ 
ing.  Pressure  sensors,  on  the  other  hand,  sense  a  continuous 
range  of  pressure  states.  Historically,  these  have  been  useful  in 
robotics,  where  they  play  an  important  role  in  designing  control 
systems  for  manipulators.  Polyvinylidene fluoride  (P’VDF)  films 
and  force  sensitive  resistors  (FSRs)  are  two  inexpensive  types 
of  pressure  sensors  with  good  dynamic  range  and  form  factors 
useful  for  small  devices  and  interactive  systems.  Flexible  strain 
gauges  utilizing  the  piezoresistive  effect  have  a  resistance  re¬ 
lated  to  the  amount  of  deformation  (“bend”)  applied  to  the  sen¬ 
sor.  Such  gauges  have  been  used  as  the  basis  for  inexpensive 
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glove  devices  that  sense  the  deflection  of  each  of  the  fingers  of 
the  hand. 

Capacitive  sensing  is  based  on  the  property  that  nearly  any 
object  is  capable  of  storing  electric  charge,  and  that  charge  will 
flow  between  two  objects  when  touching  or  in  close  proximity. 
A  zero-force  touch  sensor  can  be  implemented  with  a  charge 
transfer  technique,  whereby  the  capacitance  of  an  electrode  is 
estimated  by  measuring  the  time  taken  for  an  electrode  to  dis¬ 
charge  a  small  applied  charge  (Hinckley  &  Sinclair,  1999).  This 
time  drops  dramatically  when  the  user  places  a  finger  on  the 
electrode,  since  the  user’s  body  takes  on  much  of  the  charge. 
Other  capacitive  sensing  techniques  can  sense  an  object  before 
it  touches  the  electrode,  making  them  suitable  as  proximity  sen¬ 
sors  for  a  wide  variety  of  interactive  applications  (Baxter,  1996; 
Smith,  1999;  Vranish,  McConnell,  &  Mahalingam,  1991).  Multi¬ 
ple  electrodes  can  be  used  to  implement  position  sensitive  slid¬ 
ers,  such  as  the  wheel  on  the  Apple  iPod. 

Most  common  touch  screens  report  the  single  2D  position 
of  the  user’s  finger  touching  or  pressing  the  screen  (Sears, 
Plaisant,  &  Shneiderman,  1992) .  Resistive  touch  screens  use  two 
large  transparent  conductive  overlays  that  vary  in  resistance 
over  their  length.  When  the  user  presses  on  the  screen,  the 
overlays  are  brought  into  contact,  and  a  voltage  applied  to  one 
or  the  other  overlay  is  used  to  detect  the  horizontal  or  vertical 
position  of  the  finger.  Capacitive  touch  screens  use  capacitive 
sensing  to  sense  touch,  and  the  relative  difference  in  the  charge 
sensed  at  each  corner  of  the  screen  to  determine  position.  Re¬ 
cently,  the  ability  to  more  precisely  sense  the  touch  location  and 
also  sense  the  area  and  shape  of  the  touching  object  has  been 
enabled  by  embedding  multiple  capacitive  sensors  in  the  display 
surface  (Dietz  &  Leigh,  2001;  Rekimoto,  2002).  Finally,  surface 
acoustic  wave  systems  rely  on  sensing  the  finger’s  disruption  of 
surface  acoustic  waves  applied  to  a  screen  surface  (Pickering, 
1986). 

Finally,  computer  vision  techniques  have  been  applied  to 
sense  touch  (Fails  &  Olsen,  2002;  Han,  2005;  Matsushita  &  Reki¬ 
moto,  1997;  Smart  Technologies,  Inc.,  2007;  Tomasi,  Rafii,  & 
Torunoglu,  2003;  Wellner,  1993;  Wilson,  2005).  Using  computer 
vision  to  sense  touch  over  an  area  has  a  number  of  advantages: 
first,  these  techniques  usually  do  not  require  a  special  instru¬ 
mented  surface  as  do  most  touch  screens.  Secondly,  vision  tech¬ 
niques  naturally  support  multiple  touch  points.  Finally,  vision 
techniques  enable  the  ability  to  detect  and  recognize  a  variety  of 
objects  besides  fingers.  For  example,  barcode-like  visual  codes 
may  be  applied  to  uniquely  identify  objects  such  as  game  pieces 
placed  on  a  surface. 

Gaze  and  Eyetracking 

Gaze  detection  refers  to  determining  where  a  person  is  looking 
and  is  principally  the  domain  of  computer  vision.  It  is  possible 
to  very  coarsely  determine  head  orientation  using  techniques 
related  to  face  detection  (Wu,  Toyama,  &  Huang,  2000),  but 
head  orientation  is  often  a  poor  indicator  of  where  someone  is 
looking.  The  goal  of  eyetracking  systems  is  precisely  determine 
where  the  user  is  looking,  or  foveating.  Usually,  these  tech¬ 
niques  are  based  on  precise  tracking  of  multiple  reflections  of 
an  infrared  illuminant  off  the  eye’s  cornea.  For  good  perfor¬ 


mance,  however,  eyetracking  systems  require  careful  per-user 
calibration,  and  so  have  seen  limited  general  application  in  inter¬ 
active  systems  (Beymer  &  Flickner,  2003;  Jacob,  1993;  Tobii, 
2005;  Zhai,  Morimoto,  &  Ihde,  1999). 

Rather  than  determining  gaze  in  a  general  fashion  only  to 
later  match  the  gaze  direction  to  one  of  several  known  objects, 
an  alternative  is  to  determine  only  whether  user  is  looking  at 
the  object..  The  detector  can  then  be  embedded  in  the  object  it¬ 
self  Furthermore,  the  reflection  of  an  infrared  illuminant  by  the 
cornea  and  retina  can  be  detected  by  simple  image  processing 
techniques  when  the  camera  and  infrared  illuminant  are  colo¬ 
cated  (Haro,  Flicker,  &  Essa,  2000;  Shell  et  al.,  2004). 


Speech 

The  long  history  of  research  on  speech  recognition  techniques 
has  resulted  in  commodity  systems  that  bring  modern  speech 
recognition  to  anyone  with  a  PC  and  an  inexpensive  micro¬ 
phone  (Rabiner  &  Juang,  1993).  New  interactive  systems,  how¬ 
ever,  highlight  the  need  for  further  work.  Current  systems,  for 
example,  function  poorly  without  a  “close-talk”  microphone  and 
in  noisy  environments,  and  so  are  unsuited  for  use  in  such  con¬ 
texts  as  intelligent  rooms  and  mobile  scenarios. 

The  array  microphone  combines  audio  from  multiple  mi¬ 
crophones  to  address  the  problems  of  multiple  sound  sources 
and  noisy  environments.  Through  the  process  of  beamforming, 
the  outputs  of  the  multiple  microphones  of  an  array  is  com¬ 
bined  to  form  a  single  audio  signal  in  which  all  but  the  dominant 
speaker’s  signal  has  been  removed.  Beamforming  can  also  re¬ 
veal  information  about  the  position  of  the  speaker  (Tashev  & 
Malvar,  2005). 

To  achieve  robustness,  speech  may  also  be  combined  with 
other  input  modalities  such  as  pen  gestures  (Oviatt,  2002).  Such 
approaches  usually  require  a  sophisticated  model  of  the  user’s 
interaction.  Perhaps  inspired  by  HAL  in  2001:  A  Space  Odyssey, 
some  researchers  have  proposed  incorporating  computer  vi¬ 
sion-based  lip-reading  techniques  into  the  speech  interpreta¬ 
tion  process  (Stork,  1998).  Finally,  information  such  as  intona¬ 
tion,  prosody  and  conversational  turn  taking  can  be  valuable  in 
interactive  systems  (Bilmes  et  al.,  2005;  Choudhury  &  Pentland, 
2003;  Pentland,  2004). 


Gesture 

Many  notions  oi gesture  exist  in  interactive  systems,  and  thus 
many  sensor  systems  are  applicable.  A  gesture  can  be  thought  of 
as  a  specific  hand  pose,  a  spatial  trajectory  of  the  hands  or  sty¬ 
lus,  pointing  or  other  motion  to  indicate  an  object,  or  the  qual¬ 
ity  of  a  motion  of  almost  any  body  part  as  it  relates  to  a  given  ap¬ 
plication  context  (McNeill,  1992). 

Many  of  the  previously  mentioned  tracking  and  movement 
sensing  technologies  have  been  applied  to  sense  and  recognize 
gestures.  For  example,  a  wireless  sensor  package  with  multi¬ 
ple  accelerometers  or  gyros  can  capture  motion  information 
useful  in  recognizing  many  gestures.  Computer  vision  tech¬ 
niques  also  can  be  used  to  track  body  parts  such  as  the  hands 
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and  head,  as  well  as  overall  motion  qualities  that  can  be  inter¬ 
preted  as  gesture.  Often  such  systems  ease  the  sensing  task  by 
requiring  the  user  to  wear  brightly  colored  gloves,  or  by  train¬ 
ing  precise  models  of  skin  color  (Brashear,  Starner,  Lukowicz, 
&  Junker,  2003). 

Pen  gestures  are  often  studied  on  the  tablet  computer,  a 
form  factor  that  often  uses  the  Wacom  electromagnetic  posi¬ 
tioning  technology.  This  system  uses  coils  embedded  in  the  pen 
and  under  the  display  to  find  pen  position,  limited  height 
(hover)  above  the  surface,  tilt,  pressure,  and  button  state.  It  can 
also  support  multiple  simultaneous  inputs.  The  gestures  them¬ 
selves  are  usually  modeled  as  simple  “flick”  gestures  or  spatial 
trajectories. 

While  computer  vision  techniques  have  been  explored  to 
recover  detailed  hand  pose  information,  gloves  with  built-in 
sensors  are  more  commonly  worn  for  this  purpose  (Baudel  & 
Beaudouin-Lafon,  1993).  Early  f/rtwa/  reality  (VR)  systems,  for 
example,  used  magnetic  trackers  attached  to  gloves  equipped 
with  bend  sensors  to  recover  the  position,  orientation,  and  pose 
of  the  hands.  More  recently,  vision-based  professional  motion 
capture  systems  that  track  infrared  retro-reflective  balls  have 
been  used  in  a  similar  fashion.  With  such  precise  hand  shape 
information,  it  is  possible  to  point  at  an  object  with  the  index 
finger,  and  then  make  a  motion  similar  to  pulling  the  trigger  of 
a  gun  to  effect  an  action  (Vogel  &  Balakrishnan,  2005). 

Identity 

In  interactive  systems,  it  is  often  useful  to  know  the  identity  of 
an  object  or  user,  and  a  variety  of  sensing  systems  are  designed 
to  recognize  known  objects.  Object  recognition  is  an  active  re¬ 
search  area  in  computer  vision.  There  are  practical  techniques 
for  quickly  recognizing  one  of  many  known  flat  objects  such 
as  photos  or  book  covers,  for  example  (Lowe,  2004).  Computer 
vision-based  face  recognition  techniques  have  also  been  shown 
to  work  in  fairly  controlled  settings  (Li  &  Jain,  2005).  General  ob¬ 
ject  recognition  and  face  recognition  in  uncontrolled  settings 
is  still  difficult,  however. 

Beyond  face  recognition,  biometrics  uses  a  variety  of  sensing 
technologies.  Fingerprint  recognition  hardware,  for  example, 
uses  optical  scanning  technology,  or  an  array  of  tiny  capacitive 
sensors,  to  construct  an  overall  picture  of  the  fingerprint.  Since 
Johansen’s  early  experiments  demonstrating  an  ability  to  rec¬ 
ognize  human  motion  from  point-light  displays  (Johansson, 
1973),  researchers  have  worked  on  gait  recognition  techniques 
from  video  (Boyd  &  Little,  2005).  Iris,  retina,  hand  geometry, 
vascular  pattern,  handwritten  signature,  and  voice  dynamics  are 
other  biometric  techniques  employing  sensing  technology 
(Sugiura  &  Koseki,  1997;  Wayman,  Jain,  Maltoni,  &  Maio,  2004). 

In  the  absence  of  reliable  techniques  to  recognize  an  object 
by  its  natural  properties,  it  is  often  useful  to  “tag”  an  object  with 
a  standard,  easily  recognizable  marker  that  reveals  the  object’s 
identity.  Visual  codes  such  as  the  ubiquitous  UPC  bar  code  sym¬ 
bols,  for  example,  are  read  by  laser  scanning  systems,  which  are 
now  small  enough  to  be  incorporated  into  mobile  devices.  Two- 
dimensional  “matrix  codes”  such  as  the  QR  code  pack  more  bits 
into  the  same  space,  and  have  been  used  in  a  variety  of  inter¬ 
active  systems  that  recognize  them  by  image  analysis  (Kato, 


Billinghurst,  Poupyrev,  Imamoto,  &  Tachibana,  2000;  Rekimoto 
&  Ayatsuka,  2000). 

Recently,  radio  frequency  identification  (RFID)  tags  have 
gained  in  popularity.  RFID  tags  themselves  are  usually  passive, 
can  be  made  small  and  unobtrusive,  are  cheap  to  manufacture, 
and  can  be  read  at  a  distance.  A  scanning  antenna  that  emits  an 
RF  signal  reads  the  tags;  this  signal,  in  turn,  powers  the  tags  with 
enough  energy  to  respond  with  an  identification  code.  RFID  sys¬ 
tems  vary  in  terms  of  range,  power  requirements,  antenna  and 
tag  form  factors,  bit  depth,  and  so  on  (Garfinkel  &  Rosenberg, 
2005). They  are  thus  particularly  attractive  for  commercial  in¬ 
ventory  management  applications  and  have  been  applied  to  in¬ 
teractive  systems  (Want,  Fishkin,  Gujar,  &  Harrison,  1999). 

Context 

Sensors  can  provide  important  information  about  the  context  of 
the  user  or  device.  For  example,  a  computer  may  listen  in  on 
an  office  to  determine  whether  a  meeting  is  in  progress,  and  if 
so  withhold  noncritical  notifications  (Oliver  &  Horvitz,  2005).  A 
later  section  explores  the  role  of  context  in  interpreting  the  out¬ 
put  of  sensors. 

Simple  context  sensors  are  especially  useful  in  mobile  appli¬ 
cations.  Environmental  sensors  that  detect  such  information  as 
air  temperature,  lighting  quality,  and  air  pressure  may  be  more 
directly  relevant  to  the  application  than  the  absolute  location 
given  by  a  GPS  sensor  (Lester  et  al.,  2006;  Schmidt,  Beigl,  & 
Gellersen,  1999).  Context  sensors  may  be  used  to  determine  the 
user’s  activity,  or  what  the  user  is  currently  doing.  In  mobile  ap¬ 
plications,  an  inertial  sensor  may  be  used  to  determine  the  cur¬ 
rent  transportation  mode  of  the  user,  while  a  microphone  may 
be  used  to  conclude  that  the  user  is  engaged  in  a  conversation. 
An  array  of  simple  switches  placed  throughout  a  household  en¬ 
vironment,  such  as  on  kitchen  cabinets  and  drawers,  may  be  all 
that  is  needed  to  reliably  determine  the  activities  of  its  inhabi¬ 
tants  (Tapia,  Intille,  &  Larson,  2004;  Wilson  &  Atkeson,  2005). 

Affect 

In  psychology,  affect  refers  to  an  emotion  or  subjective  feeling. 
Recently  there  has  been  interest  in  applying  sensing  technol¬ 
ogy  to  allow  interactive  systems  to  respond  appropriately  to 
(and  perhaps  influence)  the  user’s  affect  (Picard,  2000).  A  sys¬ 
tem  might  respond  to  the  user’s  boredom,  interest,  pleasure, 
stress,  or  frustration  (as  in  the  example  in  the  introduction)  by 
changing  aspects  of  the  interaction. 

Like  other  multimodal  systems,  an  affective  computing  sys¬ 
tem  is  likely  to  integrate  a  variety  of  conventional  sensors.  There 
is  an  emphasis,  however,  on  the  use  of  physiological  sensors  to 
recover  physical  data  that  may  be  related  to  the  user’s  affective 
state.  For  example,  the  galvanic  skin  response  (GSR)  sensor 
measures  the  skin’s  conductivity,  which  increases  quickly  when 
the  user  is  startled  or  experiences  anxiety.  The  blood  volume 
pulse  (BVP)  sensor  measures  blood  pressure  over  a  local  region 
by  measuring  the  reflectance  of  a  bright  infrared  light,  and  can 
detect  certain  states  of  arousal  when  applied  to  the  fingertips. 
Respiration  rate  can  be  sensed  by  measuring  the  amount  of 
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stretch  in  an  elastic  band  worn  around  the  chest.  The  elec¬ 
tromyogram  (EMG)  sensor  measures  the  amount  of  electrical 
activity  produced  when  the  muscle  it  is  placed  over  contracts, 
and  is  useful  in  detecting  jaw  clenching,  and  contraction  of  var¬ 
ious  muscles  related  to  facial  expressions.  Finally,  the  electro¬ 
cardiogram  (ECG  or  EKG)  measures  heart  rate. 

In  designing  affective  computing  systems,  it  is  often  difficult 
to  determining  the  mapping  of  sensor  outputs  to  application 
specific  quantities,  such  as  emotional  state.  Particularly  chal¬ 
lenging  is  the  task  of  identifying  specific  physical  correlates  for 
broadly  defined  emotional  states  such  as  “frustration.”  Finally, 
physiological  sensors  are  unsuitable  for  many  applications  be¬ 
cause  of  the  difficulty  in  deploying  them:  many  must  be  placed 
on  particular  locations  on  the  body,  may  require  good  contact 
with  the  skin,  and  are  susceptible  to  differences  among  individ¬ 
ual  users  or  even  the  same  user  from  day  to  day. 

Brain  Interfaces 

Advances  in  cognitive  neuroscience  and  brain  imaging  technol¬ 
ogy  have  spurred  initial  explorations  into  interfacing  computers 
directly  with  a  user’s  brain  activity.  Much  of  the  work  is  moti¬ 
vated  by  a  desire  to  help  individuals  who  have  lost  the  motor 
skills  necessary  to  use  traditional  interfaces.  Thus,  the  goal  of 
brain-computer  interfaces  (BCI)  is  often  to  enable  users  to  ex¬ 
plicitly  manipulate  brain  activity  in  order  to  provide  input  to  a 
system.  Such  interfaces  typically  emulate  traditional  interfaces 
by  triggering  keystrokes  and  cursor  control.  However,  future  ap¬ 
plications  will  likely  take  advantage  of  the  unique  abilities  of  BCI 
systems  to  enable  completely  new  styles  of  interaction  (Hjelm  & 
Browall,  2000). 

BCI  is  generally  limited  to  brain  imaging  techniques  that 
are  noninvasive  and  do  not  require  bulky,  expensive  equip¬ 
ment.  The  electroencephalograph  (EEG)  measures  electrical 
activity  at  local  parts  of  the  brain  using  electrodes  placed  care¬ 
fully  on  the  scalp.  EEG  has  low  spatial  resolution  compared  to 
other  brain  imaging  techniques,  but  has  relatively  good  tem¬ 
poral  resolution.  Functional  near  infrared  (fNIR)  imaging 
measures  blood  flow  in  local  regions  of  the  brain  by  calculating 
the  absorption  of  infrared  light  directed  into  the  scalp.  The 
technology  suffers,  however,  from  low  temporal  resolution, 
but  obtains  higher  spatial  resolution  than  EEG  and  generates 
results  that  are  similar  to  more  impractical  blood-flow  related 
imaging  techniques  such  as  functional  magnetic  resonance 
imaging  (fMRI). 

"With  today’s  BCI  systems,  users  must  learn  how  to  manipu¬ 
late  their  brain  activity  effectively  for  a  given  application,  either 
through  operant  conditioning  or  by  executing  certain  prede¬ 
termined  cognitive  tasks  that  are  distinguishable  to  the  sensors. 
Imagining  the  performance  of  a  motor  skill,  for  example,  exer¬ 
cises  specific  parts  of  the  brain  (Curran  &  Stokes,  2003).  This 
specific  activity  may  be  detectable  by  an  imaging  technique  of 
coarse  resolution.  Alternatively,  applications  can  be  specifically 
designed  to  take  advantage  of  naturally  occurring  brain  activity, 
such  as  that  associated  with  a  flashing  light.  Under  carefully  con¬ 
trolled  conditions,  it  is  possible  to  classify  the  user’s  engage¬ 
ment  in  cognitive  tasks,  such  as  rest,  mental  arithmetic,  and 
mental  rotation  (Kiern  &  Aunon,  1990). 


SIGNAL  PROCESSING 


It  is  rare  to  find  a  sensor  precisely  suited  to  a  given  sensing  task. 
Often,  sensor  output  must  be  manipulated  or  combined  with 
other  sensors  to  fit  the  needs  of  the  application.  This  section 
surveys  signal  processing  techniques  useful  in  applying  sensors 
to  input  and  recognition  tasks. 


Preprocessing 

Preprocessing  refers  to  the  earliest  stage  of  processing  sensor 
signals.  It  is  at  this  stage  that  noise  may  be  removed  from  raw 
sensor  signals,  or  signals  may  be  reduced  to  make  them  more 
compact  and  otherwise  easier  to  use  in  later  processing  stages. 

The  performance  of  a  sensor  relevant  to  preprocessing  can 
be  characterized  in  several  ways.  Accuracy  refers  to  the  degree 
to  which  the  sensor  readings  represent  the  true  value  of  what 
is  measured.  Precision,  or  resolution,  by  contrast,  refers  to  the 
extent  to  which  successive  readings  of  the  same  physical  phe¬ 
nomenon  agree  in  value.  It  is  important  to  realize  that  while  a 
device’s  resolution  is  often  measured  in  bits,  this  number  is  of¬ 
ten  distinct  from  the  number  of  bits  used  to  represent  or  store 
the  sensor’s  readings. 

In  general,  accuracy  and  precision  can  be  estimated  by  col¬ 
lecting  several  successive  measurements  (samples  or  observa¬ 
tions)  of  the  same  input,  and  computing  the  resultant  mean  and 
scatter  (covariance) .  An  accurate  sensor  will  put  the  mean  near 
the  true  value,  and  a  precise  sensor  will  have  a  small  amount  of 
scatter  about  the  mean.  An  accurate  but  noisy  sensor  will  have 
low  precision  (high  scatter),  but  can  still  be  useful  by  the  Central 
Limit  Theorem  from  statistics:  If  we  make  some  assumptions 
about  the  noise,  and  average  a  sufficient  number  of  successive 
values  we  will  derive  a  good  estimate  of  the  true  value  (Hoel, 
Port,  &  Stone,  1971). 

Averaging  of  successive  sensors  readings  is  but  one  simple 
way  to  smooth  noisy  data  to  obtain  a  noise-free  estimate  of  what 
is  measured.  Of  course,  the  input  in  a  real  application  is  likely  to 
be  changing  over  time,  and  the  manner  in  which  this  average  is 
computed  can  vary.  For  example,  the  boxcar filter  is  simply  the 
average  of  the  last  n  samples,  and  is  thus  easy  to  implement; 
however,  the  boxcar  filter  suffers  because  it  requires  a  buffer  of 
samples  over  which  the  average  is  computed,  and  the  resulting 
estimate  will  lag  the  true  value  in  the  case  of  a  changing  signal. 
Related  to  the  boxcar  filter  is  a  technique  whereby  the  estimate 
is  obtained  as  a  weighted  average  of  new  observation  and  pre¬ 
vious  estimate.  This  filter  is  even  easier  to  implement  and  re¬ 
quires  no  buffer  of  previous  samples.  In  this  technique,  how¬ 
ever,  each  estimate  depends  on  previous  estimates  as  the 
Poisson  distribution  over  time,  such  that  a  very  quickly  moving 
signal  or  a  signal  with  many  outliers  will  result  in  erratic  changes 
in  the  smoothed  signal. 

The  Kalman  filter  is  a  popular  technique  for  filtering  time- 
varying  signals  and  can  be  used  to  both  smooth  and  predict  a 
signal.  It  is  the  optimal  linear  filter  in  the  sense  that  it  minimizes 
the  difference  between  the  estimate  and  the  true  value,  assum¬ 
ing  a  linear  model  of  the  input’s  changing  signal  and  Gaussian 
noise  (Welch  &  Bishop,  2004).  The  most  common  Kalman  filter 
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for  estimating  the  position  of  a  moving  object  models  the  ob¬ 
ject’s  state  as  a  linear  function  of  both  velocity  and  the  position 
estimate  in  the  previous  time  step.  The  Kalman  filter  models  un¬ 
certainty  in  two  ways.  First,  there  is  some  uncertainty  in  the  lin¬ 
ear  model  (how  much  do  we  believe  that  the  linear  model  is 
correct?).  Secondly,  there  is  uncertainty  resulting  from  instan¬ 
taneous  noise  that  corrupts  the  observation  (how  precise  is  the 
sensor?).  A  properly  tuned  Kalman  filter  balances  these  uncer¬ 
tainties  appropriately  and  suffers  fewer  problems  with  lag  than, 
for  example,  a  boxcar  filter.  When  the  changing  signal  is  actu¬ 
ally  linear,  it  can  completely  defeat  lag  due  to  filtering.  An  im¬ 
properly  tuned  Kalman  filter,  however,  can  impart  unnecessary 
lag  and  overshoot,  such  that  the  estimate  runs  past  the  input  be¬ 
fore  correcting  itself 

Often,  rather  than  obtaining  a  continuous  estimate  of  a 
sensed  quantity,  we  are  interested  only  in  obtaining  a  binary 
result:  Is  the  switch  on  or  off?  When  a  user  throws  on  a  switch  in 
a  real  system,  the  output  state  of  the  switch  can  change  very 
rapidly  from  off  to  on  and  back  several  times  before  settling  to 
a  single  stable  state.  Debouncing  techniques  combat  this  ef¬ 
fect;  one  simple  technique  is  to  ignore  the  switch  for  some 
small,  fixed  time  after  seeing  the  first  change  in  switch  state 
(e.g.,  40  millseconds) . 

More  difficult  is  the  situation  in  which  a  truly  continuous 
quantity  must  be  transformed  into  a  binary  signal.  This  is  com¬ 
monly  done  by  choosing  a  threshold  below  which  we  report  the 
output  as  “zero,”  and  otherwise,  “one.”  For  example,  in  using  a 
continuous-valued  tilt  sensor  such  as  an  accelerometer  to  de¬ 
termine  whether  a  tablet  PC  is  being  used  in  “portrait”  or  “land¬ 
scape”  mode,  it  is  necessary  to  transform  the  tilt  information 
into  a  binary  quantity  indicating  “portrait”  or  “landscape.”  An 
alternative  to  a  single  threshold  is  a  scheme  with  two  thresholds 
and  a  region  between  (a  deadband)  in  which  no  change  to  the 
output  is  made.  Similar  to  debouncing,  this  approach  can  pre¬ 
vent  fluctuation  of  the  output  around  a  single  threshold. 

Choosing  threshold  values  is  generally  challenging,  and  poorly 
designed  complex  systems  are  frequently  awash  in  thresholds 
that  require  modification  to  achieve  acceptable  performance. 
Ultimately,  the  thresholding  process  destroys  information;  de¬ 
pending  on  the  nature  of  subsequent  processing,  this  loss  can 
be  detrimental  to  a  system’s  overall  performance.  This  is  par¬ 
ticularly  true  for  borderline  cases  where  a  system  is  likely  to 
make  erroneous  decisions  as  the  result  of  either  an  improperly 
chosen  threshold  or  a  noisy  input.  Such  concerns  may  be  eased 
by  adopting  a  “soft”  threshold  that  reports  intermediate  results 
around  the  “hard”  threshold;  the  logistic  function  (Bishop, 
1995)  can  be  useful  in  this  approach. 

The  signal’s  effective  range  of  output,  or  dynamic  range, 
must  be  often  be  considered  both  in  thresholding  and  subse¬ 
quent  processing.  The  relevant  range  of  the  property  to  be 
sensed  must  of  course  lie  within  the  dynamic  range  of  the  sen¬ 
sor.  If  the  dynamic  range  changes — as  a  consequence,  for  ex¬ 
ample,  of  temperature  change,  lighting  change,  or  even  varia¬ 
tions  in  installation — it  may  be  necessary  to  calibrate  the  signal 
to  achieve  a  normative  range.  One  strategy  is  to  find  the  sen¬ 
sor’s  minimum  and  maximum  output  during  normal  use,  and  to 
map  these  to  some  canonical  range.  For  example,  the  output 
of  a  photodetector  may  be  mapped  to  a  range  from  zero  to  one 
by  recording  the  value  of  the  sensor  in  the  darkest  and  brightest 


conditions  of  regular  use.  Another  strategy  is  to  calibrate  the 
sensor  to  ground  truth  values  that  are  collected  by  some  other 
more  trusted  sensor.  Both  of  these  approaches  require  care  if 
the  sensor  is  not  linear  in  its  response;  it  may  then  be  neces¬ 
sary  to  fit  a  curve  (e.g.,  polynomial)  to  map  the  sensor  to  normal 
values.  Characteristics  such  as  dynamic  range,  linearity  of  the  re¬ 
sponse,  and  variation  due  to  temperature  are  often  detailed  in 
a  sensor’s  “data  sheet,”  available  from  the  manufacturer. 

In  time-varying  systems,  we  are  often  concerned  with  the 
frequency  with  which  we  receive  new  samples  from  the  sen¬ 
sor.  An  overly  high  sampling  rate  can  result  in  too  much  data 
to  process,  and  can  be  reduced  by  downsampling.  By  con¬ 
trast,  many  interactive  systems  will  seem  to  lose  their  respon¬ 
siveness  if  the  overall  latency  is  greater  than  100  milliseconds. 
Latency  or  lag  refers  to  any  delay  present  in  the  sensor’s  re¬ 
sponse  to  a  change  in  the  sensed  property  of  the  world,  and 
can  limit  the  responsiveness  of  an  interactive  system  built  on 
the  sensor  (MacKenize  &  "Ware,  1993).  A  low  sampling  rate  im¬ 
parts  latency,  which  may  be  remedied  by  predictive  techniques 
such  as  the  Kalman  filter. 

Finally,  it  is  important  to  consider  the  true  distribution  of  any 
noise  in  filtering  and  many  subsequent  processing  techniques. 
Many  techniques — including  the  simplest  averaging,  Kalman 
filters,  and  many  probabilistic  approaches — assume  a  Gaussian 
or  uniform  distribution  of  noise.  Outliers  violating  this  assump¬ 
tion  can  be  troublesome  and  should  be  removed  by  ad  hoc 
means  or  techniques  from  the  field  of  robust  statistics  (Fischler 
&  Bolles,  1981;  Huber,  1981).  For  example,  the  median  filter, 
in  which  values  of  a  sequence  are  replaced  by  the  median  value, 
is  easy  to  implement,  yet  more  robust  than  simple  averaging. 

Feature  Selection 

In  the  context  of  recognition,  a  feature  can  refer  to  a  particular 
sensor  or  a  piece  of  information  derived  from  one  or  more  sen¬ 
sors,  or  even  derived  from  other  features.  Often  thought  of  as 
a  preprocessing  step,  feature  selection  refers  to  the  process  of 
determining  which  features  are  to  be  computed  from  the  raw 
inputs  and  passed  to  the  next  level  of  processing.  Appropriate 
feature  selection  can  sometimes  make  difficult  recognition 
problems  easy.  For  example,  one  somewhat  unusual  approach 
to  detecting  faces  in  video  is  to  detect  eye-blinking  patterns. 
Blinking  provides  a  signal  that  is  easily  detected  by  simple  image 
processing  operations,  and  is  further  supported  by  the  fact  that 
both  eyes  blink  together  and  are  arranged  in  a  symmetric  spatial 
configuration  on  the  face.  Blinking  thus  may  be  h.igfi\y  diagnos¬ 
tic  for  faces  (Crowley  &  Berard,  1997). 

Feature  selection  begins  by  determining  a  set  of  sensors  rel¬ 
evant  to  the  task  at  hand  often  with  knowledge  of  the  task  or 
domain.  In  the  course  of  development  of  a  new  sensing-based 
system,  it  can  be  beneficial  to  incorporate  as  many  physical  sen¬ 
sors  as  possible,  with  the  idea  that  subsequent  feature  selec¬ 
tion  processes  will  indicate  which  sensors  are  necessary  and  suf¬ 
ficient.  Furthermore,  a  number  of  sensors  taken  in  combination 
may  provide  the  best  overall  performance. 

Having  selected  a  number  of  sensors,  often  the  next  step  is 
to  compute  derived  features  from  the  raw  sensor  inputs.  For  ex¬ 
ample,  when  an  unimportant  and  unpredictable  offset  is  pre- 
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sent  in  the  sensor  inputs  raw  levels,  it  may  be  easier  to  work 
with  its  derivative  instead.  Again,  these  features  are  determined 
in  an  ad  hoc  fashion,  in  light  of  special  domain  or  application 
knowledge.  For  example,  early  stylus  gesture  recognition  algo¬ 
rithms  relied  on  simple  derived  features  such  as  the  (a)  initial 
angle  of  the  stroke,  (b)  maximum  speed  obtained,  (c)  size  of 
the  containing  bounding  box,  (d)  duration  of  the  stroke,  (e) 
amount  of  change  in  curvature  along  the  gesture,  and  so  on 
(Rubine,  1991).  Early  face  recognition  approaches  relied  on  fea¬ 
tures  such  as  the  distances  between  facial  features  such  as  the 
eyes,  nose,  and  mouth  (Zhao,  Chellappa,  Phillips,  &  Rosenfeld, 
2003).  In  the  domain  of  audio,  the  Linear  Predictive  Coding 
(LPC)  and  the  Fourier  transform  are  useful  derived  feature 
spaces.  The  Fourier  transform  in  particular  has  general  applica¬ 
bility  to  signals  with  periodicity.  For  example,  the  Fourier  trans¬ 
form  of  a  body-worn  accelerometer  may  reveal  patterns  of  the 
user’s  walking  (Hinckley  et  al.,  2000). 

Often,  it  is  desirable  to  put  spatial  features  in  a  local  coordi¬ 
nate  system.  For  example,  a  gesture  recognition  system  may  be¬ 
gin  with  the  position  of  the  head  and  hands  of  the  user.  To  re¬ 
move  the  effects  of  the  person  moving  about  the  room,  the 
head  position  may  be  subtracted  from  the  position  of  each 
hand,  yielding  a  “head-centric”  coordinate  system.  In  the  spirit 
of  asymmetric  bimanual  models  of  gesture,  we  might  also  con¬ 
sider  a  coordinate  system  centered  on  the  nondominant  hand 
(Guiard,  1987;  Hinckley  et  al.,  1998).  In  many  cases  switching 
to  a  local  coordinate  system  eliminates  a  large  source  of  the  ir¬ 
relevant  variation  present  in  the  raw  signal,  thus  easing  subse¬ 
quent  modeling,  and  can  be  superior  to  using  only  derivative 
information. 

If  there  is  a  large  number  of  sensors,  or  if  each  sensor  is  of 
high  dimension  (e.g.,  images  taken  from  video  cameras),  each 
sensor’s  value  is  unlikely  to  be  statistically  independent  from  one 
another.  To  remove  redundancy  and  make  the  input  a  more 
manageable  size,  some  form  of  dimensionality  reduction  may 
be  used  to  transform  each  observation  into  one  of  lower  dimen¬ 
sion.  One  broad  class  of  techniques  involves  approximating  each 
sample  as  the  linear  combination  of  a  small  number  of  basis  func¬ 
tions;  the  coefficients  in  this  linear  combination  form  the  corre¬ 
sponding  sample  in  the  new,  smaller  feature  space.  Principle 
Components  Analysis  (PCA)  is  a  popular  technique  and  is  the 
optimal  linear  technique  in  the  mean-square  error  sense.  PCA 
finds  orthogonal  vectors  (“principle  components,”  or  eigenvec¬ 
tors')  in  the  input  space  as  basis  vectors,  each  vector  reducing 
variance  (scatter)  in  the  data  set.  PCA  has  been  used  in  a  wide  va¬ 
riety  of  recognition  systems,  such  as  face  recognition  from  im¬ 
ages,  where  often  less  than  50  components  are  necessary  to  per¬ 
form  recognition  (Pentland,  Moghaddam,  &  Starner,  1994). 
Today,  there  are  numerous  techniques  related  to  PCA,  many  of 
which  are  more  suited  to  classification  (Fodor,  2002). 

"Where  the  number  of  input  features  is  not  large,  automatic 
feature  selection  techniques  may  be  used  to  determine  the  sub¬ 
set  of  features  that  matter.  "While  the  topic  is  an  active  area  of  re¬ 
search,  one  technique  of  general  applicability  is  cross  valida¬ 
tion  (Bishop,  1995;  Mitchell,  1997).  The  simplest  form  of  cross 
validation  is  the  holdout  method,  which  begins  by  dividing  the 
data  set  into  two  halves.  Several  variations  of  the  model  are  then 
trained  on  one  half  of  the  data  and  tested  on  the  other.  The  vari¬ 
ation  with  the  best  performance  on  the  test  set  is  selected  as  the 


best  model.  In  the  case  of  feature  selection,  each  variation  em¬ 
ploys  a  particular  subset  of  the  original  input  features;  after  try¬ 
ing  all  such  subsets,  we  are  left  with  the  best  performing  sub¬ 
set  of  features.  For  more  than  a  handful  of  original  features  this 
approach  will  be  impractical,  so  various  greedy  approximations 
are  often  used,  such  as  starting  with  the  full  set  and  eliminating 
one  at  a  time,  or  successively  adding  features  from  a  small  set. 

Classification  and  Modeling 

Classification  refers  to  the  process  of  determining  which  of  sev¬ 
eral  known  classes  a  given  sample  or  observation  is  drawn  from, 
and  is  typically  the  means  by  which  a  novel  input  is  recognized. 
A  classifier  can  be  used,  for  example,  to  recognize  which  of  sev¬ 
eral  known  gestures  the  user  has  performed  by  the  motion  of 
the  pen  on  a  tablet.  Detection  refers  to  determining  the  pres¬ 
ence  of  an  observation  drawn  from  a  known  class  against  a  back¬ 
ground  of  many  other  observations.  The  distinction  between 
classification  and  detection  is  often  rather  semantic.  For  exam¬ 
ple,  a  face  detection  system  will  determine  if  there  is  any  face 
present  in  an  image,  while  a  face  recognition  system  will  deter¬ 
mine  the  identity  of  the  detected  face.  "While  both  operations 
can  be  thought  of  as  classification,  often  they  call  for  different 
techniques. 

"When  simple  thresholding  or  feature  selection  operations 
are  not  enough  to  transform  a  group  of  sensor  readings  into  a 
signal  that  is  readily  consumed  by  the  application,  it  is  often  nec¬ 
essary  to  exploit  more  sophisticated  classification  and  modeling 
techniques.  These  techniques  are  particularly  useful  in  cases 
where  it  is  necessary  to  use  many  sensors  together,  and  when 
there  are  dependencies  among  them  that  are  difficult  to  untan¬ 
gle  by  simple  inspection.  Modeling  refers  to  the  choices  in  rep¬ 
resentation  of  sensor  values,  their  dependencies,  and  the  com¬ 
putations  performed  on  them. 

There  are  many  ways  to  classify  a  new  sensor  observation  as 
belonging  to  one  of  several  known  classes.  Approaches  in  which 
a  model  is  trained  automatically  from  a  set  of  training  examples 
are  the  most  relevant  to  sensor-based  systems.  These  tech¬ 
niques  are  typically  the  domain  of  machine  learning.  The 
canonical  introductory  technique  is  Fisher’s  linear  discrimi¬ 
nant  (Bishop,  1995)  in  which  a  closed-form  training  procedure 
determines  a  line  in  the  feature  space  that  optimally  divides  two 
classes  of  training  data.  A  new,  unseen  sample  may  then  be  clas¬ 
sified  by  determining  which  side  of  the  line  the  sample  lies.  Be¬ 
yond  the  two-class  case,  samples  are  often  classified  by  com¬ 
puting  the  likelihood  that  the  sample  was  drawn  from  each 
class,  and  choosing  the  class  with  the  largest  likelihood.  Assum¬ 
ing  a  new  observation  x,  and  classes  Q  we  choose  C*  as  the 
maximum  value  P(Q)P(x|Q).  The  prior  P{Ci)  indicates  our  be¬ 
lief  that  a  sample  is  drawn  from  a  class  before  we  even  record 
it,  and  is  often  ignored.  There  are  a  variety  of  techniques  to  de¬ 
rive  such  probabilistic  models  from  a  set  of  examples. 

Common  to  all  these  approaches  is  the  ability  to  characterize 
the  quality  of  a  recognition  result.  A  sample  that  is  correctly  clas¬ 
sified  as  belonging  to  a  given  class  is  a  true  positive.  A  sample 
that  is  incorrectly  classified  as  belonging  to  the  class  is  a  false 
positive.  A  sample  that  is  correctly  classified  as  not  belonging 
to  a  given  class  is  true  negative,  while  a  sample  that  is  incor- 


1 86  •  WILSON 


rectly  classified  as  not  belonging  is  a  false  negative.  In  the  con¬ 
text  of  interactive  systems,  a  false  negative  might  correspond 
to  when  the  user  provides  an  input  and  the  system  fails  to  rec¬ 
ognize  it.  A  high  false-negative  rate  can  lead  to  an  overall  im¬ 
pression  of  unresponsiveness,  or  a  sense  on  the  part  of  the  user 
that  they  are  doing  something  wrong.  False  positives,  on  the 
other  hand,  may  correspond  to  when  the  system  takes  an  action 
when  the  user  had  no  such  intention,  and  can  lead  to  an  im¬ 
pression  that  the  system  is  erratic  or  overly  sensitive  (Zhai  & 
Bellotti,  2005). 

In  most  situations,  a  clear  trade-off  exists  between  the  rate  of 
true  positives  and  false  positives.  Lower  the  bar  for  acceptance 
to  increase  the  true  positive  rate,  and  the  rate  of  false  positives 
is  likely  to  increase.  In  the  context  of  interactive  systems,  this 
tradeoff  is  especially  important  to  consider  when  developing  cri¬ 
teria  for  when  the  system  takes  action  as  the  result  of  a  recog¬ 
nition  process.  The  receiver  operator  characteristic  curve 
(ROC  curve)  plots  true  positive  rate  against  false  positive  rate 
and  best  characterizes  this  trade-off  (see  Fig.  10.1).  The  ROC 
curve  is  also  an  established  method  to  compare  the  performance 
of  recognition  techniques,  without  regard  to  any  application- 
specific  choice  on  how  tolerant  we  are  to  false  positives. 

In  a  given  application,  it  is  also  instructive  to  break  out  clas¬ 
sification  performance  by  each  class.  The  confusion  matrix 
summarizes  how  a  labeled  test  set  is  classified  by  each  class,  and 
may  reveal  that  much  of  the  overall  classification  error  can  be 
traced  to  errors  classifying  observations  from  a  small  number 
of  classes.  This  can  thus  inform  design  of  the  classifier  or  the  set 
of  application-relevant  categories.  Boosting  is  one  technique  in 
which  misclassified  samples  are  emphasized  in  subsequent 
training  of  the  model  to  reduce  the  overall  error  rate  (Schapire, 
2003). 

The  native  Bayes  classifier  assumes  that  the  value  of  a  given 
feature  is  independent  of  all  the  others.  This  property  of  con¬ 
ditional  independence  may  not  actually  apply  to  the  data  set. 


false  positive  fraction 


Technique  1 
Technique  2 


FIGURE  10.1.  ROC  curves  illustrate  the  trade-off  between  the 
rate  of  true  positives  and  false  positives,  and  can  be  useful  in 
comparing  recognition  techniques.  Here  we  see  that  for  a  given 
tolerable  rate  of  false  positives,  Technique  2  yields  better  recog¬ 
nition  performance  than  Technique  1 . 


but  its  assumption  simplifies  computation  and  often  may  not 
matter  in  practice  (hence  the  label  “naive”).  Assuming  observa¬ 
tions  of  the  form  x  =  (xi,  X2,  ■  ■  .,  x„),  the  posterior  probability 
of  a  class  C  is  P{C  \  x)  =  P{C)  P(x  |  C)  by  the  Bayes  rule.  Naive 
Bayes  treats  each  feature  as  independent:  P(C|x)  =  P{C)X\P 
(Xi\C).  Because  each  feature  is  modeled  independently,  naive 
Bayes  is  particularly  suited  to  high  dimensional  feature  spaces 
and  large  data  sets.  Each  feature  can  be  continuous  or  discrete. 
Discrete  variables  are  often  modeled  as  a  histogram  (or  proba¬ 
bility  mass  function),  while  continuous  variables  can  be  quan¬ 
tized  or  binned  to  discrete  values,  or  modeled  as  a  Gaussian  or 
other  parametric  distribution. 

A  number  of  other  popular  classification  techniques  do  not 
have  obvious  probabilistic  interpretations.  The  neural  network, 
for  example,  is  best  thought  of  as  a  function  approximation 
technique.  Often,  as  applied  to  classification,  the  input  of  the 
approximated  function  is  the  observation  itself,  and  the  output 
is  a  vector  whose  fth  component  indicates  belief  that  the  ob¬ 
servation  belongs  to  the  fth  class. 

Decision  trees  can  be  a  powerful  classification  technique  that 
leads  to  very  compact  representations  for  some  problems.  Each 
node  of  a  decision  tree  corresponds  to  an  assertion  about  the 
value  of  a  feature  in  the  observation,  and  yields  a  split  in  the  data 
set.  The  leaves  of  the  tree  then  indicate  the  class  to  which  the 
observation  belongs.  Classification  of  a  new  sample  is  then 
rather  like  the  children’s  game  of  “twenty  questions.”  Training 
the  model  involves  determining  how  to  make  the  splits  to  opti¬ 
mize  classification  performance,  and  possibly,  the  size  of  the 
tree  (Breiman,  Freidman,  Olsen,  &  Stone,  1984). 

The  Support  Vector  Machine  (SVM)  is  a  powerful  modern 
alternative  to  the  Fisher  linear  discriminant  (Cristianini  & 
Shawe-Taylor,  2000).  SVMs  determine  the  split  between  the  two 
classes  to  maximize  performance  on  unseen  data.  Furthermore, 
SVMs  gain  much  of  their  power  by  allowing  nonlinear  splits  of 
the  feature  space,  but  are  often  thought  of  as  being  computa¬ 
tional  intensive  (though,  see  (Platt,  1999)). 

Where  the  conditional  independence  assumption  of  naive 
Bayes  is  too  strong,  other  techniques  that  directly  model  the 
joint  probabilities  are  applicable.  For  example,  a  mixture  of 
Gaussians  uses  a  sum  of  multiple  Gaussian  distributions  to 
model  arbitrarily  complex  joint  distributions:  P(x|C)  =  XP(Wi) 
P(x  I  CO,) ,  where  P(x  |  co,)  is  Gaussian  with  mean  (jl,-  and  covariance 
2,-.  Such  mixture  models  may  be  trained  by  the  expectation 
maximization  (EM)  algorithm  (Mitchell,  1997;  Neal  &  Hinton, 
1999).  The  EM  algorithm  is  very  similar  to  clustering  approaches 
such  as  k-means,  in  which  k  points  in  the  feature  space  are  cho¬ 
sen  as  representative  of  the  overall  set  of  samples. 

Often,  there  are  advantages  in  treating  some  subset  of  the 
variables  as  conditionally  independent  from  others.  For  exam¬ 
ple,  a  full  joint  probability  distribution  can  require  a  lot  of  data 
to  train;  there  may  be  clear  constraints  from  the  application  that 
imply  conditional  independence,  and  there  may  be  some  subset 
of  the  variables  that  are  most  effectively  modeled  with  one  tech¬ 
nique  while  the  rest  are  best  modeled  with  another.  In  this  case, 
it  may  be  helpful  to  selectively  apply  conditional  independence 
to  break  the  problem  into  smaller  pieces.  For  example,  we 
might  takeT’(x|C)  =  P(Xi,X2\C)'?(xf\C)  for  a  three-dimensional 
feature  space,  model  P(Xi,X2|C)  with  a  mixture  of  Gaussians, 
andP(x3|C)  as  a  histogram.  This  overall  model  amounts  to  an 
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assertion  of  condition  independence  between  X3  and  the  joint 
space  ofxi  andx2. 

This  modularity  afforded  by  assumptions  of  conditional  in¬ 
dependence  is  taken  to  its  logical  conclusion  in  the  Bayesian 
network,  which  is  commonly  represented  as  a  directed  acyclic 
graph,  where  each  node  corresponds  to  a  random  variable  Xi 
with  probability  distribution  P(x,|  parents{xff) ,  and  each  vari¬ 
able  is  conditionally  independent  of  all  variables  except  its  par¬ 
ents  (Jensen,  2001).  Nodes  in  a  Bayesian  network  for  which  we 
have  observations  are  called  evidence  nodes,  while  others  are 
considered  hidden.  Observations  may  be  entered  into  network, 
and  through  an  inference  procedure,  the  likelihood  of  the  ob¬ 
servation  may  be  calculated,  as  well  as  posterior  distributions 
over  any  hidden  nodes.  With  Bayesian  networks,  designers  may 
craft  complex  probability  models  without  becoming  mired  in 
mathematical  notation,  and  software  packages  allow  graphical 
manipulation  of  the  networks  directly  (Kadie,  Hovel,  &  Horvitz, 
2001). 

The  dynamic  Bayesian  network  (DBN)  models  time-varying 
sequences,  and  thus  is  relevant  to  systems  in  which  interactions 
take  place  over  durations  of  time.  A  DBN  can  be  thought  of  as 
a  Bayesian  network  where  certain  nodes  depend  on  the  same 
Bayesian  network  instantiated  on  the  previous  time  slice.  De¬ 
pendencies  can  be  within  a  time  slice,  or  across  the  time  slice. 
For  example,  a  state  variable  may  depend  on  its  past  value  as 
P(,Xt\xt-i).  Such  relationships  with  past  values  of  the  same  vari¬ 
able  can  encode  a  probabilistic  Hnite  state  machine  or  Markov 
model,  where  the  distribution  P(x,|x,_i)  is  considered  a  transi¬ 
tion  matrix.  With  this  dependency  on  random  variables  in  the 
past  DBNs  can  effectively  encode  a  time-varying  state  or  “mem¬ 
ory”  that  may  be  relevant  for  an  interactive  application.  For  ex¬ 
ample,  it  can  be  useful  in  modeling  interactions  composed  of  a 
sequence  of  steps,  or  where  application  state  itself  can  be  mod¬ 
eled  as  a  finite  state  machine.  Finally,  by  making  a  strong  de¬ 
pendency  on  the  immediate  past,  the  model  can  be  given  some 
inertia  or  “smoothed.” 

One  popular  special  case  of  the  DBN  is  the  Hidden  Markov 
Model  (HMM),  often  used  to  model  time-varying  signals  such  as 
speech,  gesture,  pen  strokes,  and  so  on.  HMMs  model  observa¬ 
tions  Yt  conditioned  on  a  state  variable  x^,  which  evolves  over 
time  as  P(Xt\x,-f).  As  with  many  probabilistic  models,  HMMs  are 
generative  in  nature,  meaning  one  of  the  ways  we  can  under¬ 
stand  them  is  to  consider  “running  them  forward”  to  generate 
new  observations:  an  HMM  can  be  thought  of  as  a  stochastic  fi¬ 
nite  state  machine  (Markov  model)  that  generates  observations 
drawn  from  a  distribution  associated  with  each  state.  With  each 
time  step,  the  Markov  model  takes  a  transition  to  a  new  state. 
In  the  inference  (recognition)  process,  the  posterior  distribu¬ 
tion  over  the  hidden  state  variable  x,  is  computed  from  the  ob¬ 
servation  sequence  (Rabiner,  1989).  HMMs  have  been  applied 
many  types  of  observation  sequences,  including  hand  gestures, 
handwriting  recognition,  speech  recognition,  and  so  on.  In  the 
simplest  application  paradigm,  a  separate  HMM  is  trained  for 
each  class. 

Much  of  the  attraction  of  Bayesian  networks  is  due  to  their 
flexibility  to  implement  complex  probabilistic  dependencies. 
Many  probability  models  may  be  thought  of  as  particular  Bayesian 
networks.  Naive  Bayes,  mixture  of  Gaussians,  Hidden  Markov 
Models,  and  Kalman  filters,  are  among  the  models  that  have  been 


shown  to  be  special  cases  of  the  Bayesian  network  (Jordan, 
1999).  The  structure  of  the  Bayesian  network  is  often  deter¬ 
mined  by  hand  using  application-specific  knowledge,  while  the 
various  distributions  may  be  tuned  by  training  from  data.  Parts 
of  a  Bayesian  network  may  be  completely  handcrafted,  derived 
from  domain  knowledge  in  the  same  manner  as  expert  systems. 
Other  sections  of  the  network  may  in  fact  be  Hidden  Markov 
Models,  Kalman  filters,  mixtures  of  Gaussians  and  various  hy¬ 
brids,  such  as  a  mixture  of  Kalman  filters.  The  automatic  learn¬ 
ing  of  the  structure  of  the  network  itself  is  active  area  of  re¬ 
search  (Heckerman,  Geiger,  &  Chickering,  1994). 

Many  of  the  techniques  outlined  above  can  be  applied  to  the 
more  generic  task  of  modeling,  where  we  are  interested  in 
more  than  classification  results.  For  example,  a  Bayesian  net¬ 
work  that  fully  models  the  user’s  interactions  with  a  mobile  de¬ 
vice  might  include  a  variable  representing  the  user’s  location. 
The  value  of  this  variable  will  be  hidden  (unknown)  if  there  is 
no  sensor  to  directly  observe  the  user’s  location.  We  may,  how¬ 
ever,  compute  the  posterior  distribution  of  the  variable  after 
several  other  kinds  of  observations  are  entered  in  the  network 
and  find  that  the  user’s  location  is  sometimes  known  with  some 
precision  (e.g.,  the  device  recognizes  the  nearest  wireless  access 
point  with  a  known  location).  Not  only  is  this  model  useful  in 
deducing  the  user’s  location,  it  also  enables  other  parts  of  the 
model  to  exploit  this  knowledge  even  if  we  are  not  ultimately  in¬ 
terested  in  location  information. 

Finally,  sometimes  the  signal  processing  task  for  an  applica¬ 
tion  is  better  thought  of  as  approximating  a  function  that  di¬ 
rectly  maps  a  set  of  inputs  to  outputs.  Techniques  such  as  neural 
networks.  Radial  Basis  Function  networks,  and  other  manifold 
learning  techniques  can  be  useful  in  learning  mappings  from 
sensor  inputs  to  application-specific  quantities.  Such  techniques 
can  be  particularly  useful  in  transforming  raw,  high-dimensional, 
nonlinear  sensor  readings  into  simple  calibrated  outputs  useful 
in  an  application.  For  example,  in  carefully  controlled  circum¬ 
stances,  it  is  possible  to  map  images  of  a  face  to  gaze  angle  by 
providing  a  number  of  face  image  and  gaze-angle  pairs.  A  func¬ 
tion  approximation  approach  can  interpolate  over  these  exam¬ 
ples  to  map  new  images  to  gaze  angle  (Beymer  &  Poggio,  1996). 


AN  EXAMPLE  SYSTEM 


The  following  example  demonstrates  a  number  of  the  tech¬ 
niques  described  above  in  a  working,  interactive,  sensor-based 
system.  After  motivating  the  overall  design  of  the  system,  a  num¬ 
ber  of  aspects  of  hardware  design  are  illustrated.  Also  described 
are  subsequent  signal  processing  steps  such  as  sensor  fusion, 
the  application  of  Bayesian  networks  for  modeling,  and  gesture 
and  speech  recognition. 

In  the  design  of  intelligent  rooms,  the  issue  of  how  the 
room’s  inhabitants  might  best  interact  with  the  room  often 
arises.  The  traditional  notions  of  desktop  computing  or  the  mul¬ 
tiple,  incompatible,  button-laden  remote  controls  typical  of  con¬ 
sumer  electronics  are  perhaps  antithetical  to  the  seamless  and 
untethered  experience  that  is  a  main  feature  of  the  vision  of  in¬ 
telligent  environments.  One  popular  notion  of  how  users  could 
control  an  intelligent  room  is  borrowed  directly  from  Star  Trek: 
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The  user  of  the  room  merely  speaks  to  it,  as  in,  “Computer,  turn 
on  the  lights.” 

In  the  development  of  one  intelligent  room  (Brumitt,  Meyers, 
Krumm,  Kern,  &  Shafer,  2000),  a  user  study  was  conducted  to 
determine  how  real  users  might  want  to  control  multiple  lights 
throughout  the  space  (Brumitt  &  Cadiz,  2001).  A  Wizard  of  Oz 
paradigm  was  adopted  so  that  the  study  would  not  be  limited 
to  designs  already  implemented.  The  experimenter,  seated  be¬ 
hind  one-way  mirrored  glass,  operated  the  lighting  controls 
manually  in  response  to  user  actions.  The  users  were  then  ex¬ 
posed  to  multiple  ways  of  controlling  the  lights:  (a)  a  traditional 
GUI  list  box,  (b)  a  graphical  touch  screen  display  depicting  a  plan 
view  of  the  room  with  lights,  (c)  two  speech  only-based  systems, 
and  (d)  a  speech  and  gesture-based  system.  The  study  con¬ 
cluded  that,  like  Captain  Kirk,  users  preferred  to  use  speech  to 
control  the  lights,  but  that  the  vocabulary  used  to  indicate  which 
light  to  control  was  highly  unpredictable.  This  variance  in  speech 
chosen  poses  a  problem  for  the  pure  speech-based  interface. 

Interestingly,  the  majority  of  subjects  looked  at  the  light  they 
were  trying  to  control  while  speaking.  This  observation  suggests 
that  an  intelligent  room  could  resolve  ambiguity  in  spoken  com¬ 
mands  (e.g.,  which  light  to  control)  by  using  computer  vision 
techniques  to  determine  the  user’s  gaze,  at  least  where  the  de¬ 
vice  under  control  is  within  sight.  There  are  a  number  of  general 
approaches  to  computing  gaze,  but  each  has  serious  drawbacks. 
For  example,  it  is  possible  to  roughly  compute  gaze  from  a  small 
number  of  cameras  throughout  the  room  (Wu  et  al.,  2000),  but 
such  systems  presently  lack  accuracy  and  reliability,  or  require 
a  large  number  of  cameras  to  cover  a  useful  space.  Wearing  a 
special  device  such  as  glasses  solves  some  problems,  but  may 
not  be  acceptable  to  casual  users.  Another  approach  is  to  em¬ 
bed  a  camera  in  the  device  to  determine  whether  the  user  is 
looking  at  it,  rather  than  computing  general  gaze  (Shell  et  al., 
2004).  This  technique  can  be  effective,  but  presently  scales 
poorly  to  a  large  number  of  devices. 

In  light  of  the  difficulties  of  determining  gaze  reliably,  we  rea¬ 
soned  that  pointing  gestures  may  play  a  similar  role  as  gaze  in 
indicating  objects.  "While  few  subjects  in  the  lighting  study  spon¬ 
taneously  used  gestures,  this  may  be  partially  explained  by  the 
near  perfect  performance  of  the  Wizard  of  Oz  speech  recog¬ 
nizer  (the  experimenter).  Furthermore,  pointing  may  have  cer¬ 
tain  advantages  over  gaze.  For  example,  pointing  is  typically  the 
result  of  a  conscious  decision  to  take  action,  while  changes  in 
eye  gaze  direction  maybe  more  involuntary  (Zhai  et  al.,  1999). 
On  the  other  hand,  pointing  may  be  no  easier  to  detect  by  com¬ 
puter  vision  techniques  than  gaze  (Jojic,  Brumitt,  Meyers,  Har¬ 
ris,  &  Huang,  2000). 

To  demonstrate  the  utility  of  the  combination  of  pointing 
and  speech  as  an  interface  modality  in  an  intelligent  environ¬ 
ment,  we  built  a  hardware  device  to  sense  pointing  gestures  and 
developed  associated  signal  processing  algorithms  to  combine 
speech  and  gesture  (Wilson  &  Shafer,  2003).  At  the  center  of  the 
XWand  system  is  a  handheld  device  that  may  be  used  to  select 
objects  in  the  room  by  pointing,  and  a  speech  recognition  sys¬ 
tem  for  a  simple  command  and  control  grammar.  To  turn  on  a 
light  in  the  room,  the  user  may  point  the  wand  at  a  light  and  say, 
“Turn  on.”  Because  the  pointing  gesture  serves  to  limit  the  con¬ 
text  of  the  interaction  (the  light),  the  speech  recognition  task 


is  reduced  to  recognizing  the  few  operations  available  on  lights: 
“Turn  on”  or  “Turn  off”  Alternatively,  the  user  may  perform  a 
simple  gesture  in  place  of  speech  to  effect  the  same  command. 
The  user  may,  for  example,  point  at  a  media  player  device,  hold 
the  button  down,  and  roll  the  device  to  adjust  the  volume.  The 
XWand  system  illustrates  a  number  of  points  related  to  sensor 
and  recognition-based  input,  including  the  hardware  design  of 
a  composite  inertial  sensor,  sensor  fusion,  dynamic  Bayesian 
networks,  and  a  host  of  design  considerations. 

The  original  XWand  system  is  based  on  a  3D  model  of  a  room 
and  the  controllable  devices  within  it.  Using  onboard  sensors, 
the  XWand  device  can  determine  its  own  absolute  orientation, 
while  a  computer  vision  system  mounted  in  the  environment 
finds  the  position  of  the  wand.  Given  the  size  and  3D  position  of 
an  object  in  the  room,  it  is  a  simple  trigonometric  calculation 
to  determine  whether  the  XWand  is  currently  pointing  at  the 
object. 

The  original  XWand  hardware  device  contains  an  Analog  De¬ 
vices  ADXL202  two-axis  MEMS  accelerometer,  a  Honeywell 
HMC1023  three-axis  magnetometer,  a  Murata  ENC-03  one-axis 
piezoelectric  gyroscope,  a  4l8MHz  FM  transceiver,  a  PIC  16F873 
microcontroller,  an  IR-LED,  and  a  pushbutton  mounted  on  a 
custom  printed  circuit  board  (PCB)  (see  Fig.  10.2).  While  the  ac¬ 
celerometer  is  useful  in  detecting  pitch  and  roll  (recall  that  grav¬ 
ity  is  an  acceleration) ,  it  cannot  detect  the  yaw  attitude  of  the 
device.  The  three-axis  magnetometer  reports  direction  cosines 
against  magnetic  north,  from  which  yaw  can  be  determined  only 
if  the  device  is  held  flat  (some  GPS  devices  are  equipped  with 
two-axis  magnetometers  that  give  heading  when  the  device  is 
held  flat) .  Fortunately,  pitch  and  roll  information  from  the  ac¬ 
celerometers  may  be  used  to  “correct”  the  output  of  the  three- 
axis  magnetometer  to  yield  a  full  3D  orientation  with  respect 
to  magnetic  north. 

To  compute  the  3D  position  of  wand,  the  XWand  system  uses 
a  pair  of  FireWire  cameras  mounted  in  the  corners  of  the  room, 
which  are  used  to  track  the  IR-LED  on  the  device.  Each  camera 
uses  an  IR-pass  filter  so  that  in  a  typical  office  environment,  only 
the  IR-LED  is  visible  in  the  image.  Eurthermore,  the  IR-LED  is 
programmed  to  flash  at  15Hz.  "When  the  host  takes  the  video 
output  of  each  camera  at  30Hz,  consecutive  images  may  be  sub¬ 
tracted  prxelwise  so  that  only  objects  blinking  at  15Hz  remain. 
The  IR-LED  can  thus  be  located  easily  in  both  views.  Further¬ 
more,  the  cameras  are  calibrated  to  the  geometry  of  the  room 
so  that  the  3D  position  of  the  IR-LED  is  obtained  from  its  2D  po¬ 
sition  in  both  views.  Note  that  this  arrangement  assumes  a  line 
of  sight  to  the  IR-LED  from  both  cameras. 

To  support  speech  recognition,  an  open  microphone  (low 
impedance)  is  placed  in  the  environment.  Ultimately,  this  mi¬ 
crophone  should  be  placed  on  the  device,  perhaps  with  the  au¬ 
dio  encoded  and  relayed  off-board  for  recognition.  The  speech 
recognition  engine  is  programmed  with  simple  command  and 
control  grammar  based  on  a  simple  command-referent  pattern, 
where  a  referent  can  be  a  device  in  the  environment  (e.g.,  a 
light)  and  the  command  refers  to  one  of  a  number  of  permitted 
actions  on  the  device  (e.g.,  “turn  on”). 

Simple  gestures  made  with  the  wand — such  as  flicking  left, 
right,  up,  down,  and  roll-are  recognized  by  simple  routines  that 
measure  the  change  in  attitude  of  the  wand  from  the  attitude 
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(d)  (c) 

FIGURE  10.2.  (a)  The  first  XWand  prototype  includes  accelerometers,  magnetometers,  gyro, 

radio,  etc.  (image  ©  ACM,  2003)  (b)  The  Orb  device  repackaged  much  of  the  XWand.  (c)  The  Warp- 
Pointer  updates  most  of  the  components  and  uses  Bluetooth,  (d)  The  XWand  3D  geometry  model 
includes  the  3D  position  of  all  interactive  devices  in  a  room,  (image  ©  ACM,  2003)  (e)  The  World- 
Cursor  teleoperated  laser  pointer  is  driven  by  the  XWand. 


recorded  when  the  button  on  the  device  is  first  pressed.  We 
have  also  experimented  with  using  Hidden  Markov  Models  to 
recognize  more  complex  gestures,  but  have  chosen  instead  to 
exploit  a  small  set  of  simple,  memorable,  and  reusable  gestures 
in  conjunction  with  other  contextual  information  such  as  point¬ 
ing  and  possibly  speech  information.  This  approach  allows  for 
a  very  simple  and  robust  gesture  recognition  process,  and 
avoids  training  users  on  a  gesture  set  of  greater  complexity. 

A  dynamic  Bayesian  network  fuses  the  various  quantities  to 
arrive  at  a  multimodal  interpretation  of  the  user’s  interaction. 
It  models  the  combination  of  the  output  of  the  speech  recog¬ 
nition,  the  object  at  which  the  wand  is  currently  pointing,  any 
gesture  performed,  the  known  state  of  the  devices  under  con¬ 
trol,  and  the  state  of  the  interpretation  in  the  previous  time 
steps  (see  Fig.  10.3).  The  network  bases  this  combination  on  the 
command-referent  pattern  outlined  above,  where  the  referent 
may  be  determined  by  speech  or  pointing  gesture,  and  the  com¬ 
mand  may  be  determined  by  speech,  gesture,  button  click,  or 
any  combination  thereof  The  ultimate  action  to  be  taken  (e.g., 
“turn  on  light  #2”)  depends  on  the  command  and  referent,  as 
well  as  the  state  of  the  device  itself  (e.g.,  “turn  on  light  #2”  is 
only  permitted  if  the  light  is  off).  Finally,  both  the  command  and 
referent  at  the  current  time  step  depend  heavily  on  the  com- 


FIGURE  10.3.  The  XWand  Dynamic  Bayesian  Network  (DBN) 
models  multimodal  interaction,  it  combines  wand  input  (Pointing 
Target,  Gesture,  ButtonClick),  speech  input  (SpeechReferent, 
SpeechCommand,  SpeechAction),  and  world  state  (Light  1 ,  Light2, 
Light3)  to  determine  the  next  action  (Action)  as  a  combination 
of  command  (Command)  and  referent  (Referent)  and  past  beliefs 
(PrevCommand,  PrevReferent).  (image  ©ACM,  2003) 
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mand  and  referent  from  the  previous  time  step,  such  that  either 
quantity  can  be  specified  at  slightly  different  moments  in  time. 

The  multimodal  fusion  Bayes  network  offers  a  number  of  in¬ 
teresting  capabilities  for  the  XWand  system.  For  example,  when 
the  state  of  each  controlled  device  is  represented  in  the  net¬ 
work  itself,  distributions  over  related  quantities  change  in  ap¬ 
propriate  ways.  For  example,  if  the  user  points  at  a  device  that  is 
currently  turned  off,  speech  and  gesture  recognition  results  in¬ 
consistent  with  that  state  are  ignored,  such  that  the  phrase 
“turn  off”  is  removed  from  the  speech  recognition  grammar.  In 
future  work,  it  may  also  be  possible  to  infer  vocabulary  by  train¬ 
ing  the  network  dynamically  (e.g.,  point  at  a  light  and  label  it  as 
a  “light,”  or  point  several  times  at  a  point  in  space  to  train  the 
position  of  a  known  light  while  saying,  “Light.” 

We  have  continued  to  develop  the  XWand  prototype  in  var¬ 
ious  ways.  A  typical  objection  to  the  camera-based  tracking 
system  is  the  lengthy  setup  and  calibration  procedures  such 
systems  require.  We  therefore  explored  an  alternative  config¬ 
uration  that  eliminates  cameras  in  favor  of  a  single  teleoper- 
ated  laser  pointer  mounted  in  the  ceiling  (Wilson  &  Pham, 
2003).  The  WorldCursor  laser  (see  Fig.  10.2)  is  programmed  to 
match  the  motion  of  the  wand,  in  a  manner  similar  to  how  the 
standard  mouse  controls  a  cursor  with  relative  motion.  Be¬ 
cause  the  set  of  objects  the  laser  can  reach  is  limited  by  line 
of  sight,  the  original  3D  model  of  the  system  is  eliminated  in 
favor  of  a  simpler  spherical  coordinate  system  with  an  arbitrary 
origin,  thus  simplifying  setup. 

The  most  recent  iteration  includes  a  three-axis  accelerome¬ 
ter  that  can  be  combined  with  a  magnetometer  to  arrive  at  a 
true  3D  orientation  by  a  simple  cross-product  calculation  (see 
Fig.  10.2).  The  device  has  also  been  applied  to  cursor  control 
with  very  large  displays,  where  the  mixture  of  sensors  enables 
a  variety  of  cursor  control  mechanisms  including  absolute,  ori¬ 
entation  based  pointing  (position  and  orientation) ,  relative  an¬ 
gular  motion  similar  to  the  Gyromouse,  and  pure  absolute  po¬ 
sition  only.  This  flexibility  allows  exploration  of  several  new 
modes  of  interaction.  The  relative,  gyro-based  mode  of  pointing 
allows  very  fine  control  with  clutching  over  a  small  area  With 
very  large  (wall-sized)  displays,  however,  it  is  easy  to  lose  the 
cursor  among  the  many  onscreen  objects.  With  the  current  de¬ 
vice,  it  is  possible  to  momentarily  adopt  one  of  the  more  ab¬ 
solute  pointing  modes  to  “warp”  the  cursor  to  a  point  directly  in 
front  of  the  user. 


CONSIDERATIONS  IN  DESIGNING 
RECOGNITION-BASED  SYSTEMS 


The  previous  section  touches  on  issues  common  to  complex 
sensing-based  interactive  systems,  from  hardware  design  to 
modeling  and  sensor  fusion.  Overall,  the  strategy  in  the  example 
is  to  avoid  complex  recognition  problems  whenever  possible 
through  thoughtful  choice  of  hardware  sensors,  design  of  the 
interaction,  and  strong  modeling  of  context.  In  the  following 
section,  we  expand  upon  the  motivations  behind  these  design 
choices.  Many  of  these  considerations  must  be  taken  into  ac¬ 
count  in  the  design  of  recognition-based  interactive  systems. 
While  many  involve  difficult  problems  with  no  easy  answer,  it  is 


best  to  be  aware  of  these  issues  when  designing  a  sensing-based 
interactive  system. 

Computational  Cost,  Utility,  and  the  Cost  of  Failure 

A  number  of  challenges  exist  in  designing  an  interactive  system 
that  uses  sensors  and  recognition  techniques.  First,  although 
Moore’s  Law  continually  pushes  forward  the  frontier  of  practical 
signal  processing  techniques,  many  of  these  algorithms  are  still 
computationally  intensive.  Only  in  the  last  six  years  or  so  have 
real-time  computer  vision  techniques  become  practical  on  com¬ 
modity  hardware.  Furthermore,  many  of  the  machine  learning 
and  pattern  recognition  techniques  are  data-driven,  thus  re¬ 
quiring  large  amounts  of  storage,  memory,  and  training.  In  the 
case  of  mobile  devices,  where  computational  power  lags  desk¬ 
top  computing  power  by  several  years,  many  of  these  algo¬ 
rithms  are  impractical,  and  developers  are  often  forced  to  make 
hard  choices  and  take  shortcuts.  These  concerns  are  often  mag¬ 
nified  in  interactive  systems  that  entirely  remake  the  user  inter¬ 
face  and  employ  multiple  recognition-based  techniques. 

Second,  recognition-based  systems  often  face  serious  chal¬ 
lenges  achieving  and  guaranteeing  the  level  of  robustness  re¬ 
quired  in  real  applications.  In  the  case  of  developing  consumer 
products,  for  example,  it  is  one  thing  to  demonstrate  a  tech¬ 
nique  in  a  laboratory  setting;  it  is  quite  another  to  show  that 
the  same  technique  will  work  in  the  variety  of  circumstances  in 
which  customers  will  expect  it  to  work.  Unit  testing  even  simple 
recognition-based  interactive  systems  can  be  daunting.  Com¬ 
puter  vision  techniques,  for  example,  are  often  susceptible  to 
variations  in  lighting,  while  audio-based  techniques  may  fail  in 
the  presence  of  background  noise.  Some  effort  has  been  de¬ 
voted  to  developing  signal  processing  techniques  that  adapt  to 
both  the  current  circumstances  and  user.  Adaptive  speech 
recognition  and  handwriting  recognition  techniques,  for  exam¬ 
ple,  have  become  commonplace.  Even  in  these  cases,  however, 
it  is  important  that  the  system  have  good  functionality  out  of  the 
box  or  else  the  user  may  not  use  the  system  long  enough  for 
an  adaptive  algorithm  to  improve  performance. 

In  the  development  of  a  recognition-based  interactive  sys¬ 
tem,  it  may  become  impractical  to  seek  more  improvement  in 
recognition  performance.  At  this  point,  it  is  important  to  con¬ 
sider  the  cost  of  recognition  failure:  often  the  cost  of  repairing  a 
false  positive  recognition  can  overwhelm  any  advantage  in  the 
use  of  the  system.  In  speech  recognition,  for  example,  the  repair 
of  the  errors  can  be  awkward,  slow,  and  disruptive  to  the  task 
(C.-M.  Karat,  Halverson,  C.  Karat,  &  Horn,  1999).  Only  users  that 
are  unable  to  use  a  regular  keyboard  may  accept  a  dictation  sys¬ 
tem  that  fails  three  times  out  of  one  hundred  words,  for  exam¬ 
ple  (Feng,  C.-M.  Karat,  &  Sears,  2005).  Another  consideration  is 
whether  the  system  returns  a  result  similar  to  the  desired  re¬ 
sult  when  it  fails  (“graceful  degradation”),  in  which  case  repair  is 
likely  to  be  easier  (Horvitz,  1999).  If  a  recognition  failure  is  too 
costly  to  consider  repair  (for  example,  control  of  an  air-lock  on 
a  spacecraft,  or  more  mundanely,  closing  a  window  on  a  desk¬ 
top  GUI),  the  cost  of  making  a  mistake  may  be  incorporated  di¬ 
rectly  in  the  model  so  that  false  positives  are  more  likely  to  be 
avoided.  This  can  be  done  either  by  seeking  some  kind  of  de¬ 
liberate  confirmation  from  the  user,  or  more  simply  by  moving 
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thresholds  up  the  ROC  curve.  In  the  latter  approach,  it  is  im¬ 
portant  to  be  aware  that  often  users  modify  their  behavior  on 
successive  attempts  such  that  their  input  is  no  longer  modeled 
by  the  usual  training  corpus;  in  speech  this  is  known  as  the 
Lombard  effect  (Junqua,  1993),  but  this  phenomenon  can  be 
observed  in  other  modalities  as  well. 

Considerations  of  computational  cost,  robustness,  and  cost 
of  errors  should  play  prominent  roles  in  the  end-to-end  design 
of  recognition-based  systems.  Ultimately,  the  designer  may  be 
forced  to  recast  the  interaction  and  possibly  the  application  al¬ 
together.  An  example  of  a  system  in  which  one  can  find  many 
of  these  issues  played  out  is  the  Sony  EyeToy  an  add-on  cam¬ 
era  for  the  Sony  Playstation  2.  The  EyeToy  allows  the  player  to 
interact  with  a  game  through  gesture  and  body  motion,  rather 
than  through  the  usual  game  controller.  It  also  appears  to  use 
very  simple  image  processing  techniques  to  determine  the 
player’s  motion.  These  techniques  are  computationally  cheap 
and  generally  quite  robust  to  varying  light  levels  and  other  fac¬ 
tors  that  are  likely  to  be  encountered  in  a  residential  application. 
Furthermore,  the  EyeToy  works  with  a  small  number  of  games 
written  specifically  for  use  with  this  device.  These  games  take 
advantage  of  the  strengths  of  the  EyeToy,  rather  than  risk  pro¬ 
viding  a  poor  emulation  of  the  regular  game  controller. 

The  Role  of  Feedback 

Feedback  is  important  in  any  interactive  application,  but  may  be 
even  more  so  where  sensing  and  recognition  are  used  (Bellotti 
et  al.,  2002).  We  can  characterize  some  kinds  of  feedback  as 
“tight”  or  not,  where  a  “tight”  feedback  loop  provides  feedback 
in  a  frequent,  timely,  and  informative  fashion. 

Usually,  if  there  is  any  chance  of  recognition  failure,  the  sys¬ 
tem  should  provide  feedback  on  all  recognition  results,  errors 
or  otherwise.  If  possible,  the  system  should  provide  some  indi¬ 
cation  of  the  nature  of  any  failure  so  that  the  user  can  modify  his 
or  her  own  behavior  to  meet  the  system’s  expectations.  If  the 
system  can  provide  some  indication  of  the  quality  of  the  inter¬ 
pretation  as  it  is  happening,  then  it  may  be  possible  to  allow 
the  user  to  cancel  or  modify  the  interaction  on  the  fly,  so  as  to 
avoid  costly  errors  or  cumbersome  confirmation  routines  (Vo¬ 
gel  &  Balakrishnan,  2004). 

The  right  feedback  can  often  influence  the  design  of  the  sens¬ 
ing  and  recognition  algorithm  itself  For  example,  because  the  on¬ 
screen  cursor  is  updated  so  quickly  (a  tight  feedback  loop),  a 
naive  user  might  not  ever  realize  or  care  that  the  mouse  provides 
rapid,  small,  successive  bits  of  relative  movement  information 
rather  than  true  position  information,  which  would  be  much 
harder  to  sense.  This  trick  is  used  in  the  WorldCursor  system  to 
avoid  the  use  of  cameras  required  by  the  original  XWand  system. 

Considering  the  EyeToy  again,  it  is  interesting  to  note  that 
the  image  of  the  player  is  often  incorporated  into  the  onscreen 
presentation.  By  watching  themselves  onscreen,  players  are  able 
to  interact  with  onscreen  elements  without  relying  on  sophisti¬ 
cated,  yet  more  failure  prone  and  computationally  intense, 
hand-tracking  algorithms.  This  feedback  also  cleverly  ensures 
that  the  player  stays  in  the  camera’s  field  of  view;  the  flow  the 
game  does  not  have  to  be  broken  to  alert  a  player  who  has  left 
the  field  of  view. 


Implicit  and  Explicit  Interaction 

In  the  previous  discussion  regarding  feedback  and  cost  of  fail¬ 
ure,  we  assume  that  interaction  is  structured  in  such  a  way  that 
the  user  takes  action  and  expects  a  timely  response  from  the 
system — hat  is,  the  user’s  actions  and  the  system’s  responses 
are  explicit.  Most  interactive  systems  can  be  characterized  in 
this  way. 

In  contrast  to  explicit  interactions,  implicit  interactions  are 
based  not  on  explicit  action  by  the  user,  but  more  commonly  on 
users’  existing  patterns  of  behavior  (Schmidt,  2000).  For  exam¬ 
ple,  with  the  frustration-sensing  system  outlined  in  the  intro¬ 
duction,  the  state  of  frustration  is  not  explicitly  entered  by  the 
user  in  order  to  elicit  some  behavior  from  the  system.  Instead, 
the  state  arises  naturally  and  perhaps  involuntarily,  and  upon  de¬ 
tection,  the  system  should  take  appropriate  action. 

Implicit  interactions  may  take  place  over  a  long  duration,  and 
may  not  exhibit  an  obvious  pattern  of  cause  and  effect.  For  ex¬ 
ample,  systems  that  adapt  to  perceived  user  preferences,  as  in¬ 
dicated  by  the  history  of  user  behavior,  might  eventually  make 
a  recommendation  to  the  user,  or  even  take  actions  so  subtle 
that  the  user  may  not  notice  them.  Such  systems  can  be  com¬ 
plex  in  terms  of  sensing  and  modeling,  and  often  tend  towards 
automating  or  refining  aspects  of  the  user’s  original  task 
(Horvitz,  Breese,  Heckerman,  Hovel,  &  Rommelse,  1998).  For 
example,  a  smart  home  may  observe  its  inhabitants’  daily  pat¬ 
terns  of  coming  and  going  to  determine  an  optimal  schedule 
to  control  the  thermostat  automatically,  balancing  comfort  and 
economy  (Mozer,  2005).  One  potential  difficulty  is  that,  unless 
the  output  of  the  system  is  designed  very  carefully,  users  may 
feel  unable  to  correct  a  mistake  made  by  the  system  or  exert 
more  explicit  control  in  the  face  of  exceptional  circumstances 
(e.g.,  a  party).  Designers  should  consider  providing  functional¬ 
ity  that  allows  the  user  to  query  the  system  for  why  it  took  a 
given  action,  provide  simple  mechanisms  to  redress  errors,  and 
finally,  revert  to  a  manual  control  mode. 

Implicit  interaction  systems  driven  by  patterns  of  ongoing 
user  behavior  are  often  not  as  critically  dependent  on  sensing 
and  recognition  reliability  and  the  nature  of  feedback.  Rather, 
it  is  more  important  that  interpretation  processes  are  more  cor¬ 
rect  than  incorrect  over  time;  accordingly,  modeling  techniques 
that  integrate  noisy  sensor  values  over  time  are  often  appropri¬ 
ate.  The  automatic  thermostat,  for  example,  should  probably  in¬ 
corporate  many  observations  of  the  users’  behavior — including 
presence  and  manual  thermostat  control — and  model  week¬ 
ends,  weekdays,  and  holidays  differently.  Sophisticated  model¬ 
ing  techniques  in  hand,  designers  of  such  systems  have  the  op¬ 
portunity  to  exploit  powerful,  though  sometimes  unreliable, 
sensing  techniques. 

The  Importance  of  Context 

Notions  of  context  can  play  an  important  role  in  sensing-based 
interaction  (Dey  Kortuem,  Morse,  &  Schmidt,  2001).  “Context” 
refers  to  the  overall  environment  the  interaction  or  device  finds 
itself  in,  rather  than  the  objects  obviously  and  directly  relevant 
to  the  task  at  hand.  'What  constitutes  context  often  depends  on 
point  of  view;  after  studying  the  application  in  detail,  factors  that 
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once  may  have  seemed  external  and  tangentially  relevant  (con¬ 
text)  may  be  central  to  the  model  after  all. 

Note  the  purposeful  vagueness  in  the  definition  of  “con¬ 
text;”  with  respect  to  sensing,  “environment”  can  refer  to  the 
actual  physical  surroundings.  The  addition  of  sensors  to  a  sys¬ 
tem  may  give  it  awareness  of  its  environment,  and  thereby  en¬ 
able  interesting  “context-dependent”  behavior.  For  example,  a 
mobile  device  equipped  with  GPS  or  other  localization  tech¬ 
nology  might  bring  up  web  search  results  corresponding  to  the 
nearest  points  of  interest  (Abowd  et  al.,  1997;  Davies,  Ceverst, 
Mitchell,  &  Efrat,  2001;  Hariharan,  Krumm,  &  Horvitz,  2005). 
Beyond  the  physical  environment,  context  can  refer  to  more 
abstract  states,  such  the  user’s  current  activity  (e.g.,  working  or 
not),  the  history  of  their  interactions,  and  their  preferences.  It 
may  also  refer  to  momentary  information,  such  as  the  action 
of  the  nondominant  hand,  or  anaphoric  references  that  pro¬ 
vide  scope  for  an  ongoing  interaction. 

Often  our  activities  follow  preestablished  patterns,  either  by 
design  or  by  accident.  These  patterns  can  provide  strong  con¬ 
textual  information  for  interpreting  the  user’s  interactions.  For 
example,  entertainment-based  scenarios  may  have  a  narrative 
structure  that  constrains  the  interaction  (Galyean,  1995).  The 
KidsRoom  interactive  experience,  for  example,  was  structured 
according  to  a  narrative  progression  involving  the  fantastical 
transformation  of  a  children’s  room  (Bobick  et  al.,  1999;  Bo- 
bick  et  al.,2000;  see  Fig.  10.4).  This  structure  in  turn  guided  the 
selection  of  recognition  at  any  given  moment. 

Modeling  context  explicitly  can  be  a  powerful  way  to  solve 
difficult  recognition  problems.  The  more  context  can  be  brought 
to  bear,  often  the  easier  and  more  robust  the  recognition.  Al¬ 
though  counterintuitive  at  first,  by  exploiting  context,  the  com¬ 
bination  of  multiple  sensors  with  simple  signal  processing  tech¬ 
niques  may  result  in  better  performance  than  the  use  of  fewer 
sensors  with  more  complex  signal  processing  techniques.  Recall 
how  in  the  XWand  system,  the  speech  recognition  process  is 
constrained  by  the  knowledge  of  what  the  user  is  currently 


FIGURE  10.4.  The  KidsRoom  engaged  children  to  participate  in 
an  interactive  journey.  Computer  vision  and  projection  technolo¬ 
gies  were  used  to  transform  an  ordinary  room  into  a  variety  of 
settings  including  a  forest  and  river.  Image  ©  the  Massachusetts 
Institute  of  Technology  1999. 


pointing  at  with  the  wand,  the  current  state  of  the  device  indi¬ 
cated,  and  so  forth.  In  fact,  the  interaction  can  be  so  constrained 
that,  often,  the  system  only  needs  some  indication  that  the  user 
said  anything.  Similarly,  it  suffices  to  use  simple,  learnable  and 
reusable  gestures  when  the  interaction  has  been  contextual¬ 
ized  by  pointing,  speech,  or  both.  Our  frustration  detection  sys¬ 
tem  may  be  more  robust  if  it  incorporates  the  knowledge  of  the 
application  currently  in  focus.  For  example,  it  may  be  easier  to 
write  one  frustration  detector  for  office  applications  and  a  sep¬ 
arate  one  for  use  while  playing  video  games,  rather  than  one  de¬ 
tector  that  works  in  both  contexts.  In  the  end,  the  two  detectors 
may  only  differ  in  terms  of  how  some  threshold  is  set. 


The  Importance  of  A  Priori  Knowledge 

The  development  of  a  model  for  a  sensing,  interactive,  system 
can  benefit  from  specific  knowledge  of  the  domain  in  which  the 
interaction  is  situated.  Such  a  priori  knowledge  can  lead  to  in¬ 
sights  as  to  how  to  determine  meaningful  categories  from  raw 
sensor  data.  Higher  level  rules  taken  from  the  domain  can  then 
be  brought  to  bear.  For  example,  in  the  case  of  a  pen-based  sys¬ 
tem  that  automatically  parses  and  manipulates  mathematical 
equations,  the  rules  of  how  mathematical  expressions  are  com¬ 
bined  can  be  a  powerful  constraint  that  drives  the  correct  in¬ 
terpretation  of  the  sloppiest  drawings  (La'Viola  &  Zeleznik, 
2004).  Similarly,  knowledge  of  chemistry  can  guide  the  trans¬ 
formation  of  a  sketches  of  molecules  to  a  full  3D  model  (Ten- 
neson  &  Becker,  2005).  Providing  a  kind  of  context,  strong  as¬ 
sumptions  taken  from  the  domain  can  limit  the  applicability  of  a 
model  but  can  often  dramatically  improve  performance. 

Many  domains  of  human  behavior  have  been  categorized 
and  described  in  terms  of  detailed  taxonomies  and  ontologies. 
For  example,  music,  gesture,  dance,  and  spoken  language  each 
have  detailed  ontologies,  notation  schemes,  and  so  on.  It  can  be 
beneficial  to  draw  from  such  knowledge  when  developing  a 
model,  but  some  aspects  of  a  categorization  scheme  may  not  be 
fully  supported  by  the  available  sensors.  For  example,  there  may 
be  some  aspect  of  the  domain  not  covered  by  the  model,  or  a 
single  category  may  “alias”  to  several  distinct  classes  as  per¬ 
ceived  by  the  sensors.  A  detailed  analysis  of  the  sensing  system 
output  may  lead  to  insights  on  the  original  domain  model. 

One  of  the  advantages  of  incorporating  bits  of  domain 
knowledge  representation  directly  into  the  model  itself  is  that 
it  becomes  more  transparent  to  its  designers  and  users,  and 
thus,  more  reusable.  If  there  is  a  problem  with  the  system,  the 
designer  may  directly  inspect  semantically  relevant  quantities 
from  the  model.  Approaches  that  do  not  rely  on  such  informed 
representations,  such  as  neural  networks,  are  often  so  difficult 
to  inspect  that  upon  discovery  of  a  problem,  it  may  be  easier  to 
retrain  the  model  from  scratch  than  to  troubleshoot  what  the 
network  really  learned.  A  good  compromise  is  to  use  the  more 
data-driven  techniques  (such  as  neural  networks  or  probabilis¬ 
tic  modeling  techniques)  to  map  the  raw  input  signals  onto  se¬ 
mantically  meaningful  mid-level  primitives.  Such  choices  in  rep¬ 
resentation  can  support  modularity  and  explorative  research. 

It  is  interesting  to  note,  however,  that  initial  research  on  a 
given  complex  sensing  problem  often  draws  heavily  from  do¬ 
main  knowledge,  only  to  be  eclipsed  later  by  more  purely  data- 
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driven  approaches.  Early  approaches  to  speech  recognition,  for 
example,  transformed  the  audio  signal  into  a  string  of  symbols 
representing  phoneme  categories  developed  by  linguists;  today, 
one  is  more  likely  to  see  approaches  in  which  subword  acoustic 
unit  categories  are  trained  from  audio  signals  directly  in  a  purely 
data-driven  approach.  Early  face  detection  and  recognition  ap¬ 
proaches  similarly  relied  on  prescriptive  or  ad  hoc  features, 
while  more  recent  approaches  are  more  purely  data-driven  (Li  & 
Jain,  2005;  Zhao  et  al.,  2003). 

Generalize  or  Specialize? 

The  incorporation  of  a  priori  and  context  information  leads  to 
potentially  complex,  yet  powerful,  modeling  techniques.  One  of 
the  drawbacks  of  adding  more  and  more  detail  into  a  model, 
however,  is  that  the  resulting  system  may  be  so  tailored  to  a  par¬ 
ticular  domain  or  set  of  contextual  circumstances  that  it  fails  to 
generalize  to  new  applications.  It  seems  as  if  engineering  best 
practices,  including  modularity  and  device-independence,  run 
counter  to  models  that  are  optimized  for  a  given  situation. 

For  example,  consider  the  problem  of  delivering  various 
location-based  services  to  inhabitants  of  an  indoor  space.  A 
good,  familiar  choice  for  the  representation  of  each  person’s 
location  might  be  two-dimensional  Cartesian  coordinates  on  a 
full  map  of  the  environment.  A  person-tracking  system  is  installed 
in  the  environment;  it  is  charged  with  determining  where  every¬ 
one  is  on  the  map.  This  information  is  then  passed  onto  another 
part  of  the  system  that  delivers  services  to  each  user  based  on 
the  knowledge  of  where  everyone  is  at  any  given  moment. 

Such  a  choice  of  representation  has  many  desirable  fea¬ 
tures.  We  can  consider  a  multitude  of  person-tracking  tech¬ 
nologies — as  long  as  each  reports  2D  Cartesian  coordinates 
we  can  incorporate  its  output  into  our  map  (“device  indepen¬ 
dence”),  and  we  can  consider  merging  the  results  of  multiple 
such  systems  in  a  simple  probabilistic  framework  where  we 
model  the  location  of  each  person  as  a  distribution  over  space 
rather  than  simple  2D  coordinates.  Furthermore,  we  can  ex¬ 
ploit  familiar  rules  of  geometry  to  calculate  interesting  proper¬ 
ties  such  as  whether  a  display  is  within  sight  of  a  particular 
user.  Most  importantly,  we  need  not  concern  ourselves  with  the 
particular  application  services  that  our  ultimate  system  will  pro¬ 
vide:  as  long  as  each  service  expects  only  our  generic  repre¬ 
sentation  we  can  expect  the  service  to  work  properly.  The  sys¬ 
tem  is  general  in  its  applicability. 

Contrast  this  approach  to  one  in  which  we  have  no  unifying 
geometric  model  or  map,  but  instead,  install  proximity  sensors 
at  each  of  the  devices  of  interest,  each  sensor  detecting  some 
one  as  they  approach  the  device.  Perhaps  we  install  special  sen¬ 
sors  on  some  of  the  doors  to  detect  when  someone  walks 
through  (Abowd,  Battestini,  &  O’Connell,  2003),  and  maybe  a 
sensor  on  seats  to  detect  when  someone  sits  down  (Brumitt 
et  al.,  2000).  Often,  proximity  information  is  enough  and  re¬ 
quires  no  calibration  step  (Krumm  &  Hinckley,  2004).  For  our 
location-based  services,  we  note  that  in  some  cases,  it  is  impor¬ 
tant  to  know  who  is  at  a  given  location,  but  in  many  cases,  this 
information  is  not  necessary.  We  place  various  simple  sensors 
throughout  the  environment  in  locations  that  we  think  will  pro¬ 
vide  the  information  needed  to  support  the  specific  location- 


based  services  we  currently  have  in  mind.  The  solution  we  come 
up  with  is  quite  specialized. 

It  is  unclear  which  approach  is  superior.  The  geometric 
model-based  approach  depends  heavily  on  the  performance  of 
the  person-tracking  system.  'When  it  fails,  it  may  return  no  use¬ 
ful  position  information,  even  when  the  given  circumstances  do 
not  require  precise  position  information,  and  even  while  the 
sensor  and  the  underlying  signal  processing  algorithms  may 
produce  relevant  intermediate  results  that  could  be  used  at 
a  subsequent  level  of  processing.  The  assumption  of  device¬ 
independence  ignores  the  fact  that  different  sensing  technolo¬ 
gies  typically  have  very  different  failure  modes.  The  resulting  ap¬ 
plication  may  perform  poorly  due  its  design  around  modeling 
choices  that  follow  the  lowest  common  denominator.  On  the 
other  hand,  it  could  work  very  well  and  provide  the  utmost  in 
flexibility  if  the  person-tracking  is  very  reliable. 

The  specialized  approach  of  course  suffers  in  all  the  ways 
that  the  general  approach  excels;  each  installation  of  the  system 
may  require  significant  innovation  on  the  part  of  the  designer. 
On  the  other  hand,  because  it  is  tailored  to  the  application,  the 
system  is  more  likely  to  gracefully  handle  sensing  failure  modes, 
and  furthermore,  is  more  likely  to  be  less  wasteful  of  resources, 
both  physical  and  computational.  Not  surprisingly,  the  special¬ 
ized  approach  is  likely  to  exhibit  better  performance  for  a  given 
application  than  the  generalized  approach. 

The  consideration  of  generalized  versus  specialized  designs 
is  a  common  engineering  problem  that  is  especially  relevant  in 
the  realm  of  sensor  and  recognition-based  systems.  As  our  ex¬ 
ample  illustrates,  the  two  approaches  may  demand  completely 
different  sensors,  representations,  and  modeling  choices. 

Traditional  vs.  Nontraditional  Interfaces 

A  question  that  naturally  arises  in  the  application  of  sensing  and 
recognition  to  interactive  systems  is  whether  the  design  should 
emulate,  augment,  or  completely  replace  the  interfaces  we  al¬ 
ready  have. 

It  is  probably  not  surprising  that  so  many  interactive  sensor- 
based  systems  emulate  the  mouse.  After  all,  once  this  function¬ 
ality  is  achieved,  the  system  is  now  relevant  to  the  vast  majority 
of  the  world’s  software.  It  is  interesting  to  note,  however,  the 
degree  to  which  the  original  development  of  the  GUI  hinged  on 
the  development  of  the  mouse  itself,  and  how  to  this  day,  the 
mouse  is  still  the  favored  input  device.  This  suggests  that  un¬ 
less  the  new  interactive  system  offers  significant  new  function¬ 
ality  over  the  mouse,  it  will  not  be  adopted,  and  that  instead  of 
forcing  the  new  techniques  on  today’s  interfaces,  designers 
should  think  about  what  changes  in  the  interface  are  implied 
by  new  sensing  systems. 

Another  approach  is  to  augment  or  complement  today’s 
interfaces.  For  example,  it  is  relatively  simple  to  add  an  ac¬ 
celerometer  to  a  mobile  device  that  allows  the  user  to  position 
the  cursor  or  scroll  by  tilting  the  device,  and  it  is  not  hard  to 
imagine  how  users  could  easily  pick  up  this  interaction  (Hinck¬ 
ley  et  al.,  2000). 

Still  another  approach  is  to  completely  reinvent  the  inter¬ 
face.  The  risks  are  that  the  user  is  required  to  learn  a  completely 
new  way  of  working,  the  designer  is  faced  with  developing  the 
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entire  system,  and  the  utility  of  the  system  will  inevitably  be 
compared  more  established  techniques.  Again,  unless  there  is 
a  significant  improvement  to  be  had,  users  are  unlikely  to  adopt 
the  new  approach. 

Many  of  these  issues  are  illustrated  by  recent  research  into 
interactive  table  systems.  In  its  simplest  form,  an  interactive 
table  system  might  be  no  more  than  a  large  flat-panel  touch 
screen  turned  horizontal  to  function  as  a  table  surface.  This 
simple  change  in  orientation  enables  new  applications  that 
exploit  the  collaborative  nature  of  multiple  users  gathering 
around  a  table  (Shen,  Vernier,  Forlines,  &  Ringel,  2004).  Fur¬ 
thermore,  sensing  techniques  have  been  developed  to  support 
multiple  points  of  contact  on  the  surface,  and  to  support  bi¬ 
manual  interaction,  or  the  multiple  hands  of  multiple  users. 
Distinct  from  wall  displays,  tables  are  able  hold  objects,  and 
accordingly,  some  interactive  tables  use  sensing  techniques  to 
recognize  objects  placed  up  on  them.  This  ability  can  be  used 
to  support  various  tangible  user  interface  scenarios.  For  ex¬ 
ample,  a  puck  placed  on  the  table  can  enable  a  feature  of  the 
application;  the  user  might  rotate  the  puck  to  adjust  an  associ¬ 
ated  parameter,  or  it  might  call  up  a  group  of  photos  (Fitzmau- 
rice  et  al.,  1995;  Ullmer,  Ishii,  &  Glas,  1998;  see  also  Ishii,  chap¬ 
ter  ,  this  volume.). 

Clearly,  in  order  to  exploit  the  unique  affordances  and  sens¬ 
ing  capabilities  of  an  interactive  table,  we  must  be  willing  to  let 
go  of  many  of  today’s  GUI  interfaces.  For  example,  the  multi¬ 
ple-touch  and  multiple-user  aspect  cannot  be  supported  by  the 
single-focus  model  of  the  GUI,  and  even  the  assumption  that 
the  display  has  a  natural  reading  orientation  (an  “up”  direction) 
may  no  longer  be  valid  (Kruger,  Carpendale,  Scott,  &  Green¬ 
berg,  2004;  Shen,  Lesh,  &  Vernier,  2003;  see  also  Scott,  chap¬ 
ter  ,  this  volume).  Finally,  today’s  GUI  model  has  no  role  for 
tangible  UIs. 

The  designers  of  interactive  table  systems  tend  to  follow  a 
few  guiding  principles  to  innovate  without  alienating  the  user. 
First,  there  is  a  desire  to  leave  the  overburdened  widgets  of  the 
modern  GUI  behind  and  instead  rely  on  more  of  a  direct  ma¬ 
nipulation  style  of  interaction.  For  example,  to  rotate  a  virtual 
object  on  the  screen,  it  may  suffice  to  place  two  fingers  any¬ 
where  on  the  object  and  move  one  finger  about  the  other,  much 
in  the  same  way  that  one  might  rotate  a  piece  of  paper  sitting  on 
a  desk.  By  using  the  multiple  touch  capability  we  avoid  the 
clumsy  widget-heavy  and  mode-heavy  rotation  techniques  typi¬ 
cal  of  many  drawing  packages.  The  addition  of  pucks  and  other 
tangle  UIs  extends  this  approach;  for  example,  while  the  puck  is 
on  the  table,  it  may  behave  as  knob  (Patten,  Ishii,  &  Pangaro, 
2001).  Secondly,  there  is  a  trend  to  add  widget-based  interaction 
back  into  the  direct  manipulation  framework,  perhaps  mainly  to 
address  the  need  to  perform  a  variety  of  actions  on  the  digital 
representation  of  the  object.  For  example,  e-mailing,  printing, 
and  contrast  and  color  adjustment  are  just  a  few  things  that  the 
user  might  want  to  do  with  their  photos;  these  operations  are 
outside  of  the  scope  of  direct  manipulation.  These  widget-based 
interactions  can  draw  upon  the  advantages  a  multiple  touch 
table  interface  provides.  For  example,  putting  two  fingers  down 
near  an  object  may  trigger  a  scrollbar-like  widget  in  which  bring¬ 
ing  the  two  fingers  closer  or  further  apart  adjusts  a  parameter 
(Wu  &  Balakrishnan,  2003). 


Evaluating  Novel  Interfaces 

Determining  whether  the  novel  sensing  and  interaction  model 
actually  works  is  the  domain  of  usability  testing.  No  research 
into  sensor  and  recognition-based  input  is  complete  without 
an  evaluation  showing  its  effectiveness.  Unfortunately,  the  tech¬ 
nical  difficulty  of  getting  such  systems  to  work  at  all  often  leaves 
little  time  to  quantify  performance  on  a  standard  task.  Further¬ 
more,  when  the  new  work  is  in  a  preliminary  state,  it  may  not  be 
instructive  to  compare  the  new  technique  against  one  that  has 
had  decades  to  evolve. 

Many  times,  user  study  subjects  are  often  so  impressed  by 
the  “magic”  of  sensor-based  systems  that  the  sheer  novelty  of 
the  interface  can  skew  study  results.  Subjective  surveys  are  likely 
to  show  bias  in  favor  of  the  novel  design  (Nielsen,  1994).  This 
is  a  very  difficult  problem  without  an  easy  solution,  particularly 
in  the  case  of  tangible  UIs,  tabletop  interactive  systems,  and  per¬ 
ceptual  user  interface  systems.  Longitudinal  studies,  which  in¬ 
volve  repeated  sessions  spread  out  over  multiple  days,  can  be 
used  to  minimize  this  effect,  but  such  studies  are  expensive  and 
time-consuming.  As  a  result,  many  interactive  sensing-based  sys¬ 
tems  come  with  few  convincing  quantitative  user  studies  that 
prove  their  utility.  In  the  case  of  a  system  that  uses  tracking  or 
detection  to  select  an  object,  Fitts’  Law  studies  can  be  an  effec¬ 
tive  technique  to  compare  pointing  performance  across  very 
different  systems  (MacKenzie,  1992). 

Of  course,  often  the  point  of  the  work  is  not  to  show  a  de¬ 
crease  in  task  completion  time,  a  reduction  in  errors,  or  other 
more  conventional  metrics  from  the  field  of  HCI.  Rather,  the 
goal  may  be  to  highlight  completely  new  ways  of  conceptualiz¬ 
ing  the  relationship  between  users  and  their  machines,  and  to 
demonstrate  that  such  innovations  are  technically  feasible. 
There  is  often  more  of  an  emphasis  on  invention  and  design 
than  on  evaluation.  The  novelty  that  can  sabotage  a  user  study 
may  even  be  a  desirable  effect,  compelling  users  to  engaging  ex¬ 
periences  they  may  not  have  otherwise  had.  Furthermore,  tra¬ 
ditional  evaluation  methods  seem  at  odds  with  the  goals  of  sur¬ 
prising,  delighting,  and  entertaining  the  user.  The  field  has  only 
recently  begun  to  recognize  the  need  to  develop  ways  to  ob¬ 
jectively  evaluate  interactive  systems  along  these  dimensions 
that  are  basic  to  the  quality  of  life  (Blythe,  Overbeeke,  Monk,  & 
■Wright,  2004;  Norman,  2003). 

A  fantastic  example  where  simple  sensing  techniques  were 
used  to  great  effect  to  surprise  and  delight  the  user  is  the  Ping- 
PongPlus  system  (Ishii,  'Wisneski,  Orbanes,  Chun,  &  Paradiso, 
1999).  In  this  research  prototype,  the  usual  ping-pong  table  was 
augmented  with  sound  effects  and  a  top-down  video  projec¬ 
tion  onto  the  table  surface,  and  electronics  to  sense  where  the 
ping  pong  ball  hits  the  surface  during  game  play  (see  Fig.  10.5). 
This  sensing  system  used  eight  microphones  mounted  under 
the  table,  and  custom  electronics  to  triangulate  where  the  ball 
hit  the  surface.  The  sound  effects  and  video  presentation  re¬ 
acted  to  each  strike  of  the  ball,  in  ways  calculated  to  amuse  the 
players  and  augment  the  game  play  in  dramatic  ways.  For  ex¬ 
ample,  in  one  mode,  the  ball  produces  ripples  on  the  table, 
while  in  another,  thunderstorm  audio  and  video  effects  build  up 
as  the  length  of  the  volley  increases. 
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FIGURE  10.5.  The  PingPongPlus  system  uses  a  series  of  microphones  to  triangulate  where  the 
ping  pong  ball  strikes  the  surface  of  the  table.  This  position  information  is  used  to  drive  a  variety 
of  interactive  graphics  displayed  on  the  table  by  an  overhead  projector,  image  ©  2006  Tangible 
Media  Group,  MIT  Media  Laboratory 


CONCLUSION 


In  this  chapter,  we  explored  a  variety  of  sensing  technologies 
available  today,  outlined  a  number  of  signal  processing  tech¬ 
niques  common  to  using  these  sensing  technologies  in  sensor 
and  recognition-based  input  for  interactive  systems,  and  dis¬ 
cussed  further  issues  related  to  designing  such  systems. 

While  much  of  the  discussion  highlighted  difficulties  in  de¬ 
signing  systems  that  rely  on  sensors,  the  future  of  interactive 
sensing  systems  is  bright.  Advances  in  MEMS  and  nanotechnol¬ 
ogy  will  continue  to  drive  innovations  in  the  sensors  them¬ 
selves,  while  the  relentless  increase  in  commodity  CPU  power 
and  storage  capabilities  will  continue  to  enable  more  sophisti¬ 
cated  modeling  techniques  used  in  interpreting  sensor  outputs. 

Another  powerful  driver  in  the  development  of  sensing- 
based  interactive  systems  is  the  growth  of  the  computing  form 
factor  beyond  the  traditional  desktop  computer.  The  prolifera¬ 
tion  of  cell  phones,  personal  digital  assistants,  portable  gaming 
devices,  music  players,  tablet  PCs,  and  living  room-centric  PCs 
shows  a  trend  towards  the  vision  of  ubiquitous  computing,  in 


which  computing  is  situated  throughout  our  environment  and 
daily  life.  Individual  computing  devices  will  be  tailored  in  deep 
ways  to  the  task  at  hand,  away  from  the  “one  size  fits  all”  men¬ 
tality  of  desktop  computing.  The  use  of  sensors  and  recognition 
techniques  will  play  an  important  role  In  enabling  this  diversity, 
and  will  naturally  support  and  demand  a  variety  of  interaction 
styles.  As  Interactive  systems  become  more  tailored  to  a  given 
activity,  the  opportunity  to  leverage  the  techniques  described  in 
this  chapter  increases,  in  turn  enabling  the  application  of  the 
sensors  themselves.  Such  a  virtuous  cycle  may  speed  the  devel¬ 
opment  and  adoption  of  sensing-based  systems  in  ways  that  are 
hard  to  imagine  today. 
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INTRODUCTION 


Since  the  existence  of  mankind,  humans  have  been  inventing 
and  building  tools  to  make  life  easier  and  more  comfortable. 
One  of  mankind’s  first  tools  with  a  visual  display  was  the  sun¬ 
dial,  which  was  invented  more  than  3,000  years  ago  in  babylon. 
Due  to  its  basic  physical  principle,  one  of  its  significant  dis¬ 
advantages  was  that  it  could  not  show  the  time  during  the  night 
or  when  the  weather  was  cloudy.  Moreover,  the  first  sundials 
were  in  public  places,  so  that  people  had  to  make  the  effort  to 
go  there  in  order  to  check  the  time.  Later  developments  of 
clocks  (also  in  public  places)  eliminated  the  disadvantages  of 
the  sundial.  Church  clocks  could  not  only  show  the  time  under 
any  weather  condition  and  at  any  time  of  day  or  night,  but  they 
could  also  display  the  time  acoustically  and  therefore  bridge  dis¬ 
tances  up  to  a  few  miles.  The  following  developments  are  well 
known-from  the  grandfather  clock  and  the  fob  watch  continu¬ 
ing  with  the  first  analog  and  digital  wristwatches  to  today’s  high 
performance  wrist  computers  with  high-resolution  wrist-worn 
visual  displays  (Luczak  &  Oehme,  2002).  They  do  not  only  dis¬ 
play  the  time  accurately,  but  they  are  truly  multifunctional.  For 
instance,  some  can  visualize  individual  heart-rate  profiles,  and 
others  can  be  used  to  chart  one’s  precise  location  on  earth  with 
the  help  of  the  global-positioning  system.  A  few  can  even  re¬ 
place  personal  digital  assistants. 


BASIC  PRINCIPLES  AND  CONCEPTS 


Image  Generation 

Different  visual  displays  rely  on  different  physical  principles  to 
generate  an  image:  light  can  be  emitted,  transmitted,  or  re¬ 
flected  by  the  display.  Examples  from  noncomputer  displays 
might  be  helpful  to  explain  the  basic  principles  and  point  out 
relevant  advantages  and  restrictions. 

Writing  on  a  piece  of  paper  alters  the  reflective  properties 
of  the  paper  from  a  highly  reflective  white  to  a  less  reflective 
blue  or  black.  Ambient  light  is  needed  to  read  what  is  written  on 
the  paper,  but  the  contrast  ratio  between  text  and  background 
remains  the  same  under  different  lighting  conditions.  The  same 
principle  holds  for  reflectance-based  displays  such  as  liquid  crys¬ 
tal  displays  (LCDs)  without  backlighting  or  electronic  ink  and 
paper  (see  sections  “Nonemitter/Liquid-Crystal  Display”  and 
“Electronic  Ink  and  Electronic  Paper.”) 

Transparency  film  used  with  an  overhead  projector  and 
slides  used  with  a  projector  are  examples  of  transmission.  Dif¬ 
ferent  parts  of  the  transparency  or  slide  transmit  light  of  differ¬ 
ent  wavelengths  (i.e.  color)  in  different  amounts.  Although 
transmission  is  used  as  a  basic  principle  in  many  electronic  in¬ 
formation  displays,  it  is  also  often  combined  with  other  physical 
principles.  Eor  instance,  a  front-projection  display  (see  section 
“Classification  of  Electronic  Information  Display  Technologies”) 
consists  of  a  light  source,  an  LCD  generates  the  image  and  trans¬ 
mits  the  rays  of  light,  and  finally  the  projection  surface  reflects 
the  light  into  the  eye  of  the  observer. 


A  lighthouse  is  an  example  of  emission.  'Whereas  its  light  can 
be  easily  seen  at  night,  it  is  barely  visible  in  bright  daylight.  Ex¬ 
amples  of  emission-based  displays  are  CRTs  (see  section  “Cath¬ 
ode  Ray  Tube”)  electroluminescent  displays  (see  section  “Elec- 
troluminscent  Displays”),  and  cathodoluminescent  displays  (see 
section  “Cathodoluminescent  Displays”).  Similar  to  a  light¬ 
house,  these  displays  need  to  be  brighter  than  the  ambient  light 
to  be  perceived  properly. 

Electronic  Information  Displays 

The  development  of  advanced  display  technologies  began  with 
the  cathode  ray  tube  or  CRT,  which  was  first  discovered  in  the 
nineteenth  century,  although  the  observation  of  a  glow  from  the 
electronic  excitation  of  gas  in  an  evacuated  tube  may  go  back  as 
early  as  the  17th  century.  The  invention  of  the  device  itself  is  gen¬ 
erally  attributed  to  Karl  Eerdinand  Braun.  The  “Braun  tube”  re¬ 
portedly  first  built  in  Strasbourg,  Germany,  in  1896  or  1897,  used 
both  deflection  and  a  fluorescent  material  for  the  screen.  It  was 
probably  the  first  application  of  an  electronic  information  display 
in  natural  sciences  (Castellano,  1992).  CRT  displays  have  domi¬ 
nated  the  market  for  many  years  since  their  invention.  However, 
the  increasing  requirements  of  consumers  have  led  to  rapid  de¬ 
velopments  of  alternative  concepts  for  electronic  information 
displays.  Today,  a  large  variety  of  display  technologies  are  com¬ 
peting  with  CRTs  in  terms  of  image  quality,  and  new  technolo¬ 
gies,  which  are  currently  under  development,  may  soon  cause 
more  major  changes  in  display  demand  (see  section  “Classifica¬ 
tion  of  Electronic  Information  Display  Technologies”). 

There  are  two  basic  methods  for  displaying  information  vi¬ 
sually:  (a)  digital  and  (b)  analog.  A  digital  display  uses  binary 
numbers  of  arbitrary  length  for  displaying  symbols  such  as  char¬ 
acters  or  icons;  whereas  an  analog  system  uses  a  continuous 
signal  spectrum  for  information  presentation.  If  an  instant  im¬ 
pression  is  required,  analog  displays  often  present  information 
better.  Many  people  glance  quickly  at  their  analog  watch  and 
know  roughly  what  the  time  is  or  at  an  automobile  dashboard 
and  know  that  they  are  driving  too  fast.  Analog  displays  translate 
a  value  of  a  continuous  variable  into  an  angle  or  a  length.  Analog 
displays  used  for  control  devices  consist  of  a  scale  and  an  indi¬ 
cator  or  hand.  Either  the  scale  or  the  hand  moves.  There  are  a 
number  of  guidelines  for  the  design  of  analog  displays  (Wood- 
son,  1987;  Baumann  &  Lanz  1998). 

"When  accuracy  is  a  critical  issue,  however,  digital  displays  are 
preferred.  Reading  analog  meters  accurately  requires  time  and 
cognitive  skills,  whereas  writing  down  the  value  on  a  digital  dis¬ 
play  is  merely  a  case  of  copying  down  the  numbers.  In  cases 
where  both  accuracy  and  quick  reckoning  are  required,  hybrid 
displays  are  often  used. 

Because  a  computer  is  a  digital  device,  all  data  displayed  are 
binary  numbers,  and  therefore  all  commonly  used  electronic  in¬ 
formation  displays  are  digital  displays.  Nonetheless,  some  ap¬ 
plication  programs  mimic  analog  display  devices  because  of 
their  ergonomic  advantages.  If  the  spatial  resolution  and  the 
refresh  rate  of  the  digital  display  are  sufficiently  high  and  the 
covered  color  space  is  sufficiently  large,  no  significant  differ¬ 
ences  in  visual  performance  from  an  analog  display  occur  (see 
section  “Resolution”). 
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Display  Segmentation 

Common  personal  computer  (PC)  displays  have  a  sufficiently 
high  resolution  to  display  a  virtually  unlimited  set  of  characters, 
symbols,  and  graphics.  Conversely,  a  wide  variety  of  displays 
with  lower  spatial  resolution  is  found  on  other  kinds  of  com¬ 
puterized  technical  devices,  such  as  mobile  phones  or  music 
players.  Here,  the  set  of  displayable  tokens  is  often  far  more 
restricted. 

Basic  display  elements  are  binary,  whether  they  are  on  or  off 
A  ternary  display  element  can  be  built  from  a  two-color  light- 
emitting  diode  displays  (LED).  To  transmit  more  information, 
display  elements  with  more  discrete  states  are  necessary  using 
i.e.,  different  colors  or  brightness  levels  to  distinguish  between 
states. 

A  more  common  way  to  increase  the  amount  of  displayable 
information  is  the  grouping  of  a  number  of  binary  display  ele¬ 
ments  into  a  unit.  The  classical  seven-segment  display  (Fig.  11.1) 
and  many  custom  displays  are  examples  of  this  approach. 

If  multiple  binary  display  elements  are  arranged  in  matrix 
form  and  the  elements  are  enhanced  to  display  multiple  colors 
and  different  levels  of  brightness,  the  basic  concept  of  the  com¬ 
mon  CRT  and  flat-panel  displays  can  be  derived.  The  shape  of 
picture  elements  (pixels)  in  current  electronic  information  dis¬ 
plays  is  typically  square  or  rectangular.  The  pixels  are  arranged 
in  horizontal  rows  and  vertical  columns,  which  are  called  “pixel 
matrices”  or  “pixel  formats.” 

Display  Dimensionality 

Visual  displays  can  be  distinguished  by  means  of  the  spatial  di¬ 
mensionality  of  the  image  generated.  Clearly,  as  the  price  of  the 
display  and  the  complexity  of  the  display  technology  increase, 
the  more  dimensions  are  used. 

The  simplest  way  to  display  information  is  to  rely  on  only  one 
spatial  dimension.  Although  the  notion  of  one  dimensional  (ID) 
display  is  not  strictly  correct  since  human  perception  requires  a 
display  unit  to  have  two  spatial  dimensions,  the  secondary  dimen¬ 
sion  of  such  displays  does  not  provide  information  and  is  simply 
a  function  of  the  primary  dimension.  ID  display  represents  the 
minimal  approach  to  presenting  information,  which  is  encoded  by 
either  discrete  or  continuous  variables  in  one  dimensional  states 
space.  Nevertheless,  a  simple  source  of  light,  such  as  a  bulb  or  a 
candle,  can  convey  a  binary  message  of  arbitrary  length.  For  most 
current  applications,  for  example,  to  show  the  status  of  a  device  as 
“on”  or  “off,”  LEDs  are  used.  LEDs  give  off  light  radiation  when  bi¬ 
ased  in  the  forward  direction.  Most  light  emitting  diodes  function 
in  the  near  infrared  and  visible  ranges,  with  the  exception  of  ultra 
violet  LEDs.  When  displaying  a  continuous  variable  in  one  dimen¬ 
sion,  a  control  window  usually  indicates  the  actual  level  of  the  pa- 


FIGURE  11.1.  Seven-segment  display. 


rameter,  which  is  measured  or  controlled.  The  volume  level  of  a 
speaker  or  the  temperature  in  a  room  can  be  indicated  using  a  row 
of  pixels,  LEDs,  or  a  slider  on  a  scale. 

In  order  to  display  more  complex  information,  which  can 
only  be  shown  using  two  independent  spatial  dimensions  (length 
and  width),  two-dimensional  (2D)  displays  are  used.  This  cate¬ 
gory  of  displays  has  by  far  the  biggest  segment  in  the  market. 
They  are  suitable  for  displaying  all  kinds  of  graphical  informa¬ 
tion.  Images  are  formed  by  activating  pixels,  each  of  which  has  a 
unique  location  (x,y)  within  a  two-dimensional  plane,  and  a 
color  and  gray  scale  value  may  be  assigned  to  each.  Although 
these  displays  have  only  two  dimensions,  stereoscopic  percep¬ 
tion  is  also  feasible  using  depth  cues  such  as  relative  size,  height 
relative  to  the  horizon,  interposition  or  occlusion,  shadows  and 
shading,  spatial  perspective,  linear  perspective,  and  texture  gra¬ 
dients.  Even  more  realistic  three-dimensional  (3D)  scenes  can 
be  viewed  on  2D  screens  by  applying  special  viewing  devices, 
such  as  shutter  glasses  or  polarizing  glasses.  In  order  to  simulate 
depth  perception,  each  of  the  eyes  must  see  a  slightly  different 
image  (binocular  parallax).  For  instance,  the  so-called  shutter 
glasses  alternately  block  each  eye’s  view  so  that  each  perceives 
only  the  image  intended  for  it.  The  alternation  of  images  oc¬ 
curs  many  times  per  second  and  these  separate  images  are 
fused  into  one  true  3D  image  at  higher  cortical  levels.  The  speed 
of  the  shutters  is  directly  proportional  to  the  refresh  rate  of  the 
monitor.  In  the  case  of  polarising  glasses,  different  light  polari¬ 
sation  modes  are  used  to  encode  the  information  for  the  left 
and  the  right  eye,  respectively.  The  polarizing  glasses  filter  out 
the  unwanted  photon  polarization  from  each  view  and  two  sep¬ 
arate  images  are  projected  onto  the  visual  cortex.  These  tech¬ 
niques  are  used  with  immersive  desks  or  a  Computer  Animated 
Virtual  Environment  (CAVE). 

One  of  the  limitations  of  such  stereoscopic  displays  is  that 
motion  parallax,  as  one  of  the  depth  cues,  is  not  provided  au¬ 
tomatically.  Display  devices  that  are  able  to  provide  the  correct 
stereoscopic  perspective  to  each  of  the  viewer’s  eyes  over  a 
range  of  viewing  positions  can  be  defined  as  auto-stereoscopic. 
Volumetric  3D  displays  fall  into  this  category.  There  is  a  grow¬ 
ing  need  to  present  complex  and  voluminous  3D  information  in 
such  way  that  it  may  be  interpreted  rapidly,  naturally,  and  accu¬ 
rately.  Volumetric  displays  (also  called  direct  volume  display  de¬ 
vices)  permit  3D-image  data  to  be  presented  in  a  transparent 
volume.  Innate  depth  cues  inherent  in  three-dimensional  ob¬ 
jects  are  then  automatically  present,  and,  in  principle,  the  three- 
dimensional  images  may  be  viewed  from  an  arbitrary  direction. 
Volumetric  display  systems  provide  a  transparent  volume  in 
space  where  animated  images  may  be  depicted.  Since  these  im¬ 
ages  have  three  physical  dimensions,  depth  cues  are  inherent 
without  the  need  to  compute  complex  transformations,  which 
is  a  considerable  benefit  of  this  type  of  display.  Furthermore,  im¬ 
ages  may  be  observed  naturally  without  the  need  to  wear  special 
glasses.  Common  to  all  volumetric  display  systems  is  a  volume 
or  region  occupying  three  physical  dimensions  within  which  im¬ 
age  components  may  be  positioned,  depicted,  and  possibly  ma¬ 
nipulated.  Volumetric  display  systems  employ  a  method  for  im¬ 
age  construction  similar  to  2D  displays.  Elements,  which  are 
called  “voxels,”  are  activated  at  the  appropriate  locations  within 
the  3D  space.  Each  voxel  has  a  unique  position  (x,y,  z)  and  can 
be  assigned  a  color  and  gray-scale  value.  Depending  on  the 
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technology  used  to  activate  the  voxels,  the  volumetric  displays 
can  be  assigned  to  several  categories,  such  as  emissive  volume 
display,  varifocal  mirror  display,  laser  displays,  or  holographic 
displays.  A  more  detailed  description  of  volumetric  display  tech¬ 
nology  can  be  found  in  Blundel  and  Schwarz  (2000). 

Human-Related  Display  Functionality  and  Other  Criteria 

The  paradigm  of  wearable  computing  requires  the  visual  dis¬ 
plays  to  be  worn  on  the  human  body  (Azuma,  1997;  Feiner, 
2002).  A  popular  approach  for  mobile  applications  is  to  mount 
the  display  on  the  user’s  head.  Thus,  the  user  can  work  hands¬ 
free  and  is  always  able  to  perceive  information  in  his  or  her  field 
of  vision.  These  displays  are  commonly  referred  to  as  head- 
mounted  displays  (HMDs).  From  the  technological  point  of 
view,  HMDs  can  roughly  be  split  into  three  main  categories,  ac¬ 
cording  to  the  way  the  image  is  provided  to  the  user  (Fig.  11.2). 

The  first  category,  referred  to  as  “screen-based  HMDs,”  com¬ 
prises  all  HMDs  whose  picture  elements  are  created  in  a  spa¬ 
tially  adjacent  way.  The  ocular  image  forming  displays  use  tech¬ 
nologies  such  as  CRT,  LCD,  digital  mirror  devices  (DMD),  or 
organic  light-emitting  diodes  (OLEDs).  Most  of  today’s  HMDs 
on  the  market,  however,  are  based  on  transmissive  or  reflective 
liquid-crystal  minidisplays  (LCD)  (Holzel,  1999).  Miniature  LCDs 
are  available  for  relatively  low  prices  and  provide  an  appropriate 
resolution  (SXGA  resolution  of  up  to  1280  X  1024  pixels) ,  while 
being  lightweight  (von  Waldkirch,  2005). 

In  the  retinal-projection  method,  the  display  image  is  pro¬ 
jected  directly  onto  the  retina  in  the  same  way  as  a  slide  is  pro¬ 
jected  onto  the  screen.  Projection  displays  are  normally  de¬ 
signed  in  the  form  of  a  Maxwellian-view  optical  system  (Bass, 
1995),  where  the  screen  plane  is  optically  conjugated  to  the 


retina  and  the  illumination  source  is  conjugated  to  the  eye’s 
pupil  plane.  Consequently,  such  displays  can  only  be  imple¬ 
mented  on  the  basis  of  illuminated  screens  (like  LCD  and 
DMD) ,  and  not  with  self-emitting  technologies  like  OLED  and 
CRT  (von  Waldkirch,  2005). 

Scanning  displays,  where  the  image  is  scanned  pixel  by  pixel 
directly  onto  the  retina,  are  an  alternative  to  screen-based  dis¬ 
plays.  Retinal  scanning  displays  (RSDs),  also  referred  to  as  vir¬ 
tual  retinal  displays  (VRDs),  are  most  important  in  this  category. 
"VRD  technology  was  first  proposed  in  1992  by  Sony.  Since  1992, 
researchers  at  the  Human  Interface  Technology  Lab  (Washing¬ 
ton,  DC)  have  been  developing  this  technology  to  obtain  a  com¬ 
mercial  product.  In  2003,  they  presented  the  first  commercial 
"VRD,  called  “Nomad,”  together  with  the  U.S. -based  company 
Microvision.  With  this  technology,  an  image  is  scanned  directly 
onto  a  viewer’s  retina  using  low-power  red,  green,  and  blue 
light  sources,  such  as  lasers  or  LEDs  (Urey  Wine,  &  Lewis,  2002). 
The  VRD  system  has  superior  brightness  and  contrast  compared 
to  LCDs  and  CRTs,  as  it  typically  uses  spectrally  pure  lasers  as 
its  light  source  (Stanney  &  Zyda,  2002).  Finally,  a  combination  of 
a  scanning  technology  and  a  screen-based  system  is  possible 
(see  Pausch,  Dwivedi,  &  Long,  1991;  Aye,  Yu,  Kostrzewski,  Sa¬ 
vant,  &  Jannson,  1998;  Fruehauf,  Aye,  Yua,  Zou,  &  Savant,  2000). 

Beside  technological  categorization,  the  various  HMD  con¬ 
cepts  can  be  classified  according  to  their  functionality  or  in¬ 
tended  purpose.  Fig.  11.3  shows  a  rough  functional  categoriza¬ 
tion  of  HMDs  (von  Waldkirch,  2005).  Here,  HMDs  can  be  divided 
into  monocular,  binocular,  and  binocular  displays.  In  addition, 
all  these  types  can  provide  the  image  in  either  a  closed-view  or 
see-through  mode. 

Monocular  displays  have  one  display  source  and  thus  pro¬ 
vide  the  image  to  one  eye  only.  Therefore,  in  comparison  with 
biocular  or  binocular  HMDs,  monocular  displays  are  lighter  and 


FIGURE  1 1 .2.  Systematics  of  head-mounted  displays  (von  Waldkirch,  2005). 
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FIGURE  1 1 .3.  Head-mounted  displays  categorized  according  to  functions  (von  Waldkirch,  2005). 


cheaper.  However,  in  a  monocular  HMD,  the  view  seen  by  the 
eye,  which  is  not  blocked  by  a  screen,  may  produce  binocular  ri¬ 
valry  with  the  image  seen  through  the  HMD.  Biocular  HMDs 
have  two  displays  with  separate  screens  and  optical  paths,  al¬ 
lowing  both  eyes  to  see  exactly  the  same  images  simultaneously. 
Thus,  the  user  perceives  a  2D  image  only,  similar  to  a  computer 
screen.  Binocular  displays  allow  stereoscopic  viewing  with  3D- 
depth  perception.  To  produce  the  stereoscopic  view,  two  spa¬ 
tially  slightly  incongruent  images  are  provided  to  the  left  and 
to  the  right  eye  (Tinocular  parallax,  see  section  “Human  Related 
Display  Functionality  and  Other  Criteria”) . 

Beside  the  HMDs,  important  categories  of  displays  are  wrist- 
worn  and  handheld  displays.  These  displays  are  very  widespread, 
because  they  are  integrated  in  wristwatches,  mobile  phones,  per¬ 
sonal  digital  assistants  (PDAs),  media  players,  and  other  portable 
devices  (see  section  “Mobile  Phones  and  Handheld  Devices”). 

A  truly  “wearable”  visual  display  was  introduced  recently  by 
Philips  Research  and  Textile  Institute  TITV  Greiz.  The  company 
has  been  developing  photonic  textiles-fabrics  that  contain  light¬ 
ing  systems  and  therefore  can  serve  as  electronic  information 
displays.  Multicolored  light-emitting  diodes  (LEDs)  were  suc¬ 
cessfully  integrated  into  fabrics  without  compromising  the  soft¬ 
ness  of  the  cloth.  Passive  matrixes  of  10  10  red,  green,  and  blue 
LED  packages  were  placed  on  interconnecting  substrates,  made 
entirely  of  cloth.  Applied  in  soft  fabric,  the  light  from  the  small¬ 
est  pixels  diffuses,  resulting  in  a  more  or  less  continuous  light- 
emitting  surface.  Photonic  textiles  can  also  be  made  interactive. 
Philips  has  achieved  interactivity  by  incorporating  sensors  (such 
as  orientation  and  pressure  sensors)  and  communication  de¬ 
vices  (such  as  Bluetooth,  GSM)  into  the  fabric.  Photonic  tex¬ 
tiles  open  up  a  wide  range  of  possible  applications  in  the  fields 
of  communication  and  personal  health  care  (Philips  Research 
Press  Information,  2005). 


QUALITY  CRITERIA 


Display  Colors 

Different  displays  can  present  differing  numbers  of  colors.  The 
simplest  displays  have  two  pixel  states,  either  on  or  off  and, 
therefore,  are  monochrome.  The  greater  the  difference  be¬ 
tween  these  two  states,  expressed  as  a  contrast  ratio  or  the 
weighted  wavelength  differences  of  the  light  falling  upon  the 
observer’s  eye,  the  more  easily  the  viewer  can  distinguish  be¬ 
tween  them.  More  complex  displays,  called  gray-scale  displays, 
are  also  able  to  present  different  luminance  levels  of  pixels. 

Clearly,  all  modern  electronic  information  displays  repro¬ 
duce  color  as  well  (Hanson,  2003).  The  Commission  Interna¬ 
tional  de  I’Eclairage  (CIE)  chromaticity  diagram  is  often  used  to 
represent  a  display’s  color  capabilities.  The  diagram  is  based  on 
the  CIE  standard  XYZ  system.  X,  V,  and  Z  are  hypothetical  pri¬ 
mary  colors.  Fig.  11.4  depicts  the  equal  energy-matching  func¬ 
tions  of  the  standard  ATZ  system.  The  ATZ  primaries  were  cho¬ 
sen  so  that  whenever  equal  amounts  of  them  are  combined, 
they  match  white  light. 

Independently  of  the  absolute  value  ofX,  F,  and  Z,  relative 
amounts — denoted  by  lowercase  letters — can  be  used  to  de¬ 
scribe  a  color.  Because x, y,  and z  are  relative  amounts,  x  +  y  + 
z  =  1.  Consequently,  when  x  and  j  are  known,  z  is  known  as 
well,  because  z  =  1  —  (pc  +  y).  Therefore,  a  curve  can  be  used  to 
depict  all  the  possible  combinations  of  the  AYZ  primaries,  as 
shown  here  in  Fig.  11.5.  At  the  intersection  ofx  =  0.33  andy  = 
0.33  pure  white  (i.e.  achromatic  light)  can  be  found.  The  curved 
line,  the  locus  of  spectral  colors,  represents  the  x,y,  and  z  val¬ 
ues  for  all  of  the  spectral  colors.  The  hue  of  any  color  within  the 
curved  line  can  be  determined  by  drawing  a  line  from  pure  white 
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FIGURE  1 1 .4.  Equal  energy-matching  functions  of  the  standard 
XYZ  system.  The  curves  X,  Y,  and  Z  and  show  the  relative 
amounts  of  the  X,  Y  and  Z  primaries  needed  to  match  the  color  of 
the  wavelength  of  light  (cf.  Kaufman,  1974). 
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FIGURE  11.5.  The  CIE  chromaticity  diagram.  The  curve,  the  lo¬ 
cus  of  spectral  colors,  represents  the  x,  y  and  z  values  for  all  of 
the  spectral  colors  between  380  nm  and  770  nm.  The  triangle  de¬ 
picts  the  colors  that  can  be  presented  by  a  display  that  uses  the 
three  primary  colors  at  the  corners  of  the  triangle.  The  hue  of  any 
color  within  the  curved  line  can  be  determined  by  drawing  a  line 
from  pure  white  (x  =  0.33  and  y  =  0.33)  through  the  color  of  in¬ 
terest  to  the  locus  line.  A  pale  orange  (x  =  0.45  and  y  =  0.35)  is 
marked  by  a  black  dot  as  an  example.  The  saturation  is  given  by 
the  relative  length  of  the  dashed  line  between  white  and  the  color 
of  interest  and  white  and  the  intersection  of  this  line  with  the 
locus  (cf.  Kaufman,  1974). 


through  the  color  of  interest  to  the  locus  line.  The  saturation  is 
given  by  the  relative  length  of  the  line  between  white  and  the 
color  of  interest,  and  white  and  the  intersection  of  this  line  with 
the  locus  (see  Fig.  11.5). 

The  representation  of  this  information  in  the  computer’s 
hardware  and  software  places  a  further  restriction  on  the  variety 


of  colors.  Popular  configurations  devote  4  to  32  bits  to  the  rep¬ 
resentation  of  color  information,  yielding  16  to  16,777,216  col¬ 
ors  with  different  hues  and  saturation. 

Because  X  andy  are  relative  values,  an  additional  parameter, 
not  shown  in  the  Fig.  11.5,  is  needed  to  specify  the  luminance. 
It  is  denoted  by  Y,  and  hence  this  system  of  color  characteriza¬ 
tion  is  referred  to  as  the  “ayF-system.” 

In  order  to  depict  a  great  variety  of  different  colors,  common 
displays  use  the  three  primary  colors:  red,  green,  and  blue.  The 
actual  scope  of  colors  is  restricted  by  the  position  of  each  of 
the  three  primaries  in  the  CIE  chromaticity  diagram  and  lies 
within  the  triangle  defined  by  these  primaries  (see  Fig.  11.5). 

Whereas  the  CIE  chromaticity  diagram  is  useful  In  depicting 
the  color  space,  additional  systems  are  used  to  describe  color  in 
the  context  of  human-computer  interaction  (HCI) .  The  specifi¬ 
cation  of  hue,  saturation,  and  brightness  (HSB;  sometimes  in¬ 
stead  of  brightness  the  terms  luminance  or  lightness  are  used) 
provides  a  device-independent  way  to  describe  color.  For  this 
model,  the  color  space  is  an  upside-down  cone  (see  Fig.  11.6). 
On  the  edge  of  the  cone  base,  the  visible  light  spectrum  is 
arranged  in  a  circle  by  joining  red  and  violet.  Hue  Is  the  actual 
color  and  can  be  specified  in  angular  degrees  around  the  cone 
starting  and  ending  at  red  =  0°  or  360°,  in  percent  or  eight-bit 
values  (0  to  255).  Saturation  is  the  purity  of  the  color,  measured 
in  percent  or  eight-bit  values  from  the  center  of  the  cone  (min.) 
to  the  surface  (max.).  At  0%  saturation,  hue  is  meaningless. 
Brightness  is  measured  in  percent  or  eight-bit  values  from  black 
(min.)  to  white  (max.).  At  0%  brightness,  both  hue  and  satura¬ 
tion  are  meaningless. 

RGB  stands  for  the  three  basic  colors — red,  green  and  blue 
— that  are  produced  by  the  visual  display.  A  number  of  other  col¬ 
ors  can  be  produced  by  additive  color  mixing.  If  any  two  of  the 
color  channels  are  mixed  In  equal  proportions,  new  colors  are 


FIGURE  1 1 .6.  Hue,  saturation  and  brightness  system  for  speci¬ 
fying  color. 
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created:  blue  and  green  combine  to  create  cyan  (bright,  light 
blue);  red  and  blue  make  magenta  (a  bright  pink);  and  red  and 
green  make  yellow.  If  all  three  colors  are  mixed  together  equally 
at  full  power,  the  result  is  white  light.  Colors  can  be  specified 
by  giving  the  relative  power  of  each  color  in  percent  or  as  an 
8-bit  number  (0  to  255  in  decimal  or  00  to  FF  in  hexadecimal). 
So,  for  example,  (0,  0,  0)  specifies  black;  (255,  0,  0),  a  pure  red 
and  (255.  255,  255).  white. 

CMYK  model  defines  cyan,  magenta,  yellow,  and  black  as 
main  colors.  This  system  is  used  to  specify  colors  on  the  mon¬ 
itor  for  printing.  In  printing,  colors  are  mixed  subtractively  and, 
using  red,  green,  and  blue,  it  would  not  be  possible  to  produce 
many  colors.  By  choosing  cyan,  magenta,  and  yellow  as  basic 
colors,  many  other  colors,  including  red,  green,  and  blue,  can 
be  produced.  Theoretically,  when  all  three  basic  colors  are 
printed  over  each  other,  the  resulting  color  should  be  black. 
In  practice,  this  is  not  the  case,  and  a  fourth  printing  process 
with  black  ink  is  also  used.  Values  for  CMYK  are  often  speci¬ 
fied  as  percentages. 


Brightness  and  Glare 

Brightness  can  be  defined  as  the  perceived  amount  of  light  that 
comes  from  an  object.  It  is  a  physiological  interpretation  of 
luminance  (Z),  an  important  photometric  quantity.  It  is  a  mea¬ 
sure  of  the  luminous  intensity  of  light  (/)  emitted  from  a  light 
source  per  unit  surface  area  (A)  normal  to  the  direction  of  the 
light  flux  (Gakir,  Hart,  &  Stewart,  1979).  In  this  context,  d  is  the 
angle  between  the  perpendicular  from  the  surface  and  the  di¬ 
rection  of  measurement  (Boff  &  Lincoln,  1988): 

Z  =  -cos6»  (11.1) 

A 

The  unit  of  luminance  is  lumen  per  steradian  per  square 
meter  or  candela  per  square  meter  (cd/m^) .  In  some  cases  the 
unit  foot-lambert  (fL)  is  used  to  describe  luminance  (1  fL  = 
3,426  cd/rc?). 

The  variable  /  is  the  luminous  intensity  (also  called  “radiant 
intensity”),  which  represents  the  emitted  flux-per-unit  solid 
angle  fi: 


can  be  expressed  using  the  following  equation,  in  which  the 
integration  is  performed  over  the  range  of  visible  wavelengths 
(Macadam,  1982): 

X=770 

4>=  J  K(1)P(1)<71  (11.3) 

The  human  eye  cannot  collect  all  of  the  light  that  is  radiated 
or  reflected  from  the  source.  Brightness  also  depends  on  the  size 
of  the  surface  spot,  which  the  light  is  emanating  from.  Figure 
11.7  shows  how  the  brightness  of  a  surface  is  actually  given  by 
the  luminous  flux  per  unit  of  the  projected  area  of  the  emitting 
surface  per  unit  solid  angle  depending  on  the  viewing  angle. 

Some  displays,  such  as  LCDs,  appear  dimmer  from  an 
oblique  angle  than  from  the  normal  viewing  angle;  whereas 
most  emissive  displays,  such  as  CRTs,  emit  light  in  such  a  way 
that  the  angular  luminous  intensity  approximately  follows  Lam¬ 
bert’s  cosine  law  (Lambertian  surface),  resulting  in  approxi¬ 
mately  constant  luminance  across  all  viewing  directions. 

High  luminance  levels  in  the  field  of  view  cause  glare  dis¬ 
comfort.  Glare  caused  by  light  sources  in  the  field  of  view  is 
called  “direct  glare”;  glare  caused  by  light  being  reflected  by  a 
surface  in  the  field  of  view  is  called  “reflected  glare”  (Fig.  11.8). 

Reflected  glare  can  occur  from  specular  (smooth,  polished, 
or  mirror-like)  surfaces,  spread  (brushed,  etched,  or  pebbled) 
surfaces,  diffuse  (flat  or  matt)  surfaces  or  as  a  combination  of 
the  above  three  (compound)  (Sanders  &  McCormick,  1993). 
Glare  sources  are  more  disturbing  when  they  have  higher  lu¬ 
minance  and  when  they  are  closer  to  the  fixation  point  (Sheedy 
2005).  Experiments  also  show  that  visibility  is  decreased  by 
glare,  and  the  decrease  is  greatest  when  the  source  of  the  glare 
is  in  the  line  of  vision  (Boff  &  Lincoln,  1988). 

In  order  to  avoid  glare,  it  is  advisable  to  position  the  display 
right-angled  to  the  window  (so  that  the  line  of  vision  is  parallel  to 
the  window).  The  display  can  be  protected  with  curtains,  blinds, 
or  movable  walls.  Lamps  that  can  be  reflected  in  the  monitor 
must  not  have  a  mean  luminance  of  more  than  200  cd/m^  and 
the  maximum  luminance  must  be  less  than  400  cd/m^  accord¬ 
ing  to  German  standard  DIN  5035-7  (Fig.  11.9). 


,  .  dip 
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(11.2) 


Luminous  intensity  is  measured  in  candela  (cd);  the  unit  is 
lumen  per  steradian. 

Brightness  of  a  display  depends  not  only  on  the  optical 
power  generated,  transmitted,  or  reflected  by  the  display  but 
also  on  the  response  of  the  human  eye  at  certain  wavelengths. 
The  measure  for  this  relation  is  photopic  luminosity,  KQC), 
which  relates  optical  power  in  watts  at  a  given  wavelength  to 
its  effect  on  the  human  visual  system.  This  luminosity  can  be 
considered  as  the  optical  spectral  response  of  the  eye  of  a  “stan¬ 
dard  observer”  (Nelson  &  Wullert,  1997).  Mathematical  relation 
of  luminance  to  the  spectral  distribution  of  optical  power  P(K) 


FIGURE  11.7.  Definition  of  the  brightness  of  a  surface  as  a  func¬ 
tion  of  the  direction  from  which  the  surface  is  observed  (Nelson 
&  Wullert,  1997). 
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dot  or  filter  element,  and  dot  pitch  is  defined  as  the  center-to- 
center  distance  between  adjacent  green  phosphor  or  filter  ele¬ 
ment  dots.  Thus,  the  pixel  density  is  the  reciprocal  of  pixel  pitch, 
which  is  equal  to  the  dot  pitch. 

There  is  a  simple  relationship  between  resolution,  in  pixel/ 
inch,  and  viewable  diagonal  screen  size  in  inches  for  different 
monitor  pixel  formats.  The  resolution  quality  levels  are  defined 
as  follows  (Castellano,  1992): 

•  Low  resolution:  <50  pixels/inch 

•  Medium  resolution:  51  to  70  pixels/inch 

•  High  resolution:  71  to  120  pixels/inch 

•  Ultrahigh  resolution:  >120  pixels/inch 

The  highest  resolution  Tmax)  that  can  be  represented 
on  a  CRT  monitor  can  be  calculated  as  follows: 


FIGURE  1 1 .9.  Glare  can  be  avoided  by  limiting  lamp  luminance 
and  by  positioning  the  display  correctly. 
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is  the  effective  screen  diagonal,  where  the  picture  can  be 
viewed.  The  angle  a  (angle  between  the  horizontal  and  the 
screen  diagonal)  can  be  calculated  using  knowledge  about  the 
aspect  ratio  between  the  vertical  (V)  and  horizontal  (//): 
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As  an  example,  a  17"  monitor  =  15,8  inches)  with  a  hor¬ 
izontal  stripe  pitch  of  SP  =  0.26  mm  and  an  aspect  ratio  //:  V  = 
3 : 4  (common  computer  monitor)  a  maximum  number  of  dots 
presentable  in  the  horizontal  of  1,235  or  926  dots  in  the  verti¬ 
cal  is  possible.  For  our  example,  this  means  that  with  a  resolu¬ 
tion  of  1,024  X  768  on  the  graphic  card,  each  dot  that  has  to  be 
presented  needs  to  be  provided  with  at  least  one  phosphorus 
wire.  However,  with  a  resolution  of  1,280  X  1,024,  not  all  dots 
can  be  represented  clearly. 


Resolution 

The  arrangement  of  pixels  into  horizontal  rows  and  vertical 
columns  is  defined  as  the  pixel  format.  It  is  often  referred  to  as 
a  “resolution.”  However,  this  is  not  a  resolution  parameter  by 
itself  The  resolution  of  a  monitor  mainly  depends  on  its  screen 
diagonal  and  its  dot  pitch  (stripe  pitch,  SP).  When  set  to  lower 
resolutions,  a  pixel  encompasses  multiple  dots.  Thus,  defining 
the  resolution  pixel  density,  or  the  number  of  pixels  per  linear 
distance  (pixels  per  inch  or  pixels  per  centimeter)  plays  an  im¬ 
portant  role.  This  parameter  indicates  how  close  the  pixels  are. 

As  a  picture  element,  the  pixel  is  the  basic  unit  of  program¬ 
mable  color  on  a  display  or  in  a  digital  image.  In  color  CRT,  the 
pixel  is  formed  from  a  number  of  phosphor  dots  and  may  con¬ 
sist  of  a  number  of  triads  that  are  composed  of  a  red,  green  and 
blue  phosphor  dots.  The  dot  pitch  in  a  CRT  display  with  a 
shadow  mask  is  defined  as  the  distance  between  the  holes  in 
the  shadow  mask,  measured  in  millimeters  (mm).  In  desktop 
monitors,  common  dot  pitches  are  0.31  mm,  0.28  mm,  0.27  mm, 
0.26  mm,  and  0.25  mm.  In  matrix-driven,  flat,  panel  displays,  every 
single  pixel  is  composed  of  a  red,  a  green,  and  a  blue  phosphor 


Contrast  and  Gray  Scales 

Strictly  speaking,  contrast  is  not  a  physiologic  unit,  but  is  nev¬ 
ertheless  one  of  the  most  important  photometry  quantities.  It  is 
related  to  a  lot  of  visual  performance  criteria,  such  as  visual  acu¬ 
ity,  contrast  sensitivity,  speed  of  recognition,  and  so  forth.  There 
are  different  definitions  of  contrast. 

The  Weberscher  Contrast  (luminance  contrast)  in  accor¬ 
dance  with  the  International  Lighting  Commission  is  defined  as 
follows: 

^  _  -^object  -^background 

■^background 

The  definition  implies  that  a  negative  luminance  contrast  is 
created  by  presenting  a  dark  object  on  a  bright  background  and 
a  positive  contrast  is  created  by  presenting  a  bright  object  on  a 
dark  background  (Ziefle,  2002).  This  formula  is  not  symmetric; 
if  the  luminance  of  the  object  is  much  greater  than  the  back¬ 
ground  luminance,  the  numerical  values  of  contrast  are  large 
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and  increase  rapidly.  If  the  background  luminance  is  much 
greater  than  the  object  luminance,  the  luminance  contrast  tends 
to  move  asymptotically  to  the  value  —1. 

The  Michelson  Contrast  (also  called  “modulation  con¬ 
trast,”  “depth  of  modulation,”  or  “relative  contrast  about  the 
mean  luminance”)  is  generally  used  for  periodic  stimuli  that  de¬ 
viate  symmetrically  above  and  below  a  mean  luminance  value, 
for  example,  gratings  or  bar  patterns,  and  is  computed  as  fol¬ 
lows  (Boff  &  Lincoln,  1988): 


C 


m 


^max  ^min 
■^max  ■^min 


(11.7) 


and  Znjin  are  maximum  and  minimum  luminance  in  the 
pattern.  The  Michelson  Contrast  will  take  on  a  value  between  0 
and  1.  The  minimum  modulation  threshold,  which  means  the 
smallest  brightness  modulation  that  can  be  detected,  occurs  at 
a  brightness  modulation  of  approximately  0.003  at  about  3  cycles 
per  degree  of  the  visual  field  (Nelson  &  Wullert,  1997). 

Sanders  and  McCormick  (1993)  specified  another  popular 
possibility  for  defining  contrast  (called  luminous  contrast)  as 
the  difference  between  maximum  and  minimum  luminance  in 
relationship  to  maximum  luminance: 


C,=- 


(11.8) 


fleeted  off  the  display’s  surface  (for  a  reflective  type  LCD).  In 
multiplexed  LCDs,  the  contrast  is  affected  by  the  viewing  angle. 
Therefore,  the  contrast  should  be  indicated  by  referring  to  the 
solid  angle,  known  as  the  “viewing  cone”  (Castellano,  1992). 

Large  contrast  ratios  are  also  necessary  in  order  to  satisfy 
gray-scale  requirements.  Based  on  the  idea  of  the  brightest 
areas  being  white  and  the  darkest  areas  being  black,  levels  of 
brightness  in  between  the  two  extremes  are  referred  to  as  gray 
levels  or  gray  shades,  and  the  ability  to  display  them  as  gray 
scale  (Nelson  &  Wullert,  1997).  Technically,  gray  scale  is  a  term 
that  should  be  applied  only  to  monochrome  or  “gray”  displays. 
The  term  is  now  often  applied  to  color  displays  where  interme¬ 
diate  brightness  controls  are  required  by  the  system  (Castellano, 
1992).  The  number  of  gray  scales  is  determined  both  by  the 
contrast  level  and  the  ability  of  the  human-visual  system  to  dis¬ 
tinguish  between  the  different  brightness  levels.  Our  visual  sys¬ 
tem  reacts  to  the  changes  in  brightness  level  as  a  logarithmic 
function;  therefore  very  small  differences  in  brightness  might  be 
not  perceived  (Theis,  1999).  The  acceptable  difference  in  bright¬ 
ness  levels  between  scales  is  1.414  (the  square  root  of  2).  In  or¬ 
der  to  obtain  five  levels  of  gray  scale  above  background,  a  dis¬ 
play  must  have  a  contrast  ratio  of  at  least  5.6: 1  (1,  1.41,  2,  2.82, 
4  and  5.65 : 1)  (Castellano).  Full-color  displays  have  about  128  to 
256  linear  gray  levels.  The  number  of  gray  shades  (G)  that  can 
be  displayed  can  be  defined  as  a  logarithmic  function  based  on 
contrast  ratio  (Nelson  &  Wullert): 


For  a  simple  luminance  increment  or  decrement  relative  to 
background  luminance  (such  as  a  single  point),  the  contrast 
ratio  is  used  (Boff  &  Lincoln,  1988).  This  simple  ratio  of  two  lev¬ 
els  of  luminances  is  widely  used  in  engineering  (Bosman,  1989): 

(11.9) 

■^background 

These  contrast  measures  can  be  converted  from  one  to  an¬ 
other.  For  example,  given  the  contrast  ratio  and  knowledge 

of  positive  contrast  conditions  =  Zobject;  ^mln  =  -^background). 

the  other  visual  contrasts  can  be  calculated  as  follows: 
Weberscher  Contrast: 


Michelson  Contrast: 


C 


m 


Q  -1 
Q  +  l 


(11.10) 


(11.11) 


Luminous  Contrast: 


(11.12) 

Q 

In  a  computer  display,  the  contrast  ratio  of  symbols  to  back¬ 
ground  has  to  be  more  than  3:1.  The  signs  have  to  be  repre¬ 
sented  sharply  up  to  the  edges  of  the  screen.  Modern  LCDs 
have  a  contrast  ratio  of  500 : 1  or  more.  Since  LCDs  do  not  emit 
light,  the  luminance  in  the  previous  equation  for  contrast  refers 
to  the  luminance  of  light  either  passing  through  the  display  (for 
a  backlit  transmissive  type)  or  the  luminance  of  the  light  re- 


G  =  l  + 


log 


^max 
■^min  y 


(11.13) 


Refresh  Rates 

The  number  of  times  that  the  image  on  the  display  is  drawn  per 
second  is  called  the  display  refresh  rate  (also  called  frame  rate 
or  vertical  scanning  rate).  The  unit  of  measurement  is  Hertz 
(Hz).  A  high-display  refresh  rate  prevents  a  flickering  image,  be¬ 
cause  there  is  only  a  small  amount  of  time  between  two  succes¬ 
sive  stimulations  of  a  single  dot.  A  refresh  rate  of  between  70  and 
80  Hz  is  needed  to  ensure  a  flicker-free  image  on  the  display 
(the  flicker  is  not  perceivable)  (Bosman,  1989). 

In  CRT  technology,  the  phosphorus  dots  (phosphorus 
stripes)  are  stimulated  by  the  electron  beam.  Because  they  light 
up  for  only  a  fraction  of  a  second,  stimulation  must  occur  sev¬ 
eral  times  a  second.  It  passes  line  by  line  while  the  electron 
beam  is  writing  from  left  to  right  and  then  returns  to  the  begin¬ 
ning  of  the  following  line.  Line  build  begins  in  the  upper  left  cor¬ 
ner,  line  by  line,  until  each  dot  has  been  stimulated  once.  Then 
the  electron  beam  goes  back  to  the  upper-left  corner  and  begins 
to  build  up  the  picture  again. 

The  line  frequency  (or  horizontal  frequency  2  measured  in 
kHz)  is  another  characteristic  of  CRT  displays.  It  measures  the 
number  of  lines  the  electron  beam  can  draw  per  second.  In  this 
case,  line  frequency,  display-refresh  rate,  and  resolution  are  di¬ 
rectly  connected  with  each  other.  A  monitor  with  a  display- 
refresh  rate  of  70  Hz  and  a  resolution  of  1,024  X  768  needs  to 
have  an  electron  beam  that  is  capable  of  drawing  70  X  768  lines  = 
53,760  lines  per  second.  This  means  that  the  monitor  has  to 
process  a  line  frequency  of  at  least  53.76  kHz.  If  one  wants  to 
use  a  higher  resolution,  it  is  necessary  to  check  whether  the 
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monitor  is  capable  of  processing  such  a  line  frequency;  other¬ 
wise  damage  might  occur. 

In  addition  to  the  frequencies  described,  the  image  repre¬ 
sentation  mode  can  also  lead  to  poor-quality  representation. 
There  are  two  image  representation  modes:  interlaced  and  non¬ 
interlaced  mode.  With  the  interlaced  mode,  the  image  forma¬ 
tion  is  divided  into  two  half  images.  Here,  the  electron  beam 
first  builds  all  uneven  lines  and  then  all  even  lines. 

This  is  the  common  process  for  televisions  (for  Phase  Alter¬ 
nation  Line,  PAL);  50  half-images  per  second  are  normal,  which 
means  a  vertical  scanning  frequency  of  50  Hz  but  a  full  screen  fre¬ 
quency  of  only  25  Hz.  Nowadays,  monitors  with  a  non-interlaced 
mode  are  commonly  used.  In  this  case,  the  electron  beam 
builds  all  of  the  lines  one  after  another,  without  exception. 

In  general,  active  or  light  emitting  displays,  such  as  CRTs,  can 
be  written  more  quickly  than  passive  or  reflecting  displays,  such 
as  LCDs.  The  higher  speed  is  usually  related  to  increased  power 
consumption.  Multiplexed  LCDs  have  refresh  rates  of  about  40 
to  50  frames  per  second.  The  physics  of  the  display  is  such  that 
it  takes  some  time  to  turn  on  any  pixel  or  line.  Recent  advan¬ 
tages  in  LCD  technology  have  reduced  response  time  to  about 
8-20  ms. 


TECHNOLOGIES 


Classification  of  Electronic  Information 
Display  Technologies 

Direct-view  displays.  These  are  video  displays  in  which 
the  light  produced  by  a  display  device  is  viewed  directly  without 


first  bouncing  off  a  screen.  A1  CRT,  LCD,  .plasma  televisions,  and 
computer  monitors  are  direct-view  displays.  These  displays  tend 
to  work  best  in  bright-light  conditions  and  have  greater  light 
output  than  projection  displays. 

Projection  displays.  Unlike  a  direct-view  system,  a  pro¬ 
jection  display  relies  on  the  projection  of  an  image  onto  a 
screen.  There  are  front-  and  rear-projection  systems,  which 
mainly  differ  with  regard  to  screen  technology.  The  front  pro¬ 
jection  utilizes  a  reflective  screen  surface  while  a  rear  projection 
uses  a  transmissive  screen  surface.  Projection  displays  work  best 
in  dimly  lit  environments.  In  particular,  a  front  projection  set 
up  requires  a  darkened  room  for  optimum  viewing  quality. 

Off-screen  display  systems.  These  display  systems  do 
not  utilize  any  projection  screen.  Instead,  a  natural  medium,  like 
simple  glass  or  even  the  retina,  can  be  used  for  image  projec¬ 
tion.  Off-screen  systems  are  based  either  on  coherent  or  non¬ 
coherent  light  emission.  Coherence  is  a  property  of  waves  that 
measures  the  ability  of  the  waves  to  interfere  with  each  other. 
Usually,  laser  light  has  much  higher  coherence  than  nonlaser 
light.  VRDs  (see  section  “Laser  Displays”)  and  3D  holographic 
head-up  displays  are  examples  of  off-screen  display  systems. 


Cathode  Ray  Tube 

The  CRT  is  a  cathodoluminescent  display:  light  is  generated  by 
exciting  a  luminescent  material  with  energetic  electrons.  A  CRT 
consists  of  a  glass  bulb,  a  cathode,  an  electron  gun,  deflection 
yoke,  a  mask,  and  a  phosphor  coating  (Fig.  11.11). 


Electronic  Information  Displays 
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CRT  -  Cathode-ray  tube 
DMD  -  Digital  mirror  device 
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MOS  -  Metal  oxide 

semiconductor 

MIM  -  Metal-insulator-metal 
PDLC-  Polymer  dispersed 
liquid  crystal 

PSCT  -  Polymer  stabilized 
cholesteric  texture 
STN  -  Super  tvwsted  nematic 
TFT  -  Thin  film  transistor 


1 

- 1 

1 - 

- 1 

1 - 

Laser 

CRT 

Flat-panel 

Non-coherent 

Coherent 

display 

1 

emission 

emission 

1 

(head-up  display) 

(hologram) 

Shadow 

Beam 

Mono- 

mask 

index 

chrome 

Emitter 


Nonemitter  /Liquid-ciystal  (LCD) 


Active  matrix 


Passive  matrix 


TFT  MOS  MIM 


Plasma 

addressed 


Others  STN  FLC  PDLC  PSCT  Others 


Cathodoluminescence 


Electroluminescence  (EL) 


G  as  discharge  (Plasma  panel) 


I  I  I  I  I  I  I  I 

Flat  Field  Vacuum  AC  thin  Organic  Light  emitting  AC  Matrix  DC  Matrix 

CRT  emitter  fluorescence  film  EL  materials  diode 


FIGURE  11.10.  Classification  of  electronic  information  technologies  with  high  information 
content  (Theis,  1999). 
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FIGURE  11.11.  Major  parts  and  components  of  a  cathode  ray  tube.  In  this  example  shadow 
mask  technology  is  depicted  (cf.  Hanson,  2003). 


An  electron  gun  located  in  the  back  of  the  device  emits  nega¬ 
tively  charged  electrons,  which  are  attracted  and  accelerated  by 
an  anode  that  is  located  in  front  of  the  screen.  The  electron  beam 
is  diverted  by  an  electromagnetic  field,  built  up  by  the  deflection 
coils,  and  thus  directed  toward  the  screen.  Electrons  are  ex¬ 
tracted  from  the  cathode  by  thermal  emission  from  low-surface- 
potential  materials  (typically  metallic  oxides),  the  electron  beam 
generated  at  the  cathode  is  then  accelerated,  deflected,  and  fo¬ 
cused  by  a  series  of  electrostatic  lenses  and  deflection  coils. 

A  screen  mask  is  attached  in  front  of  the  ground-glass  plate 
so  that  the  electron  beam  is  focused  and  then  steered  on  the 
phosphorus  layer  and  deposited  on  the  front  surface.  As  a  con¬ 
sequence,  the  outside  areas  of  the  beam  are  prevented  from 
mistakenly  hitting  adjoining  phosphorus  dots,  which  would  lead 
to  a  blurred  representation  and  chromatic  distortion.  Display- 
screen  masks  are  either  dot-mask  screens,  Trinitrons,  or  slot- 
mask  screens. 

Dot-mask  screen.  As  the  name  suggests,  a  thin  metal  or 
ceramic  screen  with  multiple  holes  is  used  in  a  dot  screen  (Fig. 
11.12  (a)).  The  shadow-mask  technique  is  applied,  in  which 
three  electron  beams  pass  through  the  holes  and  focus  on  a  sin¬ 
gle  point  on  the  tube’s  phosphor  surface.  Thus,  the  other  dis¬ 
play  screen  dots  are  shaded.  The  electron  guns  are  arranged  in 
the  form  of  a  triangle  (delta  gun),  which  requires  a  high  amount 
of  adjustment  as  insufficient  adjustment  will  cause  color  defects. 
The  measurement  of  the  dot  pitches  of  dot-type  screens  is 
taken  diagonally  between  adjoining  dots  of  the  same  color.  Be¬ 
cause  of  their  horizontal  measurement,  they  cannot  be  easily 
compared  with  the  distance  between  the  dots  of  a  Trinitron  or 
a  slot  mask  screen  (Blankenbach,  1999). 

Slit  mask  (Trinitron  technology).  Monitors  based  on  the 
Trinitron  technology  use  an  aperture  grid  instead  of  a  shadow 
mask  (Fig.  11.12  (b)).  The  phosphorus  surface  does  not  consist  of 
colored  dots  but  instead  consists  of  a  multitude  of  tiny  vertical 


phosphorus  wires.  Their  arrangement  is  alternating,  similar  to  the 
dot  mask.  The  electron  guns  are  on  a  single  line,  which  makes  ad¬ 
justment  easier.  Instead  of  a  dot-type  screen,  the  Trinitron  has  ver¬ 
tically  taut  wires.  In  contrast  to  the  dot-type  screen,  the  equivalent 
of  the  dot  pitch,  the  SP  of  these  monitors  is  measured  by  the  hor¬ 
izontal  distance  between  wires.  Slit  masks  are  relatively  insensitive 
to  the  warmth  that  develops  during  use,  because  they  do  not 
bend,  but  merely  change  their  length.  They  are  very  sensitive  to 
mechanical  load  (l.e.,  vibration).  Depending  on  the  size  of  the 
monitor,  either  one  or  two  horizontal  holding  wires  are  used  for 
stabilization.  These  wires  are  recognizable  (i.e.,  on  the  Trinitron 
they  are  visible  as  small  gray  lines  In  the  top  and  bottom  third). 

Slot-mask  screen.  This  screen  technology  (Fig.  11.12  (c)) 
also  has  tiny  phosphor  wires,  which  are  similar  to  monitors 
based  on  Trinitron  technology.  The  openings  in  the  shadow 
mask  are  executed  as  slots  (Precht,  Meier,  &  Kleinlein,  1997). 

CRT  is  a  mature,  well-understood  technology,  which  is  found 
in  color  and  monochrome  TV  screens,  projection  TVs,  vehicle 
displays,  aircraft  cockpit  displays,  marine  instruments,  VDT  ter¬ 
minals,  communication  equipment,  medical  devices,  military 
systems,  etc.  (Castellano,  1992).  It  offers  high-information  con¬ 
tent  at  a  low  cost  and  is  capable  of  displaying  a  large  color  gamut 
(>256  colors  readily  available).  It  is  capable  of  high  resolution 
and  high-pixel-count  displays  are  readily  available.  Direct-view 
displays  can  be  made  In  diagonal  screen  sizes  of  up  to  40  inches; 
projection  systems  using  smaller  tubes  can  be  made  for  much 
larger  screen  viewing  (Castellano) .  Other  advantages  of  CRTs  are 
an  intrinsically  high  gray  scale,  good  temporal  dynamics,  and 
consistent  brightness  for  nearly  all  viewing  angles  (Candry 
2003).  On  the  negative  side,  the  color  CRT  is  large  and  heavy, 
power  consumption  and  operating  voltage  are  high,  and  vibra¬ 
tion  robustness  is  low.  Detrimental  imaging  characteristics  in¬ 
clude  poor  ambient  contrast  due  to  diffuse  reflection  from  the 
phosphor  surface,  a  propensity  toward  flickering,  geometric  dis¬ 
tortion,  and  a  correlation  between  spot  size  and  image  luminance 
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FIGURE  11.12.  Dot-mask  (a),  slit  mask  (b)  and  slot  mask  (c)  arrangement. 


(Candry).  The  operation  time  of  CRT  displays  is  limited  to  about 
10,000  hours  of  continuous  operation.  The  main  representation 
errors  that  occur  in  CRT  displays  are  linearity  error,  conver¬ 
gence  error,  and  moire. 

Linearity  error.  Geometric  elements  and  letters  should 
be  represented  at  the  same  size  everywhere  on  the  screen.  If 
displays  are  only  adjustable  vertically,  problems  can  arise.  A  de¬ 
flection  error  can  cause  the  marginal,  equispaced  raster  ele¬ 
ments  to  disperse.  Therefore,  the  figures’  widths  change  con¬ 
stantly  as  they  move  from  the  middle  of  the  screen. 

Convergence  error.  Color  representation  is  obtained  by 
three  electron  beams  and  three  phosphorus  layers,  one  of  each 
for  the  colors  red,  green,  and  blue.  Each  electron  beam  hits  one 
layer.  Congruent  dots  of  the  layers  build  a  pixel  that  can  be  seen 
in  any  color,  depending  on  the  intensity  of  the  beams.  If  the 
electron  beams  are  not  adjusted  exactly,  misconvergence  is  the 
consequence.  “Ghostings”  of  the  removed  color  arise,  causing  a 
blurred  and  distorted  image. 

Moire.  Certain  color  combinations  and  pixel  arrange¬ 
ments  can  cause  interference  called  moire.  The  interference  can 
result  when  a  mask  is  deposited  imprecisely,  and  electrons  pass 
through  the  mask  imprecisely.  As  a  result,  streaks  are  seen  on 
the  screen. 

Nonemitter/Liquid-Crystal  Display 

Liquid  Crystal  Displays  (LCD)  have  become  increasingly  viable 
in  recent  years  and  now  tend  to  surpass  CRTs.  Early  commer¬ 
cial  developments  of  LCDs  concentrated  on  small  numeric  and 
alphanumeric  displays,  which  rapidly  replaced  LEDs  and  other 
technologies  in  applications  such  as  digital  watches  and  calcu¬ 
lators  (Bosman,  1989).  Now  there  are  even  more  complex  dis¬ 
plays  for  use  in  many  applications,  such  as  laptops,  handheld 
computers,  flat-panel  displays  (FPDs),  HMDs,  and  miniature 
televisions  (i.e.  those  in  airplane  seats).  LCDs  have  the  following 
advantages: 

•  Power  consumption  is  low. 

•  They  operate  at  low  voltages. 

•  Lifetime  in  normal  environments  is  very  long. 


•  Displays  may  be  viewed  either  directly  in  transmission  or  re¬ 
flection  or  may  be  projected  onto  large  screens. 

LCDs  consist  of  two  glass  plates  with  microscopic  lines  or 
grooves  on  their  inner  surfaces  and  a  liquid  crystal  layer  between 
them.  Liquid  crystal  materials  do  not  emit  light,  so  external  or 
back  illumination  must  be  provided.  The  physical  principle  is 
based  on  the  anisotropic  material  qualities  of  liquid  crystals. 
When  substances  are  in  an  odd  state  that  is  somewhat  like  a  liquid 
and  somewhat  like  a  solid,  their  molecules  tend  to  point  in  the 
same  direction,  like  the  molecules  in  a  solid,  but  can  also  move 
around  to  different  positions,  like  the  molecules  in  a  liquid.  This 
means  that  liquid  crystals  are  neither  a  solid  nor  a  liquid  but  are 
closer  to  a  liquid  state.  LCDs  operate  by  electrically  modulating 
the  anisotropy  between  optical  states  in  order  to  produce  visible 
contrast.  In  an  electric  field,  liquid  crystals  change  their  alignment 
and  therefore  their  translucence.  If  no  voltage  is  applied,  light  can 
pass  through  and  the  pixels  appear  bright.  When  voltage  is  ap¬ 
plied,  the  pixels  become  dark  (Matschulat,  1999).  The  light  to  be 
modulated  may  either  originate  from  ambient  or  an  additional 
bright-light  source  placed  behind  the  LCD  (Theis,  1999). 

The  two  principal  flat  panel  technologies  are  the  passive  ma¬ 
trix  LCD  and  active  matrix  LCD.  Passive  matrix  addressing  is 
used  in  twisted  nematic  (TN)  LCDs.  In  TN  LCD  displays  the  mi¬ 
croscopic  lines  of  the  glass  plates  are  arranged  orthogonally  to 
each  other,  and  the  glass  plates  serve  as  polarizers  (Precht  et  al., 
1997).  Their  directions  of  translucence  lie  at  right  angles  on  top 
of  one  another,  so  that  no  light  can  pass  through.  Because  of  the 
fine  grooves  on  the  inner  surface  of  the  two  glass  panels 
(arranged  vertically  on  one  panel  and  horizontally  on  the 
other),  the  liquid  crystal  is  held  between  them  and  can  be  en¬ 
couraged  to  form  neat  spiral  chains.  These  chains  can  alter  the 
polarity  of  light.  In  the  so-called  nematic  phase,  the  major  axes 
of  the  crystal’s  molecules  tend  to  be  parallel  to  each  other. 

Nonpolarized  light  from  background  illumination  can  pass 
the  polarization  filter  with  just  one  plane  of  polarization.  It  is 
twisted  about  90°  along  the  helix  and  can  thus  pass  through  the 
second  polarization  layer.  The  display  appears  to  be  bright  when 
there  is  no  electric  current.  By  applying  an  electric  current  to 
twisted  nematics,  they  untwist  and  straighten,  changing  the  an¬ 
gle  of  the  light  passing  through  them  so  that  it  no  longer 
matches  the  angle  of  the  top  polarizing  filter.  Consequently,  no 
light  passes  through  that  portion  of  the  LCD,  which  becomes 
darker  than  the  surrounding  areas,  and  the  pixel  appears  black. 
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FIGURE  11.13.  Principle  of  operation  of  a  twisted  nematic  LCD. 


By  applying  different  electric  currents,  gray  scales  can  be 
produced  with  LCD  technology.  One  disadvantage  with  this  is 
the  relatively  low  contrast,  but  this  can  be  improved  by  applying 
a  steep  electro-optic  characteristic  line  of  the  liquid  crystals. 
Low  voltage  is  then  sufficient  to  change  the  translucence,  caus¬ 
ing  the  liquid  crystals  to  twist  by  more  than  90  degrees.  Such 
displays  are  called  super-twisted  nematic  (STN),  for  example, 
double  super-twisted  nematic  (DSTN)  or  triple  super-twisted 
nematic  (TSTN)  (Schadt,  1996). 

In  passive  matrix  LCDs,  the  electric  field  expands  over  the  pixel 
to  be  addressed  to  the  entire  horizontal  and  vertical  electrodes, 
which  results  in  disturbing  stripes,  or  “ghosting.”  Other  disadvan¬ 
tages  are  slow-response  times  and  a  prelightening  of  the  pixels. 

Active  matrix  (AM)  addressing  places  an  electronic  switch  at 
each  pixel  of  an  LCD;  thus  controlling  the  charging  of  the  pixel 
capacitor  up  to  the  voltage  corresponding  to  the  desired  gray 
shade,  and  then  holding  this  voltage  unit  the  next  video  informa¬ 
tion  is  written  in.  The  available  switches  are  thin-film  transistors 
(TFTs),  which  act  like  diodes  (Ltider,  2003).  In  order  to  display  a 
full  range  of  colors,  each  element  on  a  LCD  is  assigned  to  a  pri¬ 
mary  color  by  a  special  filter,  which  is  deposited  on  the  faceplate 
of  the  LCD.  To  guarantee  high-display  contrast  quality,  brightness 
and  high  quality  of  color  representation,  many  transistors  are 
needed.  For  a  resolution  of  1,024  X  768  pixel,  2.36  million  tran¬ 
sistors  are  required,  one  per  primary  color  and  subpixel. 

The  traditional  representation  errors  caused  by  an  incorrect 
deflection  do  not  occur  because  of  the  fixed  matrix,  but  defec¬ 
tive  transistors  can  cause  errors,  leading  to  permanently  bright 
or  dark  pixels.  These  pixels  are  distracting,  particularly  when 
all  three  transistors  of  an  elementary  cell  are  switched  to  bright, 
which  causes  a  white  pixel  to  appear  on  the  screen.  Further¬ 
more,  the  representation  of  moving  pictures  is  restricted.  The 
response  latency  of  the  liquid  crystals  is  about  20  to  30  ms.  Fast 
sequences  (i.e.  scrolling)  are  blurred. 


LCD  monitors  with  a  standard  video  graphic  array  (VGA)  plug 
have  to  convert  the  analog  VGA  signal  back  into  a  digital  signal.  If 
an  incorrect  A/D  changer  is  used  (i.e.,  an  18-bit-A/D  changer),  the 
color  representation  can  be  affected,  although  the  LCD  monitor 
is  actually  capable  of  representing  true  color  (24  Bit). 

CRTs  easily  represent  black;  the  cathode  simply  does  not 
emit  electrons.  LCDs,  however,  have  to  completely  block  out 
the  light  from  the  back  light.  Technically,  this  is  impossible,  and 
as  a  consequence  the  contrast  is  reduced. 

When  an  LCD  display  is  viewed  from  an  angled  position,  it 
appears  darker  and  color  representation  is  distorted  (see  sec¬ 
tion  “Brightness  and  Glare”).  Recently  developed  technologies, 
such  as  in-plane  switching  (IPS),  multidomain  vertical  alignment 
(MVA)  and  thin-film  transistors  (TFT)  have  improved  the  width 
of  the  viewing  angle.  The  welcome  effect  of  the  restricted  view¬ 
ing  angle  for  privacy  purposes,  as  in  the  case  of  automated  teller 
machines  (ATMs),  is  worth  mentioning  here. 

The  nominal  screen  diagonal  of  a  LCD  is  equivalent  to  the  ef¬ 
fective  screen  diagonal.  In  contrast,  CRT  nominal  screen  diago¬ 
nals  are  smaller  than  the  effective  screens. 


Plasma  Displays 

The  oldest  electro-optical  phenomenon  able  to  produce  light 
is  an  electrical  discharge  in  gas  (Bosman,  1989).  Millions  of  years 
elapsed  before  this  effect  was  identified,  analyzed,  and  mastered 
by  humans.  The  first  attempts  to  produce  a  matrix  display  panel 
were  made  in  1954.  Since  then,  research  has  continued  and  a 
host  of  approaches  have  evolved  (Bosman).  The  beauty  of  this 
technique  is  that,  unlike  front-view  projection  screens,  one  does 
not  have  to  turn  off  the  lights  to  see  the  image  clearly  and  easily. 
Therefore,  plasmas  are  excellent  for  video  conferencing  and 
other  presentation  needs  (Pioneer,  2001). 

In  plasma  technology,  two  glass  plates  are  laid  with  their  par¬ 
allel  thin  conducting  paths  at  right  angles  to  one  another  (Precht 
et  al.,  1997).  The  gap  between  the  plates  is  evacuated  and  filled 
with  a  gas  mixture  (see  Fig.  11.14).  If  sufficiently  high  voltage  is 
applied  to  the  cross  point  of  two  orthogonal  conducting  paths, 
the  gas  ionizes  and  begins  to  shine  (like  a  lot  of  small  gas- 
discharge  lamps).  The  inside  of  one  glass  plate  is  coated  with  a 
phosphorus  layer,  which  is,  according  to  the  fundamental  colors, 
composed  of  three  different  kinds  of  phosphorus.  There  is  a 
wired  matrix  below  the  phosphorus-layer  to  trigger  the  PDF  The 
space  between  the  glass  plates  is  divided  into  gas-filled  cham¬ 
bers.  When  voltage  is  applied  to  the  wired  matrix  on  the  bottom 
of  the  display,  the  gas  is  transformed  into  a  plasmatic  state  and 
emits  ultraviolet  radiation,  causing  the  phosphorus  to  glow. 

The  advantage  of  this  technology  is  the  flat  construction  of 
large  screens  that  perform  extraordinarily  well  under  most  am¬ 
bient  light  conditions.  For  example,  even  very  bright  light  does 
not  wash  out  the  image  on  the  screen.  Another  characteristic 
of  a  plasma  panel  is  the  extreme  viewing  angles  both  vertically 
and  horizontally.  With  a  l60°-viewing  angle,  people  sitting  to  the 
side  of  the  screen  will  still  be  able  to  see  the  image  without  los¬ 
ing  any  information  (Pioneer,  2001). 

Plasma  screens  do  have  disadvantages.  First,  they  consume 
a  large  quantity  of  power,  making  this  technology  unsuitable  for 
battery-operated  and  portable  computers.  Further,  high  voltage 
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FIGURE  11.14.  Major  parts  and  components  of  a  plasma  display 
panel. 


is  required  to  ignite  the  plasma,  and  the  phosphorus  layers  de¬ 
grade  over  time.  Finally,  the  pixels  are  oversized,  so  the  user 
must  be  situated  at  a  distance  from  the  display. 

Electroluminescent  Displays 

Electroluminescence  (EL)  refers  to  the  generation  of  light  by  non- 
thermal  means  in  response  to  an  applied  electric  field  that  pro¬ 
duces  light  (Nelson  &  Wullert,  1997).  In  early  EL  displays,  the 
phosphors  were  used  in  powder  form  as  in  CRTs.  Current  ELDs 
use  a  thin  film  of  phosphorescent  substance  (yellow-emitting 
ZnS:  Mn),  which  is  sandwiched  between  two  conducting  elec¬ 
trodes.  The  top  electrode  is  transparent.  One  of  the  electrodes 
is  coated  with  vertical  wires  and  the  other  with  horizontal  wires, 
forming  a  grid.  When  electrical  current  is  passed  though  a  hori¬ 
zontal  and  a  vertical  wire,  electrons  are  accelerated  and  pass 
through  the  phosphorous  layer.  Light  is  emitted  by  the  excited  ac¬ 
tivator  atoms  in  the  form  of  a  pulse.  The  activators  are  transition 
metal  or  rare-earth  atoms.  Once  excited,  these  centers  can  decay 
to  the  ground  state  through  a  radiating  transition,  and  thus  emit 
light  with  their  specific  emission  spectrum  (Budin,  2003).  Elec¬ 
troluminescence  can  also  be  obtained  from  organics.  Organic  EL 
devices  generally  consist  of  a  hole-transport  layer  (HTL)  and  an 
electron-transport  layer  (ETL)  between  electrodes.  The  radiating 
recombination  can  be  further  optimized  by  introducing  fluores¬ 
cent  centers  near  the  interface  of  the  two  layers.  Such  structures 
are  particularly  useful  in  tuning  EL  colors  (Theis,  1999). 


High-quality  color  presentation  is  the  main  concern  with  re¬ 
gard  to  ELDs.  A  large  number  of  laboratories  have  experimented 
with  many  materials,  activators,  and  full-color  solutions  (Ono, 
1993).  Color  ELDs  can  be  manufactured  in  different  ways.  One 
of  the  approaches,  which  is  quite  common  for  emitters,  is  that 
of  additive-color  synthesis,  using  three  juxtaposed  patterned 
phosphors.  This  approach,  however,  suffers  from  a  reduced 
spatial-fill  factor  for  each  monochromatic  emitter.  The  second 
approach  to  full  color  ELDs  consists  of  using  a  single  white  emit¬ 
ting  structure  with  patterned  color  filters.  It  results  in  a  much 
simpler  manufacturing  process.  A  hybrid  solution,  which  is  be¬ 
ing  developed  into  a  full-color  commercial  product  at  Planar 
(Fig.  11.15),  consists  in  stacking  and  registering  several  plates 
(King,  1996).  The  first  plate  is  a  glass  plate  with  the  active  struc¬ 
ture  on  its  far  side,  and  transparent  electrodes  on  both  sides  of 
a  patterned  ZnS :  Mn  (filtered  to  red)  and  ZnS :  Tb  (green)  struc¬ 
ture  similar  to  that  described  above.  The  second  glass  plate  has 
a  fully  transparent  blue-emitting  Ce :  Ca  thiogallate  structure  on 
top  of  the  plate.  On  both  plates,  row  electrodes  are  reinforced 
by  a  thin  metal  bus  (Theis,  1999;  Budin,  2003). 

There  are  many  different  types  of  EL  technology,  among 
which  the  alternating  current  thin  film  electroluminescent  dis¬ 
plays  (ACTFEL)  are  the  most  commonly  used  ELDs.  Currently, 
ELDs  are  manufactured  in  sizes  ranging  from  1  to  18  inches  with 
resolutions  from  50  to  1,000  lines  per  inch  (Rack  et  al.,  1996). 
ELD  qualities  of  note  are  the  sharp  edges  of  the  pixels,  good 
contrast  (10 : 1  at  400  lux),  the  wide  viewing  angle,  fast  response 
time,  durability,  shock  resistance,  operation  at  high  and  low 
temperatures,  and  the  smallest  thickness  and  weight  compared 
to  the  other  flat  panel  displays  (Budin,  2003).  The  bigger  disad¬ 
vantage  of  ELDs  is  their  limited  ability  to  be  used  in  full-color 
or  large-area  applications  (Nelson  &  Wullert,  1997).  ELDs  are 
primary  used  for  medical,  industrial,  and  instrumentation  ap¬ 
plications  (Budin,  2003). 

EL  devices  also  include  LEDs.  LED  technology  is  used  in  al¬ 
most  every  consumer  electronic  product  on  the  market.  The 
operational  principle  of  LEDs  can  be  briefly  described  as  fol¬ 
lows:  with  no  voltage,  or  reversed  voltage  applied  across  the 
pn  junctions,  an  energy  barrier  prevents  the  flow  of  electrons 
and  holes.  When  a  forward  bias  voltage  (1.5  to  2  V)  is  applied 
across  the  junction,  the  potential  barrier  height  is  reduced  by  al¬ 
lowing  electrons  to  be  injected  into  the  p  region  and  holes  into 
the  n  region.  The  injected  minority  carriers  recombine  with  car¬ 
riers  of  opposite  sign,  resulting  in  the  emission  of  photons 
(Castellano,  1992). 

LEDs  as  well  as  some  types  of  ELD  can  also  be  used  as  a  back¬ 
light  for  LCD  applications. 

Cathodoluminescent  Displays 

The  working  principles  of  these  displays  are  comparable  to  the 
CRT;  therefore,  they  are  also  referred  to  as  flat  CRTs.  One  of  the 
most  successful  flat  CRT  configurations  is  called  vacuum  fluo¬ 
rescent  display  (VFD).  Vacuum  fluorescent  displays  are  broadly 
used  in  consumer  electronic  products  and  are  recognizable  by 
their  green  or  blue-green  glow,  which  is  due  to  the  ZnO :  Zn 
phosphor  (Nelson  &  Wullert,  1997).  It  is  essentially  a  triode  with 
arrays  of  grids  and  phosphor-coated  anodes  used  for  coincident 
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FIGURE  11.15.  Schematic  of  (a)  a  Planar  multicolor  EL  prototype,  and  (b)  of  the  first  full- 
(saturated)  color  dual  substrate  EL  panel  (cf.  King,  1994). 


addressing.  The  grid  structure  is  positioned  over  the  anode  and 
the  cathode  filaments  are  stretched  and  held  above  the  grid. 
The  entire  structure  is  then  sealed  in  an  evacuated  cell.  The 
cathode  filament  is  a  very  fine  wire  that  is  heated  to  a  tempera¬ 
ture  just  below  incandescence  (about  600°C).  At  that  tempera¬ 
ture  it  remains  virtually  invisible  but  it  emits  electrons.  The  an¬ 
ode  is  supplied  with  a  voltage  of  between  10  V  and  50  V  and  at 
the  same  time  the  grid  voltage  is  applied  to  the  grid  of  that  se¬ 
lected  segment.  The  electrons,  which  are  emitted  by  cathode  fil¬ 
aments,  are  controlled  by  grids.  When  a  grid  is  supplied  by  a 
positive  voltage,  it  attracts  the  negative  electrons  whilst  posi¬ 
tive  electrons  are  attracted  when  the  grid  is  supplied  with  the 
negative  voltage.  The  electrons  collide  with  the  phosphor- 
coated  anode,  resulting  in  photon  emission.  The  VFD  display 
can  be  driven  in  a  dynamic  or  static  system.  The  dynamic  system 
is  a  type  of  multiplexing  where  a  pulse  is  applied  to  the  grid  of 
each  digit  or  row  and  to  the  anode  of  each  selected  digit.  The 
static  system  uses  a  DC  signal  for  each  anode  of  each  digit 
(Castellano,  1992;  Nelson  &  Wullert,  1997). 

One  of  the  disadvantages  of  VFDs  is  that  they  cannot  be 
scaled  to  arbitrarily  large  sizes  because  the  front  and  back  plates 
are  only  supported  at  the  edges.  Another  problem  is  equaliza¬ 
tion  of  the  luminous  efficiencies.  The  blue  phosphors  tend  to  be 
very  low  in  luminous  efficiency  and  make  the  realization  of  a 
full-color  matrix  VFD  difficult  (Castellano) . 

Because  the  thermionic  emission  used  in  VFDs  seems  to  be 
incompatible  with  an  internal  array  of  spacers,  field  emission 
has  been  proposed  as  a  source  of  electrons.  Internal  spacers  can 
be  incorporated  between  emitters  to  allow  field-emission  dis¬ 
plays  (FEDs)  to  be  thinner,  lighter,  and  potentially  much  larger 
than  VFDs  (Nelson  &  Wullert,  1997).  The  basic  principle  of  VFDs 
is  similar  to  that  of  FEDs.  In  EEDs,  electrons  are  not  emitted  by 
a  cathode  consisting  of  an  oxide-coated  heated  filament,  com¬ 
pared  to  the  working  principle  of  VFDs,  but  by  field  emission 
from  a  cold  cathode. 


Electrons  emitted  from  a  cathode  are  accelerated  toward  a 
screen,  where  their  energy  is  transformed  into  light  (Theis, 
1999)  (Fig.  11.16). 

In  real  devices,  efficient  electron  emission  is  obtained  from  a 
few  thousand  microtips  per  pixel  formed  from  Mo  or  Si  (Spindt- 
type,  with  tip  radii  below  1  mm  for  high  field  strength),  or  by 
replacing  the  micro  tip  with  planar  diamond-like  carbon  films, 
which  are  also  suitable  for  electrons. 

The  gate  voltage  in  Spindt-type  devices  is  about  50  V  and  the 
dielectric  thickness  separating  gate  and  cathode  is  about  1  |jim. 
The  anode-cathode  distance  is  200  ixm.  A  spacer  technique  is 
used  to  avoid  glass  bending  due  to  atmospheric  pressure 
(Theis,  1999;  Budin,  2003). 

The  greatest  FED  challenges  include  finding  low-voltage 
color  phosphors,  simplifying  processing,  and  increasing  effi¬ 
ciency  (Nelson  &  Wullert,  1997).  However,  the  technology  is 
attractive  since  it  is  relatively  thin  and  it  is  possible  to  create  a 
Lambertian  light  with  180°-viewing  cones  and  good  video¬ 
response  times. 

Laser  Displays 

LASER  is  the  acronym  for  Light  Amplification  by  Stimulated 
Emission  of  Radiation.  Laser  light  is  generated  in  the  active 
medium  of  the  laser.  Energy  is  pumped  into  the  active  medium 
in  an  appropriate  form  and  is  partially  transformed  into  radia¬ 
tion  energy  (ILT,  2001).  In  contrast  to  thermal  emitters,  a  laser 
emits  a  concentrated  and  monochromatic  light  with  high  local 
and  temporal  coherence  (see  Fig.  11.17). 

In  contrast  to  conventional  light  sources,  laser  light  has  rays 
that  are  nearly  parallel  with  each  other  (it  has  a  small  diver¬ 
gence),  is  focusable  down  to  a  wave  length,  has  a  high  power 
density,  a  monochromatic  character  (light  of  one  wavelength), 
and  high  local  and  temporal  coherence  (the  same  phase).  These 
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FIGURE  11.16.  Principal  structure  of  field  emission  displays  (Theis,  1997). 
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FIGURE  11.17.  Dispersion  of  light  between  thermal  emitters  and  laser  light. 


characteristics  can  be  used  to  generate  an  image  analogous  to 
a  conventional  CRT  system:  the  laser  is  directed  line  by  line 
across  the  display  area.  The  line  diversion,  horizontal  and  verti¬ 
cal,  is  achieved  by  the  use  of  rotary  or  oscillating  mirrors.  Light 
output  from  the  diode-pumped  solid-state  lasers  is  modulated 
according  to  the  input  signal,  and  the  red,  green,  and  blue  light 
is  combined.  This  combined  light  is  then  raster-scanned  onto 
the  screen  to  create  an  image.  The  laser  projects  the  image 
either  onto  a  flat  area  or  directly  onto  the  retina.  Figure  11.18 
shows  the  principle  of  a  retinal-scanning  laser  display. 

A  collimated  low-power  laser  diode  is  normally  used  as  a 
modulated  light  source.  The  tiny  laser  beam  is  subsequently  de¬ 
flected  in  u-  and  v-direction  by  two  uni-axial  scanners.  The  hori¬ 
zontal  scanner  (u-direction)  operates  at  several  kHz.  The  vertical 
scanner  frequency  (v-direction)  defines  the  image  refresh  rate, 
which  must  exceed  the  critical  fusion  frequency  (CFF)  of  about 
60  Hz  to  prevent  flickering  effects.  Finally,  a  viewing  optics  pro¬ 
jects  the  laser  beam  through  the  center  of  the  eye  pupil  onto  the 
retina.  In  the  retinal-scanning  display  system  the  pixels  are  pro¬ 
jected  serially  in  time  onto  the  retina.  As  the  image  refresh  rate  is 


above  the  temporal  resolution  limit  of  the  human  eye,  the  user 
does  not  perceive  any  flickering  effects  (von  Waldkirch,  2005). 

Electronic  Ink  and  Electronic  Paper 

For  nearly  2,000  years,  ink  on  paper  was  the  most  popular  way 
to  display  words  and  images.  It  still  has  many  advantages  com¬ 
pared  to  computer  displays  with  regard  to  text  readability  and 
price.  The  biggest  limitation  of  paper  displays  is  that  the  printed 
symbols  cannot  be  changed  or  removed  without  leaving  notice¬ 
able  marks.  The  invention  of  electronic  ink  has  made  it  possi¬ 
ble  to  overcome  this  disadvantage.  Electronic  ink  displays  promise 
to  have  paper-like  properties.  For  example,  electronic  ink  dis¬ 
plays  are  viewed  in  reflective  light,  have  a  wide  viewing  angle, 
are  thin,  flexible,  and  relative  inexpensive.  Unlike  paper,  they 
are  electrically  writeable  and  erasable.  A  big  advantage  that  they 
have  over  other  displays  is  their  very  low  power  consumption, 
possibly  extending  the  battery  life  of  devices  with  such  displays 
into  months  or  even  years.  The  principal  components  of  elec- 


1 1 .  VISUAL  DISPLAYS 


217 


FIGURE  11.18.  Schematlcal  illustration  of  the  RSD  principle  (according  to  Menozzi  et  al.,  2001). 


tronic  ink  (see  E  Ink  Corporation,  2001)  are  tiny  capsules,  about 
100  mm  diameter,  filled  with  a  clear  fluid.  Suspended  in  the  fluid 
are  positively  charged  white  particles  and  negatively  charged 
black  particles.  The  particles  will  move  toward  an  inversely 
charged  electrode.  Therefore,  when  a  top  transparent  electrode 
is  negatively  charged,  the  white  particles  move  to  the  top  and 
become  visible  to  the  user.  The  black  particles  will,  at  the  same 
time,  move  to  a  positively  charged  bottom  electrode  and  the  mi¬ 
crocapsule  appears  white  to  the  user.  Similarly,  the  microcap¬ 
sule  will  appear  black  when  an  electric  field  of  the  opposite  di¬ 
rection  is  applied.  The  microcapsules  are  suspended  in  a 
“carrier  medium”  that  can  be  printed  onto  a  sheet  of  plastic  film 
(or  onto  many  other  surfaces).  The  film  is  again  laminated  to  a 
layer  of  circuitry  that  forms  a  pattern  of  pixels  that  can  be  con¬ 
trolled  similarly  to  other  displays. 

An  invention  made  20  years  ago  at  Xerox  Palo  Alto  Research 
Center  forms  the  basis  of  Gyricon  reusable  paper,  which  has 
similar  advantages  to  electronic  ink.  It  consists  of  an  array  of 
small  balls  of  0.03  to  0.1  mm  in  diameter  embedded  in  oil-filled 
pockets  in  transparent  silicone  rubber.  Every  ball  is  half  white 
and  half  black  and  can  rotate  freely.  Each  ball  has  an  electric 
charge,  and  when  an  electric  field  is  applied  to  the  surface  of  the 
sheet,  the  balls  rotate  showing  either  their  black  or  white  side 
and  attach  to  the  wall  of  the  sheet.  They  remain  there  displaying 
the  intended  image  as  long  as  no  new  electric  field  is  applied 
(Gibbs,  1998). 

Unlike  the  first  prototypes  of  electronic  ink  and  electronic 
paper,  which  could  only  display  two  colors,  usually  black  and 
white,  photonic  ink  can  display  any  color  value  in  the  spectrum. 
Photonics  (from  photon)  is  an  area  of  study  based  on  the  uti¬ 
lization  of  radiant  energy,  such  as  light,  for  various  applications. 
Photonic  ink  or  p-ink,  consists  of  planar  arrays  of  silica  micros¬ 
pheres — an  opal  film-embedded  in  a  matrix  of  cross-linked  poly- 
ferrocenylsilane  (PES).  Photonic  ink  is  able  to  display  color 
through  controlled  Bragg  diffraction  of  light.  Varying  the  size  of 
the  spaces  between  the  particles  creates  different  colors.  A  poly¬ 
mer  gel  is  filled  between  the  stacked  spheres.  The  gel  swells 
when  soaked  in  a  solvent  and  shrinks  when  it  dries  out,  chang¬ 


ing  the  distance  between  the  tiny  spheres.  Because  the  size  of 
the  spaces  determines  the  wavelength  that  will  be  reflected,  the 
swelling  and  drying  of  the  gel  results  in  a  continuously  tunable 
display  of  color  across  the  entire  spectral  range  visible.  The 
amount  of  solvent  absorbed  by  the  gel  is  controlled  by  apply¬ 
ing  an  electrical  voltage.  The  optical  response  of  the  film  to  a 
change  in  solvent  is  less  than  half  a  second  (American  Chemi¬ 
cal  Society,  2003). 

Applications  for  electronic  paper  are  seen  as  a  substitute  for 
paper  products  such  as  books  and  newspapers,  as  rewriteable 
paper  in  the  office,  as  a  material  for  price  tags  and  retail  signs,  as 
wall  sized  displays,  foldable  displays,  and  as  an  alternative  dis¬ 
play  for  PDAs  and  mobile  phones. 


VISUAL  PERFORMANCE 


Trouble-free  usage  and  user  acceptance  of  any  new  technology 
depend  substantially  on  the  quality  of  the  visual  display  as  a  cen¬ 
tral  communication  unit  as  well  as  the  ease  with  which  the  dis¬ 
plays  allow  visual  information  to  be  processed.  Thus,  a  careful 
visual  evaluation  is  indispensable  in  order  to  assess  the  effi¬ 
ciency  of  visual  performance  and  to  identify  existing  shortcom¬ 
ings  of  the  displays. 

Looking  back,  there  is  quite  a  long  history  of  studies  con¬ 
cerned  with  the  evaluation  of  visual  displays  (Dillon,  1992; 
Schlick,  Daude,  Luczak,  Week,  &  Springer,  1997;  Luezak  & 
Oehme,  2002;  Pfendler  &  Schlick,  in  press).  In  order  to  estimate 
the  costs  and  benefits  of  electronic  displaying  in  terms  of  hu¬ 
man  performance  and  visual  load,  the  main  interest  was  to  learn 
which  factors  affect  visual  performance  to  what  extent.  One  re¬ 
search  approach  was  to  compare  display  types  with  respect  to 
the  efficiency  of  visual  processing.  In  most  cases  a  basic  com¬ 
parison  of  the  traditional  hard  copy  with  different  electronic  dis¬ 
plays  was  performed  (Heppner,  Anderson,  Farstrup,  &  Weiden- 
mann,  1985;  Gould,  Alfaro,  Barnes,  Einn,  Grischkowsky,  & 
Minuto,  1987).  Another  approach  was  to  study  the  effects  of 
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specific  display  factors  on  performance  (Pfendler,  Widdel,  & 
Schlick,  2005;  Sheedy  Subbaram,  &  Hayes,  2003;  Plainis,  &  Mur¬ 
ray,  2000;  Ziefle,  1998).  This  procedure  helps  to  distinguish  dif¬ 
ferent  sources  that  account  for  performance  shifts.  Moreover, 
user  characteristics  (Bergqvist,  Wolgast,  Nilsson,  &  Voss,  1995; 
Kothiya  &  Tettey,  2001;  Ziefle,  2003a)  and  workplace  settings 
(Sommerich,  Joines,  &  Psihoisos,  2001;  Ziefle,  2003b)  as  well  as 
individual  body  postures  received  attention  (Aaras,  Horgen, 
Bjorset,  Ro,  Thoresen,  &  Larsen,  1997;  Aaras,  Horgen,  Bjorset, 
Ro,  &  Thoresen,  1998;  Helander,  Little,  &  Drury,  2000)  in  the 
context  of  visual  performance.  Beyond  display  characteristics, 
any  visual  evaluation  should  also  consider  the  diverse  task  de¬ 
mands,  because  they  can  strongly  influence  performance  out¬ 
comes.  Their  influence  can  be  either  positive,  by  compensating 
other  suboptimal  visual  settings,  or  negative,  when  several  neg¬ 
ative  factors  accumulate,  possibly  leading  to  visual  complaints. 

Task  Demands 

Task  demands  are  of  crucial  importance  for  visual  evaluations 
because  they  represent  a  complex  entity  of  different  elements. 
The  combination  of  (a)  the  task  type  (what  the  user  is  requested 
to  do),  (b)  user  characteristics  (level  of  expertise,  visual  abili¬ 
ties  or  working  motivation),  (c)  text  factors  (font  size,  line 
pitch),  (d)  display  factors  (i.e.  contrast,  refresh  rates,  resolution) 
and  (e)  surrounding  factors  (i.e.  ambient  lighting  or  time  on 
task)  has  a  considerable  impact  on  performance  outcomes.  As 
these  factors  were  found  to  interplay,  it  is  necessary  to  examine 
the  particular  conditions  and  settings  that  were  used  in  the  dif¬ 
ferent  studies  concerned  with  visual  evaluations. 

In  order  to  evaluate  visual  displays,  simple  detection  tasks, 
memory  and  recognition  tasks,  visual  search  tasks,  and  (proof) 
reading  were  used.  Overall,  two  basic  task  forms  can  be  differ¬ 
entiated  between  in  this  context:  tasks  in  which  a  semantic  con¬ 
text  is  present  (i.e.,  (proof)  reading)  and  tasks  with  no  semantic 
context  (i.e.,  visual  search).  This  distinction  has  implications 
with  respect  to  the  outcomes’  ecological  validity  and  generaliz- 
ability  as  well  as  to  the  sensitivity  with  which  shortcomings  of  vi¬ 
sual  displays  are  reflected.  Tasks  that  have  a  semantic  context 
have  the  general  advantage  of  simulating  what  users  mostly  do 
when  using  displays,  thus  the  evaluation  process  is  ecologically 
valid.  However,  these  tasks  were  found  not  to  be  very  sensitive 
to  visual  degradation  effects  (Stone,  Clark,  &  Slater,  1980;  Ziefle 
1998),  because  of  two  main  reasons.  First,  the  encoding  and 
processing  of  text  material  represents  behavior  gained  through 
intensive  training:  representing  a  top-down  process,  compre¬ 
hension  guides  the  reader  through  the  text  and  possibly  masks 
degradation  effects.  The  second  objection  is  concerned  with 
reading  strategies  involving  different  combinations  of  cognitive 
and  visual  processes:  when  proofreading,  for  example,  partici¬ 
pants  may  read  for  comprehension  or  scan  for  unfamiliar  letter 
clusters  and  word  shapes.  Thus,  it  is  possible  that  performance 
outcomes  do  not  reflect  the  degradation  properly  even  though 
the  display  quality  is  suboptimal:  deteriorations  might  then  be 
overlooked,  especially  in  short  test  periods.  Even  though  eco¬ 
logical  validity  is  lower,  visual  search  and  detection  tasks 
showed  a  higher  sensitivity  to  visual  degradation  effects,  be¬ 
cause  the  visual  encoding  process  predominately  relies  on  vi¬ 


sual  properties  of  the  display  (bottom-up  process)  without  be¬ 
ing  masked  by  compensating  cognitive  strategies. 

However,  text  factors  were  also  found  to  considerably  affect 
performance:  among  those,  display  size,  the  amount  of  infor¬ 
mation  to  be  processed  at  a  time,  as  well  as  font  size  (Duncan 
&  Humphreys,  1989;  Oehme,  Weidenmaier,  Schmidt,  &  Luczak, 
2001,  Ziefle,  Oehme,  &  Luczak,  2005)  were  revealed  to  be  cru¬ 
cial  in  this  context.  The  smaller  the  font  size  and  the  larger  and 
denser  the  amount  of  (text)  material  to  be  processed,  the 
stronger  the  performance  decrements  were. 

Whenever  a  critical  visual  evaluation  procedure  is  needed 
that  reflects  effects  of  visual  displays  on  performance,  a  two-step 
procedure  is  to  be  recommended.  As  a  first  step,  a  benchmark 
procedure  is  advised.  This  procedure  includes  young  and  well- 
sighted  participants  and  a  task  that  is  visually  rather  than  cog¬ 
nitively  strenuous.  The  second  step  includes  a  broadened 
scope.  Older  users  (as  they  represent  a  major  part  of  the  work¬ 
force)  and  tasks  with  different  visual  and  cognitive  demands 
have  to  be  addressed.  In  addition,  extended  periods  of  on¬ 
screen  viewing  are  to  be  examined  in  order  to  realistically  assess 
the  long-term  effects  of  computer  displays. 

Measures  Used  for  Visual  Evaluations 

The  measures  used  to  quantify  the  effects  of  visual  display  qual¬ 
ity  also  differ  regarding  their  sensitivity  to  visual  degradation. 
Apart  from  effectiveness  and  efficiency  of  performance,  judg¬ 
ments  of  visual  strain  symptoms  and  the  presence  of  visual  com¬ 
plaints  were  also  assessed. 

Among  performance  measures,  global  and  local  parameters 
can  be  distinguished.  Global  performance  measures  are  the 
speed  and  accuracy  of  task  performance;  they  were  widely  used 
across  visual  evaluation  studies.  Reading  and  search  times  re¬ 
flect  the  basic  velocity  of  the  encoding  and  visual  processing. 
However,  system  speed  can  only  then  be  meaningfully  inter¬ 
preted  when  the  accuracy  of  task  completion  is  also  taken  into 
account.  In  this  context,  the  speed-accuracy  trade  off  has  a  con¬ 
siderable  impact.  Whenever  visual  conditions,  for  example  in 
computer  displays,  are  suboptimal  (under  low  resolution  or 
contrast  conditions),  the  speed  and  accuracy  of  information 
processing  cannot  be  kept  at  a  constantly  high  level  simultane¬ 
ously.  Rather,  one  of  the  two  components  deteriorates  while  the 
other  is  kept  constant.  When  both  components  are  considered, 
the  overall  costs  for  the  information  processing  can  be  assessed. 
Nevertheless,  reliable,  global  measures  do  not  provide  an  un¬ 
derstanding  of  the  processes  that  cause  the  performance  decre¬ 
ment.  Therefore,  the  oculomotor  behavior  was  consulted  in  or¬ 
der  to  gain  deeper  insights  into  the  nature  of  the  deteriorated 
encoding  process  (Owens,  &  Wolf-Kelley  1987;  Iwasaki  &  Kuri- 
moto,  1988;  Jaschinski,  Bonacker,  &  Alshuth,  1996;  Best,  Little¬ 
ton,  Gramopadhye,  &  Tyrell,  1996).  The  spatial  and  temporal 
characteristics  of  saccades  were  predominantly  analyzed.  Under 
visual  degradation  conditions,  more  saccades  are  executed, 
smaller  in  size,  and  accompanied  by  increased  fixation  times 
(Ziefle,  1998;  Ziefle  2001a,  2001b).  Also,  accommodation  states, 
pupil  size,  vergence  efforts,  and  visual  scan  path  complexity 
(Schlick,  Winkelholz,  Motz,  &  Luczak,  2006)  were  taken  as  mea¬ 
sures  to  quantify  the  effects  of  display  quality.  The  higher  effort 
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required  for  the  visual  system  observed  when  reading  elec¬ 
tronic  texts  is  assumed  to  form  the  physiological  basis  of  visual 
fatiguing  (Wilkins,  Nimmo-Smith,  Tait,  McManus,  Della  Sala, 
Tilley,  et  al.,  1984).  Beyond  oculomotor  measures,  stress  indi¬ 
cators  such  as  heart-rate  variability  and  eye  blinks  were  used  to 
determine  the  effects  of  display  quality  (Oehme,  Wiedenmaier, 
Schmidt,  &  Luczak,  2003). 

Another  approach  to  quantifying  effects  of  display  quality  is 
to  collect  user  judgments  with  respect  to  what  is  called  visual  fa¬ 
tiguing,  visual  stress,  strain,  visual  load,  or  discomfort.  Visual 
fatiguing  is  considered  as  a  complex  subjective  measure  based 
on  the  awareness  of  several  symptoms:  burning,  dry,  aching, 
and  watering  eyes,  difficulties  in  reading  (text  becomes  blurred 
and  fades  away)  as  well  as  increased  blinking  and  eye  pressure 
(Stone  et  al.,  1980;  Hung,  Ciuffreda,  &  Semmlow,  1986).  User 
judgments  represent  an  important  aspect  of  display  quality,  as 
perceived  visual  comfort  is  the  most  direct  source  of  users’  sat¬ 
isfaction.  Even  though  ratings  are  easily  to  obtain,  their  validity 
is  not  without  controversy.  First,  users  differ  considerably  with 
respect  to  their  responsiveness  to  visual  strain  symptoms.  Fur¬ 
thermore,  the  sensitivity  to  visual  stress  is  not  constant,  but 
changes  with  the  time  spent  on  the  task  and  age  (Wolf  &  Schaf- 
fra,  1964).  Second,  factors  like  fear  of  failing  and  the  misinter¬ 
pretation  of  one’s  own  performance  may  contaminate  judg¬ 
ments  and  make  it  necessary  to  prove  that  ratings  match 
performance  outcomes.  Third,  the  emergence  of  visual  fatigu¬ 
ing  does  not  necessarily  follow  the  same  time  course  as  perfor¬ 
mance  shifts.  Visual  fatiguing  is  therefore  not  necessarily  ac¬ 
companied  by  performance  decrements  and  vice  versa  (Yeh  & 
Wickens,  1984;  Howarth  &  Istance,  1985).  In  order  to  obtain  a 
valid  picture,  it  is  therefore  advisable  to  include  both  subjective 
and  objective  measurements. 

Effects  of  Specific  Display  Factors  on  Visual  Performance 

As  a  first  factor,  effects  of  display  resolution  (see  section  “Resolu¬ 
tion”)  are  considered.  Technically  speaking,  display  resolution  is 
defined  as  the  number  of  pixels  per  inch  out  of  which  objects  and 
letters  are  displayed  on  the  screen  surface.  The  psychophysical 
correlate  of  display  resolution,  and  this  is  of  visual  ergonomic  in¬ 
terest,  is  the  sharpness  of  contours  and  the  clarity  with  which  ob¬ 
jects  can  be  identified.  High-display  resolutions  allow  more  ob¬ 
jects  to  be  displayed  on  a  given  screen  space,  as  object  sizes 
decrease  with  increasing  resolutions.  Even  if  this  may  be  advan¬ 
tageous  when  screen  space  restrictions  are  considered,  it  might 
be  counter-productive:  the  benefit  of  higher  resolutions  in  terms 
of  contour  sharpness  may  be  negated  by  the  smaller  object  sizes, 
and  this  has  to  be  counterbalanced.  The  central  question  is 
whether  higher  resolutions  lead  to  quantifiable  improvements. 
The  relevant  literature  shows  that  this  is  indeed  the  case  (Gould 
et  al.,  1987;  Miyao,  Hacisalihzade,  Allen,  &  Stark, 1989;  Ziefle, 
1998).  Due  to  unequal  technical  standards  over  time,  resolution 
levels  differ  considerably  between  the  studies  (40-90  dpi) .  In 
conclusion,  it  can  be  said  that  up  to  90  dpi  performance,  the 
higher  resolution,  is  better.  Young  adults’  search  performance 
was  increased  by  20%  in  the  90  dpi  condition  compared  to  the 
60  dpi  condition.  This  greater  effectiveness  was  caused  by  a  more 
efficient  oculomotor  control:  fixation  times  decreased  by  11% 


and,  in  addition,  less  saccades  (5%)  were  executed  for  process¬ 
ing  the  visual  information.  Furthermore,  performance  in  the  low- 
resolution  condition  was  not  only  found  to  be  interrelated  with 
the  emergence  and  strength  of  visual  fatiguing  symptoms,  but 
also,  the  probability  of  fatiguing  symptoms  was  significantly 
higher  when  two  suboptimal  viewing  conditions  were  coincident: 
the  longer  participants  worked  in  the  low-resolution  condition 
the  stronger  performance  decrements  were  (Ziefle,  1998). 

A  second  factor  to  be  examined  extensively  was  intermittent 
light  stimulation,  which  is  characteristically  present  in  CRT 
screens,  referred  to  as  the  “refresh  rate”  (see  section  “Refresh 
Rates”).  The  perceptual  component  here  is  a  flicker  sensation, 
which  predominately  occurs  when  low  refresh  rates  are  pre¬ 
sent.  Reading  on  a  screen  with  low  refresh  rates  (50  Hz)  is  ex¬ 
tremely  hard  work  for  users  and  leads  to  considerable  eyestrain 
symptoms  even  after  short  reading  periods.  'With  increasing 
refresh  rates  (>70  Hz)  the  flicker  sensation  decreases,  but  it 
should  be  noted,  that  physically,  the  intermittent  stimulation 
is  nevertheless  still  present  and  may  affect  performance.  Many 
ergonomic  studies  have  concerned  themselves  with  screen 
flicker  and  found  that  it  is  a  major  source  of  performance 
decrements  and  visual  fatiguing,  involving  the  emergence  of 
migraines  (Boschmann  &  Roufs,  1994;  Jaschinski  et  al.,  1996; 
Kiiller  &  Laike,  1998;  Lindner  &  Kropf,  1993).  The  nature  of  the 
flicker  effect  is  referred  to  as  “a  basic  disturbance  of  eye  move¬ 
ment  control”  caused  by  the  intermittent  light  (Kennedy  & 
Murray,  1993;  Baccino,  1999).  Regarding  the  impact  of  refresh 
rates  on  performance,  it  was  shown  that  performance  with  the 
100  Hz  screen  was  14%  better  compared  to  50  Hz,  and  ocular 
efficiency  was  16%  better  (fixation  times  and  number  of  fixa¬ 
tions  per  line).  Apart  from  that,  effects  of  the  time  spent  on 
the  task  were  found  (comparing  the  beginning  and  the  end  of 
the  task  completion).  Already  in  a  test  period  of  only  30  min¬ 
utes,  visual  and  ocular  performance  decreased  by  8%  (Ziefle, 
2001a).  However,  effects  of  refresh  rates  on  performance  do 
not  follow  a  linear  (“the  higher  the  better”) ,  but  rather  a  curve- 
linear  relationship:  when  a  140  Hz  screen  was  compared  to  a 
screen  driven  with  100  Hz,  search  performance  and  ocular  ef¬ 
ficiency  were  found  to  deteriorate  in  the  140  Hz  condition  (by 
5-10%  across  measures).  This  confirms  that  the  visual  system  is 
sensitive  to  high-frequent  intermittent  light,  even  though  users 
are  no  longer  able  to  detect  flicker  in  this  high  refresh-rate  con¬ 
dition.  Overall,  we  can  say  in  summary  that  refresh  rates  of 
90-100  Hz  in  CRTs  facilitate  reasonably  good  performance. 

Third,  effects  of  luminance  (see  section  “Brightness  and 
Glare”)  and  contrast  (see  section  “Resolution”)  were  also  re¬ 
ported  to  play  a  major  role  for  visual  performance  when  com¬ 
puter  displays  are  used  (Plainis  &  Murray,  2000;  van  Schaik  & 
Ling,  2001;  Ziefle,  Groger,  &  Sommer,  2003;  Oetjen,  Ziefle,  & 
Groger,  2005).  However,  results  are  not  easy  to  integrate.  The 
contrast  itself  is  a  complex  factor,  but  it  is  also  escorted  by  sev¬ 
eral  other  lighting  characteristics,  which  are  highly  interdepen¬ 
dent  and  affect  visual  performance  outcomes  separately  as  well 
as  in  combination.  The  contrast,  always  a  ratio  out  of  two  lumi¬ 
nance  levels  (background  and  object)  was  proven  to  markedly 
affect  visual  performance.  Low-contrast  ratios  lead  to  perfor¬ 
mance  decrements  and  oculomotor  efficiency  in  a  range  of  10- 
20%.  But  contrast  ratios  do  not  specify  which  absolute  lumi¬ 
nance  levels  constitute  the  respective  contrast  and — depending 
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on  which  contrast  definition  is  used — they  mostly  do  not  differ¬ 
entiate  between  whether  information  is  displayed  in  negative  or 
positive  polarity  (bright  letters  on  dark  vs.  dark  letters  on  a 
bright  background).  In  addition,  ambient  lighting  considerably 
interferes  with  the  display  contrast.  Thus,  knowing  the  contrast 
level  without  detailing  polarity,  absolute  luminance  levels  and 
the  level  of  ambient  lighting  does  not  allow  the  interpretation  of 
performance  outcomes.  Generally,  it  was  found  that  positive  po¬ 
larity  displays  result  in  a  better  visual  performance  than  negative 
ones  (Bauer  &  Cavonious,  1980).  The  positive  effect  of  positive 
polarity  occurs  due  to  the  (simple)  fact  that  light  is  essential  for 
visual  encoding  (the  absolute  light  level  is  higher  with  positive 
displays),  and  because  it  is  less  demanding  when  the  computer 
display  has  the  same  polarity  as  hard  copy  (especially  because 
both  media  are  often  used  simultaneously  in  the  same  work  set¬ 
ting),  and  does  not  require  the  user  to  alternately  readapt.  Am¬ 
bient  lighting  strongly  reduces  the  contrast  levels  on  the  screen, 
because  the  two  lighting  sources  interfere.  Thus,  it  is  advanta¬ 
geous  for  performance  if  illumination  in  the  room  is  sparse 
(Kokoschka  &  Haubner,  1986).  Regarding  display  luminance  it  is 
recommended  that  luminance  levels  should  not  be  too  bright, 
preventing  interfering  glare,  especially  when  ambient  lighting 
is  low  and  the  probability  of  glare  rises  (Schenkmann,  Fukunda, 
&  Persson,  1999). 

The  importance  of  contrast  and  luminance  has  received  re¬ 
newed  interest  in  recent  studies  (Hollands,  Cassidy,  McFadden, 
&  Boothby  2001;  Hollands,  Parker,  McFadden,  &  Boothby  2002; 
Groger,  Ziefle,  &  Sommer,  2003;  Ziefle  et  al.,  2003;  Oetjen  & 
Ziefle,  2004).  Among  the  visual  factors,  TFT  screens  (see  section 
“Nonemitter/Liquid-Crystal  Display”)  have  the  basic  advantage 
of  being  flicker  free.  However,  they  also  have  one  major  disad¬ 
vantage:  the  displayed  information  is  “perfectly”  visible  if  users 


work  in  front  of  the  TFT  screen.  Whenever  this  “optimal”  posi¬ 
tion  is  not  present,  visibility  is  distinctly  worse.  This  specific 
property  of  TFT  screens  is  called  “anisotropy.”  A  display  is  called 
anisotropic  if  it  shows  a  deviation  of  more  than  10%  of  its  lumi¬ 
nance  subject  to  the  target  location  or  viewing  angle  (ISO 
13406-2  2001).  The  nature  of  anisotropy  is  such  that  photomet¬ 
ric  measures  (contrast  and  luminance)  are  not  constant  over  the 
screen  surface,  but  rather  decrease  with  increasing  viewing  an¬ 
gle.  In  order  to  quantify  the  change  in  photometric  measures 
at  different  viewing  angles,  a  measurement  set  up  was  devel¬ 
oped  that  made  it  possible  to  exactly  correlate  photometric 
measures  and  visual  performance  (Groger  et  al.,  2003;  Ziefle 
et  al.,  2003).  The  screen  was  virtually  cut  into  63  (9  X  7)  fields. 
The  luminance  of  bright/dark  areas  was  individually  measured 
by  a  photometer  and  contrasts  were  determined.  Then  different 
measuring  positions  were  adopted:  first,  the  “standard  view” 
was  applied,  commonly  used  by  the  industry.  The  photometer 
was  set  in  front  of  the  screen  and  displaced  gradually  from  field 
to  field,  with  the  photometer  always  set  at  right  angles  to  the 
screen  (Fig.  11.19  left). 

From  an  ergonomic  point  of  view,  this  procedure  (as  shown 
in  Fig.  11.19,  left)  is  highly  artificial,  as  users  do  not  displace 
themselves,  but  rather  change  their  view:  viewing  angles  change 
remarkably  depending  on  where  users  are  looking.  This  is  en¬ 
tirely  disregarded  in  this  measurement  procedure.  In  order  to 
simulate  real  viewing  conditions,  the  “user  view”  (Fig.  11.19,  cen¬ 
ter)  as  well  as  the  “bystander  view”  (Fig.  11.19,  right)  were  real¬ 
ized.  For  the  user  view,  the  photometer  was  positioned  centrally 
in  front  of  the  screen.  As  the  position  of  the  photometer  did  not 
change,  viewing  angles  increased  with  distance  from  the  center 
of  the  screen,  thereby  emulating  the  user’s  head  movements 
when  looking  toward  the  screen  edges.  For  the  “bystander 


FIGURE  11.19.  Quantifying  anisotropy  in  TFT  screens:  left:  "standard  view”  with  the  photometer 
displaced  at  right  angles;  centre:  "user  view”  with  the  photometer  emulating  the  user's  head 
movements;  right;  "bystander  view”  with  the  photometer  positioned  off-axis  (Ziefle  et  al.,  2003). 
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FIGURE  1 1 .20.  Photometric  measures  in  the  "bystander  view"  for  a  TFT  (left  side)  display  and  a 
CRT  (right  sight)  display  (Oetjen  &  Ziefle,  2004) 


view,”  the  photometer  was  set  to  a  central  point  of  the  display 
and  turned  to  the  different  measuring  fields  (30°  and  50°).  For 
extended-viewing  conditions,  the  photometer  was  set  off  axis, 
and  its  view  pointed  to  the  screen  from  the  side  (left  and  right 
side,  respectively). 

In  Fig.  11.20  it  becomes  evident  that  photometric  measures 
change  dramatically  as  a  function  of  viewing  angle  in  TFT  Dis¬ 
plays  (Figure  11.20,  left),  and  to  a  much  lesser  extent  in  CRTs 
(Fig.,  11.20,  right). 

Performance  outcomes  showed  that  anisotropic  effects  have 
to  be  taken  seriously.  Visual  performance  with  TFT  screens  con¬ 
siderably  deteriorates  by  about  1096  for  young  adults  when  they 
had  to  look  at  the  screen  off  axis.  Anisotropy  plays  a  role  in 
many  real-life  work  settings:  (air,  rail)  traffic  controlling  envi¬ 
ronments  or  stock  exchanges  use  several  TFT  screens  simulta¬ 
neously  (placed  parallel  to  and/or  upon  another)  that  have  to  be 
surveyed  by  one  operator.  Another  example  is  the  schooling 
and  training  context  where  it  is  quite  usual  for  several  users  to 
work  with  only  one  screen.  Thus,  from  an  ergonomic  point  of 
view,  TFT’s  anisotropy  must  be  regarded  as  a  visual  limitation 
of  the  TFT  Display  technology,  at  least  when  a  fast  and  accurate 
visual  detection  performance  is  of  importance. 

Comparison  of  Different  Display  Types 

During  the  last  30  years,  since  the  first  evaluation  studies  of  dif¬ 
ferent  displays  were  published,  a  huge  number  of  studies  have 
dealt  with  the  fundamental  question  as  to  which  display  type  as¬ 
sures  the  highest  reading  comfort  and  the  best  visual  perfor¬ 
mance.  Typically,  and  this  reflects  the  chronological  develop¬ 
ment,  hard  copy,  CRT,  and  TFT  displays  were  compared  with 


respect  to  visual  performance.  As  electronic  display  quality  has 
improved  continuously  over  time,  the  technical  standards  on 
which  the  evaluations  were  based,  differ  greatly.  This  is  a  factor 
that  should  be  borne  in  mind.  Recent  studies  include  new  de¬ 
velopments  (augmented  reality,  see  section  ‘Augmented  Reality”) 
in  visual-display  technology. 

However,  independently  of  the  time  and  technical  standard 
of  computer  displays,  the  one  and  only  display  that  outper¬ 
formed  all  others  with  respect  to  visual  performance  and  com¬ 
fort  is  the  traditional  hard  copy.  Its  development  covers  more 
than  200  years  (since  the  invention  of  the  industrial  produc¬ 
tion  of  hard  copy  in  France)  and  has  been  continuously 
amended  with  respect  to  readability  and  visibility,  by  the  ex¬ 
pertise  of  typesetters  and  typographers.  Thus,  hard  copy  can 
be  regarded  as  an  outstandingly  suitable  display  with  regard  to 
visual  ergonomic  demands.  It  provides  high  contrast  and  res¬ 
olution  without  disturbance  by  glare,  screen  reflections,  or 
flicker.  Accordingly,  the  majority  of  studies  show  that  perfor¬ 
mance  in  CRT  screens  is  significantly  lower  compared  to  paper 
(Gould  et  al.,  1987;  Heppner  et  al.,  1985;  Ziefle,  1998).  Hard¬ 
copy  was  also  found  to  outperform  modern  TFT  displays 
(Ziefle,  2001a). 

As  the  private  and  public  need  for  electronically  displayed  in¬ 
formation  continuously  increases  the  evaluation  should  focus 
on  the  quality  of  different  electronic  displays.  Here,  the  classi¬ 
cal  comparison  of  CRT  and  TFT  displays  is  of  central  interest. 
The  CRT,  the  hitherto  widespread  screen  type,  seems  to  be  be¬ 
ing  phased  out  due  to  its  specific  lighting  characteristics 
(flicker).  The  development  of  TFT  technology  was  therefore 
highly  welcome:  TFT  screens,  lightweight  and  flat,  are  flicker 
free  and  display  information  at  much  higher  levels  of  luminance 
and  contrast. 
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At  first  sight,  the  outcomes  of  studies  that  compare  CRTs  and 
TFTs  reveal  an  inconsistent  picture:  There  are  studies  in  which 
TFT  displays  led  to  a  higher  visual  performance  compared  to 
CRT  screens  (Menozzi,  Lang,  Napflin,  Zeller,  &  Krueger,  2001; 
Ziefle  2001a,b),  but  there  are  also  studies  in  which  CRT  displays 
outperform  TFT  screens  (Oetjen  &  Ziefle,  2004;  Oetjen  et  al., 
2005;  Oetjen  &  Ziefle,  in  press).  This  inconsistency  can  be  re¬ 
solved  when  taking  the  experimental  purposes  and  evaluation 
settings  into  account.  Whenever  the  impact  of  screen  flicker  was 
of  main  interest,  the  methodological  focus  was  to  compare  both 
displays  with  respect  to  refresh  rates  (present  in  the  CRT  and 
absent  in  the  TFT) .  This  procedure  ruled  out  any  other  differ¬ 
ences  between  both  displays  that  may  confound  or  compensate 
flicker  effects  (anisotropy).  Thus,  participants  were  usually 
seated  in  front  of  the  screen  and  had  to  work  on  tasks  that  were 
displayed  centrally  (otherwise,  anisotropic  effects  would  have 
been  mixed  up).  In  these  cases,  visual  performance  clearly  favors 
the  TFT  and  shows  that  screen  flicker  in  CRTs  is  disadvantageous. 
In  addition,  it  was  found  that  the  relative  benefit  TFT  technology 
is  disproportionately  higher  for  older  users:  while  young  adults 
(20-25  years)  showed  performance  superiority  of  10%  for  the 
TFT  compared  to  the  CRT  (100  Hz),  the  benefit  from  the  TFT 
was  16%  and  27%,  respectively,  when  older  users  (40-65  years) 
were  examined  (Ziefle,  2001b).  Note,  however,  that  the  older 
adults  generally  showed  nearly  40%  lower  visual  performance. 

However,  the  benefit  of  the  TFT  thus  determined  has  not  yet 
taken  anisotropic  effects  into  account.  As  soon  as  anisotropy  is 
considered,  the  picture  changes.  Studies  concerned  with  the 
two  display  types  from  the  anisotropy  perspective  showed  that 
the  TFT  superiority  over  the  CRT  disappears  when  extended 
viewing  angles  were  taken  into  account  (Groger  et  al.,  2003; 
Hollands,  et  al.,  2001,  2002;  Oetjen  &  Ziefle,  2004;  Oetjen  & 
Ziefle,  in  press).  When  considering  all  screen  positions,  visual 


performance  decreased  by  8%  when  a  TFT  was  used  instead  of 
a  CRT  screen.  When  a  central  view  was  applied,  detection  times 
were  14%  faster  than  when  viewing  50°  off  axis.  A  further  aggra¬ 
vating  factor  was  font  size.  Deterioration  when  detecting  small 
targets  (1.5  mm  vs.  2.4  mm)  rose  to  almost  30%  with  the  TFT, 
in  contrast  to  “only”  20%  with  the  CRT.  Whenever  all  sub-opti¬ 
mal  factors  (small  font,  TFT  screen,  off-axis  viewing)  occurred 
simultaneously,  performance  decrements  were  found  to  mount 
up  to  as  much  as  38%  (Oetjen  &  Ziefle,  2004). 


STANDARDS,  RULES 
AND  LEGAL  REGULATIONS 
AND  HEALTH  PROTECTION  ASPECTS 


There  are  a  large  number  of  national  and  international  stan¬ 
dards,  regulations  and  seals  of  approval  that  regulate  diverse 
aspects  of  the  design  and  use  of  visual  displays.  Among  the  areas 
covered  are  ergonomics,  emissions,  energy  consumption,  elec¬ 
trical  safety,  and  documentation.  Due  to  very  fast  technological 
development,  only  selected  standards  are  listed  in  Table  11.1. 
Up-to-date  information  can  be  found  on  the  websites  listed  in 
Table  11.2. 


DISPLAYS  FOR  SELECTED  APPLICATIONS 


Virtual  Reality 

Immersive  virtual  reality  (VR)  is  a  technology  that  enables  users 
to  “enter  into”  computer-generated  3D  environments  and  in- 


TABLE  11.1.  A  Selected  List  of  National  and  International  Standards,  Regulations  and  Seals 
of  Approval  Regulating  Design  and  Use  of  Visual  Displays 


Ergonomics 

Image  quality 
Reflection  characteristics 
Color  requirements 
Brightness  and  contrast  adjustable 

Tilt  and  swivel 
Gloss  of  housing 


DIN  EN  29241-3,  ISO  13406-2,  TCO'99,  GS-Mark,  Ergonomics  Approved  Mark 
DIN  EN  ISO  9241-7,  ISO  13406-2,  TCO'99,  GS-Mark,  Ergonomics  Approved  Mark 
DIN  EN  ISO  9241-8,  ISO  13406-2,  TCO’99,  GS-Mark,  Ergonomics  Approved  Mark 
European  VDU  directive  90/270  EEC,  German  ordinance  for  work  with  visual  display  units,  TCO'99, 
GS-Mark,  Ergonomics  Approved  Mark 
TCO'99,  GS-Mark,  German  ordinance  for  work  with  visual  display  units 
GS-Mark,  German  ordinance  for  work  with  visual  display  units 


Emissions 

Noise 

Electrostatic  potential,  electrical  and 
magnetic  fields 
X-ray  radiation 


European  VDU  directive  90/270  EEC,  ISO  7779  (ISO  9296)  German  ordinance  for  work  with  visual 
display  units,  TCO'99,  GS-Mark 
PrEN  50279,  TCO'99,  Ergonomics  Approved  Mark 

TCO'99 


Energy  consumption 
Electrical  safety 
Documentation 

Technical  documentation  and  user  manual 


EPA  Energy  Star,  TCO'99,  VESA  DPMS 
TCO'99,  GS-Mark 

German  Equipment  Safety  Law,  GS-Mark 
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TABLE  1 1 .2.  Websites  that  Provide  Information  about  National  and  International  Standards,  Regulations 

and  Seals  of  Approval  for  Visual  Displays 

Websites 

European  directives  European  directives  regarding  health  and  safety  are  available  via  the  website  of  the  European  Agency  for  Safety 

and  Health  at  Work.  Internet:  http://europe.osha.eu.int/legislation/ 

EPA  The  US  Environmental  Protection  Agency  (EPA)  promotes  the  manufacturing  and  marketing  of  energy  efficient 

office  automation  equipment  with  its  Energy  Star  Program.  Internet:  http://www.energystar.gov/ 

National  laws  and  ordinances  National  laws,  directives  and  regulations  regarding  health  and  safety  for  many  European  and  some  other 

countries  are  available  via  the  website  of  the  European  Agency  for  Safety  and  Health  at  Work.  Internet: 
http://europe.osha.eu.int/legislation/ 

The  International  Organization  for  Standardization  (ISO)  is  a  worldwide  federation  of  national  standards  bodies 
from  140  countries.  Internet:  http://www.iso.ch 

European  standards  are  available  via  the  website  of  the  European  Agency  for  Safety  and  Health  at  Work. 

Internet:  http://europe.osha.eu.int/legislation/standards 
DIN  is  the  German  Institute  for  Standardization 
Internet:  http://www.din.de 

TOO  (The  Swedish  Confederation  of  Professional  Employees)  developed  requirements  for  PCs.  The  TCO’99  label 
specifies  ergonomic,  ecological,  energy  consumption  and  emission  requirements.  Internet:  http://www.tco.se 
The  Video  Electronics  Standards  Association  (VESA)  creates  standards  for  transmissions  between  computers  and 
video  monitors  that  signal  inactivity.  Internet:  http://www.vesa.org 
GS-Mark  The  German  safety  approval  mark  shows  conformity  with  the  German  Equipment  Safety  Law  and  signals  that  the 

product,  as  well  as  the  user  manual  and  production  process,  has  been  tested  by  an  authorized  institution  such 
as  TUV  Rheinland.  Internet:  http://www.tuv.com 

Ergonomics  Approved  Mark  The  Ergonomics  Approved  Mark  demonstrates  that  a  Visual  Display  Terminal  complies  with  numerous  ergonomic 

standards.  The  test  mark  is  devised  by  TUV  Rheinland.  Internet:  http://www.tuv.com 


ISO 

EN 

DIN 

TCO 

VESA 


teract  with  them.  VR  technology  involves  the  additional  moni¬ 
toring  of  body  movements  using  tracking  devices,  enabling  in¬ 
tuitive  participation  with  and  within  the  virtual  world.  Additional 
tracked  peripheral  devices  permit  virtual  navigation,  pick-and- 
place  manipulation  of  virtual  objects  (Schlick,  Reuth,  &  Luczak, 
2000),  interaction  with  humanoids  and  avators  by  using  data 
gloves,  space  joysticks,  or  3D-tracked  balls  (Holmes,  2003). 

A  3D-view  is  generated  using  different  concepts  and  tech¬ 
nologies  (see  sections  “Human-Related  Display  Functionality 
and  other  Criteria”).  The  HMD  is  a  commonly  used  display  de¬ 
vice.  For  VR,  a  closed-view  HMD  in  non-see-through  mode  is 
used  (see  section  “Augmented  Reality”).  Visual  displays,  espe¬ 
cially  HMDs,  have  decreased  substantially  in  weight  since  the 
first  invention  of  an  immersive  head-worn  display,  but  are  still 
hindered  by  cumbersome  designs,  obstructive  tethers,  sub- 
optimal  resolution,  and  an  insufficient  field  of  view.  Recent  ad¬ 
vantages  in  wearable  computer  displays,  which  can  incorpo¬ 
rate  miniature  LCDs  directly  into  conventional  eyeglasses  or 
helmets,  should  simplify  ergonomic  design  and  further  reduce 
weight  (Lieberman,  1999).  Most  of  the  advanced  closed-view 
HMDs  have  adjustable  inter  pupillary  distance  (IPD)  in  order 
to  avoid  mismatches  in  depth  perception.  They  provide  a  hor¬ 
izontal  field  of  view  of  30  to  35  degrees  per  eye  and  a  resolu¬ 
tion  of  at  least  800  600,  and  therefore  outperform  predecessor 
systems  (Stanney  &  Zyda,  2002).  Large  images  of  VR  scenes 
can  be  reproduced  by  a  projector  on  the  front  or  back  of  one 
or  more  screens.  Stereoscopic  images  for  both  eyes  are  pro¬ 
jected  either  alternately  or  with  different  polarization  at  the 
same  time  (see  section  “Human-Related  Display  Functionality 
and  Other  Criteria”). 


The  “CAVE”  (Computer  Animated  Virtual  Environment)  is  a 
further  development  in  projection  technology.  It  consists  of  a 
cube  with  several  panels  onto  which  the  images  are  projected 
from  behind.  Depending  on  the  construction  they  are  C3  (two 
walls  and  the  floor),  C4,  C5,  or  C6,  respectively.  A  CAVE  provides 
space  for  small  groups,  but  can  track  and  optimize  the  stereo¬ 
scopic  view  for  only  one  person.  The  others  perceive  distor¬ 
tions,  especially  on  corners  and  edges. 

Augmented  Reality 

Augmented  Reality  (AR)  characterizes  the  visual  fusion  of  3D  vir¬ 
tual  objects  into  a  3D  real  environment  in  real  time.  Unlike  vir¬ 
tual  reality  or  environments,  AR  supplements  reality,  rather  than 
completely  replacing  it  (Azuma,  2001).  AR  can  be  used  in  many 
applications,  such  as  production  (Schlick  et  al.,  1997),  assem¬ 
bly  and  service,  medical  (Park,  Schmidt,  &  Luczak,  2005),  archi¬ 
tecture,  entertainment  and  edutainment,  military  training,  de¬ 
sign,  robotics,  and  telerobotics. 

One  approach  to  overlaying  the  real  world  with  virtual  in¬ 
formation  is  to  use  an  HMD.  Superimposition  can  occur  in  two 
ways:  using  an  HMD  with  see-through  mode  (optical-see- 
through)  and  an  HMD  with  non-see-through  mode,  called 
video-see-through  (feed-through).  The  HMD  in  the  non-see- 
through  mode,  which  is  also  used  in  virtual  reality  visually  iso¬ 
lates  the  user  completely  from  the  surrounding  environment, 
and  the  system  must  use  video  cameras  to  obtain  a  view  of  the 
real  world.  The  optical  see-through  HMD  eliminates  the  video 
channel;  the  user  is  directly  looking  at  the  real  scene.  Instead, 
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the  merging  of  real  world  and  virtual  augmentation  is  performed 
optically  in  front  of  the  user  using  a  mini  projector  and  half-sil¬ 
vered  mirror.  Both  systems  have  advantages  and  disadvantages 
in  usability  and  technology.  In  an  optical  see-through  system 
there  is  a  time  lag  between  the  real-world  information  and  the 
virtual  information  that  is  blended  into  the  field  of  view,  which 
is  caused  by  the  computing  time  for  image  generation.  In  addi¬ 
tion,  the  calibration  of  such  a  system  is  rather  complicated.  In 
video-see-through  systems  the  quality  of  the  video  depends  on 
the  technology  of  the  cameras  and  the  display.  There  are  sev¬ 
eral  factors  (i.e.  limited  refresh  rate,  displacement  of  the  cameras 
from  eye  level,  and  time  delay)  that  have  an  adverse  effect  on  hu¬ 
man  perception  and  hand-eye  coordination  (Biocca  &  Rolland, 
1998;  Oehme  et  al.,  2001;  Luczak,  Park,  Balazs,  Wiedenmaier,  & 
Schmidt,  2003;  Park  et  al.,  2005;  Ziefle  et  al.,  2005).  Another  dis¬ 
advantage  is  the  severe  loss  of  information  because  the  percep¬ 
tion  of  the  real-work  environment  is  limited  to  the  maximum 
resolution  and  FOV  of  the  displays  and  cameras. 

Another  approach  to  overlaying  the  visual  information  is  to 
use  a  VRD.  In  contrast  to  the  screen-based  HMDs  just  described, 
the  VRD  reaches  the  retina  directly  with  a  single  stream  of  pix¬ 
els,  thus  guaranteeing  a  clear,  sharp  projection  of  different  kinds 
of  information  (Microvision,  2000).  Because  of  the  higher  laser- 
light  intensity,  the  half-silvered  mirrors  commonly  used  for  see- 
through  HMDs  can  be  optimized  to  a  maximum  translucence, 
improving  the  see-through  quality.  Furthermore,  the  display’s 
maximum  resolution  is  no  longer  determined  by  the  tolerances 
of  manufacturing  of  the  display  pixels,  but  by  the  control  logic 
and  quality  of  the  deflection  mirror. 

A  clip-on  display  is  a  combination  of  the  optical  and  video- 
see-through  modus.  A  tiny  LCD  screen  with  video-based  real¬ 
time  images  can  be  clipped  onto  eyeglasses  or  safety  glasses. 
The  user  can  easily  move  back  and  forth  between  the  images  on 
the  monitor  and  the  real  world  beyond  it. 

Mobile  Phones  and  Handheld  Devices 

Mobile  phones  and  handheld  electronic  devices  represent  one 
of  the  fastest-growing  technological  fields  ever.  Already  in  1999, 
500  million  mobile  phones  were  sold  worldwide,  and  it  is  fore¬ 
casted  that  about  2.6  billion  mobile  devices  will  be  in  use  by  the 
end  of  the  decade.  Mobile  communication  technologies,  such 
as  the  Internet,  universal  mobile  telecommunications  systems 
(UMTS),  wireless  local  area  networks  (WLAN)  and  wireless  ap¬ 
plication  services,  have  changed  and  will  change  social,  eco¬ 
nomic,  and  communicative  pathways  in  private  and  business  sec¬ 
tors  still  more.  Mobile  computing  has  already  expanded  into 
many  service  fields,  including  sales,  health  care,  administration, 
and  journalism.  Different  mobile  or  handheld  devices  can  be 
found  on  the  market:  mobile  phones,  smart  phones,  communi¬ 
cators,  and  PDAs.  The  main  reason  for  their  topicality  is  that  the 
devices  provide  on-the-go  lookup  and  entry  of  information, 
quick  communication  and  instant  messaging  (Weiss,  2002). 
Handheld  devices  may  differ  with  respect  to  physical  dimen¬ 
sions,  (color)  resolution,  contrast,  luminance,  and  touch/stylus 
sensitivity,  but  they  are  all  characterized  by  small  screen  size, 
which  has  a  considerable  impact  on  information  processing.  Dis¬ 
plays  of  mobile  phones  usually  vary  between  3  and  4  cm,  and 


have  fewer  than  6  lines  with  up  to  20  characters  per  line.  PDAs 
and  smart  phones  provide  somewhat  more  display  space  (about 
5X7  cm),  which  can  result  in  up  to  20  lines  of  20  characters  each. 

The  limited  screen  space  is  very  problematic  for  providing 
optimized  information  access  and  the  question  of  how  to 
“best”  present  the  information  on  the  small  display  is  challeng¬ 
ing.  At  first  glance,  the  challenge  seems  to  be  mainly  related  to 
visibility.  If  so,  visually  ergonomic  principles  are  to  be  consid¬ 
ered  predominately:  in  order  to  provide  fast  and  accurate  in¬ 
formation  access,  objects  and  letters  should  be  big  enough, 
text  lines  should  not  be  too  close  together  and  information 
density  should  be  low.  This  is  especially  important  for  older 
adults,  who  often  have  problems  with  sight  (Brodie,  Chat- 
tratichart.  Perry,  &  Scane,  2003;  Omori,  Watanabe,  Takai,  & 
Miyao,  2002).  However,  visibility  concerns  are  not  the  only 
point  at  issue.  There  is  also  the  cognitive  aspect  of  informa¬ 
tion  visualization,  the  requirement  that  the  presentation  of  in¬ 
formation  should  help  users  to  orientate  themselves  properly. 
Disorientation  in  handheld  devices’  menus  is  a  rather  frequent 
problem  (Ziefle  &  Bay,  2005,  2006).  Users  have  to  navigate 
through  a  complex  menu  of  functions,  which  is  mostly  hidden 
from  sight,  as  the  small  window  allows  only  few  functions  or 
small  text  fragments  to  be  displayed  at  a  time.  Considering  vi¬ 
sual  and  cognitive  demands  concurrently,  two  alternatives  can 
be  contrasted:  one  alternative  is  to  display  only  a  little  infor¬ 
mation  per  screen  at  a  time,  which  helps  avoid  visibility  prob¬ 
lems  due  to  high  information  density.  The  other  alternative  is 
to  display  as  much  information  per  screen  as  possible;  this  al¬ 
lows  users  to  have  maximal  foresight  (cognitive  preview)  of 
other  functions  on  the  menu,  which  should  benefit  informa¬ 
tion  access  from  a  cognitive  point  of  view  and  minimize  disori¬ 
entation.  Apparently,  a  sensitive  cutoff  between  visual  and  cog¬ 
nitive  impacts  has  to  be  met  and  it  is  to  be  determined  whether 
the  impact  of  the  cognitive  preview,  or  rather  the  impact  of  vi¬ 
sual  density  is  decisive  for  efficient  information  access.  This  was 
examined  in  a  recent  study  (Bay  &  Ziefle,  2004).  Young  adults 
processed  tasks  on  a  simulated  mobile  phone  where  one, 
three,  or  seven  menu  items  were  presented  at  a  time.  Results 
corroborated  the  meaningful  impact  of  information  presenta¬ 
tion  on  small  screens  for  efficiency  when  using  the  device:  in¬ 
termediate  foresight  (three  functions)  was  found  to  lead  to  the 
best  performance.  If  only  one  menu  item  was  shown  at  a 
time — as  is  the  case  in  a  number  of  devices  on  the  market — the 
users  needed  40%  more  steps  to  process  tasks  than  when  three 
items  were  shown,  confirming  the  cognitive  facet  to  be  cru¬ 
cial.  But  also  when  information  density  was  high  (seven  func¬ 
tions),  performance  declined  by  more  than  30%  compared  to 
the  presentation  of  three  menu  functions  per  screen,  confirm¬ 
ing  the  visual  facet  as  also  playing  an  important  role. 

In  order  to  accommodate  the  problem  of  displaying  a  lot  of 
information  on  a  small  screen,  several  other  techniques  were 
proposed.  One  technique  is  rapid  serial  visual  presentation 
(RSVP),  which  is  based  on  the  idea  of  presenting  information 
temporally  instead  of  spatially  (Rahman  &  Muter,  1999;  Gold¬ 
stein,  Oqvist,  Bayat,  Ljungstrand,  &  Bjork,  2001).  With  RSVI)  one 
or  more  words  are  presented  at  a  time  at  a  fixed  location  on  the 
screen  and  users  have  to  integrate  the  single  text  fragments  bit 
by  bit,  by  scrolling  the  text  forwards  and  (if  necessary)  back¬ 
wards.  A  similar  technique  is  the  times  square  method  (TSM) 
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also  known  as  “leading.”  Here,  the  information  is  not  static,  but 
moves  across  the  screen  (word  by  word  or  sentence  by  sen¬ 
tence),  with  the  text  scrolling  autonomously  from  left  to  right. 
Even  if  trained  readers  may  reach  a  reasonable  level  of  efficiency 
in  both  presentation  modes  (RSVP  and  TSM),  users  report  dis¬ 
liking  both  presentation  forms.  The  low  acceptance  may  be  due 
to  a  high  cognitive  and  visual  demand  imposed  by  the  informa¬ 


tion  presentation:  either,  memory  load  is  high,  as  it  is  difficult 
for  users  not  to  lose  the  plot  while  integrating  the  words  and 
sentences,  which  are  displayed  one  after  another  (RSVP).  Or, 
visual  and  attentional  demands  are  high  as  it  is  crucial  to  catch 
the  text  contents  in  moving  sentences.  Accordingly,  when  users 
fail  to  catch  the  contents  on  the  first  attempt,  they  have  to  wait 
until  the  scrolling  information  appears  again  (TSM). 
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It  is  well  known  that  sense  of  touch  is  inevitable  for  under¬ 
standing  the  real  world.  The  use  of  force  feedback  to  enhance 
computer-human  interaction  (CHI)  has  often  been  discussed. 
A  Haptic  interface  is  a  feedback  device  that  generates  sensation 
to  the  skin  and  muscles,  including  a  sense  of  touch,  weight,  and 
rigidity.  Compared  to  ordinary  visual  and  auditory  sensations,  a 
haptic  is  difficult  to  synthesize.  Visual  and  auditory  sensations 
are  gathered  by  specialized  organs,  the  eyes  and  ears.  On  the 
other  hand,  a  sensation  of  force  can  occur  in  any  part  of  the  hu¬ 
man  body  and  is  therefore  inseparable  from  actual  physical  con¬ 
tact.  These  characteristics  lead  to  many  difficulties  when  devel¬ 
oping  a  haptic  interface.  Visual  and  auditory  media  are  widely 
used  in  everyday  life,  although  little  application  of  haptic  inter¬ 
face  is  used  for  information  media. 

In  the  field  of  virtual  reality,  haptic  interface  is  one  of  the  ma¬ 
jor  research  areas.  The  last  decade  has  seen  significant  advance 
in  development  of  haptic  interface.  High-performance  haptic 
devices  have  been  developed  and  some  of  them  are  commer¬ 
cially  available.  This  chapter  presents  current  methods  and  is¬ 
sues  in  haptic  interface. 

The  next  section  describes  the  mechanism  of  haptic  sensa¬ 
tion  and  overall  view  of  feedback  technologies.  This  section  is 
followed  by  three  sections  that  introduce  examples  of  haptic- 
interface  technologies  developed  by  the  author.  The  last  sec¬ 
tion  presents  application  areas  and  future  prospects  of  haptic 
interface. 


MECHANISM  OF  HAPTICS  AND  METHODS 
FOR  HAPTIC  FEEDBACK 


Somatic  Sensation 

Haptic  interface  presents  synthetic  stimulation  to  somatic  sen¬ 
sation.  Somatic  sensation  is  composed  of  proprioception  and 
skin  sensation.  Proprioception  is  complemented  by  mechanore- 
ceptors  of  skeletal  articulations  and  muscles.  There  are  three 
types  of  joint  position  receptors:  (a)  free  nerve  ending  as  well  as 
(b)  Ruffini  and  (c)  Pacinian  corpuscles.  Ruffini  corpuscle  detects 
static  force.  On  the  other  hand.  Pacinian  corpuscle  has  a  func¬ 
tion  to  measure  acceleration  of  the  joint  angle.  Position  and  mo¬ 
tion  of  the  human  body  is  perceived  by  these  receptors.  Force 
sensation  is  derived  from  mechanoreceptors  of  muscles;  muscle 
spindles  and  Goldi  tendons.  These  receptors  detect  contact 
forces  applied  by  an  obstacle  in  the  environment. 

Skin  sensation  is  derived  from  mechanoreceptors  and  ther¬ 
moreceptors  of  skin.  Sense  of  touch  is  evoked  by  those  recep¬ 
tors.  Mechanoreceptors  of  skin  are  classified  into  four  types: 
(a)  Merkel  disks,  (b)  Ruffini  capsules,  (c)  Meissner  corpuscles, 
and  (d)  Pacinian  corpuscles.  These  receptors  detect  edges  of 
objects,  skin  stretching,  velocity,  and  vibration,  respectively. 

Proprioception  and  Force  Display 

Force  display  is  a  mechanical  device  that  generates  a  reaction 
force  from  virtual  objects.  Recently,  much  research  has  been 
conducted  on  haptic  interfaces,  although  the  technology  is  still 


in  a  state  of  trial  and  error.  There  are  several  approaches  to  im¬ 
plementing  haptic  interfaces: 

Exoskeleton-Type  of  Force  Display 

An  exoskeleton  is  a  set  of  actuators  attached  to  a  hand  or  a 
body.  In  the  field  of  robotics  research,  exoskeletons  have  often 
been  used  as  master  manipulators  for  teleoperations.  However, 
most  master  manipulators  entail  a  large  amount  of  hardware 
and  therefore  have  a  high  cost,  which  restricts  their  application 
areas.  Compact  hardware  is  needed  to  use  them  in  HCI.  The 
first  example  of  a  compact  exoskeleton  suitable  for  desktop  use 
was  published  in  1990  (Iwata,  1990).  The  device  applies  force 
to  the  fingertips  as  well  as  the  palm.  Figure  12.1  shows  an  over¬ 
all  view  of  the  system. 

Lightweight  and  portable  exoskeletons  have  also  been  de¬ 
veloped.  Burdea  used  small  pneumatic  cylinders  to  apply  the 
force  to  the  fingertips  (Burdea,  Zhuang,  Roskos,  Silver,  & 
Langlana,  1992).  Cyber  Grasp  (Fig.  12.2)  is  a  commercially  avail¬ 
able  exoskeleton,  in  which  cables  are  used  to  transmit  force 
[http:/Avww.vti.com] . 

Tool-Handling  Type  of  Force  Display 

The  tool-handling  type  of  force  display  is  the  easiest  way  to 
realize  force  feedback.  The  configuration  of  this  type  is  similar 
to  that  of  a  joystick.  Unlike  the  exoskeleton,  the  tool-handling- 
type  force  display  is  free  from  the  need  to  be  fitted  to  the  user’s 
hand.  It  cannot  generate  a  force  between  the  fingers,  but  has 
practical  advantages. 


FIGURE  12.1.  Desktop  force  display. 
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FIGURE  12.2.  Cyber  Grasp. 


FIGURE  12.3.  PHANTOM. 


A  typical  example  of  this  category  is  the  pen-based  force  dis¬ 
play  (Iwata,  1993).  A  pen-shaped  grip  is  supported  by  two  3DOF 
pantographs  that  enable  a  6DOF  force/torque  feedback.  An¬ 
other  example  of  this  type  is  the  Haptic  Master,  which  was 
demonstrated  at  the  edge  venue  of  SIGGRAPH’94.  The  device 
has  a  ball-shaped  grip  to  which  6  DOF  force/torque  is  fed  back 
(Iwata,  1994).  This  device  employs  a  parallel  mechanism  in 
which  a  top  triangular  platform  and  a  base  triangular  platform 
are  connected  by  three  sets  of  pantographs.  This  compact  hard¬ 
ware  has  the  ability  to  carry  a  large  payload. 

Massie  and  Salisbury  developed  the  PHANToM,  which  has  a 
3  DOF  pantograph  (Massie  &  Salisbury,  1994).  A  thimble  with  a 
gimbal  is  connected  to  the  end  of  the  pantograph,  which  can 
then  apply  a  3  DOF  force  to  the  fingertips.  The  PHANToM  be¬ 
came  one  of  the  most  popular  commercially  available  haptic 
interfaces  (Fig.  12.3). 

Object-Oriented  Type  of  Force  Display 

The  object-oriented  type  of  force  display  is  a  radical  idea  for 
the  design  of  a  haptic  interface.  The  device  moves  or  deforms  to 
simulate  the  shapes  of  virtual  objects.  A  user  can  make  physical 
contact  the  surface  of  the  virtual  object. 

An  example  of  this  type  can  be  found  in  Tachi  and  colleagues’ 
work  (Tachi  et  al.,  1994).  Their  device  consists  of  a  shape- 
approximation  prop  mounted  on  a  manipulator.  The  position  of 
the  fingertip  is  measured  and  the  prop  moves  to  provide  a  con¬ 
tact  point  for  the  virtual  object.  McNeely  proposed  an  idea 
named  “Robotic  Graphics”  (McNeely,  1993),  which  is  similar  to 
Tachi  and  colleagues’  method.  Hirose  developed  a  surface  dis¬ 
play  that  creates  a  contact  surface  using  a  4  X  4  linear  actuator 
array  (Hirota  &  Hirose,  1996).  The  device  simulates  an  edge  or 
a  vertex  of  a  virtual  object. 


Passive  Prop 

A  passive  input  device  equipped  with  force  sensors  is  a  dif¬ 
ferent  approach  to  the  haptic  interface.  Murakami  and  Nakajima 
used  a  flexible  prop  to  manipulate  a  three  dimensional  (3D)  vir¬ 
tual  object  (Murakami  &  Nakajima,  1994).  The  force  applied  by 
the  user  is  measured  and  the  deformation  of  the  virtual  object  is 
determined  based  on  the  applied  force.  Sinclair  developed  a 
force-sensor  array  to  measure  pressure  distribution  (Sinclair, 
1997).  These  passive  devices  allow  users  to  interact  using  their 
bare  fingers.  However,  these  devices  have  no  actuators,  so  they 
cannot  represent  the  shape  of  virtual  objects. 

Proprioception  and  Full-Body  Haptics 

One  of  the  new  frontiers  of  haptic  interface  is  full-body  haptics 
that  includes  foot  haptics.  Force  applied  to  a  whole  body  plays  a 
very  important  role  in  locomotion.  The  most  intuitive  way  to 
move  about  the  real  world  is  walking  on  foot.  Locomotion  in¬ 
terface  is  a  device  that  provides  a  sense  of  walking  while  the 
walker’s  body  is  localized  in  the  real  world.  There  are  several  ap¬ 
proaches  to  realizing  locomotion  interface: 

Sliding  Device 

A  project  named  “Virtual  Perambulator”  developed  locomo¬ 
tion  interface  using  a  specialized  sliding  device  (Iwata  &  Fujii, 
1996).  The  primary  objective  of  the  first  stage  was  to  allow  the 
walker’s  feet  to  change  direction.  Controlling  steering  bars  or 
joysticks  are  not  as  intuitive  as  in  locomotion.  The  first  proto¬ 
type  of  the  Virtual  Perambulator  was  developed  in  1989  (Iwata  & 
Matsuda,  1990).  Figure  12.4  shows  overall  view  of  the  apparatus. 
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FIGURE  12.4.  The  first  prototype  of  Virtual  Perambulator. 


A  user  wore  a  parachute-like  harness  and  omni-directional  roller 
skates.  The  trunk  of  the  walker  was  fixed  to  the  framework  of 
the  system  by  the  harness.  An  omni-directional  sliding  device  is 
used  for  changing  direction  by  foot.  Specialized  roller  skates 
equipped  with  four  casters  were  developed,  which  enabled  two- 
dimensional  motion.  The  walker  could  freely  move  his  or  her 
feet  in  any  direction.  Motion  of  the  feet  was  measured  by  ultra¬ 
sonic  range  detector.  From  the  result  of  this  measurement,  an 
image  of  the  virtual  space  was  displayed  in  the  head-mounted 
display  corresponding  with  the  motion  of  the  walker.  The  di¬ 
rection  of  locomotion  in  virtual  space  was  determined  accord¬ 
ing  to  the  direction  of  the  walker’s  step. 


Treadmill 

A  simple  device  for  virtual  walking  is  a  treadmill,  a  device  or¬ 
dinarily  used  for  physical  fitness.  An  application  of  this  device 
used  as  a  virtual  building  simulator  was  developed  at  UNC 
(Brooks,  1986).  This  treadmill  has  a  steering  bar  similar  to  that 
of  a  bicycle.  A  treadmill  equipped  with  a  series  of  linear  actua¬ 
tors  underneath  the  belt  was  developed  at  ATR  (Noma,  Sugi- 
hara,  &  Miyasato,  2000).  The  device  was  named  GSS,  and  it  sim¬ 


ulates  the  slope  of  virtual  terrain.  The  Treadport  developed  at 
the  University  of  Utah  is  a  treadmill  combined  with  a  large  ma¬ 
nipulator  connected  to  a  walker  (Christensen,  1998).  The  ma¬ 
nipulator  provides  gravitational  force  while  the  walker  is  passing 
a  slope.  Figure  12.5  shows  overall  view  of  the  TreadProt. 

The  omni-directional  treadmill  employs  two  perpendicular 
treadmills,  one  inside  of  the  other.  Each  belt  is  made  from  ap¬ 
proximately  3,400  separate  rollers,  which  are  woven  together 
into  a  mechanical  fabric.  Motion  of  the  lower  belt  is  transmit¬ 
ted  to  the  walker  with  rollers.  This  mechanism  enables  omni¬ 
directional  walking  (Darken,  Cockayne,  &  Carmein,  1997). 


Foot  Pads 

A  foot  pad  applied  to  each  foot  is  an  alternative  implemen¬ 
tation  of  locomotion  interface.  Two  large  manipulators  driven  by 
hydraulic  actuators  were  developed  at  University  of  Utah  and 
applied  to  a  locomotion  interface.  These  manipulators  are  at¬ 
tached  to  a  walker’s  feet.  The  device  is  named  BiPort  [http:// 
www.sarcos.com] .  The  manipulators  can  present  viscosity  of  vir¬ 
tual  ground.  A  similar  device  has  been  developed  at  the  Cyber¬ 
net  Systems  Corporation,  which  uses  two  3  DOF  motion  plat¬ 
forms  for  the  feet  (Poston  et  al.,  1997).  These  devices,  however, 
have  not  been  evaluated  or  applied  to  VE. 


Pedaling  Device 

In  the  battlefield  simulator  of  NPSNET  project,  a  unicycle-like 
pedaling  device  is  used  for  locomotion  in  the  virtual  battlefield 
(Plat  et  al.,  1994).  A  player  changes  direction  by  twisting  his  or 
her  waist. 

The  OSIRIS,  a  simulator  of  night-vision  battle,  utilizes  a  stair- 
stepper  device  (Lorenzo  et  al.,  1997).  A  player  changes  direction 
using  a  joystick  or  by  twisting  his  or  her  waist. 


FIGURE  12.5.  Treadport. 
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Gesture  Recognition  of  Walking 

Slaters  et  al.  proposed  locomotion  in  virtual  environments 
by  “walking  in  place.”  They  recognized  the  gesture  of  walking  us¬ 
ing  a  position  sensor  and  a  neural  network  (Slater  et  al.,  1994). 

Skin  Sensation  and  Tactile  Display 

The  tactile  display  that  stimulates  skin  sensation  is  a  well-known 
technology.  It  has  been  applied  to  communication  aids  for  blind 
persons  as  well  as  a  master  system  of  tele-operation.  A  sense  of 
vibration  is  relatively  easy  to  produce,  and  a  good  deal  of  work 
has  been  done  using  vibration  displays  (Kontarinis  &  Howe, 
1995;  Minsky  &  Lederman,  1997).  The  micropin  array  is  also 
used  for  tactile  displays.  Such  a  device  has  enabled  the  provision 
of  a  teltaction  and  communication  aid  for  blind  persons  (Moy 
et  al.,  2000;  Kawai  &  Tomita,  2000).  It  has  the  ability  to  convey 
texture  or  2D-geometry  (Burdea,  1996). 

The  micropin  array  looks  similar  to  the  object-oriented-type 
force  display,  but  it  can  only  create  the  sensation  of  skin.  The 
stroke  distance  of  each  pin  is  short,  so  the  user  cannot  feel  the 
3D-shape  of  a  virtual  object  directly.  The  major  role  of  tactile 
display  is  to  convey  the  sense  of  Hne  texture  of  an  object’s  sur¬ 
face.  The  latest  research  activities  of  tactile  display  focus  on  se¬ 
lective  stimulation  of  mechanoreceptors  of  the  skin.  As  men¬ 
tioned  at  the  beginning  of  this  section,  there  are  four  types  of 
mechanoreceptors  in  the  skin:  (a)  Merkel  disks,  (b)  Ruffini  cap¬ 
sules,  (c)  Meissner  corpuscles,  and  (d)  Pacinian  corpuscles. 
Stimulating  these  receptors  selectively,  various  tactile  sensations 
such  as  roughness  or  slip  can  be  presented.  Microair  jet  (Asa- 
nura  et  al.,1999)  and  microelectrode  array  (Kajimoto  et  al., 
1999)  are  used  for  selective  stimulation. 


TECHNOLOGIES  IN  FINGER/HAND  HAPTICS 


Exoskeleton 

An  exoskeleton  is  one  of  the  typical  forms  of  haptic  interface. 
Figure  12.6  shows  detailed  view  of  an  exoskeleton  (Iwata,  1990). 

Force  sensation  contains  six-dimensional  information:  three- 
dimensional  forces  and  three-dimensional  torque.  The  core  ele¬ 
ment  of  the  force  display  is  a  six-degree-of-freedom  parallel  ma¬ 
nipulator.  The  typical  design  feature  of  parallel  manipulators  is 
an  octahedron  called  “Stewart  platform.”  In  this  mechanism,  a 
top  triangular  platform  and  a  base  triangular  platform  are  con¬ 
nected  by  six  length-controllable  cylinders.  This  compact  hard¬ 
ware  has  the  ability  to  carry  a  large  payload.  The  structure,  how¬ 
ever,  has  some  practical  disadvantages  in  its  small-working 
volume  and  its  lack  of  backdrivability  (reduction  of  friction)  of 
the  mechanism.  In  our  system,  three  sets  of  parallelogram  link¬ 
ages  (pantograph)  are  employed  instead  of  linear  actuators.  Each 
pantograph  is  driven  by  two  DC  motors.  Each  motor  is  powered 
by  a  PWM  (pulse  width  modulation)  amplifier.  The  top  end  of 
the  pantograph  is  connected  to  the  vertex  of  the  top  platform 
with  a  spherical  joint.  This  mechanical  configuration  has  the 
same  advantages  as  an  octahedron  mechanism.  The  pantograph 


FIGURE  12.6.  Desktop  force  display. 


mechanism  improves  the  working  volume  and  backdrivability 
of  the  parallel  manipulator.  The  inertia  of  motion  parts  of  the 
manipulator  are  so  small  that  compensation  is  not  needed. 

The  working  space  of  the  center  of  the  top  platform  is  a 
spherical  volume  whose  diameter  is  approximately  30  cm.  Each 
joint  angle  of  the  manipulator  is  measured  by  potentiometers. 
Linearity  of  the  potentiometers  is  1%.  The  maximum  payload 
of  the  manipulator  is  2.3  kg,  which  is  more  than  a  typical  hand. 

The  top  platform  of  the  parallel  manipulator  is  fixed  to  the 
palm  of  the  operator  with  a  U-shaped  attachment,  which  en¬ 
ables  the  operator  to  move  the  hand  and  Hngers  independently. 
Three  actuators  are  set  coaxially  with  the  first  joint  of  the 
thumb,  forefinger,  and  middle  finger  of  the  operator.  The  last 
three  fingers  work  together.  DC  servo  motors  are  employed  for 
each  actuator. 

TooLHandling-iype  of  Haptic  Interface 

Exoskeletons  are  troublesome  for  users  to  put  on  or  take  off 
This  disadvantage  obstructs  practical  use  of  force  displays.  A 
tool-handling-type  of  haptic  interface  implements  force  display 
without  a  glove-like  device.  A  pen-based  force  display  is  pro¬ 
posed  as  an  alternative  device  (Iwata,  1993).  A  six-degree-of- 
freedom  force  reflective  master  manipulator,  which  has  pen¬ 
shaped  grip,  was  developed.  Users  are  familiar  with  pens  because 
most  human  intellectual  works  are  done  in  pen.  In  this  aspect. 


234  •  IWATA 


the  pen-based  force  display  is  easily  applied  to  the  design  of  3D 
shapes. 

The  human  hand  has  the  ability  of  six-degree-of-freedom 
motion  in  3D  space.  If  a  stx-degree-of-freedom  master  manipu¬ 
lator  is  built  using  serial  joints,  each  joint  must  support  the 
weight  of  upper  joints.  Because  of  this,  manipulator  hardware 
must  be  large;  we  use  parallel  mechanism  in  order  to  reduce 
size  and  weight  of  the  manipulator.  The  pen-based  force  dis¬ 
play  employs  two  three-degree-of-freedom  manipulators.  Both 
ends  of  the  pen  are  connected  to  these  manipulators.  Total 
degree-of-freedom  of  the  force  display  is  six.  Three-degree-of- 
freedom  force  and  three-degree-of-freedom  torque  are  applied 
at  the  pen.  An  overall  view  of  the  force  display  is  shown  in  Fig. 
12.7.  Each  three-degree-of-freedom  manipulator  is  composed  of 
pantograph  link.  By  this  mechanism,  the  pen  is  free  from  the 
weight  of  the  actuators. 

Figure  12.8  shows  a  diagram  of  mechanical  configuration  of 
the  force  display.  Joints  MAI,  MA2,  MA3,  MBl,  MB2,  and  MB3  are 
equipped  with  DC  motors  and  potentiometers.  Other  joints 
move  passively.  The  position  of  joint  A  and  B  is  measured  by  po¬ 
tentiometers.  A  three-dimensional  force  vector  is  applied  at  the 
joint  A  and  B.  The  joint  A  determines  the  position  of  the  pen 
point,  and  the  joint  B  determines  the  orientation  of  the  pen. 
Working  space  of  the  pen  point  is  a  part  of  a  spherical  volume 
whose  diameter  is  44  cm.  The  rotational  angle  around  the  axis 
of  the  pen  is  determined  by  the  distance  between  joint  A  and 
B.  A  screw-motion  mechanism  converts  rotational  motion  of  the 
pen  into  transition  of  the  distance  between  joint  A  and  B. 

Applied  force  and  torque  at  the  pen  is  generated  by  combi¬ 
nation  of  forces  at  the  point  A  and  B.  In  case  these  force  have 
the  same  direction,  translational  force  is  applied  to  the  user’s 
hand.  If  direction  of  the  forces  are  reverse,  torque  around  the 
yaw  axis  or  the  pitch  axis  is  generated.  If  two  forces  are  oppo¬ 
site,  torque  around  the  roll  axis  is  generated  by  the  screw-motion 
mechanism. 


TECHNOLOGIES  IN  FINGER/HAND  OUTPUT: 
OBIECT-ORIENTED-TYPE  HAPTIC  INTERFACE 


Basic  Idea  of  FEELEX 

The  author  demonstrated  haptic  interfaces  to  a  number  of  peo¬ 
ple,  and  found  that  some  of  them  were  unable  to  fully  experi¬ 
ence  virtual  objects  through  the  medium  of  synthesized  haptic 
sensation.  There  seem  to  be  two  reasons  for  this  phenomenon. 
First,  these  haptic  interfaces  only  allow  the  users  to  touch  the 
virtual  object  at  a  single  point  or  at  a  group  of  points.  These  con¬ 
tact  points  are  not  spatially  continuous,  due  to  the  hardware 
configuration  of  the  haptic  interfaces.  The  user  feels  a  reaction 
force  through  a  grip  or  thimble.  Exoskeletons  provide  more 
contact  points,  but  these  are  achieved  using  Velcro  bands  at¬ 
tached  to  a  specific  part  of  the  user’s  fingers,  which  are  not  con¬ 
tinuous.  Therefore,  these  devices  cannot  recreate  a  natural  in¬ 
teraction  sensation  when  compared  to  manual  manipulation  in 
the  real  world. 

The  second  reason  why  users  fail  to  perceive  the  sensation  is 
related  to  a  combination  of  the  visual  and  haptic  displays.  A  vi- 


FIGURE  12.7.  Pen-based  force  display. 


rotary  joint 


swivel  Joint 


FIGURE  12.8.  Mechanical  configuration  of  pen-based  force 
display. 


sual  image  is  usually  combined  with  a  haptic  interface  using  a 
conventional  CRT  or  projection  screen.  Thus,  the  user  receives 
visual  and  haptic  sensation  through  different  displays,  and 
therefore  has  to  integrate  the  visual  and  haptic  images  in  his  or 
her  brain.  Some  users,  especially  elderly  people,  have  difficulty 
in  this  integration  process. 
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Considering  these  problems,  new  interface  devices  have  de¬ 
veloped.  The  project  is  named  “FEELEX.”  The  word  FEELEX  is 
derived  from  a  conjunction  of  “feel”  and  “flex.”  The  major  goals 
of  this  project  are, 

•  to  provide  a  spatially  continuous  surface  that  enables  users  to 
feel  virtual  objects  using  any  part  of  the  fingers  or  even  the 
whole  palm 

•  to  provide  visual  and  haptic  sensations  simultaneously  using 
a  single  device  that  doesn’t  oblige  the  user  to  wear  any  extra 
apparatus 

A  new  configuration  of  visual/haptic  display  was  designed  to 
achieve  these  goals.  Figure  12.9  illustrates  the  basic  concept  of 
the  FEELEX.  The  device  is  composed  of  a  flexible  screen,  an 
array  of  actuators,  and  a  projector.  The  flexible  screen  is  de¬ 
formed  by  the  actuators  in  order  to  simulate  the  shape  of  virtual 
objects.  An  image  of  the  virtual  objects  is  projected  onto  the  sur¬ 
face  of  the  flexible  screen.  Deformation  of  the  screen  converts 
the  2D  image  from  the  projector  into  a  solid  image.  This  con¬ 
figuration  enables  users  to  touch  the  image  directly  using  any 
part  of  their  hand.  The  actuators  are  equipped  with  force  sen¬ 
sors  to  measure  the  force  applied  by  the  user.  The  hardness  of 
the  virtual  object  is  determined  by  the  relationship  between  the 
measured  force  and  its  position  on  the  screen.  If  the  virtual  ob¬ 
ject  is  soft,  a  large  deformation  is  caused  by  a  small,  applied  force. 


DESIGN  SPECIFICATION 
AND  IMPLEMENTATION  OF  PROTOTYPES 


FEELEX  1 

The  FEELEX  1,  developed  in  1997,  was  designed  to  enable 
double-handed  interaction  using  the  whole  palms.  Therefore, 
the  optimum  size  of  the  screen  was  determined  to  be  24  cm  x 
24  cm.  The  screen  is  connected  to  a  linear  actuator  array  that 
deforms  its  shape.  Each  linear  actuator  is  composed  of  a  screw 
mechanism  driven  by  a  DC  motor.  The  screw  mechanism  con¬ 
verts  the  rotation  of  an  axis  of  the  motor  to  the  linear  motion 
of  a  rod.  The  motor  must  generate  both  motion  and  a  reaction 
force  on  the  screen.  The  diameter  of  the  smallest  motor  that  can 
drive  the  screen  is  4  cm.  Therefore,  a  6  x  6  linear-actuator  array 
can  be  set  under  the  screen.  The  deformable  screen  is  made  of 
a  rubber  plate  and  a  white  nylon  cloth.  The  thickness  of  the  rub¬ 
ber  is  3  mm.  Figure  12.10  shows  an  overall  view  of  the  device. 

The  screw  mechanism  of  the  linear  actuator  has  a  self-lock 
function  that  maintains  its  position  while  the  motor  power  is  off 
Hard  virtual  wall  is  difficult  to  simulate  using  tool-handling-type 
force  displays.  Considerable  motor  power  is  required  to  gener¬ 
ate  the  reaction  force  from  the  virtual  wall,  which  often  leads 
to  uncomfortable  vibrations.  The  screw  mechanism  is  free  from 
this  problem.  A  soft  wall  can  be  represented  by  the  computer- 


FIGURE  1 2 . 9 .  Basic  design  of  FEELEX. 


FIGURE  12.10.  Overall  view  of  the  FEELEX  1 . 
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controlled  motion  of  the  linear  actuators  based  on  the  data  from 
the  force  sensors.  A  force  sensor  is  set  at  the  top  of  each  linear 
actuator.  Two  strain  gauges  are  used  as  a  force  sensor.  The  strain 
gauge  detects  small  displacements  of  the  top  end  of  the  linear 
actuator  caused  by  the  force  applied  by  the  user.  The  position  of 
the  top  end  of  the  linear  actuator  is  measured  by  an  optical  en¬ 
coder  connected  to  the  axis  of  the  DC  motor.  The  maximum 
stroke  of  the  linear  actuator  is  80  mm,  and  the  maximum  speed 
is  100  mm/s. 

The  system  is  controlled  via  a  personal  computer  (PC).  The 
DC  motors  are  interfaced  by  a  parallel  I/O  unit,  and  the  force 
sensors  are  interfaced  by  an  A/D  converter  unit.  The  force  sen¬ 
sors  provide  interaction  with  the  graphics.  The  position  and 
strength  of  the  force  applied  by  the  user  are  detected  by  a  6  X  6 
sensor  array.  The  graphics  projected  onto  the  flexible  screen  are 
changed  according  to  the  measured  force. 

FEELEX  2 

The  FEELEX  2  is  designed  to  improve  the  resolution  of  the  hap¬ 
tic  surface.  In  order  to  determine  the  resolution  of  the  linear  ac¬ 
tuators,  we  considered  the  situation  where  a  medical  doctor  pal¬ 
pates  a  patient.  After  interviewing  several  medical  doctors,  it 
was  discovered  that  they  usually  recognized  a  tumor  using  their 
index  finger,  middle  finger,  and  third  finger.  The  size  of  a  tumor 
is  perceived  by  comparing  it  to  the  width  of  their  fingers,  i.e. 
two-fingers  large  or  three-fingers  large.  Thus,  the  distance  be¬ 
tween  the  axis  of  the  linear  actuators  should  be  smaller  than  the 
width  of  a  finger.  Considering  the  condition  just  described,  the 
distance  is  set  to  be  8  mm.  This  8  mm  resolution  enables  the  user 
to  hit  at  least  one  actuator  when  he  or  she  touches  any  arbi¬ 
trary  position  on  the  screen.  The  size  of  the  screen  is  50  mm  X 
50  mm,  which  allows  the  user  to  touch  the  surface  using  three 
fingers. 

In  order  to  realize  8  mm  resolution,  a  piston-crank  mecha¬ 
nism  is  employed  for  the  linear  actuator.  The  size  of  the  motor 
is  much  larger  than  8  mm,  so  the  motor  should  be  placed  at  a 
position  offset  from  the  rod.  The  piston-crank  mechanism  can 
easily  achieve  this  offset  position.  Figure  12.11  illustrates  the 
mechanical  configuration  of  the  linear  actuator.  A  servo-motor 
from  a  radio-controlled  car  is  selected  as  the  actuator.  The  ro¬ 
tation  of  the  axis  of  the  servo-motor  is  converted  to  the  linear 
motion  of  the  rod  by  a  crank-shaft  and  a  linkage.  The  stroke  of 
the  rod  is  18  mm,  and  the  maximum  speed  is  250  mm/s.  The 
maximum  torque  of  the  servo-motor  is  3.2  Kg-cm,  which  applies 
a  1.1  Kgf  force  at  the  top  of  each  rod.  This  force  is  sufficient  for 
palpation  using  the  fingers. 

The  flexible  screen  is  supported  by  23  rods,  and  the  servo¬ 
motors  are  set  remotely  from  the  rods.  Figure  12.12  shows  an 
overall  view  of  the  FEELEX  2.  The  23  separate  sets  of  piston- 
crank  mechanisms  can  be  seen  in  the  picture. 

Figure  12.13  shows  the  top  end  of  the  rods.  The  photo  was 
taken  while  the  flexible  screen  is  off  The  diameter  of  each  rod  is 
6  mm.  A  strain  gauge  cannot  be  put  on  top  of  the  rod  because  of 
its  small  size.  Thus,  the  electric  current  going  to  each  servo¬ 
motor  is  measured  to  sense  the  force.  The  servo-motor  gener¬ 
ates  a  force  to  maintain  the  position  of  the  crank  shaft.  When 
the  user  applies  force  to  the  rod,  the  electric  current  on  the  mo- 


rubber  sponge 


FIGURE  12.11.  Piston-crank  mechanism. 


FIGURE  12.12.  Overall  view  of  the  FEELEX  2 . 
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FIGURE  12.13.  Top  end  of  the  rods. 


tor  increases  to  balance  the  force.  The  relationship  between  the 
applied  force  and  the  electric  current  is  measured.  The  applied 
force  at  the  top  of  the  rods  is  calculated  using  data  from  the 
electric  current  sensor.  The  resolution  of  the  force  sensing 
capability  is  40  gf 

Characteristics  of  the  FEELEX 

The  performance  of  existing  haptic  interfaces  is  usually  repre¬ 
sented  by  the  dynamic  range  of  force,  impedance,  inertia,  fric¬ 
tion,  etc.  However,  these  parameters  are  only  crucial  while  the 
device  is  attached  to  the  finger  or  hand.  In  the  case  of  the  tool¬ 
handling-type  of  haptic  interface  or  the  exoskeleton,  the  devices 
move  with  the  hand  even  though  the  user  doesn’t  touch  the  vir¬ 
tual  objects.  Therefore,  inertia  or  friction  degrades  the  usability 
and  the  dynamic  range  of  force  determines  the  quality  of  the  vir¬ 
tual  surface.  On  the  other  hand,  the  FEELEX  is  entirely  sepa¬ 
rate  from  the  user’s  hand,  so  its  performance  is  determined  by 
the  resolution  and  speed  of  the  actuators.  The  resolution  of  the 
actuator  corresponds  to  the  smoothness  of  the  surface  and  the 
speed  of  the  actuator  determines  the  motion  of  the  virtual  ob¬ 
ject.  The  EEELEX  2  has  improved  resolution  and  motion  speed 
compared  to  the  FEELEX  1.  Each  actuator  of  the  FEELEX  2  has 
a  stroke  rate  of  up  to  7  Hz,  which  can  simulate  the  motion  of  a 
very  fast  virtual  object.  The  rod  pushes  the  rubber  sponge  so 
that  the  user  feels  as  if  the  object  was  pulsating.  7  Hz  is  much 
faster  than  the  human  pulse  rate. 

The  major  advantage  of  the  FEELEX  is  that  it  allows  natural 
interaction  using  only  the  bare  hand.  In  SIGGRAPH’98,  1,992 
subjects  spontaneously  enjoyed  the  haptic  experience.  One  of 
the  subject  contents  of  the  FEELEX  1  system,  known  as  Anom- 
alocaris,  was  selected  as  a  long-term  exhibition  at  the  Ars  Elec- 
tronica  Center  (Linz,  Austria).  The  exhibition  has  been  popular 
among  visitors,  and  especially  children. 

Another  advantage  of  FEELEX  is  safety.  The  user  of  FEELEX 
doesn’t  wear  any  special  equipment  while  the  interaction  is  tak¬ 
ing  place.  The  exoskeleton  and  tool-handling-type  force  displays 
have  control  problems  in  their  contact  surface  for  the  virtual  ob¬ 
jects.  'Vibration  or  unwanted  forces  can  be  generated  back  to  the 


FIGURE  12.14.  Overall  view  of  Volflex. 


user,  which  is  sometimes  dangerous.  The  contact  surface  of  the 
FEELEX  is  physically  generated,  so  it  is  free  from  such  control 
problems. 

The  major  disadvantage  of  the  FEELEX  is  the  degree  of 
difficulty  in  its  implementation.  It  requires  a  large  number  of  ac¬ 
tuators  that  have  to  be  controlled  simultaneously.  The  drive 
mechanism  of  the  actuator  must  be  robust  enough  for  rough 
manipulation.  Since  the  FEELEX  provides  a  feeling  of  natural  in¬ 
teraction,  some  of  the  users  apply  large  forces.  Our  exhibit  at  the 
Ars  Electronica  Center  suffered  from  overload  of  the  actuators. 

Another  disadvantage  of  the  FEELEX  is  its  limitation  in  the 
shape  of  objects  that  can  be  displayed.  The  current  prototypes 
cannot  present  a  sharp  edge  on  a  virtual  object.  Furthermore, 
the  linear  actuator  array  can  only  simulate  the  front  face  of  ob¬ 
jects.  Some  of  the  participants  of  the  Anomalocaris  demonstra¬ 
tion  wanted  to  touch  the  rear  of  the  creature,  but  an  entirely 
new  mechanism  would  be  required  to  simulate  the  reverse  side 
of  the  object. 


VOLMETRIC  OBJECT-ORIENTED-TYPE 
OF  HAPTIC  INTERFACE 


Basic  Design  of  Volflex 

In  order  to  present  a  side  or  the  back  of  a  virtual  object,  we  de¬ 
signed  a  volumetric  object-oriented  haptic  interface.  Volflex  is 
a  new  haptic  interface  that  provides  the  user  with  a  physical  3D 
surface  for  interaction.  The  device  is  composed  of  a  group  of  air 
balloons.  The  balloons  fulfill  the  interaction  surface,  and  are 
arranged  in  a  body-centered  cubic  lattice.  A  tube  is  connected  to 
each  balloon,  and  the  volume  of  each  balloon  is  controlled  by  an 
air  cylinder.  The  tubes  are  connected  to  each  other  by  springs. 
This  mechanical  flexibility  enables  an  arbitrary  shape  of  the  in- 
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FIGURE  12.15.  Examples  of  deformation. 


teraction  surface.  Each  air  cylinder  is  equipped  with  a  pressure 
sensor  that  detects  force  applied  by  the  user.  According  to  the 
pressure  data,  the  device  is  programmed  to  perform  like  a  clay. 
Unlike  real  clay,  Volflex  allows  the  user  to  “Un  do”  an  operation. 

A  projector  is  set  above  the  balloons.  An  image  is  projected 
onto  the  surface  of  the  device,  not  on  the  user’s  hand.  We  de¬ 
veloped  a  mechanical  rotary  shutter  that  separates  the  projector 
and  camera.  The  camera  captures  the  user’s  hand,  which  is 
eliminated  from  the  projected  image. 

Virtual  Clay  Volflex 

Virtual  clay  is  one  of  the  ultimate  goals  of  interactive  technique 
of  3D  graphics.  Digital  tools  for  2D  paint  are  mature  technology. 
On  the  other  hand,  tools  for  3D-shape  manipulation  are  at  only 
preliminary  stage.  Shape  design  of  3D  objects  is  one  of  the  ma¬ 
jor  application  areas  of  haptic  interface.  Shape  design  of  3D  ob¬ 
ject  requires  good  sensation  of  haptics. 

The  Volflex  provides  an  effective  interface  device  for  manip¬ 
ulating  virtual  clay  using  a  lattice  of  air  balloons.  Two-dimen¬ 
sional  paint  tools  have  been  popular,  and  digital  picture  is  easy 
to  draw.  The  Volflex  is  a  new  digital  tool  for  making  3D  shapes. 
It  has  the  potential  to  be  a  revolution  in  industrial  design.  De¬ 
signers  use  their  palm  or  the  joints  of  their  fingers  to  deform  a 
clay  model  when  carrying  out  rough-design  tasks.  The  Volflex 
has  the  ability  to  support  such  natural  manipulation. 

The  Volflex  is  not  only  a  tool  for  3D-shape  design,  but  is  also 
an  interactive  artwork.  Physical  properties  of  the  virtual  object 
can  be  designed  by  programming  the  balloons’  controllers.  A 
projected  image  can  be  also  designed.  The  combination  of  hap¬ 
tic  and  visual  display  provides  a  new  platform  for  interactive 
sculpture. 


TECHNOLOGIES  IN  FULL-BODY  HAPTICS 


Treadmill-Based  Locomotion  Interface 
Basic  Design  of  the  Torus  Treadmill 

A  key  principle  of  treadmill-based  locomotion  interface  is  to 
make  the  floor  move  in  a  direction  opposite  of  that  of  the 
walker.  The  motion  of  the  floor  cancels  displacement  of  the 


walker  in  the  real  world.  The  major  problem  with  treadmill- 
based  locomotion  interface  is  that  there  is  not  yet  a  way  to  allow 
the  walker  to  change  direction.  An  omni-directional  active  floor 
enables  a  virtually  infinite  area.  In  order  to  realize  an  infinite 
walking  area,  geometric  configuration  of  an  active  floor  must  be 
chosen.  A  closed  surface  driven  by  actuators  has  an  ability  to 
create  an  unlimited  floor.  The  following  requirements  for  im¬ 
plementation  of  the  closed  surface  must  be  considered: 

1.  The  walker  and  actuators  must  be  put  outside  the  surface. 

2.  The  walking  area  must  be  a  plane  surface. 

3.  The  surface  must  be  made  of  a  material  that  stretches  very 

little. 

The  shape  of  a  closed  surface,  in  general,  is  a  surface  with 
holes.  If  the  number  of  holes  is  zero,  the  surface  is  a  sphere.  The 
sphere  is  the  simplest  infinite  surface.  However,  the  walking 
area  of  the  sphere  is  not  a  plane  surface.  A  very  large  diameter  is 
required  to  make  a  plane  surface  on  a  sphere,  which  restricts 
implementation  of  the  locomotion  interface. 

A  closed  surface  with  one  hole  like  a  doughnut  is  called  torus. 
A  torus  can  be  implemented  using  a  group  of  belts,  which  make 
a  plane  surface  for  the  user  to  walk  on.  A  closed  surface  with 
more  than  one  hole  cannot  make  a  plane  walking  surface.  Thus, 
the  torus  is  the  only  form  suitable  for  a  locomotion  interface. 

Mechanism  and  Performance 

The  Torus  Treadmill  operates  using  a  group  of  belts  con¬ 
nected  to  each  other.  The  Torus  Treadmill  is  realized  by  these 
belts  (Iwata,  1999).  Figures  12.16  and  12.17  illustrates  basic 
structure  of  the  Torus  Treadmill,  which  employs  12  treadmills. 
These  treadmills  move  the  walker  along  in  an  “X”  direction. 
Twelve  treadmills  are  connected  side  by  side  and  driven  in  a  per¬ 
pendicular  direction.,  a  motion  that  moves  the  walker  along  a 
“Y”  direction. 

Figure  12.18  shows  the  apparatus.  Twelve  treadmills  are  con¬ 
nected  to  four  chains  and  mounted  on  four  rails.  The  chain  dri¬ 
ves  the  walker  along  the  Y  direction.  The  rail  supports  the 
weight  of  the  treadmills  and  the  walker.  An  AC  motor  is  em¬ 
ployed  to  drive  the  chains.  The  power  of  the  motor  is  200  W  and 
is  controlled  by  an  inverter.  The  maximum  speed  of  rotation  is 
1.2  m/s.  The  maximum  acceleration  is  1.0  m/s^.  The  decelera¬ 
tion  caused  by  friction  is  1.5  m/s^.  Frequency  characteristics  are 
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FIGURE  12.16.  Structure  of  Torus  Treadmill  (X  motion). 


FIGURE  12.18.  Torus  Treadmill. 


A  problem  with  this  mechanical  configuration  is  the  gap  be¬ 
tween  the  belts  in  the  walking  area.  In  order  to  minimize  the 
gap,  we  put  a  driver  unit  of  each  treadmill  alternatively.  The  gap 
is  only  2  mm  wide  in  this  design. 


Control  Algorithm  of  the  Torus  Treadmill 


A  scene  of  the  virtual  space  is  generated  corresponding  with 
the  results  of  motion  tracking  of  the  feet  and  head.  The  motion 
of  the  feet  and  head  is  measured  by  a  Polhemus  FASTRACK.  The 
device  measures  a  six-degree-of-freedom  motion.  The  sampling 
rate  of  each  point  is  20  Hz.  A  receiver  is  attached  to  each  knee. 
We  cannot  put  the  sensors  near  the  motion  floor  because  a  steel 
frame  distorts  magnetic  field.  The  length  and  direction  of  a  step 
is  calculated  by  the  data  from  those  sensors.  The  user’s  view¬ 
point  in  virtual  space  moves  in  correspondence  with  the  length 
and  direction  of  the  steps. 

To  keep  the  walker  in  the  center  of  the  walking  area,  the 
Torus  Treadmill  must  be  driven  in  correspondence  with  the 
walker.  A  control  algorithm  is  required  to  achieve  safe  and  nat¬ 
ural  walking.  From  our  experience  in  the  Virtual  Perambulator 
project,  the  walker  should  not  be  connected  to  a  harness  or  me¬ 
chanical  linkages,  since  such  devices  restrict  the  motion  and  in¬ 
hibit  natural  walking.  The  control  algorithm  of  the  Torus  Tread- 


limited  by  a  circuit  protector  of  the  motor  driver.  The  maximum 
switching  frequency  is  0.8  Hz. 

Each  treadmill  is  equipped  with  an  AC  motor.  In  order  to 
shorten  the  length  of  the  treadmill,  the  motor  is  put  underneath 
the  belt.  The  power  of  each  motor  is  80  W  and  is  controlled  by 
an  inverter.  The  maximum  speed  of  each  treadmill  is  1.2  m/s. 
The  maximum  acceleration  is  0.8  m/s^.  The  deceleration  caused 
by  friction  is  1.0  m/s^.  The  width  of  each  belt  is  250  mm  and  the 
overall  walkable  area  is  Im  X  Im. 


FIGURE  12.17.  Structure  of  Torus  Treadmill  (Y  motion). 
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mill  must  be  safe  enough  to  allow  removal  of  the  harness  from 
the  walker.  At  the  final  stage  of  the  Virtual  Perambulator  Project, 
we  succeeded  in  removing  the  harness  using  a  hoop  frame.  The 
walker  can  freely  walk  and  turn  around  in  the  hoop,  which  sup¬ 
ports  the  walker’s  body  while  he  or  she  slides  his  or  her  feet.  We 
simulated  the  function  of  the  hoop  in  the  control  algorithm  of 
the  Torus  Treadmill  by  putting  circular  deadzone  in  the  center  of 
the  walking  area.  If  the  walker  steps  out  of  the  area,  the  floor 
moves  in  the  opposite  direction,  so  that  the  walker  is  carried 
back  into  the  deadzone. 


Foot-Pad-Based  Locomotion  Interface 

Methods  of  Presentation  of  Uneven  Surface 

One  of  the  major  research  issues  in  locomotion  interface  is 
the  presentation  of  uneven  surface.  Locomotion  interfaces  are 
often  applied  for  simulation  of  buildings  or  urban  spaces.  Those 
spaces  usually  include  stairs.  A  walker  should  be  provided  with 
a  sense  of  climbing  up  or  going  down  those  stairs.  In  some  ap¬ 
plications  of  locomotion  interface,  such  as  training  simulators  or 
entertainment  facilities,  rough  terrain  should  be  presented. 

The  presentation  of  a  virtual  staircase  was  tested  at  the  early 
stage  of  the  Virtual  Perambulator  project  (Iwata  &  Fujii,  1996). 
A  string  was  connected  to  the  roller  skate  of  each  foot.  The 
string  is  pulled  by  a  motor.  'When  the  walker  climbed  up  a  stair, 
the  forward  foot  was  pulled  up.  When  the  walker  went  down 
the  stairs,  the  backward  foot  was  pulled  up.  However,  this 
method  was  not  considered  successful  because  of  instability. 

Later,  a  6  DOF  motion  platform  was  applied  to  a  final  version 
of  the  Virtual  Perambulator,  where  a  user  walks  in  a  hoop  frame. 
The  walker  stood  on  the  top  plate  of  the  motion  platform.  Pitch 
and  heave  motion  of  the  platform  were  used.  When  the  walker 
stepped  forward  to  climb  a  stair,  the  pitch  angle  and  vertical 
position  of  the  floor  increased.  After  finishing  the  climbing 
motion,  the  floor  went  back  to  the  neutral  position.  When  the 
walker  stepped  forward  to  go  down  a  stair,  the  pitch  angle  and 
vertical  position  of  the  floor  decreases.  This  inclination  of  the 
floor  is  intended  to  present  height  differences  between  the  feet. 
The  heave  motion  is  intended  to  simulate  vertical  acceleration. 
However,  this  method  failed  in  simulation  of  stairs,  mainly  be¬ 
cause  the  floor  was  flat. 

A  possible  method  to  create  a  difference  in  height  between 
the  feet  is  the  application  of  two  large  manipulators,  such  as 
the  BiPort.  A  4  DOF  manipulator  driven  by  hydraulic  actuators  is 
connected  to  each  foot.  The  major  problem  with  this  method 
is  the  way  in  which  the  manipulators  trace  the  turning  motion 
of  the  walker.  When  the  walker  turns  around,  two  manipulators 
interfere  with  each  other. 

The  Torus  Treadmill  provides  a  natural  turning  motion  in 
which  the  walker  can  physically  turn  about  on  the  active  floor. 
This  turning  motion  using  the  feet  is  a  major  contribution  to  hu¬ 
man  spatial  recognition  performance.  Vestibular  and  proprio¬ 
ceptive  feedback  is  essential  to  the  sense  of  orientation  (Iwata  & 
Yoshida,  1999).  The  Torus  Treadmill  can  be  modified  for  simu¬ 
lation  of  an  uneven  surface.  If  we  install  an  array  of  linear  actu¬ 
ators  on  each  treadmill,  an  uneven  floor  can  be  realized  by  con¬ 
trolling  the  length  of  each  linear  actuator.  However,  this  method 


is  almost  impossible  to  implement,  because  a  very  large  number 
of  linear  actuators  are  required  to  cover  the  surface  of  the  torus¬ 
shaped  treadmills  and  control  of  the  signal  for  each  actuator 
must  be  transmitted  wirelessly. 


Basic  Design  of  the  CaitMaster 

A  new  locomotion  interface  that  simulates  an  omni-directional 
uneven  surface  has  been  designed,  which  is  called  the  “Gait- 
Master.”  The  core  elements  of  the  device  are  two  6  DOF  motion- 
bases  mounted  on  a  turntable.  Figure  12.19  illustrates  the  basic 
configuration  of  the  GaitMaster. 

A  walker  stands  on  the  top  plate  of  the  motion  base.  Each 
motion  base  is  controlled  to  trace  the  position  of  the  foot.  The 
turntable  is  controlled  to  trace  the  orientation  of  the  walker. 
The  motion  of  the  turntable  removes  interference  between  the 
two  motion  bases. 

The  X  and  Y  motion  of  the  motion  base  traces  the  horizontal 
position  of  the  feet  and  cancels  its  motion  by  moving  in  the  op¬ 
posite  direction.  The  rotation  around  the  yaw  axis  traces  the  hor¬ 
izontal  orientation  of  the  feet.  The  Z  motion  traces  vertical  posi¬ 
tion  of  the  feet  and  cancels  its  motion.  The  rotation  around  the 
roll-and-pitch  axis  simulates  the  inclination  of  a  virtual  surface. 
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Control  Algorithm  of  the  GaitMaster 

The  control  algorithm  must  keep  the  position  of  the  walker  at 
the  neutral  position  of  the  GaitMaster.  In  order  to  keep  the  po¬ 
sition  maintained,  the  motion  platforms  have  to  cancel  the  mo¬ 
tion  of  the  feet.  The  principal  of  the  cancellation  is  as  follows: 

1.  Suppose  the  right  foot  is  at  the  forward  position  and  the  left 
foot  is  at  the  backward  position  while  walking. 

2.  When  the  walker  steps  forward  with  the  left  foot,  the  weight 
of  the  walker  is  laid  on  the  right  foot. 

3.  The  motion  platform  of  the  right  foot  goes  backward  in  ac¬ 
cordance  with  the  displacement  of  the  left  foot,  so  that  the 
central  position  of  the  walker  is  maintained. 

4.  The  motion  platform  of  the  left  foot  follows  the  position  of 
the  left  foot.  When  the  walker  finishes  stepping  forward,  the 
motion  platform  supports  the  left  foot. 

If  the  walker  climbs  up  or  goes  down  stairs,  a  similar  proce¬ 
dure  can  be  applied.  The  vertical  motion  of  the  feet  is  canceled 
using  the  same  principle.  The  vertical  displacement  of  the  for¬ 
ward  foot  is  canceled  in  accordance  with  the  motion  of  the 
backward  foot,  so  that  the  central  position  of  the  walker  is  main¬ 
tained  at  the  neutral  height.  Figure  12.20  illustrates  the  method 
of  canceling  the  climbing-up  motion. 

The  turntable  rotates  so  that  the  two  motion  platforms  can 
trace  the  rotational  motion  of  the  walker.  If  the  walker  changes 
the  direction  of  his  or  her  steps,  the  turntable  rotates  to  trace 
the  walker’s  orientation.  The  orientation  of  the  turntable  is  de¬ 
termined  according  to  direction  of  the  feet.  The  turntable  ro¬ 
tates  so  that  its  orientation  is  at  the  middle  of  the  feet.  The 


axis  of  turntable 


walker  can  physically  turn  around  on  the  GaitMaster  using  this 
control  algorithm  of  the  turntable. 

Prototype  GaitMaster 

Figure  12.21  shows  the  prototype  GaitMaster.  In  order  to 
simplify  the  mechanism  of  the  motion  platform,  the  surface  of 
the  virtual  space  was  defined  as  sets  of  plainer  surfaces.  Most 
buildings  or  urbane  spaces  can  be  simulated  without  inclining  of 
the  floor.  Thus,  we  can  neglect  the  roll-and-pitch  axis  of  the  mo¬ 
tion  platforms.  Each  platform  of  the  prototype  GaitMaster  is 
composed  of  three  linear  actuators  top  of  which  a  yaw  joint  is 
mounted.  We  disassembled  a  6  DOF  Stewart  platform  and  made 
two  XYZ  stages.  Three  linear  guides  are  applied  to  support  the 
orientation  of  the  top  plate  of  the  motion  platform.  The  payload 
of  each  motion-platform  is  approximately  150  Kg.  A  rotational 
joint  around  the  yaw  axis  is  mounted  on  each  motion  platform. 
The  joint  is  equipped  with  a  spring  that  moves  the  feet  in  the 
neutral  direction. 

A  turntable  is  developed  using  a  large  DD  motor.  The  maxi¬ 
mum  angular  velocity  is  500  deg/sec.  A  3  DOF  goniometer  is 
connected  to  each  foot.  The  goniometer  measures  back-and- 
forth  and  up-and-down  motion  as  well  as  yaw  angle.  The  control 
algorithm  mentioned  in  the  former  section  was  implemented 
and  succeeded  in  the  presentation  of  virtual  staircases. 


FIGURE  12.20.  Canceling  the  climbing-up  motion. 


FIGURE  12.21.  GaitMaster. 
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Robot-Tile-Based  Locomotion  Interface 
CirculaFloor  Project 

Locomotion  interfaces  often  require  bulky  hardware,  since 
they  have  to  carry  the  user’s  whole  body.  Also,  the  hardware  is 
not  easy  to  reconfigure  to  improve  its  performance  or  to  add 
new  functions.  Considering  these  issues,  the  goals  of  the  Circula 
Floor  project  are, 

1.  to  develop  compact  hardware  for  the  creation  of  an  infinite 
surface  for  walking;  The  major  disadvantage  of  existing  lo¬ 
comotion  interfaces  is  their  difficult  installation.  We  need  to 
solve  this  problem  for  a  demonstration  at  SIGGRAPH. 

2.  to  develop  scalable  hardware  architecture  for  future  im¬ 
provement  of  the  system;  Another  disadvantage  of  existing 
locomotion  interfaces  is  the  difficulty  of  improving  the  sys¬ 
tem.  We  have  to  design  a  new  hardware  architecture  that  al¬ 
lows  us  to  easily  upgrade  the  actuation  mechanism  and  add 
new  mechanisms  for  the  creation  of  uneven  surfaces. 

In  order  to  achieve  these  goals,  we  designed  a  new  configura¬ 
tion  for  a  locomotion  interface  by  using  a  set  of  omni-directional 
movable  tiles.  Each  tile  is  equipped  with  a  holonomic  mecha¬ 
nism  that  achieves  omni-directional  motion.  An  infinite  surface 
is  simulated  by  the  circulation  of  the  movable  tiles.  The  motion 
of  the  feet  is  measured  by  position  sensors.  The  tile  moves  op¬ 
posite  to  the  measured  direction  of  the  walker  so  that  the  mo¬ 
tion  of  the  step  is  canceled.  The  position  of  the  walker  is  fixed  in 
the  real  world  by  this  computer-controlled  motion  of  the  tiles. 
The  circulation  of  the  tiles  has  the  ability  to  cancel  the  displace¬ 
ment  of  the  walker  in  an  arbitrary  direction.  Thus,  the  walker 
can  freely  change  direction  while  walking.  Figure  12.22  shows 
overall  view  of  CirculaFloor 

The  CirculaFloor  is  a  new  method  that  takes  advantage  of 
both  the  treadmill  and  footpad.  It  creates  an  infinite  omni¬ 
directional  surface  using  a  set  of  movable  tiles.  The  combination 
of  tiles  provides  a  sufficient  area  for  walking,  and  thus  precision 


FIGURE  12.22.  CirculaFloor. 


tracing  of  the  foot  position  is  not  required.  It  has  the  potential 
to  create  an  uneven  surface  by  mounting  an  up-and-down 
mechanism  onto  each  tile. 

Method  of  Creating  Infinite  Surface 

The  current  method  of  circulating  the  movable  tiles  is  de¬ 
signed  to  satisfy  the  following  conditions:  (a)  Two  of  the  mov¬ 
able  tiles  are  used  for  pulling  back  the  user  to  the  center  of  the 
dead  zone;  (b)  The  rest  of  the  movable  tiles  are  used  to  create 
a  new  front  surface;  (c)  These  tiles  are  moved  the  shortest  dis¬ 
tances  to  the  next  destination,  while  they  avoid  colliding  with 
other  tiles;  (d)  The  control  program  allocates  all  destinations 
to  the  tiles,  when  the  tiles  reach  their  destination;  (e)  The  tiles 
don’t  rotate  corresponding  to  walking  direction  to  simplify  the 
algorithm. 

Considering  the  above  conditions,  the  circulation  method  is 
varied  corresponding  to  the  walking  direction.  Three  modes, 
“alternating  circulation,”  “unidirectional  circulation,”  and  “cross 
circulation”  are  designed  corresponding  to  the  direction  (Fig. 
12.23).  Representative  motion  of  each  mode  is  illustrated  in  Fig. 
12.24  and  12.25. 

Alternating  circulation  (Fig.  12.24):  This  mode  is  adopted  for 
the  directions  between  ±15  deg  and  ±75-105  deg.  The  tiles 
used  for  creating  a  new  front  surface  (white-colored  tiles  in 
Figure  12.3)  move  around  to  the  front  of  the  tiles  for  alterna- 


FIGURE  12.23.  Pulling-back  modes  corresponding  to  walking 
direction. 


FIGURE  12.24.  Circulation  of  movable  tiles  in  alternating  mode. 
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FIGURE  12.25.  Circulation  of  movable  tiles  in  unidirectional 
mode. 


tively  pulling  back  (in  Figure  12.24,  gray-colored  tiles)  from  left 
(Path-l)/right  (Path-2)  sides. 

Unidirectional  circulation  (Fig.  12.25):  This  mode  is  adopted 
for  the  directions  between  ±15-30  deg  and  ±60-75  deg.  The 
tiles  used  for  creating  a  new  front  surface  move  around  to  the 
right/left  front  of  the  tiles  for  pulling  back  with  a  unidirectional 
circulation. 

Cross-circulation  (Fig.  12.26):  This  mode  is  adopted  for  the 
directions  between  ±30-60  deg.  The  tiles  used  for  creating  a 
new  front  surface  move  around  to  the  left/right  front  (Path-1)  or 
the  left/right  sides  (Path-2)  of  the  tiles  for  pulling  back. 

When  a  user  of  the  CirculaFloor  switches  the  walking  direc¬ 
tion,  the  control  program  calculates  the  nearest  phase  of  each 
tile  by  using  a  template-matching  technique  corresponding  to 
the  new  direction.  Then  the  tiles  take  the  shortest  way  to  their 
destinations. 


APPLICATION  AREA  FOR  HAPTIC  INTERFACE 


Application  Area  for  Finger/Hand  Haptics 

Medicine.  Medical  applications  for  haptic  interfaces  are 
currently  growing  rapidly.  Various  surgical  simulators  have  been 
developed  using  a  tool-handling-type  force  display.  We  developed 
a  simulator  for  laparoscopic  surgery  using  the  Haptic  Master.  Sim¬ 
ulator  software  using  PHANToM  are  commercially  available. 

Palpation  is  typically  used  in  medical  examinations.  The 
FEELEX  2  is  designed  to  be  used  as  a  palpation  simulator.  If  we 
display  a  virtual  tumor  based  on  a  CT  or  an  MRI  image,  a  medical 
doctor  can  palpate  the  internal  organs  before  surgery,  and  this 
technique  can  be  also  applied  to  telemedicine.  Connecting  two 
FEELEXs  together  via  a  communication  line  would  allow  a  doc¬ 
tor  to  palpate  a  patient  remotely. 

3D-shape  modeling.  The  design  of  3D  shapes  defi¬ 
nitely  requires  haptic  feedback.  A  typical  application  of  the  tool¬ 
handling-type  force  display  is  in  3D-shape  modeling.  One  of  the 
most  popular  applications  of  the  PHANToM  system  is  as  a  mod¬ 


FIGURE  12.26.  Circulation  of  movable  tiles  in  cross  mode. 


eling  tool.  Such  a  tool-handling-type  force  display  allows  a  user 
to  point  contact,  and  point-contact  manipulation  is  most  suited 
for  precision-modeling  tasks.  However,  it  isn’t  effective  when 
the  modeling  task  requires  access  to  the  whole  shape.  Design¬ 
ers  use  their  palm  or  the  joints  of  their  fingers  to  deform  a  clay 
model  when  carrying  out  rough-design  tasks.  The  FEELEX  has 
the  ability  to  support  such  natural  manipulation. 

HUI  (Haptic  User  Interface).  Today,  touchscreens  are 
widely  used  in  automatic  teller  machines,  ticketing  machines, 
information  kiosks,  and  so  on.  A  touchscreen  enables  an  intu¬ 
itive-user  interface,  although  it  lacks  haptic  feedback.  Users  can 
see  virtual  buttons,  but  they  can’t  feel  them.  This  is  a  serious 
problem  for  a  blind  person.  The  FEELEX  provides  a  barrier-free 
solution  to  the  touchscreen-based  user  interface.  Figure  12.13 
shows  an  example  of  a  haptic  touch-screen  using  the  FEELEX  1. 

Art.  Interactive  art  may  be  one  of  the  best  applications  of 
the  FEELEX  system.  As  we  discussed  earlier,  the  Anomalocaris  has 
been  exhibited  in  a  museum  in  Austria.  It  succeeded  in  evoking 
haptic  interaction  with  many  visitors.  The  FEELEX  can  be  used  for 
interactive  sculptures.  Visitors  are  usually  prohibited  from  touch¬ 
ing  physical  sculptures.  However,  they  can  not  only  touch  sculp¬ 
tures  based  around  FEELEX,  they  can  also  deform  them. 

Application  Areas  for  Locomotion  Interfaces 

As  a  serious  application  of  our  locomotion  interface,  we  are 
working  with  the  Ship  Research  Laboratory  to  develop  an  “evac¬ 
uation  simulator  (Yamao  et  al.,  1996).’’  The  Ship  Research  Labo¬ 
ratory  is  a  national  research  institute  that  belongs  to  the  min¬ 
istry  of  transportation  of  Japan.  Analysis  of  evacuation  of 
passengers  during  maritime  accidents  is  very  important  for  ship 
safety.  However,  it  is  impossible  to  carry  out  experiments  with 
human  subjects  during  an  actual  disaster.  Therefore,  they  intro¬ 
duced  virtual-reality  tools  for  simulation  of  disaster  in  order  to 
analyze  evacuation  of  passengers.  They  built  a  virtual  ship  that 
models  the  generation  of  smoke  and  the  inclination  of  the  ves¬ 
sel.  Experiments  of  evacuation  are  carried  out  for  the  construe- 
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tion  of  mathematical  models  of  passengers’  behavior  in  a  disas¬ 
ter.  The  Torus  Treadmill  will  be  effective  in  such  experiments. 

Locomotion  by  walking  is  intuitive  and  inevitable  in  the  study 
of  human  behavior  in  virtual  environments.  We  are  applying  the 
system  to  research  a  human  model  of  evacuation  in  maritime  ac¬ 
cidents.  The  GaitMaster  can  be  applied  to  other  areas  than  the 
Torus  Treadmill.  Its  application  may  include  rehabilitation  of 
walking  or  simulation  of  mountain  climbing. 


CONCLUSION  AND  FUTURE  PROSPECTS 


This  chapter  describes  major  topics  of  haptic  interfaces.  A  num¬ 
ber  of  methods  have  been  proposed  implementing  haptic  in¬ 
terfaces.  Future  research  in  this  field  will  include  following  two 
issues:  (a)  safety  issues,  and  (b)  psychology  in  haptics. 

Safety  Issues 

Safety  is  an  important  issue  in  haptic  interface.  Inadequate  con¬ 
trol  of  actuator  may  injure  the  user.  The  exoskeleton  and  tool- 
handling-type-force  displays  have  control  problems  in  their  con¬ 
tact  surface  for  the  virtual  objects.  "Vibration  or  unwanted  forces 
can  be  generated  back  to  the  user,  which  is  sometimes  danger¬ 
ous.  One  of  the  major  advantages  of  FEELEX  is  safety.  The  user 
of  FEELEX  doesn’t  wear  any  special  equipment  while  the  inter¬ 
action  is  taking  place.  The  contact  surface  of  the  FEELEX  is 
physically  generated,  so  it  is  free  from  such  control  problems. 


Locomotion  interface  has  much  more  important  safety  is¬ 
sues.  The  system  supports  the  full  body  of  the  user,  so  that  in¬ 
adequate  control  causes  major  damage  to  the  user.  Specialized 
hardware  for  keeping  the  walker  safe  must  be  developed. 

Psychology  in  Haptics 

There  have  been  many  findings  regarding  haptic  sensation. 
Most  of  these  are  related  to  skin  sensation,  and  very  few  re¬ 
search  activities  include  muscle  sensation.  Among  these,  Led- 
erman  and  Klatzky’s  work  was  closely  related  to  the  design  of 
the  force  display  (Lederman  &  Klatzky  1987).  Their  latest  work 
involved  spatially  distributed  forces  (Lederman  &  Klatzky,  1999) . 
They  performed  an  experiment  involving  palpation.  The  sub¬ 
jects  were  asked  to  find  a  steel  ball  placed  underneath  a  foam- 
rubber  cover.  The  results  showed  that  steel  balls  smaller  than 
8  mm  in  diameter  decreased  the  score.  This  finding  supports 
our  specification  for  the  FEELEX  2  in  which  the  distance  be¬ 
tween  rods  is  8  mm.  This  kind  of  psychological  study  will  sup¬ 
port  future  development  of  haptic  interface. 

Haptics  is  indispensable  for  human  interaction  in  the  real 
world.  However,  haptics  is  not  commonly  used  in  the  field  of 
HCI.  Although  there  are  several  commercially  avalable  haptic  in¬ 
terfaces,  they  are  expensive  and  limited  in  their  fuction.  Image 
display  has  a  history  of  over  100  years.  Today,  image  displays, 
such  as  television  and  movies,  are  used  in  everyday  life.  On  the 
other  hand,  haptic  interface  has  only  10-year  history.  There  are 
hazards  to  overcome  for  popular  use  of  haptic  interface.  How¬ 
ever,  haptic  interface  is  a  new  frontier  of  media  technology  and 
will  definitly  contribute  to  human  life. 
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INTRODUCTION  AND  A  BRIEF  HISTORY 
OF  NONSPEECH  SOUND  AND  HCI 


Our  sense  of  hearing  is  very  powerful,  and  we  can  extract  a 
wealth  of  information  from  the  pressure  waves  entering  our 
ears  as  sound.  Sound  gives  us  a  continuous,  holistic  contact 
with  our  environments;  we  hear  a  rich  set  of  sounds  from  our 
interactions  with  objects  close  to  us,  familiar  sounds  of  unseen 
friends  or  family  nearby,  noises  of  things  to  avoid,  such  as  traffic, 
and  noises  of  things  to  attend  to,  such  as  a  ringing  telephone. 
Nonspeech  sounds  (such  as  music,  environmental  sounds,  and 
sound  effects)  give  us  different  types  of  information  than  those 
provided  by  speech;  they  can  be  more  general  and  more  ambi¬ 
ent  where  speech  is  precise  and  requires  more  focus  (see  chap¬ 
ter  XXX  for  more  on  speech  output).  Nonspeech  sounds  com¬ 
plement  speech  in  the  same  way  that  visual  icons  complement 
text.  For  example,  icons  can  present  information  in  a  small 
amount  of  space  as  compared  to  text;  nonspeech  sounds  can 
present  information  in  a  small  amount  of  time  as  compared  to 
speech.  There  is  less  research  about  nonspeech  than  speech 
interfaces,  and  this  chapter  will  show  something  of  where  it  has 
been  used  and  of  what  it  is  capable. 

The  combination  of  visual  and  auditory  feedback  at  the  user 
interface  is  a  powerful  tool  for  interaction.  In  everyday  life, 
these  primary  senses  combine  to  give  complementary  informa¬ 
tion  about  the  world.  Our  visual  system  gives  us  detailed  infor¬ 
mation  about  a  small  area  of  focus,  whereas  our  auditory  system 
provides  general  information  from  all  around,  alerting  us  to 
things  outside  our  view.  Blattner  and  Dannenberg  (1992)  dis¬ 
cussed  some  of  the  advantages  of  using  this  approach  in  multi¬ 
media/multimodal  computer  systems: 

In  our  interaction  with  the  world  around  us,  we  use  many  senses. 
Through  each  sense,  we  interpret  the  external  world  using  representa¬ 
tions  and  organizations  to  accommodate  that  use.  The  senses  enhance 
each  other  in  various  ways,  adding  synergies  or  further  informational 
dimensions,  (p.  xviii) 

These  advantages  can  be  brought  to  the  multimodal  (or  mul- 
tisensory)  human-computer  interface  by  the  addition  of  non¬ 
speech  auditory  output  to  standard  graphical  displays  (see 
chapter  21  for  more  on  multimodal  interaction).  Whilst  direct¬ 
ing  our  visual  attention  to  one  task,  such  as  editing  a  document, 
we  can  still  monitor  the  state  of  others  on  our  machine  using 
sound.  Currently,  almost  all  information  presented  by  comput¬ 
ers  uses  the  visual  sense.  This  means  information  can  be  missed 
because  of  visual  overload  or  because  the  user  is  not  looking  in 
the  right  place  at  the  right  time.  A  multimodal  interface  that  in¬ 
tegrated  information  output  to  both  senses  could  capitalize  on 
the  interdependence  between  them  and  present  information 
in  the  most  efficient  way  possible. 

The  classical  uses  of  nonspeech  sound  can  be  found  in  the 
human  factors  literature  (see  McCormick  &  Sanders,  1982). 
Here  it  is  used  mainly  for  alarms,  warnings,  and  status  informa¬ 
tion.  Buxton  (1989)  extended  these  ideas  and  suggested  that 
encoded  messages  could  be  used  to  present  more  complex 
information  in  sound.  This  type  of  auditory  feedback  will  be 
considered  here. 


The  other  main  use  of  nonspeech  sound  is  in  music  and 
sound  effects  for  games  and  other  multimedia  applications. 
These  kinds  of  sounds  indicate  to  the  user  something  about 
what  is  going  on  and  try  to  create  a  mood  for  the  piece  (much 
as  music  does  in  film  and  radio) .  As  Blattner  and  Dannenberg 
(1992)  said,  “Music  and  other  sound  in  film  or  drama  can  be 
used  to  communicate  aspects  of  the  plot  or  situation  that  are 
not  verbalized  by  the  actors.  Ancient  drama  used  a  chorus  and 
musicians  to  put  the  action  into  its  proper  setting  without 
interfering  with  the  plot”  (p.  xix).  Work  on  auditory  output  for 
interaction  takes  this  further  and  uses  sound  to  present  infor¬ 
mation,  such  as  things  that  the  user  might  not  otherwise  see  or 
important  events  that  the  user  might  not  notice. 

The  use  of  sound  to  convey  information  in  computers  is  not 
new.  In  the  early  days,  programmers  used  to  attach  speakers 
to  their  computers’  busses  or  program  counters  (Thimbleby 
1990).  The  speaker  would  click  each  time  the  program  counter 
was  changed.  Programmers  would  get  to  know  the  patterns  and 
rhythms  of  sound  and  could  recognize  what  the  machine  was 
doing.  Another  everyday  example  is  the  sound  of  a  hard  disk. 
Often,  users  can  tell  when  a  save  or  copy  operation  has  com¬ 
pleted  by  the  noise  their  disks  make.  This  allows  them  to  do 
other  things  while  waiting  for  the  copy  to  finish.  Nonspeech 
sound  is  therefore  an  important  information  provider,  giving 
users  knowledge  about  things  in  their  systems  that  they  may 
not  see. 

Two  important  events  kick-started  the  research  area  of  non¬ 
speech  auditory  output:  the  first  was  the  special  issue  of  the  HCI 
journal  on  nonspeech  sound,  edited  by  Buxton  (1989).  This  laid 
the  foundations  for  some  of  the  key  work  in  the  area;  it  included 
papers  by  Blattner  on  earcons  (Blattner,  Sumikawa,  &  Green¬ 
berg,  1989),  Gaver  (1989)  on  auditory  icons,  and  Edwards  (1989) 
on  Soundtrack.  The  second  event  was  the  First  International 
Conference  on  Auditory  Display  (ICAD’92)  held  in  Santa  Fe  in 
1992  (Kramer,  1994b).  For  the  first  time,  this  meeting  brought 
the  main  researchers  interested  in  the  area  together  (see 
http://www.icad.org  for  the  proceedings  of  the  ICAD  confer¬ 
ences).  Resulting  from  these  two  events  was  a  large  growth  in 
research  in  the  area  during  the  1990s  that  continues  today. 

The  rest  of  this  chapter  goes  into  detail  on  all  aspects  of 
auditory  interface  design.  It  presents  some  of  the  advantages 
and  disadvantages  of  using  sound  at  the  interface.  Then,  a 
brief  introduction  to  psychoacoustics,  or  the  study  of  the  per¬ 
ception  of  sound,  follows.  This  is  followed  by  an  introduction 
to  sound  sampling  and  synthesis  techniques  needed  for  audi¬ 
tory  interfaces.  The  next  sections  describe  the  main  tech¬ 
niques  that  are  used  for  auditory  information  presentation  and 
some  of  the  main  applications  of  sound  in  human-computer 
interaction.  The  chapter  finishes  with  some  conclusions  about 
the  state  of  research  in  this  area. 


WHY  USE  NONSPEECH  SOUND  IN  HCI? 


Some  Advantages  of  Sound 

It  is  advantageous  to  use  sound  at  the  user  interface  for  many 
reasons. 
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Vision  and  hearing  are  interdependent.  Our  visual  and 
auditory  systems  work  well  together.  Our  eyes  provide 
high-resolution  information  around  a  small  area  of  focus 
(with  peripheral  vision  extending  further).  According  to 
Perrott,  Sadralobadi,  Saberi,  and  Strybel  (1991),  humans 
view  the  world  through  a  window  of  80°  laterally  and  60° 
vertically.  Within  this  visual  field,  focusing  capacity  is  not 
uniform.  The  foveal  area  of  the  retina  (the  part  with  the 
greatest  acuity)  subtends  an  angle  of  only  2°  around  the 
point  of  fixation.  Sounds,  on  the  other  hand,  can  be  heard 
from  above,  below,  in  front,  or  behind  the  user,  but  with  a 
much  lower  resolution.  Therefore,  our  ears  tell  our  eyes 
where  to  look:  if  there  is  an  interesting  sound  from  out¬ 
side  our  view  we  will  turn  to  look  at  it  to  get  more  detailed 
information. 

Superior  temporal  resolution.  As  Kramer  (1994a)  said, 
‘Acute  temporal  resolution  is  one  of  the  greatest  strengths 
of  the  auditory  system”  (p.  2).  In  certain  cases,  reactions  to 
auditory  stimuli  have  been  shown  to  be  faster  than  reactions 
to  visual  stimuli  (Ely,  1982). 

Reduce  the  overload from  large  displays.  Modern,  large,  or 
multiple  monitor  graphical  interfaces  use  the  human  visual 
sense  very  intensively.  This  means  that  we  may  miss  impor¬ 
tant  information  because  our  visual  system  is  overloaded — 
we  have  just  too  much  to  look  at.  To  stop  this  overload,  in¬ 
formation  could  be  displayed  in  sound  so  that  the  load  could 
be  shared  between  senses. 

Reduce  the  amount  of  information  needed  on  screen.  Re¬ 
lated  to  the  point  above  is  the  problem  with  information  pre¬ 
sentation  on  devices  with  small  visual  displays,  such  as  mo¬ 
bile  telephones  and  personal  digital  assistants  (PDAs) .  These 
have  very  small  screens  that  can  easily  become  cluttered.  To 
solve  this,  some  information  could  be  presented  in  sound 
to  release  screen  space. 

Reduce  demands  on  visual  attention.  Another  issue  with 
mobile  devices  is  that  users  who  are  using  them  on  the 
move  cannot  devote  all  of  their  visual  attention  to  the  de¬ 
vice — they  must  look  where  they  are  going  to  avoid  uneven 
surfaces,  traffic,  pedestrians,  and  so  forth.  In  this  case,  visual 
information  may  be  missed  because  the  user  is  not  looking 
at  the  device.  If  this  were  played  in  sound,  then  the  infor¬ 
mation  would  be  delivered  while  the  user  was  looking  at 
something  else. 

The  auditory  sense  is  underutilized.  The  auditory  system 
is  very  powerful;  we  can  listen  to  (and  some  can  compose) 
highly  complex  musical  structures.  As  Alty  (1995)  said,  “The 
information  contained  in  a  large  musical  work  (say  a  sym¬ 
phony)  is  very  large  .  .  .  The  information  is  highly  organized 
into  complex  structures  and  substructures.  The  potential 
therefore  exists  for  using  music  to  successfully  transmit 
complex  information  to  a  user”  (p.  409). 

Sound  is  attention  grabbing.  Users  can  choose  not  to  look 
at  something,  but  it  is  harder  to  avoid  hearing  it.  This  makes 
sound  very  useful  for  delivering  important  information. 
Some  objects  or  actions  within  an  interface  may  have  a 
more  natural  representation  in  sound.  Ely  (1982)  sug¬ 
gested,  “Perception  of  sound  is  different  to  visual  percep¬ 
tion,  sound  can  offer  a  different  intuitive  view  of  the  infor¬ 


mation  it  presents  .  .  .”  (p.  14).  Therefore,  sound  could  al¬ 
low  us  to  understand  information  in  different  ways. 

To  make  computers  more  usable  by  visually  disabled  people. 
With  the  development  of  graphical  displays,  user  interfaces  be¬ 
came  much  harder  for  visually  impaired  people  to  operate.  A 
screen  reader  (see  the  section  on  sounds  for  users  with  visual 
impairments  below)  cannot  easily  read  this  kind  of  graphical  in¬ 
formation.  Providing  information  in  an  auditory  form  can  help 
solve  this  problem  and  allow  visually  disabled  persons  to  use 
the  facilities  available  on  modern  computers. 

Some  Problems  with  Sound 

Kramer  (1994a)  suggested  some  general  difficulties  with  using 

sound  to  present  information. 

Low  resolution:  Many  auditory  parameters  are  not  suitable  for 
high-resolution  display  of  quantitative  information.  For  exam¬ 
ple,  when  using  sound  volume,  only  very  few  different  values 
can  be  unambiguously  presented  (Euxton,  Gaver,  &  Ely  1991). 
Vision  has  a  much  higher  resolution.  The  same  also  applies  to 
spatial  precision  in  sound.  Under  optimal  conditions,  differ¬ 
ences  of  about  1°  can  be  detected  in  front  of  a  listener  (see 
section  on  three  dimensional  sound;  Elauert,  1997).  In  vision, 
differences  of  an  angle  of  two  seconds  can  be  detected  in  the 
area  of  greatest  acuity  in  the  central  visual  field. 

Presenting  absolute  data  is  difficult:  Many  interfaces  that  use 
nonspeech  sound  to  present  data  do  it  in  a  relative  way.  Users 
tell  if  a  value  is  going  up  or  down  through  the  difference  be¬ 
tween  two  sounds.  It  is  difficult  to  present  absolute  data  un¬ 
less  the  listener  has  perfect  pitch,  which  is  rare.  In  vision,  a 
user  needs  only  to  look  at  a  number  to  get  an  absolute  value. 
Lack  of  orthogonality:  Changing  one  attribute  of  a  sound 
may  affect  the  others.  For  example,  changing  the  pitch  of  a 
note  may  affect  its  perceived  loudness  and  vice  versa  (see 
the  section  on  perception  of  sound). 

Transience  of  information:  Sound  disappears  after  it  has 
been  presented;  users  must  remember  the  information  that 
the  sound  contained  or  use  some  method  of  replaying.  In 
vision,  the  user  can  easily  look  back  at  the  display  and  see 
the  information  again.  (This  is  not  always  the  case — think, 
for  example,  of  an  air  conditioning  system:  its  sounds  con¬ 
tinue  for  long  periods  and  become  habituated.  Sounds  often 
continue  in  the  background  and  only  become  apparent 
when  they  change  in  some  way.) 

Annoyance  due  to  auditory  feedback:  There  is  one  prob¬ 
lem  with  sound  that  has  not  yet  been  mentioned,  but  it  is  the 
one  most  commonly  brought  up  against  the  use  of  sound  in 
user  interfaces:  annoyance.  As  this  is  an  important  topic,  it 
will  be  discussed  in  detail  in  a  later  section. 

Comparing  Speech  and  Nonspeech 
Sounds  for  Interface  Design 

One  obvious  question  is  why  not  just  use  speech  for  output? 

Why  do  we  need  to  use  nonspeech  sounds?  Many  of  the  advan- 
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tages  presented  above  apply  to  speech  as  well  as  nonspeech 
sounds.  However,  there  are  some  advantages  to  nonspeech 
sounds.  If  we  think  of  a  visual  analogy,  speech  output  is  like  the 
text  on  a  visual  display  and  nonspeech  sounds  are  like  the  icons. 
Presenting  information  in  speech  is  slow  because  of  its  serial  na¬ 
ture;  to  assimilate  information,  the  user  must  typically  hear  it 
from  beginning  to  end,  and  many  words  may  have  to  be  com¬ 
prehended  before  a  message  can  be  understood.  With  non¬ 
speech  sounds,  the  messages  are  shorter  and  therefore  can  be 
heard  more  rapidly  (although  the  user  might  have  to  learn  the 
meaning  of  the  nonspeech  sound  whereas  the  meaning  is  con¬ 
tained  within  the  speech  and  so  requires  no  learning — just  like 
the  visual  case).  Speech  suffers  from  many  of  the  same  prob¬ 
lems  as  text  in  text-based  computer  systems,  as  this  is  also  a  se¬ 
rial  medium.  Barker  and  Manji  (1989)  claimed  that  an  important 
limitation  of  text  is  its  lack  of  expressive  capability:  It  may  take 
many  words  to  describe  something  fairly  simple.  Graphical  dis¬ 
plays  that  speeded  up  interactions  were  introduced,  as  users 
could  see  a  picture  of  the  application  they  wanted  instead  of 
having  to  read  its  name  from  a  list  (Barker  &  Manji,  1989).  In  the 
same  way,  an  encoded  sound  message  can  communicate  its  in¬ 
formation  in  fewer  sounds.  The  user  hears  the  sound  then  re¬ 
calls  its  meaning  rather  than  having  the  meaning  described  in 
words.  Increasing  the  presentation  rate  of  synthetic  speech  de¬ 
creases  the  accuracy  of  recognition  of  what  is  being  said.  Picto¬ 
rial  icons  can  also  be  universal:  they  can  mean  the  same  thing 
in  many  different  languages  and  cultures,  and  the  nonspeech 
sounds  have  similar  universality  (given  some  of  the  different 
musical  cultures) . 

An  important  ability  of  the  auditory  system  is  habituation, 
where  continuous  sounds  can  fade  into  the  background  of  con¬ 
sciousness  after  a  short  period.  If  the  sound  changes  (or  stops), 
then  it  would  come  to  the  foreground  of  attention  because  the 
auditory  system  is  sensitive  to  change.  Habituation  is  difficult 
to  achieve  with  speech  because  of  the  large  dynamic  range  it 
uses.  According  to  Patterson  (1982),  “The  vowels  of  speech  are 
often  30  dB  more  intense  than  the  consonants,  and  so,  if  a  voice 
warning  were  attenuated  to  produce  a  background  version  with 
the  correct  vowel  level  the  consonants  would  be  near  or  below 
masked  threshold”  (p.  11).  It  is  easier  to  habituate  certain  types 
of  nonspeech  sounds  (think  of  an  air  conditioner  where  you 
only  notice  that  it  was  on  when  it  switches  off)  and  sounds  can 
be  designed  to  facilitate  habituation  if  required. 

Baddeley  (1997)  gave  evidence  to  show  that  background 
speech,  even  at  low  intensities,  is  much  more  disruptive  than 
nonspeech  sound  when  recalling  information.  He  reports  the  un¬ 
attended  speech  effect.  Unattended  speech  (e.g.,  in  the  back¬ 
ground)  causes  information  to  be  knocked  out  of  short-term 
memory,  whereas  noise  or  nonspeech  sound  does  not.  This  prob¬ 
lem  is  unaffected  by  the  intensity  of  the  speech,  provided  that  it  is 
audible.  This  shows  a  problem  for  speech  at  the  interface,  as  it  is 
likely  to  prove  disruptive  for  other  users  in  the  same  environment 
unless  it  is  kept  at  a  very  low  intensity  and,  as  we  saw  above,  this 
can  cause  problems  with  the  ability  to  hear  consonants. 

Nonspeech  sounds  are  also  good  for  presenting  continuous 
information,  such  as  a  graph  of  stock  market  data.  In  speech, 
particular  values  can  be  spoken  out  at  particular  times,  but  there 
would  be  no  way  to  monitor  the  overall  trend  of  the  data.  Meth¬ 


ods  for  doing  this  in  nonspeech  sounds  were  developed  over 
20  years  ago  (Mansur,  Blattner,  &  Joy,  1985)  and  have  proved  to 
be  very  effective  (see  Sound  Graphs  in  perception  of  sound  sec¬ 
tion  below  for  more  on  this). 

This  discussion  has  shown  that  there  are  many  reasons  to 
think  of  using  nonspeech  sounds  in  addition  to  speech  in  HCIs. 
Few  interfaces  make  good  use  of  both.  Speech  in  general  is 
good  for  giving  instructions  and  absolute  values;  nonspeech 
sounds  are  good  for  giving  rapid  feedback  on  actions,  quickly 
presenting  highly  structured  information,  and  presenting  con¬ 
tinuous  data.  Together,  they  make  a  very  effective  means  of  pre¬ 
senting  information  nonvisually. 

Avoiding  Annoyance 

The  main  concern  potential  users  of  all  auditory  interfaces  have 
is  annoyance  due  to  sound  pollution.  There  are  two  aspects  to 
annoyance:  A  sound  may  be  annoying  to  the  user  whose  ma¬ 
chine  is  making  the  noise  (the  primary  user),  and  it  may  be  an¬ 
noying  to  others  in  the  same  environment  who  overhear  it  (sec¬ 
ondary  users).  Buxton  (1989)  discussed  some  of  the  problems 
of  sound  and  suggested  that  some  sounds  help  us  (information) 
and  some  impede  us  (noise).  We  therefore  need  to  design 
sounds  that  are  more  informative  and  less  noise.  Of  course,  one 
person’s  informative  sounds  are  another’s  noise,  so  it  is  impor¬ 
tant  to  make  sure  that  the  sounds  on  one  computer  are  not  an¬ 
noying  for  colleagues  working  nearby. 

Few  studies  focus  on  the  problems  of  annoyance  due  to  non¬ 
speech  sounds  in  computers.  There  are,  however,  many  stud¬ 
ies  of  annoyance  from  speech  (e.g.,  Berglund,  Harder,  &  Preis, 
1994),  from  the  sounds  of  aircraft,  traffic,  and  other  environ¬ 
mental  noises,  and  most  of  these  suggest  that  the  primary  rea¬ 
son  for  the  annoyance  of  sound  is  excessive  volume.  In  a  differ¬ 
ent  context,  Patterson  (1989)  investigated  some  of  the  problems 
with  auditory  warnings  in  aircraft  cockpits.  Many  of  the  warn¬ 
ings  were  added  in  a  “better-safe-than-sorry”  manner  that  led 
to  them  being  so  loud  that  the  pilot’s  first  response  was  to  try  to 
turn  them  off  rather  than  deal  with  the  problem  being  indi¬ 
cated.  One  of  Patterson’s  main  recommendations  was  that  the 
volume  of  the  warnings  should  be  reduced. 

A  loud  sound  grabs  the  attention  of  the  primary  user,  even 
when  the  sound  is  communicating  an  unimportant  event.  As 
the  sound  is  loud,  it  travels  from  one  machine  to  the  ears 
of  other  people  working  nearby,  increasing  the  noise  in  their 
environments. 

So,  how  can  annoyance  be  avoided?  One  key  way  is  to  avoid 
using  intensity  as  a  cue  in  sound  design  for  auditory  interfaces. 
Quiet  sounds  are  less  annoying.  Listeners  are  also  not  good  at 
making  absolute  intensity  judgments  (see  the  next  section). 
Therefore,  intensity  is  not  a  good  cue  for  differentiating  sounds 
anyway. 

Headphones  can  be  used  so  that  sounds  are  heard  only  by 
the  primary  user.  This  may  be  fine  for  users  of  mobile  tele¬ 
phones  and  music  players,  but  is  not  always  a  good  solution  for 
desktop  users  who  do  not  want  to  be  cut-off  from  their  col¬ 
leagues.  Manipulating  sound  parameters  other  than  intensity 
can  make  sounds  attention  grabbing  (but  not  annoying). 
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FIGURE  13.1.  Asinewave. 


Rhythm  or  pitch  can  be  used  to  make  sounds  demanding  be¬ 
cause  the  human  auditory  system  is  very  good  at  detecting 
changing  stimuli  (see  Edworthy,  Loxley  &  Dennis,  1991).  There¬ 
fore,  if  care  is  taken  with  the  design  of  sounds  in  an  interface, 
specifically  avoiding  the  use  of  volume  changes  to  cue  the  user, 
then  many  of  the  problems  of  annoyance  can  be  avoided. 


PERCEPTION  OF  SOUND 

This  section  provides  some  basic  information  about  the  per¬ 
ception  of  sound  that  is  applicable  to  nonspeech  auditory  out¬ 
put.  The  auditory  interface  designer  must  be  conscious  of  the 
effects  of  psychoacoustics,  or  the  perception  of  sound,  when 
designing  sounds  for  the  interface.  As  Frysinger  (1990)  said, 
“The  characterization  of  human  hearing  is  essential  to  auditory 
data  representation  because  it  defines  the  limits  within  which 
auditory  display  designs  must  operate  if  they  are  to  be  effective” 
(p.  31).  Using  sounds  without  regard  for  psychoacoustics  may 
lead  to  the  user  being  unable  to  differentiate  one  sound  from 
another,  unable  to  hear  the  sounds,  or  unable  to  remember 
them  (see  Moore,  2003). 

What  are  sounds?  Sounds  are  pressure  variations  that  prop¬ 
agate  in  an  elastic  medium  (in  this  case,  the  air).  The  pressure 
variations  originate  from  the  motion  or  vibration  of  objects. 
These  pressure  variations  hit  the  listener’s  ear  and  start  the 
process  of  perceiving  the  sound.  The  pressure  variations  plot¬ 
ted  against  time  can  be  seen  in  Fig.  13.1.  This  shows  the  sim¬ 
plest  form  of  sound:  a  sine  wave  (which  might  be  produced  by 
a  tuning  fork).  A  sound  is  made  up  from  three  basic  compo¬ 
nents.  Frequency  is  the  number  of  times  per  second  the  wave 
repeats  itself  (Fig.  13.1  shows  three  cycles).  It  is  normally  mea¬ 
sured  in  Hertz  (Hz).  Amplitude  is  the  deviation  away  from  the 
mean  pressure  level,  or  force  per  unit  area  of  a  sound.  It  is  nor¬ 
mally  measured  in  decibels  (dB).  Phase  is  the  position  of  the 
start  of  the  wave  on  the  time  axis  (measured  in  milliseconds). 

Sounds  from  the  real  world  are  normally  much  more  com¬ 
plex  than  Fig.  13.1  and  tend  to  be  made  up  of  many  sine  waves 
with  different  frequencies,  amplitudes,  and  phases.  Figure  13.2 
shows  a  more  complex  sound  made  of  three  sine  wave  compo¬ 
nents  (or  partials)  and  the  resulting  waveform.  Fourier  analysis 
allows  a  sound  to  be  broken  down  into  its  component  sine 
waves  (Gelfand,  1981). 

The  sounds  in  Fig.  13.1  and  Fig.  13.2  are  periodic — they  re¬ 
peat  regularly  over  time.  This  is  very  common  for  many  types  of 
musical  instruments  that  might  be  used  in  an  auditory  interface. 
Many  natural,  everyday  sounds  (such  as  impact  sounds)  are  not 
periodic  and  do  not  repeat.  The  sound  in  Fig.  13.2  is  also  har¬ 
monic — its  partials  are  integer  multiples  of  the  lowest  (or  fun¬ 
damental)  frequency.  This  is  again  common  for  musical  instru¬ 
ments  but  not  for  everyday  sounds.  Periodic  harmonic  sounds 
have  a  recognizable  pitch,  where  as  nonperiodic,  inharmonic 
sounds  tend  to  have  no  clear  pitch. 

The  attributes  of  sound  described  above  are  the  physical  as¬ 
pects.  There  is  a  corresponding  set  of  perceptual  attributes. 
Pitch  is  the  perceived  frequency  of  a  sound.  Pitch  is  roughly  a 
logarithmic  function  of  frequency.  It  can  be  defined  as  the  at- 


the  flircc  simple  ones. 

FIGURE  13.2.  A  complex  wave  made  up  of  three  components 
with  its  fundamental  at  100  Hz. 


tribute  of  auditory  sensation  in  terms  of  which  sounds  may  be 
ordered  on  a  musical  scale  (Moore,  2003).  In  the  western  mu¬ 
sical  system,  there  are  96  different  pitches  arranged  into  eight 
octaves  of  12  notes.  Tones  separated  by  an  octave  have  the  fre¬ 
quency  ratio  two  to  one.  For  example,  middle  C  is  261.63  Hz, 
the  octave  above  this  is  at  523.25  Hz,  and  the  octave  below  at 
130.81  Hz.  It  is  one  of  the  most  useful  and  easily  controlled 
aspects  of  sound  and  is  very  useful  for  auditory  interface  de¬ 
signers.  However,  as  Buxton  et  al.  (1991)  said,  “It  is  important  to 
be  aware  of  the  myriad  interactions  between  pitch  and  other 
attributes  of  sound  .  .  .”  (p.  2.10).  For  example,  pitch  is  affected 
by  sound  intensity:  at  less  than  2  kHz  an  increase  in  intensity 
increases  the  perceived  pitch,  at  3  kHz  and  over  an  increase  in 
intensity  decreases  the  perceived  pitch  (Gelfand,  1981). 

Humans  can  perceive  a  wide  range  of  frequencies.  The  max¬ 
imum  range  we  can  hear  is  from  20  Hz  to  20  kHz.  This  de¬ 
creases  with  age,  so  that  at  age  70,  a  listener  might  only  hear  a 
maximum  of  10  kHz.  It  is  therefore  important  to  ensure  that  the 
sounds  in  an  auditory  interface  are  perceivable  by  its  users  (also 
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poor  quality  loudspeakers  may  not  be  able  to  cope  with  the  high¬ 
est  or  lowest  frequencies).  Listeners  are  not  good  at  making  ab¬ 
solute  judgments  of  pitch  (Moore,  2003).  Only  196  of  the  popu¬ 
lation  has  perfect  pitch.  Another  important  factor  is  tone 
deafness.  Moore  suggested  that  this  is  a  misnomer  and  almost 
everyone  is  able  to  tell  that  two  sounds  are  different;  they  are  not 
always  able  to  say  which  is  higher  or  lower.  This  can  often  be 
overcome  with  practice,  but  it  is  important  for  the  auditory  in¬ 
terface  designer  to  be  aware  of  the  problem.  Mansur  et  al.  (1985) 
gave  evidence  of  one  other  important  effect:  “There  appears  to 
be  a  natural  tendency,  even  in  infants,  to  perceive  a  pitch  that  is 
higher  in  frequency  to  be  coming  from  a  source  that  is  vertically 
higher  in  space  when  compared  to  some  lower  tone”  (p.  171). 
This  is  important  when  creating  an  auditory  interface  as  it  could 
be  used  to  give  objects  a  spatial  position.  If  only  stereo  position 
is  available  to  provide  spatial  cues  in  the  horizontal  plane,  then 
pitch  could  provide  them  in  the  vertical  plane.  Guidelines  for  the 
use  of  pitch  (and  the  other  parameters  below)  are  described  in 
the  section  on  Earcons. 

Loudness  is  the  perceived  intensity  of  a  sound.  Loudness  (Z) 
is  related  to  intensity  (/)  according  to  the  Power  Law.  L  =  k 
(Gelfand,  1981).  Therefore,  a  lOdB  increase  in  intensity  doubles 
the  perceived  loudness  of  a  sound.  Loudness  is  again  affected  by 
the  other  parameters  of  sound.  For  example,  sounds  of  between 
1  kHz  and  5  kHz  sound  louder  at  the  same  intensity  level  than 
those  outside  that  frequency  range.  Humans  can  perceive  a  very 
wide  range  of  intensities:  the  most  intense  sound  that  a  listener 
can  hear  is  120  dB  louder  than  the  quietest.  This  equates  to  a  ra¬ 
tio  of  1,000,000,000,000: 1  (Moore,  2003).  Buxton  et  al.  (1991) 
also  reported  that  listeners  are  “very  bad  at  making  absolute 
judgments  about  loudness”  (p.  2 . 10)  and  also  that  “our  ability  to 
make  relative  judgments  of  loudness  are  limited  to  a  scale  of 
about  three  different  levels”  (p.  2.11).  It  is  also  a  primary  cause 
of  annoyance  (see  the  section  on  avoiding  annoyance  above), 
so  it  should  be  used  sparingly  by  auditory  interface  designers. 

Timbre  is  the  quality  of  the  sound.  This  attribute  of  audi¬ 
tory  sensation  allows  a  listener  to  judge  two  sounds  with  the 
same  loudness  and  pitch  as  dissimilar.  It  is  what  makes  a  vio¬ 
lin  sound  different  from  a  piano,  even  if  both  are  playing  at 
the  same  pitch  and  at  the  same  loudness.  Its  structure  and 
dimensions  are  not  yet  fully  understood.  It  is  based  partly  on 
the  spectrum  and  dynamics  of  a  sound.  Its  structure  is  not 
well  understood,  but  it  is  one  of  the  most  important  attributes 
of  sound  that  an  interface  designer  can  use.  As  Blattner  et  al. 
(1989)  said,  “Even  though  timbre  is  difficult  to  describe  and 
notate  precisely,  it  is  one  of  the  most  immediate  and  easily 
recognizable  characteristics  of  sound”  (p.  26)  (both  auditory 
icons  and  earcons  use  timbre  as  one  of  their  fundamental  at¬ 
tributes;  see  section  on  nonspeech  sound  presentation  tech¬ 
niques  below).  Many  of  the  synthesis  techniques  in  the  next 
section  make  it  easy  for  a  designer  to  create  and  use  differ¬ 
ent  timbres. 

Duration  is  another  important  attribute  of  sound.  Sounds  of 
different  durations  are  used  to  form  rhythmic  structures  that  are 
a  fundamental  part  of  music.  Duration  can  also  affect  the  other 
parameters  of  sound.  For  example,  for  sounds  of  less  than  one 
second,  loudness  increases  with  duration.  This  is  important  in 
auditory  interfaces  because  short  sounds  are  often  needed  so 
that  the  auditory  feedback  can  keep  pace  with  the  interactions 


taking  place;  accordingly,  they  must  be  made  loud  enough  for 
listeners  to  hear. 

Direction  is  the  position  of  the  sound  source.  This  is  often 
overlooked  but  is  an  important  aspect  of  sound  in  our  every¬ 
day  lives.  As  mentioned  above,  one  of  the  key  differences  be¬ 
tween  sight  and  hearing  is  that  sounds  can  be  heard  from  all 
around  the  listener.  If  a  sound  source  is  located  to  one  side  of 
the  head,  then  the  sound  reaching  the  further  ear  will  be  re¬ 
duced  in  intensity  (Interaural  Intensity  Difference;  IID)  and  de¬ 
layed  in  time  (Interaural  Time  Difference;  ITD;  Blauert,  1997). 
These  two  factors  are  key  in  allowing  a  listener  to  localize  a 
sound  in  space.  Humans  can  detect  small  changes  in  the  posi¬ 
tion  of  a  sound  source.  The  minimum  auditory  angle  (MAA)  is 
the  smallest  separation  between  two  sources  that  can  be  reliably 
detected.  Strybel,  Manligas  and  Perrott  (1992)  reported  that  in 
the  median  plane  sound  sources  only  1°  apart  can  be  detected. 
At  90°  azimuth  (directly  opposite  one  ear),  accuracy  falls  to 
±  10°.  This  has  important  implications  for  auditory  displays  that 
use  sound  position  as  a  cue  because  higher-resolution  place¬ 
ment  can  be  used  when  sounds  are  presented  in  front  of  the 
user  (see  section  earcons  for  more  on  sound  positioning). 


TECHNOLOGY  AND  SOUND  PRODUCTION 


Most  desktop  PCs,  handheld  computers,  and  mobile  telephones 
have  sophisticated  sound  hardware  available  for  the  auditory  in¬ 
terface  designer.  This  is  normally  used  for  playing  games,  but  it 
is  sufficient  to  do  most  of  the  things  required  by  an  auditory  in¬ 
terface.  The  aim  of  this  section  is  to  describe  briefly  some  of 
the  main  technologies  that  are  important  for  designers  to  un¬ 
derstand  when  creating  interfaces. 

There  are  two  main  aspects  to  sound  production:  the  first  is 
sound  synthesis,  and  the  second  is  sound  sampling  and  play¬ 
back.  A  basic  overview  will  be  given  focusing  on  aspects  related 
to  audio  interfaces.  (For  much  more  detail  on  sound  synthesis 
and  MIDI  see  Roads,  1996;  Miranda,  1998;  for  more  on  sam¬ 
pling,  see  Pohlmann,  2005.) 

There  are  many  tools  available  for  synthesis  and  sample  play¬ 
back,  and  devices  such  as  desktop  PCs  and  mobile  telephones 
have  the  necessary  processing  power.  The  Java  programming 
language  (http://www.java.sun.com),  for  example,  has  built-in 
support  for  a  range  of  synthesis  and  sampling  techniques  for 
many  different  platforms.  Libraries  such  as  Fmod  (http://www. 
fmod.org)  allow  standard  cross-platform  sound  and  work  on 
many  different  devices  and  programming  languages.  All  current 
desktop  operating  systems  provide  support  for  synthesis  and 
sampling.  The  basic  technologies  necessary  to  make  the  sounds 
needed  for  auditory  interfaces  are  thus  readily  available,  but  it  is 
important  to  know  something  of  how  they  work  to  utilize  them 
most  effectively. 


A  Brief  Introduction  to  Sound  Synthesis  and  MIDI 

The  aim  of  sound  synthesis  is  to  generate  a  sound  from  a  stored 
model,  often  a  model  of  a  musical  instrument.  For  auditory  in¬ 
terfaces,  we  need  a  wide,  good-quality  range  of  sounds  that  we 
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can  generate  in  real  time  as  the  user  interacts  with  the  interface. 
Synthesizers  come  in  three  main  forms:  soundcards  on  PCs,  ex¬ 
ternal  hardware  devices,  and  software  synthesizers. 

Most  synthesizers  are  polyphonic,  such  as  able  to  play  mul¬ 
tiple  notes  at  the  same  time  (as  opposed  to  monophonic).  This 
is  important  for  auditory  interface  design,  as  you  might  well 
want  to  play  a  chord  made  up  of  several  notes.  Most  synthesiz¬ 
ers  are  multitimbral,  such  as  able  to  play  multiple  different  in¬ 
struments  at  the  same  time.  This  is  again  important,  as  in  many 
situations,  a  sound  composed  of  two  different  instruments 
might  be  required.  The  main  forms  of  synthesis  will  now  be 
briefly  reviewed. 

Wavetable  synthesis  is  one  of  the  most  common  and  low  cost 
synthesis  techniques.  Many  of  the  most  popular  PC  soundcards 
use  it  (such  as  the  SoundBlaster™  series  from  Creative  Tech¬ 
nology — http://www.creative.com).  The  idea  behind  wavetable 
synthesis  is  to  use  existing  sound  recordings  (which  are  often 
very  difficult  to  synthesize  exactly)  as  the  starting  point  and  to 
create  very  convincing  simulations  of  acoustical  instruments 
based  on  them  (Miranda,  1998;  Roads,  1996).  A  sample  (record¬ 
ing)  of  a  particular  sound  will  be  stored  in  the  soundcard.  It  can 
then  be  played  back  to  produce  a  sound.  The  sample  memory 
in  these  systems  contains  a  large  number  of  sampled  sound  seg¬ 
ments,  and  they  can  be  thought  of  as  a  table  of  sound  wave¬ 
forms  that  may  be  looked  up  and  utilized  when  needed. 
Wavetable  synthesizers  employ  a  variety  of  different  techniques, 
such  as  sample  looping,  pitch  shifting,  mathematical  interpola¬ 
tion,  and  polyphonic  digital  filtering,  to  reduce  the  amount  of 
memory  required  to  store  the  sound  samples  or  to  get  more 
types  of  sounds.  More  sophisticated  synthesizers  contain  more 
wavetables  (perhaps  one  or  more  for  the  initial  attack  part  of  a 
sound  and  then  more  for  the  sustain  part  of  the  sound  and  then 
more  for  the  final  decay  and  release  parts).  Generally,  the  more 
wavetables  that  are  used  the  better  the  quality  of  the  synthesis, 
but  this  does  require  more  storage.  It  is  also  possible  to  com¬ 
bine  multiple,  separately  controlled  wavetables  to  create  a  new 
instrument. 

Wavetable  synthesis  is  not  so  good  if  you  want  to  create  new 
timbres,  as  it  lacks  some  of  the  flexibility  of  the  other  techniques 
below.  Most  wavetable  synthesizers  contain  many  sounds  (often 
hundreds),  so  there  may  not  be  a  great  need  to  create  new 
ones.  For  most  auditory  interfaces,  the  sounds  from  a  good 
quality  wavetable  synthesizer  will  be  perfectly  acceptable.  For 
desktop  computers  the  storage  of  large  wavetables  is  no  prob¬ 
lem,  but  for  mobile  telephones  with  less  storage,  there  may  be  a 
much  smaller,  lower  quality  set,  so  care  may  be  needed  to  de¬ 
sign  appropriate  sounds. 

FM  (frequency  modulation)  Synthesis  techniques  generally 
use  one  periodic  signal  (the  modulator)  to  modulate  the  fre¬ 
quency  of  another  signal  (the  carrier;  Chowning,  1975).  If  the 
modulating  signal  is  in  the  audible  range,  then  the  result  will 
be  a  significant  change  in  the  timbre  of  the  carrier  signal.  Each 
FM  voice  requires  a  minimum  of  two  signal  generators.  Sophis¬ 
ticated  FM  systems  may  use  four  or  six  operators  per  voice,  and 
the  operators  may  have  adjustable  envelopes  that  allow  adjust¬ 
ment  of  the  attack  and  decay  rates  of  the  signal.  FM  synthesis  is 
cheap  and  easy  to  implement,  and  it  can  be  useful  for  creating 
expressive  new  synthesized  sounds.  However,  if  the  goal  is  to 
recreate  the  sound  of  an  existing  instrument,  then  FM  synthe¬ 


sis  is  not  the  best  choice,  as  it  can  generally  be  done  more  eas¬ 
ily  and  accurately  with  wavetable  techniques. 

Additive  (and  Subtractive)  Synthesis  is  the  oldest  form  of 
synthesis  (Roads,  1996).  Multiple  sine  waves  are  added  together 
to  produce  a  more  complex  output  sound  (subtractive  synthe¬ 
sis  is  the  opposite:  a  complex  sound  has  frequencies  filtered  out 
to  create  the  sound  required).  It  is  theoretically  possible  to 
create  any  sound  using  this  method  (as  all  complex  sounds  can 
be  decomposed  into  sets  of  sine  waves  by  Fourier  analysis). 
However,  it  can  be  very  difficult  to  create  any  particular  sound. 
Computer  musicians  often  use  this  technique,  as  it  is  very  flexi¬ 
ble  and  easy  to  create  new  and  unusual  sounds,  but  it  may  be 
less  useful  for  general  auditory  interface  design. 

Physical  Modeling  Synthesis  uses  mathematical  models  of 
the  physical  acoustic  properties  of  instruments  and  objects. 
Equations  describe  the  mechanical  and  acoustic  behavior  of  an 
instrument.  The  better  the  simulation  of  the  instrument,  the 
more  realistic  the  sound  produced.  Nonexistent  instruments 
can  also  be  modeled  and  made  to  produce  sounds.  Physical 
modeling  is  an  extremely  good  choice  for  synthesis  of  many 
classical  instruments,  especially  those  of  the  woodwind  and 
brass  families.  The  downside  is  that  it  can  require  large  amounts 
of  processing  power,  which  limits  the  polyphony. 

The  Musical  Instrument  Digital  Interface — MIDI 

The  Musical  Instrument  Digital  Interface  (MIDI)  allows  the 
real-time  control  of  electronic  instruments  (such  as  synthesiz¬ 
ers,  samplers,  drum  machines,  etc.)  and  is  now  very  widely  used 
(http://www.midi.org).  It  specifies  a  hardware  interconnection 
scheme,  a  method  for  data  communications,  and  a  grammar  for 
encoding  musical  performance  information  (Roads,  1996).  For 
auditory  interface  designers,  the  most  important  part  of  MIDI 
is  the  performance  data,  which  is  a  very  efficient  method  of  rep¬ 
resenting  sounds.  Most  soundcards  support  MIDI  with  an  in¬ 
ternal  synthesizer  and  provide  a  MIDI  interface  to  connect  to  ex¬ 
ternal  devices.  Both  Apple  Mac  OS  and  Microsoft  Windows  have 
good  support  for  MIDI.  Most  programming  languages  now 
come  with  libraries  supporting  MIDI  commands. 

MIDI  performance  information  is  like  a  piano  roll:  notes  are 
set  to  turn  on  or  off  and  play  different  instruments  over  time.  A 
MIDI  message  is  an  instruction  that  controls  some  aspect  of  the 
performance  of  an  instrument.  A  MIDI  message  is  made  up  of 
a  status  byte,  which  indicates  the  type  of  the  message,  followed 
by  up  to  two  data  bytes  that  give  the  parameters.  For  example, 
the  Note  On  command  takes  two  parameters:  one  value  giving 
the  pitch  of  the  note  required  and  the  other  the  volume.  This 
makes  it  a  very  compact  form  of  presentation. 

Performance  data  can  be  created  dynamically  from  program 
code  or  by  a  sequencer.  In  an  auditory  interface,  the  designer 
might  assign  a  particular  note  to  a  particular  interface  event — 
for  example,  a  click  on  a  button.  When  the  user  clicks  on  the 
button,  a  MIDI  Note  On  event  will  be  fired.  When  the  user  re¬ 
leases  the  button,  the  corresponding  Note  Off  event  will  be 
sent.  This  is  a  very  simple  and  straightforward  way  of  adding 
sounds.  With  a  sequencer,  data  can  be  entered  using  classical 
music  notation  by  dragging  and  dropping  notes  onto  a  stave  or 
by  using  an  external  piano-style  keyboard.  This  could  then  be 
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saved  to  a  MIDI  file  for  later  playback  (or  could  be  recorded  and 
played  back  as  a  sample — see  comparing  MIDI  synthesis  to 
sampling  for  auditory  interface  design  section  below). 

A  Brief  Introduction  to  Sampling 

In  many  ways,  sampling  is  simpler  than  synthesis.  The  aim  is  to 
make  a  digital  recording  of  an  analogue  sound  and  then  to  be 
able  to  play  it  back  later,  with  the  played  back  sound  matching 
the  original  as  closely  as  possible.  There  are  two  important  as¬ 
pects:  sample  rate  and  sample  size. 


Sample  Rate 

Sample  rate  is  the  number  of  discrete  snapshots  of  the 
sound  that  are  taken,  often  measured  per  second.  The  higher 
the  sampling  rate  is,  the  higher  the  quality  of  the  sound  when 
it  is  played  back.  With  a  low  sampling  rate,  few  snapshots  of  the 
sound  are  taken,  and  the  recording  will  not  match  well  the 
sound  being  recorded.  The  Sampling  Theorem  (Roads,  1996) 
states  that  to  reconstruct  a  signal,  the  sampling  frequency  must 
be  at  least  twice  the  frequency  of  the  signal  being  sampled.  As 
mentioned  above,  the  limit  of  human  hearing  is  around  20  kHz, 
therefore  a  maximum  rate  of  40  kHz  is  required  to  be  able  to 
record  any  sound  that  a  human  can  hear.  The  standard  audio 
CD  format  uses  a  sample  rate  of  44.1  kHz,  meaning  that  is  cov¬ 
ers  all  of  the  frequencies  that  a  human  can  hear.  If  a  lower  sam¬ 
pling  rate  is  used,  then  higher  frequencies  are  lost.  For  example, 
the  .au  sample  format  uses  a  sampling  rate  of  8  kHz,  meaning 
that  only  frequencies  of  less  than  4  kHz  can  be  recorded  (see 
Bagwell,  1998). 

Higher  sampling  rates  generate  much  more  data  than  lower 
ones,  so  they  may  not  always  be  suitable  if  storage  is  limited  (for 
example,  on  a  mobile  computing  device).  An  auditory  interface 
designer  must  think  about  the  frequency  range  of  the  sounds 
needed  in  an  interface.  This  might  allow  the  sample  rate  to  be 
reduced.  If  the  highest  quality  is  required,  you  must  be  pre¬ 
pared  to  deal  with  large  audio  files. 


Sample  Size 

The  larger  the  sample  size  is,  the  better  the  quality  of  the 
recording,  as  more  information  is  stored  at  each  snapshot  of  the 
sound.  Sample  size  defines  the  volume  (or  dynamic)  range  of  the 
sound.  With  an  8-bit  sample,  only  256  discrete  amplitude  (or 
quantization)  levels  can  be  represented.  To  fit  an  analogue  sound 
into  one  of  these  levels  might  cause  it  to  be  rounded  up  or  down, 
and  this  can  add  noise  to  the  recording.  CD  quality  sounds  use 
16-bit  samples,  giving  65536  different  levels,  so  the  effects  of 
quantization  are  reduced.  Many  high  quality  samplers  use  24-bit 
samples  to  reduce  the  problems  of  quantization  further. 

The  two  main  bit  sizes  used  in  most  soundcards  are  8  and 
16  bits.  As  with  sample  rates,  the  main  issue  is  size:  16-bit  sam¬ 
ples  require  a  lot  of  storage,  especially  at  high  sample  rates.  Au¬ 


dio  CD  quality  sound  generates  around  10  Mbytes  of  data  per 
minute.  Compression  techniques  such  as  MP3  can  help  reduce 
the  amount  of  storage  but  keep  quality  high. 

Comparing  MIDI  Synthesis  to  Sampling 
for  Auditory  Interface  Design 

MIDI  is  very  flexible,  as  synthesizers  can  generate  sound  in  real 
time,  as  it  is  needed.  If  you  do  not  know  all  of  the  sounds  you 
might  want  in  your  auditory  interface  in  advance,  then  this  can 
very  effective — as  the  sound  is  needed,  it  is  just  played  by  the 
synthesizer.  A  system  that  is  based  around  samples  can  only  play 
back  samples  that  have  been  prerecorded  and  stored. 

Another  advantage  of  MIDI  is  that  sounds  can  be  changed. 
Once  a  sample  has  been  stored,  it  is  difficult  to  change  it.  For 
example,  it  is  possible  to  change  the  speed  and  pitch  of  a 
sound  independently  with  MIDI.  If  a  sample  is  played  back  at 
a  different  speed  its  pitch  will  change,  which  may  cause  unde¬ 
sirable  effects. 

MIDI  commands  are  also  very  small;  each  command  might 
only  take  up  two  or  three  bytes.  Generating  sounds  from  code 
in  your  auditory  interface  is  straightforward.  For  instance,  files 
containing  high  quality  stereo  sampled  audio  require  about 
10  Mbytes  of  data  per  minute  of  sound,  while  a  typical  MIDI  se¬ 
quence  might  consume  less  than  10  Kbytes  of  data  per  minute 
of  sound.  This  is  because  the  MIDI  file  does  not  contain  the 
sampled  audio  data;  it  contains  only  the  instructions  needed 
by  a  synthesizer  to  play  the  sounds. 

Samples  have  the  advantage  that  you,  as  the  interface  de¬ 
signer,  know  exactly  what  your  sound  will  sound  like.  With 
MIDI,  you  are  at  the  mercy  of  the  synthesizer  on  the  user’s  ma¬ 
chine.  It  may  be  of  very  poor  quality,  and  the  sounds  might  not 
sound  anything  like  the  ones  you  designed.  With  samples,  all 
of  the  information  about  the  sound  is  stored  so  that  you  can 
guarantee  it  will  sound  like  the  recording  you  made  (given  the 
possible  limitations  of  the  speakers  on  the  user’s  machine). 

It  is  also  not  possible  to  synthesize  all  sounds.  Much  of  the 
work  in  sound  synthesis  has  focused  on  synthesizing  musical  in¬ 
struments.  Few  synthesizers  can  do  a  good  job  of  creating  nat¬ 
ural,  everyday  sounds.  If  you  want  to  use  natural  sounds  in  your 
interface,  you  are  limited  to  using  samples. 

Three  Dimensional  Sound 

Much  of  the  recorded  sound  we  hear  is  in  stereo.  A  stereo 
recording  uses  differences  in  intensity  between  the  ears.  From 
these  differences,  the  listener  can  gain  a  sense  of  movement  and 
position  of  a  sound  source  in  the  stereo  field.  The  perceived  po¬ 
sition  is  along  a  line  between  the  two  loudspeakers  or,  if  they 
are  wearing  headphones,  inside  the  head  between  the  listeners’ 
ears.  This  simple,  inexpensive  technique  can  give  useful  spatial 
cues  at  the  auditory  interface.  This  is  being  taken  further  to 
make  sounds  appear  as  coming  from  around  a  user  (in  virtual 
3D)  when  only  a  small  number  of  loudspeakers  (or  even  just  a 
pair  of  headphones)  are  used.  Spatial  sound  can  be  used  for  a 
range  of  things,  including  giving  directional  information,  spread¬ 
ing  sound  sources  around  the  head  to  help  users  differentiate 
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simultaneous  sounds,  and  creating  audio  windows  in  which  to 
present  information  (see  the  section  on  sound  for  mobile  and 
ubiquitous  computing) . 

As  well  as  the  ITD  and  IID,  in  the  real  world  we  use  our  pin¬ 
nae  (the  outer  ear)  to  filter  the  sounds  coming  from  different  di¬ 
rections  so  that  we  know  where  they  are  coming  from.  To  simulate 
sounds  as  coming  from  around  the  user  and  outside  of  the  head 
when  wearing  headphones,  sounds  entering  the  ear  are  recorded 
by  putting  microphones  into  the  ear  canals  of  listeners.  The  dif¬ 
ferences  between  the  sound  at  the  sound  source  and  at  the  ear 
canal  are  then  calculated  and  the  differences,  or  head-related 
transfer  functions  (HRTFs),  derived  are  used  to  create  filters  with 
which  stimuli  can  be  synthesized  (Blauert,  1997).  This  research  is 
important,  as  three-dimensional  auditory  interfaces  that  are  more 
natural  can  be  created,  with  sounds  presented  around  the  user, 
as  they  would  be  in  real  life.  Almost  all  current  PC  soundcards  can 
generate  such  3D  sounds,  as  they  are  often  used  in  games. 

The  main  problem  with  providing  simulated  3D  sound 
through  headphones  comes  from  the  general  HRTFs  used.  If 
your  ears  are  not  like  the  ears  of  the  head  (often  a  dummy  head) 
from  which  the  HRTFs  were  generated,  then  you  are  likely  to  feel 
that  the  sounds  are  coming  from  inside  your  head  and  not  out¬ 
side.  It  is  also  very  easy  to  confuse  front  and  back  so  that  listen¬ 
ers  cannot  tell  if  a  sound  is  in  front  or  behind.  Vertical  position¬ 
ing  is  also  difficult  to  do  reliably.  This  means  that  many  designers 
who  use  spatial  sound  in  their  interfaces  often  limit  themselves 
to  a  plane  cutting  through  the  head  horizontally  at  the  level  of 
the  ears,  creating  a  2.5D  space.  This  reduces  the  space  in  which 
sounds  can  be  presented  but  avoids  many  of  the  problems  of 
users  not  being  able  to  localize  the  sounds  properly. 

To  improve  quality,  head  tracking  is  often  used.  Once  the  ori¬ 
entation  of  the  user’s  head  is  known,  sounds  can  be  respatial- 
ized  to  remain  in  position  when  the  head  turns.  Active  listen¬ 
ing  is  used  to  disambiguate  the  location  of  a  sound — listeners 
naturally  make  small  head  movements  and  these  change  the  IID 
and  ITD,  cueing  the  listener  to  the  location  of  the  sound.  Using 
such  tracking  can  significantly  improve  the  performance  of 
3D  auditory  user  interfaces  (Marentakis  &  Brewster,  2004). 
Marentakis  and  Brewster  also  showed  that  targets  should  be 
around  ±  10°  in  size  when  head  tracking  is  used  to  enable  accu¬ 
rate  localization  and  selection. 


sound-producing  events.  Auditory  icons  are  like  sound  effects  for 
computers”  (p.  68).  Auditory  icons  rely  on  an  analogy  between  the 
everyday  world  and  the  model  world  of  the  computer  (Gaver, 
1997;  for  more  examples  of  the  use  of  earcons  see  the  work  on 
Mercator  and  Audio  Aura  described  in  sections  on  sounds  for 
users  with  visual  impairments  and  sound  for  mobile  and  ubiqui¬ 
tous  computing). 

Gaver  (1997)  used  sounds  of  events  that  are  recorded  from 
the  natural  environment,  such  as  tapping  or  smashing  sounds. 
He  used  an  ecological  listening  approach  (Neuhoff,  2004),  sug¬ 
gesting  that  people  do  not  listen  to  the  pitch  and  timbre  of 
sounds  but  to  the  sources  that  created  them.  When  pouring  liq¬ 
uid,  a  listener  hears  the  fullness  of  the  receptacle,  not  the  in¬ 
creases  in  pitch.  Another  important  property  of  everyday  sounds 
is  that  they  can  convey  multidimensional  data.  When  a  door 
slams,  a  listener  may  hear  the  size  and  material  of  the  door,  the 
force  that  was  used,  and  the  size  of  room  on  which  it  was 
slammed.  This  could  be  used  within  an  interface  so  that  selec¬ 
tion  of  an  object  makes  a  tapping  sound,  the  type  of  material 
could  represent  the  type  of  object,  and  the  size  of  the  tapped  ob¬ 
ject  could  represent  the  size  of  the  object  within  the  interface. 

Gaver  (1989)  used  these  ideas  to  create  auditory  icons  and 
from  these  built  the  SonicFinder.  This  ran  on  the  Apple  Macin¬ 
tosh  and  provided  auditory  representations  of  some  objects  and 
actions  within  the  interface.  Files  were  given  a  wooden  sound, 
applications  a  metal  sound,  and  folders  a  paper  sound.  The 
larger  the  object  the  deeper  the  sound  it  made.  Thus,  selecting 
an  application  meant  tapping  it — it  made  a  metal  sound  that 
confirmed  that  it  was  an  application  and  the  deepness  of  the 
sound  indicated  its  size.  Copying  used  the  idea  of  pouring  liquid 
into  a  receptacle.  The  rising  of  the  pitch  indicated  that  the  re¬ 
ceptacle  was  getting  fuller  and  the  copy  progressing. 


NONSPEECH  SOUND  PRESENTATION 
TECHNIOUES 


A)  Papery  lapping  .sound 
to  show  selection  of  folder. 


B)  Scraping  sound  lo  indicate 
dragging  folder. 


The  two  main  types  of  nonspeech  audio  presentation  tech¬ 
niques  commonly  used  are  auditory  icons  and  earcons.  Sub¬ 
stantial  research  has  gone  into  developing  both  of  these,  and 
the  main  work  is  reviewed  below. 

Auditory  Icons 


Gaver  (1989,  1997)  developed  the  idea  of  auditory  icons.  These 
natural,  everyday  sounds  can  be  used  to  represent  actions  and 
objects  within  an  interface.  Gaver  defined  them  as  “everyday 
sounds  mapped  to  computer  events  by  analogy  with  everyday 


C)  Clinking  sound  to  show 
wastebasket  selected 


D)  Smashing  sound  to  indicate 
folder  deleted. 


FIGURE  13.3.  An  interaction  showing  the  deletion  of  a  folder 
in  the  SonicFinder  (from  Gaver,  1989). 
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To  demonstrate  how  the  SonicFinder  worked  a  simple  inter¬ 
action  is  provided  in  Fig.  13.3,  showing  the  deletion  of  a  folder. 
In  (a),  a  folder  is  selected  by  tapping  on  it;  this  causes  a  papery 
sound,  indicating  that  the  target  is  a  folder.  In  (b),  the  folder  is 
dragged  toward  the  wastebasket,  causing  a  scraping  sound.  In 
(c),  the  wastebasket  becomes  highlighted,  and  a  clinking  sound 
occurs  when  the  pointer  reaches  it.  Finally,  in  (d),  the  folder  is 
dropped  into  the  wastebasket,  and  a  smashing  sound  occurs  to 
indicate  it  has  been  deleted  (the  wastebasket  becomes  fat  to 
indicate  there  is  something  in  it). 

Problems  can  occur  with  representational  systems,  such  as 
auditory  icons,  because  some  abstract  interface  actions  and 
objects  have  no  obvious  representation  in  everyday  sound. 
Gaver  (1989)  used  a  pouring  sound  to  indicate  copying  be¬ 
cause  there  was  no  natural  equivalent;  this  is  more  like  a 
sound  effect.  He  suggested  the  use  of  movie-like  sound  effects 
to  create  sounds  for  things  with  no  easy  representation.  This 
may  cause  problems  if  the  sounds  are  not  chosen  correctly,  as 
they  will  become  more  abstract  than  representational  and  the 
advantages  of  auditory  icons  will  be  lost. 

Gaver,  Smith,  and  O’Shea  (1991)  developed  the  ideas  from 
the  SonicFinder  further  in  the  ARKola  system,  which  modeled 
a  soft  drink  factory.  The  simulation  consisted  of  a  set  of  nine  ma¬ 
chines  split  into  two  groups:  those  for  input,  and  those  for  out¬ 
put.  The  input  machines  supplied  the  raw  materials;  the  out¬ 
put  machines  capped  the  bottles  and  sent  them  for  shipping. 
Each  machine  had  an  on/off  switch  and  a  rate  control.  The  aim 
of  the  simulation  was  to  run  the  plant  as  efficiently  as  possible, 
avoid  waste  of  raw  materials,  and  make  a  profit  by  shipping  bot¬ 
tles.  Two  users  controlled  the  factory,  with  each  user  able  to  see 
approximately  one  third  of  the  whole  plant.  This  form  of  plant 
was  chosen  because  it  allowed  Gaver  et  al.  to  investigate  how 
the  sounds  would  affect  the  way  users  handled  the  given  task 
and  how  people  collaborated.  It  was  also  an  opportunity  to  in¬ 
vestigate  how  different  sounds  would  combine  to  form  an  au¬ 
ditory  ecology  (integrated  set  of  sounds),  or  soundscape.  Gaver 
et  al.  related  the  way  the  different  sounds  in  the  factory  com¬ 
bined  to  the  way  a  car  engine  is  perceived.  Although  the  sounds 
are  generated  by  multiple  distinct  components,  these  combine 
to  form  what  is  perceived  as  a  unified  sound.  If  something  goes 
wrong,  the  sound  of  the  engine  will  change,  alerting  the  listener 
to  the  problem,  but  in  addition,  to  a  trained  ear,  the  change  in 
the  sound  would  alert  the  listener  to  the  nature  of  the  prob¬ 
lem.  The  sounds  used  to  indicate  the  performance  of  the  indi¬ 
vidual  components  of  the  factory  were  designed  to  reflect  the 
semantics  of  the  machine. 

Each  of  the  machines  had  a  sound  to  indicate  its  status  over 
time;  for  example,  the  bottle  dispenser  made  the  sound  of  clink¬ 
ing  bottles.  The  rhythm  of  the  sounds  reflected  the  rate  at 
which  the  machine  was  running.  If  a  machine  ran  out  of  sup¬ 
plies  or  broke  down,  its  sound  stopped.  Sounds  were  also 
added  to  indicate  that  materials  were  being  wasted.  A  splash¬ 
ing  sound  indicated  that  liquid  was  being  spilled;  the  sound  of 
smashing  bottles  indicated  that  bottles  were  being  lost.  The  sys¬ 
tem  was  designed  so  that  up  to  14  different  sounds  could  be 
played  at  once.  To  reduce  the  chance  that  all  sounds  would  be 
playing  simultaneously,  sounds  were  pulsed  once  a  second 
rather  than  playing  continuously. 


An  informal  evaluation  was  undertaken  where  pairs  of  users 
were  observed  controlling  the  plant,  either  with  or  without 
sound.  These  observations  indicated  that  the  sounds  were  ef¬ 
fective  in  informing  the  users  about  the  state  of  the  plant  and 
that  the  users  were  able  to  differentiate  the  different  sounds  and 
identify  the  problem  when  something  went  wrong.  When  the 
sounds  were  used,  there  was  much  more  collaboration  between 
the  two  users.  This  was  because  each  could  hear  the  whole 
plant  and  therefore  help  if  there  were  problems  with  machines 
that  the  other  was  controlling.  In  the  visual  only  condition, 
users  were  not  as  efficient  at  diagnosing  what  was  wrong  even 
if  they  knew  there  was  a  problem. 

One  of  the  biggest  advantages  of  auditory  icons  is  the  ability 
to  communicate  meanings  which  listeners  can  easily  learn  and 
remember,  other  systems  (for  example  earcons,  see  the  next  sec¬ 
tion)  use  abstract  sounds  where  the  meanings  are  harder  to 
learn.  Problems  did  occur  with  some  of  the  warning  sounds 
used,  as  Gaver  et  al.  (1991)  indicated:  “The  breaking  bottle 
sound  was  so  compelling  semantically  and  acoustically  that  part¬ 
ners  sometimes  rushed  to  stop  the  sound  without  understand¬ 
ing  its  underlying  cause  or  at  the  expense  of  ignoring  more  seri¬ 
ous  problems”  (p.  89).  Another  problem  was  that,  when  a 
machine  ran  out  of  raw  materials,  its  sound  just  stopped;  users 
sometimes  missed  this  and  did  not  notice  that  something  had 
gone  wrong. 

Design  Guidelines  for  Auditory  Icons 

There  have  been  few  detailed  studies  investigating  the  best 
ways  to  design  auditory  icons,  so  there  is  little  guidance  for  in¬ 
teraction  designers.  Mynatt  (1994)  proposed  the  following  basic 
design  methodology:  (a)  Choose  short  sounds  which  have  a 
wide  bandwidth,  and  where  length,  intensity,  and  sound  qual¬ 
ity  are  roughly  equal;  (b)  evaluate  the  identifiability  of  the  audi¬ 
tory  cues  using  free-form  answers;  (c)  evaluate  the  learnability 
of  the  auditory  cues  which  are  not  readily  identified;  (d)  test 
possible  conceptual  mappings  for  the  auditory  cues  using  a  re¬ 
peated  measures  design  where  the  independent  variable  is  the 
concept  that  the  cue  will  represent;  (e)  evaluate  possible  sets 
of  auditory  icons  for  potential  problems  with  masking,  discrim- 
inability  and  conflicting  mappings;  and  (f)  conduct  usability  ex¬ 
periments  with  interfaces  using  the  auditory  icons. 

Earcons 

Earcons  were  developed  by  Blattner  et  al.  (1989).  They  used 
abstract,  synthetic  tones  in  structured  combinations  to  create 
auditory  messages.  Blattner  et  al.  defined  earcons  as  “nonverbal 
audio  messages  that  are  used  in  the  computer/user  interface  to 
provide  information  to  the  user  about  some  computer  object, 
operation  or  interaction”  (p.  13).  Unlike  auditory  icons,  there 
is  no  intuitive  link  between  the  earcon  and  what  it  represents; 
the  link  must  be  learned.  They  use  a  more  traditional  musical 
approach  than  auditory  icons. 

Earcons  are  constructed  from  simple  building  blocks  called 
“motifs”  (Blattner  et  al.,  1989).  These  short,  rhythmic  sequences 
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can  be  combined  in  different  ways.  Blattner  et  al.  suggested 
their  most  important  features  include  the  following: 

Rhythm:  Changing  the  rhythm  of  a  motif  can  make  it  sound 
very  different.  Blattner  et  al.  (1989)  described  this  as  the 
most  prominent  characteristic  of  a  motif 
Pitch:  There  are  96  different  pitches  in  the  western  musical 
system,  and  these  can  be  combined  to  produce  a  large  num¬ 
ber  of  different  motifs. 

Timbre:  Motifs  can  be  made  to  sound  different  by  the  use 
of  different  timbres,  for  example  playing  one  motif  with  the 
sound  of  a  violin  and  the  other  with  the  sound  of  a  piano. 
Register:  This  is  the  position  of  the  motif  in  the  musical 
scale.  A  high  register  means  a  high-pitched  note,  and  a  low 
register  means  a  low-pitched  note.  The  same  motif  in  a  dif¬ 
ferent  register  can  convey  a  different  meaning. 

Dynamics:  This  is  the  volume  of  the  motif  It  can  be  made 
to  increase  as  the  motif  plays  (crescendo)  or  decrease 
(decrescendo). 

There  are  two  basic  ways  in  which  earcons  can  be  con¬ 
structed.  The  first,  and  simplest,  are  compound  earcons.  These 
simple  motifs  can  be  concatenated  to  create  more  complex 
earcons.  For  example,  a  set  of  simple,  one-element  motifs  might 
represent  various  system  elements  such  as  create,  destroy, 
file,  and  string  (see  Fig.  13. 4A).  These  could  then  be  concate¬ 
nated  to  form  earcons  (Blattner  et  al.,  1989).  In  the  figure,  the 
earcon  for  create  is  a  high-pitched  sound  that  gets  louder,  and 
for  destroy,  it  is  a  low-pitched  sound  that  gets  quieter.  For  file, 
there  are  two  long  notes  that  fall  in  pitch  and  for  string,  two 
short  notes  that  rise.  In  Fig.  13. 4B,  the  compound  earcons  can 
be  seen.  For  the  create  file  earcon,  the  create  motif  is  simply  fol¬ 
lowed  by  the  file  motif  This  provides  a  simple  and  effective 
method  for  building  up  earcons. 

Hierarchical  earcons  are  more  complex  but  can  represent 
complex  sound  structures.  Each  earcon  is  a  node  in  a  tree  and  in¬ 
herits  properties  from  the  earcons  above  it.  Figure  13.5,  shows 
a  hierarchy  of  earcons  representing  a  family  of  errors.  The  top 
level  of  the  tree  is  the  family  rhythm.  This  sound  just  has  a 
rhythm  and  no  pitch;  the  sounds  used  are  clicks.  The  rhythmic 
structure  of  level  one  is  inherited  by  level  two,  but  this  time,  a 
second  motif  is  added  where  pitches  are  put  to  the  rhythm.  At 
this  level,  Blattner  et  al.  (1989)  suggested  that  the  timbre  should 
be  a  sine  wave,  which  produces  a  colorless  sound.  This  is  done 
so  that  at  level  three,  the  timbre  can  be  varied.  At  level  three, 
the  pitch  is  also  raised  by  a  semitone  to  make  it  easier  to  differ¬ 
entiate  from  the  pitches  inherited  from  level  two.  Other  levels 
can  be  created  where  register  and  dynamics  are  varied. 

Blattner  et  al.  (1989)  proposed  the  design  of  earcons  but 
did  not  develop  or  test  them.  Brewster,  Wright,  and  Edwards 
(1994)  carried  out  a  detailed  evaluation  of  compound  and  hi¬ 
erarchical  earcons  based  on  the  design  proposed  by  Blattner 
et  al.,  1989  which  involved  simple  system  beeps  and  a  richer 
design  based  on  more  complex  musical  timbres  using  psy- 
choacoustical  research  (see  the  section  on  perception  of 
sound).  In  these  experiments,  participants  were  presented 
with  earcons  representing  families  of  icons,  menus,  and  com- 


FIGURE  13.4.  Compound  earcons.  A  shows  the  four  audio  motifs 
'create,'  'destroy,'  'file,'  and  'string.'  B  shows  the  compound 
earcons  'create  file'  and  'destroy  string'  (Blattner  et  al.,  1989). 
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FIGURE  13.5.  A  hierarchy  of  earcons  representing  errors  (From 
Blattner  et  al.,  1989). 


binations  of  both  (examples  can  be  heard  at  http:/Avww.dcs.gla. 
ac.uk/~stephen/demos.shtml).  They  heard  each  sound  three 
times  and  then  had  to  identify  them  when  played  back.  Results 
showed  that  the  more  complex  musical  earcons  were  signifi¬ 
cantly  more  effective  than  both  the  simple  beeps  and  Blattner’s 
proposed  design,  with  over  80%  recalled  correctly.  Brewster 
et  al.  (1994)  found  that  timbre  was  a  much  more  important 
than  previously  suggested  whereas  pitch  on  its  own  was  diffi¬ 
cult  to  differentiate.  The  main  design  features  of  the  earcons 
used  were  formalized  into  a  set  of  design  guidelines: 

Timbre:  This  is  the  most  important  grouping  factor  for 
earcons.  Use  musical  instrument  timbres  with  multiple  har¬ 
monics,  as  this  helps  perception  and  can  avoid  masking. 
These  timbres  are  more  recognizable  and  differentiable. 
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Pitch  and  Register:  If  listeners  are  to  make  absolute  judg¬ 
ments  of  earcons,  then  pitch/register  should  not  be  used  as 
a  cue  on  its  own.  A  combination  of  register  and  another  pa¬ 
rameter  gives  better  rates  of  recall.  If  register  alone  must  be 
used,  then  there  should  be  large  differences  (two  or  three 
octaves)  between  earcons.  Much  smaller  differences  can  be 
used  if  relative  judgments  are  to  be  made.  The  maximum 
pitch  used  should  be  no  higher  than  5  kHz  and  no  lower 
than  125  Hz-150  Hz  so  that  the  sounds  are  not  easily  masked 
and  are  within  the  hearing  range  of  most  listeners. 

Rhythm,  duration  and  tempo:  Make  rhythms  as  different  as 
possible.  Putting  different  numbers  of  notes  in  each  earcon  is 
very  effective.  Earcons  are  likely  to  be  confused  if  the  rhythms 
are  similar,  even  if  there  are  large  spectral  differences.  Very 
short  note  lengths  might  not  be  noticed,  so  do  not  use  very 
short  sounds.  Earcons  should  be  kept  as  short  as  possible 
so  they  can  keep  up  with  interactions  in  the  interface  being 
sonified.  Two  earcons  can  be  played  in  parallel  to  speed  up 
presentation. 

Intensity:  This  should  not  be  used  as  a  cue  on  its  own  be¬ 
cause  it  is  a  major  cause  of  annoyance.  Earcons  should  be 
kept  within  a  narrow  dynamic  range  so  that  annoyance  can 
be  avoided  (see  section  on  avoiding  annoyance  above  for 
more  on  this  issue) . 

Major/minor  mode:  Lemmens  (2005)  showed  that  by 
changing  from  a  major  to  minor  key,  he  could  change  the  af¬ 
fective  responses  of  users  to  earcons.  In  western  music,  the 
minor  mode  is  broadly  thought  of  as  sad  and  the  major 
mode  as  happy.  This  can  be  used  as  a  further  cue  to  create 
differentiable  earcons. 

One  aspect  that  Brewster  also  investigated  was  musical 
ability — as  earcons  are  based  on  musical  structures,  is  it  only 
musicians  who  can  use  them?  The  results  showed  that  more 
complex  earcons  were  recalled  equally  well  by  nonmusicians  as 
they  were  by  musicians,  indicating  that  they  are  useful  to  a  more 
general  audience  of  users. 

In  a  further  series  of  experiments,  Brewster  (1998b)  looked 
in  detail  at  designing  hierarchical  earcons  to  represent  larger 
structures  (with  over  30  earcons  at  four  levels) .  These  were  de¬ 
signed  building  on  the  guidelines  above.  Users  were  given  a 
short  training  period  and  then  were  presented  with  sounds,  and 
they  had  to  indicate  where  the  sound  was  in  the  hierarchy. 

Results  were  again  good,  with  participants  recalling  over  80% 
correctly,  even  with  the  larger  hierarchy  used.  The  study  also 
looked  at  the  learning  and  memorability  of  earcons  over  time. 
Results  showed  that,  even  with  small  amounts  of  training,  users 
could  get  good  recall  rates,  and  the  recall  rates  of  the  same 
earcons  tested  a  week  later  were  unchanged. 

In  recent  work,  McGookin  and  Brewster  (2004)  looked  at 
presenting  multiple  earcons  in  parallel.  This  is  problematic  un¬ 
less  done  carefully,  as  the  structures  used  to  create  earcons 
also  cause  them  to  overlap  when  played  in  parallel.  McGookin 
and  Brewster  suggested  each  earcon  should  have  an  onset  de¬ 
lay  and  different  spatial  location  to  improve  understanding. 
(For  examples  of  earcons  in  use,  see  the  sonically  enhanced 
widgets  and  Palm  III  work  in  the  section  on  applications  of  au¬ 
ditory  output). 


Comparing  Auditory  Icons  and  Earcons 

Both  earcons  and  auditory  icons  are  effective  at  communicat¬ 
ing  information  in  sound.  There  is  more  formal  evidence  of  this 
for  earcons,  as  more  basic  research  has  looked  at  their  design. 
There  is  less  basic  research  into  the  design  of  auditory  icons,  but 
the  systems  that  have  used  them  in  practice  have  been  effective. 
Detailed  research  is  needed  into  auditory  icons  to  correct  this 
problem  and  to  provide  designers  with  design  guidance  on  how 
to  create  effective  sounds.  It  may  be  that  each  has  advantages 
over  the  other  in  certain  circumstances  and  that  a  combination 
of  both  is  the  best.  In  some  situations,  the  intuitive  nature  of  au¬ 
ditory  icons  may  make  them  favorable.  In  other  situations, 
earcons  might  be  best  because  of  the  powerful  structure  they 
contain,  especially  if  there  is  no  real-world  equivalent  of  what 
the  sounds  are  representing.  Indeed,  there  may  be  some  middle 
ground  where  the  natural  sounds  of  auditory  icons  can  be  ma¬ 
nipulated  to  give  the  structure  of  earcons. 

The  advantage  of  auditory  icons  over  earcons  is  that  they 
are  easy  to  learn  and  remember  because  they  are  based  on  nat¬ 
ural  sounds,  and  the  sounds  contain  a  semantic  link  to  the  ob¬ 
jects  they  represent.  This  may  make  their  association  to  cer¬ 
tain,  more  abstract  actions  or  objects  within  an  interface  more 
difficult.  Problems  of  ambiguity  can  also  occur  when  natural 
sounds  are  taken  out  of  the  natural  environment  and  context 
is  lost  (people  may  have  their  own  idiosyncratic  mappings).  If 
the  meanings  of  auditory  icons  must  be  learned  then  they  lose 
some  of  their  advantages,  and  they  come  closer  to  earcons. 

Earcons  are  abstract  so  their  meaning  must  always  be 
learned.  This  may  be  a  problem,  for  example,  in  walk  up  and  use 
type  applications.  Research  has  shown  that  little  training  is 
needed  if  the  sounds  are  well  designed  and  structured.  Lep- 
latre  and  Brewster  (2000)  began  to  show  that  it  is  possible  to 
learn  the  meanings  implicitly  while  using  an  interface  that  gen¬ 
erates  the  sounds  as  it  is  being  used.  However,  some  form  of 
learning  must  take  place.  According  to  Blattner  et  al.  (1989), 
earcons  may  have  an  advantage  when  there  are  many  highly 
structured  sounds  in  an  interface.  With  auditory  icons,  each  one 
must  be  remembered  as  a  distinct  entity  because  there  is  no 
structure  linking  them  together.  With  earcons,  there  is  a  strong 
structure  linking  them  that  can  easily  be  manipulated.  There  is 
not  yet  any  experimental  evidence  to  support  this. 

Pure  auditory  icons  and  earcons  make  up  the  two  ends  of  a 
presentation  continuum  from  representational  to  abstract  (see 
Fig.  13.6).  In  reality,  things  are  less  clear.  Objects  or  actions 
within  an  interface  that  do  not  have  an  auditory  equivalent  must 
have  an  abstract  auditory  icon  made  for  them.  The  auditory  icon 
then  moves  more  toward  the  abstract  end  of  the  continuum. 
When  hearing  an  earcon,  the  listener  may  hear  and  recognize  a 
piano  timbre,  rhythm,  and  pitch  structure  as  a  kind  of  catch 
phrase  representing  an  object  in  the  interface.  He  or  she  does 
not  hear  all  the  separate  parts  of  the  earcon  and  work  out  the 
meaning  from  them  (listeners  may  also  try  and  put  their  own 
representational  meanings  on  earcons,  even  if  the  designer  did 
not  intend  it  as  found  by  Brewster,  1998b).  The  earcon  then 
moves  more  toward  the  representational  side  of  the  continuum. 
Therefore,  earcons  and  icons  are  closer  than  they  might  appear. 

There  are  not  yet  any  systems  that  use  both  types  of  sounds 
fully,  and  this  would  be  an  interesting  area  to  investigate.  Some 


13.  NONSPEECH  AUDITORY  OUTPUT  •  259 


◄ - 1 - ► 

Representational  Abstract 

Auditory  Icons  Karcons 

FIGURE  13.6.  The  presentation  continuum  of  auditory  icons 
and  earcons. 


parts  of  a  system  may  have  natural  analogues  in  sound,  and 
therefore,  auditory  icons  could  be  used;  other  parts  might  be 
more  abstract  or  structured  and  earcons  would  be  better.  The 
combination  of  the  two  would  be  the  most  beneficial.  This  is 
an  area  ripe  for  further  research. 


THE  APPLICATIONS  OF  AUDITORY  OUTPUT 


Auditory  output  has  been  used  in  a  wide  range  of  different  sit¬ 
uations  and  applications.  This  section  will  outline  some  of  the 
main  areas  of  use  and  will  highlight  some  of  the  key  papers  in 
each  area  (for  more  uses  of  sound  see  the  ICAD  (http://www. 
icad.org)  or  ACM  CHI  (http://www.acm.org/sigchi)  series  of 
conferences). 

Sonic  Enhancement  of  GUIs 

One  long-running  strand  of  research  in  the  area  of  auditory  out¬ 
put  is  in  the  addition  of  sound  to  standard  graphical  displays  to 
improve  usability.  One  reason  for  doing  this  is  that  users  can  be¬ 
come  overloaded  with  visual  information  on  large,  high-resolu¬ 
tion  displays.  In  highly  complex  graphical  displays,  users  must 
concentrate  on  one  part  of  the  display  to  perceive  the  visual 
feedback,  so  that  feedback  from  another  part  may  be  missed. 
This  becomes  very  important  in  situations  where  users  must  no¬ 
tice  and  deal  with  large  amounts  of  dynamic  data.  For  example, 
imagine  you  are  working  on  your  computer  writing  a  report  and 
are  monitoring  several  ongoing  tasks,  such  as  compiling,  print¬ 
ing,  and  downloading  files  over  the  Internet.  The  word-process¬ 
ing  task  will  take  up  your  visual  attention  because  you  must  con¬ 
centrate  on  what  you  are  writing.  In  order  to  check  when  your 
printout  is  done,  the  compilation  has  finished,  or  the  Hies  have 
downloaded,  you  must  move  your  visual  attention  away  from  the 
report  and  look  at  these  other  tasks.  This  causes  the  interface 
to  intrude  into  the  task  you  are  trying  to  perform.  If  informa¬ 
tion  about  these  other  tasks  was  presented  in  sound,  you  could 
continue  looking  at  the  report  and  hear  information  in  the  back¬ 
ground  about  the  other  tasks.  To  find  out  how  the  file  download 
was  progressing,  you  could  just  listen  to  the  download  sound 
without  moving  your  visual  attention  from  the  writing  task. 

One  of  the  earliest  pieces  of  work  on  sonic  enhancement  of 
an  interface  was  Gaver’s  (1989)  SonicFinder  described  above. 
This  used  auditory  icons  to  present  information  about  the  Mac¬ 
intosh  interface  redundantly  with  the  graphical  display. 

Brewster  (1998a)  investigated  the  addition  of  sound  to  en¬ 
hance  graphical  buttons.  An  analysis  of  the  way  buttons  are  used 
was  undertaken,  highlighting  some  usability  problems.  It  was 
found  that  the  existing,  visual  feedback  did  not  indicate  when 
mispresses  of  a  button  might  have  occurred.  For  example,  the 


selection  of  a  graphical  button  Is  shown  in  Fig.  13.7  (starting  with 

1. A  and  2. A).  The  button  highlights  when  it  is  pressed  down  (Fig. 
13.7  l.B  and  2.B).  There  is  no  difference  in  feedback  between  a 
correct  selection  (Fig.  13.7  l.C)  and  a  misselection  (Fig.  13.7 

2. C),  where  the  user  moves  the  mouse  off  the  graphical  button 
before  the  selection  is  complete.  The  user  could  therefore  slip 
off  the  button,  fail  to  press  it,  and  get  no  feedback.  This  error 
can  happen  when  the  user  is  moving  away  from  the  button  and 
on  to  some  other  task.  For  example,  the  user  moves  to  a  toolbar 
to  press  the  “Bold”  button  and  then  moves  back  to  the  text  to 
position  the  cursor  to  start  typing.  The  button  press  and  the 
mouse  move  overlap,  and  the  button  is  not  pressed.  It  is  hard 
for  the  user  to  notice  this  because  no  feedback  is  given. 

The  problems  could  not  easily  be  solved  by  adding  more 
graphical  feedback:  the  user  is  no  longer  looking  at  the  button’s 
location  so  any  feedback  given  there  will  be  missed.  Feedback 
could  be  given  at  the  mouse  location  but  we  cannot  be  sure  the 
user  will  be  looking  there  either.  Brewster  (1998a)  designed  a 
new  button  that  used  auditory  feedback  to  indicate  more  about 
the  state  of  the  button.  This  was  advantageous,  as  sound  is  om¬ 
nidirectional,  and  the  user  does  not  need  to  focus  attention  on 
any  part  of  the  screen  to  perceive  it. 

Three  earcons  were  used  to  improve  the  effectiveness  of 
graphical  buttons.  An  organ  timbre  was  used  for  all  of  the 
sounds.  "When  the  user  moved  over  a  button,  a  continuous  tone 
was  played  at  130  Hz  at  a  volume  just  above  the  background 
sound  level.  This  informed  the  user  the  cursor  was  over  the 
target  (but  could  easily  be  habituated).  When  the  mouse  was 
pressed  down  over  the  graphical  button,  a  continuous  tone  was 
played  at  261  Hz.  The  third  sound  indicated  that  the  graphical 
button  had  been  successfully  selected.  This  sound  consisted  of 
two  short  tones  with  a  pitch  of  1046  Hz  and  duration  of  40ms. 
This  sound  was  not  played  if  a  slip-off  error  occurred.  If  the  user 
pressed  the  button  very  quickly,  then  only  the  success  sound 
was  played  to  avoid  unnecessary  feedback. 

An  experimental  evaluation  of  these  sounds  was  under¬ 
taken.  Results  showed  that  users  recovered  from  slip-off  errors 
significantly  faster  and  with  significantly  fewer  mouse  clicks 
when  sounds  were  present  in  the  buttons.  Users  also  signifi¬ 
cantly  preferred  the  buttons  with  sound  when  asked  to  rate 
subjective  preference.  They  also  did  not  rate  the  buttons  as 
more  annoying  than  the  standard  graphical  ones.  An  interest¬ 
ing  point  to  note  was  that  the  use  of  no  sound  could  be  atten¬ 
tion  grabbing  when  a  sound  was  expected.  The  participants 
could  easily  recognize  a  slip  off  due  to  the  demanding  nature  of 
the  success  sound  not  being  played.  This  is  important  as  re¬ 
ducing  the  amount  of  feedback  presented  is  one  way  to  make 
sure  that  it  is  not  annoying. 

Many  other  widgets  have  been  successfully  sonified.  Beau- 
douin-Lafon  and  Conversey  (1996)  showed  that  nonspeech 
sounds  could  improve  usability  of  scrollbars,  Maury,  Athenes, 
and  Chatty  (1999)  and  Marila  (2002)  added  sounds  to  improve 
menu  selections  in  drop-down  menus.  Ronkainen  and  Pasanen 
(2005)  did  several  studies  into  the  design  of  audio  feedback  for 
buttons.  Brewster  (1998a)  investigated  a  wide  range  of  differ¬ 
ent  widgets  including  scroll-bars,  menus,  progress  bars,  tool 
palettes,  and  drag  and  drop.  These  widgets  have  been  included 
in  a  toolkit  (Crease,  Gray,  &  Brewster,  2000)  that  designers  can 
use  to  add  sound  easily  to  their  interfaces. 
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FIGURE  13.7.  The  visual  feedback  presented  by  a  graphical  but¬ 
ton  when  selected.  1  shows  a  correct  selection  and  2  shows  a 
slip-off  (From  Brewster,  1998a). 


Sound  for  Users  with  Visual  Impairments 

One  of  the  most  important  uses  for  nonspeech  sound  is  in  in¬ 
terfaces  for  people  with  visual  disabilities.  One  of  the  main  de¬ 
privations  caused  by  blindness  is  the  problem  of  access  to  in¬ 
formation.  A  blind  person  will  typically  use  a  screen  reader  and 
a  voice  synthesizer  to  use  a  computer.  The  screen  reader  ex¬ 
tracts  textual  information  from  the  computer’s  video  memory 
and  sends  it  to  the  speech  synthesizer  to  speak  it.  This  works 
well  for  text  but  not  well  for  the  graphical  components  of  cur¬ 
rent  user  interfaces.  It  is  still  surprising  to  find  that  many  com¬ 
mercial  applications  used  by  blind  people  make  little  use  of 
nonspeech  sound,  concentrating  on  synthetic  speech  output. 
This  is  limiting  (as  discussed  above),  as  speech  is  slow,  it  can 
overload  short-term  memory,  and  it  is  not  good  for  presenting 
certain  types  of  information;  for  example,  it  is  not  possible  to 
render  many  types  of  images  via  speech,  so  these  can  become 
inaccessible  to  blind  people.  One  reason  for  sound’s  lack  of  use 
has  been  how  to  employ  it  effectively.  As  Edwards  (1995)  said, 
“Currently  the  greatest  obstacle  to  the  exploitation  of  the  variety 
of  communications  channels  now  available  is  our  lack  of  under¬ 
standing  of  how  to  use  them”  (p.  ii).  The  combination  of  speech 
and  nonspeech  sounds  can  increase  the  amount  of  information 
presented  to  the  user.  As  long  as  this  is  done  in  a  way  that  does 
not  overload  the  user,  then  it  can  improve  access  to  informa¬ 
tion.  Some  of  the  main  research  into  the  use  of  nonspeech  au¬ 
ditory  in  interfaces  for  blind  people  will  now  be  described  (for 
more  on  perceptual  impairments  in  general  see  chapter  43) . 


Soundtrack  was  an  early  attempt  to  create  a  word  processor 
designed  to  be  used  by  blind  persons  and  was  developed  by 
Edwards  (1989).  It  used  earcons  and  synthetic  speech  as  output, 
and  it  was  designed  so  that  the  objects  a  sighted  user  would 
see  in  an  interface,  for  example  menus  and  dialogues,  were  re¬ 
placed  by  auditory  equivalents  that  were  analogies  of  their  visual 
counterparts.  Its  interface  was  constructed  from  auditory  ob¬ 
jects  with  which  the  user  could  interact.  They  were  defined  by 
a  location,  a  name,  a  sound,  and  an  action.  They  were  arranged 
into  a  grid  of  two  layers,  analogous  to  menus  (see  Eig.  13.8). 

Each  auditory  object  made  a  sound  when  the  cursor  entered  it, 
and  these  could  be  used  to  navigate  rapidly  around  the  screen. 
Soundtrack  used  sine  waves  for  its  audio  feedback.  Chords  were 
built  up  for  each  menu,  dependent  on  the  number  of  menu  items. 
For  the  edit  menu,  a  chord  of  four  notes  was  played  because  there 
were  four  menu  items  within  it  (cut,  copy,  paste,  and  find). 

The  base  sounds  increased  in  pitch  from  left  to  right — as  in 
the  normal  representation  of  a  musical  scale  (for  example  on  a 
piano)  and  the  top  layer  used  higher  pitches  than  the  bottom. 
Using  these  two  pieces  of  information  a  user  could  quickly  find 
his  or  her  position  on  the  screen.  If  any  edge  of  the  screen  was 
reached,  a  warning  sound  was  played.  If  at  any  point  the  user 
got  lost  or  needed  more  precise  information,  he  or  she  could 
click  on  an  object  and  it  would  speak  Its  name. 

The  approach  taken  In  Soundtrack  was  to  take  the  visual  in¬ 
terface  to  a  word  processor  and  translate  it  into  an  equivalent 
auditory  form.  The  Mercator  system  (Mynatt  &  Edwards,  1995; 
Mynatt  &  "Weber,  1994)  took  a  broader  approach.  The  designers’ 
goal  was  to  model  and  translate  the  graphical  interfaces  of 
X  Windows  applications  into  sound  without  modifying  the  ap¬ 
plications  (and  thus  create  a  more  general  solution  than  Sound¬ 
track’s)  .  Their  main  motivation  was  to  simulate  many  of  the  fea¬ 
tures  of  graphical  interfaces  to  make  graphical  applications 
accessible  to  blind  users  and  keep  coherence  between  the  audio 
and  visual  interfaces  so  that  blind  and  sighted  users  could  in¬ 
teract  and  work  together  on  the  same  applications.  This  meant 
that  the  auditory  version  of  the  interface  had  to  facilitate  the 
same  mental  model  as  the  visual  one.  This  did  not  mean  that 
they  translated  every  pixel  on  the  screen  into  an  auditory  form; 
instead,  they  modeled  the  interaction  objects  that  were  present. 
Modeling  the  pixels  exactly  in  sound  was  ineffective  due  to  the 
very  different  nature  of  visual  and  auditory  media  and  the  fact 
that  graphical  interfaces  had  been  optimized  to  work  with  the 
visual  sense  (for  example,  the  authors  claim  that  an  audio  equiv¬ 
alent  of  overlapping  windows  was  not  needed  as  overlapping 
was  just  an  artifact  of  a  small  visual  display).  Nonspeech  sound 
was  an  important  aspect  of  their  design  to  make  the  iconic  parts 
of  a  graphical  interface  usable. 

Mercator  used  three  levels  of  nonspeech  auditory  cues  to 
convey  symbolic  information  presented  as  icons  in  the  visual  in¬ 
terface.  The  first  level  addressed  the  question,  what  is  this  ob- 
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FIGURE  13.8.  The  Soundtrack’s  main  screen  (From  Edwards,  1989). 
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ject?  In  Mercator,  the  type  of  an  interface  object  was  conveyed 
with  an  auditory  icon.  For  example,  touching  a  window  sounded 
like  tapping  a  piece  of  glass,  container  objects  sounded  like  a 
wooden  box  with  a  creaky  hinge,  and  text  fields  used  the  sound 
of  a  manual  typewriter.  While  the  mapping  was  easy  for  interface 
components  such  as  trashcan  icons,  it  was  less  straightforward 
components  that  did  not  have  simple  referents  in  reality  (e.g., 
menus  or  dialogue  boxes,  as  discussed  in  the  section  on  sonic 
enhancement  of  GUIs).  In  Mercator,  auditory  icons  were  also 
parameterized  to  convey  detailed  information  about  specific  at¬ 
tributes,  such  as  menu  length.  Global  attributes  were  also 
mapped  into  changes  in  the  auditory  icons.  For  example,  high¬ 
lighting  and  graying  out  are  common  to  a  wide  range  of  differ¬ 
ent  widgets.  To  represent  these,  Mynatt  and  Edwards  (1995) 
used  sound  filters.  A  low-pass  filter  was  used  to  make  the  sound 
of  a  grayed  out  object  duller  and  more  muffled. 

Sonification 

Building  on  the  work  about  accessibility  emerges  the  idea  of 
making  data  accessible.  Sonification,  or  visualization  in  sound, 
can  be  used  to  present  complex  data  nonvisually.  There  are 
many  situations  where  sonification  can  also  be  useful  for  sighted 
users  (if  they  only  have  access  to  a  small  screen  for  example), 
or  in  combination  with  graphical  feedback  in  multimodal  visu¬ 
alization  systems.  Sonification  is  defined  as  “the  transformation 
of  data  relations  into  perceived  relations  in  an  acoustic  signal  for 
the  purposes  of  facilitating  communication  or  interpretation” 
(Kramer  &  Walker,  1999).  The  range  of  sonification  goes  from 
the  clicks  of  the  Geiger  counter  to  multidimensional  informa¬ 
tion  presentation  of  stock  market  data. 

Mansur  et  al.  (1985)  performed  one  of  the  most  significant 
studies  presenting  data  in  sound.  Their  study,  which  laid  out  the 
research  agenda  for  subsequent  research  in  sound  graphs,  used 
sound  patterns  to  represent  two-dimensional  line  graphs.  The 
value  on  the  y- axis  of  the  graph  was  mapped  to  pitch  and  the 
x-axis  to  time;  this  meant  that  a  listener  could  hear  the  graph 
rise  and  fall  over  time  in  a  similar  way  that  a  sighted  person 
could  see  the  line  rising  and  falling.  This  is  the  basic  technique 
used  in  most  sonification  systems. 

They  found  that  their  approach  was  successful  in  allowing 
distinctions  to  be  made  between  straight  and  exponential 
graphs,  varying  monotonicity  in  graphs,  convergence,  and  sym¬ 
metry.  However,  they  did  find  that  there  were  difficulties  in 
identifying  secondary  aspects  of  graphs,  such  as  the  slope  of  the 
curves.  They  suggested  that  a  full  sound  graph  system  should 
contain  information  for  secondary  aspects  of  the  graph,  such 
as  the  first  derivative.  Their  suggestion  was  to  encode  this  in¬ 
formation  by  adding  more  overtones  to  the  sound  to  change 
the  timbre.  They  also  suggested  utilizing  special  signal  tones  to 
indicate  a  graph’s  maxima  or  minima,  inflection  points,  or  dis¬ 
continuities.  Many  other  studies  have  been  undertaken  to  de¬ 
velop  this  presentation  technique  further  (Flowers  &  Hauer, 
1992,  1995;  Walker,  2002). 

Walker  and  Cothran  (2003)  produced  the  Sonification  Sand¬ 
box  to  allow  designers  to  design  sound  graphs  easily  and  rapidly. 
The  software  allows  “users  to  independently  map  several  data 
sets  to  timbre,  pitch,  volume,  and  pan,  with  full  control  over  the 


default,  minimum,  maximum,  and  polarity  for  each  attribute.” 
This  gives  auditory  interface  designers  the  chance  to  prototype 
sonifications,  without  having  to  create  their  own,  custom-made 
applications.  (The  software  is  freely  available  from  http://sonify. 
psych.gatech.edu/research.) 

Sound  for  Mobile  and  Ubiquitous  Computing 

One  of  the  major  growth  areas  in  computing  at  the  beginning  of 
the  21st  century  has  been  in  mobile  and  ubiquitous  computing. 
People  no  longer  use  computers  just  while  at  a  desk.  Mobile  tele¬ 
phones,  personal  digital  assistants  (PDAs),  and  handheld  comput¬ 
ers  are  now  widely  used  (see  chapter  23  for  more  on  mobile  and 
ubiquitous  computing) .  One  problem  with  these  devices  is  that 
there  is  a  very  limited  amount  of  screen  space  on  which  to  dis¬ 
play  information:  the  screens  are  small  as  the  devices  must  be  able 
to  fit  into  the  hand  or  pocket  to  be  easily  carried.  Small  screens 
can  easily  become  cluttered  with  information  and  widgets  and  this 
presents  a  difficult  challenge  for  interface  designers. 

The  graphical  techniques  for  designing  interfaces  on  desk¬ 
top  interfaces  do  not  apply  well  to  handheld  devices.  Screen 
resources  are  limited;  memory  and  processing  power  are  much 
reduced  from  desktop  systems.  However,  in  many  cases,  inter¬ 
face  designs  and  interaction  techniques  have  been  taken  straight 
from  standard  desktop  graphical  interfaces  (where  screen  space 
and  other  resources  are  not  a  problem)  and  applied  directly  to 
mobile  devices.  This  has  resulted  in  devices  that  are  hard  to  use, 
with  small  text  that  is  hard  to  read,  cramped  graphics  and  little 
contextual  information.  Speech  and  nonspeech  sounds  an 
important  way  of  solving  these  problems. 

Another  reason  for  using  sound  is  that  if  users  are  perform¬ 
ing  tasks  while  walking  or  driving,  they  cannot  devote  all  of  their 
visual  attention  to  the  mobile  device.  Visual  attention  must  re¬ 
main  with  the  main  task  for  safety.  It  is  therefore  hard  to  design 
a  visual  interface  that  can  work  well  under  these  circumstances. 
An  alternative,  sonically  enhanced  interface  would  require  less 
visual  attention,  and  therefore  potentially  interfere  less  in  the 
main  activity  in  which  the  user  is  engaged. 

Three  main  pieces  of  work  are  surveyed  in  this  section,  cov¬ 
ering  the  main  approaches  taken  in  this  area.  The  first  adds 
sound  to  the  existing  interface  of  a  mobile  computer  to  improve 
usability,  the  second  creates  a  purely  auditory  interface  for  a  mo¬ 
bile,  and  the  third  creates  an  ambient  auditory  environment. 

Brewster  (2002)  developed  the  ideas  of  sonified  buttons  de¬ 
scribed  in  the  section  on  sonic  enhancement  of  GUIs  and  ap¬ 
plied  them  to  buttons  on  the  3Com  Palm  series  of  pen-based 
handheld  computers.  Many  of  the  same  feedback  problems 
with  buttons  apply  in  handhelds  as  in  desktops,  but  are  worse  as 
the  screen  is  smaller  (and  may  be  hard  to  see  when  the  device  is 
moving  or  the  sun  is  shining).  In  addition,  there  is  the  problem 
of  the  stylus  (or  finger)  obscuring  the  target  on  the  display, 
which  makes  it  difficult  for  users  to  know  when  they  are  press¬ 
ing  in  the  correct  place.  Simple  earcons  were  used  to  overcome 
the  problems.  One  aim  of  the  work  was  to  see  the  effects  when 
users  were  on  the  move  and  to  see  if  adding  audio  could  reduce 
the  size  of  the  widgets  so  that  screen  space  could  be  saved. 

In  general,  the  results  confirmed  those  of  the  previous  study. 
Subjective  workload  in  the  sonically  enhanced  buttons  was  re- 
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duced,  as  compared  to  their  silent  counterparts.  The  addition  of 
sound  allowed  the  participants  to  enter  significantly  more  five¬ 
digit  strings  than  in  the  corresponding  silent  treatment,  with 
smaller  sonic  buttons  as  effective  as  larger  silent  ones.  When 
walking,  there  was  a  20%  drop  in  performance  overall,  with  the 
sonic  interface  still  performing  better  than  the  standard  one. 
Participants  walked  further  when  sound  was  added,  and  small 
buttons  with  sound  allowed  as  much  text  to  be  entered  as  the 
large,  silent  buttons.  The  suggested  reason  for  this  was  that 
users  did  not  have  to  concentrate  so  much  of  their  visual  atten¬ 
tion  on  the  device;  much  of  the  feedback  needed  was  in  sound, 
so  they  could  look  where  they  were  going.  This  would  therefore 
allow  the  size  of  items  on  the  display  to  be  reduced  without  a 
corresponding  drop  in  usability. 

Sawhney  and  Schmandt  (1999,  2000)  developed  a  wearable 
computer-based  personal  messaging  audio  system  called  “no¬ 
madic  radio”  to  deliver  information  and  messages  to  users  on  the 
move.  One  of  the  aims  of  this  system  was  to  reduce  the  interrup¬ 
tions  to  a  user  caused  by  messages  being  delivered  at  the  wrong 
time  (for  example  mobile  telephone  calls  being  received  in  a 
meeting,  or  a  PDA  beeping  to  indicate  an  appointment  in  the  mid¬ 
dle  of  a  conversation).  In  the  system,  users  wore  a  microphone 
and  shoulder-mounted  loudspeakers  that  provide  a  basic  planar 
3D  audio  environment  (see  section  on  three  dimensional  sound) 
through  which  the  audio  was  presented.  A  clock  face  metaphor 
was  used  with  12:00  in  front  of  the  user’s  nose,  3:00  by  the  right 
ear,  6:00  directly  behind  the  head,  and  so  forth.  Messages  were 
then  presented  in  the  position  appropriate  to  the  time  that  they 
arrived.  The  advantage  of  the  3D  audio  presentation  (as  described 
above)  is  that  it  allows  users  to  listen  to  multiple,  simultaneous 
sound  streams  and  to  distinguish  and  separate  each  one  (the 
cocktail  party  effect  shows  that  listeners  can  attend  one  stream  of 
sound  amongst  many,  but  also  monitor  the  others  in  case  they 
need  attention;  Arons,  1992). 

The  system  used  a  context-based  notification  strategy  that 
dynamically  selected  the  appropriate  notification  method 
based  on  the  user’s  focus  of  attention.  Seven  levels  of  audi¬ 
tory  presentation  were  used  from  silent  to  full  speech  ren¬ 
dering.  If  the  user  was  engaged  in  a  task,  then  the  system  was 
silent  and  no  notification  of  an  incoming  call  or  message 
would  be  given  (so  as  not  to  cause  an  interruption).  The  next 
level  used  ambient  cues  (based  on  auditory  icons)  with 
sounds  like  running  water  indicating  that  the  system  was  op¬ 
erational.  These  cues  were  designed  to  be  easily  habituated 
but  to  let  the  user  know  that  the  system  was  working.  The 
next  level  was  a  more  detailed  form  of  auditory  cue,  giving 
information  on  system  events,  task  completions,  and  mode 
transitions.  For  example,  a  ringing  telephone  sound  was  used 
to  indicate  the  arrival  of  voicemail.  These  were  more  attention 
grabbing  cues  than  the  ambient  cues  and  would  only  be 
played  if  the  user  was  not  fully  occupied.  The  next  four  levels 
of  cue  used  speech,  expanding  from  a  simple  message  sum¬ 
mary  up  to  the  full  text  of  a  voicemail  message.  These  might 
be  used  if  the  person  wearing  nomadic  radio  was  not  involved 
in  tasks  that  required  detailed  attention.  The  system  at¬ 
tempted  to  work  out  the  appropriate  level  to  deliver  the  no¬ 
tifications  by  listening  to  the  background  audio  level  near  the 
user  (using  the  built-in  microphone)  and  by  determining 
whether  the  user  was  speaking.  For  example,  if  the  user  was 
speaking,  the  system  might  use  an  ambient  cue  so  as  not  to 


interrupt  the  conversation.  Users  could  also  press  a  button  on 
the  device  to  indicate  they  were  busy  and  turn  it  to  silent. 

Three-dimensional  sound  has  been  combined  with  gestures 
to  create  interactions  where  users  can  point  at  sound  sources  to 
choose  them.  An  early  example  was  from  Cohen  (1993)  who 
created  audio  windows  that  users  could  manipulate  with  ges¬ 
tures,  much  as  windows  on  a  desktop  computer  could  be  con¬ 
trolled.  Brewster,  Lumsden,  Bell,  Hall,  and  Tasker  (2003)  made 
this  idea  mobile  and  created  a  soundscape  of  audio  sources 
around  a  listener’s  head  that  presented  different  types  of  infor¬ 
mation.  Users  nodded  at  a  sound  source  of  interest  to  select  it. 
A  simple  study  showed  that  users  could  walk  and  nod  to  select 
items,  but  that  there  were  many  issues  with  sound  placement 
and  feedback.  Further  study  by  Marentakis  and  Brewster  (2004) 
looked  at  different  types  of  gestures  and  feedback  to  improve 
the  quality  of  mobile  3D  audio  interactions. 

There  has  been  much  work  in  the  area  of  notification  systems 
using  audio  for  ambient  displays.  Carefully  designed  nonspeech 
audio  can  grab  attention  and  then  fade  into  the  background. 
An  early  example  was  Audio  Aura  by  Mynatt,  Back,  'Want,  Baer, 
and  Ellis  (1998),  which  aimed  “to  provide  serendipitous  infor¬ 
mation,  via  background  auditory  cues,  that  is  tied  to  people’s 
physical  actions  in  the  workplace”  (p.  566).  In  a  similar  way  to 
nomadic  radio.  Audio  Aura  used  auditory  icons  to  provide  back¬ 
ground  information  that  did  not  distract  users. 

The  system  used  active  badges  so  that  the  location  of  users 
could  be  identified  and  appropriate  audio  cues  given,  along 
with  wireless  headphones  so  that  users  could  hear  the  sounds 
without  distracting  others.  The  location  information  from  the 
active  badges  was  combined  with  other  data  sources  such  as  on¬ 
line  calendars  and  e-mail.  Changes  in  this  information  triggered 
audio  cues  sent  to  the  user  through  the  headphones. 

Here  are  some  examples  of  how  the  system  might  be  used.  In 
the  first  the  user  might  go  the  office  coffee  room  and  as  he  or  she 
enters  the  room  hear  information  about  the  number  and  type  of 
e-mail  messages  currently  waiting.  This  would  give  the  user  a  cue  as 
to  whether  to  stay  and  talk  to  colleagues  or  go  back  to  the  office 
to  answer  the  messages.  In  the  second  example,  a  user  goes  to  a 
colleague’s  office  but  the  occupant  is  not  there.  Audio  Aura  would 
play  sounds  indicating  if  the  occupant  has  been  in  recendy  or  away 
for  a  longer  period.  The  authors  were  keen  to  make  sure  the 
sounds  were  not  distracting  and  attention  grabbing — they  were 
meant  to  give  background  information  and  not  to  be  alarms.  To  this 
end,  great  care  was  taken  with  the  cue  design.  They  attempted  to 
design  sonic  ecologies — groups  of  sounds  that  fitted  together  into 
a  coherent  whole.  For  example,  one  set  of  cues  was  based  on  a 
beach  scene.  The  amount  of  new  e-mail  was  mapped  to  seagull 
cries:  the  more  mail  the  more  the  gulls  cried.  Group  activity  levels 
were  mapped  to  the  sound  of  surf:  the  more  activity  going  on 
within  the  group  the  more  active  the  waves  became.  These  cues 
were  very  subtle  and  did  not  grab  users’  attentions,  but  some  learn¬ 
ing  of  the  sounds  would  be  needed,  as  they  are  quite  abstract. 

This  work  was  taken  on  and  implemented  in  a  realistic  envi¬ 
ronment  by  Kilander  and  Lonnqvist  (2001).  They  created  a 
■Weakly  Intrusive  Ambient  Soundscape  (WISP),  where  states  in 
the  computational  or  physical  ubiquitous  computing  environ¬ 
ment  are  presented  as  subtle,  nonintrusive  sound  cues  based 
on  auditory  icons,  with  each  cue  “sufficiently  nonintrusive  to  be 
accepted  without  disturbing  the  focus  of  the  task  at  hand,  while 
distinctive  enough  to  be  separable  from  other  cues”  (Kilander 
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&  Lonnqvist,  2002,  p.  1).  They  describe  a  meeting  room  scenario 
where  devices  such  as  handheld  computers,  public  PCs,  and 
clocks  might  all  be  able  to  make  sounds  and  give  cues  for  ambi¬ 
ent  awareness.  The  level  of  intrusiveness  of  the  sounds  could 
be  varied.  For  low  intrusiveness,  a  quiet  sound  with  lots  of  reverb 
was  played,  making  the  cue  sound  far  away  and  almost  inaudible; 
for  high  intrusiveness,  sharp  sounds  with  no  reverb  were  played. 

One  problem  with  the  system  was  choice  of  sounds;  users 
could  be  detected  by  the  environment,  their  personal  sound 
mappings,  and  parameters  chosen.  However,  these  mappings 
could  conflict  with  others’  choices;  for  example,  two  users 
might  use  the  same  sound  cue  for  different  events.  This  could 
be  solved  to  some  extent  with  local  presentation  of  the  sounds, 
as  in  nomadic  radio,  but  global  cues  would  be  more  of  a  prob¬ 
lem  for  these  types  of  systems. 


CONCLUSIONS 


Research  into  the  use  of  nonspeech  sounds  for  information  dis¬ 
play  at  the  user  interface  began  in  the  early  1990s,  and  there  has 


been  rapid  growth  since  then.  It  has  shown  its  benefits  in  a  wide 
range  of  different  applications  from  systems  for  blind  people  to 
ubiquitous  computing.  There  are  many  good  examples  that  de¬ 
signers  can  look  at  to  see  how  sounds  may  be  used  effectively, 
and  design  guidelines  are  now  starting  to  appear. 

Two  areas  are  likely  to  be  important  in  its  future  growth.  The 
first  is  in  combining  sound  with  other  senses  (vision,  tactile, 
force-feedback,  etc.)  to  create  multimodal  displays  that  make 
the  most  of  the  all  the  senses  available  to  users.  This  is  an  area 
ripe  for  further  investigation,  and  many  interesting  interaction 
problems  can  be  tackled  when  multiple  senses  are  used  to¬ 
gether.  Key  questions  here  are  around  what  sound  is  best  for 
and  how  to  combine  it  with  the  others  most  effectively.  The 
second  area  in  which  nonspeech  sound  has  a  large  part  to  play 
is  with  mobile  and  wearable  computing  devices  (again  also  in  a 
multimodal  form).  Small  screens  cause  many  difficult  presenta¬ 
tion  problems,  and  this  is  exactly  the  situation  where  sound  has 
many  advantages — it  does  not  take  up  any  precious  screen 
space,  and  users  can  hear  it  even  if  they  cannot  look  at  their  de¬ 
vice.  In  a  ubiquitous  setting,  there  may  not  even  be  a  screen  at 
all,  and  sound  can  provide  information  on  the  services  avail¬ 
able  in  a  particular  environment  in  a  nonintrusive  way. 
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INTRODUCTION 


In  some  ways  this  chapter  could  be  seen  as  redundant  in  an  HCI 
book — surely  networks  are  just  an  implementation  mechanism, 
a  detail  below  the  surface;  all  that  matters  are  the  interfaces  that 
are  built  on  them.  On  the  other  hand,  networked  interfaces, 
especially  the  web — and,  increasingly,  mobile  devices — have 
changed  the  way  we  view  the  world  and  society.  Even  those  bas¬ 
tions  of  conservatism,  the  financial  institutions,  have  found 
themselves  in  sea  change,  and  a  complete  re-structuring  of  the 
fundamentals  of  businesses  is  just  an  implementation  detail. 

Structure 

The  chapter  will  begin  with  a  brief  overview  of  types  of  net¬ 
works  and  then  deal  with  network-based  interaction  under  four 
main  headings: 

•  Networks  as  Enablers — things  that  are  only  possible  with 
networks 

•  Networks  as  Mediators — issues  and  problems  because  of 
networks 

•  Networks  as  Subjects — understanding  and  managing 
networks 

•  Networks  as  Platforms — algorithms  and  architectures  for 
distributed  interfaces 

In  addition,  there  will  be  a  section  taking  a  broader  view  of 
the  history  and  future  of  network  interaction  and  the  societal 
effects  and  paradigm  changes  engendered,  especially  by  more 
recent  developments  in  global  and  wireless  networking. 


ABOUT  NETWORKS 


The  word  network  will  probably  make  many  think  of  accessing 
the  Internet  and  the  web.  Others  may  think  of  a  jumble  of  Ether¬ 
net  wires  between  the  PCs  in  their  office.  In  fact,  the  range  of  net¬ 
working  standards,  including  physical  cabling  (or  lack  of  cabling) 
and  the  protocols  that  computers  use  to  talk  along  those  cables, 
is  extensive.  Although  most  of  the  wire-based  networks  have 
been  around  for  some  time,  they  are  in  a  state  of  flux,  due  to  in¬ 
creases  in  scale  and  the  demands  of  continuous  media.  In  the 
wireless  world  things  are  changing  even  more  rapidly  with  two 
new  generations  of  data  service  being  introduced  over  the  next 
two  years. 


As  an  aid  to  seeing  the  broader  issues  surrounding  these  chang¬ 
ing  (and,  in  some  cases,  potentially  ephemeral)  technologies,  we 
can  use  the  following  two  dimensions  to  classify  them: 

•  Global  vs.  Local 

How  spatially  distant  are 
the  points  connected —  local 

ranging  from  machines 
in  the  same  room  (IrDa, 

Bluetooth),  through  global 
those  in  a  building/site 
(LAN)  to  global  net¬ 
works  (Internet,  mobile 
phone  networks)? 

•  Fixed  vs.  Flexible 
How  permanent  are  the  links  between  points  of  the  network, 
from  physically  fixed  machines,  to  self-reconfiguring  devices 
that  recognize  other  devices  in  their  vicinity? 

The  fixed  vs.  flexible  dimension  is  almost,  but  not  quite,  ter¬ 
restrial  vs.  wireless.  The  “not  quite”  is  because  fixed  networks 
increasingly  involve  wireless  links.  Also,  it  is  often  possible, 
when  visiting  another  organization,  to  plug  a  portable  computer 
into  a  (wired)  Ethernet  network  and  find  you  have  access  to 
the  local  printers,  Internet  connections,  and  so  on — flexible 
wire-based  networking. 

Let’s  look  at  a  few  network  technologies  against  these  di¬ 
mensions.  Traditional  office  LANs  (local  area  networks)  are 
squarely  in  the  local-fixed  category,  whereas  the  Internet  is 
largely  global-fixed.  Corporate  WANs  (wide  area  networks), 
connecting  offices  within  the  same  national  or  international 
company,  sit  somewhere  between. 

Mobile  phones  have  been  placed  within  the  global-fixed 
category  as  well.  This  may  seem  strange;  the  phone  can  go  any¬ 
where.  However,  the  interconnections  between  phones  are 
fixed  and  location-independent.  If  two  mobile  phones  are  in 
the  same  room  it  is  no  easier  to  connect  between  them  than  if 
they  were  at  opposite  ends  of  the  earth  (bar  a  shorter  lag  time 
perhaps) . 

Similarly,  the  Internet,  although  increasingly  accessible 
through  mobile  devices  and  phones,  is  largely  based  on  fixed 
domain  names,  IP  numbers,  and  URLs. 

Given  the  ideas  that  the  placement  of  mobile  phones  is  a  lit¬ 
tle  ambiguous  and  it  is  possible  to  detect  the  location  of  phones 
and  thus  deliver  location-based  content,  some  of  the  phone 
technologies  have  been  listed  in  the  global-flexible  category. 
These  include  GSM  (Global  System  for  Mobile  Communica¬ 
tions),  GPRS  (General  Packet  Radio  Service),  and  3G.  There  is 
obviously  a  steadily  increasing  data  rate,  and  third-generation 
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services  are  able  to  cope  with  heavy  media  content  including 
live  video;  in  fact,  the  “killer  app”  is  live  sports  highlights!  How¬ 
ever,  the  most  significant  differences  are  the  charging  and  con¬ 
nectivity  model.  With  GSM  you  connect  when  required  to  the 
Internet  and  this  is  treated  like  any  other  telephone  call,  usu¬ 
ally  meaning  pay-per-minute  whilst  connected.  In  contrast,  sec¬ 
ond-  and  third-generation  services  are  based  on  sending  small 
packets  of  data  (the  P  in  the  GPRS  acronym) .  The  connection 
to  the  Internet  is  treated  as  “always  on”  and  packets  of  data  are 
sent  to  or  from  the  phone  as  required.  Charging  is  also  typically 
by  data  use  or  by  fixed  charge. 

In  the  local-flexible  category  there  is  a  host  of  existing  and 
emerging  technologies.  At  the  most  mundane  are  the  now  ubiq¬ 
uitous  WiFi  networks  and  hotspots  (based  on  the  802.11  proto¬ 
col)  (IEEE,  2001).  These  merely  treat  the  machine  the  same  as  if 
it  were  plugged  into  the  local  fixed  network.  At  a  more  local 
scale,  infrared  (IrDa)  enabled  devices  can  talk  to  one  another  if 
their  infrared  sensors  are  within  line-of-sight,  and  Bluetooth 
(Bluetooth,  2001)  or  emerging  wireless  technologies  such  as 
ZigBee  or  UWB  (Ultra  Wide  Band)  (Zigbee,  2006;  WiMedia, 
2006)  allow  flexible  connections  between  personal  devices. 
With  these,  a  laptop  can  use  a  mobile-phone  modem,  or  a  Blue¬ 
tooth  hands-free  headset  can  connect  to  a  phone  without  hav¬ 
ing  to  plug  in  with  a  piece  of  wire. 

These  same  technologies  can  also  be  used  to  establish  local 
connections  with  printers  or  other  devices  or  even  track  peo¬ 
ple  using  the  unique  addresses  that  are  often  broadcast  contin¬ 
ually.  Thus  they  offer  both  the  opportunities  of  accessing  fixed 
public  equipment  through  personal  devices,  as  well  as  the 
threat  of  surveillance  and  hacking  everywhere! 

Finally,  research  in  wearable  computers  has  suggested  using 
the  body  itself  as  the  connection  between  worn  devices  in  a  per¬ 
sonal  area  network  (PAN)  (Zimmerman,  1996).  The  future  is 
networked  and  we  will  become  the  network. 

On  the  whole  we  have  seen  in  the  last  10  years  the  main  fo¬ 
cus  of  network-based  interaction  has  moved  anti-clockwise  in 
this  picture  from  fixed/local  networks  (mainly  LAN),  through 
fixed  global  networks  (the  Internet  and  web  explosion), 
through  global  mobile  networks  (mostly  phone-based,  but  in¬ 
cluding  WAP  (Wireless  Application  Protocol),  i-mode,  etc.),  and 
moving  towards  flexible  local  connections  between  devices.  In 
both  the  local  and  global  spaces  there  has  also  been  a  growth 
of  less  centrally  controlled  networking  with  peer-peer  services 
establishing  decentralized  applications  over  the  Internet,  and 
wireless  ad-hoc  networks  allowing  machines  to  establish  net¬ 
works  with  no  fixed  infrastructure. 


NETWORKS  AS  ENABLERS 


Things  That  Are  Only  Possible  With  Networks 

It  can  be  the  case  that  the  network  is  no  more  than  an  imple¬ 
mentation  detail;  for  example,  using  a  networked  disk  rather 
than  a  local  one.  However,  there  are  also  many  applications,  like 
videoconferencing,  which  are  only  possible  because  the  net¬ 
work  is  there.  The  key  feature  of  networks  is  the  access  to  re¬ 
mote  resources  of  some  kind  or  other. 


Remote  Resources 

Four  kinds  of  remote  things  are  made  accessible  by  networks: 

•  People 

•  Physical  things 

•  Data 

•  Computation 

These  things  may  be  remote  because  they  are  far  away  from 
where  you  normally  are,  or  because  you  are  yourself  on  the  move 
and  hence  away  from  your  own  resources  (colleagues,  databases, 
etc.).  Hence,  mobility  can  create  a  need  for  any  or  all  the  above. 

People 

Networks  mean  we  can  communicate  and  work  with  others 
in  distant  places.  This  is  often  a  direct  action,  such  as  e-mailing 
someone  or  engaging  in  a  videoconference.  These  are  all  the 
normal  study  of  CSCW  (Computer-Supported  Cooperative 
Work)  and  groupware  (see  chapter  29) . 

Interaction  with  remote  people  may  also  be  indirect.  Rec- 
ommender  systems  gather  information  about  people’s  prefer¬ 
ences  and  use  this  to  suggest  further  information,  services,  or 
goods  based  on  their  own  preferences  and  those  of  others  who 
have  similar  tastes  (Resnick  &  Varian,  1997).  Because  the  people 
making  recommendations  are  in  different  locations,  the  data  on 
who  selected  what  must  be  stored  centrally.  If  you  have  been 
suggested  books  at  Amazon.com,  you  have  experienced  a  rec- 
ommender  system. 

Collaborative  virtual  environments  also  offer  the  ability  for  re¬ 
mote  people  to  interact,  but  by  embedding  them  within  an  ap¬ 
parently  local  virtual-reality  world.  Although  the  person  you  are 
dealing  with  may  be  half  a  world  away,  her  avatar  (a  virtual  pres¬ 
ence,  perhaps  a  cartoon  character,  photo,  or  robot-like  creature) 
may  seem  only  a  few  yards  or  meters  away  in  the  virtual  world. 

Physical  Things 

We  can  also  view  and  control  remote  things  at  a  distance.  For 
example,  live  web  cams  in  public  places  allow  us  to  see  things 
(and  people)  there.  Similarly,  the  cameras  mounted  around 
rockets  as  they  prepare  to  take  off  (and  are  then  usually  de¬ 
stroyed  during  the  launch)  allow  the  mission  controllers  to  mon¬ 
itor  critical  aspects  of  the  physical  system,  as  do  the  numerous 
telemetry  sensors  which  will  also  be  related  via  some  sort  of 
closed  network.  And  of  course  the  launch  command  itself  will  be 
relayed  to  the  rocket  by  the  same  closed  network  as  will  the  on¬ 
going  mission,  perhaps  the  Mars  robots,  via  wireless  links. 

In  the  rocket  example  it  would  be  dangerous  to  be  in  the  ac¬ 
tual  location;  in  other  circumstances  it  is  merely  expensive  or  in¬ 
convenient.  Telescopes  are  frequently  mounted  in  distant  parts 
of  the  world  where  skies  are  clearer  than  those  above  the  labo¬ 
ratories  to  which  they  belong.  In  order  to  avoid  long  interna¬ 
tional  trips  to  remote  places,  some  of  these  telescopes  now 
have  some  form  of  remote  control  and  monitoring  using  the 
Internet  (Lavery  Kilgourz,  &  Sykeso,  1994). 

At  a  more  personal  level  the  systems  within  certain  high-end 
cars  are  controlled  using  a  within-car  network  (called  CAN) .  Even 
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an  adjustable  heated  seat  may  require  dozens  of  control  wires, 
but  with  a  network  only  one  power  and  one  control  cable  is 
needed.  The  engine  management  system,  lighting  assemblies, 
radio,  CD  player,  and  wind  screen  wipers  each  have  a  small 
controller  that  talks  through  the  network  to  the  driver’s  con¬ 
sole  (although  critical  engine  systems  will  usually  have  a  sepa¬ 
rate  circuit). 

Many  household  appliances  are  now  being  made  Internet- 
ready.  In  some  cases  this  may  mean  an  actual  interface;  for  ex¬ 
ample,  an  Internet  fridge  that  can  scan  the  barcodes  of  items  as 
you  put  them  in  and  out  and  then  warn  you  when  items  are  get¬ 
ting  out  of  date,  generate  a  shopping  list  of  items  for  you,  and 
even  order  from  your  favorite  store  (Electrolux,  1999).  Others 
have  instead,  or  in  addition,  connectivity  for  maintenance  pur¬ 
poses,  sending  usage  and  diagnostic  data  back  to  the  manufac¬ 
turer  so  that  technicians  can  organize  service  or  repair  visits  be¬ 
fore  the  appliance  fails  in  some  way. 

In  some  ways  Internet  shopping  can  also  be  seen  in  this 
light.  Whilst  at  one  level  it  is  merely  a  transfer  of  data,  the  ulti¬ 
mate  end  is  that  you  receive  physically  the  ordered  goods.  This 
interaction  with  the  remote  physical  goods  is  often  two-way  as 
you  track  its  progress,  and  sometimes  its  physical  location, 
through  a  web  interface. 

Data 

Anyone  using  the  web  is  accessing  remote  data.  Sometimes, 
data  is  stored  remotely  purely  for  convenience,  but  often  data  is 
necessarily  stored  remotely 

•  because  it  is  shared  by  many  remote  people; 

•  because  central  storage  helps  maintain  control,  security,  or 
privacy; 

•  because  it  is  used  by  a  single  user  at  different  locations  (e.g., 
web  e-mail); 

•  because  it  is  too  extensive  to  be  stored  locally  (e.g.,  large 
databases  and  thin  client). 


In  the  case  of  the  Web  the  data  is  remote  because  it  is  ac¬ 
cessed  by  different  people  at  different  locations,  the  author(s) 
of  the  material,  and  all  those  who  want  to  read  it. 

Even  though  the  web  is  quite  complex,  we  may  perceive  a  web 
page  as  a  single  entity,  but  in  fact  it  exists  in  many  forms  (Eig.  14.1). 
The  author  of  the  page  will  typically  have  created  it  offline  on  his 
or  her  own  PC.  He  or  she  then  uploads  the  page  (which  effectively 
means  copying  it)  onto  the  web  server.  Any  changes  the  author 
makes  after  uploading  the  page  will  not  be  visible  to  the  world  un¬ 
til  it  is  next  uploaded.  When  a  user  wants  to  see  the  page  and  en¬ 
ters  a  URL  or  clicks  on  a  link,  his  browser  asks  the  web  server  for 
the  file,  which  is  then  copied  into  the  browser’s  memory  and  dis¬ 
played  to  the  user.  You  can  tell  the  browser  has  a  copy  as  you  can 
disconnect  from  the  Internet  and  still  scroll  within  the  file.  If  you 
access  the  same  page  again  quite  soon,  your  browser  may  choose 
to  use  the  copy  it  holds  rather  than  going  back  to  the  web  server, 
again  potentially  meaning  you  see  a  slightly  out-of-date  copy  of 
the  page.  ’Various  other  things  may  keep  their  own  cached  copies, 
including  web  proxies  and  firewalls. 

This  story  of  copied  data  in  various  places  is  not  just  about 
the  web,  but  is  true  to  some  extent  or  other  of  all  shared  net¬ 
worked  data.  With  people  or  physical  things,  we  do  not  expect  to 
have  the  actual  person  or  thing  locally,  just  a  representation.  This 
is  equally  true  for  shared  data  except  that  the  representation  is 
so  much  like  the  “real  thing”  it  is  far  less  obvious  to  the  user. 

For  shared  networked  data,  even  the  “real  thing”  may  be 
problematic — there  may  be  no  single  “golden  copy,”  but  instead 
many  variants,  all  with  equal  right  to  be  called  “the  real  data.” 

You  don’t  even  escape  networking  issues  if  you  only  access 
data  locally  on  your  own  PC;  networking  issues  may  still  arise  if 
your  data  is  backed-up  over  the  network. 

Computation 

Sometimes  remote  computational  resources  are  accessed 
over  the  network.  The  most  obvious  example  of  this  is  large  su¬ 
percomputers.  These  have  enormous  computational  power, 
and  scientists  wishing  to  use  them  will  often  pre-book  time  slots 


FIGURE  14.1.  Copies  of  a  web  page  in  many  places. 
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to  perform  particularly  intensive  calculations  such  as  global 
weather  simulations,  analysis  of  chemical  structure,  stress  cal¬ 
culations,  and  so  on.  Because  these  machines  are  so  expensive, 
programs  for  them  are  typically  developed  on  other,  less  pow¬ 
erful  computers  and  then  uploaded  over  the  network  when  the 
supercomputer  is  available. 

If  the  data  required  as  input  or  output  for  the  calculation  is 
not  too  great,  “fairly  simple”  means  can  be  used  to  upload  the 
programs  and  data.  However,  some  calculations  work  on  large 
volumes  of  data;  for  example,  data  from  microwave  readings  of 
the  upper  atmosphere  to  probe  the  ozone  hole  generate  ter¬ 
abytes  (millions  of  millions  of  bytes)  of  data  per  second.  High- 
capacity  networks  are  being  created  in  many  countries  to  enable 
both  high-volume  data  for  this  sort  of  application  as  well  as  the 
expected  data  required  for  rich  media  (Foster,  2000;  Foster  & 
Kesselman,  1999;  GRID,  2001). 

The  ease  with  which  data  and  results  can  be  shipped  back 
and  forth  across  the  Internet  has  enabled  the  growth  of  web  ser¬ 
vices,  web  applications  designed  to  be  accessed  by  other  pro¬ 
grams  supplying  services  or  data.  As  well  as  more  scientific  or 
heavy  commercial  uses  of  these,  they  have  become  a  standard 
part  of  many  consumer-oriented  applications — for  example, 
“del.icio.us”  has  a  standard  API  accessible  through  the  web  al¬ 
lowing  third-party  applications  to  interact  with  it. 

Sometimes  calculations  need  to  be  performed  centrally,  not 
because  the  central  computer  is  powerful,  but  because  the  local 
device  is  a  computational  lightweight.  For  example,  one  may 
want  to  create  a  remote  analysis  package  where  engineers  in  the 
field  enter  data  into  a  PDA  or  phone  interface,  but  where  com¬ 
plex  stress  calculations  are  carried  out  on  a  small  server  back  in 
the  office.  The  data  on  materials  and  calculations  involved  may 
not  be  extensive  by  supercomputer  standards,  but  may  still  be 
too  much  for  a  handheld  device. 

Because  transporting  large  volumes  of  data  is  not  always  prac¬ 
tical,  calculations  are  often  performed  where  the  data  is.  (In  per¬ 
forming  any  computation  program,  data  and  computational  en¬ 
gines  must  all  be  in  the  same  place.  If  they  aren’t  together,  then 
one  or  the  other  must  be  moved  or  copied  to  bring  them  together 
(Ramduny  &  Dix,  1997)).  For  example,  when  you  perform  a  data¬ 
base  access,  the  request  for  the  data  is  usually  transmitted  to  the 
database  server  as  an  SQL  query,  for  example:  “SELECT  name, 
salary  FROM  payroll  WHERE  salary  >  70000.”  In  principle  the  com¬ 
plete  contents  of  the  payroll  database  could  be  downloaded  to 
your  PC  and  the  selection  of  appropriate  records  carried  out  locally, 
however,  it  would  be  more  costly  to  transmit  the  data  hence  the 
calculation  is  effectively  transmitted  to  the  database  server.  In  a  sim¬ 
ilar  vein,  Alexa  allows  third  parties  to  run  programs  on  their  servers 
through  their  Web  Search  Platform  (AlexaWSk  2006);  this  allows 
the  programs  to  access  a  100-terabyte  web  crawl  that  would  be  im¬ 
practical  (and  commercially  unacceptable)  to  transfer  to  clients. 

Even  when  the  volume  of  data  is  not  large  or  the  frequency 
of  access  would  make  it  cost  effective  to  transmit  it,  security  or 
privacy  reasons  may  prevent  the  download  of  data.  For  example, 
some  datasets  are  available  to  search  to  a  limited  degree  on  the 
web,  but  charge  for  a  download  or  CD  of  the  complete  dataset. 
My  own  hcibook.com  site  (Dix,  Finlay,  Abowd,  &  Beale,  1998) 
is  rather  like  this,  allowing  searching  of  the  book’s  contents  on¬ 
line  and  displaying  portions  of  the  text,  but  not  allowing  a  full 
download  as  readers  are  expected  to  buy  the  book! 


For  those  who  haven’t  come  across  it,  the  SETI  (the  Search  for 
Extra-Terrestrial  Intelligence)  project  is  analyzing  radio  signals 
from  outer  space  looking  for  patterns  or  regularities  that  may 
indicate  transmissions  from  an  alien  civilization.  You  can  down¬ 
load  a  SETI  screensaver  that  performs  calculations  for  SETI  when 
you  are  not  using  your  machine.  Each  SETI  screensaver  periodi¬ 
cally  gets  bits  of  data  to  analyze  from  the  central  SETI  servers  and 
then  returns  results.  This  means  that  the  SETI  project  ends  up 
with  the  combined  computational  resources  of  many  hundreds  of 
thousands  of  PCs. 


Security  considerations  may  also  prohibit  the  distribution  of 
programs  themselves  if  they  contain  proprietary  algorithms. 
Also,  if  the  source  of  the  program  is  not  fully  trusted,  one  may 
not  want  to  run  these  programs  locally.  The  latter  is  the  reason 
that  Java  applets  are  run  in  a  software  “sandbox”  confining  the 
ability  of  the  applet  to  access  local  files  and  other  potentially  vul¬ 
nerable  resources. 

The  Search  for  Extra-Terrestrial  Intelligence  (SETI)  is  an 
interesting  example  of  remote  computation  (SETI(S)home, 
2001).  Normally,  remote  computation  involves  a  device  of  low- 
computational  power  asking  a  central  computer  to  do  work  for 
it.  In  the  case  of  SETI,  large  calculations  are  split  up  and  distrib¬ 
uted  over  large  numbers  of  not  particularly  powerful  computers. 

The  same  technique  is  used  in  “PC  farms.”  These  occur  when 
large  numbers  of  PCs  are  networked  together  to  act  as  a  form 
of  super-computer.  For  example,  in  CERN  (the  home  of  the 
web),  data  from  high-energy  collisions  may  consist  of  many 
megabytes  of  data  for  each  event,  with  perhaps  hundreds  of  sig¬ 
nificant  events  per  second  (CERN,  2001).  The  data  from  each 
event  is  passed  to  a  different  PC  which  then  performs  calcula¬ 
tions  on  the  data.  When  the  PC  finishes  it  stores  its  results  and 
then  adds  itself  back  to  a  pool  of  available  machines. 

During  coming  years  we  are  likely  to  see  both  forms  of  re¬ 
mote  computation.  As  devices  become  smaller  and  more 
numerous,  many  will  become  simply  sensors  or  actuators  com¬ 
municating  with  central  computational  and  data  servers  (al¬ 
though  “central”  here  may  mean  one  per  room,  or  even  one  per 
body).  On  the  other  hand,  several  companies,  inspired  by  SETI, 
are  pursuing  commercial  ways  of  harnessing  the  spare,  and  usu¬ 
ally  wasted,  computational  power  of  the  millions  of  home  and 
office  PCs  across  the  world. 

Applications 

The  existence  of  networks,  particularly  the  global  networks  of¬ 
fered  by  the  Internet  and  mobile  phone  networks,  have  made 
many  new  applications  possible  and  changed  others. 

Several  of  the  more  major  application  areas  made  possible 
by  networks  are  covered  in  their  own  chapters:  groupware 
(chapter  29),  online  communities  (chapter  30),  mobile  systems 
(chapter  32),  e-commerce  (chapter  39),  telecommunications 
(chapter  40)  and,  of  course,  the  web  (chapter  37). 

In  addition,  networking  impinges  on  many  other  areas.  Hand¬ 
held  devices  (chapter  32)  can  operate  alone,  but  are  increasingly 
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able  to  interact  with  one  another  and  with  fixed  networks  via 
wireless  networking.  Similarly,  wearable  computers  (chapter  33) 
are  expected  to  be  interacting  with  one  another  via  short-range 
networks,  possibly  carried  through  our  own  bodies  (which 
makes  mobile  phones  seem  positively  safe!)  and  information  ap¬ 
pliances  (chapter  38)  will  be  Internet  connected  to  allow  remote 
control  and  maintenance.  In  the  area  of  government  and  citi¬ 
zenship  (chapter  41),  terms  such  as  e-democracy  and  e-govern- 
ment  are  used  to  denote  not  just  the  technological  ability  to  vote 
or  access  traditional  government  publications  online,  but  a 
broader  agenda  whereby  citizens  feel  a  more  intimate  connec¬ 
tion  to  the  democratic  process.  Of  course,  education,  entertain¬ 
ment,  and  game  playing  are  also  making  use  of  networks. 

Throughout  the  chapter  we  will  also  encounter  broader  issues 
of  human  abilities,  especially  concerned  with  time  and  delays, 
involving  aspects  of  virtually  all  of  part  II  (human  perception,  cog¬ 
nition,  motor  skills,  etc.).  Also,  we  will  find  that  networking  raises 
issues  of  trust  and  ethics  (chapters  65  and  62)  and  of  course  the 
global  network  increases  the  importance  of  culturally  and  linguis¬ 
tically  accessible  information  and  interfaces  (chapter  23). 

Networking  has  already  transformed  many  people’s  working 
lives,  allowing  telecommunication,  improving  access  to  corpo¬ 
rate  information  whilst  on  the  move,  and  enabling  the  forma¬ 
tion  of  virtual  organizations.  Networks  are  also  allowing  whole 
new  business  areas  to  develop,  not  just  the  obvious  applications 
in  e-shopping  and  those  concerned  with  web  design. 

The  Internet  has  forced  many  organizations  to  create  parallel 
structures  to  handle  the  more  direct  connections  between  primary 
supplier  and  consumer  (disintermediation) .  This  paradoxically  is  al¬ 
lowing  more  personalized  (if  not  personal)  services  and  often  a 
focus  on  customer-supplier  and  customer-customer  communi¬ 
cation  (Siegal,  1999;  Light  &  Wakeman,  2001).  This  restructuring 
may  also  allow  the  more  flexible  businesses  to  revolutionize  their 
high  street  (or  mall)  presence,  allowing  you  to  buy  shoes  in  differ¬ 
ent  sizes,  or  use  next-day  fitting  services  for  clothes  (Dix,  2001b). 

The  complexity  of  installing  software  and  the  need  to  have 
data  available  anywhere  at  any  time  has  driven  the  nascent  ap¬ 
plication  service  provider  (ASP)  sector.  You  don’t  install  software 
yourself,  but  instead  use  software  hosted  remotely  by  providers 
who  charge  on  a  usage  rather  than  once-off  basis.  By  storing  the 
data  with  third  parties,  an  organization  can  off-load  the  majority 
of  its  backup  and  disaster-management  requirements. 

For  the  individual  user,  the  ubiquity  of  Internet  access  for 
many  has  enabled  many  PIM  (personal  information  manage¬ 
ment)  applications  such  as  e-mail,  calendars,  bookmark  lists,  and 
address  books.  These  things  that  would  once  have  been  seen  as 
personal  are  being  not  only  accessed  via  the  web,  but  in  many 
cases  also  shared.  These  web  communities  are  no  longer  the 
province  of  geeks,  but  have  become  part  of  the  day-to-day  lives 
of  many,  engendering  whole  new  ways  of  finding  out  and  getting 
to  know,  including  social  bookmarks,  blogs,  and  photologs. 

The  “personal”  device  has  not  become  redundant  though  in 
this  web  orientation  of  applications.  As  well  as  being  an  access 
point  to  global  services,  it  is  also  a  potential  interaction  device 
for  things  close  by.  For  example,  in  an  installation  by  .ahePooch:. 
(thePooch,  2006)  in  an  arts  event,  the  attendees  were  encour¬ 
aged  to  send  SMS  texts  to  Andrine,  a  huge  face  projected  high 
on  the  wall.  The  texts  were  analyzed  using  natural  language¬ 
processing  techniques,  and,  depending  on  the  content,  the  face 
took  on  different  emotions:  happy,  sad,  or  shocked  (Lock,  Al- 


lanson,  &  Phillips,  2003).  The  cameras  in  phones  are  also  being 
used  to  enable  them  to  be  used  as  location-finding  devices  (Sar- 
vas,  Herrarte,  Wilhelm,  &  Davis,  2004),  to  enable  the  embedding 
of  SpotCodes  or  other  visual  codes  in  paper  posters  (Toye  et  al., 
2004;  Semacode,  2006),  and  for  real  time  manipulation  of  large 
public  displays  (Rohs,  Sheridan,  &  Ballagas,  2004).  Some  of 
these  applications  use  local  networking  such  as  Bluetooth;  oth¬ 
ers  paradoxically  use  the  “global”  connectivity  through  SMS  or 
WAP  to  enable  local  interactions. 


NETWORKS  AS  MEDIATORS 


Issues  and  Problems  Because  of  Networks 

This  section  takes  as  a  starting  point  that  an  application  is  net¬ 
worked  and  looks  at  the  implications  this  has  for  the  user  inter¬ 
face.  This  is  most  apparent  in  terms  of  timing  problems  of  various 
kinds.  This  section  is  really  about  when  the  network  is  largely  not 
apparent,  except  for  the  unintended  effects  it  has  on  the  user. 

We’ll  begin  with  a  technical  introduction  to  basic  properties 
of  networks  and  then  see  how  these  affect  the  user  interface 
and  media  delivery. 

Network  Properties 

Bandwidth  and  Compression 

The  most  commonly  cited  network  property  is  bandwidth,  how 
much  data  can  be  sent  per  second.  Those  who  have  used  dial¬ 
up  connections  will  be  familiar  with  56  K  modems,  and  those 
with  longer  memories  or  who  use  mobile  phone  modems  may 
recall  9.6  K  modems  or  less.  The  “K”  in  all  of  these  refers  to 
thousands  of  bits  (0/1  value)  per  second  (strictly  Kbps),  rather 
than  bytes  (single  characters)  that  are  more  commonly  seen  in 
disk  and  other  memory  sizes.  A  byte  takes  eight  bits;  taking  into 
account  a  small  amount  for  overhead,  you  can  divide  the  bits 
per  second  by  10  to  get  bytes  per  second. 

Faster  networks  between  machines  in  offices  are  more  typi¬ 
cally  measured  in  megabits  per  second  (again,  strictly  Mbps  but 
often  just  written  “M”) — for  example,  the  small  “telephone  ca¬ 
ble”  Ethernet  is  rated  at  either  10  Mbps  or  100  Mbps. 

As  numbers  these  don’t  mean  much,  but  if  we  think  about 
them  in  relation  to  real  data  the  implications  for  users  become 
apparent. 

A  small  word  processor 
document  may  be  30  Kb 
(kilobytes).  Down  a  9.6  K 
GSM  modem  this  will  take 
approximately  half  a 
minute;  on  a  56  K  modem 
this  is  reduced  to  five  sec¬ 
onds;  for  a  10  Mb  Ethernet 
this  is  30  milliseconds.  A 
full-screen,  web-quality  gra¬ 
phic  may  be  300  Kb,  taking 
five  minutes  of  9.6  K  mo¬ 
dem,  less  than  a  minute  on 
a  56  K  modem,  or  one- third 


Note  that  I  am  using  the 
formula: 

,  ,  ,  .  F  X  10 

download  time  T  =  — — — 

M 

where: 

F  =  size  of  file  in  bytes 
1 0  is  the  number  of  raw  bits 
per  byte 

M  =  modem  speed  in  bits 
per  second 
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of  a  second  on  10  Mb  Ethernet.  (Note  that  these  are  theoretical 
minimum  times  if  there  is  nothing  else  using  the  network.) 

Rich  media,  such  as  sound  or  video,  puts  a  greater  load 
on  again.  Raw,  uncompressed  HI-FI  quality  sound  needs  over 
200  kilobits  per  second,  and  video  tens  of  megabits  per  second. 
Happily  there  are  ways  to  reduce  this;  otherwise  digital  AV 
would  be  impossible  over  normal  networks. 

Real  media  data  has  a  lot  of  redundant  information:  areas  of 
similar  color  in  a  picture,  successive  similar  frames  in  a  video,  or 
sustained  notes  in  music.  Compression  techniques  use  this  sim¬ 
ilarity  to  reduce  the  actual  amount  of  data  that  needs  to  be  sent 
(e.g. ,  rather  than  sending  a  whole  new  frame  of  video,  it  just  sends 
the  differences  from  the  last  frame).  Also,  some  forms  of  com¬ 
pression  make  use  of  human  perceptual  limits;  for  example,  MP3 
stores  certain  pitch  ranges  with  greater  fidelity  than  others,  as  the 
human  ear’s  sensitivity  is  different  at  different  pitches  (MPEG, 
2001);  also,  JPEG  images  give  less  emphasis  to  accurate  color  hue 
than  the  darkness/lightness  (JPEG,  2001).  Between  them,  these 
techniques  can  reduce  the  amount  of  information  that  needs  to  be 
transferred  significantly,  especially  for  richer  media  such  as  video. 
Thus  the  actual  bandwidth  and  the  effective  bandwidth,  in  terms 
of  the  sorts  of  data  that  are  transmitted,  may  be  very  different. 

Latency  and  Start-Up 

Bandwidth  measures  how  much  data  can  be  transferred;  la¬ 
tency  is  how  long  each  bit  takes.  In  terms  of  a  highway,  band¬ 
width  would  be  the  number  of  lanes  and  latency  is  the  time  it 
takes  to  travel  the  length  of  the  highway.  The  latency  is  due  to 
two  factors.  The  first  is  the  speed  of  transmission  of  electricity 
through  wires  or  light  through  optical  networks.  This  may  seem 
insignificant,  but  for  a  beam  of  light  to  travel  across  the  Atlantic 
would  take  20  ms  and  in  practice  this  hop  takes  more  like  70  ms. 
For  satellite-based  communications,  the  return  trip  to  and  from 
a  geostationary  satellite  takes  nearly  a  second;  think  about  the 
typical  delay  you  can  hear  on  a  transcontinental  telephone  call. 
The  second  factor  contributing  to  latency  is  that  every  elec¬ 
tronic  switch  or  computer  router  has  to  temporarily  store  and 
then  decide  what  to  do  with  the  signal  before  passing  it  on  to 
the  next  along  the  chain.  Typically  this  is  a  more  major  factor 
and  in  practice  transatlantic  Internet  traffic  will  take  nearer 


250  ms  from  source  to  final  destination,  most  of  which  is  spent 
in  various  computer  centers  at  one  end  or  the  other. 

Latency  is  made  worse  by  setup  time.  Every  time  you  estab¬ 
lish  an  Internet  connection  a  conversation  is  established  be¬ 
tween  your  computer  and  the  machine  hosting  the  web-server: 

“Hello,  are  you  there?” 

“Yes,  I’m  here.  What  do  you  want?” 

“I’d  like  to  send  you  some  data.” 

“Great.  I’m  waiting.” 

“Okay,  here  it  is,  then.” 

(This  is  called  handshaking.)  Each  turn  in  this  conversation 
involves  a  round  trip,  network  latency  on  both  outward  and  return 
paths,  and  processing  by  both  computers.  And  this  is  before  the 
web  server  proper  even  gets  to  look  at  your  request.  Similar  pat¬ 
terns  happen  as  you  dial  a  telephone  call. 

Latency  and  setup  time  are  critical  as  they  often  dominate 
the  delay  for  the  user  except  for  very  large  files  or  streaming 
audio/visual  media.  Early  web  design  advice  (given  by  those  con¬ 
cerned  about  people  with  a  slow  connection,  but  who  clearly 
had  never  used  one!)  used  to  suggest  having  only  as  much  text 
as  would  fit  on  a  single  screen.  This  was  intended  to  minimize 
the  download  time.  However,  this  ignores  setup  times.  A  long 
text  page  doesn’t  take  long  to  load  even  on  a  slow  connection, 
once  the  connection  to  the  web  server  has  been  established. 
Then  it  is  far  faster  to  scroll  in  the  browser  than  to  click  and 
wait  for  another  small  page  to  load.  A  similar  problem  is  the 
practice  of  breaking  large  images  up  into  a  jigsaw  of  small 
pieces.  There  are  valid  reasons  for  this — allowing  rollover  inter¬ 
action,  or  where  parts  of  the  image  are  of  different  kinds  (picture/ 
text) — however,  it  is  also  used  without  such  reasons  and  each 
small  image  requires  a  separate  interaction  with  the  server,  en¬ 
countering  latency  and  setup  delays. 

jitter  and  Buffering 

Suppose  you  send  letters  to  a  friend  every  three  days  and  the 
postal  service  typically  takes  two  days  to  deliver  letters  (the 
average  latency  in  network  terms) .  Your  friend  will  receive  letters 
every  three  days,  just  delayed  from  when  you  sent  them.  Now 


FIGURE  14.2.  Bandwidth,  latency,  and  litter. 
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imagine  that  the  postal  system  is  a  little  variable:  sometimes  let¬ 
ters  take  two  days,  but  occasionally  they  are  faster  and  arrive  the 
next  day  and  sometimes  they  are  slower  and  take  three  days. 
You  continue  to  send  letters  every  three  days,  but  if  a  slow  letter 
is  followed  by  a  fast  one,  your  friend  will  receive  them  only  one 
day  apart,  if  on  the  other  hand  a  fast  letter  is  followed  by  a  slow 
one  the  gap  becomes  five  days.  This  variability  in  the  delay  is 
called  “jitter.”  (Note  that  the  fast  letters  are  just  as  problematic  as 
the  slow  ones;  a  fast  letter  followed  by  a  normal-speed  one  still 
gives  a  four-day  gap.) 

Jitter  doesn’t  really  matter  when  sending  large  amounts  of 
data,  or  when  sending  one-off  messages.  However  it  is  critical 


for  continuous  media.  If  you  just  played  video  frames  or  sound 
when  it  arrived,  jitter  would  mean  that  the  recording  would 
keep  accelerating  and  slowing  down  (see  Fig.  14.3(a)  and  (b)). 

Jitter  can  be  partially  alleviated  by  buffering.  Imagine  that 
your  friend’s  postman  holds  back  one  letter  for  three  days  and 
then  starts  giving  letters  to  your  friend  one  every  third  day.  If 
your  mail  always  arrives  in  exactly  two  days,  the  postman  will 
always  hold  exactly  one  letter  as  mail  will  arrive  as  fast  as  he 
passes  it  on.  If  however  a  letter  arrives  quickly  he  will  simply 
hold  two  letters  for  a  few  days  and  if  it  is  slow  he  will  have  a 
spare  letter  to  give.  Your  friend’s  mail  is  now  arriving  at  a  regular 
rate,  but  the  delay  has  increased  to  (a  predictable)  five  days. 


(b) 


frames  generated  at  fixed  rate  latency  varies  -  itter 


FIGURE  14.3.  (a)  No  jitter,  no  problem,  (b)  litter  causes  irregular  perception. 


frames  generated  at  fixed  rate  latency  varies  -  itter 


FIGURE  14.4.  Buffering  smooths  jitter  but  adds  delay. 
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Buffering  in  network  multimedia  behaves  exactly  the  same, 
holding  back  a  few  seconds  of  audio/video  data  and  then  re¬ 
leasing  it  at  a  constant  rate. 

Reliability  and  Loss,  Datagram, 

and  Connection-Based  Services 

Virtually  all  networks  are  designed  on  the  principle  that  there 
will  be  some  loss  or  damage  to  data  en-route.  This  arises  for  var¬ 
ious  reasons;  sometimes  there  is  electrical  interference  in  a  wire, 
or  the  internal  computers  and  routers  in  the  network  may  have 
too  much  traffic  to  cope  with.  This  is  normal  and  network  soft¬ 
ware  is  built  to  detect  damaged  data  and  cope  with  lost  data. 

Because  of  this,  the  lowest  layers  of  a  network  are  assumed 
to  be  “lossy.”  Any  data  damaged  in  transit  are  discarded  and 
computers  and  hardware  en-route  can  choose  to  discard  data 
if  they  get  busy.  So  when  one  computer  sends  a  packet  of  data 
to  another  it  can  assume  that  if  the  packet  of  data  arrives  it  will 
be  intact,  but  it  may  never  arrive  at  all. 

Some  network  data,  in  particular  certain  forms  of  real-time 
multimedia  data,  are  deliberately  sent  in  this  unreliable,  message- 
at-a-time,  form  (called  “datagrams”). 

However,  it  is  usually  easier  to  deal  with  reliable  channels,  and 
so  higher  levels  of  the  network  create  what  are  called  connection- 
based  services  on  top  of  the  unreliable  lower-level  service.  Inter¬ 
net  users  may  have  come  across  the  term  TCP/IE  IP  is  the  name  of 
an  unreliable  low-level  service  that  routes  packets  of  data  be¬ 
tween  computers.  TCP  is  a  higher-level,  connection-based  service 
built  on  top  of  IP  The  way  TCP  works  is  that  the  computer  want¬ 
ing  to  make  a  connection  contacts  the  other  and  they  exchange 
a  few  (unreliable  IP)  messages  to  establish  the  link.  Once  the  link 
is  established,  the  sending  computer  tags  messages  it  sends  with 
sequence  data.  Whenever  the  receiving  computer  has  all  the  data 
up  to  a  certain  point  it  sends  an  acknowledgement.  If  the  sending 
computer  doesn’t  get  an  acknowledgement  after  a  certain  time 
it  resends  the  data  (Stevens,  1998,  1999). 

With  TCP,  the  receiving  computer  cannot  send  a  message 
back  when  it  notices  a  gap;  it  has  to  wait  for  the  sending  com¬ 
puter  to  resend  after  the  timeout.  While  it  is  awaiting  the  resend 
it  cannot  process  any  of  the  later  data.  Notice  that  this  means  re¬ 
liability  is  bought  at  the  price  of  potential  delays. 

Quality  of  Service  (QoS)  and  Reservation 

The  above  properties  are  not  just  determined  by  a  raw  net¬ 
work’s  characteristics,  such  as  the  length  of  wires,  types  of 
routers,  modems,  and  so  on.  They  are  also  affected  by  other 
traffic  and  its  volume  and  nature.  If  10  PCs  are  connected  to  a 
single  10  Mbps  network  connection  and  require  high-volume 
data  transfers  (perhaps  streaming  video),  then  there  is  only, 
on  average,  1Mbps  available  for  each.  If  you  are  accessing  a  net¬ 
work  service  that  requires  transatlantic  connections  during 
peak  hours,  then  intermediate  routers  and  hubs  in  the  network 
are  likely  to  become  occasionally  overloaded,  leading  to  inter¬ 
mittent  packet  loss,  longer  average  delays,  and  more  variabil¬ 
ity  in  delay,  hence  jitter.  In  the  past  when  the  capacity  of  Inter¬ 
net  backbones  was  lower,  it  was  obvious  in  the  United  Kingdom 


when  the  United  States  “woke  up”  as  the  web  ground  to 
a  crawl! 

For  certain  types  of  activity,  in  particular  real-time  or  stream¬ 
ing  rich-media,  one  would  like  to  be  able  to  predict  or  guarantee 
a  minimum  bandwidth,  maximum  delay,  jitter,  and  so  forth.  These 
are  collectively  called  “quality  of  service  (QoS)”  issues  (Campbell 
&  Nahrstedt,  1997).  Some  network  protocols  allow  applications  to 
reserve  a  virtual  channel  with  guaranteed  properties;  however,  the 
most  common  large-scale  network,  the  Internet,  does  not  have 
such  guarantees — it  operates  solely  on  a  best-endeavor  basis.  Up¬ 
grades  to  the  underlying  protocol  (called  by  the  catchy  name  IPv6) 
allow  some  differentiation  of  different  types  of  traffic.  This  may  al¬ 
low  routers  to  make  decisions  to  favor  time-critical  data,  but  it  will 
still  not  be  able  to  reserve  guaranteed  capacity.  However,  in  prac¬ 
tice  the  increased  capacity  of  Internet  backbones  is  allowing  large- 
scale  'Voice-over-Internet  services  such  as  Skype  with  acceptable 
end-to-end  service  (FCC,  2006;  Skype,  2006). 

Encryption,  Authentication,  and  Digital  Signatures 

Some  networks,  such  as  closed  office  networks,  offer  no  greater 
worries  about  security  of  information  than  talking  together  (both 
are  capable  of  being  bugged,  but  with  similar  levels  of  difficulty). 
However,  more  open  networks,  such  as  the  Internet  or  phone 
networks,  mean  that  data  is  traveling  through  a  third  party  and 
public  infrastructure  to  get  to  its  recipients.  Increasing  use  of  wire¬ 
less  devices  also  means  that  the  data  sent  between  devices  is  more 
easily  able  to  be  monitored  or  interfered  with  by  third  parties. 
One  option  is  to  only  use  physically  secure  networks,  but  for  eco¬ 
nomic  reasons  this  is  often  not  an  option.  Furthermore,  solutions 
which  do  not  rely  on  the  network  itself  being  secure  are  more  ro¬ 
bust.  If  you  rely  on,  for  example,  a  private  dedicated  line  between 
two  offices  and  assume  it  is  secure,  then  if  someone  does  man¬ 
age  to  tap  into  it,  all  your  interoffice  communication  is  at  risk. 

The  more  common  approach  now  is  to  assume  the  networks 
are  insecure  and  live  with  it.  This  gives  rise  to  two  problems: 

Secrecy — how  to  stop  others  from  seeing  your  data 

Security — how  to  make  sure  data  is  not  tampered  with 

The  first  problem  is  managed  largely  by  encryption  methods, 
ensuring  that  even  if  someone  reads  all  your  communications 
they  cannot  understand  them  (Schneier,  1996).  The  “https”  in 
some  URLs  is  an  example  of  this,  denoting  that  the  communi¬ 
cation  to  the  web  server  is  encrypted. 

The  second  problem,  security,  has  various  manifestations. 
Given  communications  are  via  a  network,  how  do  you  know 
that  you  are  talking  to  the  right  person/machine?  Authentica¬ 
tion  mechanisms  deal  with  this.  In  various  ways  they  allow  one 
machine  to  verify  (usually  by  secret  information  that  can  only 
be  known  by  the  true  intended  party)  that  it  is  talking  to  the 
right  party. 

Even  if  you  know  that  you  are  talking  to  the  right  person/ 
machine,  how  do  you  know  that  the  data  you  receive  hasn’t 
been  changed?  This  is  like  receiving  a  signed  letter,  but,  unbe¬ 
knownst  to  you,  someone  has  added  some  lines  of  text  above 
the  signature — although  it  really  comes  from  the  person  you 
think,  the  message  is  not  as  was  sent.  If  data  is  being  encrypted 
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then  this  may  often  implicitly  solve  this  problem  as  any  tam¬ 
pered  data  is  uninterpretable  by  a  third  party,  who  therefore 
cannot  alter  it  in  a  meaningful  way. 

If  secrecy  is  not  an  issue,  however,  encryption  is  an  unnec¬ 
essary  overhead  and  instead  digital  signatures  generate  a  small 
data  block  that  depends  on  the  whole  of  the  message  and  secret 
information  known  to  the  sender.  It  is  possible  for  the  recipient 
to  verify  that  the  signature  block  corresponds  to  the  data  sent 
and  the  person  who  is  supposed  to  have  sent  it.  One  example  of 
this  are  “signed  applets”  in  which  the  Java  code  is  digitally 
signed  so  that  you  can  choose  to  only  run  Java  programs  from 
trusted  parties. 

U1  Properties 

Network  Transparency 

One  of  the  goals  of  many  low-level  network  systems  is  to 
achieve  transparency;  that  is,  to  make  it  invisible  to  the  user 
where  on  the  network  a  particular  resource  lies.  When  you  ac¬ 
cess  the  web  you  use  the  same  kind  of  URL  and  same  kind  of 
interface  whether  the  web  server  is  in  Arizona,  Australia,  or  Ar¬ 
menia.  I  know  that  when  I’m  at  home  and  send  an  e-mail  be¬ 
tween  two  machines  less  than  two  meters  apart,  the  message  ac¬ 
tually  goes  all  the  way  across  the  Atlantic  and  back — but  this  is 
only  because  I  have  quite  a  detailed  understanding  of  the  com¬ 
puters  involved.  As  a  user  I  press  “send  mail”  on  one  machine 
and  it  arrives  near-instantaneously  on  the  other. 

Although  network  transparency  has  many  advantages  to 
the  user — you  don’t  care  about  routes  through  the  network 
and  so  on — there  are  limits  to  its  effectiveness  and  desirability. 
Some  years  ago  I  was  at  a  Xerox  lab  in  Welwyn  Garden  City  in 
the  United  Kingdom.  Randy  Trigg  was  demonstrating  some 
new  features  of  Notecards  (an  early  hypertext  system;  Halasz, 
Moran,  &  Trigg,  1987).  The  version  was  still  under  develop¬ 
ment  and  every  so  often  would  hit  a  problem  and  a  LISP  de¬ 
bugger  window  would  appear.  After  using  it  for  a  while,  it  sud¬ 
denly  froze — no  debugger  window,  no  error  message;  it  just 
froze.  After  a  few  seconds  of  embarrassment,  Randy  hit  a  con¬ 
trol  key  and  launched  the  debugger.  A  few  minutes  of  frantic 
scanning  through  stack  dumps,  program  traces,  and  so  on, 
and  the  reason  became  clear  to  him.  He  had  demonstrated  a 
feature  that  he  had  last  used  on  his  workstation  at  Palo  Alto. 
The  feature  itself  was  not  at  fault,  but  required  an  obscure  font 
that  he  had  on  his  own  workstation,  but  not  on  the  machine 
there  in  Welwyn.  When  Notecards  had  requested  the  font  the 
system  might  have  thrown  up  an  error  window,  or  substituted 
a  similar  font.  However,  in  the  spirit  of  true  network  trans¬ 
parency,  the  location  of  the  font  should  not  matter;  having 
failed  to  find  it  on  the  local  machine,  it  proceeded  to  interro¬ 
gate  machines  on  the  local  network  to  see  if  they  had  it,  and 
then  proceeded  to  scan  the  Xerox  UK  network  and  world  net¬ 
work.  Eventually,  if  we  had  waited  long  enough,  it  would  have 
been  found  on  Randy’s  machine  in  Palo  Alto.  Network  trans¬ 
parency  rarely  extends  to  timing! 

Transparency  has  also  been  critiqued  for  CSCW  purposes 
(Mariani  &  Rodden,  1991).  It  may  well  be  very  important  to 
users  where  resources  and  people  are.  For  mobile  computing 
also,  an  executive  takes  a  laptop  on  the  plane  only  to  discover 


that  the  files  needed  are  residing  on  a  network  fileserver  rather 
than  on  the  machine  itself  If  the  interface  hides  location,  how 
can  one  predict  when  and  where  resources  will  be  available? 

Later  in  this  chapter,  we  will  discuss  recent  work  where  the 
presence  of  intermittent  connections,  limited  range,  and  variable 
signal  strength  is  being  used  as  a  deliberate  feature  in  interfaces. 

Delays  and  Time 

As  is  evident,  one  of  the  issues  that  arises  again  and  again 
when  considering  networks  is  time.  How  long  are  the  delays, 
how  long  to  transfer  data,  and  so  on?  Networking  is  not  the  only 
reason  for  delays  in  applications,  but  is  probably  one  of  the 
most  noticeable — the  web  has  often  been  renamed  the  “'World- 
Wide  Wait.”  There  is  a  long-standing  literature  on  time  and  de¬ 
lays  in  user  interfaces.  This  is  not  as  extensive  as  one  might 
think,  however,  largely  because  for  a  long  time  the  prevailing 
perception  in  the  HCI  community  was  that  temporal  problems 
would  go  away  (with  some  exceptions),  leading  to  what  I  called 
the  “myth  of  the  infinitely  fast  machine”  (Dix,  1987). 

One  of  the  earlier  influential  papers  was  Ben  Shneiderman’s 
review  of  research  findings  on  delays  (Shneiderman,  1984) ,  mainly 
based  on  command-line  interfaces.  More  recently  there  have 
been  a  series  of  workshops  and  special  journal  issues  on  issues 
of  time,  sparked  largely  by  web  delays  (Johnson  &  Gray,  1996; 
Clarke,  Dix,  Ramduny  &  Trepess,  1997;  Howard  &  Fabre,  1998). 

Three  main  timescales  are  problematic  for  networked  user 
interfaces: 

1 00  ms  Feedback  for  hand-eye  coordination  tasks  needs  to 

be  less  than  1 00-200  milliseconds  to  feel  fluid.  This 
is  probably  related  to  the  fact  that  there  are  delays 
of  this  length  in  our  motor-sensory  system  anyway. 
For  aural  feedback,  the  timescales  are  slightly  tighter 
again. 

1  second  Timescale  for  apparent  cause-effect  links,  such  as 

popping  a  window  after  pressing  a  button.  If  the  res¬ 
ponse  is  faster  than  this  the  effect  seems  "immediate." 
This  is  related  to  a  period  of  about  1  second  that  the 
brain  regards  as  "now.” 

5-10  seconds  Waits  longer  than  this  engender  annoyance  and  make 
it  hard  to  maintain  task  focus.  This  may  be  related  to 
short-term  memory  decay. 

The  100  ms  time  is  hard  to  achieve  if  the  interaction  involves 
even  local  network  traffic.  The  1  second  time  is  usually  achievable 
for  local  networks  (as  are  assumed  by  X-Windows  systems) ,  but 
more  problematic  for  long-haul  networks.  The  5-10  second  time 
is  in  principle  achievable  for  even  the  longest  transcontinental 
connections,  but,  when  combined  with  bandwidth  limitations  or 
overload  of  remote  resources,  may  become  problematic.  This  is 
especially  evident  on  web-based  services  where  the  delay  between 
hitting  a  link  and  retrieving  a  page  (especially  a  generated  page) 
may  well  exceed  these  limits,  even  for  the  page  to  begin  to  draw. 

The  lesson  for  UI  designers  is  to  understand  the  sort  of  in¬ 
teraction  required  and  to  ensure  that  parts  of  the  user  interface 
are  located  appropriately.  For  example,  if  close  hand-eye  co¬ 
ordination  is  required,  it  must  run  locally  on  the  user’s  own 
machine — in  the  case  of  the  web  in  an  applet,  in  JavaScript 
code,  or  so  on.  If  the  nature  of  the  application  is  such  that  parts 
of  the  application  cannot  reside  close  enough  to  the  user  for  the 
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type  of  interaction  required  then,  of  course,  one  should  not  sim¬ 
ply  have  a  slow  version  of  (for  example)  dragging  an  icon 
around,  but  instead  change  the  overall  interaction  style  to  re¬ 
flect  the  available  resources. 

Two  of  the  factors  that  alleviate  the  effects  of  longer  delays 
are  predictability  of  the  delay  and  progress  indicators.  Both  give 
the  user  some  sense  of  control  or  understanding  over  the 
process,  especially  if  users  have  some  indication  of  expected  de¬ 
lays  before  initiating  an  action  (Johnson,  1997).  The  many  vari¬ 
able  factors  in  networked  systems  make  predicting  delays  very 
difficult,  increasing  the  importance  of  giving  users  some  sense 
of  progress.  The  psychological  effect  of  progress  indicators  is 
exploited  (cynically)  by  those  web  browsers  that  have  progress 
bars  that  effectively  lie  to  the  user,  moving  irrespective  of  any 
real  activity.  (Try  unplugging  a  computer  from  the  network  and 
attempting  to  access  a  web  page;  some  browsers  will  hit  70% 
on  the  progress  bar  before  reporting  a  problem.)  Other  net¬ 
work  applications  use  recent  network  activity  to  predict  re¬ 
maining  time  for  long  operations  (such  as  large  file  downloads). 
Other  solutions  include  generating  some  sort  of  intermediate 
low  quality  or  partial  information  while  the  full  information  is 
being  generated  or  downloaded  (e.g.,  progressive  image  for¬ 
mats  or  splash  pages  of  Flash  movies) . 

For  virtual  reality  using  head-mounted  displays,  as  well  as 
hand-eye  coordination  tasks,  we  also  have  issues  of  the  coordi¬ 
nation  between  head  movements  and  corresponding  generated 
images.  The  timescales  here  are  even  tighter  as  the  sensory 
paths  are  faster  within  our  bodies,  hence  less  tolerant  of  exter¬ 
nal  delays.  The  brain  receives  various  indications  of  movement: 
the  position  and  changes  of  neck  and  related  muscles,  the  bal¬ 
ance  sensors  in  the  inner  ear,  and  visual  feedback.  Delays  be¬ 
tween  the  movement  of  the  generated  environment  and  head 
movement  lead  to  dissonance  between  these  different  senses 
and  have  an  effect  rather  like  being  at  sea,  with  corresponding 
disorientation  and  nausea.  Also,  any  delays  reduce  the  sense  of 
immersion — being  there  within  the  virtual  environment.  Early 
studies  of  virtual  reality  (VR)  showed  that  users’  sense  of  im¬ 
mersion  was  far  better  when  they  were  given  very  responsive 
wire  frame  images  than  if  they  were  given  fully  rendered  im¬ 
ages  at  a  delayed  and  lower  frame  rate  (Pausch,  1991). 

Coping  Strategies 

People  are  very  adaptable.  When  faced  with  unacceptable  de¬ 
lays  (or  other  user  interface  problems)  users  develop  ways  to 
work  around  or  ameliorate  the  problem — coping  strategies.  For 
example,  web  users  may  open  multiple  windows  so  that  they 
can  view  one  page  whilst  reading  another  (McManus,  1997)  and 
users  of  “telnet”  for  remote  command  line  interfaces  may  type 
“test”  characters  to  see  whether  the  system  has  any  outstand¬ 
ing  input  (Dk,  1994). 

Coping  strategies  may  hide  real  problems,  so  it  is  important 
not  to  assume  that  just  because  users  do  not  seem  to  be  com¬ 
plaining  or  failing  that  everything  is  all  right.  On  the  other  hand, 
we  can  also  use  the  fact  that  users  are  bright  and  resourceful  by 
building  interface  features  that  allow  users  to  adopt  coping 
strategies  where  it  would  be  impossible  or  impractical  to  produce 
the  interface  response  we  would  like.  For  example,  where  we 
expect  delays  we  can  ensure  that  continual  interaction  is  not  re¬ 


quired  (by  perhaps  amassing  issues  requiring  user  attention  in  a 
“batch”  fashion),  thus  allowing  users  to  more  easily  multitask. 
Unfortunately,  this  latter  behavior  is  not  frequently  seen — the 
“myth”  lives  on  and  most  networked  programs  still  stop  activity 
and  await  user  interaction  whenever  problems  are  encountered. 

Timeliness  of  Feedback/Feedtfirough,  Pace 

Although  feedback  is  one  of  the  most  heavily  used  terms  in 
HCI,  we  may  often  ignore  the  complex  levels  of  feedback  when 
dealing  with  near-instantaneous  responses  of  GUI  interfaces. 
In  networked  systems  with  potentially  long  delays  we  need  to 
unpack  the  concept.  We’ve  already  discussed  some  of  the  criti¬ 
cal  timescales  for  feedback.  For  hand-eye  coordination,  getting 
feedback  below  the  100  ms  threshold  is  far  more  important  than 
fidelity;  quickly  moving  wire  frames  or  simple  representations 
are  better  than  dragging  an  exact  image  with  drop  shadow. 

For  longer  feedback  cycles,  such  as  pressing  a  button,  we 
need  to  distinguish 

•  Syntactic  feedback — that  the  system  has  recognized  your 
action 

•  Intermediate  feedback — that  the  system  is  dealing  with 
the  request  implied  by  your  action  (and,  if  possible,  progress 
towards  that  request) 

•  Semantic  feedback — that  the  system  has  responded  and 
the  results  were  obtained 

The  direct-manipulation  metaphor  has  led  to  identification 
and  hence  confusion  between  these  levels,  and  many  systems 
provide  little  in  the  way  of  syntactic  or  intermediate  feedback, 
relying  solely  on  semantic  feedback.  In  networked  systems 
where  the  semantic  feedback  includes  some  sort  of  remote  re¬ 
source,  it  is  crucial  to  introduce  specific  mechanisms  to  supply 
syntactic  and  intermediate  feedback;  otherwise,  the  system  may 
simply  appear  to  have  ignored  the  user’s  action  (leading  to  re¬ 
peated  actions  with  potentially  unforeseen  consequences)  or 
even  frozen  or  crashed. 

This  also  reminds  us  of  a  crucial  design  rule  for  slow  systems: 
Wherever  possible,  make  actions  idempotent — that  is,  that  in¬ 
voking  the  same  action  twice  should,  where  possible,  have  the 
same  effect  as  a  single  action.  This  means  that  the  “try  again” 
response  to  a  slow  system  does  not  lead  to  strange  results. 

For  collaborative  systems  or  those  involving  external  or  au¬ 
tonomous  resources  (e.g.,  remote  controlled  objects,  environ¬ 
mental  sensors,  software  agents),  we  must  also  consider  feed¬ 
through.  Feedback  is  experiencing  the  effect  of  one’s  own 
actions-,  feedthrough  is  the  effect  of  one’s  own  actions  on  other 
people  and  things  and  experiencing  the  effects  of  their  actions 
oneself  For  example,  in  an  online  chat  system,  you  type  a  short 
message,  press  “send,”  and  your  message  appears  in  your  tran¬ 
script  (feedback);  then,  sometime  later,  it  also  appears  in  the 
transcript  of  the  other  chat  participants  (feedthrough). 

Feedback  is  needed  to  enable  us  to  work  out  whether  the 
actions  we  have  performed  are  appropriate,  hence  (typically) 
needs  to  be  much  quicker  than  feedthrough  responses.  This  is 
fortunate  as  feedthrough  by  its  very  nature  usually  requires  net¬ 
work  transmission  and  ensuing  delays.  The  exception  to  the 
rule  that  feedthrough  can  afford  to  be  slower  is  where  the  users 
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are  attempting  to  perform  some  close  collaborative  task  (e.g., 
positioning  some  items  using  direct  manipulation)  or  where 
there  is  a  second,  fast  communication  channel  (e.g.,  on  the  tele¬ 
phone,  User  A  says  to  User  B,  “See  the  red  box,”  but  the  relevant 
item  hasn’t  appeared  yet  on  User  B’s  screen). 

Potentially  more  important  to  users  of  collaborative  systems 
than  bandwidth  or  even  raw  delays  is  pace,  the  rate  at  which  it 
is  possible  to  interact  with  a  remote  resource  or  person.  This  is 
partly  determined  by  lower  level  timings,  but  is  also  heavily  in¬ 
fluenced  by  interface  design.  For  example,  you  know  that  some¬ 
one  is  sitting  at  their  desk  and  send  them  an  urgent  e-mail.  The 
time  that  it  takes  to  get  a  response  will  be  hardly  affected  by  the 
raw  speeds  between  your  machine  and  your  colleague,  and 
more  determined  by  factors  such  as  how  often  the  e-mail  client 
checks  the  server  for  new  e-mail  and  whether  it  sounds  an  alert 
when  new  e-mail  arrives,  or  simply  waits  there  until  your  col¬ 
league  chooses  to  check  the  in-box. 

Race  Conditions  and  Inconsistent  Interface  States 

Alison  and  Brian  are  using  an  online  chat  program. 

Alison  writes,  “It’s  a  beautiful  day.  Let’s  go  out  after  work,” 
and  then  begins  to  think  about  it. 

Brian  writes,  “I  agree  totally,”  and  then  has  to  leave  to  go  to 
a  meeting. 

At  almost  the  same  time  Alison  writes,  “Perhaps  not;  I  look 
awful  after  the  late  party.” 

Unfortunately  the  messages  are  so  close  to  simultaneous 
that  both  Alison  and  Brian’s  machines  put  their  own  contribu¬ 
tion  first,  so  Alison  sees  the  chat  window  as  in  Fig.  14. 5. a  and 
Brian  sees  it  as  in  Fig.  14. 5. b.  Brian  thinks  for  a  few  moments, 
and  then  writes,  “No,  you  look  lovely  as  ever,”  but  unfortunately 
Alison  never  sees  this  as  she  took  one  look  at  Brian’s  previous 
remark  and  shut  down  the  chat  program. 

This  type  of  incident  where  two  events  happen  so  close  to¬ 
gether  that  their  effects  overlap  is  called  “a  race  condition.”  Race 
conditions  may  lead  to  inconsistent  states  for  users  as  in  this 
example,  or  may  even  lead  to  the  software  crashing.  Although  in 
principle  race  conditions  are  possible  however  fast  the  under¬ 
lying  network,  the  likelihood  of  races  occurring  gets  greater  as 
the  network  (and  other)  delays  get  longer. 


Alison  It's  a  beautiful  day. 

Let's  go  out  after  work. 

Alison  perhaps  not,  I  look  awful 
after  the  late  party 

Brian  I  agree  totally 


Even  some  of  the  earliest  studies  in  collaborative  systems  have 
shown  the  disorienting  effects  of  users  seeing  different  views  of 
their  shared  information  space,  even  when  this  is  simply  a  mat¬ 
ter  of  seeing  different  parts  of  the  same  space  (Stefik,  Bobrow, 
Foster,  Tanning,  &  Tatar,  1987). 

Consistency  becomes  an  even  greater  problem  in  mobile  sys¬ 
tems  where  wireless  connections  may  be  temporarily  lost,  or  de¬ 
vices  may  be  unplugged  from  fixed  networks  whilst  on  the  move. 
During  these  periods  of  disconnection,  it  is  easy  for  several  peo¬ 
ple  to  be  updating  the  same  information,  leading  to  potential 
problems  when  their  devices  next  become  network-connected. 

Later  in  this  chapter,  we  will  discuss  mechanisms  and  algo¬ 
rithms  that  can  be  used  to  maintain  consistency  even  when  de¬ 
lays  are  long  and  race  conditions  are  likely  to  occur. 

Awareness 

Returning  to  Alison  and  Brian,  after  Brian  has  typed  his 
response  he  may  not  know  that  Alison  hasn’t  seen  his  second 
contribution. 

Awareness  of  who  is  around  and  what  they  are  doing  is  a 
major  issue  in  CSCW  (see  Dourish  &  Bellotti  (1992)  and  McDaniel 
&  Brinck  (1997)).  It  has  various  forms: 

•  Being  able  to  tell  easily,  when  you  want  to  know,  what  other 
people  are  doing; 

•  Being  made  aware  (via  alerts,  very  salient  visual  cues,  etc.) 
when  significant  events  occur  (e.g.,  new  user  arrives,  some¬ 
one  makes  a  contribution,  etc.); 

•  Having  a  peripheral  awareness  of  who  is  around  and  what 
they  are  up  to. 

Awareness  isn’t  just  about  other  people.  In  any  circumstance 
where  the  environment  may  change,  but  not  through  your  own 
direct  action,  you  may  need  to  know  what  the  current  state  is 
and  what  is  happening.  This  is  not  confined  to  networked  ap¬ 
plications,  but  applies  to  any  hidden  or  invisible  phenomena, 
for  example,  background  indexing  of  your  hard-disk  contents. 
In  networked  applications,  anything  distant  is  invisible  unless  it 
is  made  visible  (audible)  in  the  interface. 

One  of  the  earlier  influential  experiments  to  demonstrate 
the  importance  of  peripheral  awareness  was  ArKola  (Gaver, 
Smith,  &  O’Shea,  1991).  This  was  a  simulated  bottling  factory 


Alison  It's  a  beautiful  day. 

Let's  go  out  after  work. 

Brian  l  agree  totally 

Alison  perhaps  not,  I  look  awful 
after  the  late  party 


(a)  Alison's  chat  window  (b)  Brian's  chat  window 


FIGURE  14.5.  Consistency  breakdown. 


14.  Network-Based  Interaction  •  277 


where  two  people  worked  together  to  maintain  the  factory,  sup¬ 
plying,  maintaining,  and  so  on.  The  participants  couldn’t  see  the 
entire  factory  at  once,  so  relied  on  the  sounds  produced  to  be 
aware  of  its  smooth  running  or  if  there  are  any  problems.  For  ex¬ 
ample,  the  sound  of  breaking  glass  might  suggest  that  the  end 
of  the  production  line  has  run  out  of  crates,  but  if  it  immediately 
stopped,  one  would  assume  that  the  other  participant  had 
sorted  the  problem  out. 

The  numerous  forms  of  shared  video  and  audio  spaces  are 
another  example  of  this.  Several  people,  usually  in  distant  offices, 
establish  long-term,  always-on  audio,  video,  or  audio-video  links 
between  their  offices  (Buxton  &  Moran,  1990;  Olson  &  Bly 
1991).  Sometimes  these  are  used  for  direct  communication,  but 
most  of  the  time  they  just  give  a  peripheral  awareness  that  the 
other  person  is  there  and  the  sort  of  activity  he  or  she  is  doing. 
This  can  be  used  for  functional  purposes  (e.g.,  knowing  when 
the  other  person  is  interruptable),  but  also  for  social  purposes — 
feeling  part  of  a  larger  virtual  office.  Other  systems  have  allowed 
larger  numbers  of  (usually  deliberately  low  quality  and  so  less 
intrusive)  web  cam  views  of  colleagues’  offices  and  shared  areas 
(Roussel,  1999,  2001).  The  aim  is  the  same:  to  build  social  co¬ 
hesion,  to  allow  at-a-glance  reading  of  one  another’s  situation, 
and  to  promote  “accidental”  encounters. 

A  form  of  awareness  mechanism  is  now  common  on  the  web 
with  “buddy  lists”  that  tell  you  when  friends  are  online  (ICQ, 
2001).  I  still  know  of  no  examples  of  rich  media  experiments  in 
a  domestic  environment,  for  example  a  virtual  kitchen-share  with 
your  elderly  mother  in  Minnesota.  However,  various  forms  of 
domestic  sharing  via  the  Internet  are  becoming  more  common, 
including  the  early  Casablanca  project  at  Interval  (Hindus,  Main- 
waring,  Leduc,  Hagstrom,  &  Bayley  2001)  using  shared  elec¬ 
tronic  sketchpads  in  the  home,  and  more  recent  projects  (CA- 
SIDE,  2005;  Taylor,  Izadi,  Swan,  harper,  &  Buxton,  2006).  Some  of 
these  applications  are  fairly  standard  “computer”  interfaces; 
some  are  soft  surveillance,  such  as  monitoring  an  elderly  relative; 
however,  there  is  also  a  stream  of  research  looking  at  more  inti¬ 
mate  ways  of  sharing  presence,  including  pads  that  glow  or  warm 
when  a  loved  one  touches  them  far  away  or  Jenny  Tillotson’s 
“Scent  ’Whisper,”  a  pair  of  Internet-connected  brooches  which 
emit  a  pleasant  scent  when  the  loved  one  whispers  on  the  other 
end  (Tillotson,  2005). 

Trying  to  capture  all  this  information  within  a  computer  dis¬ 
play  can  be  distracting  and  can  use  up  valuable  screen  space — 
and  of  course  assumes  that  the  computer  is  there  and  on.  For  this 
reason,  several  projects  have  looked  at  ambient  interfaces,  which 
in  various  ways  make  the  physical  environment  reflect  the  virtual. 
These  interfaces  monitor  various  events  in  the  electronic  worlds 
and  then  change  things  in  the  physical  environment:  lights  on  the 
wall,  moving  strings  hung  from  the  ceiling,  or  even  a  shaking  pot 
plant  (Lock,  Allanson,  &  Phillips,  2000).  Again,  this  is  not  funda¬ 
mentally  limited  to  networked  environments,  but  is  of  course  not 
very  useful  when  the  relevant  activity  is  close  at  hand  anyway. 

The  other  side  of  this  is  finding  out  what  people  are  doing  in 
order  to  signal  this  to  others.  For  computer  activity — are  you 
logged  on,  have  you  been  typing  recently,  and  what  web  page  are 
you  viewing?  — this  is  in  principle  available,  although  the  vari¬ 
ous  layers  of  software  may  make  it  hard  for  an  awareness  ser¬ 
vice  to  discover.  For  non-computer  aspects  this  is  more  prob¬ 
lematic — are  you  in  the  room,  busy,  with  other  people? — and 


may  require  a  range  of  sensors  in  the  environment,  ultrasound, 
video,  and  so  on,  with  corresponding  privacy  issues  (see,  for  ex¬ 
ample,  Bellotti  (1993)).  Monitoring  of  everyday  objects  is  an¬ 
other  way  to  achieve  this;  for  example,  one  experiment  used 
electronic  coffee  cups  with  sensors  to  tell  when  they  were  picked 
up  and  moved  around  (Gellersen,  Beigl,  &  Krull,  1999).  As  more 
and  more  devices  become  networked,  it  may  be  that  we  don’t 
need  special  sensors,  just  use  the  combined  information  from 
those  available,  although  the  privacy  issues  remain. 

In  collaborative  virtual-reality  environments,  knowing  that 
other  people  are  around  (as  avatars)  is  as  important  as  in  a  phys¬ 
ical  world,  but  harder  due  to  limited  senses  (usually  just  vision 
and  sound)  and  limited  field  of  view.  Furthermore,  there  are 
computational  costs  in  passing  information  such  as  audio  or 
even  detailed  positional  information  around  the  network  when 
there  are  tens,  hundreds,  or  thousands  of  users.  'Various  spatial 
models  have  been  developed  to  analyze  and  implement  the  idea 
of  proximity  in  virtual  space  (Benford,  Bowers,  Fahlen,  Mariani, 
&  Rodden,  1994;  Rodden,  1996;  Sandor,  Bogdan,  &  Bowers, 
1997;  Dix  et  al.,  2000).  These  seek  to  formalize  concepts  of 
(a)  where  your  focus  of  attention  is  within  the  virtual  world  and 
thus  whether  you  require  full  quality  audio  and  visual  represen¬ 
tation  of  others,  (b)  broader  areas  where  you  would  expect  some 
peripheral  awareness  where  potentially  degraded  information 
can  be  used,  and  (c)  those  parts  of  the  space  for  which  you  need 
no  awareness  information. 

Media  Issues 

■When  describing  the  intrinsic  network  properties,  issues  for 
continuous  media  were  mentioned  several  times.  This  is  be¬ 
cause,  with  the  possible  exception  of  close  hand-eye  coordina¬ 
tion  tasks,  continuous  media  put  some  of  the  tightest  require¬ 
ments  on  the  underlying  networks. 

Interactive  Conversation  and  Action 

Most  demanding  of  all  are  audio-visual  requirements  of  inter¬ 
active  conversation.  Anyone  who  has  had  a  transcontinental 
telephone  conversation  will  have  some  feeling  for  the  problems 
a  delay  of  a  second  or  two  can  cause.  "While  actually  speaking, 
the  delays  are  less  significant;  however,  turn-taking  becomes 
very  problematic.  This  is  because  the  speaker  in  a  conversation 
periodically  (and  subconsciously)  leaves  short  (200-300  ms) 
gaps  in  the  flow  of  speech.  These  moments  of  silence  act  as  en¬ 
try  points  for  the  other  participant,  who  is  expected  to  either  ac¬ 
knowledge  with  a  “go  on”  sound,  such  as  “uhm”  or  perhaps  a 
small  nod  of  the  head,  or  break  in  with  his  own  conversation. 
Entries  at  other  points  would  be  seen  as  butting  in  and  rude, 
and  lack  of  feedback  responses  can  leave  the  speaker  uncertain 
as  to  the  listener’s  understanding.  The  200-300  ms  is  again  al¬ 
most  certainly  related  to  the  time  it  takes  for  the  listener’s  sen¬ 
sory  system  to  get  the  relevant  aural  information  to  the  brain 
and  for  it  to  signal  the  relevant  nod,  acknowledgement,  or  start 
to  speak.  Clearly  our  conversational  system  is  finely  tuned  to  the 
expected  intrinsic  delays  of  face-to-face  conversation. 

"When  network  delays  are  added  it  is  no  longer  possible  to  re¬ 
spond  within  the  expected  200-300  ms  window.  The  speaker 
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therefore  gets  no  responses  at  the  appropriate  points  and  it  is 
very  hard  for  the  listener  to  break  into  the  flow  of  speech  with¬ 
out  appearing  rude  (by  the  time  they  hear  the  gap  and  speak, 
the  speaker  has  already  restarted).  Some  telephone  systems  are 
half-duplex;  that  is,  they  only  allow  conversation  in  one  direc¬ 
tion  at  a  time,  which  means  that  the  various  vocalizations  (“uhu,” 
“hmm,”  and  so  on)  that  give  the  speaker  feedback  will  be  lost 
entirely  while  the  speaker  is  actually  talking.  It  is  not  uncommon 
for  the  speaker  to  have  to  resort  to  saying,  ‘Are  you  there?”  due 
to  a  loss  of  sense  of  presence. 

These  effects  are  similar  whether  one  is  dealing  with  pure  au¬ 
dio  stream  (as  with  the  telephone),  video  streams  (as  with  desk¬ 
top  conferencing),  or  distributed  virtual  environments.  One  VR 
project  in  the  United  Kingdom  conducted  all  of  its  meetings  us¬ 
ing  a  virtual  environment  in  which  the  participants  were  repre¬ 
sented  by  cuboid  robot-like  avatars  (called  “blockies”)  (Green- 
halgh,  1997).  The  project  ended  with  an  online  virtual  party.  As 
the  music  played  the  participants  (and  their  avatars)  danced.  Al¬ 
though  clearly  enjoying  themselves,  the  video  of  the  party  showed 
an  interesting  phenomenon.  Everyone  danced  alone.  There  are 
various  reasons  for  this;  for  example,  it  was  hard  to  determine  the 
gender  of  a  potential  dancing  partner.  However,  one  relates  di¬ 
rectly  to  the  network  delays.  Although  everyone  hears  the  same 
music  they  all  were  hearing  it  at  slightly  different  time,  further¬ 
more,  the  avatars  for  other  people  will  be  slightly  delayed  from 
their  actual  movements.  Given  popular  music  rhythms  operate  at 
several  beats  per  second,  even  modest  delays  means  that  your 
partner  appears  to  dance  completely  out  of  time!  In  more  recent 
work,  predictive  algorithms  have  been  used  to  create  “ghost”  fig¬ 
ures  showing  the  “best  guess”  location  of  people  and  things  in 
collaborative  virtual  environments  (Gutwin  et  al.,  2004). 

Reliability 

As  well  as  delays,  we  noted  previously  that  network  connec¬ 
tions  may  not  always  be  reliable — that  is,  information  may  be 
lost.  Video  and  audio  streams  behave  very  differently  in  the 
presence  of  dropped  information.  Imagine  you  were  watching  a 
film  on  a  long  air  flight.  The  break  in  the  sound  when  the  pilot 
makes  an  announcement  is  much  more  difficult  than  losing 
sight  of  the  screen  for  a  moment  or  two  as  the  passenger  in 
front  stands  up.  At  a  smaller  scale  a  fraction-of-a-second  loss  of  a 
few  frames  of  video  just  makes  the  movie  seem  a  little  jerky;  a 
smaller  loss  of  even  a  few  tens  of  milliseconds  of  audio  signal 
would  make  an  intrusive  click  or  distortion.  In  general,  reliabil¬ 
ity  is  more  important  for  audio  than  video  streams  and  where 
resources  are  limited  it  is  typically  most  important  to  reserve  the 
quality  of  service  for  the  audio  stream. 

Sound  and  Vision 

Why  is  it  that  audio  is  more  sensitive  than  video?  Vision 
works  (largely)  by  looking  at  a  single  snapshot.  Try  walking 
around  the  room  with  your  eyes  shut,  but  opening  them  for 
glances  once  or  twice  a  second.  Apart  from  the  moment  or  two 
as  your  eyes  refocus  you  can  cope  remarkably  well.  Now  turn  a 
radio  on  with  the  sound  turned  very  low  and  every  second  turn 
the  sound  up  for  a  moment  and  back  to  silent — potentially  an 


interesting  remixing  sound,  but  not  at  all  meaningful.  Sound 
more  than  vision  is  about  change  in  time.  Even  the  most  basic 
sounds,  pure  tones,  are  measured  in  frequencies,  how  long  be¬ 
tween  peaks  and  troughs  of  air  pressure.  For  more  complex 
sounds  the  shape  of  the  sound  through  time — how  its  volume 
and  frequency  mix  changes — are  critical.  For  musical  instru¬ 
ments  it  is  hard  to  hear  the  difference  between  instruments  if 
they  are  playing  a  continuous  note,  but  instantly  differentiable 
by  their  attack,  how  the  note  starts.  (To  get  some  idea  of 
the  complexity  of  sound  see  the  review  in  Mitsopoulos’  thesis 
(Mitsopoulos,  2000),  and  for  an  insight  into  the  way  different 
senses  affect  interaction  see  my  AVI’96  paper  (Dix,  1996).) 

Compression 

As  we  discussed  earlier,  it  is  possible  to  produce  reliable  net¬ 
work  connections,  but  this  introduces  additional  delays.  Com¬ 
pression  can  also  help  by  reducing  the  overall  amount  of  audio¬ 
visual  data  that  needs  to  be  transmitted,  but  again  may  introduce 
additional  delays.  Eurthermore,  simple  compression  algorithms 
require  reliable  channels  (both  kinds  of  delays) .  Special  algo¬ 
rithms  can  be  designed  to  cope  with  dropped  data,  making  sure 
that  the  most  important  parts  of  the  signal  are  replicated  or 
“spread  out”  so  that  dropped  data  leads  to  a  loss  in  quality  rather 
than  to  interruption. 

Jitter 

As  noted  previously,  jitter  is  particularly  problematic  for  con¬ 
tinuous  media.  Small  variations  in  delay  can  lead  to  jerky  video 
playback,  but  is  again  even  worse  for  audio  streams.  First  of  all, 
a  longer  than  normal  gap  between  successive  bits  of  audio  data 
would  lead  to  a  gap  in  the  sound,  just  like  dropped  data.  And 
perhaps  even  more  problematic,  what  do  you  do  when  subse¬ 
quent  data  arrives  closer  together — play  it  faster?  Changing  the 
rate  of  playing  audio  data  doesn’t  just  make  it  jerky,  but  also 
changes  the  frequency  of  sound,  rendering  it  meaningless. 

(Aside:  Actually,  you  can  do  some  quite  clever  things  by  dig¬ 
itally  speeding  up  sound  but  not  changing  its  frequency.  These 
are  not  useful  for  dealing  with  jitter,  but  can  be  used  to  quickly 
overview  audio  recordings,  or  catch  up  on  missed  audio  streams 
(Arons,  1997;  Stifelman,  Arons,  &  Schmandt,  2001).) 

Eor  real-time  audio  streams,  such  as  video  conferencing  or 
Voice  over  IP  (VoIP),  it  is  hard  to  do  anything  about  this  and  the 
best  one  can  do  is  drop  late  data  and  do  some  processing  of  the  au¬ 
dio  stream  to  smooth  out  the  clicks  this  would  otherwise  generate. 

Broadcast  and  Prerecorded  Media 

Where  media  is  prerecorded  or  being  broadcast,  but  where  a 
few  seconds  delay  are  acceptable,  it  is  possible  to  do  far  better. 
Recall  that  in  several  places  we  saw  that  better  quality  can  be  ob¬ 
tained  if  we  are  prepared  to  introduce  additional  delays. 

If  you  have  used  streaming  audio  or  video  broadcasts,  you 
will  know  that  the  quality  is  quite  acceptable  and  doesn’t  have 
many  of  the  problems  described  above.  This  is  partly  because  of 
efficient  compression,  meaning  that  video  is  compressed  to  a 
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fraction  of  a  percent  of  its  raw  bandwidth  and  so  can  fit  down 
even  a  modem  line.  However,  this  would  not  solve  the  problems 
of  jitter.  To  deal  with  this,  the  player  at  the  receiving  end  buffers 
several  seconds  of  audio-visual  data  before  playing  it  back.  The 
buffering  irons  out  the  jitter,  giving  continuous  quality. 

Try  it  out  for  yourself  Tune  onto  a  radio  channel  and  simul¬ 
taneously  listen  to  the  same  broadcast  over  the  Internet  with 
streamed  audio.  You’ll  clearly  hear  up  to  a  minute  delay  be¬ 
tween  the  two. 

Fast  and  free  Internet  connections  have  enabled  the  sharing 
and  distribution  of  high-quality  media  for  storing  and  playing 
later,  initially  through  illegitimate  file-sharing  including  the  pre¬ 
lawsuit  Napster  and  many  current  peer-peer  applications,  but 
also  increasingly  through  paid-for  services  such  as  the  current 
Napster  and  Apple  iTunes  (Wikipedia  contributors,  2006).  As 
well  as  forcing  existing  media  publishers  to  revisit  their  business 
models,  the  rise  of  web-distributed  MP3  and  Podcasting  has  en¬ 
abled  would-be  artists  and  broadcasters  to  bypass  the  traditional 
distribution  channels. 

Public  Perception:  Ownership,  Privacy,  and  Trust 

One  of  the  early  barriers  to  consumer  e-commerce  has  been  dis¬ 
trust  of  the  transaction  mechanisms,  especially  giving  credit 
card  details  over  the  web.  Arguments  that  web  transactions  are 
more  secure  than  phone-based  credit-card  transactions  or  even 
using  a  credit  card  at  your  local  restaurant  (which  gets  both  card 
number  and  signature)  did  little  to  alleviate  this  fear.  This  was 
never  as  major  a  barrier  in  the  United  States  as  it  was,  for  exam¬ 
ple,  in  Europe,  but  across  the  world  has  been  a  concern,  slow¬ 
ing  down  the  growth  of  e-shopping  (or  really  e-buying,  but  that 
is  another  story  (Drx,  2001a)). 

It  certainly  is  the  case  that  transactions  via  secure  channels  can 
be  far  more  secure  than  physical  transactions,  in  which  various  doc¬ 
uments  can  be  stolen  or  copied  en-route  and  are  in  a  format  much 
more  easy  to  exploit  for  fraud.  However,  knowing  that  a  transaction 
is  secure  is  more  than  the  mechanisms  involved,  it  is  about  human 
trust:  Do  I  understand  the  mechanisms  well  enough,  and  the  peo¬ 
ple  involved  well  enough,  to  trust  my  money  to  it? 

In  fact,  with  a  wider  perspective,  this  distrust  is  very  well 
founded.  Encryption  and  authentication  mechanisms  can  en¬ 
sure  that  I  am  talking  to  a  particular  person  or  company  and  that 
no  one  else  can  overhear.  But  how  do  I  know  to  trust  that  per¬ 
son?  Being  distant  means  I  have  few  of  the  normal  means  avail¬ 
able  to  assess  the  trustworthiness  of  my  virtual  contact.  In  the 
real  world  I  may  use  the  location  and  appearance  of  a  shop  to 
decide  whether  I  believe  it  will  give  good  service.  For  mail  order 
goods  I  may  use  the  size,  glossiness,  and  publication  contain¬ 
ing  an  advertisement  to  assess  its  expense  and  again  use  this  to 
give  a  sense  of  quality  of  the  organization.  It  is  not  that  these 
physical  indicators  are  foolproof,  but  that  we  are  more  familiar 
with  them.  In  contrast,  virtual  space  offers  few  intrinsic  affor- 
dances.  It  is  easy  and  quite  cheap  to  produce  a  very  professional 
web  presence,  but  it  may  be  little  more  than  a  facade.  This  is 
problematic  in  all  kinds  of  electronic  materials,  but  is  perhaps 
most  obvious  when  money  is  involved. 

Even  if  I  trust  the  person  at  the  other  end,  how  do  I  know 
whether  the  network  channel  I  am  using  is  of  a  secure  kind? 


Again,  the  affordances  of  the  physical  world  are  clear.  In  a  closed 
office,  the  open  street,  or  in  a  bar  frequented  by  staff  of  a  rival 
firm,  we  will  say  different  things  depending  on  the  perceived 
privacy  of  the  location.  In  the  electronic  world  we  rely  on  “https” 
at  the  beginning  of  a  URL  (how  many  ordinary  consumers  know 
what  that  means?),  or  an  icon  inserted  by  the  e-mail  program 
to  say  a  message  has  been  encrypted  or  signed.  We  need  to  trust 
not  only  the  mechanisms  themselves,  but  also  the  indicators 
that  tell  us  what  mechanisms  are  being  used  (Millett,  Friedman, 
&  Felten,  2001;  Fogg  et  al.,  2001). 

In  non-financial  transactions,  issues  of  privacy  are  also  criti¬ 
cal.  We’ve  already  seen  several  examples  where  privacy  issues 
occur.  As  more  devices  become  networked,  especially  via  wire¬ 
less  links,  and  our  environment  and  even  our  own  bodies  be¬ 
come  filled  with  interlinked  sensors,  issues  about  who  can  ac¬ 
cess  information  about  you  become  more  significant.  This  poses 
problems  at  a  technical  level — ensuring  security  between  de¬ 
vices,  at  an  interface  level — being  able  to  know  and  control  what 
or  who  can  see  specific  information;  and  at  a  perception  level, 
believing  in  the  privacy  and  security  of  the  systems.  There  are 
also  legal  implications.  For  example,  in  the  United  Kingdom  in 
2001  it  was  illegal  for  mobile  telecoms  operators  to  give  location 
information  to  third  parties  (Sangha,  2001). 

The  issue  of  perception  is  not  just  a  minor  point,  but  per¬ 
haps  the  dominant  one.  Networks,  and  indeed  computer  sys¬ 
tems  in  general,  are  by  their  nature  hidden.  We  do  not  see  the 
bits  traveling  down  the  wires  or  through  the  air  from  device  to 
device,  but  have  to  trust  the  system  at  even  the  most  basic  level. 
As  HCI  specialists  we  believe  ourselves  a  little  above  the  mun¬ 
dane  software  engineers  who  merely  construct  computer  sys¬ 
tems,  as  we  take  a  wider  view  and  understand  that  the  interac¬ 
tion  between  human  and  electronic  systems  has  additional 
emergent  properties  and  that  it  is  this  complete  socio-techni- 
cal  unit  which  achieves  real  goals.  For  networked  systems,  this 
view  is  still  far  too  parochial. 

Imagine  if  the  personal  e-mail  of  millions  of  people  was 
being  sucked  into  the  databanks  of  a  transnational  computer 
company,  and  only  being  released  when  accessed  through  the 
multinational’s  own  web  interface.  The  public  outcry!  Imagine 
Hotmail,  Yahoo  mail,  and  so  on.  How  is  it  that,  although  stored 
on  distant  computers,  perhaps  half  the  world  away,  millions  of 
people  feel  that  it  is  “their”  mailbox  and  trust  the  privacy  of  web 
mail  more  than,  perhaps,  their  organization’s  own  mail  system. 
This  feeling  of  ownership  of  remote  resources  is  more  than  the 
technology  that  protects  the  security  of  such  systems,  it  is  a  cul¬ 
tural  phenomenon  and  a  marketing  phenomenon.  The  web 
mail  “product”  is  not  just  technology,  or  interface,  but  is  formed 
by  every  word  that  is  written  about  the  product  in  advertise¬ 
ments,  press  releases,  and  media  interviews  (Dtx,  2001a). 


NETWORKS  AS  SUBJECTS 


Understanding  and  Managing  Networks 

■When  using  a  networked  application,  you  don’t  really  care  what 
kind  of  network  is  being  used,  whether  the  data  is  sent  over 
copper  wires,  fiber  optic,  microwave,  or  satellite.  All  you  care 
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about  is  that  the  two  ends  manage  to  communicate  and  the  ef¬ 
fects  any  of  the  above  have  on  the  end-to-end  network  properties 
such  as  bandwidth  discussed  in  the  previous  section.  However, 
there  are  times  when  the  network’s  internals  can’t  be  ignored. 

Those  involved  in  installing  or  managing  networks  need  to 
understand  the  internal  workings  of  the  network  in  order  to  op¬ 
timize  performance  and  find  faults.  For  ordinary  users,  when 
things  go  wrong  in  a  networked  application,  they  effectively  be¬ 
come  a  network  manager  and  so  understanding  something  of 
the  network  can  help  them  to  deal  with  the  problem.  Even 
when  things  are  working,  having  some  awareness  of  the  current 
state  of  the  network  may  help  one  predict  potential  delays, 
avoid  problems,  and  minimize  costs.  In  some  cases  this  can 
even  be  used  as  a  positive  part  of  the  interactive  experience. 

"We’!!  start  this  section  by  looking  at  some  of  the  technical  is¬ 
sues  that  are  important  in  understanding  networks.  This  parallels 
earlier  information  in  this  chapter,  but  is  focused  on  the  internal 
properties  of  the  network.  "We’!!  then  look  at  the  interface  issues 
for  those  managing  networks  and  the  ways  in  which  interfaces  can 
make  users  aware  of  critical  network  states.  Finally,  we’ll  look 
briefly  at  the  way  models  of  networks  can  be  used  as  a  metaphor 
for  some  of  the  motor  and  cognitive  behaviors  of  humans. 

Network  Models 
Layers 

Networking  is  dominated  by  the  idea  of  layers;  lower  levels 
of  the  network  offer  standard  interfaces  to  higher  levels  so  that 
it  its  possible  to  change  the  details  of  the  lower  level  without 
changing  the  higher  levels.  For  example,  imagine  you  are  using 
your  PDA  to  access  the  Internet  whilst  in  a  train  (see  Fig.  14.6). 


The  web  browser  establishes  a  TCP/IP  connection  to  the  web 
server,  requests  a  web  page,  and  then  displays  it.  However, 
between  your  PDA  and  the  web  server  the  message  may  have 
traveled  through  an  infrared  link  to  your  mobile  phone,  which 
then  used  a  cell-based  radio  to  send  it  to  a  mobile-phone  sta¬ 
tion,  then  via  a  microwave  link  to  a  larger  base  station  onto  a 
fiber  optic  telephone  backbone,  and  via  various  copper  wires 
to  your  ISP’s  modem  bank.  Your  ISP  is  then  connected  into  an¬ 
other  fiber  optic  Internet  backbone  and  eventually  via  more 
fiber  optic,  copper,  and  microwave  links  to  the  web  server.  To 
complicate  things  even  further,  it  may  even  be  that  the  tele¬ 
phone  and  Internet  backbones  may  share  the  same  physical 
cabling  at  various  points.  Imagine  if  your  poor  PDA  had  to 
know  about  all  of  this. 

In  fact,  even  your  PDA  will  know  about  at  least  five  layers: 

•  Infrared — how  the  PDA  talks  to  the  phone 

•  Modem — how  the  PDA  uses  the  phone  to  talk  to  your  ISP 

•  IP — how  your  PDA  talks  to  the  web-server  computer 

•  TCP — how  data  is  passed  as  a  reliable  connection-based 
channel  between  the  right  program  on  your  PDA  and  the 
web-server  computer 

•  HTTP — how  the  browser  talks  to  the  web  server 

Each  of  these  hides  most  of  the  lower  levels,  so  your  browser 
needs  to  know  nothing  about  11)  the  modem,  or  the  infrared 
connection  whilst  accessing  the  web  page. 

The  nature  of  these  layers  differs  both  between  different 
types  of  network  (for  example,  "WAP)  for  sending  data  over  mo¬ 
bile  phones  and  devices;  it  has  five  defined  layers  (Arehart  et  al., 
2000)  while  the  ISO  OSI  reference  model  has  seven  layers 
(ISO/IEC7498,  1994). 


phone  cell 
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FIGURE  14.6.  The  long  path  from  PDA  to  web. 
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FIGURE  1 4 . 7 .  OSI  7  layers  and  TCP/IR 


mail  program  says  SMTP  server  replies 


220  mail. server. )U'i  ¥.SMTP 


HF.I.O  mypt'.myiiomain.rtim 

250  mait.savcr.m'i  Hello  mype.ntyihnutin.eomllOO.O.l.Tl 
pleased  to  meet  you 

MAILFrom:<mysolKimydomjin.com> 

250  <myself^tnydvniain.cvm>...  Sender  ti^ 

RCpTTo:<a  friond©thcirdomain.com> 

250  <a  fnendUtlieirdomant.com>.^  Reeiptent  ok 


DATA 


35-f  Enter  mail,  end  with  *. '  on  a  line  by  itself 


At  lower  levels  data  is  usually  sent  in  small  packets,  which 
contain  a  small  amount  of  data  plus  header  information  saying 
where  the  data  is  coming  from,  where  it  is  going  to  and  other 
bookkeeping  information  such  as  sequence  numbers,  data 
length,  and  so  forth. 

Even  telephone  conversations,  except  those  using  pre-digital 
exchanges,  are  sent  by  chopping  up  your  speech  into  short  seg¬ 
ments  at  your  local  telephone  exchange,  sending  each  segment 
as  a  packet  and  then  reassembling  the  packets  back  into  a  con¬ 
tinuous  stream  at  the  other  end  (Stevens,  1998,  1999). 


Internetworking  and  Tunneling 

This  layering  doesn’t  just  operate  within  a  particular  network 
standard,  but  between  different  kinds  of  networks  too.  The  In¬ 
ternet  is  an  example  of  an  internet  (notice  lowercase  “i”)  that  is 
a  network  that  links  together  different  kinds  of  low-level  net¬ 
works.  For  example,  many  PCs  are  connected  to  the  Internet  via 
an  Ethernet  cable.  Ethernet  sends  its  own  data  in  packets  like 
those  of  Fig.  14.9.  The  Internet  protocol  (IP)  also  has  packets 
of  a  form  like  Fig.  14.9.  When  you  make  an  Internet  connection 
via  Ethernet,  the  IP  packets  are  placed  in  the  data  portion  of 
the  Ethernet  packet,  so  you  get  something  a  bit  like  Fig.  14. 10. 

This  placing  of  one  kind  of  network  packet  inside  the  data 
portion  of  another  kind  of  network  is  also  used  in  virtual  private 
networks  (VPNs)  in  a  process  called  tunneling.  This  process 
is  used  to  allow  a  secure  network  to  be  implemented  using  a 
public  network  like  the  Internet.  Imagine  a  company  has  just 
two  offices,  one  in  Australia  and  the  other  in  Canada.  When  a 
computer  in  the  Sydney  office  sends  data  to  a  computer  in  the 
Toronto  office,  the  network  packet  is  encrypted,  put  in  the  data 
portion  of  an  Internet  packet,  and  sent  via  the  Internet  to  a  spe¬ 
cial  computer  in  the  Toronto  office.  "When  it  gets  there,  the  com¬ 
puter  at  the  Toronto  office  detects  it  is  VPN  data,  extracts  the  en¬ 
crypted  data  packet,  decrypts  it,  and  puts  it  onto  its  own  local 
network  where  the  target  computer  picks  it  up.  As  far  as  both 


first  line  ol  message 
..dotty  2nd  line 
last  line 


gun 


header  body 


to  address 

from  address 

info 

data  length 

data . 

250  KAA2WS2  accepted  for  delivery 

FIGURE  14.9.  Typical  network  packet  format  (simplified). 

221  mail.server.net  closing  connection 


FIGURE  14.8.  Protocol  to  send  e-mail  via  SMTP 


ethemet  header 


ethemet  body 


Protocols 

Systems  at  the  same  layer  typically  require  some  standard 
language  to  communicate.  This  is  called  a  protocol-,  for  higher 
levels  this  may  be  quite  readable.  For  example,  to  send  an  In¬ 
ternet  e-mail  message  your  mail  program  connects  to  an  SMTP 
server  (a  system  that  relays  messages)  using  TCP/IP  and  has  the 
exchange  shown  in  Fig.  14.8. 


IP  header 

IP  body 

ethemet 

ethemet 

IP  flag 

IP  to 

IP 

other 

Internet  data . 

to  addr. 

from  addr. 

data  len. 

addr 

from 

header 

etc. 

addr 

IP  info 

FIGURE  14.10.  Internet  IP  packet  inside  Ethernet  packet. 
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ends  are  concerned  it  looks  as  if  both  offices  are  on  the  same 
LAN  and  any  data  on  the  Internet  is  fully  encrypted  and  secure. 

Routing 

If  two  computers  are  on  the  same  piece  of  physical  network, 
each  can  simply  “listen”  for  packets  that  are  destined  for  them — 
so  sending  messages  between  them  is  easy.  If,  however,  mes¬ 
sages  need  to  be  sent  between  distant  machines — for  example, 
if  you  are  dialed  into  an  ISP  in  the  United  Kingdom  and  are  ac¬ 
cessing  a  web  server  in  the  United  States — the  message  cannot 
simply  be  broadcast  to  every  machine  on  the  Internet  (Fig. 
14.11).  Instead,  at  each  stage  it  needs  to  be  passed  in  the  right 
direction  between  different  parts  of  the  network.  Routers  per¬ 
form  this  task.  They  look  at  the  address  of  each  packet  and  de¬ 
cide  where  to  pass  it.  A  local  machine  might  need  to  simply  put 
it  onto  the  relevant  local  network,  but  if  not  it  may  need  to  pass 
it  on  to  another  intermediate  machine. 

Routers  may  be  stand-alone  boxes  in  network  centers,  or 
may  be  a  normal  computer.  Often  a  file  server  acts  as  a  router 
between  a  LAN  and  the  global  network. 

As  well  as  routers,  networks  are  also  linked  by  hubs  and 
switches  that  make  several  different  pieces  of  physical  network 
behave  as  if  they  were  one  local  network,  and  gateways  that  link 
different  kinds  of  network.  The  details  of  these  are  not  impor¬ 
tant,  but  they  add  more  to  the  sheer  complexity  of  even  small 
networks. 


Addresses 

In  order  to  send  messages  on  the  Internet  or  any  other  net¬ 
work,  you  need  to  have  the  address  of  where  they  are  to  go  (or 
at  least  your  computer  needs  it).  In  a  phone  network  this  is  the 
telephone  number  and  on  the  Internet  it  is  an  IP  number.  The 
IP  number  is  a  32-bit  number,  normally  represented  as  a  group 
of  four  numbers  between  0  and  255  (e.g.  212.35.74.132),  which 
you  will  have  probably  seen  at  some  stage  when  using  a  web 
browser  or  other  Internet  tool.  It  is  these  IP  numbers  that  have 
are  used  by  routers  to  send  Internet  data  to  the  right  places. 

However,  with  any  network  there  is  a  problem  of  getting  to 
know  the  address.  With  phone  numbers  you  simply  look  up  the 
person’s  name  in  a  telephone  directory  or  by  phoning  the  op¬ 
erator.  Similarly,  most  networks  have  a  naming  scheme  and 
some  way  to  translate  these  into  addresses.  In  the  case  of  the 


The  32-bit  IP  number  space  allows  for  4  billion  addresses.  This 
sounds  like  quite  a  lot;  however,  these  have  been  running  out  due 
to  the  explosive  growth  in  the  number  of  Internet  devices  and 
"wasted”  IP  numbers  due  to  the  way  ranges  of  numbers  get  allo¬ 
cated  to  sub-networks.  The  new  version  of  TCP/IR  IPv6,  which  is 
being  deployed,  has  1 28-bit  addresses,  which  require  1 6  numbers 
(IPng,  2001).  This  allows  sufficient  unique  IP  addresses  for  every 
phone,  PDA,  Internet  enabled  domestic  appliance,  or  even  elec¬ 
tronic  paperclip. 


FIGURE  14.11.  Routers  send  messages  in  the  right  direction  through  complex  networks. 
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Internet,  domain  names  (e.g.,  acm.org,  www.hcibook.com,  or 
magisoft.co.uk)  are  the  naming  system.  There  are  many  of  these 
and  they  are  changed  relatively  rapidly,  so  there  is  no  equivalent 
of  a  telephone  directory,  but  instead  special  computers  called 
“domain  name  servers  (DNS)”  act  as  the  equivalent  of  tele¬ 
phone  directory  inquiry  operators.  Every  time  an  application 
needs  to  access  a  network  resource  using  a  domain  name  (e.g., 
to  look  up  a  URL)  the  computer  has  to  ask  a  DNS  what  IP  ad¬ 
dress  corresponds  to  that  domain  name.  Only  then  can  it  use 
the  IP  address  to  contact  the  target  computer. 

The  DNS  system  is  an  example  of  a  “white  pages”  system. 
You  have  an  exact  name  and  want  to  find  the  address  for  that 
name.  In  addition  there  are  so-called  “yellow  pages”  systems 
where  you  request,  for  example,  a  color  postscript  printer  and 
are  told  of  addresses  of  systems  supplying  the  service.  Some¬ 
times  this  may  be  mediated  by  brokers  who  may  attempt  to  find 
the  closest  matching  resource  (e.g.,  a  nonpostscript  color 
printer)  or  even  perform  translations  (for  example,  the  Java  JINI 
framework;  Edwards  &  Rodden,  2001). 

This  latter  form  of  resource-discovery  system  is  most  impor¬ 
tant  in  mobile  systems  and  ubiquitous  computing  where  we  are 
particularly  interested  in  establishing  connections  with  other 
geographically  close  devices. 

A  final  piece  in  the  puzzle  is  how  one  gets  to  know  the  ad¬ 
dress  of  the  name  server,  directory  service,  or  brokering  service. 
In  some  types  of  network  this  may  be  managed  by  sending 
broadcast  requests  to  a  network  (“Is  there  a  name  server  out 
there?”).  In  the  case  of  the  Internet  this  is  normally  explicitly 
set  for  each  machine  as  part  of  its  network  settings. 

All  Together  .  .  . 

If  you  are  in  your  web  browser  and  you  try  to  access  the 
URL  http://www.meandeviation.com/qbb/,  the  following  stages 
take  place: 

1.  Send  IP-level  request  to  DNS  asking  for 
WWW.  meandeviation  .com 

2.  Wait  for  reply 

3.  DNS  sends  reply  64.39.13.108 

4.  Establish  TCP  level  connection  with  64.39.13.108 

5.  Send  HTTP  request  (“GET/qbb/HTTP/1.1”) 

6.  Web  server  sends  the  page  back  in  reply 

7.  Close  TCP  connection 

Most  of  these  stages  are  themselves  simplified,  all  will  involve 
layering  on  top  of  lower-level  networks,  and  most  stages  involve 
several  sub-stages  (e.g.,  establishing  a  TCP  level  connection  re¬ 
quires  several  IP-level  messages). 

The  basic  message  is  that  network  internals  are  multilevel, 
multistage,  and  pretty  complicated. 

Decentralizing:  Peer-Peer  and  Ad-Hoc  Networks 

Traditional  Internet  and  web  applications  tended  to  work 
through  client-server  paradigms  in  which  a  user’s  client  appli¬ 
cation  accesses  a  well-known  application  such  as  a  web  server. 


However,  there  has  been  a  growth  of  more  decentralized  mod¬ 
els  at  both  higher  and  lower  protocol  levels  (Androutsellis- 
Theotokis  &  Spinellis,  2004). 

At  an  application  level,  peer-peer  file  sharing  works  by  hav¬ 
ing  clients  running  on  users’  own  PCs  talk  directly  to  one  an¬ 
other  broadcasting  requests  for  particular  files  by  name  or  type 
to  all  connected  clients.  These  then  forward  the  requests  to 
their  connected  nodes  until  some  maximum  hop  count  is  hit. 
Often  semi-centralized  services  are  used  to  establish  initial  con¬ 
nections,  but  thereafter  everything  is  done  at  the  “edges”  of  the 
network.  These  protocols  and  applications  have  some  techni¬ 
cal  strengths  but  their  origins  are  rooted  in  legal  disputes  on  the 
distribution  of  copyright  material.  Recent  research  has  also 
shown  that  the  anonymity  of  peer-peer  file  sharing  also  allows 
“deviant”  subgroups  sharing  illicit  material  (Hughes,  Gibson, 
Walkerdine,  &  Coulson,  2006). 

At  a  lower  level,  ad-hoc  networks  build  more  structured 
higher-level  networks  for  computers  that  have  point-to-point 
connections,  often  through  wireless  connectivity.  This  can  be 
used  to  set  up  a  network  in  a  meeting,  or  at  a  larger  level  could 
even  allow  everyone  in  a  pop  festival  to  have  Internet  access 
through  a  small  number  of  wireless  hot  spots.  In  an  ad  hoc  net¬ 
work,  the  low-level  software  works  out  what  machines  are  con¬ 
nected  where  and  routes  messages  from  end  to  end  even  if  the 
two  end-point  computers  do  not  have  direct  access.  The  diffi¬ 
cult  thing  in  an  ad  hoc  network  is  not  just  setting  up  this  routing 
information,  but  the  dynamics:  dealing  with  the  fact  that  ma¬ 
chines  constantly  move  around,  perhaps  going  out  of  range  of 
one  another,  may  be  turned  off,  or  may  join  the  network  late. 

Network  Management 

Those  most  obviously  exposed  to  this  complexity  are  the  engi¬ 
neers  managing  large  national  and  international  networks,  both 
data  networks  such  as  the  main  Internet  backbones,  and  tele¬ 
coms  networks.  The  technical  issues  outlined  above  are  com¬ 
pounded  by  the  fact  that  the  different  levels  of  network  hard¬ 
ware  and  network  management  software  are  typically  supplied 
by  different  manufacturers.  Furthermore,  parts  of  the  network 
may  be  owned  and  managed  by  a  third  party  and  shared  with 
other  networks.  For  example,  a  transatlantic  fiber  optic  cable 
may  carry  telecoms  and  data  traffic  from  many  different  carriers. 

"When  a  fault  occurs  in  such  a  network  it  is  hard  to  know 
whether  it  is  a  software  fault  or  a  hardware  fault,  where  it  is  hap¬ 
pening,  or  who  is  responsible.  If  you  send  engineers  out  to  the 
wrong  location  it  will  cost  both  their  time  and  the  time  the  ser¬ 
vice  is  unavailable.  Typically  the  penalties  for  inoperative  or  re¬ 
duced  quality  services  are  high;  you  need  to  get  it  right,  fast. 

This  is  a  specialized  and  complex  area,  but  clearly  of  increasing 
importance.  It  poses  many  fascinating  UI  challenges:  how  to  vi¬ 
sualize  complex  multilayered  structures  and  how  to  help  opera¬ 
tors  trace  faults  in  these.  Although  I  know  that  it  is  a  topic  being 
addressed  by  individual  telecoms  companies,  published  material 
in  the  HCI  literature  is  minimal.  The  exception  is  visualization  of 
networks,  both  physical  and  logical,  which  is  quite  extensive 
(Dodge  and  Kitchin’s  Atlas  of  Cyberspace  and  associated  website 
(http://www.cybergeography.org/atlas/)  is  the  comprehensive  text 
in  this  area;  Dodge  &  Kitchin,  2001). 
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Ordinary  systems  administrators  in  organizations  face  similar 
problems  albeit  on  a  smaller  scale.  A  near-universal  experience 
and  consequently  of  misconfigured  e-mail  systems,  continual 
network  failures  and  performance  problems  certainly  suggests 
this  is  an  area  ripe  for  effective  interface  solutions,  but  again 
there  is  very  little  in  the  current  HCI  literature. 

Finally,  it  appears  that  everyone  is  now  a  network  adminis¬ 
trator;  even  a  first-time  home  PC  user  must  manage  modem  set¬ 
tings,  name  server  addresses,  SMTP  and  POP  servers,  and  more. 
It  is  interesting  that  the  interface  used  by  most  such  users  is 
identical  to  that  supplied  for  full  systems  administrators.  Ar¬ 
guably  this  may  ease  the  path  for  those  who  graduate  from 
single  machines  to  administering  an  office  or  organization  (per¬ 
haps  less  than  5%  of  users).  Unfortunately,  it  makes  life  intoler¬ 
able  for  the  other  95%!  The  only  thing  that  makes  this  possible 
at  all  is  that  the  “welcome”  disks  from  many  ISPs  and  the  Wiz¬ 
ards  shipping  with  the  OS  offer  step-by-step  instructions  or  may 
automatically  configure  the  system.  These  complications  are 
compounded  if  the  user  wishes  to  allow  access  through  more 
than  one  ISP,  or  connect  into  a  fixed  network.  As  many  home 
users  now  have  several  PCs  and  other  devices  that  need  to  be 
networked,  this  is  not  a  minor  issue. 

In  the  first  edition  of  this  handbook  I  wrote,  “If  we  look  at 
the  current  state  of  the  two  most  popular  PC  systems,  Microsoft 
Windows  and  Mac  OS,  the  picture  is  not  rosy”.  Sadly  things  have 
not  improved  dramatically;  you  only  have  to  watch  a  room  full 
of  computer  scientists  at  a  meeting  trying  to  sort  out  wireless 
connections. 

On  Windows,  if  you  do  want  to  change  your  network  settings 
you  need  to  set  several  IP  numbers,  web  proxies,  and  SMTP  set¬ 
tings  in  several  control  panels.  In  fact  as  I  sat  here  now  writing 
I  tried  and  failed  to  find  the  place  to  change  the  web  proxy  set¬ 
ting  on  my  PC.  It  is  possible  to  set  up  alternate  settings  for  net¬ 
works  if  you  move  between  two  locations,  but  more  than  that 
and  you  are  lost!  On  MacOS  the  situation  has  improved  in  the 
last  few  years  and  now  nearly  all  settings  are  in  one  place  under 
“Network  Settings.”  Changing  locations  with  a  laptop  is  simply  a 
matter  of  selecting  “Location  .  .  .”  from  the  main  menu.  Per¬ 
versely,  the  default  printer  that  used  to  be  attached  to  your  lo¬ 
cation  on  the  older  MacOS  9  interface  now  has  to  be  changed  by 
hand  each  time! 

The  problems  in  both  emphasize  partly  the  intrinsic  com¬ 
plexity  of  networking — yes,  it  does  involve  multiple  logically  dis¬ 
tinct  settings,  many  of  which  relate  to  low-level  details.  How¬ 
ever,  it  also  exposes  the  apparent  view  that  those  involved  in 
network  administration  are  experts  who  understand  the  mean¬ 
ing  of  various  internal  networking  terms.  This  is  not  the  case 
even  for  most  office  networks,  and  certainly  not  at  home. 

And  this  is  just  initially  setting  up  the  system  to  use.  For  the 
home  user,  debugging  faults  has  many  of  the  same  problems 
as  large  networks.  You  try  to  visit  a  website  and  get  an  error  box. 
Is  the  website  down?  Are  there  problems  with  the  wireless  LAN, 
the  broadband  modem,  the  phone  line,  or  the  ISP’s  hardware? 
Are  all  your  configurations  settings  right?  Has  a  thunderstorm 
3,000  miles  away  knocked  out  a  vital  network  connection?  Try¬ 
ing  to  understand  a  multilayered,  non-localized  and,  when 
things  work,  largely  hidden  system  is  intrinsically  difficult  and 
where  diagnostic  tools  for  this  are  provided,  they  assume  an 
even  greater  degree  of  expertise. 


The  much-heralded  promise  of  devices  that  connect  to  one 
another  within  our  homes  and  about  our  bodies  is  going  to 
throw  up  many  of  the  same  problems.  Some  old  ones  may  ease 
as  explicit  configuration  becomes  automated  by  self-discovery 
between  components,  but  this  adds  further  to  the  “hiddenness” 
and  thus  difficulty  in  managing  faults,  security,  and  so  on.  You 
can  imagine  the  scenario:  The  sound  on  my  portable  DVD  stops 
working  and  produces  a  continuous  noise — why?  There  are  no 
cables  to  check  of  course  (wireless  networking),  but  hours  of 
checking  and  randomly  turning  devices  on  and  off  narrows  the 
problem  down  to  a  fault  in  the  washing  machine  which  is  send¬ 
ing  continuous  “I  finished  the  clothes”  alerts  to  all  devices  in  the 
vicinity. 

Network  Awareness 

One  of  the  problems  noted  above  is  the  hiddenness  of  networks. 
This  causes  problems  when  things  go  wrong  as  one  hasn’t  an 
appropriate  model  of  what  is  going  on,  but  also  sometimes  even 
when  things  are  working  properly. 

"We  discussed  earlier  some  of  the  network  properties  that  may 
affect  usability:  bandwidth,  delay,  jitter,  and  so  forth.  These  are  all 
affected  to  some  extent  by  other  network  loads,  the  quality  of 
current  network  connections,  and  so  on.  So  predicting  perfor¬ 
mance  (and  knowing  whether  to  panic  if  things  appear  to  go 
slow)  needs  some  awareness  of  the  current  state  of  the  network. 

Personal  computers  using  wireless  networks  usually  offer 
some  indication  of  signal  strength  (if  one  knows  where  to  look 
and  what  it  means)  although  this  is  less  common  for  line  qual¬ 
ity  for  modems.  As  wireless  devices  and  sensors  become  smaller 
they  will  not  have  suitable  displays  for  this  and  explicitly  making 
users  aware  of  the  low-level  signal  strength  of  an  intelligent  pa¬ 
perclip  may  not  be  appropriate.  However,  as  interface  designers 
we  do  need  to  think  how  users  will  be  able  to  cope  and  problem- 
solve  in  such  networks. 

Only  more  sophisticated  network  management  software  al¬ 
lows  one  to  probe  the  current  load  on  a  network.  This  does 
matter.  Consider  a  small  home  network  with  several  PCs  con¬ 
nected  through  a  single  modem.  If  one  person  starts  a  large 
download  they  will  be  aware  that  this  will  affect  the  perfor¬ 
mance  of  the  rest  of  their  web  browsing.  Other  members  of  the 
household  will  just  experience  a  slowing  down  of  everything 
and  not  understand  why.  If  the  cause  is  a  file-sharing  utility  such 
as  Gnutella,  even  the  person  whose  computer  is  running  it  may 
not  realize  its  network  impact.  In  experiments  at  MIT  the  level 
of  network  traffic  has  been  used  to  “jiggle”  a  string  hanging  from 
the  ceiling  so  that  heavy  traffic  leads  to  a  lot  of  movement 
(Wisneski  et  al.,  1998).  The  movements  are  not  intrusive  so  give 
a  general  background  awareness  of  network  activity.  Although 
supplying  a  ceiling-mounted  string  may  not  be  the  ideal  solu¬ 
tion  for  every  home,  other  more  prosaic  interface  features  are 
possible. 

Cost  awareness  is  also  very  important.  In  the  United  King¬ 
dom,  first  generation  GSM  mobile  data  services  are  charged  by 
connection  time.  So  knowing  how  long  you  have  been  con¬ 
nected  and  how  long  things  are  likely  to  take  becomes  critical.  If 
these  charges  differ  at  peak  hours  of  the  day,  calculating  whether 
to  read  your  e-mail  now,  or  do  a  quick  check  and  download  the 
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big  attachments  later  can  become  a  complex  decision.  The  move 
to  data-volume-based  charging  means  the  user  needs  to  esti¬ 
mate  the  volume  of  data  that  is  likely  to  be  involved  in  initiat¬ 
ing  an  action  versus  the  value  of  that  data — do  you  want  to  click 
on  that  link  if  the  page  it  links  to  includes  large  graphics,  per¬ 
haps  an  applet  or  two?  Terrestrial  broadband  networks  have 
largely  shifted  to  fixed  monthly  fees  and  this  has  dramatically 
changed  the  way  people  perceive  the  Internet,  from  smash-and- 
grab  interactions  to  surf-and-play. 

Network  Confusion 

If  the  preceding  doesn’t  sound  confusing  enough,  the  multilay¬ 
ered  nature  of  networked  applications  means  that  it  is  hard  to 
predict  the  possible  patterns  of  interference  between  things  im¬ 
plemented  at  different  levels  or  even  at  the  same  level.  Again 
this  is  often  most  obvious  when  things  go  wrong,  but  also  be¬ 
cause  unforeseen  interactions  may  mean  that  two  features  that 
work  perfectly  well  in  isolation  may  fail  when  used  together. 

This  problem,  feature  interaction,  has  been  studied  particu¬ 
larly  in  standard  telecoms  (although  it  is  certainly  not  confined 
to  them).  Let’s  look  at  an  example  of  feature  interaction.  Tele¬ 
phone  systems  universally  apply  the  principle  that  the  caller, 
who  has  control  over  whether  and  when  the  call  is  made,  is  the 
person  who  pays.  In  the  exceptions  (free-phone  numbers,  re¬ 
verse  charges)  special  efforts  are  made  to  ensure  that  sub¬ 
scribers  understand  the  costs  involved.  Some  telephone  sys¬ 
tems  also  have  a  feature  whereby  a  caller  who  encounters  a  busy 
line  can  request  a  callback  when  the  line  becomes  free.  Unfor¬ 
tunately  at  least  one  company  implemented  this  callback  feature 
so  that  the  charging  system  saw  the  callback  as  originating  from 
the  person  who  had  originally  been  called.  Each  feature  seemed 
to  be  clear  on  its  own,  but  together  meant  you  could  be  charged 
for  calls  you  didn’t  want  to  make. 

’With  N  features  there  are  N(N— 1)/2  possible  pairs  of  inter¬ 
actions  to  consider,  N(N— 1)(N  — 2)/6  triples,  and  so  on.  This  is 
a  well-recognized  (but  not  solved)  problem  with  considerable 
efforts  being  made  using,  for  example,  formal  analysis  of  inter¬ 
actions  to  automatically  detect  potential  problems.  It  is  worth 
noting  that  this  is  not  simply  a  technical  issue;  the  charging  ex¬ 
ample  shows  that  it  is  not  just  who  pays  that  matters,  but  also 
the  perceptions  of  who  pays.  This  particular  interaction  would 
have  been  less  of  a  problem  if  the  interface  of  the  phone  sys¬ 
tem  had,  for  example,  said  (in  generated  speech),  “You  have 
had  a  call  from  XXX;  press  ‘callback’  and  you  will  be  charged 
for  this  call.”  The  hybrid-feature  interaction  research  group  at 
Glasgow  are  attempting  to  build  a  comprehensive  list  of  such 
problems  in  telecoms  (HFIG,  2001),  but  these  sorts  of  problems 
are  likely  to  be  found  increasingly  in  related  areas  such  as  ubiq¬ 
uitous  computing  and  resource  discovery. 

Exploiting  the  Limitations:  Seamfulness 
and  Virtual  Locality 

Some  companies  cynically  exploit  this  user  confusion  over  net¬ 
work  charging  and  in  the  United  Kingdom  there  have  been 
some  high-profile  news  stories  about  teenagers  running  up 


thousands  of  pounds  of  debt  after  innocently  signing  up  to  ring- 
tone  delivery  services. 

Happily  there  are  also  more  positive  uses  of  the  limitations 
of  networks.  Given  suitable  awareness  mechanisms  for  network 
strength  and  connectivity,  people  are  very  resourceful  in  ex¬ 
ploitation.  You  will  have  experienced  the  way  mobile-phone 
users  get  to  know  the  sweet  spots  for  their  networks  when  in 
areas  of  poor  coverage,  learning  to  get  out  of  doors,  away  from 
big  buildings,  or  up  hills.  This  can  get  sophisticated.  One  mixed- 
reality  experiment,  part  of  the  Equator  project,  consisted  of  an 
outside  game  where  real  players  running  in  the  street  were  pit¬ 
ted  against  virtual  characters  manipulated  remotely.  The  “real” 
players  learned  to  hide  in  the  GPS  shadow  of  buildings  so  that 
their  location  could  not  be  detected  by  the  virtual  participants 
(Benford  et  al.,  2003).  Similar  effects  have  been  seen  in  ’WiFi- 
based  games  where  participants  get  to  know  the  regions  of 
good  and  bad  coverage  and  may  use  these  to  seek  out  or  avoid 
remote  interactions. 

In  these  ways,  users  do  more  than  cope;  they  actively  exploit 
the  limitations  of  networks  and  sensing.  The  Pirates  game  did 
this  with  RF  technology.  RF  beacons  represent  islands  in  an 
ocean,  the  locality  of  the  RF  signals  corresponding  to  the  limited 
land  area  of  the  island  (Bjork,  Falk,  Hansson,  &  Ljungstrand, 
2001).  This  notion  of  exploiting  network  limitations  has  led  to 
the  idea  of  seamful  games,  deliberately  designed  so  that  varia¬ 
tions  in  ’WiFi  coverage  and  connectivity  are  part  of  the  game  play 
(Chalmers  et  al.,  2005). 

Even  more  strange  is  a  recent  game  “Hitchers.”  In  this  you 
pick  up  and  drop  off  hitchers  with  your  phone,  just  as  you  might 
pick  up  a  hitchhiker  on  the  road.  You  can  only  pick  up  a  hitcher 
in  the  same  mobile  cell  as  the  last  person  who  dropped  the 
hitcher  off  The  hitchers  are  actually  stored  and  managed  cen¬ 
trally,  but  the  effect  is  as  if  the  hitchers  were  only  accessible  in 
a  small  region.  Here,  even  though  there  is  no  real  limitation  of 
access,  a  limitation  is  constructed  to  make  a  more  interesting 
game:  virtual  locality.  You  can  imagine  virtual  locality  being 
used  in  synchronous  applications — for  example,  allowing  a 
phone  user  to  broadcast  to  people  in  their  vicinity  apparently  as 
if  it  were  a  limited  range  transmission,  whereas  in  reality  it  is 
centrally  managed,  based  on  GPS  or  phone  cell  location? 

The  Network  Within 

So  far  the  story  is  pretty  bleak  from  a  user  interface  viewpoint: 
a  complex  problem,  of  rapidly  growing  importance,  with  rela¬ 
tively  little  published  work  in  many  areas.  One  good  thing  as  a 
HCI  practitioner  about  understanding  the  complexity  of  net¬ 
works  is  that  they  help  us  understand  better  the  workings  of  the 
human  cognitive  and  motor  system. 

For  at  least  five  decades,  computational  models  have  been 
used  to  inspire  cognitive  models.  Also  of  course  cognitive  and 
neurological  models  have  been  used  to  inspire  computational 
models  in  artificial  intelligence  and  neural  networks.  However, 
our  bodies  are  not  like  a  single  computer,  but  in  various  ways 
more  like  a  networked  system. 

This  is  because,  first,  several  things  can  happen  at  once.  The 
interacting  cognitive  subsystems  (ICS)  model  from  APU  Cam¬ 
bridge  (Barnard  &  May,  1995)  takes  this  into  account  by  looking 
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at  various  parts  of  the  cognitive  system,  the  conversions  between 
representations  between  these  parts,  and  the  conflicts  that  arise 
if  the  same  part  is  used  to  perform  different  tasks  simultane¬ 
ously.  Similarly,  the  very  successful  PERT-style  GOMS  analysis 
used  on  the  NYNEX  telephone  operators’  interface  used  the  fact 
that  the  operator  could  be  doing  several  things  simultaneously 
with  no  interfering  parts  of  their  bodies  and  brains  (John,  1990; 
Gray,  John,  &  Atwood,  1992). 

"We  are  also  like  a  networked  system  in  that  signals  take  an 
appreciable  time  to  get  from  our  senses  to  our  brains  and  from 
our  brains  to  our  muscles.  The  famous  homunculus  from  Card, 
Moran,  and  Newell’s  (1983)  Model  Human  Processor  makes  this 
very  clear  with  timings  attached  to  various  paths  and  types  of 
mental  processing.  In  fact,  the  sorts  of  delays  within  our  bodies 
(from  50-200  ms  on  different  paths)  are  very  similar  to  those 
found  on  international  networks. 

In  industrial  control  one  distinguishes  between  open-loop 
and  closed-loop  control  systems.  Open-loop  control  is  where 
you  give  the  machine  an  instruction  and  assume  it  does  it  cor¬ 
rectly  (like  a  treasure  map:  “Ten  steps  North,  turn  left,  three 
steps  forward,  and  dig”).  This  assumes  the  machine  is  well  cali¬ 
brated  and  predictable.  In  contrast,  closed-loop  control  uses 
sensors  to  constantly  feedback  and  modify  future  actions  based 
on  the  outcomes  of  previous  ones  (e.g.,  “Follow  the  yellow 
brick  road  until  you  come  to  the  Emerald  City”).  Closed-loop 
control  systems  tend  to  be  far  more  robust,  especially  in  un¬ 
controlled  environments,  like  the  real  world. 

Not  surprisingly  our  bodies  are  full  of  closed-loop  control 
systems  (e.g.,  the  level  of  carbon  dioxide  in  your  lungs  triggers 
the  breathing  reflex).  However,  closed-loop  control  can  become 
difficult  if  there  are  delays  (you  have  not  received  feedback  from 
the  previous  action  when  starting  the  next  one).  Delays  either 
mean  one  has  to  slow  down  the  task  or  use  some  level  of  pre¬ 
diction  to  work  out  what  to  do  next,  based  on  feedback  of  ac¬ 
tions  before  the  last  one.  This  breakdown  of  closed-loop  control 
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FIGURE  14.12.  (a)  Open-loop  control,  (b)  Closed-loop  control. 


in  the  face  of  (especially  unexpected)  delays  is  one  of  the  rea¬ 
sons  hand-eye  coordination  tasks,  such  as  mouse  movement, 
break  down  if  delays  exceed  a  couple  of  hundred  milliseconds. 
The  feedback  loops  in  our  bodies  for  these  tasks  assume  normal 
delays  of  around  200  ms  and  are  robust  to  variations  around  this 
figure,  but  adding  delays  beyond  this  starts  to  cause  breakdown. 

The  delays  inside  our  bodies  cause  other  problems  too.  The 
path  from  our  visual  cortex  into  our  brain  is  far  faster  (by  100  ms 
or  so)  than  that  from  our  touch  and  muscle  tension  sensors 
around  our  bodies.  If  we  were  designing  a  computer  system 
to  use  this  information,  we  might  consider  having  a  short 
100  ms  tape  loop,  so  that  we  could  store  the  video  input  until 
we  had  the  appropriate  information  from  all  senses.  However, 
the  sheer  volume  of  visual  information  means  that  our  brains 
do  not  attempt  to  do  this.  Instead  there  is  a  part  of  your  brain 
that  predicts  where  it  “thinks”  your  body  is  and  what  it  is  feel¬ 
ing  based  on  previous  nerve  feedback  and  what  it  knows  the 
muscles  have  been  asked  to  do.  The  same  bit  of  the  brain  then 
monitors  what  actually  did  happen  (when  the  nerve  signals 
have  made  their  way  up  the  spinal  column  to  the  brain)  and 
gives  an  uncomfortable  or  shocked  sensation  when  a  mismatch 
occurs — for  example,  if  you  go  to  pick  something  up,  but  be¬ 
cause  of  poor  light  or  a  strange-shaped  object  you  touch  it  ear¬ 
lier  or  later  than  you  would  expect.  Tickling  is  also  connected 
with  this  lack  of  ability  to  predict  the  sensations  (this  is  why  it  is 
difficult  to  tickle  yourself). 

Race  conditions  also  occur  within  this  networked  system  of 
our  bodies — for  example,  getting  letter  inversions  whilst  typ¬ 
ing  where  signals  to  the  two  hands  get  processed  in  the  wrong 
order.  Brewster  and  Dix  (1994)  also  used  race  conditions  to  un¬ 
derstand  what  goes  wrong  in  certain  kinds  of  mis-hits  of  on¬ 
screen  buttons.  In  certain  circumstances  two  almost  simultane¬ 
ous  “commands”  from  our  brains  to  our  hands  to  release  the 
mouse  button  and  to  our  arms  to  move  to  a  new  mouse  loca¬ 
tion  can  get  out  of  order,  meaning  the  mouse  moves  out  of  the 
target  before  it  is  released.  This  analysis  allowed  us  to  design 
an  experiment  that  forced  this  very  infrequent  error  to  occur 
much  more  frequently  and  therefore  make  it  easier  to  assess 
potential  solutions. 


NETWORKS  AS  PLATFORMS 

Algorithms  and  Architectures  for  Distributed  Interfaces 

User  interfaces  are  hard  enough  to  construct  on  a  single  ma¬ 
chine;  concurrent  access  by  users  on  networked  machines  is  a 
nightmare! 

Happily,  appropriate  algorithms,  architectures,  toolkits  and 
frameworks  can  help — a  bit. 

Accessing  Shared  Objects 

■We  saw  earlier  how  race  conditions  within  networked  systems 
can  lead  to  inconsistencies  within  the  user  interface  and  within 
the  underlying  data  structures.  Fortunately,  there  are  a  range  of 
techniques  for  dealing  with  this. 
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Locking 

The  standard  technique,  used  in  databases  and  file  systems, 
for  dealing  with  multiple  accesses  to  the  same  data  object  is 
locking.  When  a  user’s  application  wants  to  update  a  particular 
database  record,  it  asks  the  database  manager  for  a  lock  on  the 
record.  It  can  then  get  a  copy  of  the  record  and  send  back  the 
update,  knowing  that  nothing  else  can  access  it  in  the  mean¬ 
time.  Users  are  typically  unaware  that  locking  is  being  per¬ 
formed — the  act  of  opening  a  file  or  opening  an  edit  form  for  a 
database  record  establishes  the  lock  and  later,  when  the  file  or 
the  edit  form  is  completed  and  closed,  the  lock  is  released. 

Although  this  is  acceptable  for  more  structured  domains 
there  are  problems  in  more  dynamic  domains  such  as  shared 
editing.  Locking  a  file  when  one  user  is  editing  it  is  no  good  as 
we  want  several  people  to  edit  the  same  file  at  the  same  time. 
In  these  cases  more  lightweight  forms  of  locking  can  be  used  at 
finer  granularities:  at  paragraph,  sentence,  or  even  per-charac- 
ter  level.  For  example,  the  act  of  clicking  over  a  paragraph  to 
set  a  text-entry  position  may  implicitly  request  a  paragraph 
lock,  which  is  released  when  you  go  on  to  edit  another  para¬ 
graph.  However,  implicit  and  informal  locks,  because  they  are 
not  apparent,  can  lead  to  new  problems.  For  example,  a  user 
may  click  on  a  paragraph  and  do  some  changes,  but  before 
moving  on  to  another  part  of  the  document,  get  interrupted. 
No  one  else  can  then  edit  the  paragraph.  To  avoid  this,  the 
more  informal  locks  are  often  time-limited  or  can  be  forcibly 
broken  by  the  server  if  another  user  requests  a  lock  on  the 
same  object. 

Replication 

In  collaboration  systems  such  as  Lotus  Notes/Domino  or 
source  code  systems  such  as  CVS,  users  do  not  lock  central 
copies  of  data,  but  instead  each  user  (or  possibly  each  site)  has 
their  own  replica  of  the  complete  Notes  database/CVS  tree.  Pe¬ 
riodically  these  replicas  are  synchronized  with  central  copies 
or  with  each  other.  Updates  can  happen  anywhere  by  anyone 
with  a  replica.  Conflicts  may  of  course  arise  if  two  people  edit 
the  same  note  between  synchronizations.  Instead  of  prevent¬ 
ing  such  conflicts,  the  system  (and  software  written  using  it) 
accepts  that  such  conflicts  will  occur.  When  the  replicas  syn¬ 
chronizes,  conflicts  are  detected  and  various  (configurable)  ac¬ 
tions  may  occur:  flagging  the  conflicts  to  users,  adding  conflict¬ 
ing  copies  as  versions,  and  so  on. 

This  view  of  “replicate  and  worry  later”  is  essential  in  many 
mobile  applications,  as  attempts  to  lock  a  file  whilst  disconnected 
would  first  of  all  require  waiting  until  a  network  connection  could 
be  made,  and,  worse,  if  the  network  connection  is  lost  whilst  the 
lock  is  still  in  operation,  could  lead  to  files  being  locked  for  very 
long  periods.  Other  examples  of  replication  in  research  environ¬ 
ments  include  the  CODA  at  CMU,  which  allows  replication  of  a 
standard  UNIX  file  system  (Kistler  &  Satyanarayanan,  1992)  and 
Liveware,  a  contact-information  system  that  replicates  and  syn¬ 
chronizes  when  people  meet  in  a  manner  modeled  after  the 
spread  of  computer  viruses  (Witten,  Thimbleby  Coulouris,  & 
Greenberg,  1991).  Although  Liveware  is  now  quite  a  few  years  old 
and  was  “spread”  using  synchronizing  floppy  disks,  the  same  prin¬ 


ciple  is  now  being  suggested  for  PDAs  and  other  devices  to  ex¬ 
change  information  via  IrDA  or  Bluetooth. 

Optimistic  Concurrency  for  Syncfironous  Editing 

The  “do  it  now  and  see  if  there  are  conflicts  later”  approach 
is  called  “optimistic  concurrency,”  especially  in  more  synchro¬ 
nous  settings.  For  example,  in  a  shared  editing  system  the  like¬ 
lihood  of  two  users  editing  the  same  sentence  at  the  same  time 
is  very  low.  An  optimistic  algorithm  doesn’t  bother  to  lock  or 
otherwise  check  things  when  the  users  start  to  edit  in  an  area, 
but  in  the  midst  or  at  the  end  of  their  edits  checks  to  see  if  there 
are  any  conflicts,  and  attempts  to  fix  them. 

There  are  three  main  types  of  data  that  may  be  shared: 

•  Orthogonal  data — where  the  data  consists  of  attributes  of  in¬ 
dividual  objects/records  that  can  all  be  independently  updated 

•  Sequential  data — particularly  text,  but  any  form  of  list 
where  the  order  is  not  determined  by  an  attribute  property 

•  Complex  structural  data — such  as  directory  trees,  taxo¬ 
nomic  categories,  and  so  on. 

In  terms  of  complexity  for  shared  data,  these  are  in  increas¬ 
ing  difficulty. 

Orthogonal  data,  although  by  no  means  trivial,  is  the  sim¬ 
plest  case.  There  is  quite  a  literature  on  shared  graphical  edi¬ 
tors,  which  all  have  this  model:  independent  shapes  and  objects 
with  independent  attributes  such  as  color,  size,  or  position. 
When  merging  updates  from  two  users,  all  one  has  to  do  is  look 
at  each  attribute  in  turn  and  see  whether  it  has  been  changed  by 
only  one  user,  in  which  case  the  updated  value  is  used,  or  if  it 
has  been  changed  by  both,  in  which  case  either  the  last  update 
is  used  or  the  conflict  is  flagged. 

Structured  data  is  most  complicated.  What  do  you  do  if 
someone  has  created  a  new  file  in  directory  D,  but  at  the  same 
time  someone  else  has  deleted  the  directory?  I  know  of  no  op¬ 
timistic  algorithms  for  dealing  effectively  with  this  in  the  CSCW 
literature.  CODA  deals  with  directory  structures  (normal  UNIX 
file  system)  but  takes  a  very  simple  view  of  this  as  it  only  flags 
inconsistencies  and  doesn’t  attempt  to  fix  them. 

Algorithms  for  shared  text  editing  sit  somewhere  between  the 
two  and  have  two  slightly  different  problems,  both  relating  to  race 
conditions  when  two  or  more  users  are  updating  the  same  text: 

•  Dynamic  pointers — if  user  A  is  updating  an  area  of  text  in 
front  of  user  B,  then  the  text  user  B  is  editing  will  effectively 
move  in  the  document. 

•  Deep  conflict — what  happens  if  user  A  and  user  B’s  cursors 
are  at  the  very  same  location  and  they  perform  insertions/ 
deletions 

Figure  14.13  shows  an  example  of  the  first  of  these  problems. 
The  deeper  conflict  occurs  when  both  cursors  are  at  the  same 
point,  say  after  the  “Y”  in  “XYZ.”  Adonis  types  ‘A’  and  at  the  same 
time  Beatrice  types  “B” — should  we  have  “XYABZ”  or  “XYBAZ,”  or 
perhaps  even  lose  one  or  other  character?  Or  if  Adonis  types  ‘A’ 
and  Beatrice  presses  “delete,”  should  we  have  “XYZ”  or  “XAZ”? 
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A  number  of  algorithms  exist  for  dealing  with  this  (Sun  &  Ellis, 
1998;  Mauve,  2000;  Vidot,  Cart,  Ferriz,  &  Suleiman,  2000),  includ¬ 
ing  a  retrofit  to  Microsoft  Word  called  CoWord  (Xia,  Sun,  Sun, 
Chen,  &  Shen,  2004).  Most  of  these  stem  from  the  dOPT  algorithm 
used  in  the  Grove  editor  (Ellis  &  Gibbs,  1989).  These  algorithms 
work  by  having  various  transformations  that  allow  you  to  reorder 
operations.  For  example,  if  we  have  two  insertions  (labeled  “a”  and 
“b”)  performed  at  the  same  time  at  different  locations 

a)  insert  texta  at  location  n 

b)  insert  textb  at  location  m 

but  decide  to  give  insert  a  preference,  then  we  have  to  trans¬ 
form  b  to  b'  as  follows: 

b')  if  (m  <  n)  insert  textb  at  location  m  (i) 

if  (m  =  n)  insert  textb  at  location  m  (ii) 

if  (m  >  n)  insert  textb  at  location  m -I- length  (texta)  (iii) 


optimistic  concurrency  group  undo 


FIGURE  14.14.  Multi-user  transformations. 


Case  (i)  says  that  if  the  location  of  insert  “b”  is  before  insert 
“a”  you  don’t  have  to  worry.  Case  (iii)  says  that  if  it  is  after  in¬ 
sert  “a”  you  have  to  shift  your  insert  along  accordingly.  Case  (ii) 
is  the  difficult  one  where  a  conflict  occurs  and  has  to  be  dealt 
with  carefully  to  ensure  that  the  algorithm  generates  the  same 
results  no  matter  where  it  is.  The  version  above  would  mean 
that  B’s  cursor  gets  left  behind  by  A’s  edit.  The  alternative  would 
be  to  make  case  (ii)  the  same  as  case  (iii)  which  would  mean 
B’s  cursor  would  be  pushed  ahead  of  A’s. 

In  early  work  in  this  area  I  proposed  regarding  dynamic 
pointers  as  first-class  objects  and  using  these  in  all  representa¬ 
tions  of  actions  (Dix,  1991, 1995).  This  means  that  rules  like  case 
(i)  and  case  (iii)  happen  “for  free,”  but  the  deep  conflict  case  still 
needs  to  be  dealt  with  specially. 

Groupware  Undo 

The  reason  that  undo  is  complicated  in  groupware  is  similar 
to  the  problems  of  race  conditions  in  optimistic  concurrency. 

In  the  case  of  optimistic  concurrency  user  A  has  performed 
action  a  and  user  B  has  performed  action  b,  both  on  the  same 
initial  state.  The  problem  is  to  transform  user  B’s  action  into  one 
b'  that  can  be  applied  to  the  state  after  action  a  yet  still  mean 
the  “same  things”  as  the  original  action  a. 

In  the  case  of  groupware  undo  we  may  have  the  situation 
where  user  A  has  performed  action  a,  followed  by  user  B  per¬ 
forming  action  b,  then  user  A  decides  to  undo  action  a.  How 
do  we  transform  action  b  so  that  the  transformed  b'  means  the 
same  before  action  a  as  b  did  after? 

Similar,  but  slightly  different  transformation  rules  can  be  pro¬ 
duced  for  the  case  of  undo  and  dynamic  pointers  can  be  used 
for  most  cases. 

As  with  optimistic  concurrency  there  is  slightly  more  work 
on  group  undo  in  shared  graphical  editors  where  the  orthogo¬ 
nal  data  makes  conflicts  easier  (Berlage  &  Spenke,  1992). 

Real  Solutions? 

Although  these  various  algorithms  can  ensure  there  is  no  in¬ 
ternal  inconsistency  and  that  all  participants  see  the  same  thing, 
they  do  not  necessarily  solve  all  problems.  Look  again  at  the  case 
of  Alison  and  Brian’s  chat  earlier  in  the  chapter.  Certainly  in  the 
case  of  group  undo,  when  Abowd  and  Dix  (1992)  published  the 


14.  Network-Based  Interaction  •  289 


first  paper  on  the  topic,  we  proposed  various  solutions,  but  also 
recommended  that  as  well  as  an  explicit  undo  button,  systems 
ought  to  provide  sufficient  history  to  allow  users  to  recreate  what 
they  want  to  without  using  the  undo  button.  This  is  not  because 
it  is  impossible  to  find  a  reasonable  meaning  for  the  undo  button, 
but  because  in  the  case  of  group  undo,  there  are  several  reason¬ 
able  meanings.  Choosing  the  meaning  a  user  intends  is  impossi¬ 
ble  and  so  it  may  sometimes  be  better  not  to  guess. 

Architectures  for  Networked  Systems 

Software  architecture  is  about  choosing  what,  in  terms  of  code 
and  functionality,  goes  where.  For  applications  on  a  single  ma¬ 
chine  these  are  often  logical  distinctions  between  parts  of  the 
code.  For  networked  systems,  “where”  includes  physical  loca¬ 
tion,  and  the  choice  of  location  makes  an  enormous  difference 
to  the  responsiveness  of  the  system. 

The  simplest  systems  are  usually  centralized  client-server 
architectures  in  which  the  majority  of  computation  and  data  stor¬ 
age  happens  in  the  central  server.  Many  of  the  problems  of  race 
conditions  and  potential  inconsistencies  disappear.  However,  this 
means  that  every  interaction  requires  a  network  interaction  with 
the  server  meaning  that  feedback  may  be  very  slow.  The  oppo¬ 
site  extreme  are  replicated  peer-peer  architectures  where  all  the 
code  is  running  on  users’  own  PCs,  and  the  PCs  communicate 
directly  with  one  another.  Feedback  can  now  be  instantaneous, 
but  the  complexity  of  algorithms  to  maintain  consistency,  catch¬ 
up  late  joiners,  and  so  on,  can  be  very  complex.  Most  systems 
operate  somewhere  between  these  two  extremes  with  a  central 
“golden  copy”  of  shared  data,  but  with  some  portion  of  the  data 
on  individual  PCs,  PDAs,  or  other  devices  in  order  to  allow  rapid 
feedback.  A  notable  exception  to  this  is  peer-peer  sharing  net¬ 
works  such  as  Gnutella  where  the  only  central  resource  is  a  sort 
of  switchboard  to  allow  client  programs  to  contact  one  another. 
The  reason  for  this  is  largely  legal  as  these  are  often  used  to  share 
copyright  media!  One  reason  this  works  is  that  the  data  is  static; 
the  song  “Yesterday”  is  the  same  now  as  when  it  was  first 
recorded  in  1965 — although  given  it  has  the  greatest  number  of 
covers  of  any  song  this  is  perhaps  not  the  best  example! 

In  web  applications  options  are  constrained  by  the  features 
allowed  in  HTML  and  web  browsers.  Note  that  even  a  web  form 
is  allowing  some  local  interaction  (filling  in  the  form)  as  well  as 
some  centralized  interaction  (submitting  it).  Applets  allow  more 
interaction,  but  the  security  limitations  mean  that  they  can  only 
talk  back  to  the  server  where  they  originated.  Thus  true  peer- 
peer  architectures  are  impossible  on  the  web,  but  can  be  emu¬ 
lated  by  chat  servers  and  similar  programs  that  relay  messages 
between  clients.  For  Intranets  it  is  easier  to  configure  browsers 
so  that  they  accept  applets  as  trusted  and  thus  with  greater  net¬ 
work  privileges,  or  to  include  special  plug-in  components  to 
perform  more  complicated  actions. 

Mobile  systems  have  yet  more  issues  as  they  need  to  be  ca¬ 
pable  of  managing  disconnected  operation  (when  they  have  no 
physical  or  wireless  connection  to  the  network).  This  certainly 
means  keeping  cached  copies  of  central  data.  For  example,  the 
AvantGo  browser  for  PDAs  downloads  copies  of  web  pages  onto 
a  PDA  and  synchronizes  these  with  their  latest  versions  every 
time  the  PDA  is  docked  into  a  desktop  PC  (AvantGo,  2001).  Al¬ 


though  this  is  now  beginning  to  change  with  higher-bandwidth 
mobile-data  services,  it  is  still  the  case  that  access  whilst  mobile 
is  slower  and  more  expensive  than  whilst  using  a  fixed  connec¬ 
tion.  This  pushes  one  towards  replicated  architectures  with  ma¬ 
jor  resynchronization  when  connected  via  cheap/fast  connec¬ 
tions  and  more  on-demand  or  priority-driven  synchronization 
when  on  the  move. 

Supporting  Infrastructure 

In  order  to  help  manage  networked  applications  various  types 
of  supporting  infrastructure  are  being  developed. 

Awareness  Servers 

These  keep  track  of  which  users  are  accessing  particular  re¬ 
sources  (e.g.,  visiting  a  particular  web  page)  so  that  you  can  be 
kept  informed  as  to  whether  others  are  “near”  you  in  virtual 
space,  or  whether  friends  are  online  and  active.  (Palfreyman  & 
Rodden,  1996;  ICQ,  2001;  SUN  Microsystems.  2001) 

Notification  Servers 

These  serve  a  similar  role  for  data,  allowing  client  programs 
to  register  an  interest  in  particular  pieces  of  shared  data.  "When 
the  data  is  modified  or  accessed  the  interested  parties  are  “no¬ 
tified.”  For  example,  you  may  be  told  when  a  web  page  has 
changed  or  when  a  new  item  has  been  added  to  a  bulletin  board. 
Some  notification  servers  also  manage  the  shared  data  (Patter¬ 
son,  Day,  &  Kucan,  1996)  whilst  others  are  “pure,”  just  managing 
the  job  of  notification  (Ramduny  Dix,  &  Rodden,  1998). 

Event/Messaging  Systems 

These  allow  different  objects  in  a  networked  environment 
to  send  messages  to  one  another  in  ways  which  are  more  con¬ 
venient  than  those  allowed  by  the  raw  network.  For  example, 
they  may  allow  messages  to  be  sent  to  objects  based  on  location- 
independent  names,  so  that  objects  don’t  have  to  know  where 
each  other  are. 

Resource  Discovery 

Systems  such  as  the  Java  JINI  framework  and  Universal  Plug- 
and-Play  allow  devices  to  find  out  about  other  devices  close  to 
them — for  example,  the  local  printer — and  configure  them¬ 
selves  to  work  with  one  another.  As  ubiquitous  and  mobile  de¬ 
vices  multiply,  this  will  become  increasingly  important. 

Wef  Service  and  Remote  Procedure  Call 

Web  service  frameworks  such  as  SOAP  or  XML-RPC,  or  non¬ 
web  remote  procedure  mechanisms  such  as  Java’s  RMI  or  CORBA, 
allow  applications  on  different  machines  to  easily  connect  to 
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one  another  even  though  they  may  be  written  in  different  pro¬ 
gramming  languages  and  run  on  different  operating  systems. 
In  all  cases  the  parameters  or  arguments  have  to  be  serialized 
into  a  binary  or  ASCII  test  stream  to  be  sent.  As  Fig.  14.15  shows 
these  are  designed  to  be  read  by  machines,  not  people! 


HISTORY,  FUTURES,  AND  PARADIGM  SHIFTS 


History 

Given  the  anarchic  image  of  the  web,  it  is  strange  that  the  In¬ 
ternet  began  its  development  as  a  United  States  military  project. 
The  suitability  of  the  Internet  for  distributed  management  and 
independent  growth  stems  not  from  an  egalitarian  or  anti¬ 
centralist  political  agenda,  but  from  the  need  to  make  the  net¬ 
work  resilient  to  nuclear  attack,  with  no  single  point  of  failure. 

In  the  1970s  when  the  Internet  was  first  developing,  it  and 
other  networks  were  mainly  targeted  at  connecting  together 
large  computers.  It  was  during  the  1980s,  with  the  rise  of  per¬ 
sonal  computing,  that  local  networks  began  to  become  popular. 

However,  even  before  that  point,  very  local  networks  at  the 
lab-bench  level  had  been  developed  to  link  laboratory  equip¬ 
ment — for  example,  the  IEEE488  designed  originally  to  link 
Hewlett  Packard’s  proprietary  equipment  and  then  becoming 
an  international  standard.  Ethernet  too  began  life  in  commercial 
development  at  Xerox  before  becoming  the  de  facto  standard 
for  local  networking. 


TimeliiK 

E — Key  Events  for  the  Internet 

1968 

First  proposal  for  ARPANET 

-military  and  government  research  contracted  to 

Bolt,  Beranek,  and  Newman 

1971 

ARPANET  enters  regular  use 

1973/4 

Redesign  of  lower  level  protocols  leads  to  TCP/IP 

1983 

Berkeley  TCP/IP  Implementation  for  4.2BSD — 
public  domain  code 

1980s 

Rapid  growth  of  NSFNET 
-broad  academic  use 

1990s 

WWW  and  widespread  public  access  to  the  Internet 

2000 

WAP  on  mobile  phones 

Web  transcends  the  Internet 

Although  it  is  technical  features  of  the  Internet  (decentral¬ 
ized,  resilient  to  failures,  hardware  independent)  that  have 
made  it  possible  for  it  to  grow,  it  is  the  web  that  has  made  it  be¬ 
come  part  of  popular  consciousness.  Just  as  strange  as  the  In¬ 
ternet’s  metamorphosis  from  military  standard  to  anarchic  cult, 
is  the  web’s  development  from  medium  of  exchange  of  large 
high-energy  physics  datasets  to  e-commerce,  home  of  alterna¬ 
tive  web  pages  and  online  sex! 


<SOAP-ENV : Enve 1 ope 

xmlns : SOAP-ENV="hLtp : //schemas . xmlsoap . org/soap/envelope/ " 
xmlns : xsi="http: //www . w3.org/1999/XMLSchema-instance" 
xmlns : xsd="http : //www . w3 . org/1999/XMLSchema"> 

<S0AP- ENV : Heade  r> 

</SOAP-ENV:Header> 

<SOAP-ENV:Body> 

<nsl : sayHelloTo 

xmlns : nsl="Lookup" 

SOAP-ENV :  encodingStyle-'"http:  /  /schemas  .  xmlsoap.org/soap/encoding/"> 
<name  xsi :type="xsd:string">Hello  World</name> 

</nsl : sayHelloTo> 

</SOAP-ENV:Body> 

</SOAP-ENV : Envelope> 


FIGURE  14.15.  SOAPXMLencodingof  Lookup  ("Hello  World"). 
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Paradigm  Shift 

During  the  1970s  through  to  the  mid-1990s,  networks  were  a 
technical  phenomenon,  enabling  many  aspects  of  business  and 
academic  life,  but  with  very  little  public  impact.  However,  this 
has  changed  dramatically  over  the  last  five  years  and  now  we 
think  of  a  networked  society.  The  Internet  and  other  network 
technologies,  such  as  SMS  text  messages,  are  not  only  trans¬ 
forming  society  but,  at  a  popular  and  cultural  level,  defining 
an  era. 

International  transport,  telecommunications,  and  broadcast¬ 
ing  had  long  before  given  rise  to  the  term  global  village.  How¬ 
ever,  it  seems  this  was  more  a  phrase  waiting  for  a  meaning.  Un¬ 
til  recently  the  global  village  was  either  parochial  (telephoning 
those  you  already  know)  or  sanitized  (views  of  distant  cultures 
through  the  eyes  of  the  travel  agent  or  television  camera) .  It  is 
only  now  that  we  see  chat  rooms  and  web  homepages  allowing 
new  contacts  and  friendship  around  the  world  or  at  least 
amongst  those  affluent  enough  to  have  Internet  access. 

Markets  too  have  changed  due  to  global  networks.  It  is  not 
only  the  transnationals  who  can  trade  across  the  world,  and 
even  my  father-in-law  runs  a  thriving  business  selling  antiques 
through  eBay. 

Marketing  has  also  had  to  face  a  different  form  of  cross- 
cultural  issue.  Although  selling  the  same  product,  a  hoarding 
in  Karachi  may  well  be  different  from  an  advert  in  a  magazine 
in  Kentucky,  reflecting  the  different  cultural  concerns.  Global 
availability  of  web  pages  changes  all  that.  You  have  to  create  a 
message  that  appeals  to  all  cultures,  a  tall  order.  Those  who  try 
to  replicate  the  targeting  of  traditional  media  by  having  several 
country-specific  websites  may  face  new  problems;  the  global 
access  to  even  these  country-specific  pages  means  that  the  res¬ 
idents  of  Kentucky  and  Karachi  can  compare  the  different  ad¬ 
verts  prepared  for  them,  and  in  so  doing  see  how  the  company 
views  them  and  their  cultures. 

Economics  drives  much  of  popular,  as  well  as  business,  de¬ 
velopment  of  networked  society.  One  of  the  most  significant 
changes  in  the  United  Kingdom  in  recent  years  was  due  to 
changes  in  charging  models.  In  the  United  States,  local  calls 
have  long  been  free  and  hence  so  were  Internet  connections  to 
local  points  of  presence.  The  costs  of  Internet  access  in  the 
United  Kingdom  (and  even  more  important,  the  perception  of 
the  cost)  held  back  widespread  use.  The  rise  of  free  or  fixed- 
charge  unmetered  access  changed  nearly  overnight  the  ac¬ 
ceptability  and  style  of  use.  Internet  access  used  to  be  like  a 
lightning  guerrilla  attack  into  the  web  territory,  quick  in 
and  out  before  the  costs  mounted  up,  but  is  now  a  full-scale 
occupation. 

The  need  for  telecoms  companies  across  the  world  to  re¬ 
cover  large  investments  in  wireless-band  franchises  combined 
with  use  rather  than  connection-based  charging  made  possible 
by  GPRS  and  third-generation  mobile  services,  make  it  likely 
that  we  will  see  a  similar  growth  in  mobile  access  to  global  net¬ 
worked  information  and  services. 

In  an  article  in  1998, 1  used  the  term  PopuNET  to  refer  to  a 
change  in  society  that  was  not  yet  there,  but  would  come.  Pop¬ 
uNET  is  characterized  by  network  access 

everywhere,  everywhen  by  everyone 


This  pervasive,  permanent,  popular  access  is  similar  to  the 
so-called  “Martini  principle”  applied  more  recently  to  mobile 
networking — anytime,  anyplace,  anywhere.  Of  course,  Martini 
never  pretends  to  be  anything  but  exclusive,  so  not  surpris¬ 
ingly  these  differ  on  the  popular  dimension!  “Anyplace,  any¬ 
where”  does  correspond  to  the  pervasive  “everywhere”  and 
“anytime”  to  the  permanent  “everywhen.”  However,  there  is  a 
subtle  difference,  especially  between  “anytime”  and  “every¬ 
when.”.  ‘Anytime”  means  that  at  any  time  you  choose  you  can 
connect.  “Everywhen”  means  that  at  all  places  and  all  times 
you  are  connected.  When  this  happens,  one  ceases  to  think 
of  “connectedness”  and  it  simply  becomes  part  of  the  back¬ 
drop  of  life. 

The  changes  in  charging  models  have  brought  the  United 
Kingdom  closer  to  “everywhen”  and  the  United  States  and  many 
other  parts  of  the  world  are  already  well  down  the  path.  Always- 
on  mobile  connectivity  will  reinforce  these  changes. 

PopuNET  will  demand  new  interfaces  and  products,  not  just 
putting  web  pages  on  TV  screens  or  spreadsheets  on  fridge 
doors.  What  these  new  interfaces  will  be  is  still  uncertain. 


Futures  (Near) 

It  is  dangerous  to  predict  far  into  the  future  in  an  area  as  volatile 
as  this.  One  development  that  is  already  underway,  which  will 
make  a  major  impact  on  user  interfaces,  is  short-range  net¬ 
working  which  will  enable  various  forms  of  wearable  and  ubiq¬ 
uitous  networks.  Another  is  the  introduction  of  network  appli¬ 
ances,  which  will  make  the  home  “alive”  in  the  network. 

We  have  considered  network  aspects  of  continuous  media 
at  length.  The  fact  that  the  existing  Internet  TCP/IP  protocols  do 
not  enable  guaranteed  quality-of-service  will  put  severe  limits  on 
its  ability  to  act  as  an  infrastructure  for  services  such  as  video- 
on-demand  or  video-sharing  between  homes.  The  update  to 
TCP/IP  which  has  been  under  development  for  several  years, 
IPv6,  will  allow  prioritized  traffic,  but  it  falls  short  of  real  guar¬ 
anteed  QoS  (IPng,  2001). 

It  seems  that  this  is  impasse.  One  of  the  reasons  that  IPv6 
has  taken  so  long  is  not  the  technical  difficulty,  but  backwards 
compatibility  and  the  problems  of  uptake  on  the  existing 
worldwide  infrastructure,  so  evolutionary  change  is  hard.  How¬ 
ever,  revolutionary  change  is  also  hard;  one  cannot  easily  es¬ 
tablish  a  new  parallel  international  infrastructure  overnight. 
Or  perhaps  one  can. 

Mobile  phone  services  have  started  with  an  infrastructure 
designed  for  continuous  voice  and  are,  through  a  series  of 
quite  dramatic  changes,  moving  this  towards  a  fully  mixed 
media/data  service — and  it  is  a  global  network.  Eurthermore, 
more  and  more  non-web-based  Internet  services  are  using 
HTTP,  the  web  protocol,  to  talk  to  one  another  in  order  to 
be  “firewall  friendly.”  This  means  that  mobile  phones  and 
PDAs  can  be  web-connected  without  being  Internet-con¬ 
nected.  So  perhaps  the  future  is  not  an  evolution  of  Internet 
protocols,  but  a  gradual  replacement  of  the  Internet  by  nth- 
generation  mobile  networks.  Instead  of  the  Internet  on 
phones,  we  may  even  see  mobile  phone  networking  stan¬ 
dards  being  used  over  wires. 
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INTRODUCTION 


Computers  have  become  a  primary  tool  for  office  workers,  al¬ 
lowing  them  to  access  the  information  they  need  to  perform 
their  jobs;  however,  accessing  information  is  more  difficult  for 
more  mobile  users.  With  current  computer  interfaces,  the  user 
must  focus  both  physically  and  mentally  on  the  computing  de¬ 
vice  instead  of  the  environs.  In  a  mobile  environment,  such  in¬ 
terfaces  may  interfere  with  the  user’s  primary  task.  Many  mobile 
tasks  could,  however,  benefit  from  computer  support.  Our  fo¬ 
cus  is  the  design  of  wearable  computers  that  augment,  instead 
of  interfere,  with  the  user’s  tasks.  Carnegie  Mellon  University’s 
"VuMan  3  project  provides  an  example  of  how  the  introduction 
of  wearable  computing  to  a  task  can  reap  many  rewards. 


VUMAN  3  MAINTENANCE  INSPECTION 
WEARABLE  COMPUTER 


Many  maintenance  activities  begin  with  an  inspection,  during 
which  problems  are  identified.  Job  orders  and  repair  instructions 
are  generated  from  the  results  of  the  inspection.  The  VuMan  3 
wearable  computer  was  designed  for  streamlining  Limited  Tech¬ 
nical  Inspections  (LTI)  of  amphibious  tractors  for  the  U.S. 
Marines  at  Camp  Pendleton,  California  (Smailagic,  Siewiorek, 
Martin,  &  Stivoric,  1998).  The  LTI  is  a  600-element,  50-page 


checklist  that  usually  takes  four  to  six  hours  to  complete.  The 
inspection  includes  an  item  for  each  part  of  the  vehicle  (e.g., 
front  left  track,  rear  axle,  windshield  wipers,  etc.).  "VuMan  3  cre¬ 
ated  an  electronic  version  of  this  checklist.  The  system’s  inter¬ 
face  was  arranged  as  a  menu  hierarchy  and  a  physical  dial  and  se¬ 
lection  buttons  controlled  navigation.  The  top  level  consisted  of 
a  menu  that  gave  a  choice  of  function.  Once  the  inspection  func¬ 
tion  was  chosen,  the  component  being  inspected  was  selected 
by  its  location  on  the  vehicle.  At  each  stage,  the  user  could  go 
up  one  level  of  the  hierarchy. 

The  inspector  selects  one  of  four  possible  options  about  the 
status  of  the  item:  (a)  Serviceable,  (b)  Unserviceable,  (c)  Miss¬ 
ing,  or  (d)  On  Equipment  Repair  Order  (ERO).  Further  ex¬ 
planatory  comments  about  the  item  can  be  selected  (e.g.,  the 
part  is  unserviceable  due  to  four  missing  bolts). 

The  LTI  checklist  consists  of  a  number  of  sections,  with  ap¬ 
proximately  one  hundred  items  in  each  section.  The  user  se¬ 
quences  through  each  item  by  using  the  dial  to  select  “Next  Item,” 
or  “Next  Field.”  A  “smart  cursor”  helps  automate  some  of  the  nav¬ 
igation  by  positioning  the  user  at  the  next  most  likely  action. 

As  part  of  the  design  process,  a  field  study  was  performed.  In 
typical  troubleshooting  tasks,  one  Marine  would  read  the  mainte¬ 
nance  manual  to  a  second  Marine,  who  performs  the  inspection. 
"With  the  "VuMan  3,  only  one  Marine  is  needed  for  the  task,  as  that 
Marine  has  the  electronic  maintenance  manual.  Thus,  the  physical 
manual  does  not  have  to  be  carried  into  hard-to-reach  places. 

The  most  unanticipated  result  was  a  40%  reduction  in  in¬ 
spection  time.  The  bottom  right  image  of  Fig.  15.1  demonstrates 


Current  Practice  SAVINGS  FACTOR  VuMan  3  Field  Trials 


Inspection 

time 

70% 

less 


FIGURE  15.1.  VuMan  3  Savings  Factors. 
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the  reason  for  this  result.  Here,  the  Marine  is  on  his  or  her  side 
looking  up  at  the  bottom  of  the  amphibious  tractor.  In  such 
places,  it  is  hard  to  read  or  write  on  the  clipboard  typically  used 
for  inspections.  The  Marine  constantly  gets  into  position,  crawls 
out  to  read  instructions,  crawls  back  into  position  for  the  in¬ 
spection,  and  then  crawls  out  again  to  record  the  results.  In  ad¬ 
dition,  the  Marine  tends  to  do  one  task  at  a  time  when  the  Ma¬ 
rine  might  have  five  things  to  inspect  in  one  place.  This  extra 
motion  has  a  major  impact  on  the  time  required  to  do  a  task. 
By  making  information  truly  portable,  wearable  computers  can 
improve  the  efficiency  of  this  application  and  many  other  simi¬ 
lar  ones. 

The  second  form  of  time  savings  with  the  VuMan  3  occurred 
when  the  inspection  is  finished.  The  wearable  computer  re¬ 
quires  a  couple  of  minutes  to  upload  its  data  to  the  logistics 
computer.  The  manual  process,  however,  required  a  typist  to 
enter  the  Marine’s  handwritten  text  into  the  computer.  Given 
that  the  soldier  may  have  written  the  notes  in  cold  weather  while 
wearing  gloves,  the  writing  may  require  some  interpretation. 
This  manual  process  represents  another  30%  of  the  time. 

Such  redundant  data  entry  is  common  when  users  are  mobile 
(Starner,  Snoeck,  Wong,  &  McGuire,  2004).  There  are  numerous 
checklist-based  applications  including  plant  operations,  pre¬ 
flight  checkout  of  aircraft,  inventory,  and  so  forth  that  may  ben¬ 
efit  from  a  form-filling  application  run  on  a  wearable  computer. 
In  the  case  of  the  "VuMan  3  project,  the  results  were  striking. 
From  the  time  the  inspection  was  started  until  the  data  was  en¬ 
tered  into  the  logistics  computer,  70%  of  the  time  was  saved  by 
using  the  wearable.  There  was  a  potential  savings  by  reducing 
maintenance  crews  from  two  to  one.  Finally,  there  was  also  a 
savings  in  weight  over  paper  manuals. 


THE  WEARABLE  COMPUTING  CAMP 


Designing  wearable  computer  interfaces  requires  attention  to 
many  different  factors  due  to  their  closeness  to  the  body  and 
their  use  while  performing  other  tasks.  For  the  purposes  of  dis¬ 
cussion,  we  have  created  the  “CAMP”  framework,  which  consists 
of  the  following  factors: 

Corporal:  Wearables  should  be  designed  to  interface  phys¬ 
ically  with  the  user  without  discomfort  or  distraction. 

Attention:  Interfaces  should  be  designed  for  the  user’s  di¬ 
vided  attention  between  the  physical  and  virtual  worlds. 

Manipulation:  "When  mobile,  users  lose  some  of  the  dex¬ 
terity  assumed  by  desktop  interfaces.  Controls  should  be 
quick  to  find  and  simple  to  manipulate. 

Perception:  A  user’s  ability  to  perceive  displays,  both  visual 
and  audio,  is  also  reduced  while  mobile.  Displays  should 
be  simple,  distinct,  and  quick  to  navigate. 

Power,  heat,  on-body  and  off-body  networking,  privacy,  and 
many  other  factors  also  affect  on-body  computing  (Starner, 
2001).  Many  of  these  topics  are  the  subjects  of  current  research, 
and  much  work  will  be  required  to  examine  how  these  factors 
interrelate.  Due  to  space,  we  will  concentrate  mainly  on  CAMP 
principles  and  practice  in  the  remainder  of  this  chapter. 


CORPORAL:  DESIGN  GUIDES 
FOR  WEARABILITY 


The  term  wearable  implies  the  use  of  the  human  body  as  a  sup¬ 
port  environment  for  the  object  described.  Society  has  histori¬ 
cally  evolved  its  tools  and  products  into  more  portable,  mobile, 
and  wearable  form  factors.  Clocks,  radios,  and  telephones  are 
examples  of  this  trend.  Computers  are  undergoing  a  similar  evo¬ 
lution.  Simply  shrinking  computing  tools  from  the  desktop  par¬ 
adigm  to  a  more  portable  scale  does  not  take  advantage  of  a 
whole  new  context  of  use.  While  it  is  possible  to  miniaturize 
keyboards,  human  evolution  has  not  kept  pace  by  shrinking  our 
fingers.  There  is  no  Moore’s  Law  for  humans.  The  human  anat¬ 
omy  introduces  minimal  and  maximal  dimensions  that  define 
the  shape  of  wearable  objects,  and  the  mobile  context  also  de¬ 
fines  dynamic  interactions.  Conventional  methods  of  interac¬ 
tion,  including  keyboard,  mouse,  joystick,  and  monitor,  have 
mostly  assumed  a  fixed  physical  relationship  between  user  and 
device.  With  wearable  computers,  the  user’s  physical  context 
may  be  constantly  changing.  Symbol’s  development  of  a  wear¬ 
able  computer  for  shipping  hubs  provides  an  example  of  how 
computing  must  be  adapted  for  the  human  body. 

As  a  company.  Symbol  is  well  known  for  its  barcode  technol¬ 
ogy;  however,  it  is  also  one  of  the  first  successful  wearable  com¬ 
puter  companies,  having  sold  over  100,000  units  from  its  WSS 
1000  line  of  wearable  computers  (see  Fig.  15.2).  The  WS-1000 
consists  of  a  wrist-mounted  wearable  computer  that  features  a 
laser  barcode  scanner  encapsulated  in  a  ring  worn  on  the  user’s 
finger.  This  configuration  allows  the  user  to  scan  barcodes  while 
keeping  both  hands  free  to  manipulate  the  item  being  scanned. 
Because  the  user  no  longer  has  to  fumble  with  a  desk-tethered 
scanner,  these  devices  increase  the  speed  at  which  the  user  can 
manipulate  packages  and  decrease  the  overall  strain  on  the 
user’s  body.  Such  features  are  important  in  shipping  hubs, 
where  millions  of  packages  are  scanned  by  hand  every  year. 
Symbol  spent  over  U.S.  $5  million  and  devoted  40,000  hours  of 


FIGURE  15.2.  Symbol's  WSS  1 000  series  wrist-mounted  wear¬ 
able  computer  with  ring  scanner. 
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testing  to  develop  this  new  class  of  device,  and  one  of  the  ma¬ 
jor  challenges  was  adapting  the  computer  technology  to  the 
needs  of  the  human  body  (Stein,  Ferrero,  Hetfield,  Quinn,  & 
Krichever,  1998). 

One  of  the  first  observations  made  was  that  users  may  be 
widely  varying  shapes  and  sizes.  Specifically,  Symbol’s  scanner 
had  to  fit  the  fingers  of  both  large  men  and  small  women.  Simi¬ 
larly,  the  wrist  unit  had  to  be  mounted  on  both  large  and  small 
wrists.  Even  though  the  system’s  wires  were  designed  to  be  un¬ 
obtrusive,  the  system  must  be  designed  to  break  away  if  entan¬ 
gled  and  subjected  to  strain.  This  policy  provided  a  safeguard 
for  the  user. 

Initial  testing  discovered  other  needs  that  were  obvious  in 
hindsight.  For  example,  the  system  was  strapped  to  the  user’s 
forearm  while  the  user  exerted  him-  or  herself  moving  boxes. 
Soon,  the  “soft-good”  materials,  which  were  designed  for  the 
comfort  of  the  user,  became  sodden  with  sweat.  After  one  shift, 
the  user  was  expected  to  pass  the  computer  to  the  operator  on 
the  next  shift.  Not  only  was  the  sweat-laden  computer  mount 
considered  “gross,”  it  also  presented  a  possible  health  risk.  This 
problem  was  solved  by  separating  the  computer  mount  from 
the  computer  itself  Each  user  received  his  or  her  own  mount, 
which  could  be  kept  adjusted  to  the  user’s  own  needs.  After 
each  shift,  the  computer  could  be  removed  from  the  user’s 
mount  and  placed  in  the  replacement  user’s  mount. 

Another  unexpected  discovery  is  that  the  users  tended  to 
use  the  computer  as  body  armor.  "When  a  shipping  box  would 
begin  to  fall  on  the  user,  the  user  would  block  the  box  with  the 
computer  mounted  on  her  or  his  forearm,  as  that  was  the  least 
sensitive  part  of  the  user’s  body.  Symbol’s  designers  were  sur¬ 
prised  to  see  users  adapt  their  work  practices  to  use  the  rigid 
forearm  computer  to  force  boxes  into  position.  Accordingly,  the 
computer’s  case  was  designed  out  of  high  impact  materials; 
however,  another  surprise  came  with  longer  term  testing  of  the 
computer. 

Employees  in  the  test  company’s  shipping  hubs  constantly 
reached  into  wooden  crates  to  remove  boxes.  As  they  reached 
into  the  crates,  the  computer  would  grind  along  the  side.  After 
extended  use,  holes  would  appear  in  the  computer’s  casing, 
eventually  damaging  the  circuitry.  Changing  the  composition 
of  the  casing  to  be  resistant  to  both  abrasion  and  impact  finally 
fixed  the  problem. 

After  several  design  cycles.  Symbol  presented  the  finished 
system  to  new  employees  in  a  shipping  hub.  After  a  couple  of 
weeks’  work,  test  results  showed  that  the  new  employees  felt 
the  system  was  cumbersome,  whereas  established  employees 
who  had  participated  in  the  design  of  the  project  felt  that  the 
wearable  computer  provided  a  considerable  improvement  over 
the  old  system  of  package  scanning.  After  consideration,  Sym¬ 
bol’s  engineers  realized  that  these  new  employees  had  no  ex¬ 
periential  basis  for  comparing  the  new  system  to  the  past  re¬ 
quirements  of  the  job.  As  employees  in  shipping  hubs  are  often 
short  term,  a  new  group  of  employees  were  recruited.  For  two 
weeks,  these  employees  were  taught  their  job  using  the  old  sys¬ 
tem  of  package  scanning:  the  employee  would  reach  into  a 
crate,  grasp  a  package,  transfer  it  to  a  table,  grasp  a  handheld 
scanner,  scan  the  package,  replace  the  scanner,  grasp  the  pack¬ 
age,  and  transfer  it  to  its  appropriate  conveyer  belt.  The  em¬ 
ployees  were  then  introduced  to  the  forearm-mounted  WS-1000. 


With  the  wearable  computer,  the  employee  would  squeeze  her 
or  his  index  and  middle  finger  together  to  trigger  the  ring- 
mounted  scanner  to  scan  the  package  while  reaching  for  it, 
grasp  the  package,  and  transfer  it  to  the  appropriate  convey  belt 
in  one  fluid  motion.  These  employees  returned  very  positive 
scores  for  the  wearable  computer. 

This  lesson,  that  perceived  value  and  comfort  of  a  wearable 
computer  is  relative,  was  also  investigated  by  Bodine  and  Gem- 
perle  (2003).  In  short  interviews,  users  were  fitted  with  a  back¬ 
pack  or  armband  wearable  and  told  that  the  system  was  either 
a  police  monitoring  device  (similar  to  those  used  for  house  ar¬ 
rest),  a  medical  device  for  monitoring  health,  or  a  device  for  use 
during  parties.  The  subjects  were  then  asked  to  rate  the  devices 
on  various  scales  of  desirability  and  comfort.  Not  surprisingly, 
the  police  wearable  was  considered  the  least  desirable;  how¬ 
ever,  the  police  function  elicited  more  negative  physical  com¬ 
fort  ratings,  and  the  medical  function  elicited  more  positive 
physical  comfort  ratings,  even  though  they  were  the  same  de¬ 
vice.  In  other  words,  perceived  comfort  can  be  affected  by  the 
supposed  function  of  the  device. 

Researchers  have  also  explored  wearability  in  more  general 
terms.  Wearability  is  defined  as  the  interaction  between  the 
human  body  and  the  wearable  object.  Dynamic  wearability  in¬ 
cludes  the  human  body  in  motion.  Design  for  wearability  con¬ 
siders  the  physical  shape  of  objects  and  their  active  relation¬ 
ship  with  the  human  form.  Gemperle,  Kasabach,  Stivoric,  Bauer, 
and  Martin  (1998)  explored  history  and  cultures  including  top¬ 
ics  such  as  clothing,  costumes,  protective  wearables,  and  carried 
devices  (Siewiorek,  2002).  They  studied  physiology,  biome¬ 
chanics,  and  the  movements  of  modern  dancers  and  athletes. 
Drawing  upon  the  experience  of  CMU’s  wearables  group,  over 
two  dozen  generations  of  machines  representing  over  a  hun¬ 
dred  person  years  of  research,  they  codified  the  results  into 
guidelines  for  designing  wearable  systems.  These  results  are 
summarized  in  Table  15.1. 

This  team  also  developed  a  set  of  wearable  forms  to 
demonstrate  how  wearable  computers  might  be  mounted  on 
the  body.  Each  of  the  forms  was  developed  by  applying  design 
guidelines  and  follows  a  simple  pattern  for  ensuring  weara¬ 
bility.  The  pods  were  designed  to  house  electronic  compo¬ 
nents.  All  of  the  forms  are  between  3/8"  and  1"  thick,  and  flex¬ 
ible  circuits  can  fit  comfortably  into  the  1/4"  thick  flex  zones. 
Beginning  with  acceptable  areas  and  the  humanistic  form  lan¬ 
guage,  the  team  considered  human  movement  in  each  indi¬ 
vidual  area.  Each  area  is  unique,  and  some  study  of  the  mus¬ 
cle  and  bone  structure  was  required  along  with  common 
movements.  Perception  of  size  was  studied  for  each  individual 
area.  For  testing,  minimal  amounts  of  spandex  were  stretched 
around  the  body  to  attach  the  forms.  The  results  are  shown  in 
Fig.  15.3. 

These  studies  and  guidelines  provide  a  starting  point  for 
wearable  systems  designers;  however,  there  is  much  work  to 
be  done  in  this  area.  Weight  was  not  considered  in  these  stud¬ 
ies,  nor  were  the  long-term  physiological  effects  such  systems 
might  have  on  the  wearer’s  body.  Similarly,  fashion  can  affect  the 
perception  of  comfort  and  desirability  of  a  wearable  compo¬ 
nent.  As  wearable  systems  become  more  common  and  are  used 
for  longer  periods  of  time,  it  will  be  important  to  test  these  com¬ 
ponents  of  wearability. 
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TABLE  15.1.  Design  for  Wearability  Attributes. 

Attribute  Comments 

Placement  Identify  where  the  computer  should  be  placed  on  the  body.  Issues  include  identifying  areas  of  similar  size  across  a 

population,  areas  of  low  movement/flexibility,  and  areas  large  in  surface  area. 

Humanistic  Form  Language  The  form  of  the  object  should  work  with  the  dynamic  human  form  to  ensure  a  comfortable  fit.  Principles  include 

inside  surface  concave  to  fit  body,  outside  surface  convex  to  deflect  objects,  tapering  sides  to  stabilize  form  on 
body,  and  radiusing  edges/corners  to  provide  soft  form. 

Human  Movement  Many  elements  make  up  a  single  human  movement:  mechanics  of  joints,  shifting  of  flesh,  and  the  flexing  and 

extending  of  muscles  and  tendons  beneath  the  skin.  Allowing  for  freedom  of  movement  can  be  accomplished 
in  one  of  two  ways:  (a)  by  designing  around  the  more  active  areas  of  the  joints,  or  (b)  by  creating  spaces  on  the 
wearable  form  into  which  the  body  can  move. 

Human  Perception  of  Size  The  brain  perceives  an  aura  around  the  body.  Forms  should  stay  within  the  wearer's  intimate  space,  so  that 

perceptually  they  become  a  part  of  the  body.  The  intimate  space  is  between  zero  and  five  inches  off  the  body 
and  varies  with  position  on  the  body. 

Wearables  must  be  designed  to  fit  many  types  of  users.  Allowing  for  size  variations  is  achieved  in  two  ways: 

(a)  static  anthropometric  data,  which  details  point  to  point  distances  on  different  sized  bodies,  and 

(b)  consideration  of  human  muscle  and  fat  growth  in  three  dimensions  using  solid  rigid  areas  coupled 
with  flexible  areas. 

Comfortable  attachment  of  forms  can  be  created  by  wrapping  the  form  around  the  body,  rather  than  using  single 
point  fastening  systems  such  as  clips  or  shoulder  straps. 

The  system  much  have  sufficient  volume  to  house  electronics,  batteries,  and  so  forth  that,  in  turn,  constrains  the 
outer  form. 

The  weight  of  a  wearable  should  not  hinder  the  body’s  movement  or  balance.  The  bulk  of  the  wearable  object 
weight  should  be  close  to  the  center  of  gravity  of  the  human  body,  minimizing  the  weight  that  spreads  to 
the  extremities. 

Before  purchasing  a  wearable  system,  walk  and  move  with  the  wearable  object  to  test  its  comfort  and 
accessibility. 

Passive  and  active  sensory  interaction  with  the  wearable  should  be  simple  and  intuitive. 

The  body  needs  to  breathe  and  is  very  sensitive  to  products  that  create,  focus,  or  trap  heat. 

Culture  and  context  will  dictate  shapes,  materials,  textures,  and  colors  that  perceptually  fit  the  user  and 
their  environment. 


Size  Variations 

Attachment 

Contents 

Weight 

Accessibility 

Interaction 

Thermal 

Aesthetics 


ATTENTION 


Humans  have  a  finite  and  nonincreasing  capacity  that  limits  the 
number  of  concurrent  activities  they  can  perform.  Herb  Simon 
observed  that  human  effectiveness  is  reduced  as  they  try  to  mul¬ 
tiplex  more  activities.  Frequent  interruptions  require  a  refocus¬ 
ing  of  attention.  After  each  refocus  of  attention,  a  period  of  time 
is  required  to  reestablish  the  context  prior  to  the  interruption. 
In  addition,  human  short-term  memory  can  hold  seven  plus  or 
minus  two  (e.g.,  five  to  nine)  chunks  of  information.  With  this 
limited  capacity,  today’s  systems  can  overwhelm  users  with  data, 
leading  to  information  overload.  The  challenge  to  human  com¬ 
puter  interaction  design  is  to  use  advances  in  technology  to  pre¬ 
serve  human  attention  and  to  avoid  information  saturation. 

In  the  mobile  context,  the  user’s  attention  is  divided  be¬ 
tween  the  computing  task  and  the  activities  in  the  physical  en¬ 
virons.  Some  interfaces,  like  some  augmented  realities  (Azuma, 
1997)  and  Dual  Purpose  Speech  (Lyons,  Skeels,  et  al.,  2004),  try 
to  integrate  the  computing  task  with  the  user’s  behavior  in  the 
physical  world.  The  VuMan  3  interface  did  not  tightly  couple  the 
virtual  and  real  worlds,  but  the  computer  interface  was  designed 
specifically  for  the  user’s  task  and  allowed  the  user  to  switch 
rapidly  between  a  virtual  interface  and  his  or  her  hands-on  ve¬ 
hicle  inspection. 

Many  office  productivity  tasks  such  as  e-mail  or  web  search¬ 
ing,  however,  have  little  relation  to  the  user’s  environment.  The 


mobile  user  must  continually  assess  what  attentional  resources 
she  or  he  can  commit  to  the  interface  and  for  how  long  before 
switching  attention  back  to  the  primary  task.  Oulasvirta,  Tammi- 
nen,  Roto,  and  Kuorelahit  (2005)  specifically  examined  such  sit¬ 
uations  by  fitting  cameras  to  mobile  phones  and  observing  users 
attempting  web  search  tasks  while  following  predescribed  routes. 
Subjects  performed  these  tasks  in  a  laboratory,  in  a  subway  car, 
riding  a  bus,  waiting  at  a  subway  station,  walking  on  a  quite  street, 
riding  an  escalator,  eating  at  a  cafeteria  and  conversing,  and  nav¬ 
igating  a  busy  street,  web  pages  required  an  average  of  16.2  sec¬ 
onds  to  load  and  had  considerable  variance  requiring  the  user 
to  attend  the  interface.  The  subjects  shifted  their  attention  from 
the  phone  interface  more  often  depending  on  the  task:  35%  of 
page  loadings  in  the  laboratory  versus  80%  of  the  page  loadings 
while  walking  a  quiet  street.  The  duration  of  continuous  atten¬ 
tion  on  the  mobile  device  also  varied  depending  on  the  physi¬ 
cal  environment:  8-16  seconds  for  the  laboratory  and  cafe  versus 
below  6  seconds  for  the  riding  the  escalator  or  navigating  a  busy 
street.  Similarly,  the  number  of  attention  switches  depended  on 
the  demands  of  the  environment. 

The  authors  noted  that  even  riding  an  escalator  requires  de¬ 
mands  on  attention  (e.g.,  choosing  a  correct  standing  position, 
monitoring  personal  space  for  passers-by,  and  determining 
when  the  end  is  in  order  to  step  off).  Accordingly,  they  are 
working  on  a  “Resource  Competition  Framework,”  based  on  the 
Multiple  Resource  Theory  of  attention  (Wickens,  1984),  to  relate 
mobile  task  demands  to  the  user’s  cognitive  resources.  This 
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FIGURE  1 5.3.  Forms  studied  for  wearability. 

framework  helps  predict  when  the  mobile  user  will  need  to 
adopt  attentional  strategies  to  cope  with  the  demands  of  a  mo¬ 
bile  task.  The  authors  reported  four  such  strategies  observed 
in  their  study.  The  first,  “calibrating  attention,  ”  refers  to  the 
process  by  which  the  mobile  user  first  attends  to  the  environ¬ 
ment  and  determines  the  amount  of  attention  the  user  needs  to 
devote  to  the  environment  versus  the  interface.  “Brief  sampling 
over  long  intervals”  refers  to  the  practice  of  only  attending  to 
the  environment  in  occasional,  brief  bursts  to  monitor  for 
changes  that  may  require  a  deviation  from  plan,  such  as  when 
reading  while  walking  a  empty  street.  “Task  finalization”  refers  to 
subjects’  preference  to  finish,  when  sufficiently  close,  a  task  or 
subtask  before  switching  attention  back  to  the  physical  envi¬ 
ronment.  “Turntaking  capture”  occurs  when  the  user  is  con¬ 
versing  with  another  person.  Attending  and  responding  to 
another  person  requires  significant  concentration,  leading  to 
minimal  or  no  attention  to  the  mobile  interface. 

The  third  author,  who  has  been  using  his  wearable  computer 
to  take  notes  on  his  everyday  life  since  1993,  has  remarked  on 
similar  strategies  in  his  interactions.  Describing  these  atten¬ 
tional  strategies  and  designing  interfaces  that  leverage  them  will 
be  important  in  future  mobile  interfaces.  Much  research  has 
been  performed  on  aircraft  and  automobile  cockpit  design-to- 
design  interfaces  that  augment,  but  do  not  interfere  with,  the  pi¬ 
lot’s  primary  task  of  navigating  the  vehicle;  however,  only  re¬ 
cently  has  it  begun  to  be  possible  to  instrument  mobile  users 
and  examine  interface  use  (and  misuse)  “in-the-field”  for  the 
mobile  computer  user  (Lyons  &  Starner,  2001).  Now,  theories  of 
attention  can  be  applied  and  tested  to  everyday  life  situations. 

This  newfound  ability  to  monitor  mobile  workers  may  help 
us  determine  how  not  to  design  interfaces.  In  contrast  to  the 
VuMan  3  success  described  above,  Ockerman’s  (2000)  PhD  the¬ 
sis  “Task  Guidance  and  Procedure  Context:  Aiding  Workers  in 
Appropriate  Procedure  Following”  warns  that  mobile  interfaces, 
if  not  properly  designed,  may  hinder  the  user’s  primary  task. 


Ockerman  studied  experienced  pilots  inspecting  their  small  air¬ 
craft  before  flying.  "When  a  wearable  computer  was  introduced 
as  an  aid  to  completing  the  aircraft’s  safety  inspection  check¬ 
list,  the  expert  pilots  touched  the  aircraft  less  (a  way  many  pilots 
develop  an  intuition  as  to  the  aircraft’s  condition).  In  addition, 
the  pilots  relied  too  much  on  the  wearable  computer  system, 
which  was  purposely  designed  to  neglect  certain  safety  steps. 
The  pilots  trusted  the  wearable  computer  checklist  to  be  complete 
instead  of  relying  on  their  own  mental  checklists.  Ockerman 
showed  how  such  interfaces  might  be  improved  by  providing 
more  context  for  each  step  in  the  procedure.  Another  approach 
would  be  integrating  the  aircraft  itself  into  the  interface  (e.g., 
use  augmented  reality  to  overlay  graphics  on  the  aircraft  indi¬ 
cating  where  the  pilot  physically  inspects  the  plane). 

Most  recently,  DARPA’s  Augmented  Cognition  project  (Koll- 
morgen,  Schmorrow,  Kruse,  &  Patrey  2005)  aims  to  create  mo¬ 
bile  systems  that  monitor  their  user’s  attentional  resources  and 
records  or  delays  incoming  information  in  order  to  present  it 
to  the  user  in  a  more  orderly  and  digestible  time  sequence. 
These  systems  exploit  mobile  electroencephalogram  (EEG) 
readings  or  functional  near  infrared  imaging  (fNIR)  to  monitor 
the  user’s  brain  activations  and  relate  these  results  to  the  user’s 
current  state.  Such  projects,  if  successful  on  a  larger  scale,  could 
reveal  much  about  the  mental  resources  required  for  truly  mo¬ 
bile  computing. 

The  Attention  Matrix,  shown  in  Fig.  15.4  (Anhalt  et  al.,  2001), 
categorizes  activities  by  the  amount  of  attention  they  require. 
The  activities  are  (a)  Information,  (b)  Communication,  and 
(c)  Creation.  Individual  activities  are  categorized  by  the  amount 
of  distraction  they  introduce  in  units  of  increasing  time:  (a)  Snap, 
(b)  Pause,  (c)  Tangent,  and  (d)  Extended.  The  Snap  duration  is 
an  activity  that  is  usually  completed  in  a  few  seconds,  such  as 
checking  a  watch  for  the  time.  The  user  should  not  have  to  in¬ 
terrupt  their  primary  activity  to  perform  this  activity.  The  Pause 
action  requires  the  user  to  stop  their  current  activity,  switch  to 
the  new  but  related  activity,  and  then  return  to  their  previous 
task  within  a  few  minutes.  Pulling  over  to  the  side  of  the  road  and 
checking  directions  is  an  example  of  a  pause.  A  Tangent  action 
is  a  medium  length  task  that  is  unrelated  to  the  action  in  which 
the  user  is  engaged.  Receiving  an  unrelated  phone  call  is  an  ex¬ 
ample  of  a  tangent  activity.  An  Extended  action  occurs  when  the 
user  deliberately  switches  tasks,  beginning  a  wholly  new,  long¬ 
term  activity.  For  the  car  driver,  stopping  at  a  motel  and  resting 
for  the  night  is  an  extended  activity. 

As  distractions  on  the  left  of  the  matrix  take  less  time  from 
the  user’s  primary  activity,  our  intent  is  to  move  activities  of  the 
matrix  towards  the  left  side  (Snap).  Our  goal  is  to  evaluate  how 
this  process  extends  to  a  larger  sample  of  applications. 


MANIPULATION 


VuMan  3  Dial  Pointing 

VuMan  3  added  a  novel  manipulation  interface  suitable  for  use 
when  physical  attention  is  occupied.  The  VuMan3  has  a  low-res¬ 
olution  display  and,  consequently,  a  purely  textual  interface.  Fig¬ 
ure  15.5  shows  a  sample  screen  from  the  user  interface.  The 
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FIGURE  15.4.  Attention  Matrix. 


Main  Menu 

Please  choose  an  application. 


System  Options 


LTI:  AAVR7A1 
About  VuMan... 
Transfer  Data... 


FIGURE  1 5.5.  VuMan  3  Options  Screen. 


user  navigates  through  a  geographically  organized  hierarchy: 
top,  bottom,  front,  rear;  then  left,  right,  and  more  detail.  Even¬ 
tually,  at  the  node  leafs,  individual  components  are  identified. 
There  are  over  600  of  these  components.  Each  component  is 
indicated  to  be  “Serviceable”  or  “Unserviceable.”  If  it  is  service¬ 


able,  then  no  further  information  is  given.  If  it  is  unserviceable, 
then  one  of  a  small  list  of  reasons  is  the  next  screen. 

The  user  can  return  up  the  hierarchy  by  choosing  the  cate¬ 
gory  name  in  the  upper  right  corner,  or  sequence  to  the  next  se¬ 
lection  in  an  ordering  of  the  components.  Once  a  component 
is  marked  as  serviceable  or  unserviceable,  the  next  selection  in 
the  sequence  is  automatically  displayed  for  the  user.  Eurther- 
more,  each  component  has  a  probability  of  being  serviceable  as¬ 
sociated  with  it,  and  the  cursor  is  positioned  over  the  most 
likely  response  for  that  component. 

The  screen  contains  navigational  information.  Sometimes 
there  is  more  on  a  logical  screen  than  can  fit  on  a  physical 
screen.  Screen  navigation  icons  are  on  the  left-hand  side  of  the 
screen.  The  user  can  go  to  the  previous  physical  screen  or  next 
physical  screen  that  are  functional  parts  of  the  logical  screen. 
The  user  can  always  go  back  to  the  main  menu.  In  Eig.  15.6.  the 
options  include  the  vehicle  number,  number  of  hours  on  the  ve¬ 
hicle,  vehicle  serial  number,  and  so  forth,  which  is  used  to  dis¬ 
tinguish  this  report  apart  from  other  reports.  The  inspection  is 
divided  into  sections,  and  different  people  can  be  inspecting  dif¬ 
ferent  sections  in  parallel.  The  inspector  would  pick  a  section, 
highlight  it  by  rotating  the  dial,  and  then  select  the  highlighted 
item  by  pressing  a  button.  The  inspector  would  then  receive  a 
detailed  set  of  instructions  on  how  to  proceed.  In  Eig.  15.7,  the 
inspector  is  instructed  to  check  for  damage  and  bare  metal. 
The  “smart  cursor”  anticipates  that  the  inspector  will  be  filling  in 
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Status  Options:  Choose  one  of  the  options  beiow 


Serviceable 
Unserviceable 
Missing 
On  ERO 

None  of  the  above 


FIGURE  15.6.  VuMan  3  Information  Screen.  FIGURE  15.8.  VuMan  3  Status  Screen. 
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FIGURE  15.7.  VuMan  3  Hull  Forward  Screen. 


the  “Status  Field,”  whose  current  value  is  “none.”  By  clicking,  a 
list  of  options  is  displayed,  the  first  of  which  is  “Serviceable.” 
With  the  Marine  LTI,  the  item  is  serviceable  in  80%  of  the  cases. 
By  ordering  the  most  probable  selection  first,  the  interface  em¬ 
ulates  a  paper  checklist  where  most  of  the  items  will  be  checked 
as  “OK.”  The  smart  cursor  then  assumes  the  most  likely  step. 
There  is  no  need  to  even  move  the  dial — the  user  merely  clicks 
on  the  highlighted  option.  For  example,  in  Fig.  15.8,  “Service¬ 
able”  has  been  filled  in  and  the  box  signifying  the  next  activity 
is  “Next.”  If  all  entries  are  “Serviceable,”  one  would  simply  tap 
the  button  multiple  times.  If  an  item  is  “Unserviceable,”  the  dial 
is  turned  and  “Unserviceable”  is  selected.  Next,  a  list  of  reasons 
why  that  particular  device  was  unserviceable  would  appear.  The 
dial  is  rotated  and  one  or  more  of  the  options  are  selected. 
Since  more  than  one  reason  may  be  selected  as  to  why  it’s  un¬ 
serviceable,  “Done”  is  selected  to  indicate  completion.  The  se¬ 
lected  items  would  appear  in  the  “Comment”  field.  When  the 


checklist  is  completed,  the  data  is  uploaded  to  the  logistics 
computer,  which  would  then  generate  the  job  work  orders. 

Several  lessons  were  derived  from  building  the  system.  As 
part  of  the  design  cycle,  a  mouse  (essentially,  a  disk  with  but¬ 
tons)  was  tested;  however,  the  physical  configuration  of  the  de¬ 
vice  could  be  ambiguous.  Was  the  left  button  in  the  proper  po¬ 
sition  when  the  mouse’s  tail  was  towards  the  user  or  away  from 
the  user?  Were  the  buttons  supposed  to  be  at  the  top?  The  dial 
removed  this  orientation  ambiguous. 

Another  design  lesson  was  to  minimize  cables.  An  earlier 
system  had  a  cable  connecting  the  battery,  a  cable  for  the 
mouse,  and  a  cable  for  the  display.  These  wires  quickly  became 
knotted.  To  avoid  this  problem,  the  VuMan  3  design  used  inter¬ 
nal  batteries,  and  the  dial  was  built  into  the  housing.  The  only 
remaining  wire  connected  it  to  the  display. 

A  third  lesson  was  that  wearable  computers  have  a  minimum 
footprint  that  is  comfortable  for  a  user’s  hand.  While  the  key¬ 
boards  of  palmtop  computers  are  getting  smaller,  evolution  has 
not  correspondingly  shrunk  our  fingers.  The  thickness  of  the 
electronics  will  become  thinner.  Eventually,  it  will  be  as  thick  as  a 
sheet  of  plastic  or  incorporated  into  clothing;  however,  the 
interface  will  have  a  minimal  footprint.  Furthermore,  the  inter¬ 
face — no  matter  where  it  is  located  on  the  user’s  body — is  oper¬ 
ated  in  the  same  way.  This  is  a  major  feature  of  the  dial.  It  can  be 
worn  on  a  user’s  hip  or  in  the  small  of  the  user’s  back.  In  airplane 
manufacturing,  where  workers  navigate  small  spaces,  the  hip  de¬ 
fines  the  smallest  diameter  through  which  the  person  can  enter. 
Here  a  shoulder  holster  is  preferred  for  the  wearable  computer. 
The  Marines’  oversized  coverall  pockets  were  an  advantage  for 
the  system. 

The  soldiers  could  drop  the  computer  into  their  coveralls 
and  operate  it  through  the  cloth  of  the  pocket.  In  terms  of  sim¬ 
plicity,  as  well  as  orientation  independence,  the  dial  integrated 
with  the  presentation  of  information  on  the  screen.  Everything 
on  the  screen  could  be  considered  to  be  on  a  circular  list.  In 
most  cases,  less  than  a  dozen  items  on  a  screen  are  selectable. 
This  sparse  screen  is  an  advantage  on  a  head  mounted  display 
where  the  user  may  be  reading  while  are  moving.  The  font  must 
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be  large  enough  to  read  while  the  screen  is  bouncing.  The  dial 
should  be  an  intuitive  interface  for  web  browsing.  Probably 
there  are  less  than  a  half  dozen  items  on  a  typical  page  to  se¬ 
lect,  and  it  is  rotated  clockwise  or  counter  clockwise.  A  button  is 
then  used  to  select  the  highlighted  item.  VuMan  3  had  three 
types  of  buttons  that  all  performed  the  same  function.  The  but¬ 
tons  support  left-hand  and  right-hand  thumb-dominant  as  well 
as  a  central  button  for  finger-dominant  users. 


Mobile  Keyboards 

The  VuMan  3  addressed  the  problem  of  menu  selection  in  the 
mobile  domain  and  effectively  used  a  ID  dial  to  create  a  point¬ 
ing  device  that  can  be  used  in  many  different  mobile  domains. 
For  tasks  such  as  wireless  messaging,  however,  more  free-form 
text  entry  is  needed.  While  speech  technology  has  made  great 
strides  in  the  last  decade,  speech  recognition  is  very  difficult  in 
the  mobile  environment  and  often  suffers  from  high  error  rates. 
In  addition,  speech  is  often  not  socially  acceptable  (e.g.,  in  hos¬ 
pitals,  meetings,  classrooms,  etc.).  Keyboard  interfaces  still  pro¬ 
vide  one  of  the  most  accurate  and  reliable  methods  of  text  entry. 

Since  2000,  wireless  messaging  has  been  creating  billions  of 
dollars  of  revenue  for  mobile  phone  service  providers,  and  over 
1  trillion  messages  are  currently  being  typed  per  year.  Until  re¬ 
cently,  many  of  these  messages  were  created  using  the  Multi¬ 
tap  or  T9  input  method  on  phone  keypads.  Yet  studies  have 
shown  that  users  average  a  slow  10-20  words  per  minute  (wpm) 
using  these  common  typing  methods  (for  comparison,  a  highly 
skilled  secretary  on  a  desktop  averages  70-90  wpm).  Given  the 
obvious  desire  for  mobile  text  input,  HCI  researchers  have  be¬ 
gun  re-examining  keyboards.  While  keyboard  entry  has  been 
well  studied  in  the  past,  mobility  suggests  intriguing  possibili¬ 
ties.  For  example,  if  an  adequate  method  of  typing  can  be  com¬ 
bined  with  a  sufficient  display  for  the  mobile  market,  computing 
may  move  “off-the-desktop”  permanently. 

Traditionally,  text  entry  studies  emphasize  learnability  speed, 
and  accuracy;  however,  a  mobile  user  may  not  be  able  to  de¬ 
vote  all  of  his  or  her  attention  to  the  text  entry  process.  For  ex¬ 
ample,  the  user  may  be  taking  notes  on  a  conversation  and  wish 
to  maintain  eye  contact  with  his  or  her  conversational  partner, 
or  the  user  may  be  in  a  meeting  and  may  hide  the  keyboard  un¬ 
der  the  desk  to  avoid  distracting  others  with  the  keyboard’s 
noise  and  his  or  her  finger  motions.  The  user  might  also  at¬ 
tempt  to  enter  text  while  walking  and  need  to  attend  to  the 
physical  environment  instead  of  looking  at  the  screen.  These 
conditions  all  describe  “blind”  typing,  where  the  user  enters  text 
with  only  occasional  glances  at  the  screen  to  ensure  that  the  text 
has  been  entered  correctly. 

Lyons,  Starner,  and  colleagues  (2004)  and  Clawson,  Lyons, 
Starner,  and  Clarkson  (2005)  have  performed  longitudinal  stud¬ 
ies  on  two  keyboards — (a)  Handykey’s  Twiddler  (Fig.  15.9)  and 
(b)  the  mini-QWERTY  “thumb”  keyboard  (Fig.  15.10) — to  de¬ 
termine  if  they  might  achieve  desktop-level  text  entry  in  the  mo¬ 
bile  domain.  As  the  “average”  desktop  entry  rate  was  considered 
to  be  30  wpm,  including  hunt-and-peck  typists,  this  benchmark 
was  chosen  as  the  minimum  for  speed.  Traditionally,  very  high 
accuracy  is  desired  for  desktop  typing.  As  a  culture  of  informal 
e-mail  and  SMS  messaging  has  developed,  however,  less  accu- 


F1GURE15.9.  Handykey’s  Tviddler. 


FIGURE  1 5 . 1 0 .  Mini-QWERTY  Thumb  Keyboard . 
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rate  typing  has  become  common.  The  community  is  debating 
how  to  reconcile  speed  and  accuracy  measures;  however,  error 
rates  of  approximately  5%  per  character  are  common  in  current 
mobile  keyboard  studies. 

With  the  Twiddler,  novices  averaged  4  wpm  during  the  first 
20-minute  session  and  averaged  47  wpm  after  25  hours  of  prac¬ 
tice  (75  20-minute  sessions).  The  fastest  user  averaged  67  wpm, 
which  is  approximately  the  speed  of  one  of  the  authors  who  has 
been  using  the  Twiddler  for  twelve  years.  While  25  hours  of  prac¬ 
tice  seems  extreme,  a  normal  high  school  typing  class  involves  al¬ 
most  three  times  that  training  time  to  achieve  a  goal  of  40  wpm. 

Even  so,  mobile  computer  users  may  already  have  experi¬ 
ence  with  desktop  QWERTY  keyboards.  Due  to  their  familiarity 
with  the  key  layout,  these  users  might  more  readily  adopt  a 
mini-QWERTY  keyboard  for  mobile  use.  Can  a  mini-QWERTY 
keyboard  achieve  desktop  rates?  The  study  performed  by  Claw¬ 
son  and  colleagues  (2005)  examined  the  speed  and  accuracy  of 
experienced  desktop  typists  on  two  different  mini-QWERTY  de¬ 
signs.  These  subjects  averaged  30  wpm  during  the  first  twenty- 
minute  session  and  increased  to  60  wpm  by  the  end  of  four  hun¬ 
dred  minutes  of  practice! 

While  both  of  these  studies  easily  achieved  desktop  typing 
rates  and  had  error  rates  comparable  to  past  studies,  can  these 
keyboards  be  used  while  mobile?  While  neither  study  tested 
keyboard  use  while  the  user  was  walking  or  riding  in  a  car,  both 
experimented  with  blind  text  entry  (in  that,  in  at  least  one  con¬ 
dition,  typists  could  not  look  at  the  keyboard  nor  the  output  of 
their  typing).  When  there  was  a  statistically  significant  difference 
between  blind  and  normal  typing  conditions,  experienced  Twid¬ 
dler  typist  slightly  improved  their  speeds  and  decreased  their 
error  rates.  Experienced  mini-QWERTY  typists,  however,  were 
significantly  inhibited  by  the  blind  condition,  with  speeds  of 
46  wpm  and  approximately  three  times  the  error  rate  even  af¬ 
ter  TOO  minutes  of  practice.  These  results  might  be  expected  in 
that  Twiddler  users  are  trained  to  type  without  visual  feedback 
from  the  keyboard,  whereas  the  mini-QWERTY  keyboard  design 
assumes  that  the  user  can  see  the  keyboard  to  help  disam¬ 
biguate  the  horizontal  rows  of  small  keys. 

The  results  of  these  studies  demonstrate  that  there  are  mul¬ 
tiple  ways  that  desktop  typing  rates  can  be  achieved  on  a  mobile 
device.  The  question  remains,  however,  whether  the  benefits 
of  typing  quickly  while  “blind”  or  moving  will  be  sufficient  to 
cause  users  to  learn  a  new  text  entry  method.  Other  benefits 
might  also  affect  the  adoption  of  keyboards  in  the  future.  For  ex¬ 
ample,  a  12-button  device  such  as  the  Twiddler  can  be  the  size  of 
a  small  mobile  phone  and  still  perform  well,  while  40-button 
mini-QWERTY  keyboards  may  have  already  shrunk  as  much  as  is 
possible  for  users’  hands.  Another  factor  may  be  adoption  of 
mobile  computing  in  developing  countries.  According  to  Tech- 
web,  almost  1  billon  mobile  phones  were  shipped  in  2005.  Many 
new  mobile  phone  users  will  not  have  learned  to  type  on  a  Ro¬ 
man  alphabet  keyboard  and  may  be  more  concerned  with  quick 
learning  than  compatibility  with  desktop-input  skills. 

Speech  Interfaces 

Vocollect.  Mobile  keyboards  are  not  suitable  for  applica¬ 
tions  in  which  hands-free  control  is  necessary,  such  as  ware- 


FIGURE  15.11.  Vocollect’s  audio-based  wearable  computer. 


house  applications.  Pittsburgh-based  Vocollect  focuses  on 
package  manipulation — in  particular,  the  warehouse-picking 
problem.  In  this  scenario,  a  customer  places  an  order  consist¬ 
ing  of  several  different  items  stored  in  a  supplier’s  warehouse. 
The  order  transmits  from  the  warehouse’s  computer  to  an  em¬ 
ployee’s  wearable  computer.  In  turn,  each  item  and  its  location 
are  spoken  to  the  employee  through  a  pair  of  headphones.  The 
employee  can  control  how  this  list  is  announced  through  feed¬ 
back  via  speech  recognition,  and  can  also  report  inventory  er¬ 
rors  as  they  occur.  The  employee  accumulates  the  customer’s 
order  from  the  warehouse’s  shelves  and  ships  it.  This  audio- 
only  interface  also  frees  the  employee  to  manipulate  packages 
with  both  hands,  whereas  a  pen-based  system  would  be  con¬ 
siderably  more  awkward.  As  of  December  2000,  Vocollect  had 
approximately  15,000  users  and  revenues  between  U.S.  $10  and 
$25  million. 

Navigator  wearable  computer  with  speech  input. 

Boeing  has  been  pioneering  “augmented  reality”  using  a  head- 
mounted,  see-through  display.  As  the  user  looks  at  the  aircraft, 
the  next  manufacturing  step  is  superimposed  on  the  appropri¬ 
ate  portion  of  the  aircraft.  One  of  their  first  applications  is  fab¬ 
rication  of  wire  harnesses.  Every  aircraft  is  essentially  unique. 
They  may  be  from  different  airlines.  Even  if  they  are  from  the 
same  airline,  one  might  be  configured  for  a  long-haul  route  and 
another  for  a  short-haul  route.  The  airline  may  specify  different 
configurations.  For  example,  their  galleys  will  be  in  different 
places,  the  wire  harnesses  would  change,  and  so  forth.  Wire 
harnesses  are  fabricated  months  before  they  are  assembled  into 
the  aircraft.  The  assembly  worker  starts  with  a  Peg-board  mea¬ 
suring  about  three  feet  high  and  six  feet  long.  Mounted  on  the 
board  is  a  full-sized  diagram  of  the  final  wire  harness.  Pegs  pro¬ 
vide  support  for  the  bundles  of  wire  as  they  form.  The  worker 
(a)  selects  a  precut  wire,  (b)  reads  its  identification  number, 
(c)  looks  up  the  wire  number  on  a  paper  list  to  find  the  start- 
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ing  coordinates  of  the  wire,  (d)  searches  for  the  wire  on  the  di¬ 
agram,  and  (e)  threads  the  wire  following  the  route  on  the  dia¬ 
gram.  With  augmented  reality,  the  worker  selects  a  wire  and 
reads  the  wire  identification  from  the  bar  code.  A  head  tracker 
provides  the  computer  with  information  on  where  the  worker  is 
looking  and  superimposes  the  route  for  that  particular  wire  on 
the  board.  Trial  evaluations  indicate  a  savings  of  25%  of  the  as¬ 
sembly  effort  primarily  due  to  elimination  of  cross-referencing 
the  wire  with  paper  lists. 

The  Navigator  2,  circa  1995,  was  designed  for  a  voice- 
controlled  aircraft  inspection  application  (Siewiorek,  Smailagic, 
&  Lee,  1994).  The  speech  recognition  system,  with  a  secondary, 
manually  controlled  cursor,  offers  complete  control  over  the  ap¬ 
plication  in  a  hands-free  manner,  allowing  the  operator  to  per¬ 
form  an  inspection  with  minimal  interference  from  the  wearable 
system.  Entire — or  portions  of — aircraft  manuals  can  be 
brought  on-site  as  needed,  using  wireless  communication.  The 
results  of  inspection  can  then  be  downloaded  to  a  maintenance 
logistic  computer. 

Consider  one  portion  of  Navigator  2’s  application,  three- 
dimensional  inspections.  The  application  was  developed  for 
McClellan  Air  Force  Base  in  Sacramento,  California  and  the  KC- 
135  aerial  refueling  tankers.  Every  five  years,  these  aircraft  are 
stripped  down  to  bare  metal.  The  inspectors  use  magnifying 
glasses  and  pocket  knives  to  hunt  for  corrosion  and  cracks. 

At  startup.  Fig.  15.12  the  application  prompts  the  user  for 
either  their  choice  of  activating  the  speech  recognition  system 
or  not.  The  user  then  proceeds  to  the  Main  Menu.  From  this  lo¬ 


cation,  several  options  are  available,  including  online  documen¬ 
tation,  assistance,  and  the  inspection  task  (Fig.  15.13).  Once 
the  user  chooses  to  begin  an  inspection,  information  about  the 
inspection  is  entered,  an  aircraft  type  to  examine  is  selected, 
and  the  field  of  interest  is  narrowed  from  major  features  (Left 
Wing,  Right  Tail,  etc.;  Fig.  15.14)  to  more  specific  details  (indi¬ 
vidual  panes  in  the  cockpit  window  glass;  Fig.  15.15).  A  coordi¬ 
nate  system  is  superimposed  on  the  inspection  region.  The  hor¬ 
izontal  coordinates  begin  from  the  nose  and  the  vertical 
coordinates  are  “water  lines”  derived  as  if  the  airplane  was  float¬ 
ing.  The  inspector  records  each  imperfection  in  the  skin  at  the 
corresponding  location  on  the  display.  The  area  covered  by  each 
defect  is  recorded,  as  well  as  the  type  of  defect,  such  as  Cor¬ 
roded,  Cracked,  or  Missing.  To  maximize  usability,  each  item  or 
control  may  be  selected  simply  by  speaking  its  name.  Figure 
15.16  shows  the  Navigator  2  systems  in  use. 

The  user  navigates  to  the  display  corresponding  to  the  por¬ 
tion  of  the  skin  currently  being  inspected.  This  navigation  is  par¬ 
tially  textual  based  on  buttons  (choose  aircraft  type  to  be  in¬ 
spected)  and  partially  graphical  based  on  side  perspectives  of 
the  aircraft  (choose  area  of  aircraft  currently  being  inspected). 
The  navigation  can  be  performed  either  through  a  joystick  input 
device  or  through  speech  input.  The  speech  input  is  exactly  the 
text  that  would  be  selected.  The  positioning  of  the  imperfection 
is  done  solely  through  the  joystick,  since  speech  is  not  well 
suited  for  the  pointing  necessary  to  indicate  the  position  of  the 
imperfection.  As  the  cursor  is  moved  by  the  joystick,  the  coor¬ 
dinates  and  the  type  of  material  represented  by  the  cursor  is  dis- 
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FIGURE  15.13.  Navigator  2  Main  Menu . 


FIGURE  15.14.  Navigator  2  Region  Selection. 


FIGURE  15.15.  Sheet  metal. 


FIGURE  15.16.  Navigator  2. 


played  at  the  bottom  of  the  screen.  If  a  defect  is  at  the  current 
position,  a  click  produces  a  list  of  reasons  why  that  material 
would  be  defective,  such  as  corrosion,  scratch,  and  so  forth.  The 
defect  type  can  be  selected  by  the  joystick  or  by  speaking  its 
name,  and  the  information  would  go  into  the  database.  The 
user  can  navigate  to  the  main  selection  screen  by  selecting  the 
“Main  menu”  option  on  all  of  the  screens.  One  level  up  In  the  hi¬ 
erarchy  can  also  be  achieved  through  a  single  selection. 

The  relationship  between  the  user  interface  design  princi¬ 
ples  and  the  Navigator  2  user  interface  is: 

•  Simplicity  of  function.  The  only  functions  available  to  the 
user  are  to  enter  skin  imperfections  for  one  of  four  aircrafts, 
to  transfer  data  to  another  computer,  to  enter  identification 
information  both  for  the  vehicle  and  for  the  inspector,  and 
to  see  a  screen  that  describes  the  Navigator  2  project. 

•  No  textual  input.  The  identification  information  required  en¬ 
tering  numbers.  A  special  dialogue  was  developed  to  enable 
the  entering  of  numeric  information  using  the  joystick  as  an 
input  device.  This  was  cumbersome  for  the  users,  but  only 
needed  to  be  performed  once  per  inspection. 

•  Controlled  navigation.  The  interface  was  arranged  as  a  hier¬ 
archy.  The  top  level  consisted  of  a  menu  that  gave  a  choice  of 
function.  Once  the  inspection  function  and  then  the  vehicle 
were  chosen,  the  area  of  the  skin  inspected  was  navigated  to 
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via  selecting  an  area  of  the  aircraft  to  expand.  Once  an  imper¬ 
fection  was  indicated,  the  user  had  to  select  one  of  the  allow¬ 
able  types  of  imperfections.  At  each  stage,  the  user  could  go 

up  one  level  of  the  hierarchy  or  return  to  the  main  menu. 

One  of  the  lessons  learned  with  Navigator  2  is  the  power  of 
forcing  the  use  of  a  common  vocabulary.  Since  the  average  age 
of  the  aircraft  is  35  years,  the  types  of  defects  encountered  is  a 
very  stable  set.  Previously,  one  inspector  would  call  a  defect 
“gouged,”  while  another  inspector  would  call  the  same  defect  a 
“scratch.”  What  is  the  difference  between  a  gouge  and  a  scratch? 
How  much  material  does  it  take?  How  much  time  does  it  take  to 
repair?  What  skill  of  labor  is  needed?  The  logistics  problem  is 
much  more  difficult  without  a  standardized  vocabulary.  Thus 
there  is  a  serendipitous  advantage  in  injecting  more  technology. 

A  second  lesson  is  that  in  some  cases,  the  speech  recognition 
front  end  mistakenly  produces  the  wrong  output.  Speech  recog¬ 
nition  systems  typically  have  an  error  rate  of  2%  to  10%.  The 
unexpected  output  may  cause  the  application  to  produce  the 
wrong  result.  In  one  of  Navigator  2’s  early  demonstrations, 
the  user  was  attempting  to  exit  the  application;  however,  the 
speech  recognition  system  thought  a  number  was  spoken.  At 
that  point,  the  application  was  expecting  a  second  number,  but 
there  was  no  match  since  the  user  was  saying  “Exit,”  “Quit,” 
“Bye,”  and  so  forth.  The  system  appeared  to  be  frozen,  when 
in  actuality,  there  was  a  mismatch  between  what  the  applica¬ 
tion  software  was  expecting  and  what  the  user  thought  the 
application  state  was.  The  solution  was  to  give  the  user  more 
feedback  on  the  state  of  the  application  by  additional  on-screen 
clues.  Also  developed  was  a  novel  application  input  test  gener¬ 
ator  that  took  a  description  of  the  interface  screens  and  created 
a  list  of  all  possible  legal  exits  from  each  screen. 

A  third  lesson  learned  was  the  criticality  of  response  time. 
When  speech  recognition  was  done  in  software  on  Navigator 
(circa  1995)  it  was  12  times  real  time,  which  became  very  frus¬ 
trating.  People  are  less  patient  when  they  are  on  the  move  than 
they  are  when  they  are  at  a  desktop.  People  at  a  desk  are  will¬ 
ing  to  wait  three  minutes  for  the  operating  system  to  boot  up, 
but  when  a  user  is  on  the  move,  expectations  are  for  instant  re¬ 
sponse  like  that  of  portable  tools  such  as  a  flashlight.  For  exam¬ 
ple,  some  airplanes  have  a  digital  computer  to  control  the  pas¬ 
sengers’  overhead  lights.  It  is  disconcerting  that  when  the 
button  is  pushed,  two  or  three  seconds  may  pass  before  the 
light  turns  on.  Even  a  couple-second  delay  in  a  handheld  de¬ 
vice  is  disruptive.  Users  typically  continue  to  push  buttons  un¬ 
til  they  receive  a  response.  The  extra  inputs  cause  a  disconnect 
between  the  software  and  the  user.  The  software  receives  a 
stream  of  inputs,  but  the  user  sees  outputs  that  are  related  to  in¬ 
puts  given  a  long  time  before  the  screen  actually  appears.  The 
situation  is  similar  to  listening  to  yourself  talk  when  there  is  a 
second  or  two  delay  in  the  sound  played  back.  Such  delays  can 
easily  confuse  users. 

The  field  evaluation  indicated  that  inspection  is  composed  of 
three  phases.  The  inspectors  would  spend  the  same  amount  of 
time  maneuvering  their  cherry  picker  to  access  a  region  of  the 
airplane,  visually  inspecting  and  feeling  the  airplane’s  skin,  and 
recording  the  defect’s  type  and  location.  Navigator  2  reduced 
the  paperwork  time  by  half  resulting  in  an  overall  time  savings 
of  about  18%.  Training  time  to  familiarize  inspectors  with  the 


use  of  Navigator  2  was  about  five  minutes,  after  which  they 
would  proceed  with  actual  inspections.  A  major  goal  of  field 
evaluations  is  that  users  perform  productive  work.  They  do  not 
want  to  redo  something  that  was  already  done  once. 

The  typical  inspection,  which  discovers  approximately  100 
defects,  requires  about  36  hours.  Today,  the  inspector  takes 
notes  on  a  clipboard.  Upon  completion,  the  inspector  fills  out 
forms  on  a  computer.  The  inspector  spends  two  to  three  min¬ 
utes  entering  each  defect.  The  data  entry  is  thus  an  additional 
three-to-four-hour  task.  Navigator  2  transmits  the  results  of  the 
inspection  by  radio  in  less  than  two  minutes. 

In  summary,  evaluations  of  inspectors  before  and  after  the  in¬ 
troduction  of  Navigator  2  indicated  a  50%  reduction  in  the  time 
to  record  inspection  information  (for  an  overall  reduction  of 
18%  in  inspection  time)  and  almost  two  orders  of  magnitude  re¬ 
duction  in  time  to  enter  inspection  information  into  the  logis¬ 
tics  computer  (from  over  three  hours  to  two  minutes) .  In  addi¬ 
tion,  Navigator  2  weighs  two  pounds,  compared  to  the  cart  the 
inspectors  currently  use,  which  carries  25  pounds  of  manuals. 

Speech  translation.  The  SR/LT  application  (Speech 
Recognition/Language  Translation)  consists  of  three  phases: 
(a)  speech-to-text  language  recognition,  (b)  text-to-text  lan¬ 
guage  translation,  and  (c)  text-to-speech  synthesis.  The  applica¬ 
tion  running  on  TIA-P  (Tactical  Information  Assistant-Prototype, 
circa  1996)  is  the  Dragon  Multilingual  Interview  System  (MIS), 
jointly  developed  by  Dragon  Systems  and  the  Naval  Aerospace 
and  Operational  Medical  Institute  (NAOMI) .  It  is  a  keyword-trig¬ 
gered,  multilingual  playback  system,  which  listens  to  a  spoken 
phrase  in  English,  proceeds  through  a  speech  recognition  front- 
end,  plays  back  the  recognized  phrase  in  English,  and  after 
some  delay  (~8-10  secs),  synthesizes  the  phrase  in  a  foreign 
language  (Croatian).  The  other,  local  person  can  answer  with 
“Yes,”  “No,”  and  some  pointing  gestures.  The  Dragon  MIS  has 
about  45,000  active  phrases,  in  the  following  domains:  medical 
examination,  mine  fields,  road  checkpoints,  and  interrogation. 
Therefore,  a  key  characteristic  of  this  application  is  that  it  deals 
with  a  fixed  set  of  phrases,  and  includes  one-way  communica¬ 
tion.  A  similar  system  is  used  in  Iraq  as  a  briefing  aid  to  interro¬ 
gate  former  Iraqi  intelligence  officials  and  to  speak  with  civilians 
about  information  relevant  to  locating  individuals  (Chisholm, 
2004).  This  shows  the  viability  of  the  approach. 

TLA-P  is  a  commercially  available  system,  developed  by  CMU, 
incorporating  a  133  MHz  586  processor,  32MB  DRAM,  2  GB  IDE 
Disk,  full-duplex  sound  chip,  and  spread  spectrum  radio  (2Mbps, 
2.4  GHz)  in  a  ruggedized,  handheld,  pen-based  system  designed 
to  support  speech  translation  applications.  TIA-P  is  shown  in 
Fig.  15.17. 

Dragon  loads  into  memory  and  stays  memory  resident.  The 
translation  uses  uncompressed  ~20  KB  of  .'WAV  files  per  phrase. 
There  are  two  channels  of  output:  (a)  the  first  plays  in  English, 
and  (b)  the  second  plays  in  Croatian.  A  stereo  signal  can  be  split, 
with  one  channel  directed  to  an  earphone,  and  the  second  to  a 
speaker.  This  is  done  in  hardware  attached  to  the  external 
speaker.  An  Andrea  noise-canceling  microphone  is  used  with 
an  on-off  switch. 

Speech  translation  for  one  language  (Croatian)  requires  a  to¬ 
tal  of  60  MB  disk  space.  The  speech  recognition  requires  an  ad¬ 
ditional  20-30  MB  of  disk  space. 
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FIGURE  15.17.  TIA-P  Wearable  Computer. 


TIA-P  has  been  tested  with  the  Dragon  speech  translation 
system  in  several  foreign  countries:  Bosnia  (Fig.  15.18),  Korea, 
and  Guantanamo  Bay,  Cuba.  TIA-P  has  also  been  used  in  human 
intelligence  data  collection  and  experimentation  with  the  use  of 
electronic  maintenance  manuals  for  F-16  maintenance. 

The  following  lessons  were  learned  during  the  TIA-P  field 
tests: 

•  Wires  should  be  kept  to  a  minimum. 

•  Handheld  display  was  convenient  for  checking  the  translated 
text. 

•  Standard,  external  electrical  power  should  be  available  for 
use  internationally. 

•  Battery  lifetime  should  be  extended. 

•  Ruggedness  is  important. 

The  smart  modules  (circa  1997)  are  a  family  of  wearable 
computers  dedicated  to  the  speech  processing  application 
(Smailagic,  Siewiorek,  &  Reilly,  2001).  A  smart  module  provides 
a  service  almost  instantaneously  and  is  configurable  for  different 
applications.  The  design  goals  also  included  (a)  reduce  latency, 
(b)  remove  context  swaps,  and  (c)  minimize  weight,  volume, 
and  power  consumption  (Reilly,  1998;  Martin,  1999).  The  func¬ 
tional  prototype  consists  of  two  functionally  specialized  mod¬ 
ules:  (a)  performing  language  translation  and  (b)  speech  recog¬ 
nition.  The  first  module  incorporates  speech  to  text  language 
recognition  and  text  to  speech  synthesis.  The  second  module 
performs  text-to-text  language  translation.  The  LT  module  runs 


FIGURE  15.18.  U.S.  Soldier  in  Balkans  Using  TIA-R 


the  PANLITE  language  translation  software  (Frederking  &  Brown, 
1996),  and  the  SR  module  runs  CMU’s  Sphinx  II  continuous, 
speaker-independent  speech  recognition  software  (Ravishankar, 
1996;  Li,  Hon,  Hwang,  &  Reddy,  1989)  and  Phonebox  Speech 
Synthesis  software. 

Figure  15.19  depicts  the  structure  of  the  speech  translator, 
from  English  to  a  foreign  language,  and  vice  versa.  The  speech  is 
input  into  the  system  through  the  Speech  Recognition  subsys¬ 
tem.  A  user  wears  a  microphone  as  an  input  device,  and  back¬ 
ground  noise  is  eliminated  using  filtering  procedures.  A  lan¬ 
guage  model,  generated  from  a  variety  of  audio  recordings  and 
data,  provides  guidance  for  the  speech  recognition  system  by 
acting  as  a  knowledge  source  about  the  language  properties. 
The  Language  Translation  engine  uses  an  Example-Based  Ma¬ 
chine  Translation  (EBMT)  system,  which  takes  individual  sen¬ 
tence  phrases  and  compares  them  to  a  corpus  of  examples  it 
has  in  memory  to  find  phases  it  knows  how  to  translate.  A  lexi¬ 
cal  MT  (glossary)  translates  any  unknown  word  that  may  be  left. 
The  EBMT  engine  translates  individual  “chunks”  of  the  sentence 
using  the  source  language  model  and  then  combines  them  with 
a  model  of  the  target  language  to  ensure  correct  syntax.  When 
reading  from  the  EBMT  corpus,  the  system  makes  several  ran¬ 
dom-access  reads  while  searching  for  the  appropriate  phrase. 
Since  random  reads  are  done  multiple  times,  instead  of  load¬ 
ing  large,  continuous  chunks  of  the  corpus  into  memory,  the 
disk  latency  times  will  be  far  more  important  than  the  disk  band¬ 
width.  The  Speech  Generation  subsystem  performs  text-to- 
speech  conversion  at  the  output  stage.  To  make  sure  that  mis- 
recognized  words  are  corrected,  a  Clarification  Dialog  takes 
place  on-screen.  It  includes  the  option  to  speak  the  word  again, 
or  to  write  it  in.  As  indicated  in  Fig.  15.19,  an  alternative  input 
modality  could  be  the  text  from  the  Optical  Character  Recogni¬ 
tion  subsystem  (such  as  scanned  documents  in  a  foreign  lan¬ 
guage),  which  is  fed  into  the  Language  Translation  subsystem. 

User  interface  design  went  through  several  iterations 
based  on  feedback  during  field  tests.  The  emphasis  was  on 
getting  completely  correct  two-way  speech  translation,  and 
having  an  easy  to  use,  straightforward  interface  for  the  clari¬ 
fication  dialogue. 
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FIGURE  1 5.20.  Speech  Recognizer  (SR)  and  Language  Translator 
(LT)  Smart  Module. 


FIGURE  15.19.  Speech  Translator  System  Structure. 


The  speech  recognition  code  was  profiled  and  tuned.  Profil¬ 
ing  was  performed  to  identify  “hot  spots”  for  hardware  and  soft¬ 
ware  acceleration,  as  well  as  to  reduce  the  required  computa¬ 
tional  and  storage  resources.  A  six-times  speedup  was  achieved 
over  the  original  desktop  PC  system  implementation  of  lan¬ 
guage  translation,  and  five  times  smaller  memory  requirements 
(Christakos,  1998).  Reducing  OS  swapping  and  code  optimiza¬ 
tion  made  a  major  impact.  Input  to  the  module  is  audio  and  out¬ 
put  is  ASCII  text.  The  speech  recognition  module  is  augmented 
with  speech  synthesis.  Figure  15.20  illustrates  a  combination  of 
the  language  translation  module  (LT),  and  speech  recognizer 
(SR)  module,  forming  a  complete  stand-alone,  audio-based,  in¬ 
teractive  dialogue  system  for  speech  translation. 

Target  languages  included  Serbo-Croatian,  Korean,  Creole 
French,  and  Arabic.  Average  language  translation  performance 
was  one  second  per  sentence. 

The  key  factors  that  determine  how  many  processes  can  be 
run  on  a  module  are  memory,  storage  space,  and  available  CPU 
cycles.  To  minimize  latency,  the  entirety  of  an  application’s 
working  dataset  should  be  able  to  stay  memory  resident. 

Figure  15.21  depicts  the  functional  prototype  of  the  Speech 
Translator  Smart  Module,  with  one  module  performing  language 
translation,  and  another  one  speech  recognition  and  synthesis. 


PERFORMANCE  EVALUATION 


Figure  15.22  illustrates  the  response  time  for  speech  recog¬ 
nition  applications  running  on  TIA-P  and  SR  Smart  Module.  As 
SR  is  using  a  lightweight  operating  system  (Linux)  versus  Win¬ 
dows  95  on  TIA-P  and  the  speech  recognition  code  is  more  cus¬ 
tomized,  it  has  a  shorter  response  time.  An  efficient  mapping 
of  the  speech  recognition  application  onto  the  SR  Smart  Mod¬ 
ule  architecture  provided  a  response  time  very  close  to  real¬ 
time.  To  ensure  system  responsiveness,  it  was  important  to  pro¬ 
vide  feedback  to  the  person  in  near  real  time. 

The  lessons  learned  from  tests  and  demonstrations  include: 
manual  intervention  process  to  correct  misrecognized  words  in¬ 
curs  some  delay;  swapping  can  diminish  the  performance  of  the 
language  translation  module;  the  size  of  display  can  be  as  small 
as  a  deck  of  cards. 


FIGURE  15.21.  Speech  Translator  SM  Functional  Prototype. 
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FIGURE  15.22.  Response  Times  (lower  is  better). 
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The  required  system  resources  for  speech  translator  soft¬ 
ware  are  several  times  smaller  than  for  the  laptop/workstation 
version,  as  shown  in  Table  15.2. 

Dual  Purpose  Speech 

In  industry,  most  speech  recognition  on  mobile  computers  con¬ 
centrates  on  the  tasks  of  form  filling  or  simple  interface  com¬ 
mands  and  navigation.  One  reason  is  that  speech  interfaces  are  of¬ 
ten  socially  interruptive  when  other  people  are  nearby.  Speech 
translation,  as  with  the  TIA  system  above,  is  a  different  class  of  in¬ 
terface.  The  computer  is  an  essential  enabler  of  the  conversation. 
Lyons,  Skeels,  and  colleagues  (2004)  introduced  a  different  type  of 
conversation  enabler  in  their  Dual  Purpose  Speech  work. 

Dual  Purpose  Speech  is  easiest  to  discuss  using  a  scenario. 
Tracy,  a  wearable  user  equipped  with  a  head-up  display  (HUD) 
and  a  Twidder  keyboard,  is  in  conversation  with  a  recently  in¬ 
troduced  colleague.  Pressing  a  button  on  the  keyboard,  the 
wearable  user  enables  speech  recognition  and  says,  “Bob,  what 
is  your  phone  number  so  that  I  have  it  for  later?” 

The  wearable  recognizes  that  its  user  wants  to  record  a 
phone  number  and  starts  the  user’s  contact  list  application.  It 
attempts  to  recognize  the  name  spoken  and  enters  that  into  the 
application;  however,  it  also  saves  the  speech  so  that  the  user 
can  correct  the  text  later  if  there  is  an  error. 

Bob  responds,  ‘Area  code  404” 

“404,”  repeats  Tracy. 

“555-1212,”  completes  Bob. 

“555-1212,”  continues  Tracy,  who  presses  another  button  on 
her  keyboard  indicating  the  interaction  is  over.  “Okay,  I 
have  it!” 

On  Tracy’s  head-up  display  a  new  contact  has  been  made  for 
“Bob  (404)  555-1212.”  When  Tracy  finishes  her  conversation,  she 
clicks  an  ‘Accept”  button  on  the  application  because  she  has  rec¬ 
ognized  the  information  correctly.  Tracy  could  also  edit  the  in¬ 
formation  or  play  back  the  audio  recorded  during  the  interaction 
with  Bob.  Note  that  Tracy  verbally  repeated  the  information  that 
Bob  provided — a  good  conversational  practice.  Tracy  both  con¬ 
firmed  that  she  understood  the  information  and  provided  Bob 
with  an  opportunity  to  correct  her  if  necessary;  however,  this 
practice  is  also  good  from  a  privacy  standpoint.  Tracy  wears  a 
noise-canceling  microphone,  which  is  thresholded  to  record 
only  her  own  voice  and  not  that  of  her  conversational  partners. 
In  this  way,  Tracy  respects  the  privacy  of  her  colleagues. 

Lyons,  Skeels,  and  colleagues  (2004)  have  designed  Dual 
Purpose  Speech  applications  for  scheduling  appointments,  pro¬ 
viding  reminders  for  the  user,  and  communicating  important  in¬ 
formation  to  close  colleague.  The  key  point  of  this  research 


TABLE  15.2.  Comparison  of  Required  System  Resources 


Laptop/ 

Functional  Module 

Optimized  Module 

Workstation 

SR/LT 

SR/LT 

Memory  Size 

195  MB 

53  MB 

41  MB 

Disk  Space 

1  GB 

350  MB 

200  MB 

from  the  perspective  of  this  section,  however,  is  that  these  appli¬ 
cations  allow  the  user  to  manipulate  information  on  their  wear¬ 
ables  as  part  of  the  process  of  communicating  (thus,  the  “dual 
purpose”  name).  The  users  may  actively  format  their  speech  so 
the  system  can  better  understand  them,  and  they  may  have  to 
correct  the  system  afterwards;  however,  the  interface  is  manipu¬ 
lated  and  the  information  is  entered  as  part  of  a  social  process. 

This  style  of  interface  provides  a  contrast  to  the  traditional 
desktop  computer,  where  the  user’s  attention  is  assumed  to  be 
dedicated  to  the  interface.  Other  wearable-computing-related 
fields  also  attempt  to  create  interfaces  that  are  driven  by  the 
user’s  interactions  with  the  environment.  For  example,  Feiner, 
MacIntyre,  and  Seligmann  (1993)  early  augmented  reality  systems 
attempted  to  display  appropriate  repair  instructions  based  on  the 
user’s  actions  during  the  repair  process.  Such  awareness  of  the 
user’s  context  and  goals  may  allow  wearable  computers  to  be  uti¬ 
lized  where  a  user’s  lack  of  attentional  or  physical  resources 
would  normally  preclude  traditional  desktop  applications. 

Perception 

Just  as  dexterity  is  impaired  when  a  user  is  on  the  go,  the  user’s 
ability  to  perceive  a  wearable’s  interface  is  also  lessened.  The 
vibration  and  visual  interference  from  a  moving  background  in¬ 
terferes  with  visual  tasks.  Background  noise  and  the  noise  from 
the  body  itself  affect  hearing.  The  moving  of  clothes  over  the 
body  and  the  coupling  of  mechanical  shock  through  the  body 
can  lessen  the  user’s  ability  to  perceive  tactile  displays.  Sears, 
Lin,  Jacko,  and  Xiao  (2003)  described  these  detriments  to  mo¬ 
bile  interaction  caused  by  environmental  and  situational  factors 
as  “Situationally-Induced  Impairments  and  Disabilities”.  These 
researchers  and  others  are  developing  procedures  to  test  hu¬ 
man  performance  in  mobile  computing  tasks  in  context  (in  this 
case,  walking  a  path;  Barnard,  2005,  in  press).  Such  research  is 
sorely  needed,  as  not  enough  is  known  about  how  to  ade¬ 
quately  simulate  mobile  computing  scenarios  in  testing.  For  ex¬ 
ample,  in  Barnard  and  colleagues’  (in  press)  work  on  perform¬ 
ing  reading  comprehension  tasks  on  PDAs  while  walking, 
lighting  levels  affected  workload  measures  more  when  walking 
a  path  than  when  walking  on  a  treadmill.  The  community  needs 
to  develop  understanding  about  the  interactions  between  mo¬ 
bility,  attention,  and  perception  in  common  mobile  computing 
scenarios  in  order  to  adequately  develop  testing  environments 
for  mobile  interfaces. 

In  the  past,  such  work  focused  on  cockpits,  both  for  aviation 
and  automobiles  (Wickens,  C.  D.  &  Hollands  J.  (1999).  Engi¬ 
neering  Psychology  and  Human  Performance  (3rd  Edition). 
Prentice  Hall;  Melzer,  &  Moffit,  1997;  "Velger,  1998);  however,  the 
U.S.  military’s  Land  Warrior  project  has  highlighted  the  need  for 
such  research  for  dismounted  users  who  are  on  the  go  (Black¬ 
wood,  1997).  Some  researchers  have  begun  exploring  mobile 
output  devices  for  very  specific  tasks.  For  example,  Krum  (2004) 
described  experiments  with  a  head-up  display  that  focused  on 
determining  how  to  render  overhead  views  of  an  area  to  en¬ 
courage  learning  of  the  layout  of  the  surrounding  environment 
while  the  user  is  navigating  to  a  goal  on  foot.  As  mobile-aug¬ 
mented  reality  is  becoming  practical  from  a  technical  standpoint, 
researchers  have  begun  to  address  perceptual  issues.  While  not 
a  mobile  experiment,  Laramee  and  Ware  (2002)  have  investi- 
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gated  head-mounted  displays  to  determine  the  relative  effects  of 
rivalry  and  visual  interference  between  binocular  and  monocular 
displays  with  varying  levels  of  transparency.  As  the  market  de¬ 
termines  which  mobile  contexts  are  most  important  for  users, 
experiments  such  as  these  will  help  determine  how  to  design 
interfaces  to  least  interfere  with  the  user’s  primary  tasks  while 
providing  the  most  value  in  terms  of  augmentation. 


CONCLUSION  AND  FUTURE  CHALLENGES 


Wearable  computers  are  an  attractive  way  to  deliver  a  ubiquitous 
computing  system’s  interface  to  a  user,  especially  in  non-office¬ 
building  environments.  The  biggest  challenges  in  this  area  deal 
with  fitting  the  computer  to  the  human  in  terms  of  interface, 
cognitive  model,  contextual  awareness,  and  adaptation  to  tasks 
being  performed.  These  challenges  include 

•  User  interface  models.  What  is  the  appropriate  set  of  meta¬ 
phors  for  providing  mobile  access  to  information  (e.g.,  what  is 
the  next  “desktop”  or  “spreadsheet”)?  These  metaphors  typi¬ 
cally  take  over  a  decade  to  develop  (e.g.,  the  desktop  metaphor 
started  in  early  1970s  at  Xerox  PARC  and  required  over  a  decade 
before  it  was  widely  available  to  consumers).  Extensive  experi¬ 
mentation  working  with  end-user  applications  will  be  required. 
Furthermore,  a  set  of  metaphors  may  each  be  tailored  to  a  spe¬ 
cific  application  or  a  specific  information  type. 

•  Input/output  modalities.  'While  several  modalities  mimicking 
the  input/output  capabilities  of  the  human  brain  have  been 
the  subject  of  computer  science  research  for  decades,  the 


accuracy  and  ease  of  use  (e.g.,  many  current  modalities  re¬ 
quire  extensive  training  periods)  are  not  yet  acceptable.  In¬ 
accuracies  produce  user  frustrations.  In  addition,  most  of 
these  modalities  require  extensive  computing  resources  that 
will  not  be  available  in  low-weight,  low-energy,  wearable  com¬ 
puters.  There  is  room  for  new,  easy-to-use  input  devices  such 
as  the  dial  developed  at  Carnegie  Mellon  University  for  list- 
oriented  applications. 

•  Quick  interface  evaluation  methodology.  Current  approaches 
to  evaluate  a  human  computer  interface  requires  elaborate 
procedures  with  scores  of  subjects.  Such  an  evaluation  may 
take  months  and  is  not  appropriate  for  use  during  interface 
design.  These  evaluation  techniques  should  especially  focus 
on  decreasing  human  errors  and  frustration. 

•  Matched  capability  with  applications.  The  current  thought  is 
that  technology  should  provide  the  highest  performance  capa¬ 
bility;  however,  this  capability  is  often  unnecessary  to  complete 
an  application  and  enhancements  such  as  full-color  graphics  re¬ 
quire  substantial  resources  and  may  actually  decrease  ease  of 
use  by  generating  information  overload  for  the  user.  Interface 
design  and  evaluation  should  focus  on  the  most  effective  means 
for  information  access  and  resist  the  temptation  to  provide  ex¬ 
tra  capabilities  simply  because  they  are  available. 

•  Context-aware  applications.  How  do  we  develop  social  and 
cognitive  models  of  applications?  How  do  we  integrate  input 
from  multiple  sensors  and  map  them  into  user  social  and 
cognitive  states?  How  do  we  anticipate  user  needs?  How  do 
we  interact  with  the  user?  These,  plus  many  other  questions, 
have  to  be  addressed  before  context  aware  computing  be¬ 
comes  possible.  Some  initial  results  have  been  reported  in 
Krause,  Smailagic,  and  Siewiorek  (2006). 
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In  our  last  version  of  this  chapter,  we  focused  on  fixed  location 
computing  workstations  and  situations,  but  acknowledged  the 
burgeoning  use  of  portable  information  devices  (PIDs)  and  pro¬ 
vided  a  few  ideas  to  consider  about  their  use.  We  indicated  that 
fixed  computer  environments,  while  still  a  significant  part  of 
workstation  design  considerations,  have  been  surpassed  by 
highly  mobile  information  technology  that  does  not  use  fixed 
workstations.  These  portable  technologies  are  now  in  use  in  al¬ 
most  every  venue  and  human  activity,  and  the  nature  of  their 
characteristics  and  activities  of  use  do  not  lend  them  to  tradi¬ 
tional  fixed  workstation  considerations.  This  introduces  a  host 
of  potential  ergonomic  concerns  related  to  the  design  of  work 
areas  (and  activities)  in  which  PIDs  and  other  forms  of  comput¬ 
ing  are  used.  Decades  of  research  and  applications  have  defined 
important  considerations  in  the  ergonomic  design  of  fixed  com¬ 
puter  work  areas  (Grandjean,  1979,  1984;  Stammerjohn,  Smith, 
&  Cohen,  1981;  Cakir,  Hart,  &  Stewart,  1979;  ANSI/HFES-100, 
1988;  Smith  &  Cohen,  1997;  Smith,  Carayon,  &  Cohen,  2003; 
BSR/HFES-100,  2005).  However,  very  little  has  been  done  to  de¬ 
fine  the  design  of  work  areas  for  PIDs  and  mobile  computing.  In 
this  chapter,  we  will  propose  some  ideas  and  considerations  for 
dealing  with  ergonomic  concerns  for  these  mobile  technologies 
in  addition  to  updating  information  on  fixed  computer  work¬ 
station  applications.  However,  we  admit  that  the  challenge  to 
define  good  ergonomic  practice  for  the  use  of  PIDs  is  large,  and 
our  advice  is  still  very  limited. 

Ergonomics  is  the  science  of  fitting  the  environment  and  ac¬ 
tivities  to  the  capabilities,  dimensions,  and  needs  of  people.  Er¬ 
gonomic  knowledge  and  principles  are  applied  to  adapt  work¬ 
ing  conditions  to  the  physical,  psychological,  and  social  nature 
of  the  person.  The  goal  of  ergonomics  is  to  improve  perfor¬ 
mance  while  enhancing  comfort,  health,  and  safety.  Computer 
workstation  design  is  more  than  just  making  the  computer  in¬ 
terfaces  easier  to  use,  or  making  furniture  adjustable  in  vari¬ 
ous  dimensions.  It  also  involves  integrating  design  considera¬ 
tions  with  the  work  environment,  the  task  requirements,  the 
social  aspects  of  work,  and  job  design.  Critical  considerations 
for  good  ergonomics  practice  are  to  limit  biomechanical  load¬ 
ing  on  the  back  and  joints  to  low  forces  of  short  duration;  to 
keep  the  back,  neck,  and  joints  in  good  postures  while  work¬ 
ing;  to  limit  the  amount  of  time  you  are  in  static  postures;  to 
limit  the  duration  and  extent  of  repetitive  motions  of  any  part 
of  your  body  including  your  voice;  to  work  in  environments 
where  you  can  easily  see  and  hear;  and  to  take  frequent  rest 
breaks  from  working. 

A  fundamental  perspective  in  this  chapter  is  that  work  area 
(workstation)  design  influences  employee  comfort,  health, 
motivation,  and  performance.  We  will  examine  basic  ergonomic 
considerations  (principles,  practices,  concerns)  that  can  be 
used  to  develop  guidance  for  the  design  of  work  areas  (work¬ 
stations)  for  the  use  of  computing  and  related  IT  such  as  PIDs. 
Some  of  today’s  technologies  have  the  capability  to  interact  di¬ 
rectly  with  each  other,  sometimes  without  human  intervention, 
while  other  times  requiring  human  action.  The  wide  myriad  of 
environments  of  use,  interaction  schemes,  and  activities  of  use 
make  specific  guidance  for  PIDs  complex  and  difficult.  Thus,  we 
will  provide  general  guidance  to  reduce  musculoskeletal  stress 
and  to  enhance  physical  comfort. 


ERGONOMIC  PERSPECTIVE 


Forty  years  ago,  a  person  would  interact  with  several  different 
types  of  information  sources  and  technologies  when  engaging 
in  activities,  such  as  work.  The  person  might  look  at  hardcopy 
documents,  take  notes  with  a  pen  and  a  paper  tablet,  use  a  fixed 
location  telephone,  talk  face-to-face  with  colleagues,  type  on  a 
typewriter,  and  perform  many  other  tasks  during  the  course  of 
the  working  day.  The  diversity  of  activities  led  people  to  move 
actively  around  during  the  day.  Then,  thirty  years  ago,  many 
people  started  to  spend  most  of  their  workdays  in  sedentary 
work  sitting  in  front  of  a  computer  terminal.  This  type  of  human- 
machine  system  led  to  restricted  physical  movement  and  em¬ 
ployee  attention  directed  toward  the  computer  monitor.  Today 
there  are  mixed  exposures  in  a  multidimensional  environment 
where  some  people  are  still  at  fixed  sedentary  workstations, 
while  others  are  interacting  with  multiple  technologies  that 
have  different  physical  characteristics.  In  some  instances,  people 
are  again  on  the  move  and  not  in  fixed,  sedentary  situations. 
Movement  is  good  for  the  muscles  while  sedentary  sitting  is  not 
if  it  lasts  for  too  long.  However,  the  interaction  with  technol¬ 
ogy  while  moving  may  create  dynamic  postural  loading  and  the 
awkward  application  of  perceptual-motor  skills  when  making  in¬ 
puts  to  technology  or  when  reading  and  listening  to  displays. 
For  example,  you  might  be  walking  through  the  airport  while 
checking  your  email  on  your  cell  phone. 

In  Sweden,  since  the  initial  work  of  Hultgren  and  Knave 
(1973),  Ostberg  (1975),  and  Gunnarsson  and  Ostberg  (1977), 
thousands  of  research  studies  from  every  corner  of  the  globe  have 
examined  the  working  conditions  of  computer  users  and  their  as¬ 
sociated  health  complaints.  There  have  been  several  international 
conferences  devoted  to  these  issues  starting  in  1980,  and  confer¬ 
ences  on  these  issues  are  already  programmed  through  2010.  The 
findings  from  this  research  and  the  meetings  have  generally  indi¬ 
cated  that  poor  ergonomic  and  task  design  conditions  are  associ¬ 
ated  with  large  numbers  of  computer  users  complaining  about 
visual  discomfort,  musculoskeletal  discomfort  and  pain,  and  psy¬ 
chological  distress.  In  fact,  if  these  adverse  ergonomic  conditions 
are  repeated  daily  over  a  long  period,  more  or  less  chronic  aches 
and  pains  of  the  upper  extremities  and  back  can  occur,  and  may 
involve  not  only  muscles  but  also  other  soft  tissues,  such  as  ten¬ 
dons  and  nerves.  Long  lasting,  adverse  ergonomic  conditions  may 
lead  to  a  deterioration  of  joints,  ligaments,  and  tendons.  Reviews 
of  field  studies  for  fixed  computer  workstations  (Bergquist,  1984; 
Bergquist  et  al.,  1992;  Smith,  1984,  1987,  1997;  Grandjean,  1979, 
1987;  Hagberg  et  al.,  1995;  Bernard,  1997;  Carayon,  Smith,  & 
Haims,  1999),  as  well  as  general  experience,  have  shown  that 
these  conditions  may  be  associated  with  a  higher  risk  of 

1.  Inflammation  of  the  joints 

2.  Inflammation  of  the  tendon-sheaths 

3.  Inflammation  of  the  attachment-points  of  tendons 

4.  Symptoms  of  degeneration  of  the  joints  in  the  form  of  chronic 
arthroses 

5.  Painful  muscles 

6.  Disc  troubles 

7.  Peripheral  nerve  disorders 
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The  relationship  between  the  user  and  technology  is  recip¬ 
rocal.  It  is  a  system  where  the  constraints  of  one  element  affect 
the  performance  of  the  other.  Various  aspects  of  the  system, 
such  as  the  task  requirements,  the  work  demands,  the  environ¬ 
ment,  and  the  workstation,  all  influence  how  effectively  and 
comfortably  the  technology  can  be  used  by  the  person  (Smith  & 
Sainfort,  1989;  Smith  &  Carayon,  1995;  Carayon  &  Smith,  2000). 
One  of  the  consequences  of  this  is  that  the  design  of  the  work¬ 
space  limits  the  nature  and  effectiveness  of  the  interaction  be¬ 
tween  the  person  and  the  technology.  For  instance,  inadequate 
space  for  carrying  out  physical  activities  can  lead  to  constrained 
postures,  which  together  with  long  lasting  static  loading  pro¬ 
duces  discomfort  in  the  back  and  shoulders.  This  leads  to  symp¬ 
toms  of  tiredness,  muscle  cramps,  and  pain.  In  addition,  heavy 
workload,  chronic  repetition,  and  other  biomechanical  strains 
can  cause  similar  problems.  These  adverse  postural,  repetition, 
and  workload  exposures  lead  to  reduced  performance  and  pro¬ 
ductivity,  and  in  the  long  run,  they  may  also  affect  employee  well 
being  and  health. 

Today,  people  carry  their  computing  and  communications 
with  them  and  engage  in  activities  in  any  available  work  area. 
The  potential  for  serious  ergonomic  risks  is  high  for  these  new 
modes  of  using  technology.  We  will  first  examine  problems  and 
design  considerations  for  fixed  workstations  where  PCs  and  lap¬ 
tops  are  used.  Then  we  will  look  at  PIDs. 


STUDIES  ON  THE  DESIGN 
OF  FIXED  WORKSTATIONS 


Workstation  design  is  a  major  element  in  ergonomic  strategies 
for  improving  user  comfort  and  particularly  for  reducing  mus¬ 
culoskeletal  problems.  Often,  the  task  requirements  will  have 
a  role  in  defining  the  layout  and  dimensional  characteristics 
of  the  workstation.  The  relative  importance  of  the  screen,  in¬ 
put  devices,  and  hardcopy  (e.g.,  source  documents)  depends 
primarily  on  the  task,  and  this  can  influence  the  design  con¬ 
siderations  necessary  to  improve  operator  performance,  com¬ 
fort,  and  health.  Many  studies  have  examined  fixed  computer 
workstation  design  issues  and  generally  have  found  that  the 
quality  of  the  fit  between  the  user  and  the  interactive  devices 
being  used  has  a  substantial  role  in  user  musculoskeletal  com¬ 
fort  and  musculoskeletal  pain  (Smith,  Carayon,  &  Cohen, 
2003).  For  instance,  Cohen,  James,  Taveira,  Karsh,  Scholz,  and 
Smith  (1995)  identified  the  following  working  conditions  that 
led  to  awkward  posture  and  undue  loads  on  the  musculoskele¬ 
tal  system: 

1.  Static  postures  of  the  trunk  neck  and  arms 

2.  Awkward  twisting  and  reaching  motions 

3.  Poor  lighting  and  glare 

4.  The  placement  of  the  keyboard  on  uneven  working  surfaces 

5.  Insufficient  work  surface  space 

6.  Insufficient  knee  and  toe  space 

7.  The  inability  for  the  chair  armrests  to  fit  under  the  working 
surfaces 


8.  Chairs  with  poor  back  and  shoulder  support,  inadequate 
padding  in  the  backrest  and  seat  pan,  arm  rests  that  did  not 
fit  under  working  surfaces,  and  a  lack  of  appropriate  seat  pan 
height  adjustment 

Based  on  this  and  several  other  studies,  ergonomics  experts 
have  proposed  guidance  for  designing  fixed  computing  work¬ 
stations.  Grandjean  (1984)  proposed  the  consideration  of  the 
following  features  of  fixed  computer  workstation  design: 

1.  The  furniture  should  be  as  flexible  as  possible  with  adjust¬ 
ment  ranges  to  accommodate  the  anthropometric  diversity  of 
the  users. 

2.  Controls  for  workstation  adjustment  should  be  easy  to  use. 

3.  There  should  be  sufficient  knee  space  for  seated  operators. 

4.  The  chair  should  have  an  elongated  backrest  with  an  ad¬ 
justable  inclination  and  a  lumbar  support. 

5.  The  keyboard  should  be  moveable  on  the  desk  surface.  (To¬ 
day  we  might  say  the  input  devices  should  be  able  to  be  po¬ 
sitioned  at  several  locations  on  the  working  surfaces.) 

Derjani-Bayeh  and  Smith  (1999;  Smith  &  Derjani-Bayeh,  2003) 
conducted  a  prospective  intervention  study  to  examine  the  ben¬ 
efits  of  ergonomic  redesign  for  computer  users  at  fixed  work¬ 
stations.  The  study  took  place  in  a  consumer  products  call  cen¬ 
ter  where  shoppers  could  order  products  from  a  catalog  using  a 
telephone  or  an  Internet  site.  There  were  three  ergonomic  in¬ 
terventions  studied.  In  the  first  condition,  ergonomics  experts 
provided  modifications  to  current  workstation  configurations  to 
maximize  their  fit  with  the  incumbent  employee.  In  the  second 
condition,  new  workstation  accessories  (keyboard  tray,  monitor 
holder,  document  holder,  wrist  rest,  footrest,  and  task  lighting) 
were  added  as  needed  to  improve  the  employees’  fit  with  their 
workstations  and  general  environment.  In  the  third  condition, 
the  same  factors  as  the  second  condition  were  added  but  in  ad¬ 
dition,  a  new  chair  with  multiple  adjustments  was  also  added. 

Eighty  volunteer  subjects  participated.  They  were  drawn 
from  a  larger  pool  of  volunteers.  The  participants  for  the  third 
condition  were  randomly  selected  from  the  larger  pool,  and 
then  subjects  for  conditions  one  and  two  were  matched  to 
these  selections  based  on  type  of  job,  age,  gender,  and  length  of 
experience  with  the  company.  Baseline  measurements  of  self- 
reported  health  status  were  collected  using  a  questionnaire  sur¬ 
vey.  Follow-up  measurements  were  taken  directly  after  imple¬ 
mentation  of  the  ergonomic  improvements  and  then  12  months 
later.  In  addition,  productivity  measurements  were  obtained  for 
each  participant  and  a  control  group  of  approximately  375  em¬ 
ployees  in  the  same  departments.  The  results  indicated  that 
subjects  working  under  conditions  two  and  three  showed  re¬ 
ductions  in  the  extent  and  intensity  of  musculoskeletal  health 
complaints,  but  not  the  subjects  in  condition  one.  However,  the 
subjects  in  condition  one  showed  greater  average  improvement 
in  productivity  than  in  conditions  two  and  three  and  to  the  con¬ 
trol  group  receiving  no  treatment.  Of  importance  to  designers 
was  the  finding  that  not  all  subjects  showed  improved  produc¬ 
tivity  with  ergonomic  improvements.  In  fact  about  one-half  of  the 
subjects  showed  reduced  productivity  with  the  ergonomic  im¬ 
provements,  even  though  the  overall  average  for  the  ergonomic 
improvements  showed  a  positive  effect. 
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GUIDELINES  FOR  FIXED  COMPUTER 
WORKSTATION  DESIGN  FEATURES 


Based  on  the  findings  from  several  field  studies  and  standards 
dealing  with  computer  workstation  design  (ANSI/HFES-100, 1988; 
CSA,  1989;  BSR/HFES-100,  2005)  the  following  general  guidance 
is  proposed  for  the  design  of  fixed  computer  workstations. 

The  recommended  size  of  the  work  surface  is  dependent  upon 
the  tasks,  the  documents,  and  the  technologies  being  used.  The 
primary  working  surface  (e.g.,  those  supporting  the  keyboard,  the 
mouse,  the  displays,  and  documents)  should  be  sufficient  to: 

1.  Allow  the  display  screens  to  be  moved  forward  or  backward 
for  a  comfortable  viewing  distance  for  different  employees 
with  differing  visual  capabilities. 

2.  Allow  a  detachable  keyboard  and  a  detachable  mouse  (or 
other  pointing  device)  to  be  placed  in  several  locations  on 
the  working  surface. 

3.  Allow  source  documents  to  be  positioned  for  easy  viewing 
and  proper  musculoskeletal  alignment  of  the  upper  extrem¬ 
ities  and  the  back. 

4.  Additional  working  surfaces  (e.g.,  secondary  working  sur¬ 
faces)  may  be  necessary  to  store,  layout,  read,  write  on,  or 
manipulate  documents,  materials,  or  technologies  (input  de¬ 
vices,  displays,  computers,  PIDs). 

5.  Provide  adequate  knee  and  legroom  for  repositioning  move¬ 
ments  while  working. 

6.  The  tabletop  should  be  as  thin  as  possible  for  better  thigh 
and  knee  clearance. 

7.  Establish  a  comfortable  table  height  that  provides  the  nec¬ 
essary  thigh  and  knee  clearance,  and  which  allows  input 
devices  (keyboard,  mouse)  to  be  at  comfortable  heights.  Ad¬ 
justable  tables  provide  the  opportunity  to  better  fit  the 
users. 

Sometimes  computer  workstations  are  configured  so  that 
multiple  pieces  of  equipment  and  source  materials  can  be 
equally  accessible  to  the  user.  In  this  case,  additional  working 
surfaces  are  necessary  to  support  these  additional  tools  and 
should  be  arranged  to  allow  for  easy  movement  from  one  sur¬ 
face  to  another.  Proper  clearances  under  each  working  surface 
should  be  maintained,  as  well  as  a  comfortable  height. 

It  is  important  to  provide  unobstructed  room  under  the 
working  surface  for  the  feet  and  legs  so  that  operators  can  eas¬ 
ily  shift  their  posture.  Knee  space  height  and  width  and  toe 
depth  are  the  three  key  factors  for  the  design  of  clearance  space 
under  the  working  surfaces.  The  recommended  minimum  width 
for  leg  clearance  is  51  cm,  while  the  preferred  minimum  width 
is  61  cm  (ANSI/HFS-100,  1988).  The  BSR/HFES-100  (2005)  guid¬ 
ance  calls  for  minimum  width  of  leg  clearance  of  50  cm.  The  min¬ 
imum  depth  under  the  work  surface  from  the  operator  edge  of 
the  work  surface  should  be  38  cm  for  clearance  at  the  knee  level 
and  59  cm  at  the  toe  level  (ANSI/HFS-100, 1988).  The  BSR/HFES- 
100  (2005)  recommends  a  minimum  depth  at  the  extended  leg 
toe  level  of  60  cm.  A  good  workstation  design  accounts  for  in¬ 
dividual  body  sizes  and  often  exceeds  minimum  clearances  to 
allow  for  free  postural  movement. 


Table  height  has  been  shown  to  be  an  important  contribu¬ 
tor  to  computer  user  musculoskeletal  problems  (Hunting, 
Laubli,  &  Grandjean,  1981;  Grandjean,  Nishiyama,  Hunting  &  Pi- 
dermann,  1982;  Grandjean,  Hunting  &  Nishiyama,  1982,  1984; 
Grandjean,  Hunting,  &  Pidermann,  1983).  Normal  desk  height 
of  30  inches  (76  cm)  is  often  too  high  for  keyboard  and  mouse 
use  by  most  people.  It  is  desirable  for  table  heights  to  vary  with 
the  height  of  the  user,  particularly  if  the  chair  is  not  height  ad¬ 
justable.  Height  adjustable  working  surfaces  are  effective  for 
this.  Adjustable  multisurface  tables  encourage  good  posture  by 
allowing  the  keyboard,  mouse,  and  displays  to  be  independently 
adjusted  to  appropriate  keying,  pointing,  and  viewing  heights 
for  each  user  and  each  task.  Tables  that  cannot  be  adjusted  eas¬ 
ily  can  be  a  problem  when  used  by  multiple  users  of  differing 
sizes,  especially  if  the  chair  is  not  height  adjustable.  When  ad¬ 
justable  tables  are  used,  the  ease  of  making  the  adjustments  is 
essential.  Adjustments  should  be  easy  to  make,  and  operators 
should  be  instructed  on  how  to  adjust  the  workstation  to  be 
comfortable. 

Specifications  for  seated  working  surfaces’  heights  vary  with 
whether  the  table  is  adjustable  or  at  one  fixed  height,  and  with 
a  single  working  surface  or  multiple  working  surfaces.  The 
proper  height  for  a  nonadjustable  working  surface  is  about 
70  cm  (27.5  inches)  from  the  floor  to  top  of  the  working  sur¬ 
face.  However,  it  must  be  emphasized  that  fixed  height  table 
surfaces  fit  only  a  portion  of  the  user  population.  In  the  ab¬ 
sence  of  a  height  adjustable  chair,  such  fixed  height  table  sur¬ 
faces  will  produce  a  misfit  in  a  portion  of  the  users  that  could 
influence  musculoskeletal  comfort  and  pain. 

The  BSR/HFES-100  (2005)  guidance  requires  that  all  surfaces 
supporting  input  devices  must  be  height  adjustable  (or  a  com¬ 
bination  of  height  and  tilt  adjustable).  In  addition,  the  input- 
device  support  surfaces  should  provide  for  adjustment  fore  and 
aft  and  side-to-side  if  possible. 

Adjustable  tables  allow  vertical  adjustments  of  the  keyboard 
(and  mouse)  and  displays.  Some  allow  for  the  independent  ad¬ 
justment  of  the  keyboard  and  display.  For  a  single  adjustable 
seated  working  surface,  the  working-surface  height  adjustment 
range  should  be  between  60  cm  to  80  cm  (23.5-31.5  inches). 
The  BSR/HFES-100  (2005)  guidance  is  for  an  adjustment  range 
of  56  cm  to  72  cm  (22.0-28.5  inches).  In  the  case  of  indepen¬ 
dently  adjustable  working  surfaces  for  the  keyboard  and  the 
screen,  the  appropriate  height  ranges  are  59  cm  to  71  cm  (23.0- 
28.0  inches)  for  the  keyboard  surface  (ANSI/HFS-100,  1988), 
and  90  cm  to  115  cm  (35.5-45  inches)  for  the  screen  surface 
(Grandjean,  1987). 

The  BSR/HFES-100  (2005)  guidance  indicates  that  the  work¬ 
table  surfaces  should  provide  for  tilt  from  —15  degrees  to  +20 
degrees  if  possible. 

For  standing  only  workstations,  the  BSR/HFES-100  (2005) 
recommends  a  table  surface  height  adjustment  range  of  95  cm 
to  118  cm  (37.5-46.5  inches).  However,  if  both  height  and  tilt 
surface  adjustments  are  included,  then  the  range  of  height 
adjustment  recommended  is  from  78  cm  to  118  cm  (30.5-46.5 
inches)  with  tilt  adjustment  ranges  from  —45  degrees  to  +20 
degree.  Eor  sit  and  stand  workstations,  the  BSR/HFES-100  rec¬ 
ommendation  for  height  adjustment  of  the  input  device  sur¬ 
faces  is  between  56  cm  to  118  cm  (22.0-46.5  inches). 
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THE  DESIGN  OF  WORKSTATIONS 
FOR  USE  WITH  LAPTOP  COMPUTERS 


We  will  start  with  the  laptop  as  a  prime  example  of  the  influence 
of  portability  and  efficiency,  and  then  we  will  move  on  to  other 
portable  IT  devices.  Recently  the  Human-Computer  Interaction 
Committee  of  the  International  Ergonomics  Association  (lEA) 
produced  a  guideline  for  the  use  of  laptop  computers  to  improve 
ergonomic  conditions  (Saito  et  al.,  2000).  This  was  prompted 
by  the  ever-increasing  sales  of  laptops  and  the  replacement  of 
fixed  PCs  by  portable  laptops  on  fixed  working  surfaces. 

The  primary  advantage  of  the  laptop  is  easy  portability  so  the 
user  can  take  the  computer  anywhere  to  do  work.  He  or  she  can 
use  it  at  the  office,  take  it  home  to  finish  work,  or  take  it  with  him 
or  her  to  make  a  presentation  at  a  meeting.  The  convenience 
and  effectiveness  of  easy,  lightweight  portability  are  very  high. 
In  addition,  all  of  the  files  are  with  the  laptop,  so  nothing  is  mis¬ 
takenly  left  behind  at  the  office  (or  home).  However,  the  com¬ 
fort  and  health  factors  can  be  very  low  if  the  person  uses  the  lap¬ 
top  in  all  manner  of  environments,  workstations,  and  tasks  that 
diminish  the  consistent  application  of  good  ergonomic  princi¬ 
ples.  An  important  feature  of  the  lEA  laptop  guideline  (Saito, 
2000)  is  to  encourage  situations  of  use  that  mirror  the  best  prac¬ 
tices  of  ergonomic  conditions  for  fixed  computer  workstations. 

Work  Environment  and  Workstation  Layout,  “Create  an  environment 
that  fits  your  work” 

1.  Use  your  laptop  in  a  proper  environment  (lighting,  temperature, 
noise,  and  so  on).  In  particular,  make  sure  the  work  area  is  nei¬ 
ther  too  bright  nor  too  dark. 

2.  Allocate  enough  space  on  your  desk  when  placing  a  laptop. 
Chair  and  Desk,  ‘Adjust  chair  height  to  match  your  physique” 

1.  Adjust  your  chair  height  based  on  the  height  of  the  keyboard, 
such  that  your  forearm  is  parallel  to  the  surface  of  the  keyboard. 

2.  If  your  feet  do  not  lie  flat  on  the  floor,  provide  a  footrest. 

3.  Provide  enough  space  underneath  the  desk. 

Keyboard,  “Set  the  keyboard  to  a  desirable  angle,  and  use  a  palm  rest  if 
necessary” 

1.  Adjust  the  angle  of  the  keyboard  based  on  your  posture  and 
preferences. 

2.  Make  sure  there  is  space  in  front  of  the  keyboard  for  you  to  rest 
your  wrists  comfortably  (this  space  can  be  on  the  desktop  sur¬ 
face  itself  if  the  keyboard  is  thin). 

3.  If  the  keyboard  seems  difficult  to  use,  use  an  external  keyboard. 

Working  Posture,  ‘Avoid  unnatural  postures,  and  change  your  posture 
occasionally” 

1.  Avoid  staying  in  postures  where  you  are  bent  too  far  forward  or 
backward,  or  twisted,  for  an  extended  duration 

2.  Laptop  users  tend  to  view  the  display  from  too  close,  so  make 
sure  you  maintain  a  distance  of  at  least  40-50  cm  between  the 
display  and  your  eyes. 

3.  Alternate  near  vision  with  far  vision  (i.e.,  observe  object  located 
at  least  6  m  far)  as  frequent  as  possible. 

4.  Make  sure  your  wrists  are  not  at  an  unnatural  angle. 


Nonkeyboard  Input  Devices,  “Use  a  mouse  as  your  pointing  device  if  at 
all  possible” 

1.  If  a  mouse  can  be  connected  to  your  laptop,  then  do  so  as  often 
as  possible. Use  a  mouse  pad  whenever  you  use  a  mouse 

2.  When  you  cannot  connect  a  mouse,  make  sure  you  understand 
the  built-in  pointing  device,  and  use  the  pointing  device  appro¬ 
priately  (Saito  et  al.,  2000,  pp.  421-434) 

These  laptop  guidelines  provided  by  Saito  et  al.  (2000)  are 
useful  when  the  laptop  is  used  as  a  fixed  PC  at  a  docking  sta¬ 
tion  or  at  a  desk  (worktable) .  However,  they  do  not  provide  as 
much  help  in  the  situations  where  there  is  no  fixed  workstation. 
We  will  describe  some  situations  below  where  this  can  occur. 

Imagine  sitting  at  the  airport,  and  your  flight  has  been  de¬ 
layed  for  two  hours.  Your  have  your  laptop  with  you,  so  you  de¬ 
cide  the  get  some  work  done  while  you  wait.  You  could  rent  a 
cubicle  or  kiosk  at  the  airport  that  would  provide  you  with  a 
high-speed  Internet  connection,  a  stationary  telephone,  a  work¬ 
ing  surface  (desk  or  table),  a  height  adjustable  chair,  and  some 
privacy  (noise  control,  personal  space).  The  characteristics  of 
these  work  areas  do  not  often  conform  to  the  best  principles 
of  ergonomic  design.  It  is  likely  that  the  cubicle  will  provide 
some  improvement  over  sitting  with  the  laptop  on  your  lap,  but 
the  characteristics  may  not  meet  the  recommendations  pre¬ 
sented  in  this  chapter.  Such  situations  are  acceptable  for  short 
exposures  of  up  to  60  minutes,  but  longer  exposures  may  lead 
to  musculoskeletal  discomfort,  pain,  and  injury  (if  chronic). 
Now  imagine  you  have  been  told  to  stay  in  the  boarding  area  be¬ 
cause  the  departure  may  be  sooner  than  two  hours.  You  get  out 
your  laptop,  connect  it  to  your  cell  phone,  and  place  them  on 
your  lap.  (That  is  why  they  are  called  laptops.)  You  are  sitting 
in  a  nonadjustable  chair  with  poor  back  support. 

This  scenario  is  all  too  common.  You  can  walk  through 
O’Hare  International  Airport  on  any  given  day  and  see  hundreds 
of  people  sitting  at  the  boarding  gate  working  on  laptops  that  are 
sitting  on  their  laps.  Now  imagine  a  palm  held  device  that  allows 
you  to  access  your  e-mail  or  to  connect  to  the  Internet.  This  de¬ 
vice  can  be  operated  while  you  are  standing  in  line  at  the  air¬ 
port  to  check  in,  or  sitting  at  the  boarding  gate  like  the  laptop 
users.  You  can  stand  or  sit  punching  at  miniature  buttons  (some¬ 
times  with  a  stylus  because  they  are  so  small)  and  interact  with 
the  interconnected  world.  Again,  this  scene  is  all  too  familiar  in 
almost  any  venue  (e.g.,  airport,  restaurant,  street,  office). 

In  situations  where  there  is  not  a  fixed  workstation,  the  de¬ 
vice  is  typically  positioned  wherever  it  is  convenient.  Very  often 
such  positioning  creates  bad  postures  for  the  legs,  back,  shoul¬ 
ders,  arms,  wrists/hands,  or  neck.  In  addition,  the  smaller  di¬ 
mensions  of  the  manual  input  devices  (touch  pad,  buttons,  key¬ 
board,  joystick,  roller  ball)  make  motions  much  more  difficult, 
and  these  often  produce  constrained  postures  and/or  the  use  of 
too  much  force  to  operate  the  device.  If  the  devices  are  used 
continuously  for  a  prolonged  period  (such  as  one  hour  or  more), 
muscle  tension  builds  up  and  discomfort  in  joints,  muscles,  liga¬ 
ments,  tendons,  and  nerves  can  occur.  Some  devices  use  voice 
and  audio  interfaces  and  can  be  used  when  you  are  walking  (or 
even  running) .  These  might  be  headsets  with  earplugs  and  a  mi¬ 
crophone.  These  devices  put  additional  load  on  the  neck,  and 
the  voice  and  audio  interfaces  can  strain  the  voice  or  the  ears. 
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We  know  very  little  about  the  discomfort,  health  effects,  and 
psychosocial  effects  of  these  new  strains.  However,  it  is  intu¬ 
itive  that  frequent  use  of  these  interfaces  will  lead  to  strain  and 
potential  adverse  consequences. 

To  reduce  the  undesirable  effects  of  the  poor  workstation 
characteristics  that  lead  to  the  discomfort  the  following  recom¬ 
mendations  are  given: 

1.  If  you  are  using  a  laptop  on  your  lap,  find  a  work  area  where 
you  can  put  the  laptop  on  a  table  (rather  than  on  your  lap). 
Then  arrange  the  work  area  as  closely  as  possible  with  the 
recommendations  presented  above  in  the  lEA  laptop  guide¬ 
lines  (Saito,  2000). 

2.  If  you  are  using  a  handheld  PID,  you  should  position  yourself 
so  that  your  back  is  supported.  It  is  preferable  to  use  the  de¬ 
vice  sitting  down.  Of  course,  if  you  are  using  the  PID  as  you 
are  walking,  then  this  is  not  possible.  If  the  PID  has  a  voice 
interface,  then  use  an  earpiece  and  a  microphone  so  that 
you  do  not  have  to  hold  it  with  your  hand.  Be  sure  not  to 
overuse  the  device  such  that  your  voice  and  ears  are  strained. 
Take  frequent  breaks  from  use  to  provide  recovery  for  your 
voice  and  ears. 

3.  Never  work  in  poor  postural  conditions  for  more  than  30  min¬ 
utes  continuously.  Take  at  least  a  5-minute  break  (preferably 
10  minutes)  away  from  the  laptop/PID  use,  put  the  device 
down  (away),  get  up  and  stretch  for  1  minute  or  more,  and 
then  walk  for  2-3  minutes  (unless  you  are  already  walking 
in  which  case  you  should  sit  down  for  5  minutes).  If  you  are 
using  a  handheld  PID  in  a  standing  position,  then  during 
your  break  put  it  away,  do  1  minute  of  stretching,  and  then 
sit  down  for  4  minutes.  That  may  mean  sitting  on  the  floor, 
but  preferably,  you  will  sit  where  you  can  support  your  back 
(against  a  wall,  or  a  seat  back). 

4.  Buy  equipment  that  provides  the  best  possible  input  inter¬ 
faces  and  displays  (screens,  headphones,  typing  pads).  Since 
these  devices  are  small,  the  perceptual  motor  requirements 
for  their  use  are  much  more  difficult  (sensory  requirements, 
motion  patterns,  skill  requirements,  postural  demands,  and 
force  demands).  Therefore,  screens  should  provide  easily 
readable  characters  (large,  understandable),  and  input  but¬ 
tons  should  be  easy  to  operate  (large,  properly  spaced,  eas¬ 
ily  accessible,  low  force). 

5.  Only  use  these  devices  when  you  do  not  have  access  to  fixed 
workstations  that  have  better  ergonomic  characteristics.  Do 
not  use  these  devices  continuously  for  more  than  30  minutes. 


PIDS,  CARRY  ALONG,  AND  WEARABLE  IT 


We  have  already  discussed  some  ergonomic  issues  of  using 
PIDs.  We  can  foresee  an  even  greater  potential  for  ergonomic 
concerns  with  the  use  of  PIDs  than  with  laptop  computers.  PIDs 
are  made  to  be  as  small  and  light  as  possible  for  portability  and 
convenience  of  carrying.  This  is  good  as  the  lower  weight  pro¬ 
duces  smaller  loads  of  the  body.  Small  sized  devices  are  more 
difficult  to  manipulate  with  the  hands,  and  to  observe  displays 
with  the  eyes.  This  has  led  many  designers  to  emphasize  ver¬ 
bal,  auditory,  and  haptic  interfaces  that  do  not  require  substan¬ 


tial  manipulation  by  the  hands  or  good  vision  (Hirose  &  Hirota, 
2005;  HCII2005,  2005).  In  fact,  several  new  PIDs  have  the  capa¬ 
bility  to  communicate  with  other  IT  devices  without  human  in¬ 
tervention  (HCII2205,  2005). 

Now,  we  will  explore  some  current  applications  of  PIDs  and 
some  possible  future  applications  to  see  where  ergonomic  and 
workstation  issues  might  emerge.  Millions  and  millions  of  cell 
phones  are  in  use  worldwide,  and  cell  phones  are  a  good  repre¬ 
sentative  of  the  PID.  There  is  virtually  nowhere  in  the  world 
where  cell  phones  cannot  be  used  (with  a  few  exceptions),  and 
there  is  virtually  no  one  (with  adequate  finances)  who  cannot  ac¬ 
cess  a  cell  phone  and  connect  with  the  world.  Cell  phones  can 
have  many  capabilities  including  telephoning,  e-mailing,  texting, 
auditory  streaming,  internet  surfing,  walkie-talkie  communicat¬ 
ing,  photographing,  video  camera  picturing  and  recording,  and 
television/cable/satellite  broadcast  receiving.  Even  with  all  of 
these  built  in  features,  the  size  of  cell  phones  is  shrinking.  As  cell 
phones  shrink,  the  manual,  hand  interfaces  are  getting  smaller, 
as  are  the  visual  displays.  With  their  small  size  cell  phone  can 
be  carried  easily  and  can  be  used  when  a  person  is  walking,  sit¬ 
ting,  running,  lying  down,  or  hanging  upside  down.  What  are  the 
ergonomic  concerns  with  their  use,  and  in  particular  the  work¬ 
station  design  issues? 

Small  manual  interfaces  and  displays  make  the  accurate  and 
comfortable  application  of  perceptual-motor  skills  difficult  (Al¬ 
bers  &  Kim,  2002;  Hinckley,  2003;  Haggerty  &  Tarasewich,  2005; 
Myers  &  Wobbrock,  2005).  A  cell  phone  can  be  held  in  one  hand 
(this  hand  becomes  the  workstation)  and  then  be  manipulated 
with  the  other  hand.  In  some  instances,  the  hand  holding  the 
cell  phone  is  also  used  to  manipulate  the  manual  interface. 
Either  of  these  situations  leads  to  workstation  conditions  where 
the  users  cannot  apply  their  highest  level  of  perceptual-motor 
skills.  For  example,  they  cannot  use  both  hands  for  inputting 
into  the  interface,  the  device  is  held  at  an  awkward  angle  for 
manual  inputting,  or  the  posture  of  the  trunk  is  unstable  which 
limits  the  capacity  to  use  the  hands  effectively. 

Due  to  this,  a  preferred  form  of  input  to  control  the  action  of 
the  PID  is  speech  (Gamm  &  Haeb-Umback,  1995;  Karat,  Vergo, 
&  Nabamoo,  2003),  while  the  displays  are  typically  a  combina¬ 
tion  of  visual  and  auditory  information.  These  interfaces  may 
lead  to  ergonomic  problems  of  overuse  of  the  voice,  increased 
duration  of  mental  concentration,  increased  eyestrain,  visual 
discomfort,  and  increased  error  rates  of  the  communication 
with  the  devices.  No  body  of  research  data  tells  us  if  these  prob¬ 
lems  are  or  will  become  prevalent  among  PID  users  or  if  there 
will  be  long-term  comfort,  health,  and  performance  effects.  For 
now,  we  can  only  conjecture  about  the  possibilities  for  prob¬ 
lems.  It  is  clear  that  many  people  are  using  PIDs  and  other  com¬ 
puter-based  technologies  many  hours  per  day,  and  this  in¬ 
creased  extent  of  use  will  likely  lead  to  discomfort  and  some 
health  effects  for  the  voice,  ears,  upper  extremities,  and  mental 
stress  (Eklof,  Gustafsson,  Hagberg,  &  Thomee,  2005). 

Now  we  will  discuss  workstation  issues  with  PIDs,  using  the 
cell  phone  as  an  example.  We  will  go  back  to  the  airport  and  I 
will  use  my  cell  phone  to  communicate  with  my  office.  At  this 
time,  I  am  standing  in  a  long  line  at  the  airline  check  in  counter. 
My  cell  phone  has  voice  activation  and  control  capability.  I  ask 
my  cell  phone  for  my  e-mail  service  provider,  and  my  e-mails 
come  up  on  the  display  screen.  I  am  using  my  thumb  to  scroll 
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through  the  e-mails  one  at  a  time.  I  hold  the  cell  phone  close 
to  my  face  to  enhance  my  ability  to  read  the  display  screen.  I 
shuffle  forward  as  the  line  moves  toward  the  airline  counter 
(automated  check  in  station).  I  am  standing,  and  I  have  no  pos¬ 
tural  support  for  my  back,  buttocks,  and  legs.  It  is  true  that  I 
would  be  standing  in  line  feeling  the  postural  strain  even  if  I 
were  not  using  my  cell  phone.  However,  my  cell  phone  use  adds 
extra  postural  loading,  since  I  am  manually  (vocally)  manipulat¬ 
ing  the  interface  to  operate  the  e-mail  processing.  The  small  vi¬ 
sual  display  may  create  eyestrain,  and  if  I  talk  on  the  cell  phone 
for  too  long,  it  may  create  voice  strain. 

One  way  to  reduce  the  load  would  be  if  I  had  a  worksta¬ 
tion  where  I  could  sit  and  support  my  back,  buttocks,  legs,  and 
even  my  arms  and  elbows  as  I  manually  manipulate  the  inter¬ 
face.  I  could  use  a  vocal  and  auditory  interface  with  a  head 
mounted  earplug  and  microphone  (headset),  and  thus  elimi¬ 
nate  the  manual  input  and  visual  display  viewing.  The  head¬ 
set  is  a  workstation  improvement  that  reduces  the  manual  ma¬ 
nipulation  and  some  of  the  postural  loading  on  the  upper 
extremities  and  back.  The  headset  adds  some  weight  to  the 
head,  and  this  increases  loading  on  the  neck,  shoulders,  and 
back.  One  difficulty  with  using  an  auditory  interface  is  the  high 
ambient  noise  in  the  airport  lounge  that  interferes  with  and 
masks  the  cell  phone’s  auditory  signals.  To  reduce  this  audi¬ 
tory  interference,  a  helmet  with  acoustical  privacy  could  serve 
as  a  workstation  improvement.  However,  a  helmet  adds  sub¬ 
stantial  weight  to  the  head,  which  puts  increased  loading  on 
the  neck,  shoulders,  and  back.  A  major  workstation  improve¬ 
ment  for  users  waiting  in  line,  that  would  reduce  the  postural 
loading  when  using  a  cell  phone,  is  to  provide  chairs  so  a  user 
can  sit  down.  A  chair  provides  postural  support  for  the  back, 
buttocks,  and  legs.  A  chair  on  casters  or  wheels  that  can  be 
scooted  along  as  the  line  moves  forward  could  be  beneficial. 
However,  this  could  increase  the  congestion  in  the  lounge 
area  as  people  are  moving  through  the  line.  This  example  il¬ 
lustrates  the  new  loads  that  may  be  added  to  technology  users’ 
lives  by  the  use  of  PIDs.  We  will  reiterate  some  of  the  advice 
from  the  laptop  section  as  this  is  also  applicable  to  the  use  of 
PIDs  and  other  portable  devices. 

To  reduce  the  undesirable  effects  of  a  lack  of  a  workstation 
that  can  lead  to  the  users’  discomfort  the  following  recommen¬ 
dations  are  given: 

1.  If  you  are  using  a  laptop  on  your  lap,  find  a  work  area  where 
you  can  put  the  laptop  on  a  table  (rather  than  on  your  lap) . 
Then  arrange  the  work  area  as  closely  as  possible  with  the 
recommendations  presented  above  in  the  lEA  laptop  guide¬ 
lines  (Saito,  2000). 

2.  Never  work  in  poor  postural  conditions  for  more  than  30  min¬ 
utes  continuously.  Take  at  least  a  5-minute  break  (preferably 
10  minutes)  away  from  the  laptop/PID  use,  put  the  device 
down  (away),  get  up  and  stretch  for  1  minute  or  more,  and 
then  walk  for  2-3  minutes  (unless  you  are  already  walking 
in  which  case  you  should  sit  down  for  5  minutes).  If  you  are 
using  a  handheld  PID  in  a  standing  position,  then  during 
your  break  put  it  away,  do  1  minute  of  stretching,  and  then 
sit  down  for  4  minutes.  That  may  mean  sitting  on  the  floor, 
but  preferably,  you  will  sit  where  you  can  support  your  back 
(against  a  wall,  or  a  seat  back). 


3.  Buy  equipment  that  provides  the  best  possible  input  inter¬ 
faces  and  displays  (screens,  headphones,  typing  pads).  Since 
these  devices  are  small,  the  perceptual  motor  requirements 
for  their  use  are  much  more  difficult  (sensory  requirements, 
motion  patterns,  skill  requirements,  postural  demands,  and 
force  demands).  Therefore,  screens  should  provide  easily 
readable  characters  (large,  understandable),  and  input  but¬ 
tons  should  be  easy  to  operate  (large,  properly  spaced,  eas¬ 
ily  accessible,  low  force). 

4.  Do  not  use  these  devices  continuously  for  more  than 
30  minutes. 


THE  CHAIR  AS  A  CRITICAL  ELEMENT 
OF  THE  WORKSTATION 


It  was  not  until  the  last  40  years  that  sitting  posture  and  chairs 
(seats)  became  topics  for  scientific  research,  especially  for  er¬ 
gonomics  and  orthopedics.  Studies  have  revealed  that  the  sitting 
position,  as  compared  to  the  standing  position,  reduces  static 
muscular  efforts  in  legs  and  hips,  but  increases  the  physical  load 
on  the  intervertebral  discs  in  the  lumbar  region  of  the  spine. 

The  debate  over  what  constitutes  proper-seated  posture  is 
not  yet  fully  resolved.  Is  an  upright-seated  posture  most  healthy, 
or  is  a  relaxed  posture  with  a  backward-leaning  trunk  health¬ 
ier?  Interesting  experiments  by  the  Swedish  surgeons  Nachem- 
son  and  Elfstrom  (1970)  and  Andersson  and  Ortengreen  (1974) 
offered  some  guidance  about  this.  These  authors  measured  the 
pressure  inside  the  intervertebral  discs  as  well  as  the  electrical 
activity  of  the  back  muscles  in  relation  to  different  sitting  pos¬ 
tures.  When  the  backrest  angle  of  the  seat  was  increased  from 
90  to  120  degrees,  subjects  exhibited  an  important  decrease  of 
the  intervertebral  disc  pressure  and  of  the  electromyographic 
activity  of  the  back.  Since  heightened  pressure  inside  interver¬ 
tebral  discs  means  that  they  have  more  stress,  it  was  concluded 
that  a  sitting  posture  with  reduced  disc  pressure  is  more  healthy 
and  desirable. 

The  results  of  the  Swedish  studies  indicated  that  leaning  the 
back  against  an  inclined  backrest  transfers  some  of  the  weight  of 
the  upper  part  of  the  body  to  the  backrest.  This  reduces  con¬ 
siderably  the  physical  load  on  the  intervertebral  discs  and  the 
static  strain  of  the  back  and  shoulder  muscles.  Thus,  some  com¬ 
puter  users  seem  to  instinctively  get  into  the  proper  posture 
when  they  leaning  backwards. 

Most  ergonomic  standards  for  computer  workstations  are 
based  on  a  more  traditional  view  about  a  healthy  sitting  posture. 
Mandal  (1982)  reported  that  the  “correct  seated  position”  goes 
back  to  1884  when  the  German  surgeon  Staffel  recommended 
the  well-known  upright  position.  Mandal  stated: 

But  no  normal  person  has  ever  been  able  to  sit  in  this  peculiar  position 
(upright  trunk,  inward  curve  of  the  spine  in  the  lumbar  region  and 
thighs  in  a  right  angle  to  the  trunk)  for  more  than  1-2  minutes,  and 
one  can  hardly  do  any  work  as  the  axis  of  vision  is  horizontal.  Staffel 
never  gave  any  real  explanation  why  this  particular  posture  should  be 
better  than  any  other  posture.  Nevertheless,  this  posture  has  been  ac¬ 
cepted  ever  since  quite  uncritically  by  all  experts  all  over  the  world  as 
the  only  correct  one. 
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The  sitting  posture  of  students  in  the  lecture  hall  or  of  any 
other  audience  is  very  seldom  a  “correct  upright  position  of  the 
trunk.”  On  the  contrary,  most  people  lean  backward  (even  with 
unsuitable  chairs)  or,  in  some  cases,  lean  forward  with  elbows 
resting  on  the  desk.  These  two  preferred  trunk  positions  are 
probably  associated  with  a  substantial  decrease  of  intervertebral 
disc  pressure,  as  well  as  lessened  tension  of  muscles  and  other 
tissues  in  the  lumbar  and  thoracic  spine  (Nachemson  &  Elf- 
strom,  1970;  Andersson  &  Ortengreen,  1974).  Thus,  sitting  in 
general  and  particularly  when  using  a  computer  is  probably 
most  comfortable  and  healthy  when  leaning  back,  or  when  the 
arms  are  supported  while  leaning  forward. 

Poorly  designed  chairs  can  contribute  to  computer  user  dis¬ 
comfort.  Chair  adjustability  in  terms  of  height,  seat  angle,  and 
lumbar  support  helps  to  provide  trunk,  shoulder,  neck,  and  leg 
postures  that  reduce  strain  on  the  muscles,  tendons,  and  discs. 
The  postural  support  and  action  of  the  chair  help  maintain 
proper-seated  posture  and  encourage  good  movement  pat¬ 
terns.  A  chair  that  provides  swivel  action  encourages  movement 
while  backward  tilting  increases  the  number  of  postures  that 
can  be  assumed. 

The  chair  height  should  be  adjustable  so  that  the  computer 
operator’s  feet  can  rest  firmly  on  the  floor  with  minimal  pres¬ 
sure  beneath  the  thighs.  The  minimum  range  of  adjustment  for 
seat  pan  height  should  be  between  38  and  56  cm  (15-22  inches) 
to  accommodate  a  wide  range  of  stature  (BSR/HFES,  2005). 

To  enable  short  users  to  sit  with  their  feet  on  the  floor  with¬ 
out  compressing  their  thighs,  it  may  be  necessary  to  add  a 
footrest.  A  well-designed  footrest  has  the  following  features: 

1.  It  is  inclined  upwards  slightly  (about  5-15  degrees). 

2.  It  has  a  nonskid  surface. 

3.  It  is  heavy  enough  that  it  does  not  slide  easily  across  the 

floor. 

4.  It  is  large  enough  for  the  feet  to  be  firmly  planted. 

5.  It  accommodates  persons  of  different  stature. 

The  seat  pan  is  where  the  person  sits  on  the  chair.  This  part 
of  the  chair  directly  supports  the  weight  of  the  buttocks.  The 
seat  pan  should  be  wide  enough  to  permit  operators  to  make 
slight  shifts  in  posture  from  side  to  side.  This  not  only  helps  to 
avoid  static  postures,  but  also  accommodates  a  large  range  of 
individual  buttock  sizes.  The  seat  pan  should  not  be  overly 
U-shaped  because  this  can  lead  to  static  sitting  postures.  The 
minimum  seat  pan  width  should  be  46  cm  (18  inches),  and  the 
depth  between  38  and  43  cm  (15-17  inches)  (BSR/HFES,  2005; 
ANSI/HEES-lOO,  1988).  The  front  edge  of  the  seat  pan  should  be 
well  rounded  downward  to  reduce  pressure  on  the  underside 
of  the  thighs  that  can  affect  blood  flow  to  the  legs  and  feet.  This 
feature  is  often  referred  to  as  a  “waterfall”  design.  The  seat 
needs  to  be  padded  to  the  proper  firmness  that  ensures  an  even 
distribution  of  pressure  on  the  thighs  and  buttocks.  A  properly 
padded  seat  should  compress  about  one-half  inch  to  one  inch 
when  a  person  sits  on  it. 

Some  experts  feel  that  the  seat  front  should  be  elevated 
slightly  (up  to  seven  degrees),  while  others  feel  it  should  be 
lowered  slightly  (about  five  degrees).  There  is  some  disagree¬ 
ment  among  the  experts  about  the  correct  answer.  Due  to  this 
disagreement,  many  chairs  allow  for  both  front  and  backward 


angling  of  the  front  edge  of  the  seat  pan.  The  operator  can  then 
angle  the  chair’s  front  edge  to  a  comfortable  position.  The  BSR/ 
HFES  (2005)  guidelines  recommend  a  user  adjustable  range  of 
at  least  six  degrees  of  forward  and  backward  adjustment  with 
at  reclined  position  of  three  degrees.  The  seat  pan  height  and 
angle  adjustments  should  be  accessible  and  easy  to  use  from  a 
seated  position. 

The  tension  and  tilt  angle  of  the  chair’s  backrest  should  be 
adjustable.  Inclination  of  chair  backrest  is  important  for  opera¬ 
tors  to  be  able  to  lean  forward  or  back  in  a  comfortable  man¬ 
ner  while  maintaining  a  correct  relationship  between  the  seat 
pan  angle  and  the  backrest  inclination.  A  backrest  inclination 
of  about  110  degrees  is  considered  an  appropriate  posture  by 
many  experts.  However,  studies  have  shown  that  operators  may 
incline  backwards  as  much  as  125  degrees,  which  also  is  an  ap¬ 
propriate  posture.  Backrests  that  tilt  to  allow  an  inclination  of 
up  to  125  degrees  are  therefore  a  good  idea.  The  backrest  tilt 
adjustments  should  be  accessible  and  easy  to  use.  An  advan¬ 
tage  of  having  an  independent  tilt  angle  adjustment  is  that  the 
backrest  tilt  will  then  have  little  or  no  effect  on  the  front  seat 
height  or  angle.  This  also  allows  operators  to  shift  postures 
readily.  BSR/HFES  (2005)  recommends  backrest  adjustment  of 
up  to  105  degrees  must  be  achievable  with  up  to  120  degrees 
preferred. 

Chairs  with  high  backrests  are  preferred  since  they  provide 
support  to  both  lower  back  and  the  upper  back  (shoulder). 
This  allows  employees  to  lean  backward  or  forward,  adopting 
a  relaxed  posture  and  resting  the  back  and  shoulder  muscles. 
A  full  backrest  with  a  height  around  45-51  cm  (18-20  inches) 
is  recommended.  BSR/HFES  (2005)  recommends  a  backrest 
be  at  least  45  cm  (18  inches)  high.  To  prevent  back  strain  it  is 
also  recommended  that  chairs  have  lumbar  (midback)  sup¬ 
port,  since  the  lumbar  region  is  one  of  the  most  highly 
strained  parts  of  the  spine  when  sitting.  BSR/HFES  (2005)  rec¬ 
ommends  that  the  width  of  the  backrest  should  be  at  least  36  cm 
(14  inches). 

For  most  computer  workstations,  chairs  with  rolling  castors 
or  wheels  are  desirable:  They  are  easy  to  move  and  facilitate 
postural  adjustment,  particularly  when  the  operator  has  to 
reach  for  equipment  or  materials  that  are  on  the  secondary 
working  surfaces.  Chairs  should  have  five  supporting  legs. 

Another  important  chair  feature  is  armrests.  Both  pros  and 
cons  to  the  use  of  armrests  at  computer  workstations  have 
been  advanced.  On  the  one  hand,  some  chair  armrests  can 
present  problems  of  restricted  arm  movement,  interference 
with  the  operation  of  input  devices,  pinching  of  fingers  be¬ 
tween  the  armrest  and  table,  restriction  of  chair  movement, 
such  as  under  the  worktable,  irritation  of  the  arm  or  elbows 
due  to  tissue  compression  when  resting  on  the  armrest,  and 
adoption  of  awkward  postures.  Properly  designed  armrests 
can  overcome  the  problems  mentioned  above.  Armrests  can 
provide  support  for  resting  the  arms  to  prevent  or  reduce  arm, 
shoulder,  and  neck  fatigue.  Removable  armrests  are  an  advan¬ 
tage  because  they  provide  greater  flexibility  for  individual  op¬ 
erator  preference.  For  specific  tasks  such  as  using  a  numeric 
keypad,  a  full  armrest  can  be  beneficial  in  supporting  the  arms. 
Many  chairs  have  height  adjustable  armrests  that  are  helpful 
for  operator  comfort,  and  some  allow  for  adjusting  the  angle 
of  the  armrests  as  well. 
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ADDITIONAL  WORKSTATION  CONSIDERATIONS 


Providing  the  capability  for  the  screen  to  swivel  and  tilt  up  and 
down  gives  the  user  the  ability  to  better  position  the  screen  for 
easier  viewing.  Reorientation  of  the  screen  around  its  vertical 
and  horizontal  axes  can  help  to  position  a  screen  to  reduce 
screen  reflections  and  glare.  Reflections  can  be  reduced  by  sim¬ 
ply  tilting  the  display  slightly  back  or  down,  or  to  the  left  or  right 
away  from  the  source  of  glare.  The  perception  of  screen  reflec¬ 
tions  depends  not  only  upon  screen  tilt,  but  also  upon  the  op¬ 
erator’s  line  of  sight. 

An  important  component  of  the  workstation  that  can  help 
reduce  musculoskeletal  loading  is  a  document  holder.  When 
properly  designed,  proportioned,  and  placed,  document  hold¬ 
ers  reduce  awkward  inclinations  of  the  head  and  neck  and  fre¬ 
quent  movements  of  the  head  up  and  down  and  back  and  forth. 
They  permit  source  documents  to  be  placed  in  a  central  loca¬ 
tion  at  the  same  viewing  distance  as  the  computer  screen.  This 
eliminates  needless  head  and  neck  movements  and  reduces 
eyestrain.  In  practice,  some  flexibility  about  the  location,  ad¬ 
justment,  and  position  of  the  document  holder  should  be  main¬ 
tained  to  accommodate  both  task  requirements  and  operator 
preferences.  Dainoff  (1982)  showed  the  effectiveness  of  an  in¬ 
line  document  holder.  The  document  holder  should  have  a 
matte  finish  so  that  it  does  not  reflect  light. 

Privacy  requirements  include  both  visual  and  acoustical  con¬ 
trol  of  the  workplace.  Visual  control  prevents  physical  intru¬ 
sions,  contributes  to  confidential/private  conversations,  and  pre¬ 
vents  the  individual  from  feeling  constantly  watched.  Acoustical 
control  prevents  distracting  and  unwanted  noise  (from  machine 
or  conversation)  and  permits  speech  privacy.  While  certain 
acoustical  methods  and  materials,  such  as  freestanding  panels, 
are  used  to  control  general  office  noise  level,  they  can  also  be 
used  for  privacy.  Planning  for  privacy  should  not  be  made  at  the 
expense  of  visual  interest  or  spatial  clarity.  For  instance,  provid¬ 
ing  wide  visual  views  can  prevent  the  individual  from  feeling  iso¬ 
lated.  Thus,  a  balance  between  privacy  and  openness  enhances 
user  comfort,  work  effectiveness,  and  office  communications. 
Involving  the  employee  in  decisions  of  privacy  can  help  in  de¬ 
ciding  the  compromises  between  privacy  and  openness. 

The  use  of  a  wrist  rest  when  keying  can  help  to  minimize 
extension  (backward  bending)  of  the  hand/wrist,  but  the  use 
of  a  wrist  rest  for  operator  comfort  and  health  has  generated 
some  debate  because  there  are  trade-offs  for  comfort  and  health. 
When  the  hand  or  wrist  is  resting  on  the  wrist  rest,  there  is  com¬ 
pression  of  the  tissue,  which  may  create  increased  carpal  canal 
pressure  or  local  tissue  ischemia.  On  the  other  hand,  the  wrist 
rest  allows  the  hands  and  shoulders  to  be  supported  with  less 
muscular  tension,  which  is  beneficial  to  computer  operator 
comfort.  At  this  time,  there  is  no  scientific  evidence  that  the 
use  of  a  wrist  rest  either  causes  or  prevents  serious  muscu¬ 
loskeletal  disorders  of  the  hands,  wrists,  or  shoulders.  Thus,  the 
choice  to  use  a  wrist  rest  should  be  based  on  employee  com¬ 
fort  and  performance  considerations  until  scientific  evidence 
suggests  otherwise. 

If  used,  the  wrist  rest  should  have  a  broad  surface  (five  cm 
minimum)  with  a  rounded  front  edge  to  prevent  cutting  pres¬ 
sure  on  the  wrist  and  hand.  Padding  further  minimizes  skin 


compression  and  irritation.  Height  adjustability  is  important  so 
that  the  wrist  rest  can  be  set  to  a  preferred  level  in  concert  with 
the  keyboard  height  and  slope. 

Arm  holders  are  also  available  to  provide  support  for  the 
hands,  wrists,  and  arms  while  keyboarding  and  have  shown  to 
be  useful  for  shoulder  comfort.  The  placement  of  the  arm 
holder  should  not  induce  awkward  postures  in  its  use.  The  de¬ 
vice  should  be  placed  within  easy  reach  of  the  operator,  espe¬ 
cially  when  it  will  be  used  frequently  during  work. 

When  keyboard  trays  are  used,  they  should  allow  for  the 
placement  of  other  input  devices  directly  on  the  tray  instead  of 
on  other  working  surfaces. 


THE  VISUAL  ENVIRONMENT 


Visual  displays  tend  to  be  of  two  types,  the  cathode  ray  tube 
(CRT)  and  the  diode  matrix  (LED/LCD)  flat  panels.  Fixed  work¬ 
stations  with  personal  computers  tend  to  use  CRTs  because  they 
are  cheap,  while  laptop  computers  and  hand-held  devices  are  al¬ 
most  exclusively  flat  panel  due  to  size  and  weight  limitations. 
Both  types  of  displays  have  characteristics  that  lead  to  problems 
from  environmental  influences.  For  instance,  luminance  sources 
in  the  environment  that  fall  on  the  screen  wash  out  characters 
on  the  screen,  and  the  accumulation  of  dust  particles  on  the 
screen  may  distort  images.  These  conditions  affect  the  ability 
to  read  the  screen  and  can  lead  to  visual  fatigue  and  dysfunc¬ 
tion.  Specific  characteristics  of  the  environment,  such  as  illumi¬ 
nation  and  glare,  have  been  related  to  computer-operator  vision 
strain  problems.  The  main  visual  functions  involved  in  com¬ 
puter  work  are  accommodation,  convergence,  and  adaptation. 

The  alignment  of  lighting  in  relation  to  the  computer  work¬ 
station,  as  well  as  levels  of  illumination  in  the  area  surrounding 
a  computer  workstation,  have  been  shown  to  influence  the  abil¬ 
ity  of  the  computer  operator  to  read  hardcopy  and  the  com¬ 
puter  screen  (Cakir  et  al.,  1979;  Stammerjohn  et  al.,  1981;  Dain¬ 
off,  1983;  Grandjean,  1987).  Readability  is  also  affected  by  the 
differences  in  luminance  contrast  in  the  work  area.  The  level  of 
illumination  affects  the  extent  of  reflections  from  working  sur¬ 
faces  and  from  the  screen  surface.  Mismatches  in  these  charac¬ 
teristics  and  the  nature  of  the  job  tasks  have  been  postulated 
to  cause  the  visual  system  to  overwork  and  lead  to  visual  fatigue 
and  discomfort  (Cakir  et  al.,  1979;  NAS,  1983). 

Generally,  it  has  been  shown  that  excessive  illumination 
leads  to  increased  screen  and  environmental  glare,  and  poorer 
luminance  contrast  (Gunnarsson  &  Ostberg,  1977;  Laubli,  Hunt¬ 
ing,  &  Grandjean,  1981;  Ghiringhelli,  1980).  Several  studies  have 
shown  that  screen  and  working  surface  glare  are  problematic 
for  visual  disturbances  (Gunnarsson  &  Ostberg,  1977;  Laubli 
et  al.,  1981;  Cakir  et  al.,  1979;  Stammerjohn  et  al.,  1981).  Re¬ 
search  by  van  der  Heiden,  Braeuninger,  and  Grandjean  (1984) 
showed  that  computer  users  spend  a  considerable  amount  of 
their  viewing  time  looking  at  objects  other  than  the  screen. 
Bright  luminance  sources  in  the  environment  can  produce  re¬ 
flections  and  excessive  luminance  contrasts  that  may  create  ex¬ 
cessive  pupillary  response,  which  leads  to  visual  fatigue. 

All  surfaces  within  the  visual  field  of  an  operator  should  be  of 
a  similar  order  of  brightness  as  much  as  practical  to  achieve.  The 
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temporal  uniformity  of  the  surface  luminance  is  as  important 
as  the  static  spatial  uniformity.  Rhythmically  fluctuating  surface 
luminance  in  the  visual  field  is  distracting  and  reduces  visual 
performance.  Such  unfavorable  conditions  prevail  if  the  work 
requires  the  operator  to  glance  alternately  at  a  bright  and  then  a 
dark  surface,  or  if  the  light  source  generates  an  oscillating  light. 
Sometimes  fluorescent  lights  can  have  a  noticeable  strobo¬ 
scopic  effect  on  moving  reflective  objects,  and  sometimes  when 
reflected  from  a  computer  CRT  screen.  When  fluorescent  tubes 
wear  out  or  are  defective,  they  develop  a  slow,  easily  perceptible 
flicker,  especially  at  the  visual  periphery.  Flickering  light  is  ex¬ 
tremely  annoying  and  causes  visual  discomfort. 

Lighting  is  an  important  aspect  of  the  visual  environment 
that  influences  computer  screen  and  hardcopy  readability,  glare 
on  the  screen,  and  viewing  in  the  general  environment. 

The  intensity  of  illumination,  or  the  illuminance  being  mea¬ 
sured,  is  the  amount  of  light  falling  on  a  surface.  In  practice,  this 
level  depends  on  both  the  direction  of  flow  of  the  light  and  the 
spatial  position  of  the  surface  being  illuminated  in  relation  to  the 
light  flow.  Illuminance  is  measured  in  both  the  horizontal  and 
vertical  planes.  At  computer  workplaces,  both  the  horizontal  and 
vertical  illuminances  are  important.  A  document  lying  on  a  desk 
is  illuminated  by  the  horizontal  illuminance,  whereas  the  com¬ 
puter  screen  is  illuminated  by  the  vertical  illuminance.  In  an  of¬ 
fice  that  is  illuminated  from  overhead  luminaries,  the  ratio  be¬ 
tween  the  horizontal  and  vertical  illuminances  is  usually  between 
0.3  and  0.5.  If  the  illuminance  in  a  room  is  said  to  be  500  lux, 
this  implies  that  the  horizontal  illuminance  is  500  lux  while  the 
vertical  illuminance  is  between  150  and  250  lux  (0.3  to  0.5  of  the 
horizontal  illuminance). 

The  illumination  required  for  a  particular  task  is  determined 
by  the  visual  requirements  of  the  task  and  the  visual  ability  of 
the  employees  concerned.  The  illuminance  in  workplaces  that 
use  computer  screens  should  not  be  as  high  as  in  workplaces 
that  exclusively  use  hardcopy.  Lower  levels  of  illumination  will 
provide  better  computer-screen  image  quality  and  reduced 
screen  glare.  Illuminance  in  the  range  of  300  to  700  lux  mea¬ 
sured  on  the  horizontal  working  surface  (not  the  computer 
screen)  is  normally  preferable.  The  lighting  level  should  be  set 
up  according  to  the  visual  demands  of  the  tasks  performed.  For 
instance,  higher  illumination  levels  are  necessary  to  read  hard¬ 
copy  and  lower  illumination  levels  are  better  for  work  that  just 
uses  the  computer  screen.  Thus,  a  job  in  which  both  a  hardcopy 
and  a  computer  screen  are  used  should  have  a  general  work- 
area  illumination  level  of  about  500  to  700  lux;  a  job  that  re¬ 
quires  reading  only  the  computer  screen  would  have  a  general 
work  area  illumination  of  300  to  500  lux. 

Conflicts  can  arise  when  both  hardcopy  and  computer 
screens  are  used  by  different  employees  who  having  differing 
job  task  requirements  or  differing  visual  capabilities  and  are 
working  in  the  same  room.  As  a  compromise,  room  lighting  can 
be  set  at  the  lower  level  (300  lux)  or  intermediate  level  (500  lux) 
and  additional  task  lighting  for  hardcopy  tasks  can  be  provided 
at  each  workstation  as  needed.  Such  additional  lighting  must  be 
carefully  shielded  and  properly  placed  to  avoid  glare  and  reflec¬ 
tions  on  the  computer  screens  and  adjacent  working  surfaces  of 
other  employees.  Furthermore,  task  lighting  should  not  be  too 
bright  in  comparison  to  the  general  work  area  lighting,  since  the 


contrast  between  these  two  different  light  levels  may  produce 
eyestrain. 

The  surface  of  the  computer  screen  reflects  light  and  images. 
The  luminance  of  the  reflections  decreases  character  contrast 
and  disturbs  legibility;  it  can  be  so  strong  that  it  produces  a 
glare.  Image  reflections  are  annoying,  especially  since  they  also 
interfere  with  focusing  mechanisms;  the  eye  is  induced  into  fo¬ 
cusing  between  the  text  and  the  reflected  image.  Thus,  reflec¬ 
tions  are  also  a  source  of  distraction.  Stammerjohn  et  al.  (1981), 
as  well  as  Elias  and  Cail  (1983),  observed  that  bright  reflections 
on  the  screen  are  often  the  principal  complaint  of  operators. 

Luminance  is  a  measure  of  the  brightness  of  a  surface,  the 
amount  of  light  leaving  the  surface  of  an  object,  either  reflected 
by  the  surface  (as  from  a  wall  or  ceiling) ,  emitted  by  the  surface 
(as  from  the  CRT  characters),  or  transmitted  (as  light  from  the 
sun  that  passes  through  translucent  curtains).  High  intensity 
luminance  sources  (such  as  windows)  in  the  peripheral  field  of 
view  should  be  avoided.  In  addition,  a  balance  among  lumi¬ 
nance  levels  within  the  computer  user’s  field  of  view  should  be 
maintained.  To  reduce  environmental  glare,  the  luminance  ratio 
within  the  user’s  near  field  of  vision  should  be  approximately 
1 : 3  and  within  the  far  field  of  vision  should  be  approximately 
1 : 10.  For  luminance  on  the  screen  itself,  the  character-to-screen 
background-luminance  contrast  ratio  should  be  at  least  7:1.  To 
give  the  best  readability  for  each  operator,  it  is  important  to  pro¬ 
vide  screens  with  adjustments  for  character  contrast  and  bright¬ 
ness.  These  adjustments  should  have  controls  that  are  obvious 
and  easily  accessible  from  the  normal  working  position  (e.g., 
located  at  the  front  of  the  screen). 

Experts  have  traditionally  recommended  a  viewing  distance 
between  the  screen  and  the  operator’s  eye  of  45-50  cm,  but  no 
more  than  70  cm.  However,  experience  in  field  studies  has 
shown  that  users  may  adopt  a  viewing  distance  greater  than 
70  cm  and  still  be  able  to  work  efficiently  and  comfortably.  Thus, 
viewing  distance  should  be  determined  in  context  with  other 
considerations.  It  will  vary  depending  upon  the  task  require¬ 
ments,  computer  screen  characteristics,  and  individual  visual  ca¬ 
pabilities.  For  instance,  with  poor  screen  or  hardcopy  quality,  it 
may  be  necessary  to  reduce  viewing  distance  for  easier  charac¬ 
ter  recognition.  Typically  the  viewing  distance  should  be  50  cm 
or  less  because  of  the  small  size  of  the  characters  on  the  com¬ 
puter  or  IT  device  screen. 


THE  AUDITORY  ENVIRONMENT 


As  the  use  of  PIDs  in  a  wide  variety  of  environments  increases, 
the  need  for  a  quiet  auditory  environment  that  allows  the  user 
to  easily  hear  and  speak  at  a  normal  loudness  level  becomes 
more  important.  Crowded,  noisy  environments  detract  from 
users’  abilities  to  hear.  Many  PIDs  have  auditory  interfaces,  such 
as  headphones,  which  can  concentrate  the  primary  auditory  sig¬ 
nal  and  block  out  some  of  the  environmental  noise.  One  con¬ 
cern  is  that  users  may  increase  the  loudness  beyond  levels  that 
are  safe  for  the  auditory  sensory  system  (CDC,  2003).  High  am¬ 
bient  noise  in  the  environment  leads  to  the  need  for  greater 
mental  concentration  (attention)  to  the  auditory  signals,  in- 
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creased  intensity  of  the  primary  auditory  signal  (up  to  levels  that 
may  cause  temporary  auditory  threshold  shifts) ,  and  mood  dis¬ 
turbances  (irritation,  anger,  and  discouragement;  Crocker, 
1997).  These  effects  can  result  in  increased  mental  fatigue,  psy¬ 
chosocial  stress,  and  long-term  exposures  may  lead  to  discom¬ 
fort  and  health  consequences. 

While  headphones  may  provide  benefits  to  concentrate  the 
primary  auditory  signal,  they  also  have  the  drawback  of  provid¬ 
ing  too  much  auditory  energy  to  the  ears.  Prolonged  exposure 
to  load  primary  auditory  signals  can  cause  the  adverse  effects 
described  above.  To  protect  the  auditory  sensory  system  when 
using  PIDs,  we  proposed  the  following: 

1.  Find  environments  to  use  your  PID  where  the  ambient 
auditory  levels  are  50  dba  or  less. 

2 .  Do  not  use  your  PID  for  more  than  30  minutes  without  taking 
a  break  where  you  can  rest  your  eyes  and  ears.  You  should 
rest  for  at  least  10  minutes  before  you  start  using  your  PID 
again.  As  your  total  use  of  the  PID  increases  over  the  course 
of  a  day,  the  rest  breaks  should  become  longer.  While  the  lit¬ 
erature  suggests  that  taking  a  sufficient  break  after  30  min¬ 
utes  of  PID  use  should  be  sufficient  for  your  eyes  to  recover 
(given  a  proper  visual  environment  as  defined  above),  the  lit¬ 
erature  does  not  provide  guidance  on  the  maximum  amount 
of  time  of  PID  use  before  a  break  is  necessary  to  provide  re¬ 
covery  for  your  ears  and  mental  mood.  Nor  does  the  litera¬ 
ture  indicate  the  minimum  amount  of  break  time  necessary 
to  achieve  auditory  or  mental  recovery. 

3.  Keep  the  volume  of  the  headphones  of  the  PID  at  the  lowest 
level  necessary  to  hear  the  primary  auditory  signal  properly. 
In  no  instance  exceed  85  dba,  and  do  not  exceed  30  minutes 
of  continuous  exposure  (see  2  above).  It  is  best  to  not  ex¬ 
ceed  70  dba,  and  if  you  adhere  to  1  above,  you  will  not  have 
to  exceed  60-70  dba  if  you  have  normal  hearing. 

4.  Be  courteous  when  using  your  microphone  or  cell  phone. 
Do  not  talk  directly  at  other  people  such  that  your  conver¬ 
sation  may  produce  masking  noise  that  interferes  with  their 
conversations.  Keep  your  voice  as  low  as  possible  that  allows 
your  interface  or  listener  to  understand  your  signal. 


ERGONOMIC  IMPLEMENTATION  ISSUES 


Implementing  a  workplace  change  such  as  improving  worksta¬ 
tion  design  or  work  methods  is  a  complex  process  because  it 
impacts  many  elements  of  the  work  system  (Derjani  &  Smith, 
1999;  Smith  &  Carayon,  1995;  Hagberg  et  al.,  1995;  Smith  & 
Sainfort,  1989;  Carayon  &  Smith,  2000;  Smith  &  Derjani-Bayeh, 
2003).  Managers,  designers,  and  engineers  often  like  to  believe 
that  technological  enhancements  are  easy  to  make,  and  that 
performance  and  health  improvements  will  be  immediate  and 
substantial.  Proper  implementation  involves  changes  in  more 
than  the  workstation;  for  instance,  there  needs  to  be  consider¬ 
ation  of  the  work  organization,  job  content,  task  improvements, 
job  demands,  and  socialization  issues.  Planning  for  change  can 
help  the  success  of  implementation  and  reduce  the  stress  gener¬ 
ated  by  the  change.  The  success  of  implementing  change  depends 


heavily  on  the  involvement  and  commitment  of  the  concerned 
parties,  in  particular  management,  technical  staff,  support  staff, 
first  line  supervision,  and  the  employees. 

There  is  universal  agreement  among  change  management 
experts  that  the  most  successful  strategies  for  workplace  im¬ 
provements  involve  all  elements  (subsystems)  of  the  work  sys¬ 
tem  that  will  be  affected  by  the  change  (Hendrick,  1986;  Lawler, 
1986;  Smith  &  Carayon,  1995;  Carayon  &  Smith,  2000).  Involve¬ 
ment  assumes  that  there  is  an  active  role  in  the  change  process, 
not  just  providing  strategic  information.  Active  participation 
generates  greater  motivation  and  better  acceptance  of  solutions 
than  passively  providing  information  and  taking  orders.  Active 
participation  is  achieved  by  soliciting  opinions  and  sharing  au¬ 
thority  to  make  decisions  about  solutions.  However,  one  draw¬ 
back  of  active  participation  is  the  need  to  develop  consensus 
among  participants  who  have  differing  opinions  and  motives. 
This  usually  takes  more  time  than  traditional  decision  making, 
and  can  bring  about  conflict  among  subsystems.  Another  draw¬ 
back  is  that  line  employees  often  do  not  have  the  technical  ex¬ 
pertise  necessary  to  form  effective  solutions. 

Participative  ergonomics  can  take  various  forms,  such  as  de¬ 
sign  decision  groups,  quality  circles  and  worker-management 
committees.  Some  of  the  common  characteristics  of  these  vari¬ 
ous  programs  are  employee  involvement  in  developing  and  im¬ 
plementing  ergonomic  solutions,  dissemination  and  exchange 
of  information,  pushing  ergonomics  expertise  down  to  lower 
levels,  co-operation  between  experts  and  nonexperts.  One  of 
the  characteristics  of  participatory  ergonomics  is  the  dissemi¬ 
nation  of  information  (Noro,  1991).  Participative  ergonomics 
can  be  beneficial  to  reduce  or  prevent  resistance  to  change  be¬ 
cause  of  the  information  provided  to  the  various  members  of 
the  organization  concerned  with  the  new  technology.  Uncer¬ 
tainty  and  lack  of  information  are  two  major  causes  of  resistance 
to  change  and  have  been  linked  to  increased  employee  stress.  If 
employees  are  informed  about  potential  ergonomics  changes  in 
advance,  they  are  less  likely  to  actively  resist  the  change. 

Training  computer  users  about  how  the  new  workstation 
functions  and  operations  are  important  especially  if  the  adjust¬ 
ment  controls  are  neither  obvious  nor  intuitive.  Hagberg  et  al. 
(1995)  indicated  that  employee  training  is  a  necessary  compo¬ 
nent  to  any  ergonomic  program  for  reducing  work-related 
musculoskeletal  disorders.  Green  and  Briggs  (1989)  found  that 
adjustable  workstations  are  not  always  effective  without  appro¬ 
priate  information  about  benefits  of  adjustments  and  training 
in  how  to  use  the  equipment.  Hagberg  et  al.  (1995)  suggested 
the  following  considerations  for  ergonomics  training  programs: 

1.  Have  employees  involved  in  the  development  and  process  of 
training.  Using  employee  work  experiences  can  be  helpful 
in  illustrating  principles  to  be  learned  during  training.  In  ad¬ 
dition,  using  employees  as  instructors  can  be  motivational 
for  the  instructors  and  learners. 

2.  Use  active  learning  processes  where  learners  participate  in 
the  process  and  apply  hands-on  methods  of  knowledge  and 
skill  acquisition.  This  approach  to  learning  enhances  acqui¬ 
sition  of  inputs  and  motivation  to  participate. 

3.  Apply  technology,  such  as  audio-visual  equipment  and  com¬ 
puters,  to  illustrate  principles.  Much  like  active  processes. 
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technology  provides  opportunities  for  learners  to  visualize 
the  course  materials  and  to  test  their  knowledge  dynamically 
and  immediately. 

4.  Use  of  on-the-job  training  is  preferred  over  classroom  train¬ 
ing.  Either  can  be  effective  when  used  together. 


SUMMARY  OF  RECOMMENDATIONS 


A  computer  workstation  (work  area)  is  comprised  of  the  com¬ 
puter,  input  and  output  interfaces,  the  furniture  where  the  com¬ 
puter  is  used,  and  the  physical  environment  in  which  the  com¬ 
puter  is  used.  The  design  of  these  elements  and  how  they  fit 
together  play  a  crucial  role  in  user  performance  and  in  minimiz¬ 
ing  potential  adverse  discomfort  and  health  consequences.  The 
recommendations  presented  in  this  chapter  address  the  physical 
environment  and  implementation  issues.  Important  considera¬ 
tion  should  also  be  given  to  organizational  factors  and  task  re¬ 
lated  factors  as  they  affect  and  depend  on  the  individual.  Unique 
situational  factors  need  to  be  considered  when  an  ergonomic 
intervention  is  being  implemented.  Generalizing  these  recom¬ 
mendations  to  all  situations  in  which  computers  or  PIDs  are  used 
is  a  mistake.  The  approach  presented  in  this  chapter  emphasizes 
the  adaptation  of  the  work  area  to  a  user’s  needs  so  that  sensory, 
musculoskeletal,  and  mental  loads  are  minimized. 

Positioning  of  computers  in  the  workplace  is  important  in  or¬ 
der  to  provide  a  more  productive  work  environment.  Comput¬ 
ers  workstations  should  be  placed  at  right  angles  to  the  win¬ 
dows,  and  windows  should  not  be  behind  or  in  front  of  the 
operator.  This  will  reduce  the  possibility  of  reflections  on  the 
screen,  which  can  otherwise  reduce  legibility.  In  addition,  bright 
reflections  coming  from  light  sources  can  be  reduced  by  placing 
these  light  sources  on  either  side  of  or  parallel  to  the  line  of  vi¬ 
sion  of  the  operator. 

Moreover,  illumination  should  be  adapted  to  the  quality  of 
the  source  documents  and  the  task  required.  This  is  done  by 


having  a  high  enough  illumination  to  enhance  legibility,  yet  low 
enough  to  avoid  excessive  luminance  contrasts.  Recommended 
levels  range  between  300  and  700  lux. 

The  keyboard  and  other  input  devices  should  be  movable  on 
the  work  surface,  but  stable  when  in  use.  A  wrist  rest  or  sup¬ 
port  surface  of  15  cm  in  depth  is  recommended  to  rest  the  wrist 
and  forearms.  Computer  workstations  should  allow  for  adjust¬ 
ments  that  promote  good  postures.  A  computer  workstation 
without  adjustable  keyboard  (input  device)  height  and  without 
adjustable  height  and  distance  of  the  screen  is  not  reasonable 
for  continuous  work  of  more  than  30  minutes.  The  controls  for 
adjusting  the  dimensions  of  a  workstation  and  chair  should  be 
easy  to  use.  Such  adjustability  is  particularly  important  at  work¬ 
stations  and  chairs  used  by  more  than  one  employee.  Further¬ 
more,  sufficient  space  for  the  user’s  thighs,  knees,  and  legs 
should  be  provided  to  allow  for  comfort  and  to  avoid  unnatural 
or  constrained  postures. 

■With  regard  to  chair  design,  a  backward-leaning  posture  al¬ 
lows  for  relaxation  of  the  back  muscles  and  decreases  the  load 
on  the  intervertebral  discs.  Chair  seat  pan  height  should  be  eas¬ 
ily  adjustable  and  fit  a  wide  range  of  statures.  The  chair  should 
have  a  full  backrest  that  can  incline  backward  of  up  to  130  de¬ 
grees.  In  addition,  the  backrest  should  have  a  lumbar  support 
and  a  slightly  concave  form  at  the  thoracic  level. 

PIDs  pose  unique  problems  and  ergonomic  sense  indicates 
that  users  need  to  take  actions  that  will  minimize  the  loads  on 
their  musculoskeletal  system,  sensory  systems,  and  mental 
processes.  PIDs  have  inherent  ergonomic  problems  due  to  their 
ubiquitous  applications,  small  size,  and  potential  for  users’  over 
exposures  to  prolonged  use.  Using  PIDs  in  the  proper  environ¬ 
ment,  limiting  the  extent  of  use  by  taking  frequent  rest  breaks, 
and  having  the  best  possible  interfaces  is  good  ergonomic  sense. 

These  recommendations  are  not  exhaustively  inclusive  of  all 
aspects  of  computer  (and  PID)  workstation  (environment)  de¬ 
sign.  The  chapter  described  other  important  workstation  crite¬ 
ria  in  various  sections. 
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attention.  Can  you  manage  to  pay  attention?  For  how  long?  Do 
you  begin  to  shake  your  head  in  despair,  and  give  up?  Do  you 
wish  you  were  somewhere  else — NOW? 


MAKING  THINGS  EASIER  TO  USE 
AND  UNDERSTAND:  THINKING 
ABOUT  THE  USER'S  EXPERIENCE 


The  previous  description  is  unfortunately  an  accurate  analogy  of 
many  users’  experiences  as  they  attempt  to  learn,  work,  play,  and 
relax.  New  products,  new  services,  and  new  technology  with 
which  you  are  unfamiliar  can  create  confusion.  Users  of  these 
new  products,  services,  and  technology  are  customers,  electri¬ 
cians,  grandparents,  clerks,  pilots,  and  students — you  and  me. 
And  for  most  of  us,  it’s  a  jungle  out  there!  Las  Vegas  at  night  with 
fireworks,  or,  monitors  that  are  winking,  blinking,  distracting,  dis¬ 
turbing,  overwhelming — and,  after  a  short  time,  visually  deafen¬ 
ing.  Now  add  voices  coming  from  boxes.  .  .!  Although  this  may 
seem  like  an  exaggeration,  for  many  this  situation  is  exactly  their 
experience.  User  interface  design  focuses  on  designing  flexible 
environments  that  have  a  positive  impact  a  user’s  ability  to  expe¬ 
rience  and  interact  with  a  product  whether  that  product  is  a  mo¬ 
bile  communication  device,  website,  information  kiosk,  or  appli¬ 
ance.  It  involves  creating  environments  that  include  strong 
navigational  devices  that  can  be  understood  intuitively  and  used 
effortlessly.  Designers  have  a  responsibility  to  create  user  experi¬ 
ences  that  are  simple  and  transparent.  To  do  their  job  well,  they 
must  advocate  on  behalf  of  the  user,  ensuring  that  the  interfaces 
they  design  are  not  just  merely  exercises  in  technology  but  that 
they  truly  assist  and  guide  the  user  from  task  to  task,  enabling 
work  to  be  done,  and  ultimately  improving  quality  of  life.  'When 
designers  succeed,  their  products  can  be  used  effortlessly  and  are 
even  pleasurable  to  use.  Good  design  does  not  needlessly  draw 
attention  to  itself  It  just  works.  This  is  the  role  of  good  design. 


DEFINING  VISUAL  DESIGN 


The  nautilus  shell  is  an  example  of  the  synthesis  between  form 
and  function  found  in  nature  (Fig.  17.1).  Its  form  is  the  result  of 
evolution,  which  is  both  transparent  and  beautiful.  The  nautilus 
shell  is  a  perfect  analogy  for  design  and  the  design  process  be¬ 
cause  it  creates  valuable  user  experiences  and  usable  interfaces. 

The  word  design  functions  as  both  a  noun  and  a  verb.  Many 
people  use  it  to  refer  to  the  outward  appearance  or  style  of  a  prod¬ 
uct.  However,  design  also  refers  to  a  process — that  of  intentionally 
establishing  a  plan  or  system  by  which  a  task  can  be  accomplished 
or  a  goal  reached.  It  includes  tangible  and  intangible  systems  in 
which  objects  or  processes  are  coherently  organized  to  include 
the  environments  in  which  these  objects  or  processes  function. 
Design  affects  all  people  in  every  aspect  of  what  they  do.  Good  de¬ 
sign  performs  for  people.  It  is  concerned  with  economics  and  the 
transmission  of  ideas.  The  challenge  presented  to  a  design  team 
is  to  plan  a  prototype  with  a  clear  purpose  that  is  easy  to  use, 
meets  user  needs,  addresses  commercial  considerations,  and  can 
be  mass-produced.  Its  visual  form,  whether  two-  or  three-dimen¬ 
sional,  digital  or  analog,  logically  explains  its  purpose  and  effi- 


FIGURE  17.1.  Nautilus  Shell.  The  nautilus  shell  is  an  example 
of  the  synthesis  between  form  and  function  found  in  nature.  Its 
form  is  the  result  of  evolution.  The  work  of  evolution  is  transpar¬ 
ent  as  its  beauty  is  admired.  The  nautilus  shell  is  a  perfect  anal¬ 
ogy  for  design  and  the  design  process  to  create  valuable  user 
experiences  and  usable  interfaces. 


ciently  leads  the  user  through  its  function.  Design  is  not  a  series 
of  subjective  choices  based  on  personal  preference,  at  best  a  cos¬ 
metic  afterthought  considered  if  time  and  money  are  leftover. 
Good  design  is  the  tangible  representation  of  product  goals.  An 
iterative  and  interactive  process  that  requires  active  learning, 
design  unifies  a  wide  range  of  disciplines.  Good  design  is  a  sig¬ 
nificant  activity  that  reveals  multiple  solutions  to  each  problem. 
Design  equally  values  different  ways  of  thinking.  It  allows  peo¬ 
ple  with  a  variety  of  skills  and  learning  abilities  to  work  coopera¬ 
tively  to  bring  insights  and  expertise  to  problems  and  opportu¬ 
nities  in  order  to  better  develop  new  and  innovative  solutions. 
Problems  can  be  analyzed  using  a  multitude  of  viewpoints  and 
methods.  'Writing,  drawing,  statistical  analysis,  graphing,  discus¬ 
sion,  interviewing,  personal  observation,  model-making,  and  di¬ 
agramming  are  all  legitimate  methods  for  examination  as  the 
physical,  social,  and  cultural  contexts  of  possible  answers  are 
considered  (Davis,  Hawley,  McMullan,  &  Spilka,  1997). 


THE  DESIGN  PROCESS 


Design  as  a  Catalyst  for  Learning,  a  publication  funded  in  part 
by  the  National  Endowment  for  the  Arts,  argued  that  effective 
design  that  responds  to  human  problems  uses  the  following 
steps  (Fig.  17.2): 

•  Problem  identification  and  definition:  A  need  or  problem 
is  identified,  researched,  and  defined. 
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FIGURE  1 7.2.  Interacting  Design  Loop.  The  interacting  design 
loop  developed  by  Richard  Kimbell,  founder  of  the  Technology 
Education  Research  Unit  at  Goldsmiths  College  and  presented 
in  Design  as  a  Catalyst  for  Learning,  captures  the  divergent,  itera¬ 
tive  and  cyclical  nature  of  the  design  process. 


•  Gathering  and  analyzing  information:  The  focus  is  on 
learning  what  is  not  known.  Assumptions  are  questioned. 
Wide  and  broad  research  is  used  to  locate  information  and 
generate  ideas. 

•  Determining  performance  criteria  fora  successful  solution: 
Research  continues  as  imagery  is  selected.  Rules  are  declared 
and  what  is  known  is  specified. 

•  Generating  alternative  solutions  and  building  prototypes: 
Multiple  solutions  are  generated.  A  variety  of  methods  for 
analysis,  such  as  drawing,  interviewing,  modeling  or  evaluat¬ 
ing  statistics,  are  used. 

•  Implementing  choices:  Project  content,  scope,  and  intent  are 
formally  established.  Initial  possibilities  are  represented  and 
presented  as  prototypes. 

•  Evaluating  outcomes:  Prototypes  are  assessed,  tested,  evalu¬ 
ated,  and  judged.  The  knowledge  gained  is  incorporated  into 
further  studies  and  refinements. 

•  Production:  A  prototype,  which  is  a  synthesis  of  the  initial  so¬ 
lutions  made  using  this  process,  and  specifications  are  re¬ 
leased  for  making  multiples  to  a  manufacturer. 


THE  ROLE  OF  THE  DESIGNER 


Visual  design  decisions  are  based  on  project  goals,  user  per¬ 
spective,  and  informed  decision  making.  While  many  aspects 
of  design  are  quantifiable,  there  are  visual  principles  that  are  less 
measurable  but  equally  important.  Even  though  the  necessary 
skills  to  become  visually  literate  and  make  competent  design  de¬ 
cisions  can  be  learned,  design  involves  a  highly  specialized 
knowledge  base.  A  unique  combination  of  creativity  and  skill  dif¬ 
ferentiates  and  makes  one  design  more  attractive  and  desirable 
than  another.  Both  education  and  talent  are  necessary  to  apply 
the  principles  required  to  present  information  in  its  most  ac¬ 
cessible,  useful,  and  pleasing  form.  The  role  of  the  designer  in 


the  development  of  interfaces  for  interactive  products  is  to  un¬ 
derstand  the  product  goals  and  ensure  that  information  is  ap¬ 
proachable,  useful,  and  desirable.  In  an  environment  in  which 
the  interface  is  the  only  tangible  representation  of  a  product 
and  user  perception  determines  product  success,  appropriate 
information  presentation  and  visual  design  is  key.  Designers  un¬ 
derstand  visual  principles  in  context,  and  know  how  to  apply 
them  appropriately  to  create  innovative,  functional  and  aesthet¬ 
ically  pleasing  solutions. 


THE  PROCESS  OF  GOOD  DESIGN- 
HOW  DO  WE  GET  THERE  FROM  HERE? 


Interface  designers  are  responsible  for  defining  what  the  expe¬ 
rience  will  be  like  when  a  product  is  used.  While  print  media 
dictates  that  users  encounter  content  in  a  largely  predeter¬ 
mined  sequence,  an  interface  offers  the  user  greater  flexibility 
over  how  content  can  be  accessed  based  on  users’  needs  and 
wants.  A  successful  interface  can  be  easily  navigated.  Interface 
designers  define,  decide,  and  then  create  the  experience  for 
users,  so  that  an  experience  with  a  product  is  useful,  meaning¬ 
ful,  even  pleasant  and  empowering.  The  designer  must  maintain 
an  attitude  of  unbiased  discovery  and  empathy  for  the  user.  The 
designer  must  develop  clearly  defined  goals  in  order  to  create 
a  good  design  that  includes  an  evaluation  process  that  supports 
and  enhances  these  goals,  and  includes  the  flexibility  to  respond 
to  changes  as  the  process  continues  and  products  evolve. 


AN  INFORMATION-DESIGN  PROCESS  IS  AN 
INFORMED  DESIGN  PROCESS 


An  information-design  process  (IDP)  is  a  method  of  visually 
structuring  and  organizing  information  to  develop  effective 
communication.  Information  design  is  not  superficial  or  deco¬ 
rative,  but  is  rather  a  merging  of  functional,  performance-based 
requirements  with  the  most  appropriate  form  to  present  these 
requirements.  A  thoughtful,  well-designed  solution  will, 

•  Motivate  users:  It  psychologically  entices  an  audience,  con¬ 
vincing  members  that  information  and  tasks  can  be  success¬ 
fully  handled. 

•  Increase  ease  of  use  and  accessibility:  The  effort  needed  to 
comprehend  information  is  decreased.  A  clear  path  that 
aids  in  skimming  and  referencing  text  and  gives  easy  access  is 
provided. 

•  Increase  the  accuracy  and  retention  of  the  information: 
Users  learn  and  retain  information  better  when  it  is  visually 
mapped  and  structured  in  obvious  and  intuitive  ways. 

•  Focus  on  the  needs  of  its  users:  Multiple  audiences  have  dif¬ 
ferent  requirements  and  styles  of  learning.  Solutions  should  be 
developed  that  provide  alternative  means  of  accessing  infor¬ 
mation  for  different  types  of  users.  An  information-design  ap¬ 
proach  is  part  of  a  process  that  incorporates  research,  design, 
testing,  and  training  to  produce  useful,  cost-effective  solutions. 
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Phase  1 :  The  Audit 

The  goal  of  the  audit  is  to  create  a  blueprint  for  the  project, 
much  like  architectural  drawings  are  developed  before  con¬ 
structing  a  building  (Fig.  17.3).  The  audit  process  begins  by  ask¬ 
ing  and  answering  a  number  of  questions  and  acknowledging 
ongoing  change  and  an  ever-increasing  palette  of  products  and 
services.  Questions  are  asked  throughout  the  entire  product  life 
cycle,  since  the  answers/design  solutions  reflect  the  user/use  en¬ 
vironment  and  affect  the  ongoing  usefulness  and  value  of  the 
product.  To  create  an  eloquent  design,  continually  ask  and 
answer  the  following  questions: 

Audit  Questions  A 

•  Who  are  the  product  users? 

•  How  will  this  product  be  used? 

•  When  will  this  product  be  used? 

•  Why  will  this  product  be  used? 

•  Where  will  this  product  be  used? 

•  How  will  the  process  evolve  to  support  this  product  as  it 
evolves? 

After  the  first  set  of  questions  are  asked  and  answered,  a  sec¬ 
ond  set  of  questions  must  be  asked  and  answered: 

Audit  Questions  B 

•  What  is  the  most  efficient,  effective  way  for  a  user  to  accom¬ 
plish  a  set  of  tasks  and  move  on  to  the  next  set  of  tasks? 

•  How  can  the  information  required  for  product  ease  of  use 
be  presented  most  efficiently  and  effectively? 

•  How  can  the  design  of  this  product  be  done  to  support  ease 
of  use  and  transition  from  task  to  task  as  a  seamless,  trans¬ 
parent,  and  even  pleasurable  experience? 

•  What  are  the  technical  and  organizational  limits  and  con¬ 
straints? 


Audit  Development  Implementation 


FIGURE  17.3.  Information  Design  Process  (IDP).  IDP  is  phased 
to  insure  user  and  organizational  needs  are  met.  It  is  ongoing 
and  iterative,  throughout  the  lifecycle  of  a  product.  Any  change 
can  trigger  a  recycling  of  this  process,  to  insure  solutions  remain 
appropriate  and  useful. 


An  audit  focuses  on  discovery.  Many  disciplines  and  organiza¬ 
tional  resources  must  be  consulted.  Change  is  a  given,  since  de¬ 
signers  begin  with  assumptions  and  don’t  know  all  they  need  to 
know  yet.  The  answers  to  their  questions  and  their  analysis  in 
the  context  of  organizational  objectives  provide  the  basis  for  the 
audit  report,  which  serves  as  the  guide  in  design  development. 
The  audit  report  can  be  as  simple  as  a  two-page  list  or  as  com¬ 
plex  as  a  comprehensive  hundred-page  report.  Since  the  goal  is 
discovery,  it  includes  every  aspect  of  the  organization  concerned 
with  the  product-development  cycle:  project  management,  us¬ 
ability  engineering,  technical  development,  user  support  and 
documentation,  visual  communication  and  design,  and  content 
management.  With  these  goals,  the  result  is  unbiased,  accurate, 
comprehensive  information  that  serves  as  the  basis  for  design. 

Phase  2:  Design  Development 

The  design-development  phase  uses  the  audit  report  as  a  guide¬ 
line.  This  is  an  ongoing,  iterative  process  with  each  iteration  in¬ 
corporating  user  test  results  to  make  the  product  appropriate  to 
the  particular  set  of  needs.  In  reality,  the  length  of  this  process  is 
often  defined  and  limited  by  real-world  deadlines  such  as  product 
release  dates.  The  design-development  phase  includes  design 
and  testing.  The  designer  or  design  team  creates  a  number  of  so¬ 
lutions  based  on  results  and  objectives  determined  by  the  audit 
report  as  well  as  other  project  specifics.  Initially,  design  ideas 
should  be  very  broad,  incorporating  many  ideas  and  options  no 
matter  how  unrealistic  or  unusual.  As  ideas  are  tested,  user  feed¬ 
back  incorporated,  and  other  parameters  defined,  solutions  nat¬ 
urally  become  more  defined.  Surviving  design  ideas  are  then 
based  on  solid  information  derived  from  user  feedback,  providing 
a  strong  basis  for  final  design  decisions.  In  the  beginning,  the  fo¬ 
cus  is  on  high-level  concepts  and  navigation.  How  will  the  prod¬ 
uct  work?  What  will  it  feel  like  to  use?  As  initial  concepts  are  re¬ 
fined,  design  details  become  more  specific.  "When  the  conceptual 
model  and  organizational  framework  are  approved,  the  design 
of  the  look  or  product  package  begins.  By  the  end  of  this  phase,  a 
prototype  design  to  be  carried  out  in  implementation  and  moni¬ 
toring  is  tested,  approved,  and  specified. 

Phase  3:  Implementation  and  Monitoring 

The  implementation  phase  focuses  on  delivering  what  has  been 
defined,  designed,  and  documented  in  the  preceding  phases. 
It  is  the  final  part  of  a  holistic  process  that  defines  everything 
necessary  to  make  a  product  succeed  on  an  ongoing  basis.  This 
includes  not  only  the  implementation  of  the  design  within  the 
technology,  but  also  any  additional  support  such  as  the  creation 
of  training  materials  and  other  reinforcements  that  enhance  use 
and  productivity.  Continuous  monitoring  is  key  to  sustained 
success,  because  a  successful  product  responds  to  evolving 
technology  and  user  needs.  This  last  phase  is  mostly  consulta¬ 
tive  and  ongoing  throughout  the  product  lifecycle  in  order  to 
ensure  that  changes  such  as  new  technology  and  product  de¬ 
velopments  are  reflected  in  the  product  itself  These  may  in  fact 
trigger  another  audit/design/testing  cycle,  although  usually  less 
extensive  than  the  initial  process.  Though  the  implementation 
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phase  is  called  “the  last  phase,”  it  reveals  the  evolutionary 
process  of  design  and  development.  The  goal  of  ongoing  mon¬ 
itoring  of  solutions  is  to  be  aware  of  changes  in  user  needs, 
technology,  and  competition  that  impact  user  acceptance  and 
satisfaction.  Changes  here  often  result  in  the  need  to  reevalu¬ 
ate  and  redesign  to  incorporate  this  new  knowledge  gained. 


VISUAL  DESIGN  PRINCIPLES 


Interaction  design  bridges  many  worlds:  that  of  visual  design, 
information  presentation,  and  usability  with  aesthetics.  Donis  A. 
Donis  (1973a),  in  A  Primer  of  Visual  Literacy,  argued  that  art  and 
its  meaning  have  dramatically  changed  in  contemporary  times 
from  one  that  involved  a  concern  with  function  to  one  that  views 
the  process  of  creating  art  as  that  of  making  emotional  maps  that 
spring  from  the  province  of  the  intuitive  and  subjective.  This  ar¬ 
gument  extends  to  design.  To  someone  unskilled  in  creating  ef¬ 
fective  communications,  visual  design  is  often  understood  as 
personal  preference  limited  to  style  or  appearance.  However, 
any  form  of  effective  design  is  a  result  of  rigorous  study,  a  con¬ 
cern  for  organization  and  usability  combined  with  knowledge  of 
the  basic  design  principles  of  harmony,  balance,  and  simplicity. 
Visual  design  is  in  fact  a  form  of  literacy. 


UNIVERSAL  PRINCIPLES  OF  VISUAL 
COMMUNICATION  AND  ORGANIZATION 


The  principles  of  harmony,  balance,  and  simplicity  are  related 
yet  distinct  in  meaning  and  application.  LLarmony  is  the  group¬ 
ing  of  related  parts,  so  that  all  elements  combine  logically  to 
make  a  unified  whole.  In  interface  design,  as  with  other  cate¬ 
gories  of  design,  this  is  achieved  when  all  design  elements  work 
in  unity.  Transitions  from  place  to  place  are  effortless  and  the 
techniques  used  to  achieve  this  harmony  are  unnoticed  by  the 
user.  Visual  harmony  achieves  the  same  goal  as  musical  har¬ 
mony  in  which  notes  combine  to  create  a  chord.  The  golden 
section,  also  known  as  the  “golden  mean”  or  “golden  rectangle,” 
is  one  of  the  most  widely  used  methods  for  creating  harmony. 
Architects,  artists,  musicians,  mathematicians,  and  designers 
have  used  the  golden  section  extensively  for  centuries  to  create 
proportional  relationships  (Fig.  17.4). 

Balance  offers  equilibrium  or  rest.  Donis  stated  that 
equilibrium  is  the  strongest  visual  reference  (Donis,  1973b). 
It  provides  the  equivalent  of  a  center  of  gravity  that  grounds 
the  page.  Without  balance,  the  page  collapses,  all  elements 
are  seen  as  dispersed,  and  content  is  lost.  Balance  requires 
continual  modification  from  page  to  page  because  while 
each  page  is  part  of  a  greater  system,  elements  can  vary  and 
all  have  visual  weight.  In  the  same  way  that  a  clown  balanc¬ 
ing  on  a  ball  while  juggling  objects  of  different  weights  must 
continually  make  adjustments  for  actions  that  are  occurring, 
visual  balance  requires  the  same  concerns  and  adjustments 
as  in  the  physical  world.  Regardless  of  how  a  design  is  or¬ 
ganized,  it  must  achieve  stability  and  unity  in  order  for  a 
user  to  feel  comfortable  with  the  solution.  Balance  can  be 
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FIGURE  17.4.  The  Golden  Rectangle.  Divide  a  square  to  find 
the  center  point  (a)  from  which  length  (ab)  is  found.  From  point 
(a)  the  length  (ab)  is  swung  as  an  arc  to  point  (c)  to  create  a  rec¬ 
tangle  that  uses  the  proportions  of  the  golden  section. 


achieved  a  number  of  ways.  One  obvious  method  uses  sym¬ 
metry,  such  as  found  on  a  page  with  text  and  image  aligned 
on  a  centered  axis.  Deceptively  simple,  symmetry  form  is  of¬ 
ten  considered  easy  to  make;  however,  unless  handled  care¬ 
fully  a  symmetrical  composition  can  be  predictable,  boring, 
and  static.  Asymmetry  employs  nonaxial  balance  and  uses 
contrast  between  elements  such  as  weight,  form,  and  color 
to  create  visual  tension  and  drama.  Both  are  valid  approaches 
and  require  skill  and  knowledge  of  complex  visual  interac¬ 
tion  to  achieve. 

Simplicity  is  the  embodiment  of  clarity,  elegance,  and 
economy.  A  solution  that  offers  simplicity  is  unambiguous  and 
easily  understood.  It  offers  clarity  working  effortlessly  devoid 
of  unnecessary  decoration.  It  appears  deceivingly  easy  acces¬ 
sible,  and  approachable,  even  though  it  may  be  conceptually 
rich.  Simplicity  involves  distillation — every  element  is  indis¬ 
pensable,  if  an  element  is  removed,  the  composition  falls 
apart.  Achieving  simplicity  is  no  easy  task.  Two  guidelines  for 
creating  simple  design  solutions  are  (a)  “Less  is  more!”  (at¬ 
tributed  to  Mies  van  der  Rohe)  and  (b)  “When  in  doubt,  leave 
it  out!”  (Anonymous).  The  most  refined  design  is  direct  and 
includes  only  essential  elements.  Removing  any  element 
breaks  the  composition  rendering  it  unintelligible  or  radically 
different. 


VISUAL  DESIGN  TOOLS  AND  TECHNIOUES 


New  technologies  are  rapidly  being  created  that  extend  past  the 
simple  automation  of  tasks  and  communication;  they  are  revo¬ 
lutionizing  processes  and  the  resulting  products.  Before  the  rev¬ 
olution  brought  about  by  electronic  publishing  technology, 
many  disciplines  such  as  writing,  editing,  design,  publishing, 
and  programming  were  discreet  units  that  handled  a  defined 
step  within  a  larger  process.  Today’s  new  publishing  environ¬ 
ments  encourage  the  possibility  of  a  renaissance  publisher — a 
person  who  can  create,  design,  publish,  and  distribute.  Yet  the 
process  used  to  arrive  at  successful  solutions  is  very  complex 
and  extends  past  technical  knowledge  to  include  a  mastery  of  vi¬ 
sual  and  verbal  language  in  order  to  build  effective  communi- 
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cation.  Focus  must  be  placed  on  the  factors  that  determine  suc¬ 
cess  with  constant  evaluation  and  adjustment  of  these  factors 
in  light  of  new  developments. 

The  Five  Criteria  for  Good  Design 

Before  any  work  begins,  all  participants  in  the  process  should 
have  a  clear  understanding  of  the  criteria  for  good  design.  The 
following  questions  are  guidelines  for  evaluation  of  design  so¬ 
lutions  before,  during,  and  after  the  process  to  ensure  that  all 
solutions  remain  valid  as  products,  technology,  and  user  needs 
evolve. 

•  Is  it  appropriate?  Is  the  solution  appropriate  for  the  par¬ 
ticular  audience,  environment,  technology,  and/or  culture? 

•  Is  it  durable?  Will  the  solution  be  useful  over  time?  Can  it 
be  refined  and  transitioned  as  the  product  evolves  and  is 
redefined? 

•  Is  it  verifiable?  Has  the  design  been  tested  by  typical  users 
in  the  environment  that  it  will  be  used  in?  Has  feedback  been 
properly  evaluated  and  used  to  improve  the  product? 

•  Does  it  have  impact?  Does  the  design  solution  not  only 
solve  the  problem,  but  also  impact  the  look  and  feel,  so  that 
the  user  finds  the  product  experience  comfortable,  useful, 
and  desirable? 

•  Is  it  cost  effective?  Can  the  solution  be  implemented  and 
maintained?  Are  individuals  with  the  necessary  skills  and  un¬ 
derstanding  to  create,  refine,  and  maintain  the  design  available 
throughout  the  product’s  life?  The  cost  of  any  design  begins 
with  the  audit  and  design  phases,  but  continues  after  the  im¬ 
plementation  phase  to  insure  that  it  remains  advantageous  and 
cost-effective.  The  hard  and  soft  costs  of  delivering  the  solu¬ 
tion  plus  ongoing  maintenance  add  up  to  the  real  design  costs. 

Visual  Design  Principles  at  Work 

The  following  sections  outline  the  disciplines  and  principles 
used  to  create  quality  design  solutions.  Each  topic  is  worthy  of 
extended  study,  because  there  is  much  to  understand  when 
evaluating  how  to  effectively  present  information.  As  the  de¬ 
sign  process  evolves,  insights,  and  information  are  discovered 
that  impact  a  solution.  It  is  optimistic  to  base  solutions  on  an  ini¬ 
tial  exercise  because  the  very  nature  of  process  means  discov¬ 
ering  what  is  unknown  yet  critical.  Therefore,  all  members  in¬ 
volved  in  the  design  process  must  remain  open  and  ready  to 
incorporate  new  information,  which  may  change  or  delay  re¬ 
sults,  but  more  accurately  reflect  user  needs. 

For  example,  if  a  new  feature  is  developed  that  changes  a 
product’s  target  audience  from  mid-level  managerial  to  execu¬ 
tive  users,  most  methods  for  critical  interaction  and  content  de¬ 
livery  should  be  reconsidered.  Executives  have  less  time  and 
need  different  information.  The  result  might  be  a  simpler  inter¬ 
face  with  streamlined  content  that  uses  a  larger  typeface  and  a 
more  conservative  visual  language.  The  most  important  princi¬ 
ple  to  remember  when  thinking  about  design  is  that  there  are 
no  rules,  only  guidelines.  Everything  is  context  sensitive.  Always 
consider  and  respect  the  users. 


Typography 

In  The  Elements  of  Typographic  Style,  the  poet  and  typographer, 
Robert  Bringhurst  (2005)  described  typography  as  frozen  lan¬ 
guage.  Typography  is  the  visual  representation  of  spoken  and 
unspoken  thought  that  allows  an  idea  to  be  shared  across  time 
and  distance  independent  of  its  creator.  A  functional  and  ex¬ 
pressive  art  that  shares  many  of  the  same  concerns  as  writing 
and  editing,  typography  involves  organizing  text  so  that  its 
meaning  is  communicated  according  to  an  author’s  intent.  In 
design,  a  literacy  that  provides  an  understanding  of  typography 
and  how  text  can  be  structured  in  space  is  as  important  as  a  lit¬ 
eracy  that  understands  how  to  structure  grammar  in  order  to 
explicate  content. 

Typography  is  made  from  type,  individual  characters  orga¬ 
nized  by  visual  characteristics  into  typefaces.  Type  is  the  smallest 
definable  part  of  a  design,  much  like  a  pixel  is  to  a  screen  display. 
James  Felici  (2003),  who  has  worked  through  evolutions  in  type¬ 
setting  technologies  in  The  Complete  Manual  of  Typography,  de¬ 
fined  a  font  as  the  electronic  file  that  contains  the  programming 
code  needed  to  make  the  characters  found  in  a  typeface.  Histori¬ 
cally,  a  typeface  consists  of  all  the  individual  characters  or  glyphs  at 
a  given  size:  letterforms,  punctuation,  numbers,  mathematical 
symbols,  diacritical  marks,  and  other  accessory  characters  needed 
to  fully  compose  a  text.  This  definition  serves  as  a  reminder  to 
read  a  text  carefully  and  consider  all  needs  before  selecting  a  type¬ 
face  and  developing  a  presentation  form  (Fig.  17.5). 

Effective  typography  is  rational.  It  is  concerned  with  clarity 
and  comprehension;  the  ease  in  which  characters  and  word 
shapes  are  recognized  in  reading  environments  and  the  ease 
in  which  they  can  be  used.  It  extends  past  the  shapes  of  indi¬ 
vidual  letters  and  their  potential  combinations  to  include  the 
relationships  found  between  word  and  interword  shapes, 
functional  groupings  that  ultimately  progress  into  issues  of 
type  weight,  slope,  width  and  scale,  characteristics  that  act  as 
interpretative  devices  in  order  to  create  influential  and  per- 
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FIGURE  17.5.  Type  Family.  A  type  family  is  built  around  four 
core  members:  roman,  bold,  italic,  and  bold  italic.  Additional 
members  may  include  typefaces  whose  weight  and  width  are  vari¬ 
ants  of  the  core  group.  A  family  can  also  contain  expert  sets  that 
offer  additional  or  alternate  characters  such  as  smallcaps,  frac¬ 
tions  and  non-aligning  numbers. 
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suasive  form.  If  the  principles  of  good  typography  can  be  un¬ 
derstood  and  applied,  then  these  same  principles  can  be  ex¬ 
tended  to  more  complex  issues  that  follow  such  as  page  and 
product  design.  Typographic  choice  affects  legibility  and  read¬ 
ability,  the  ability  to  easily  see  and  understand  what  is  on  the 
page,  in  all  media.  Walter  Tracy  (1986),  in  Letters  of  Credit:  A 
View  of  Type  Design,  offered  the  most  useful  definitions  for 
legibility  and  readability.  Legibility,  the  speed  at  which  letters 
and  the  words  built  from  them  can  be  recognized,  refers  to 
perception.  Readability ,  the  facility  and  ease  with  which  text 
can  be  read,  refers  to  comprehension.  Legibility  and  readabil¬ 
ity  are  related.  Regardless  of  media,  legibility  and  readability 
are  determined  by  variables  such  as  point  size,  letter  pairing, 
word  spacing,  line  length  and  leading,  resolution,  color,  and  or¬ 
ganizational  strategies  such  as  text  clustering.  Together,  legi¬ 
bility  and  readability  comprise  typography’s  functional  as¬ 
pects.  Good  typography,  like  good  design,  is  invisible  to  the 
user — it  just  works  (Fig.  17.6). 

Selecting  an  appropriate  typeface  for  a  specific  purpose  and 
context  requires  experience  and  understanding  (Fig.  17.7).  With 
thousands  of  typefaces  to  choose  from  and  numerous  ways  to 
manipulate  them,  finding  the  typeface  best  suited  for  an  audi¬ 
ence  is  not  easy.  With  its  lack  of  control,  multiple  media,  and  var¬ 
ied  viewing  contexts  the  current  publishing  environment  makes 
this  a  complex  task. 

Typeface  choice  impacts  whether  and  how  a  communication 
is  read.  Distinct  typefaces  and  typographic  styles  create  envi¬ 
ronments  that  influence  a  user’s  perception  of  text.  The  physi¬ 
cal  nature  of  the  presentation  itself  helps  determine  content 
and  acceptance.  A  typeface  with  extremely  thick  and  thin 
strokes  may  appear  sophisticated  and  readable  in  print  but  may 
look  naive  and  render  text  unreadable  in  a  digital  environment. 
Typefaces  are  frequently  designed  to  solve  issues  of  legibility 
and  readability  created  by  a  technology.  A  typeface  made  for 
online  use  can  increase  page  legibility,  as  well  as  the  overall  per¬ 
ception  of  approachability  quality  of  an  interface,  and  ultimately 
product  acceptance  (Fig.  17.8,  17.9,  &  17.10) 

An  informed  selection  can  make  reading  enjoyable  and  ef¬ 
fortless  rather  than  frustrating  and  fatiguing.  Though  typogra- 
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FIGURE  17.6.  Legibility  and  Readability.  The  letters,  letter  pairs 
and  words  shown  above  are  examples  of  what  can  happen  if  the 
designer  is  not  sensitive  to  issues  of  legibility  and  readability. 
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FIGURE  17.7.  Univers  "U”.  Univers,  a  type  family  designed  by 
Adrian  Frutiger  and  released  for  commercial  use  in  1 954,  is  com¬ 
posed  of  twenty-one  fonts  that  together  offer  a  wide  range  of 
weights,  widths  and  slopes  that  allows  a  text  to  be  organized  so 
that  its  form  is  visually  coherent  and  easily  read. 
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FIGURE  17.8.  Bell  Centennial:  Technology  specific  typefaces. 
AT&T  commissioned  Bell  Centennial,  a  typeface  designed  at  a  very 
small  size  for  telephone  directory  use,  in  order  to  solve  an  indus¬ 
trial  problem  created  by  changing  typesetting  and  printing  tech¬ 
nologies.  The  resulting  type  family  designed  for  maximum  legibil¬ 
ity,  readability  and  spatial  efficiency  provided  the  user  with  a  clear 
information  hierarchy,  it  reduced  paper  use  and  directory  assis¬ 
tance  calls.  Here,  Bell  Centennial  is  shown  at  six  point,  the  size  at 
which  it  was  intended  to  function. 


phy  might  seem  to  be  an  insignificant  issue  to  a  non-designer, 
it  affects  overall  usability.  A  clear  understanding  of  the  concepts 
and  principles  that  affect  legibility  and  readability  is  crucial  to 
determining  effective  typography. 


17.  Visual  Design  Principles  for  Usable  Interfaces  *  337 


acefgsv 

0123456789 

acefgsv 

0123456789 

FIGURE  1 7.9.  Bell  Centennial:  Technology  specific  typefaces. 
Bell  Centennial's  forms  were  opened  to  increase  legibility  and 
readability.  Curved  strokes  were  straightened  and  the  horizontal 
and  vertical  juncture  was  notched  so  that  it  did  they  did  not  clog 
with  ink  when  printed.  Select  letterforms  shown  at  32  point  for 
Bell  Centennial  (bottom)  and  Helvetica  (top)  illustrate  this. 


FIGURE  17.10.  Pixel  Font.  Pixel  fonts  are  screen  fonts  designed 
specifically  for  use  with  or  as  on  screen  navigational  elements. 
Their  block-like  forms  are  the  result  of  working  with  the  actual 
pixels  themselves. 


HOW  THE  HUMAN  EYE  SEES, 
AND  THEN  READS 


Herbert  Spencer  (1969)  in  The  Visible  Word,  a  publication 
with  an  objective  of  introducing  and  uniting  those  who  re¬ 
search  legibility  with  those  who  work  with  typography,  pre¬ 
sented  that  the  eye  uses  both  outline  word  shapes  and  their 
internal  patterns  to  move  along  a  text  line  and  steps  and 
jumps  as  it  groups  text  to  form  comprehensible  phrases  of 
information.  This  motion  of  the  eye  during  reading  is  known 
as  “saccadic  movement.”  Sometimes  during  this  process,  the 
eye  regresses  and  returns  to  what  has  been  read.  Optimal  ty¬ 
pography  allows  for  fewer  backward  movements.  Spencer  ar¬ 
gues  that  possessing  a  mastery  of  reading  mechanics  is  im¬ 
portant  to  a  mastery  of  content.  Typeface  selection  directly 
affects  this  skill,  making  it  easier  or  more  difficult  for  the  eye 
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FIGURE  17.11.  Now  Read  This.  The  phrase,  Now  Read  This,  is 
shown  in  full,  cropped  at  the  bottom,  and  from  the  top  suggest¬ 
ing  the  importance  of  outline  word  shape  and  internal  pattern. 


to  group,  read,  and  understand  information  (Fig.  17.11,  17.12, 
&  17.13). 


TYPEFACE  SIZE  AND  SELECTION 


Type  size  is  given  in  points,  a  measuring  system  unique  to 
typography.  In  digital  typesetting  systems,  a  point  equals  1/72 
of  an  inch.  Type  size  cannot  be  determined  by  physically  mea¬ 
suring  a  letterform  because  when  type  existed  solely  in  metal, 
the  technology  in  which  it  was  first  used,  size  was  measured  by 
the  height  of  the  metal  block  on  which  a  letter  sat.  This  is  one 
of  the  reasons  why  the  same  letter  repeatedly  set  in  differing 
typefaces  at  the  same  point  size  appears  dissimilar  when  com¬ 
pared.  Lowercase  letters  set  in  the  same  point  size  with  differ¬ 
ing  typefaces  can  appear  larger  or  smaller  in  comparison  to 
each  other  due  to  variations  in  x-height.  Other  variables  such 
as  stroke  weight,  letter  width,  and  ascender  and  descender 
length  influence  size  perception  and  help  make  some  type¬ 
faces  more  or  less  readable  and  legible  than  others.  Type  size  is 
also  dependent  on  the  resolution  offered  by  output  and  view¬ 
ing  devices,  color  usage,  context,  and  other  design  issues.  It 
is  crucial  to  understand  not  only  the  characteristics  of  a  type¬ 
face  but  also  usage  context  and  application  environment  when 
selecting  a  typeface. 

Serif  and  sans  serif.  Serif  and  sans  serifs  are  general  cat¬ 
egories  used  for  classifying  type.  5er// refers  to  a  typeface  with 
serifs,  the  short  strokes  that  project  off  the  end  of  letter  strokes, 
as  opposed  to  a  sans  serif,  a  typeface  without  serifs.  It  is  debat¬ 
able  whether  serif  typefaces,  conventionally  used  for  setting  text 
in  print,  are  more  readable  than  sans  serif  typefaces,  which  have 
an  even  stroke  weight  and  more  open  counters  that  have  proven 
useful  for  setting  text  for  on-screen  reading.  And,  while  sans  serif 
is  considered  easier  to  read  on  screen,  serifs  can  be  made  equally 
legible  if  the  appropriate  typeface,  size,  and  color  is  specified. 
Some  designers  think  that,  in  print,  serifs  aid  in  character  recog¬ 
nition  and  readability;  they  help  differentiate  individual  letters 
creating  horizontal  lines  for  the  eye  to  follow.  This  has  not  been 
proven  conclusively.  Other  designers  hold  the  view  that  “we  read 
best  what  we  read  most.”  It’s  likely  that  this  discussion  will  con¬ 
tinue.  Recent  technological  developments  that  subdivide  a  pixel 
into  red,  green,  and  blue  elements  on  LCD  screens  have  resulted 
in  new  technologies  that  create  a  better  immersive  environment 
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Serif 

A  type  classification  The  beginning  or  ending 
that  refers  to  a  stroke  drawn  at  a  right 
typeface  with  serifs  angle  or  obliquely  across 
the  arm,  stem  or  tail 
of  a  letter. 


Ascender 

The  stem  of  a  lowercase 
letter  that  extends  past 
the  mean  line. 


as  opposed  to  type  that  that  extends  below  the  baseline. 

Is  italic  or  slants  to  the 

right.  Roman  can  also 

be  used  to  describe  type 

weight.  It  refers  to  a 

standard  weight  as  opposed 

to  a  bold  or  light  weight. 


Sans  Serif 

A  type  classification 
that  refers  to  a  typeface 
without  serifs 


Cap  Line 

An  optical  line  created 
by  the  eye  moving 
across  the  top  of  a  set 
of  uppercase  letters 


Mean  Line 

An  optical  line  created 
by  the  eye  moving 
across  the  top  of  a  set 
of  lowercase  letters. 

The  eye  follows  this  line 
when  reading. 


x-height 

The  height  of  the 
lowercase  letters 
without  ascenders 
or  descenders. 


Baseline 

The  optical  line  on  which 
type  sits 


FIGURE  17.12.  Anatomy  of  a  letter.  The  typographic  terms  defined  and  illustrated  above  are 
used  by  designers  in  discussing  principles  that  affect  the  legibility  of  type  and  overall  quality  of  the 
communication. 


hqx  hqx  hqx  hqx  hqx 

PALATINO  BASKERVILLE  BODONI  SERIFA  HELVETICA 

FIGURE  17.13.  Hqx  Illustration.  The  x-height  of  a  typeface  (based  on  the  actual  height  of  a 
lower-case  x)  is  a  key  characteristic  when  deciding  the  visual  size  of  a  typeface,  particularly  when 
readability  is  critical.  While  the  above  typefaces  are  the  same  point  size,  some  seem  larger  (e.g., 
Helvetica)  and  easier  to  read  than  others  (e.g.,  Serifa). 


for  online  reading.  This,  in  turn,  will  spark  new  explorations  in  ty¬ 
pographic  form  and  its  presentation. 

Families  of  type.  Many  design  students  first  learn  to  clas¬ 
sify  typefaces  into  five  chronological  families  or  organizational 
groupings  popularized  by  James  Craig  et  al.,  (2006)  in  Design¬ 
ing  with  Type-.  (1)  old  style,  (b)  transitional,  (c)  modern,  (d) 
Egyptian,  and  (e)  contemporary.  Classifying  typefaces  into 
these  families  makes  it  easier  to  understand  the  differences  and 
similarities  in  structure  and  fitness.  Like  anything  else,  type  de¬ 
sign  does  not  happen  in  isolation.  These  categories,  as  with 
those  of  many  other  classification  systems,  are  formed  around 
historical  junctures  in  which  the  overall  design  of  letterforms 
shifted  dramatically  in  response  to  technological,  political,  cul¬ 
tural,  aesthetic,  and  economic  concerns.  Typefaces  within  each 
set  have  passed  through  the  tenures  of  metal,  photocomposi¬ 
tion,  and  been  adapted  for  digital  technologies.  Many  of  the 
problems  solved  through  these  older  forms  have  proven  inspi¬ 
rational  to  contemporary  type  designers  who  as  their  prede¬ 
cessors  did,  continue  to  explore  new  forms  for  new  purposes. 

Verdana  a  sans  serif  screen  font  designed  by  Matthew  Carter, 
whose  roots  lie  in  the  Industrial  Revolution,  was  commissioned 
by  Microsoft  and  released  in  1996.  Its  form,  which  uses  a  visu¬ 


ally  even  stroke  with  wide  counters,  helped  signal  a  new  soft¬ 
ware  release.  Verdana’s  members  consist  of  Roman,  italic,  bold, 
and  bold  italic.  It  has  two  peers  that  use  nonLatin  alphabets: 
Verdana  Greek  and  Verdana  Cyrillic.  Verdana  is  related  to 
Tahoma,  a  condensed  variation  of  Verdana  designed  for  use  in 
situations  that  require  more  information  to  fit  in  less  space — 
such  as  with  dialog  boxes  and  menus;  and  Nina,  a  spatially  effi¬ 
cient  sans  serif  designed  for  situations  that  require  more  infor¬ 
mation  to  fit  in  even  less  space — such  as  with  small  handheld 
devices.  John  Berry  (2004),  who  writes  and  consults  extensively 
on  typographic  matters,  said  in  Now  Read  This  that  Meiryo,  Ver¬ 
dana’s  daughter,  evolved  partially  as  a  response  to  technologies 
that  enhance  photometric  resolution  permitting  more  complex 
writing  systems  such  as  scripts  to  be  optimally  read  on  screen 
and  a  demand  for  a  Japanese  type  that  weaves  the  character 
sets  of  Kanji,  Kana,  Katakana  and  Roman]!  together  fashioning  a 
favorable  reading  environment  for  screen  and  print.  Meiryo, 
too,  has  Greek  and  Cyrillic  companions.  That  one  family  has 
siblings  and  offspring,  manufactured  around  a  variety  of  alphabets 
and  writing  systems,  speaks  in  part  to  the  economic  and  political 
concerns  of  an  international  corporation  that  must  respond  to 
the  demands  of  different  cultural  markets  as  much  as  it  does  to 
the  need  for  multilingual  communication  (Fig.  17.14). 
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1894  1957 

Egyptian  Contemporary 

Serifa  Helvetica 

Rockwell  Frutiger 

Georgia  Verdana 

FIGURE  17.14.  Type  Classification.  These  five  A’s  are  representa¬ 
tive  of  typographic  style  from  the  1 600 's  to  present  day,  reflecting 
changes  in  tools,  fashion  and  current  events.  Ultimately,  choice 
of  output  media  should  determine  typeface  selection,  given 
details  such  as  stress,  the  thick  and  thin  parts  of  letterforms,  neg¬ 
ative  space,  viewing  environment,  output  resolution,  etc. 


A 

1788 

Modern 

Bodoni 

Didot 

Waldbaum 


VARIATIONS  IN  LETTERFORMS 


Variations  in  Stress 

Early  type  designers  mimicked  scribal  letterforms  because  they 
knew  and  understood  these  forms.  Old-style  typefaces  have 
a  diagonal  stress,  a  backwards  slant — a  visual  remnant  of 
the  pen — that  distributes  weight  through  the  thin  part  of  the 
letterform.  Over  time,  the  stress  traveled  several  degrees  to  the 
right  as  seen  in  transitional  typefaces.  With  modern  typefaces, 
the  stress  becomes  rigidly  vertical.  Egyptians  such  as  Century 
Schoolbook  have  a  slight  diagonal  stress;  other  Egyptians  with 
a  stronger  slab  serif  such  as  Serifa  have  no  stress.  Contemporary 
typefaces  have  no  noticeable  stress. 

Variation  in  Thick  and  Thin 

The  degree  of  contrast  between  the  thick  and  thin  strokes  of  the 
letters  can  vary.  Old  style  typefaces  have  little  contrast  in  strokes. 
This  contrast  increases  in  transitional  faces.  Extreme  stroke  con¬ 
trast  is  a  dominant  characteristic  of  modern  typefaces.  Egypt¬ 
ian  typefaces  return  to  less  contrast.  Contemporary  typefaces 
have  no  perceptible  thick  and  thin  strokes. 


Variations  in  Serifs 

Serifs  differ  in  weight  and  bracket,  which  is  the  shape  created  by 
the  serif  joining  the  vertical  stroke  of  the  letter.  Old-style  type¬ 
faces  have  heavy  concave  serifs  with  thickset  brackets.  The 
meticulous  serifs  of  modern  typefaces  are  refined  and  thin  and 
without  brackets.  Many  Egyptians  have  heavy  straight  serifs  with 
little  or  no  bracket.  Sans  serifs  are  considered  contemporary 
typefaces  (Eig.  17.15). 


TYPOGRAPHIC  GUIDELINES 


Combining  Typefaces 

Sans  serif  and  serif  typefaces  can  be  effectively  combined  if 
changes  are  limited  to  prevent  visual  chaos.  The  key  is  to  ensure 
that  the  result  respects  the  content  and  reinforces  the  informa¬ 
tion  hierarchy  and  overall  design  goals.  When  combining  type¬ 
faces, decide  whether  harmony  or  contrast  is  important.  Gener¬ 
ally,  do  not  use  more  than  two  different  type  families  in  a 
document.  Remember,  at  a  minimum,  a  type  family  offers  a  Ro¬ 
man,  italic,  bold,  and  bold  italic.  Consider  the  pattern  and  tex¬ 
ture  that  the  x-heights  and  stroke  weights  weave  when  com¬ 
bined.  Excellent  typography  does  not  impede  the  user  and 
the  information.  Too  many  typefaces  jar  and  confuse  the  reader, 
create  visual  intrusions,  and  slow  the  reading  process. 

Contrast  in  Weight  (Boldness) 

Combining  two  classic  typefaces  with  a  strong  differential  fac¬ 
tor  such  as  Helvetica  Extra  Bold  with  Times  New  Roman  can  add 
useful  contrast.  Be  wary  of  combining  intricate  typefaces  such  as 
Gill  Sans  Bold  and  Souvenir,  which  have  structures  that  may  not 
create  compatible  reading  environments.  Too  much  contrast 
and  visual  complexity  can  be  detrimental. 

Output  Device  and  Viewing  Environment 

The  quality  of  publishing  technologies  and  viewing  environ¬ 
ment  vary  greatly  —  laser  printer  versus  video  versus  electronic 

alphabet  ALPHABET 


alphabet  ALPHABET 

FIGURE  17.15.  Serif  versus  Sans  serif.  The  serif  typeface  Cen¬ 
tury  above  versus  the  san  serif  typeface  Univers,  below.Under- 
standing  a  typeface's  physical  characteristics  and  how  it  per¬ 
forms  in  different  environments  is  important.  Set  a  paragraph  of 
text  in  both  a  serif  and  sans  serif  with  exactly  the  same  line 
length,  size  and  spacing  and  compare  the  differences  on  screen 
and  on  paper. 
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media,  etc.  In  choosing  a  typeface,  its  style,  size,  spacing,  and 
leading,  think  about  the  final  output  medium,  and  examine  this 
technology’s  effect  on  legibility.  Low-quality  monitors  and  poor 
lighting  have  a  major  impact:  serifs  sometimes  disappear,  let¬ 
ters  in  small  bold  type  fill  in  and  colored  type  may  disappear 
altogether. 


Letter  Spacing  and  Word  Spacing 

While  the  spaces  within  and  between  letterforms  and  words  are 
determined  by  a  type  designer  in  order  to  set  a  rhythm  that 
reads  well,  this  spacing  can  be  altered  or  kerned.  However,  be 
careful!  When  letter  spacing  is  too  tight,  the  letters  are  hard  to 
distinguish  from  each  other  and  legibility  decreases.  When  let¬ 
ter  spacing  is  too  wide,  letter  groups  are  not  easily  recognized. 
Spencer  argues  that  optimal  letter  spacing  is  inconspicuous,  the 
user  can  read  quickly  and  easily  and  understand  content.  Tight 
word  spacing  makes  distinguishing  individual  words  difficult. 
When  word  spacing  is  too  wide,  word  groups  fall  apart.  When 
there  is  greater  space  between  words  than  there  is  between 
lines,  the  reader’s  eye  naturally  falls  to  the  closest  word,  which 
may  be  below  instead  of  across  the  line.  This  frequently  occurs 
with  low-resolution  or  low-cost  products. 


Line  Spacing/Leading 

Leading  is  the  distance  measured  in  points  between  the  baseline 
of  one  line  of  text  and  the  baseline  of  the  text  line  below  it.  As¬ 
cender  and  descender  length  influences  how  closely  lines  of 
type  can  be  stacked.  The  space  between  lines  of  text,  or  leading, 
should  increase  in  relation  to  type  size.  This  adjustment  is  visual 
not  mathematical.  Overall  legibility  may  be  improved  by  in¬ 
creasing  the  leading  in  relation  to  column  width. 


A  justified  coiumn  can  ieave 
uneven  word  spacing,  creating 
rivers,  or  verticai  white  spaces 
within  the  paragraph.  These 
rivers  cause  the  eye  to  move 
vertically  down  the  page,  to 
naturally  connect  visually  what 
is  closest  in  proximity,  instead  of 
easily  across  the  line  of  type.  It 
is  very  difficult  to  prevent  rivers 
in  justified  columns,  unless  much 
time  and  effort  is  applied.  For 
this  reason,  unless  the  type  is 
manually  set  or  adjusted,  it  is 
better  to  use  a  column  set  flush 
left,  ragged  right. 


A  justified  column  can  leave 
uneven  word  spacing,  creating 
rivers,  or  vertical  white  spaces 
within  the  paragraph.  These 
rivers  cause  the  eye  to  move 
vertically  down  the  page,  to 
naturally  connect  visually  what 
is  closest  in  proximity,  instead 
of  easily  across  the  line  of  type. 

It  is  very  difficult  to  prevent 
rivers  in  justified  columns,  unless 
much  time  and  effort  is  applied. 
For  this  reason,  unless  the  type 
is  manually  set  or  adjusted, 
it  is  better  to  use  a  column  set 
flush  left,  ragged  right. 


FIGURE  17.16.  With  current  technology,  the  difference  between 
a  justified  text  column  and  ragged  right  text  column  can  make  a 
huge  difference  in  readability.  In  a  poorly  justified  column,  spaces 
within  a  justified  line  connect  vertically  down  the  page,  distract¬ 
ing  the  eye  from  easily  reading  across  a  line  of  text. 


Highlighting  with  Type 

Content  can  be  highlighted  by  modifying  type  weight,  slope,  or 
case.  Weight  can  be  shifted  from  Roman  to  bold  or  extra  bold. 
Slope  can  be  altered  from  Roman  to  italic.  Be  mindful  that  italics 
are  appropriate  for  short  phrases  and  not  long  text  passages. 
The  italic  appears  lighter  and  smaller  on  the  page  when  com¬ 
pared  with  its  companion  Roman  and  its  complex  forms  are 
more  difficult  to  read.  Case  can  change  from  upper  and  lower¬ 
case  to  all  capitals  or  small  caps.  Using  all  caps  for  extended  text 
passages  impedes  readability  since  word  outlines  are  rectangu¬ 
lar  and  harder  for  the  eye  to  differentiate.  Limited  shape  and 
size  cues  are  available  to  help  differentiate  between  letters, 
words,  and  sentences  to  create  meaning.  Use  only  one  high¬ 
lighting  technique  for  emphasis. 


Line  Length/Column  Width 

The  correct  line  length  is  just  long  enough  for  the  eye  to  sweep 
across  without  losing  its  place  and  easily  drop  down  to  continue 
reading  the  following  lines.  A  good  rule  of  thumb  is  that  a  line  of 
average  length  contain  between  39  and  52  characters. 


Justified  vs.  Ragged  Right  (FlushLleft) 

A  justified  text  column  can  leave  uneven  word  spacing,  creat¬ 
ing  rivers,  or  vertical  white  spaces,  within  in  the  paragraph. 
Rivers  cause  the  eye  to  move  vertically  down  the  page,  natu¬ 
rally  connecting  with  what  is  closest  in  proximity,  instead  of 
moving  easily  across  the  line.  It  is  very  difficult  to  prevent  rivers 
in  justified  text  columns  without  spending  considerable  effort. 
Unless  the  type  is  manually  set  or  adjusted,  which  is  a  time- 
consuming  activity,  it’s  better  to  use  type  that  is  set  flush  left, 
ragged  right  (Fig.  17.16). 


Decorative  Typefaces 

Decorative  typefaces  are  of  limited  use  for  body  text,  because 
their  irregular  design  lessens  legibility  and  should  be  used  in 
headlines  with  caution.  Because  they  are  essentially  typographic 
fashion  statements,  decorative  typefaces  can  either  reinforce 
or  distract  from  the  overall  message  or  brand  of  a  particular 
product  or  organization. 

Black  on  White  vs.  White  on  Black  and  Dark  on  Light 
Background  vs.  Light  on  a  Dark  Background 

Positive  and  negative  type.  White  on  black  (or  light  on 
a  dark  background)  is  generally  regarded  as  less  legible  and 
much  more  difficult  to  read  over  large  areas.  To  the  eye,  white 
letters  on  a  black  background  appear  smaller  than  their  re¬ 
versed  equivalent.  The  amount  of  contrast  between  the  color 
of  type  and  the  background  is  an  especially  important  factor  for 
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online  communication.  Color  adds  exponential  levels  of  complex¬ 
ity  to  these  considerations  since  displays  are  nconsistent  from  one 
situation  to  another  (Fig.  17.17  &  Fig.  17.18).  (Dair,  1967) 


DESIGN  PRINCIPLES:  PAGE  DESIGN 


Typography  deals  with  legibility  and  page  design  focuses  on  read¬ 
ability,  the  ability  to  read  and  comprehend  information.  Can  the 
user  find  what  is  needed  on  the  page?  The  two  important  functions 
of  page  design  are  motivation  and  accessibility.  A  well-designed 
page  is  inviting,  drawing  the  eye  into  the  information.  Users  are 
motivated  to  accept  the  invitation.  An  effective  page  design  ensures 
that  the  reader  continues  by  increasing  the  ease  of  understanding 
and  accessibility  of  the  information.  (For  purposes  of  simplicity,  the 
term  page  design  is  used  interchangeably  to  mean  page,  screen, 
and  document  design.)  Motivation  and  accessibility  are  accom¬ 
plished  by  providing  the  reader  with  ways  to  quickly  understand 
the  information  hierarchy.  At  a  glance,  the  page  design  should  re¬ 
veal  easy  navigation  and  clear,  intuitive  paths  to  discovering  addi¬ 
tional  details  and  information.  This  is  called  “visual  mapping.” 

A  page,  site,  or  product  visually  mapped  for  easy  navigation  has, 

•  an  underlying  visual  structure  or  grid,  organizational  land¬ 
marks,  graphic  cues  and  other  reader  aids  (Fig.  17.19) 

•  distinctly  differentiated  information  types 

•  clearly  structured  examples,  procedures,  and  reference  tools 

•  well-captioned  and  annotated  diagrams,  matrices,  charts,  and 
graphics. 

This  kind  of  visual  structuring  helps  the  reader  and  provides 
an  obvious  path  through  the  materials,  aids  in  skimming,  gives  a 
conceptual  framework;  and,  prevents  a  feeling  of  information 
overload. 

A  table  of  contents  is  a  simple  visual  map.  It  quickly  provides 
a  general  overview  of  the  order  and  some  details  about  the 
structure  and  content.  What  it  does  not  reveal,  however,  are  pri¬ 
orities.  Site  maps  or  other  diagrams  provide  this  type  of  infor¬ 
mation  as  well. 


Building  the  Design  of  a  Page 

Effective  visual  mapping  is  apparent  in  the  sequence  shown  be¬ 
low  that  demonstrates  the  evolution  of  an  accessible  page  from 
plain  text.  As  design  elements  are  added,  the  page  becomes 
inviting  to  read  and  content  becomes  attainable.  The  final  ex¬ 
ample  organizes  the  content  into  units  of  information  using  line 
spacing  and  vertical  thresholds  or  queues.  Differentiation  in 
typeface,  weight,  and  scale  reinforce  structure.  Information  de¬ 
sign  techniques,  drawn  from  cognitive  science,  can  be  used  to 
improve  communication  effectiveness  and  performance. 

Gray  page  or  screen.  Raw  text  interests  few  readers.  When 
information  is  presented  as  a  uniform  undifferentiated  mass,  it  is 
difficult  and  irritating  to  use  and  easy  to  ignore  (Fig.  17.20). 

Chunking.  Structure  the  visual  field  by  breaking  like  kinds 
of  information  into  manageable  groups  according  to  subject 
matter.  Chunks  in  close  proximity  are  read  as  related.  Graphic 
devices  such  as  rules  and  line  spaces  are  used  to  reinforce  a 
grouping  and  separate  chunks  (Fig.  17.21). 

Queuing.  Order  information  chunks  visually  to  reflect  the 
content  hierarchy  by  addressing  the  user’s  requirements  of  sub¬ 
ject  matter,  order,  and  importance  (Fig.  17.22). 

Filtering.  Simplify  linguistic  and  visual  order  by  filtering 
out  unnecessary  background  noise,  which  interferes  with  the  in¬ 
formation  being  transmitted.  Filtering  builds  a  sense  of  layers 
of  information  by  using  color,  visual  cues  and  symbols,  and  bul¬ 
leted  lists  and  headers  to  make  a  page  effective  for  a  range  of 
users  and  uses  (Fig.  17.23). 

Mixing  modes.  People  learn  through  different  cognitive 
modes  or  styles.  Some  users  favor  text,  others  may  prefer  illus¬ 
trations,  photos,  diagrams,  or  formulas.  To  suit  these  varied 
learning  preferences,  information  must  be  translated  into  sev¬ 
eral  different  modes  that  are  then  carefully  presented  to  rein¬ 
force  content  and  organization  (Fig.  17.24). 


hqx  hqx  hqx  hqx  hqx 

PALATINO  BASKERVILLE  BODONI  SERIFA  HELVETICA 

FIGURE  17.17.  Black  on  White  vs.  White  on  Black.  The  x-height  of  a  typeface  (based  on  the 
actual  height  of  a  lower-case  x)  is  a  key  characteristic  when  deciding  the  visual  size  of  a  typeface, 
particularly  when  readability  is  critical.  While  the  above  typefaces  are  the  same  point  size,  some 
seem  larger  (e.g.,  Helvetica)  and  easier  to  read  than  others  (e.g.,  Serifa). Letterforms  often  appear 
as  black/dark  shapes  on  white/light  backgrounds.  The  eye  also  reads  the  reverse,  or  negative 
shape  around  a  letterform,  which  can  create  a  shape  that  visually  distracts  and  makes  text  diffi¬ 
cult  to  read. Try  setting  a  paragraph  of  type  black  on  white  background,  then  set  the  same  exact 
paragraph  with  white  type  on  a  black  background.  You  can  also  try  this  same  exercise  with  a  dark 
color  on  a  light  background,  and  then  try  the  reverse.  You  will  notice  that  the  greater  the  con¬ 
trast  (e.g.  white  type  on  black)  the  harder  it  is  to  read  in  large  amounts. 
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FIGURE  17.18.  Typographic  Contrasts.  This  figure  which  builds 
on  the  relationships  of  contrasts  discussed  and  illustrated  in 
Design  with  Type  by  Carl  Dair  (1967),  shows  typographic  contrasts 
that  can  be  easily  used  to  build  information  heirarchies. 


Word  Spacing  The  effect  of  too-tight  word  spacing  is 
that  words  are  more  difficult  to  distinguish  from  eac 
h  other.  When  word  spacing  is  too  wide,  gaps  betwe 
en  words  doni  allow  the  eye  to  talce  inword  groups 

as  easily.  Whan  there  Is  greater  space  between  wor 
ds  than  there  is  between  lines,  the  reader's  eye  traw 
els  down  the  page  instead  of  across  the  line.  This 
often  occurs  with  low-resoluti\on  or  low-cost  product 
s.Line  Spactng/LeadingThe  space  between  lines  of 
text,  or  leading,  should  increase  In  relation  totype 

size.  You  can  also  improve  legibility  by  increasing  th 
e  leading  in  relation  to  column  widtLine  Lerigth/Colu 
mn  WidthThe  correct  line  length  is  just  long  enough 
for  the  eye  to  easily  move  across  the  line  without  los 
ing  its  place,  and  drop  down  to  continue  reading  the 
following  lines.Justified  vs.  Ragged  Right  (flush  left) 


FIGURE  17.21.  Chunking. 


FIGURE  17.19.  Grid.  The  lines  that  appear  this  rectangle  pro¬ 
vide  an  understanding  of  what  the  grid,  or  underlying  structure 
is  of  this  page.  The  grid  is  used  as  a  guide  to  create  more  pages 
that  use  similar  relationships  for  placement  and  alignments,  it  is 
a  point  of  departure  for  one  who  understands  the  system,  to  cre¬ 
ate  variations,  be  a  bit  more  playful,  yet  still  provide  a  consis¬ 
tent  "feel”  to  the  user. 
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Word  Spacing  The  effect  of  too-tight 
word  spadng  is  that  words  are  more 
difficult  to  distinguish  from  each  ote 

When  word  spacing  is  too  wide,  gap 
s  between  words  don't  allow  the  eye 

dfarvjt  gylh 

as  easily.  When  there  is  greater  spa 
ce  between  words  than  there  is  bet 
ween  lines,  the  reader's  eye  travels 

This  often  occurs  with  low-resolutio 
n  or  low<ost  Line  Spacing/Leading 

The  space  between  lines  of  text,  or 
leading,  should  increase  in  relation 

FIGURE  17.22.  Queuing. 


Word  Spacing  The  effect  of  too-tight  word  spacing  is 
that  words  are  more  difficult  to  distinguish  from  eac 
h  other.  Whan  word  spacing  is  too  wide,  gaps  betwe 
an  words  don't  allow  the  eye  to  take  inword  groups 
as  easily.  When  there  is  greater  space  between  wor 
ds  than  there  is  between  lines,  the  reader's  eye  trav 
els  down  the  page  Instead  of  across  the  line.  This 
often  occurs  with  low-resolutl\on  or  low-cost  product 
s.Line  SpacIng^eadlngThe  space  between  lines  of 
text,  or  leading,  should  increase  In  relation  totype 
size.  You  can  also  improve  lagibillfy  by  lncreaair)g  th 
e  leading  in  relation  to  column  widtLine  LengtfYColu 
mn  WidthThe  correct  line  length  is  just  long  enough 
for  the  eye  to  easily  move  across  the  line  without  los 
ing  its  place,  and  drop  down  to  continue  reading  the 
following  lines.JustKied  vs.  Ragged  Right  (flush  left) 
A  justified  column  can  leave  uneven  word  spacing, 
creatig  ‘rivers’,  or  vertical  white  spaces,  within  in  the 
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Word  Spacing  The  effect  of  too-tight 
word  spacing  is  that  words  are  more 
difficult  to  distinguish  from  each  ote 
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FIGURE  1 7.20.  Gray  page  or  screen. 


FIGURE  17.23.  Filtering. 
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FIGURE  1 7.24.  Mixing  Modes. 


Abstracting.  The  individual  page  or  screen  is  a  microcosm 
of  the  complete  book,  site,  or  product.  The  result  is  a  complete 
codified  system  of  graphic  standards,  which  is  effective  for  both 
the  reader  and  the  producer.  Abstracting  builds  a  system  of  stan¬ 
dards  that  simplifies  text  organization,  creates  consistent  ap¬ 
proaches  to  preprocessing  information,  and  establishes  a  cus¬ 
tomized  look  for  an  organization’s  products  (Fig.  17.25). 


Other  Page  Design  Techniques 

White  space.  White  space  (or  empty  space)  is  an  under¬ 
utilized  but  extremely  effective  design  tool.  It  visually  opens  up 
a  page,  provides  focus,  helps  group  like  kinds  of  information, 
provides  resting  points  for  the  reader’s  eye,  and  creates  the  per¬ 
ception  of  simplicity  and  ease  of  use. 

The  grid.  A  grid  is  a  controlled  system  of  organization  that 
allows  for  the  distribution  of  visual  elements  in  an  intelligible  or¬ 
der.  A  grid,  as  part  of  an  overall  design  system,  provides  an  un¬ 
derlying  structure  that  determines  the  horizontal  placement  of 
columns  and  the  vertical  placement  of  headlines,  text,  graph¬ 
ics,  and  other  artwork. 

A  grid  is  built  on  a  series  of  consistent  relationships,  align¬ 
ments,  and  spatial  organizations.  It  acts  as  a  blueprint  that  can 
be  repeatedly  used  to  create  sequential  pages,  which  are  related 
but  respond  to  different  content.  "When  the  grid  system  is  un¬ 
derstood,  it  forms  the  basis  for  consistent  application  and  ex¬ 
tension  of  the  design  by  others  who  also  understand  the  inten¬ 
tion  of  the  system.  Every  strong  design  uses  an  underlying 
structure  or  grid  to  create  a  consistent  look  and  feel  to  any  form 
of  visual  communication.  An  analogy  can  be  created  between 
the  horizontal  and  vertical  lines  that  compose  a  grid  used  on  a 
page  and  the  metal  beams  that  systematically  make  up  the  over¬ 
all  supporting  structure  of  a  high-rise  building.  While  the  struc- 


FIGURE  17.25.  Abstracting. 


rural  supports  in  a  building  are  consistent  from  floor  to  floor, 
the  configuration  of  the  space  within  each  individual  floor  is 
based  on  each  occupant’s  needs.  The  same  hold  true  with  the 
grid  on  a  page.  An  important  tool  that  improves  usability,  a  grid 
enables  a  user  to  navigate  a  page  quickly  and  easily.  A  grid  spec¬ 
ifies  placement  for  all  visual  elements.  The  user  anticipates 
where  a  button  will  appear  or  how  help  is  accessed.  Product  or 
program  usefulness  and  ultimately  success  are  greatly  increased 
through  the  consistency  offered  by  a  grid. 

Field  of  vision.  Field  of  vision  refers  to  what  a  user  can 
see  on  a  page  with  little  or  no  eye  movement;  it  is  the  main  area 
where  the  eye  rests  to  view  most  of  the  page.  A  good  design 
places  key  elements  in  the  primary  field  of  vision  reflecting  and 
reinforcing  the  information  hierarchy.  Size,  contrast,  grouping, 
relationships,  and  movement  are  tools  that  create  and  reinforce 
field  of  vision.  The  user  first  sees  what  is  visually  strongest,  not 
necessarily  what  is  largest  or  highest.  This  is  particularly  true  for 
online  information  due  to  the  limitations  of  page  real  estate  and 
dense  information  environments. 

These  concepts,  as  well  as  the  strength  of  peripheral  vision, 
can  be  experienced  when  viewing  a  page  that  has  a  banner  ad¬ 
vertisement  or  moving  graphic. 

It  is  virtually  impossible  to  ignore  or  focus  attention  on  the 
primary  field  of  vision  when  there  is  winking  and  blinking  else¬ 
where.  In  fact,  superfluous  visual  devices  reduce  information’s 
value  by  distracting  and  disturbing  the  user’s  desire  and  ability 
to  focus,  read  and  understand. 

Proximity.  This  concept  applies  to  the  placement  of  vi¬ 
sual  elements  physically  close  to  each  other  so  that  it  is  under¬ 
stood  that  these  are  related  elements. 
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The  illusion  of  depth.  Though  the  online  world  exists 
two-dimensions,  contrast  can  be  used  to  create  the  illusion  of 
depth.  Contrasts  created  through  size,  weight,  structure,  direc¬ 
tion,  color,  texture,  and  layering  can  form  cues  that  reinforce  hi¬ 
erarchy  by  giving  the  illusion  that  an  element  is  on  top  of  or  in 
front  of  another. 


CHARTS,  DIAGRAMS,  GRAPHICS,  AND  ICONS 


The  goal  of  any  visual  device  is  to  provide  the  fastest,  most  effi¬ 
cient  path  to  understanding  ideas  and  to  make  these  ideas  clear 
and  compelling.  Useful,  effective  graphics  act  like  visual  short¬ 
hand,  particularly  important  when  the  real  estate  of  the  page  is 
limited  (Fig.  17.26).  A  good  visual  eliminates  the  need  for  text 
and  communicates  across  cultures.  A  bad  graphic  with  an  un¬ 
clear  meaning  that  must  be  reinforced  by  a  long  caption  can  be 
worse  than  none  at  all.  The  old  cliche,  a  picture  is  worth  a  thou¬ 
sand  words,  is  true  only  if  the  picture  is  efficient  and  effective.  In 
stressful  or  difficult  situations,  people  do  not  have  time  to  read 
or  the  ability  to  focus  on  text  and/or  complex  visuals.  Though 
more  difficult  to  achieve,  brevity  and  simplicity  in  such  cases 
have  greater  value.  Product  users  prefer  well-designed  charts, 
diagrams,  and  illustrations  that  quickly  and  clearly  communicate 
complex  ideas  and  information.  Studies  show  that  visual  images 
are  retained  long  after  the  reader  is  finished.  Designed  correctly, 
visual  images  can  make  information  memorable  and  effective.  At 
a  minimum,  a  powerful  illustration  or  graphic  can  often  improve 
performance  simply  because  it  increases  user  motivation.  Visuals 
are  robust  communications  tools  used  to  (a)  visualize  and  ana¬ 
lyze  data;  (b)  present  new  or  abstract  concepts;  (c)  make  phys¬ 
ical  and  technical  concepts  invisible  to  the  eye;  and,  (d)  sum¬ 
marize  information  efficiently  and  effectively.  Visuals  explain  and 


FIGURE  1 7.26.  Zen  calligraphy  is  an  example  of  the  historically 
close  relationship  between  word  and  image.  The  Zen  master 
Hakuin  (1768-1865)  created  this  symbol  to  mean  "dead",  with 
additional  notes  saying  "Whenever  anyone  understands  this,  then 
he  is  out  of  danger”. 


reinforce  concepts,  relationships  and  data,  making  them  tangi¬ 
ble.  Photographs,  charts,  illustrations,  icons,  or  diagrams  be¬ 
come  thinking  tools.  The  information  is  clarified,  made  easier  to 
evaluate,  and  has  greater  impact.  Visuals  are  a  very  effective  way 
to  communicate  a  message,  but  choosing  the  appropriate  pre¬ 
sentation  for  a  concept  is  critical  to  the  user’s  ability  to  effec¬ 
tively  comprehend  a  message.  Understanding  the  limitations  of 
the  display  medium  is  crucial  to  creating  a  successful  visual. 


Tables,  Charts,  Diagrams 

These  three  types  of  graphics  are  discussed  in  order  of  com¬ 
plexity.  Tables  are  the  least  difficult  to  create,  charts  the  second 
most  difficult,  and  diagrams  the  third.  Illustrations,  graphics, 
and  other  images  and  visuals  are  the  most  complex,  require 
more  conceptual  and  visual  sophistication,  and  may  require  a 
consultant  to  create.  "When  is  one  more  appropriate  than  the 
other?  Determining  which  format  is  the  most  effective  is  illus¬ 
trated  in  Fig.  17.27.  In  addition  to  this  list,  it  is  important  to  re¬ 
member  that  visual  cues  such  as  color,  shading,  texture,  lines, 
and  boxes  should  be  considered  redundant  cues  and  only  used 
to  provide  additional  emphasis  to  support  the  concept. 


Icons  and  Visual  Cues 

Icons  and  other  visual  cues  are  a  form  of  visual  shorthand, 
which  helps  users  locate  and  remember  information.  Develop¬ 
ing  an  easily  understood  style  that  is  consistent  with  the  overall 
program  style  is  not  easy.  Choose  a  style  that  is  simple  and  con¬ 
sistently  reinforced  throughout  a  product.  More  complex  and 
unique  symbols  and  icons  can  be  used  if  usage  takes  place  over 
a  longer  period,  allowing  product  familiarity  and  learning  to 
take  place.  The  MasterCard  logo  consists  of  two  intersecting 
circles.  After  many  years  of  reinforcement,  most  people  imme¬ 
diately  recognize  it  without  accompanying  text  or  other  expla¬ 
nation.  It  is  very  difficult  to  create  an  icon  that,  without  expla¬ 
nation,  communicates  a  concept  across  cultures.  For  example, 
the  use  of  a  freestanding  rectangular  box  with  an  open  door  flap 
indicates  mailbox  or  in-box.  This  kind  of  mailbox  is  rarely  used 
today  and  was  never  used  in  Europe  where  mail  is  placed  in 
slots  or  upright  boxes.  Even  the  concept  of  mail  delivery  can 
be  considered  strange.  This  is  a  case  where  meaning  had  to  be 
learned.  Although  simple  ideas  presented  as  icons  are  appro¬ 
priate,  a  program  with  many  complex  concepts  using  colloquial 
images  can  make  using  a  program  agonizing  for  users  from 
other  cultures.  There  is  an  important  difference  between  an 
icon  and  an  illustration,  though  the  two  concepts  are  often  con¬ 
fused.  If  an  icon  must  be  labeled,  it  is  really  an  illustration.  The 
icon’s  value  as  visual  shorthand  is  lost.  Better  to  use  a  word  or 
short  phrase  rather  than  word  and  image  when  screen  space  is 
at  a  minimum. 

A  successful  icon  is  memorable  with  minimal  reinforcement.  If 
after  viewing  an  icon  several  times  a  user  cannot  remember  its 
meaning,  then  the  icon  is  valueless  and  should  be  eliminated. 
Icon  sets  should  share  a  similarity  of  style  (business-like  or  play¬ 
ful)  and  possess  formal  presentation  properties  consistent  with 
the  overall  program  or  product  to  which  they  belong  (Fig.  17.28). 
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When  to  Use  Viihat  Graphic 

If  you  want  to  show... 

UM  a... 

Croups 

Group  of  related  items,  with  a  speOfk  order 

numbered  list 

Relationships 

Relationships  and  Steps  Involved  in  a  process 

Relationships  between  categories  Of  ideas 
Relationships  of  tasks  taking  place  over  time 

flow  chart 

table 

project  plan  table 

Eva!  u  ate/Compare 

Evaluate  items  agalhst  several  criteria 

Evaluate  Items  against  one  triteria 

Compare  more  than  one  Item  to  more  than 

one  variable 

Compare  several  things  in  relation  to  one  variable 
Compare  the  relative  parts  that  make  up  a  whole 

rating  table 
comparison  table 

matrix  diagram 

bar  chart 

pie  chart 

Hierarchy 

Hierarchical  structure  of  an  organization 

organizational  dtait 

Concepts 

Concept 

Abstract  concept 

FIGURE  17.27. 

illustration  and/or 
text  Icons,  other 
graphics 

complex  images 
interactive 

components 

FIGURE  17.28.  AIGA/DOT  Icons.  These  six  icons,  all  from  the 
same  set,  were  developed  by  the  American  Institute  of  Graphic 
Arts  and  the  U.S.  Department  of  Transportation  for  use  in  sig¬ 
nage.  From  upper  left  to  lower  right;  restaurant,  no  smoking, 
trash,  stairway  leading  up,  information  and  taxi. 

Illustrations  and  Photographs 

As  technology  improves,  the  only  limit  placed  on  the  use  of 
complex  images  will  be  by  the  designer.  The  most  important 
consideration  is  appropriateness  of  the  image  for  the  intended 


audience.  Do  not  use  cartoons  for  a  company  brochure,  or  a 
low-resolution  photograph  of  a  control  panel  when  a  line  illus¬ 
tration  is  more  effective. 

While  understanding  the  meaning  and  implications  of  illus¬ 
trations  and  photographs  is  no  easy  task,  there  are  guidelines 
for  making  choices.  A  photograph  can  easily  represent  an  exist¬ 
ing  object,  but  issues  related  to  resolution  and  cross-media  pub¬ 
lishing  can  make  it  unintelligible.  If  a  photograph  can  be  repro¬ 
duced  with  proper  resolution,  cropping  and  contrast,  and 
emphasize  a  required  detail,  then  photography  is  a  good  choice. 
A  photograph  can  provide  orientation  and  contextual  cues  that 
are  more  difficult  to  achieve  in  an  illustration.  No  matter  how 
simplified  or  cropped  to  focus  attention,  a  photograph’s  repro¬ 
duction  quality  is  often  unpredictable.  In  this  situation,  a  tech¬ 
nical  illustration  such  as  a  line  drawing  is  more  effective.  An  ob¬ 
vious  advantage  of  illustration  is  that  it  can  present  abstract 
concepts  or  objects  that  do  not  yet  exist,  or  that  may  never  ex¬ 
ist.  Another  benefit  is  the  ability  to  focus  the  viewer’s  attention 
on  detail.  For  example,  a  line  drawing  can  place  attention  on  a 
specific  machine  part  by  changing  line  weight.  To  achieve  a  sim¬ 
ilar  result  in  a  photograph  adds  time,  complicates  the  image, 
and  possibly  never  simplifies  the  explanation. 

Regardless  of  the  method  used  to  create  a  visual  explanation, 
it  must  clarify  and  reinforce  content.  If  the  purpose  of  the  image 
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is  to  explain  where  to  locate  a  piece  of  equipment,  than  an 
overview  of  the  equipment  in  the  environment  is  appropriate.  If 
the  goal  is  to  show  an  aspect,  such  as  a  button  location,  then  the 
illustration  should  only  focus  attention  on  that  aspect.  An  image 
can  be  cropped  to  focus  attention  on  what  is  being  explained; 
it  depends  on  the  goal  of  the  photograph  or  illustration. 

Situations  exist  in  which  it  is  more  effective  to  use  a  combi¬ 
nation  of  photography  and  illustration  than  either  alone.  For  ex¬ 
ample,  a  photograph  of  an  object  in  its  usage  environment  con¬ 
veys  more  information  than  that  of  the  object  itself  If  the 
objective  is  to  show  the  location  of  a  part  of  that  object,  then  a 
line  drawing  in  close  proximity  to  or  inset  in  the  photograph  is 
more  useful  than  a  photograph  or  illustration  alone) . 

Guidelines 

Visuals  should  reinforce  the  message.  Don’t  assume 
that  the  audience  understands  how  a  visual  reinforces  the  ar¬ 
gument  (Fig.  17.29).  A  clear  and  concise  argument  must  still  be 
made  that  helps  shorten  the  process  of  comprehension  and 
learning  and  causes  the  user  to  say,  ‘Aha,  that’s  how  it  works 
together!”  Visuals  should, 

•  clarify  complex  ideas 

•  reinforce  concepts 

•  help  the  user  understand  relationships 

Create  a  consistent  visual  language.  Create  a  consis¬ 
tent  visual  language  that  works  within  the  entire  communica¬ 
tion  system.  Graphics  attract  attention.  "When  the  user  sees  a 
screen,  the  eye  automatically  jumps  to  a  visual,  regardless  of  the 
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FIGURE  1 7.29.  Thirty  centuries  of  development  separate  the 
Chinese  ancient  characters  on  the  left  from  the  modern  writing 
on  the  right.  The  meaning  of  the  characters  is  (from  top  to  bot¬ 
tom):  sun,  mountain,  tree,  middle,  field,  frontier,  door. 


fact  that  it  may  interrupt  reading.  Graphics  should  conform  to 
all  elements  on  a  page.  Unharmonious  graphics  impede  com¬ 
prehension  by  increasing  the  effort  needed  to  understand  the 
relationship  between  the  text  and  the  visual. 

Consider  both  function  and  style.  It  is  important  to 
consider  function  versus  decoration.  Albeit  wonderful  to  see 
an  artistically  illustrated  tax  form,  is  it  appropriate  to  the  content 
or  image  of  both  the  message  and  organization  it  represents? 
The  best  graphic  is  appropriate  to  the  context  of  the  communi¬ 
cation  and  reinforces  and  validates  the  message  (Fig.  17.30). 

Focus  on  quality  vs.  quantity.  Graphics  are  only  effec¬ 
tive  if  they  are  carefully  planned,  executed,  and  used  sparingly. 
A  well-considered  diagram  with  a  concise  caption  is  more  ef¬ 
fective  than  several  poorly  thought-out  diagrams  that  require 
long  explanations. 

Work  with  a  professional.  Many  individuals  within  an 
organization  can  write  an  internal  report  but  a  writer  or  public 
relations  firm  is  usually  commissioned  to  find  the  most  effective, 
relevant,  and  interesting  way  to  communicate  a  public  message 
such  as  presented  in  an  annual  report  or  company  brochure. 
Similarly,  a  designer  or  visual  communications  firm  should  be 
retained  to  oversee  the  development  of  user  interfaces,  graph- 
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FIGURE  17.30.  It  is  obvious  which  of  the  above  examples  com¬ 
municates  an  important  message  most  quickly.  The  goal  for  the 
designer  is  to  communicate  the  message  in  the  most  direct  way, 
so  that  the  use  can  understand  and  make  decisions  based  on 
that  information.  Obviously  some  situations  are  more  critical 
than  others,  but  it  is  no  less  important  to  begin  design  with  con¬ 
sideration  of  the  needs  of  users. 
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ics,  and  other  visual  elements  that  impact  the  look  and  feel  and 
ultimately  the  overall  success  of  a  program. 

Build  a  library  to  create  visual  consistency,  organi¬ 
zational  identity,  and  a  streamlined  process.  Because 
graphics  require  a  professional,  they  can  be  very  time  consuming 
and  expensive  to  create.  Once  a  visual  language  and  style  are  es¬ 
tablished,  start  building  a  graphics  library.  If  concepts  are  re¬ 
peatedly  illustrated,  streamline  the  development  process  by  col¬ 
lecting  the  supporting  illustrations  and  making  them  available 
for  reuse.  An  organizational  style  can  be  created  for  these  visual 
explanations.  Through  repetition,  users  can  learn  to  associate  a 
style  and  method  of  explanation  with  an  organization,  which  aids 
in  understanding  and  reinforces  product  brand  and  identity. 

Reinforce  shared  meaning  (common  visual  language). 

A  serious  issue  to  consider  when  creating  graphics,  particularly 
conceptual  diagrams,  is  shared  meaning,  whether  it  be  across  an 
organization  or  the  globe.  Individuals  can  interpret  the  same  dia¬ 
gram  in  a  variety  of  ways  based  on  backgrounds  and  experiences. 

Truly  effective  graphics  require  extra  time  and  effort,  but  the 
payoff  is  tremendous.  Graphics  are  invaluable  tools  for  promot¬ 
ing  additional  learning  and  action  because  they  (a)  reinforce  the 
message,  (b)  increase  information  retention,  and,  (c)  shorten 
comprehension  time. 


COLOR 


Though  color  should  be  considered  a  reinforcing,  or  redundant, 
visual  cue,  it  is  by  far  the  most  strongly  emotional  element  in 
visual  communication.  Color  evokes  immediate  and  forceful  re¬ 
sponses,  both  emotional  and  informational.  Because  color  is  a 
shared  human  experience,  it  is  symbolic.  And  like  fashion,  the 
perception  of  color  changes  over  time.  In  all  communication, 
color  can  be  used  to  trigger  certain  reactions  or  define  a  style. 
For  example,  in  Western  business  culture,  dark  colors  such  as 
navy  are  generally  considered  conservative,  while  paler  colors 
such  as  pink  are  regarded  as  feminine.  In  other  cultures,  these 
color  choices  have  an  entirely  different  meaning. 

The  appropriate  use  of  color  can  make  it  easier  for  users  to 
absorb  large  amounts  of  information  and  differentiate  informa¬ 
tion  types  and  hierarchies.  Research  on  the  effects  of  color  in 
advertising  show  that  ads  using  one  spot  of  color  are  noticed 
200%  more  often  than  black-and-white  ads,  while  full-color  ads 
produce  a  500%  increase  in  interest.  Color  is  often  used  to: 

•  show  qualitative  differences 

•  act  as  a  guide  through  information 

•  attract  attention/highlight  key  data 

•  indicate  quantitative  changes 

•  depict  physical  objects  accurately 

All  in  all,  color  is  an  immensely  powerful  tool.  Like  the  tools 
of  typography  and  page  design,  it  can  easily  be  misused.  Re¬ 
search  shows  that  while  one  color,  well  used,  can  increase  com¬ 
munication  effectiveness,  speed,  accuracy,  and  retention,  mul¬ 


tiple  colors  when  poorly  used  actually  decrease  effectiveness. 
Because  it  is  readily  available,  it  is  very  tempting  to  apply  color 
in  superficial  ways.  For  color  to  be  effective,  it  should  be  used  as 
an  integral  part  of  the  design  program,  to  reinforce  meaning 
and  not  simply  as  decoration.  The  choice  of  color — while  ulti¬ 
mately  based  on  individual  choice-should  follow  and  reinforce 
content  as  well  as  function. 


Basic  Principles  of  Color 

Additive  primaries.  The  entire  spectrum  of  light  is 
made  up  of  red,  green,  and  blue  light,  each  representing  a  third 
of  the  spectrum.  These  three  colors  are  known  as  “additive  pri¬ 
maries,”  and  all  colors  are  made  up  of  varying  amounts  of  them. 
When  all  three  are  combined,  they  produce  white  light. 

Subtractive  primaries.  If  you  add  and  subtract  the 
three  primaries,  cyan,  yellow,  and  magenta  are  produced.  These 
are  called  “subtractive  primaries.” 

Green  -f  Blue  -  Red  =  Cyan 

Red  +  Blue  -  Green  =  Magenta  Red  +  Green  -  Blue  =  Yellow 

Color  on  a  computer  display  is  created  by  using  different 
combinations  of  red,  green,  and  blue  light.  In  print,  colors  are 
created  with  pigments  rather  than  light.  All  pigments  are  made 
up  of  varying  amounts  of  the  subtractive  primaries.  The  three  at¬ 
tributes  of  color  are, 

1.  hue — the  actual  color 

2.  saturation — the  intensity  of  the  color 

3.  value — includes  lightness  and  brightness: 

“Lightness”  refers  to  how  light  or  dark  a  color  appears.  “Bright¬ 
ness”  is  often  used  interchangeably  with  lightness;  however,  light¬ 
ness  depends  on  the  color  of  the  object  itself,  and  brightness  de¬ 
pends  on  the  amount  of  light  illuminating  the  object. 

How  to  Use  Color 

Less  is  more  .  .  .  useful  and  understandable.  Just  as 
you  can  overload  a  page  or  screen  with  too  many  typefaces,  you 
can  have  too  many  colors.  Given  the  unpredictability  of  color  dis¬ 
plays,  users,  and  viewing  situations,  the  choice  can  get  very  com¬ 
plicated.  Color  is  often  best  used  to  highlight  key  information.  As 
a  general  rule,  use  no  more  than  three  colors  for  primary  infor¬ 
mation.  An  example  is  the  use  of  black,  red,  and  gray  -  black  and 
red  for  contrasting  information,  gray  for  secondary.  When  think¬ 
ing  about  color  online,  one  must  remember  that  each  display  will 
output  color  in  a  different  way.  Add  to  that  the  lighting  situation 
and  a  variety  of  users.  All  these  factors  affect  color  choice. 

Create  a  color  logic;  use  color  coding.  Use  a  color 
scheme  that  reinforces  the  hierarchy  of  information.  Don’t  mis- 
cue  the  audience  by  using  different  colors  for  the  same  ele¬ 
ments.  Whenever  possible,  try  to  use  colors  that  work  with  the 
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project  identity  or  established  visual  language.  Create  a  color 
code  that  is  easily  understood  by  the  user  and  reinforces  the 
information. 

Create  a  palette  of  compatible  colors.  Harmonious 
color  is  created  by  using  a  monochromatic  color  scheme  or  by 
using  differing  intensities  of  the  same  hue.  However,  make 
them  different  enough  to  be  easily  recognized  and  simple 
enough  to  be  easily  reproduced,  no  matter  what  medium  you 
are  using. 

Use  complementary  colors  with  extreme  caution. 

These  are  colors  that  lie  opposite  each  other  on  the  color 
wheel.  Let  one  dominate  and  use  the  other  for  accents.  Never 
place  them  next  to  each  because  the  edges  where  they  meet  will 
vibrate.  Though  this  was  the  goal  of  pop  art  in  the  1960s,  it 
makes  pages  impossible  to  read.  One  must  check  each  particu¬ 
lar  display,  as  the  calibration  of  monitors  can  unexpectedly 
cause  this  to  happen. 

Decisions  regarding  color  in  typography  are  critical. 

Colored  type  appears  smaller  to  the  human  eye  than  the  same 
type  in  black.  This  is  important  to  consider  when  designing  user 
interfaces.  One  must  also  consider  the  “smear”  effect  on  typog¬ 
raphy  in  displays,  based  on  the  color  chosen  and  interaction 
with  colors  around  it.  Additionally,  quality  and  calibration  of  dis¬ 
plays  impact  characteristics  of  color  online. 

Consider  the  viewing  medium.  The  same  color  looks 
different  when  viewed  on  different  viewing  media  such  as  a 
computer  display,  an  LCD  projector,  color  laser  printer  versus 
dot-matrix  output,  glossy  versus  dull  paper. 

Context  is  everything.  Though  printed  color  is  very  fa¬ 
miliar  and  more  controllable,  projected  color  is  inconsistent  and 
varies  depending  on  such  things  as  lighting,  size  of  the  color 
area,  size  and  quantity  of  colored  elements,  lighting,  and  output 
device.  One  must  check  all  outputMewing  possibilities  to  insure 
that  a  color  is  readable  as  well  as  legible,  and  not  depend  on 
cross-media  specification  for  insuring  consistency.  What  might 
look  good  on  a  laptop  may  not  be  readable  when  projected  in 
a  room  for  hundreds  of  people  to  view,  and  may  look  com¬ 
pletely  different  when  printed  in  a  corporate  brochure.  The 
amount  of  color  will  affect  how  it  is  viewed  as  well  as  the  best 
background  choice.  A  blue  headline  is  very  readable  on  a  white 
background,  but  if  that  background  becomes  a  color,  then  read¬ 
ability  can  be  reduced  dramatically,  depending  on  how  it  gets 
presented  on  each  particular  display. 

Contrast  is  critical  when  making  color  choices. 

Contrast  is  the  range  of  tones  between  the  darkest  and  the  light¬ 
est  elements,  whether  one  is  considering  black  and  white  or 
color.  The  desired  contrast  between  what  is  being  “read”  (this 
includes  graphics,  photographs,  etc.)  must  be  clearly  and  easily 
differentiated  from  the  background  it  is  presented  against.  If 
there  is  not  enough  contrast  (of  color,  size,  resolution,  etc.),  it 
will  be  difficult  or  impossible  to  read.  This  is  particularly  a  prob¬ 
lem  with  online  displays,  as  the  designer  has  no  control  of  qual¬ 
ity  of  the  output  display. 


Quantity  affects  perception.  A  small  amount  of  color 
will  be  perceived  differently  than  the  same  color  used  in  a  large 
quantity.  In  the  smaller  area,  the  color  will  appear  darker;  in  the 
larger  area,  the  color  will  appear  lighter  and  brighter. 

Use  color  as  a  redundant  cue  when  possible.  At  least 
9%  of  the  population,  mostly  male,  is  color-deficient  to  some 
degree,  so  it  is  generally  not  a  good  idea  to  call  out  warning 
points  only  through  color.  With  a  combination  of  color  and  a  dif¬ 
ferent  typeface,  etc.,  you  won’t  leave  anyone  in  the  dark. 

We  live  in  a  global  world,  so  when  in  Rome.  .  .  Re¬ 
member  that  different  colors  have  different  connotations  within 
various  cultures,  religions,  professions,  etc.  For  example,  in  the 
United  States  on  February  I4th,  red  means  love,  but  in  Korea, 
red  means  death,  and  in  China,  red  is  used  in  weddings  and 
symbolizes  good  luck  and  fortune.  In  many  other  countries,  red 
means  revolution.  To  a  competitor,  red  means  first  place,  and 
to  an  accountant,  red  means  a  negative  balance.  To  a  motorist, 
red  means  stop,  and  in  emergencies,  a  red  cross  means  med¬ 
ical  help  (Fig.  17.31). 


CREATING  A  SYSTEM:  GRAPHIC  STANDARDS 


With  the  explosion  of  new  publishing  media  in  a  global  market¬ 
place,  the  need  for  guidelines  for  developing  and  producing 
consistent,  quality  communication  has  taken  on  a  new  urgency. 
New  technologies  make  it  easy  to  generate  images,  offering  a 
wealth  of  options  for  experienced  and  inexperienced  publishers 
alike.  The  danger  lies  in  creating  visual  chaos,  with  every  element 
demanding  attention  beyond  the  point  of  sensory  overload. 
With  new  tools,  chaos  can  happen  faster,  at  a  lower  cost,  and 
with  greater  distribution.  A  graphic  standards  manual  prevents 
this  confusion. 

^graphic  standards  manual  is  the  physical  manifestation  of 
an  identity  system.  The  design  historian  and  educator,  Philip 
Meggs  (1998)  writes  in  A  History  of  Graphic  Design  that  identity 
systems  arose  in  the  1950s  with  the  rise  of  multinational  corpo¬ 
rations  that  began  to  recognize  the  value  and  power  found  in 
presenting  a  cohesive  visual  image  globally.  A  quality-control 
agent,  a  standards  manual  allows  an  organization  to  document 
guidelines  and  provide  tools  for  organizing  and  structuring 
communications  and  reinforcing  brand  identity  to  diverse  in¬ 
ternal  and  external  audiences.  It  explains  the  methodology  be¬ 
hind  the  design,  specifies  written  and  visual  language,  produc¬ 
tion  materials,  and  methods  and  gives  examples  of  how  to  and 
not  to  use  the  identity  system  so  that  standards  can  be  imple¬ 
mented  by  different  people,  in  different  places,  at  different 
times.  A  standards  manual  supports  expansion  by  explaining 
how  to  maintain  a  consistent  brand  and  organizational  look  and 
feel  as  new  products,  features,  and  technology  are  introduced. 
A  graphic  standards  system  provides, 

•  Built-in  quality:  The  system  ensures  that  the  correct  organi¬ 
zation/product  image  is  communicated  to  all  audiences. 
Standards  promote  consistency  in  handling  information 
across  product  lines,  divisions,  projects,  etc. 
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FIGURE  17.31.  Trying  Examples  in  Your  Context.  Since  color  is  not  available  in  this  particular  edi¬ 
tion,  try  your  own  experiment.  Take  a  look  at  this  illustration,  and  recreate  a  paragraph  of  text, 
with  the  background  graded  from  100%  to  0%,  choosing  one  color  for  the  background.  Then  set 
lines  of  type  in  a  variety  of  typefaces  and  sized,  to  see  where  it  becomes  legible  or  totally  impos¬ 
sible  to  read.  The  important  thing  to  remember  is  to  test  out  whatever  choices  you  make  within 
the  particular  context  and  parameters,  including  viewing/projecting  devices.  Such  things  as  light¬ 
ing,  projection  distance,  and  users'  physiological  constraints  can  make  all  the  difference  as  to 
whether  something  can  be  read  or  not. 
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•  Control  over  resources:  A  system  provides  dramatic  manage¬ 
rial  control  over  resources  that  use  time,  money,  and  materi¬ 
als.  Well-developed  standards  build  in  flexibility.  New  com¬ 
munications  are  easily  developed  without  the  original 
designer  and  in  many  instances  are  developed  within  the  or¬ 
ganization  itself. 

•  Streamlined  development  process :  A  graphic  standards  system 
helps  structure  thinking  for  content,  design,  and  production 
by  providing  a  guideline  of  predetermined  solutions  for  com¬ 
munication  problems.  Typical  problems  are  solved  in  advance 
or  the  first  time  they  occur.  Most  importantly,  a  graphics  stan¬ 
dards  system  encourages  the  organization  as  a  whole  to 
progress  to  higher-level  issues  of  communication  effectiveness. 

What  does  a  system  cover?  Graphic  standards  histori¬ 
cally  have  been  applied  to  an  organization’s  logo,  stationery, 
business  cards,  and  other  printed  materials.  As  the  online  por¬ 
tion  of  an  organization’s  identity  dominates,  providing  for  cross¬ 
media  guidelines  is  even  more  critical.  Graphic  standards  are 
generally  communicated  to  the  organization  in  print  and  elec¬ 
tronic  form.  Documentation  often  include, 

•  Corporate  identity  manuals:  Style  guides  available  in  print 
and  online  illustrate  the  application  of  the  standards  across 
the  company’s  publications  and  provide  specifications  for 
production  and  expansion. 

•  Templates  and  guidelines:  Templates  and  guidelines  are  avail¬ 
able  in  paper  and  electronic  form  and  are  used  to  develop 
pages  for  both  environments. 

•  Editorial  style  guides:  Editorial  guides  determine  the  use  of 
product/service  names,  punctuation,  spelling,  and  writing 
styles. 

Developing  the  system.  When  developing  a  corporate 
graphics  standards  system,  consider  the  global  publishing  needs 
of  the  company,  the  resources  available  for  producing  docu¬ 
ments,  and  the  skill  level  of  those  in  charge  of  production.  To  re¬ 
sponsibly  determine  overall  needs,  a  team  effort  is  required. 
Personnel  from  areas  such  as  information  systems,  graphic  de¬ 
sign,  usability,  and  marketing  along  with  engineers,  writers,  and 
users  should  be  involved  in  the  process.  This  team  approach 
helps  build  support  for,  and  commitment  to,  the  corporate 
standards.  The  development  of  a  comprehensive  system  follows 
the  information  design  process  of  audit,  development  and 
implementation. 

Audit.  The  audit  is  a  critical  step  in  determining  the  scope 
and  parameters  of  an  organization’s  corporate  graphic  stan¬ 
dards.  Specific  questions  for  the  audit  phase  include, 

•  What  is  the  purpose? 

•  Who  are  the  audiences? 

•  What  are  the  differences  and  similarities  between  audiences? 

•  Who  will  do  the  work? 

•  How  long  will  it  take? 

•  What  tools  will  be  used? 

•  What  is  the  desired  company  or  product  image? 


Development.  Goals  for  the  development  phase  include: 
(a)  the  design  of  standards  that  are  easy  to  read,  use,  and  pro¬ 
ject  a  consistent  corporate  image;  (b)  design  of  products  that 
fit  within  the  production  parameters  of  the  company. 

Implementation.  The  implementation  phase  must  en¬ 
sure  that  the  system  is  accepted  and  used  properly.  This  re¬ 
quires  training  and  support,  easy  procedures  for  distributing 
and  updating  materials,  and  a  manual  explaining  how  to  use 
the  system.  The  development  of  standards  is  in  itself  an  educa¬ 
tional  process.  It  requires  all  participants  to  be  aware  of  com¬ 
munication  objectives  and  what  is  needed  to  meet  them.  As  al¬ 
ternatives  are  developed  and  tested,  management  has  the 
opportunity  to  evaluate  its  company’s  purpose,  nature,  and  di¬ 
rection  as  well  as  its  working  methods  and  communication  pro¬ 
cedures.  The  process  requires  commitment  and  involvement 
across  many  departments  and  levels.  The  result  is  an  empow¬ 
ering  of  the  organization — planting  the  seeds  for  growth  and  in¬ 
creased  effectiveness. 


DESIGNING  THE  EXPERIENCE 


The  heart  of  interface  design  is  to  define  and  create  the  user’s 
experience;  what  it  is  really  like  for  people  facing  the  monitor, 
using  a  cell  phone,  or  an  ATM.  Though  presentation  possibilities 
are  expanding  day  by  day,  our  capacity  to  understand,  use,  and 
integrate  new  information  and  technology  has  not  grown  at  the 
same  rate.  Making  the  most  appropriate  media  choices, 
whether  image,  animation,  or  sound,  to  explain  complex  ideas 
to  widely  varied  audiences  is  no  easy  task.  The  most  important 
guideline  is  to  understand  that  there  are  no  rules,  only  guide¬ 
lines.  It  is  a  generalization  to  say  that  a  visual  principle  works  a 
certain  way  because  any  change  in  context  changes  the  appli¬ 
cation  of  the  principle.  Eor  example,  in  the  early  days  of  the  soft¬ 
ware  industry,  research  showed  that  a  specific  blue  worked  well 
as  a  background  color.  Now,  however,  depending  on  monitor 
calibration,  as  well  as  environmental  lighting,  that  blue  could 
be  a  disaster.  In  fact,  that  blue  can  often  vibrate  if  type  in  par¬ 
ticular  colors  is  placed  on  it.  Of  course,  it  depends  on  the  type 
quantity,  size,  weight,  viewing  situation,  etc.  Sound  complex? 
For  this  reason,  it’s  important  to  understand  the  principles,  test 
the  ideas,  and  then  test  results  on  every  output  devices  that  will 
be  used.  Putting  known  guidelines  together  with  experience 
continually  gathered  from  the  field  allows  the  designer  to  de¬ 
velop  a  clear  understanding  of  what  works  well  in  a  well-defined 
environment  and  user  situation.  The  next  key  guideline  is  to 
keep  it  simple.  Although  many  tools  are  available,  there  is  only 
one  goal:  to  clearly  communicate  ideas.  The  designer  must  al¬ 
ways  ask,  ‘"WTiat  is  the  most  efficient  and  effective  way  to  com¬ 
municate  this  idea?”  A  good  illustration  might  work  better  (and 
take  less  bandwidth)  than  an  animated  sequence.  Text  set  in  a 
simple  bold  headline  might  allow  the  user  to  read  the  page 
more  efficiently  than  text  placed  in  a  banner  moving  across  the 
top  of  a  page  that  constantly  draws  the  eye  upward.  Animated 
icons  are  entertaining,  but  are  they  appropriate  or  necessary  for 
serious  financial  information?  It’s  tempting  to  use  new  tools. 
The  best  tip  is  to  use  a  tool  only  if  it  can  explain  an  idea  better 
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than  any  other  tool,  enhance  an  explanation,  or  illustrate  a  point 
that  otherwise  could  not  be  made  as  effectively  or  efficiently. 
The  best  design  is  not  noticed;  it  just  works.  Products  are  used 
to  accomplish  tasks,  not  to  draw  attention  to  the  design.  The 
best  test  of  product  success  is  the  ease  with  which  a  user  can 
understand  and  complete  a  task  and  move  on  to  the  next  task. 
Real  estate  and  online  real  estate  are  alike  in  that  they  both 
stress  location,  location,  location!  With  such  a  premium  on 
space,  and  so  much  to  accomplish  in  so  little  time,  be  consid¬ 
erate  and  efficient  with  online  real  estate. 

Use  the  elements  found  in  a  product’s  graphic  standards  ap¬ 
propriately.  Constantly  consider  choices  and  context  and  review 
design  principles.  The  following  are  issues  and  considerations 
to  continually  evaluate  when  presenting  interactive  information. 

Effective  and  appropriate  use  of  the  medium.  Tran¬ 
sitioning  a  print  document  to  an  online  environment  requires 
rethinking  how  the  document  is  presented.  Viewing  and  navi¬ 
gating  through  online  information  requires  radically  different 
design  considerations  and  methods.  Users  do  not  necessarily 
view  the  information  in  a  linear  way,  in  a  specific  order,  or  time- 
frame.  Interactive  media  viewed  on  computer  screens  have 
quite  different  characteristics  and  potential,  particularly  as  in¬ 
formation  crosses  platforms,  resolutions,  and  environments. 
The  rich  medium  of  print  allows  a  book — a  product — to  be 
held,  viewed,  and  read  in  a  sequence  determined  by  the  user. 
The  physicality  of  a  book  provides  sensory  cues  that  are  not 
present  on  a  two-dimensional  monitor.  Interface  designers  must 
find  ways  to  provide  equivalent  cues  that  encourage  people  to 
handle  products  comfortably  and  with  confidence. 

The  element  of  time.  The  element  of  time  is  the  critical 
difference  between  static  and  interactive  media.  The  sense  of  in¬ 
teraction  with  a  product  impacts  the  user’s  perception  of  useful¬ 
ness  and  quality.  Animated  cues  such  as  blinking  cursors  and  other 
implied  structural  elements  like  handles  around  selected  areas  be¬ 
come  powerful  navigational  tools  if  intuitively  understood  and 
predictably  applied.  Consider  how  the  product  will  be  used.  Will 
the  user  sit  down  and  calmly  use  the  product  or  will  he  or  she 
panic  and  fumble  with  a  keypad?  Will  the  task  be  completed  at  one 
time  or  at  intervals  over  hours,  days,  months,  and  years?  The  ele¬ 
ment  of  time  contributes  to  the  design  criteria  and  choices. 


CONSISTENT  AND  APPROPRIATE 
VISUAL  LANGUAGE 


A  major  issue  is  the  unpredictability  and  vastness  of  products. 

Providing  way-finding  devices  that  are  easy  to  recognize,  un¬ 
derstand,  and  remember,  include, 

•  clear  and  obvious  metaphors 

•  interface  elements  consistent  with  the  visual  style  of  other 
program  parts,  including  consistent  style  for  illustrations, 
icons,  graphic  elements,  and  dingbats. 

•  guidelines  for  navigational  aids  such  as  color,  typography,  and 
page/screen  structure  that  are  consistent  with  other  parts  of 
product  support. 


Navigational  Aids 

Progress  through  a  book  is  seen  and  marked  in  many  ways. 
Bookmarks  and  turned  corners  serve  as  placeholders.  Pens  act 
as  mnemonic  devices  highlighting  or  underlining  text.  Table  of 
contents  and  indices  reveal  content  location.  A  finger  marks  a 
passage  to  be  shared  with  a  colleague  as  a  book  is  cradled. 

Unlike  a  book,  a  digital  document  or  program  cannot  be 
seen  or  touched  in  its  entirety.  If  a  document  cannot  be  held, 
how  is  specific  location  known  in  relation  to  overall  location? 
How  do  users  return  to  or  move  forward  through  content?  How 
do  users  travel  through  unfamiliar  space? 

Navigational  aids  provide  users  with  highways,  maps,  road 
signs,  and  landmarks  as  they  move  through  the  online  landscape. 
They  enhance  discovering  and  communicate  the  underlying 
structure;  thereby  providing  a  sense  of  place  so  that  users  know 
where  they  are,  where  they  have  been,  and  how  to  move  else¬ 
where  or  return  to  the  beginning.  Using  or  building  on  already  fa¬ 
miliar  visual  elements,  such  as  those  found  in  other  products  and 
earlier  releases,  leverages  existing  knowledge.  Graphic  standards 
support  this  as  well.  Consistent  use  of  page  layout  and  grid  struc¬ 
ture  makes  it  easier  to  remember  how  information  is  organized 
and  where  it  is  placed  or  zoned.  This  ensures  that  whatever  visual 
cues  are  applied  can  take  advantage  of  the  user’s  experience,  ul¬ 
timately  saving  time  for  both  the  designer  and  user. 

Graphics/Icons 

"Visual  representations  such  as  site  maps,  graphics,  and  icons  are 
effective  devices  for  orienting  users  within  a  program.  A  site 
map  offers  an  overall  product  view  and  shows  sections  or  units 
of  information  and  how  these  units  are  related.  In  the  digital  en¬ 
vironment,  graphics  and  icons  assume  the  role  of  contents,  in¬ 
dexes,  and  page  numbers  and  can  be  more  effective  guides 
through  and  around  a  program  than  their  counterparts  in  print, 
because  tools  such  as  roll-overs  highlight  functionality.  Creat¬ 
ing  effective  graphics  and  icons  requires  that  intent  and  action 
are  defined  and  designed. 

Metaphor 

Prior  knowledge  makes  it  easier  to  learn  because  it  provides  a 
conceptual  framework  on  which  information  can  be  associated 
and  expanded.  "When  it  was  first  introduced,  the  metaphor  of  a 
desktop  with  a  filing  system  for  a  software  interface  that  orga¬ 
nized  data  in  a  program  was  easy  to  grasp  because  it  built  on  a 
known  experience.  Using  familiar  visual  analogies  helps  users 
easily  understand  and  organize  new  information. 

Color 

Color  is  a  free  and  very  seductive  design  tool  once  a  monitor  is 
purchased.  Use  it  intelligently.  A  monitor  offers  limited  work¬ 
space.  Color  can  replace  or  reinforce  written  explanations  when 
meaning  is  assigned  to  it  and  it  is  applied  methodically.  A  blue 
background  is  always  utilized  in  a  testing  section  and  a  yellow 
background  in  a  section  overview. 
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Legibility 

Legibility  is  the  ability  to  read  information  on  the  page.  The 
page  can  be  a  screen,  and  as  such,  has  special  considerations. 
Color,  size,  background,  movement,  viewing  environment,  light¬ 
ing,  and  resolution  play  a  critical  part  in  legibility. 

Readability 

Readable  screens  demand  intelligent  visual  representations  and 
concise,  unambiguous  text.  Meaning  can  be  implied  or  inferred 
by  the  placement  of  elements  in  designated  areas  or  zones  re¬ 
served  for  distinct  information.  This  makes  optimal  use  of  a  lim¬ 
ited  space  and  increases  comprehension  and  accessibility. 

Guidelines 

Use  the  analogy  of  a  poster  as  a  guide  to  design. 

A  home  page  is  the  equivalent  of  an  attention-grabbing  poster 
unpredictably  placed  in  uncontrollable  locations.  Unlike  a  home 
page  were  a  mouse-click  on  a  speaker’s  name  can  give  bio¬ 
graphical  information  and  a  click  on  location  can  find  directions, 
a  poster’s  static  format  restricts  the  amount  and  depth  of  infor¬ 
mation  that  it  can  offer. 

In  print,  information  is  presented  in  a  fixed  order.  On  the 
web,  information  is  organized  hierarchically  in  a  manner  that  is 
radically  different.  Individual  users  can  access  the  information 
offered  on  a  website  in  a  sequence  that  suits  their  intents  and 
purposes.  Online  environments  offer  little  regulation  over  how 
and  in  what  order  the  product  is  accessed.  While  designers  can 
make  suggestions  and  guesses,  this  lack  of  control  requires 
fundamental  differences  in  information  presentation.  A  well- 
designed  home  page,  like  a  well-designed  poster,  should  hint 
at  all  topics  contained  in  the  site,  provide  high-level  information 
about  these  topics,  and  suggest  easy  paths  to  access  this  infor¬ 
mation.  If  information  goes  beyond  a  single  screen,  its  design 
must  visually  communicate  location  through  strong  visual  hints, 
so  that  the  user  investigates  beyond  what  is  immediately  visible. 
Imagine  the  design  considerations  required  for  smaller,  hand¬ 
held,  voice-activated  devices. 

Design  for  the  most  difficult  common  denominator. 

Design  the  interface  in  anticipation  of  a  worst-case  scenario. 
If  a  manager  will  use  a  product  in  a  quiet  office  with  a  fast  con¬ 
nection,  perfect  lighting,  and  a  large  monitor,  then  the  prob¬ 
lem  is  different  from  that  of  a  contractor  accessing  critical 
information  on  a  laptop  in  the  field.  User  profile  is  often  un¬ 
known  because  new  technologies  define  new  categories  as 
new  opportunities  are  recognized.  Consider  the  breadth  of 
possibilities.  Design  from  the  user’s  perspective.  Testing,  view¬ 
ing,  and  questioning  can  make  the  difference  of  product  ac¬ 
ceptance  or  not. 

Avoid  overuse  of  saturated  colors.  Saturated  colors 
such  as  red  tend  to  jump  out  at  the  viewer  and  can  be  distract¬ 
ing  and  irritating.  Red  is  usually  not  a  good  choice  for  large  areas 
of  on-screen  color.  High  impact  is  dependent  on  the  contrast 


between  background  and  foreground  colors.  On  a  black  back¬ 
ground,  both  yellow  and  white  have  a  higher  impact  than  red. 
Consider  variations  in  every  viewing  situation  including  how 
these  variations  affect  contrast  among  page  elements  and  over¬ 
all  legibility  and  readability. 

Consider  different  users’  levels  of  skill.  Navigational 
tools  should  be  simple  enough  for  a  novice  to  use,  but  should 
not  impede  an  expert.  Detailed  visual  maps  and  other  graphics 
should  be  available  for  those  who  need  them,  without  hindering 
an  experienced  user  who  wants  to  bypass  an  explanation. 

Be  aware  of  the  fatigue  factor.  Although  there  is  no 
definitive  answer  on  fatigue  caused  by  looking  at  a  computer 
screen  for  long  periods  of  time,  it  is  a  central  factor  to  consider. 
According  to  Color  and  the  Computer,  by  H.  John  Durrett,  look¬ 
ing  at  a  well-designed  computer  screen  should  not  cause  any 
more  fatigue  than  reading  a  book  or  writing  a  report.  Though 
some  would  disagree  with  this  statement,  many  people  spend 
more  time  with  their  computer  than  a  book  and  no  doubt  could 
offer  additional  opinions  on  this  subject.  As  interactive  media 
becomes  a  commodity,  the  focus  will  not  be  on  what  a  program 
does,  but  on  how  it  does  it.  This  will  make  the  difference  be¬ 
tween  product  acceptance  and  product  failure.  Success  or  fail¬ 
ure  will  be  judged  by  the  ease  with  which  a  product  is  used  and 
how  easy  users  perceive  its  interface. 

Other  differences  to  consider.  There  are  a  many  dif¬ 
ferences  that  impact  how  and  why  interfaces  are  designed,  and 
many  of  these  differences  are  discussed  in  more  detail  in  other 
chapters.  A  designer  should  never  forget  that  physical  and  men¬ 
tal  impairments  impact  an  ability  to  read,  comprehend,  and  use 
interfaces. 

Use  the  “squint  test”  to  check  the  design.  The  squint 
test  is  a  very  simple  self-test  that  checks  visual  hierarchy.  Sim¬ 
ply  squint  at  the  page  so  that  details  are  out  of  focus.  "What  is  the 
first,  most  dominant  element  on  the  page?  Is  this  what  should 
be  seen  first?  "What  has  secondary  importance?  Cognitive  psy¬ 
chology  calls  this  “visual  queuing.”  Successful  interaction  design 
creates  a  visual  order  that  the  user  can  easily  follow. 


CHALLENGES  AND  OPPORTUNITIES 


Creating  Your  Own  Guidelines 

Interactive  communication  designers  face  great  challenges. 
How  can  products  that  are  seen,  read,  understood,  and  acted 
upon  be  created?  Given  increasing  variety  and  complexity,  how 
can  the  power  of  new  technologies  be  harnessed?  How  can  in¬ 
formed  visual  choices  be  made?  "WARNING:  No  book,  seminar, 
or  technology  will  turn  someone  into  a  professional  designer! 
Design  is  not  a  craft  dependent  upon  aesthetic  ability.  Design  re¬ 
quires  education,  training,  and  experience  in  a  variety  of  related 
disciplines  equivalent  to  that  of  an  architect,  engineer,  surgeon, 
or  cabinetmaker.  The  following  guidelines  are  offered  as  starting 
points,  first  steps  in  understanding  how  to  make  informed  de- 
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sign  decisions — design  that  provides  the  best,  most  thought¬ 
ful,  and  appropriate  integration  of  both  form  and  function. 

•  There  are  no  universal  rules,  only  guidelines.  If  there 
were  rules,  everything  would  look  the  same  and  work  per¬ 
fectly  according  to  these  rules.  Each  situation  is  different  with 
its  own  context  and  parameters. 

•  Remember  the  audience:  be  a  user  advocate.  Think 
about  audience  needs  first  throughout  the  development 
process.  Who  is  in  the  audience?  What  are  their  require¬ 
ments?  How  and  where  will  the  audience  use  the  product? 
The  evaluation  criteria  used  in  the  design-development 
process  springs  from  the  answers  to  these  and  other  ques¬ 
tions.  Designers  must  understand  and  advocate  for  the  user. 

•  Structure  the  messages.  Analyze  content  to  create  a  clear 
visual  hierarchy  of  major  and  minor  elements  that  reflects  the 
information  hierarchy.  This  visual  layering  of  information 
helps  the  user  focus  on  context  and  priorities. 

•  Test  the  reading  sequence.  Apply  the  squint  test.  How  does 
the  eye  travel  across  the  page,  screen,  or  publishing  medium? 
What  is  seen  first,  second,  and  third?  Does  this  sequence  sup¬ 
port  the  objectives  and  priorities  as  defined  in  the  audit? 

•  Form  follows  function.  Be  clear  about  the  user  and  use  en¬ 
vironment  first.  An  effective  interface  design  represents  and 
reinforces  these  goals. 

•  Keep  things  simple.  Remember  the  objective  is  to  com¬ 
municate  a  message  efficiently  and  effectively,  so  that  users 
can  perform  a  task.  Fewer  words,  type  styles,  and  graphic  el¬ 
ements,  mean  less  visual  noise  and  greater  comprehension. 
An  obvious  metaphor  enhances  intuitive  understanding  and 
use.  The  goal  is  to  transfer  information,  not  show  off  features 
or  graphics. 

•  People  don’t  have  time  to  read.  Write  clearly  and  con¬ 
cisely.  Design  information  in  an  economical,  accessible,  intu¬ 


itive  format  that  is  enhanced  by  a  combination  of  graphics 
and  typography.  Graphics,  if  well  thought  out  and  designed 
as  an  integral  part  of  the  page,  are  very  powerful  and  can  ef¬ 
ficiently  and  effectively  provide  explanations  while  saving 
space  on  a  page. 

•  Be  consistent.  Consistent  use  of  type,  page  structure  and 
graphic  and  navigational  elements  create  a  visual  language 
that  decreases  the  amount  of  effort  it  takes  to  read  and  un¬ 
derstand  a  communication  piece.  The  goal  is  to  create  a  user 
experience  that  seems  effortless  and  enjoyable  throughout. 

•  Start  the  design  process  early.  Don’t  wait.  Assemble  the 
development  team  of  designers,  usability  professionals,  en¬ 
gineers,  researchers,  writers,  and  user  advocates  at  the  be¬ 
ginning  of  the  process.  With  interactive  media,  the  traditional 
review  and  production  process  will  change.  The  process  is 
less  of  a  hand  off  and  more  of  a  team  effort;  it’s  more  like 
making  a  film  than  writing  a  book.  Successfully  applying  the 
principles  of  good  design  enables  an  organization  to  com¬ 
municate  more  effectively  with  its  audiences  and  customers 
improving  the  worth  of  its  products  and  services  and  adding 
value  to  its  brand  and  identity. 

•  Good  design  is  not  about  good  luck.  Good  design  for  us¬ 
able  interfaces  appropriately  applies  the  fundamentals  of  vi¬ 
sual  design  to  interactive  products.  Creating  the  most  use¬ 
ful,  successful  design  for  an  interactive  product  is  difficult. 
The  design  process  is  iterative  and  experiential.  There  are 
usually  several  possible  ways  to  solve  a  problem,  and  the  final 
design  decision  is  dictated  by  the  best  choices  that  work 
within  the  parameters  at  any  particular  time.  Advocate  on  be¬ 
half  of  the  user.  Users  are  why  designers  are  here  and  have 
this  work  to  do.  Users  are  everywhere,  often  in  places  not 
yet  imagined.  As  the  world  grows  smaller  and  becomes  even 
more  connected,  the  opportunity  lies  in  where  and  what  has 
not  been  discovered. 
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User  interfaces  (UIs)  for  successful  products  and  services  enable 
users  around  the  world  to  access  complex  data  and  functions. 
Solutions  to  global  user  interface  design  consist  of  partially  uni¬ 
versal  and  partially  local  solutions  to  the  design  of  metaphors, 
mental  models,  navigation,  appearance,  and  interaction.  The 
UI  development  process  must  account  for  many  complex  local¬ 
ization  issues,  in  particular,  the  user  group’s  culture.  Culture 
dimensions,  i.e. ,  those  proposed  by  anthropologists  and  other 
analysts  of  culture,  can  provide  insight  into  how  designers 
can/should  adjust  UIs  to  better  serve  users.  By  managing  users’ 
experiences  of  familiar  structures  and  processes,  the  UI  de¬ 
signer  can  achieve  compelling  solutions  that  enable  the  UI  to  be 
more  useful  and  appealing.  Users  across  the  globe  will  be  more 
productive  and  engaged  with  the  products  and  services. 

The  concept  of  User  Interfaces  for  All,  as  set  forth  by 
Stephanidis  (2000),  implied  the  availability  of  and  easy  access 
to  computer-based  products  and  services  among  all  peoples  in 
all  countries  worldwide.  Successful  computer-based  products 
and  services  developed  for  users  in  different  countries  and 
among  different  cultures  (even  within  one  country)  consist  of 
partially  universal,  general  solutions  and  partially  unique,  local 
solutions  to  the  design  of  UIs.  Global  enterprises  seek  to  mass- 
distribute  products  and  services  with  minimal  changes  to 
achieve  cost-efficient  production,  maintenance,  distribution, 
and  user  support.  Nevertheless,  it  is  becoming  increasingly  im¬ 
portant,  technically  viable,  and  economically  necessary  to  pro¬ 
duce  localized  versions  for  certain  markets.  UIs  must  be  de¬ 
signed  for  specific  user  groups,  not  merely  translated  and  given 
a  superficial  “local”  appearance  for  quick  export  to  different 
markets. 

Insufficient  attention  to  localization  can  lead  to  embarrassing 
and  sometimes  critical  miscommunication.  For  example,  in  Chi¬ 
nese,  Coca-Cola  means  “bite  the  wax  tadpole”  or  “female  horse 
stuffed  with  wax,”  depending  on  the  dialect,  which  caused  the 
company  to  change  its  name  to  a  phonetic  equivalent,  which 
means  “happiness  in  the  mouth.”  Similarly,  Pepsi’s  slogan 
“Come  alive  with  Pepsi”  becomes  “Pepsi  brings  your  ancestors 
back  from  the  grave.”  (Hendrix,  2001).  Differences  of  culture 
can  lead  to  significant  business  implications.  For  example,  Saudi 
Arabia’s  Higher  Committee  for  Scientific  Research  and  Islamic 
Law  banned  Pokemon  video  games  because  they  “possessed  the 
minds”  of  Saudi  children,  thus  closing  off  one  of  the  Middle 
East’s  largest  markets  to  the  Japanese  Nintendo’s  multibillion- 
dollar  enterprise  (Associated  Press) . 

By  contrast,  attention  to  localization  of  language  leads  to 
greater  comprehension,  which  can  lead  to  a  drop  in  customer- 
service  costs  when  instructions  are  displayed  in  a  user’s  native 
language.  Moreover,  localization  can  lead  to  greater  attention 
and  retention  on  the  part  of  viewers/customers.  This  implica¬ 
tion  is  especially  significant  for  "Web-based  communication, 
where  Forrester  (1998)  reported  that  visitors  remain  twice  as 
long  reviewing  local-language  sites  as  they  do  English-only  sites, 
and  business  users  are  three  times  more  likely  to  buy  when 
communication  is  in  their  own  language.  Developers  in  the  Eu¬ 
ropean  Union  are  faced  with  the  daunting  but  unavoidable  chal¬ 
lenge  of  providing  websites  in  more  than  a  dozen  languages  to 
succeed  in  national  markets. 

By  managing  the  user’s  experience  with  familiar  structures 
and  processes,  surprise  at  novel  approaches,  and  preferences/ 


expectations,  the  UI  designer  can  achieve  compelling  forms 
that  enable  the  UI  to  be  more  usable,  useful,  and  appealing.  Us¬ 
able  is  defined  by  the  international  Standards  Organization  in 
Switzerland  (ISO)  as  effective,  efficient,  and  satisfying.  Global¬ 
ization  of  product  distribution  requires  a  strategy  and  tactics 
for  the  design  process  that  enables  efficient  product  develop¬ 
ment,  marketing,  distribution,  and  maintenance.  Globalization 
of  UI  design,  whose  content  and  form  is  so  much  dependent 
upon  visible  languages  and  effective  communication,  improves 
the  likelihood  that  users  will  be  more  productive  and  engaged 
with  computer-based  products  and  services  in  many  different 
locations  globally. 

Erom  the  designer’s  perspectives,  two  primary  objectives  are 
(a)  provide  a  consistent  UI,  and,  more  generally,  user  experi¬ 
ence  across  all  appropriate  products  and  services,  and  (b)  de¬ 
sign  products  and  services  with  their  necessary  support  systems 
that  are  appropriately  internationalized  (prepared  for  localiza¬ 
tion)  and  localized,  i.e.,  designed  for  specific  markets.  Before 
discussing  globalization,  localization,  and  culture  issues,  we  re¬ 
view  briefly  essential  concepts  of  UI  design. 

Demographics,  experience,  education,  and  roles  in  organi¬ 
zations  of  work  or  leisure  characterize  users.  Their  individual 
needs  and  wants,  hence  their  goals,  as  well  as  their  group  roles 
define  their  tasks.  User-centered,  task-oriented  design  meth¬ 
ods  account  for  these  aspects  and  facilitate  the  attainment  of 
effective  UI  designs  that  acknowledge  and  respect  users. 

UIs  conceptually  consist  of  metaphors,  mental  models,  nav¬ 
igation,  interaction,  and  appearance,  which  may  be  defined  as 
follows  (Marcus,  1995,  1998): 

•  Metaphors-,  essential  concepts  conveyed  through  words  and 
images,  or  through  acoustic,  or  tactile  means.  Metaphors 
concern  both  over-arching  concepts  that  characterize  inter¬ 
action,  as  well  as  individual  items,  like  the  “trash  can”  stand¬ 
ing  for  “deletion”  within  the  “desktop”  metaphor. 

•  Mental  models-,  organization  of  data,  functions,  tasks,  roles, 
and  people  in  groups  at  work  or  play.  The  term,  similar  to, 
but  distinct  from  cognitive  models,  task  models,  user  models, 
etc.,  is  intended  to  convey  the  organization  observed  in  the 
UI  itself,  which  is  presumably  learned  and  understood  by 
users  and  which  reflects  the  content  to  be  conveyed,  as  well 
as  the  user  tasks. 

•  Navigation-,  movement  through  mental  models,  afforded  by 
windows,  menus,  dialogue  areas,  control  panels,  etc.  The 
term  implies  dialogue  and  process,  as  opposed  to  structure, 
i.e.,  sequences  of  content  potentially  accessed  by  users,  as 
opposed  to  the  static  structure  of  that  content. 

•  Interaction-,  the  means  by  which  users  communicate  input  to 
the  system  and  the  feedback  supplied  by  the  system.  The 
term  implies  all  aspects  of  command-control  devices  (e.g., 
keyboards,  mice,  joysticks,  microphones),  as  well  as  sensory 
feedback  (e.g.,  changes  of  state  of  virtual  graphical  buttons, 
auditory  displays,  and  tactile  surfaces). 

•  Appearance:  verbal,  visual,  acoustic,  and  tactile  perceptual 
characteristics  of  displays.  The  term  implies  all  aspects  of  vis¬ 
ible,  acoustic,  and  haptic  languages  (e.g.,  typography  or 
color;  musical  timbre  or  cultural  accent  within  a  spoken  lan¬ 
guage;  and  surface  texture  or  resistance  to  force). 
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It  is  important  to  bear  in  mind  that  localization  concerns  go 
well  beyond  only  language  translation.  They  may  affect  each 
component  of  a  UI:  from  choices  of  metaphorical  references, 
hierarchies  in  the  mental  model,  and  navigation  complexity,  to 
choices  of  input  techniques,  graphics,  colors,  sounds/voice/mu¬ 
sic,  and  use  of  vibration. 

This  chapter  discusses  the  development  of  UIs  intended  for 
users  in  many  different  countries  with  different  cultures  and  lan¬ 
guages.  The  text  presents  a  survey  of  important  issues,  as  well  as 
recommended  steps  in  the  development  of  UIs  for  an  interna¬ 
tional  and/or  intercultural  user  population  and  products/services 
intended  for  most  platforms,  including  desktop,  Web,  mobile  de¬ 
vices,  and  vehicles.  With  the  rise  of  the  Internet  and  application- 
oriented  websites,  the  challenge  of  designing  good  UIs  be¬ 
comes  an  immediate,  practical  matter,  not  only  a  theoretical 
issue.  This  topic  is  discussed  from  a  user  perspective,  not  a  tech¬ 
nology  or  code  perspective.  The  chapter  will  (a)  introduce  funda¬ 
mental  definitions  of  globalization  in  UI  design;  (b)  demonstrate 
why  globalization  is  vital  to  the  success  of  computer-based  com¬ 
munication  products;  (c)  introduce  culture  dimensions;  (d)  show 
their  effect  on  one  particular  platform  of  desktop  Web  designs, 
suggest  alternate  culture  dimensions,  and  recommend  other 
issues  that  might  relate  to  culture  and  user  experience. 


GLOBALIZATION 


Definitions  of  Globalization, 

Internationalization,  Localization 

“Globalization”  refers  to  the  entire  process  of  preparing  products 
or  services  for  worldwide  production  and  consumption  and  in¬ 
cludes  issues  at  international,  intercultural,  and  local  scales.  In  an 
information-oriented  society,  globalization  affects  most  computer- 
mediated  communication,  which,  in  turn,  affects  UI  design.  The 
discussion  that  follows  refers  particularly  to  UI  design. 

“Internationalization”  refers  to  the  process  of  preparing  code 
that  separates  the  localizable  data  and  resources  (that  is,  items 
that  pertain  to  language  and  culture  needed  for  input  and  out¬ 
put)  from  the  primary  functionality  of  the  software.  This  sepa¬ 
ration  may  include  the  ability  for  the  UI  to  work  on  different 
platforms  in  one  or  more  geographic  regions  where  technical, 
financial,  political,  and  legal  matters  may  affect  the  designs.  Soft¬ 
ware  created  in  this  way  does  not  need  to  be  rewritten  or  re¬ 
compiled  for  each  local  market.  International  issues  refer  to  ge¬ 


ographic,  political,  linguistic,  and  typographic  issues  of  nations 
or  groups  of  nations.  The  UIs  is  the  International  Standards  Or¬ 
ganization’s  (ISO)  draft  of  human  factors  standards  in  Europe 
for  color  legibility  standards  of  cathode-ray  tube  (CRT)  devices 
(International  Organization  for  Standardization,  1989)  is  an  ex¬ 
ample  of  an  effort  to  establish  international  standards  for  some 
parts  of  a  UI.  Other  examples  are  the  legal  requirement  for  bilin¬ 
gual  English  and  French  displays  in  Canada,  and  the  quasi-legal 
denominations  for  currency,  time,  and  physical  measurements, 
which  differ  from  country  to  country  (Table  18.1). 

Intercultural  issues  refer  to  the  religious,  historical,  linguis¬ 
tic,  aesthetic,  and  other,  more  humanistic  issues  of  particular 
groups  or  peoples,  sometimes  crossing  national  boundaries.  Ex¬ 
amples  (Table  18.2)  include  calendars  that  acknowledge  vari¬ 
ous  religious  time  cycles;  terminology  for  color,  type,  and  signs 
reflecting  various  popular  cultures;  and  organization  of  content 
in  Web  search  criteria  reflecting  cultural  preferences.  Some¬ 
times  these  issues  reach  national  political  importance,  such  as 
the  debate  in  the  redesigned  society  of  Iraq  during  2005  over 
whether  Thursday  and  Friday  (the  Moslem  Sabbath)  should 
constitute  the  “weekend”  rather  than  Saturday  (the  Jewish  Sab¬ 
bath)  or  Sunday  (the  Christian  Sabbath)  (Kuhn,  2005). 

“Localization”  refers  to  the  process  of  customizing  (includ¬ 
ing  language  translation  but  potentially  other  changes,  also,  such 
as  content  hierarchies,  graphics,  colors,  and  icons/symbols)  the 
data  and  resources  of  code  needed  for  a  specific  market.  Trans¬ 
lation  is  accomplished  manually  by  in-house  staff  or  one  or  more 
outside  contractors  performing  that  service.  Translation  can  also 
be  accomplished  semi-automatically  using  software  provided  by 
third-party  firms,  such  as  Systran  (www.Systran.com).  Localiza¬ 
tion  can  take  place  for  specific,  small-scale  communities,  often 
with  unified  language  and  culture,  and  usually  at  a  scale  smaller 
than  countries,  or  significant  cross-national  ethnic  “regions.”  Ex¬ 
amples  include  affinity  groups  (e.g.,  French  “twenty-some¬ 
things,”  or  U.S.  Saturn  automobile  owners),  business  or  social  or¬ 
ganizations  (e.g.,  German  staff  of  DaimlerChrysler  or  Japanese 
golf  club  members),  and  specific  intranational  groups  (e.g.,  In¬ 
dia’s  untouchables,  Swedish  househusbands,  or  young  Japan¬ 
ese  professional  women  who  are  rejecting  marriage  as  a  life  pat¬ 
tern.).  Used  informally,  localization  may  apply  to  “corporate 
cultures”  or  other  groups  that  may  be  geographically  dispersed. 
With  the  spread  of  Web  access,  the  term  “localization”  may  be 
used  more  often  to  refer  to  groups  of  shared  interests  that  may 
be  geographically  dispersed.  Note:  this  broad  definition  of  “cul¬ 
ture”  is  not  accepted  by  all  theorists.  For  example,  see  Clausen 
(2000),  as  reported  by  Yardley  (2000).  However,  for  the  purposes 
of  this  chapter,  this  broad  definition  is  used. 


TABLE  18.1.  Examples  of  Differing  Displays  for  Currency,  Time,  and  Physical  Measurements 


Item 

USA  Exampies 

European  Examples 

Asian  Examples 

Currency 

$1,234.00  (US  Dollars) 

DM1 .234  (German  marks) 

¥1,234  Japanese  yen) 

Time  Measures 

8:00  FM.,  August  24,  1999 

8:00  RM.,  8/24/99 

20:00,  24  August  1999  (England) 

20:00,  24.08.99  (Germany,  traditional) 

20:00,  1999-08-24  (ISO  8601  Euro  standard) 

20:00,  1 999.08.24,  or  Imperial  Heisei  1 1 , 
or  HI  1  Japan) 

Physical  Measures 

3  lb,  14  oz 

3'  10",  3  feet  and  10  inches 

3.54  kg,  8.32  m  (England) 

3.54  kg,  8,32  m  (Euro  standard) 

3.54  kg,  8.32  m  in  Roman  or  Katakana  chars. 
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TABLE  18.2.  Examples  of  Differing  Displays  for  Other  Data  Formats  (Partly  from  Aykin) 


item 

USA  Examples 

European  Examples 

Asian  Examples 

Numerics 

1 ,234.56  (also  Can.,  China,  UK) 

1  234,56  (Finland,  Fr,  Lux.,  Portugal.,  Sweden) 
1.234,56  (Albania,  Arg,,  Denmark,  Greece,  Neth.) 

1 '234.56  (Switz:  Get,  Ital.) 

1  '234,56  (Switz:  French) 

1,234.56 

Telephone  Numbers 

1-234-567-8901,  ext.  23 
1.234.567.8901 
(123)  456-7890 

1 234  56  78  90  (Austria) 

(123)  4  5  6  78  90  (Germany) 

(12)  3456  789  (Italy) 

4-46(0)12  345  67 

-F49  (1234)  5678-9  (Switz) 

-F81-53-478-1481  (lapan) 

82  2  3142  1100  (Korea) 
-F82-(0)2-535-3893  (Korea) 

86  12  34567890  (China) 

Address  Formats 

Title,  First  Name,  Ml,  Last  Name 
Department 

Company 

Number,  Street, 

City,  State,  Zip-Code 

Country 

Paternal  Name,  Maternal  Name,  First  Name 
Company,  Department 

Street,  Number 

City,  District/Region 

Zip-Code,  Country 

(Order  may  vary  from  country  to  country) 

Family  Name,  First  Name 
Department 

Company 

Number,  Street,  Neighborhood, 
District 

Zip-Code,  City 
(lapan) 

TABLE  18.3.  Examples  of  Differing  Cultural  References 

Item 

N.  America/Europe  Example 

Middle-Eastern  Example 

Asian  Example 

Sacred  colors 

White,  blue,  gold,  scarlet 

Green,  light  blue 

Saffron  yellow 

(ludeo-  Christian) 

(Islam) 

(Buddhism) 

Reading  direction 

Left  to  right 

Right  to  left 

Top  to  bottom 

Item 

USA 

France,  Germany 

lapan 

Web  search 

"Culture"  doesn’t  imply  political 

"Culture”  implies  political 

"Culture”  implies  tea  ceremony 

discussions 

Discussions 

Discussions 

Sports  references 

Baseball,  football,  basketball 

Soccer 

Sumo  wrestling,  baseball 

golf  is  a  sport 

golf  is  a  religion 

Localization  changes  may  need  to  consider  any  or  all  of  the 
following: 

Address  formats 

Alphabetic  sequence  and  nomenclature 
Arithmetic  operations  symbolism 

Business  standards  (quotes,  tariffs,  contracts,  agreement 
terms,  etc.) 

Calendar 

Character  handling 
Colors 

Content  categories 

Date  and  time  formats 

Documentation  nomenclature  and  formats 

Electrical  and  electronic  plug  formats  and  nomenclature 

Energy  formats 

Environmental  standards  (“green”-compliancy,  low  energy, 
low  pollution,  etc.) 

Eile  formats 

Eont  nomenclature,  sizes,  faces,  and  byte  formats 
Erequency  (i.e.,  gigahertz) 

Hyphenation  and  syllabification 
Icons  and  symbols 


Intellectual  property  (protection  via  patents,  copyrights, 
trademarks) 

Keyboard  formats 
Language  differences 
Legal  processes 
Licensing  standards 

Measurement  units  (length,  volume,  weight,  electricity, 
energy,  temperature,  etc.) 

Monetary  or  currency  formats 
Multilingual  usage 
Name  formats 
Negative  formats 

Numeric  formats  and  number  symbols: 

Packaging 
Paper  formats 

Punctuation  symbols  and  usage 
ReadingAvriting  direction 
Sorting  sequences 
Style  formats 

Telephone/fax,  temperature  formats 
Text  length 

Video  recording  and  playback  formats 
Voltage/amperage  units  and  formats 
Weight  formats 
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Table  18.4  demonstrates  the  complexities,  even  within 
English-language  users.  Preparing  texts  in  local  languages  may 
require  use  of  additional  or  different  characters.  The  ASCII  sys¬ 
tem,  which  uses  seven  or  eight  bits  to  represent  characters, 
supports  English,  and  the  ISO  8859-1  character  set  supports 
Western  European  languages,  including  Spanish,  French,  and 
German.  Other  character  encoding  systems  include  EBCDIC, 
Shift-JIS,  UTF-8,  and  UTF-16.  ISO  has  established  specific  char¬ 
acter  sets  for  languages  such  as  Cyrillic,  Modern  Greek,  Hebrew, 
Japanese,  etc.  The  new  Unicode  system  (ISO  10646)  (Graham, 
2000)  uses  16  bits  to  represent  65,536  characters,  which  is  suf¬ 
ficient  to  display  Asian  languages,  like  Japanese  and  Korean,  and 
permits  easier  translation  and  presentation  of  character  sets. 

Advantages  and  Disadvantages  of  Globalization 

The  business  justification  for  globalization  of  UIs  is  complex,  but 
compelling.  Clear  business  reasons  can  drive  decisions  to  local¬ 
ize  content  on  websites  or  for  Web/mobile  applications:  Research 
shows  that  Web-site  visitors  stay  twice  as  long  at  local  language 
sites  as  they  do  at  English-only  sites,  business  users  are  three- 
times  more  likely  to  buy  when  the  site  is  in  their  own  language, 
and  contacts  with  customer  service  decline  when  instructions  are 
shown  in  the  users’  native  languages  (Forrester,  1998). 

If  the  content  (functions  and  data)  is  likely  to  be  of  value  to 
target  populations  outside  of  the  original  market,  it  is  usually 
worthwhile  to  plan  for  international  and  intercultural  factors  in 
developing  a  product  or  service,  so  that  it  may  be  customized  ef¬ 
ficiently,  e.g.,  having  separate  text  files  that  can  be  translated 
more  easily.  Rarely  can  a  product  achieve  global  acceptance  with 
a  “one-size-fits-all”  solution.  Developing  a  product  for  inter¬ 
national,  intercultural  audiences  usually  involves  more  than 
merely  a  translation  of  verbal  language,  however.  Visible  (or  oth¬ 
erwise  perceptual,  e.g.,  auditory)  language  must  also  be  revised, 
and  other  UI  characteristics  may  need  to  be  altered. 

While  increasing  initial  development  costs,  developing  prod¬ 
ucts  or  services  ready  for  global  use  gives  rise  to  potential  for 
increased  international  sales.  However,  for  some  countries, 
monolithic  domestic  markets  may  inhibit  awareness  of,  and  in¬ 
centives  for,  globalization;  e.g.,  because  the  United  States  has 
in  the  past  been  such  a  large  producer  and  consumer  of  soft- 


TABLE  18.4.  Comparison  of  English-Language 
User  Community  Conventions 


United  States 

United  Kingdom 

Dates 

Month/Day/Year: 

Day/Month/Year: 

March  17,  2001,  3/17/01 

17  March  2001, 
17/03/01 

Time 

1 2-hour  clock,  a.m./rm. 

24-hour  clock 

No  leading  zero  (8:32  a  m.) 

Leading  zero  (08:32) 

Currency 

$189.56,  56C 

GB£189.56, 

£189.56,  56p 

Spelling 

Center 

Centre 

Color 

Colour 

Terminology 

Truck 

Lorrie 

Bathroom 

Toilet 

Book  spine  title 

Top-down 

Bottom  up 

ware,  it  is  not  surprising  that  some  U.S. manufacturers  have  tar¬ 
geted  only  domestic  needs.  However,  as  many  U.S.  industries, 
such  as  movie  making  and  games  development,  have  discov¬ 
ered,  foreign  sales  may  be  a  significant  portion  of  total  sales  or 
even  larger  than  domestic  sales.  In  order  to  penetrate  some 
markets,  the  local  language  may  be  a  nearly  absolute  require¬ 
ment  as  in  France.  Recent  reports  show  that  European  Union 
(EU)  countries  must  provide  local  variations  in  order  to  gain 
user  acceptance.  As  an  example:  English-only  portals  are  a  bar¬ 
rier  to  EU  business  success  (Vickers,  2000). 

Some  software  products  are  initiated  with  international  ver¬ 
sions  (e.g.,  typically  five  to  seven  languages  for  global  products 
originating  in  the  United  States),  but  they  are  usually  released  in 
sequence  because  of  limited  development  resources.  Other 
products  are  “retrofitted”  to  suit  the  needs  of  a  particular  coun¬ 
try,  language,  or  culture,  as  needs  or  opportunities  arise.  In 
some  cases,  the  later,  ad-hoc  solution  may  suffer  because  of  the 
lack  of  original  planning  for  globalization. 

Globalization  Development  Process 

The  “globalized”  UI  development  process  is  a  sequence  of  par¬ 
tially  overlapping  steps,  some  of  which  are  partially,  or  com¬ 
pletely  iterative: 

•  Plan:  Define  the  strategy,  including  the  challenges  or  oppor¬ 
tunities  for  globalization;  establish  objectives  and  tactics;  de¬ 
termine  budget,  schedule,  tasks,  development  team,  and 
other  resources.  Globalization  must  be  specifically  accounted 
for  in  each  item  of  project  planning;  otherwise,  cost  over¬ 
runs,  delays  in  schedule,  and  lack  of  resources  are  likely  to 
occur.  In  most  cases,  business  managers  will  expect  to  see  a 
return-on-investment  (ROI)  analysis  of  the  expected  benefits, 
basis  for  benchmarking,  likely  tools  and  process,  and  met¬ 
rics  to  be  used  in  “proving”  the  results  are  better. 

•  Research:  Investigate  dimensions  of  global  variables  and 
techniques  for  all  subsequent  steps,  e.g.,  techniques  for 
analysis,  criteria  for  evaluation,  media  for  documentation,  etc. 
In  particular,  identify  items  among  data  and  functions  that 
should  be  targets  for  change  and  identify  sources  of  national/ 
cultural/local  reference.  Globalized  user-centered  design 
stresses  the  need  to  adequately  research  users’  wants  and 
needs  according  to  a  sufficiently  varied  spectrum  of  poten¬ 
tial  users,  across  specific  dimensions  of  differentiation.  Re¬ 
cently,  Fortune,  a  major  business  publication,  noted  that  Mi¬ 
crosoft  hired  a  significant  number  of  anthropologists  to 
undertake  ethnographic  analyses  and  evaluations  in  the 
course  of  product  development  (Murphy,  2005) .  During  the 
past  few  years,  many  companies  have  had  anthropologists 
and  ethnographers  attend  user  interface  conferences  and 
anthropology  conferences  to  report  on  their  findings.  One 
Internet-based  discussion  group.  Anthropologists  in  Design 
(see  URL  references)  regularly  tracks  inquiries  about  best  in¬ 
formation  resources,  tools,  educational  institutions,  and  any 
case  studies  related  to  current  job  tasks.  Local  informal  gath¬ 
erings  within  some  urban  environments  enable  designers 
and  anthropologists/ethnographers  to  get  together  to  ex¬ 
change  experiences. 
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•  Analyze:  Examine  results  of  challenges  or  opportunities  in 
the  prospective  markets,  refine  criteria  for  success  in  solving 
problems  or  exploiting  opportunities  (write  marketing  or 
technical  requirements),  determine  key  criteria  for  usability, 
usefulness,  and  appeal  (i.e.,  the  user  experience),  and  define 
the  design  brief,  or  primary  statement  of  the  design’s  goals. 
At  this  stage,  globalization  targets  should  be  itemized. 

•  Design:  "Visualize  alternative  ways  to  satisfy  criteria  using  al¬ 
ternative  prototypes;  based  on  prior  or  current  evaluations, 
select  the  design  that  best  satisfies  criteria  for  both  general 
good  UI  design  and  globalization  requirements;  prepare 
documents  that  enable  consistent,  efficient,  precise,  and  ac¬ 
curate  implementation. 

•  Implement:  Build  the  design  to  complete  the  final  product, 
i.e.,  write  code  using  appropriate,  effective,  efficient  tools 
identified  in  planning  and  research  steps. 

•  Evaluate:  At  any  stage,  review  or  test  results  in  the  market¬ 
place  against  defined  criteria  for  success,  i.e.,  conduct  focus 
groups,  test  usability  on  specific  functions,  gather  sales,  and 
user  feedback.  Identify  and  evaluate  matches  and  mis¬ 
matches,  and  then  revise  the  designs.  Test  prototypes  or  final 
products  with  international,  intercultural,  or  specific  local¬ 
ized  user  groups  to  achieve  globalized  UI  designs.  As  noted  in 
recent  proceedings  of  UI  conferences  such  as  the  Association 
for  Computing  Machinery’s  Special  Interest  Group  on  Human- 
Computer  Interaction  (ACM/SIGCHI) ,  Usability  Profession¬ 
als  Association  (UPA),  Human-Computer  Interaction  Inter¬ 
national  (HCII),  and  the  International  Conference  on  the 
Internationalization  of  Products  and  Services  (IWIPS)  the 
techniques  of  evaluation  may  need  to  be  adjusted  to  be  suc¬ 
cessful  in  different  cultures  and  geographic  circumstances 
(Chavan,  2005).  Recent  publications  have  focused  specifically 
on  studies  of  mobile  phone  users  (Lee,  Ryu,  Smith-Jackson, 
Shin,  Nussbaum,  &  Tamioka,  2005),  as  opposed  to  'Web-site 
visitors. 

•  Document:  Record  the  development  history,  issues,  and  de¬ 
cisions  in  specifications,  guidelines,  and  recommendation 
documents.  Honold  (1999)  for  example,  notes  that  German 
and  Chinese  cell-phone  users  require  different  strategies  for 
documentation,  as  well  as  training,  which  are  related  to  cul¬ 
tural  differences  predicted  by  classical  models.  Even  design¬ 
ing  the  “ideal”  documentation  for  a  particular  culture  of  de¬ 
velopers  may  require  user-centered  design  processes,  e.g., 
those  that  prefer  paper  and  text,  and  those  that  prefer  visual, 
interactive  means  with  minimal  text  reading. 

•  Maintain:  Determine  which  documents,  customer-response 
services,  and  other  processes  will  require  specialized  multiple 
languages  and  changes  in  media  or  delivery  techniques.  Pre¬ 
pare  appropriate  guidelines  and  templates. 

•  Train:  Determine  which  documents  and  processes  will  require 
multiple  languages,  different  graphics,  pacing,  media,  or  distri¬ 
bution  methods.  Prepare  appropriate  guidelines  and  templates. 

Critical  Aspects  for  Globalization:  General  Guidelines 

Beyond  the  UI  development  process  steps  identified  in  the  pre¬ 
vious  section,  the  following  guidelines  can  assist  developers  in 


preparing  a  “check  list”  for  specific  tasks.  The  following  recom¬ 
mendations  are  grouped  under  the  UI  design  terms  referred  to 
earlier: 


User  Demographics 

•  Identify  national  and  cultural  target  user  populations  and  seg¬ 
ments  within  those  populations,  and  then  identify  possible 
needs  for  differentiation  of  UI  components  and  the  proba¬ 
ble  cost  of  delivering  them. 

•  Identify  potential  savings  in  development  time  through  the 
reuse  of  UI  components  based  on  common  attributes 
among  user  groups.  For  example,  certain  primary  (or  top- 
level)  controls  in  a  "Web-based,  data-retrieval  application 
might  be  designed  for  specific  user  groups  to  aid  compre¬ 
hension  and  to  improve  appeal.  Lower-level  controls,  on  the 
other  hand,  might  be  designed  with  standardized,  unvarying 
form-like  elements. 

•  Consider  legal  issues  in  target  communities,  which  may  in¬ 
volve  issues  of  privacy,  intellectual  property,  spamming,  defa¬ 
mation,  pornography  and  obscenity,  vandalism  (e.g.,  viruses), 
hate  speech,  fraud,  theft,  exploitation  and  abuse  (children, 
environment,  elderly,  etc.),  legal  jurisdiction,  seller/buyer 
protection,  etc. 


Technology 

•  Determine  the  appropriate  media  for  the  appropriate  target 
user  categories,  e.g.,  emphasis  of  sound,  visual,  or  three- 
dimensional  (3D)  tactile  media;  verbal  versus  visual  content; 
etc. 

•  Account  for  international  differences  to  support  platform, 
population  and  software  needs,  including  languages,  scripts, 
fonts,  colors,  file  formats,  etc. 

•  Research  and  provide  appropriate  software  for  code  devel¬ 
opment  and  content  management  systems. 


Metaphors 

•  Determine  optimum  minimum  number  of  concepts,  terms, 
and  primary  images  to  meet  target  user  needs. 

•  Check  for  hidden  miscommunication  and  misunderstanding. 

•  Adjust  metaphorical  images  or  text  to  account  for  national  or 
cultural  differences.  Publications  and  projects  from/for  China 
and  India  have  suggested  metaphors  that  differ  considerably 
from  "Western  stereotypes.  For  example,  Chavan  (1994) 
stated  that  Indians  relate  more  easily  to  bookshelves,  books, 
chapters,  sections,  and  pages,  rather  than  the  desktop,  file 
folders,  and  files.  The  "Wuhong  pro  tope  PDA,  developed  by 
Ericsson  for  Chinese  users  in  2002,  used  metaphors  based 
on  the  Chinese  business-social  concept  of  Guang-xi,  or  rela¬ 
tionship  maintenance.  This  approach  meant  that  people, 
knowledge,  and  relationships  were  more  fundamental  and 
pervasive  than  folders,  files,  and  applications  (Marcus,  2003). 
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Mental  Models 

•  Determine  optimum  minimum  varieties  of  content  orga¬ 
nization. 

•  Consider  how  hierarchies  may  need  to  change  in  detail  and 
overall  in  terms  of  breadth  and  depth.  In  (Choong  &  Salvendy, 
1999),  the  authors  noted  that  Chinese  and  North-American 
users  tended  to  organize  the  contents  of  a  house  in  different 
ways,  and  in  (Carroll,  1999),  the  author  noted  that  if  one 
group  was  given  the  hierarchies  of  the  other,  the  group  had 
more  difficulty  navigating  the  hierarchy. 

Navigation 

•  Determine  need  for  navigation  variations  to  meet  target  user 
requirements,  determine  cost-benefit,  and  revise  as  feasible. 

Interaction 

•  Determine  optimum  minimum  variations  of  input  and  feed¬ 
back.  For  example,  because  of  Web-access  speed  differences 
for  users  in  countries  with  very  slow  access,  it  is  important 
to  provide  text-only  versions,  without  extensive  graphics,  as 
well  as  alternative  text  labels  to  avoid  graphics  that  take  con¬ 
siderable  time  to  appear. 

Appearance 

•  Determine  optimum  minimum  variations  of  visual  and  verbal 
attributes.  Visual  attributes  include  layout,  icons  and  symbols, 
choice  of  graphics,  typography,  color,  and  general  aesthet¬ 
ics.  Verbal  attributes  include  language,  formats,  and  order¬ 
ing  sequences.  For  example,  many  Asian  written  languages, 
such  as  Chinese  and  Japanese,  contain  symbols  with  many 
small  strokes.  This  factor  seems  to  lead  to  an  acceptance  of 
higher  visual  density  of  marks  in  complex  public-information 
displays  than  is  typical  for  Western  countries. 


An  Example  of  Specific  Guidelines:  Appearance 

Guidelines  for  visual  and  verbal  appearance  follow.  Details  can 
be  found  in  (Aykin,  2004;  DelGaldo  &  Neilson,  1996;  Fernandes, 
1995;  Marcus,  1999;  Nielsen,  1990). 

Layout  and  Orientation 

•  Adjust  layout  of  menus,  tables,  dialogue  boxes,  and  windows 
to  account  for  varying  reading  directions  and  size  of  text.  Ro¬ 
man  languages  read  only  left  to  right,  but  Asian  languages 
may  read  in  several  directions.  For  example,  in  Japan,  peo¬ 
ple  read  from  the  right  and  down,  or  down  and  to  the  right, 
and  Arabic/Hebrew  may  include  right-reading  Roman  text 
within  left-reading  lines  of  text. 


•  If  areas  where  dialogue  will  appear  use  sentence-like  struc¬ 
ture  with  embedded  data  fields  and/or  controls,  these  areas 
will  need  special  restructuring  to  account  for  language 
changes  that  significantly  alter  sentence  format.  For  example, 
German  sentences  often  have  verbs  at  the  ends  of  sentences, 
while  in  English  and  French  verbs  are  placed  in  the  middle 
of  sentences. 

•  As  appropriate,  change  layout  or  imagery  that  implies  or  re¬ 
quires  a  specific  reading  direction.  Left-to-right  sequencing 
may  be  inappropriate  or  confusing  for  use  with  right-to-left 
reading  scripts  and  languages.  A  Web-site  design  for  Arabia 
On-Line  of  the  late  1990s  featured  left-to-right  English  text  on 
a  home  page  intended  for  Western  business  people  and 
tourists,  but  the  page  layout  was  still  right-to-left,  as  in  Arabic, 
with  primary  links  at  the  far  right,  secondary  and  tertiary  links 
in  the  center  and  left  sections  of  the  screen,  and  small  direc¬ 
tional  arrows  pointing  right  to  left.  The  designers  had  suc¬ 
cumbed  to  Arabic-language  influence  to  the  detriment  of 
English-language  readers. 

•  Check  for  misleading  arrangements  of  images  that  lead  the 
viewer’s  eye  in  directions  inconsistent  with  language  read¬ 
ing  directions. 

•  For  references  to  paper  and  printing,  use  appropriate  print¬ 
ing  formats  and  sizes.  For  example,  in  the  United  States,  stan¬ 
dard  office  letterhead  size  is  8.5  X  11  inches,  but  in  Europe 
it  is  210mm  X  297  mm. 


Icons,  Symbols,  and  Graphics 

•  Avoid  the  use  of  text  elements  and  punctuation  within  icons 
and  symbols  to  minimize  the  need  for  versions  to  account  for 
varying  languages  and  scripts. 

•  Adjust  the  appearance  and  orientation  to  account  for  national 
or  cultural  differences.  For  example,  using  a  mailbox  as  an 
icon  for  e-mail  may  require  different  images  for  different 
countries. 

•  Consider  using  signs  or  derivatives  from  international  sig¬ 
nage  systems  developed  for  safety,  mass  transit,  and  com¬ 
munication  (American  Institute  of  Graphic  Arts,  1981;  Ol- 
gyay  1995;  Pierce,  1996).  These  signs  require  little  or  no 
translation  and  may  require  minimal  culture-specific  support 
information. 

•  Avoid  puns  and  local,  unique  references  that  will  not  trans¬ 
fer  well.  Note:  many  “universal”  signs  may  be  covered  by  in¬ 
ternational  trademark  and  copyright  use,  e.g.,  Mickey  Mouse 
and  the  “Smiley”  smiling  face. 

•  Check  for  appropriateness  and  use  with  caution  the  following: 

Animals  People 

Body  parts  and  positions  Puns,  or  plays  on  words 

Colors  National  emblems,  signs 

Hand  gestures  Religious,  mythological  signs 

•  Consider  whether  selection  symbols,  such  as  the  X  or  check 
marks,  convey  correct  distinctions  of  selected  versus  not- 
selected  items.  Some  users  may  interpret  an  X  as  crossing  out 
what  is  not  desired,  not  selection. 
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•  Be  aware  that  office  equipment  such  as  telephones,  mail¬ 
boxes,  folders,  and  storage  devices  differ  significantly  from 
nation  to  nation. 

Typography 

•  Consider  character-coding  schemes,  which  differ  dramatically 
for  different  languages.  ASCII  is  used  primarily  for  English, 
but  single-byte  schemes  are  used  for  European  languages, 
and  double-byte  schemes  are  used  for  Asian  languages.  These 
differences,  as  well  as  bidirectional  fonts  for  Hebrew  and  Ara¬ 
bic  display,  make  it  more  challenging  to  support  multilingual 
UIs.  Without  accounting  for  character-coding  schemes,  it  is 
more  difficult  for  users  to  access  content  easily. 

•  Use  fonts  available  for  the  range  of  languages  required. 

•  Consider  whether  special  font  characters  are  required  for 
currency,  physical  measurements,  etc. 

•  Use  appropriate  alphabetic  sequence  and  nomenclature 
(e.g. ,  U.S.  “zee”  versus  Canadian/English  “zed”) 

•  Ensure  appropriate  decimal,  ordinal,  and  currency  number 
usage.  Formats  and  positioning  of  special  symbols  vary  from 
language  to  language. 

•  Consider  appropriate  numeric  formats  for  decimal  numbers 
and  their  separators  (Aykin,  2000): 

1.234.56  Canada,  China,  United  Kingdom,  United  States 

1.234.56  Albania,  Argentina,  Denmark,  Greece,  Nether¬ 
lands 

1  234,56  Finland,  France,  Luxembourg,  Portugal,  Sweden 

1’234.56  Switzerland  (German,  Italian) 

1’234,56  Switzerland  (French) 

Other  numeric  issues  include  the  following: 

Names  of  characters 

Standards  for  display  of  negative  and  positive  numbers 
Percent  indication 

Use  of  leading  zeros  for  decimal  values  (e.g.,  0.1  or  .1) 

List  separators 

Lucky  and  unlucky  numbers  (i.e.,  lucky  telephone  numbers  in 
Asian  countries  sometimes  sell  for  higher  prices.  Note  that  Chi¬ 
nese  prefer  even  numbers,  but  Japanese  prefer  odd  numbers.) 

•  Use  appropriate  temperature  formats,  i.e. ,  Fahrenheit,  Centi¬ 
grade,  Kelvin 

•  Use  appropriate  typography  and  language  for  calendar,  time 
zone,  and  telephone/fax  references. 

•  Consider  these  date  and  time  issues,  among  others: 
Calendars  (Gregorian,  Moslem,  Jewish,  Indian,  Chinese, 

Japanese,  etc.  Note  that  even  in  India  there  are  three  major 
Hindu  religious  calendars,  one  used  by  northern  Indians, 
one  by  southern  Indians,  and  one  shared  by  both,  separate 
from  a  secular  calendar.) 

Character  representation  (Hindu-Arabic,  Arabic,  Chinese, 
Roman,  etc.) 

Clock  of  12  or  24  hours 


Capitalization  rules  (e.g.,  main  words  in  U.S.  book  titles  versus 
British  first  only,  and  the  use  of  capitals  for  all  nouns  in  German) 
Days  considered  for  start  of  week  and  for  weekend 
Format  field  separators 

Maximum  and  minimum  lengths  of  date  and  time 
Names  and  abbreviations  for  days  of  week  and  months  (two-, 
three-  and  multicharacter  standards) 

Short  and  long  date  formats  for  dates  and  times 

Time  zone(s)  appropriate  for  a  country  and  their  names 

Use  of  A.M.  and  p.m.  character  strings 

Use  of  daylight  savings  time 

Use  of  leading  zeros 

•  Consider  monetary  format  issues: 

Conventions  for  use  with  Euros  (e.g.,  double  listings) 
Credit/debit-card  formats,  usage  conventions 
Currency  names,  denominations 
Currency  symbols  (local  versus  international  versions) 
Currency  conversion  rates 
Monetary  formats,  symbols,  and  names 
Rules  for  combining  different  monetary  formats:  e.g., 
required  multiple  currency  postings  of  Euro  plus  local  na¬ 
tional  currency 
Validating  monetary  input 

•  Consider  name  and  address  formats: 

Address  elements  and  number  of  lines 
Address  line  order 

Address:  street  numbers  first  or  last  punctuation: 

e.g.,  4,  route  de  Monastere;  1504  South  58th  Street;  Motza 
Illit  11) 

Address:  zip/postal  codes:  alphanumerics. 

Address:  zip/postal  codes:  sequence  with  city,  province/state, 
country 
Character  sets 

Datafield  labels  (family  name  versus  last  name  vs  surname; 

first  name  versus  given  name  versus  Christian  name) 

Field  labels  (last  name  versus  surname,  city/town/district/ 
province,  etc.) 

Location  and  location  order:  neighborhood,  district,  city/ 
town,  state/province) 

Name  formats: 

e.g.,  family  name  first  for  Asian  names,  family  names  in  all 
caps  in  Asia) :  number  of  names,  number  of  last  names,  or  sur¬ 
names  (maternal  versus  paternal),  prefixes/titles  (e.g.,  Ger¬ 
man  double  title:  Dr.  Eng.),  suffixes  (e.g.,  Jr.) 

Name  formats:,  double  family  names: 

Note,  for  example,  that  even  within  Spanish-speaking 
countries,  some  list  double  family  names  with  maternal  first, 
and  others  list  family  names  with  paternal  first.  Number  of 
last  names/surnames  (maternal,  paternal) 

Number  of  names  (first,  middle,  initials,  etc.) 

Prefixes  and  suffixes 
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Zip  and  postal  codes  (numeric  versus  alphanumeric,  typo¬ 
graphy,  order  in  relation  to  city,  state/province,  or  country) 
Consider  telephone,  fax,  and  mobile  phone-number  formats: 
Grouping  of  digits  varies  from  country  to  country. 

Internal  dialing  (initial  area-code  zeros)  versus  external 
(without) 

Numeric  versus  alphanumeric  (i.e.,  -I- 1-510-767-2676  versus 
510-POP-CORN)  Number  grouping,  separators  (e.g.,  (,),  -f, 
— ,  period,  space,  etc.) 

Use  of  plus  sign  for  country  codes  in  telephone  numbers 
Use  of  parentheses  for  area  codes 

Format  for  multiple  sequential  numbers  of  businesses  (slash, 
commas,  etc.) 


Color 

Follow  perceptual  guidelines  for  good  color  usage.  For  ex¬ 
ample,  use  warm  colors  for  advancing  elements  and  cool  col¬ 
ors  for  receding  elements;  avoid  requiring  users  to  recall  in 
short-term  memory  more  than  5  ±  2  different  coded  colors. 
Respect  national  and  cultural  variations  in  colors,  where  fea¬ 
sible,  for  the  target  users. 

Follow  appropriate  professional/popular  usage  of  colors, 
color  names,  denotation,  and  connotation. 


Aesthetics 

Respect,  where  feasible,  different  aesthetic  values  among  tar¬ 
get  users.  For  example,  some  cultures  have  significant  at¬ 
tachment  to  wooded  natural  scenes,  cartoon  characters,  tex¬ 
tures,  patterns,  and  imagery  (e.g.,  the  Finnish  and  the 
Japanese),  which  might  be  viewed  as  exotic  or  inappropriate 
by  other  cultures.  For  example,  the  Finnish  and  the  Japan¬ 
ese  enjoy  natural-scenery  imagery  more  than  some  other 
countries.  The  Chinese,  Korean,  and  Japanese  seem  to  enjoy 
cartoon  characters  appearing  in  their  products  and  services 
more  than  would  seem  appropriate  for  other  cultures. 
Consider  specific  culture-dependent  attitudes.  For  example, 
Japanese  viewers  find  disembodied  body  parts,  such  as  eyes 
and  mouths,  unappealing  in  visual  imagery. 


Language  and  Verbal  Style 

Consider  which  languages  are  appropriate  for  the  target 
users,  including  the  possibility  of  multiple  national  languages 
within  one  country.  For  example,  English  and  French  within 
Canada;  French,  German,  and  Italian  within  Switzerland, 
French  or  Dutch  in  Belgium,  and  in  Israel:  Hebrew,  Arabic, 
French  (official),  and  English  (unofficial).  India  has  more  than 
20;  South  Africa  has  seven,  and  English  is  third  among  native 
languages.  Note  also,  that  some  languages  have  different  di¬ 
alects,  i.e.,  Mexican,  Argentinian,  and  Castillian  (Spain)  Span¬ 
ish;  Parisian,  Swiss,  and  Canadian  French. 


•  Consider  which  dialects  are  appropriate  within  language 
groupings  and  check  vocabulary  carefully,  e.g.,  for  British  ver¬ 
sus  American  terms  in  English,  Mexican  versus  Spanish  terms 
in  Spanish,  or  Mainland  China  versus  Taiwanese  terms  in 
Mandarin  Chinese. 

•  Consider  the  impact  of  varying  languages  on  the  length  and 
layout  of  text.  For  example,  German,  French,  and  English  ver¬ 
sions  of  text  generally  have  increasingly  shorter  lengths. 
Some  Asian  texts  are  50%-80%  shorter  than  English;  some 
non-English  Roman-character  prose  texts  can  be  5096  to  20096 
longer.  Some  labels  can  be  even  longer. 

Example  (Aykin,  2000): 

English  Undo  Dutch  Ongedaan  maken 

English  Autoscroll  Swedish  Automatisk  rullning 

English  Preferences  German  Bildschirmeinstellungen 

•  Consider  the  different  alphabetic  sorting  or  ordering  se¬ 

quences  for  the  varied  languages  and  scripts  that  may  be  nec¬ 
essary  and  prepare  variations  that  correspond  to  the  alpha¬ 
bets.  Note  that  different  languages  may  place  the  same  letters 
in  different  locations,  for  example,  A  comes  after  A  in  French 
but  after  Z  in  Finnish. 

•  Consider  differences  of  hyphenation,  insertion-point  loca¬ 
tion,  and  emphasis,  e.g.,  use  of  bold,  italic,  quotes,  double 
quotes,  brackets,  etc. 

•  Use  appropriate  abbreviations  for  such  typical  items  as  dates, 
time,  and  physical  measurements. 

•  Remember  that  different  countries  have  different  periods  of 
time  for  “weekends”  and  the  date  on  which  the  week  begins. 


GLOBALIZATION  CASE  STUDY 


Planet  Sabre 

An  early  example  of  globalization  is  the  UI  design  for  Sabre’s 
Planet  Sabre  ™ ,  one  of  the  world’s  largest  private  extranets  used 
by  one  third  of  the  all  travel  agents.  Sabre  contained  approxi¬ 
mately  42  terabytes  of  data  about  airline  flights,  hotels,  and  au¬ 
tomobile  rentals  that  enabled  almost  $2  billion  of  bookings  an¬ 
nually,  receiving  up  to  one  billion  “hits”  per  day.  The  author’s 
firm  worked  closely  with  Sabre  over  a  period  of  five  years  to  de¬ 
velop  the  Planet  Sabre  UI  (Marcus,  2001). 

The  UI  development  process  emphasized  achieving  global 
solutions  from  the  beginning  of  the  project.  For  example,  re¬ 
quirements  mentioned  allowing  for  the  space  needs  of  multi¬ 
ple  languages  for  labels  in  windows  and  dialogue  boxes.  Besides 
supporting  English,  Spanish,  German,  Erench,  Italian,  and  Por¬ 
tuguese,  the  UI  design  proposed  switching  icons  for  primary  ap¬ 
plication  modules,  so  they  would  be  more  gender-,  culture-,  and 
nation-appropriate  (see  Pig.  18.1  &  18.2). 

Figure  18.1  shows  the  initial  screen  of  Planet  SABRE,  with 
icons  representing  the  primary  applications  or  modules  within 
the  system  conveyed  through  the  metaphor  of  objects  on  the 
surface  of  a  planet.  The  mail  box  representing  the  electronic 
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FIGURE  18.1.  Example  of  Planet  SABRE  home  screen  showing  typical  Icons  for  Passenger,  Airline 
Booking,  Hotel  Rental,  Car  Rental,  and  Email  (mail  box). 


.Vewtofs:  y  Assoriated  Lists 


Data  Display  y  Image*  ^ Sidebars 


O  Traveler  (ft  Mailbox  Ocar^- 


Cancel  |  Vendor  Preferences  Keyword 


Customizer 


FIGURE  1 8.2.  Example  of  dialogue  box  in  the  Customizer  application,  by  which  users  can  change 
the  icons  to  become  more  locally  relevant. 


mail  application  depicts  an  object  that  users  in  the  USA  would 
recognize  immediately;  however,  users  in  many  other  countries 
might  not  recognize  this  object,  because  mail  boxes  come  in 
very  different  physical  forms. 

Figure  18.2  shows  a  prototype  version  of  a  Customizer  dia¬ 
logue  box,  in  which  the  user  can  change  preferences;  i.e.,  icons, 
so  they  appear  throughout  the  UI  with  more  recognizable  im¬ 
ages  such  as  the  depiction  of  the  passenger. 

At  every  major  stage  of  prototyping,  designs  developed  in 
the  United  States  were  taken  to  users  in  international  locations 
for  evaluation.  User  feedback  was  relayed  to  the  development 
team  and  affected  later  decisions  about  all  aspects  of  the  UI 
design. 


CULTURE  DIMENSIONS 


Introduction  to  Culture  Dimensions 

Localization  includes  considerations  of  the  target-market  cul¬ 
tures.  Although  relevant  for  client-server  personal-computer 
software  applications  and  mobile  devices,  he  Web,  in  particu¬ 
lar,  enables  global  distribution  of  products  and  services  through 
Internet  websites,  intranets,  and  extranets.  Professional  Web-ori¬ 
ented  analysts  and  designers  generally  agree  that  well-designed 
UIs  improve  the  performance  and  appeal  of  websites,  help¬ 
ing  to  convert  “tourists”  or  “browsers”  to  “residents”  and  “cus- 
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tomers.”  In  a  global  economy,  user  differences  may  reflect 
worldwide  culture  differences.  This  section  analyzes  some  of 
the  needs,  wants,  preferences,  and  expectations  of  different  cul¬ 
tures.  This  analysis,  as  an  example,  uses  a  cross-cultural  theory 
developed  by  Geert  Hofstede;  however,  other  competing  theo¬ 
ries  exist,  a  discussion  of  which  follows. 

As  an  example  of  culture  differences,  consider  the  order  in 
which  one  prefers  to  find  information.  If  one  is  planning  a  trip 
by  train,  is  it  preferable  to  see  the  schedule  information  first  or 
read  about  the  information  provider  and  assess  its  credibility? 
Different  cultures  look  for  different  data  to  make  decisions. 


Culture:  An  Additional  Issue  for  Globally 
Oriented  U1  Designers 

Cultures,  even  within  some  individual  countries,  may  be  very 
different,  e.g.,  French-Canadians  within  Canada,  or  Moslem 
groups  within  European  countries.  As  previously  noted,  sacred 
colors  in  the  Judeo-Christian  West  (e.g.,  red,  blue,  white,  gold) 
are  different  from  Buddhist  saffron  yellow  or  Islamic  green.  Sub¬ 
dued  north-European,  Finnish  designs  for  background  screen 
patterns  (see  Fig.  18.3)  might  not  be  equally  suitable  in  south- 
European,  Mediterranean  climates.  These  differences  reflect 
strong  cultural  values.  Batchelor  (2000),  as  reported  by  Boxer 
(2001),  analyzes  the  history  of  Western  culture’s  “fear”  of  color 
in  a  book  appropriately  titled  Chromophobia.  How  might  these 
cultural  differences  be  understood? 

Many  analysts  in  organizational  communication  have  stud¬ 
ied  cultures  thoroughly  and  published  classic  theories;  other 
authors  have  applied  these  theories  to  analyze  the  impact 
of  culture  on  business  relations  and  commerce  (e.g.,  see  Elash- 
mawi  &  Harris  1998;  Harris  &  Moran  1993;  Lewis  1991).  Culture- 
oriented  works  are  becoming  better  known  within  the  UI 
design  community.  Anthropological  theorists  include  Geert 
Hofstede  (1997),  Edward  T.  Hall  (1969),  David  Victor  (1992), 


Fons  Trompenaars  and  C.  H.  Turner  (1998),  and  Shalom  Sch¬ 
warts  (2004)  each  of  whom  offers  valuable  insights  into  the  the 
challenges  of  cross-cultural  communication.  Three  recent 
books  mention  “cultural  intelligence”  or  “cultural  awareness”  in 
their  titles  (Earley,  Soon,  &  Soon,  2003;  Kohls,  Robert,  &  Knight, 
1994;  Thomas  &  Inkson,  2004). 

Other  recent  publications  not  only  add  to  the  depth  of  this 
research,  but  also  generate  controversy  regarding  the  perma¬ 
nence  and  identity  of  some  characterizations  of  cultures.  Eor 
example,  Nisbett  (2003)  asserted  that  essential  thinking  pat¬ 
terns  seem  to  be  culturally  influenced  and  are  not  universal.  He 
described  Western  culture’s  devotion  objects,  logical  reason¬ 
ing,  categorization,  and  desire  to  understand  situations  and 
events  as  linear  cause-and-effect  sequences;  while  Eastern 
minds  emphasize  relationships  and  seem  to  accept  contradic¬ 
tion  more  readily. 

Clausen  (Clausen,  2000),  as  reported  by  Yardley  (2000),  ar¬ 
gued  that  the  anthropological  term  “culture”  refers  to  “the  [es- 
entially  inescapable]  total  structure  of  life  of  a  particular  society” 
while  many  today  use  culture  to  refer  to  (optional)  “shared 
values.”  As  decribed  by  Yardley  (Yardley,  2000  B3)  Clausen  ar¬ 
gued  that  “. . .  the  ‘culture’  of  the  Internet  has  none  of  the  char¬ 
acteristics  of  a  real  culture.  It  is  not  a  total  way  of  life;  it  did  not 
evolve  among  a  distinct  people;  nobody  inherited  it  or  was 
raised  in  it;  it  makes  no  moral  demands,  has  no  religion  at  its 
center,  and  produces  no  art.”  Although  complex,  its  rules  are 
procedural.  From  the  perspective  of  Internet  “dwellers,”  some 
of  his  statements  are  likely  to  stir  debate,  as  will  his  assertions 
that  in  terms  of  the  strict  definition,  culture  no  longer  exists  in 
the  United  States. 

The  permanence  of  cultural  attributes  is  questioned  further 
in  articles  by  Ono  and  Spindle  (2000)  and  Herskovitz  (2000), 
who  note  the  rise  of  individualism  in  classically  collectivist  Japan 
and  the  acceptance,  as  well  as  the  influence,  of  Japanese  pop- 
cultural  artifacts  (i.e.,  music,  movies,  and  television)  in  Asian  na¬ 
tions  that  were  recently  mortal  enemies  of  Japan. 


FIGURE  18.3.  TeamWare  Finnish  screen  patterns. 
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Some  theorists,  including  Hofstede  (1997),  have  discussed  the 
relation  of  culture  to  economic  success  and  have  made  differing 
assertions.  Hofstede  felt  it  is  a  complex  mixture  of  culture  plus 
geography  and  idiosyncratic  drifts  in  technology,  i.e.,  both  culture 
and  creativity.  Other  theorists  argue  for  solely  cultural  determin¬ 
ism,  as  reported  by  Stille  (2001).  Still  other  students  of  media,  so¬ 
ciology,  and  politics  link  culture  to  political  and  social  constructs 
and  discuss  the  ability  of  specific  groups  to  manipulate  people’s 
cultural  self  images  through  media  (Lamont,  1995;  Brain,  2005). 

Although  many  cultural  anthropology  theorists  are  poten¬ 
tially  valuable  for  UI  designers,  the  application  of  Hofstede’s  the¬ 
ories  will  demonstrate  the  value  of  this  body  of  research  for  the 
fields  of  UI  design,  using  examples  from  Web-site  design.  The  fol¬ 
lowing  sections  introduce  Hofstede’s  concept  of  cultural  dimen¬ 
sions,  and  applies  these  dimensions  to  analyzing  Web-site  UIs. 

Hofstede's  Dimensions  of  Culture 

During  1978-1983,  the  Dutch  cultural  anthropologist  Geert  Hof¬ 
stede  conducted  detailed  interviews  with  hundreds  of  IBM  em¬ 
ployees  in  53  countries.  Through  standard  statistical  analysis  of 
large  data  sets,  he  was  able  to  determine  patterns  of  similarities 
and  differences  among  their  replies.  From  this  analysis,  he  for¬ 
mulated  his  theory  that  world  cultures  vary  along  consistent,  fun¬ 
damental  dimensions.  Because  his  subjects  were  constrained  to 
one  multinational  corporation’s  worldwide  employees,  and  thus 
to  one  company  culture,  he  ascribed  their  differences  to  the  ef¬ 
fects  of  their  national  cultures.  (One  debated  characteristic  of  his 
approach  is  that  he  maintained  that  each  country  has  just  one 
dominant  culture.)  In  1997,  Hofstede  published  a  version  of  his 
research  in  Cultures  and  Organizations:  Software  of  the  Mind. 
His  focus  was  not  on  defining  culture  as  refinement  of  the  mind 
but  rather  on  essential  patterns  of  thinking,  feeling,  and  acting 
that  are  well  established  during  childhood.  These  cultural  dif¬ 
ferences  manifest  themselves  in  a  culture’s  choices  of  symbols, 
heroes/heroines,  rituals,  and  values. 

Hofstede  rated  53  countries  on  indices  including  five  dimen¬ 
sions  normalized  to  values  (usually)  of  0  to  100.  His  five  dimen¬ 
sions  (indices)  of  culture  are, 

•  Power-distance  (PDI) 

•  Collectivism/individualism  (ID'V) 

•  Femininity/masculinity  (MAS) 

•  Uncertainty  avoidance  (UAI) 

•  Long/short-term  time  orientation  (LTO) 

Each  of  Hofstede’s  dimensions  follows  with  an  explanation 
of  implications  for  UI  (especially  Web  UI)  design,  and  illustra¬ 
tions  of  characteristic  websites.  The  complete  data  for  all  coun¬ 
tries  appears  in  Table  18.5. 

Power  Distance 

“Power  distance”  (PD)  refers  to  the  extent  to  which  less  power¬ 
ful  members  expect  and  accept  unequal  power  distribution 
within  a  culture. 


High  PD  countries  tend  to  have  centralized  political  power 
and  exhibit  tall  hierarchies  in  organizations  with  large  differ¬ 
ences  in  salary  and  status.  Subordinates  may  view  the  boss  as  a 
benevolent  dictator  who  expects  them  to  do  as  they  are  told. 
Parents  teach  children  to  be  obedient  and  expect  respect. 
Teachers  are  considered  wise  and  are  esteemed.  Inequalities  are 
expected,  and  even  may  be  desired. 

Low  PD  countries  tend  to  view  subordinates  and  supervi¬ 
sors  more  as  equals  and  more  interchangeable,  with  flatter  hi¬ 
erarchies  in  organizations  and  less  difference  in  salaries  and  sta¬ 
tus.  Parents  and  children,  as  well  as  teachers  and  students,  may 
view  themselves  more  as  equals.  Equality  is  expected  and  gen¬ 
erally  desired. 

Hofstede  noted  that  these  differences  are  hundreds  or  even 
thousands  of  years  old.  He  does  not  believe  they  will  disappear 
quickly  from  traditional  cultures,  even  with  powerful  global 
telecommunication  systems.  Based  on  this  definition,  (high  ver¬ 
sus  low)  power  distance  may  influence  the  following  aspects  of 
UI  design: 

•  Access  to  information:  highly  structured  versus  not  as  highly 
structured. 

•  Hierarchies  in  mental  models:  tall  versus  shallow. 

•  Emphasis  on  the  social  and  moral  order  (e.g. ,  nationalism  or 
religion)  and  its  symbols:  significant/frequent  versus  minor/ 
infrequent  use. 

•  Focus  on  expertise,  authority,  certifications,  official  logos: 
strong  versus  weak. 

•  Prominence  given  to  leaders  versus  citizens,  customers,  or 
employees. 

•  Importance  of  security,  restrictions,  or  barriers  to  access:  ex¬ 
plicit,  enforced,  frequent  restrictions  on  users  versus  trans¬ 
parent,  integrated,  implicit  freedom  to  roam. 

•  Social  roles  used  to  organize  information  (e.g.,  a  managers’ 
section  that  is  obvious  to  all  but  sealed  off  from  nonman¬ 
agers)  :  frequent  versus  infrequent. 

These  PD  differences  are  illustrated  by  university  websites 
from  two  countries  with  very  different  PD  indices  (Fig.  18.4  & 
Fig.  18.5):  the  Universiti  Utara  Malaysia  (www.uum.edu.my)  in 
Malaysia,  with  a  PD  index  of  104,  the  highest  in  Hofstede’s  analy¬ 
sis;  and  and  the  Technische  Universiteit  Eindhoven  (www.tue.nl) 
in  the  Netherlands,  with  a  PD  index  of  38. 

The  Malaysian  website  features  strong  axial  symmetry,  a  fo¬ 
cus  on  the  official  seal  of  the  university,  photographs  of  fac¬ 
ulty  or  administration  leaders  conferring  degrees,  and  monu¬ 
mental  buildings  in  which  people  play  a  small  role.  A  top-level 
menu  selection  provides  a  detailed  explanation  of  the  sym¬ 
bolism  of  the  official  seal  and  information  about  the  leaders 
of  the  university. 

The  Dutch  website,  on  the  other  hand,  features  an  empha¬ 
sis  on  students  (not  leaders),  an  asymmetric  layout,  and  pho¬ 
tos  of  both  genders  in  illustrations.  This  website  emphasizes  the 
power  of  students  as  consumers  and  equals.  In  similar  Dutch 
university  websites,  students  even  have  the  opportunity  to  op¬ 
erate  a  'WebCam  to  take  their  own  tours. 
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TABLE  18.5.  Indices  from  Hofstede,  Geert,  Cultures  and  Organizations:  Software  of  the  Mind,  (Hofstede). 


PDI  Rank 

Score 

IDV  Rank 

Score 

M/V3  Rank 

Score 

UAI  Rank 

Score 

LTO  Rank 

Score 

Arab  Countries 

7 

80 

26/27 

38 

23 

53 

27 

68 

Argentina 

35/36 

49 

22/23 

46 

20/21 

56 

10/15 

86 

Australia 

41 

36 

2 

90 

16 

61 

37 

51 

15 

31 

Austria 

53 

11 

18 

55 

2 

79 

24/25 

70 

Bangladesh 

11 

40 

Belgium 

20 

65 

8 

75 

22 

54 

5/6 

94 

Brazil 

14 

69 

26/27 

38 

27 

49 

21/22 

76 

6 

65 

Canada 

39 

39 

4/5 

80 

24 

52 

41/42 

48 

20 

23 

Chile 

24/25 

63 

38 

23 

46 

28 

10/15 

86 

China 

1 

118 

Columbia 

17 

67 

49 

13 

1 1/12 

64 

20 

80 

Costa  Rica 

42/44 

35 

46 

15 

48/49 

21 

10/15 

86 

Denmark 

51 

18 

9 

74 

50 

16 

51 

23 

East  Africa 

21/23 

64 

33/35 

27 

39 

41 

36 

52 

Equador 

8/9 

78 

52 

8 

13/14 

63 

28 

67 

Finland 

46 

33 

17 

63 

47 

26 

31/32 

59 

France 

15/16 

68 

10/11 

71 

35/36 

43 

10/15 

86 

Germany  FR 

42/44 

35 

15 

67 

9/10 

66 

29 

65 

14 

31 

Great  Britain 

42/44 

35 

3 

89 

9/10 

66 

47/48 

35 

18 

25 

Greece 

27/28 

60 

30 

35 

18/19 

57 

1 

112 

Guatemala 

2/3 

95 

53 

6 

43 

37 

3 

101 

Hong  Kong 

15/16 

68 

37 

25 

18/19 

57 

49/50 

29 

2 

96 

India 

10/11 

77 

21 

48 

20/21 

56 

45 

40 

7 

61 

Indonesia 

8/9 

78 

47/48 

14 

30/31 

46 

41/42 

48 

Iran 

29/30 

58 

24 

41 

35/36 

43 

31/32 

59 

Ireland  (Rep  of) 

49 

28 

12 

70 

7/8 

68 

47/48 

35 

Israel 

52 

13 

19 

54 

29 

47 

19 

81 

Italy 

34 

50 

7 

76 

4/5 

70 

23 

75 

iamaica 

37 

45 

25 

39 

7/8 

68 

52 

13 

iapan 

33 

54 

22/23 

46 

1 

95 

7 

92 

4 

80 

Malaysia 

1 

104 

36 

26 

25/26 

50 

46 

36 

Mexico 

5/6 

81 

32 

30 

6 

69 

18 

82 

Netherlands 

40 

38 

4/5 

80 

51 

14 

35 

53 

10 

44 

New  Zealand 

50 

22 

6 

79 

17 

58 

39/40 

49 

16 

30 

Nigeria 

22 

16 

Norway 

47/48 

31 

13 

69 

52 

8 

38 

50 

Pakistan 

32 

55 

47/48 

14 

25/26 

50 

24/25 

70 

23 

0 

Panama 

2/3 

95 

51 

11 

34 

44 

10/15 

86 

Peru 

21/23 

64 

45 

16 

37/38 

42 

9 

87 

Philippines 

4 

94 

31 

32 

11/12 

64 

44 

44 

21 

19 

Poland 

13 

32 

Portugal 

24/25 

63 

33/35 

27 

45 

31 

2 

104 

Salvador 

18/19 

66 

42 

19 

40 

40 

5/6 

94 

Singapore 

13 

74 

39/41 

20 

28 

48 

53 

8 

9 

48 

South  Africa 

35/36 

49 

16 

65 

13/14 

63 

39/40 

49 

South  Korea 

27/28 

60 

43 

18 

41 

39 

16/17 

85 

5 

75 

Spain 

31 

57 

20 

51 

37/38 

42 

10/15 

86 

Sweden 

47/48 

31 

10/11 

71 

53 

5 

49/50 

29 

12 

33 

Switzerland 

45 

34 

14 

68 

4/5 

70 

33 

58 

Taiwan 

29/30 

58 

44 

17 

32/33 

45 

26 

69 

3 

87 

Thailand 

21/23 

64 

39/41 

20 

44 

34 

30 

64 

8 

56 

Turkey 

18/19 

66 

28 

37 

32/3 

45 

16/17 

85 

Uruguay 

26 

61 

29 

36 

42 

38 

4 

100 

USA 

38 

40 

1 

91 

15 

62 

43 

46 

17 

29 

Venezuela 

5/6 

81 

50 

12 

3 

73 

21/22 

76 

West  Africa 

10/11 

77 

39/41 

20 

30/31 

46 

34 

54 

Yugoslavia 

12 

76 

33/35 

27 

48/49 

21 

8 

88 

Zimbabwe 

19 

25 
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FIGURE  18.4.  High-power  distance:  Malaysian  University  website. 


Individualism  versus  Collectivism 

Individualism  in  cultures  implies  loose  ties;  everyone  is  ex¬ 
pected  to  look  after  one’s  self  or  immediate  family  but  no  one 
else.  Collectivism  implies  that  people  are  integrated  from  birth 
into  strong,  cohesive  groups  that  protect  them  in  exchange  for 
unquestioning  loyalty. 

Hofstede  found  individualistic  cultures  value  personal  time, 
freedom,  challenge,  and  such  extrinsic  motivators  as  material  re¬ 
wards  at  work.  In  family  relations,  they  value  being  honest,  talk¬ 
ing  things  out,  using  guilt  to  achieve  behavioral  goals,  and  main¬ 
taining  self-respect.  Their  societies  and  governments  place 
individual  social-economic  interests  over  the  group,  maintain 
strong  rights  to  privacy,  nurture  strong  private  opinions  (ex¬ 
pected  from  everyone),  restrain  the  power  of  the  state  in  the 
economy,  emphasize  the  political  power  of  voters,  maintain 
strong  freedom  of  the  press,  and  profess  the  ideologies  of  self- 
actualization,  self-realization,  self-government,  and  freedom. 

At  work,  collectivist  cultures  value  training,  physical  condi¬ 
tions,  skills,  and  the  intrinsic  rewards  of  mastery.  In  family  rela¬ 
tions,  they  value  harmony  more  than  honesty/truth  (and  silence 
more  than  speech),  use  shame  to  achieve  behavioral  goals,  and 
strive  to  maintain  face.  Their  societies  and  governments  place 
collective  social-economic  interests  over  the  individual,  may  in¬ 
vade  private  life  and  regulate  opinions,  favor  laws  and  rights  for 
groups  over  individuals,  dominate  the  economy,  control  the 
press,  and  profess  the  ideologies  of  harmony,  consensus,  and 
equality. 


Individualism  and  collectivism  may  influence,  respectively, 

the  following  ’Web  UI  aspects: 

•  Motivation  based  on  personal  achievement:  maximized  (ex¬ 
pect  the  extra-ordinary)  for  individualist  cultures  versus  un¬ 
derplayed  (in  favor  of  group  achievement)  for  collectivist 
cultures. 

•  Images  of  success:  demonstrated  through  materialism  and 
consumerism  versus  achievement  of  socio-political  agendas. 

•  Rhetorical  style:  controversial/argumentative  speech  and  tol¬ 
erance  or  encouragement  of  extreme  claims  versus  official 
slogans  and  subdued  hyperbole  and  controversy. 

•  Prominence  given  to  youth  and  action  versus  to  aged,  expe¬ 
rienced,  wise  leaders  and  states  of  being. 

•  Importance  of  individuals  versus  products  shown  by  them¬ 
selves  or  with  groups. 

•  Underlying  sense  of  social  morality:  emphasis  on  truth  versus 
relationships. 

•  Emphasis  on  change:  what  is  new  and  unique  versus  tradition 
and  history. 

•  Willingness  to  provide  personal  information  versus  protec¬ 
tion  of  personal  data  differentiating  the  individual  from  the 
group. 

The  effects  of  these  differences  can  be  illustrated  on  the  Web 

by  examining  national  park  websites  from  two  countries  with 

very  different  IC  indices  (Fig.  18.6  &  Fig.  18.7).  The  Glacier  Bay 
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FIGURE  18.5.  Low-power  distance:  Dutch  Educational  website. 


National  Park  website  (www.nps.gov/glba/evc.htm  is  located  in 
the  United  States,  which  has  the  highest  IC  index  rating  (91). 
The  website  from  the  National  Parks  of  Costa  Rica  (www 
.tourism-costarica.com)  is  located  in  a  country  with  an  IC  index 
rating  of  15.  The  third  image  (Fig.  18.8)  shows  a  lower  level  of 
the  Costa  Rican  website. 

Note  the  differences  in  the  two  groups  of  websites.  The  U.S. 
website  features  an  emphasis  on  the  visitor,  his  or  her  goals,  and 
possible  actions  in  coming  to  the  park.  The  Costa  Rican  web¬ 
site  features  an  emphasis  on  nature,  downplays  the  individual 
tourist,  and  uses  a  slogan  to  emphasize  a  national  agenda.  An 
even  more  significant  difference  lies  below  the  What’s  Cool 


menu.  Instead  of  a  typical  Western  display  of  new  technology 
or  a  new  experience,  the  screen  is  filled  with  a  major  political 
announcement  that  the  Costa  Rican  government  has  signed  an 
international  agreement  against  the  exploitation  of  children  and 
adolescents. 

Masculinity  versus  Femininity  (MAS) 

“Masculinity”  and  “femininity”  refer  to  gender  roles,  not  physi¬ 
cal  characteristics.  Hofstede  focused  on  the  traditional  assign¬ 
ment  to  masculine  roles  of  assertiveness,  competition,  and 
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FIGURE  1 8.6.  High  individualist  (low  collectivist)  value:  US  National  Park  website. 
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FIGURE  18.7.  Low  individualist  (high  collectivist)  value:  Costa  Rican  National  Park  website. 


toughness,  and  the  assignment  of  feminine  roles  to  home  and 
children,  people,  and  tenderness.  He  acknowledged  that,  in  dif¬ 
ferent  cultures,  different  professions  are  dominated  by  different 
genders.  (For  example,  women  dominate  the  medical  profes¬ 
sion  in  the  Soviet  Union,  while  men  dominate  in  the  United 
States).  However,  in  masculine  cultures,  the  traditional  distinc¬ 
tions  are  strongly  maintained,  while  feminine  cultures  tend  to 
collapse  the  distinctions  and  overlap  gender  roles  (both  men 
and  women  can  exhibit  modesty,  tenderness,  and  a  concern 
with  both  quality  of  life  and  material  success.)  Traditional  mas¬ 
culine  work  goals  include  earnings,  recognition,  advancement, 
and  challenge.  Traditional  feminine  work  goals  include  good 


relations  with  supervisors,  peers,  and  subordinates;  good  living 
and  working  conditions;  and  employment  security. 

The  following  list  shows  some  typical  masculinity  (MAS)  index 
values,  where  a  high  value  implies  a  strongly  masculine  culture: 


95 

Japan 

79 

Austria 

63 

South  Africa 

62 

United  States 

53 

Arab  countries 

47 

Israel 

43 

France 
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FIGURE  1 8.8.  What’s  Cool:  Political  message  about  exploitation  of  children. 


39  South  Korea 
05  Sweden 

Since  Hofstede’s  definition  focused  on  the  balance  between 
roles  and  relationships,  masculinity  and  femininity  may  be  ex¬ 
pressed  on  the  Web  through  different  emphases.  High-masculinity 
cultures  would  focus  on  the  following  UI  design  elements: 

•  Traditional  gender/family/age  distinctions 

•  Work  tasks,  roles,  and  mastery,  with  quick  results  for  limited 
tasks 

•  Navigation  oriented  to  exploration  and  control 

•  Attention  gained  through  games  and  competitions 

•  Graphics,  sound,  and  animation  used  for  utilitarian  purposes 

Feminine  cultures  would  emphasize  the  following  UI  design 
elements, 

•  Blurring  of  gender  roles 

•  Mutual  cooperation,  exchange,  and  support,  (versus  mastery 
and  winning) 

•  Attention  gained  through  poetry,  visual  aesthetics,  appeals 
to  unifying  values 

Examples  of  MAS  differences  on  the  Web  can  be  illustrated 
by  examining  websites  from  countries  with  very  different  MAS 
indices  (Fig.  18.9,  Fig.  18.10,  &  Fig.  18.11).  The  Woman. Excite 
website  (woman.excite.co.jp  )  is  located  in  Japan,  which  has  the 


highest  MAS  value  95).  The  website  narrowly  orients  its  search 
portal  toward  a  specific  gender,  which  this  company  does  not 
do  in  other  countries. 

The  ChickClick  (www.chickclick.com)  U.S.  website  (MAS  = 
52)  consciously  promotes  the  autonomy  of  young  women  (leav¬ 
ing  out  later  stages  in  a  woman’s  life.) 

The  Excite  website  (www.excite.com.se)  from  Sweden,  with 
the  lowest  MF  value  5,  makes  no  distinction  in  gender  or  age. 
(With  the  exception  of  the  Netherlands,  another  low  MAS  coun¬ 
try,  all  other  European  websites  provide  more  preselected 
information.) 

Uncertainty  Avoidance  (UA) 

People  vary  in  the  extent  to  which  they  feel  anxiety  about  un¬ 
certain  or  unknown  matters,  as  opposed  to  the  more  specific 
feeling  of  fear  caused  by  known  or  understood  threats.  Cultures 
vary  in  their  avoidance  of  uncertainty,  creating  different  rituals 
and  having  different  values  regarding  formality,  punctuality,  le- 
gal-religious-social  requirements,  and  tolerance  for  ambiguity. 

Hofstede  noted  that  cultures  with  high  uncertainty  avoid¬ 
ance  tend  to  have  high  rates  of  suicide,  alcoholism,  and  acci¬ 
dental  deaths,  and  high  numbers  of  prisoners  per  capita.  Busi¬ 
nesses  may  have  more  formal  rules,  require  longer  career 
commitments,  and  focus  on  tactical  operations  rather  than  strat¬ 
egy.  These  cultures  tend  to  be  expressive;  people  talk  with  their 
hands,  raise  their  voices,  and  show  emotions.  People  seem  ac¬ 
tive,  emotional,  even  aggressive.  They  shun  ambiguous  situa¬ 
tions  and  expect  structure  in  organizations,  institutions,  and 
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FIGURE  18.9.  High  masculinity  website:  Excite.com  especially  for  women  in  lapan. 
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FIGURE  18.10.  Medium  masculinity  website:  ChickClick.com  in  the  USA. 


relationships  to  help  make  events  easy  to  interpret  interpret  and 
predictable.  Teachers  are  expected  to  be  experts  who  know  the 
answers  and  may  speak  in  cryptic,  academic  language  that  ex¬ 
cludes  novices.  In  high  UA  cultures,  what  is  different  may  be 
viewed  as  a  threat,  and  what  is  “dirty”  (unconventional)  is  of¬ 
ten  equated  with  what  is  dangerous. 

By  contrast,  low  UA  cultures  tend  to  be  less  expressive  and  less 
openly  anxious;  people  behave  quietly  without  showing  aggres¬ 
sion  or  strong  emotions  (though  their  excessive  caffeine  con¬ 
sumption  may  be  to  combat  depression  from  their  inability  to  ex¬ 
press  their  feelings.)  People  seem  easygoing,  even  relaxed.  Teachers 


may  not  know  all  the  answers  (or  there  may  be  more  than  one  cor¬ 
rect  answer),  conduct  more  open-ended  classes,  and  are  expected 
to  speak  in  plain  language.  In  these  cultures,  what  is  different  may 
be  viewed  as  simply  curious,  or  perhaps  ridiculous. 

Based  on  this  definition,  uncertainty  avoidance  may  influ¬ 
ence  contrary  aspects  of  UI  and  Web  design.  High-UA  cultures 
would  emphasize  the  following: 

•  Simplicity,  with  clear  metaphors,  limited  choices,  and  re¬ 
stricted  data 

•  Attempts  to  reveal  or  forecast  results  of  actions  before  users  act 
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FIGURE  18.11.  Low  masculinity  website:  Swedish  Excite.com 


•  Navigation  schemes  intended  to  prevent  users  from  becom¬ 
ing  lost 

•  Mental  models  and  help  systems  that  focus  on  reducing  “user 
errors” 

•  Redundant  cues  (color,  typography,  sound,  etc.)  to  reduce 
ambiguity. 

Low-UA  cultures  would  emphasize  the  reverse: 

•  Complexity  with  maximal  content  and  choices 

•  Acceptance  (even  encouragement)  of  wandering  and  risk, 
with  a  stigma  on  “over-protection” 

•  Less  control  of  navigation;  for  example,  links  might  open  new 
windows  leading  away  from  the  original  location 

•  Mental  models  and  help  systems  might  focus  on  understand¬ 
ing  underlying  concepts  rather  than  narrow  tasks 

•  Coding  of  color,  typography  and  sound  maximize  information 

Online  examples  of  UA  differences  can  be  seen  on  airline 
websites  from  two  countries  with  very  different  UA  indices  (Fig. 
18.12  &  Fig.  18.13).  The  Sabena  Airlines  website  (www.sabena 
.com  )  is  located  in  Belgium,  a  country  with  a  UA  index  of  94,  the 
highest  of  the  cultures  studied.  This  website  shows  a  home  page 
with  very  simple,  clear  imagery  and  limited  choices. 

The  British  Airways  website  (www.britishairways.com)  from 
the  United  Kingdom  (UA  =  35)  shows  more  complexity  of  con¬ 
tent  and  choices  with  pop-up  windows,  multiple  types  of  con¬ 
trols,  and  “hidden”  content  that  must  be  displayed  by  scrolling. 

Long-  versus  Short-Term  Time  Orientation 

In  the  early  1980s,  shortly  after  Hofstede  first  formulated  his  cul¬ 
tural  dimensions,  he  became  convinced  that  a  fifth  dimension 
needed  to  be  defined.  Long-term  time  orientation  (LT)  seemed 
to  play  an  important  role  in  Asian  countries  (influenced  by  Con- 


fucian  philosophy  over  thousands  of  years)  that  shared  these 
beliefs: 

•  A  stable  society  requires  unequal  relations. 

•  The  family  is  the  prototype  of  all  social  organizations;  conse¬ 
quently,  older  people  (parents)  have  more  authority  than 
younger  (and  men  more  than  women) . 

•  Virtuous  behavior  to  others  means  not  treating  them  as  one 
would  not  like  to  be  treated. 

•  Virtuous  behavior  in  work  means  trying  to  acquire  skills 
and  education,  working  hard,  and  being  frugal,  patient,  and 
persevering. 

Western  countries,  by  contrast,  were  more  likely  to  promote 
equal  relationships,  emphasize  individualism,  focus  on  treating 
others  as  you  would  like  to  be  treated,  and  find  fulfillment  through 
creativity  and  self-actualization.  When  Hofstede  and  Bond  devel¬ 
oped  a  survey  specifically  for  Asia  and  reevaluated  earlier  data,  they 
found  that  long-term  orientation  cancelled  out  some  of  the  effects 
of  masculinity/femininity  and  uncertainty  avoidance.  They  con¬ 
cluded  that  Asian  countries  are  oriented  to  practice  and  the  search 
for  virtuous  behavior  while  Western  countries  are  oriented  to  belief 
and  the  search  for  truth.  Of  the  23  countries  compared,  the  fol¬ 
lowing  showed  the  most  extreme  values  (ranked  in  parentheses): 

118  China  (ranked  1) 

80  Japan  (4) 

29  United  States  (17) 

0  Pakistan  (23) 

High-LT  countries  would  emphasize  the  following  aspects 
of  UI  design: 

•  Content  focused  on  practice  and  practical  value 

•  Relationships  as  a  source  of  information  and  credibility 

•  Patience  in  achieving  results  and  goals 
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FIGURE  18.12.  High  uncertainty  avoidance:  Sabena  Airlines  website  from  Belgium. 
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FIGURE.  18.13.  Low  uncertainty  avoidance:  UK's  British  Airways  website. 


Low-LT  (i.e.,  short-term  time-orientation)  countries  would 
emphasize  the  contrary: 

•  Content  focused  on  truth  and  certainty  of  beliefs 

•  Rules  as  a  source  of  information  and  credibility 

•  Desire  for  immediate  results  and  achievement  of  goals 

Online  examples  of  LT  differences  can  be  seen  in  two  ver¬ 
sions  of  the  same  company’s  website  from  two  countries  with 
different  LT  values  (Fig.  18.14  &  Fig.  18.15).  The  Siemens  web¬ 
site  (www.siemens.co.de)  from  Germany  (LT  of  31)  shows  a  typ¬ 


ical  'Western  corporate  layout  that  emphasizes  crisp,  clean  func¬ 
tional  design  aimed  at  achieving  goals  quickly. 

The  Chinese  version  from  Beijing  requires  more  patience  to 
achieve  navigational  and  functional  goals.  Typically,  such  web¬ 
sites  also  have  more  emphasis  on  photos  of  people  and  more 
tolerance  of  informal,  even  cartoon-like  imagery. 

Design  Issues 

Hofstede  noted  that  some  cultural  relativism  is  necessary:  it  is 
difficult  to  establish  absolute  criteria  for  what  is  noble  and  what 
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FIGURE  18. 14.  Low  long-term  orientation:  website  form  Siemens  Germany. 


FIGURE  18. 1 5.  High  long-term  orientation,  website  from  Siemens  in  China. 


is  disgusting.  There  is  no  escaping  bias;  all  people  develop  cul¬ 
tural  values  based  on  their  environment  and  early  training  as 
children.  Not  everyone  in  a  society  Hts  the  cultural  pattern  pre¬ 
cisely  but  there  is  enough  statistical  regularity  to  identify  trends 
and  tendencies.  These  trends  and  tendencies  should  not  be 
treated  as  defective  or  used  to  create  negative  stereotypes  but 
recognized  as  different  patterns  of  values  and  thought.  In  a  mul¬ 


ticultural  world,  it  is  necessary  to  cooperate  to  achieve  practi¬ 
cal  goals  without  requiring  everyone  to  think,  act,  and  believe 
identically. 

The  cultural  differences  just  identified  seem  to  survive  in 
websites  over  periods  of  time.  A  study  by  Marcus  and  Baum¬ 
gartner  (2003)  showed  that  over  a  three-year  period,  the  Sabena 
and  British  Airways  websites  cited  earlier  changed  their  designs 
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dramatically.  The  main  air-booking  areas  had  approximately  the 
same  number  of  links,  but  the  areas  around  this  central  area 
differed  remarkably.  The  UK  site  had  twice  as  many  links,  indi¬ 
cating  again  that  more  complexity  was  tolerated. 

Although  this  analysis  just  discussed  has  used  Hofstede’s  di¬ 
mensions,  many  recent  publications  use  other  sets  of  dimen¬ 
sions,  often  numbering  7±2  dimensions,  and  sometimes  mixing 
several  theorists.  Marcus  and  Baumgartner  (2004)  published  a 
summary  of  Baumgartner’s  thesis,  which  surveyed  approxi¬ 
mately  60  experts  worldwide  regarding  their  views  of  the  best 
dimensions  from  11  theories  through  which  to  evaluate  UIs. 
Twenty-nine  dimensions  emerged,  from  which  five  dimensions 
seemed  to  be  the  highest  esteemed:  (a)  high  versus  low  con¬ 
text,  (b)  technology  attitudes  and  status,  (c)  uncertainty  avoid¬ 
ance,  (d)  time  perception,  and  (e)  authority  perception.  While 
the  optimum  set  of  dimensions  continues  to  be  debated,  more 
publications  within  the  UI  community  extol  the  value  of  using 
culture  dimensions  to  analyze  and  design  UIs. 

This  review  of  cultural  dimensions  raises  many  issues  about 
UI  design,  especially  for  the  Web: 

•  How  formal  or  rewarding  should  interaction  be? 

•  What  will  motivate  different  people?  Money?  Fame?  Honor? 
Achievement? 

•  How  much  conflict  can  people  tolerate  in  content  or  style  of 
argumentation? 

•  Should  sincerity,  harmony,  or  honesty  be  used  to  make 
appeals? 

•  What  role  exists  for  personal  opinion  versus  group  opinion? 

•  How  well  are  ambiguity  and  uncertainty  avoidance  received? 

•  Will  shame  or  guilt  constrain  negative  behavior? 

•  What  role  do  community  values  play  in  individualist  versus 
collectivist  cultures? 

•  Does  the  objective  of  distance  learning  change  what  can  be 
learned  in  individualist  versus  collectivist  cultures?  Should 
these  sites  focus  on  tradition?  Skills?  Expertise?  Earning  power? 

•  How  should  online  teachers  or  trainers  act:  friends  or  gurus? 

•  Would  job  sites  differ  for  individualist  versus  collectivist  cultures? 

•  Should  there  be  different  sites  for  men  and  women  in  differ¬ 
ent  cultures? 

•  Would  personal  Webcams  be  okay?  No  okay? 

•  How  much  advertising  hyperbole  is  likely  to  be  tolerated? 

•  Would  an  emphasis  on  truth  as  opposed  to  practice  and 
virtue  require  different  types  of  procedural  websites  for  West¬ 
ern  or  Asian  audiences? 

If  cross-cultural  theory  becomes  an  accepted  element  of  UI 
design,  then  we  may  need  to  change  our  current  practices  and 
tools  (e.g.,  use  appropriate  testing  and  focus-group  techniques 
or  design-team  collaboration  techniques  suitable  for  the  cul¬ 
tures  involved)  or  develop  new  ones.  We  need  to  make  it  feasi¬ 
ble  to  develop  multiple  culturally  distinguished  versions  of  web¬ 
sites  in  a  cost-effective  manner,  perhaps  through  templates  or 
through  specific  versioning  tools.  As  personal  computer,  Web, 
and  mobile  products/services  continue  to  develop  globally,  an¬ 


swering  these  questions,  and  exploring,  then  exploiting,  these 
dimensions  of  culture,  will  become  a  necessity  not  an  option, 
for  successful  theory  and  practice. 


CONCLUSIONS  AND 
FUTURE  RESEARCH  ISSUES 


To  achieve  culturally  sensitive,  successful  global  access  to  UIs 
provides  many  design  challenges  in  the  UTdevelopment 
process.  Progress  in  technology  increases  the  number  and  kinds 
of  functions,  data,  platforms,  and  users  of  computer-based  com¬ 
munication  media.  The  challenge  of  enabling  more  people  and 
more  kinds  of  people  to  use  this  content  and  these  tools  effec¬ 
tively  will  depend  increasingly  upon  global  solutions.  By  recog¬ 
nizing  the  need  for,  and  benefit  to  users  of  UI  designs  intended 
for  international  and  intercultural  markets,  developers  will 
achieve  greater  success  and  increased  profitability  through  the 
global  distribution  and  increased  acceptance  of  their  products. 

The  recommendations  provided  in  this  chapter  are  an  ini¬ 
tial  set  of  heuristics  that  will  assist  developers  in  achieving  global 
solutions  to  their  product  development.  Design  methodologies 
must  support  globalization  throughout  the  development 
process.  In  addition,  it  is  likely  that  some  international  and  in¬ 
tercultural  references  will  change  rapidly,  requiring  frequent  up¬ 
dating  of  designs.  Future  work  on  global  UI  design  may  address 
the  following  issues: 

1.  How  might  global  UIs  be  designed  to  account  for  different 
kinds  of  intelligence?  Gardner  (1985)  identified  the  following 
dimensions  of  intelligence.  These  dimensions  suggest  users 
might  have  varying  strengths  of  conceptual  competence  with 
regard  to  using  UIs  on  an  individual  basis,  but  these  might 
also  vary  internationally,  or  interculturally,  due  to  influences 
of  language,  history,  or  other  factors: 

•  Verbal/image  comprehension 

•  Word/image  fluency 

•  Numerical/graphical  fluency 

•  Spatial  visualization 

•  Associative  memory 

•  Perceptual  speed 

•  Reasoning 

•  Interpersonal  awareness 

•  Self-awareness 

2.  How  might  metaphors,  mental  models,  and  navigation  be  de¬ 
signed  precisely  for  different  cultures  that  differ  by  such  di¬ 
mensions  as  age,  gender,  national  or  regional  group,  or  pro¬ 
fession?  The  author  has  posed  this  issue  earlier  as  a  question  to 
the  UI  analysis/design  community  (Marcus,  1993,  April).  The 
topic  is  discussed  broadly  in  DelGaldo  and  Nielson  (1996) .  Fur¬ 
ther,  what  means  can  be  developed  to  enable  these  variations 
to  be  produced  in  a  cost-effective  manner  using  templates? 

The  taxonomic  analyses  of  global  issues  for  Uis,  the  theoret¬ 
ical  basis  for  their  component  selection,  the  criteria  for  their 
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evaluation,  and  their  design  methodology  have  emerged  in  the 
Ul-development  field.  Articles  about  the  impact  of  culture  dif¬ 
ferences  on  UI  design  and  techniques  of  ethnographic  analysis 
appear  ever  more  frequently  in  primary  industry  publications 
and  in  professional  conferences  such  as  ACM/SIGCHI,  the 
Usability  Professionals  Association,  the  American  Anthropolo¬ 
gists’  Association,  IWIPS,  and  others.  The  lively  exchanges  to 
be  found  on  Anthropologists  in  Design’s  Internet  discussions 
likewise  attest  to  the  growing  numbers  of  professionals  in¬ 
volved  in  this  cross-disciplinary  practice  (see  URL  references  for 
resources).  Designers  should  be  aware  of  the  scope  of  the  ac¬ 
tivity,  know  sources  of  insight,  and  incorporate  professional 
techniques  in  their  development  process  in  order  to  improve 
the  value  and  success  of  their  international  and  intercultural 
computer-mediated  products  and  services. 
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A  conversational  speech  interface  for  a  computer  emulates 
human-to-human  interaction  by  calling  on  our  inherent  ability 
as  humans  to  speak  and  listen.  While  human  speech  is  a  skill 
we  acquire  early  and  practice  frequently,  getting  computers  to 
map  sounds  to  actions  and  to  respond  appropriately  with  either 
synthesized  or  recorded  speech  is  a  massive  programming  un¬ 
dertaking.  Because  we  all  speak  a  little  differently  from  each 
other,  and  because  the  accuracy  of  the  recognition  is  dependent 
on  an  audio  signal  that  can  be  distorted  by  many  factors,  speech 
technology,  like  the  other  recognition  technologies,  lacks  100% 
accuracy.  When  designing  a  spoken  interface,  one  must  design 
to  the  strengths  and  weaknesses  of  the  technology  to  optimize 
the  overall  user  experience. 

The  goal  for  a  spoken  user  interface  is  to  emulate  a  conver¬ 
sation  convincingly  enough  that  the  person  interacting  with  the 
computer  can  use  what  he  or  she  has  learned  in  a  lifetime  of  con¬ 
versations.  Successful  communication  occurs  when  the  sender 
and  receiver  of  the  message  achieve  a  shared  understanding. 
Human-to-human  conversations  are  characterized  by  turn  tak¬ 
ing,  or  shifts  in  initiative,  as  well  as  verbal  and  nonverbal  feed¬ 
back  to  indicate  understanding.  Herb  Clark  (1993)  said,  “Speak¬ 
ing  and  listening  are  two  parts  of  a  collective  activity.”  Because 
language  use  is  deeply  ingrained  in  human  behavior,  successful 
speech  interfaces  should  be  based  on  an  understanding  of  the 
different  ways  that  people  use  language  to  communicate.  Speech 
applications  should  adopt  language  conventions  that  help  peo¬ 
ple  know  what  they  should  say  next  and  avoid  conversational 
patterns  that  violate  standards  of  polite,  cooperative  behavior. 

Many  excellent  examples  of  speech  user  interfaces  emulate 
effective  conversational  partners.  In  these  systems,  the  computer 
“speaker”  appears  to  remember  contextual  information  and 
gives  the  impression  of  understanding  what  the  user  is  saying. 
For  example,  several  airlines  use  speech  applications  to  handle 
telephone  reservations  or  lost  baggage  tracking  (Cohen,  Gian- 
gola,  &  Balogh,  2004).  These  systems  are  a  substantial  improve¬ 
ment  over  previous  interactive  voice  response  (IVR)  systems  that 
relied  solely  on  telephone  keypad  input.  Instead  of  pressing  “1” 
for  this  and  “2”  for  that,  users  can  speak  natural  language  phrases 
such  as  “I’d  like  to  travel  from  Boston  to  New  York.” 

A  crucial  factor  in  determining  the  success  of  a  spoken  ap¬ 
plication  is  whether  the  use  of  speech  technology  presents  a 
clear  benefit.  Speech  is  best  used  when  it  enables  something 
that  cannot  otherwise  be  done,  such  as  accessing  electronic 
mail  or  an  online  calendar  over  the  telephone  when  a  computer 
is  not  available.  In  general,  it  is  effective  to  use  speech  applica¬ 
tions  for  situations  when  speech  can  enable  a  task  to  be  done 
more  efficiently,  such  as  when  a  user’s  hands  and  eyes  are  busy 
doing  another  task.  Likewise,  speech  input  is  useful  when  no 
keyboard  is  available  for  text  entry,  or  when  a  user  has  a  physical 
disability  that  limits  the  use  of  his  or  her  hands,  or  if  a  user  is  just 
not  comfortable  typing.  Speech  output  is  particularly  liberating 
for  people  with  visual  impairments.  In  addition,  it  provides  a 
way  of  communicating  information  if  the  user  is  in  a  divided  at¬ 
tention  state  such  as  driving,  and  it  can  be  used  to  grab  users’  at¬ 
tention  or  embody  a  particular  personality  for  a  computer  sys¬ 
tem  or  character. 

While  speech  seems  like  it  might  be  the  ideal  way  to  com¬ 
municate  with  a  computer  anytime,  there  are  situations  when 
it  is  best  not  to  use  a  speech  user  interface.  For  example,  speech 


output  is  ineffective  for  delivering  large  amounts  of  information. 
Not  only  is  it  difficult  for  users  to  maintain  the  information  in 
short-term  memory,  but  people  can  read  much  faster  than  they 
can  listen.  Speech  input  can  also  be  problematic  when  the 
speaker  is  not  in  a  private  environment,  or  when  other  voices 
in  the  background  might  interfere  with  the  speech  recognition. 

The  success  of  a  spoken  interaction  with  a  computer  depends 
on  not  only  the  task  and  the  motivation  of  the  user,  but  also  on 
the  physical  device(s)  being  used.  Speech  input  and  output  ca¬ 
pabilities  of  devices  greatly  vary.  For  example,  personal  comput¬ 
ers  (PCs)  provide  good  quality  audio  subsystems  and  speakers, 
and  a  variety  of  microphones  that  perform  very  well  under  quiet 
conditions  can  be  used  with  them.  The  audio  channels  for  other 
devices  are  not  on  par  with  PCs.  "While  most  personal  digital  as¬ 
sistants  (PDAs)  have  started  to  offer  speech  input  capabilities, 
the  quality  of  their  microphones  and  audio  subsystems  degrade 
the  speech  signal,  resulting  in  poor  recognition  performance 
for  many  applications.  In  addition,  PDAs  have  insufficient  com¬ 
puting  resources  to  enable  large  vocabulary  speech  recognition 
processing.  For  telephony  applications,  when  the  input  device 
is  either  a  cell  phone  or  a  landline,  the  speech  recognition  en¬ 
gines  are  usually  deployed  on  large  servers.  The  accuracy  of 
telephony  systems  for  a  given  task  is  normally  lower  than  for  a 
similar  PC  configuration  with  a  headset  microphone  in  a  quiet 
environment.  Background  noise,  signal  degradation,  poor  cellu¬ 
lar  connection  and  the  application  of  compression  techniques 
can  substantially  reduce  the  performance  of  speech  recognition 
systems.  Nevertheless,  careful  design  can  compensate  for  many 
of  these  problems  and  lead  to  successful  interaction  with  a 
telephone-based  speech  application. 

Speech  technology  can  be  used  for  the  following  types  of 
tasks: 

•  Composition.  Composition  tasks  have  as  their  primary  goal 
the  creation  of  a  document  such  as  word  processing  docu¬ 
ments,  e-mails,  or  instant-messaging  text.  Composition  in¬ 
cludes  dictating  the  text  and  fixing  any  recognition  errors. 

•  Transcription.  Transcription  is  similar  to  composition  in 
that  a  document  is  created  from  speech,  but  it  differs  by 
virtue  of  the  fact  that  the  primary  user  task  is  parallel  to  the 
creation  of  a  document.  Broadcast  news,  business  meetings, 
and  business  calls  are  examples  of  situations  where  having  a 
permanent,  textual  record  of  the  speech  is  valuable.  A  digital 
(textual)  record  of  the  conversation  is  searchable,  readable  by 
the  deaf,  and  supports  advanced  business  intelligence  appli¬ 
cations  such  as  data  mining.  These  are  all  examples  of  tran¬ 
scription  tasks. 

•  Transaction.  The  third  type  of  interaction,  and  a  major  fo¬ 
cus  of  this  chapter,  is  one  where  users  have  as  their  goal  the 
completion  of  one  or  more  transactions,  rather  than  the  cre¬ 
ation  of  a  document  or  a  permanent  record  of  a  conversa¬ 
tion.  Examples  of  transactional  applications  include  financial 
account  management  such  as  trading  stocks,  e-commerce 
applications  such  as  the  purchase  of  computer  equipment, 
searching  for  information  on  the  Internet,  and  controlling 
the  environment. 

•  Collaboration.  The  final  type  of  conversational  task  is  col¬ 
laboration.  Collaborative  conversational  applications  are  char- 
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acterized  by  tasks  that  result  in  human-to-human  communi¬ 
cation.  One  example  of  this  is  when  speech  recognition  is 
used  as  an  input  modality  to  an  instant  messaging  application. 

Designing  a  speech  user  interface  is  similar  to  designing  any 
other  interface  for  human-computer  interaction  (HCI) .  A  good 
design  relies  on  applying  principles  of  user-centered  design,  and 
many  of  these  same  principles  and  techniques  can  be  used  with 
speech  interfaces.  In  this  chapter,  we  discuss  the  lifecycle  of  a 
speech  application  from  the  starting  point  of  understanding  the 
requirements  for  speech,  through  knowing  which  technology 
to  use,  selecting  a  dialog  style,  and  designing  the  prompts.  All 
these  steps  culminate  with  an  iterative  testing  process. 


SPOKEN  INTERFACE  LIFECYCLE 


Once  a  designer  has  clearly  established  that  including  speech 
technology  for  input,  output,  or  both,  is  an  appropriate  design 
decision,  the  designer  must  begin  to  define  how  speech  best  fits 
into  the  accomplishment  of  the  task.  In  order  to  do  this,  the  de¬ 
signer  must  first  have  a  clear  understanding  of  the  task.  With  the 
task  well  understood,  the  designer  needs  to  overlay  speech  onto 
that  model  by  listening  to  how  people  speak  in  the  domain.  Ob¬ 
serving  humans  interacting  and  speaking  to  accomplish  the  task 
may  result  in  additional  refinements  to  the  model.  The  next  step 
is  to  select  the  appropriate  speech  technology,  or  combination  of 
technologies,  for  the  task,  in  the  contexts  of  use  and  user  group. 
The  heart  of  the  work  for  a  speech  interface  designer  is  to  craft 
the  prompts  that  the  system  will  “speak”  and  to  prepare  the  sys¬ 
tem  to  recognize  what  the  users  will  most  likely  say  in  response. 
Lastly,  the  designer  should  create  a  series  of  tests  for  the  system, 
initially  in  the  lab  and  ultimately  out  in  the  real  world  with  users 
accomplishing  real  tasks.  In  between  the  various  forms  of  test¬ 
ing,  a  designer  needs  to  allocate  time  to  refine  the  design  and 
make  changes  based  on  what  he  or  she  learns  from  the  results  of 
the  testing.  In  this  chapter,  we  will  cover  certain  steps  in  greater 
detail  than  others.  These  steps  include  (a)  listening  to  users 
speak  when  accomplishing  the  task,  (b)  selecting  the  technology, 
and  (c)  designing  the  system  prompts. 

Thus,  the  first  step  in  the  life  cycle  of  a  conversational  inter¬ 
face  is  to  model  the  task  that  users  will  be  accomplishing.  This 
requires  the  designer  to  sketch  out  the  logical  steps  that  make 
up  the  task,  along  with  the  pieces  of  information  that  must  be  ex¬ 
changed  in  order  to  complete  each  step.  With  a  speech  applica¬ 
tion,  each  interaction  between  the  system  and  a  user  is  often  re¬ 
ferred  to  as  a  “turn.”  Thus,  the  task  model  tries  to  capture  the 
expected  number  of  turns,  as  well  as  the  vocabulary  that  is  re¬ 
quired  to  support  the  information  exchange  during  those  turns. 
It  is  important  for  the  designer  to  consider  alternate  flows,  since 
not  all  users  will  necessarily  approach  the  task  in  the  same  way. 
For  example,  when  making  a  flight  reservation,  some  users  will 
select  a  flight  based  exclusively  on  time  schedules,  while  others 
will  opt  for  airline  loyalty  and  price  points,  accepting  any  flight 
time  that  meets  their  criteria  for  loyalty  and  cost. 

The  model  must  operate  within  the  set  of  constraints  for  the 
application.  Constraints  usually  include  the  business  goals  for 
creating  the  speech  application,  as  well  as  the  requirements  re¬ 


sulting  from  the  environment  in  which  users  may  find  them¬ 
selves  (e.g.,  noisy  environment)  and  the  users  themselves  (e.g., 
the  majority  of  users  will  be  over  65  years  old  and  uncomfort¬ 
able  with  technology).  For  example,  if  a  business  goal  is  to  cross- 
sell  related  items  when  a  customer  makes  a  purchase  by  calling 
into  a  speech-enabled  call  center  application,  the  designer  will 
want  to  incorporate  a  turn  that  involves  the  system  suggesting 
related  items  that  are  on  sale  that  day. 


SELECTING  THE  SPEECH  TECHNOLOGY 


While  good  interaction  design  in  a  speech  application  can  com¬ 
pensate  for  some  shortcomings  in  speech  technology,  if  a  cer¬ 
tain  baseline  level  of  accuracy  is  not  achieved,  the  application 
probably  will  not  succeed.  Accuracy  depends  on  the  choice  of 
the  underlying  speech  technology,  and  making  the  best  match 
between  the  technology,  the  task,  the  users,  and  the  context  of 
use.  Automatic  Speech  Recognition  (ASR)  can  have  explicitly  de¬ 
fined,  rule-based  grammars,  or  use  statistical  grammars  such  as 
language  models.  Usually,  a  transactional  system  uses  explicitly 
defined  grammars,  while  dictation  systems  or  natural  language 
understanding  systems  use  statistical  models.  The  designer  will 
also  have  to  decide  whether  to  use  synthetic  speech  in  the  sys¬ 
tem  or  not.  Synthetic  speech,  also  known  as  text-to-speech 
(TTS),  is  speech  produced  by  a  computer.  Given  that  today’s 
synthesizers  still  do  not  sound  entirely  natural,  the  choice  of 
whether  to  use  synthesized  output,  recorded  output,  or  no 
speech  output  can  be  a  difficult  one.  The  next  section  discusses 
the  various  technologies  in  more  detail. 

How  Does  ASR  Work? 

ASR  systems  work  by  analyzing  the  acoustic  signal  received 
through  a  microphone  connected  to  the  computer  (see  Figure 
19.1).  The  user  speaks  and  the  microphone  captures  the 
acoustic  signal  as  digital  data,  which  is  then  analyzed  by  an 
acoustic  model  and  a  language  model.  The  different  speech 
recognition  systems  on  the  market  differ  in  the  building  blocks 
(e.g.,  phonemes)  that  they  employ  to  analyze  the  acoustic  signal 
data.  The  analysis  employs  algorithms  based  on  Hidden  Markov 
Models,  a  type  of  algorithm  that  uses  stochastic  modeling  to 
decode  a  sequence  of  symbols,  to  complete  the  computations 
(Rabiner,  1989;  Roe  &  Wilpon,  1993).  After  the  acoustic  analysis 
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is  complete,  the  system  analyzes  the  resulting  strings  of  build¬ 
ing-block  data  using  a  language  model  that  contains  a  base  vo¬ 
cabulary  and  any  specific  domain  topics  (e.g.,  computer,  medical) 
that  may  have  been  added.  When  the  analysis  is  completed,  the 
text  appears  on  the  computer  screen. 

Speech  recognition  systems  require  computers  with  an 
approximate  minimum  of  200MHz  processor,  32MB  of  RAM, 
300  MB  of  available  hard  disk,  and  a  16-bit  sound  card  with  a  mi¬ 
crophone  input  jack  and  good  recording.  These  requirements 
enable  local  decoding  of  the  recognized  speech.  It  is  possible  to 
have  a  system,  such  as  a  small,  pervasive  device,  capture  a  user’s 
speech  and  then  decode  it  on  a  remote  server  (e.g..  Price  & 
Sears,  2005)  and  return  the  decoded  text  to  the  user,  albeit  with 
a  short  time  delay.  Multiple  users  can  work  with  one  installation 
of  a  speech  recognition  system.  In  some  systems,  users  create 
personal  voice  models  and  log  on  with  their  individual  user 
names,  and  the  system  uses  their  personalized  speech  files  for 
the  recognition  processing.  Each  user  can  also  create  several  dif¬ 
ferent  user  voice  models  in  order  to  achieve  the  best  recognition 
rates  in  environments  with  different  levels  of  background  noise 
(e.g.,  home,  office,  and  mobile  work  locations).  For  results  on 
the  effectiveness  of  creating  and  using  different  user  voice  mod¬ 
els,  see  Price,  Ling,  Feng,  Goldman,  Sears,  &  Jacko  (2005). 

Karat,  Lai,  Danis,  and  Wolf  (1999)  reported  ASR  system  accu¬ 
racy  rates  in  the  mid-90s  for  “in  vocabulary”  words  (e.g.,  words  in 
the  20,000-word  vocabulary  included  with  the  software)  in  peo¬ 
ple’s  initial  use  of  continuous  speech  recognition  software.  Error 
rates  of  2-5%  are  common  with  more  frequent  use.  Karat,  Horn, 
Halverson,  and  J.  Karat  (1999)  tested  three  commercially  avail¬ 
able  speech  recognition  systems  in  1998  with  users  in  initial  and 
extended  use  and  found  initial  use  data  of  13.6  corrected  words 
per  minute  with  an  average  of  one  incorrect,  missing,  or  extra 
word  for  every  20  words  and  one  formatting  error  every  75 
words.  Improved  ASR  system  accuracy  and  higher  user  produc¬ 
tivity  were  measured  when  users  employed  more  recent  ver¬ 
sions  of  the  speech  recognition  software  (Sears,  Karat,  Oseitutu, 
Kaimullah,  &  Feng,  2000;  Feng,  Sears,  &  Karat,  2006),  since  users 
are  able  to  correct  errors  more  easily  and  quickly.  The  product 
also  provided  a  better  quality  microphone  to  reduce  the  number 
of  recognition  errors  that  occurred  in  the  first  place. 

In  general,  there  are  three  types  of  errors  that  occur  in  ASR 
systems  (Halverson,  Horn,  C.  Karat,  &  J.  Karat,  1999).  The  first  is 
that  users  can  make  direct  errors  where  they  misspeak,  stutter, 
or  press  the  wrong  key.  Second,  users  can  make  errors  of  intent, 
where  they  decide  to  restate  something.  The  third  type  of  er¬ 
ror  is  an  indirect  error,  where  the  speech  recognition  system 
mis-recognizes  what  the  user  says.  The  indirect  errors  are  diffi¬ 
cult  to  detect  during  proofreading.  All  three  types  of  errors  can 
lead  to  cascading  errors  where,  in  the  process  of  correcting  one 
error,  others  occur. 

Most  telephony  systems  are  speaker-independent  (e.g.,  no 
personalized  training  of  the  voice  models  required)  speech- 
recognition  systems.  They  are  also  usually  server  based  and 
must  handle  the  signal  degradation  that  occurs  across  the  tele¬ 
phone  lines.  Telephony  systems  can  be  created  from  a  combi¬ 
nation  of  speech  recognition  and  natural  language  processing 
(NLP)  technologies.  In  telephony  systems,  a  dialog  manager 
component  works  with  the  speech  recognition  software  to  han¬ 
dle  the  course  of  the  conversation  with  the  user.  The  system 
provides  feedback  to  the  user  through  the  dialog  manager  using 


recordings  of  either  human  voice  or  text-to-speech.  Telephony 
systems  have  capabilities  such  as  “barge  in”  and  “talk  ahead” 
that  enable  the  user  to  redirect  the  action  of  the  system  and 
complete  multiple  requests  before  being  prompted  for  addi¬ 
tional  information  necessary  to  complete  the  task.  Conversa¬ 
tional  telephony  systems  include  interactive  voice  response 
(LVR)  systems  and  new  systems  built  using  voice  XML  (see  sec¬ 
tion  on  NLP  for  description  of  the  technology). 

How  Does  Text-to-Speech  (TTS)  Work? 

Text-to-speech  (TTS)  synthesis  enables  computers  or  other  elec¬ 
tronic  systems  such  as  telephones  to  output  simulated  human 
speech.  Synthetic  speech  is  based  on  the  fields  of  (a)  text  analysis, 
(b)  phonetics,  (c)  phonology,  (d)  syntax,  (e)  acoustic  phonetics, 
and  (f )  signal  processing.  The  quality  and  effectiveness  of  speech 
synthesis  is  based  on  a  hierarchy.  The  base  level  of  achievement  is 
to  produce  speech  synthesis  that  is  intelligible  by  human  beings. 
The  second  level  is  to  produce  speech  synthesis  that  simulates 
the  natural  qualities  of  human  speech.  The  third  level  of  speech 
synthesis  is  to  produce  synthesized  speech  that  is  personalized  to 
the  person  it  is  representing — that  is,  the  intonation  of  the  par¬ 
ticular  person’s  speech  is  represented.  The  fourth  and  highest 
level  of  achievement  in  synthesized  speech  is  to  produce  speech 
based  on  a  person’s  own  voice  recordings,  so  that  the  speech 
sounds  just  like  the  actual  person  being  represented.  Currently, 
speech  synthesis  technology  has  achieved  the  base  level  of  qual¬ 
ity  and  effectiveness,  and  concatenated  synthesis  can  simulate  the 
natural  quality  of  human  speech,  although  at  great  expense.  The 
third  and  fourth  levels  of  speech  synthesis  technology  are  the  fo¬ 
cus  of  research  in  laboratories  around  the  world. 

Two  types  of  speech  synthesis  are  commercially  available 
today:  (a)  concatenated  synthesis  and  (b)  formant  synthesis, 
which  is  the  most  prevalent  type.  Concatenated  synthesis  employs 
computers  to  assemble  recorded  voice  sounds  into  speech  out¬ 
put.  It  sounds  natural,  but  can  be  prohibitively  expensive  for 
many  applications,  as  it  requires  large  disk  storage  space  for  the 
units  of  recorded  speech  and  significant  computational  power 
to  assemble  the  speech  units  on  demand.  Concatenated  syn¬ 
thesizers  rely  on  databases  of  diphones  and  demi-syllables  to 
create  the  natural-sounding  synthesized  speech.  Diphones  are 
the  transitions  between  phonemes.  Demi-syllables  are  the  half¬ 
syllables  recorded  from  the  beginning  of  a  sound  to  the  center 
point,  or  from  the  center  point  to  the  end  of  a  sound  (Wein- 
schenk  &  Barker,  2000).  After  the  voice  units  are  recorded,  then 
the  database  of  units  is  coded  for  changes  in  frequency,  pitch, 
and  prosody  (intonation  and  duration).  The  coding  process  en¬ 
ables  the  database  of  voice  units  to  be  as  efficient  as  possible. 

Formant  synthesis  is  a  rule-based  process  that  creates  ma¬ 
chine-generated  speech  (see  Figure  19.2).  A  set  of  phonological 
rules  is  applied  to  an  audio  waveform  that  simulates  human 
speech.  Formant  synthesis  involves  two  complex  steps:  the  first 
covers  the  conversion  of  the  input  text  into  a  phonetic  repre¬ 
sentation,  while  the  second  encompasses  the  production  of 
sound  based  on  that  phonetic  representation.  In  the  first  step, 
the  text  is  input  from  a  database  or  file  and  is  normalized  so  that 
any  symbols  or  abbreviations  are  resolved  as  full  alphabetic 
words.  To  convert  the  words  into  phonemes,  a  pronunciation 
dictionary  is  used  for  most  words  and  a  set  of  letter-to-sound 
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FIGURE  19.2.  Model  of  text  to  speech  synthesis. 


rules  is  used  for  word  exceptions  not  available  in  the  dictionary. 
In  the  second  step,  the  phonemes  are  analyzed  using  a  sound  in¬ 
ventory  and  intonation  rules  about  pitch  and  duration.  The 
speech  synthesis  is  the  resulting  output  heard  by  users  through 
a  speaker  or  headphone. 

The  quality  of  the  synthetic  speech  should  be  evaluated 
along  the  lines  of  its  acceptability  naturalness,  and  intelligibil¬ 
ity.  It  is  important  to  ask  users  to  evaluate  different  speech 
against  each  other,  because  these  qualities  are  always  relative 
(Francis  &  Nusbaum,  1999).  Although  these  subjective  differ¬ 
ences  in  opinion  will  probably  always  exist,  a  study  by  Lai,  Wood, 
and  Considine  (2000)  showed  that  there  were  no  significant  dif¬ 
ferences  in  comprehension  levels  for  longer  messages  (e.g., 
with  a  word  length  ranging  from  100  to  500  words)  among  five 
major  commercial  text-to-speech  engines. 

Current  Capabilities  and  Limitations 
of  Speech-Synthesis  Software 

Formant  synthesis  produces  speech  that  is  highly  intelligible, 
but  sounds  unnatural;  however,  it  has  the  power  to  inexpen¬ 
sively  produce  nearly  unlimited  speech,  from  a  resource  point 
of  view.  The  limitation  of  using  formant  synthesis  is  the  com¬ 
plexity  of  the  required  linguistic  rules  to  produce  accurate 
speech  output.  Utilizing  domain-specific  information  and  as¬ 
sumptions  produces  a  substantial  improvement  in  the  synthe¬ 
sizer’s  prosody.  (Prosody  refers  to  speech  elements  such  as  in¬ 
tonation,  duration,  pitch,  rate,  loudness,  or  rhythm.)  Users  will 
be  able  to  comprehend  the  speech  at  a  higher  rate  and  will  per¬ 
ceive  the  voice  to  be  more  natural. 

Some  applications  of  concatenated  synthesis  attempt  to  re¬ 
duce  costs  by  basing  the  voice  recordings  on  whole  words.  In¬ 
teractive  voice  systems  (IVR)  and  voice  mail  systems  that  use 
concatenated  speech  synthesis  based  on  voice  recordings  of 
whole  words  sound  unnatural  and  unevenly  paced,  and  this 
makes  the  synthetic  voice  hard  to  understand  or  remember.  An 
application  of  concatenated  synthesis  needs  to  be  done  cor¬ 
rectly  or  not  at  all. 

How  Do  Natural  Language  Processing  (NLP)  and  Natural 
Language  Understanding  (NLU)  Work? 

“Natural  language  processing  (NLP)”  refers  to  a  wide  range  of 
processing  techniques  aimed  at  extracting,  representing,  re¬ 
sponding  to,  and  ultimately  understanding  the  semantics  of  text. 


Natural  Language  Understanding  (NLU)  is  an  area  of  NLP  focused 
on  the  understanding  of  natural  language  text.  It  is  the  process  of 
analyzing  text  and  taking  some  action  based  on  the  meaning  of 
the  text.  We  include  any  technology  that  allows  a  user  to  com¬ 
municate  with  a  system  using  a  language  that  is  not  rigidly  struc¬ 
tured  (e.g,  a  “formal”  language).  The  focus  of  this  chapter  is  on 
systems  in  which  communication  between  the  user  and  the  sys¬ 
tem  has  constructs  similar  in  grammar  and  dialog  to  the  lan¬ 
guage  of  everyday  human-to-human  communication. 

As  a  technology,  NLU  is  independent  from  speech  recognition, 
although  the  combination  of  the  two  yields  a  powerful  human- 
computer  interaction  (HCI)  paradigm  (see  Figure  19.3).  When 
combined  with  NLU,  speech  recognition  transcribes  an  acoustic 
signal  into  text,  which  is  then  interpreted  by  an  understanding 
component  to  extract  meaning.  In  a  conversational  system,  a  dia¬ 
log  manager  will  then  determine  the  appropriate  response  to  give 
the  user.  Communication  with  the  user  can  take  place  via  a  vari¬ 
ety  of  modalities,  including  (a)  speech  input  and  output,  (b)  text 
input  and  output,  (c)  handwriting,  or  (d)  some  combination  of 
these  modalities.  Figure  19.3  shows  a  block  diagram  of  a  proto¬ 
typical  multimodal  conversational  system  that  allows  speech  and 
keyboard  natural  language  input,  and  speech  and  GUI  text  output. 

NLU  has  been  an  active  area  of  research  for  many  decades. 
The  promise  of  NLU  lies  in  the  “naturalness”  of  the  interaction. 
Since  humans  have  deep  expertise  in  interacting  with  one  an¬ 
other  through  language,  the  idea  that  leveraging  a  user’s  ability 
to  interact  using  language  will  result  in  systems  with  greater 
usability  has  been  an  implicit  and  explicit  hypothesis  in  a  wide 
array  of  research  studies  and  technology  development  efforts. 
Designing  systems  that  use  natural  interaction  techniques  means 
the  user  is  freed  from  learning  the  formal  language  of  a  system. 
Thus,  instead  of  using  formalisms  (e.g.,  UNIX  commands),  script¬ 
ing  languages,  or  graphical  menus  and  buttons,  users  can  en¬ 
gage  in  a  dialog  with  the  system.  Less  user  training,  more  rapid 
development  of  expertise  and  better  error  recovery  are  all 
promising  aspects  of  systems  that  use  NLU. 

Dialog  can  be  described  as  a  series  of  related  conversational 
interactions.  It  adds  the  richness  of  context  and  the  knowledge  of 
multiple  interactions  over  time  to  the  user  interface,  transforming 
natural  language  interfaces  into  conversational  interfaces.  Such 
interactions  require  the  system  to  maintain  a  history  of  the  inter¬ 
action  as  well  as  the  state  of  the  interaction  at  all  times  (Chai, 
2001).  Two  important  components,  which  must  be  represented 
in  any  dialog  system,  are  user  goals  and  the  current  context  of  the 
interaction.  The  history  of  the  interaction  must  be  evaluated 
against  the  user  goal,  with  prompts  to  the  user  designed  to  ac¬ 
quire  the  necessary  information  required  to  satisfy  a  goal.  Dialog 
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FIGURE  19.3.  Diagram  of  a  prototypical  multimodal  conversational  system. 


technology  can  be  embedded  in  an  application  to  enable  either 
user-initiated,  system-initiated,  or  mixed-initiative  conversations 
(see  descriptions  of  these  different  dialog  styles  following).  The 
conversations  are  guided  by  dialog  management  technology. 

Current  Capabilities  and  Limitations  of  NLU  Systems 

Most  NLU  applications  will  easily  run  on  the  desktop  and  note¬ 
book  computers  commonly  available  in  the  year  2007.  In  addi¬ 
tion,  research  prototype  systems  that  move  NLU  functionality 
onto  handheld  systems  are  beginning  to  appear.  Cars  that  ac¬ 
cept  natural  voice  commands  to  control  a  limited  range  of  fea¬ 
tures  are  now  available.  NLU  on  the  web  is  becoming  more  com¬ 
mon  (e.g.,  www.askjeeves.com, www.neuromedia.com),  and 
illustrates  client-server  architecture  for  NLU.  As  computing 
power  increases  and  digital  technology  pervades  our  everyday 
life,  we  can  expect  to  see  a  dramatic  increase  in  the  use  of  NLU 
in  new  environments,  supporting  more  everyday  tasks. 

Capturing  and  representing  knowledge  of  a  domain  is  a  com¬ 
plex  and  labor-intensive  process.  As  the  scope  of  the  application 
domain  increases,  it  becomes  increasingly  difficult  to  build  and 
maintain  NLU  applications.  As  a  result,  all  successful  examples  of 
rich  NLU  interfaces  have  relatively  narrow  application  domains. 
This  has  direct  impact  on  the  definition  of  user  profiles  for  any 
given  application.  Simply  stated,  the  narrower  the  conceptual, 
functional,  syntactic,  and  lexical  domains  of  the  target  user  pop¬ 
ulation,  the  greater  the  chance  of  building  an  NLU  application 
that  satisfies  its  users.  There  is  currently  no  way  to  quantify  and 
measure  these  characteristics,  and  determining  whether  a  par¬ 
ticular  application  represents  a  tractable  problem  in  NLU  is  largely 
an  issue  of  experience  and  instinct  on  the  part  of  designers  and 


engineers.  An  NLU  engine  and  speech  input  and  output  tech¬ 
nologies  are  the  building  blocks  for  a  conversational  application. 
Now,  let’s  turn  our  attention  to  the  design  of  the  application  itself 


CRAFTING  THE  SPOKEN  INTERACTION 


The  first  step  in  designing  a  conversational  interface  is  to  edu¬ 
cate  yourself  about  the  ways  people  speak  in  the  domain  of  the 
task.  The  best  approach  is  to  find  users  who  are  doing  an  activ¬ 
ity  that  is  as  close  as  possible  to  the  target  task.  Keep  in  mind 
that  the  vocabulary  and  sentence  structure  used  in  graphical 
computer  applications  or  in  printed  material  may  be  quite  dif¬ 
ferent  from  the  way  people  actually  speak  about  a  task.  For  ex¬ 
ample,  imagine  that  you  want  to  provide  a  speech  interface  to 
a  calendar.  Printed  calendars  and  online  calendars  typically 
show  day,  week,  and  month  views  of  calendar  appointments. 
They  always  show  days  of  the  week  and  numbered  dates.  As 
soon  as  you  listen  to  two  people  talk  about  appointments  and 
scheduling,  however,  you  learn  that  they  only  use  numbered 
dates  occasionally.  The  more  common  way  to  discuss  dates  with 
other  people  is  to  use  relative  descriptions:  “What  do  you  have 
a  week  from  Monday?  Are  you  busy  this  Thursday?  What  about 
the  next  day?”  The  concept  of  relative  dates  is  completely  absent 
from  paper-based  and  online  calendars;  however,  users  will  ex¬ 
pect  to  be  able  to  use  them  when  speaking. 

"With  a  little  creativity,  you  can  find  or  create  situations  in  which 
two  people  can  talk  about  the  target  task.  Although  you  might 
want  to  plan  out  the  interactions  a  little  bit,  care  must  be  taken 
not  to  use  scripts  or  memorized  phrases.  The  idea  with  this  type 
of  listening  is  to  allow  people’s  natural  social  instincts  to  drive  the 
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dialog.  If  possible,  record  the  natural  dialogs.  They  will  serve  as 
the  basis  for  the  speech  interface  design.  Listening  in  this  way  is 
called  a  “natural  dialogue  study”  (Yankelovich,  in  press).  Using 
natural  dialogue  studies,  you  can  learn  (a)  about  concepts  used 
when  talking  about  the  task  that  are  absent  in  other  media,  (b) 
common  vocabulary  used  when  talking  about  the  task,  (c)  tone  of 
voice  that  is  considered  polite  in  the  context  of  the  task,  (d)  sen¬ 
tence  structure  typical  for  the  domain  patterns  of  interaction,  and 
(e)  methods  people  use  to  give  one  another  feedback. 

Dialog  Styles 

Once  you  have  an  understanding  of  your  task  and  the  way  peo¬ 
ple  speak  in  this  domain,  you  will  need  to  decide  if  your  appli¬ 
cation  would  benefit  most  from  a  directed,  user-initiated,  or 
conversational  dialogue  style. 

Directed  dialogue  (system  initiated).  This  is  the  most 
commonly  used  style  of  interaction  in  speech-based  telephony 
systems  on  the  market  today.  With  a  directed  dialogue,  the  user 
is  instructed  or  “directed”  what  to  say  at  each  prompt  (Kamm, 
1994).  Systems  that  use  this  style  can  often  be  recognized  by  the 
use  of  the  word  “say”  in  the  prompts.  “Welcome  to  ABC  Bank. 
You  can  check  an  account  balance,  transfer  funds,  or  pay  a  bill. 
Say,  ‘Balance,’  ‘Transfer,’  or  ‘Pay.’”  The  reason  most  systems  use 
this  dialogue  style  is  to  increase  the  accuracy  of  the  speech 
recognition.  By  telling  the  user  what  to  say,  the  chances  of  the 
user  speaking  an  out-of-vocabulary  utterance  are  much  lower. 
This  style  is  sometimes  referred  to  as  “system  initiated”  because 
the  user  plays  a  more  passive  role,  waiting  for  the  system  to 
prompt  the  user  for  what  is  needed. 

User-initiated.  Unlike  a  directed  dialogue,  which  works 
well  for  novice  users,  user-initiated  systems  are  intended  for  use 
by  expert  users.  With  this  type  of  interaction  style,  the  system 
is  passive  and  the  user  is  not  prompted  for  specific  informa¬ 
tion.  For  example, 

Computer:  How  can  I  help  you? 

User:  I  would  like  to  buy  100  shares  of  Amazon.com 

at  the  market  price  with  funds  from  my  current 
account  number  345198. 

These  types  of  dialogues  do  not  work  well  with  new  users 
who  do  not  know  what  to  say,  and  tend  to  have  lower  recogni¬ 
tion  rates. 

Conversational  systems  (mixed  initiative).  The  type 
of  interaction  that  is  easiest  for  users  and  feels  the  most  natural 
is  a  mixed  initiative  system.  In  this  style  of  interaction,  the  sys¬ 
tem  and  the  user  take  turns  initiating  the  communication  de¬ 
pending  on  the  flow  of  the  conversation  and  the  status  of  the 
task.  With  a  mixed  initiative  system,  the  prompts  can  fluctuate 
between  open-ended  prompts  and  more  directed  prompts,  de¬ 
pending  on  the  context.  Here’s  a  portion  of  dialogue: 

Computer:  What  would  you  like  to  do?  {Open-ended) 

User:  What  is  the  price  of  Fidelity  Magellan?  {User- 

initiated) 


Computer: 


User: 


Computer: 


User: 


Computer: 


Price  of  Fidelity  Magellan  is  $36.05  per  share. 
What  else? 

I  want  to  buy  500  shares.  {Which  shares  to  buy 
is  implied  in  the  previous  interaction.) 
Confirming  purchase  of  500  shares  of  Fidelity 
Magellan  fund,  please  say,  “Yes”  or  “No.” 
{Directed) 

No,  skip  the  purchase.  I’d  like  to  transfer 
$1,000  from  my  index  fund  {User-initiated). 
Purchase  canceled.  To  which  fund  would 
you  like  to  transfer?  (Prompt  for  missing 
information) 


In  this  example,  we  see  the  system  functioning  as  a  cooper¬ 
ative  partner  in  the  conversation. 


Inherent  Challenges  of  Speech 

Designing  conversational  systems  poses  substantial  challenges. 
Understanding  these  challenges  and  assessing  the  various  trade¬ 
offs  that  must  be  made  during  the  design  process  will  help  to 
produce  the  most  effective  interface. 

By  its  nature,  speech  is  transient.  Once  you  hear  it  or  say  it, 
it’s  gone.  By  contrast,  graphics  are  persistent.  A  graphical  inter¬ 
face  typically  stays  on  the  screen  until  the  user  performs  some 
action.  Listening  to  speech  taxes  users’  short-term  memory.  Be¬ 
cause  speech  is  transient,  users  can  remember  only  a  limited 
number  of  items  in  a  list,  and  they  may  forget  important  infor¬ 
mation  provided  at  the  beginning  of  a  long  sentence.  Users’  lim¬ 
ited  ability  to  remember  transient  information  has  substantial 
implications  for  the  speech  interface  design.  In  general,  tran¬ 
sience  means  that  speech  is  not  a  good  medium  for  delivering 
large  amounts  of  information.  The  transient  nature  of  speech 
can  also  provide  benefits.  Because  people  can  look  and  listen 
at  the  same  time,  speech  is  ideal  for  grabbing  attention  or  for 
providing  an  alternate  mechanism  for  feedback.  Imagine  re¬ 
ceiving  a  notification  about  the  arrival  of  an  e-mail  message 
while  working  on  a  spreadsheet.  Speech  might  give  the  user  the 
opportunity  to  ask  for  the  sender  or  the  subject  of  the  message. 
The  information  can  be  delivered  without  forcing  the  user  to 
switch  contexts. 

Speech  is  also  invisible.  The  lack  of  visibility  makes  it  chal¬ 
lenging  to  communicate  the  functional  boundaries  of  an  appli¬ 
cation  to  the  user  (Yankelovich,  1996).  In  a  graphical  applica¬ 
tion,  menus  and  other  screen  elements  make  most  or  all  of  the 
functionality  of  an  application  visible  to  a  user.  By  contrast,  in  a 
speech  application,  it  is  much  more  difficult  to  indicate  to  the 
user  what  actions  they  may  perform,  and  what  words  and 
phrases  they  must  say  to  perform  those  actions. 


Prompt  Design 

The  problems  of  transience  and  invisibility  can  often  be  mitigated 
by  well-designed  prompts.  Many  factors  must  be  considered 
when  designing  prompts,  but  the  most  important  is  assessing  the 
trade-off  between  flexibility  and  performance.  The  more  you 
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constrain  what  the  user  can  say  to  an  application,  the  less  likely 
they  are  to  encounter  recognition  errors.  On  the  other  hand, 
allowing  users  to  speak  information  flexibly  can  often  speed  the 
interaction  (if  recognition  succeeds),  feel  more  natural,  and 
avoid  forcing  users  to  memorize  commands.  Here  are  some  tips 
for  creating  useful  prompts. 

Use  explicit  prompts  when  the  user  input  must  be  tightly 
constrained.  For  example,  after  recording  a  message,  the 
prompt  might  be  “Say,  ‘Cancel,’  ‘Send,’  or  ‘Review.”’  This  sort 
of  prompt  directs  the  user  to  say  just  one  of  those  three  key¬ 
words.  Even  conversational  systems  should  fall  back  on  this  type 
of  prompt  when  it  is  critical  that  the  input  be  correctly  recog¬ 
nized,  as  with  transactions  that  involve  transferring  money  or 
booking  travel.  ‘Are  you  sure  you  want  to  book  this  flight  from 
JFK  to  LAX?  Please  say,  ‘Yes’  or  ‘No.”’ 

Use  implicit  prompts  when  the  application  is  able  to  accept 
input  that  is  more  flexible.  These  prompts  rely  on  conversational 
conventions  to  constrain  the  user  input.  For  example,  if  the  user 
says,  “Send  mail  to  Bill,”  and  “Bill”  is  ambiguous,  the  system 
prompt  might  be  “Did  you  mean  Bill  Smith  or  Bill  Jones?”  Users 
are  likely  to  respond  with  input  such  as  “Smith”  or  “I  meant  Bill 
Jones.”  While  possible,  conversational  convention  makes  it  less 
likely  that  they  would  say,  “Bill  Jones  is  the  one  I  want.” 

Using  variable  prompts  is  a  good  way  to  try  to  simulate  a 
human-human  conversation.  Given  a  certain  condition  or  state 
of  the  system  (e.g.,  the  ready  state,  or  a  system  response  to  si¬ 
lence),  it  is  preferable  not  to  play  the  exact  same  system 
prompt  every  time.  Subtle  variations  in  the  wording  impart  a 
much  more  natural  feel  to  the  interaction.  Note  the  following 
possibilities  for  the  ready  state  in  a  natural  language  under¬ 
standing  system:  “What  now?”,  “Fm  ready  to  help,”  “What’s 
next?” 

Another  interaction  that  we  can  model  on  human  speech  is 
the  use  of  tapered  prompts.  Tapering  can  be  accomplished  in 
one  of  two  ways.  If  an  application  is  presenting  a  set  of  data  such 
as  current  quotes  for  a  stock  portfolio,  drop  out  unnecessary 
words  once  a  pattern  is  established.  For  example,  ‘As  of  15  min¬ 
utes  ago.  Acme  Industries  was  trading  at  45  up  1/2,  BioStartup 
was  at  83  up  1/2,  and  TechGiant  was  at  106  down  1/4.”  Tapering 
can  also  happen  over  time — that  is,  if  you  need  to  tell  the  user 
the  same  information  more  than  once,  make  it  shorter  each 
time  the  function  is  used. 

One  way  to  speed  interaction  for  expert  users  and  provide 
help  for  less  experienced  users  is  to  use  incremental  prompts. 
This  technique  involves  starting  with  a  short  prompt.  If  the 
user  does  not  respond  within  a  time-out  period,  the  application 
prompts  again  with  instructions  that  are  more  detailed.  For  ex¬ 
ample,  the  initial  prompt  might  be,  ‘"WTiich  service?”  If  the  user 
says  nothing,  then  the  prompt  could  be  expanded  to  “Say 
‘Banking,’  ‘Address  book,’  or  ‘Yellow  pages.’”  Incremental 
prompts  have  been  used  successfully  in  a  number  of  systems, 
but  they  suffer  from  several  problems.  A  first-time  user  is  just  as 
likely  to  say  something  to  the  first  prompt  as  they  are  to  say 
nothing.  This  often  results  in  a  recognition  error.  Another  com¬ 
mon  pitfall  with  incremental  prompts  is  that  they  tend  to  cause 
collision  errors  where  both  the  system  and  the  user  speak  at 
the  same  time.  It  is  common  for  timing  problems  to  happen  re¬ 
peatedly  once  started,  increasing  the  user’s  frustration  along 
the  way. 


Providing  Help 

There  are  two  basic  types  of  help.  In  the  first  case,  the  user 
initiates  by  requesting  help.  Common  queries  for  help  in 
speech  systems  are,  “Help”,  “What  can  I  say?”,  or  “What  are 
my  choices?”  It  is  always  a  good  idea  to  support  several  varia¬ 
tions  of  this  request  in  the  grammar.  In  the  second  type  of 
help,  the  system  detects  that  the  user  is  in  trouble  and  auto¬ 
matically  presents  help.  Although  not  all  forms  of  user  prob¬ 
lems  are  detectable  by  the  system,  one  can  presume  if  a  series 
of  errors  or  long  silences  occur  that  the  user  probably  needs 
some  assistance. 

Help,  when  given,  needs  to  be  specific  to  the  user’s  current 
task.  Few  things  are  as  frustrating  as  being  stuck  and  asking  for 
help  only  to  find  that  the  information  presented,  while  long  and 
detailed,  has  no  bearing  on  the  current  situation.  A  good  ap¬ 
proach  for  help  is  to  provide  the  user  with  an  example  of  a  suc¬ 
cessful  utterance  just  as  the  user  starts  the  downward  spiral  of 
repeated  errors.  If  several  errors  have  occurred  in  a  row,  the  sys¬ 
tem  should  present  the  user  with  sample  phrasing  that  would  be 
expected  to  work,  based  on  the  context  that  he  is  in.  For  example, 
a  user  could  say,  “I  want  to  sell  300  shares  of  Ultrasoft.”  If  the  sys¬ 
tem  cannot  determine  the  exact  context,  another  option  is  to  list 
the  functions  that  are  available  at  the  current  branch  and  have  the 
user  point  the  system  in  the  right  direction.  Here’s  an  example: 


Computer: 


User: 

Computer: 


User: 


Let’s  start  over.  You  can  forward,  reply,  delete, 
or  create  a  new  message.  What  do  you  want 
to  do? 

I  want  to  forward  a  message. 

To  forward  a  message,  say  the  first  and  last 
name  of  another  subscriber.  For  example,  say, 
“Forward  this  message  to  John  Smith.” 
Forward  this  message  to  Alishia  Grant. 


If  the  system  returns  a  recognition  error  now,  it  is  easy  to  es¬ 
tablish  that  the  name  is  at  fault  because  we  already  know  the 
function.  This  dialogue  might  continue: 


Computer: 


User: 

Computer: 

User: 

Computer: 


Oh  dear.  I  didn’t  catch  that  name.  Just  say  the 
first  name. 

Alishia. 

I  think  you  said,  ‘Alice.”  Is  that  correct? 

No,  that  is  not  correct. 

Thanks  for  your  patience  with  this.  Please  spell 
the  first  name. 


User:  A-L-I-S-H-I-A. 


Now  the  system  can  repeat  a  similar  process  for  the  last  name, 
or  can  look  up  all  subscribers  with  a  first  name  of  Alishia  and  offer 
the  user  a  set  of  possible  last  names  from  which  to  choose. 


Handling  Errors 

How  a  system  handles  recognition  errors  can  dramatically  affect 
the  quality  of  a  user’s  experience.  If  either  the  application  or  the 
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user  detects  an  error,  an  effective  speech  user  interface  should 
provide  one  or  more  mechanisms  for  correcting  the  error.  Al¬ 
though  this  seems  obvious,  correcting  a  speech  input  error  is 
not  always  easy!  If  the  user  speaks  a  word  or  phrase  again,  the 
same  error  could  reoccur,  depending  on  the  cause  of  the  error 
(Yankelovich,  Levow,  &  Marx,  1995). 

Recognition  errors  can  be  divided  into  three  categories: 
(a)  rejection,  (b)  substitution,  and  (c)  insertion  (Schmandt, 
1994;  Ballentine  &  Morgan,  1999).  A  rejection  error  occurs 
when  the  recognizer  has  no  hypothesis  about  what  the  user 
said.  A  substitution  error  involves  the  recognizer  mistaking  the 
user’s  utterance  for  a  different  valid  utterance,  as  when  “send  a 
message”  is  interpreted  as  “seventh  message.”  With  an  insertion 
error,  the  recognizer  either  interprets  noise  as  a  valid  utterance, 
or  decodes  multiple  words  when  only  one  was  spoken.  This  can 
be  caused  by  other  people  talking  nearby  or  by  the  user  inad¬ 
vertently  tapping  the  telephone  or  microphone. 

Rejection  errors.  In  handling  rejection  errors,  design¬ 
ers  want  to  avoid  the  “brick  wall”  effect,  when  every  rejection  is 
met  with  the  same  “I  didn’t  understand”  response.  Users  get 
frustrated  very  quickly  when  faced  with  repetitive  error  mes¬ 
sages.  Instead,  give  progressive  assistance-,  a  short  error  mes¬ 
sage  the  first  couple  of  times,  and  if  errors  persist,  offer  more 
detailed  assistance.  For  example,  here  is  one  progression  of  er¬ 
ror  messages  that  a  user  might  encounter:  “Sorry?”,  “"What  did 
you  say?”,  “Sorry.  Please  rephrase.”,  “I  didn’t  understand.  Speak 
clearly,  but  do  not  overemphasize.”,  “Still  no  luck.  Wait  for  the 
prompt  tone  before  speaking.”  Progressive  assistance  does 
more  than  bring  the  error  to  the  user’s  attention;  the  user  is 
guided  toward  speaking  a  valid  utterance  with  successively 
more  informative  error  messages  that  consider  possible  causes 
of  the  error. 

Substitution  errors.  Although  rejection  errors  are  frus¬ 
trating,  substitution  errors  can  be  damaging.  If  the  user  asks  a 
weather  application  for  “Kuai,”  but  the  recognizer  hears  “Good¬ 
bye”  and  hangs  up,  the  interaction  could  be  completely  termi¬ 
nated.  In  situations  like  this,  the  system  should  explicitly  verify 
that  the  user’s  utterance  was  correctly  understood.  'Verification 
should  be  commensurate  with  the  cost  of  the  action  that  would 
be  effected  by  the  recognized  utterance.  Reading  the  wrong 
stock  quote  or  calendar  entry  will  make  the  user  wait  a  few  sec¬ 
onds,  but  hanging  up  or  sending  a  confidential  message  to  the 
wrong  person  by  mistake  could  have  serious  consequences. 

Insertion  errors.  Spurious  recognition  typically  occurs 
because  of  background  noise.  The  illusory  utterance  will  either 
be  rejected  or  mistaken  for  an  actual  command;  in  either  case, 
the  previous  methods  can  be  applied.  The  real  challenge  is  to 
prevent  insertion  errors.  One  option  is  to  provide  users  with  a 
keypad  command  to  turn  off  the  speech  recognizer  to  talk  to 
someone,  sneeze,  or  simply  gather  their  thoughts.  Pressing  the 
keypad  command  again  can  restart  the  recognizer  with  a  sim¬ 
ple  prompt,  such  as  “'What  now?”  to  indicate  that  the  recognizer 
is  listening  again. 

"Whatever  the  type  of  error,  a  general  technique  for  avoiding 
errors  in  the  first  place  is  to  filter  recognition  results  for  unlikely 


user  input.  For  example,  a  scheduling  application  might  assume 
that  an  error  has  occurred  if  the  user  appears  to  want  to  sched¬ 
ule  a  meeting  for  three  o’clock  in  the  morning. 

Correction  Strategies 

If  errors  do  occur,  it  is  important  to  provide  a  means  for  the  user 
to  correct  the  error  (assuming  they  notice  it).  Flexible  correction 
mechanisms  that  allow  a  user  to  correct  a  portion  of  the  input  are 
helpful.  For  example,  if  the  user  asks  for  a  weather  forecast  for 
Boston  for  Tuesday,  the  system  might  respond,  “Thursday’s 
weather  for  Boston  is  .  ...”  A  flexible  correction  mechanism 
would  allow  the  user  to  just  correct  the  day:  “No,  I  said  Tuesday.” 

"When  possible,  using  an  alternate  form  of  input,  such  as  the 
telephone  keypad,  can  alleviate  the  user’s  frustration.  If  the  user 
is  at  a  prompt  where  three  choices  are  available  and  the  user  has 
encountered  several  rejection  errors,  the  user  could  be  in¬ 
structed:  “Press  any  key  when  you  hear  the  option  you  want.” 
The  telephone  keypad  also  works  well  when  the  requested  in¬ 
put  is  numeric  (e.g.,  telephone  numbers,  account  or  social  se¬ 
curity  numbers) .  Getting  users  to  type  alphabetic  text  using  a 
telephone  keypad  is  not  a  good  idea;  speech  systems  are  usually 
recommended  to  avoid  this  type  of  input  in  the  first  place. 

Another  strategy  is  to  have  the  system  take  its  best  guess  at 
the  requested  function.  This  is  a  good  tactic  to  take  when  the 
number  of  functions  enabled  at  a  particular  branch  in  the  dia¬ 
logue  is  too  large  to  list  for  the  user.  A  reasonable  prompt  is,  “I 
think  you  are  trying  to  create  a  message,  is  that  correct?”  If  the 
user  answers  in  the  affirmative,  the  conversation  moves  forward 
and  the  system  can  present  the  user  with  a  sample  valid  utter¬ 
ance  for  that  function.  If  the  best  guess  is  wrong,  however,  it  is 
not  a  good  idea  to  keep  iterating  through  the  N-best  choices  be¬ 
cause  this  only  leads  to  user  frustration.  If  the  response  to  the 
best  guess  is  negative,  a  better  solution  is  to  reprompt  with  a  re¬ 
stricted  set  of  choices.  Be  sure  to  eliminate  the  choice  that  is  def¬ 
initely  wrong  (e.g.,  creating  a  message  in  the  previous  example). 
The  goal  is  to  move  away  from  a  very  general  prompt  such  as, 
“I’m  sorry,  I  do  not  understand.  Please  try  again”  towards  a  di¬ 
rected  prompt  that  will  increase  the  likelihood  of  success.  A  se¬ 
ries  of  errors  in  a  row  is  a  clear  indication  that  simply  having  the 
user  repeat  the  utterance,  or  rephrase  it,  is  not  working. 

The  best-guess  tactic  can  be  combined  with  another  correc¬ 
tion  strategy,  switching  to  more  constrained  grammar,  to  increase 
its  likelihood  of  success.  For  example,  in  the  prior  prompt,  a  di¬ 
rective  of  what  utterances  are  available  to  the  user  can  be  added: 
“I  think  you  are  trying  to  create  a  message.  Is  that  correct?  Please 
say,  ‘Yes’  or  ‘No.’” 


TESTING  AND  ITERATING 


Once  the  preliminary  application  design  is  complete,  a  "Wizard 
of  Oz  study  can  help  test  and  refine  the  interface.  The  speech 
data  collected  from  this  study  can  be  used  to  help  refine  the 
grammar  (Rudnicky,  1995)  or,  in  the  case  of  a  NLU  system,  to 
statistically  train  the  engine  on  potential  utterances.  In  these 
studies,  a  human  wizard — often  using  software  tools — simulates 
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the  speech  interface.  Major  usability  problems  are  often  uncov¬ 
ered  with  these  types  of  simulations. 

A  Wizard  of  Oz  study  usually  involves  bringing  participants 
into  a  lab  and  telling  them  they  will  be  interacting  with  a  com¬ 
puter.  If  it  is  a  telephony  application,  they  can  be  asked  to  call  a 
telephone  number,  at  which  point  a  human  “wizard”  picks  up 
the  phone  and  manipulates  software  so  that  recordings  of  syn¬ 
thesized  speech  are  played  to  the  participant.  As  the  participant 
makes  requests  to  the  computer,  the  wizard  carries  out  the  op¬ 
erations  and  has  the  computer  speak  the  responses.  Often, 
none  of  the  participants  suspect  that  they  are  not  interacting 
with  a  real  speech  system  (Dahlback,  1993).  Because  computer 
tools  are  usually  necessary  to  carry  out  a  convincing  simulation. 
Wizard  of  Oz  studies  are  more  time-consuming  and  complicated 
to  run  than  natural  dialogue  studies.  If  a  prototype  of  the  final 
application  can  be  built  quickly,  it  may  be  more  cost-effective 
to  move  directly  to  a  usability  study. 

With  speech  applications,  usability  studies  are  particularly  im¬ 
portant  for  uncovering  problems  because  of  recognition  errors, 
which  are  difficult  to  simulate  effectively  in  a  Wizard  of  Oz  study, 
but  are  a  leading  cause  of  usability  problems.  The  effectiveness  of 
an  application’s  error  recovery  functionality  must  be  tested  in 
the  environments  in  which  real  users  will  use  the  application. 
Conducting  usability  tests  of  speech  applications  can  be  a  bit 
tricky.  Two  standard  techniques  used  in  tests  of  graphical  appli¬ 
cations — (a)  facilitated  discussions  and  (b)  speak-aloud  proto¬ 
cols — cannot  be  used  effectively  with  speech  applications.  It  is 
best  to  have  study  participants  work  in  isolation,  speaking  only 
into  a  telephone  or  microphone.  A  tester  should  not  intervene 
unless  the  participant  becomes  completely  stuck.  A  follow-up 
interview  can  be  used  to  collect  the  participant’s  comments  and 
reactions.  Ultimately,  the  system  needs  to  be  tested  with  real 
users,  accomplishing  real  tasks  in  a  realistic  environment.  This 
type  of  testing  will  provide  data  that  will  allow  the  designer  to 
modify  the  grammar  and  prompts  as  necessary. 


CONCLUSION 


The  design  and  development  life  cycle  for  an  effective  speech  ap¬ 
plication  differs  from  a  traditional  GUI  application.  An  effective 
speech  application  is  one  that  uses  speech  to  enhance  a  user’s 
performance  of  a  task  or  enable  an  activity  that  cannot  be  done 
without  it;  however,  the  design  process  does  share  common  el¬ 


ements  with  the  design  of  a  successful  GUI  application,  such  as 
the  need  to  first  understand  the  task  that  the  users  are  trying  to 
accomplish.  In  the  case  of  speech,  modeling  the  task  can  usu¬ 
ally  be  accomplished  by  listening  to  how  humans  accomplish  the 
task  today.  Sometimes  this  involves  listening  to  the  interactions 
between  a  call  center  employee  and  a  caller;  sometimes  it  in¬ 
volves  constructing  a  situation  between  two  humans  and  asking 
them  to  converse  to  accomplish  the  task.  Observing  users  during 
the  task-modeling  phase  helps  a  designer  to  understand  who  the 
users  are  and  what  their  goals  are.  Listening  carefully  to  users 
while  conducting  natural  dialogue  studies  shows  how  they  speak 
in  the  context  of  the  task.  A  natural  dialogue  study  also  ensures 
that  prompts  and  feedback  follow  the  conversational  conven¬ 
tions  that  users  expect  in  a  cooperative  interaction. 

Once  the  task  is  modeled  and  well  understood,  and  the  busi¬ 
ness  goals  have  been  defined,  the  designer/developer  must  se¬ 
lect  the  appropriate  technology  for  the  task,  users  and  context 
of  use.  Conversational  interfaces  that  use  NLU  technologies 
capitalize  on  human  expertise  in  interacting  with  each  other 
through  language.  Not  all  tasks  will  be  able  to  use  NLU,  given 
the  resource  requirements  and  the  need  for  a  constrained  do¬ 
main;  however,  even  speech  applications  that  are  built  with 
grammars  can  carry  many  of  the  usability  advantages  of  NLU  and 
have  a  conversational  feel  to  them,  if  the  prompts  are  carefully 
crafted.  Successful  dialogs  will  move  between  user-initiated, 
system-initiated,  and  mixed-initiative  styles  depending  on  the 
state  of  the  task  and  the  history  of  the  interaction. 

Another  key  to  the  success  of  the  design  of  an  effective 
speech  user  interface  is  to  follow  an  iterative  process.  Not  even 
the  most  experienced  designer  will  craft  a  perfect  dialogue  in 
the  first  iteration.  Once  an  application  is  designed,  'Wizard  of 
Oz  and  usability  studies  provide  opportunities  to  test  interac¬ 
tion  techniques  and  refine  application  behavior  based  on  feed¬ 
back  from  prototypical  users.  To  ensure  that  the  system  will 
be  used  over  time,  the  designer  must  (a)  focus  on  users  and 
(b)  modify  the  design  as  new  data  is  collected. 

Finally,  testing  the  design  with  target  users  ensures  that  the 
prompts  are  clear,  that  feedback  is  appropriate,  and  that  errors 
are  caught  and  corrected.  In  addition,  as  part  of  the  testing,  ver¬ 
ifying  that  the  design  accomplishes  the  business  goals  that  were 
set  out  to  be  achieved  helps  the  application  gain  approval  from 
the  sponsors.  If  problems  are  uncovered  during  testing,  the  de¬ 
sign  should  be  revised  and  tested  again.  By  focusing  on  users 
and  iterating  on  the  design,  one  can  produce  an  effective,  pol¬ 
ished  speech  interface  design. 
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Design  of  multimedia  interfaces  currently  leaves  a  lot  to  be  de¬ 
sired.  As  with  many  emerging  technologies,  it  is  the  fascination 
with  new  devices,  functions,  and  forms  of  interaction  that  has 
motivated  design  rather  than  ease  of  use,  or  even  utility  of  prac¬ 
tical  applications.  Poor  usability  limits  the  effectiveness  of  multi- 
media  products  which  might  look  good,  but  do  not  deliver 
effective  use  (Scaife,  Rogers,  Aldrich,  &  Davies,  1997).  The  multi- 
media  market  has  progressed  beyond  the  initial  hype,  and  cus¬ 
tomers  are  looking  for  well-designed,  effective,  and  mature 
products. 

The  distinguishing  characteristics  of  multimedia  are  infor¬ 
mation-intensive  applications  that  have  complex  design  spaces 
for  presenting  information  to  people.  Designers,  therefore, 
must  start  by  modeling  information  requirements.  This  chap¬ 
ter  describes  a  design  process  that  starts  with  an  information 
analysis  and  then  progresses  to  deal  with  issues  of  media  selec¬ 
tion  and  integration.  The  background  to  the  method  and  its 
evolution  with  experience  can  be  found  in  several  publications 
(Sutcliffe  &  de  Angeli,  2005;  Faraday  &  Sutcliffe,  1996,  1997b, 
1998b;  Sutcliffe  &  Faraday,  1994).  A  more  detailed  description 
is  given  in  Sutcliffe  (2003).  The  time-to-market  pressure  gives  lit¬ 
tle  incentive  for  design,  so  at  first  reading,  a  systematic  approach 
may  seem  to  be  counter  to  the  commercial  drivers  of  develop¬ 
ment;  however,  I  would  argue  that  if  multimedia  design  does 
not  adopt  a  usability  engineering  approach,  it  will  fail  to  deliver 
effective  and  usable  products. 

Multimedia  applications  have  significant  markets  in  educa¬ 
tion  and  training,  although  dialogue  in  many  systems  is  re¬ 
stricted  to  drill  and  quiz  interaction  and  simple  navigation.  This 
approach,  however,  is  oversimplified:  for  training  and  educa¬ 
tion,  interactive  simulations  and  microworlds  are  more  effective 
(Rogers  &  Scaife,  1998).  Multimedia  has  been  used  extensively 
in  task-based  applications  in  process  control  and  safety  critical 
systems  (Alty,  1991;  Hollan,  Hutchins,  &  Weitzman,  1984);  how¬ 
ever,  most  transaction  processing  applications  are  currently 
treated  as  standard  interfaces  rather  than  multimedia-based  de¬ 
signs.  With  the  advent  of  the  web  and  e-commerce,  this  view 
may  change. 

Design  issues  for  multimedia  user  interfaces  (UIs)  expand 
conventional  definitions  of  usability  (e.g..  International  Stan¬ 
dards  Organization,  1997)  into  five  components: 

1.  Operational  usability  is  the  conventional  sense  of  usability 
that  concerns  design  of  graphical  user  interface  features  such 
as  menus,  icons,  metaphors,  and  navigation  in  hypermedia. 

2.  Information  delivery  is  a  prime  concern  for  multimedia  or 
any  information-intensive  application,  and  raises  issues  of 
media  selection,  integration,  and  design  for  attention. 

3.  Learning:  Training  and  education  are  both  important  mar¬ 
kets  for  multimedia;  hence,  learnability  of  the  product  and 
its  content  are  key  quality  attributes.  Design  of  educational 
technology,  however,  is  a  complex  subject  in  its  own  right, 
and  multimedia  is  only  one  part  of  the  design  problem. 

4.  Utility:  In  some  applications,  utility  will  be  the  functionality 
that  supports  the  user’s  task;  in  others,  information  delivery 
and  learning  will  represent  the  value  perceived  by  the  user. 

5.  Engagement  and  attractiveness:  The  attractiveness  of 
multimedia  is  now  a  key  factor,  especially  for  websites.  Multi¬ 


media  interfaces  have  to  attract  users  and  deliver  a  stimulat¬ 
ing  user  experience,  as  well  as  being  easy  to  use  and  learn. 

Multimedia  design  involves  several  specialisms  that  are  tech¬ 
nical  subjects  in  their  own  right.  For  instance,  design  of  text  is 
the  science  (or  art)  of  calligraphy  that  has  developed  new  fonts 
over  many  years;  visualization  design  encompasses  the  creation 
of  images,  either  drawn  or  captured  as  photographs.  Design  of 
moving  images,  cartoons,  video,  and  film  are  further  specializa¬ 
tions,  as  are  musical  composition  and  design  of  sound  effects. 
Multimedia  design  lies  on  an  interesting  cultural  boundary  be¬ 
tween  the  creative  artistic  community  and  science-based  engi¬ 
neering.  One  implication  of  this  cultural  collision  is  that  space 
precludes  “within  media”  design,  or  guidelines  for  design  of  one 
particular  medium,  which  is  dealt  with  in  depth  in  this  chapter. 
Successful  multimedia  design  often  requires  teams  of  specialists 
who  contribute  from  their  own  skill  sets  (Kristof  &  Satran,  1995; 
Mullet  &  Sano,  1995). 


DEFINITIONS  AND  TERMINOLOGY 


Multimedia  essentially  extends  the  graphical  user  interface  (GUI) 
paradigm  by  providing  a  richer  means  of  representing  informa¬ 
tion  for  the  user  by  use  of  image,  video,  sound,  and  speech. 
Some  views  of  what  constitutes  multimedia  can  be  found  in 
Bernsen  (1994),  who  proposed  a  taxonomy  of  analog  versus  dis¬ 
crete  media,  which  he  called  modalities,  as  well  as  visual,  audio, 
and  tactile  dimensions.  Heller  and  Martin  (1995)  took  a  more 
conventional  view  of  classifying  image,  text,  video,  and  graphics 
for  educational  purposes.  The  following  definitions  broadly  fol¬ 
low  those  in  the  ISO  standard  14915  on  Multimedia  User  Inter¬ 
face  Design  (International  Standards  Organization,  1998).  The 
starting  point  is  to  ask  about  the  difference  between  what  is  per¬ 
ceived  by  someone  and  what  is  stored  on  a  machine. 

Communication  concepts  in  multimedia  can  be  separated  into 

•  Message:  The  content  of  communication  between  a  sender 
and  receiver. 

•  Medium  (plural,  media):  The  means  by  which  that  content  is 
delivered.  Note  that  this  is  how  the  message  is  represented 
rather  than  the  technology  for  storing  or  delivering  a  mes¬ 
sage.  There  is  a  distinction  between  perceived  media  and 
physical  media  such  as  CD-ROM,  hard  disk,  and  so  forth. 

•  Modality:  The  sense  by  which  a  message  is  sent  or  received 
by  people  or  machines.  This  refers  to  the  senses  of  vision, 
hearing,  touch,  smell,  and  taste. 

A  message  is  conveyed  by  a  medium  and  received  through  a 
modality.  A  modality  is  the  sensory  channel  that  we  use  to  send 
and  receive  messages  to  and  from  the  world — essentially,  our 
senses.  Two  principal  modalities  are  used  in  human-computer 
communication: 

1.  Vision:  All  information  received  through  our  eyes,  including 
text  and  image-based  media. 

2.  Hearing:  All  information  received  through  our  ears,  as 
sound,  music,  and  speech. 
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In  the  future,  as  multimedia  converges  with  Virtual  Reality 
(VR),  we  will  use  other  modalities  more  frequently:  haptic 
(sense  of  touch),  kinaesthetic  (sense  of  body  posture  and  bal¬ 
ance,  gustation  (taste)  and  olfaction  (smell).  These  issues  are 
dealt  with  in  chapters  21,  Multimodal  interfaces  (Oviatt,  Darves, 
&  Coulston,  2004),  and  31,  Virtual  environments  (Stanney). 

Defining  a  medium  is  not  simple  because  it  depends  on  how 
it  was  captured  in  the  first  place,  how  it  was  designed,  and  how 
it  has  been  stored.  For  example,  a  photograph  can  be  taken  on 
film,  developed,  and  then  scanned  into  a  computer  as  a  digi¬ 
tized  image.  The  same  image  may  have  been  captured  directly 
by  a  digital  camera  and  sent  to  a  computer  as  an  e-mail  file.  At 
the  physical  level,  media  may  be  stored  by  different  techniques. 

Physical  media  storage  has  usability  implications  for  the  qual¬ 
ity  of  image  and  response  time  in  networked  multimedia.  A 
screen  image  with  640X480  VGA  resolution  using  24  bits  per 
pixel  for  good  color  coding  gives  921,600  bytes,  so  at  30  frames 
per  second,  one  second  needs  around  25  megabytes  of  memory 
or  disk  space.  Compression  algorithms,  for  instance,  MPEG 
(Moving  Pictures  Expert  Group),  reduce  this  by  a  factor  of  10. 
Even  so,  storing  more  than  a  few  minutes  of  moving  image  con¬ 
sumes  megabytes.  The  usability  trade-off  is  between  the  size  of 
the  display  footprint  (e.g.,  window  size),  the  resolution  mea¬ 
sured  in  dots  per  inch,  and  the  frame  rate.  The  ideal  might  be 
full-screen  high  resolution  (600  dpi)  at  30  frames  per  second; 
with  current  technology,  a  10-cm  window  at  300  dpi  and 
15  frames  per  second  is  more  realistic.  Physical  image  media 
constraints  become  more  important  on  networks,  when  band¬ 
width  will  limit  the  desired  display  quality.  Sound,  in  compari¬ 
son,  is  less  of  a  problem.  Storage  demands  depend  on  the  fi¬ 
delity  required  for  replay.  Full  stereo  with  a  complete  range  of 
harmonic  frequencies  consumes  only  100  kilobytes  for  five  min¬ 
utes,  so  there  are  few  technology  constraints  on  delivery  of 
high-quality  audio. 


COGNITIVE  BACKGROUND 


The  purpose  of  this  section  is  to  give  a  brief  overview  of  cogni¬ 
tive  psychology  as  it  affects  multimedia  design.  More  details  can 
be  found  in  section  II,  Humans  in  HCI. 

Perception  and  Comprehension 

Generally,  our  eyes  are  drawn  to  moving  shapes,  then  complex, 
different,  and  colorful  objects.  Visual  comprehension  can  be 
summarized  as  “what  you  see  depends  on  what  you  look  at  and 
what  you  know.”  Multimedia  designers  can  influence  what  users 
look  at  by  controlling  attention  with  display  techniques  such  as 
use  of  movement,  highlighting  and  salient  icons;  however,  de¬ 
signers  should  be  aware  that  the  information  people  assimilate 
from  an  image  also  depends  on  their  internal  motivations,  what 
they  want  to  find  and  how  well  they  know  the  domain  (Treis- 
man,  1988).  A  novice  will  not  see  interesting  plant  species  in  a 
tropical  jungle,  whereas  a  trained  botanist  will.  Selection  of  vi¬ 
sual  content  therefore  has  to  take  the  user’s  knowledge  and 
task  into  account. 


Because  the  visual  sense  receives  information  continuously, 
it  is  overwritten  in  working  memory  (Baddeley,  1986).  This 
means  that  memorization  of  visually  transmitted  information  is 
not  always  effective  unless  users  are  given  time  to  view  and 
comprehend  images.  Furthermore,  users  only  extract  very  high- 
level  or  “gist”  (general  sense)  information  from  moving  images. 
Interpreting  visual  information  relies  on  memory.  In  realistic 
images,  this  process  is  automatic;  however,  with  nonrealistic 
images,  we  have  to  think  carefully  about  the  meaning,  for  ex¬ 
ample,  to  interpret  a  diagram.  While  extraction  of  information 
from  images  is  rapid,  it  does  vary  according  to  the  complexity  of 
the  image  and  how  much  we  know  about  the  domain.  Sound 
is  a  transient  medium,  so  unless  it  is  processed  quickly,  the  mes¬ 
sage  can  be  lost.  Even  though  people  are  remarkably  effective  at 
comprehending  spoken  language  and  can  interpret  other  sounds 
quickly,  the  audio  medium  is  prone  to  interference  because 
other  sounds  can  compete  with  the  principal  message.  Because 
sound  is  transient,  information  in  speech  will  not  be  assimilated 
in  detail,  and  so  only  the  gist  will  be  memorized  (Gardiner  & 
Christie,  1987). 

Selective  Attention 

We  can  only  attend  to  a  limited  number  of  inputs  at  once.  While 
people  are  remarkably  good  at  integrating  information  received 
by  different  senses  (e.g.,  watching  a  film  and  listening  to  the 
sound  track) ,  limits  are  determined  by  the  psychology  of  human 
information  processing  (Wickens,  Sandry,  &  Vidulich,  1983). 
Our  attention  is  selective  and  closely  related  to  perception;  for 
instance,  we  can  overhear  a  conversation  in  a  room  with  many 
people  speaking  (the  cocktail  party  effect).  Eurthermore,  selec¬ 
tive  attention  differs  between  individuals  and  can  be  improved 
by  learning.  For  example,  a  conductor  can  distinguish  the  dif¬ 
ferent  instruments  in  an  orchestra,  whereas  a  typical  listener 
cannot.  However,  all  users  have  cognitive  resource  limitations, 
which  means  that  information  delivered  on  different  modali¬ 
ties  (e.g.,  by  vision  and  sound)  have  to  compete  for  the  same  re¬ 
source.  For  instance,  speech  and  printed  text  both  require  a 
language-understanding  resource,  while  video  and  a  still  image 
use  image-interpretation  resources.  Cognitive  models  of  infor¬ 
mation  processing  architectures  (e.g.,  Interacting  Cognitive 
Subsystems;  Barnard,  1985)  can  show  that  certain  media  com¬ 
binations  will  not  result  in  effective  comprehension  because 
they  compete  for  the  same  cognitive  resources,  thus  creating  a 
processing  bottleneck.  We  have  two  main  perceptual  channels 
for  receiving  information:  (a)  vision  and  (b)  hearing.  Informa¬ 
tion  going  into  these  channels  has  to  be  comprehended  before 
it  can  be  used.  Figure  20.1,  based  on  Card,  Moran,  and  Newell’s 
(1983)  Model  Human  Processor,  shows  the  cognitive  architec¬ 
ture  of  human  information  processing  and  resource  limitations 
that  lead  to  multimedia  usability  problems. 

Capacity  overflow  may  happen  when  too  much  information 
is  presented  in  a  short  period,  swamping  the  user’s  limited 
working  memory  and  cognitive  processor’s  capability  to  com¬ 
prehend,  chunk,  and  then  memorize  or  use  the  information. 
The  connotation  is  to  give  users  control  over  the  pace  of  infor¬ 
mation  delivery.  Integration  problems  arise  when  the  message 
on  two  media  is  different,  making  integration  into  working 
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Bottlenecks 

1.  Capacity  overflow:  information  overload 

2.  Integration:  common  message? 

3.  Contention:  conflicting  channels 

4.  Comprehension 

5.  Multi-tasking  input/output 


STM  =  short  term  memory 
Similar  to  working  memory 
limited  capacity, 
continuously  overwritten 
with  new  input 


FIGURE  20.1.  Approximate  model  of  human  information  processing  using  a  "human  as  computer 
system”  analogy,  based  on  the  Model  Human  Processor  (Card,  Moran,  &  Newell,  1 983;  for  more  on 
cognitive  models,  see  Proctor  &  Vu,  chap.  2,  this  Handbook;  Byrne,  chap.  5,  this  Handbook) 


memory  difficult;  this  leads  to  the  thematic  congruence  princi¬ 
ple.  Contention  problems  are  caused  by  conflicting  attention 
between  dynamic  media,  and  when  two  inputs  compete  for  the 
same  cognitive  resources,  e.g.,  speech  and  text  both  requiring 
language  understanding  at  once.  Comprehension  is  related  to 
congruence;  we  understand  the  world  by  making  sense  of  it 
with  our  existing  long-term  memory.  Consequently,  if  multi- 
media  content  is  unfamiliar,  we  cannot  make  sense  of  it.  Finally 
multitasking  makes  further  demands  on  our  cognitive  process¬ 
ing,  so  we  will  experience  difficulty  in  attending  to  multimedia 
input  while  performing  output  tasks. 

Making  clear  a  theme  in  a  multimedia  presentation  involves  di¬ 
recting  the  user’s  reading  and  viewing  sequence  across  different 
media  segments.  Video  and  speech  are  processed  in  sequence, 
while  text  enforces  a  serial  reading  order  by  the  syntactic  con¬ 
vention  of  language.  In  contrast,  viewing  image  media  is  less  pre¬ 
dictable  since  it  depends  on  the  size  and  complexity  of  the  image, 
the  user’s  knowledge  of  the  contents,  task,  and  motivation  (Nor¬ 
man  &  Shallice,  1986),  and  the  designed  effects  for  salience.  At¬ 
tention-directing  effects  can  increase  the  probability  that  the  user 
will  attend  to  an  image  component,  although  no  guarantee  can 
be  given  that  a  component  will  be  perceived  or  understood. 

Emotion  and  Arousal 

The  content  of  image  media  in  particular  can  evoke  an  emo¬ 
tional  response,  which  can  be  used  to  promote  a  more  exciting 
and  engaging  user  experience  (e.g.,  the  use  of  human  image 
and  speech  to  persuade  users;  see  chap.  7,  this  Handbook). 
These  issues  are  dealt  with  more  extensively  in  other  chapters. 


People  treat  human  photographs,  video  and  even  animated 
characters  with  similar  social  responses  as  they  give  to  real  peo¬ 
ple,  so  human  image  content  can  be  used  to  increase  interest 
and  draw  attention.  Emotional  responses  (see  chap.  4)  can  be 
invoked  not  only  by  content,  but  also  by  surprise  interactive  ef¬ 
fects  (e.g.,  when  a  character  suddenly  appears  to  challenge  the 
users).  Surprise  effects,  moving  image,  and  unexpected  stimu¬ 
lating  images  all  affect  the  arousal  system,  which  broadly 
equates  with  our  feeling  of  excitement.  Designs  that  stimulate 
our  arousal  are  more  likely  to  be  memorable  and  engaging. 

Learning  and  Memorization 

Learning  is  the  prime  objective  in  tutorial  multimedia.  In  these 
applications,  the  objective  is  to  create  a  rich  memory  schema, 
which  can  be  accessed  easily  in  the  future.  We  learn  more  ef¬ 
fectively  by  active  problem  solving,  or  learning  by  doing.  This 
approach  is  at  the  heart  of  constructivist  learning  theory  (Pa- 
pert,  1980),  which  has  connotations  for  tutorial  multimedia.  In¬ 
teractive  microworlds  where  users  learn  by  interacting  with  sim¬ 
ulations,  or  constructing  and  testing  the  simulation,  give  a  more 
vivid  experience  that  forms  better  memories  (Rogers  &  Scaife, 
1998).  Multiple  viewpoints  help  to  develop  rich  schemata  by 
presenting  different  aspects  of  the  same  problem,  so  the  whole 
concept  can  be  integrated  from  its  parts.  An  example  might  be 
to  explain  the  structure  of  an  engine,  then  how  it  operates,  and 
finally,  display  a  causal  model  of  why  it  works.  Schema  integra¬ 
tion  during  memorization  fits  the  separate  viewpoints  together. 

The  implications  from  psychology  are  summarized  in  the 
form  of  multimedia  design  principles  that  amplify  and  extend 
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those  proposed  for  general  user  interface  design  (e.g.,  ISO  9241 
part  10;  International  Standards  Organization,  1997).  The  prin¬ 
ciples  are  high-level  concepts  that  are  useful  for  general  guid¬ 
ance,  but  they  have  to  be  interpreted  in  a  context  to  give  more 
specific  advice. 

•  Thematic  congruence:  Messages  presented  in  different  me¬ 
dia  should  be  linked  together  to  form  a  coherent  whole.  This 
helps  comprehension  as  the  different  parts  of  the  message 
make  sense  when  fitted  together.  Congruence  is  partly  a 
matter  of  designing  the  content  so  it  follows  a  logical  theme — 
for  example,  the  script  or  story  line  makes  sense  and  does 
not  assume  too  much  about  the  user’s  domain  knowledge — 
and  partly  a  matter  of  attentional  design  to  help  the  user  fol¬ 
low  the  message  thread  across  different  media  (Ayers  & 
Sweller,  2005). 

•  Manageable  information  loading:  Messages  presented  in 
multimedia  should  be  delivered  either  at  a  pace  that  is  un¬ 
der  the  user’s  control,  or  at  a  rate  that  allows  for  effective  as¬ 
similation  of  information  without  causing  fatigue.  The  rate 
of  information  delivery  depends  on  the  quantity  and  com¬ 
plexity  of  information  in  the  message,  the  effectiveness  of  the 
design  in  helping  the  user  extract  the  message  from  the  me¬ 
dia,  as  well  as  the  user’s  domain  knowledge  and  motivation. 
Two  ways  to  reduce  information  overload  are  to  (a)  avoid  ex¬ 
cessive  use  of  concurrent  dynamic  media  and  (b)  give  the 
user  time  to  assimilate  complex  messages. 

•  Ensure  compatibility  with  the  user’s  understanding:  Media 
should  be  selected  that  convey  the  content  in  a  manner  com¬ 
patible  with  the  user’s  existing  knowledge  (e.g.,  the  radiation 
symbol  and  road  sign  icons  are  used  to  convey  hazards  and 
dangers  to  users  who  have  the  appropriate  knowledge  and 
cultural  background).  The  user’s  ability  to  understand  the 
message  is  important  for  designed  image  media  (e.g.,  diagrams, 
graphs)  when  interpretation  is  dependent  on  the  user’s  knowl¬ 
edge  and  background. 

•  Complementary  viewpoints:  Similar  aspects  of  the  same  sub¬ 
ject  matter  should  be  presented  on  different  media  to  create 
an  integrated  whole.  Showing  different  aspects  of  the  same 
object — for  example,  picture  and  design  diagram  of  a  ship — 
can  help  memorization  by  developing  richer  schema  and  bet¬ 
ter  memory  cues. 

•  Consistency:  Consistency  helps  users  learn  an  interface  by 
making  the  controls,  command  names,  and  layout  follow  a  fa¬ 
miliar  pattern.  People  recognize  patterns  automatically,  so 
operating  the  interface  becomes  an  automatic  skill.  Consis¬ 
tent  use  of  media  to  deliver  messages  of  a  specific  type  can 
help  by  cueing  users  with  what  to  expect. 

•  Interaction  and  engagement:  Interaction  and  engagement 
help  understanding  and  learning  by  encouraging  the  user  to 
problem  solve.  Memory  is  an  active  process.  Interaction  in¬ 
creases  arousal  and  this  make  the  user’s  experience  more 
vivid,  exciting,  and  memorable. 

•  Reinforce  messages:  Redundant  communication  of  the  same 
message  on  different  media  can  help  learning.  Presentation 
of  the  same  or  similar  aspects  of  a  message  helps  memoriza¬ 
tion  by  the  frequency  effect.  Exposing  users  to  the  same  thing 
in  a  different  modality  also  promotes  rich  memory  cues. 


DESIGN  PROCESS 


Multimedia  design  has  to  address  the  problems  inherent  in  the 
design  of  any  user  interface,  viz.  defining  user  requirements, 
tasks,  dialogue  design;  however,  three  issues  specifically  con¬ 
cern  multimedia: 

1.  Matching  the  media  to  the  message  by  selecting  and  inte¬ 
grating  media  so  the  user  comprehends  the  information 
content  effectively. 

2.  Managing  users’  attention  so  key  items  in  the  content  are 
noticed  and  understood,  and  the  user  follows  the  message 
thread  across  several  media. 

3.  Interaction  and  navigation  so  the  user  can  access,  play,  and 
interact  with  media  in  an  engaging  and  predictable  manner. 

Figure  20.2  gives  an  overview  of  the  design  process  that  ad¬ 
dresses  these  issues. 

The  method  shown  in  the  figure  starts  by  requirements  and 
information  analysis  to  establish  the  necessary  content  and  com¬ 
munication  goals  of  the  application.  It  then  progresses  to  do¬ 
main  and  user  characteristic  analysis  to  establish  a  profile  of  the 
user  and  the  system  environment.  The  output  from  these  stages 
feeds  into  media  selection  and  integration,  which  match  the  log¬ 
ical  specification  of  the  content  to  available  media  resources. 
This  is  interleaved  with  interaction  design,  unless  the  application 
is  restricted  to  information  presentation.  Design  then  progresses 
to  thematic  integration  of  the  user’s  reading/viewing  sequence 
and  design  to  direct  the  users’  attention.  The  method  can  be 
tailored  to  fit  within  different  development  approaches.  For 


FIGURE  20.2.  Overview  of  the  multimedia  design  process  ex¬ 
pressed  as  a  data  flow  diagram. 
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instance,  in  rapid  application  development,  storyboards,  proto¬ 
types,  and  iterative  build-and-evaluate  cycles  would  be  used.  On 
the  other  hand,  in  a  more  systematic,  software  engineering  ap¬ 
proach,  more  detailed  specifications  and  scripts  will  be  produced 
before  design  commences.  Even  though  the  process  is  described 
as  a  sequence,  in  practice,  the  stages  are  interleaved  and  iterated; 
however,  requirements,  information  modeling,  and  media  se¬ 
lection  should  be  carried  out,  even  if  they  are  not  complete,  be¬ 
fore  subsequent  design  stages  commence. 

Design  approaches  in  multimedia  tend  to  be  iterative  and 
user-centered.  Storyboards  are  a  well-known  means  of  infor¬ 
mal  modeling  in  multimedia  design  (Nielsen,  1995;  Sutcliffe, 
1999).  Originating  from  animation  and  cartoon  design,  story¬ 
boards  are  a  set  of  images  which  represent  key  steps  in  a  design. 
Translated  to  software,  storyboards  depict  key  stages  in  interac¬ 
tion  and  are  used  for  conducting  walkthroughs  to  explain  what 
happens  at  each  stage.  Allowing  the  users  to  edit  storyboards 
and  giving  them  a  construction  kit  to  build  their  own  encour¬ 
ages  active  participation.  Storyboards  are  followed  by  building 
concept  demonstrators  using  multimedia  authoring  tools  (e.g., 
Macromedia  Director,  Toolbook)  to  rapidly  develop  early  proto¬ 
types.  Concept  demonstrators  are  active  simulations  that  fol¬ 
low  a  scenario  script  of  interaction;  departure  from  the  preset 
sequence  is  not  allowed.  Several  variations  can  be  run  to  sup¬ 
port  comparison;  however,  the  user  experience  is  passive.  In 
contrast,  users  can  test  interactive  prototypes  by  running  dif¬ 
ferent  commands  or  functions.  The  degree  of  interactivity  de¬ 
pends  on  the  implementation  cost  which  increases  as  proto¬ 
types  converge  with  a  fully  functional  product. 

Users,  Requirements,  and  Domains 

The  starting  point  for  multimedia,  as  in  all  applications,  is  re¬ 
quirements  analysis.  The  difference  in  multimedia  lies  in  the 
greater  emphasis  on  information  requirements.  A  variety  of 
analytic  approaches  can  be  adopted,  such  as  task-analysis  (see 
Redish,  chap.  48,  this  Handbook),  contextual  inquiry  (Holtzblatt, 
chap.  49,  this  Handbook)  or  scenario  analysis  (Rosson  &  Carroll, 
chap.  53,  this  Handbook).  Requirements  are  listed  and  catego¬ 
rized  into  information,  task-related,  and  nonfunctional  classes. 
These  will  be  expanded  in  subsequent  analyses. 

It  is  important  to  get  a  profile  of  the  target  user  population 
to  guide  media  selection.  There  are  three  motivations  for  user 
analysis: 

1 .  Choice  of  modalities,  which  is  important  for  people  with  dis¬ 
abilities,  but  also  for  user  preferences.  Some  people  prefer 
verbal-linguistic  material  over  image. 

2.  Tuning  the  content  presented  to  the  level  of  users’  existing 
knowledge,  which  is  particularly  important  for  training  and 
educational  applications. 

3 .  Capturing  users  ’  expectations  so  the  experience  can  be  geared 
toward  their  backgrounds  (e.g.,  different  styles  for  younger, 
older  people,  cultures,  and  socio-economic  audiences). 

Acquiring  information  about  the  level  of  experience  possessed 
by  the  potential  user  population  is  important  for  customization. 
User  profiles  are  used  to  design  training  applications  to  ensure 


that  the  right  level  of  tutorial  support  is  provided,  and  to  assess 
the  users’  domain  knowledge  so  that  appropriate  media  can  be 
selected.  This  is  particularly  important  when  symbols,  designed 
images,  and  diagrammatic  notations  may  be  involved.  The  roles 
and  backgrounds  of  users  will  have  an  important  bearing  on 
design.  For  example,  marketing  applications  will  need  simple, 
focused  content  and  more  aesthetic  design,  whereas  tutorial  sys¬ 
tems  need  to  deliver  detailed  content.  Information  kiosk  appli¬ 
cations  need  to  provide  information,  as  do  task-based  applica¬ 
tions,  but  decision  support  and  persuasive  systems  (Fogg,  1998; 
see  also  chap.  7,  this  Handbook)  also  need  to  ensure  users  com¬ 
prehend  and  are  convinced  by  messages.  Domain  knowledge, 
including  use  of  conventions,  symbols,  and  terminology  in  the 
domain,  is  important  because  less  experienced  users  will  require 
that  more  complete  information  is  presented. 

The  context  and  environment  of  a  system  will  also  have  an 
important  bearing  on  design.  For  example,  tourist  information 
systems  in  outdoor  public  areas  will  experience  a  wide  range  of 
lighting  conditions,  which  can  make  image  and  text  hard  to 
read.  High  levels  of  ambient  noise  in  public  places  or  factory 
floors  can  make  audio  and  speech  useless.  Hence,  it  is  impor¬ 
tant  to  gather  information  on  (a)  the  location  of  use  (e.g.,  office, 
factory  floor,  public/private  space,  and  hazardous  locations), 
(b)  pertinent  environmental  variables  (e.g.,  ambient  light,  noise 
levels,  and  temperature),  (c)  usage  conditions  (e.g.,  single  user, 
shared  use,  and  broadcast)  and  (d)  expected  range  of  locations 
(e.g.,  countries,  languages,  and  cultures).  Choice  of  language, 
icon  conventions,  interpretation  of  diagrams,  and  choice  of  con¬ 
tent  all  have  a  bearing  on  design  of  international  user  interfaces. 

As  well  as  gathering  general  information  about  the  system’s 
context  of  use,  domain  modeling  can  prove  useful  for  creating 
the  system  metaphor.  Domain  models  are  recorded  as  sketches 
of  the  work  environment,  showing  the  layout  and  location  of 
significant  objects  and  artifacts,  accompanied  by  lists  of  envi¬ 
ronmental  factors.  Structural  metaphors  for  organizing  infor¬ 
mation  and  operational  metaphors  for  controls  and  devices 
have  their  origins  in  domain  analysis. 

Information  Architecture 

This  activity  consists  of  several  activities  that  will  differ  according 
to  the  type  of  application.  Some  applications  might  have  a 
strong  task  model  (e.g.,  a  multimedia  process  control  applica¬ 
tion  where  the  tasks  are  monitoring  a  chemical  plant,  diagnos¬ 
ing  problems,  and  supporting  the  operator  in  controlling  plant 
operation).  In  task-driven  applications,  information  require¬ 
ments  are  derived  from  the  task  model.  In  information  provi¬ 
sion  applications,  such  as  websites  with  informative  roles,  in¬ 
formation  analysis  involves  categorization,  and  the  architecture 
generally  follows  a  hierarchical  model.  In  the  third  class  of  ex¬ 
planatory  or  thematic  applications,  analysis  is  concerned  with 
the  story  or  argument  (e.g.,  how  the  information  should  be  ex¬ 
plained  or  delivered).  Educational  multimedia  and  websites 
with  persuasive  missions  fall  into  the  last  category. 

In  task-driven  applications,  information  needs  are  annotated 
on  the  task  model  following  a  walkthrough  that  asks  users  what 
information  they  need  to  (a)  complete  the  task  subgoal,  (b)  take 
a  decision  at  this  step,  or  (c)  provide  as  input  (see  Sutcliffe, 
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1997).  In  information  provision  applications,  classification  of  the 
content  according  to  one  of  more  user  views  defines  the  infor¬ 
mation  architecture;  for  example,  most  university  departments 
have  an  information  structure  with  upper-level  categories  for  re¬ 
search,  undergraduate  courses,  postgraduate  courses,  staff  in¬ 
terests,  departmental  organization,  mission  and  objectives,  and 
so  forth.  A  theme  or  story  line  should  be  developed  for  ex¬ 
planatory  applications.  This  will  depend  on  the  application’s 
objectives  and  the  message  the  owner  wishes  to  deliver.  An  ex¬ 
ample  thematic  map  from  a  hypothetical  health  awareness  ap¬ 
plication  is  illustrated  in  Fig.  20.3. 

The  requirement  is  to  convince  people  of  the  dangers  of 
heart  disease.  The  theme  is  a  persuasive  argument  that  first  tries 
to  convince  people  of  the  dangers  (e.g,  smoking,  poor  diet, 
stressed  lifestyles,  etc.),  and  then  convince  them  of  ways  to  im¬ 
prove  their  lifestyles  to  prevent  heart  disease,  followed  by  rein¬ 
forcing  the  message  with  the  benefits  of  a  healthy  lifestyle  (e.g., 
lower  health  insurance,  saving  money,  longer  life,  etc.).  Sub¬ 
themes  are  embedded  at  different  points  so  users  can  explore 
the  facts  behind  heart  disease,  the  statistics  and  their  exposure, 
how  to  get  help,  and  so  forth.  Information  is  then  gathered  for 
each  node  in  the  thematic  map.  How  this  architecture  will  be  de¬ 
livered  depends  on  interaction  design  decisions:  it  could  be¬ 
come  an  interactive  story  to  explore  different  lifestyle  choices, 
combined  with  a  quiz.  The  outcome  of  information  architecture 
analysis  will  be  an  information-enhanced  task  model,  a  thematic 
map,  or  a  hierarchy/network  to  show  the  structure  and  relation¬ 
ships  of  information  categories.  The  next  step  is  to  analyze  the 
information  content  by  classifying  it  by  types. 

Information  types  are  amodal,  conceptual  descriptions  of 
information  components  that  elaborate  the  content  definition. 
Information  types  specify  the  message  to  be  delivered  in  a  mul¬ 
timedia  application  and  are  operated  on  by  mapping  rules  to  se¬ 
lect  appropriate  media  resources.  The  following  definitions  are 
based  on  the  Task-Based  Information  Analysis  Method  (Sutcliffe, 
1997)  and  ISO  14915,  part  3  (International  Standards  Organiza¬ 
tion,  2000). 

The  information  types  are  used  in  “walkthroughs”,  in  which 
the  analyst  progresses  through  the  task/scenario/use  case  asking 
questions  about  information  needs.  This  can  be  integrated  with 


data  modeling  (or  object/class  modeling),  so  that  the  informa¬ 
tion  in  objects  and  their  attributes  can  be  categorized  by  the 
following  types,  using  the  decision  tree  shown  in  Fig.  20.4.  The 
first  question  is  whether  information  represents  concrete  facts 
about  the  real  world  or  more  abstract,  conceptual  information; 
this  is  followed  by  questions  about  the  information  that  relates 
to  change  in  the  world  or  describes  permanent  states.  Finally, 
the  decision  tree  gives  a  set  of  ontological  categories  to  classify 
information  that  expand  on  type  definitions  commonly  found  in 
software  engineering  specifications.  More  complex  ontologies 
are  available  (Arens,  Hovy  &  Van  Mulken,  1993;  Mann  &  Thomp¬ 
son,  1988),  so  the  classification  presented  in  Fig.  20.4  is  a  com¬ 
promise  between  complexity  and  ease  of  use.  A  finer-grained 
classification  enables  more  finely  tuned  media  selection  deci¬ 
sions,  but  at  the  cost  of  more  analysis  effort. 

Components  are  classified  by  “walking  through”  the  decision 
tree  using  the  definitions  and  the  following  questions: 

•  Is  the  information  contained  in  the  component  physical  or 
conceptual? 

•  Is  the  information  static  or  dynamic — for  example,  does  it 
relate  to  change  or  not? 

•  Which  type  in  the  terminal  branch  of  the  tree  does  the  infor¬ 
mation  component  belong  to? 

It  is  important  to  note  that  one  component  may  be  classi¬ 
fied  with  more  than  one  type;  for  instance,  instructions  on  how 
to  get  to  the  railway  station  may  contain  procedural  information 
(the  instructions  “turn  left,”  “straight  ahead,”  etc.),  and  spatial 
or  descriptive  information  (“the  station  is  in  the  corner  of  the 
square,  painted  blue”).  The  information  types  are  “tools  for 
thought”  and  can  be  used  either  to  classify  specifications  of  con¬ 
tent  or  to  consider  what  content  may  be  necessary.  To  illustrate, 
for  the  task  “navigate  to  the  railway  station,”  the  content  may  be 
minimally  specified  as  “instructions  how  to  get  there,”  in  which 
case  the  information  types  prompt  questions  in  the  form  “what 
sort  of  information  does  the  user  need  to  fulfil  the  task/user 
goal?”  Alternatively,  the  content  may  be  specified  as  a  scenario 
narrative  of  directions,  waymarks  to  recognize  and  description 
of  the  target.  In  this  case,  the  types  classify  components  in  the 


FIGURE  20.3.  Thematic  map  for  a  healthcare  promotion  application. 
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FIGURE  20.4.  Decision  tree  for  classifying  information  types.  The  first  decision  point  reflects 
abstraction  from  the  real  world,  the  second  points  to  change  in  time  and  the  third  categorizes 
content. 


narrative  to  elucidate  the  deeper  structure  of  the  content.  The 
granularity  of  components  is  a  matter  of  the  designer’s  choice 
and  will  depend  on  the  level  of  detail  demanded  by  the  appli¬ 
cation.  To  illustrate  the  analysis, 

Communication  goal:  Explain  how  to  assemble  a  bookshelf 
from  ready-made  parts. 

Information  component  1: 

Parts  of  the  bookshelf:  sides,  back,  shelves,  connecting 
screws 

Mapping  to  information  types: 

Physical-Static-Descriptive;  parts  of  the  bookshelf  are 
tangible,  don’t  change,  and  need  to  be  described 
Physical-Static-Spatial;  dimensions  of  the  parts,  how  they 
are  organized 

Physical-Static-Relationship  type  could  also  be  added  to 
describe  which  parts  fit  together 

Information  component  2: 

How  to  assemble  parts  instructions 


Mapping  to  information  types: 

Physical-Dynamic-Discrete  action 

Physical-Dynamic-Procedure 

Physical-Static-State;  to  show  final  assembled  bookshelf 

Media  Selection  and  Combination 

The  information  types  are  used  to  select  the  appropriate  media 
resource(s).  Media  classifications  have  had  many  interpretations 
(Alty,  1997;  Bernsen,  1994;  Heller  &  Martin,  1995).  The  follow¬ 
ing  classification  focuses  on  the  psychological  properties  of  the 
representations,  rather  than  the  physical  nature  of  the  medium 
(e.g.,  digital  or  analog  encoding  in  video).  Note  that  these  defi¬ 
nitions  are  combined  to  describe  any  specific  medium,  so  speech 
is  classified  as  an  audio,  linguistic  medium,  while  a  cartoon  is  clas¬ 
sified  as  a  nonrealistic  (designed)  moving  image. 

The  definitions  may  be  usefully  considered  in  two  dimen¬ 
sions  of  abstraction:  (a)  the  designer’s  involvement  in  creating 
the  medium,  and  (b)  the  rate  of  change.  Media  resources  are 
classified  using  the  decision  tree  illustrated  in  Fig.  20.5.  More 
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FIGURE  20.5.  Decision  tree  for  classifying  media  resources. 


fine-grained,  taxonomic  distinctions  can  be  made,  for  instance, 
between  different  signs  and  symbolic  languages  (see  Bernsen, 
1994),  but  as  with  information  types,  richer  taxonomies  increase 
specification  effort. 

The  approach  to  classifying  media  uses  a  walkthrough  of  the 
decision  tree  with  the  following  questions  that  reflect  the  facets 
of  the  classification: 

•  Is  the  medium  perceived  to  be  realistic  or  not?  Media  re¬ 
sources  captured  directly  from  the  real  world  will  usually  be 
realistic  (e.g.,  photographs  of  landscapes,  bird  song  sound 
recordings).  In  contrast,  nonrealistic  media  are  created  by 
human  action;  however,  the  boundary  case  category  that  il¬ 
lustrates  the  dimension  is  a  realistic  painting  of  a  landscape. 

•  Does  the  medium  change  over  time  or  not?  The  boundary 
case  here  is  the  rate  of  change,  particularly  in  animations 
where  some  people  might  judge  ten  frames/sec  to  be  a  video, 
but  a  PowerPoint  presentation  of  five  slides  in  one  minute  to 
be  a  sequence  of  static  images. 

•  Which  modality  does  the  resource  belong  to?  In  this  case,  the 
categories  are  orthogonal,  although  one  resource  may  ex¬ 
hibit  two  modalities  (e.g.,  a  film  with  a  sound  track  commu¬ 
nicates  in  both  visual  and  audio  modalities). 

Classification  of  media  resources  facilitates  mapping  of  in¬ 
formation  types  to  media  resources;  however,  the  process  of  se¬ 
lection  may  also  guide  the  acquisition  or  creation  of  appropriate 
media  resources.  So  if  the  selection  process  indicates  the  need 
for  a  resource  that  is  not  in  the  designer’s  media  resource 
library,  the  classification  guides  the  necessary  acquisition/ 
creation.  Cost  trade-offs  will  naturally  be  considered  in  this 
process.  Finally,  the  classification  provides  a  mechanism  for  in¬ 
dexing  media  resource  libraries. 

Media  selection.  Recommendations  for  selecting  and  in¬ 
fluencing  the  user’s  attention  have  to  be  interpreted  according 


to  the  user’s  task  and  design  goal.  If  information  provision  is  the 
main  design  goal  (e.g.,  a  tourist  kiosk  information  system),  then 
persistence  of  information  and  drawing  attention  to  specific 
items  is  not  necessarily  as  critical  as  in  tutorial  applications.  Task 
and  user  characteristics  influence  media  choice;  for  instance, 
verbal  media  are  more  appropriate  to  language-based  and  logi¬ 
cal  reasoning  tasks,  while  visual  media  are  suitable  for  spatial 
tasks  involving  moving,  positioning,  and  orienting  objects. 
Some  users  may  prefer  visual  media,  while  image  is  of  little  use 
for  blind  users.  Media  resources  may  be  available  for  selection, 
or  have  to  be  purchased  from  elsewhere.  If  existing  media  can 
be  edited  and  reused,  this  is  usually  preferable  to  creating  new 
media  from  scratch.  Graphical  images  can  be  particularly  ex¬ 
pensive  to  draw,  whereas  capture  of  images  by  scanning  is  usu¬ 
ally  quick  and  cheap. 

The  suggested  mappings  from  information  types  to  media 
resources  are  given  in  Table  20.1.  Information  types  are  mapped 
to  a  simple  categorization  of  media  as  (a)  static,  nonstreaming 
media;  (b)  still  image,  text,  diagrams,  graphs  and  dynamic  me¬ 
dia;  (c)  audio:  natural  sounds,  speech,  music,  and  designed 
sounds;  and  (d)  video  and  animation.  Several  media  map  to 
each  type,  although  the  preferred  choice  is  shown  in  italics. 
Since  most  components  in  the  information  architecture  will 
have  multiple  information  types  and  each  information  type 
maps  to  several  media,  the  selection  process  encourages  multi- 
media  integration.  For  example,  when  a  procedure  for  explain¬ 
ing  a  physical  task  is  required,  first  a  series  of  realistic  images  will 
be  selected,  followed  by  video  and  speech  to  integrate  the 
steps,  then  text  to  summarize  the  key  points.  The  mappings  are 
supplemented  by  the  following  heuristics: 

•  To  convey  detail,  use  static  media  (e.g.,  text  for  language- 
based  content,  diagrams  for  models,  or  still  image  for  physi¬ 
cal  detail  of  objects;  Booher,  1975;  Faraday  &  Sutcliffe,  1998b). 

•  To  engage  the  user  and  draw  attention,  use  dynamic  media 
(e.g.,  video  for  physical  information,  animation,  or  speech). 


TABLE  20. 1 .  Media  Selection  Example 
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Note:  The  italics  denote  the  preferred  mappings  for  media  and  information  types,  while  ordinary  text  shows  other  potential  media  uses  for  the  information  type. 
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•  For  spatial  information,  use  diagrams,  maps  with  photo¬ 
graphic  images  to  illustrate  detail,  animations  to  indicate 
pathways  (Bieger  &  Clock,  1984;  May  &  Barnard,  1995). 

•  For  values  and  quantitative  information,  use  charts  and  graphs 
for  overviews  and  trends,  supplemented  by  tables  for  detail 
(Berlin,  1983;  Tufte,  1997). 

•  Abstract  concepts,  relationships,  and  models  should  be  illus¬ 
trated  with  diagrams  explained  by  text  captions  and  speech 
to  give  supplementary  information. 

•  Complex  actions  and  procedures  should  be  illustrated  as  a 
slideshow  of  images  for  each  step,  followed  by  a  video  of  the 
whole  sequence  to  integrate  the  steps.  Text  captions  on  the 
still  images  and  speech  commentary  provide  supplementary 
information  (Hegarty  &  Just,  1993).  Text  and  bullet  points 
summarize  steps  at  the  end,  so  choice  trade-offs  may  be  con¬ 
strained  by  cost  and  quality  considerations. 

•  To  explain  causality,  still  and  moving  image  media  need  to  be 
combined  with  text  (e.g.,  the  cause  of  a  flood  is  explained  by 
text  describing  excessive  rainfall  with  an  animation  of  the 
river  level  rising  and  overflowing  its  banks;  Narayanan  & 
Hegarty,  1998,  2002).  Causal  explanations  of  physical  phe¬ 
nomena  may  be  given  by  introducing  the  topic  using  linguis¬ 
tic  media,  showing  the  cause  and  effect  by  a  combination  of 
still  image  and  text  with  speech  captions  for  commentary; 
integrating  the  message  by  moving  image  with  voice  com¬ 
mentary  and  providing  a  bullet  point  text  summary. 

The  end  point  of  media  selection  is  media  integration:  one 
or  more  media  will  be  selected  for  each  information  group  to 
present  complementary  aspects  of  the  topic.  Some  examples 
of  media  combination  that  amplify  the  basic  selection  guidelines 
are  given  in  Table  20.1.  The  table  summarizes  the  media  selec¬ 
tion  and  combinations  for  each  information  type;  the  first  pref¬ 
erence  media  choice  for  each  is  shown  in  italics. 

Engagement  and  attractiveness.  The  previous  guide¬ 
lines  were  oriented  to  a  task-driven  view  of  media.  Media  selec¬ 
tion,  however,  can  also  be  motivated  by  aesthetic  choice.  These 
considerations  may  contradict  some  of  the  earlier  guidelines  be¬ 
cause  the  design  objective  is  to  please  the  user  and  capture  the 
user’s  attention,  rather  than  deliver  information  effectively.  First, 
a  health  warning  should  be  noted:  the  old  saying  “beauty  is  in 
the  eye  of  the  beholder”  has  good  foundation.  Judgments  of  aes¬ 
thetic  quality  suffer  from  considerable  individual  differences.  A 
person’s  reaction  to  a  design  is  a  function  of  their  motivation 
(see  Brave  &  Nass,  chap.  4,  this  Handbook),  individual  prefer¬ 
ences,  knowledge  of  the  domain,  and  exposure  to  similar  exam¬ 
ples,  to  say  nothing  of  peer  opinion  and  “fashion.”  Furthermore, 
attractiveness  is  often  influenced  more  by  content  than  the 
choice  of  media  or  presentation  format.  The  following  heuris¬ 
tics  should  therefore  be  interpreted  with  care  and  their  design 
manifestations  tested  with  users: 

•  Dynamic  media,  especially  video,  have  an  arousing  effect 
and  attract  attention;  hence,  video  and  animation  are  useful 
in  improving  the  attractiveness  of  presentations.  Animation 
must  be  used  with  care,  however,  as  gratuitous  video  that 
cannot  be  turned  off  quickly  offends  (Spool,  Scanlon,  Snyder, 
Schroeder,  &  de  Angelo,  1999). 


•  Speech  engages  attention  because  we  naturally  listen  to  con¬ 
versation.  Choice  of  voice  depends  on  the  application:  female 
voices  for  more  restful  and  information  effects,  male  voices  to 
suggest  authority  and  respect  (Reeves  &  Nass,  1996). 

•  Image  must  be  selected  with  careful  consideration  of  content. 
Images  may  be  selected  for  aesthetic  motivations,  to  provide 
a  restful  setting  for  more  important  foreground  information 
(Mullet  &  Sano,  1995).  Backgrounds  in  half  shades  and  low  sat¬ 
uration  color  provide  more  depth  and  interest  in  an  image. 

•  Music  has  an  important  emotive  appeal,  but  it  must  be  used 
with  care.  Classical  music  may  be  counterproductive  for  a 
younger  audience,  while  older  listeners  will  not  find  heavy 
metal  pop  attractive. 

•  Natural  sounds  such  as  running  water,  wind  in  trees,  bird 
songs,  and  waves  on  a  seashore  have  restful  properties  and 
hence  decrease  arousal. 

Media  integration  rules  may  also  be  broken  for  aesthetic  rea¬ 
sons.  If  information  transfer  is  not  at  a  premium,  use  of  two  con¬ 
current  video  streams  might  be  arousing  for  a  younger  audi¬ 
ence,  as  MTV  (music  television)  and  pop  videos  indicate. 
Multiple  audio  and  speech  tracks  can  give  the  impression  of 
complex,  busy,  and  interesting  environments. 

Aesthetic  design.  If  the  user  profile  and  domain  analysis 
indicate  that  having  a  pleasing  and  attractive  design  is  important 
for  the  user’s  perception,  then  aesthetics  need  to  be  considered 
in  depth;  however,  aesthetics  should  be  considered  as  a  design 
criterion  for  all  applications,  since  poor  appearance  and  inter¬ 
action  design  may  provoke  adverse  reaction  (Norman,  2004). 
Some  studies  suggest  that  aesthetic  design  is  an  important 
component  of  usability  and  overall  preference  (Tractinsky  1997; 
Tractinsky  Shoval-Katz,  &  Ikar,  2000);  however,  others  have 
shown  that  aesthetic  preferences  are  open  to  contextual  ef¬ 
fects  on  users’  judgment  (Sutcliffe  &  de  Angeli,  2005).  Judging 
when  aesthetics  may  be  important  is  not  easy.  For  example,  in  e- 
commerce  applications  with  high-value,  designer-label  prod¬ 
ucts,  aesthetic  presentation  is  advisable,  as  is  similar  when  sell¬ 
ing  to  a  design-oriented  audience;  however,  the  decision  is  not 
clear-cut  in  many  applications. 

Aesthetic  design  primarily  concerns  graphics  and  visual  me¬ 
dia.  Evaluation  questionnaires  assess  design  on  classic  aesthet¬ 
ics,  which  broadly  equate  with  conventional  usability  guidelines 
on  structured  and  consistent  layout,  and  expressive  aesthetics 
that  capture  the  more  creative  aspects  of  visual  design  (Lavie  & 
Tractinsky,  2004);  however,  these  measure  user  reaction  to  gen¬ 
eral  design  aspects  such  as  “original,  fascinating,  clear,  and  pleas¬ 
ant.”  The  following  heuristics  provide  more  design-directed 
guidance,  but  they  may  also  be  employed  for  evaluation  (Sut¬ 
cliffe,  2002;  Sutcliffe  &  de  Angeli,  2005): 

•  Judicious  use  of  color:  Color  use  should  be  balanced,  and 
low  saturation  pastel  colors  should  be  used  for  backgrounds. 
Designs  should  not  use  more  than  two  to  three  fully  satu¬ 
rated  intense  colors. 

•  Symmetry:  Symmetrical  layouts  (e.g.,  bilateral,  radial  organiza¬ 
tion)  that  can  be  folded  over  to  show  the  symmetrical  match. 
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•  Simplicity  and  space:  Uncluttered,  simple  layouts  that  use 
space  to  separate  and  emphasize  key  components. 

•  Shape:  Curved  shapes  convey  an  attractive  visual  style  when 
contrasted  with  rectangles. 

•  Structured  and  consistent  layout:  Use  of  grids  to  structure 
image  components  and  promote  consistency  between  pages. 

•  Depth  of  field:  Use  of  layers  in  an  image  and  washed  out  back¬ 
ground  images  stimulates  curiosity  and  can  be  attractive  by 
promoting  a  peaceful  effect. 

•  Design  of  unusual  or  challenging  images  that  stimulate  the 
users’  imagination  and  increase  attraction:  Unusual  images 
often  disobey  normal  laws  of  form  and  perspective. 

Several  sources  provide  more  detailed  advice  on  aesthetics 
and  visual  design  (Kristof  &  Satran,  1995;  Mullet  &  Sano,  1995); 
however,  advice  is  usually  given  as  examples  of  good  design 
rather  than  specific  guidelines. 

Image  and  identity.  Design  of  media  for  motivation  is  a 
complex  area  in  its  own  right,  and  this  topic  is  dealt  with  in  more 
depth  in  Fogg  (chap.  7,  this  Handbook),  so  the  treatment  here 
will  focus  on  media  selection  issues. 

Simple  photographs  or  more  complex  interactive  animations 
(talking  heads  or  full  body  mannequins)  have  an  attractive  effect. 
We  appear  to  ascribe  human  properties  to  computers  when  in¬ 
terfaces  give  humanlike  visual  cues  (Reeves  &  Nass,  1996);  how¬ 
ever,  the  effectiveness  of  media  representing  people  depends  on 
the  characters’  appearance  and  voice  (see  Fig.  20.6,  courtesy  of 
Sapient).  In  human-human  conversation,  we  modify  our  reactions 


according  to  our  knowledge,  or  assumptions  about,  the  other  per¬ 
son’s  role,  group  identification,  culture,  and  intention  (Clark, 
1996).  For  example,  reactions  to  a  military  mannequin  will  be  very 
different  from  those  to  the  representation  of  a  parson.  Male  voices 
tend  to  be  treated  as  more  authoritative  than  female  voices. 

Use  of  humanlike  forms  is  feasible  with  prerecorded  video  and 
photographs;  however,  the  need  depends  on  the  application. 
Video  representation  of  the  lecturer  can  augment  presentations, 
and  video  communication  helps  interactive  dialogue.  A  good 
speaker  holds  our  attention  by  a  variety  of  tricks,  such  as  main¬ 
taining  eye  contact,  varying  the  voice  tone,  and  using  simple  and 
concise  language,  as  well  as  delivering  an  interesting  message. 
These  general  effects  can  be  reinforced  by  projected  personality. 
Friendly  people  are  preferred  over  colder,  more  hostile  individu¬ 
als.  TV  announcers,  who  tend  to  be  middle-aged  and  confident, 
but  avuncular,  characters,  have  attention  drawing  power  of  a 
dominant  yet  friendly  personality.  Both  sexes  pay  attention  to  ex¬ 
trovert,  young  personalities,  while  the  male  preference  for  beau¬ 
tiful  young  women  is  a  particularly  strong  effect.  These  traits  have 
been  exploited  by  advertisers  for  a  long  time.  There  are  lessons 
here  for  multimedia  designers  as  the  Web  and  interactive  TV  con¬ 
verge.  Multimedia  will  increasingly  convey  persuasive  messages 
(Reeves  &  Nass,  1996;  Fogg,  chap.  17,  this  Handbook). 

Affective  effects.  Media  design  for  affect  (emotional  re¬ 
sponse  and  arousal)  involves  both  choice  of  context  and  inter¬ 
action.  Arousal  is  increased  by  (a)  interactive  applications, 
(b)  surprising  events  during  interaction,  (c)  use  of  dynamic  me¬ 
dia,  and  (d)  challenging  images.  In  contrast,  if  the  objective  is  to 
calm  the  users,  arousal  can  be  decreased  by  choice  of  natural 
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FIGURE  20.6.  Effective  use  of  human  image  for  attraction.  The  picture  attracts  by  the  direction 
of  gaze  to  the  user  as  well  as  by  the  appearance  of  the  individual.  Courtesy  of  Sapient. 
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images  and  sounds,  and  soothing  music.  The  most  common 
emotional  responses  that  designers  may  want  to  invoke  are 
pleasure,  anxiety,  fear,  and  surprise.  Pleasure,  anxiety,  and  fear 
usually  depend  on  our  memory  of  agents,  objects,  and  events 
(Ortony,  Clore,  &  Collins,  1988),  which  will  be  presented,  so 
content  selection  is  the  important  determinant;  however,  anxi¬ 
ety  can  be  evoked  by  uncertainty  in  interaction  and  cues  to  hid¬ 
den  effects.  The  emotional  response  of  fear  or  pleasure  will 
depend  on  the  user’s  previous  experience. 

Although  the  bounds  of  media  selection  are  only  set  by  the 
creative  imagination  of  the  designer,  some  fundamentals  don’t 
change:  design  still  needs  to  be  motivated  by  users’  goals,  which 
may  be  related  to  work,  learning,  or  having  fun;  nevertheless, 
testing  with  users  is  still  essential. 

The  outcome  of  media  selection  is  a  first-cut  script.  The  pre¬ 
sentation  sequence  can  be  planned  using  bar  charts  to  illustrate 
the  presentation  order,  organized  according  to  the  task,  or  ap¬ 
plication  requirements  (e.g.,  lesson  plan);  see  Fig.  20.7.  The 
script  gives  an  outline  plan  of  the  presentation,  which  can  be  an¬ 
notated  with  navigational  and  dialogue  controls.  The  informa¬ 
tion  model  defines  the  high-level  presentation  order.  Key  infor¬ 
mation  items  are  annotated  onto  presentation  bar  charts  when 
planning  the  sequence  and  duration  of  media  delivery.  The  in¬ 
formation  types  are  ordered  in  a  first-cut  sequence.  The  se¬ 
lected  media  are  added  to  the  bar  chart  to  show  which  media 
stream  will  be  played  over  time.  Decisions  on  timing  depend  on 
the  content  of  the  media  resource  (e.g.,  length  of  a  video  clip, 
frame  display  rate) .  Timeline  diagrams  can  be  augmented  by  hi¬ 
erarchy  diagrams  to  show  the  classification  of  information 


groups  that  can  be  mapped  to  menu  style  access  paths,  or  net¬ 
work  diagrams  to  help  planning  hypermedia  links. 

Interaction  and  Navigation 

Although  discussion  of  interactive  multimedia  has  been  delayed 
until  now,  in  practice,  dialogue  and  presentation  design  proceed 
hand  in  hand.  Task  analysis  provides  the  basis  for  dialogue  de¬ 
sign,  as  well  as  specification  of  navigation  controls.  Dialogue 
controls  will  be  included  in  early  storyboards.  Navigational  and 
control  dialogues  allow  flexible  access  to  the  multimedia  con¬ 
tent,  and  give  users  ability  to  control  how  media  are  played.  Di¬ 
alogue  design  may  also  involve  specifying  how  users  interact 
with  tools,  agents,  and  objects  in  interactive  microworlds. 

Metaphors  and  interaction  design.  While  task  and 
domain  analysis  can  provide  ideas  for  interaction  design,  this  is 
also  a  creative  process.  Interaction  design  is  essentially  a  set  of 
choices  along  a  dimension,  from  simple  controls  such  as  menus 
and  buttons,  where  the  user  is  aware  of  the  interface,  to  em¬ 
bodiment,  in  which  the  user  becomes  involved  as  part  of  the  ac¬ 
tion  by  controlling  an  avatar  or  other  representation  of  their 
presence.  At  this  end  of  the  dimension  multimedia  interaction 
converges  with  virtual  reality  (see  chap.  31,  this  Handbook).  In¬ 
teractive  metaphors  occupy  the  middle  ground. 

Some  interactive  metaphors  are  generally  applicable,  such  as 
(a)  timelines  to  move  through  historical  information,  (b)  the 
use  of  a  compass  to  control  direction  of  movement  in  an  inter- 
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FIGURE  20.7.  Bar  chart  representation  of  the  first-cut  presentation  script  after  media  selec¬ 
tion.  Information  types  are  placed  on  the  y  axis  with  the  task  sub-goals  that  determine  the  script 
order  on  the  x  axis  with  media  resources  represented  by  bars.  For  this  browsing  task,  three  sepa¬ 
rate  sequences  are  planned  for  the  view  schedule,  promotions,  and  so  forth. 
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active  space,  and  (c)  controls  based  on  automobiles  (steering 
wheels)  or  ships  (rudders).  Others  will  be  more  specific,  such  as 
selecting  and  interacting  with  different  characters  (young,  old, 
male,  female,  overweight,  fit,  etc.)  in  a  health  promotion  appli¬ 
cation.  Design  of  interaction  also  involves  creating  the  micro¬ 
world  within  which  the  user  moves,  as  well  as  interactive  objects 
that  can  be  selected  and  manipulated. 

Interaction  via  characters  and  avatars  can  increase  the  user’s 
sense  of  engagement  first  by  selecting  or  even  constructing  the 
character,  although  some  users  may  not  have  the  patience  to 
build  their  own  avatar  using  a  graphical  paint  program.  In 
character-based  interaction,  the  user  can  either  see  the  world 
from  an  egocentric  viewpoint  (e.g.,  from  their  character’s  posi¬ 
tion)  or  exocentric  viewpoint  (e.g.,  when  they  see  their  character 
in  the  graphical  world).  The  sophistication  in  control  of  move¬ 
ment  and  interaction  will  depend  on  the  hardware  available 
(e.g.,  joystick,  wand,  or  standard  mouse  and  keyboard).  "While 
mimicking  physical  interaction  via  datagloves  and  tracking  re¬ 
quires  virtual  reality  technology,  relatively  complex  interaction 
(e.g.,  actions  in  a  football  game:  pass,  head  ball  in  direction  north/ 
south/east/west)  can  be  programmed  into  button  and  function 
keys.  Engagement  is  also  promoted  by  surprise  and  unexpected 
effects,  so  as  the  user  moves  into  a  particular  area  a  new  subworld 
opens  up,  or  system-controlled  avatars  appear.  These  techniques 
are  well  known  to  games  programmers;  however,  they  are  also 
applicable  to  other  genres  of  multimedia  applications.  The  de¬ 
sign  concepts  for  engagement  can  be  summarized  as  follows: 

•  Character-driven  interaction:  Provides  the  user  with  a  choice 
of  avatars  or  personae  they  can  adopt  as  representations  of 
themselves  within  the  interactive  virtual  world. 


•  Tool-based  interaction:  Places  tools  in  the  world  that  users 
can  pick  up,  and  the  tool  becomes  the  interface  (e.g.,  a  virtual 
mirror  magnifies,  and  a  virtual  helicopter  flies;  Tan,  Robert¬ 
son,  &  Czerwinski,  2001). 

•  Collaborative  characters:  In  computer  mediated  communi¬ 
cation,  these  characters  may  represent  other  users;  in  other 
applications,  system-controlled  avatars  appear  to  explain, 
guide,  or  warn  the  user. 

•  Surprise  effects:  Although  conventional  HCI  guidelines 
should  encourage  making  the  affordances  and  presence  of 
interactive  objects  explicit,  when  designing  for  engagement, 
hiding  and  surprise  are  important. 

Interaction  design  for  an  explanatory/tutorial  application  is 
illustrated  in  Fig.  20.8  (www.bbc.co.uk/sn/).  This  is  an  interac¬ 
tive  microworld  in  which  the  user  plays  the  role  of  a  dinosaur 
character,  illustrating  use  of  the  engagement  concepts.  A  com¬ 
pass  navigation  metaphor  allows  the  user  to  act  as  the  dinosaur, 
moving  around  the  landscape  illustrated  in  the  photographs. 
The  user  is  given  feedback  on  the  characteristics  of  other  preda¬ 
tors  and  prey  in  the  vicinity  and  has  to  decide  whether  to  attack 
or  avoid  them.  Other  controls  that  might  be  added  to  such  in¬ 
teractive  microworlds  could  be  settings  to  change  the  environ¬ 
ment  (e.g.,  add  more  predators,  change  the  weather,  etc.).  En¬ 
gagement  can  be  taken  even  further  by  giving  the  user  facilities 
to  actually  design  the  MacWorld  so  the  application  becomes  a 
domain-oriented  design  environment  (Eischer,  Giaccardi,  Ye, 
Sutcliffe,  &  Mehandjiey  2004). 

Navigation.  In  information-intensive  multimedia  where 
access  to  content  is  the  main  design  goal,  hypermedia  dialogues 


FIGURE  20.8.  Interactive  microworld — Big  A1  game  (www.bbc.co.uk/sn/).  The  user  plays  the 
dinosaur  role  by  navigating  with  the  compass  metaphor.  The  photograph  updates  with  each  move 
and  the  user  is  given  a  choice  of  attacking  or  avoiding  other  dinosaurs  in  the  virtual  world. 
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that  link  content  segments  will  be  appropriate.  Good  hypertext 
design  is  based  on  a  sound  information  analysis  that  specifies 
the  pathways  between  related  items,  and  the  use  of  cues  to 
show  the  structure  of  the  information  space  to  the  user.  In 
document-based  hypermedia  (e.g.,  HTML  and  the  Web),  links 
can  only  access  the  whole  media  resource  rather  than  point  to 
components  within  it.  The  design  issues  are  to  (a)  plan  the  over¬ 
all  structure,  (b)  segment  complex  structures  into  a  hierarchy  of 
subnetworks,  and  then  (c)  plan  the  implementation  of  each 
subnetwork.  The  access  structure  of  most  hypermedia  will  be 
hierarchical,  organized  according  to  the  information  model  and 
categorization  of  content  (e.g.,  information  grouped  by  func¬ 
tion,  organization,  task  usage,  or  user  preference).  Navigation 
design  transforms  the  user’s  conceptual  model  of  an  informa¬ 
tion  space  into  a  hypermedia  structure.  Unfortunately,  individ¬ 
ual  users  have  different  models,  so  this  may  not  be  an  easy  task. 
Implementing  too  many  links  in  order  to  satisfy  each  user’s  view 
will  make  the  system  too  complex  and  increase  the  chance  of 
the  user  getting  lost.  Too  few  links  will  frustrate  users  who  can¬ 
not  find  the  associations  they  want.  Unfortunately,  hypermedia 
systems  assume  a  fixed  link  structure  so  the  user  is  limited  to 
the  pathways  provided  by  the  designer.  More  open-ended 
hypermedia  environments  (e.g..  Microcosm:  Lowe  &  Hall,  1998) 
provide  more  flexibility  via  links  with  query  facilities  which  allow 
access  to  databases.  Dynamic  links  attached  to  hotspots  in  im¬ 
ages  or  nodes  in  text  documents  provide  access  paths  to  a  wider 
variety  of  data. 

One  problem  with  large  hypermedia  systems  is  that  users 
get  lost  in  them.  Navigation  cues,  waymarks,  and  mini-map 
overviews  can  help  to  counter  the  effects  of  disorientation. 
Mini-maps  give  an  overview  of  the  hypertext  area  and  a  refer¬ 
ence  context  for  where  users  are  in  large  networks.  Filters  help 
to  reduce  complexity  by  showing  only  a  subset  of  nodes  and 
links  that  the  user  is  interested  in.  Having  typed  links  helps  fil¬ 
tering  views  because  the  user  can  guess  the  information  con¬ 
tent  from  the  link  type  (e.g.,  reference,  example,  source,  related 
work,  etc.).  Other  navigation  facilities  are  visit  lists  containing 
a  history  of  nodes  traversed  in  a  session,  and  bookmarks,  so 
users  can  tailor  a  hypermedia  application  with  their  own  navi¬ 
gation  aide-memoires  (Nielsen,  1995).  Once  the  structure  has 
been  designed,  link  cues  need  to  be  located  within  media  re¬ 
sources,  so  the  appropriate  cues  need  to  be  considered  for 
each  medium.  For  example, 

•  Text  media:  The  'Web  convention  is  to  underline  and  high¬ 
light  text  in  a  consistent  color,  such  as  blue  or  purple. 

•  Images:  Link  cues  can  be  set  as  standalone  icons  or  as  active 
components  in  images.  Icons  need  to  be  tested  with  users 
because  the  designer’s  assumed  meaning  can  be  ambiguous. 
Active  components  should  signal  the  link’s  presence  by  cap¬ 
tions  or  pop-up  hover  text  so  the  user  can  inspect  a  link  be¬ 
fore  deciding  whether  to  follow  it. 

•  Moving  images:  Links  from  animation  and  film  are  difficult 
to  design  because  the  medium  is  dynamic;  however,  link 
buttons  can  be  placed  below  the  video  window.  Active  com¬ 
ponents,  such  as  overlaid  buttons  within  a  moving  image, 
are  technically  more  challenging  programs.  Buttons  may 
also  be  timed  to  pop  up  at  appropriate  times  during  the 
video. 


•  Audio  links:  Sound  and  speech  links  present  the  same  diffi¬ 
culties  as  moving  images.  One  solution  is  to  use  visual  cues, 
possibly  synchronized  with  the  sound  or  speech  track.  If 
speech  recognition  is  available,  then  voice  commands  can  act 
as  links,  but  these  commands  need  to  be  explained  to  the  user. 

Navigation  controls  provide  access  to  the  logical  content  of 
multimedia  resources;  however,  access  to  logical  components 
may  be  constrained  by  limitations  of  physical  media  resources. 
For  example,  a  movie  may  be  logically  composed  of  several 
scenes,  but  it  can  only  be  accessed  by  an  approximate  timer  set 
to  the  beginning  of  the  whole  film.  "Worse  still,  a  video  clip  may 
only  be  playable  as  a  single  segment,  making  implementation 
of  navigation  requirements  impossible.  It  is  advisable  to  give  the 
user  control  whenever  possible. 

In  many  cases,  the  media-rendering  device,  such  as  a  video 
player  for  .avi  files  or  QuickTime  movies,  will  provide  controls. 
If  controls  have  to  be  implemented  from  scratch,  the  following 
list  should  be  considered  for  each  media  type: 

•  Static  media:  Size  and  scale  controls  to  zoom  and  pan;  page 
access,  if  the  medium  has  page  segmentation,  as  in  text  and 
diagrams;  the  ability  to  change  attributes  such  a  color  and  dis¬ 
play  resolution;  and  font  type  and  size  in  text. 

•  Dynamic  media:  The  familiar  video  controls  of  stop,  start,  play, 
pause,  fast  forward  and  rewind,  as  well  as  the  ability  to  address 
a  particular  point  or  event  in  the  media  stream  by  a  time 
marker  or  an  index  (e.g.,  “go  to”  component/marker,  etc.). 

Navigation  controls  employ  standard  user  interface  compo¬ 
nents  (e.g.,  buttons,  dialogue  boxes,  menus,  icons,  and  sliders) 
and  techniques  (e.g.,  form  filling,  dialogue  boxes,  and  selection 
menus).  For  more  guidance,  see  ISO  9241,  parts  12,  14  &  17 
(International  Standards  Organization,  1997)  and  ISO  14915,  part  2 
(International  Standards  Organization,  1998). 

Design  for  Attention 

Having  selected  the  media  resources,  the  designer  must  now 
ensure  that  the  user  will  extract  the  appropriate  information.  An 
important  consideration  of  multimedia  design  is  to  link  the 
thread  of  a  message  across  several  different  media.  This  section 
gives  recommendations  on  planning  the  user’s  reading/view¬ 
ing  sequence,  and  guidelines  for  realizing  these  recommenda¬ 
tions  in  presentation  sequences,  hypermedia  dialogues,  and 
navigation  controls.  The  essential  differences  are  timing  and 
user  control.  In  a  presentation  design,  the  reading/viewing  se¬ 
quence  and  timing  are  set  by  the  designer,  whereas  the  read¬ 
ing/viewing  sequence  in  hypertext  implementation  and  inter¬ 
active  dialogues  is  under  user  control. 

Presentation  techniques  help  to  direct  the  user’s  attention  to 
important  information  and  specify  the  desired  order  of  reading 
or  viewing.  The  need  for  thematic  links  between  information 
components  is  specified,  and  attention-directing  techniques  are 
selected  to  implement  the  desired  effect. 

The  design  issues  are: 

•  To  plan  the  overall  thematic  thread  of  the  message. 

•  To  draw  the  user’s  attention  to  important  information. 

•  To  establish  a  clear  readingA-iewing  sequence. 
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•  To  provide  clear  links  when  the  theme  crosses  from  one 
medium  to  another. 

Design  for  attention  is  particularly  important  for  images. 
User  attention  to  time-varying  media  is  determined  by  the 
medium  itself  (e.g.,  we  have  little  choice  but  to  listen  to  speech 
or  to  view  animations  in  the  order  in  which  they  are  presented) . 
The  reading  sequence  is  directed  by  the  layout  of  text,  although 
this  is  culturally  dependent  (e.g.,  western  languages  read  left 
to  right,  Arabic  languages  in  the  opposite  direction).  Viewing  or¬ 
der  in  images,  however,  is  unpredictable  unless  the  design 
specifically  selects  the  user’s  attention. 

In  some  cases,  a  common  topic  may  be  sufficient  for  direct¬ 
ing  the  user’s  reading/viewing  sequence;  however,  when  the 
thread  is  important  or  hard  to  follow,  designed  effects  for  at¬ 
tention,  or  contact  points,  are  necessary  to  aid  the  user’s  per¬ 
ception.  The  term  contact  point  refers  to  a  reference  from  one 
medium  to  another  and  comes  from  the  research  of  Hegarty 
and  Just  (1993)  and  Narayanan  and  Hegarty  (1998),  which 
demonstrated  that  comprehension  is  improved  by  reinforcing 
the  links  between  information  in  different  media.  Two  types  of 
contact  point  are  distinguished: 

1.  Direct  contact  points:  Attention-directing  effects  are  imple¬ 
mented  in  both  the  source  and  destination  medium  (e.g.,  in 


the  text,  an  instruction  such  as,  “Look  at  the  oblong  compo¬ 
nent  in  Figure  1”  is  given  while  the  component  is  high¬ 
lighted).  Direct  contact  points  create  a  strong  cross  reference 
between  two  media,  but  can  become  intrusive  if  overused.  An 
example  of  a  direct  contact  point  is  shown  in  Fig.  20.9  (Fara¬ 
day  &  Sutcliffe,  1997a). 

2.  Indirect  contact  points  implement  an  attention-directing  ef¬ 
fect  only  in  the  source,  or  less  frequently  the  destination, 
medium  (e.g.,  “In  Figure  1,  the  assembly  is  shown”  is  spo¬ 
ken,  with  no  highlighting  being  used  in  the  image.).  Indirect 
contact  points  have  less  attention-directing  force  and  work 
by  temporal  sequencing  or  spatial  juxtapositoning.  Indirect 
contact  points  are  less  intrusive,  so  they  may  be  used  more 
frequently  without  becoming  disruptive. 

In  most  cases,  the  attentional  effect  in  a  direct  contact  point 
will  be  actuated  in  sequence,  although  occasionally,  both  effects 
may  be  presented  concurrently  if  the  order  of  the  association 
is  not  important.  In  hypermedia  implementations,  direct  con¬ 
tact  points  become  a  link  cue  in  the  source  medium  and  a  high¬ 
light  anchor  in  the  destination  medium. 

Multiple  contact  points  may  be  organized  in  a  logical  order 
to  follow  the  theme  and  connect  a  thread  of  topics  (Faraday  & 
Sutcliffe,  1998a,  1999).  For  instance,  in  a  biology  tutorial,  the 
explanation  of  parts  of  a  cell  is  organized  with  interleaved 


FIGURE  20.9.  Direct  contact  point  between  text  and  video  segments,  reinforced  by  speech. 
The  arrows  play  video  clips  linked  to  the  text  description.  The  user  can  either  play  the  whole  expla¬ 
nation  or  step  through  it  using  the  contact  point  links  (Faraday  &  Sutcliffe,  1997a). 
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speech  segments  and  a  diagram  describing  the  cell’s  compo¬ 
nents  top  to  bottom,  left  to  right.  Highlighting  techniques  lo¬ 
cate  each  component  in  turn,  following  the  order  of  the  spo¬ 
ken  explanation.  The  attention-directing  techniques  described 
in  the  following  section  are  used  to  implement  contact  points. 

The  design  problem  is  how  to  direct  the  user’s  attention  to 
the  appropriate  information  at  the  correct  level  of  detail.  Ini¬ 
tially,  users  will  tend  to  extract  information  from  images  at  the 
scene  level,  (e.g,  major  objects  will  be  identified  but  with  very 
little  descriptive  detail;  Treisman,  1988).  Regular  layout  grids 
help  design  composite  images  (Mullet  &  Sano,  1995)  and  en¬ 
courage  viewing  sequences  in  image  sets.  Alternatively,  the  win¬ 
dow  frame  can  be  set  to  control  which  parts  of  an  image  are 
viewed.  Larger  window  frames  will  be  attended  to  before 
smaller  areas.  A  list  of  the  key  components  that  the  user  needs 
to  focus  on  and  the  facts  that  should  be  extracted  is  checked 
against  the  initial  presentation  design  to  see  if  the  key  compo¬ 
nents  will  attract  sufficient  attention,  or  whether  the  user  is 
likely  to  be  confused  by  extraneous  detail. 

Still  image  media.  Highlighting  techniques  for  designed 
and  natural  images,  organized  in  approximate  power  of  their 
effect,  are  summarized  in  Table  20.2.  A  common  highlighting 
technique  will  pick  out  spatially  distributed  objects,  such  as 
when  all  related  objects  are  changed  to  the  same  color;  co¬ 
located  objects  can  be  grouped  by  using  a  common  color  or  tex¬ 
ture  for  their  background,  or  by  drawing  a  box  around  them. 
The  highlighted  area  will  set  the  granularity  of  the  user’s  atten¬ 
tion.  Captions  linked  to  objects  in  an  image  are  another  useful 
means  of  drawing  attention  and  providing  supplementary  in¬ 
formation  (e.g.,  identity).  Dynamic  revealing  of  captions  is  par¬ 
ticularly  effective  for  directing  the  user’s  viewing  sequence.  Se¬ 
quential  highlighting  is  also  useful  for  showing  the  pathways  or 
navigational  instructions. 

Moving  image  media.  Directing  attention  to  compo¬ 
nents  within  moving  images  is  difficult  because  of  the  dynamic 
nature  of  the  medium.  Design  of  film  and  video  is  an  extensive 


subject  in  its  own  right,  so  treatment  here  will  necessarily  be 
brief  The  following  design  advice  is  based  on  Hochberg  (1986). 
The  design  objectives,  as  for  other  media,  are  to  draw  the  user’s 
attention  to  key  components  within  the  video  or  animation. 

First,  the  content  should  be  structured  into  scenes  that  cor¬ 
respond  to  the  information  script.  To  structure  animation  se¬ 
quences  and  make  scene  boundaries  obvious,  use  a  cut,  wipe, 
or  dissolve  to  emphasize  that  a  change  in  the  content  structure 
has  taken  place;  however,  cuts  should  be  used  with  care  and 
continuity  maintained  between  the  two  sequences  if  they  are 
to  be  integrated.  Continuity  is  manifest  as  the  same  viewpoint 
and  subject  matter  in  two  contiguous  shots.  Change  in  back¬ 
ground  or  action  (e.g.,  an  individual  walking  left  in  one  clip  and 
walking  right  in  the  next)  is  quickly  noticed  as  a  change.  An  es¬ 
tablishing  shot  that  shows  the  whole  scene  helps  to  introduce 
a  new  sequence  and  provide  context.  To  provide  detail  of  a 
newly  introduced  object  or  context,  the  object  is  shown  filling 
the  frame  with  a  small  amount  of  surrounding  scene,  while  to 
imply  a  relationship  or  compare  two  objects,  a  tight  two-shot 
with  both  objects  together  in  the  same  frame  is  advisable. 

Linguistic  media  (text  and  speech).  As  with  moving 
image,  the  literature  is  extensive,  so  the  following  heuristics  are 
a  brief  summary;  see  Levie  and  Lentz  (1982)  for  more  detail. 
Text  may  be  structured  to  indicate  subsections  by  indentation, 
formatting  into  paragraphs  or  columns,  or  segmented  by  back¬ 
ground  color.  Bullet  points  or  numbered  sections  indicate  order 
more  formally,  such  as  for  procedures.  Different  voices  help  to 
structure  speech  while  also  attracting  attention.  If  language  is 
being  used  to  set  the  context  for  accompanying  media,  it  is  im¬ 
portant  that  the  correct  level  of  identification  is  set.  For  in¬ 
stance,  a  higher  level  concept,  or  the  whole  scene  in  an  accom¬ 
panying  image,  is  described  at  the  beginning  of  a  script,  and 
then  lower  level  topics  reset  the  user’s  focus.  Discourse  markers 
can  make  phrases  and  sentences  more  salient. 

Adding  contact  points  and  attention-directing  effects  com¬ 
pletes  the  design  process;  however,  as  with  all  user  interfaces, 
there  is  no  substitute  for  usability  testing.  Designs  are  con- 


Still  image: 
designed 
and  natural 

Moving  image 


Text 


Speech/sound 


TABLE  20.2.  Attention-Directing  Techniques  for  Different  Media 


Attention  Orienting  Techniques  in 
Approximate  Order  of  Power 

Movement  of  or  change  in  the  shape/size/color  of  an  object. 

Use  of  bold  outline.  Object  marked  with  a  symbol  (e.g.,  arrow) 
or  icon.  Draw  boundary,  use  color,  shape,  size,  or  texture  to 
distinguish  important  objects. 

Freeze  frame  followed  by  applying  a  still  image  highlight. 

Zoom,  close-up  shot  of  the  object.  Cuts,  wipes,  and 
dissolve  effects. 


Bold,  font  size,  type,  color,  or  underlining.  To  direct  attention 
to  larger  segments  of  text,  use  formatting,  bullet  points, 
subsections,  and/or  indentation. 

Familiar  voice.  Silence  followed  by  onset  of  sound.  Different 
voices,  or  a  change  in  voice  prosody  (tonality),  amplitude 
(loudness),  change  and  variations  in  pitch  (frequency),  voice 
rate,  change  source  direction,  and  alarm  sounds  (police  sirens). 


Notes 

Some  effects  may  compromise  natural  images  because 
they  overlay  the  background  image  with  new 
components  (e.g.,  arrows,  arcs,  icons).  Group  objects 
by  a  common  technique. 

Change  in  topographic  motion,  in  which  an  object 
moves  across  the  ground  of  an  image,  is  more 
effective  than  internal  movement  of  an  object's 
components.  Size  and  shape  may  be  less  effective 
for  highlighting  a  moving  object. 

Formatting  techniques  are  paragraphs;  headings/titles 
as  entry  points;  and  indents  to  show  hierarchical 
nesting,  with  bullet  points  and  lists. 

Voices  familiar  to  the  user  (e.g.,  close  relatives)  attract 
attention  over  nonfamiliar  speech.  Discourse  markers 
such  as  "Next,”  "Because,"  "So,"  and  so  forth  draw 
attention  to  subsequent  phrases. 
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structed  incrementally  by  iterations  of  design  and  evaluation 
that  checks  for  usability  using  standard  methods,  with  addi¬ 
tional  memory  and  comprehension  tests  for  multimedia.  There¬ 
fore,  when  testing  a  design,  ask  the  user  to  tell  you  what  they 
understood  the  message  to  be.  This  can  be  done  during  the  pre¬ 
sentation  with  a  think-aloud  protocol  to  check  that  users  did 
attend  to  key  items,  and  afterwards,  by  a  memory  test.  If  a  user 
cannot  remember  key  components  in  the  message,  then  the  de¬ 
sign  may  need  to  be  improved. 


CONCLUSIONS 


Multimedia  still  poses  many  issues  for  further  research.  The 
design  method  described  in  this  chapter,  coupled  with  user- 
centred  design,  can  improve  quality;  however,  there  is  still  a 
need  for  experts  to  create  specific  media  resources  (e.g.,  film/ 


video  and  audio  experts).  Furthermore,  considerable  research 
is  still  necessary  before  we  fully  understand  the  psychology  of 
multimedia  interaction.  Design  for  motivation  and  attractiveness 
is  still  poorly  understood,  and  personality  effects  in  media  may 
not  be  robust  when  usability  errors  impede  communication.  The 
process  by  which  people  extract  information  from  complex  im¬ 
ages  still  requires  extensive  research,  although  the  increasing 
number  of  eyetracking  studies  is  beginning  to  throw  some  light 
on  this  topic.  In  the  future,  language  and  multimodal  communi¬ 
cation  will  change  our  conception  of  multimedia  from  its  current 
CD-ROM  or  Web-based  form  into  conversational  and  multisen- 
sory  interfaces.  Multimedia  will  become  part  of  wearable  and 
ubiquitous  user  interfaces,  where  the  media  are  part  of  our 
everyday  environment.  Design  for  multisensory  communication 
will  treat  media  and  artifacts  (e.g.,  our  desks,  clothing,  walls  in 
our  homes)  as  a  continuum,  while  managing  the  diverse  inputs 
to  multimedia  from  creative  design,  technology,  and  usability  en¬ 
gineering  will  be  one  of  the  many  interesting  future  challenges. 
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WHAT  ARE  MULTIMODAL  SYSTEMS, 
AND  WHY  ARE  WE  BUILDING  THEM? 


Multimodal  systems  process  two  or  more  combined  user  input 
modes — such  as  speech,  pen,  touch,  manual  gestures,  gaze,  and 
head  and  body  movements — in  a  coordinated  manner  with 
multimedia  system  output.  This  class  of  systems  represents  a 
new  direction  for  computing,  and  a  paradigm  shift  away  from 
conventional  WIMP  interfaces.  Since  the  appearance  of  Bolt’s 
(1980)  “Put  That  There”  demonstration  system,  which  processed 
speech  in  parallel  with  touch-pad  pointing,  a  variety  of  new 
multimodal  systems  has  emerged.  This  new  class  of  interfaces 
aims  to  recognize  naturally  occurring  forms  of  human  language 
and  behavior,  which  incorporate  at  least  one  recognition-based 
technology  (e.g.,  speech,  pen,  vision).  The  development  of  novel 
multimodal  systems  has  been  enabled  by  the  myriad  input  and 
output  technologies  currently  becoming  available,  including  new 
devices  and  improvements  in  recognition-based  technologies. 
This  chapter  will  review  the  main  types  of  multimodal  interfaces, 
their  advantages  and  cognitive  science  underpinnings,  primary 
features  and  architectural  characteristics,  and  general  research  in 
the  field  of  multimodal  interaction  and  interface  design. 

The  growing  interest  in  multimodal  interface  design  is 
inspired  largely  by  the  goal  of  supporting  more  transparent, 
flexible,  efficient,  and  powerfully  expressive  means  of  human- 
computer  interaction.  Multimodal  interfaces  are  expected  to 
be  easier  to  learn  and  use,  and  are  preferred  by  users  for  many 
applications.  They  have  the  potential  to  expand  computing  to 
more  challenging  applications,  to  be  used  by  a  broader  spec¬ 
trum  of  everyday  people,  and  to  accommodate  more  adverse 
usage  conditions  than  in  the  past.  Such  systems  also  have  the 
potential  to  function  in  a  more  robust  and  stable  manner  than 
unimodal  recognition  systems  involving  a  single  recognition- 
based  technology,  such  as  speech,  pen,  or  vision. 

The  advent  of  multimodal  interfaces  based  on  recognition  of 
human  speech,  gazes,  gestures,  and  other  natural  behaviors  rep¬ 
resents  only  the  beginning  of  a  progression  toward  computa¬ 
tional  interfaces  capable  of  relatively  humanlike  sensory  percep¬ 
tion.  Such  interfaces  eventually  will  interpret  continuous  input 
from  a  large  number  of  different  visual,  auditory,  and  tactile  in¬ 
put  modes,  which  will  be  recognized  as  users  engage  in  everyday 
activities.  The  same  system  will  track  and  incorporate  information 
from  multiple  sensors  on  the  user’s  interface  and  surrounding 
physical  environment  in  order  to  support  intelligent  adaptation  to 
the  user,  task,  and  usage  environment.  Future  adaptive  multi- 
modal-multisensor  interfaces  have  the  potential  to  support  new 
functionality,  to  achieve  unparalleled  robustness,  and  to  perform 
flexibly  as  a  multifunctional  and  personalized  mobile  system. 


WHAT  TYPES  OF  MULTIMODAL  INTERFACES 
EXIST,  AND  WHAT  IS  THEIR  HISTORY 
AND  CURRENT  STATUS? 


Multimodal  systems  have  developed  rapidly  during  the  past 
decade,  with  steady  progress  toward  building  more  general  and 
robust  systems,  as  well  as  more  transparent  human  interfaces 


than  ever  before  (Benoit,  Martin,  Pelachaud,  Schomaker,  & 
Suhm,  2000;  Oviatt  et  al.,  2000).  Major  developments  have  oc¬ 
curred  in  the  hardware  and  software  needed  to  support  key 
component  technologies  incorporated  within  multimodal  sys¬ 
tems,  as  well  as  in  techniques  for  integrating  parallel  input 
streams.  Multimodal  systems  also  have  diversified  to  include 
new  modality  combinations,  including  speech  and  pen  input, 
speech  and  lip  movements,  speech  and  manual  gesturing,  and 
gaze  tracking  and  manual  input  (Benoit  &  Le  Goff,  1998;  Cohen 
et  al.,  1997;  Stork  &  Hennecke,  1995;  Turk  &  Robertson,  2000; 
Zhai,  Morimoto,  &  Ihde,  1999).  In  addition,  the  array  of  multi¬ 
modal  applications  has  expanded  extremely  rapidly  in  recent 
years.  Among  other  areas,  it  presently  includes  multimodal 
map-based  systems  for  mobile  and  in-vehicle  use;  multimodal 
browsers;  multimodal  interfaces  to  virtual  reality  systems  for 
simulation  and  training;  multimodal  person-identification/veri¬ 
fication  systems  for  security  purposes;  multimodal  medical,  ed¬ 
ucational,  military,  and  web-based  transaction  systems;  and 
multimodal  access  and  management  of  personal  information  on 
handhelds  and  cell  phones  (Cohen  &  McGee,  2004;  Iyengar, 
Nock,  &  Neti,  2003;  McGee,  2003;  Neti,  Iyengar,  Potamianos,  & 
Senior,  2000;  Oviatt,  2003;  Oviatt,  Flickner,  &  Darrell,  2004;  Ovi¬ 
att  &  Lunsford,  2005;  Oviatt  et  al.,  2000;  Pankanti,  Bolle,  &  Jain, 
2000;  Reithinger  et  al.,  2003). 

In  one  of  the  earliest  multimodal  concept  demonstrations. 
Bolt  had  users  sit  in  front  of  a  projection  of  “Dataland”  in  “the 
Media  Room”  (Negroponte,  1978).  Using  the  “Put  That  There” 
interface  (Bolt,  1980),  they  could  use  speech  and  pointing  on  an 
armrest-mounted  touchpad  to  create  and  move  objects  on  a 
2D  large-screen  display.  For  example,  the  user  could  issue  a 
command,  such  as  “Create  a  blue  square  there,”  with  the  in¬ 
tended  location  of  “there”  indicated  by  a  2D  cursor  mark  on 
the  screen.  Semantic  processing  was  based  on  the  user’s  spo¬ 
ken  input,  and  the  meaning  of  the  deictic  “there”  was  resolved 
by  processing  the  x/y  coordinate  indicated  by  the  cursor  at  the 
time  “there”  was  uttered.  Since  Bolt’s  early  prototype,  consid¬ 
erable  strides  have  been  made  in  developing  a  wide  variety  of 
different  types  of  multimodal  systems. 

Among  the  earliest  and  most  rudimentary  multimodal  systems 
were  ones  that  supported  speech  input  along  with  a  standard 
keyboard  and  mouse  interface.  Conceptually,  these  multimodal 
interfaces  represented  the  least  departure  from  traditional 
graphical  user  interfaces  (GUIs) .  Their  initial  focus  was  on  pro¬ 
viding  richer  natural  language  processing  to  support  greater  ex¬ 
pressive  power  for  the  user  when  manipulating  complex  visuals 
and  engaging  in  information  extraction.  As  speech  recognition 
technology  matured  during  the  late  1980s  and  1990s,  these  sys¬ 
tems  added  spoken  input  as  an  alternative  to  text  entry  via  the 
keyboard.  As  such,  they  represent  early  involvement  of  the  nat¬ 
ural  language  and  speech  communities  in  developing  the  tech¬ 
nologies  needed  to  support  new  multimodal  interfaces.  Among 
the  many  examples  of  this  type  of  multimodal  interface  are 
CUBRICON,  Georal,  Galaxy,  XTRA,  Shoptalk,  and  Miltalk  (Cohen 
et  al,  1989;  Kobsa  et  al,  1986;  Neal  &  Shapiro,  1991;  Seneff,  God- 
deau,  Pao,  &  Polifroni,  1996;  Siroux,  Guyomard,  Multon,  &  Re- 
mondeau,  1995;  Wahlster,  1991). 

Several  of  these  early  systems  were  multimodal-multimedia 
map  systems  to  which  a  user  could  speak  or  type  and  point  with 
a  mouse  to  extract  tourist  information  or  engage  in  military  sit- 
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uation  assessment  (Cohen  et  al.,  1989;  Neal  &  Shapiro,  1991; 
Seneff  et  al.,  1996;  Siroux  et  al.,  1995).  For  example,  using  the 
CUBRICON  system  a  user  could  point  to  an  object  on  a  map  and 
ask,  “Is  this  <  point  >  an  airbase?”  CUBRICON  was  an  expert 
system  with  extensive  domain  knowledge,  as  well  as  natural  lan¬ 
guage  processing  capabilities  that  included  referent  identifica¬ 
tion  and  dialogue  tracking  (Neal  &  Shapiro,  1991).  With  the 
Georal  system,  a  user  could  query  a  tourist-information  system 
to  plan  travel  routes  using  spoken  input  and  pointing  via  a 
touch-sensitive  screen  (Siroux  et  al.,  1995).  In  contrast,  the 
Shoptalk  system  permitted  users  to  interact  with  complex  graph¬ 
ics  representing  factory  production  flow  for  chip  manufacturing 
(Cohen  et  al,  1989).  Using  Shoptalk,  a  user  could  point  to  a  spe¬ 
cific  machine  in  the  production  layout  and  issue  the  command: 
“Show  me  all  the  times  when  this  machine  was  down.”  After  the 
system  delivered  its  answer  as  a  list  of  time  ranges,  the  user 
could  click  on  one  to  ask  the  follow-up  question:  “What  chips 
were  waiting  in  its  queue  then,  and  were  any  of  them  hot  lots?” 
Multimedia  system  feedback  was  available  in  the  form  of  a  text 
answer,  or  the  user  could  click  on  the  machine  in  question  to 
view  an  exploded  diagram  of  the  machine  queue’s  contents  dur¬ 
ing  that  time  interval. 

More  recent  multimodal  systems  have  moved  away  from 
processing  simple  mouse  or  touchpad  pointing,  and  have  be¬ 
gun  designing  systems  based  on  two  parallel  input  streams  that 
each  are  capable  of  conveying  rich  semantic  information.  These 
multimodal  systems  recognize  two  natural  forms  of  human  lan¬ 
guage  and  behavior,  for  which  two  recognition-based  technolo¬ 
gies  are  incorporated  within  a  more  powerful  bimodal  user  in¬ 
terface.  To  date,  systems  that  combine  either  speech  and  pen 
input  (Oviatt  &  Cohen,  2000)  or  speech  and  lip  movements 
(Benoit  et  al.,  2000;  Stork  &  Hennecke,  1995;  Rubin,  "Vatikiotis- 
Bateson,  &  Benoit,  1998;  Potamianos,  Neti,  Gravier,  &  Garg, 
2003)  constitute  the  two  most  mature  areas  within  the  field  of 
multimodal  research.  In  these  cases,  the  keyboard  and  mouse 
have  been  abandoned.  For  speech  and  pen  systems,  spoken  lan¬ 
guage  sometimes  is  processed  along  with  complex  pen-based 
gestural  input  involving  hundreds  of  different  symbolic  inter¬ 
pretations  beyond  pointing^  (Oviatt  et  al.,  2000).  For  speech 
and  lip  movement  systems,  spoken  language  is  processed  along 
with  corresponding  human  lip  movements  during  the  natural 
audio-visual  experience  of  spoken  interaction.  In  both  of  these 
sub-literatures,  considerable  work  has  been  directed  toward 
quantitative  modeling  of  the  integration  and  synchronization 
characteristics  of  the  two  rich  input  modes  being  processed, 
and  innovative  time-sensitive  architectures  have  been  devel¬ 
oped  to  process  these  patterns  in  a  robust  manner. 

Multimodal  systems  that  recognize  speech  and  pen-based 
gestures  first  were  designed  and  studied  in  the  early  1990s  (Ovi¬ 
att,  Cohen,  Fong,  &  Frank,  1992),  with  the  original  QuickSet  sys¬ 
tem  prototype  built  in  1994.  The  QuickSet  system  is  an  agent- 
based  collaborative  multimodal  system  that  runs  on  a  hand-held 
PC  (Cohen  et  al.,  1997).  With  QuickSet,  for  example,  a  user  can 
issue  a  multimodal  command  such  as  ‘Airstrips  .  .  .  facing  this 
way  < draws  arrow >,  and  facing  this  way  < draws  arrow>,” 


using  combined  speech  and  pen  input  to  place  the  correct 
number,  length,  and  orientation  (e.g.,  S’')((  NE)  of  aircraft  landing 
strips  on  a  map.  Other  research-level  systems  of  this  type  were 
built  in  the  late  1990s.  Examples  include  the  Human-centric 
Word  Processor,  Portable  "Voice  Assistant,  QuickDoc  and  MVIEWS 
(Bers,  Miller,  &  Makhoul,  1998;  Cheyer,  1998;  Oviatt  et  al.,  2000; 
Waibel,  Suhm,  Vo,  &  Yang,  1997).  These  systems  represent  a  va¬ 
riety  of  different  system  features,  applications,  information  fu¬ 
sion,  and  linguistic  processing  techniques.  Eor  illustration 
purposes,  a  comparison  of  five  different  speech  and  gesture  sys¬ 
tems  is  summarized  in  Fig.  21.1.  In  most  cases,  these  multi¬ 
modal  systems  jointly  interpreted  speech  and  pen  input  based 
on  a  frame-based  method  of  information  fusion  and  a  late  se¬ 
mantic  fusion  approach,  although  QuickSet  used  a  statistically- 
ranked  unification  process  and  a  hybrid  symbolic/statistical  ar¬ 
chitecture  (Wu,  Oviatt,  &  Cohen,  1999).  Other  recent  systems 
also  have  adopted  unification-based  multimodal  fusion  and  a 
hybrid  architectural  approach  for  processing  multimodal  input 
(Bangalore  &  Johnston,  2000;  Denecke  &  Yang,  2000;  Pfleger, 
2004;  Wahlster,  2001)  and  even  multimodal  output  (Kopp,  Tep- 
per,  &  Cassell,  2004). 

Multimodal  systems  that  process  speech  and  continuous  3D 
manual  gesturing  are  emerging  rapidly,  although  these  systems 
remain  less  mature  than  ones  that  process  2D  pen  input  (En- 
carnacao  &  Hettinger,  2003;  Flanagan  &  Huang,  2003;  Sharma, 
Pavlovic,  &  Huang,  1998;  Pavlovic,  Sharma,  &  Huang,  1997). 
This  primarily  is  because  of  the  significant  challenges  associated 
with  segmenting  and  interpreting  continuous  manual  move¬ 
ments,  compared  with  a  stream  of  xjy  ink  coordinates.  Because 
of  this  difference,  multimodal  speech  and  pen  systems  have  ad¬ 
vanced  more  rapidly  in  their  architectures,  and  have  progressed 
further  toward  commercialization  of  applications.  However,  a 
significant  cognitive  science  literature  is  available  for  guiding  the 
design  of  emerging  speech  and  3D-gesture  prototypes  (Con¬ 
don,  1988;  Kendon,  1980;  McNeill,  1992),  which  will  be  dis¬ 
cussed  further  later  in  this  chapter.  Among  the  earlier  systems 
to  begin  processing  manual  pointing  or  3D  gestures  combined 
with  speech  were  developed  by  Koons  and  colleagues  (Koons, 
Sparrell,  &  Thorisson,  1993),  Sharma  and  colleagues  (Sharma 
et  al.,  1996),  Poddar  and  colleagues  (Poddar,  Sethi,  Ozyildiz,  & 
Sharma,  1998) ,  and  by  Duncan  and  colleagues  (Duncan,  Brown, 
Esposito,  Holmback,  &Xue,  1999). 

Historically,  multimodal  speech  and  lip  movement  research 
has  been  driven  by  cognitive  science  interest  in  intersensory 
audio-visual  perception  and  the  coordination  of  speech  output 
with  lip  and  facial  movements  (Benoit  &  Le  Goff,  1998;  Bernstein 
&  Benoit,  1996;  Cohen  &  Massaro,  1993;  Massaro  &  Stork,  1998; 
McGrath  &  Summerfield,  1985;  McGurk  &  MacDonald,  1976; 
McLeod  &  Summerfield,  1987;  Robert-Ribes,  Schwartz,  Lallou- 
ache,  &  Escudier,  1998;  Sumby  &  Pollack,  1954;  Summerfield, 
1992;  Vatikiotis-Bateson,  Munhall,  Hirayama,  Lee,  &  Terzopou- 
los,  1996).  Among  the  many  contributions  of  this  literature  has 
been  a  detailed  classification  of  human  lip  movements  (visemes) 
and  the  viseme-phoneme  mappings  that  occur  during  articu¬ 
lated  speech.  Existing  systems  that  have  processed  combined 


^However,  other  recent  pen/voice  multimodal  systems  that  emphasize  mobile  processing,  such  as  MiPad  and  the  Field  Medic  Information  System 
(Holzman,  1999;  Huang  et  al.,  2000),  still  limit  pen  input  to  pointing. 
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FIGURE  21.1.  Examples  of  functionality,  architectural  features,  and  general  classification  of  different  speech  and  gesture  multimodal  applications. 
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speech  and  lip  movements  include  the  classic  work  by  Petajan 
(1984),  Brooke  and  Petajan  (1986),  and  others  (Adjoudani  & 
Benoit,  1995;  Bregler  &  Konig,  1994;  Silsbee  &  Su,  1996;  Tom¬ 
linson,  Russell,  &  Brooke,  1996).  Additional  examples  of  speech 
and  lip  movement  systems,  applications,  and  relevant  cognitive 
science  research  have  been  detailed  elsewhere  (Benoit  et  al., 
2000).  Researchers  in  this  area  have  been  actively  exploring 
adaptive  techniques  for  improving  system  robustness,  especially 
in  noisy  environmental  contexts  (Dupont  &  Luettin,  2000; 
Meier,  Hiirst,  &  Duchnowski,  1996;  Potamianos,  Neti,  Gravier,  & 
Garg,  2003;  Rogozan  &  Deglise,  1998),  which  is  an  important  fu¬ 
ture  research  direction.  Although  this  literature  has  not  empha¬ 
sized  the  development  of  applications,  nonetheless  its  quanti¬ 
tative  modeling  of  synchronized  phoneme/viseme  patterns  has 
been  used  to  build  animated  characters  that  generate  text-to- 
speech  output  and  coordinated  lip  movements.  These  new  ani¬ 
mated  characters  are  being  used  as  an  interface  design  vehicle 
for  facilitating  users’  multimodal  interaction  with  next-generation 
conversational  interfaces  (Cassell,  Sullivan,  Prevost,  &  Churchill, 
2000;  Cohen  &  Massaro,  1993). 

"While  the  main  multimodal  literatures  to  date  have  focused 
on  either  speech  and  pen  input  or  speech  and  lip  movements, 
recognition  of  other  modes  also  is  maturing  and  beginning  to 
be  integrated  into  new  kinds  of  multimodal  systems.  In  partic¬ 
ular,  there  is  growing  interest  in  designing  multimodal  inter¬ 
faces  that  incorporate  vision-based  technologies,  such  as  inter¬ 
pretation  of  gaze,  facial  expressions,  head  nodding,  gesturing, 
and  large  body  movements  (Flanagan  &  Huang,  2003;  Morency, 
Sidney  Lee,  &  Darrell,  2005;  Morimoto,  Koons,  Amir,  Flickner,  & 
Zhai,  1999;  Pavlovic,  Berry,  &  Huang,  1997;  Turk  &  Robertson, 
2000;  Zhai  et  al.,  1999).  These  technologies  unobtrusively  or 
passively  monitor  user  behavior  and  need  not  require  explicit 
user  commands  to  a  “computer.”  That  contrasts  with  active 
input  modes,  such  as  speech  or  pen,  which  the  user  deploys 
intentionally  as  a  command  issued  to  the  system  (see  Fig.  21.2). 
While  passive  modes  may  be  “attentive”  and  less  obtrusive,  ac¬ 
tive  modes  generally  are  more  reliable  indicators  of  user  intent. 

As  vision-based  technologies  mature,  one  important  future 
direction  will  be  the  development  of  blended  multimodal  in¬ 
terfaces  that  combine  both  passive  and  active  modes.  These  in¬ 
terfaces  typically  will  be  temporally  cascaded,  so  one  goal  in  de¬ 
signing  new  prototypes  will  be  to  determine  optimal  processing 
strategies  for  using  advance  information  from  the  first  mode 
(e.g.,  gaze)  to  constrain  accurate  interpretation  of  the  follow¬ 
ing  modes  (e.g.,  gesture,  speech).  This  kind  of  blended  multi¬ 
modal  interface  potentially  can  provide  users  with  greater  trans¬ 
parency  and  control,  while  also  supporting  improved  robustness 
and  broader  application  functionality  (Oviatt  &  Cohen,  2000; 
Zhai  et  al.,  1999).  As  this  collection  of  technologies  matures, 
there  also  is  strong  interest  in  designing  new  types  of  pervasive 
and  mobile  interfaces,  including  ones  capable  of  adaptive  pro¬ 
cessing  to  the  user  and  environmental  context. 

As  multimodal  interfaces  gradually  evolve  toward  supporting 
more  advanced  recognition  of  users’  natural  activities  in  con¬ 
text,  they  will  expand  beyond  rudimentary  bimodal  systems  to 
ones  that  incorporate  three  or  more  input  modes,  qualitatively 
different  modes,  and  more  sophisticated  models  of  multimodal 
interaction.  This  trend  already  has  been  initiated  within  bio¬ 
metrics  research,  which  has  combined  recognition  of  multiple 


behavioral  input  modes  (e.g.,  voice,  handwriting)  with  physio¬ 
logical  ones  (e.g.,  retinal  scans,  fingerprints)  to  achieve  reliable 
person  identification  and  verification  in  challenging  field  con¬ 
ditions  (Choudhury,  Clarkson,  Jebara,  &  Pentland,  1999;  Jain 
et  al.,  1999;  Jain  &  Ross,  2002;  Pankanti  et  al.,  2000). 

Apart  from  these  developments  within  research-level  sys¬ 
tems,  multimodal  interfaces  also  are  being  commercialized  as 
products,  especially  in  areas  like  personal  information  access 
and  management  on  handhelds  and  cell  phones.  Microsoft’s 
handheld  Mipad  for  personal  information  management,  and 
Kirusa’s  cell  phone  interface  for  directory  assistance,  messaging, 
and  so  on,  are  just  two  examples  of  the  many  mobile  commer¬ 
cial  products  that  are  being  developed  with  multimodal  inter¬ 
faces.  Both  include  spoken  language  processing  and  a  stylus  for 
tapping  on  fields  to  constrain  and  guide  the  natural  language 
processing.  In  some  cases,  keyboard  input  is  supported  as  a 
third  option,  as  well  as  multimedia  output  in  the  form  of  visual¬ 
izations  and  text-to-speech.  Another  visible  growth  area  for  mul¬ 
timodal  interfaces  involves  in-vehicle  control  of  navigation, 
communication,  and  entertainment  systems,  which  has  emerged 
in  both  domestic  and  import  cars.  Mobile  map-based  systems 
and  systems  for  safety-critical  medical  and  military  applications 
also  are  being  commercialized  by  companies  like  Natural  Inter¬ 
action  Systems  (e.g..  Rasa,  NISMap,  and  NISChart  applications), 
which  places  an  emphasis  on  developing  tangible  multimodal 
interfaces  that  preserve  users’  existing  work  practice,  minimize 
cognitive  load,  and  provide  backups  in  case  of  system  failure 
(Cohen  &  McGee,  2004;  McGee,  2003). 


WHAT  ARE  THE  GOALS  AND  ADVANTAGES 
OF  MULTIMODAL  INTERFACE  DESIGN? 


Over  the  past  decade,  numerous  advantages  of  multimodal  in¬ 
terface  design  have  been  documented.  Unlike  a  traditional  key- 
board-and-mouse  interface  or  a  unimodal  recognition-based 
interface,  multimodal  interfaces  permit  flexible  use  of  input 
modes.  This  includes  the  choice  of  which  modality  to  use  for 
conveying  different  types  of  information,  to  use  combined  input 
modes,  or  to  alternate  between  modes  at  any  time.  Since  indi¬ 
vidual  input  modalities  are  well  suited  in  some  situations,  and 
less  ideal  or  even  inappropriate  in  others,  modality  choice  is  an 
important  design  issue  in  a  multimodal  system.  As  systems  be¬ 
come  more  complex  and  multifunctional,  a  single  modality  sim¬ 
ply  does  not  permit  all  users  to  interact  effectively  across  all 
tasks  and  environments. 

Since  there  are  large  individual  differences  in  ability  and  pref¬ 
erence  to  use  different  modes  of  communication,  a  multimodal 
interface  permits  diverse  user  groups  to  exercise  selection  and 
control  over  how  they  interact  with  the  computer  (Fell  et  al., 
1994;  Karshmer  &  Blattner,  1998).  In  this  respect,  multimodal 
interfaces  have  the  potential  to  accommodate  a  broader  range 
of  users  than  traditional  interfaces,  including  users  of  different 
ages,  skill  levels,  native  language  status,  cognitive  styles,  sensory 
impairments,  and  other  temporary  illnesses  or  permanent 
handicaps.  For  example,  a  visually  impaired  user  or  one  with 
repetitive  stress  injury  may  prefer  speech  input  and  text-to- 
speech  output.  In  contrast,  a  user  with  a  hearing  impairment 
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Multimodal  interfaces  process  two  or  more  combined  user  input  modes —  such  as  speech,  pen, 
touch,  manual  gestures,  gaze,  and  head  and  body  movements —  in  a  coordinated  manner  with  multimedia 
system  output.  They  are  a  new  class  of  interfaces  that  aim  to  recognize  naturally  occurring  forms  of 
human  language  and  behavior,  and  which  incorporate  one  or  more  recognition-based  technologies  (e.g., 
speech,  pen,  vision). 

Active  input  modes  are  ones  that  are  deployed  by  the  user  intentionally  as  an  explicit  command  to  a 
computer  system  (e.g.,  speech). 

Passive  input  modes  refer  to  naturally  occurring  user  behavior  or  actions  that  are  recognized  by  a 
computer  (e.g.,  facial  expressions,  manual  gestures).  They  involve  user  input  that  is  unobtrusively  and 
passively  monitored,  without  requiring  any  explicit  command  to  a  computer. 

Blended  multimodal  interfaces  are  ones  that  incorporate  system  recognition  of  at  least  one 
passive  and  one  active  input  mode,  (e.g.,  speech  and  lip  movement  systems). 

Temporally-cascaded  multimodal  interfaces  are  ones  that  process  two  or  more  user  modalities 
that  tend  to  be  sequenced  in  a  particular  temporal  order  (e.g.,  gaze,  gesture,  speech),  such  that  partial 
information  supplied  by  recognition  of  an  earlier  mode  (e.g.,  gaze)  is  available  to  constrain  interpretation 
of  a  later  mode  (e.g.,  speech).  Such  interfaces  may  combine  only  active  input  modes,  only  passive  ones, 
or  they  may  be  blended. 

Mutual  disambiguation  involves  disambiguation  of  signal  or  semantic-level  information  in  one 
error-prone  input  mode  from  partial  information  supplied  by  another.  Mutual  disambiguation  can  occur  in 
a  multimodal  architecture  with  two  or  more  semantically  rich  recognition-based  input  modes.  It  leads  to 
recovery  from  unimodal  recognition  errors  within  a  multimodal  architecture,  with  the  net  effect  of 
suppressing  errors  experienced  by  the  user. 

Simultaneous  integrator  refers  to  a  user  who  habitually  presents  two  input  signals  (e.g.,  speech, 
pen)  in  a  temporally  overlapped  manner  when  communicating  multimodal  commands  to  a  system 

Sequential  integrator  refers  to  a  user  who  habitually  separates  their  multimodal  signals,  presenting 
one  before  the  other  with  a  brief  pause  intervening 

Multimodal  hypertiming  refers  to  the  fact  that  both  sequential  and  simultaneous  integrators  will 
further  accentuate  their  basic  multimodal  integration  pattern  when  under  duress  (e.g.,  as  task  difficulty  or 
system  recognition  errors  increase) 

Visemes  refers  to  the  detailed  classification  of  visible  lip  movements  that  correspond  with  consonants 
and  vowels  during  articulated  speech.  A  viseme-phoneme  mapping  refers  to  the  correspondence  between 
visible  lip  movements  and  audible  phonemes  during  continuous  speech. 

Feature-level  fusion  is  a  method  for  fusing  low-level  feature  information  from  parallel  input  signals 
within  a  multimodal  architecture,  which  has  been  applied  to  processing  closely  synchronized  input  such 
as  speech  and  lip  movements. 

Semantic-level  fusion  is  a  method  for  integrating  semantic  information  derived  from  parallel  input 
modes  in  a  multimodal  architecture,  which  has  been  used  for  processing  speech  and  gesture  input. 


FIGURE  21.2.  Multimodal  Interface  terminology. 


or  accented  speech  may  prefer  touch,  gesture,  or  pen  input. 
The  natural  alternation  between  modes  that  is  permitted  by  a 
multimodal  interface  also  can  be  effective  in  preventing  overuse 
and  physical  damage  to  any  single  modality,  especially  during 
extended  periods  of  computer  use  (Markinson^,  personal  com¬ 
munication,  1993). 

Multimodal  interfaces  also  provide  the  adaptability  that  is 
needed  to  accommodate  the  continuously  changing  conditions 
of  mobile  use.  In  particular,  systems  involving  speech,  pen,  or 
touch  input  are  suitable  for  mobile  tasks  and,  when  combined, 
users  can  shift  among  these  modalities  from  moment  to  mo¬ 


ment  as  environmental  conditions  change  (Holzman,  1999;  Ovi- 
att,  2000b,  2000c).  There  is  a  sense  in  which  mobility  can  induce 
a  state  of  temporary  disability,  such  that  a  person  is  unable  to 
use  a  particular  input  mode  for  some  period.  For  example,  the 
user  of  an  in-vehicle  application  may  frequently  be  unable  to  use 
manual  or  gaze  input,  although  speech  is  relatively  more  avail¬ 
able.  In  this  respect,  a  multimodal  interface  permits  the  modal¬ 
ity  choice  and  switching  that  is  needed  during  the  changing  en¬ 
vironmental  circumstances  of  actual  field  and  mobile  use. 

A  large  body  of  data  documents  that  multimodal  interfaces 
satisfy  higher  levels  of  user  preference  when  interacting  with 
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simulated  or  real  computer  systems.  Users  have  a  strong  pref¬ 
erence  toward  interacting  multimodally,  rather  than  unimodally, 
across  a  wide  variety  of  different  application  domains,  although 
this  preference  is  most  pronounced  in  spatial  domains  (Haupt¬ 
mann,  1989;  Oviatt,  1997).  For  example,  95%  to  100%  of  users 
preferred  to  interact  multimodally  when  they  were  free  to  use 
either  speech  or  pen  input  in  a  map-based  spatial  domain  (Ovi¬ 
att,  1997).  During  pen/voice  multimodal  interaction,  users  pre¬ 
ferred  speech  input  for  describing  objects  and  events,  sets  and 
subsets  of  objects,  out-of-view  objects,  conjoined  information, 
and  past  and  future  temporal  states,  and  for  issuing  commands 
for  actions  or  iterative  actions  (Cohen  &  Oviatt,  1995;  Oviatt  & 
Cohen,  1991).  However,  their  preference  for  pen  input  in¬ 
creased  when  conveying  digits,  symbols,  graphic  content,  and 
especially  when  conveying  the  location  and  form  of  spatially  ori¬ 
ented  information  on  a  dense  graphic  display  such  as  a  map 
(Oviatt  &  Olsen,  1994;  Oviatt,  1997;  Suhm,  1998).  Likewise,  71% 
of  users  combined  speech  and  manual  gestures  multimodally, 
rather  than  using  one  input  mode,  when  manipulating  graphic 
objects  on  a  CRT  screen  (Hauptmann,  1989). 

During  the  early  design  of  multimodal  systems,  it  was  as¬ 
sumed  that  efficiency  gains  would  be  the  main  advantage  of  de¬ 
signing  an  interface  multimodally,  and  that  this  advantage  would 
derive  from  the  ability  to  process  input  modes  in  parallel.  It  is 
true  that  multimodal  interfaces  sometimes  support  improved 
efficiency,  especially  when  manipulating  graphical  information. 
In  simulation  research  comparing  speech-only  with  multimodal 
pen/voice  interaction,  empirical  work  demonstrated  that  multi¬ 
modal  interaction  yielded  10%  faster  task-completion  time  dur¬ 
ing  visual-spatial  tasks,  but  no  significant  efficiency  advantage 
in  verbal  or  quantitative  task  domains  (Oviatt,  1997;  Oviatt,  Co¬ 
hen,  &  Wang,  1994).  Likewise,  users’  efficiency  improved  when 
they  combined  speech  and  gestures  multimodally  to  manipu¬ 
late  3D  objects,  compared  with  unimodal  input  (Hauptmann, 
1989).  In  another  early  study,  multimodal  speech-and-mouse  in¬ 
put  improved  efficiency  in  a  line-art  drawing  task  (Leatherby  & 
Pausch,  1992).  Finally,  in  a  study  that  compared  task-completion 
times  for  a  graphical  interface  versus  a  multimodal  pen/voice  in¬ 
terface,  military  domain  experts  averaged  four  times  faster  at 
setting  up  complex  simulation  scenarios  on  a  map  when  they 
were  able  to  interact  multimodally  (Cohen,  McGee,  &  Clow, 
2000) .  This  latter  study  was  based  on  testing  of  a  fully  functional 
multimodal  system,  and  it  included  time  required  to  correct 
recognition  errors. 

One  particularly  advantageous  feature  of  multimodal  inter¬ 
face  design  is  its  superior  error  handling,  both  in  terms  of  er¬ 
ror  avoidance  and  graceful  recovery  from  errors  (Oviatt  &  van 
Gent,  1996;  Oviatt,  Bernard,  &  Levow,  1999;  Oviatt,  1999a;  Rud- 
nicky  &  Hauptmann,  1992;  Suhm,  1998;  Tomlinson  et  al.,  1996). 
There  are  user-centered  and  system-centered  reasons  why  mul¬ 
timodal  systems  facilitate  error  recovery,  when  compared  with 
unimodal  recognition-based  interfaces.  For  example,  in  a  multi¬ 
modal  speech  and  pen-based  gesture  interface  users  will  select 
the  input  mode  that  they  judge  to  be  less  error  prone  for  par¬ 
ticular  lexical  content,  which  tends  to  lead  to  error  avoidance 
(Oviatt  &  van  Gent,  1996).  They  may  prefer  speedy  speech  input, 
but  will  switch  to  pen  input  to  communicate  a  foreign  surname. 
Secondly,  users’  language  often  is  simplified  when  interacting 
multimodally,  which  can  substantially  reduce  the  complexity  of 


natural  language  processing  and  thereby  reduce  recognition 
errors,  as  described  later  in  this  chapter  (Oviatt  &  Kuhn,  1998). 
In  one  study,  users’  multimodal  utterances  were  documented  to 
be  briefer,  to  contain  fewer  complex  locative  descriptions,  and 
50%  fewer  spoken  disfluencies,  when  compared  with  a  speech- 
only  interface.  Thirdly,  users  have  a  strong  tendency  to  switch 
modes  after  system  recognition  errors,  which  facilitates  error  re¬ 
covery.  This  error  resolution  occurs  because  the  confusion  ma¬ 
trices  differ  for  any  given  lexical  content  for  the  different  recog¬ 
nition  technologies  involved  in  processing  (Oviatt  et  al.,  1999; 
Oviatt,  2002). 

In  addition  to  these  user-centered  reasons  for  better  error 
avoidance  and  resolution,  there  also  are  system-centered  rea¬ 
sons  for  superior  error  handling.  A  well-designed  multimodal 
architecture  with  two  semantically  rich  input  modes  can  support 
mutual  disambiguation  of  input  signals.  For  example,  if  a  user 
says  “ditches”  but  the  speech  recognizer  confirms  the  singular 
“ditch”  as  its  best  guess,  then  parallel  recognition  of  several 
graphic  marks  can  result  in  recovery  of  the  correct  plural  inter¬ 
pretation.  This  recovery  can  occur  in  a  multimodal  architecture 
even  though  the  speech  recognizer  initially  ranks  the  plural  in¬ 
terpretation  “ditches”  as  a  less  preferred  choice  on  its  n-best  list. 
Mutual  disambiguation  involves  recovery  from  unimodal  recog¬ 
nition  errors  within  a  multimodal  architecture,  because  semantic 
information  from  each  input  mode  supplies  partial  disambigua¬ 
tion  of  the  other  mode,  thereby  leading  to  more  stable  and  ro¬ 
bust  overall  system  performance  (Oviatt,  1999a,  2000a,  2002). 
Another  example  of  mutual  disambiguation  is  shown  in  Fig.  21.3. 
To  achieve  optimal  error  handling,  a  multimodal  interface  ideally 
should  be  designed  to  include  complementary  input  modes,  and 


Speech  Reco  N-Best  Gesture  Reco  N-Best  Multimodal  N-Best 

1.00  CIVILIAN  1.00  line  1  1 .00  pan  IPANI  1 

2.00  EIGHT  1  2.00  arrow  1 

3.00  CIVILIANS 

1  4.00  PAN  1 

'^ain  ^ 

83 

bb 

3 

3 

82 

P:; 

81^ 

If 

m 

xrS 

% 

'^i\ 

w 

SI 

j 

• 

|PAN  J  ^ 

UcOir _ I _ ~  _  'hiOouSI f  [1 7.S>J.63335;e2672i 


FIGURE  21.3.  Multimodal  command  to  "pan”  the  map,  which  il¬ 
lustrates  mutual  disambiguation  occurring  between  incoming 
speech  and  gesture  information,  such  that  lexical  hypotheses 
were  pulled  up  on  both  «-best  lists  to  produce  a  correct  final 
multimodal  interpretation. 
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the  alternative  input  modes  should  provide  duplicate  functional¬ 
ity  such  that  users  can  accomplish  their  goals  using  either  mode. 

In  two  recent  studies  involving  over  4,600  multimodal  com¬ 
mands,  a  multimodal  architecture  was  found  to  support  mu¬ 
tual  disambiguation  and  error  suppression  ranging  between 
19%  and  41%  (Oviatt,  1999a,  2000a,  2002).  Improved  robustness 
also  was  greater  for  “challenging”  user  groups  (accented  vs.  na¬ 
tive  speakers)  and  usage  contexts  (mobile  vs.  stationary  use). 
These  results  indicate  that  a  well-designed  multimodal  system 
not  only  can  perform  more  robustly  than  a  unimodal  system, 
but  also  in  a  more  stable  way  across  varied  real-world  users  and 
usage  contexts.  Finally,  during  audio-visual  perception  of  speech 
and  lip  movements,  improved  speech  recognition  also  has  been 
demonstrated  for  both  human  listeners  (McLeod  &  Summer- 
field,  1987)  and  multimodal  systems  (Adjoudani  &  Benoit,  1995; 
Tomlinson  et  al.,  1996). 

Another  recent  focus  has  been  on  the  advantages  of  multi¬ 
modal  interface  design  for  minimizing  users’  cognitive  load.  As 
task  complexity  increases,  there  is  evidence  that  users  self- 
manage  their  working  memory  limits  by  distributing  informa¬ 
tion  across  multiple  modalities,  which  in  turn  enhances  their 
task  performance  during  both  perception  and  production 
(Calvert,  Spence,  &  Stein,  2004;  Mousavi,  Low,  &  Sweller,  1995; 
Oviatt,  1997;  Oviatt,  Coulston,  &  Lunsford,  2004;  Tang,  McLach- 
lan,  Lowe,  Saka,  &  MacLean,  2005).  These  predictions  and  find¬ 
ings  are  based  on  Wickens  and  colleagues’  cognitive  resource 
theory  and  Baddeley’s  theory  of  working  memory  (Baddeley 
1992;  Wickens,  Sandry  &  Vidulich,  1983).  The  latter  maintains 
that  short-term  or  working  memory  consists  of  multiple  inde¬ 
pendent  processors  associated  with  different  modes.  This  in¬ 
cludes  a  visual-spatial  “sketch  pad”  that  maintains  visual  materi¬ 
als  such  as  pictures  and  diagrams  in  one  area  of  working 
memory,  and  a  separate  phonological  loop  that  stores  auditory- 
verbal  information.  Although  these  two  processors  are  believed 
to  be  coordinated  by  a  central  executive,  in  terms  of  lower-level 
modality  processing  they  are  viewed  as  functioning  largely  in¬ 
dependently,  which  is  what  enables  the  effective  size  of  working 
memory  to  expand  when  people  use  multiple  modalities  during 
tasks  (Baddeley,  1992).  So  with  respect  to  management  of  cog¬ 
nitive  load,  the  inherent  flexibility  of  multimodal  interfaces  is 
well  suited  to  accommodating  the  high  and  changing  load  con 
ditions  typical  of  realistic  mobile  use. 


WHAT  METHODS  AND  INFORMATION 
HAVE  BEEN  USED  TO  DESIGN  NOVEL 
MULTIMODAL  INTERFACES? 


The  design  of  new  multimodal  systems  has  been  inspired  and 
organized  largely  by  two  things.  First,  the  cognitive  science  lit¬ 
erature  on  intersensory  perception  and  intermodal  coordina¬ 
tion  during  production  is  beginning  to  provide  a  foundation  of 
information  for  user  modeling,  as  well  as  information  on  what 
systems  must  recognize  and  how  multimodal  architectures 
should  be  organized.  For  example,  the  cognitive  science  litera¬ 
ture  has  provided  knowledge  of  the  natural  integration  patterns 
that  typify  people’s  lip  and  facial  movements  with  speech  output 
(Benoit,  Guiard-Marigny,  Le  Goff,  &  Adjoudani,  1996;  Ekman, 


1992;  Ekman  &  Friesen,  1978;  Fridlund,  1994;  Hadar,  Steiner, 
Grant,  &  Rose,  1983;  Massaro  &  Cohen,  1990;  Stork  &  Hen- 
necke,  1995;  Vatikiotis-Bateson  et  al.,  1996),  and  their  coordi¬ 
nated  use  of  manual  or  pen-based  gestures  with  speech 
(Kendon,  1980;  McNeill,  1992;  Oviatt,  DeAngeli,  &  Kuhn,  1997). 
Given  the  complex  nature  of  users’  multimodal  interaction,  cog¬ 
nitive  science  has  played  and  will  continue  to  play  an  essential 
role  in  guiding  the  design  of  robust  multimodal  systems.  In  this 
respect,  a  multidisciplinary  perspective  will  be  more  central  to 
successful  multimodal  system  design  than  it  has  been  for  tradi¬ 
tional  GUI  design.  The  cognitive  science  underpinnings  of  mul¬ 
timodal  system  design  are  described  later  in  this  chapter. 

Secondly,  high-fidelity  automatic  simulations  also  have 
played  a  critical  role  in  prototyping  new  types  of  multimodal  sys¬ 
tems  (Dahlback,  Jeonsson,  &  Ahrenberg,  1992;  Oviatt  et  al., 
1992).  'When  a  new  multimodal  system  is  in  the  planning  stages, 
design  sketches  and  low-fidelity  mock-ups  may  initially  be  used 
to  visualize  the  new  system  and  plan  the  sequential  flow  of 
human-computer  interaction.  These  tentative  design  plans  then 
are  rapidly  transitioned  into  a  higher-fidelity  simulation  of  the 
multimodal  system,  which  is  available  for  proactive  and  situ¬ 
ated  data  collection  with  the  intended  user  population.  High- 
fidelity  simulations  have  been  the  preferred  method  for  design¬ 
ing  and  evaluating  new  multimodal  systems,  and  extensive  data 
collection  with  such  tools  preferably  is  completed  before  a  fully 
functional  system  ever  is  built. 

During  high-fidelity  simulation  testing,  a  user  interacts  with 
what  she  believes  is  a  fully  functional  multimodal  system  al¬ 
though  the  interface  is  actually  a  simulated  front-end  designed  to 
appear  and  respond  as  the  fully  functional  system  would.  During 
the  interaction,  a  programmer  assistant  at  a  remote  location  pro¬ 
vides  the  simulated  system  responses.  As  the  user  interacts  with 
the  front  end,  the  programmer  tracks  her  multimodal  input  and 
provides  system  responses  as  quickly  and  accurately  as  possi¬ 
ble.  To  support  this  role,  the  programmer  makes  use  of  auto¬ 
mated  simulation  software  that  is  designed  to  support  interactive 
speed,  realism  with  respect  to  the  targeted  system,  and  other  im¬ 
portant  characteristics.  For  example,  with  these  automated  tools, 
the  programmer  may  be  able  to  make  a  single  selection  on  a 
workstation  field  to  rapidly  send  simulated  system  responses  to 
the  user  during  a  data-collection  session. 

High-fidelity  simulations  have  been  the  preferred  method  for 
prototyping  multimodal  systems  for  several  reasons.  Simulations 
are  relatively  easy  and  inexpensive  to  adapt,  compared  with 
building  and  iterating  a  complete  system.  They  also  permit  re¬ 
searchers  to  alter  a  planned  system’s  characteristics  in  major 
ways  (e.g.,  input  and  output  modes  available),  and  to  study  the 
impact  of  different  interface  features  in  a  systematic  and  scientific 
manner  (e.g.,  type  and  base-rate  of  system  errors).  In  compari¬ 
son,  a  particular  system  with  its  fixed  characteristics  is  a  less  flex¬ 
ible  and  suitable  research  tool,  and  the  assessment  of  any  single 
system  basically  amounts  to  an  individual  case  study.  Using  sim¬ 
ulation  techniques,  rapid  adaptation  and  investigation  of  planned 
system  features  permit  researchers  to  gain  a  broader  and  more 
principled  perspective  on  the  potential  of  newly  emerging  tech¬ 
nologies.  In  a  practical  sense,  simulation  research  can  assist  in  the 
evaluation  of  critical  performance  tradeoffs  and  in  making  deci¬ 
sions  about  alternative  system  designs,  which  designers  must  do 
as  they  strive  to  create  more  usable  multimodal  systems. 
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To  support  the  further  development  and  commercialization  of 
multimodal  systems,  additional  infrastructure  that  will  be  needed 
in  the  future  includes  (a)  simulation  tools  for  rapidly  building  and 
reconfiguring  multimodal  interfaces,  (b)  automated  tools  for  col¬ 
lecting  and  analyzing  multimodal  corpora,  and  (c)  automated 
tools  for  iterating  new  multimodal  systems  to  improve  their  per¬ 
formance  (see  Oviatt  et  al.,  2000,  for  further  discussion). 


WHAT  ARE  THE  COGNITIVE  SCIENCE 
UNDERPINNINGS  OF  MULTIMODAL 
INTERFACE  DESIGN? 


This  section  discusses  the  growing  cognitive  science  literature 
that  provides  the  empirical  underpinnings  needed  to  design 
next-generation  multimodal  interfaces.  The  ability  to  develop 
multimodal  systems  depends  on  knowledge  of  the  natural  inte¬ 
gration  patterns  that  typify  people’s  combined  use  of  different 
input  modes.  In  particular,  the  design  of  new  multimodal  sys¬ 
tems  depends  on  intimate  knowledge  of  the  properties  of  dif¬ 
ferent  modes  and  the  information  content  they  carry,  the  unique 
characteristics  of  multimodal  language  and  its  processability,  and 
the  integration  and  synchronization  characteristics  of  users’  multi¬ 
modal  interaction.  It  also  relies  on  accurate  prediction  of  when 
users  are  likely  to  interact  multimodally  and  how  alike  different 
users  are  in  their  specific  integration  patterns.  The  relevant  cog¬ 
nitive  science  literature  on  these  topics  is  very  extensive,  espe¬ 
cially  when  consideration  is  given  to  all  of  the  underlying  sensory 
perception  and  production  capabilities  involved  in  different  in¬ 
put  modes  currently  being  incorporated  in  new  multimodal  in¬ 
terfaces.  As  a  result,  this  section  will  be  limited  to  introducing  the 
main  cognitive  science  themes  and  findings  that  are  relevant  to 
the  more  common  types  of  multimodal  system. 

This  cognitive  science  foundation  also  has  played  a  key  role 
in  identifying  computational  “myths”  about  multimodal  inter¬ 
action,  and  replacing  these  misconceptions  with  contrary  em¬ 
pirical  evidence.  Figure  21.4  summarizes  10  common  myths 


Ten  Myths  of  Multimodal  Interaction 
Myth  #1:  If  you  build  a  multimodal  system,  users  will  interact  multimodally 
Myth  #2:  Speech  &  pointing  is  the  dominant  multimodal  integration  pattern 
Myth  #3:  Multimodal  input  involves  simultaneous  signals 

Myth  #4:  Speech  is  the  primary  input  mode  in  any  multimodal  system  that  includes  it 

Myth  #5:  Multimodal  language  does  not  differ  linguistically  from  unimodal  language 

Myth  #6:  Multimodal  integration  involves  redundancy  of  content  between  modes 

Myth  #7;  Individual  error-prone  recognition  technologies  combine  multimodally  to  produce  even 
greater  unreliability 

Myth  #8:  All  users '  multimodal  commands  are  integrated  in  a  uniform  way 
Myth  #9:  Different  input  modes  are  capable  of  transmitting  comparable  content 
Myth  #10;  Enhanced  efficiency  is  the  main  advantage  of  multimodal  systems 

(taken  from  Oviatt,  1999b) 


FIGURE  2 1 .4.  Ten  myths  of  multimodal  interaction:  Separating 
myth  from  empirical  reality. 


about  multimodal  interaction,  which  are  addressed  and  dis¬ 
cussed  in  more  detail  elsewhere  (Oviatt,  1999b).  As  such,  the  lit¬ 
erature  summarized  in  this  section  aims  to  provide  a  more  ac¬ 
curate  foundation  for  guiding  the  design  of  next-generation 
multimodal  systems. 

When  Do  Users  Interact  Multimodally? 

During  natural  interpersonal  communication,  people  are  always 
interacting  multimodally.  Of  course,  in  this  case  the  number  of 
information  sources  or  modalities  that  an  interlocutor  has  avail¬ 
able  to  monitor  is  essentially  unlimited.  However,  all  multi¬ 
modal  systems  are  constrained  in  the  number  and  type  of  in¬ 
put  modes  they  can  recognize.  Also,  a  user  can  compose  active 
input  during  human-computer  interaction  that  either  is  deliv¬ 
ered  multimodally  or  that  is  delivered  entirely  using  just  one 
mode.  That  is,  although  users  in  general  may  have  a  strong  pref¬ 
erence  to  interact  multimodally  rather  than  unimodally  this  is 
no  guarantee  that  they  will  issue  every  command  to  a  system 
multimodally,  given  the  particular  type  of  multimodal  interface 
available.  Therefore,  the  first  nontrivial  question  that  arises  dur¬ 
ing  system  processing  is  whether  a  user  is  communicating  uni¬ 
modally  or  multimodally. 

In  the  case  of  speech-  and  pen-based  multimodal  systems, 
users  typically  mix  unimodal  and  multimodal  expressions.  In 
one  study  involving  a  visual-spatial  domain,  users’  commands 
were  expressed  multimodally  2096  of  the  time,  with  others  just 
spoken  or  written  (Oviatt  et  al.,  1997).  In  contrast,  in  other  spa¬ 
tial  domains,  the  ratio  of  users’  multimodal  interaction  often  is 
65%  to  70%  (Oviatt,  1999b;  Oviatt  et  al.,  2004).  Predicting 
whether  a  user  will  express  a  command  multimodally  also  de¬ 
pends  on  the  type  of  action  she  is  performing.  In  particular, 
users  usually  express  commands  multimodally  when  describ¬ 
ing  spatial  information  about  the  location,  number,  size,  orien¬ 
tation,  or  shape  of  an  object.  In  one  study,  users  issued  multi¬ 
modal  commands  86%  of  the  time  when  they  had  to  add,  move, 
modify,  or  calculate  the  distance  between  objects  on  a  map  in  a 
way  that  required  specifying  spatial  locations  (Oviatt  et  al., 
1997).  They  also  were  moderately  likely  to  interact  multimodally 
when  selecting  an  object  from  a  larger  array,  for  example,  when 
deleting  a  particular  object  from  the  map.  However,  when  per¬ 
forming  general  actions  without  any  spatial  component,  such  as 
printing  a  map,  users  expressed  themselves  multimodally  less 
than  1%  of  the  time.  These  data  emphasize  that  future  multi¬ 
modal  systems  will  need  to  distinguish  between  instances  in 
which  users  are  and  are  not  communicating  multimodally,  so 
that  accurate  decisions  can  be  made  about  when  parallel  input 
streams  should  be  interpreted  jointly  versus  individually.  They 
also  suggest  that  knowledge  of  the  type  of  actions  to  be  in¬ 
cluded  in  an  application,  such  as  whether  the  application  entails 
manipulating  spatial  information,  should  influence  the  basic  de¬ 
cision  of  whether  to  build  a  multimodal  interface  at  all. 

Findings  from  a  more  recent  study  reveal  that  multimodal  in¬ 
terface  users  spontaneously  respond  to  dynamic  changes  in 
their  own  cognitive  load  by  shifting  to  multimodal  communica¬ 
tion  as  load  increases  with  task  difficulty  and  communicative 
complexity  (Oviatt,  Coulston,  &  Lunsford,  2004).  Given  a  flexi¬ 
ble  multimodal  interface,  users’  ratio  of  multimodal  (versus 
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unimodal)  interaction  increased  substantially  from  18.6%  when 
referring  to  established  dialogue  context  to  77.1%  when  re¬ 
quired  to  establish  a  new  context,  a  -1315%  relative  increase. 
Likewise,  the  ratio  of  users’  multimodal  interaction  increased 
significantly  as  the  tasks  became  more  difficult,  from  59.2%  dur¬ 
ing  low  difficulty  tasks,  to  65.5%  at  moderate  difficulty,  68.2%  at 
high,  and  75.0%  at  very  high  difficulty,  an  overall  relative  in¬ 
crease  of  -127%.  These  adaptations  in  multimodal  interaction 
levels  reflect  users’  efforts  to  self-manage  limitations  in  their 
working  memory  as  discourse-level  demands  and  task  com¬ 
plexity  increased.  As  discussed  earlier,  users  accomplished  this 
by  distributing  communicative  information  across  multiple 
modalities  in  a  manner  compatible  with  a  cognitive  load  theory 
of  multimodal  interaction.  This  interpretation  is  consistent  with 
Baddeley’s  theory  of  working  memory  (Baddeley  1992) ,  as  well 
as  the  growing  literatures  within  education  (Mousavi,  Low,  & 
Sweller,  1995;  Sweller,  1988),  linguistics  (Almor,  1999),  and 
multisensory  perception  (Calvert,  Spence,  &  Stein,  2004;  Ernst 
&  Bulthoff,  2004).  Recent  work  on  visual  and  haptic  processing 
under  workload  also  indicates  that  presentation  of  haptic  feed¬ 
back  during  a  complex  task  can  augment  users’  ability  to  han¬ 
dle  visual  information  overload  (Tang,  McLachlan,  Lowe,  Saka,  & 
MacLean,  2005). 

In  a  multimodal  interface  that  processes  passive  or  blended 
input  modes,  there  always  is  at  least  one  passively  tracked  in¬ 
put  source  providing  continuous  information  (e.g.,  gaze  track¬ 
ing,  head  position).  In  these  cases,  all  user  input  would  by  defi¬ 
nition  be  classified  as  multimodal,  and  the  primary  problem 
would  become  segmentation  and  interpretation  of  each  con¬ 
tinuous  input  stream  into  meaningful  actions  of  significance  to 
the  application.  In  the  case  of  blended  multimodal  interfaces 
(e.g.,  gaze  tracking  and  mouse  input),  it  still  may  be  opportune 
to  distinguish  active  forms  of  user  input  that  might  be  more  ac¬ 
curately  or  expeditiously  handled  as  unimodal  events. 

What  Are  the  Integration  and  Synchronization 
Characteristics  of  Users'  Multimodal  Input? 

The  past  literature  on  multimodal  systems  has  focused  largely 
on  simple  selection  of  objects  or  locations  in  a  display,  rather 
than  considering  the  broader  range  of  multimodal  integration 
patterns.  Since  the  development  of  Bolt’s  (1980)  “Put  That 
There”  system,  speak-and-point  has  been  viewed  as  the  proto¬ 
typical  form  of  multimodal  integration.  In  Bolt’s  system,  seman¬ 
tic  processing  was  based  on  spoken  input,  but  the  meaning  of 
a  deictic  term  such  as  “that”  was  resolved  by  processing  the  xly 
coordinate  indicated  by  pointing  at  an  object.  Since  that  time, 
other  multimodal  systems  also  have  attempted  to  resolve  deic¬ 
tic  expressions  using  a  similar  approach — for  example,  using 
gaze  location  instead  of  manual  pointing  (Koons  et  al.,  1993). 

Unfortunately,  this  concept  of  multimodal  interaction  as 
point-and-speak  makes  only  limited  use  of  new  input  modes 
for  selection  of  objects — just  as  the  mouse  does.  In  this  respect, 
it  represents  the  persistence  of  an  old  mouse-oriented  meta¬ 
phor.  In  contrast,  modes  that  transmit  written  input,  manual 
gesturing,  and  facial  expressions  are  capable  of  generating  sym¬ 
bolic  information  that  is  much  more  richly  expressive  than  sim¬ 
ple  pointing  or  selection.  In  fact,  studies  of  users’  integrated 


pen/voice  input  indicate  that  a  speak-and-point  pattern  only 
comprises  14%  of  all  spontaneous  multimodal  utterances  (Ovi- 
att  et  al.,  1997).  Instead,  pen  input  more  often  is  used  to  create 
graphics,  symbols  and  signs,  gestural  marks,  digits,  and  lexical 
content.  During  interpersonal  multimodal  communication,  lin¬ 
guistic  analysis  of  spontaneous  manual  gesturing  also  indicates 
that  simple  pointing  accounts  for  less  than  20%  of  all  gestures 
(McNeill,  1992).  Together,  these  cognitive  science  and  user¬ 
modeling  data  highlight  the  fact  that  any  multimodal  system  de¬ 
signed  exclusively  to  process  speak-and-point  will  fail  to  provide 
users  with  much  useful  functionality.  For  this  reason,  specialized 
algorithms  for  processing  deictic-point  relations  will  have  only 
limited  practical  use  in  the  design  of  future  multimodal  systems. 
It  is  clear  that  a  broader  set  of  multimodal  integration  issues 
needs  to  be  addressed  in  future  work.  Future  research  also 
should  explore  typical  integration  patterns  between  other 
promising  modality  combinations,  such  as  speech  and  gaze. 

It  also  is  commonly  assumed  that  any  signals  involved  in  a 
multimodal  construction  will  co-occur  temporally.  The  pre¬ 
sumption  is  that  this  temporal  overlap  then  determines  which 
signals  to  combine  during  system  processing.  In  the  case  of 
speech  and  manual  gestures,  successful  processing  of  the  deic¬ 
tic  term  “that  square”  in  Bolt’s  original  system  relied  on  inter¬ 
pretation  of  pointing  when  the  word  “that”  was  spoken  in  or¬ 
der  to  extract  the  intended  referent.  However,  one  empirical 
study  indicated  that  users  often  do  not  speak  deictic  terms  at  all, 
and  when  they  do,  the  deictic  frequently  is  not  overlapped  in 
time  with  their  pointing.  In  fact,  it  has  been  estimated  that  as 
few  as  25%  of  users’  commands  actually  contain  a  spoken  deic¬ 
tic  that  overlaps  with  the  pointing  needed  to  disambiguate  its 
meaning  (Oviatt  et  al.,  1997). 

Beyond  the  issue  of  deixis,  a  series  of  studies  has  shown  that 
users’  input  frequently  does  not  overlap  at  all  during  multi¬ 
modal  commands  to  a  computer  (Oviatt,  1999b;  Oviatt  et  al., 
2003;  Oviatt  et  al.,  2005;  Xiao,  Girand,  &  Oviatt,  2002;  Xiao, 
Lunsford,  Coulston,  "Wesson,  &  Oviatt,  2003).  In  fact,  there  are 
two  distinct  types  of  user  with  respect  to  integration  patterns: 
simultaneous  integrators  and  sequential  ones.  A  user  who  ha¬ 
bitually  integrates  her  speech  and  pen  input  in  a  simultaneous 
manner  overlaps  them  temporally,  whereas  a  sequential  inte¬ 
grator  finishes  one  mode  before  beginning  the  second,  as  sum¬ 
marized  in  Fig.  21.2.  These  two  types  of  user  integration  pat¬ 
terns  occur  across  the  lifespan  from  children  through  the 
elderly  (Oviatt  et  al.,  2005;  Xiao  et  al.,  2002;  Xiao  et  al.,  2003). 
They  also  can  be  detected  almost  immediately  during  multi¬ 
modal  interaction,  usually  on  the  very  first  input.  Users’  habitual 
integration  pattern  remains  strikingly  highly  consistent  during  a 
session,  as  well  as  resistant  to  change  following  explicit  instruc¬ 
tions  or  attempts  at  training  (Oviatt  et  al.,  2003;  Oviatt  et  al., 
2005).  This  bimodal  distribution  of  user  integration  patterns  has 
been  observed  in  different  task  domains  (e.g.,  map-based  real 
estate  selection,  crisis  management,  educational  applications 
with  animated  characters),  and  also  when  using  different  types 
of  interface  (e.g.,  conversational,  command  style)  (Oviatt, 
1999b;  Xiao  et  al.,  2002;  Xiao  et  al.,  2003).  In  short,  empirical 
studies  have  demonstrated  that  this  bimodal  distinction  be¬ 
tween  users  in  their  fundamental  integration  pattern  general¬ 
izes  widely  across  different  age  groups,  task  domains,  and  types 
of  interface. 
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One  interesting  discovery  in  recent  work  is  the  phenome¬ 
non  of  multimodal  hypertiming,  which  refers  to  the  fact  that 
both  sequential  and  simultaneous  integrators  will  entrench  fur¬ 
ther  or  accentuate  their  habitual  multimodal  integration  pat¬ 
tern  (e.g. ,  increasing  their  intermodal  lag  during  sequential  in¬ 
tegrations,  or  overlap  during  simultaneous  integrations,  as 
summarized  in  21.2)  during  system  error  handling  or  when 
completing  increasingly  difficult  tasks.  In  fact,  users  will  pro¬ 
gressively  increase  their  degree  of  entrenchment  by  18%  as  sys¬ 
tem  errors  increase,  and  by  59%  as  task  difficulty  increases 
(Oviatt  et  al.,  2003).  As  such,  changes  in  the  degree  of  users’ 
multimodal  hypertiming  provide  a  potentially  sensitive  means 
of  evaluating  their  cognitive  load  during  real-time  interactive 
exchanges.  In  the  context  of  system  error  handling,  the  phe¬ 
nomenon  of  multimodal  hypertiming  basically  replaces  the 
hyperarticulation  that  is  typically  observed  in  users  during  error- 
prone  speech-only  interactions. 

Given  the  bimodal  distribution  of  user  integration  patterns, 
adaptive  temporal  thresholds  potentially  could  support  more 
tailored  and  flexible  approaches  to  fusion.  Ideally,  an  adaptive 
multimodal  system  would  detect,  automatically  learn,  and  adapt 
to  a  user’s  dominant  multimodal  integration  pattern,  which 
could  result  in  substantial  improvements  in  system  processing 
speed,  accuracy  of  interpretation,  and  synchronous  interchange 
with  the  user.  For  example,  it  has  been  estimated  that  system 
delays  could  be  reduced  to  approximately  40%  to  50%  of  what 
they  currently  are  by  adopting  user-defined  thresholds  (Oviatt 
et  al.,  2005).  Recent  research  has  begun  comparing  different 
learning-based  models  for  adapting  a  multimodal  system’s  tem¬ 
poral  thresholds  to  an  individual  user  in  real  time  (Huang  &  Ovi¬ 
att,  in  press). 

Unfortunately,  users’  multimodal  integration  patterns  have 
not  been  studied  as  extensively  or  systematically  for  other  in¬ 
put  modes,  such  as  speech  and  manual  gesturing.  Linguistics  re¬ 
search  on  interpersonal  communication  patterns  has  revealed 
that  both  spontaneous  gesturing  and  signed  language  often  pre¬ 
cede  their  spoken  lexical  analogues  during  human  communica¬ 
tion  (Kendon,  1980;  Naughton,  1996),  when  considering  word- 
level  integration  pattern.  In  fact,  the  degree  to  which  gesturing 
precedes  speech  is  greater  in  topic-prominent  languages  such 
as  Chinese  than  it  is  in  subject-prominent  ones  like  Spanish  or 
English  (McNeill,  1992).  Even  in  the  speech  and  lip-movement 
literature,  close  but  not  perfect  temporal  synchrony  is  typical, 
with  lip  movements  occurring  a  fraction  of  a  second  before  the 
corresponding  auditory  signal  (Abry,  Lallouache,  &  Cathiard, 
1996;  Benoit,  2000).  However,  when  considering  the  whole  user 
utterance  as  the  unit  of  analysis,  some  other  studies  of  speech 
and  manual  gesturing  have  found  a  higher  rate  of  simultaneity 
for  these  modes  (Epps,  Oviatt,  &  Chen,  2004).  Learning-based 
approaches  that  are  capable  of  accurately  identifying  and  adapt¬ 
ing  to  different  multimodal  integration  patterns,  whether  due  to 
differences  among  users,  modality  combinations,  or  applica¬ 
tions  and  usage  contexts,  will  be  required  in  order  to  general¬ 
ize  and  speed  up  multimodal  system  development  in  the  future. 

In  short,  although  two  input  modes  may  be  highly  inter¬ 
dependent  and  synchronized  during  multimodal  interaction, 
synchrony  does  not  imply  simultaneity.  The  empirical  evidence 
reveals  that  multimodal  signals  often  do  not  co-occur  temporally 
at  all  during  human-computer  or  natural  human  communica¬ 


tion.  Therefore,  multimodal  system  designers  cannot  necessar¬ 
ily  count  on  conveniently  overlapped  signals  in  order  to  achieve 
successful  processing  in  the  multimodal  architectures  they 
build.  Future  research  needs  to  explore  the  integration  patterns 
and  temporal  cascading  that  can  occur  among  three  or  more 
input  modes — such  as  gaze,  gesture,  and  speech — so  that  more 
advanced  multimodal  systems  can  be  designed  and  prototyped. 

In  the  design  of  new  multimodal  architectures,  it  is  important 
to  note  that  data  on  the  order  of  input  modes  and  average  time 
lags  between  input  modes  has  been  used  to  determine  the  like¬ 
lihood  that  an  utterance  is  multimodal  versus  unimodal,  and  to 
establish  temporal  thresholds  for  fusion  of  input.  In  the  future, 
weighted  likelihoods  associated  with  different  utterance  seg¬ 
mentations,  for  example,  that  an  input  stream  containing 
speech,  writing,  speech  should  be  segmented  into  [S/W  S]  rather 
than  [S  ’W/S] ,  and  with  intermodal  time  lag  distributions,  will  be 
used  to  optimize  correct  recognition  of  multimodal  user  input 
(Oviatt,  1999b).  In  the  design  of  future  time-critical  multimodal 
architectures,  data  on  users’  integration  and  synchronization  pat¬ 
terns  will  need  to  be  collected  for  other  mode  combinations  dur¬ 
ing  realistic  interactive  tasks,  so  that  temporal  thresholds  can  be 
established  for  performing  multimodal  fusion. 

What  Individual  Differences  Exist  in  Multimodal 
Interaction,  and  What  Are  the  Implications 
for  Designing  Systems  for  Universal  Access? 

There  are  large  individual  differences  in  users’  multimodal  in¬ 
teraction  patterns,  beginning  with  their  overall  preference  to  in¬ 
teract  unimodally  versus  multimodally,  and  which  mode  they 
generally  prefer  (e.g.,  speaking  versus  writing)  (Oviatt  et  al., 
2004).  As  outlined  above,  there  likewise  are  striking  differences 
among  users  in  adopting  either  a  sequential  or  simultaneous 
multimodal  integration  pattern.  Recent  research  has  revealed 
that  these  two  patterns  are  associated  with  behavioral  and  lin¬ 
guistic  differences  between  the  groups  (Oviatt  et  al.,  2005). 
■Whereas  in  an  interactive  task  context  their  performance  speed 
was  comparable,  sequential  integrators  were  far  less  error-prone 
and  excelled  during  new  or  complex  tasks.  Although  their 
speech  rate  was  no  slower,  sequential  integrators  also  had  more 
precise  articulation  (e.g.,  less  disfluency).  Finally,  sequential  in¬ 
tegrators  were  more  likely  to  adopt  terse  and  direct  command- 
style  language,  with  a  smaller  and  less  varied  vocabulary,  which 
appeared  focused  on  achieving  error-free  communication. 
These  user  differences  in  interaction  patterns  have  been  inter¬ 
preted  as  deriving  from  fundamental  differences  among  users  in 
their  reflective-impulsive  cognitive  style  (Oviatt  et  al.,  2005). 
Based  on  this  work,  one  goal  of  future  multimodal  interface  de¬ 
sign  will  be  to  support  the  poorer  attention  span  and  higher  er¬ 
ror  rate  of  impulsive  users — especially  for  mobile  in-vehicle,  mil¬ 
itary,  and  similar  application  contexts  in  which  the  cost  of 
committing  errors  is  unacceptably  high. 

Apart  from  these  individual  differences,  cultural  differences 
also  have  been  documented  between  users  in  modality  integra¬ 
tion  patterns.  For  example,  substantial  individual  differences  have 
been  reported  in  the  temporal  synchrony  between  speech  and 
lip  movements  (Kricos,  1996)  and,  in  addition,  lip  movements 
during  speech  production  are  known  to  be  less  exaggerated 
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among  Japanese  speakers  than  Americans  (Sekiyama  &  Tohkura, 
1991).  In  fact,  extensive  inter-language  differences  have  been 
observed  in  the  information  available  from  lip  movements  dur¬ 
ing  audio-visual  speech  (Fuster-Duran,  1996).  These  findings 
have  implications  for  the  degree  to  which  disambiguation  of 
speech  can  be  achieved  through  lip  movement  information  in 
noisy  environments  or  for  different  user  populations.  Finally, 
nonnative  speakers,  the  hearing  impaired,  and  elderly  listeners 
all  are  more  influenced  by  visual  lip  movement  than  auditory 
cues  when  processing  speech  (Fuster-Duran,  1996;  Massaro, 
1996).  These  results  have  implications  for  the  design  and  ex¬ 
pected  value  of  audio-visual  multimedia  output  for  different 
user  groups  in  animated  character  interfaces.  With  respect  to 
support  for  universal  access,  recent  work  also  has  shown  the  ad¬ 
vantage  of  combined  audio-visual  processing  for  recognition  of 
impaired  speech  (Potamianos  &  Neti,  2001). 

Finally,  gender,  age,  and  other  individual  differences  are 
common  in  gaze  patterns,  as  well  as  speech  and  gaze  integra¬ 
tion  (Argyle,  1972).  As  multimodal  interfaces  incorporating  gaze 
become  more  mature,  further  research  will  need  to  explore 
these  gender  and  age-specific  patterns,  and  to  build  appropri¬ 
ately  adapted  processing  strategies.  In  summary,  considerably 
more  research  is  needed  on  multimodal  integration  and  syn¬ 
chronization  patterns  for  new  mode  combinations,  as  well  as  for 
diverse  and  disabled  users  for  whom  multimodal  interfaces  may 
be  especially  suitable  for  ensuring  universal  access. 


Is  Complementarity  or  Redundancy 
the  Main  Organizational  Theme 
That  Guides  Multimodal  Integration? 

It  frequently  is  claimed  that  the  propositional  content  conveyed 
by  different  modes  during  multimodal  communication  contains 
a  high  degree  of  redundancy.  However,  the  dominant  theme  in 
users’  natural  organization  of  multimodal  input  actually  is  com¬ 
plementarity  of  content,  not  redundancy.  For  example,  speech 
and  pen  input  consistently  contribute  different  and  comple¬ 
mentary  semantic  information,  with  the  subject,  verb,  and  object 
of  a  sentence  typically  spoken,  and  locative  information  written 
(Oviatt  et  al.,  1997).  In  fact,  a  major  complementarity  between 
speech  and  manually  oriented  pen  input  involves  visual-spatial 
semantic  content,  which  is  one  reason  these  modes  are  an  op¬ 
portune  combination  for  visual-spatial  applications.  Whereas 
spatial  information  is  uniquely  and  clearly  indicated  via  pen  in¬ 
put,  the  strong  descriptive  capabilities  of  speech  are  better 
suited  for  specifying  temporal  and  other  nonspatial  information. 
Even  during  multimodal  correction  of  system  errors,  when  users 
are  highly  motivated  to  clarify  and  reinforce  their  information  de¬ 
livery,  speech  and  pen  input  express  redundant  information  less 
than  1%  of  the  time.  Finally,  during  interpersonal  communication 
linguists  also  have  documented  that  spontaneous  speech  and 
manual  gesturing  involve  complementary  rather  than  duplicate 
information  between  modes  (McNeill,  1992). 

Other  examples  of  primary  multimodal  complementarities 
during  interpersonal  and  human-computer  communication 
have  been  described  in  past  research  (McGurk  &  MacDonald, 


1976;  Oviatt  &  Olsen,  1994;  Wickens  et  al.,  1983).  For  example, 
in  the  literature  on  multimodal  speech  and  lip  movements,  nat¬ 
ural  feature-level  complementarities  have  been  identified  be¬ 
tween  visemes  and  phonemes  for  vowel  articulation,  with  vowel 
rounding  better  conveyed  visually,  and  vowel  height  and  back- 
ness  better  revealed  auditorally  (Massaro  &  Stork,  1998;  Robert- 
Ribes  et  al.,  1998). 

In  short,  actual  data  highlight  the  importance  of  comple¬ 
mentarity  as  a  major  organizational  theme  during  multimodal 
communication.  The  designers  of  next-generation  multimodal 
systems  therefore  should  not  expect  to  rely  on  duplicated  in¬ 
formation  when  processing  multimodal  language,  although  in 
certain  contexts  (such  as  teaching)  a  greater  percentage  of  du¬ 
plicate  content  than  usual  may  be  expected  to  exist.  In  multi¬ 
modal  systems  involving  both  speech  and  pen-based  gestures 
and  speech  and  lip  movements,  one  explicit  goal  has  been  to  in¬ 
tegrate  complementary  modalities  in  a  manner  that  yields  a  syn¬ 
ergistic  blend,  such  that  each  mode  can  be  capitalized  upon  and 
used  to  overcome  weaknesses  in  the  other  mode  (Cohen  et  al., 
1989).  This  approach  to  system  design  has  promoted  the  phi¬ 
losophy  of  using  modes  and  component  technologies  to  their 
natural  advantage,  and  of  combining  them  in  a  manner  that  per¬ 
mits  mutual  disambiguation.  One  advantage  of  achieving  such 
a  blend  is  that  the  resulting  multimodal  architecture  can  func¬ 
tion  more  robustly  than  an  individual  recognition-based  tech¬ 
nology  or  a  multimodal  system  based  on  input  modes  lacking 
natural  complementarities. 


What  Are  the  Primary  Features  of  Multimodal  Language? 

Communication  channels  can  be  tremendously  influential  in 
shaping  the  language  transmitted  within  them.  From  past  re¬ 
search,  there  now  is  cumulative  evidence  that  many  linguistic 
features  of  multimodal  language  are  qualitatively  very  different 
from  that  of  spoken  or  formal  textual  language.  In  fact,  it  can  dif¬ 
fer  in  features  as  basic  as  brevity,  semantic  content,  syntactic 
complexity,  word  order,  disfluency  rate,  degree  of  ambiguity, 
referring  expressions,  specification  of  determiners,  anaphora, 
detxis,  and  linguistic  indirectness.  In  many  respects,  multimodal 
language  is  simpler  linguistically  than  spoken  language.  In  par¬ 
ticular,  comparisons  have  revealed  that  the  same  user  complet¬ 
ing  the  same  map-based  task  communicates  significantly  fewer 
words,  briefer  sentences,  and  fewer  complex  spatial  descrip¬ 
tions  and  disfluencies  when  interacting  multimodally  compared 
with  using  speech  alone  (Oviatt,  1997).  One  implication  of  these 
findings  is  that  multimodal  interface  design  has  the  potential 
to  support  more  robust  future  systems  than  a  unimodal  design 
approach.  The  following  is  an  example  of  a  typical  user’s  spoken 
input  while  attempting  to  designate  an  open  space  using  a  map 
system:  ‘Add  an  open  space  on  the  north  lake  to  b — include 
the  north  lake  part  of  the  road  and  north.”  In  contrast,  the  same 
user  accomplished  the  same  task  multimodally  by  encircling  a 
specific  area  and  saying,  “Open  space.” 

In  previous  research,  hard-to-process,  disfluent  language  has 
been  observed  to  decrease  by  50%  during  multimodal  interaction 
with  a  map,  compared  with  a  more  restricted  speech-only  inter- 
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action  (Oviatt,  1997).  This  drop  occurs  mainly  because  people 
have  difficulty  speaking  spatial  information,  which  precipitates 
disfluencies.  In  a  flexible  multimodal  interface,  they  instead  use 
pen  input  to  convey  spatial  information,  thereby  avoiding  the 
need  to  speak  it.  Further  research  is  needed  to  establish 
whether  other  forms  of  flexible  multimodal  communication  also 
generally  ease  users’  cognitive  load,  which  may  be  reflected  in 
a  reduced  rate  of  disfluencies. 

During  multimodal  pen/voice  communication,  the  linguistic 
indirection  that  is  typical  of  spoken  language  frequently  is  re¬ 
placed  with  more  direct  commands  (Oviatt  &  Kuhn,  1998).  In 
the  following  example,  a  study  participant  made  a  disfluent  in¬ 
direct  request  using  speech  input  while  requesting  a  map-based 
distance  calculation:  “’What  is  the  distance  between  the  "Victo¬ 
rian  Museum  and  the,  uh,  the  house  on  the  east  side  of  Wood¬ 
pecker  Lane?”  When  requesting  distance  information  multi- 
modally  the  same  user  encircled  the  house  and  museum  while 
speaking  the  following  brief  direct  command:  “Show  distance 
between  here  and  here.”  In  this  research,  the  briefer  and  more 
direct  multimodal  pen/voice  language  also  contained  substan¬ 
tially  fewer  referring  expressions,  with  a  selective  reduction  in 
co-referring  expressions  that  instead  were  transformed  into  de¬ 
ictic  expressions.  This  latter  reduction  in  coreference  would 
simplify  natural  language  processing  by  easing  the  need  for 
anaphoric  tracking  and  resolution  in  a  multimodal  interface. 
Also  consistent  with  fewer  referring  expressions,  explicit  speci¬ 
fication  of  definite  and  indefinite  reference  is  less  common  in 
multimodal  language  (Oviatt  &  Kuhn,  1998).  Current  natural 
language  processing  algorithms  typically  rely  heavily  on  the 
specification  of  determiners  in  definite  and  indefinite  references 
in  order  to  represent  and  resolve  noun-phrase  reference.  One 
unfortunate  byproduct  of  the  lack  of  such  specifications  is  that 
current  language  processing  algorithms  are  unprepared  for  the 
frequent  occurrence  of  elision  and  derxis  in  multimodal  human- 
computer  interaction. 

In  other  respects,  multimodal  language  clearly  is  different 
from  spoken  language,  although  not  necessarily  simpler.  For  ex¬ 
ample,  users’  multimodal  pen/voice  language  departs  from  the 
canonical  English  word  order  of  S-’V-O-LOC  (e.g.,  Subject-'Verb- 
Object-Locative  constituent),  which  is  observed  in  spoken  lan¬ 
guage  and  formal  textual  language.  Instead,  users’  multimodal 
constituents  shift  to  a  LOC-S-V-O  word  order.  A  recent  study  re¬ 
ported  that  95%  of  locative  constituents  were  in  sentence-initial 
position  during  multimodal  interaction.  However,  for  the  same 
users  completing  the  same  tasks  while  speaking,  96%  of  loca¬ 
tives  were  in  sentence-final  position  (Oviatt  et  al.,  1997).  It  is 
likely  that  broader  analysis  of  multimodal  communication  pat¬ 
terns,  which  could  involve  gaze  and  manual  gesturing  to  indi¬ 
cate  location  rather  than  pen-based  pointing,  would  reveal  a 
similar  reversal  in  word  order. 

One  implication  of  these  many  differences  is  that  new  mul¬ 
timodal  corpora,  statistical  language  models,  and  natural  lan¬ 
guage-processing  algorithms  will  need  to  be  established  before 
multimodal  language  can  be  processed  optimally.  Future  re¬ 
search  and  corpus-collection  efforts  also  will  be  needed  on  dif¬ 
ferent  types  of  multimodal  communication,  and  in  other  appli¬ 
cation  domains,  so  that  the  generality  of  previously  identified 
multimodal  language  differences  can  be  explored. 


WHAT  ARE  THE  BASIC  WAYS  IN  WHICH 
MULTIMODAL  INTERFACES  DIFFER  FROM 
GRAPHICAL  USER  INTERFACES? 


Multimodal  research  groups  currently  are  rethinking  and  re¬ 
designing  basic  user  interface  architectures  because  a  whole 
new  range  of  architectural  requirements  has  been  posed.  First, 
graphical  user  interfaces  typically  assume  that  a  single  event 
stream  controls  the  underlying  event  loop,  with  any  processing 
sequential  in  nature.  For  example,  most  GUIs  ignore  typed  in¬ 
put  when  a  mouse  button  is  depressed.  In  contrast,  multimodal 
interfaces  typically  can  process  continuous  and  simultaneous  in¬ 
put  from  parallel  incoming  streams.  Secondly,  GUIs  assume  that 
the  basic  interface  actions,  such  as  selection  of  an  item,  are 
atomic  and  unambiguous  events.  In  contrast,  multimodal  sys¬ 
tems  process  input  modes  using  recognition-based  technolo¬ 
gies,  which  are  designed  to  handle  uncertainty  and  entail  prob¬ 
abilistic  methods  of  processing.  Thirdly,  GUIs  often  are  built  to 
be  separable  from  the  application  software  that  they  control, 
although  the  interface  components  usually  reside  centrally  on 
one  machine.  In  contrast,  recognition-based  user  interfaces  typ¬ 
ically  have  larger  computational  and  memory  requirements, 
which  often  makes  it  desirable  to  distribute  the  interface  over  a 
network  so  that  separate  machines  can  handle  different  recog¬ 
nizers  or  databases.  For  example,  cell  phones  and  networked 
PDAs  may  extract  features  from  speech  input,  but  transmit  them 
to  a  recognizer  that  resides  on  a  server.  Finally,  multimodal  in¬ 
terfaces  that  process  two  or  more  recognition-based  input 
streams  require  time  stamping  of  input,  and  the  development  of 
temporal  constraints  on  mode  fusion  operations.  In  this  regard, 
they  involve  uniquely  time-sensitive  architectures. 


WHAT  BASIC  ARCHITECTURES  AND 
PROCESSING  TECHNIQUES  HAVE  BEEN 
USED  TO  DESIGN  MULTIMODAL  SYSTEMS? 


Many  early  multimodal  interfaces  that  handled  combined  speech 
and  gesture,  such  as  Bolt’s  “Put  That  There”  system  (Bolt,  1980), 
have  been  based  on  a  control  structure  in  which  multimodal 
integration  occurs  during  the  process  of  parsing  spoken  lan¬ 
guage.  As  discussed  earlier,  when  the  user  speaks  a  deictic  ex¬ 
pression  such  as  “here”  or  “this,”  the  system  searches  for  a  syn¬ 
chronized  gestural  act  that  designates  the  spoken  referent. 
"While  such  an  approach  is  viable  for  processing  a  point-and- 
speak  multimodal  integration  pattern,  as  discussed  earlier,  mul¬ 
timodal  systems  must  be  able  to  process  richer  input  than  just 
pointing,  including  gestures,  symbols,  graphic  marks,  lip  move¬ 
ments,  meaningful  facial  expressions,  and  so  forth.  To  support 
more  broadly  functional  multimodal  systems,  general  process¬ 
ing  architectures  have  been  developed  since  Bolt’s  time.  Some 
of  these  recent  architectures  handle  a  variety  of  multimodal  in¬ 
tegration  patterns,  as  well  as  the  interpretation  of  both  uni- 
modal  and  combined  multimodal  input.  This  kind  of  architec¬ 
ture  can  support  the  development  of  multimodal  systems  in 
which  modalities  are  processed  individually  as  input  alternatives 
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to  one  another,  or  those  in  which  two  or  more  modes  are  pro¬ 
cessed  as  combined  multimodal  input. 

For  multimodal  systems  designed  to  handle  joint  process¬ 
ing  of  input  signals,  there  are  two  main  subtypes  of  multimodal 
architecture.  One  subtype  integrates  signals  at  the  feature  level 
(e.g.,  “early  fusion”).  The  other  integrates  information  at  d,  se¬ 
mantic  level  (e.g.,  “late  fusion”).  Examples  of  systems  based  on 
an  early  feature-fusion  processing  approach  include  those  de¬ 
veloped  by  Bregler  and  colleagues  (Bregler,  Manke,  Hild,  & 
Waibel,  1993),  Vo  and  colleagues  (Vo  et  al.,  1995),  and  Pavlovic 
and  colleagues  (Pavlovic,  Sharma,  &  Huang,  1997;  Pavlovic  & 
Huang,  1998).  In  feature-fusion  architecture,  the  signal-level 
recognition  process  in  one  mode  influences  the  course  of 
recognition  in  the  other.  Feature  fusion  is  considered  more  ap¬ 
propriate  for  closely  temporally  synchronized  input  modalities, 
such  as  speech  and  lip  movements  (Stork  &  Hennecke,  1995; 
Rubin  et  al.,  1998). 

In  contrast,  multimodal  systems  using  the  late  semantic  fu¬ 
sion  approach  have  been  applied  to  processing  multimodal 
speech  and  pen  input  or  manual  gesturing,  for  which  the  input 
modes  are  less  coupled  temporally.  These  input  modes  provide 
different  but  complementary  information  that  typically  is  inte¬ 


grated  at  the  utterance  level.  Late  semantic  integration  systems 
use  individual  recognizers  that  can  be  trained  using  unimodal 
data,  which  are  easier  to  collect  and  are  already  publicly  avail¬ 
able  for  speech  and  handwriting.  In  this  respect,  systems  based 
on  semantic  fusion  can  be  scaled  up  more  easily  in  number  of 
input  modes  or  vocabulary  size.  Examples  of  systems  based  on 
semantic  fusion  include  Put  That  There  (Bolt,  1980),  ShopTalk 
(Cohen  et  al.,  1989),  QuickSet  (Cohen  et  al.,  1997),  CUBRICON 
(Neal  &  Shapiro,  1991),  Virtual  World  (Codellaetal.,  1992),  Finger- 
Pointer  (Fukumoto,  Suenaga,  &  Mase,  1994),  VisualMan  (Wang, 
1995),  Human-Centric  Word  Processor,  Portable  Voice  Assistant 
(Bers  et  al.,  1998),  the  VR  Aircraft  Maintenance  Training  System 
(Duncan  et  al.,  1999)  and  Jeanie  (Vo  &  Wood,  1996). 

As  an  example  of  multimodal  information  processing  flow  in 
a  late-stage  semantic  architecture.  Fig.  21.5  illustrates  two  input 
modes  (e.g.,  speech  and  manual  or  pen-based  gestures)  recog¬ 
nized  in  parallel  and  processed  by  an  understanding  component. 
The  results  involve  partial  meaning  representations  that  are 
fused  by  the  multimodal  integration  component,  which  also  is 
influenced  by  the  system’s  dialogue  management  and  interpre¬ 
tation  of  current  context.  During  the  integration  process,  alter¬ 
native  lexical  candidates  for  the  final  multimodal  interpretation 
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FIGURE  21.5 


Typical  information  processing  flow  in  a  multimodal  architecture  designed  for  speech  and  gesture. 
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are  ranked  according  to  their  probability  estimates  on  an  w-best 
list.  The  best-ranked  multimodal  interpretation  then  is  sent  to 
the  application  invocation  and  control  component,  which  trans¬ 
forms  this  information  into  a  series  of  commands  to  one  or  more 
back-end  application  systems.  System  feedback  typically  includes 
multimedia  output,  which  may  incorporate  text-to-speech  and 
non-speech  audio;  graphics  and  animation;  and  so  forth.  For  ex¬ 
amples  of  feature-based  multimodal  processing  flow  and  archi¬ 
tectures,  especially  as  applied  to  multimodal  speech  and  lip 
movement  systems,  see  Benoit  et  al.  (2000). 

There  are  many  ways  to  realize  this  information  processing 
flow  as  architecture.  One  common  infrastructure  that  has  been 
adopted  by  the  multimodal  research  community  involves  multi¬ 
agent  architectures,  such  as  the  Open  Agent  Architecture  (Co¬ 
hen,  Cheyer,  Wang,  &  Baeg,  1994;  Martin,  Cheyer,  &  Moran, 
1999)  and  Adaptive  Agent  Architecture  (Kumar  &  Cohen,  2000) . 
In  a  multi-agent  architecture,  the  many  components  needed  to 
support  the  multimodal  system  (e.g.,  speech  recognition,  ges¬ 
ture  recognition,  natural  language  processing,  multimodal  in¬ 
tegration)  may  be  written  in  different  programming  languages, 
on  different  machines,  and  with  different  operating  systems. 
Agent  communication  languages  are  being  developed  that  can 
handle  asynchronous  delivery,  triggered  responses,  multi-casting, 
and  other  concepts  from  distributed  systems,  and  that  are  fault- 
tolerant  (Kumar  &  Cohen,  2000).  Using  a  multi-agent  architec¬ 
ture,  for  example,  speech  and  gestures  can  arrive  in  parallel  or 
asynchronously  via  individual  modality  agents,  with  the  results 
recognized  and  passed  to  a  facilitator.  These  results,  typically 
an  w-best  list  of  conjectured  lexical  items  and  related  time-stamp 
information,  then  are  routed  to  appropriate  agents  for  further 
language  processing.  Next,  sets  of  meaning  fragments  derived 
from  the  speech  and  pen  signals  arrive  at  the  multimodal  inte¬ 
grator.  This  agent  decides  whether  and  how  long  to  wait  for 
recognition  results  from  other  modalities,  based  on  the  system’s 
temporal  thresholds.  It  fuses  the  meaning  fragments  into  a 
semantically-  and  temporally-compatible  whole  interpretation 
before  passing  the  results  back  to  the  facilitator.  At  this  point, 
the  system’s  final  multimodal  interpretation  is  confirmed  by  the 
interface,  delivered  as  multimedia  feedback  to  the  user,  and  ex¬ 
ecuted  by  any  relevant  applications.  In  summary,  multi-agent  ar¬ 
chitectures  provide  essential  infrastructure  for  coordinating  the 
many  complex  modules  needed  to  implement  multimodal  sys¬ 
tem  processing,  and  they  permit  doing  so  in  a  distributed  man¬ 
ner  that  is  compatible  with  the  trend  toward  mobile  computing. 

The  core  of  multimodal  systems  based  on  semantic  fusion  in¬ 
volves  algorithms  that  integrate  common  meaning  representa¬ 
tions  derived  from  speech,  gesture,  and  other  modalities  into  a 
combined  final  interpretation.  The  semantic  fusion  operation 
requires  a  common  meaning-representation  framework  for  all 
modalities,  and  a  well-defined  operation  for  combining  partial 
meanings  that  arrive  from  different  signals.  To  fuse  information 
from  different  modalities,  various  research  groups  have  inde¬ 
pendently  converged  on  a  strategy  of  recursively  matching  and 
merging  attribute/value  data  structures,  although  using  a  variety 
of  different  algorithms  (Vo  &  Wood,  1996;  Cheyer  &  Julia,  1995; 
Pavlovic  &  Huang,  1998;  Shaikh  et  al.,  1997).  This  approach  is 
considered  d,frame-based  integration  technique.  An  alternative 
logic-based  approach  derived  from  computational  linguistics 
(Carpenter,  1990,  1992;  Calder,  1987)  involves  the  use  of  typed 


feature  structures  and  unification-based  integration,  which  is 
a  more  general  and  well-understood  approach.  Unification- 
based  integration  techniques  also  have  been  applied  to  multi¬ 
modal  system  design  (Cohen  et  al.,  1997;  Johnston  et  al.,  1997; 
Wu  et  al.,  1999).  Feature-structure  unification  is  considered  well 
suited  to  multimodal  integration,  because  unification  can  com¬ 
bine  complementary  or  redundant  input  from  both  modes,  but 
it  rules  out  contradictory  input.  Given  this  foundation  for  multi¬ 
modal  integration,  more  research  still  is  needed  on  the  devel¬ 
opment  of  canonical  meaning  representations  that  are  common 
among  different  input  modes  which  will  need  to  be  represented 
in  new  types  of  multimodal  systems. 

■When  statistical  processing  techniques  are  combined  with  a 
symbolic  unification-based  approach  that  merges  feature  struc¬ 
tures,  then  the  multimodal  architecture  that  results  is  d,  hybrid 
symbolic /statistical  one.  Hybrid  architectures  represent  one 
major  new  direction  for  multimodal  system  development.  Mul¬ 
timodal  architectures  also  can  be  hybrids  in  the  sense  of  com¬ 
bining  Hidden  Markov  Models  (HMMs)  and  Neural  Networks 
(NNs).  New  hybrid  architectures  potentially  are  capable  of 
achieving  very  robust  functioning,  compared  with  either  an 
early-  or  late-fusion  approach  alone.  For  example,  the  Members- 
Teams-Committee  (MTC)  hierarchical  recognition  technique, 
which  is  a  hybrid  symbolic/statistical  multimodal  integration 
framework  trained  over  a  labeled  multimodal  corpus,  recently 
achieved  95.26%  correct  recognition  performance,  or  within 
1.4%  of  the  theoretical  system  upper  bound  (Wu  et  al.,  1999). 
Other  architectural  approaches  and  contributions  to  processing 
multimodal  information  have  been  summarized  elsewhere 
(Oliver  &  Horvitz,  2005;  Potamianos  et  al.,  2003). 


WHAT  ARE  THE  MAIN  FUTURE  DIRECTIONS 
FOR  MULTIMODAL  INTERFACE  DESIGN? 


The  computer  science  community  is  just  beginning  to  under¬ 
stand  how  to  design  innovative,  well  integrated,  and  robust 
multimodal  systems.  To  date,  most  multimodal  systems  remain 
bimodal,  and  recognition  technologies  related  to  several  human 
senses  (e.g.,  haptics,  smell,  taste)  have  yet  to  be  well  repre¬ 
sented  within  multimodal  interfaces.  The  design  and  develop¬ 
ment  of  new  types  of  systems  that  include  such  modes  will  not 
be  achievable  through  intuition.  Rather,  it  will  depend  on 
knowledge  of  the  natural  integration  patterns  that  typify  peo¬ 
ple’s  combined  use  of  various  input  modes.  This  means  that  the 
successful  design  of  multimodal  systems  will  continue  to  re¬ 
quire  guidance  from  cognitive  science  on  the  coordinated  hu¬ 
man  perception  and  production  of  natural  modalities.  In  this 
respect,  multimodal  systems  only  can  flourish  through  multi¬ 
disciplinary  cooperation,  as  well  as  teamwork  among  those  rep¬ 
resenting  expertise  in  the  component  technologies. 

Most  of  the  systems  outlined  in  this  chapter  have  been  built 
during  the  past  15  years,  and  they  are  research-level  systems. 
However,  in  some  cases  they  have  developed  well  beyond  the 
prototype  stage,  and  are  being  integrated  with  other  software  at 
academic  and  federal  sites,  or  appearing  as  newly  shipped  prod¬ 
ucts.  To  achieve  wider  commercialization  of  multimodal  inter¬ 
faces,  such  systems  will  need  to  develop  more  powerful  and 
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general  methods  of  natural  language  and  dialogue  processing, 
and  temporal  modeling  and  processing  of  incoming  signals.  In 
addition,  multimodal  datasets  and  tools  are  very  much  needed 
to  build  applications  more  rapidly  in  a  wide  range  of  domains,  in¬ 
cluding  for  newly  emerging  collaborative  multimodal  applica¬ 
tions  such  as  meeting  support  and  education  (Barthelmess, 
Kaiser,  Huang,  &  Demirdjian,  2005;  Cohen  &  McGee,  2004;  Dan- 
ninger  et  al.,  2005;  Gatica-Perez,  Lathoud,  Odobez,  &  McGowan, 
2005;  McGee,  2003;  Pentland,  2005).  The  many  mobile  multi¬ 
modal  interfaces  currently  being  built  also  will  require  active 
adaptation  to  the  user,  task,  ongoing  dialogue,  and  environmen¬ 
tal  context,  which  is  another  very  active  area  of  recent  work 
(Gorniak  &  Roy,  2005;  Gupta,  2004;  Huang  &  Oviatt,  2005;  Jain  & 
Ross,  2002;  Potamianos  et  al.,  2003;  Xiao  et  al.,  2003).  To  facilitate 
the  speed  and  generality  of  multimodal  interface  adaptation  to 
these  important  variables,  future  work  will  need  to  integrate  new 
machine  learning  techniques  that  are  now  being  developed  to 
handle  asynchronous  and  heterogeneous  data  (Bengio,  2004; 
McGowan  et  al.,  2005).  Finally,  in  the  future  a  coherent  theoreti¬ 
cal  framework  needs  to  be  developed  to  account  for  multimodal 
interaction  patterns.  This  will  be  invaluable  for  proactively  guid¬ 
ing  the  design  of  new  multimodal  interfaces  to  be  compatible 
with  human  capabilities  and  limitations.  Current  work  in  cogni¬ 
tive  neuroscience  and  multisensory  perception  are  beginning 
to  provide  an  empirical  and  theoretical  basis  for  this  future  in¬ 
terface  design  (Calvert  et  al.,  2004;  Ernst  &  Bulthoff,  2004). 

In  conclusion,  multimodal  interfaces  are  just  beginning  to 
model  human-like  sensory  perception.  They  are  recognizing 
and  identifying  actions,  language,  and  people  that  have  been 
seen,  heard,  or  in  other  ways  experienced  in  the  past.  They  lit¬ 
erally  reflect  and  acknowledge  the  existence  of  human  users, 
empower  them  in  new  ways,  and  create  for  them  a  “voice.”  They 


also  can  be  playful  and  self-reflective  interfaces  that  suggest  new 
forms  of  human  identity  as  we  interact  face  to  face  with  ani¬ 
mated  personas  representing  our  own  kind.  In  all  of  these  ways 
novel  multimodal  interfaces,  as  primitive  as  their  early  bimodal 
instantiations  may  be,  represent  a  new  multidisciplinary  science, 
a  new  art  form,  and  a  socio-political  statement  about  our  col¬ 
lective  desire  to  humanize  the  technology  we  create. 
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INTRODUCTION 


This  chapter  covers  a  broad  range  of  interactive  systems.  They 
all  have  one  idea  in  common:  It  can  be  worthwhile  for  a  system 
to  learn  something  about  each  individual  user  and  adapt  its  be¬ 
havior  to  them  in  some  nontrivial  way. 

An  example  familiar  to  most  readers  is  shown  in  Fig.  22.1.  A 
visitor  to  Amazon.com  has  just  explicitly  requested  recommen¬ 
dations,  without  having  specified  a  particular  type  of  product. 
During  the  user’s  past  visits,  Amazon.com  has  learned  some¬ 
thing  about  his  or  her  interests,  on  the  basis  of  items  he  or  she 
has  purchased  and  ratings  he  or  she  has  made.  Therefore,  the 
system  can  make  recommendations  that  are  especially  likely  to 
appeal  to  this  particular  user. 


Concepts 

The  key  idea  embodied  in  Amazon. corn’s  recommendations  and 
the  other  systems  discussed  in  this  chapter  is  that  of  adaptation 
to  the  individual  user.  Depending  on  their  functions  and  forms, 
systems  that  adapt  to  their  users  have  been  given  labels  ranging 
from  adaptive  interfaces  through  user  modeling  systems  to  soft¬ 
ware  agents  or  intelligent  agents.  Starting  in  the  late  1990s,  the 
broader  term  personalization  became  popular,  especially  in 
connection  with  commercially  deployed  systems.  In  order  to  dis¬ 


cuss  the  common  issues  that  all  of  these  systems  raise,  we  will 
refer  to  them  with  the  term  user-adaptive  systems,  which  de¬ 
scribes  their  common  properties  explicitly.  Fig.  22.2  introduces 
some  concepts  that  can  be  applied  to  any  user-adaptive  sys¬ 
tem;  Fig.  22.3  shows  the  form  that  they  take  in  Amazon. corn’s 
recommendations. 

A  user-adaptive  system  makes  use  of  some  type  of  informa¬ 
tion  about  the  current  individual  user,  such  as  the  products  that 
the  user  has  bought.  In  the  process  of  user  model  acquisition, 
the  system  performs  some  type  of  learning  and  inference  on  the 
basis  of  the  information  about  the  user  in  order  to  arrive  at 
some  sort  of  user  model,  which  in  general  concerns  only  limited 
aspects  of  the  user  (such  as  his  or  her  interest  in  particular  types 
of  product).  In  the  process  of  user  model  application,  the  sys¬ 
tem  applies  the  user  model  to  the  relevant  features  of  the  cur¬ 
rent  situation  in  order  to  determine  how  to  adapt  its  behavior  to 
the  user. 

A  user-adaptive  system  can  be  defined  as  an  interactive  sys¬ 
tem  that  adapts  its  behavior  to  individual  users  on  the  basis  of 
processes  of  user  model  acquisition  and  application  that  involve 
some  form  of  learning,  inference,  or  decision  making. 

This  definition  distinguishes  user-adaptive  systems  from 
adaptable  systems:  ones  that  the  individual  user  can  explicitly 
tailor  to  his  or  her  own  preferences  (for  example,  by  choosing 
options  that  determine  the  appearance  of  the  user  interface). 
The  relationship  between  adaptivity  and  adaptability  will  be  dis¬ 
cussed  at  several  places  in  this  chapter. 


These  recommendations  are  based  on  items  you  own  and  more, 
view:  All  I  New  Releases  |  Coming  Soon 


1. 


No  Comebacks 

by  Frederick  Forsyth 

Average  Customer  Review:  ****». 

Publication  Date:  April  1, 1983 

Our  Price:  $7.99  Used  &  new  from  $0.01 


□  I  Own  It  Not  interested  *i*****  Rate  it 

Recommended  because  you  rated  The  Deceiver  and  more  (edit) 


The  Phantom  of  Paris 
by  Gwenith  M.  Vehlow 
Average  Customer  Review:  **«-.> 

Publication  Date:  May  1,  2003 
Our  Price:  $19.95  Used  &  new  from  $17.78 

U I  Own  It  L  Not  interested  *i*i^***  Rate  it 

Recommended  because  you  said  you  owned  The  Phantom  of  Manhattan  (edit) 

3.  Jackal 

DVD  ~  Edward  Fox 
Average  Customer  Review:  ••••* 

Release  Date:  April  29,  1998 
Our  Price:  $7.99  Used  &  new  from  $4.55 
□  I  Own  It  L  Not  interested  Rate  it 

Recommended  because  you  rated  The  Day  of  the  Jackal  and  more  (edit) 

FIGURE  22. 1 .  Part  of  a  screen  showing  a  list  of  recommenda¬ 
tions  generated  on  request  by  Amazon.com.  (Screen  shot  made 
from  http://amazon.com/  in  December  2005.) 


FIGURE  22.2.  General  schema  for  the  processing  in  a  user- 
adaptive  system.  (Dotted  arrows:  use  of  information;  solid 
arrows:  production  of  results.) 


FIGURE  22.3.  Overview  of  adaptation  in  Amazon.com. 
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The  next  two  sections  of  this  chapter  address  the  following 
question:  What  can  user-adaptivity  be  good  for?  They  examine  in 
turn  10  different  functions  that  can  be  served  by  user-adaptivity, 
giving  examples  ranging  from  familiar  commercially  deployed  sys¬ 
tems  to  research  prototypes.  The  following  section  discusses 
some  usability  challenges  that  are  especially  important  in  con¬ 
nection  with  user-adaptive  systems  and  that  have  stimulated  most 
of  the  controversy  that  has  surrounded  these  systems.  The  next 
section  considers  a  key  design  decision:  What  types  of  informa¬ 
tion  about  each  user  should  be  collected?  The  final  major  sec¬ 
tion  looks  at  several  distinctive  aspects  of  the  empirical  study  of 
user-adaptive  systems.  The  chapter  concludes  with  comments  on 
the  reasons  why  their  importance  is  likely  to  continue  to  grow.^ 
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FIGURE  22.4.  Partial  screen  shot  from  the  intelligent  e-mail 
sorting  system  i-ems.  (Screen  shot  courtesy  of  Eric  McCreath.) 


FUNCTIONS:  SUPPORTING  SYSTEM  USE 


Some  of  the  ways  in  which  user-adaptivity  can  be  helpful  involve 
support  for  a  user’s  efforts  to  operate  a  system  successfully  and 
effectively.  This  section  considers  five  types  of  support. 


Taking  Over  Parts  of  Routine  Tasks 

The  first  function  of  adaptation  involves  taking  over  some  of  the 
work  that  the  user  would  normally  have  to  perform  himself  or 
herself — routine  tasks  that  may  place  heavy  demands  on  a 
user’s  time,  though  typically  not  on  his  or  her  intelligence  and 
knowledge.  Maybe  the  most  obvious  task  of  this  sort  is  organiz¬ 
ing  e-mail,  which  takes  up  a  signiheant  proportion  of  the  time  of 
many  office  workers.  The  classic  early  work  of  Maes’s  (1994) 
group  on  “agents  that  reduce  work  and  information  overload” 
addressed  this  task. 

A  more  recent  effort  (Fig.  22.4)  is  found  in  the  prototype 
“intelligent  electronic  mail  sorter”  (i-ems ;  McCreath,  Kay,  & 
Crawford,  2006;  see  also  Crawford,  Kay,  &  McCreath,  2002; 
McCreath  &  Kay,  2003),  which  is  designed  to  expedite  the  te¬ 
dious  task  of  filing  incoming  e-mail  messages  into  folders.  By 
observing  and  analyzing  the  way  an  individual  user  files  mes¬ 
sages,  the  system  learns  to  predict  the  most  likely  folder  for  any 
new  message.  In  the  overview  of  messages  in  the  user’s  inbox 
(shown  in  the  top  part  of  the  screen  shot),  i-ems  tentatively 
sorts  the  new  messages  into  categories  that  correspond  to  the 
most  likely  folders.  When  an  individual  message  is  being  dis¬ 
played,  the  one-line  field  in  the  middle  of  the  screen  shows  an 
explanation  of  the  folder  prediction.  If  the  user  agrees  with  the 
prediction,  he  or  she  can  click  on  the  ‘Archive”  button  at  the  top 
of  the  screen  to  have  the  message  moved  into  the  predicted 
folder.  To  file  it  away  in  another  folder,  he  or  she  drags  it  to  the 
icon  for  the  folder  in  the  left-hand  panel,  just  as  he  or  she  would 
with  a  system  that  did  not  make  any  predictions. 


FIGURE  22.5.  Overview  of  adaptation  in  i-ems. 


One  reason  why  research  on  systems  like  i-ems  has  continued 
for  so  long  is  that  the  problem  raises  a  number  of  challenges. 
For  example,  since  different  users  apply  radically  different  prin¬ 
ciples  for  creating  categories  of  e-mail,  i-ems  supplies  several  dif¬ 
ferent  methods  for  learning  a  user’s  implicit  rules,  each  of  which 
may  show  a  different  degree  of  success  with  different  users.  It 
also  allows  the  learned  rules  to  operate  alongside  any  hand¬ 
crafted  rules  that  the  user  may  have  defined  so  that  the  strengths 
of  both  types  of  rules  can  be  exploited  (an  evaluation  is  discussed 
below  in  the  empirical  methods  section).  Since  even  the  best 
set  of  learned  rules  will  sometimes  incorrectly  predict  how  the 
user  would  classify  a  message,  the  interface  must  be  designed  in 
such  a  way  that  incorrect  predictions  will  have  minimal  conse¬ 
quences.  New  approaches  to  the  general  problem  continue  to 
appear  (e.g.,  Surendran,  Platt,  &  Renshaw,  2005).  Two  systems 
that  have  been  fairly  widely  deployed  have  been  SwiftFile  (Segal 
&  Kephart,  1999,  2000),  which  was  incorporated  into  Lotus 
Notes,  and  POPFile  (available  in  early  2006  from  http://popfile. 
sourceforge.net/cgi-bin/wiki. pi),  a  public  domain  program  that 
is  used  mainly  for  spam  filtering  but  that  can  also  learn  to  sort 
messages  into  a  limited  number  of  user-specific  folders.^ 


^The  version  of  this  chapter  in  the  first  edition  of  this  handbook  included  a  section  about  some  of  the  machine  learning  and  artificial  intelligence  tech¬ 
niques  that  are  most  commonly  used  for  user  model  acquisition  and  application.  There  is  no  such  section  in  this  second  edition,  because  (a)  it 
seemed  more  important  to  expand  the  material  in  the  other  sections  and  (b)  the  range  of  techniques  used  has  grown  to  the  point  where  a  brief  sum¬ 
mary  would  have  limited  value.  Discussions  of  the  relevant  methods  will  be  found  in  many  of  the  works  cited  in  the  chapter. 

^For  examples  of  approaches  to  support  for  e-mail  management  that  do  not  involve  adaptation  to  individual  users,  see  Gruen  et  al.  (2004)  and  Bal¬ 
ter  and  Sidner  (2002). 


436  •  JAMESON 


Another  traditional  task  in  this  category  is  the  scheduling  of 
meetings  and  appointments  (Mitchell,  Caruana,  Freitag,  McDer¬ 
mott,  &  Zabowski,  1994;  Maes,  1994;  Horvitz,  1999;  Gervasio, 
Moffitt,  Pollack,  Taylor,  &  Uribe,  2005).  By  learning  the  user’s 
preferences  for  particular  meeting  types,  locations,  and  times  of 
day,  a  system  can  tentatively  perform  part  of  the  task  of  entering 
appointments  in  the  user’s  calendar. 

The  primary  benefits  of  this  form  of  adaptation  are  saving 
time  and  effort  for  the  user.  The  potential  benefits  are  greatest 
where  the  system  can  perform  the  entire  task  without  input 
from  the  user.  In  most  cases,  however,  the  user  is  kept  in  the 
loop  (as  with  i-ems),  because  the  system’s  ability  to  predict  what 
the  user  would  want  done  is  limited  (cf  the  section  on  usabil¬ 
ity  challenges  below). 

Adapting  the  Interface 

A  different  way  of  helping  a  person  to  use  a  system  more  ef¬ 
fectively  is  to  adapt  the  user  interface  so  that  it  fits  better  with 
the  user’s  way  of  working  with  the  system.  Interface  elements 
that  have  been  adapted  in  this  way  include  menus,  icons,  and 
the  system’s  processing  of  signals  from  input  devices  such  as 
keyboards. 

An  example  familiar  to  most  readers  is  provided  by  the  Smart 
Menus  feature  that  has  been  found  in  Microsoft  operating  sys¬ 
tems  since  'Windows  2000.  Fig.  22.6  illustrates  the  basic  mecha¬ 
nism:  An  infrequently  used  menu  option  is  initially  hidden  from 
view;  it  appears  in  the  main  part  of  a  menu  only  after  the  user 
has  selected  it  for  the  first  time.  (It  will  be  removed  later  if  the 
user  does  not  select  it  often  enough.)  The  idea  is  that,  in  the 
long  run,  the  menus  should  contain  just  the  items  that  the  user 
accesses  frequently  (at  least  recently),  so  that  the  user  needs  to 
spend  less  time  searching  within  menus. 

Some  informative  studies  related  to  Smart  Menus  have  been 
conducted  by  McGrenere,  Baecker,  and  Booth  (2002).  In  a  field 
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FIGURE  22.6.  Example  of  adaptation  in  Smart  Menus.  (The  user 
accesses  the  "Insert"  menu.  Not  finding  the  desired  option,  the 
user  clicks  on  the  extension  arrows  and  selects  the  "Field” 
option.  When  the  user  later  accesses  the  same  menu,  "Field”  now 
appears  in  the  main  section.) 


FIGURE  22.7.  Overview  of  adaptation  in  Smart  Menus. 


study  with  experienced  users  of  Microsoft  ’Word  2000,  McGrenere 
et  al.  compared  the  Smart  Menus  of 'Word  2000  with  (a)  tradi¬ 
tional  static  menus  and  (b)  an  alternative  approach  to  reducing 
the  number  of  functions  that  confront  users.  Their  variant 
MS'Word  Personal  is  an  adaptable  system:  It  provides  a  reason¬ 
ably  intuitive  and  convenient  way  for  users  to  add  and  remove 
menu  functions.  After  working  with  MS'Word  Personal  for  sev¬ 
eral  weeks,  most  of  the  users  in  the  study  preferred  this  adapt¬ 
able  system  to  the  normal  'Word  2000  with  Smart  Menus.  The 
users  who  had  been  classified  as  feature-shy  appeared  to  bene¬ 
fit  most,  but  as  is  typical  in  studies  like  this  (as  will  be  dis¬ 
cussed)  ,  quite  a  variety  of  attitudes  about  the  relative  merits  of 
the  three  approaches  to  adapting  menu  content  were  shown 
by  the  subjects.  As  the  authors  pointed  out,  it  seems  worthwhile 
to  consider  design  solutions  that  combine  some  degree  of  adap¬ 
tivity  and  adaptability.  For  example,  instead  of  automatically 
adapting  the  menus,  the  system  might  recommend  possible 
adaptations  on  the  basis  of  its  analysis  of  the  user’s  menu  se¬ 
lections  (cf  Bunt,  Conati,  &  McGrenere,  2004). 

A  more  direct  experimental  comparison  by  Findlater  and 
McGrenere  (2004)  involving  adaptive  menus  like  Smart  Menus 
is  discussed  in  the  section  on  empirical  methods  below. 

One  promising  application  of  both  adaptable  and  adaptive 
methods  involves  taking  into  account  special  perceptual  or 
physical  impairments  of  individual  users  to  allow  them  to  use  a 
system  more  efficiently,  with  minimal  errors  and  frustration.  A 
system  in  which  the  two  approaches  are  combined  is  the  'Web 
Adaptation  Technology  of  IBM  Research  (Hanson  &  Crayne, 
2005),  which  aims  to  facilitate  web  browsing  by  older  adults. 
'With  regard  to  most  of  the  adaptations,  such  as  the  reformatting 
of  multicolumn  text  in  a  single  column,  the  system  is  adaptable: 
It  provides  convenient  ways  for  the  user  to  request  the  changes. 
(It  would  in  fact  be  difficult  for  a  system  to  determine  automat¬ 
ically  whether  a  given  user  would  benefit  from  one-column  for¬ 
matting.)  Several  changes  in  the  keyboard  settings  are  achieved 
via  automatic  adaptation  (for  a  more  detailed  discussion,  see 
Trewin,  2004) .  For  example,  the  key  repeat  delay  interval  is  a  pa¬ 
rameter  that  determines  how  long  a  key  (e.g.,  the  left-arrow 
key)  has  to  be  held  down  before  the  system  starts  repeating  the 
associated  action  (e.g.,  moving  the  cursor  to  the  left).  Some 
users  require  a  relatively  long  key  repeat  delay  because  of  a  ten¬ 
dency  to  hold  keys  down  relatively  long  even  when  they  do  not 
want  repetition.  But  asking  the  user  to  specify  the  key  repeat 
delay  is  not  an  attractive  option:  It  can  be  time-consuming  to 
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explain  what  the  parameter  means;  the  user  him-  or  herself  may 
have  no  idea  what  the  best  setting  is  for  him  or  her;  trial  and 
error  with  different  settings  can  be  time-consuming  and  frus¬ 
trating;  and  for  some  users,  the  optimal  setting  can  change  from 
day  to  day.  The  Dynamic  Keyboard  component  of  the  Web  Adap¬ 
tation  Technology  therefore  includes  an  algorithm  that  analyzes 
a  user’s  typing  behavior  to  determine  an  optimal  key  repeat  de¬ 
lay  (as  well  as  other  parameters);  the  system  then  adjusts  the  pa¬ 
rameter  in  a  relatively  conservative  fashion.  Although,  under 
some  circumstances,  automatic  adjustment  of  keyboard  para¬ 
meters  could  make  the  keyboard  unpredictable  and  hard  to  use, 
results  obtained  in  the  context  of  Web  Adaptation  Technology 
(Trewin,  2004)  revealed  no  problems  of  this  sort. 


Helping  With  System  Use 

Instead  of  suggesting  (or  executing)  changes  to  the  interface 
of  a  given  application,  a  user-adaptive  system  can  adaptively  of¬ 
fer  information  and  advice  about  how  to  use  that  application, 
and  perhaps  also  perform  some  of  the  necessary  actions  itself 
Various  tendencies  make  it  increasingly  difficult  for  users  to  at¬ 
tain  the  desired  degree  of  mastery  of  the  applications  they  use. 
A  good  deal  of  research  into  the  development  of  systems  that 
can  take  the  role  of  a  knowledgeable  helper  was  conducted  in 
the  1980s,  especially  in  connection  with  the  complex  operating 
system  Unix.^  During  the  1990s,  such  work  became  less  fre¬ 
quent,  perhaps  partly  because  of  a  recognition  of  the  funda¬ 
mental  difficulties  involved.  In  particular,  it  is  often  difficult  to 
recognize  a  user’s  goal  when  the  user  is  not  performing  actions 
that  tend  to  lead  toward  that  goal.  The  Office  Assistant,  an  am¬ 
bitious  attempt  at  adaptive  help  introduced  in  Microsoft  Office 
97,  was  given  a  mixed  reception,  partly  because  of  the  inherent 
difficulty  of  its  task  but  especially  because  of  its  widely  perceived 
obtrusiveness  (c£  the  section  on  usability  challenges  below). 

Most  adaptive  help  systems  to  date  have  been  based  on  the 
paradigm  called  keyhole  recognition-,  (passively)  observing  the 
user  and  attempting  to  make  useful  inferences  about  her  goals 
and  tasks.  By  contrast.  Fig.  22.8  shows  an  example  of  an  alter¬ 
native  approach  to  intelligent  help  that  has  been  developed  by 
researchers  at  Mitsubishi  Electric  Research  Laboratory,  which  is 
based  on  a  collaborative  dialogue  paradigm  (Rich  et  al.,  2005; 
for  a  presentation  of  the  theoretical  and  technical  background, 
see  Rich  &  Sidney  1998;  Rich,  Sidney  &  Lesh,  2001).  In  this 
demonstration  scenario,  DiamondHelp  is  collaborating  with  the 
user  of  a  feature-rich  programmable  washer-dryer."*  Instead  of 
working  independently  on  the  problem,  the  user  conducts  a 
dialogue  with  the  help  system,  the  goal  of  the  dialogue  being 
the  execution  of  the  user’s  task.  The  dialogue  contributions  of 
the  help  system  and  the  user  are  shown  in  the  chat  balloons  at 
the  left-  and  right-hand  sides,  respectively,  of  the  screen.  The 
user’s  possible  dialogue  contributions  are  automatically  gener- 


FIGURE  22.8.  Example  of  collaborative  assistance  offered  by 
DiamondHelp.  (Explanation  in  text.  Screen  shots  courtesy  of 
Charles  Rich.) 
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FIGURE  22.9.  Overview  of  adaptation  in  DiamondHelp. 


ated  from  the  current  collaborative  dialogue  state  and  offered  in 
a  menu  inside  of  his  or  her  balloon.  The  user  can  choose  what 
he  or  she  wants  to  say  by  either  touching  the  appropriate 
phrase  or  saying  it  using  speech  recognition. ^  For  example,  in 
the  top  part  of  the  figure,  the  user  is  offered  a  choice  among 
three  possible  top-level  tasks,  the  most  complex  of  which  is 
defining  a  new  cycle.  In  a  typical  exchange,  the  user  specifies  a 
goal  or  subgoal  that  he  or  she  would  like  to  achieve,  and  the  sys¬ 
tem  responds  by  giving  instructions  and  perhaps  offering  fur¬ 
ther  possible  utterances  for  the  user. 


W  collection  of  papers  from  this  period  appeared  in  a  volume  edited  by  Hegner,  McKevitt,  Norvig,  and  Wilensky  (2001). 

"‘The  interface  shown  in  the  figures  may  be  displayed  on  the  washer-dryer  itself  or  remotely  accessed  via  a  home  network. 

^DiamondHelp  does  not  support  unrestricted  natural  language  or  speech  understanding.  In  a  Wizard  of  Oz  study  (a  type  of  study  that  will  be  dis¬ 
cussed  in  the  section  on  empirical  methods)  involving  a  prototype  help  system  of  this  general  sort,  DeKoven  (2004)  found  that  users  who  were 
able  to  employ  unrestricted  speech  would  have  preferred  to  have  more  guidance  about  what  they  could  say  to  the  system. 


438  •  JAMESON 


This  dialogue  in  some  ways  resembles  the  interaction  with 
the  more  familiar  type  of  wizard  that  is  often  employed  for  com¬ 
plex,  unfamiliar  tasks  such  as  software  installation.  The  main  dif¬ 
ference  is  that  the  dialogues  with  DiamondHelp  can  be  more 
flexible,  because  the  system  has  explicit  models  of  the  tasks  that 
the  user  can  perform  and  is  capable  of  making  use  of  these 
models  in  various  ways  during  the  dialogue.  For  example,  after 
pressing  the  Fabric  Load  picture,  the  user  can  continue  manip¬ 
ulating  the  GUI  in  the  lower  half  of  the  screen  by  himself  or 
herself  until  he  or  she  requests  guidance  again  (e.g.,  by  asking 
“What  next?”).  Because  the  user’s  actions  with  the  interface  are 
reported  to  the  help  system,  the  help  system  can  keep  track  of 
how  far  the  user  has  progressed  in  the  performance  of  his  or 
her  task.  In  other  words,  the  help  system  incorporates  a  re¬ 
stricted  form  of  the  sort  of  goal  and  plan  recognition  that  fea¬ 
tured  prominently  in  earlier  intelligent  help  systems.  In  Dia¬ 
mondHelp,  recognition  of  the  user’s  actions  is  relatively  likely  to 
be  accurate,  because  of  the  information  that  the  user  has  sup¬ 
plied  about  his  or  her  goals  (e.g.,  Lesh,  Rich,  &  Sidner,  1999). 
Depending  on  the  experience  and  the  preferences  of  the  user, 
therefore,  the  user  can  rely  on  the  help  system  to  various  de¬ 
grees,  ranging  from  ignoring  it,  occasionally  asking  for  a  hint, 
or  being  led  step  by  step  through  the  entire  task. 
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Mediating  Interaction  With  the  Real  World 

Whereas  an  intelligent  help  system  aids  the  user  as  he  or  she 
uses  a  complex  interactive  system,  some  recently  developed  sys¬ 
tems  help  the  user  to  cope  with  the  world  itself  They  do  so  by 
acquiring  and  processing  evidence  concerning  the  user’s  cog¬ 
nitive  and  emotional  state  and  taking  actions  designed  to  miti¬ 
gate  any  conflict  between  this  state  and  the  demands  of  the 
environment. 

One  common  function  of  systems  in  this  category  is  to  pro¬ 
tect  people  from  the  flood  of  incoming  messages  (via  cell 
phone,  instant  messaging,  e-mail,  and  other  channels)  whose 
number  and  diversity  are  increasing  with  advances  in  commu¬ 
nication  technology.  When  a  potential  recipient  is  focusing  on 
some  particular  task  or  activity,  an  adaptive  assistant  causes  mes¬ 
sages  to  be  discouraged,  delayed,  or  otherwise  buffered  until 
some  more  appropriate  time.  One  strategy  is  to  provide  to  the 
potential  initiators  of  communication  information  about  the 
state  of  the  recipient.  The  experimental  prototype  Lilsys  (cf  Fig. 
22.10)  illustrates  this  strategy.  The  system  continuously  updates 
a  user  model  that  contains  assessments  of  its  user’s  availability 
for  communication.  The  assessments  are  based  on  a  number  of 
cues  that  have  been  found  in  previous  research  to  be  useful  pre¬ 
dictors  of  a  person’s  physical  presence  and  availability:  whether 
the  user  (or  someone  else  in  the  room)  is  moving  or  speaking; 
whether  the  door  is  open;  whether  the  user  is  using  the  phone 
or  the  computer  keyboard  and  mouse;  and  what  events  are 
scheduled  in  the  user’s  calendar.  A  handcrafted  model  uses  this 
information  to  arrive  at  a  global  assessment  of  the  user’s  avail- 


FIGURE  22.10.  Above:  Lilsys’s  sensor  and  data  acquisition 
module;  below:  Lilsys's  data  flow  and  a  screen  shot  of  the  user 
interface.  (Adapted  from  Figures  1  and  2  of:  "Lilsys:  Sensing 
unavailability,”  by  ].  Begole,  N.  E.,  Matsakis,  &  |.  C.  Tang,  2004,  In 
|.  Herbsleb  &  G.  Olson  (Eds.),  Proceedings  of  the  2004  Conference  on 
Computer-Supported  Cooperative  Work,  pp.  51 1-514,  New  York:  ACM. 
Copyright  2004  by  ACM.  Adapted  with  permission.) 


FIGURE  22.11.  Overview  of  adaptation  in  Lilsys. 


ability;  this  assessment  is  in  turn  displayed  to  potential  com¬ 
municators.^  A  field  study  with  a  small  number  of  users  indi¬ 
cated  that  other  persons  do  in  fact  adapt  their  behaviors  to  take 
into  account  a  Lilsys  user’s  availability,  often  by  changing  the 
nature  of  their  communications  rather  than  by  postponing  it. 
Lilsys  users  appreciated  the  possibility  of  having  their  availabili- 


*In  many  systems  in  this  category,  such  as  the  ones  mentioned  in  the  following  paragraph,  the  model  for  the  interpretation  of  evidence  is  acquired 
via  machine  learning  methods  on  the  basis  of  relevant  training  data. 
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ties  sensed  automatically,  as  opposed  to  having  explicitly  to 
specify  availability  themselves.  For  example,  they  virtually  never 
used  the  timer  switch  (visible  in  Fig.  22.10)  that  allowed  them  to 
specify  that  they  were  going  to  be  unavailable  for  a  particular  pe¬ 
riod.  More  generally,  the  modeling  of  a  user’s  changing  cogni¬ 
tive  or  emotional  state  appears  to  be  a  task  for  which  automatic 
adaptation  is  an  especially  promising  approach,  simply  because 
people  are  typically  not  willing  or  able  to  update  an  explicit  self- 
assessment  continually. 

In  some  other  availability  management  systems,  decisions 
about  when  and  how  to  present  messages  are  made  by  the  sys¬ 
tem  itself  on  the  basis  of  the  user  model  (e.g.,  Horvitz,  Koch, 
Sarin,  Apacible,  &  Subramani,  2005).  A  good  deal  of  research  has 
examined  effective  cues  for  the  recognition  of  availability  and 
interruptibility  (e.g.,  Fogarty  et  ah,  2005;  Ho  &  Intille,  2005; 
Iqbal,  Adamczyk,  Zheng,  &  Bailey,  2005). 

A  related  line  of  research  has  focused  on  the  recognition  of 
the  mental  states  of  drivers,  which  is  especially  important  be¬ 
cause  of  safety  issues.  The  modeling  of  interruptibility  is  impor¬ 
tant  here  as  well  (e.g.,  Schneider  &  Kiesler,  2005).  But  even 
when  no  other  persons  are  involved,  there  are  reasons  to  try  to 
recognize  safety-relevant  states,  like  drowsiness  and  stress,  so 
that  the  system  can  intervene,  for  example,  by  waking  the  driver 
up  or  by  playing  soothing  music.  Stress  and  emotion  are  mani¬ 
fested  in  physiological  indicators  (e.g.,  Healey  &  Picard,  2000; 
Lisetti  &  Nasoz,  2004)  and  in  speech  (e.g.,  Fernandez  &  Picard, 
2000;  Jones  &  Jonsson,  2005).  Products  along  these  lines  have 
begun  to  appear  in  cars,  beginning  with  relatively  simple  detec¬ 
tion  methods  such  as  the  recognition  of  long  or  frequent  eye 
closures.  An  example  of  a  more  complex  and  comprehensive 
approach  to  the  modeling  of  drivers’  affective  state  is  found  in 
the  work  of  Li  andji  (2005). 

A  general  problem  with  adaptation  for  the  purpose  of  safety 
is  that  the  user  may  come  to  rely  on  the  adaptation,  reducing  his 
or  her  own  attention  to  safety.^  For  example,  a  driver  may  make 
less  effort  to  avoid  distraction  or  to  remain  alert,  expecting  that 
the  assistant  will  recognize  any  dangerous  situation  and  warn 
him  or  her  in  time.  Especially  since  the  recognition  of  a  person’s 
mental  states  is  almost  always  error-prone,  this  tendency  can 
eliminate  the  potential  safety  benefits  of  monitoring  systems  un¬ 
less  appropriate  measures  are  taken  (e.g.,  making  warning 
sounds  so  unpleasant  that  the  driver  will  want  to  avoid  relying 
on  them  more  than  is  necessary). 

Controlling  a  Dialogue 

Much  of  the  early  research  on  user-adaptive  systems  concerned 
systems  that  conducted  natural  language  dialogues  with  their 
users  (see,  e.g.,  Kobsa  &  'Wahlster,  1989).  During  the  1990s,  at¬ 
tention  shifted  to  interaction  modalities  that  were  more  widely 
available  and  that  made  it  possible  in  many  cases  to  implement 
adaptation  straightforwardly.  Toward  the  year  2000,  advances  in 


the  technology  of  natural  language  and  speech  processing  led 
to  a  recent  reawakening  of  interest  in  user-adaptive  dialogue 
systems  (see,  e.g.,  Haller,  McRoy  &  Kobsa,  1999;  Zukerman  & 
Litman,  2001;  Litman  &  Pan,  2002). 

Natural  language  dialogue  has  served  as  an  interaction  modal¬ 
ity  in  connection  with  most  of  the  functions  of  user-adaptivity 
discussed  in  this  and  the  following  sections,  such  as  the  provi¬ 
sion  of  help  and  the  recommendation  of  products.  One  type  of 
adaptivity  is  largely  characteristic  of  natural  language  dialogue: 
adaptation  of  the  system’s  dialogue  strategy,  which  is  a  policy  for 
determining  when  and  how  the  system  should  provide  infor¬ 
mation,  acquire  information  from  the  user,  and  perform  other 
dialogue  acts. 

Adaptation  is  especially  important  in  spoken  dialogue  sys¬ 
tems,  such  as  those  that  offer  information  about  train  departures 
or  flight  arrivals  via  the  phone.  Novice  users  may  require  exten¬ 
sive  explanations  and  frequent  confirmations,  but  these  ele¬ 
ments  can  be  unnecessarily  time  consuming  and  frustrating  to 
experienced  users.  Many  deployed  systems  apply  simple  adap¬ 
tation  principles  that  distinguish  between  new  and  more  expe¬ 
rienced  users.  For  example,  if  a  phone-based  mail-order  system 
knows  that  the  current  caller  has  previously  ordered  a  product, 
it  may  adopt  a  dialogue  style  that  presupposes  familiarity  with 
the  system.  Since  it  is  not  always  this  easy  to  classify  the  user,  and 
since  asking  the  user  for  a  self-assessment  can  be  awkward  and 
time  consuming,  researchers  have  looked  for  ways  of  adapting  to 
the  user  on  the  basis  of  his  or  her  behavior  during  the  current  di¬ 
alogue  (e.g.,  Litman  &  Pan,  2002).  A  recent  example  is  shown  in 
Fig.  22.12,  which  shows  a  translation  of  an  example  dialogue  con¬ 
ducted  with  the  Kyoto  City  Bus  Information  System  (Komatani, 
Ueno,  Kawahara,  &  Okuno,  2005).  The  system  adjusts  its  assess¬ 
ments  of  three  properties  of  the  caller  on  the  basis  of  each  ut¬ 
terance  of  the  caller:  level  of  skill  at  conducting  dialogues  with 
this  system;  level  of  knowledge  concerning  the  domain  (e.g., 
Kyoto  and  its  buses);  and  desire  to  complete  the  dialogue 
quickly  (“urgency”).  The  evidence  used  is  a  set  of  properties 
such  as  the  amount  of  information  specified  in  each  utterance; 
whether  the  user  refers  to  bus  stops  by  their  exact  names;  and 
whether  the  user  barges  in,  interrupting  the  system  before  the 
system  has  completed  an  utterance.  The  dialogue  excerpt  in¬ 
cludes  several  examples  that  illustrate  how  the  system’s  dialogue 
behavior  is  adapted  on  the  basis  of  the  user  model.  In  a  con¬ 
trolled  experiment,  each  of  20  subjects  conducted  dialogues 
with  the  system  either  with  the  adaptive  version  or  with  a  version 
that  assumed  that  all  callers  had  the  same  properties.  Dialogues 
were  completed  more  quickly  with  the  adaptive  variant,  and  sub¬ 
jects  expressed  greater  satisfaction.® 

Another  popular  idea  about  how  to  adapt  dialogues  to  the 
user  concerns  the  recognition  of  negative  emotions  like  anger 
and  frustration,  the  goal  being  to  transfer  callers  who  express 
these  emotions  to  human  call  agents  before  they  are  lost  as 
customers.  Although  there  has  been  a  lot  of  research  on  the 
recognition  of  mental  states  on  the  basis  of  speech  in  dialogues 


brief  overview  of  the  theory  of  risk  homeostasis  is  given  by  Wilde  (1998),  while  a  recent  alternative  perspective  is  offered  by  Fuller  (2005). 

®See  the  section  on  empirical  methods  for  a  discussion  of  the  difficulties  involved  in  the  interpretation  of  the  results  of  experiments  like  this  one. 
Although  the  Kyoto  City  Bus  Information  System  is  accessible  to  the  public,  the  adaptive  features  described  here  are  normally  turned  off,  because  they 
lead  to  slower  processing  given  the  currently  available  infrastructure. 
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case;  [skill  level:  low]  [knowledge  level:  low]  [urgency:  low] 
Sys:  What  is  your  current  bus  stop,  your  destination,  or 

specific  bus  route  number? 

User:  Maruyama  Park. 

Sys:  Let  me  confirm  these  in  turn. 

E.xplanation  is  added  by  [skill  icvcl:  lowj.) 

Will  you  get  on  at  Maruyama  Park? 

(,<—  Content  is  confirmed  by  [urgency:  low[.) 

Reply  with  yes  or  no. 

(■s—  Instruction  is  added  by^  [skill  level:  low[.) 

User;  Yes. 

Sys:  Where  will  you  get  oft'  the  bus? 

Specify  by  bus  stop  name  or  a  well-known  landmark. 
Explanation  is  added  by  [skill  level:  low[.) 

User:  Tofukuji  Temple. 

Sys:  Will  you  get  off  the  bus  at  Tofukuji  Temple? 

Reply  with  yes  or  no. 

User:  Yes. 

Sys:  Following  buses  arc  located. 

(<—  Explanation  is  added  by  [skill  level:  lo«[.) 

Nearest  bus  stop  to  Maruyama  Park  is  Gion. 

(<—  Domain  information  is  added  by  [knowledge  level:  low|.) 

Bus  number  202  bound  for  Nishioji-Kujo  has  departed 
Higashiyama-Sanjo,  which  is  two  stops  away. 


FIGURE  22. 1 2.  Excerpt  from  a  dialog  with  the  Kyoto  City  Bus 
Information  System.  (Figure  2  of  "User  modeling  in  spoken  dia¬ 
logue  systems  to  generate  flexible  guidance,”  by  K.  Komatani, 
S.  Ueno,  T.  Kawahara,  &  H.  G.  Okuno,  2005,  User  Modeling  and  User- 
Adapted  Interaction,  15,  pp.  169-183.  Copyright  2005  by  Springer 
Verlag.  Reproduced  with  permission.) 


FIGURE  22.13.  Overview  of  adaptation  in  the  Kyoto  City  Bus 
Information  System. 


(e.g.,  Yacoub,  Simske,  Lin,  &  Burns,  2003;  Liscombe,  Riccardi, 
&  Hakkani-Ttir,  2005),  it  remains  to  be  seen  how  widespread 
this  particular  application  will  become.  One  possible  drawback 
is  that  with  some  systems  callers  might  find  it  worthwhile  to 
adapt  to  the  adaptation  (as  with  safety-relevant  adaptations), 
feigning  emotion  in  order  to  get  quicker  attention. 

Other  dialogue  adaptations  that  are  being  explored  concern 
stable  personal  characteristics  like  gender  and  age.  Since  it  is 
possible  to  recognize  these  characteristics  reasonably  well  on 
the  basis  of  speech,  a  system  might  adopt  a  voice  or  dialogue 


style  that  designers  thought  appropriate  for  the  age  and  gen¬ 
der  in  question  (e.g.,  Muller,  Wittig,  &  Baus,  2003). 


FUNCTIONS: 

SUPPORTING  INFORMATION  ACOUISITION 


We  are  constantly  hearing  that  information  overload  is  a  typical 
problem  of  our  age,  especially  because  of  the  explosive  growth 
of  the  Internet  and  in  particular  the  World  Wide  Web.  In  addi¬ 
tion  to  the  vast  number  of  electronic  documents  of  various 
sorts,  users  now  have  access  to  a  vast  number  of  products  avail¬ 
able  for  sale,  people  that  they  can  get  in  touch  with,  and  systems 
that  can  teach  them  about  some  topic.  The  second  major  type 
of  function  of  user-adaptive  systems  is  to  help  people  to  find 
what  they  need  in  a  form  that  they  can  deal  with. 


Helping  Users  to  Find  Information 

We  will  first  look  at  the  broad  class  of  systems  that  help  the  user 
to  find  relevant  electronic  documents,  which  may  range  from 
brief  news  stories  to  complex  multimedia  objects. 

As  an  especially  clear  example,  consider  the  situation  of  a 
user  who,  in  the  year  2006,  heard  that  a  lot  of  interesting  facts 
and  opinions  can  be  found  in  weblogs  (blogs),  of  which  millions 
are  accessible.  The  user  would  like  to  be  able  to  read,  each  day, 
the  articles  in  blogs  that  are  of  particular  interest.  But  how  is 
he  or  she  to  find  these?  The  user  does  not  know  in  advance 
which  blogs  are  especially  likely  to  produce  material  of  interest 
(as,  for  example,  he  or  she  could  specify  a  well-known  newspa¬ 
per  as  a  promising  source  of  online  stories  if  he  or  she  were  in¬ 
terested  in  news).  The  user  could  submit  queries  to  a  search  en¬ 
gine  that  indexes  blogs,  but  in  general,  he  or  she  cannot  know 
in  advance  what  topics  of  interest  will  be  covered  by  the  latest 
blog  articles;  and  given  the  low  quality  and  lack  of  authority  of 
many  blogs,  the  user  will  not  be  confident  of  receiving  good  re¬ 
sults  on  any  given  topic. 

An  approach  to  this  problem  that  relies  heavily  on  adapta¬ 
tion  to  the  individual  user  (called  “personalization”  in  this  con¬ 
text)  is  realized  at  the  time  of  this  writing  in  the  Website  Find- 
ory  (http://findory.com),  which  offers  access  to  both  blogs  and 
news  articles.  To  the  new  user  visiting  the  blog  section  of  the 
website,  Findory  offers  a  page  that  shows  the  first  few  lines  of 
a  number  of  blog  articles  on  different  topics  (cf  Fig.  22.14).  The 
user  can  then  click  to  read  the  articles  that  interest  him  or  her 
most.  Each  selection  causes  the  system  to  update  its  model  of 
the  user’s  interests  and  adapt  the  selection  of  blog  articles  ac¬ 
cordingly.  For  example,  if  the  user  chooses  an  article  discussing 
a  copyright  infringement  suit  against  a  search  engine  company, 
further  articles  concerning  copyright  issues  and  search  engines 
are  likely  to  appear,  marked  with  the  sunburst  icon  that  is  visi¬ 
ble  in  Fig.  22.14.  If  the  user  clicks  on  the  icon  for  a  recom¬ 
mended  article,  a  page  is  displayed  that  explains  the  recom¬ 
mendation  in  a  style  similar  to  that  ofAmazon.com  (cf  Fig.  22.1 
and  the  next  subsection)  with  a  list  of  articles  that  the  user  has 
read  in  the  past  which  are  similar  to  the  recommended  item 
in  terms  of  their  content  (which  the  system  can  characterize  on 
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jamesona  40  Articles  Read  15  Searches  1  Favorite  Sign  out 
Home  News  Blogs  |“”*  S6arch  go 

Findory  Blogs  Top  Stories 


Even  Senator  Kyi,  Bush  sycophant,  is  now  distancing 
himself  from  Bush  .Hr 
AMERICAblog  (14  hours  ago) 

Ok,  this  is  telling.  NYT  :  Senator  Jon  Kyi,  a  staunch 
supporter  of  President  Bush  who  faces  a  potentially  difficult 
re-election  fight  this  year,  is  hearing  a  lot  from  constituents  in 
Arizona  about  the  plan  to  allow  a  Dubai  company  to  operate 
shipping  terminals  at  Eastern  ports.  Most  think  the  deal 
should  be  stopped.  "It  is  almost  all  critical  to  dubious,”  Mr. 

Kyi  said,  referring  to  public  opinion  ...  (read  more) 


Harvard  Stanford  aixl  a  Man  named  Summers 
LawPundit  (19  hours  ago) 

Just  a  couple  of  months  ago  we  were  in  Stratford  upon  Avon, 
location  of  Harvard  House,  home  of  John  Harvard,  the 
founder  of  what  is  now  Harvard  University.  We  see  that 
Lawrence  H.  Summers  has  just  resigned  as  President  of 
Harvard,  in  part  because  he  was  opposed  by  organizations 
such  as  NOW  who  claimed  (read  more) 

State  Blogs 
MyDD  n2  hours  ago) 


FIGURE  22.14.  A  small  part  of  a  personalized  display  of  Findory  (http://findory.com,  March  2006). 
(The  icon  that  appears  after  a  title  indicates  that  the  entry  has  been  recommended  on  the  basis 
of  the  articles  that  the  user  has  read  previously.) 


FIGURE  22. 1 5.  Overview  of  adaptation  in  Findory. 


the  basis  of  the  words  in  the  text)  or  in  terms  of  the  users  who 
have  previously  read  them  (G.  Linden,  personal  communica¬ 
tion,  February,  2006).  If  the  user  sees  in  this  way  an  article  that 
he  or  she  does  not  want  to  be  used  for  recommendations  in 
the  future,  he  or  she  can  delete  it  from  the  reading  history 
Findory’s  approach  relies  on  the  user  being  able  to  identify 
early  in  his  or  her  use  of  the  system,  some  articles  that  interest 
him  or  her  and  that  can  therefore  serve  as  examples  for  the  sys¬ 
tem’s  learning.  This  process  is  facilitated  by  the  opportunities 
that  the  user  has  to  issue  explicit  queries  with  keywords  and  to 
consult  trusted  sources  in  the  News  section  of  the  website.  The 
main  advantage  of  this  approach  is  that  the  user  need  not  make 
any  effort  to  specify  explicitly  in  what  types  of  content  he  or 
she  is  interested.  Any  such  effort  would  be  problematic  anyway 
in  that  (a)  it  can  be  difficult  and  tedious  to  describe  a  large  num¬ 
ber  of  general  interests  accurately,  for  example  by  specifying  rel¬ 


evant  key  words;  and  (b)  since  interests  change  over  time  and  as 
a  function  of  current  developments,  the  user  would  have  to 
keep  updating  the  descriptions. 

More  generally  speaking,  user-adaptive  systems  that  help 
users  find  information^  typically  draw  from  the  vast  repertoire 
of  techniques  for  analyzing  textual  information  (and  to  a  lesser 
extent,  information  presented  in  other  media)  that  have  been 
developed  in  the  field  of  information  retrieval.  The  forms  of 
adaptive  support  are  in  part  different  in  three  different  situa¬ 
tions,  the  first  two  of  which  can  arise  with  Findory: 

Support for  browsing.  In  the  World  Wide  Web  and  other 
hypermedia  systems,  users  often  actively  search  for  desired  in¬ 
formation  by  examining  information  items  and  pursuing  cross- 
references  among  them.  A  user-adaptive  hypermedia  system 
can  help  focus  the  user’s  browsing  activity  by  recommending  or 
selecting  promising  items  or  directions  of  search  on  the  basis 
of  what  the  system  has  been  able  to  infer  about  the  user’s  in¬ 
formation  needs.  An  especially  attractive  application  scenario  is 
that  of  mobile  information  access,  where  browsing  through  ir¬ 
relevant  pages  can  be  especially  time-consuming  and  expensive. 
In  this  context,  the  best  approach  may  be  for  the  system  to  omit 
entirely  links  that  it  expects  to  be  less  interesting  to  the  indi¬ 
vidual  user.  Billsus  and  Pazzani  (2007)  describe  a  case  study  of 
an  adaptive  news  server  that  operates  in  this  way.  Stationary  sys¬ 
tems  with  greater  communication  bandwidth  tend  to  include  all 
of  the  same  links  that  would  be  presented  by  a  nonadaptive  sys¬ 
tem,  highlighting  the  ones  that  are  most  likely  to  be  of  interest 
or  presenting  separate  lists  of  recommended  links.  As  is  argued 
and  illustrated  by  Tsandilas  and  schraefel  (2004),  this  approach 
makes  it  easier  for  the  user  to  remedy  incorrect  assessments  of 
the  user’s  interests  on  the  part  of  the  system. 


^Surveys  of  parts  of  this  large  area  are  provided  by,  among  others,  Kelly  and  Teevan  (2003)  and  several  chapters  in  the  collection  edited  by  Brusilovsky, 
Kobsa,  and  Nejdl  (2007). 
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Support  for  query-based  search  or  filtering.  Web 

search  engines  have  been  enormously  successful  and  popular  in 
this  context,  but  they  have  usually  exhibited  one  limitation:  The 
results  presented  for  a  given  query  have  not  depended  on  the 
interests  or  previous  behavior  of  the  individual  user.  By  con¬ 
trast,  with  personalized  search,  the  search  engine  keeps  track 
of  user’s  search  history,  builds  up  some  sort  of  model  of  the 
user’s  interests  (either  by  keeping  track  of  and  analyzing  the 
user’s  search  history  or  by  asking  for  an  explicit  description  of 
interests),  and  biases  the  results  presented  accordingly  by  re¬ 
ordering  or  filtering  the  results.  A  good  deal  of  research  (e.g., 
Teevan,  Dumais,  &  Horvitz,  2005;  Micarelli,  Gasparetti,  Sciar- 
rone,  &  Gauch,  2007)  has  demonstrated  the  potential  benefits 
of  this  strategy.  During  the  year  before  the  writing  of  this  chap¬ 
ter,  a  personalized  variant  of  the  search  engine  Google  was  in¬ 
troduced  that  sometimes  reranked  search  results  on  the  basis 
its  record  of  the  user’s  previous  web  searching  behavior.  How¬ 
ever,  it  is  unclear  how  widespread  this  approach  will  become. 
The  added  value  of  personalization  is  less  obvious  when  the 
user  has  given  an  explicit  query  than  when  the  user  is  simply 
looking  for  something  interesting,  as  is  often  the  case  with  Find- 
ory.  With  an  explicit  query,  it  is  feasible  and  worthwhile  for  the 
user  to  think  about  an  informative  description  of  his  or  her  in¬ 
terests  and  to  modify  his  or  her  query  (perhaps  repeatedly)  on 
the  basis  of  the  results  obtained. 

An  interesting  variant  on  personalized  search  is  found  in  the 
system  TSpy  (e.g.,  Smyth  et  al.,  2005):  This  community-oriented 
search  engine  tailors  the  results  of  web  search  queries  to  an  en¬ 
tire  community  of  users,  such  as  the  employees  of  a  particular 
company.  It  moves  upward  in  the  search  results  list  results  that 
have  been  clicked  on  by  previous  users  in  the  community  who 
had  issued  the  same  or  similar  queries. 

Spontaneous  Provision  of  Information 

A  number  of  systems  present  information  that  may  be  useful 
to  the  user  even  while  the  user  is  simply  working  on  some  task 
and  is  making  no  effort  to  find  information.  A  recent  prototype 
that  has  been  deployed  at  a  research  laboratory  is  the  FXPAL 
Bar  (Billsus,  Hilbert,  &  Maynes-Aminzade,  2005).  While  an  em¬ 
ployee  visits  websites  in  the  course  of  normal  work,  the  sys¬ 
tem  searches  in  the  background  for  potentially  relevant  infor¬ 
mation  (e.g.,  about  company  visitors  and  internal  publications). 
A  central  design  issue  for  this  and  similar  systems,  concerns  the 
methods  for  making  the  retrieved  information  available  to  the 
user.  Presentation  of  results  via  means  like  pop-up  windows 
risks  being  obtrusive  (cf  the  section  on  usability  challenges 
below),  but  if  the  presentation  is  too  subtle,  users  will  often 
ignore  the  recommendations  and  derive  little  or  no  benefit 
from  the  system.  Moreover,  the  optimal  solution  in  general  dif¬ 
fers  from  one  user  to  the  next.  Billsus  et  al.  (2005)  reported 
studies  with  a  variety  of  interface  solutions  for  the  FXPAL  Bar, 
some  of  which  are  adaptable  by  the  user  (e.g.,  the  size  of  a 
translucent  pop-up  window  that  describes  a  potentially  rele¬ 
vant  document).^® 


Recommending  Products 

One  of  the  most  practically  important  categories  of  user-adap¬ 
tive  systems  today  comprises  the  product  recommenders  that  are 
found  in  many  commercial  websites.  The  best-known  system,  the 
recommender  of  Amazon.com,  was  discussed  briefly  in  the  intro¬ 
duction  to  this  chapter.  Looking  more  closely  at  Fig.  22.1,  we  can 
see  some  distinguishing  features  of  this  approach  to  recommen¬ 
dation.  As  can  be  seen  from  the  brief  explanations  that  accom¬ 
pany  the  recommendations,  the  system  takes  as  a  starting  point 
the  information  it  has  about  the  user’s  ownership  or  evaluation  of 
particular  products.  It  then  recommends  products  that  are  simi¬ 
lar  in  the  sense  that  there  is  large  overlap  in  the  sets  of  customers 
that  buy  them  (hence  the  familiar  explanations  of  the  form  “Cus¬ 
tomers  who  bought  this  title  also  bought .  .  .’’).  That  is,  the  rec¬ 
ommendations  are  based  on  a  statistical  analysis  of  purchases 
made  by  many  users,  an  approach  known  as  collaborative  filter¬ 
ing  (see,  e.g.,  Schafer,  Frankowski,  Herlocker,  &  Sen,  2007,  for 
an  overview).  The  products  recommended  in  this  way  may  also 
be  similar  in  the  sense  of  having  the  same  author  or  a  similar  ti¬ 
tle  (as  in  the  examples  in  the  figure),  but  similarities  of  this  sort 
can  also  be  conspicuously  absent:  In  the  category  “Coming  soon,” 
the  user  of  Fig.  22.1  was  recommended  the  DVD  The  Island  be¬ 
cause  of  having  positively  rated  a  Sony  VAIO  notebook  PC.  As  is 
explained  by  Linden,  Smith,  and  York  (2003),  the  details  of  this 
particular  variant  of  collaborative  filtering  are  due  largely  to  the 
constraint  that  it  has  to  be  able  to  cope  with  Amazon.com’s  mil¬ 
lions  of  products  and  customers.  Although  it  is  generally  ac¬ 
knowledged  that  the  recommendations  are  not  always  accurate, 
they  can  yield  notable  benefits  simply  by  being  better  than  the 
generic  recommendations  (e.g.,  of  top-selling  items)  that  would 
be  presented  without  personalization. 

Some  product  recommenders  allow  and  require  the  user  to 
specify  his  or  her  evaluation  criteria  explicitly,  instead  of  simply 
rating  or  purchasing  individual  items.  For  example,  with  the  Ac¬ 
tive  Buyers  Guide  in  Fig.  22.16,  the  user  has  specified  how  he  or 
she  intends  to  use  the  digital  camera  that  he  or  she  would  like 
to  buy,  and  the  system  has  recommended  three  cameras,  ex¬ 
plaining  why  each  one  is  suitable  in  terms  of  the  user’s  require¬ 
ments.  This  needs-based  approach  to  recommendation  offers  a 
natural  alternative  to  the  purely  statistical  approach  of  systems 
like  Amazon.com  when  relatively  complex  and  important  deci¬ 
sions  are  involved  for  which  it  is  worthwhile  for  the  user  to 
think  carefully  about  the  attributes  of  the  products  in  question. 
However,  it  does  require  that  a  good  deal  of  knowledge  about 
the  features  of  products  and  their  relationships  to  user  require¬ 
ments  be  incorporated  in  the  system. 

An  intermediate  approach  between  these  two  extremes  is 
the  critiquing  paradigm  (for  an  early  exposition,  see  Burke, 
Hammond,  &  Young,  1997;  for  an  evaluation  of  some  recent  ad¬ 
vances,  see  McCarthy,  Reilly,  McGinty  &  Smyth,  2005).  The  dis¬ 
tinguishing  feature  is  an  iterative  cycle  in  which  the  system  pro¬ 
poses  a  product  (e.g.,  a  restaurant  in  a  given  city),  the  user 
criticizes  the  proposal  (e.g.,  asking  for  a  more  casual  restaurant), 
and  the  system  proceeds  to  propose  a  similar  product  that  takes 
the  critique  into  account. 


^“Influential  earlier  systems  in  this  category  include  those  of  Rhodes  (2000)  and  Budzik,  Hammond,  and  Birnbaum  (2001). 
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The  PowerShot  S0400  gets  our#l  ranking,  based 
on  your  needs.  It  has  a  large  LCD  screen,  and 
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The  Powershot  SD4S0  is  our  #2  pick.  Although  it  is  The  PowerShot  SD200  gets  our  #3  ranking,  based  on 
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inside  your  shirt  pocket,  and  meets  your  photo  quality  your  zoom  requirements.  It  also  fits  inside  your  shirt 
requirements.  pocket,  and  has  mostly  automatic  controls. 


FIGURE  22.16.  Partial  screen  shot  from  the  Active  Buyers  Guide  recommender  for  digital 
cameras.  (Screen  shot  made  from  http://www.activebuyersguide.com  in  December  2005.) 


Since  some  products  are  often  used  by  groups  of  users 
(e.g.,  movies,  vacations),  a  number  of  systems  have  been  de¬ 
veloped  that  explicitly  address  groups  (for  an  overview,  see 
Jameson  &  Smyth,  2007).  The  need  to  address  a  group  rather 
than  an  individual  has  an  impact  on  several  aspects  of  the  rec¬ 
ommendation  process.  Users  may  want  to  specify  their  prefer¬ 
ences  in  a  collaborative  way;  there  must  be  some  appropriate 
and  fair  way  of  combining  the  information  about  the  various 
users’  preferences;  and  the  explanations  of  the  recommenda¬ 
tions  may  have  to  refer  to  the  preferences  of  the  several  individ¬ 
ual  group  members. 

Product  recommenders  of  these  various  types  address  sev¬ 
eral  problems  that  computer  users  typically  experience  when 
they  search  for  products: 

1.  The  user  may  not  know  what  aspects  of  the  products  to  at¬ 
tend  to  or  what  criteria  should  determine  his  or  her  deci¬ 
sion.  Some  recommenders  either  (a)  make  it  less  necessary 
for  the  user  to  be  explicitly  aware  of  his  or  her  evaluation 
criteria  (as  when  collaborative  filtering  is  used)  or  (b)  help 
the  user  to  learn  about  his  or  her  own  criteria  during  the 
course  of  the  interaction  with  the  system. 

2.  If  the  user  is  unfamiliar  with  the  concepts  used  to  character¬ 
ize  the  products,  he  or  she  may  be  unable  to  make  effective 
use  of  any  search  or  selection  mechanisms  that  may  be  pro¬ 
vided.  Product  recommenders  generally  reduce  this  com¬ 
munication  gap  by  allowing  the  user  to  specify  his  or  her  cri¬ 
teria  (if  this  is  necessary  at  all)  in  terms  that  are  more  natural 
to  him  or  her.  For  example,  in  Fig.  22.16,  the  user  does  not 
need  to  know  in  advance  how  many  megapixels  his  or  her 
camera  requires,  since  the  question  about  photo  quality  is 
formulated  in  everyday  terms. 

3.  The  user  may  have  to  read  numerous  product  descriptions 
in  various  parts  of  the  website,  integrating  the  information 


found  in  order  to  arrive  at  a  decision.  Once  a  product  rec¬ 
ommender  has  acquired  an  adequate  user  model,  the  system 
can  take  over  a  large  part  of  this  work,  often  examining  the 
internal  descriptions  of  a  much  larger  number  of  products 
than  the  user  could  deal  with. 

From  the  point  of  view  of  the  vendors  of  the  products  con¬ 
cerned,  the  most  obvious  potential  benefit  is  that  users  will  find 
one  or  more  products  that  they  consider  worth  buying,  instead 
of  joining  the  notoriously  large  percentage  of  browsers  who 
never  become  buyers.  A  related  benefit  is  the  prospect  of  cross¬ 
selling:  the  system’s  model  of  the  user  can  be  employed  for  the 
recommendation  of  further  products  that  the  user  might  not 
have  considered  him-  or  herself  Finally,  some  vendors  aim  to 
build  up  customer  loyalty  with  recommenders  that  acquire 
long-term  models  of  individual  customers:  If  the  user  believes 
that  the  system  has  acquired  an  adequate  model  of  him  or  her, 
the  user  will  tend  to  prefer  to  use  the  system  again  rather  than 
starting  from  scratch  with  some  other  system. 

Tailoring  Information  Presentation 

Even  after  a  system  has  decided  which  documents  or  products 
to  present  to  a  user,  the  question  may  remain  of  exactly  how  to 
present  them.  In  some  cases,  this  question  should  be  answered 
differently  for  different  users.  In  Fig.  22.16,  the  verbal  descrip¬ 
tions  of  the  recommended  products  refer  explicitly  to  the  pref¬ 
erences  that  the  user  has  expressed;  if  they  did  not,  the  user 
would  have  to  invest  more  effort  to  judge  how  well  each  prod¬ 
uct  met  his  or  her  requirements. 

An  example  from  a  research  prototype  is  found  in  the  system 
RIA  (Fig.  22.17;  Zhou  &  Aggarwal,  2004),  a  multimodal  system 
that  helps  users  search  for  real  estate,  often  presenting  infer- 


444  •  JAMESON 


User:  Show  ranches  unter  $800K  in  Armonk. 
Ria:  I  found  4  ranches  under  $800K  in  Armonk. 


Emphasis  on  financial,  exter- 


Emphasis  on  size  and 


ior,  and  interior  aspects 


amenities 


FIGURE  22.17.  Two  cropped  screen  shots  from  the  RIA  multi- 
media  conversation  system.  (Screen  shots  courtesy  of  Vikram 
Aggarwal.) 


FIGURE  2  2 . 1 8 .  Overview  of  adaptation  in  RIA. 


mation  about  houses  on  a  map.  As  Fig.  22.17  shows,  the  amount 
of  space  available  for  describing  a  house  is  limited,  so  it  is  im¬ 
portant  to  select  the  information  that  is  most  likely  to  help  the 
user  to  decide  how  to  proceed;  otherwise,  the  user  may  have 
to  request  additional  information  explicitly,  which  would  slow 
down  the  interaction.  The  process  of  user  model  acquisition 
(the  three  left-most  nodes  in  Fig.  22.18)  is  quite  straightforward 
in  RIA:  Before  the  interaction  begins,  the  user  is  asked  a  small 
number  of  questions  about  his  or  her  interest  and  knowledge 
concerning  houses.  The  sophisticated  part  of  the  system  is  the 
way  in  which  it  uses  this  information — along  with  information 
about  the  houses  that  satisfy  the  query — to  decide  which  infor¬ 
mation  to  select  for  presentation  and  how  it  is  to  be  ordered. 
The  problem  is  one  of  optimizing  the  presentation  with  respect 
to  a  large  set  of  constraints.^^  In  an  evaluation,  the  displays 
generated  by  this  method  were  similar  to  those  generated  by  a 


human  designers,  who  found  the  task  of  selecting  the  appro¬ 
priate  information  items  to  be  quite  time-consuming  (cf  the 
section  on  empirical  methods  for  comments  on  this  evaluation 
method). 

Another  class  of  systems  in  which  the  tailoring  of  informa¬ 
tion  to  individual  users  has  promise  comprises  systems  that  pre¬ 
sent  medical  information  to  patients  (for  an  overview,  see 
Cawsey  Grasso,  &  Paris,  2007). 

Properties  of  users  that  may  be  taken  into  account  in  the  tai¬ 
loring  of  documents  include  the  user’s  degree  of  interest  in  par¬ 
ticular  topics;  the  user’s  knowledge  about  particular  concepts 
or  topics;  the  user’s  preference  or  need  for  particular  forms  of 
information  presentation;  and  the  display  capabilities  of  the 
user’s  computing  device  (e.g.,  web  browser  vs.  cell  phone).  One 
strong  point  of  the  optimization  approach  taken  with  RIA  is  that 
all  of  these  factors  can  be  represented  and  taken  into  account 
within  a  uniform  framework. 

Even  in  cases  where  it  is  straightforward  to  determine  the 
relevant  properties  of  the  user,  the  automatic  creation  of 
adapted  presentations  can  require  sophisticated  techniques 
of  natural  language  generation  (e.g.,  Bontcheva  &  "Wilks, 
2005)  and  multimedia  presentation  generation.  "Various  less 
complex  ways  of  adapting  hypermedia  documents  to  individ¬ 
ual  users  have  also  been  developed  (see  Bunt,  Carenini,  & 
Conati,  2007). 


Supporting  Collaboration 

The  increasing  tendency  for  computer  users  to  be  linked  via 
networks  has  made  it  increasingly  feasible  for  users  to  collabo¬ 
rate,  even  in  a  spontaneous  way  and  without  prior  acquain¬ 
tance.  A  system  that  has  models  of  a  large  number  of  users  can 
facilitate  such  collaboration  by  taking  into  account  the  ways  in 
which  users  match  or  complement  each  other. 

A  striking,  though  not  typical,  example  is  the  system  AgentSalon 
(Sumi  &  Mase,  2001,  2002),  shown  in  Fig.  22.19.  The  system  is 
used  at  conferences,  in  conjunction  with  a  handheld  guide 
(PalmGuide)  that  collects  information  about  exhibits  that  the 
user  has  visited  and  ratings  that  he  or  she  has  given  of  them  (the 
purpose  within  PalmGuide  being  to  make  recommendations  to 
the  user  about  other  exhibits).  When  two  visitors  agree  to  work 
with  AgentSalon,  the  information  about  them  is  transferred 
from  their  handhelds  to  AgentSalon.  Like  a  host  at  a  party, 
AgentSalon  then  looks  for  topics  about  which  the  two  visitors 
might  hold  an  interesting  conversation,  such  as  an  exhibit  about 
which  they  gave  different  ratings.  The  system  tries  to  get  a  con¬ 
versation  going  by  having  two  animated  agents  simulate  a  con¬ 
versation  between  these  two  visitors. 

User  modeling  has  been  applied  in  connection  with  several 
(partially  overlapping)  types  of  collaboration: 

•  In  computer-supported  learning  environments,  in  which  the 
idea  of  collaborative  learning  has  gained  popularity  in  re¬ 
cent  years  (e.g.,  Soller,  2007) 


^'Decisions  about  what  modalities  to  use  for  presentation — for  example,  text  or  speech  output — are  made  in  a  similar  way  (cf  Zhou,  Wen,  & 
Aggarwal,  2005). 
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FIGURE  22.21.  Screen  shots  from  the  SQL-Tutor.  (Above:  the 
main  interface;  below:  display  of  the  learner  model.  Screen  shots 
courtesy  of  Antonija  Mitrovic.) 


FIGURE  22.19.  Attempt  by  AgentSalon  to  stimulate  discussion 
between  two  conference  visitors.  (The  system  has  identified  an 
interesting  topic  by  comparing  the  records  of  their  conference 
experiences  that  have  been  stored  on  their  PDAs.  Figure  1  of 
AgentSalon:  Facilitating  face-to-face  knowledge  exchange  through 
conversations  among  personal  agents,”  by  Y.  Sumi  &  K.  Mase, 
200 1 ,  in  Proceedings  of  the  Fifth  International  Conference  on  Autonomous 
Agents,  pp.  393-400,  New  York:  ACM.  Research  conducted  at  ATR 
Media  Information  Science  Laboratories,  Kyoto.  Copyright  2001 
by  the  Association  for  Computing  Machinery,  Inc.  Reproduced 
with  permission.) 
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FIGURE  22.20.  Overview  of  adaptation  in  AgentSalon. 


•  As  a  way  of  providing  intelligent  help  for  complex  tasks  (e.g., 
Vivacqua  &  Lieberman,  2000;  Aberg  &  Shahmehri,  2001); 
putting  a  human  expert  into  the  loop  is  a  way  of  avoiding 
some  of  the  difficulties  associated  with  fully  automatic  adap¬ 
tive  help  systems  that  were  discussed  above 

•  In  environments  for  computer-supported  cooperative  work 
within  organizations  (e.g.,  Terveen  &  McDonald,  2005) 


FIGURE  22.22.  Overview  of  adaptation  in  the  SQL  Tutor. 


Supporting  Learning 

Research  on  student  modeling — or  learner  modeling,  as  it  has 
been  called  more  often  in  recent  years — aims  to  add  user-adap¬ 
tivity  to  computer-based  tutoring  systems  and  learning  environ¬ 
ments  (cf  Corbett,  A.  T.,  Koedinger,  &  Anderson,  1997).^^ 

Increasingly,  learning  environments  are  being  made  avail¬ 
able  on  the  World  Wide  Web.  An  example  is  the  SQL-Tutor 
(e.g.,  Mitrovic,  Suraweera,  Martin,  &  Weerasinghe,  2004),  which 
teaches  the  database  query  language  SQL.^^  The  top  part  of 
Fig.  22.21  illustrates  how  the  tutor  presents  a  database  querying 
problem  and  gives  feedback  on  the  learner’s  solution.  The 
lower  part  of  the  figure  shows  a  simple  visualization  of  the 
learner  model,  which  indicates  the  learner’s  degree  of  mastery 
of  each  of  the  six  clauses  of  the  SQL  SELECT  statement.  It  also 
shows  a  suggestion  by  the  system  about  what  type  of  problem 
the  learner  should  attempt  next — one  of  several  ways  in  which 
the  system  helps  the  learner  to  pick  a  problem  to  work  on  next. 


^^Good  sources  of  literature  include  the  International  Journal  of  Artificial  Intelligence  in  Education  and  the  proceedings  of  the  biennial  Confer¬ 
ences  on  Artificial  Intelligence  in  Education  (e.g.,  Looi,  McCalla,  Bredeweg,  &  Breaker,  2005). 

^^The  tutor  is  available  to  registered  students  via  Addison-Wesley’s  Website  Database  Place  (http:/Avww.aw-bc.com/databaseplace/). 
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A  number  of  different  aspects  of  the  SQL-Tutor  were  evalu¬ 
ated  in  10  studies  (e.g.,  JMitrovic  et  al.,  2004;  Ivlitrovic  &  Ohlsson, 
1999),  including  2  studies  that  showed  the  value  of  helping 
learners  to  choose  the  next  problem  and  showing  learners  their 
learner  model. 

Interaction  in  intelligent  tutoring  systems  and  intelligent 
learning  environments  can  take  many  forms,  ranging  from 
tightly  system-controlled  tutoring  to  largely  free  exploration  by 
the  learner.  Aspects  of  the  system  that  can  be  adapted  to  the 
individual  user  include  (a)  the  selection  and  the  form  of  the  in¬ 
structional  information  presented,  (b)  the  content  of  problems 
and  tests,  and  (c)  the  content  and  timing  of  hints  and  feedback. 

Learner-modeling  systems  may  adapt  their  behaviors  to  any 
of  a  broad  variety  of  aspects  of  the  user,  such  as  (a)  the  user’s 
knowledge  of  the  domain  of  instruction,  including  knowledge 
acquired  prior  to  and  during  the  use  of  the  system,  (b)  the 
user’s  learning  style,  motivation,  and  general  way  of  looking  at 
the  domain  in  question,  and  (c)  the  details  of  the  user’s  cur¬ 
rent  processing  of  a  problem. 

The  underlying  assumption  is  that  the  adaptation  of  the  sys¬ 
tem’s  behavior  to  some  of  these  properties  of  the  learner  can 


lead  to  more  effective  and  more  enjoyable  learning.  One  series 
of  studies  that  directly  demonstrated  the  added  value  of  learner- 
adaptive  tutoring  is  described  by  A.  Corbett  (2001).  JVIany  eval¬ 
uations,  however,  do  not  focus  on  measuring  the  benefits  of 
adaptivity  but  rather  on  comparing  alternative  variants  of  the 
same  adaptive  system.  In  some  cases,  it  has  been  found  that  the 
modeling  of  the  learner,  however  well  realized,  is  not  necessary 
for  the  effective  functioning  of  the  learning  environment  (see, 
e.g.,  VanLehn  et  al.,  2005). 


USABILITY  CHALLENGES 


Some  of  the  typical  properties  of  user-adaptive  systems  can  lead 
to  usability  problems  that  may  outweigh  the  benefits  of  adapta¬ 
tion  to  the  individual  user.  Discussions  of  these  problems  have 
been  presented  by  a  number  of  authors  (e.g.,  Norman,  1994; 
Wexelblat  &  Maes,  1997;  Hook,  2000;  Tsandilas  &  Schraefel, 
2004;  and  the  references  given  below  in  this  section).  Fig.  22.23 
gives  a  high-level  summary  of  many  of  the  relevant  ideas,  using 
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FIGURE  22.23.  Overview  of  usability  challenges  for  user-adaptive  systems  and  of  ways  of  deal¬ 
ing  with  them.  (Dashed  arrows  denote  threats  and  solid  arrows  mitigation  of  threats,  respec¬ 
tively;  further  explanation  is  given  in  the  text.) 


22.  Adaptive  Interfaces  and  Agents  *  447 


the  metaphor  of  signs  that  give  warnings  and  advice  to  persons 
who  enter  a  dangerous  area. 

The  Usability  Threats,  shown  in  the  third  column,  concern 
several  generally  desirable  properties  of  interactive  systems. 
Those  referred  to  by  the  top  three  signs  (concerning  predictabil¬ 
ity  and  comprehensibility,  controllability,  and  unobtrusiveness) 
correspond  to  general  usability  principles.  The  remaining  two 
threats,  to  privacy  and  to  breadth  of  experience  are  especially 
relevant  to  user-adaptive  systems. 

The  Typical  Properties  column  lists  some  frequently  encoun¬ 
tered  (though  not  always  necessary)  properties  of  user-adaptive 
systems,  each  of  which  has  the  potential  of  creating  particular 
usability  threats. 

Each  of  the  remaining  two  columns  shows  a  different  strat¬ 
egy  for  avoiding  or  mitigating  one  or  more  usability  threats: 
Each  of  the  Preventive  Measures  aims  to  ensure  that  one  of  the 
Typical  Properties  is  not  present  in  such  a  way  that  it  would 
cause  problems.  Each  of  the  Compensatory  Measures  aims  to 
ward  off  one  or  more  threats  once  it  has  arisen. 

A  discussion  of  all  of  the  relationships  indicated  in  Fig.  22.23 
would  exceed  the  scope  of  this  chapter,  but  some  remarks  will 
clarify  the  main  ideas. 

Threats  to  Predictability  and  Comprehensibility 

The  concept  of  predictability  refers  to  the  extent  to  which  a 
user  can  predict  the  effects  of  his  or  her  actions.  Comprehensi¬ 
bility  is  the  extent  to  which  the  user  can  understand  system  ac¬ 
tions  and/or  has  a  clear  picture  of  how  the  system  works. 
These  goals  are  grouped  together  here  because  they  are  asso¬ 
ciated  with  largely  the  same  set  of  other  variables. 

Users  can  try  to  predict  and  understand  a  system  on  several 
different  levels  of  detail. 

1.  Exact  layout  and  responses.  Especially  detailed  predictability 
is  important  when  interface  elements  are  involved  that  are  ac¬ 
cessed  frequently  by  skilled  users — for  example,  icons  in  con¬ 
trol  panels  or  options  in  menus  (c£  the  discussion  of  interface 
adaptations  above).  If  the  layout  and  behavior  of  the  system  is 
highly  predictable — in  fact,  essentially  identical — over  time, 
skilled  users  can  engage  in  automatic  processing  (see,  e.g., 
Hammond,  1987):  They  can  use  the  parts  of  the  interface 
quickly,  accurately,  and  with  little  or  no  attention.  In  this  situ¬ 
ation,  even  minor  deviations  from  complete  predictability  on 
a  fine-grained  level  can  have  the  serious  consequence  of 
making  automatic  processing  impossible  or  error-prone. 

2.  Success  at  specific  subtasks.  Users  may  desire  only  more 
global  predictability  and  comprehensibility  when  the  system 
is  performing  some  more  or  less  complex  task  on  the  user’s 
behalf  (e.g.,  searching  for  suitable  products  on  the  web):  In 
the  extreme  case,  the  user  may  want  only  to  predict  (or  eval¬ 
uate)  the  quality  of  the  result  of  a  complex  system  action. 

3.  Overall  competence.  The  most  global  form  of  predictability 
and  comprehensibility  concerns  the  user’s  ability  to  assess 
the  system’s  overall  level  of  competence:  the  degree  to 


which  the  system  tends  in  general  to  perform  its  tasks  suc¬ 
cessfully.  'With  many  types  of  system,  high  overall  compe¬ 
tence  can  be  taken  for  granted;  but  as  we  have  seen,  the 
processes  of  acquiring  and  applying  user  models  do  not  in 
general  ensure  a  high  degree  of  accuracy.  If  the  user  seri¬ 
ously  overestimates  the  system’s  competence,  he  or  she  may 
rely  on  the  system  excessively;  if  the  user  underestimates  the 
system,  he  or  she  will  not  derive  the  potential  benefits  that 
the  system  can  provide.  A  factor  that  is  especially  important 
with  regard  to  this  global  level  is  the  way  in  which  the  adap¬ 
tive  part  of  the  system  is  presented  to  the  user.  Some  user- 
adaptive  systems,  such  as  AgentSalon  (which  was  discussed 
above)  and  the  well-known  Microsoft  Office  Assistant,  have 
employed  lifelike  characters,  for  various  reasons.  As  has  of¬ 
ten  been  pointed  out,  such  anthropomorphic  representa¬ 
tions  can  invoke  unrealistically  high  expectations  concerning 
system  competence — not  only  with  regard  to  capabilities 
like  natural  language  understanding  but  also  with  regard  to 
the  system’s  ability  to  understand  and  adapt  to  the  user. 

In  general,  the  levels  and  degrees  of  predictability  and  com¬ 
prehensibility  that  are  necessary  or  desirable  in  a  given  case  can 
depend  on  many  factors,  including  the  function  that  is  being 
served  by  the  adaptation  and  the  user’s  level  of  skill  and  expe¬ 
rience.  The  same  is  true  of  the  choice  of  the  strategies  that 
are  most  appropriate  for  the  achievement  of  predictability  and 
comprehensibility. 

Threats  to  Controllability 

Controllability  refers  to  the  extent  to  which  the  user  can  bring 
about  or  prevent  particular  actions  or  states  of  the  system  if  he 
or  she  has  the  goal  of  doing  so.  Although  controllability  tends  to 
be  enhanced  by  comprehensibility  and  predictability,  these 
properties  are  not  perfectly  correlated.  For  example,  when  the 
user  clicks  on  a  previously  unused  option  in  Smart  Menus,  he  or 
she  can  predict  with  certainty  that  it  will  be  moved  to  the  main 
part  of  its  menu;  but  the  user  has  no  control  over  whether  this 
change  will  be  made. 

A  typical  measure  for  ensuring  some  degree  of  control  is  to 
have  the  system  submit  any  action  with  significant  consequences 
to  the  user  for  approval.  This  measure  cause  a  threat  of  obtru¬ 
siveness  (see  below);  so  it  is  an  important  interface  design  chal¬ 
lenge  to  find  ways  of  making  recommendations  in  an  unobtru¬ 
sive  fashion  that  still  makes  it  easy  for  the  user  to  notice  and 
follow  up  on  them  (c£  the  earlier  discussion  of  FXPAL  Bar). 

Like  predictability  and  comprehensibility,  controllability  can 
be  achieved  on  various  levels  of  granularity.  Especially  since  the 
enhancement  of  controllability  can  come  at  a  price,  it  is  impor¬ 
tant  to  consider  what  kinds  of  control  will  really  be  desired.  For 
example,  there  may  be  little  point  in  submitting  individual  ac¬ 
tions  to  the  user  for  approval  if  the  user  lacks  the  knowledge 
or  interest  required  to  make  the  decisions.  "Wexelblat  and  Maes 
(1997)  recommended  making  available  several  alternative  types 
of  control  from  which  users  can  choose. 
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The  term  transparency  is  sometimes  used  for  this  concept,  but  it  can  be  confusing,  because  it  also  has  different,  incompatible  meanings. 
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Obtrusiveness 

We  will  use  the  term  obtrusiveness  to  refer  to  the  extent  to 
which  the  system  places  demands  on  the  user’s  attention  which 
reduce  the  user’s  ability  to  concentrate  on  her  primary  tasks. 
This  term — and  the  related  words  distracting  and  irritating — 
are  often  heard  in  connection  with  user-adaptive  systems.  Fig. 
22.23  shows  that  (a)  there  are  several  different  reasons  why 
user-adaptive  systems  can  easily  turn  out  to  be  obtrusive  and 
(b)  there  are  equally  many  corresponding  strategies  for  mini¬ 
mizing  obtrusiveness.  Some  of  these  measures  can  lead  straight¬ 
forwardly  to  significant  improvements — for  example,  when  it  is 
recognized  that  distracting  lifelike  behaviors  of  an  animated 
character  are  not  really  a  necessary  part  of  the  system. 

Threats  to  Privacy 


dation  systems,  such  as  Amazon. corn’s,  allow  users  to  alternate 
freely  between  pursuing  the  system’s  recommendations  and 
browsing  through  product  descriptions  in  the  normal  way. 

Reduction  of  breadth  of  experience  is  especially  likely  if  the 
system  relies  more  heavily  than  is  necessary  on  an  incomplete 
user  model.  The  user  of  Fig.  22.1  had  (understandably)  in¬ 
formed  the  system  about  only  a  tiny  proportion  of  the  books 
that  he  had  ever  read  and  liked;  and  since  the  system  tends  to 
recommend — and  offer  a  chance  to  rate — similar  items,  the  sys¬ 
tem  may  never  obtain  much  evidence  about  all  of  the  user’s 
other  literary  interests.  Some  systems  mitigate  this  problem  by 
systematically  proposing  solutions  that  are  not  dictated  by  the 
current  user  model  (for  a  method  that  is  directly  applicable 
to  recommendation  lists  such  as  Amazon. corn’s,  see  Ziegler, 
McNee,  Konstan,  &  Lausen,  2005;  for  methods  realized  in  dif¬ 
ferent  types  of  recommenders,  see  Linden,  Hanks,  &  Lesh, 
1997;  Shearin  &  Lieberman,  2001). 


User-adaptive  systems  typically  (a)  gather  data  about  individual 
users  and  (b)  use  these  data  to  make  decisions  that  may  have 
more  or  less  serious  consequences.  Users  may  accordingly  be¬ 
come  concerned  about  the  possibility  that  their  data  will  be  put 
to  inappropriate  use.  Privacy  concerns  tend  to  be  especially 
acute  in  e-commerce  contexts  (e.g.,  Cranor,  2004)  and  with  some 
forms  of  support  for  collaboration  (e.g.,  Terveen  &  McDonald, 
2004),  because  in  these  cases  (a)  data  about  the  user  are  typi¬ 
cally  stored  on  computers  other  than  the  user’s  own,  (b)  the 
data  often  include  personally  identifying  information,  and 
(c)  there  may  be  strong  incentives  to  use  the  data  in  ways  that 
are  not  dictated  by  the  user’s  own  interests.  As  will  be  discussed 
in  the  next  section,  different  means  of  acquiring  information 
about  users  can  have  different  consequences  with  regard  to  pri¬ 
vacy.  On  the  other  hand,  many  of  the  measures  that  can  be 
taken  to  protect  privacy — for  example,  a  policy  of  storing  as 
little  personally  identifying  data  as  possible — are  not  specific  to 
user-adaptive  systems. 


Breadth  of  Experience 

When  a  user-adaptive  system  helps  the  user  with  some  form 
of  information  acquisition  (cf  the  second  major  section  of  this 
chapter)  much  of  the  work  of  examining  the  individual  docu¬ 
ments,  products,  and  people  involved  is  typically  taken  over 
by  the  system.  A  consequence  can  be  that  the  user  ends  up 
learning  less  about  the  domain  in  question  than  she  would  with 
a  nonadaptive  system  (cf  Lanier,  1995).  For  example,  if  the 
Amazon.com  visitor  for  whom  recommendations  are  shown  in 
Fig.  22.1  relies  heavily  on  such  recommendations  (as  opposed 
to  browsing  freely),  the  user  is  likely  to  learn  a  lot  about  the 
books  of  Frederick  Forsyth  and  about  closely  related  products 
but  little  about  the  full  range  of  books  and  other  media  that 
are  available.  One  point  of  view  here  (see,  e.g.,  the  remarks  of 
Maes  in  Shneiderman  &  Maes,  1997,  p.  53)  is  that  it  should  be  up 
to  the  user  to  decide  whether  he  or  she  prefers  to  learn  about 
a  given  domain  or  to  save  time  by  delegating  work  to  a  system. 
It  may  be  worthwhile  to  give  the  user  a  continuous  spectrum  of 
possibilities  between  complete  control  over  a  task  and  com¬ 
plete  delegation  of  it.  For  example,  many  product  recommen- 


Deallng  With  Tradeoffs 

As  can  be  seen  in  Fig.  22.23,  the  designer  who  attempts  to  com¬ 
bat  a  particular  usability  threat  will  often  have  to  accept  a  greater 
threat  to  some  other  usability  goal.  The  most  obvious  tradeoffs 
involve  unobstrusiveness.  In  particular,  steps  taken  to  enhance 
control  or  to  protect  privacy  often  require  the  user  to  perform 
additional  actions,  input  additional  information,  and  pay  atten¬ 
tion  to  additional  system  messages.  Dealing  with  tradeoffs  of 
this  sort  is  complicated  by  the  fact  that  users  often  differ 
markedly  in  the  relative  priority  that  they  assign  to  each  of  the 
conflicting  goals. 

Fig.  22.24  illustrates  some  general  points,  referring  for  con¬ 
creteness  to  a  recently  developed  prototype  Office  Control 


FIGURE  22.24.  Illustration  of  strategies  for  dealing  with  trade¬ 
offs  among  usability  goals  in  user-adaptive  systems.  (Explanation 
in  text.) 
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System  (Cheverst  et  al.,  2005).  This  system  first  observes  how 
the  occupant  of  an  office  tends  to  operate  various  devices  such 
as  the  fan  and  the  window  shades;  it  then  tries  to  help  the  user 
by  performing  some  actions  autonomously  (e.g.,  opening  the 
window  when  certain  weather  conditions  prevail  and  there  is  no 
visitor  in  the  office).  An  early  version  offered  two  ways  of  divid¬ 
ing  the  work  between  the  user  and  the  system  for  each  type  of 
action:  The  system  could  either  perform  actions  of  that  type 
autonomously  or  request  the  user’s  permission  with  a  preemp- 
tory  dialogue  box  on  the  user’s  normal  computer  screen.  Users 
chose  the  latter  option  for  different  proportions  of  the  avail¬ 
able  actions,  reflecting  different  priorities  for  the  goals  of  un¬ 
obtrusiveness  and  controllability  respectively.  In  terms  of 
the  tradeoff  graph  shown  in  Fig.  22.24,  users  chose  different 
points  on  the  straight  diagonal  line.  Despite  this  freedom  of 
choice,  users  were  often  not  satisfied  with  the  overall  usability 
of  the  prototype.  A  significant  improvement  in  acceptance  was 
achieved  when  the  designers  expanded  the  design  space:  They 
introduced  a  separate  small  screen  for  the  Office  Control  Sys¬ 
tem,  in  which  information  and  requests  for  confirmation  could 
be  offered  in  several  ways  that  are  not  very  familiar  in  everyday 
graphical  user  interfaces  (though  they  are  familiar  in  industrial 
and  traffic  contexts;  see,  e.g.,  Wickens  &  Hollands,  2000).  As 
Fig.  22.24  indicates,  these  additional  forms  of  interaction  rep¬ 
resented  a  more  favorable  combination  of  degrees  of  unobtru¬ 
siveness  and  controllability  at  least  for  some  of  the  users  some 
of  the  time. 

Consistent  with  more  complex  examples  (see  Jameson  & 
Schwarzkopf,  2002;  Billsus  &  Pazzani,  2007),  this  small  case 
study  illustrates  two  general  points:  (a)  'When  dealing  with 
tradeoffs  among  the  usability  goals  discussed  here,  it  is  impor¬ 
tant  to  offer  alternative  solutions  for  users  with  different  priori¬ 
ties.  (b)  It  is  equally  important  to  consider  relatively  novel  inter¬ 
face  design  solutions  that  may  spare  users  the  need  to  choose 
among  unsatisfactory  alternatives. 


OBTAINING  INFORMATION  ABOUT  USERS 


Some  of  the  usability  challenges  discussed  in  the  previous  sec¬ 
tion  are  closely  connected  with  the  ways  in  which  information 
about  individual  users  is  acquired — a  consideration  which  also 
largely  determines  the  success  of  a  system’s  adaptation.  The 
next  two  subsections  will  look,  respectively,  at  (a)  information 
that  the  user  supplies  to  the  system  explicitly  for  the  purpose 
of  allowing  the  system  to  adapt  and  (b)  information  that  the  sys¬ 
tem  obtains  in  some  other  way. 

Explicit  Self-Reports  and  -Assessments 

Self-reports  about  objective  personal  characteris¬ 
tics.  Information  about  objective  properties  of  the  user  (such 
as  age,  profession,  and  place  of  residence)  sometimes  has  im¬ 
plications  that  are  relevant  for  system  adaptation — for  exam¬ 
ple,  concerning  the  topics  in  which  the  user  is  likely  to  be 
knowledgeable  or  interested.  This  type  of  information  also  has 
the  advantage  of  changing  relatively  infrequently.  Some  user- 


adaptive  systems  request  information  of  this  type  from  users, 
but  the  following  caveats  apply: 

1.  Specifying  information  such  as  profession  and  place  of  resi¬ 
dence  may  require  a  fair  amount  of  tedious  menu  selection 
and/or  typing. 

2.  Since  information  of  this  sort  can  often  be  used  to  determine 
the  user’s  identity,  the  user  may  justifiably  be  concerned 
about  privacy  issues.  Even  in  cases  where  such  concerns  are 
unfounded,  they  may  discourage  the  user  from  entering  the 
requested  information. 

A  general  approach  is  (a)  to  restrict  requests  for  personal  data 
to  the  few  pieces  of  information  (if  any)  that  the  system  really  re¬ 
quires  and  (b)  to  explain  the  uses  to  which  the  data  will  be  put. 
Cranor  (2004)  gave  a  number  of  suggestions  about  how  the  use 
of  personally  identifying  data  can  be  minimized.  An  especially 
creative  approach  was  tried  in  the  web-based  Lifestyle  Finder 
prototype  (Fig.  22.25;  Krulwich,  1997),  which  was  characterized 
by  a  playful  style  and  an  absence  of  requests  for  personally  iden¬ 
tifying  information.  Of  the  users  surveyed,  93%  agreed  that  the 
Lifestyle  Finder’s  questions  did  not  invade  their  privacy. 

It  is  sometimes  possible  to  avoid  requests  for  explicit  input 
about  personal  characteristics  by  accessing  sources  where  simi¬ 
lar  information  has  already  been  stored  (a  strategy  that  will  be 
discussed  later  in  this  section). 

Self-assessments  of  interests  and  knowledge.  It  is 

sometimes  helpful  for  a  user-adaptive  system  to  have  an  assess¬ 
ment  of  a  property  of  the  user  that  can  be  expressed  naturally  as 
a  position  on  a  particular  general  dimension:  the  level  of  the 
user’s  interest  in  a  particular  topic,  the  level  of  his  or  her  knowl¬ 
edge  about  it,  or  the  importance  that  the  user  attaches  to  a  par¬ 
ticular  evaluation  criterion.  Often  an  assessment  is  arrived  at 
thrctugh  inference  on  the  basis  of  indirect  evidence,  as  with  the 
assessments  of  a  learner’s  knowledge  in  the  SQL-Tutor  (Fig. 
22.21).  But  it  may  be  necessary  or  more  efficient  to  ask  the  user 


FIGURE  22.25.  Example  of  a  screen  with  which  the  LifeStyle 
Finder  elicits  demographic  information.  (Figure  3  of  "Lifestyle 
Finder:  Intelligent  user  profiling  using  large-scale  demographic 
data,"  by  B.  Krulwich,  1997,  Al  Magazine,  18(2),  pp.  37-45. 
Research  conducted  at  the  Center  for  Strategic  Technology 
Research  of  Andersen  Consulting  (now  Accenture  Technology 
Labs).  Copyright  1997  by  the  American  Association  for  Artificial 
Intelligence.  Adapted  with  permission.) 
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for  an  explicit  assessment.  For  example,  it  would  be  difficult  for 
the  Active  Buyers  Guide  recommender  shown  in  Fig.  22.16  to 
estimate  the  importance  that  the  user  attaches  to  photo  quality 
without  asking  the  user  directly.  The  scales  in  this  figure  illus¬ 
trate  good  practice  in  that  they  make  clear  the  meaning  of  the 
various  possible  answers,  instead  of  asking,  “How  important  is 
photo  quality  to  you  (on  a  scale  from  1  to  5)?” 

Because  of  the  effort  involved  in  this  type  of  self-assessment, 
it  is  generally  worthwhile  to  consider  ways  of  minimizing  such 
requests,  making  responses  optional,  ensuring  that  the  purpose 
is  clear,  and  integrating  the  self-assessment  process  into  the 
user’s  main  task  (for  some  innovative  ideas  about  how  to 
achieve  these  goals,  see  Tsandilas  &  schraefel,  2004). 

Self-reports  on  specific  evaluations.  Instead  of  asking 
the  user  to  describe  his  or  her  interests  explicitly,  some  systems 
try  to  infer  the  user’s  position  on  the  basis  of  his  or  her  explicitly 
evaluative  responses  to  specific  items.  Amazon. corn’s  rating 
scales  (Fig.  22.1)  illustrate  one  form  that  this  type  of  input  can 
take;  other  forms  include  checkboxes  and  icons  (e.g.,  “thumbs- 
up”  and  “thumbs-down”) .  The  items  that  the  user  evaluates  can 
be  (a)  items  that  the  user  is  currently  experiencing  directly  (e.g., 
the  current  website);  (b)  actions  that  the  system  has  just  per¬ 
formed,  which  the  user  may  want  to  encourage  or  discourage 
(see,  e.g.,  'Wolfman,  Lau,  Domingos,  &  "Weld,  2001);  (c)  items 
that  the  user  must  judge  on  the  basis  of  a  description  (e.g.,  the 
abstract  of  a  talk;  a  table  listing  the  attributes  of  a  physical  prod¬ 
uct);  or  (d)  the  mere  name  of  an  item  (e.g.,  a  movie)  that  the 
user  may  have  had  some  experience  with  in  the  past  (see,  e.g., 
Fig.  22.1).  The  cognitive  effort  required  depends  in  part  on  how 
directly  available  the  item  is:  In  the  third  and  fourth  cases  just 
listed,  the  user  may  need  to  perform  memory  retrieval  and  in¬ 
ference  in  order  to  arrive  at  an  evaluation. 

Even  when  the  effort  is  minimal,  users  often  do  not  like  to 
bother  with  explicit  evaluations  that  do  not  constitute  a  neces¬ 
sary  part  of  the  task  they  are  performing.  For  this  reason,  many 
designers  try  to  get  by  with  the  type  of  nonexplicit  input  dis¬ 
cussed  in  the  following  section.  For  example,  Findory  (Fig. 
22.14)  could  allow  the  user  to  rate  the  various  news  stories  and 
blogs  presented,  but  instead  it  just  interprets  the  user’s  behav¬ 
ior  in  selecting  items  to  read  them. 

Responses  to  test  items.  In  systems  that  support  learn¬ 
ing,  it  is  often  natural  to  administer  tests  of  knowledge  or  skill. 
In  addition  to  serving  their  normal  educational  functions,  these 
tests  can  yield  valuable  information  for  the  system’s  adaptation 
to  the  user.  An  advantage  of  tests  is  that  they  can  be  con¬ 
structed,  administered,  and  interpreted  with  the  help  of  a  large 
body  of  theory,  methodology,  and  practical  experience  (e.g., 
"Wainer,  2000). 

Outside  of  a  learning  context,  users  are  likely  to  hesitate  to 
invest  time  in  tests  of  knowledge  or  skill,  unless  these  can  be 
presented  in  an  enjoyable  form  (e.g.,  the  color  discrimination 
test  used  by  Gutkauf,  Thies,  &  Domik,  1997,  to  identify  percep¬ 
tual  limitations  relevant  to  the  automatic  generation  of  graphs) . 
Trewin  (2004,  p.  76)  reported  experience  with  a  brief  typing  test 
that  was  designed  to  identify  helpful  keyboard  adaptations: 
Some  users  who  turned  out  to  require  no  adaptations  were  dis¬ 
appointed  that  their  investment  in  the  test  had  yielded  no  ben¬ 


efit.  As  a  result,  Trewin  decided  that  adaptations  should  be 
based  on  the  users’  naturally  occurring  typing  behavior. 

Nonexplicit  Input 

The  previous  subsection  has  given  some  examples  of  why  de¬ 
signers  often  look  for  ways  of  obtaining  information  about  the 
user  that  does  not  require  any  explicit  input  by  the  user. 

Naturally  Occurring  Actions 

The  broadest  and  most  important  category  of  information  of 
this  type  includes  all  of  the  actions  that  the  user  performs  with 
the  system  that  do  not  have  the  express  purpose  of  revealing  in¬ 
formation  about  the  user  to  the  system.  These  actions  may 
range  from  major  actions  like  purchasing  an  expensive  product 
to  minor  ones  like  scrolling  down  a  web  page.  The  more  signif¬ 
icant  actions  tend  to  be  specific  to  the  particular  type  of  system 
that  is  involved  (e.g.,  e-commerce  websites  vs.  learning  envi¬ 
ronments).  'Within  some  domains,  there  has  been  considerable 
research  on  ways  of  interpreting  particular  types  of  naturally 
occurring  user  actions.  For  example,  researchers  interested  in 
adaptive  hypertext  navigation  support  have  developed  a  vari¬ 
ety  of  ways  of  analyzing  the  user’s  navigation  actions  to  infer  the 
user’s  interests  and  to  propose  navigation  shortcuts  (e.g., 
Mobasher,  2007). 

In  their  purest  forms,  naturally  occurring  actions  require  no 
additional  investment  by  the  user,  because  they  are  actions  that 
the  user  would  perform  anyway.  The  main  limitation  is  that  they 
are  hard  to  interpret;  for  example,  the  fact  that  a  given  web  page 
has  been  displayed  in  the  user’s  browser  for  four  minutes  does 
not  reveal  with  certainty  which  (if  any)  of  the  text  displayed  on 
that  page  the  user  has  actually  read.  Some  designers  have  tried 
to  deal  with  this  trade-off  by  designing  the  user  interface  in  such 
a  way  that  the  naturally  occurring  actions  are  especially  easy  to 
interpret.  For  example,  a  web-based  system  might  display  just 
one  news  story  on  each  page,  even  if  displaying  several  stories 
on  each  page  would  normally  be  more  desirable. 

The  interpretation  of  naturally  occurring  actions  by  the  sys¬ 
tem  can  raise  privacy  and  comprehensibility  issues  (cf.  Fig. 
22.23)  that  do  not  arise  in  the  same  way  with  explicit  self-reports 
and  self-assessments  of  the  types  discussed  earlier  in  this  sec¬ 
tion:  'Whereas  the  latter  way  of  obtaining  information  about  the 
user  can  be  compared  with  interviewing,  the  former  way  is 
more  like  eavesdropping — unless  the  user  is  informed  about 
the  nature  of  the  data  that  are  being  collected  and  the  ways  in 
which  they  will  be  used  (cf  Cranor,  2004). 

Previously  Stored  Information 

Sometimes  a  system  can  access  relevant  information  about  the 
user  that  has  been  acquired  and  stored  independently  of  the 
system’s  interaction  with  the  user: 

1.  If  the  user  has  some  relationship  (e.g.,  patient,  customer) 

with  the  organization  that  operates  the  system,  this  organi- 
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zation  may  have  information  about  the  user  that  it  has  stored 
for  reasons  unrelated  to  any  adaptation,  such  as  the  user’s 
medical  record  (see  Cawsey  et  ah,  2007,  for  examples)  or 
address. 

2.  Relevant  information  about  the  user  may  be  stored  in  pub¬ 
licly  available  sources  such  as  electronic  directories  or  web 
home  pages.  For  example,  Pazzani  (1999)  explored  the  idea 
of  using  a  user’s  web  home  page  as  a  source  of  information 
for  a  restaurant  recommending  system. 

3.  If  there  is  some  other  system  that  has  already  built  up  a 
model  of  the  user,  the  system  may  be  able  to  access  the  re¬ 
sults  of  that  modeling  effort  and  try  to  apply  them  to  its  own 
modeling  task.  There  is  a  line  of  research  that  deals  with  user 
modeling  servers  (e.g.,  Kobsa,  2007):  systems  that  store  in¬ 
formation  about  users  centrally  and  supply  such  information 
to  a  number  of  different  applications.  Some  of  the  major  com¬ 
mercial  personalization  software  is  based  on  this  conception. 

Relative  to  all  of  the  other  types  of  information  about  users, 
previously  stored  information  has  the  advantage  that  it  can  be 
applied  in  principle  right  from  the  start  of  the  first  interaction  of 
a  given  user  with  a  given  system.  To  be  sure,  the  interpretabil- 
ity  and  usefulness  of  the  information  in  the  context  of  the  cur¬ 
rent  application  may  be  limited.  Moreover,  questions  concern¬ 
ing  privacy  and  comprehensibility  may  be  even  more  important 
than  with  the  interpretation  of  naturally  occurring  actions. 

Low-Level  Indices  of  Psychological  States 

The  next  two  categories  of  information  about  the  user  have  be¬ 
come  practically  feasible  only  in  recent  years,  with  advances  in 
the  miniaturization  of  sensing  devices. 

The  first  category  of  sensor-based  information  (discussed 
at  length  in  the  classic  book  of  Picard,  1997)  comprises  data 
that  reflect  aspects  of  a  user’s  psychological  state.  Some  of  the 
application  scenarios  in  which  this  type  of  information  can  be 
useful  were  discussed  in  the  section  on  systems  that  mediate 
interaction  with  the  real  world. 

Two  categories  of  sensing  devices  have  been  employed: 
(a)  devices  attached  to  the  user’s  body  (or  to  the  computing 
device  itself)  that  transmit  physiological  data,  such  as  elec¬ 
tromyogram  signals,  the  galvanic  skin  response,  blood  volume 
pressure,  and  the  pattern  of  respiration  (for  an  overview,  see 
Lisetti  &  Nasoz,  2004);  and  (b)  video  cameras  and  microphones 
that  transmit  psychologically  relevant  information  about  the 
user,  such  as  features  of  his  or  her  facial  expressions  (e.g., 
Bartlett,  Littlewort,  Fasel,  &  Movellan,  2003)  or  his  or  her  speech 
(e.g.,  Liscombe  et  al.,  2005). 

"With  both  categories  of  sensors,  the  extraction  of  meaningful 
features  from  the  low-level  data  stream  requires  the  application 
of  pattern  recognition  techniques.  These  typically  make  use  of 
the  results  of  machine  learning  studies  in  which  the  relation¬ 
ships  between  low-level  data  and  meaningful  features  have  been 
learned. 


One  advantage  of  sensors  is  that  they  supply  a  continuous 
stream  of  data,  the  cost  to  the  user  being  limited  to  the  physical 
and  social  discomfort  that  may  be  associated  with  the  carrying 
or  wearing  of  the  devices.  These  factors  are  still  significant  now, 
but  further  advances  in  miniaturization — and  perhaps  chang¬ 
ing  attitudes  as  well — seem  likely  to  reduce  their  importance. 

Signals  Concerning  the  Current  Surroundings 

As  computing  devices  become  more  portable,  it  is  becoming 
increasingly  important  for  a  user-adaptive  system  to  have  infor¬ 
mation  about  the  user’s  current  surroundings.  Here  again,  two 
broad  categories  of  input  devices  can  be  distinguished  (for  a  dis¬ 
cussion  of  a  number  of  specific  types  of  devices,  see  Kruger, 
Baus,  Heckmann,  Kruppa,  &  Wasinger,  2007). 

1.  Devices  that  receive  explicit  signals  about  the  user’s  sur¬ 
roundings  from  specialized  transmitters.  Some  mobile  sys¬ 
tems  that  are  used  outdoors  employ  GPS  (Global  Positioning 
System)  technology.  More  specialized  transmitters  and  re¬ 
ceivers  are  required,  for  example,  if  a  portable  museum 
guide  system  is  to  be  able  to  determine  which  exhibit  the 
user  is  looking  at. 

2.  More  general  sensing  or  input  devices.  For  example,  Schiele, 
Starner,  Rhodes,  Clarkson,  and  Pentland  (2000)  described 
the  use  of  a  miniature  video  camera  and  microphone  (each 
roughly  the  size  of  a  coin)  that  enable  a  wearable  computer 
to  discriminate  among  different  types  of  surroundings  (e.g., 
a  supermarket  vs.  a  street).  The  use  of  general-purpose  sen¬ 
sors  eliminates  the  dependence  on  specialized  transmitters. 
On  the  other  hand,  the  interpretation  of  the  signals  requires 
the  use  of  sophisticated  machine  learning  and  pattern  recog¬ 
nition  techniques. 


SPECIAL  CONSIDERATIONS  CONCERNING 
EMPIRICAL  METHODS 


The  full  repertoire  of  empirical  methods  in  HCI  is  in  principle 
applicable  to  user-adaptive  systems.  This  section  will  focus  on 
some  methods  that  are  more  important  for  user-adaptive  sys¬ 
tems  than  for  other  types  and  on  some  typical  problems  that 
need  to  be  dealt  with.  This  focused  discussion  should  not  ob¬ 
scure  the  fact  that  a  lot  of  empirical  work  with  user-adaptive  sys¬ 
tems  looks  the  same  as  with  other  systems. 

Use  of  Data  Collected  With  a  Nonadaptive  System 

The  key  difference  between  user-adaptive  systems  and  other 
interactive  systems  is  the  inclusion  of  some  method  for  acquir¬ 
ing  and  exploiting  a  user  model.  This  feature  gives  rise  to  a  type 
of  empirical  study  that  is  largely  unique  to  user-adaptive 


^^More  extended  discussions  of  empirical  methods  for  user-adaptive  systems  are  provided  by  Gena  and  Weibelzahl  (2007) ,  Hook  (2000) ,  and  Lang¬ 
ley  and  Fehling  (1998). 
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systems:  studies  in  wiiich  the  accuracy  of  the  modeling  methods 
is  evaluated. 

This  type  of  evaluation  can  often  be  performed  even  if  no 
user-adaptive  systems  that  employ  the  user  modeling  method  in 
question  exist.  What  is  needed  are  (a)  some  implementation  of 
the  adaptation  algorithm,  not  necessarily  embedded  in  any  in¬ 
teractive  system,  and  (b)  a  database  of  behavioral  data  on  how 
a  number  of  users  have  used  a  relevant  nonadaptive  system. 
The  researcher  can  then  apply  the  modeling  method  to  the  data 
in  order  to  determine  how  well  the  system  would  adapt  to  the 
users  in  question. 

A  number  of  studies  of  this  type  were  conducted  with  the 
i-ems  system  (see  McCreath  et  al.,  2006,  and  the  discussion  ear¬ 
lier  in  this  chapter).  In  one  case,  the  researchers  wanted  to  find 
out  whether  a  user  who  had  defined  a  number  of  handcrafted 
e-mail  sorting  rules  could  benefit  from  having  automatically 
learned  rules  applied  to  messages  that  were  not  covered  by  the 
handcrafted  rules.  One  simulation  was  performed  on  5,100 
e-mail  messages  that  had  been  sorted  by  a  single  user  within  a 
nonadaptive  e-mail  client  over  three  months.  In  the  order  in 
which  the  messages  had  been  received,  they  were  presented 
to  one  of  the  system’s  learning  algorithms  in  batches  of  100, 
along  with  information  about  how  the  user  had  sorted  them, 
so  that  the  system  could  continually  refine  its  set  of  learned 
rules.  After  each  batch  of  100  messages,  the  accuracy  of  the  up¬ 
dated  set  of  rules  was  evaluated:  The  system  was  asked  to  make 
predictions  about  the  next  batch  of  messages  before  being  told 
how  the  user  had  in  fact  sorted  them,  and  these  predictions 
were  compared  with  the  user’s  actual  behavior.  Several  indices 
of  the  system’s  performance  were  computed,  one  of  which  is 
shown  in  Fig.  22.26:  the  percentage  of  messages  for  which  the 
learned  rules  made  no  prediction  (e.g.,  where  the  appropriate 
folder  was  unknown  to  the  system).  The  middle  curve  in  the 
graph  shows  how  this  percentage  gradually  decreased  over 
time.  The  uppermost  curve  shows  the  corresponding  results  for 
the  case  where  only  the  user’s  handcrafted  rules  were  used  for 
prediction;  the  lowest  curve  shows  the  results  where  both  types 
of  rules  were  applied,  with  the  handcrafted  rules  taking  prece- 
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FIGURE  22.26.  Results  of  one  of  the  simulation  experiments 
performed  with  i-ems's  algorithms.  (Explanation  in  text.  Figure 
courtesy  of  Eric  McCreath.) 


dence  in  the  case  of  disagreement.  It  can  be  seen  that  the  joint 
use  of  both  sets  of  rules  gave  the  best  results  in  terms  of  avoid¬ 
ing  “unknown”  predictions.  Since  according  to  other  indices  the 
accuracy  of  the  combined  set  of  rules  was  as  good  as  that  of  the 
handcrafted  rules  alone,  the  system’s  performance  was  best 
overall  with  the  combined  set. 

Note  that  it  was  not  necessary,  for  this  evaluation,  to  create 
three  different  versions  of  i-ems  and  have  them  used  for  months 
by  different  users.  In  addition  to  being  extremely  time-consuming, 
this  procedure  would  allow  less  direct  accuracy  comparisons.  By 
contrast,  using  existing  e-mail  corpora,  McCreath  et  al.  (2006) 
performed  simulations  that  shed  light  on  many  properties  of 
the  algorithms  used. 

The  appeal  of  this  type  of  evaluation,  in  terms  of  being  able 
to  yield  numerous  interpretable  results  with  minimal  involvement 
of  actual  users,  is  so  great  that  researchers  often  seem  to  lose 
sight  of  the  fact  that  studies  with  real  users  are  likewise  essential. 
No  simulation  study,  for  example,  can  reveal  how  well  the  design 
of  the  i-ems  interface  shown  in  Fig.  22.4  corresponds  to  the  way 
users  like  to  deal  with  incoming  e-mail,  or  how  helpful  users  find 
the  explanations  that  the  system  offers  for  its  predictions. 

Early  Studies  of  Usage  Scenarios  and  User  Requirements 

In  the  field  of  HCI  as  a  whole,  it  is  expected  that  user-centered 
design  should  begin  with  a  study  of  contexts  of  use,  usage  sce¬ 
narios,  properties  of  users,  and  user  requirements.  To  date,  this 
strategy  has  been  applied  less  frequently  in  the  design  of  user- 
adaptive  systems — perhaps  because  the  designers  less  fre¬ 
quently  come  from  an  HCI  background,  often  specializing  in¬ 
stead  in  the  development  of  the  necessary  adaptive  algorithms. 
Early  user  studies  are  actually  at  least  as  important  with  novel 
user-adaptive  systems  as  with  other  types  of  system:  It  is  often 
not  clear  in  advance  whether  adaptation  will  yield  added  value 
and  achieve  acceptance  in  a  particular  context.  Careful  attention 
to  the  requirements  and  contexts  of  users  may  greatly  increase 
the  likelihood  of  success — or  at  least  warn  of  the  designers  at 
an  early  stage  that  if  a  particular  usage  scenario  is  not  promising 
for  the  sort  of  adaptive  interaction  that  they  envision. 

A  positive  example  of  early  attention  to  user  requirements  is 
found  in  the  development  of  the  museum  guide  HyperAudio, 
which  was  developed  as  a  prototype  in  the  1990s  (for  a  retro¬ 
spective  discussion,  see  Petrelli  &  Not,  2005).  Studying  the  atti¬ 
tudes  and  behavior  of  museum  visitors  at  an  early  stage  in  the 
system’s  design,  the  researchers  found  that  many  visitors  enjoy 
guided  tours  but  that  few  visitors  want  to  spend  time  interact¬ 
ing  with  technical  devices.  These  two  findings,  along  with  others, 
led  to  a  modification  of  the  original  conception  of  HyperAudio, 
in  that  they  suggested  the  appropriateness  of  a  museum  guide 
that  selects  information  for  presentation  on  the  basis  of  the 
user’s  behavior  and  location,  requiring  little  or  no  explicit  input. 

Wizard  of  Oz  Studies 

Systems  that  adapt  to  their  users  are  in  one  methodological  re¬ 
spect  similar  to  systems  that  make  use  of  speech:  They  attempt 
to  realize  a  capability  that  is  so  far,  at  least  in  many  contexts. 
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possessed  to  the  highest  degree  by  humans.  Consequently,  as 
with  speech  interfaces,  valuable  information  can  sometimes  be 
obtained  from  a  Wizard  of  Oz  study:  In  a  specially  created  set¬ 
ting,  a  human  takes  over  a  part  of  the  processing  of  the  to-be- 
developed  system  for  which  humans  are  especially  well  suited. 

One  example  is  the  Wizard  of  Oz  study  that  was  conducted 
early  in  the  development  of  the  Lumiere  intelligent  help  sys¬ 
tem  (Horvitz,  Breese,  Heckerman,  Hovel,  &  Rommelse,  1998), 
which  formed  the  basis  for  the  Office  Assistant,  which  was  in¬ 
troduced  in  Microsoft  Office  97.  In  this  study,  subjects  working 
with  a  spreadsheet  were  told  that  an  experimental  help  system 
would  track  their  activities  and  make  guesses  about  how  to  help 
them.  They  received  the  advice  via  computer  monitors.  The  ad¬ 
vice  was  actually  provided  by  usability  experts  who,  working  in 
a  separate  room,  viewed  the  subjects’  activity  via  monitors  and 
conveyed  their  advice  by  typing. 

This  type  of  study  can  yield  an  upper-bound  estimate  of  the 
highest  level  of  modeling  accuracy  that  might  be  attainable 
given  the  available  information — as  long  as  one  can  assume  that 
the  human  wizards  are  more  competent  at  the  type  of  assess¬ 
ment  in  question  than  a  fully  automatic  system  is  likely  to  be  in 
the  foreseeable  future.  In  this  example  study,  the  expert  advisers 
showed  some  ability  to  identify  the  users’  goals  and  needs;  if 
they  had  not  done  so,  perhaps  the  entire  project  would  have 
been  reconsidered. 

Aside  from  accuracy,  a  Wizard  of  Oz  study  can  also  shed  light 
on  problems  with  the  acceptability  and  usability  of  the  new  sys¬ 
tem,  as  long  as  they  concern  the  content  and  basic  nature  of  the 
adaptations  performed,  as  opposed  to  interface  details  that  are 
not  faithfully  reproduced  in  the  study.  In  our  example  study,  it 
turned  out  that  even  incorrect  advice  was  often  taken  seriously  by 
the  users,  who  wasted  time  following  up  on  irrelevant  sugges¬ 
tions.  One  design  implication  is  that  users  should  be  made  aware 
of  the  fact  that  the  system’s  advice  is  not  necessarily  relevant. 

Comparisons  With  the  Work  of  Human  Designers 

Just  as  humans  can  sometimes  be  employed  as  a  surrogate  for 
a  user-adaptive  system  in  the  early  stages  of  design,  humans  can 
sometimes  also  serve  as  a  standard  of  comparison  for  the  eval¬ 
uation  of  an  implemented  system.  This  method  makes  most 
sense  when  the  system  is  performing  a  task  at  which  human 
authors  or  designers  are  likely  to  be  experienced  and  skilled, 
such  as  the  tailoring  of  the  content  of  a  presentation  to  the  in¬ 
dividual  user.  An  example  of  such  a  study  was  mentioned  ear¬ 
lier:  a  comparison  of  the  displays  generated  by  the  real  estate 
recommender  system  RIA  with  those  generated  by  two  experi¬ 
enced  designers.  Instead  of  performing  the  same  task  as  the  sys¬ 
tem,  the  human  designers  may  simply  act  as  judges  of  the  appro¬ 
priateness  of  the  system’s  output.  In  either  case,  the  comments 
of  the  designers  can  yield  valuable  qualitative  information  to 
complement  the  objective  results. 

Experimental  Comparisons  of  Adaptive 
and  Nonadaptive  Systems 

Many  experimental  studies  involving  user-adaptive  systems 
compare  an  adaptive  variant  with  some  nonadaptive  one.  This 
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FIGURE  22.27.  The  adaptive  (left)  and  adaptable  (right)  split 
menus  used  in  an  experimental  comparison.  (Adapted  from  Fig¬ 
ure  1  of  "A  comparison  of  static,  adaptive,  and  adaptable  menus,” 
by  L.  Findlater  &  J.  McGrenere,  2004,  in  E.  Dykstra-Erickson  & 
M.  Tscheligi  (Eds.),  Human  factors  in  computing  systems:  CHI  2004 
conference  proceedings,  pp.  89-96,  New  York:  ACM,  Copyright  2004 
by  the  Association  for  Computing  Machinery,  Inc.  Adapted  with 
permission.) 


strategy  is  understandable  given  the  doubt  that  often  exists  as  to 
whether  the  additional  overhead  required  for  user-adaptivity  is 
justified  by  any  sort  of  improvement  in  the  interaction.  How¬ 
ever,  studies  like  this  are  trickier  to  conduct  and  interpret  than 
they  may  seem  at  first  glance. 

As  an  example,  consider  the  experiment  by  Findlater  and 
McGrenere  (2004),  which  examined  whether  subjects  could 
work  faster  using  (a)  an  adaptive  menu  system  somewhat  like 
Smart  Menus  (cf  the  discussion  of  adaptive  menus  above); 
(b)  an  adaptable  system  in  which  the  users  explicitly  deter¬ 
mined  the  content  of  the  menus  themselves;  or  (c)  a  conven¬ 
tional  static  menu  system.  For  each  of  these  three  types  of 
menu,  a  realization  was  chosen  that  seemed  optimal  in  the 
context  of  the  experiment.  All  three  menus  were  a  special  type 
of  split  menu  (Sears  &  Shneiderman,  1994):  The  four  most  fre¬ 
quent  items  in  each  menu  were  placed  in  a  special  section  at 
the  top  of  the  menu  for  quick  access  (instead  of  being  tem¬ 
porarily  hidden,  as  in  Smart  Menus).  The  items  selected  for 
the  upper  part  of  the  static  menu  were  optimally  chosen  in 
that  they  reflected  the  actual  frequency  of  the  items  in  the  ex¬ 
perimental  tasks.  The  adaptive  menu  was  initially  identical  to 
the  static  menu,  but  the  arrangement  of  the  items  changed  as 
a  function  of  the  user’s  behavior,  favoring  the  most  frequently 
and  recently  used  menu  items.  For  the  adaptable  menu,  the 
upper  part  was  initially  empty,  so  that  users  would  be  encour¬ 
aged  to  perform  some  adaptations.  Though  the  overall  pattern 
of  results  is  complex,  it  tends  to  speak  in  favor  of  the  adaptable 
menu.  Note  that  the  conditions  did  not  give  the  adaptive  vari¬ 
ant  much  of  a  chance  to  provide  any  benefit.  Since  the  initial 
menu  was  already  the  best  possible  single  menu  for  the  ex¬ 
perimental  tasks,  adaptation  could  improve  performance  only 
by  taking  advantage  of  any  local  concentrations  of  commands 
within  a  particular  period  of  time  (e.g.,  the  need  to  execute 
the  same  command  several  times  in  succession).  By  contrast, 
in  normal  usage  situations,  an  adaptive  menu  can  also  improve 
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performance  by  reflecting  increasingly  the  user’s  longer-term 
patterns  of  use. 

The  difficulties  in  interpreting  the  results  of  this  experiment 
could  not  easily  have  been  avoided  with  a  different  design:  Any 
other  way  of  realizing  the  three  conditions  would  have  left  some 
different  set  of  questions  open.  The  lesson  of  this  and  many 
other  examples  is  that  comparisons  between  adaptive  and  non- 
adaptive  variants  of  a  system  should  not  be  viewed  as  empirical 
tests  whose  results  can  be  interpreted  straightforwardly.  Instead, 
they  should  be  seen  as  shedding  light  on  various  aspects  of  the 
ways  in  which  people  use  adaptive  and  nonadaptive  systems  and 
on  the  effectiveness  of  these  methods  in  certain  conditions. 

Taking  Into  Account  Individual  Differences 

Individual  differences  among  users  show  up  in  just  about  every 
user  study  in  the  field  of  HCI;  but  with  user-adaptive  systems, 
they  are  especially  important  because  of  the  wide  range  of  sub¬ 
jective  reactions  that  user-adaptivity  tends  to  invoke  (illustrated 
in  the  experiment  discussed  in  the  previous  subsection).  As  a  re¬ 
sult,  asking  whether  people  like  a  particular  type  of  user-adaptive 
system  is  in  many  cases  like  asking  whether  the  voters  in  a  given 
country  prefer  progressive  or  conservative  policies.  Even  if,  in 
a  given  sample,  a  statistically  significant  tendency  in  one  direc¬ 
tion  or  the  other  can  be  found,  important  minority  points  of 
view  should  be  understood  and  reported.  As  in  politics,  the  goal 
should  be  to  take  into  account  the  range  of  different  prefer¬ 
ences  in  a  way  that  is  satisfactory  to  at  least  a  large  proportion  of 
the  potential  user  group. 

When  individual  differences  are  present,  it  may  be  tempting 
to  try  to  find  correlations  with  demographic  characteristics  or 
with  general  personality  variables.  Although  some  relationships 
of  this  sort  can  be  found  (see,  for  example,  Graziola,  Pianesi, 
Zancanaro,  &  Goren-Bar,  2005,  with  regard  to  personality  vari¬ 
ables),  it  is  not  always  worthwhile  to  look  for  them.  The  rela¬ 
tionships  tend  to  be  weak,  since  different  responses  can  also 
be  due  to  more  specific  causes,  such  as  degree  of  familiarity 
with  the  particular  type  of  system  and  particular  conditions  un¬ 
der  which  a  user  performs  a  task. 

Checking  Usability  Under  Realistic  Conditions 

With  just  about  any  type  of  interactive  system,  new  lessons  are 
likely  to  be  learned  when  a  working  prototype  (or  the  finished 
system  itself)  is  tested  in  realistic  situations,  even  if  the  sys¬ 
tem  has  been  studied  thoroughly  in  earlier  stages.  With  user- 
adaptive  systems,  realistic  testing  is  especially  advisable  because 
of  the  issues  discussed  in  the  section  on  usability  challenges, 
whose  importance  for  a  given  system  can  often  be  assessed  only 
in  real  use.  For  example,  an  obtrusive  proactive  recommender 
might  be  considered  quite  acceptable,  or  even  amusing,  when 
the  user  is  performing  some  artificial  assigned  task  in  a  labora¬ 
tory;  in  the  real  world,  when  he  or  she  is  under  pressure  to 
complete  an  important  task  quickly,  the  interruptions  may  be 
evaluated  quite  differently.  Similarly,  privacy  issues  are  serious 
mainly  when  real  data  about  the  user  are  involved  and  when 
misuse  could  have  real  consequences. 


THE  FUTURE  OF  USER-ADAPTIVE  SYSTEMS 


This  chapter  has  shown  that  adaptive  interfaces,  agents,  and 
other  user-adaptive  systems  do  not  represent  a  smooth  and  easy 
shortcut  to  more  successful  HCI:  They  present  a  complex  set 
of  usability  challenges,  and  they  require  carefully  designed 
methods  of  acquiring  information  about  users,  as  well  as  rela¬ 
tively  sophisticated  computational  techniques  that  are  not 
needed  in  other  types  of  interactive  system.  Even  when  all  of 
these  requirements  have  been  dealt  with,  it  is  often  tricky  to 
prove  empirically  that  user-adaptivity  has  actually  added  any 
value.  It  is  no  wonder  that  some  experts  believe  that  the  inter¬ 
ests  of  computer  users  are  better  served  by  continued  progress 
within  more  familiar  paradigms  of  user-centered  system  design. 

On  the  other  hand,  our  understanding  of  the  complex  chal¬ 
lenges  raised  by  user-adaptive  systems  has  been  growing  steadily, 
and  they  are  now  familiar  and  valued  elements  in  a  number  of 
types  of  system,  as  the  survey  in  the  first  two  major  sections  of 
this  chapter  showed. 

Growing  Need  for  User-Adaptivity 

Increases  in  the  following  variables  suggest  that  the  functions 
served  by  user-adaptivity  will  continue  to  grow  in  importance: 

Diversity  of  users  and  contexts  of  use.  As  several 
chapters  in  this  handbook  make  clear,  computing  devices  are 
being  used  by  an  ever-increasing  variety  of  users  in  an  increasing 
variety  of  contexts.  It  is  therefore  becoming  harder  to  design  a 
system  that  will  be  suitable  for  all  users  and  contexts  without 
some  sort  of  user-adaptivity  or  user-controlled  adaptability;  and 
as  has  been  discussed  at  several  points  in  this  chapter,  user-con- 
trolled  adaptability  has  its  limitations. 

Number  and  complexity  of  interactive  systems.  The 

functions  of  user-adaptivity  discussed  in  the  first  major  section 
of  this  chapter  involve  helping  users  to  deal  effectively  with  in¬ 
teractive  systems  and  tasks  even  when  they  are  not  able  or  will¬ 
ing  to  gain  complete  understanding  and  control  in  each  indi¬ 
vidual  case.  This  goal  becomes  increasingly  important  as  the 
number — and  in  some  cases  the  complexity — of  the  systems 
that  people  must  deal  with  continues  to  increase — because  of 
factors  ranging  from  the  growth  of  the  World  Wide  Web  to  the 
proliferation  of  miniature  interactive  computing  devices. 

Scope  of  information  to  be  dealt  with.  Even  when  us¬ 
ing  a  single,  relatively  simple  system,  users  today  can  often  ac¬ 
cess  a  much  larger  and  more  diverse  set  of  objects  of  interest 
than  they  could  a  few  years  ago — be  they  documents,  products, 
or  potential  collaborators.  It  is  therefore  becoming  relatively 
more  attractive  to  delegate  some  of  the  work  of  dealing  with 
these  objects — even  to  a  system  that  has  an  imperfect  model 
of  the  user’s  requirements.  In  the  early  1990s,  the  idea  that  an 
e-mail  sorting  agent,  such  as  the  one  described  by  Maes  (1994), 
might  delete  an  incoming  message  without  consulting  the  user 
seemed  preposterous  to  many  people.  After  the  huge  increase 
in  the  amount  of  (largely  unwanted)  e-mail  that  has  occurred 
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since  then,  many  people  now  regularly  allow  dozens  of  their  in¬ 
coming  messages  to  be  deleted  unseen. 

Increasing  Feasibility  of  Successful  Adaptation 

As  the  need  for  user-adaptivity  increases,  fortunately  so  does 
its  feasibility,  largely  because  of  advances  in  the  following  areas: 

Ways  of  acquiring  information  about  users.  Most  of 
the  methods  discussed  in  the  section  about  acquiring  informa¬ 
tion  about  users  are  becoming  more  powerful  with  advances  in 
technology  and  research.  They  therefore  offer  the  prospect  of 
substantial  increases  in  the  quality  of  adaptation — although 
methods  for  ensuring  users’  privacy  call  for  equal  attention. 

Advances  in  techniques  for  learning,  inference,  and 
decision.  In  addition  to  the  more  general  progress  in  the 
Helds  of  machine  learning  and  artificial  intelligence,  communi¬ 
ties  of  researchers  have  been  focusing  on  the  specific  require¬ 
ments  of  computational  techniques  that  support  user-adaptivity. 
Consequently,  noticeable  progress  is  being  made  every  year. 

Attention  to  empirical  methods.  The  special  empirical 
issues  and  methods  that  are  involved  in  the  design  and  evalua¬ 
tion  of  user-adaptive  systems  have  been  receiving  increasing  at¬ 
tention  from  researchers,  as  emphasis  has  shifted  from  high 


technical  sophistication  to  ensuring  that  the  systems  enhance 
the  users’  experience. 

Despite  these  tendencies,  it  is  actually  unlikely  that  the  num¬ 
ber  of  deployed  systems  associated  with  labels  like  “user-adap¬ 
tive”  will  increase.  Once  an  adaptation  technique  has  left  the 
research  laboratory  and  started  playing  some  genuinely  useful 
role  in  people’s  lives,  it  tends  to  be  described  in  terms  of  the 
function  that  it  serves  rather  than  in  terms  of  the  techniques 
that  it  uses.  Awareness  of  the  commonalities  discussed  in  this 
chapter  should  help  both  to  increase  the  number  of  systems 
that  succeed  in  this  way  and  to  recognize  them  despite  the  new 
labels  that  are  placed  on  them. 
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THE  MOBILE  EXPERIENCE 


Interacting  With  Networked  Ecosystems 

The  parallel  evolution  of  network  technologies  and  social  be¬ 
haviors  is  transforming  the  character  of  interactive  systems. 

The  conventional  model  of  the  individual  working  in  front  of 
a  personal  computer  is  challenged  by  innovative  ways  of  access¬ 
ing  content  and  communication;  networked  mobile  wireless  de¬ 
vices  grow  their  functionality,  handheld  interfaces  become 
richer,  mobile  distributed  delivery  of  content  will  be  fully  inte¬ 
grated  with  conventional  media,  and  new  social  forms  of  com¬ 
munication  are  emerging. 

The  mobile  component  of  this  system  of  interactions,  cen¬ 
tered  on  the  device  formerly  known  as  the  cell  phone,”  has 
been  among  the  most  impressive  techno-cultural  phenomena 
of  the  past  10  years,  and  its  evolution  is  still  changing  drastically 
the  way  human  interaction  is  conceived.  Mobility  is  only  one 
aspect  of  this  revolution;  mobile  devices  also  happen  to  be 
much  more  related  to  the  individuals  that  use  them,  and  follow 
constantly  both  their  private  and  work  life.  Thus,  the  interaction 
with  these  tools  is  characterized  by  a  combination  of  functional 
and  emotional  aspects. 

Present  interaction  models,  even  the  ones  developed  re¬ 
specting  the  rules  of  usability,  often  lack  fundamental  emotional 
qualities  that  are  needed  to  support  and  increase  existing  social 
behaviors  and  rituals. 

The  paradigm  of  pervasive  and  ubiquitous  networking,  in¬ 
stead  of  gravitating  around  the  personal  computer,  is  taking  the 
shape  of  an  ecosystem  of  mobile  and  static  platforms,  for  indi¬ 
vidual  and  collective  use,  all  connected  to  the  network,  and  their 
coordination  is  taking  the  shape  of  a  choreography  of  interactions 
to  provide  access  to  integrated  content  and  communication. 

“Convergence”  seems  finally  to  take  shape,  centered  on 
three  platforms  that  dominate  this  experience  ecosystem: 

•  Virtually  all  personal  computers  are  on  the  net,  and  most  of 
them  can  access  the  net  via  mobile  wireless  technologies. 

•  Two  billion  cell  phones,  most  of  them  capable  to  network 
both  voice  and  net,  are  in  the  hands  of  people  of  all  cultures 
and  statuses. 

•  The  predictable  transition  of  cable  TV  to  IP  networked  TV  is 
about  to  transform  the  paradigm  of  interacting  with  video, 
movies,  and  TV  content.  Around  the  Ty  other  distributed 
devices  (the  evolution  of  the  “device  formerly  known  as  the 
remote  control”)  will  increase  the  mobility  within  the  home. 

•  Embedded  networking  capabilities  are  growing  in  other 
“appliances,”  such  as  cars,  or  in  any  “thing”  (via  the  RFID 
technology),  and  in  active  environments. 

•  All  of  the  above  is  going  to  be  connected  with  a  multiplicity  of 
technologies;  the  integration  of  wired  and  wireless,  and  the 
integration  of  different  technologies  within  each  one  of  these 
domains  (e.g.,  cellular  networks  talking  to  local  wireless  net¬ 
works  at  different  scales,  from  Bluetooth  to  wifi  to  RFID), 
transforms  the  picture  of  a  network  into  a  flexible,  capable, 
seamless,  “opportunistic”  infrastructure  that  can  take  advan¬ 
tage  of  any  available  connectivity. 


Similar  social  and  cultural  disruptions  are  happening  in  par¬ 
allel  with  this  fully  developed  technology  disruption: 

•  Format  and  genres  of  communication  integrate  and  hybridize; 

•  Content  becomes  deconstructed  and  pervasive; 

•  Based  on  the  intersection  of  the  two  aspects  above,  new 
types  of  communities  are  emerging,  and  new  social  models 
growing  around  the  generation  and  distribution  of  content 
explore  the  space  between  one-to-one  communication  (the 
evolution  of  the  telephone)  and  broadcast  (the  evolution  of 
mass  media) . 

Devices  of  different  type  become  multiple  points  of  access  to 
the  network,  and  the  Net  is  much  more  embedded  in  the 
everyday  reality,  both  because  it  is  more  integrated  with  spaces 
and  artifacts  of  everyday  life,  and  because  it  influences  more  and 
more  human  activities  and  social  systems. 

Content,  Communication,  Control 

The  original  interaction  with  personal  computers  focused  on 
doing  (writing  a  document,  make  calculations  on  a  spread 
sheet,  and  so  on),  and  its  physical  context  was  a  single  individ¬ 
ual  user  sitting  in  front  of  his  computer,  isolated  from  the  sur¬ 
rounding  physical  space.  Today,  the  merging  of  computer  tech¬ 
nologies  with  telecommunication  technologies  and  wireless 
network  access  calls  for  a  completely  different  paradigm:  com¬ 
puters,  mobile  devices,  and  networks  are  not  only  a  complex  set 
of  tools  to  “do”  things,  to  perform  tasks.  Instead,  they  are  above 
all  gates  to  communication  media  that  connect  people,  hybrid 
physical  and  digital  collaborative  spaces  to  meet  others,  and 
tools  to  access  published,  dynamic,  and  ubiquitous  content. 

The  interfaces  of  these  devices  aren’t  anymore  exclusively 
mediators  between  a  machine  and  an  individual,  but  much 
more  often  are  the  interfaces  that  structure  the  user’s  commu¬ 
nication  mental  model,  or  interfaces  that  organize  and  catego¬ 
rize  in  an  accessible  way  the  vast  space  of  networked  content. 

Simplicity,  the  key  attribute  that  allowed  in  the  past  easy  ac¬ 
cess  to  the  functionality  of  a  “machine,”  becomes  today  a  much 
more  sophisticated  concept,  because  a  “simple”  access  to  the 
user’s  sphere  of  communication  and  content  cannot  come  only 
from  a  functional,  efficient  organization  of  the  interaction,  but  is 
much  more  rooted  in  subjective  and  cultural  aspects,  such  as 
the  mental  construct  of  the  user’s  social  universe. 

In  this  chapter  we’ll  investigate  the  specifics  of  interaction 
with  a  multiplicity  of  mobile  networked  devices,  and  we’ll  see 
how  the  whole  paradigm  of  networked  reality  changes  the  way 
we  think  about  HCI.  'We’ll  look  at  any  networked  device, 
whether  mono-functional  or  multi-functional,  such  as  cell  phones, 
networked  photo-cameras,  networked  digital-music  players, 
digital  Ty  and  so  on,  as  points  of  access  to  this  vast  ecosystem  of 
networked  content,  communication,  and  control. 

From  Interaction  to  Experience 

In  a  networked  universe  that  has  no  center,  we’ll  use  the  human 
user  as  a  subjective  viewer  of  the  whole  experience  ecosystem. 
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Around  this  hypothetical  user,  we’ll  describe  different  concen¬ 
tric  “spheres”  of  interaction.  The  first  “sphere,”  closest  to  the 
user,  corresponds  to  what  usually  is  taken  in  consideration 
when  designing  a  computer  system:  the  physical  interaction 
with  a  device.  The  other  “spheres,”  growing  progressively  to  in¬ 
clude  multiple  elements  of  this  experience  ecosystem,  will  in¬ 
clude  topics  related  to  human-to-human  communication  or  ac¬ 
cess  to  content,  or  interacting  with  objects  in  a  space. 

When  mobile  devices  are  enriched  by  multiple  functions  and 
applications,  this  whole  ecosystem  is  what  actually  influences 
the  overall  experience  of  the  user.  This  is  the  new  territory  for 
designers  to  act  and  define  the  character  of  a  product,  the  fea¬ 
tures  of  a  service,  or  the  identity  of  a  medium. 


SPHERES  OF  INTERACTION 


Sphere  1 :  Of  Humans  and  Devices 


The  first  sphere  of  interaction,  the  most  intimate  connection 
between  a  human  and  a  device,  is  the  physical  manipulation. 
The  very  nature  of  handheld  devices  makes  this  manipulation  a 
key  component  of  the  relationship  between  user  and  system.  Al¬ 
though  handheld  devices  like  mobile  phones  have  conventional 
keypads  and  a  screen  that  sometimes  mimics  the  navigation  in 
the  GUI  (Graphical  User  Interface)  of  the  overall  experience  is 
extremely  different  from  using  a  PC.  The  manipulation  and  re¬ 
lation  with  the  hand,  and  in  some  respect  with  the  rest  of  the 
body  (such  as  through  placing  the  device  in  a  bag  or  in  the 
pocket),  roots  these  objects  in  the  history  of  tools  that  are  real 
extensions  of  the  hand,  such  as  pens.  Also,  the  handheld  na¬ 
ture  of  mobile  devices  highlights  social  aspects,  such  as  the 
proxemic  role  of  the  object,  like  handing  over  a  phone  to  show 
a  picture  to  a  friend,  for  example.  In  this  context,  rich  and  so¬ 
phisticated  socio-cultural  aspects  complement  the  basic  er¬ 
gonomic  components  of  the  interaction. 

Cars  and  Motorcycles 

If  you  talk  to  a  passionate  biker,  you’ll  most  often  hear  a  com¬ 
parison  between  the  emotional  aspects  of  driving  a  motorcycle 
compared  to  the  “dullness”  of  driving  a  car;  riding  a  bike  is  a 
richer  sensorial  experience  that  involves  the  whole  body.  While 
one  drives  a  car  by  “codified”  controls,  proper  interfaces  that  act 
on  the  machine,  riding  a  bike  involves  an  extension  of  the  body. 


Actually,  riding  a  bike  is  a  hybrid  between  some  coded  controls 
(such  as  clutch  and  brakes)  and  “natural”  proprioceptive  con¬ 
trols  (such  as  moving  the  body  while  making  a  turn). 

Consequently,  a  car  may  be  perceived  as  “passive,”  while  rid¬ 
ing  a  bike  is  perceived  as  an  active  and  engaging  experience. 
Furthermore,  riding  a  bike  puts  you  in  direct,  immediate  con¬ 
nection  with  the  surrounding  environment.  From  a  certain 
point  of  view,  the  same  can  be  said  of  personal  computers  and 
cell  phones.  Computers’  only  relationship  with  the  body  is  the 
flat  space  of  the  desk,  the  movements  of  fingers  on  the  key¬ 
board  and  the  hand  on  the  mouse.  We  use  computers  while  sit¬ 
ting  alone  at  a  desk,  our  bodies  still,  and  our  minds  isolated 
from  the  surrounding  environment.  Mobile  devices  such  as  cell 
phones  are  handheld,  and  the  relationship  with  the  hands  is  a 
fundamental  character  of  the  interaction.  A  PC  provides  an  ex¬ 
perience  that  atrophies  our  body  to  minuscule  movements  of 
the  fingers  and  the  wrist,  while  handling  a  cell  phone  is  an  ex¬ 
perience  that  may  involve  larger  gestures  and  has  a  richer  rela¬ 
tionship  with  the  body.  We  use  mobile  phones  while  walking, 
sitting,  standing,  and  driving.  We  use  them  in  isolation  or  in  a 
crowd.  The  experience  of  communicating  via  a  mobile  phone 
is  only  partially  immersive;  the  environmental  noise  and  the  dis¬ 
tractions  of  the  environment  balance  this  immersion,  and  pro¬ 
vide  meaningful  context  to  our  activity.  Cell  phones  have  trans¬ 
formed  the  way  we  approach  others  and  the  power  of  the 
context  enriches  the  communication;  rather  than  asking,  “How 
are  you?”  when  we  start  a  conversation,  we  ask,  “Where  are 
you?”  as  if  the  environment  was  an  integral  part  of  the  conver¬ 
sation.  The  limited  size  of  the  device,  which  is  in  some  respects 
a  limit  to  the  interaction  because  the  screen  is  small  and 
crowded,  actually  helps  provide  a  further  element  of  context; 
with  such  a  small  screen,  we’re  forced  to  see  what’s  around  it, 
and  the  interaction  is  always  peripheral,  never  immersive,  and 
always  hybridized  by  the  “real”  experience  of  the  world  around. 

The  Ritual  Power  of  Hands 

Consider  the  way  people  take  pictures  with  a  cell  phone. 
Sometimes  the  camera  phone  is  pointed  toward  the  subject  of 
the  photo  like  a  camera,  and  sometimes  the  phone  is  used  in  a 
very  different  way  to  take  pictures  that  are  much  more  sponta¬ 
neous  and  transient.  To  be  fair,  the  way  camera  phones  are  used 
had  been  anticipated  by  a  phenomenon  grown  around  very 
conventional  film-based  cameras.  In  the  early  1990s,  users  of  a 
camera  called  Lomo  started  to  take  very  spontaneous  pictures 
by  not  aiming  in  the  viewfinder.  The  Lomo  phenomenon  trans¬ 
formed  the  genre  of  photography  with  a  disruption  that  was 
purely  “behavioral”:  a  new  gesture,  such  as  handling  the  camera 
without  bringing  it  near  the  eye,  revolutionized  photography 
and  made  the  images  much  more  “instantaneous”  and  genuine. 
This  gave  birth  to  a  new  photographic  genre  that  was  called  To¬ 
mography.  In  Tomography  the  gesture  is  the  photo.  The  way 
people  manipulate  camera  phones  is  very  similar.  In  this  case, 
the  instantaneous  photo  genre  is  reinforced  by  the  fact  that 
photos  are  taken  to  be  shared,  not  to  be  kept.  "While  in  origin 
photography  was  conceived  as  a  way  of  “conserving”  reality  by 
documenting  it,  networked  photos  are  transient,  not  perennial. 
And  the  informality  of  the  gesture  of  “catching”  them  is  a  fun- 
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damental  part  of  both  the  interaction  and  the  narratives  of  this 
genre  of  photography.  Designing  a  form  that  facilitated  or  de¬ 
nied  this  spontaneity  would  be  for  a  designer  a  way  of  confirm¬ 
ing,  or  denying,  this  behavior,  and  as  a  consequence  a  way  of 
influencing  the  genre  of  photographs  taken  with  such  a  device. 
In  this  case,  the  whole  form  and  shape  of  the  device,  rather  than 
an  interface  or  a  control,  is  the  fundamental  component  of  the 
interaction.  Better,  the  shape  of  the  device  is  the  interface. 
While  this  is  relatively  new  in  the  history  of  human  computer  in¬ 
teraction,  it  is  not  new  in  the  history  of  objects. 

Form  Follows  Gesture 


In  the  history  of  material  culture,  a  history  that  includes, 
for  example,  ancient  cooking  utensils  and  working  tools,  the  re¬ 
lationship  between  objects  and  hands  is  fundamental,  and  ends 
up  defining  the  ritual  relationship  between  humans  and  their 
tools.  We  are  facing  a  kind  of  “calligraphic”  use  of  an  object. 
The  idea  of  calligraphy,  etymologically  a  “beautiful  way  of  draw¬ 
ing  or  writing,”  is  very  much  related  to  the  idea  of  dignity  of  the 
gesture. 

In  addition  to  being  a  fundamental  component  of  the  use  of 
the  device,  the  overall  shape,  proportions,  and  mechanics  of  the 
object  are  also  establishing  the  “character”  of  the  object,  and 
with  that  they  influence  the  social  dynamics  around  the  usage  of 
the  object.  The  way  John  Travolta  pulls  the  antenna  of  his  flip 
phone  with  his  teeth  in  Pulp  Fiction  (in  the  topical  moment 
when  his  friend  just  shot  a  guy  by  error  in  a  car)  is  integral  part 
of  his  character,  and  of  the  violence  in  that  scene.  How  differ¬ 
ent  is  this  from  the  way  Anna  Magnani  “hugs”  the  handset  in  a 
cinema  rendition  of  Jean  Cocteau’s  theatrical  piece  “La  Voix  Hu- 
maine,”  a  love  monologue  “designed”  around  the  relationship 
between  the  woman — a  desperate  lover  who  has  been  aban¬ 
doned  by  her  partner — and  the  handset  of  her  phone,  the  only 
thread  left  between  her  and  her  lover.  Or  consider  how  different 
is  a  large  phone  hanging  from  the  holster  of  a  worker  from  a  tiny 
phone  held  constantly  in  the  hands  of  a  minute  Japanese 
teenager. 

The  manipulation  of  handheld  devices  also  has  a  social  role: 
sharing  a  photo  sometimes  means  also  grabbing  the  digital  cam¬ 
era  or  the  phone  and  handing  it  to  somebody  to  watch  at  the 
screen.  Even  if  the  screen  is  small,  sharing  it  with  a  friend  is  a 
rich  social  act.  Handheld  devices  are  part  of  the  proxemic  rela¬ 
tionship  between  humans,  other  humans,  and  the  social  space 
that  surrounds  them. 


When  devices  such  as  cell  phones  are  becoming  so  minia¬ 
turized  that  their  shape  is  no  more  influenced  by  their  techni¬ 
cal  components,  their  form  would  rather  be  defined  by  the 
way  people  manipulate  them.  And,  as  we  described  previously, 
the  manipulation  is  only  in  part  delimited  by  the  mere  rules 
of  usability. 


Tfie  Sense  of  Touch 


The  diffusion  of  keyboards  and  keypads  have  led  to  a  kind 
of  “buttonification”  of  the  human  being.  This  is  definitely  a  re¬ 
duction  of  the  power  of  the  sense  of  touch.  Our  senses  are  way 
more  sophisticated  than  the  basic  on/off  controls  that  they  op¬ 
erate.  In  the  case  of  handheld  devices,  this  sensorial  atrophy  is 
more  evident  for  at  least  two  reasons.  The  first  one  is  that 
while  we  hold  the  device  with  the  full  hand,  five  fingers  plus 
the  palm,  we  only  use  one  finger  to  actually  operate  the  key¬ 
pad  and  receive  tactile  feedback.  The  second  one  is  that,  for 
the  “peripheral”  nature  of  the  interaction  with  handhelds  that 
we  mentioned  before,  our  interaction  with  a  keypad  ignores 
other  fundamental  spatial  and  proprioceptual  clues,  such  as 
the  horizon,  our  wrist  position,  and  gravity.  Some  recent  ad¬ 
vances  in  sensing  and  feedback  technologies  have  finally 
fought  back  this  kind  of  sensorial  deprivation  that  keypads 
have  provoked.  With  the  introduction  of  haptics,  the  primi¬ 
tive  vibrators  used  in  originally  only  to  replace  the  ringer  tone 
of  phones  are  used  to  provide  a  broader  tactile  feedback.  Hap¬ 
tic  actuators  amplified  by  the  resonance  of  the  whole  mass  of 
the  handheld  can  stimulate  the  whole  palm  of  the  hand,  or 
they  could  be  used  in  a  denser  “resolution”  to  provide  local¬ 
ized  feedback  and  a  more  meaningful  haptic  “message”  to 
parts  of  the  hand.  Similarly,  the  introduction  of  accelerometers 
and  gravity  sensors  allows  the  device  to  be  “conscious”  of  its 
absolute  position  in  space  and  its  relative  position  to  the  hand 
of  the  user.  This  allows  users  to  take  full  advantage  of  the  pro¬ 
prioceptual  character  of  handheld  manipulation,  and  provides 
a  fuller  tactile  stimulation  as  a  feedback  to  this  manipulation. 
When  combined  with  a  proper  reflection  of  the  proprioceptual 
character  of  the  GUI,  the  overall  combination  of  enriched  tac¬ 
tile  and  visual  contextualized  experience  finally  takes  full  ad¬ 
vantage  of  the  handheld  nature  of  mobile  devices.  Localized 
buttons  will  remain  for  a  long  time,  but  combined  with  these 
additional  sensory  messages  they  will  provide  a  much  richer 
sensorial  experience. 


23.  Mobile  Interaction  Design  in  the  Age  of  Experience  Ecosystems  •  463 


Product  Species 


MUSIC  PHOTO  VIDtO 


Even  when  the  interaction  with  devices  is  properly  de¬ 
signed  and  based  on  the  principles  of  usability,  sometimes 
there  seems  to  be  an  additional  difficulty  in  understanding  the 
functions  of  an  object,  a  subtler  barrier  to  its  adoption.  An  ob¬ 
ject  may  fail  to  deliver  a  clear  message  of  its  functionality  to  the 
user.  The  issue  in  this  case  is  not  “how  easy  is  it  to  perform  a 
task  with  this  device?”  but  “what  does  this  device  do?”  or,  even 
more,  “what  is  this?”  In  the  past,  a  good  rule  to  introduce  a 
new  object  has  been  to  leverage  an  analogy  with  a  known,  fa¬ 
miliar  object. 

The  reference  to  familiar  devices  and  interactions  could  still 
apply  today,  but  can  also  become  too  big  of  a  limitation,  a  re¬ 
striction  of  new  functionalities  or  richer  paradigms  that  sim¬ 
ply  do  not  have  any  parallel  with  anything  familiar.  Too  many 
references  are  challenged:  networked  TV  has  no  channels;  mo¬ 
bile  phones  and  house  phones  hybridize;  e-mail  and  IM  inte¬ 
grate;  camera,  telephone,  and  video  functionalities  converge 
in  the  same  device.  Communication  paradigms  such  as  blog¬ 
ging,  or  content  distribution  models  such  as  podcasting,  sim¬ 
ply  do  not  have  any  reference  to  past,  established,  familiar 
models. 

Inventing  new  object  archetypes,  as  much  as  it  is  an  oxy¬ 
moron,  may  be  the  only  way  out  of  this  impasse — developing 
new  product  types  and  new  interaction  paradigms  when  playing 
analogy  to  existing  archetypes  is  just  not  applicable.  The  same 
applies  to  mono-functional  devices  in  comparison  to  multifunc¬ 
tional  ones;  the  history  of  interaction  design  is  full  of  successes 
of  simple,  mono-functional  devices  that  leveraged  this  analogy. 
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From  the  Palm  PDA  to  the  iPod,  all  fall  in  the  category  of  de¬ 
vices  that  “do  one  thing  and  do  it  right.”  In  the  future,  however, 
reducing  the  functionality  in  order  to  achieve  simplicity  of  use 
may  be  no  longer  possible.  Multifunctionality  may  offer  too 
much  of  an  advantage  to  be  compromised,  and  new  generations 
of  users,  younger  hyper-taskers,  wouldn’t  necessarily  appreciate 
oversimplification . 

So,  the  challenge  for  a  designer  is  to  ensure  that  an  object 
would  be  “understood”  while  preserving  multifunctionality 
and  richness  of  experience.  Designers  would  on  the  one  side 
develop  a  richer  perspective  on  visual  affordances,  one  that 
could  accommodate  multiple  and  partially  ambiguous  affor¬ 
dances  related  to  different  usages.  On  the  other  side,  beyond 
affordances,  designers  need  to  develop  a  better  understanding 
of  what  constitutes  a  recognizable,  understandable  “species” 
of  objects.  That  is  to  say,  rather  than  seeking  the  familiarity 
with  an  known  object  or  function  “as-is,”  developing  a  more 
sophisticated  understanding  of  how  familiarity(ies),  or  just 
components  of  this,  can  be  carried  over  along  the  evolution 
of  an  object,  or  carried  across  a  discontinuity  in  the  evolution 
of  such  an  object  or  functionality.  The  result  will  be  the  trans¬ 
formation  of  the  simple  concept  of  affordance  into  a  richer  no¬ 
tion  of  “objecthood,”  defined  as  the  ability  of  humans  to  rec¬ 
ognize  and  perceive  correctly  the  world  of  objects,  and  the 
mutual  ability  of  objects  to  be  understandable  and  to  overtly 
disclose  their  functionality.  That  is,  for  both  humans  and  ob¬ 
jects  to  develop  their  relationship  around  a  sixth  sense,  the 
“sense  of  meaning.” 
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Sphere  2:  Mental  Models  (and  Their  Representation) 


Human/device  interaction: 
digital  experience,  interface,  mental  models 


Mental  Models 


The  dominance  of  the  former  model  (centered  on  commu¬ 
nication)  or  the  latter  (centered  on  content  or  applications)  is 
not  yet  clear.  For  a  long  time  we’ll  probably  continue  to  have  hy¬ 
brids  with  the  co-presence  of  multiple  categorizations. 

What  is  common  to  both  these  models  and  different  from 
the  old  desktop  metaphor  is  that  they  have  a  less  direct  anal¬ 
ogy  with  an  interaction  space — such  as  the  physical  space  of  the 
desktop — and  they  have  more  reference  to  mental  constructs. 
As  such,  these  models  could  be  referred  to  as  “cosmographies.” 
While  a  metaphor  like  the  desktop  implies  having  a  reference 
with  something  that  exists,  a  cosmography  is  a  “general  de¬ 
scription  of  the  world  or  universe,”  and  this  could  embrace  de¬ 
scriptions  of  both  the  material  world  and  of  abstract  concepts — 
a  mental  model  of  something  that  eventually  does  not  have  any 
relation  with  the  material  reality. 


In  personal  computers,  the  metaphor  of  the  “desktop”  and 
the  “point  and  click”  in  that  digital  space  starts  to  feel  the  “com¬ 
petition”  of  other  models  of  interaction,  namely  the  information 
architectures  organized  around  e-mail  and  IM,  that  today  are 
often  dominant  compared  to  the  desktop  access  to  files  and 
folders. 

The  mental  construct  of  the  interaction  with  mobile  wire¬ 
less  media  is  intrinsically  centered  on  communication;  the 
Phonebook,  and  its  extension  to  include  presence,  status,  and, 
in  perspective,  location,  is  often  the  master  mental  model  that 
represents  our  communication  universe.  The  phonebook  acts 
as  a  point  of  departure  for  a  broad  series  of  actions:  starting  a 
call,  generating  a  message,  initiating  a  textual  chat,  checking 
the  online  status  of  our  “buddies,”  or  sending  a  picture  to 
somebody.  However,  with  the  diffusion  of  other  ways  of  con¬ 
textualizing  the  interaction,  such  as  location  detection,  a  user 
may  decide  to  switch  to  a  different  model  and,  for  example, 
“view  by  location”  and  reorganize  the  communication,  map¬ 
ping  it  over  the  actual  physical  surroundings.  In  this  model, 
a  user  may  reorganize  the  phonebook  to  check  first  “who’s 
around  me”  in  that  moment.  Even  if  the  idea  of  communicat¬ 
ing  in  physical  proximity  may  seem  paradoxical,  cell  phones 
are  often  used  to  create  an  additional  communication  chan¬ 
nel  (texting  or  sharing  images,  for  example)  with  people  who 
are  not  remote,  or  to  manage  meetings  with  people  in  physical 
proximity  (e.g.,  a  group  of  friends  in  the  same  city  arranging  a 
dinner  together). 

Other  mental  models  use  incoming  events  as  an  organiza¬ 
tional  context;  an  incoming  call  may  trigger  a  reorganization  of 
the  interface  around  the  person  who  calls  (bringing  in  the  fore¬ 
ground  previous  messages  or  pictures  exchanged  with  that  per¬ 
son)  ;  or  walking  in  front  of  a  place  may  trigger  some  location- 
based  content  and  contextual  actions  around  that  content;  or  an 
approaching  event,  such  as  a  meeting,  can  trigger  other  infor¬ 
mation  such  as  the  names  of  the  participants,  their  status,  and 
their  location. 

■While  the  taxonomies  arranged  around  the  sphere  of  com¬ 
munication  may  work  better  for  phone-first  devices,  the  prolif¬ 
eration  of  functions  (imaging,  video,  music  management,  e-mail, 
etc.)  could  suggest  taxonomies  based  around  a  clear  distinction 
of  applications,  or  around  categories  of  content  (my  photos,  my 
music,  my  phonebook,  etc.). 


Sphere  3 :  Context 


Interacting  with/in  the  physical 
space 


Physical  Context 

Mobile  interaction  cannot  be  isolated  from  its  context.  'While 
the  interaction  between  a  user  and  a  personal  computer  tends 
to  be  immersive  and  central,  and  is  agnostic  of  the  physical  con¬ 
text  of  the  user,  the  interaction  with  mobile  devices  is  typically 
contextual  and  peripheral. 
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Primitive  graphic  menus  and  icons  arranged  in  the  small 
screen,  a  kind  of  translation  of  the  conventional  PC,  are  the 
dominant  GUI  paradigm  in  wireless  mobile  media,  but  they  are 
only  the  most  superficial  aspect  of  the  interaction  with  devices 
like  cell  phones.  Multiple  other  aspects  of  interaction  with  cell 
phones  are  less  tangible,  yet  more  important,  than  the  keypad- 
and-screen  UI.  The  whole  experience  ecosystem  of  the  interac¬ 
tion  with  wireless  media  includes  the  combination  of  the  inter¬ 
action  with  the  physical  device,  the  interaction  with  its  services, 
and  with  the  physical  environment. 

Context  is  what  makes  mobile  interaction  much  richer  than 
computer  interaction.  Mobility  means  that  to  be  more  con¬ 
nected  to  the  physical  place,  not  more  disconnected  from  it; 
communication  relates  to  where  we  are  and  who  is  around  us. 
Location-based  services,  mobile  imaging,  and  mobile  blogging 
extend  this  concept  to  content  itself:  content  accessed  and  gen¬ 
erated  from  a  mobile  platform  has  the  potential  to  be  heavily 
contextualized  over  the  physical  environment. 


Smart  Objects  and  Active  Ambient 


also  take  note  of  their  environment,  recording  changes  in  tem¬ 
perature  and  so  on.  Their  broad  adoption  is  changing  the  way 
logistics  are  organized  around  raw  materials  and  products  trav¬ 
eling  to  transformation  and  to  markets. 

Smart  tags  also  enable  a  minute  level  of  interaction  with  the 
physical  space.  Cell  phones  equipped  with  RFID  tags  are  already 
enabling  financial  transactions  such  as  paying  a  ticket  on  the  un¬ 
derground  or  replacing  a  credit  card.  The  extension  of  this  level 
of  interaction  may  support  the  paradigm  of  “point  and  click  at 
real  environments.”  Handheld  devices  could  be  used  to  trigger 
contextual  information  when  pointed  at  other  objects  or  parts 
of  the  environment. 

Finally,  the  connection  of  these  tags  to  an  integrated  infor¬ 
mation  system  able  to  tell,  at  any  moment,  about  of  the  life  of  any 
item  of  produce,  for  example,  may  allow  the  quality,  the  origin, 
or  the  value,  to  be  tracked  in  real-time.  Produce  that  is  fresh  or 
older,  in-season  or  off-season,  near  or  far  from  its  origin,  with  or 
without  a  track  record  of  its  growth,  will  have  a  different  value.  In 
this  scenario,  smart  tags  may  enable  the  creation  of  digital  word- 
of-mouth,  similar  to  the  spreading  of  information  and  knowledge 
by  human  word-of-mouth,  connected  to  the  system  of  goods.  At¬ 
tached  to  items  will  be  information  and  knowledge.  It  is  possi¬ 
ble  to  describe  this  as  a  knowledge  aura,  and  this  is  the  most 
interesting  aspect  of  future  material-knowledge  systems.  Exper¬ 
iments  on  connecting  collective  wisdom  with  digital  systems  on 
the  field  have  been  undertaken  in  India  since  the  1990s.  The  fully 
fledged  scenario  will  allow  replication  in  a  digital  environment,  of 
the  wisdom  of  word-of-mouth  know-how. 

The  whole  idea  of  using  advertisement  before  the  purchase, 
and  separate  from  the  purchase  experience,  could  be  revolution¬ 
ized  by  the  presence  of  an  aura  of  information  that  follows  the 
product  until  the  point  of  purchase.  Through  this  aura  of  digital  in¬ 
formation,  the  product  would  “advertise  itself”  from  the  shelves  of 
a  shop.  Even  more  intriguing  is  the  idea  that  deeper  information 
about  a  product,  connected  to  similar  experiences  of  purchase 
and  use — what  is  known  as  a  recommendation  on  a  website — 
will  be  readily  available  at  any  moment  that  a  product  is  encoun¬ 
tered.  The  lure  and  temptations  that  attract  a  buyer  will  come  di¬ 
rectly  from  the  product,  from  the  background  voices  of  people 
that  bought  and  used  it,  and  by  the  life  history  of  the  product — a 
kind  of  very  detailed,  dynamic,  digital  certificate  of  authenticity. 


Sphere  4:  Communities 
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One  to  one  communication  formats; 
audio,  text,  multimedia;  real/non-real  time 


One  to  one  communication  narratives: 
i.e.  SMS,  email,  chat 


Smart  tags  are  very  tiny,  low-cost,  wireless  microcomputers. 
They  only  cost  a  few  cents  and  they  can  be  attached  to  any¬ 
thing — items  of  clothing,  food  cans,  spare  parts  for  cars,  electric 
irons,  and  so  on.  The  tags  can  be  programmed  to  provide  in¬ 
formation  about  the  items  to  which  they  are  attached.  They  are 
able  to  identify  a  product  and  network  this  information,  and  can 


Networked  media  are  characterized  by  the  hybridization  of  a 
multiplicity  of  communication  genres.  While  20  years  ago  the 
evolution  of  telephone  technologies  seemed  to  go  toward  hyper¬ 
realism — technologies  focused  on  replicating,  at  best,  face-to- 
face  communication — the  actual  evolution  of  telecommunication 
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has  denied  this  trend  and  pointed  toward  mediating  communi¬ 
cation  in  a  form  different  from  face-to-face.  Text  messaging  via 
SMS  created  communication  channei  that  was  simpiy  impossi- 
bie  to  achieve  with  face-to-face — often  more  intimate,  more 
enigmatic,  than  face-to-face,  and  compiementary  to  that.  Instant 
messaging  has  added  the  management  of  muitipie  one-to-one 
communications,  aiso  not  possibie  in  reai  face-to-face,  and,  iater, 
biogging  created  a  hybrid  between  one-to-one  expression  and 
broadcast.  Finaily  the  multimedia  capabiiities  of  devices  iike 
camera  phones  aliow  the  integration  between  communication 
and  content  sharing,  making  it  possible  to  share  experiences  in 
an  almost  permanent  way. 

Mobile  wireless  media  also  are  the  ideal  platform  to  sup¬ 
port  tribal  social  structures;  mobile  phones  have  favored  the 
birth  of  tribal  forms  of  few-to-few  and  few-to-many  commu¬ 
nication,  and  they  revitalize  the  scale  of  the  small  circle  of 
friends,  of  groups  of  people  who  are  in  constant  touch,  ex¬ 
changing  information  and  passions,  and  sharing  images  and 
music. 

Different  from  both  an  individual  that  communicates  one- 
to-one  with  his  interlocutor  and  from  a  broadcaster  with  a  pas¬ 
sive  mass  audience,  tribal  models  of  communication  define  a 
dense  flow  of  exchanges  within  a  restricted  circle  of  friends.  The 
space  for  tribal  relations,  superimposition  of  the  physical  space 
and  of  the  “digital  territory  of  belonging,”  also  amplifies  infor¬ 
mation  and  messages  while  building  a  shared  memory.  The  per¬ 
manence  of  content  in  a  shared  space  creates  not  a  simple  ex¬ 
change  of  messages  but  the  sedimentation,  the  memory,  and 
the  construction  of  a  shared  permanent  experience. 


Sphere  5:  Content 


[nformation  generation  and  publishing 


One  to  one  communication  formats; 
audio,  text,  multimedia;  real/non-real  time 


One  to  one  communication  narratives: 
i.e.  SMS,  email,  chat 


Mediated  social  behaviors 


Deconstructed  Content 


Mobile  imaging  is  an  already-established  form  of  hybridiza¬ 
tion  between  communication  and  content  capture.  Camera 
phones  as  an  imaging  platform  are  better  integrated  with  both 
the  PC  and  the  TV  Video  content  will  soon  follow,  and  this,  com¬ 
bined  with  the  integration  of  web  content  with  TV  productions, 
will  create  yet  another  “triple  point”  (mobile,  PC,  and  TV  and 
home  systems)  interaction  with  content.  As  soon  as  digital  mu¬ 
sic  players  become  networked  themselves  and  mobile  phones 
download  music  over-the-air,  the  music-content  ecosystem  will 
have  continuity  between  environments  and  platforms. 

The  first  consequence  this  will  have  from  an  interaction 
point  of  view  will  be  the  separation  of  content  creation  from 
content  delivery  and  the  consequent  “deconstruction”  of  con¬ 
tent.  The  possibility  of  capturing  images  with  different  devices 
(camera  phones  or  digital  cameras)  and  accessing  them  in  dif¬ 
ferent  modalities  (in  the  device  itself,  shared  with  others  or  pub¬ 
lished  on  the  web,  or  printed  or  accessed  from  a  PC  or  a  TV) 
are  already  evident  signs  of  separation  between  creation  and  ac¬ 
cess,  and  examples  of  truly  networked  access  to  content.  Mobile 
TV  will  not  be  an  exact  replica  of  TV  delivered  to  home.  The  lat¬ 
ter  has  already  been  transformed  by  Digital  Video  Recorders, 
and  broadcasts  have  been  separated  from  real  time.  Mobile  TV 
will  be  even  more  disconnected  from  the  real  time  of  broad¬ 
cast — often  buffered  and  stored  locally — but  also  re-associated 
with  the  “relative”  time  of  the  user;  the  possibility  of  caching 
content  on  a  mobile  device,  combined  with  the  need  for  “snack¬ 
ing”  shorter  versions  of  the  same  content  during  leftover  time 
while  commuting,  for  example,  will  make  the  idea  of  a  personal, 
adapted  palimpsest  more  realistic.  Content  will  thus  be  decon¬ 
structed  in  different  multiple  formats  (filling  up  many  lengths, 
between  the  short  trailer  and  the  full-length  movie)  and  con¬ 
sumed  from  different  platforms. 

Narrowcast 


“Convergence”  between  media,  for  long  just  a  buzzword,  is  hap¬ 
pening  for  real,  and  this  means  a  much  richer  interaction  for  the 
user.  Richer  communication — messaging,  voice,  and  so  on — is 
more  and  more  integrated  with  content  access,  such  as  sharing 
music,  videos,  and  images. 


Podcasting  and  blogging  are  the  signal  of  a  larger  trend  that 
opens  the  possibility  for  everybody  to  become  a  publisher.  In 
both  cases  the  broadcast  paradigm  evolves  from  one-to-many 
(where  the  one  is  an  author,  an  institutional  publisher  such  as  a 
TV  station)  to  any-to-many  (where  anybody,  with  a  limited  in- 
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one-to-one  few-to-few  broadcast 


vestment  and  a  much  leaner  technology,  can  become  a  pub¬ 
lisher).  But  there  is  more.  Both  blogging  and  podcasting  are  cre¬ 
ating  communities  of  adepts  that  do  not  behave  like  a  conven¬ 
tional  passive  audience  of  broadcast  mass  media.  They  feed  back 
and  create  a  closer  circle  of  communication  that  constitutes  an 
actual  “narrowcast”  environment,  a  scale  of  publishing  unheard 
of  before.  There  is  no  solution  of  continuity  between  the  one- 
to-one  paradigm  of  telephony  that  extends  to  the  larger  scale  of 
a  tribe  of  friends  and  the  blogging  and  podcasting  narrowcast 
that  takes  broadcast  to  a  smaller  scale.  Also,  communication  and 
content  are  so  hybridized  that  they  become  indistinguishable. 


Seamless  Experience  Ecosystems 

Today  no  interaction  can  be  isolated  in  the  confined  space 
of  human  computer  interaction;  any  consideration  on  the  in¬ 
teraction  between  a  user  and  a  mobile  device  cannot  ignore  the 
complexity  of  the  relational  “ecosystems”  that  exist  around  this 
interaction. 

The  components  of  this  complex  networked  experience  are 

1.  A  social  ecosystem,  as  mobile  devices  go  beyond  the  one- 
to-one  communication  to  mediate  very  rich  and  complex 
system  of  social  relationships; 

2.  A  content  ecosystem,  since  mobile  devices  allow  access  to 
images,  music,  and  videos;  and  peer-to-peer  content  gener¬ 
ation  and  sharing  introduce  a  completely  new  perspective 
of  content  authoring; 


3.  A  spatial  ecosystem,  because  location-based  services  and 
communication  strongly  “ground”  the  interaction  via  mo¬ 
bile  devices  in  the  physical  space; 

4.  A  business  ecosystem,  because  no  single  company  or  de¬ 
signer  can  control  the  overall  experience,  but  can  only  shape 
the  interaction  from  a  partial  point  of  control;  and  last,  but 
not  least 

5.  A  platform  ecosystem,  since  mobile  interaction  is  integrated 
with  interaction  with  other  platforms,  such  as  personal  com¬ 
puters  and  digital  TV 

The  complexity  of  these  ecosystems  puts  additional  chal¬ 
lenges  to  designers  that  want  to  shape  the  interaction. 

Among  these  challenges  are  the  following: 

Adapting  experiences  to  evolving  behaviors.  Con¬ 
vergence  will  change  drastically  the  way  users  perceive  their 
sphere  of  content  and  communication,  and  sometimes  will  give 
up  familiar  mental  models  to  embrace  new  paradigms  may  ac¬ 
tually  facilitate  the  ease  of  use.  This  is  in  apparent  contrast  with 
one  of  the  rules  of  user-centered  design,  which  is  to  refer  to  fa¬ 
miliar  models  and  avoid  disruptions.  For  example,  the  poten¬ 
tial  elimination  of  “channels”  when  TV  content  moves  over  IP 
may  help  embrace  a  new  paradigm  that  has  no  reference  to  the 
old  “zapping.”  In  the  same  way,  embracing  the  paradigm  of  one- 
phone-number-per-person  may  confuse  users  at  first,  but  will 
actually  facilitate  the  transition  to  a  communication  paradigm 
where  the  user,  not  the  device,  is  the  one  to  be  reached.  A  well- 
crafted,  user-oriented  approach  may  distinguish  between  the  re¬ 
spect  of  familiarities  that  are  structurally  embedded  in  human 
perception  and  pushing  back  resistance  of  old  mental  habits 
that  may  actually  hinder  innovation  and  cause  disruptions. 

Shaping  an  experience  without  controlling  the  full 
ecosystem.  The  design  culture  has  long  insisted  on  the  pro¬ 
tection  of  the  brand  and  on  establishing  a  “signature”  experi¬ 
ence  to  a  device  or  a  service.  In  the  future  it  will  be  very  rare, 
however,  that  a  single  brand,  thus  a  single  designer,  would  con¬ 
trol  the  full  experience  ecosystem.  Two  approaches  to  this  are 
possible:  one  is  a  resistance  to  sharing  components  of  the 
experience  with  other  brands;  another  is  embracing  the  com¬ 
plexity  of  the  system  and  finding  lean,  smart  ways  to  shape  the 
experience  without  actually  owning  and  controlling  all  the  ele¬ 
ments.  The  latter  seems  to  be  a  more  open-minded,  relaxed  way 
to  accept  changes  in  the  environment  a  designer  operates, 
while  the  former  may  end  up  in  a  stubborn,  desperate  attempt 
to  protect  a  territory  that  simply  cannot  be  protected  anymore. 
In  other  words,  it  is  better  to  adapt  and  shape  an  experience  by 
seeding  signature  elements  in  few  key  areas  rather  than  working 
on  a  more  rigid  model  and  risking  losing  all  the  control. 

Embracing  the  idea  ofhuman-to-human  (and  human- 
to-content)  mediation  rather  than  human-machine 
interaction.  The  HCI  discipline  is  rooted  in  the  interaction 
between  a  human  and  a  machine  (or  a  system)  and  focused  on 
the  shift  from  a  mechanical  interaction  to  a  logical  one.  While 
this  competence  remains  true  for  the  broad  majority  of  devices 
we  are  designing  today,  it  is  also  true  that  another  aspect,  the 
one  of  designing  interfaces  to  access  other  humans  and  content 
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via  a  mediating  interface,  is  becoming  more  and  more  dominant. 
A  deeper  understanding  of  human  patterns  of  communication, 
and  of  the  interactions  between  interfaces  of  devices  and  gen¬ 
res  of  communication,  is  needed.  In  the  same  way,  the  under¬ 
standing  of  how  innovative  ways  of  interacting  with  content  may 
shape  new  types  and  formats  of  content  is  also  needed.  Media 
studies  were  born  in  a  condition  in  which  the  media  genre  (e.g., 
the  TV  programs),  the  transport  technology  (via  aerial  or  cable 
TV),  the  device  of  access  (the  TV  set),  the  method  of  interaction 
(zapping  through  channels),  the  method  of  delivery  (broadcast) 
and  the  interaction  environment  (a  couch  potato  in  a  living 
room)  of  a  given  medium  were  linked  and  homogeneous.  Today, 
a  certain  type  of  content  (e.g. ,  a  TV  program)  may  be  delivered  in 


a  multiplicity  of  formats  (real  time  or  cached,  in  snippets  or  in  a 
linear  fashion)  to  extremely  different  platforms  and  environ¬ 
ments  (at  home  or  while  mobile,  to  a  passive  individual  or  an¬ 
notated  and  redistributed  within  a  community).  This  poses  new 
challenges  to  designers  than  need  to  shape  at  the  same  moment 
an  innovative  interaction  and  an  innovative  medium  genre. 

Experience  ecosystems  are  reshaping  the  way  human  knowl¬ 
edge  is  grown  and  diffused.  Interaction  designers  have  the  re¬ 
sponsibility  of  both  shaping  a  pleasurable,  meaningful  experi¬ 
ence  and  of  facilitating  the  chain  of  innovations  that  flows 
through  new  behaviors,  new  business  models,  and  new  tech¬ 
nologies.  Designers  need  to  be  the  first  to  drive  innovation  in 
their  own  disciplines. 


Images  from  interaction  design  projects  developed  by  the  Advanced  Concepts  Group  in  Motorola  between  2000  and  2005. 
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INTRODUCTION 


where  the  sea  meets  the  land,  life  has  blossomed  into  a  myriad 
of  unique  forms  in  the  turbulence  of  water,  sand,  and  wind.  At 
another  seashore,  between  the  land  of  atoms  and  the  sea  of  bits, 
we  are  now  facing  the  challenge  of  reconciling  our  dual  citizen¬ 
ships  in  the  physical  and  digital  worlds.  Our  visual  and  auditory 
sense  organs  are  steeped  in  the  sea  of  digital  information,  but 
our  bodies  remain  imprisoned  in  the  physical  world.  Windows 
to  the  digital  world  are  confined  to  flat  square  screens  and  pix¬ 
els,  or  “painted  bits.”  Unfortunately,  one  can  not  feel  and  con¬ 
firm  the  virtual  existence  of  this  digital  information  through 
one’s  hands  and  body. 

Imagine  an  iceberg,  a  floating  mass  of  ice  in  the  ocean.  That 
is  the  metaphor  of  Tangible  User  Interfaces.  A  Tangible  User 
Interface  gives  physical  form  to  digital  information  and  com¬ 
putation,  salvaging  the  bits  from  the  bottom  of  the  water,  set¬ 
ting  them  afloat,  and  making  them  directly  manipulatable  by 
human  hands. 


FROM  GUI  TO  TUI 


People  have  developed  sophisticated  skills  for  sensing  and  ma¬ 
nipulating  their  physical  environments.  However,  most  of  these 
skills  are  not  employed  in  interaction  with  the  digital  world  to¬ 
day.  A  Tangible  User  Interface  (TUI)  is  built  upon  those  skills  and 
situates  the  physically  embodied  digital  information  in  a  physi¬ 
cal  space.  Its  design  challenge  is  a  seamless  extension  of  the 
physical  affordance  of  the  objects  into  digital  domain  (Ishii  & 
Ullmer,  1997,  2000). 

Interactions  with  digital  information  are  now  largely  con¬ 
fined  to  Graphical  User  Interfaces  (GUIs).  We  are  surrounded 
by  a  variety  of  ubiquitous  GUI  devices  such  as  personal  com¬ 
puters,  handheld  computers,  and  cellular  phones.  The  Graph¬ 
ical  User  Interface  (GUI)  has  been  in  existence  since  the  1970s 
and  the  first  appeared  commercially  in  the  Xerox  8010  Star  Sys¬ 
tem  in  1981  (Smith,  1982).  With  the  commercial  success  of  the 
Apple  Macintosh  and  Microsoft  Windows,  the  GUI  has  become 
the  standard  paradigm  for  Human  Computer  Interaction  (HCI) 
today. 

GUIs  represent  information  (bits)  with  pixels  on  a  bitmapped 
display.  Those  graphical  representations  can  be  manipulated 
with  generic  remote  controllers  such  as  mice  and  keyboards.  By 
decoupling  representation  (pixels)  from  control  (input  devices) 
in  this  way,  GUIs  provide  the  malleability  to  emulate  a  variety 
of  media  graphically.  By  utilizing  graphical  representation  and 
“see,  point,  and  click”  interaction,  the  GUI  made  a  significant 
improvement  over  its  predecessor,  the  CUI  (Command  User 
Interface)  which  required  the  user  to  “remember  and  type” 
characters. 

However,  interactions  with  pixels  on  these  GUI  screens  are 
inconsistent  with  our  interactions  with  the  rest  of  the  physical 
environment  within  which  we  live.  The  GUI,  tied  down  as  it  is  to 
the  screen,  windows,  mouse,  and  keyboard,  is  utterly  divorced 
from  the  way  interaction  takes  place  in  the  physical  world. 
When  we  interact  with  the  GUI  world,  we  cannot  take  advantage 


of  our  dexterity  or  utilize  our  skills  for  manipulating  various 
physical  objects,  such  as  building  blocks,  or  our  ability  to  shape 
models  out  of  clay. 

Tangible  User  Interfaces  (TUIs)  aim  to  take  advantage  of 
these  haptic  interaction  skills,  which  is  a  significantly  different 
approach  from  GUI.  The  key  idea  of  TUIs  is  to  give  physical 
forms  to  digital  information.  The  physical  forms  serve  as  both 
representations  and  controls  for  their  digital  counterparts.  TUI 
makes  digital  information  directly  manipulatable  with  our  hands, 
and  perceptible  through  our  peripheral  senses,  by  physically 
embodying  it. 

Tangible  User  Interface  serves  as  a  special-purpose  interface 
for  a  specific  application  using  explicit  physical  forms,  while  GUI 
serves  as  a  general-purpose  interface  by  emulating  various  tools 
using  pixels  on  a  screen. 

TUI  is  an  alternative  to  the  current  GUI  paradigm,  demon¬ 
strating  a  new  way  to  materialize  Mark  Weiser’s  (1991)  vision  of 
Ubiquitous  Computing  of  weaving  digital  technology  into  the 
fabric  of  a  physical  environment  and  make  it  invisible.  Instead  of 
making  pixels  melt  into  an  assortment  of  different  interfaces, 
TUI  uses  tangible  physical  forms  that  can  fit  seamlessly  into  a 
users’  physical  environment. 

This  chapter  introduces  the  basic  concept  of  TUI  in  com¬ 
parison  with  GUI,  early  prototypes  of  TUI  that  highlight  the 
basic  design  principles,  and  design  challenges  that  TUI  needs 
to  overcome. 


URP:  AN  EXAMPLE  OF  TUI 


To  illustrate  basic  TUI  concepts,  we  introduce  “Urp”  (Urban 
Planning  Workbench)  as  an  example  of  TUI  (Underkoffler  & 
Ishii,  1999).  Urp  uses  scaled  physical  models  of  architectural 
buildings  to  configure  and  control  an  underlying  urban  simula¬ 
tion  of  shadow,  light  reflection,  wind  flow,  and  so  on.  (Photo 
24.1).  In  addition  to  a  set  of  building  models,  Urp  also  provides 
a  variety  of  interactive  tools  for  querying  and  controlling  the 
parameters  of  the  urban  simulation.  These  tools  include  a  clock 
tool  to  change  the  position  of  sun,  a  material  wand  to  change 
the  building  surface  between  bricks  and  glass  (with  light  reflec¬ 
tion),  a  wind  tool  to  change  the  wind  direction,  and  an 
anemometer  to  measure  wind  speed. 

The  physical  building  models  in  Urp  cast  digital  shadows 
onto  the  workbench  surface  (via  video  projection),  corre¬ 
sponding  to  solar  shadows  at  a  particular  time  of  day.  The  time 
of  day,  representing  the  position  of  the  sun,  can  be  controlled 
by  turning  the  physical  hands  of  a  “clock  tool”  (Photo  24.2) .  The 
building  models  can  be  moved  and  rotated,  with  the  angle  of 
their  corresponding  shadows  transforming  according  to  their 
position  and  the  time  of  day. 

Correspondingly,  moving  the  hands  of  the  clock  tool  can 
cause  Urp  to  simulate  a  day  of  shadow  movement  between  the 
situated  buildings.  Urban  planners  can  identify  and  isolate  inter¬ 
shadowing  problems  (shadows  cast  on  adjacent  buildings),  and 
reposition  buildings  to  avoid  areas  that  are  needlessly  dark, or 
maximize  light  between  buildings. 

A  “material  wand”  alters  the  material  surface  properties  of  a 
building  model.  By  touching  the  material  wand  to  a  building 
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PHOTO  24. 1 .  Urp  and  shadow  stimulation. 


PHOTO  24.2.  Urp  and  wind  stimulation. 


model,  the  building  surface  material  is  switched  from  bricks  to 
glass,  and  a  projected  reflection  of  sunlight  appears  to  bounce 
off  the  walls  of  the  building.  Moving  the  building  allows  urban 
designers  to  be  aware  of  the  relationship  between  the  building 
reflection  and  other  infrastructure.  For  example,  the  reflection 
off  the  building  at  sundown  might  result  in  distraction  to  drivers 
on  a  nearby  highway.  The  designer  can  then  experiment  with 
altering  the  angles  of  the  building  to  oncoming  traffic  or  mov¬ 
ing  the  building  further  away  from  the  roadway.  Tapping  again 
with  the  material  wand  changes  the  material  back  to  brick,  and 
the  sunlight  reflection  disappears,  leaving  only  the  projected 
shadow. 


PHOTO  24.3.  inTouch. 


By  placing  the  “wind  tool”  on  the  workbench  surface,  a  wind 
flow  simulation  is  activated  based  on  a  computational  fluid  dy¬ 
namics  simulation,  with  field  lines  graphically  flowing  around 
the  buildings.  Changing  the  wind  tool’s  physical  orientation 
correspondingly  alters  the  orientation  of  the  computationally 
simulated  wind.  Urban  planners  can  identify  any  potential  wind 
problems,  such  as  areas  of  high  pressure  that  may  result  in 
had-to-open  doors  or  unpleasant  walking  environments.  An 
“anemometer”  object  allows  point  monitoring  of  the  wind  speed 
(Photo  24.3).  By  placing  the  anemometer  onto  the  workspace, 
the  wind  speed  of  that  point  is  shown.  After  a  few  seconds,  the 
point  moves  along  the  flow  lines,  to  show  the  wind  speed  along 
that  particular  flow  line.  The  interaction  between  the  buildings 
and  their  environment  allows  urban  planners  to  visualize  and 
discuss  inter-shadowing,  wind,  and  placement  problems. 

In  Urp,  physical  models  of  buildings  are  used  as  tangible  rep¬ 
resentations  of  digital  models  of  the  buildings.  To  change  the  lo¬ 
cation  and  orientation  of  buildings,  users  simply  grab  and  move 
the  physical  model  as  opposed  to  pointing  and  dragging  a 
graphical  representation  on  a  screen  with  a  mouse.  The  physical 
forms  of  Urp’s  building  models,  and  the  information  associated 
with  their  position  and  orientation  upon  the  workbench,  rep¬ 
resent  and  control  the  state  of  the  urban  simulation. 

Although  standard  interface  devices  for  GUIs,  such  as  key¬ 
boards,  mice,  and  screens,  are  also  physical  in  form,  the  role  of 
the  physical  representation  in  TUI  provides  an  important  dis¬ 
tinction.  The  physical  embodiment  of  the  buildings  to  represent 
the  computation  involving  building  dimensions  and  location 
allows  a  tight  coupling  of  control  of  the  object  and  manipulation 
of  its  parameters  in  the  underlying  digital  simulation. 

In  Urp,  the  building  models  and  interactive  tools  are  both 
physical  representations  of  digital  information  (shadow  dimen¬ 
sions  and  wind  speed)  and  computational  functions  (shadow  in¬ 
terplay)  .  The  physical  artifacts  also  serve  as  controls  of  the  un¬ 
derlying  computational  simulation  (specifying  the  locations  of 
objects).  The  specific  physical  embodiment  allows  a  dual  use  in 
representing  the  digital  model  and  allowing  control  of  the  digital 


472  •  iSHii 


representation.  In  the  next  section,  the  model  of  TUI  is  intro¬ 
duced  in  comparison  with  GUI  to  illustrate  this  mechanism. 


BASIC  MODEL  OF  TANGIBLE  USER  INTERFACE 


The  interface  between  people  and  digital  information  requires 
two  key  components:  input  and  output,  or  control  and  repre¬ 
sentation.  Controls  enable  users  to  manipulate  the  information, 
while  external  representations  are  perceived  with  the  human 
senses.  Figure  24.1  illustrates  this  simple  model  of  a  user  inter¬ 
face  consisting  of  control,  representation,  and  information. 

In  the  Smalltalk-80  programming  language  (Burbeck,  1992; 
Goldberg,  1984),  the  relationship  between  these  components  is 
illustrated  by  the  “model-view-controller”  or  “MVC”  archetype, 
which  has  become  a  basic  interaction  model  for  GUIs. 

Drawing  from  the  MVC  approach,  we  have  developed  an  in¬ 
teraction  model  for  both  GUI  and  TUI.  We  carry  over  the  “con¬ 
trol”  element  from  MVC,  while  dividing  the  “view”  element  into 
two  subcomponents:  tangible  and  intangible  representations, 
and  renaming  “model”  as  “digital  information”  to  generalize  this 
framework  to  illustrate  the  difference  between  GUI  and  TUI. 

In  computer  science,  the  term  representation  often  relates 
to  the  programs  and  data  structures  serving  as  the  computer’s 
internal  representation  (or  model)  of  information.  In  this  arti¬ 
cle,  the  meaning  of  “representation”  centers  upon  external  rep¬ 
resentations — the  external  manifestations  of  information  in 
fashions  directly  perceivable  by  the  human  senses  that  include 
visual,  hearing,  and  tactile  senses. 


GUI 


In  1981,  the  Xerox  Star  workstation  set  the  stage  for  the  first 
generation  of  GUI  (Johnson  et  al.,  1989;  Smith,  1982),  estab¬ 
lishing  the  “desktop  metaphor”  which  simulates  a  desktop  on 
a  bitmapped  screen.  The  Star  workstation  was  the  first  com¬ 
mercial  system  that  demonstrated  the  power  of  a  mouse,  win¬ 
dows,  icons,  property  sheets,  and  modeless  interaction.  The 
Star  also  set  several  important  HCI  design  principles,  such  as 
“seeing  and  pointing  vs.  remembering  and  typing,”  and  “what 


you  see  is  what  you  get  (WYSIWYG).”  The  Apple  Macintosh 
brought  this  new  style  of  HCI  to  the  public’s  attention  in  1984, 
creating  a  new  trend  in  the  personal  computer  industry.  Now, 
the  GUI  is  widespread,  largely  through  the  pervasiveness  of  Mi¬ 
crosoft  Windows,  PDAs,  and  cellular  phones. 

GUI  uses  windows,  icons,  and  menus  made  of  pixels  on 
bitmapped  displays  to  visualize  information.  This  is  an  intangi¬ 
ble  representation.  GUI  pixels  are  made  interactive  through 
general  “remote  controllers”  such  as  mice,  tablets,  or  keyboards. 
In  the  pursuit  of  generality,  GUI  introduced  a  deep  separation 
between  the  digital  (intangible)  representation  provided  by  the 
bitmapped  display,  and  the  controls  provided  by  the  mouse  and 
keyboard. 

Figure  24.2  illustrates  the  current  GUI  paradigm  in  which 
generic  input  devices  allow  users  to  remotely  interact  with  digi¬ 
tal  information.  Using  the  metaphor  of  seashore  that  separates 
a  sea  of  bits  from  the  land  of  atoms,  the  digital  information  is  il¬ 
lustrated  at  the  bottom  of  the  water,  and  mouse  and  screen  are 
above  sea  level  in  the  physical  domain.  Users  interact  with  the  re¬ 
mote  control,  and  ultimately  experience  an  intangible  external 
representation  of  digital  information  (display  pixels  and  sound) . 


TUI 


Tangible  User  Interface  (TUI)  aims  at  a  different  direction  from 
GUI  by  using  tangible  representations  of  information  that  also 
serve  as  the  direct  control  mechanism  of  the  digital  information. 
By  representing  information  in  both  tangible  and  intangible 
forms,  users  can  more  directly  control  the  underlying  digital 
representation  using  their  hands. 

Tangible  Representation  as  Control 

Figure  24.3  illustrates  this  key  idea  of  TUI  to  give  tangible  (phys¬ 
ical  and  graspable)  external  representation  to  the  digital  infor¬ 
mation.  The  tangible  representation  helps  bridge  the  boundary 
between  the  physical  and  physical  worlds.  Also  notice  that  the 
tangible  representation  is  computationally  coupled  to  the  con¬ 
trol  to  the  underlying  digital  information  and  computational 
models.  Urp  illustrates  examples  of  such  couplings,  including 
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FIGURE  24.3.  TUI  model. 


the  binding  of  graphical  geometries  (digital  data)  to  the  physical 
building  models,  and  computational  simulations  (operations)  to 
the  physical  wind  tool.  Instead  of  using  a  GUI  mouse  to  change 
the  location  and  angle  graphical  representation  of  a  building 
model  by  pointing,  selecting  handles,  and  keying  in  control  pa¬ 
rameters,  an  Urp  user  can  grab  and  move  the  building  model 
to  change  both  location  and  angle. 

The  tangible  representation  functions  as  an  interactive  phys¬ 
ical  control.  TUI  attempts  to  embody  the  digital  information  in 
physical  form,  maximizing  the  directness  of  information  by 
coupling  manipulation  to  the  underlying  computation.  Through 
physically  manipulating  the  tangible  representations,  the  digi¬ 
tal  representation  is  altered.  In  Urp,  changing  the  position  and 
orientation  of  the  building  models  influences  the  shadow  sim¬ 
ulation,  and  the  orientation  of  the  “wind  tool”  adjusts  the  sim¬ 
ulated  wind  direction. 


Intangible  Representation 

Although  the  tangible  representation  allows  the  physical  embod¬ 
iment  to  be  directly  coupled  to  digital  information,  it  has  limited 
ability  to  represent  or  change  many  material  or  physical  proper¬ 
ties.  Unlike  malleable  pixels  on  the  computer  screen,  it  is  very 
hard  to  change  a  physical  object  in  its  form,  position,  or  proper¬ 
ties  (e.g.,  color  or  size)  in  real  time.  In  comparison  with  malleable 
“bits,”  “atoms”  are  extremely  rigid,  taking  up  mass  and  space. 

To  complement  this  limitation  of  rigid  “atoms,”  TUI  also  uti¬ 
lizes  malleable  representations,  such  as  video  projections  and 
sounds,  to  accompany  the  tangible  representations  in  the  same 
space  to  give  dynamic  expression  of  the  underlying  digital  in¬ 
formation  and  computation.  In  the  Urp,  the  digital  shadow  that 
accompanies  the  physical  building  models  is  such  an  example. 

The  success  of  a  TUI  often  relies  on  a  balance  and  strong  per¬ 
ceptual  coupling  between  the  tangible  and  intangible  represen¬ 
tations.  It  is  critical  that  both  tangible  and  intangible  represen¬ 
tations  be  perceptually  coupled  to  achieve  a  seamless  interface 
that  actively  mediates  interaction  with  the  underlying  digital  in¬ 
formation,  and  appropriately  blurs  the  boundary  between  phys¬ 


ical  and  digital.  Coincidence  of  input  and  output  spaces  and 
real-time  response  are  important  requirements  to  accomplish 
this  goal. 

Note:  There  exist  certain  types  of  TUIs  which  have  actuation  of  the  tan¬ 
gible  representation  (physical  objects)  as  the  central  mean  of  feedback. 
Examples  are  inTouch  (Brave,  Ishii,  &  Dahley,  1998),  curlybot  (Frei,  Su, 
Mikhak,  &  Ishii,  2000),  and  topobo  (Raffle,  Parkes,  &  Ishii,  2004).  This 
type  of  force-feedback-TUI  does  not  depend  on  “intangible”  represen¬ 
tation  since  active  feedback  through  the  tangible  representation  serves 
as  the  main  display  channel. 


Key  Properties  of  TUI 

While  Fig.  24.2  illustrates  the  GUI’s  clear  distinction  between 
graphical  representation  and  remote  controls,  the  model  of  TUI 
illustrated  in  Fig.  24.3  highlights  TUI’s  integration  of  physical 
representation  and  control.  This  model  provides  a  tool  for  ex¬ 
amining  the  following  important  properties  and  design  require¬ 
ments  of  tangible  interfaces  (Ullmer  &  Ishii,  2000). 

Computational  coupling  of  tangible  representations 
to  underlying  digital  information  and  computation. 

The  central  characteristic  of  tangible  interfaces  is  the  coupling 
of  tangible  representations  to  underlying  digital  information 
and  computational  models.  One  of  the  challenges  of  TUI  design 
is  how  to  map  physical  objects  and  their  manipulation  to  digi¬ 
tal  computation  and  feedback  in  a  meaningful  and  comprehen¬ 
sive  manner. 

As  illustrated  by  the  Urp  example,  a  range  of  digital  couplings 
and  interpretations  are  possible,  such  as  the  coupling  of  data 
to  the  building  models,  operations  to  the  wind  tool,  and  prop¬ 
erty  modifiers  to  the  material  wand. 

Deciding  the  embodiment  and  mapping  of  the  controller  is 
dictated  by  the  type  of  application  envisioned.  We  give  exam¬ 
ple  cases  in  which  a  range  of  specificity  of  embodiment  is  used. 
In  some  applications,  more  abstract  forms  of  physical  objects 
(such  as  round  pucks)  are  used  as  generic  controllers  that  are 
reusable  to  control  a  variety  of  parameters  by  rotating  and  push¬ 
ing  a  button  (Patten,  Ishii,  Hines,  &  Pangaro,  2001).  When  a 
puck  is  used  as  a  dial  to  control  a  simulation  parameter,  graphi¬ 
cal  feedback  is  given  to  complement  the  information,  such  as 
scale  of  the  dial. 

Embodiment  of  mechanisms  for  interactive  control 
with  tangible  representations.  The  tangible  representa¬ 
tions  of  TUIs  serve  simultaneously  as  interactive  physical  con¬ 
trols.  Tangibles  may  be  physically  inert,  moving  only  as  directly 
manipulated  by  a  user’s  hands.  Tangibles  may  also  be  physically 
actuated,  whether  through  motor-driven  force-feedback  ap¬ 
proaches  (e.g.,  inTouch,  Curlybot)  or  magnet-driven  approaches 
such  as  Actuated  Workbench  (Pangaro,  Maynes-Aminzade,  & 
Ishii,  2002). 

Tangibles  may  be  unconstrained  and  manipulated  in  free 
space  with  six  degrees  of  freedom.  They  may  also  be  weakly 
constrained  through  manipulation  on  a  planar  surface,  or  tightly 
constrained,  as  in  the  movement  of  the  abacus  beads  with  one 
degree  of  freedom. 
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In  order  to  make  interaction  simple  and  easy  to  learn,  TUI 
designers  need  to  utilize  the  physical  constraints  of  the  chosen 
physical  embodiment.  Because  the  physical  embodiment,  to 
some  extent,  limits  the  interaction  choices,  a  designer  must  de¬ 
sign  the  interaction  so  that  the  actions  supported  by  the  object 
are  based  on  well-understood  actions  related  to  the  physical  ob¬ 
ject.  For  example,  if  a  bottle  shape  is  chosen,  then  opening  the 
bottle  by  pulling  out  a  cork  is  a  well-understood  mechanism 
(Ishii,  Mazalek,  &  Lee,  2001).  This  understanding  of  the  cultur¬ 
ally  common  manipulation  techniques  helps  disambiguate  the 
users’  interpretation  of  how  to  interact  with  the  object. 

Perceptual  coupling  of  tangible  representations  to 
dynamic  intangible  representations.  Tangible  interfaces 
rely  on  a  balance  between  tangible  and  intangible  representa¬ 
tions.  Although  embodied  tangible  elements  play  a  central, 
defining  role  in  the  representation  and  control  of  a  TUI,  there 
is  a  supporting  role  for  the  TUI’s  intangible  representation.  A 
TUI’s  intangible  representation — usually  graphics  and  audio — 
often  mediate  much  of  the  dynamic  information  provided  by 
the  underlying  computation. 

The  real-time  feedback  of  the  intangible  representation  cor¬ 
responding  to  the  manipulation  of  the  tangible  representation 
is  critical  to  insure  perceptual  coupling.  The  coincidence  of  in¬ 
puts  and  output  spaces  (spatial  continuity  of  tangible  and  in¬ 
tangible  representations)  is  also  an  essential  requirement  to  en¬ 
hance  perceptual  coupling.  For  example,  in  Urp,  the  building 
models  (tangible  representation)  are  always  accompanied  by  a 
“digital  shadow”  (intangible  representation)  without  noticeable 
temporal  or  spatial  gaps.  That  convinces  users  of  an  illusion  that 
the  shadows  are  cast  by  the  building  models  (rather  than  by  the 
video  projector). 

Genres  of  TUI  Applications 

By  giving  physical  form  to  digital  information  to  enhance  an  ex¬ 
perience,  TUIs  have  a  wide  variety  of  application  domains.  This 
section  gives  an  overview  of  seven  genres  for  promising  TUI  ap¬ 
plications.  For  a  more  exhaustive  survey  of  TUIs  in  a  historical 
context,  I  would  encourage  the  readers  to  refer  to  Ullmer  and 
Ishii  (2000),  Holmquist,  Redstr,  and  Ljungstrand  (1999),  and 
Fishkin  (2004).  Zuckerman,  Arida,  and  Resnick  (2005)  also  pro¬ 
vided  a  useful  taxonomy  and  frameworks  to  analyze  the  design 
space  of  TUIs. 

Tangible  Telepresence 

One  such  genre  is  an  interpersonal  communication  taking 
advantage  of  haptic  interactions  using  mediated  tangible  repre¬ 
sentation  and  control.  This  genre  relies  on  mapping  haptic  input 
to  haptic  representations  over  a  distance.  Also  called  “tangible 
telepresence,”  the  underlying  mechanism  is  the  synchronization 
of  distributed  objects  and  the  gestural  simulation  of  “presence” 
artifacts,  such  as  movement  or  vibration,  which  allow  remote 
participants  to  convey  their  haptic  manipulations  of  distributed 
physical  objects.  The  effect  is  to  give  a  remote  user  the  sense 
of  ghostly  presence,  as  if  an  invisible  person  was  manipulating 


a  shared  object.  InTouch  (Brave  &  Dahley  1997),  Handjive 
(Fogg,  Cutler,  Arnold,  &  Eisbach,  1998),  and  ComTouch  (Chang, 
O’Modhrain,  Jacob,  Gunther,  &  Ishii,  2002)  are  examples  of  this. 

Tangibles  with  Kinetic  Memory 

The  use  of  kinesthetic  gestures  and  movement  to  promote 
learning  concepts  is  another  promising  domain.  Educational 
toys  to  materialize,  record,  and  play  concepts  have  been  also  ex¬ 
plored  using  actuation  technology  and  taking  advantage  of  i/o 
coincidence  of  TUI.  Gestures  in  physical  space  illuminate  the 
symmetric  mathematical  relationships  in  nature,  and  the  kinetic 
motions  can  be  used  to  teach  children  concepts  relevant  to  pro¬ 
gramming  and  differential  geometry  as  well  as  storytelling. 
Curlybot  (Frei  et  al.,  2000)  and  topobo  (Raffle  et  al.,  2004)  are 
examples  of  toys  which  distill  ideas  relating  gestures  and  form 
to  dynamic  movement,  physics,  and  storytelling. 

Constructive  Assembly 

Another  domain  is  a  constructive  assembly  approach  that 
draws  inspiration  from  LEGO™  and  building  blocks,  building 
upon  the  interconnection  of  modular  physical  elements.  This 
domain  is  mainly  concerned  with  the  physical  fit  between  ob¬ 
jects,  and  the  kinetic  relationships  between  these  pieces  that 
enable  larger  constructions  and  varieties  of  movement. 

Constructive  assembly  was  pioneered  by  Aish  and  Frazer  in 
the  late  1970s.  Aish  developed  BBS  (Aish,  1979;  Aish  &  Noakes, 
1984)  for  thermal  performance  analysis,  and  Frazer  developed  a 
series  of  intelligent  modeling  kits  such  as  “Universal  Construc¬ 
tor”  (Frazer,  1994;  Frazer,  Frazer,  &  Frazer,  1980)  for  modeling 
and  simulation.  Recent  examples  include  GDP  (Anagnostou, 
Dewey,  &  Patera,  1989),  AlgoBlock  (Suzuki  &  Kato,  1993),  Trian¬ 
gles  (Gorbet,  Orth,  &  Ishii,  1998),  Blocks  (Anderson  et  al.,  2000), 
ActiveCube  (Kitamura,  Itoh,  &  Kishino,  2001),  and  System  Blocks 
(Zuckerman  &  Resnick,  2004).  Topobo  (Raffle  et  al.,  2004)  is  a 
unique  instance  that  inherits  the  properties  of  both  “construc¬ 
tive  assemble”  and  “tangibles  with  kinetic  memory.” 

Tokens  and  Constraints 

“Tokens  and  constraints”  is  another  TUI  approach  to  oper¬ 
ate  abstract  digital  information  using  mechanical  constraints 
(Ullmer,  Ishii,  &  Jacob,  2005).  Tokens  are  discrete,  spatially  re- 
configurable  physical  objects  that  represent  digital  information 
or  operations.  Constraints  are  confining  regions  within  which 
tokens  can  be  placed.  Constraints  are  mapped  to  digital  opera¬ 
tions  or  properties  that  are  applied  to  tokens  placed  within 
their  confines.  Constraints  are  often  embodied  as  physical 
structures  that  mechanically  channel  how  tokens  can  be  ma¬ 
nipulated,  often  limiting  their  movement  to  a  single  physical 
dimension. 

The  Marble  Answering  Machine  (Crampton  Smith,  1995)  is 
a  classic  example  which  influenced  many  following  research. 
MediaBlocks  (Ullmer,  Ishii,  &  Glas,  1998),  Logjam  (Cohen,  With- 
gott,  &  Piernot,  1999),  DataTile  (Rekimoto,  Ulmer,  &  Oba,  2001), 
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and  Tangible  Query  Interface  (Ullmer,  Ishii,  &  Jacob,  2003)  are 
other  recent  examples  of  this  genre  of  development. 

Interactive  Surfaces — Tafletop  TUI 

Interactive  surfaces  are  another  promising  approach  to  sup¬ 
port  collaborative  design  and  simulation  which  has  been  ex¬ 
plored  by  many  researchers  in  the  past  years  to  support  a  variety 
of  spatial  applications  (e.g.,  Urp).  On  an  augmented  workbench, 
discrete  tangible  objects  are  manipulated  and  their  movements 
are  sensed  by  the  workbench.  The  visual  feedback  is  provided 
on  the  surface  of  the  workbench,  keeping  input/output  space 
coincidence.  This  genre  of  TUI  is  also  called  “tabletop  TUI”  or 
“tangible  workbench.” 

Digital  Desk  (Wellner,  1993)  is  the  pioneering  work  in  this 
genre,  and  a  variety  of  tabletop  TUIs  were  developed  using  mul¬ 
tiple  tangible  artifacts  within  common  frames  of  horizontal  work 
surface.  Examples  are  metaDesk  (Ullmer  &  Ishii,  1997),  InterSim 
(Arias,  Eden,  &  Fisher,  1997),  Illuminating  Light  (Underkoffler  & 
Ishii,  1998),  Urp  (Underkoffler  &  Ishii,  1999),  Build-It  (Rauter- 
berg  et  al.,  1998),  Sensetable  (Patten  et  al.,  2001),  AudioPad  (Pat¬ 
ten,  Recht,  &  Ishii,  2002),  and  IP  Network  Design  Workbench 
(Kobayashi,  Hirano,  Narita,  &  Ishii,  2003). 

One  limitation  of  above  systems  is  the  computer’s  inability 
to  move  objects  on  the  interactive  surfaces.  To  address  this  prob¬ 
lem,  the  Actuated  Workbench  was  designed  to  provide  a  hard¬ 
ware  and  software  infrastructure  for  a  computer  to  smoothly 
move  objects  on  a  table  surface  in  two  dimensions  (Pangaro 
et  al.,  2002),  providing  an  additional  feedback  loop  for  computer 
output,  and  helping  to  resolve  inconsistencies  that  otherwise 
arise  from  the  computer’s  inability  to  move  objects  on  the  table. 

Continuous  Plastic  TUI 

Fundamental  limitation  of  previous  TUIs  was  the  lack  of  ca¬ 
pability  to  change  the  forms  of  tangible  representations  during 
the  interactions.  Users  had  to  use  predefined  finite  sets  of  fixed- 
form  objects,  changing  only  the  spatial  relationship  among 
them  but  not  the  form  of  individual  objects  themselves. 

Instead  of  using  predefined  discrete  objects  with  fixed  forms, 
the  new  type  of  TUI  systems  utilizing  continuous  tangible  mater¬ 
ial  such  as  clay  and  sand  were  developed  for  rapid  form-giving 
and  sculpting  for  the  landscape  design.  Examples  are  Illuminating 
Clay  (Piper,  Patti,  &  Ishii,  2002),  and  SandScape  (Ishii,  Ratti,  Piper, 
Wang,  Biderman,  &  Ben-Joseph,  2004).  Later  this  interface  was 
applied  to  the  browsing  of  3D  volume  metric  data  in  Phoxel- 
Space  project  (Ratti,  Wang,  Piper,  Ishii,  &  Biderman,  2004). 

Augmented  Everyday  Objects 

Augmentation  of  familiar  everyday  objects  is  an  important 
design  approach  of  TUI  to  lower  the  floor  and  to  make  it  easy  to 
understand  the  basic  concepts.  Examples  are  the  Audio  Note¬ 
book  (Stifelman,  1996),  musicBottles  (Ishii  et  al.,  1999),  Hand- 
Scape  (Lee,  Su,  Ren,  &  Ishii,  2000),  LumiTouch  (Chang,  Resner, 
Koerner,  Wang,  &  Ishii,  2001),  Designers’  Outpost  (Klemmer, 


Thomsen,  Phelps-Goodman,  Lee,  &  Landay,  2002),  and  I/O 
Brush  (Ryokai,  Marti,  &  Ishii,  2004).  It  is  a  challenge  for  indus¬ 
trial  designers  to  improve  upon  a  product  by  adding  some  digi¬ 
tal  augmentation  to  an  existing  digital  object.  This  genre  is  open 
to  much  eager  interpretation  by  artists  and  designers,  to  have 
our  everyday  physical  artifacts  evolve  with  technology. 

Ambient  Media 

In  the  early  stages  of  TUI  research,  we  were  exploring  ways 
of  improving  the  quality  of  interaction  between  people  and  dig¬ 
ital  information.  We  employed  two  approaches  to  extending 
interaction  techniques  to  the  physical  world: 

•  Allowing  users  to  “grasp  and  manipulate”  foreground  infor¬ 
mation  by  coupling  bits  with  physical  objects; 

•  Enabling  users  to  be  aware  of  background  information  at  the 
periphery  using  ambient  media  in  an  augmented  space. 

At  that  time,  HCI  research  had  been  focusing  primarily  on 
foreground  activity  on  the  screen  and  neglecting  the  rest  of  the 
user’s  computing  environment  (Buxton,  1995).  However,  in 
most  situations,  people  are  subconsciously  receiving  ambient 
information  from  their  peripheral  senses  without  attending  to 
it  explicitly.  If  anything  unusual  is  noticed,  it  immediately 
comes  to  their  attention,  and  they  could  decide  to  bring  it  to 
the  foreground.  For  example,  people  subconsciously  are  aware 
of  the  weather  outside  their  window.  If  they  hear  thunder,  or  a 
sudden  rush  of  wind,  the  user  can  sense  that  a  storm  is  on  its 
way  out  of  his  or  her  peripheral  attention.  If  it  was  convenient, 
the  user  could  then  look  outside,  or  continue  working  with¬ 
out  distraction. 

Ambient  media  describes  the  class  of  interfaces  that  is  de¬ 
signed  to  smooth  the  transition  of  the  users’  focus  of  attention 
between  background  and  foreground.  Natalie  Jeremijenko’s 
Live  Wire  in  1995,  at  Xerox  Parc,  was  a  spinning  wire  that  moved 
to  indicate  network  traffic.  Designing  simple  and  adequate  rep¬ 
resentations  for  ambient  media  using  tangible  objects  is  a  key 
part  of  the  challenge  of  Tangible  Bits  (Ishii  &  Ullmer,  1997). 

The  ambientROOM  is  a  project  that  explores  the  ideas  of  am¬ 
bient  media  constructing  a  special  room  equipped  with  embed¬ 
ded  sensors  and  ambient  displays  (Ishii  et  al.,  1998).  This  work 
was  a  preliminary  investigation  into  background/peripheral  in¬ 
terfaces,  and  led  to  the  design  of  standalone  ambient  fixtures 
such  as  Pinwheels  and  Walter  Lamps  that  make  users  aware  of 
“digital  wind”  and  “bits  of  rain”  at  their  peripheral  senses 
(Dahley  Wisneski,  &  Ishii,  1998). 

Strictly  speaking,  ambient  media  is  not  a  kind  of  TUI  since 
in  many  cases  there  are  no  direct  interactions.  Rather,  ambient 
media  serve  as  background  information  displays  that  comple¬ 
ment  tangible/graspable  media  that  users  manipulate  in  the 
foreground.  TUI’s  approach  to  ambient  media  is  concerned 
with  the  design  of  simple  mappings  that  give  easy-to-under- 
stand  form  to  cyberspace  information  and  represent  change  in 
a  subtle  manner.  We  started  experimenting  with  a  variety  of  am¬ 
bient  media  such  as  sound,  light,  airflow,  and  water  movement 
for  background  interfaces  for  awareness  of  cyberspace  at  the  pe¬ 
riphery  of  human  perception. 
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This  concept  of  “ambient  media”  is  now  wideiy  studied  in 
the  HCI  community  as  a  way  to  turn  the  architecturai/physicai 
spaces  into  an  ambient  and  cairn  information  environment.  An¬ 
other  design  space  is  iow-attention  interfaces  for  interpersonai 
communication  through  ambient  media  (Chang  et  ai.,  2001). 
Ambient  devices  further  commerciaiized  the  domain  of  iow-at¬ 
tention  ambient  media  interfaces  by  deveioping  the  Ambient 
Orb  and  Weather  Beacon,  expioring  the  new  genre  of  “giance- 
abie  interfaces”  (http://www.ambientdevices.com/). 

TUi  instances 

In  this  section,  10  TUI  examples  are  presented  to  illustrate  the 
potential  application  domains  described  in  a  previous  section, 
and  to  highlight  unique  features  of  TUIs.  However,  given  the 
limited  space  and  rapid  growth  of  TUI  research  in  HCI  commu¬ 
nity  in  recent  years,  the  collection  of  examples  introduced  here 
can  only  cover  a  relatively  small  portion  of  the  representative 
works  of  TUIs. 

InToucfi:  TangWle  Telepresence  Through  Distributed 
Synchronized  Physical  Objects 

InTouch  is  a  project  to  explore  new  forms  of  interpersonal 
communication  over  distance  through  touch  by  preserving  the 
physical  analog  movement  of  synchronized  distributed  rollers 
(Brave  &  Dahley  1997;  Brave  et  al.,  1998).  Force-feedback  is  em¬ 
ployed  to  create  the  illusion  that  people,  separated  by  distance, 
are  interacting  with  a  shared  physical  object.  The  “shared”  object 
provides  a  haptic  link  between  geographically  distributed  users, 
opening  up  a  channel  for  physical  expression  over  distance. 

Two  identical  mechanisms  were  built  with  three  freely  rotat¬ 
ing  rollers  (Photo  24.3).  Each  roller  is  synchronized  to  the  corre¬ 
sponding  roller  on  the  distant  mechanism  using  force-feedback, 
so  that  when  one  roller  is  moved  the  other  corresponding  roller 
also  moves.  If  the  movement  of  one  roller  is  held,  then  the 
roller  transmits  that  resistance  to  the  other  roller.  They  are  in  a 
sense  connected  by  a  stiff  computational  spring.  Two  users  sep¬ 
arated  by  distance  can  then  play,  moving  or  tapping  the  rollers 
or  more  passively  feeling  the  other  person’s  manipulation  of  the 
object.  The  presence  of  the  other  person  is  represented  tangibly 
through  physical  interaction  with  the  inTouch  device. 

Force-feedback  is  conventionally  used  to  allow  a  user  to 
“touch”  virtual  objects  in  the  computer  screen  through  a  single 
point.  InTouch  applies  this  technology  to  realize  a  link  for  in¬ 
terpersonal  haptic  communication,  instead  of  just  touching  vir¬ 
tual  objects.  InTouch  allows  people  to  feel  as  if  they  are  con¬ 
nected  through  touching  the  rollers,  to  another  person.  Instead 
of  touching  inanimate  objects,  each  person  is  touching  a  dy¬ 
namic,  moving  object  that  is  shared. 

Important  features  of  inTouch  from  HCI  points  of  view  can 
be  summarized  as  follows: 

1.  No  boundary  between  “input”  and  “output”  (i/o  coincidence: 
the  wooden  rollers  are  force  displays  as  well  as  input  devices); 

2.  Principal  human  input/output  organs  are  hands,  not  eyes  or 
ears  (with  the  sense  of  touch  being  the  primary  mode) ; 


3.  Information  can  be  sent  and  received  simultaneously  through 

one’s  hand. 

Past  communication  media  such  as  video-telephony  set 
themselves  the  ultimate  goal  of  reproducing  the  voice  or  the  im¬ 
age  of  the  human  face  and  body  as  realistically  as  possible  in 
order  to  create  the  illusion  of  “being  there”  for  each  interlocu¬ 
tor.  InTouch  takes  the  opposite  approach  by  making  users 
aware  of  the  other  person  without  ever  rendering  him  or  her 
in  bodily  terms,  and  creating  what  we  call  a  “tangible  presence” 
or  “ghostly  presence.”  By  seeing  and  feeling  an  object  being 
moved  in  a  human  fashion  on  its  own,  we  imagine  a  ghostly 
body.  The  concept  of  the  ghostly  presence  provides  us  with  a 
different  approach  to  the  conventional  notion  of  telepresence. 

Curlybot:  A  Toy  to  Record  and  Play 

Curlybot  is  a  toy  that  can  record  and  play  back  physical  motion 
(Photo  24.4).  As  one  plays  with  it,  it  remembers  how  it  has  been 
moved  and  can  replay  that  movement  with  all  the  intricacies  of 
the  original  gesture;  every  pause,  acceleration,  and  even  the 
shaking  in  the  user’s  hand,  is  recorded.  Curlybot  then  repeats 
that  gesture  indefinitely,  creating  beautiful  and  expressive  pat¬ 
terns.  Children  can  use  curlybot  to  gain  strong  intuition  for  ad¬ 
vanced  mathematical  and  computational  concepts,  like  differ¬ 
ential  geometry,  through  play  outside  of  traditional  computers 
(Frei  et  al.,  2000). 

The  forced-feedback  technology  used  for  real-time  simulta¬ 
neous  communication  in  inTouch  was  employed  in  curlybot  for 
the  recording  and  playback  of  gestures.  Two  motors  equipped 
with  an  optical  encoder  enable  free  rotation  in  addition  to  for¬ 
ward  and  backward  movement. 

When  the  user  presses  the  button  a  red  LED  is  illuminated  to 
indicate  the  recording  mode.  The  user  then  moves  the  curly¬ 
bot  around;  meanwhile  an  encoder  is  recording  this  gesture  in¬ 
formation.  Pushing  the  button  a  second  time  terminates  record¬ 
ing  and  a  green  LED  alights  to  indicate  the  playback  mode.  The 
microprocessor  compares  the  current  position  with  the  stored 
positions  and  instructs  the  motors  to  retrace  the  steps  recorded 
in  the  curlybot’s  memory. 


PHOTO  24.4.  Curlybot. 
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This  project  contributes  to  both  interface  design  and  educa¬ 
tion.  As  a  tangible  interface  it  blurs  the  boundary  between  in¬ 
put  and  output,  as  inTouch  does.  Curlybot  itself  is  both  an  in¬ 
put  device  to  record  gestures  and  a  physical  display  device  to 
reenact  them.  By  allowing  the  user  to  teach  it  gestures  with  his 
or  her  hand  and  body,  and  then  reenacting  those  gestures  in  a 
physical  space  around  the  body,  curlybot  enables  a  strong  con¬ 
nection  between  body  and  mind  not  obtainable  from  anything 
expressed  on  a  computer  screen. 

From  an  educational  standpoint,  curlybot  allows  very  young 
children  to  explore  “advanced”  mathematical  and  computa¬ 
tional  concepts.  Curlybot  supports  new  ways  of  thinking  about 
geometric  shapes  and  patterns.  Children  can  also  use  curlybot 
to  explore  some  of  the  basic  ideas  behind  computational  pro¬ 
cedures,  like  how  complexity  can  be  built  from  simple  parts. 
This  is  similar  to  what  is  possible  with  the  Logo  programming 
language,  but  does  not  require  children  to  read  or  write  and 
thus  makes  advanced  ideas  accessible  to  younger  children. 
Curlybot  also  draws  strongly  on  children’s  intuition  about  their 
own  physical  actions  in  the  world  to  learn — what  Seymour  Pa- 
pert  called  “body  syntonic  learning”  (Papert,  1980).  In  addition, 
the  direct  input  and  beautifully  expressive  patterns  that  result 
through  curlybot’s  repetition  of  the  gestures  keep  children  play¬ 
ing  and  engaged. 

Topo^o:  3D  Constructive  Assem[>ly  With  Kinetic  Memory 

Topobo,  a  combination  of  “topology”  and  “robotics,”  is  a  3D 
constructive  assembly  system  with  kinetic  memory,  and  the  abil¬ 
ity  to  record  and  play  back  physical  motion  (Raffle  et  al.,  2004). 
By  snapping  together  a  combination  of  passive  (static)  and  ac¬ 
tive  (motorized)  components,  people  can  quickly  assemble  dy¬ 
namic  biomorphic  forms  like  animals  and  skeletons  with 
Topobo.  Topobo  allows  users  to  animate  those  forms  by  record¬ 
ing  the  movement  of  pushing,  pulling,  and  twisting  them,  and 
later  observe  the  system  play  back  those  motions  repeatedly. 
This  “record  and  play”  function  was  inherited  from  the  prior 
curlybot  project,  and  the  constructive  assembly  function  was  in¬ 
herited  from  the  commercial  toy,  Zoob  '“. 

For  example,  a  dog  can  be  constructed  and  then  taught  to 
gesture  and  walk  by  twisting  its  body  and  legs.  The  dog  will  then 
repeat  those  movements  and  walk  repeatedly.  The  same  way 
people  can  learn  about  static  structures  playing  with  regular 
building  blocks,  they  can  learn  about  dynamic  structures  by 
playing  with  Topobo.  Topobo  works  like  an  extension  of  the 
body,  giving  one  gestural  fluency.  Topobo  embeds  computation 
within  a  dynamic  building  system  so  that  gestural  manipulation 
of  the  material  becomes  a  programming  language  (Photo  24.5). 

Topobo  was  inspired  by  current  trends  in  computational 
media  design  and  by  artists  and  empiricists  using  visual  explo¬ 
rations  and  models  of  natural  phenomena  to  more  deeply  ap¬ 
preciate  patterns  found  in  the  natural  world.  In  this  spirit, 
Topobo  is  designed  to  allow  people  to  use  experimentation, 
play,  and  self-expression  to  discover  and  explore  common  nat¬ 
ural  relationships  between  natural  forms  and  dynamic  motion. 
Building  toys  and  educational  manipulatives  have  been  used 
for  years  by  children  to  learn  about  the  world  though  model 
making. 


PHOTO  24.5.  Topobo. 


Unique  among  modeling  systems  is  Topobo’s  coincident 
physical  input  and  output  behaviors  (which  is  also  common  to 
inTouch  and  curlybot).  The  system  is  comprised  of  10  differ¬ 
ent  primitives  that  can  be  snapped  together  in  a  variety  of  ways. 
Nine  of  these  primitives  are  called  “passive”  because  they  form 
static  connections.  These  static  connections  constrain  the  form 
and  the  range  of  motion  available  to  the  structure.  One  “active” 
primitive  is  built  with  an  embedded  encoder  and  motor  which 
is  programmed  by  demonstration.  These  motorized  compo¬ 
nents  are  the  only  ones  that  move,  so  the  system  is  able  to 
faithfully  record  and  replay  every  dynamic  manipulation  to  the 
structure. 

mediaBlocks:  Token  and  Constraint  Approach 

The  mediaBlocks  system  is  a  tangible  interface  for  manipu¬ 
lating  lists  of  online  digital  media  such  as  video  clips  and  images 
(Ullmer  et  al.,  1998).  Whereas  Urp  provides  a  spatial  interface  for 
leveraging  object  arrangements  consistent  with  real-world  build¬ 
ing  configurations,  the  mediaBlocks  system  provides  a  relational 
interface  for  manipulating  more  abstract  digital  information. 

The  mediaBlocks  are  small,  digitally  tagged  blocks,  dynami¬ 
cally  bound  to  lists  of  online  media  elements.  The  mediaBlocks 
support  two  major  modes  of  use.  First,  they  function  as  capture, 
transport,  and  playback  mechanisms  for  moving  online  media 
between  different  media  devices.  In  this  mode,  conference 
room  cameras,  digital  whiteboards,  wall  displays,  printers,  and 
other  devices  are  outfitted  with  mediaBlock  slots.  Inserting  one 
of  the  mediaBlocks  into  the  slot  of  a  recording  device  (e.g.,  a 


478  •  iSHii 


camera)  activates  the  recording  of  media  into  online  space,  and 
the  dynamic  binding  of  the  media  to  the  physical  block. 

Similarly,  inserting  one  of  the  bound  mediaBlocks  into  a  play¬ 
back  device  (e.g.,  video  display)  activates  playback  of  the  asso¬ 
ciated  online  media.  Inserting  mediaBlocks  into  slots  mounted 
on  computer  monitors  provides  an  intermediate  case,  allowing 
mediaBlock  contents  to  be  exchanged  bi-directionally  with  tra¬ 
ditional  computer  applications  using  the  GUI  drag-and-drop 
operation. 

The  second  functionality  of  mediaBlocks  uses  the  blocks  as 
physical  controls  on  a  media-sequencing  device  (Photo  24.6). 
A  mediaBlock  “sequence  rack”  (partially  modeled  after  the  tile 
racks  of  the  Scrabble  game)  allows  the  media  contents  of  mul¬ 
tiple  adjacent  mediaBlocks  to  be  dynamically  bound  to  a  new 
mediaBlock  carrier.  Similarly,  a  second  “position  rack”  maps  the 
physical  position  of  a  block  to  an  indexing  operation  upon  its 
contents.  When  mediaBlocks  are  positioned  on  the  left  edge  of 
the  position  rack,  the  first  media  element  of  the  block  is  se¬ 
lected.  Intermediate  physical  positions  on  the  rack  provide  ac¬ 
cess  to  later  elements  in  the  associated  media  list  of  the  block. 


sequence  rack 


sequence  stack  vis 


delete  chute 


target  stack  vis 
perspective  wall 
blank  stack 
position  wheel 
target  pad 
position  rack 


PHOTO  24.6.  mediaBlocks:  media  sequencing  device. 


PHOTO  24.7.  AudioPad  running  on  Sensetable  platform. 


Digital  Desk:  Pioneer  of  Tabletop  TUI 

Digital  Desk  (Wellner,  1993)  is  a  pioneering  work  to  demon¬ 
strate  a  way  to  integrate  physical  and  digital  document  processing 
on  a  table.  Wellner  brought  some  of  the  functionality  we  typically 
associate  with  GUIs  onto  the  physical  desktop.  This  table  used  a 
camera  and  a  microphone  to  detect  finger  presses  on  a  graphi¬ 
cal  interface  displayed  on  a  desk  with  a  video  projector.  Wellner 
used  this  desk  for  tasks  such  as  graphic  design  and  spreadsheet 
computations  on  physical  paper.  This  system  also  employed 
some  physical  props,  such  as  a  scanner  that  would  scan  items  and 
place  them  directly  on  the  tabletop  interaction  surface. 

Wellner’s  research  pointed  the  way  toward  enabling  the 
computer  to  perform  some  of  the  operations  we  traditionally 
associate  with  GUIs  in  a  tabletop  environment.  The  Digital  Desk 
also  illustrates  some  of  the  compelling  reasons  for  considering 
computer  interfaces  based  on  horizontal  interactive  surfaces. 
Because  many  work  surfaces  in  our  environment  are  already 
planar,  horizontal  or  nearly  horizontal  surfaces,  integrating  com¬ 
puter  interfaces  into  these  surfaces  may  provide  an  opportunity 
for  new  types  of  relationships  between  computation  and  physi¬ 
cal  objects,  and  may  help  create  computer  systems  that  are 
more  relevant  to  problem  domains  with  established  work  prac¬ 
tices  based  on  tabletops. 

The  Digital  Desk  inspired  many  tabletop  tangible  interfaces 
including  the  Luminous  Room  project  (Underkoffler,  Ullmer,  & 
Ishii,  1999)  from  which  Urp  (Underkoffler  &  Ishii,  1999)  was  cre¬ 
ated.  Sensetable  (Patten  et  al.,  2001)  is  another  example. 


Sensetable  and  AudioPad:  Tabletop  TUI 
for  Real-Time  Music  Performance 

Sensetable  (Patten  et  al.,  2001)  is  a  system  that  wirelessly 
tracks  the  positions  of  multiple  objects  on  a  flat  display  surface. 
The  Sensetable  serves  as  a  common  platform  for  a  variety  of 
tabletop  TUI  applications  such  as  Audio  Pad  and  IP  Network 
Design  Workbench. 

Audiopad  (Patten  et  al.,  2002)  is  a  composition  and  perfor¬ 
mance  instrument  for  electronic  music  which  tracks  the  posi¬ 
tions  of  objects  on  a  tabletop  surface  and  converts  their  motion 
into  music.  One  can  pull  sounds  from  a  giant  set  of  samples,  jux¬ 
tapose  archived  recordings  against  warm  synthetic  melodies, 
cut  between  drum  loops  to  create  new  beats,  and  apply  digital 
processing  all  at  the  same  time  on  the  same  table.  Audiopad  not 
only  allows  for  spontaneous  reinterpretation  of  musical  com¬ 
positions,  but  also  creates  a  visual  and  tactile  dialogue  between 
itself,  the  performer,  and  the  audience. 

Audiopad  is  based  on  the  Sensetable  platform  that  has  a  ma¬ 
trix  of  antenna  elements  which  track  the  positions  of  electroni¬ 
cally  tagged  objects  on  a  tabletop  surface.  Software  translates 
the  position  information  into  music  and  graphical  feedback  on 
the  tabletop.  Each  object  represents  either  a  musical  track  or  a 
microphone  (Photo  24.8). 

Experience  of  Audiopad  with  tangible  user  interface  through 
a  series  of  live  performances  suggests  that  interacting  with  elec- 
tromagnetically  tracked  objects  on  a  tabletop  surface  with  graph¬ 
ical  feedback  can  be  a  powerful  and  satisfying  tool  for  musical 
expression.  The  integration  of  input  and  output  spaces  gives  the 
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PHOTO  24.8.  IP  network  design  workbench  running  on  Sense- 
table  platform. 

performer  a  great  deal  of  flexibility  in  terms  of  the  music  that 
can  be  produced.  At  the  same  time,  this  seamless  integration 
allows  the  performer  to  focus  on  making  music,  rather  than  us¬ 
ing  the  interface.  Spatial  multiplexed  inputs  of  TUI  also  sup¬ 
ported  two  performers  play  music  simultaneously  and  collabo- 
ratively  (Photo  24.8). 

IP  Network  Design  Workbench: 

Event  Driven  Simulation  on  Sensetable 

The  IP  Network  Design  Workbench  (IPNWDWB)  is  the  col¬ 
laborative  project  between  NTT  Comware  and  the  Tangible 


Media  Group.  The  IP  Network  Design  Workbench  supports 
collaborative  network  design  and  simulation  by  a  group  of  ex¬ 
perts  and  customers  (Kobayashi  et  al.,  2003).  This  system  is 
also  based  on  the  Sensetable  platform  which  can  wirelessly  de¬ 
tect  the  location  and  orientation  of  physical  pucks.  Simulation 
engine  is  based  on  the  event-driven  simulation  model.  Using 
the  Sensetable  system,  users  can  directly  manipulate  network 
topologies  for  modeling,  control  simulation  parameters  of 
nodes  and  links  using  physical  pucks  on  the  sensing  table,  and 
simultaneously  see  the  simulation  results  projected  onto  the 
table  in  real  time  (Photo  24.9). 

The  goal  of  IPNWDWB  is  to  make  simulation  tools  more  ac¬ 
cessible  for  non-experts,  so  that  they  can  join  the  network  de¬ 
sign  process  and  interact  with  experts  more  easily  than  using 
traditional  GUI  computer.  This  system  was  commercialized  and 
has  been  used  for  collaborative  network  design  with  customers 
to  ensure  their  understanding  of  the  performance  and  cost  of 
network  enhancements  dealing  with  the  increases  of  network 
traffic  caused  by  Voice  over  IP  and/or  streaming  video,  for  ex¬ 
ample.  Because  of  the  large  tiling  horizontal  work  surface  and 
TUI  interaction  that  invites  all  the  participants  to  touch  and  ma¬ 
nipulate  pucks  simultaneously,  the  process  of  decision  making 
becomes  much  more  democratic  and  more  convincing  than  or¬ 
dinary  PowerPoint  presentations  through  conventional  GUI. 

If  we  compare  IPNWDWB  with  Urp,  we  notice  a  big  differ¬ 
ence  in  the  nature  of  applications.  In  Urp,  we  used  physical 
scale  models  of  buildings,  which  humans  have  used  for  thou¬ 
sands  of  years  to  design  cities,  as  tangible  representations  of  ur¬ 
ban  models.  Therefore,  it  is  very  natural  to  apply  TUIs  to  such 
domains  (e.g.,  urban  planning,  landscape  design)  in  which 
physical  models  have  been  used  long  before  the  birth  of  digital 
computers. 

In  contrast,  IP  Network  Design  is  based  on  event-driven  sim¬ 
ulation  models  which  are  quite  abstract  and  new.  This  model¬ 
ing  technique  requires  digital  computers.  IPNWDB  is  important 


PHOTO  24.9.  Actuated  workbench  used  for  distributed  collaboration. 
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since  it  demonstrated  that  TUI  can  empower  the  design  process 
even  in  abstract  and  computational  application  domain  which 
does  not  have  straight-forward  mappings  from  abstract  con¬ 
cepts  to  physical  objects.  There  are  a  wide  range  of  modeling 
and  simulation  techniques  such  as  System  Dynamics  and  Event- 
Driven  Simulation  that  uses  2D  graph  representation.  We  learned 
that  many  of  these  abstract  computational  applications  can  be 
supported  by  Sensetable-like  TUI  platforms  in  the  collaborative 
design  sessions.  For  example,  simultaneously  changing  parame¬ 
ters,  transferring  control  between  different  people  or  different 
hands,  and  distributing  the  adjustment  of  simulations  dynami¬ 
cally  are  interactions  enabled  by  TUI. 

Actuated  Workbench:  Closing  a  Loop 
of  Computational  Actuation  and  Sensing 

The  aforementioned  tabletop  TUI  systems  share  a  common 
weakness.  While  input  occurs  through  the  physical  manipula¬ 
tion  of  tangible  objects,  output  is  displayed  only  through  sound 
or  graphical  projection  on  and  around  the  objects.  As  a  result, 
the  objects  can  feel  like  loosely  coupled  handles  to  digital  in¬ 
formation  rather  than  physical  manifestations  of  the  informa¬ 
tion  itself 

In  addition,  the  user  must  sometimes  compensate  for  in¬ 
consistencies  when  links  between  the  digital  data  and  the  phys¬ 
ical  objects  are  broken.  Such  broken  links  can  arise  when  a 
change  occurs  in  the  computer  model  that  is  not  reflected  in  a 
physical  change  of  its  associated  object.  With  the  computer  sys¬ 
tem  unable  to  move  the  objects  on  the  table  surface,  it  cannot 
undo  physical  input,  correct  physical  inconsistencies  in  the  lay¬ 
outs  of  the  objects,  or  guide  the  user  in  the  physical  manipula¬ 
tion  of  the  objects.  As  long  as  this  is  so,  the  physical  interaction 
between  human  and  computer  remains  one-sided. 

To  address  this  problem,  the  Actuated  Workbench  was  de¬ 
signed  to  provide  a  hardware  and  software  infrastructure  for  a 
computer  to  smoothly  move  objects  on  a  table  surface  in  two  di¬ 
mensions  (Pangaro  et  al.,  2002). 

The  Actuated  Workbench  is  a  new  technology  that  uses  mag¬ 
netic  forces  to  move  objects  on  a  table  in  two  dimensions.  It  is 
intended  for  use  with  existing  tabletop  tangible  interfaces,  pro¬ 
viding  an  additional  feedback  loop  for  computer  output,  and 
helping  to  resolve  inconsistencies  that  otherwise  arise  from  the 
computer’s  inability  to  move  objects  on  the  table. 

Actuation  enables  a  variety  of  new  functions  and  applica¬ 
tions.  For  example,  a  search  and  retrieve  function  could  respond 
to  a  user  query  by  finding  matching  data  items  and  either  mov¬ 
ing  them  to  another  place  on  the  tabletop  or  wiggling  them  to 
get  the  user’s  attention.  A  more  powerful  function  would  be 
one  in  which  the  computer  could  physically  sort  and  arrange 
pucks  on  the  table  according  to  user-specified  parameters.  This 
could  help  the  user  organize  a  large  number  of  data  items  be¬ 
fore  manually  interacting  with  them.  As  a  user  makes  changes 
to  data  through  physical  input,  he  or  she  may  wish  to  undo 
some  changes.  A  physical  undo  in  this  system  could  move  the 
pucks  back  to  their  positions  before  the  last  change.  It  could 
also  show  the  user  the  exact  sequence  of  movements  she  had 
performed.  In  this  sense,  both  “undo”  and  “rewind”  commands 
are  possible. 


One  advantage  that  tabletop  tangible  user  interfaces  offer  is 
the  ease  with  which  multiple  users  can  make  simultaneous 
changes  to  the  system.  Users  can  observe  each  other’s  changes, 
and  any  user  can  reach  out  and  physically  change  the  shared  lay¬ 
out  without  having  to  grab  a  mouse  or  other  pointing  device. 
This  is  not  the  case,  however,  when  users  are  collaborating  re¬ 
motely.  In  this  scenario,  a  mechanism  for  physical  actuation  of 
the  pucks  becomes  valuable  for  synchronizing  multiple,  physi¬ 
cally  separated  workbench  stations  (Photo  24.9).  Without  such 
a  mechanism,  real-time  physical  synchronization  of  the  two  ta¬ 
bles  would  not  be  possible,  and  inconsistencies  could  arise  be¬ 
tween  the  graphical  projection  and  the  physical  state  of  the 
pucks  on  the  table. 

In  addition  to  facilitating  the  simple  synchronization  of  these 
models,  the  Actuated  Workbench  can  recreate  remote  users’  ac¬ 
tual  gestures  with  objects  on  the  table,  adding  greatly  to  the 
“Ghostly  Presence”  (Brave  et  al.,  1998)  sought  in  remote-col¬ 
laboration  interfaces. 

Actuated  Workbench  is  helpful  in  teaching  students  about 
physics  by  demonstrating  the  attraction  and  repulsion  of 
charged  particles  represented  by  pucks  on  the  table.  As  a  stu¬ 
dent  moves  the  pucks  around  on  the  table,  the  system  could 
make  them  rush  together  or  fly  apart  to  illustrate  forces  be¬ 
tween  the  objects. 

SandScape:  Continuous  TUI  for  Landscape  Design 

SandScape  (Ishii  et  al.,  2004)  is  a  tangible  interface  for  de¬ 
signing  and  understanding  landscapes  through  a  variety  of  com¬ 
putational  simulations  using  sand.  Users  view  these  simulations 
as  they  are  projected  on  the  surface  of  a  sand  model  that  repre¬ 
sents  the  terrain.  The  users  can  choose  from  a  variety  of  differ¬ 
ent  simulations  that  highlight  the  height,  slope,  contours,  shad¬ 
ows,  drainage,  or  aspect  of  the  landscape  model  (Photo  24.10). 

The  users  can  alter  the  form  of  the  landscape  model  by  ma¬ 
nipulating  sand  while  seeing  the  resultant  effects  of  computa¬ 
tional  analysis  generated  and  projected  on  the  surface  of  sand  in 
real  time.  The  project  demonstrates  how  TUI  takes  advantage  of 
our  natural  ability  to  understand  and  manipulate  physical  forms 
while  still  harnessing  the  power  of  computational  simulation  to 
help  in  our  understanding  of  a  model  representation. 

The  SandScape  configuration  is  based  on  a  box  containing 

1  m-diameter  glass  beads  lit  from  beneath  with  an  array  of  600 
high-power  infrared  LEDs.  Four  IR  mirrors  are  placed  around 
the  LED  array  to  compensate  for  the  uneven  radiance  distribution 
on  the  boundary.  A  monochrome  infrared  camera  is  mounted 

2  m  above  the  surface  of  the  beads  and  captures  the  intensity 
of  light  passing  through  the  volume.  The  intensity  of  transmitted 
light  is  a  function  of  the  depth  of  the  beads  and  a  look-up  table 
can  be  used  to  convert  surface  radiance  values  into  surface  ele¬ 
vation  values.  The  system  has  been  calibrated  to  work  with  a 
specific  bead  size  and  the  optical  properties  of  the  material  used 
(absorption  and  scattering  coefficients)  are  critical  to  its  suc¬ 
cessful  functioning.  Owing  to  the  exponential  decay  of  the  IR 
light  passing  through  the  glass  beads  (or  any  other  material)  the 
intensity  at  the  top  surface  can  vary  greatly  and  sometimes  ex¬ 
ceed  the  dynamic  range  of  the  video  camera.  This  problem  can 
be  solved  by  taking  several  images  with  different  exposure  times 
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PHOTO  24.10.  SandScape. 


and  combining  them  to  recover  the  effective  radiance  of  the 
scene.  SandScape  is  less  accurate  than  its  predecessor  Illumi¬ 
nating  Clay,  which  used  laser  rangefinders  to  capture  the  geom¬ 
etry  of  a  clay  model  (Piper  et  al.,  2002). 

SandScape  and  Illuminating  Clay  show  the  potential  advan¬ 
tages  of  combining  physical  and  digital  representations  for  land¬ 
scape  modeling  and  analysis.  The  physical  clay  and  sand  models 
convey  spatial  relationships  that  can  be  intuitively  and  directly 
manipulated  by  hand.  Users  can  also  insert  any  found  physical  ob¬ 
jects  directly  under  the  camera.  This  approach  allows  users  to 
quickly  create  and  understand  highly  complex  topographies  that 
would  be  difficult  and  time-consuming  to  produce  with  conven¬ 
tional  CAD  tools.  We  believe  that  this  “Continuous  TUI”  approach 
makes  better  use  of  our  natural  abilities  to  discover  solutions 
through  the  manipulation  of  physical  objects  and  materials. 

At  the  same  time,  the  projected  graphics  give  the  user  real¬ 
time  feedback.  While  tracked  physical  models  interfaced  with  a 
computer  are  not  a  novelty,  we  believe  that  SandScape  and  Illu¬ 
minating  Clay  offer  a  new  contribution,  by  using  the  continuous 
surface  geometry  of  the  model  itself  to  act  as  the  input/output 
mechanism.  In  so  doing  we  hope  to  give  the  projected  infor¬ 
mation  the  same  tangible  immediacy  as  the  clay/sand  material  it¬ 
self  and  allow  quantitative  data  to  support  the  intuitive  under¬ 
standing  of  the  landscape. 

Landscape  architecture,  as  well  as  urban  and  architectural 
design,  requires  the  collaboration  of  a  number  of  specialists. 
These  include  earth  engineers,  water  engineers,  agrarian  man¬ 
agers,  land  economists,  and  transport  engineers — to  name  just 
a  few.  In  the  current  process  of  design,  the  collaboration  hap¬ 
pens  at  different  stages,  and  sometimes  without  much  direct 
and  synchronous  interaction.  SandScape  and  Illuminating  Clay 
provide  a  common  platform  for  collaboration,  centered  on  the 
table  workspace.  Numerous  representations  and  analyses  can 
be  combined  in  a  single  design  environment,  potentially  offer¬ 
ing  a  greater  cohesion  between  different  specialists  and  stream¬ 
lining  the  process  of  design. 


musicBottles:  Transparent  Interface  Based 
on  Augmented  Glass  Bottles 

musicBottles  introduces  a  tangible  interface  that  deploys 
bottles  as  containers  and  controls  for  digital  information  (Photo 
24.  11).  The  system  consists  of  a  specially  designed  table  and 
three  corked  bottles  that  “contain”  the  sounds  of  the  violin,  the 
cello,  and  the  piano  in  Edouard  Lalo’s  Piano  Trio  in  C  Minor,  Op. 
7.  Custom-designed  electromagnetic  tags  embedded  in  the  bot¬ 
tles  enable  each  one  to  be  wirelessly  identified. 

When  a  bottle  Is  placed  onto  the  stage  area  of  the  table, 
the  system  identifies  each  bottle,  and  lights  up  the  stage  to 
show  that  the  bottles  have  been  recognized.  The  opening  and 
closing  of  a  bottle  is  also  detected,  and  as  the  cork  Is  removed, 
the  corresponding  instrument  becomes  audible.  A  pattern  of 
colored  light  Is  rear-projected  onto  the  table’s  translucent  sur¬ 
face  to  reflect  changes  in  pitch  and  volume  for  each  instrument. 
The  interface  allows  users  to  structure  the  experience  of  the 
musical  composition  by  physically  manipulating  the  different 
soundtracks. 

Humans  have  used  glass  bottles  for  thousands  of  years. 
Through  the  seamless  extension  of  physical  affordances  and 
metaphors  of  the  bottles  into  the  digital  world,  the  bottles  pro¬ 
ject  explores  the  transparency  of  the  interface  (Ishii,  2004). 

A  wide  variety  of  contents,  including  music,  weather  reports, 
poems,  and  stories  have  been  designed  to  test  the  concept  (Ishii 
et  al.,  1999).  The  bottles  lined  up  on  a  specially  designed  table, 
the  feel  of  the  glass  as  we  open  them,  and  the  music  and  light 
from  the  LED  lamps  that  come  out  of  them  together  create  a 
unique  aesthetic  experience.  This  is  a  pleasure  not  to  be  had 
from  the  mere  click  of  a  mouse. 

Potential  applications  are  not  limited  to  music  alone.  One 
might  imagine  perfume  bottles  filled  with  poetry  or  wine  bottles 
that  decant  stories  (Mazalek,  Wood,  &  Ishii,  2001).  More  practi¬ 
cal  applications  might  include  a  medicine  chest  full  of  bottles 
that  tell  the  user  how  and  when  to  take  them  and  let  the  hospital 
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PHOTO  24. 1 1 .  musicBottles. 


know  when  they  do.  As  an  intimate  part  of  our  daily  lives,  glass 
bottle  interfaces  offer  a  simple  and  transparent  interface. 

Pinwheels:  Ambient  interface  to  Information 

Pinwheels  are  an  example  of  ambient  media  that  demon¬ 
strate  a  new  approach  to  interfacing  people  with  online  digital 
information  through  subtle  changes  in  sound  and  movement, 
which  can  be  processed  in  the  background  of  awareness.  Pin¬ 
wheels  spin  in  a  “bit  wind”  and  represent  an  invisible  flow  of  dig¬ 
ital  information  such  as  network  traffic  as  physical  movement 
within  an  architectural  spaces  (Photo  24.12). 

Nature  is  filled  with  subtle,  beautiful,  and  expressive  ambient 
media  that  engage  each  of  our  senses.  The  sounds  of  rain  and 
the  feeling  of  warm  wind  on  our  cheeks  help  us  understand  and 
enjoy  the  weather  even  as  we  engage  in  other  activities.  Simi¬ 
larly,  we  are  aware  of  the  activity  of  neighbors  through  passing 
sounds  and  shadows  at  the  periphery  of  our  attention.  Cues  like 
an  open  door  or  lights  in  an  office  help  us  subconsciously  un¬ 
derstand  the  activities  of  other  people,  and  communicate  our 
own  activity  and  availability. 

Current  personal  computing  interfaces,  however,  largely  ig¬ 
nore  these  rich  ambient  spaces,  and  squeeze  vast  amounts  of 
digital  information  into  small  rectangular  screens.  Information  is 
presented  as  “painted  bits”  (pixels)  on  flat  screens  that  must  be 
in  the  center  (foreground)  of  a  user’s  focus  to  be  processed.  In 
order  to  broaden  the  concept  of  “display”  to  make  use  of  the 
entire  physical  environment  as  an  interface,  using  ambient  me¬ 
dia,  information  can  be  manifested  as  subtle  changes  in  form, 
movement,  sound,  color,  smell,  temperature,  or  light.  We  call 
them  “ambient  displays.” 

The  Pinwheels  evolved  from  the  idea  of  using  airflow  as  am¬ 
bient  media.  However,  we  found  that  the  flow  of  air  itself  was 
difficult  to  control  and  to  convey  information.  As  an  alternative, 
we  envisioned  that  a  visual/physical  representation  of  airflow 
based  on  the  “spinning  pinwheels”  could  be  legible  and  poetic. 


PHOTO  24.12.  Pinwheels. 


The  Pinwheels  spin  in  the  “bit  wind”  at  different  speeds  based 
upon  their  input  information  source. 

Ambient  displays  are  envisioned  as  being  all  around  and 
suited  to  the  display  of 

1.  People’s  presence  (awareness  of  remote  people’s  status/ 
activities); 

2.  Atmospheric  and  astronomical  phenomena; 

3.  General  states  of  large  and  complex  systems  (e.g.,  an  atomic 
power  plant) . 

For  instance,  an  atmospheric  scientist  might  map  patterns 
of  solar  wind  into  patterns  of  Pinwheel  spins  in  a  room. 

There  are  many  design  challenges  surrounding  ambient  dis¬ 
plays.  One  of  them  is  the  mapping  of  information  to  the  physi¬ 
cal  motion  and  other  ambient  media.  A  designer  of  ambient  dis¬ 
plays  must  transform  the  digital  data  into  a  meaningful  pattern 
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of  physical  motion  that  successfully  communicates  the  infor¬ 
mation.  The  threshold  between  foreground  and  background  is 
another  key  issue.  Ambient  displays  are  expected  to  go  largely 
unnoticed  until  some  change  in  the  display  or  user’s  state  of 
attention  makes  it  come  into  the  foreground  of  attention.  How 
to  keep  the  level  of  display  at  the  threshold  of  a  user’s  atten¬ 
tion  is  an  open  design  issue. 


Contributions  of  TUls 

TUI  is  generally  built  from  systems  of  physical  artifacts  with  dig¬ 
ital  coupling  with  computation.  Taken  together  as  ensembles, 
TUI  has  several  important  advantages  over  traditional  GUI  as 
well  as  limitations.  This  section  summarizes  those  contributions 
of  TUIs  and  required  design  considerations. 


DouMe  Interactions  Loop:  Immediate  Tactile  Feedback 

One  important  advantage  of  TUI  is  that  users  receive  pas¬ 
sive  haptic  feedback  from  the  physical  objects  as  they  grasp  and 
manipulate  them.  ’Without  waiting  for  the  digital  feedback 
(mainly  visual),  users  can  complete  their  input  actions  (e.g., 
moving  a  building  model  to  see  the  interrelation  of  shadows). 

Typically  there  are  two  feedback  loops  in  TUI,  as  shown 
in  Fig.  24.4: 

1.  The  passive  haptic  feedback  loop  provides  the  user  with  an 
immediate  confirmation  that  he  or  she  has  grasped  and 
moved  the  object.  This  loop  exists  within  a  physical  domain, 
and  it  does  not  require  any  sensing  and  processing  by  a  com¬ 
puter.  Thus,  there  is  no  computational  delay.  The  user  can 
begin  manipulating  the  object  as  desired  without  having  to 
wait  for  the  second  feedback  loop,  the  visual  confirmation 
from  the  interface.  In  contrast,  when  user  uses  a  mouse  with 
a  GUI  computer,  he  or  she  has  to  wait  for  the  visual  feedback 
(second  loop)  to  complete  an  action. 

2.  The  second  loop  is  a  digital  feedback  loop  that  requires  sens¬ 
ing  of  physical  objects  moved  by  users,  computation  based 


on  the  sensed  data,  and  displaying  the  results  as  visual  (and 

auditory)  feedback.  Therefore,  this  second  loop  takes  longer 

than  the  first  loop. 

Many  of  the  frustrations  of  using  current  computers  come 
from  the  noticeable  delay  of  digital  feedback  as  well  as  a  lack  of 
tactile  confirmation  of  actions  taken  by  computers.  We  believe  the 
double  loops  of  TUI  give  users  a  way  to  ease  those  frustrations. 

Note:  Actuation  technology  introduced  in  Actuated  Workbench  will  con¬ 
tribute  to  add  another  loop,  that  of  physical  actuation.  Figure  24.5  illus¬ 
trates  the  third  loop  introduced  into  the  TUI  model  by  computer-con- 
trolled  actuation  and  sensing.  The  third  loop  allows  the  computer  to 
give  feedback  on  the  status  of  the  digital  information  as  the  model 
changes  or  responds  to  internal  computation. 

Persistency  of  Tangibles 

As  physical  artifacts,  TUI  tangibles  are  persistent.  Tangibles 
also  carry  physical  state,  with  their  physical  configurations 
tightly  coupled  to  the  digital  state  of  the  systems  they  represent. 
The  physical  state  of  tangibles  embodies  key  aspects  of  the  dig¬ 
ital  state  of  an  underlying  computation. 

For  example,  the  physical  forms  of  the  Urp  building  models, 
as  well  as  their  position  and  orientation  on  the  workbench  of 
the  system,  serve  central  roles  in  representing  and  controlling 
the  state  of  the  underlying  digital  simulation  system.  Even  if 
the  mediating  computers,  cameras,  and  projectors  of  Urp  are 
turned  off,  many  aspects  of  the  state  of  the  system  are  still  con¬ 
cretely  expressed  by  the  configuration  of  its  physical  elements. 

In  contrast,  the  physical  form  of  the  mouse  holds  little  rep¬ 
resentational  significance  because  GUIs  represent  information 
almost  entirely  in  visual  form. 

Coincidence  of  Input  and  Output  Spaces 

Another  important  feature  (and  design  principle)  of  TUI  is 
coincidence  of  input  and  output  spaces  to  provide  seamless 
information  representation  that  spans  both  tangible  (physical) 
and  intangible  (digital)  domains. 
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FIGURE  24.4.  TUI's  double  feedback  loops. 


FIGURE  24.5.  TUI  with  actuation  (Actuated  workbench). 
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GUI  utilizes  the  mouse  and  keyboard  as  generic  “remote” 
controllers  (input),  and  the  screen  serves  as  main  output 
medium.  Thus,  there  is  spatial  discontinuity  between  those  two 
spaces.  There  is  also  multimodal  inconsistency,  as  touch  is  the 
main  input  while  vision  is  the  only  output. 

TUI  tries  to  coincide  inputs  space  and  output  space  as  much  as 
possible  to  realize  seamless  coupling  of  physical  and  digital  worlds 
(Ishii  &  Ullmer,  1997).  An  example  of  this  seamless  coupling  of  is 
Underkoffler’s  Urp  (Underkoffler  &  Ishii,  1999).  A  series  of  archi¬ 
tectural  models  serve  as  the  input  devices,  and  output  in  the  form 
of  a  wind-and-shadow  simulation  is  projected  down  onto  the  same 
tabletop  surface,  on  top  of  and  around  the  building  models.  Illu¬ 
minating  Clay  (Piper  et  al.,  2002)  and  SandScape  (Ishii  et  al.,  2004) 
demonstrate  another  example  of  i/o  coincidence  using  continu¬ 
ous  flexible  material:  sand.  Curlybot  and  topobo  demonstrate 
the  same  concept  using  the  contact  surface  of  the  tangibles  as 
input  and  output  to  digitize  the  person’s  physical  motion. 

Special  Purpose  vs.  General  Purpose 

GUIs  are  fundamentally  general-purpose  interfaces  that 
are  supposed  to  emulate  a  variety  of  applications  visually  us¬ 
ing  dynamic  pixels  on  a  screen  and  generic  remote  controllers 
such  as  the  mouse  and  keyboard.  On  the  other  hand,  TUIs  are 
relatively  specific  interfaces  tailored  to  certain  type  of  appli¬ 
cations  in  order  to  increase  the  directness  and  intuitiveness  of 
interactions. 

The  selection  of  the  correct  and  specific  application  domain 
is  critical  to  apply  TUI  successfully  to  take  advantage  of  exist¬ 
ing  skills  and  work  practices  (e.g.,  use  of  physical  models  in 
urban  planning) . 

One  notable  aspect  of  Urp  is  its  use  of  objects  with  very  ap¬ 
plication-specific  physical  forms  (scaled  building  models)  as  a 
fundamental  part  of  the  interface.  Physical  building  models  rep¬ 
resent  the  buildings  themselves  in  the  interactive  simulation. 
Thus  they  give  the  user  important  visual  and  tactile  informa¬ 
tion  about  the  computational  object  they  represent.  Indicators 
such  as  a  clock  and  weather  vane  work  in  reverse  in  the  Urp 
system.  Instead  of  the  clock  hands  moving  to  indicate  the  pas¬ 
sage  of  time,  the  user  can  move  the  clock  hands  to  change  the 
time  of  day  for  the  shadow  study  (Photo  24.1).  Likewise,  he  or 


FIGURE  24.6.  Center  and  periphery  of  user’s  attention  within 
physical  space. 


she  can  change  the  orientation  of  the  weather  vane  to  control 
the  direction  of  the  wind  (Photo  24.2). 

In  the  design  of  TUI,  it  is  important  to  give  an  appropriate 
form  to  each  tangible  tool  and  object  so  that  the  form  will  give 
an  indication  of  the  function  available  to  the  users.  For  example, 
the  clock  hands  allow  people  to  automatically  make  the  as¬ 
sumption  that  they  are  controlling  time. 

Of  course,  this  special-purpose-ness  of  TUIs  can  be  a  big  dis¬ 
advantage  if  users  would  like  to  apply  it  to  a  wide  variety  of  ap¬ 
plications  since  customized  physical  objects  tailored  to  certain 
application  can  not  be  reused  for  most  of  other  applications. 
By  making  the  form  of  objects  more  abstract  (e.g.,  a  round  puck), 
you  lose  the  legibility  of  tangible  representation  and  the  object 
will  become  a  generic  handle  rather  than  the  representation  of 
underlying  digital  information.  It  is  important  to  attain  a  balance 
between  specific/concrete  vs.  generic/abstract  to  give  a  form  to 
digital  information  and  computational  function. 

Space-Multiplexed  Input 

Another  distinct  feature  of  TUI  is  space-multiplexed  input 
(Fitzmaurice,  Ishii,  &  Buxton,  1995a).  Each  tangible  representa¬ 
tion  serves  as  a  dedicated  controller  occupying  its  own  space, 
and  encourages  two-handed  and  multi-user  simultaneous  in¬ 
teraction  with  underlying  computational  model.  Thus,  TUI  is 
suitable  for  collocated  collaboration  allowing  concurrent  ma¬ 
nipulation  of  information  by  multiple  users. 

GUI,  in  contrast,  provides  time-multiplexed  input  that  allows 
users  to  use  one  generic  device  to  control  different  computa¬ 
tional  functions  at  different  points  in  time.  For  instance,  the 
mouse  is  used  for  menu  selection,  scrolling  windows,  pointing, 
and  clicking  buttons  in  a  time-sequential  manner. 

TUI  can  support  not  only  collocated  collaboration,  but  also 
remote  collaboration  using  actuation  mechanism  to  synchro¬ 
nize  the  physical  states  of  tangibles  over  distance.  Actuated 
■Workbench  is  an  example  of  such  a  technology  that  extends  TUI 
for  remote  collaboration  (Pangaro  et  al.,  2002). 

In  the  Urp  scenario,  applying  the  Actuated  "Workbench  tech¬ 
nology,  it  is  possible  to  have  two  distributed  Urp  tables  in  dif¬ 
ferent  locations,  connected  and  synchronized  over  the  Internet. 
One  Urp  can  be  in  Tokyo,  while  the  other  Urp  can  be  in  Boston, 
and  the  shadows  are  synchronized  as  the  urban  planning  team 
moves  the  buildings  around  the  Urp  space.  The  movement  of 
buildings  can  be  also  synchronized  by  the  actuation  mechanism. 
"When  the  building  planner  moves  a  building  location,  both  the 
local  and  the  remote  shadow  will  update  simultaneously;  posi¬ 
tion  and  orientation  of  moved  buildings  is  also  synchronized. 
This  synchronization  of  a  distributed  workbench  allows  both 
teams  to  discuss  changes  to  the  situation  in  real  time,  and  pro¬ 
vides  a  common  reference  for  otherwise  ethereal  qualities  such 
as  wind,  time,  and  shadow. 


CONCLUSION 


The  author  met  a  highly  successful  computational  device  called 
the  “abacus”  when  he  was  two  years  old  (Photo  24.13).  He  could 
enjoy  the  touch  and  feel  of  the  “digits”  physically  represented  as 
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PHOTO  24.13.  Abacus. 


arrays  of  beads.  This  simple  abacus  was  not  merely  a  digital 
computational  device.  Because  of  its  physical  affordance,  the 
abacus  also  became  a  musical  instrument,  imaginary  toy  train, 
and  a  backscratcher.  He  was  captivated  by  the  sound  and  tac¬ 
tile  interaction  with  this  simple  artifact. 

His  childhood  abacus  became  a  medium  of  awareness  too. 
When  his  mother  kept  household  accounts,  he  was  aware  of  her 
activities  by  the  sound  of  her  abacus,  knowing  he  could  not  ask 
her  to  play  with  him  while  her  abacus  made  its  music. 

This  abacus  suggests  to  us  a  new  direction  of  Human- 
Computer  Interaction  (HCI)  that  we  call  Tangible  User  Interfaces 
(TUI).  First,  it  is  important  to  note  that  the  abacus  makes  no  dis¬ 
tinction  between  “input”  and  “output.”  Instead,  the  beads,  rods, 
and  frame  serve  as  physical  representations  of  numerical  infor¬ 
mation  and  computational  mechanism.  They  also  serve  as  di¬ 
rectly  manipulatable  physical  controls  to  compute  on  numbers. 

Second,  the  simple  and  transparent  mechanical  structure  of 
the  abacus  (without  any  digital  black  boxes)  provides  rich  phys¬ 
ical  affordances  (Norman,  1999)  so  that  even  children  can  im¬ 


mediately  understand  what  they  can  do  with  this  artifact  with¬ 
out  reading  a  manual. 

TUI  pursues  these  features  further  into  the  digital  domain  by 
giving  physical  form  to  digital  Information  and  computation, 
employing  physical  artifacts  both  as  representations  and  con¬ 
trols  for  computational  media.  Its  design  challenge  is  a  seamless 
extension  of  the  physical  affordances  of  the  objects  into  the 
digital  domain. 

This  chapter  introduced  the  basic  concept  of  TUI  and  a  vari¬ 
ety  of  examples  of  TUI  applications  to  address  the  key  proper¬ 
ties  of  TUI  and  its  design  challenges.  TUI  is  still  in  its  infancy,  and 
extensive  research  is  required  to  identify  the  killer  applications, 
scalableTUI  toolkits,  and  a  set  of  strong  design  principles. 

The  research  of  TUI  which  gives  physical  forms  to  digital  in¬ 
formation/computation  naturally  crosses  with  the  paths  of  in¬ 
dustrial/product  design  as  well  as  environmental/architectural 
design.  It  has  also  made  an  Impact  on  the  media-arts/interactive- 
arts  community.  The  author  hopes  that  TUI  design  will  con¬ 
tribute  to  promote  those  interdisciplinary  design  research  ini¬ 
tiatives  in  the  HCI  community  to  bring  strong  design  culture  as 
well  as  media-arts  perspective  to  the  scientific/academic  world. 

Mark  Weiser’s  (1991)  seminal  paper  on  ubiquitous  comput¬ 
ing  started  with  the  following  paragraph:  “The  most  profound 
technologies  are  those  that  disappear.  They  weave  themselves 
into  the  fabric  of  everyday  life  until  they  are  indistinguishable 
from  it.  ” 

I  do  believe  that  TUI  is  one  of  promising  paths  to  his  vision 
of  invisible  interface. 
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SCOPE  AND  STRUCTURE  OF  THIS  CHAPTER 


In  this  chapter,  we  explain  why  psychological  complexity  is  (or 
should  be)  of  interest  to  the  designers  of  human  computer  sys¬ 
tems.  We  then  distinguish  between  intrinsic  complexity  and  un¬ 
due  complexity.  We  presume  that  undue  complexity  is  gener¬ 
ally  (but  not  universally)  counterproductive  in  that  it  leads  to 
more  errors,  frustration,  and  greater  task  completion  times. 
Generally,  these  are  all  things  to  be  avoided  in  a  work-oriented 
context.  However,  in  a  more  aesthetic,  recreational,  or  pleasure- 
oriented  context,  increased  psychological  complexity  can  often 
be  desirable.  We  examine  the  sources  of  undue  complexity.  We 
hypothesize  undue  complexity  can  arise  intentionally,  through 
incompetence,  or  (most  commonly)  as  an  unintended  side  ef¬ 
fect  of  normal  socialization  processes. 

We  differentiate  complexity  from  many  related  concepts 
such  as  uncertainty,  obscurity,  and  difficulty.  Then,  we  review 
various  approaches  to  measuring  complexity.  We  suggest  where 
and  how  complexity  may  be  introduced  during  the  overall  de¬ 
velopment  of  human-computer  systems  and  several  approaches 
that  may  help  minimize  undue  complexity.  Finally,  we  specu¬ 
late  on  some  possible  future  developments  in  the  field  of  psy¬ 
chological  complexity  and  briefly  discuss  two  case  studies. 

Why  Study  Psychological  Complexity? 

Although  mathematicians  have  treated  complexity  as  a  topic  for 
a  long  time,  more  recently  this  interest  has  spread  into  many 
fields  (e.g.,  Holland,  1995;  Bar-Yam,  1997,  2000).  The  attempts 
to  understand  fields  as  diverse  as  economics,  ecology,  biology, 
and  machine  learning,  among  others,  relate  to  similar  mathe¬ 
matical  treatments,  often  under  the  general  rubric  of  Complex 
Adaptive  Systems.  Although  there  are  still  many  unsolved  prob¬ 
lems  in  this  field,  one  might  raise  the  issue  of  why  there  needs 
to  be  a  separate  inquiry  into  the  nature  of  psychological  com¬ 
plexity.  There  are  two  basic  reasons  why  psychological  com¬ 
plexity  deserves  a  separate  treatment.  First,  what  may  be 
thought  of  as  objectively  complex  may  or  may  not  be  psycho¬ 
logically  complex.  Indeed,  an  exploration  of  these  differences 
forms  a  major  part  of  this  chapter.  Second,  when  we  consider 
complexity  in  the  more  specific  context  of  human-computer  in¬ 
teraction  (HCI),  it  is  useful  to  differentiate  intrinsic  complexity 
from  undue  or  gratuitous  complexity.  Some  tasks  are  inher¬ 
ently  complex.  We  may  help  people  perform  these  tasks  via 
work  aids,  education,  documentation,  rule-based  systems,  or 
the  clever  design  of  work  groups,  but  some  considerable  in¬ 
trinsic  complexity  may  remain.  In  contrast,  although  regret¬ 
table,  it  still  seems  to  be  a  fact  that  many  systems,  applications, 
and  artifacts  are  unnecessarily  complex.  Poor  design,  for  in¬ 
stance,  burdens  users  with  complexity  beyond  what  is  required 
by  the  nature  of  the  task.  In  this  chapter,  we  will  explore 
sources  of  this  undue  complexity,  as  well  as  ways  to  prevent  or 
mitigate  it.  While  this  chapter  will  reference  some  of  the  rele¬ 
vant  literature  on  the  more  general  topics  of  complexity  and 
psychological  complexity,  the  focus  of  the  chapter  is  on  how  we 
can  use  these  concepts  to  improve  HCIs,  in  most  cases  by  re¬ 
ducing  undue  complexity. 


In  terms  of  the  underlying  cause,  there  seem  to  be  three 
main  reasons  for  undue  complexity.  First,  undue  complexity 
sneaks  into  systems  even  though  people  consciously  try  to  pre¬ 
vent  it.  This  can  happen  for  numerous  reasons,  which  we  will 
explore  in  detail  below  (e.g.,  lack  of  appropriate  methodology). 
Second,  seemingly  undue  complexity  can  be  injected  into  sys¬ 
tems  intentionally.  For  instance,  systems  commonly  enforce 
rules  governing  both  password  structure  and  password  lifetime. 
These  are  often  perceived  as  an  annoyance  by  end  users  but 
are  intentional  policy  decisions  aimed  at  improving  overall  sys¬ 
tem  security.  A  third  and  more  subtle  reason  for  undue  com¬ 
plexity  relates  to  ordinary  social  processes.  As  a  group  of  people 
work  together  or  live  together,  they  develop  and  evolve  ways 
of  referring  to  things  that  become  convenient  short  hands  for 
the  in-group  but  become  increasingly  obscure  or  difficult  for  the 
out-group.  In  the  extreme,  this  results  in  different  accents  and 
dialects  and  eventually  distinctly  different  natural  languages, 
customs,  and  assumptions.  Disparate  groups  of  people  end  up, 
not  simply  using  different  terms,  but  thinking  about  the  world 
differently  (Abley  2005).  This  process  is  typically,  though  not  ex¬ 
clusively,  unconscious.  This  type  of  socialization  begins  early; 
experiments  indicate  that  children  as  young  as  several  weeks 
old  are  already  less  capable  than  they  were  at  birth  of  making 
auditory  distinctions  within  the  phonemic  categories  deemed 
equivalent  by  their  linguistic  community.  In  one  experiment,  as 
children  living  in  an  English-speaking  environment  aged  from 
6  months  to  12  months,  their  percentage  of  discriminations  of 
a  non-English  distinction  decreased  from  80%  correct  to  10% 
(Werker  &  Tees,  1984).  Categorical  speech  perception  is  only 
one  dramatic  form  of  a  process  that  is  happening  all  the  time 
and  at  many  levels  in  normal  social  interaction. 

The  Nature  of  Psychological  Complexity  as  a  Variable 

Why  are  people  interested  in  psychological  complexity?  It  is  an 
important  intervening  variable  that  simplifies  analysis  and  pre¬ 
diction.  On  one  hand,  a  number  of  factors,  explored  in  detail  in 
this  chapter,  impact  psychological  complexity,  and  on  the  other 
hand,  a  number  of  dependent  variables  depend  on  psychologi¬ 
cal  complexity. 

One  might  yet  question  the  utility  of  complexity  as  a  unify¬ 
ing  variable.  Although  Card,  Moran,  and  Newell  (1983)  talked 
about  complex  behavior  (and  the  principle  of  Rationality  to 
help  explain  how  to  build  models  of  complex  behaviors  from 
simple  ones),  they  do  not  treat  complexity  as  a  variable,  per  se. 
Despite  this,  under  a  wide  variety  of  conditions,  accurate  pre¬ 
dictions  of  complex  human  behavior  are  possible  (see  also, 
John,  1990;  Gray,  John,  Stuart,  Lawrence,  &  Atwood,  1990). 
However,  modeling  at  this  level  of  detail  requires  a  substan¬ 
tial  amount  of  work.  The  hope  for  complexity  is  that  one  might 
develop  predictive  measures  of  complexity  that  could  be  ap¬ 
plied  easily  without  special  expertise  that  would  still  be  highly 
predictive  of  human  behavior.  In  the  ideal  case,  for  instance, 
a  development  team  might  calculate  the  complexity  of  several 
alternative  design  ideas  for  a  human-computer  system  and 
make  at  least  some  of  the  many  necessary  design  decisions 
without  the  need  for  assessing  each  one  with  costly  behavioral 
observations. 
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We  expect  intuitively  that  increased  psychological  complex¬ 
ity  is  positively  correlated  with  increased  errors,  increased  time 
to  complete  a  task,  and  decreased  productivity,  as  well  as  in¬ 
creased  frustration  in  a  results-oriented  work  environment. 
(Even  in  such  work  contexts,  there  may  be  exceptions;  e.g., 
when  lengthy  vigilance  at  an  overly  simple  task  may  actually 
produce  more  errors  than  a  somewhat  more  complex  one.) 
Nonetheless,  the  overall  presumption,  both  in  the  public  mind 
and  in  the  scientific  literature  is  that  unnecessary  complexity  is  to 
be  generally  avoided  on  the  grounds  of  reducing  errors  and  im¬ 
proving  productivity.  The  cost  of  complexity  can  be  considerable. 
For  example,  half  of  the  consumer  products  returned  as  mal¬ 
functioning  are  actually  functioning  as  designed;  they  are  just  too 
complex  for  the  customer  to  use  (den  Ouden,  2006).  A  well- 
meaning  attempt  to  improve  a  homeland  security  system  re¬ 
sulted  in  more  errors,  longer  decision  times,  and  a  decreased 
probability  that  people  would  even  use  the  system  (Coskun  & 
Grabowski,  2005) .  Although  details  vary.  Fig.  25.1  shows  the  gen¬ 
eral  relationship  of  complexity  to  performance  and  frustration. 

However,  the  relationship  of  psychological  complexity  to 
more  ludic  (pleasure-oriented)  and  aesthetic  variables  is  more 
complex.  In  general,  the  prevailing  wisdom  is  that  a  moderate 
amount  of  complexity  is  aesthetically  pleasing.  Stimuli,  re¬ 
sponses,  mappings  between  stimuli  and  responses  or  contexts 
that  are  either  too  complex  or  too  simple  are  not  as  interest¬ 
ing,  pleasurable,  or  engaging  as  those  that  are  moderately  so. 
This  seems  to  be  a  robust  phenomenon  applying  to  infants  as 
well  as  adult  humans  and  to  various  species.  Of  course,  it  is  still 
the  case  that  what  constitutes  moderate  complexity  for  an  indi¬ 
vidual  depends  upon  previous  knowledge  as  well  as  personal¬ 
ity  predilections  for  more  or  less  complexity.  In  any  case,  psy¬ 
chological  complexity  is  potentially  a  powerful  intervening 
variable  that,  on  the  one  hand,  promises  to  collapse  the  impact 
of  numerous  separate  independent  variables  into  one  (e.g., 
complexity)  and,  on  the  other  hand,  to  expand  the  implications 
of  complexity  onto  a  number  of  dependent  variables.  In  the  case 
of  performance-related  variables,  the  relationship  is  thought  to 
be  monotonic;  in  the  case  of  subjective  variables,  the  relation¬ 
ship  is  thought  generally  to  be  an  inverted  U-function,  although 
it  may  vary  according  to  the  sense  and  the  individual. 

In  the  last  few  decades,  most  HCI  work  has  examined  work 
contexts  in  which  productivity,  in  the  broadest  sense,  has  been 
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FIGURE  25.1.  The  general  relationship  between  complexity  and 
performance,  and  complexity  and  frustration. 
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FIGURE  25.2.  The  typical  relationship  between  complexity  of  a 
stimulus  and  the  associated  aesthetic  pleasure  in  ludic  situations. 


a  primary  focus.  As  computing  technology  has  become  more  in¬ 
tertwined  with  home  life  and  entertainment,  the  importance  of 
also  using  psychological  complexity  as  a  predictor  of  pleasure 
and  preference  is  increasing. 

In  the  domain  of  things  that  are  meant  to  taste  good,  for 
instance,  increasing  complexity  often  seems  to  be  presumed  a 
good  thing.  For  example,  Vintage  Cocktails  (2007)  created  a 
subjective  five-point  scale  for  cocktails:  simple,  delicious  but 
not  daunting,  sophisticated  and  smart,  provocative  and  pro¬ 
found,  and  unusual  and  unforgettable.  A  blog  of  commentary 
on  Macallan  Elegancia  scotch  also  indicates  that  complexity  is 
presumed  to  be  a  good  thing.  For  instance: 

The  nose  is  very  good  balances,  with  pleasant  sweet  nutty  and  vanilla 
hints.  The  elegant  balance  does  not  mean  any  lack  at  complexity,  with 
discrete  wood  and  cut  flower  fragrances.  In  the  palate,  a  clear  nut  taste 
with  some  slightly  acid  hints  (citrus  fruits)  on  a  background  of  slightly 
bitter  malt.  A  spicy  and  relatively  long  finish.  (The  Macallan:  Speyside 
distillery  (2007) 

On  the  other  hand,  it  is  generally  thought  that  visual  and 
sonic  patterns  of  moderate  complexity  are  more  often  preferred. 
In  the  domains  of  art  and  architecture,  some  attempts  have  been 
made  to  quantify  what  makes  for  interesting  or  aesthetically 
pleasing  patterns.  For  example,  Christopher  Alexander’s  (2002) 
work  included  an  examination  of  and  ordering  of  visual  binary 
patterns,  and  Klinger  and  Salingaros  (2000)  suggested  a  metric 
for  measuring  the  visual  complexity  of  simple  arrays.  Other  re¬ 
cent  approaches  attempt  to  relate  preferences  to  fractal  numbers 
(Taylor,  Spehar,  Wise,  Clifford,  Newell,  Hagerhall,  et  al.,  2005). 
Other  investigators  have  tried  to  predict  musical  complexity 
based  on  objective  factors  (e.g.,  Shmulevich  &  Povel,  2000; 
Shmulevich,  Yli-Harja,  Coyle,  Povel,  &  Lemstrom,  2001). 

In  more  work-oriented  contexts,  psychological  complexity 
is  clearly  related  to  other  intervening  variables  such  as  stress, 
difficulty,  and  workload.  However,  there  are  distinctions  among 
these.  Stress  depends  not  just  on  the  complexity  of  the  problem 
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situation  itself  but  also  on  external  factors  such  as  the  payoff  ma¬ 
trix  and  internal  factors  like  neuroticism.  For  instance,  a  spe¬ 
cific  puzzle  might  be  thought  to  have  a  certain  level  of  com¬ 
plexity.  The  amount  of  stress  that  one  experiences  in  attempting 
to  solve  it,  however,  would  depend  upon  the  perceived  re¬ 
wards,  punishments,  and  time  pressure  as  well  as  the  individ¬ 
ual’s  internalized  habits  for  viewing  how  various  outcomes  re¬ 
late  to  overall  life  goals;  for  example,  proclivities  for  awfulizing 
(Ellis,  2001)  tend  to  increase  stress  far  more  than  what  might 
be  objectively  justified. 

The  more  complex  the  task,  the  more  difficult  we  would  ex¬ 
pect  it  to  be;  other  things  being  equal.  However,  difficulty  can 
also  accrue  to  a  task  for  a  number  of  other  reasons  including  en¬ 
vironmental  stressors  such  as  extreme  heat  or  cold,  vibration, 
noise,  or  the  necessity  of  applying  large  forces. 

The  term  workload  is  more  commonly  applied  to  an  indi¬ 
vidual  or  team  across  tasks.  One  can  imagine  an  individual  with 
a  large  number  of  tasks,  each  of  which  is  simple,  but  whose  job 
includes  many  interruptions  and  context  switches  from  one 
simple  task  to  another.  In  such  a  case,  the  workload  may  be  high 
even  though  any  particular  task  may  be  simple. 

There  is  another  sense  in  which  the  term  psychological 
complexity  is  sometimes  used.  Social  critics  may  use  the  term  to 
refer  not  to  a  single  task  or  system  but  to  the  totality  of  life  ex¬ 
periences.  Modern  life  in  the  information  age  is  sometimes 
deemed  too  complex  in  its  totality.  This  increased  complexity 
may  apply  to  politics,  personal  relationships,  child  rearing, 
transportation,  healthcare,  food  choices,  and  so  on,  extending 
potentially  to  every  aspect  of  modern  life.  Even  providing  for 
our  entertainment  needs  has  become  a  complex  endeavor 
(Grinter  &  Edwards,  2005).  In  addition  to  the  notion  that  each 
separate  aspect  of  life  is  becoming  more  complex,  people  of¬ 
ten  find  themselves  multitasking  and  task  switching  (e.g.,  Gon¬ 
zalez  &  Mark,  2005).  While  this  is  a  potentially  interesting  av¬ 
enue  to  explore  in  its  own  right,  it  is  beyond  the  scope  of  this 
chapter,  which  instead  will  focus  on  psychological  complexity  as 
it  applies  to  individual  systems  or  tasks  that  have  information 
technology  aspects. 

Early  attempts  to  measure  task  difficulty  with  objective  met¬ 
rics  include  Attneave’s  (1957)  figure  complexity.  There  have 
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FIGURE  25.3.  A  number  of  variables,  including  complexity,  can 
increase  stress. 
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FIGURE  25.4.  A  number  of  variables,  including  complexity,  can 
add  to  task  difficulty. 


NUMBER  OF 
TASKS 

TASK  1  DIFFICULTY 


TASK  N  DIFFICULTY  t 


MULTI  TASKING  , - 

SCHFDUIF  ^ 


WORKLOAD 


OTHER  I 

INTERRUPTIONS 
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various  tasks,  the  number  of  tasks,  the  scheduling  of  the  tasks 
and  other  factors. 


been  attempts  to  provide  measures  for  the  meaningfulness  of 
trigram  nonsense  stimuli,  responses,  and  the  mapping  between 
the  stimulus  and  response  in  the  verbal  learning  literature. 
Others  have  attempted  to  extend  this  work  to  real  words;  En¬ 
glish  words  have  been  rated  for  frequency  of  occurrence  (Kucera 
&  Francis,  1967;  Brown,  G.  D.  A.,  1984),  concreteness  (Paivio, 
Yuille,  &  Madigan,  1968),  and  so  on.  In  one  case,  (e.g.,  Rubin, 
1980),  51  dimensions  of  words  were  measured.  Such  measures 
do  provide  predictability  in  a  wide  range  of  laboratory  tasks. 
For  instance,  the  log  of  word  frequency  is  closely  related  to  de¬ 
creased  latency  in  a  simple  naming  task  where  subjects  are 
shown  pictures  and  asked  to  name  the  pictured  object  as 
quickly  as  possible  (Thomas,  Fozard,  &  Waugh,  1977).  Such  em¬ 
pirical  relationships  as  these,  however,  are  problematic  in  terms 
of  applicability  to  HCI.  For  one  thing,  when  measures  are  taken 
on  real  words  in  English,  many  dimensions  tend  to  be  corre¬ 
lated.  Carroll  and  White  (1973)  used  a  linear  regression  model 
and  showed  that  age  of  acquisition  is  actually  a  more  powerful 
predictor  than  frequency  of  occurrence  (though  the  two  are 
highly  correlated).  Neural  net  models  of  learning  also  demon¬ 
strate  this  effect  (Smith,  Cottrell,  &  Anderson,  2001).  How  might 
one  apply  such  findings  to  HCI?  For  example,  given  a  choice 
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between  two  otherwise  equally  appropriate  words,  it  is  gener¬ 
ally  better  to  use  the  word  that  is  more  frequent  in  the  language, 
or  even  better,  one  that  is  learned  early  in  life.  However,  the 
relevance  of  this  heuristic  to  the  design  of  instructions,  web 
pages,  error  messages,  and  so  on,  is  very  limited  because  in  real 
design  situations,  we  seldom  have  a  choice  between  two  other¬ 
wise  equally  appropriate  words.  More  frequent  words  also  often 
have  the  property  of  being  more  ambiguous,  both  semantically 
and  syntactically  (e.g.,  can,  run),  than  longer,  less  frequent 
words.  Words  learned  early  in  life  often  have  little  relevance  to 
most  tasks  that  adult  users  engage  in.  Knowledge  of  the  partic¬ 
ular  users,  their  vocabularies,  the  task  demands,  conventions, 
contexts,  and  so  on,  dominates  more  general  properties  of  the 
language  in  suggesting  the  best  wording. 


COMPLEXITY  AND  RELATED  CONCEPTS 


In  this  section,  we  will  attempt  to  further  define  psychologi¬ 
cal  complexity  by  distinguishing  it  from  a  number  of  similar 
concepts. 


marked  handle  for  hot  water  and  another  for  cold  water,  com¬ 
bined  with  a  two-position  lever  for  shower  and  bath  is  more 
complex  visually  than  a  single,  unlabeled  knob  with  several  de¬ 
grees  of  freedom.  At  least  initially,  however,  the  former  will  be 
easier  to  use.  Similarly,  a  normal  bottle  opener  is  not  particularly 
elegant  or  simple  in  shape.  However,  its  asymmetries,  as  well 
as  its  commonality,  make  it  easy  for  people  (in  our  culture)  to 
use.  An  alternative  bottle  opener  (See  Fig.  25.6)  consists  of  a 
metal  cone  with  a  thin  protruding  knob.  This  device  is  physically 
simpler,  but  harder  to  use,  at  least  initially. 

Relationship  of  Simplicity  and  Complexity 

Simplicity  and  complexity  seem  to  be  opposites.  However,  there 
is  an  interesting  asymmetry  in  their  opposition.  It  is  as  though 
absolute  simplicity  is  the  center  point  of  an  n-dimensional 
sphere,  while  absolute  complexity  is  the  surface.  In  other  words, 
there  are  many  ways  of  moving  away  from  simplicity  toward 
complexity.  Figure  25.7  illustrates  just  a  few  of  the  many  ways 
that  a  system  can  become  more  complicated.  For  simplicity,  we 
illustrate  a  two-dimensional  rather  than  n-dimensional  space. 


Relationship  of  Complexity  and  Ease  of  Use 


It  would  seem  that  ease  of  use  and  simplicity  constitute  closely 
related  concepts.  However,  they  do  differ  in  several  ways.  First, 
ease  of  use  implies  an  empirical  orientation.  There  are  a  number 
of  interesting  issues  involved  in  measuring  ease  of  use,  such  as 
choosing  the  sets  of  reference  users,  choosing  representative 
tasks,  and  choosing  the  dependent  variables  to  be  measured 
(time  to  complete,  errors,  quality  of  result,  etc.).  In  principle, 
ease  of  use  can  be  measured  objectively  in  terms  of  human  be¬ 
havior.  Complexity,  on  the  other  hand,  does  not  have  the  same 
degree  of  conceptual  consensus.  There  are  approaches  that 
focus  on  the  formal,  intrinsic  properties  of  a  system  or  stimu¬ 
lus  (e.g.,  Halstead,  1977;  Alexander,  2002).  Other  approaches 
focus  on  the  reactions  of  a  human  being  to  the  system  or  stim¬ 
ulus.  These  can  include  passive  measures,  such  as  pupil  dilation, 
staring  time,  or  heart  rate  deceleration.  They  can  also  include 
more  subjective  judgments  or  attempts  to  measure  various  as¬ 
pects  of  task-oriented  behavior;  for  example,  the  number  of 
steps  in  a  process  or  the  number  of  variables  that  must  be  re¬ 
membered  (e.g..  Brown,  A.  B.  &  Hellerstein,  2004;  Brown,  A.  B., 
Keller,  &  Hellerstein,  2005). 

Nonetheless,  other  things  being  equal,  one  would  expect 
something  that  is  less  complex  to  be  easier  to  use.  However, 
other  things  are  typically  not  equal.  Ease  of  use  often  depends 
heavily  on  people’s  expectations  that  may  result  from  cultural 
conventions;  often  varying  widely  with  time  and  place.  Today,  a 
new  application  that  uses  the  typical  GUI  widgets,  such  as  pull¬ 
down  menus,  may  be  easy  to  use  for  experienced  users,  pri¬ 
marily  because  people  have  a  fair  amount  of  positive  transfer 
from  earlier,  similar  experiences.  A  unique  and  unfamiliar  but 
less  complex  interface  may  not  prove  as  easy  to  use. 

In  addition,  a  design  that  maximizes  visual  simplicity  through 
symmetry  and  minimalism  may  actually  increase  the  difficulty 
of  use.  For  example,  a  shower  arrangement  with  one,  clearly 


FIGURE  25.6.  A  "simple”  but  highly  non-intuitive  bottle-opener. 
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A:  x**2-Fy**2=  1 
B:(x-l)**2  +  y**2=l 
C:  (x-l)**2+(y-l)**2=l 
D:  x**2-F(y-l)*'*2=l 


FIGURE  25.7.  A  "simple”  system  can  be  complicated  in  many 
different  ways.  (Here  are  just  a  few.) 
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This  is  more  than  an  idie  phiiosophicai  observation.  In  the 
reai-worid  iterative  cycie  of  design  and  behaviorai  observation, 
it  is  reiativeiy  difficuit  to  move  inward  toward  greater  simpiicity 
whiie  improving  ease  of  use.  It  is  relatively  easy  to  move  along 
the  surface  of  a  sphere,  resolving  some  types  of  complexity 
whiie  simuitaneousiy  introducing  others  and  therefore  not  in¬ 
creasing  either  overaii  simpiicity  or  ease  of  use.  For  exampie,  in 
designing  a  word-processing  system,  you  might  find  that  peopie 
are  having  difficuity  finding  desired  items  in  iong  puii-down 
menus.  You  decide  to  make  each  menu  simpier  by  changing  the 
puii-down  options  depending  on  whether  one  ciicks  or  ciicks 
with  the  shift  key  depressed.  This  makes  each  menu  simpier, 
but  introduces  another  kind  of  compiexity.  Much  of  the  art  and 
the  importance  of  experience  in  HCI  can  be  conceived  of  as  be¬ 
ing  abie  to  move  in  from  the  surface  rather  than  around  the  sur¬ 
face.  (For  a  reiated  discussion  with  respect  to  simpiicity  and  ac- 
cessabiiity  see  Lewis,  in  press.) 

Relationship  of  Complexity  and  Uncertainty 

Increased  complexity  is  associated  with  increased  uncertainty, 
ceteris  paribus.  The  term  complexity  can  be  reasonably  applied 
to  stimuli,  responses,  or  to  the  mapping  between  them.  The 
concept  of  uncertainty  however,  seems  to  apply  only  to  map¬ 
ping.  One  can  have  uncertainty  about  what  a  stimulus  is,  or  un¬ 
certainty  about  what  response  to  make,  but  fundamentally,  it 
has  to  do  with  what  action  is  appropriate,  given  the  current 
circumstances. 

Complexity,  as  mentioned  earlier,  can  be  thought  of  either  as 
something  objectively  measurable  or  as  something  subjective. 
The  term  uncertainty  also  has  an  objective  meaning.  If  one 
turns  over  a  randomly  chosen  card  from  an  ordinary  deck,  there 
is  more  objective  uncertainty  about  the  outcome  than  if  one 
flips  a  coin.  The  term  uncertainty  may  also  be  applied  in  a 
purely  subjective  way.  In  this  subjective  sense,  a  given  person 
may  be  completely  certain  that  a  randomly  chosen  card  will  be 
the  Ace  of  Spades  (because  of  nothing  more  than  a  compelling 
intuition)  and  completely  uncertain  about  the  outcome  of  a 
coin  toss.  The  concept  of  subjective  uncertainty  has  even  fur¬ 
ther  shades  of  application.  One  may  watch  the  mowie  Apollo  13 
on  numerous  occasions.  At  one  level,  the  outcome  is  already 
predefined  and  completely  known.  There  is  neither  objective 
nor  subjective  uncertainty.  Yet,  within  the  inner  context  of 
watching  the  story  play  out,  the  viewers  allow  themselves  (each 
time)  to  feel  the  subjective  uncertainty  of  outcome. 

Subjective  uncertainty  in  HCI  can  apply  at  numerous  levels. 
For  example,  in  order  to  make  touch  typing  a  felicitous  interac¬ 
tion,  one  subjectively  presumes  that  each  keystroke  will  be  cor¬ 
rectly  transmitted  to  the  computer  and  that  what  appears  on  the 
screen  is  an  accurate  mapping  of  what  is  actually  being  stored 
inside  the  computer.  Objective  certainty  about  the  states  of 
electromechanical  systems  probably  never  reaches  100%.  How¬ 
ever,  as  a  strategy  for  partitioning  effort,  it  is  often  useful  to  ig¬ 
nore  very  small  error  probabilities  in  subjective  judgments.  The 
first  time  the  word-processing  software  changed  Thomas’  (this 
chapter)  the  to  The,  certainty  changed.  Thomas  began  to  won¬ 
der  what  else  the  application  might  be  doing.  Later,  a  much 
worse  problem  surfaced  in  the  form  of  an  automatic  update  to 


styles.  In  response  to  something  typed  on  page  1,  something 
might  change  appearance  on  page  15.  Subjective  uncertainty 
also  bears  some  resemblance  to  the  concept  of  trust.  You  might 
have  1,000  interactions  with  someone,  each  of  which  provides 
evidence  of  trustworthiness,  yet  even  a  single  interaction  that 
shows  a  person  to  be  untrustworthy  may  color  a  relationship  for 
years  to  come.  Similarly,  if  an  interaction  between  user  and  the 
computer  system  violates  an  assumed  behavioral  norm  by  doing 
something  completely  unexpected,  the  level  of  subjective  un¬ 
certainty  created  may  be  much  greater  than  warranted  in  any 
objective  sense.  Moreover,  if  the  user’s  experience  includes 
such  a  violation,  the  scope  of  subjective  uncertainty  may  be  dif¬ 
ficult  to  predict.  One  user  may  simply  generalize  uncertainty  to 
a  specific  function;  another  may  come  to  distrust  a  specific  ap¬ 
plication  (have  a  high  degree  of  uncertainty)  while  a  third  may 
come  to  distrust  computers  in  general. 

Relationship  of  Complexity  to  Number 

One  would  expect  that  if  systems  A  and  B  are  equivalent  in 
other  respects,  but  system  A  is  more  extensive  than  system  B 
in  terms  of  the  number  of  items,  that  system  A  is  more  com¬ 
plex  than  system  B.  The  degree  to  which  an  increase  in  num¬ 
ber  produces  an  increase  in  complexity,  however,  depends 
greatly  on  the  structure  of  the  systems.  Suppose  that  systems 
A  and  B  consist  of  5  and  10  buttons  respectively,  combined 
with  an  OK  light.  The  user  has  to  use  trial  and  error  to  discover 
the  correct  button  to  push  to  cause  the  OK  light  to  light.  On 
average,  it  will  take  2.5  trials  to  find  the  right  button  for  system 
A  and  5  trials  to  find  the  right  button  for  system  B.  Here,  dou¬ 
bling  the  number  of  choices  doubles  the  average  number  of 
trials.  On  the  other  hand,  imagine  that  the  user  must  find  a 
sequence  of  5  button  pushes  for  system  A  and  10  button 
pushes  for  system  B.  When  the  user  hits  each  correct  button, 
the  OK  light  comes  on.  On  average,  it  will  take  the  user  2.5  + 
21-1.5  +  1  =  7  attempts  to  find  the  right  sequence  for  sys- 
temA  and  4.5  +  4  +  3.5  +  3  +  2.5  +  2  +  1.5  +  1  =  21  attempts 
to  find  the  right  sequence  for  system  B.  Here  a  doubling  in  the 
number  of  buttons  results  in  three  times  the  mean  number  of 
trials.  For  a  final  case,  imagine  that  the  OK  light  comes  on  only 
after  all  buttons  are  pushed  in  the  correct  sequence.  There  are 
5!  possible  sequences  for  system  A  and  the  user  will  find  the  cor¬ 
rect  sequence  on  average  in  5!/2  trials  or  60  trials.  There  are  10! 
possible  sequences  for  system  B  and  the  user  will  find  the  cor¬ 
rect  sequence,  on  average  in  1,814,400  trials.  Here  a  doubling  in 
number  has  led  to  an  increase  in  trials  by  a  factor  of  30,240.  In 
practical  terms,  system  A,  though  cumbersome,  could  be  solved 
by  hand  in  a  few  minutes.  System  B  would  probably  extend  be¬ 
yond  anyone’s  patience.  A  quantitative  difference  in  complex¬ 
ity  actually  would  result  in  the  qualitative  difference  between 
something  doable  and  something  not  doable.  These  examples 
illustrate  that  the  impact  of  number  on  complexity  greatly  de¬ 
pends  on  the  nature  of  the  system.  In  particular  hidden  states 
are  usually  a  bad  thing  in  terms  of  complexity,  but  just  how  bad 
they  become  rises  quickly  with  increasing  number. 

Another  example  has  to  do  with  working  memory  and  re¬ 
lates  to  George  Miller’s  (1956)  famous  paper  “The  magic  number 
seven  plus  or  minus  two.”  Though  the  size  of  working  memory 
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MEMORY  LOAD 

FIGURE  25.8.  How  memory  load  can  have  a  minor  effect  on 
performance,  up  to  a  point,  after  which  increasing  memory  load 
still  more  yields  a  sharp  drop  in  performance. 

varies  somewhat  depending  on  the  number  of  dimensions  and 
the  type  of  material,  it  is  not  large.  A  system  that  allows  a  person 
to  operate  within  that  span  can  be  nearly  error  free,  while  a  sys¬ 
tem  that  forces  a  person  to  operate  beyond  that  span  will  be 
extremely  error  prone.  A  modest  increase  in  memory  load  can 
thus  result  in  a  very  large  increase  in  time  and  errors. 

Relationship  of  Complexity  to  Nonlinearity 

One  of  the  defining  characteristics  of  complex  adaptive  systems 
is  that  they  exhibit  nonlinearity.  From  the  perspective  of  sur¬ 
vival,  this  nonlinearity  is  a  good  thing.  From  the  perspective  of 
someone  trying  to  interact  with  a  computer  system,  nonlinear¬ 
ity  can  be  a  major  source  of  difficulty  in  understanding,  predict¬ 
ing,  and  controlling  the  behavior  of  the  system.  For  example, 
Thomas  (this  chapter)  once  had  to  debug  a  PDP-8  program  that 
was  intended  to  collect  reaction  time  data.  There  was  only  a 
single  bit  wrong  in  the  entire  program  (which  probably  had 
roughly  5,000  bits  total).  In  a  linear  system,  you  would  expect 
that  a  .02%  error  would  result  in  a  small  behavioral  problem; 
for  example,  the  reaction  time  would  be  off  by  a  constant  .02%. 
Of  course,  nothing  of  the  kind  happened.  The  program  did  not 
collect  reaction  times  at  all.  Instead,  it  eventually  caused  the  in¬ 
terpreter  to  crash.  The  bit  that  was  wrongly  set  was  the  indirect 
bit  so  that  instead  of  storing  the  reaction  time,  the  program 
used  the  reaction  time  as  an  address  in  which  to  store  the  con¬ 
tents  of  a  register.  Furthermore,  since  every  time  the  author 
tried  to  test  the  program,  the  reaction  time  varied  slightly,  the 
precise  behavioral  path  of  the  program  was  different  every  sin¬ 
gle  time.  This  example  is  not  unusual,  but  it  illustrates  the  prop¬ 
erty  that  computer  systems  are  typically  highly  nonlinear  in 
their  behaviors.  Generally,  higher  layers  of  software  are  de¬ 
signed  to  make  the  system,  as  seen  by  the  user,  as  being  more 
nearly  linear  (and  therefore,  more  predictable).  However,  even 
in  commonplace  high-level  applications,  many  nonlinear  as¬ 
pects  typically  remain. 


Relationship  of  Complexity  to  Distribution 

Halstead  (1977)  claimed  that  complexity  C  is  equal  to  a  constant 
k  times  the  sum  of  the  number  of  unique  operators  U  times  the 
natural  log  of  the  total  number  of  operators  u  plus  the  number 
of  unique  operands  O  multiplied  by  the  natural  log  of  the  total 
number  of  operands  o. 

C  =  k  ■*  (U  +  \nu  +  O  +  \no) 

This  is  not  an  unreasonable  approximation.  We  have  already 
seen  how  number  can  increase  complexity.  We  could  also  con¬ 
sider,  however,  how  these  operators  (or  operands)  are  distributed 
throughout  a  program.  Unlike  Halstead  (1977),  are  speculate  that 
(a)  AAAAAAAABBBBBBBCCCCCCCC  is  easier  to  comprehend, 
use,  edit,  and  so  on,  than  (b)  AABBCAABCBBCCAABCAACCCB. 

It  has  been  known  for  some  time  that  repeated  operations 
are  typically  faster  than  alternating  ones  (e.g.,  Fozard,  Thomas, 
&  Waugh,  1976).  There  can  be  exceptions  to  this  general  rule 
based  on  rhythm,  fatigue,  or  satiation,  especially  for  extremely 
simple  tasks.  However,  for  complex  tasks,  we  generally  expect 
a  distribution  such  as  (a)  above  to  be  much  easier  than  (b).  For 
example,  in  the  Stroop  task,  one  can  be  asked  either  to  name 
the  colors  of  ink  (while  ignoring  the  word)  or  to  read  the  names 
of  colors  (while  ignoring  the  color  of  ink).  Alternating  be¬ 
tween  the  two  every  line  is  much  slower  than  doing  either  task 
en  masse  (e.g..  Philips,  Bull,  Adams,  &  Fraser,  2002).  In  the 
Brown  et  al.  (2005)  model,  this  source  of  complexity  is  modeled 
as  context  shifts. 

Relationship  of  Complexity  to  Nature  of  Elements 

Given  the  same  number  and  structure  of  elements,  two  systems 
that  are  formally  isomorphic  may  be  quite  different  in  terms  of 
psychological  complexity  based  on  differences  in  human  biol¬ 
ogy,  learning,  or  both.  For  example,  it  seems  clear  that  the  hu¬ 
man  visual  system  is  especially  well  tuned  to  handle  the  com¬ 
plexity  of  human  faces.  For  example,  Johnson  and  Morton 
(1991)  presented  evidence  that  babies  as  young  as  three  hours 
after  birth  can  recognize  human  faces.  Chernoff  (1973)  sug¬ 
gested  that  this  ability  might  be  used  to  represent  underlying 
complexity  in  a  way  that  is  easier  for  humans  to  deal  with.  In 
the  auditory  domain,  it  is  clear  that  the  difficulty  of  dealing  with 
(recalling,  rearranging,  etc.)  a  string  of  phonemes  that  follow  the 
phonological  rules  of  one’s  native  language  is  much  less  than 
dealing  with  an  equal  length  string  that  violates  those  rules. 

Relationship  of  Complexity  to  Naming  Scheme 

Consider  two  programs  that  manifest  identical  structures.  The 
program  asks  the  user  to  input  an  integer  from  1  to  100.  For 
each  integer,  the  program  then  displays  a  corresponding 
unique  horoscope.  Structurally,  this  is  a  very  simple  program. 
However,  imagine  that  in  one  version,  the  user’s  input  num¬ 
ber  is  called,  “User’s  Input  Number”  and  in  another  it  is  called 
“NXBNM.”  Further,  suppose  the  horoscopes  in  the  first  pro¬ 
gram  are  labeled  “horoscopel,”  “horoscope2,”  “horoscope3,” 
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and  so  on  for  each  corresponding  integer.  In  the  second  pro¬ 
gram,  they  are  each  labeled  with  a  random  two-letter  string 
such  as  “HA,”  “IB,”  “JK,”  and  so  on.  It  is  hopefully  clear  that 
these  two  programs,  though  formally  identical  would  be  vastly 
different  in  how  easy  they  would  be  to  understand,  modify,  de¬ 
bug,  and  so  on. 

While  the  above  example  is  somewhat  contrived,  developing 
a  consistent  and  intuitive  naming  scheme  is  a  real-world  prob¬ 
lem.  Often,  when  a  programmer  (or  any  other  kind  of  designer) 
begins  to  name  things,  they  do  not  have  a  complete  under¬ 
standing  of  the  space  of  things  they  will  have  to  name.  For  in¬ 
stance,  when  Thomas  (this  chapter)  began  working  on  a  recent 
project,  he  named  a  file  folder  “ELFL”  for  “Electronic  Learning 
Flow  Language.”  As  the  project  grew,  one  folder  was  much  too 
generic  for  all  the  nuances  that  emerged.  Furthermore,  the  fo¬ 
cus  of  the  project  changed,  and  no  one  even  talked  about  Elec¬ 
tronic  Learning  Flow  Language  any  more.  Furnas,  Gomez,  Lan- 
dauer,  and  Dumain  (1982)  found  that  not  only  are  two  different 
people  likely  to  come  up  with  different  names  for  things,  but 
that  even  the  same  person  over  time  is  unlikely  to  sponta¬ 
neously  come  up  with  the  same  name.  They  suggest  tables  of 
synonyms  to  help  alleviate  this  problem.  Real  computer  systems 
are  often  rife  with  names  that  are  difficult  to  comprehend  or  to 
recall  precisely. 


Relationship  of  Complexity  to  Obscurity 

Consider  the  following  two  sequences  of  binary  digits. 

A:  0000  0001  0010  0011  0100  0101  0110  0111  1000 

B:  1000  0101  0100  0001  0111  0110  0011  0010  0000 

These  two  sequences  each  contain  the  same  number  of  bi¬ 
nary  digits  and  the  same  number  of  groupings.  The  first  se¬ 
quence  is  essentially  a  counting  sequence  from  zero  to  eight. 
This  is  an  obvious  sequence  if  one  knows  how  to  count  in  bi¬ 
nary.  The  second  sequence  would  presumably  be  much  harder 
to  memorize  for  most  people.  However,  it  also  represents  the 
numbers  zero  to  eight;  however,  they  are  arranged  in  alphabet¬ 
ical  order  according  to  the  English  names.  Once  one  realizes 
this  rule,  memorizing  or  reconstructing  the  second  sequence 
becomes  much  easier.  There  is  a  sense  in  which  the  first  se¬ 
quence  strikes  us  as  a  natural  ordering  because  it  is  based  on 
what  we  can  see  right  before  us.  Numbers  are  likely  to  be  highly 
associated  with  counting,  after  all.  The  rule  that  orders  the  sec¬ 
ond  sequence  strikes  us  as  more  obscure,  however,  since  it  de¬ 
pends  on  some  other  representation  (the  English  names  of  the 
numbers)  that  is  not  naturally  and  strongly  associated  with  the 
numbers  qua  numbers.  This  notion  of  obscurity  is  closely  asso¬ 
ciated  with  the  contrary  concept  of  affordance  (Norman,  1988). 
If  the  perceptible  properties  of  an  object  or  system  immediately 
make  it  clear  what  it  can  do  and  how  to  do  it  we  can  say  the  ob¬ 
ject  or  system  has  good  affordances.  Of  course,  whether  some¬ 
thing  appears  obscure  or  not  depends  heavily  on  the  user’s  pre¬ 
vious  experience,  both  cultural  and  personal  and  may  include 
inborn  factors  as  well.  In  Fig.  25.2,  the  equations  version  of  the 
simple  diagram  in  the  middle  illustrates  obscurity. 


Relationship  of  Complexity  to  Structural  Framework 

One  of  the  interesting  aspects  of  perceived  complexity  is  that 
when  it  comes  to  human  beings,  sometimes  more  is  less.  For 
example,  most  English  readers  are  capable  of  distinguishing 
WORD  from  CORD  at  a  briefer  presentation  time  than  they  can 
distinguish  W from  C  (Reicher,  1969;  Hildebrandt,  Caplan,  Sokol, 
&  Torreano,  1995).  Learning  a  set  of  paired  associates,  pre¬ 
sented  along  with  mnemonic  visual  images,  is  much  easier  than 
learning  the  set  of  paired  associates  alone  (Thomas  &  Ruben, 
1973).  Subjects  presented  with  a  set  of  sentences  and  a  con¬ 
text-setting  theme  or  picture  have  a  much  easier  time  learning 
and  remembering  a  story  than  those  who  do  not  have  the  pic¬ 
ture  (Bransford  &  Johnson,  1972).  Recalling  a  story  that  fits  with 
our  cultural  patterns  is  much  easier  than  recalling  one  that  does 
not  (Bartlett,  1932).  In  all  of  the  above  cases,  the  advantage  of 
the  structural  framework  is  that  it  relates  new  items  to  some¬ 
thing  that  already  exists  in  memory. 

In  other  cases,  however,  the  advantage  of  structure  seems 
to  have  more  to  do  with  the  underlying  nature  of  the  nervous 
system.  For  example,  in  attempts  to  develop  useful  and  novel 
representations  of  speech  signals  in  order  to  improve  synthesis, 
one  technique  developed  by  Pickover  (1985)  mapped  an  auto¬ 
correlation  function  into  polar  coordinates  scaled  to  fit  within 
a  30  degree  angle.  The  resulting  dot  patterns  (see  Figure  25.9) 
proved  singularly  difficult  to  interpret  in  any  useful  way.  How¬ 
ever,  when  the  pattern  was  reflected  to  include  a  mirror  image 
and  the  resulting  60  degree  angle  repeated  six  times  around  a 
central  point  to  produce  a  snowflake-like  pattern;  the  represen¬ 
tation  easily  differentiated  various  vowel  sounds  as  well  as  some 
subtleties  like  anticipatory  nasalization  which  were  virtually  in¬ 
visible  in  the  traditional  sonogram.  Of  interest  in  this  context  is 
the  fact  that  the  useful  representation  contained  no  more  in¬ 
formation  than  did  the  useless  one.  Arguably,  one  could  say  the 
speech-flake  pattern  was  more  complex.  However,  the  com¬ 
plexity  apparently  allowed  visual  symmetry  detectors  to  come 
into  play  and  produced  far  more  distinguishable  and  memo¬ 
rable  patterns.  These  Symmetricized  Dot  Patterns  have  since 
been  applied  to  numerous  other  domains  as  diverse  as  cardiac 
signals,  mechanical  stress,  and  even  the  crunchiness  of  cereals. 

Complexity  of  System  vs.  Task  Complexity 

It  is  easy  to  fall  into  the  trap  of  conflating  the  complexity  of  an 
interactive  computer  system  with  the  complexity  of  the  task  that 
one  is  attempting  to  perform  while  using  that  system.  In  fact, 
many  studies  in  the  HCI  literature  compare  two  or  more  ver¬ 
sions  of  systems  by  having  subjects  perform  a  small  number  of 
tasks  in  the  two  systems.  Often  these  tasks  are  treated  as  fixed 
factors  when  they  are  actually  a  small  sample  (probably  not  ran¬ 
domly  chosen)  of  all  possible  tasks.  In  principle,  it  is  very  diffi¬ 
cult  to  generalize  about  the  superiority  of  one  system  over  an¬ 
other  based  on  task  performance.  One  cannot  really  know  that 
a  different  set  of  tasks  may  not  have  completely  obliterated  or 
even  reversed  any  observed  trend. 

As  a  general  heuristic,  one  might  suppose  that  the  com¬ 
plexity  of  the  system  should  reflect  the  complexity  of  the  nec¬ 
essary  tasks  to  be  supported.  That  is,  if  the  user  only  needs  to 
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FIGURE  25.9.  A  symmetrized  dot  pattern. 


perform  very  simple  tasks,  it  may  be  enough  to  provide  a  sim¬ 
ple  system.  However,  if  the  user  needs  to  perform  very  com¬ 
plex  tasks,  then  a  more  complex  tool  may  be  required.  In  gen¬ 
eral,  there  may  be  some  truth  to  this  heuristic,  but  it  must  be 
applied  with  care.  One  caveat  is  that  the  function  associated 
with  this  increased  system  complexity  must  actually  be  useful 
and  allow  the  user  to  focus  attention  on  the  task.  Otherwise, 
additional  complexity  in  a  system  may  actually  be  more  disrup¬ 
tive  when  the  task  is  complex  than  when  it  is  simple.  For  ex¬ 
ample,  consider  the  tasks  of  writing  a  two-page  trip  report  and 
writing  a  full-length  novel.  Writing  the  novel  (we  assume)  is  a 
much  more  complex  task.  It  may  be  that  the  various  functions, 
fonts,  formatting,  and  options  in  a  complex  word  processor  are 
unnecessary  and  even  more  distracting  for  someone  trying  to 
keep  in  mind  a  complex  set  of  subplots  and  characters  than  for 
someone  writing  a  short  trip  report. 

Another  important  distinction  is  between  the  underlying 
complexity  of  a  system  and  the  complexity  that  is  surfaced  to 
the  user.  A  huge  amount  of  research  and  technological  sophis¬ 
tication  may  be  involved  in  the  development  of  an  automatic 
transmission  but  from  the  driver’s  perspective,  the  transmis¬ 
sion  is  simple.  From  the  perspective  of  the  user,  modern  search 
engines  provide  a  simple,  commonplace  interface.  Most  have 
no  notion  of  the  complexity  of  the  underlying  processes.  Of 
course,  attempts  to  do  the  user  favors  by  providing  hidden  in¬ 
telligence  may  sometimes  backfire. 

The  ideal  case  is  probably  to  provide  a  layered  interface  in 
which  the  system  reveals  only  the  complexity  needed  for  the 
task.  Early  examples  of  this  approach  include  the  concept  of 
“training  wheels”  (Carroll  &  Carrithers,  1984).  The  interfaces  for 
another  related  set  of  projects,  the  Speech  Filing  System,  the 
Audio  Distribution  System,  and  the  Olympic  Message  System 
(Gould  &  Boies,  1984),  also  provided  layered  interfaces.  How¬ 
ever,  in  many  real  systems,  it  often  happens  that  the  user  inad¬ 
vertently  falls  through  the  user  interface  into  a  deeper  layer  or 


must  do  so  to  accomplish  their  real  task.  In  addition,  undue 
complexity  in  the  underlying  system  can  indirectly  hurt  the  user 
experience  by  making  code  harder  to  test,  debug,  document, 
and  maintain.  Therefore,  we  consider  undue  complexity  as 
something  to  be  generally  avoided  both  internally  and  in  terms 
of  what  is  to  be  made  visible  to  the  end  user. 

Complexity  of  System  vs.  Contextual  Complexity 

A  person  may  be  using  a  system,  that  in  and  of  itself  is  simple, 
such  as  a  bicycle,  and  their  task  may  be  simple,  such  as  riding 
in  a  straight  line  without  falling  or  crashing.  However,  riding  that 
bicycle  in  a  straight  line  on  an  empty,  dedicated  bike  path 
through  a  park  might  be  quite  different  from  riding  on  a  busy 
city  street.  It  is  somewhat  of  a  judgment  call  to  draw  a  line  be¬ 
tween  task  and  context.  In  this  example,  one  might  just  as  well 
say  that  the  two  tasks  are  different.  However,  collapsing  all  such 
variations  into  the  task  complexity  is  probably  counterproduc¬ 
tive  because  the  types  of  actions  that  can  be  taken  and  the 
power  to  take  those  actions  are  typically  quite  different  in  the 
case  of  simplifying  the  task  versus  simplifying  the  context. 

Contextual  complexity  could  increase  overall  complexity  by 
providing  secondary  tasks  or  by  bombarding  the  person  with 
extraneous  stimuli.  Contextual  complexity  might  also  produce 
internal  distracting  material.  Imagine  two  students  using  a  word 
processor  to  complete  an  essay  take-home  exam.  One  of  them 
is  also  worried  about  the  outcome  of  some  diagnostic  medical 
tests.  Consider  two  executives  who  are  using  presentation  soft¬ 
ware  to  construct  a  sales  pitch.  One  of  them  is  operating  in  an 
organizational  context  that  makes  it  necessary  to  pass  the  pre¬ 
sentation  through  four  layers  of  management,  each  with  very 
different  ideas  about  what  makes  a  good  sales  presentation.  It 
seems  much  more  natural  to  consider  such  differences  to  be  dif¬ 
ferences  in  contextual  complexity  rather  than  task  complexity. 
We  might  expect  the  design  of  the  presentation  software  to  help 
the  user  deal  with  the  task  complexity  in  this  example  but  not 
the  contextual  complexity. 

Complexity,  Feedback,  and  Interactivity 

Finally,  the  extent  to  which  we  can  easily  understand,  control, 
and  predict  the  behavior  of  a  system  has  much  to  do  not  only 
with  how  complex  it  is  but  also  how  readily  we  can  interact  with 
it  and  receive  timely,  unambiguous  feedback.  For  example,  the 
“dynamic  query”  system  (Ahlberg,  Williamson,  &  Shneiderman, 
1992)  does  not  decrease  the  complexity  of  the  underlying  data 
that  one  is  attempting  to  understand,  nor  does  it  decrease  the 
complexity  of  the  interface  by  which  one  browses  that  data.  In 
fact,  the  interface  is  actually  more  complex  than  a  static  browsing 
interface  might  be.  However,  it  does  allow  fast,  continuous,  un¬ 
ambiguous  feedback.  Conversely,  it  is  well  known  that  delayed 
auditory  feedback  (or  delayed  visual  feedback  possible  with  tele¬ 
vision  circuits)  is  highly  disruptive  to  performance.  In  some  set¬ 
tings,  feedback  on  performance  can  still  present  complexity  in 
the  form  of  credit  assignment.  If  you  lose  a  game  of  chess,  it  is 
unambiguous  that  you  lost,  but  determining  what  caused  you 
to  lose  is  difficult.  This  is  not  just  a  human  problem;  the  “credit 


498  •  THOMAS  AND  RICHARDS 


assignment”  issue  is  crucial  to  any  complex  adaptive  system 
(Holland,  1995).  In  golf,  putting  can  be  a  very  difficult  skill  to 
improve.  The  main  reason  is  that  it  is  difficult  to  determine 
which  of  many  possible  factors  is  responsible  for  success  or  fail¬ 
ure  in  putting  in  a  real  game  situation.  For  example,  if  you  miss 
a  putt  to  the  left  it  might  mean  that  you  misread  the  slope  of  the 
green;  misread  the  grain  of  the  green;  hit  the  ball  off  center  of 
the  putter;  hit  the  ball  with  a  curved  arc  swing;  hit  the  ball  with 
a  blade  not  normal  to  the  path;  and  all  combinations  of  these  fac¬ 
tors.  In  the  domain  of  HCI  (and  elsewhere),  this  lack  of  unam¬ 
biguous  feedback  about  which  subset  of  actions  is  causing  which 
difficulties  has  sometimes  been  referred  to  as  “tangled  prob¬ 
lems”  (e.g.,  Carroll  &  Mack,  1984).  In  the  domain  of  golf,  Dave 
Pelz  (2000),  an  MIT  physicist  and  ex-astronaut,  has  invented  a 
series  of  devices  to  give  the  learner  unambiguous  and  differen¬ 
tiated  feedback  about  these  various  possible  sources  of  error. 
Studying  these  devices  might  be  a  valuable  exercise  for  system 
designers  hoping  to  provide  a  similar  untangling  for  users. 


FOUR  SOURCES  OF  DIFFICULTY  FOR  HCI 


One  goal  related  to  psychological  complexity  is  to  provide  a  sys¬ 
tem  that  will  allow  non-programmers  to  do  what  is  essentially 
programming;  that  is,  allow  them  to  get  a  computer  system  to 
do  what  they  want  it  to,  not  just  in  terms  of  choosing  from  pre¬ 
existing  options  but  to  allow  them  to  have  more  open-ended  pro¬ 
ductive  control.  A  whole  succession  of  projects  at  Carnegie- 
Mellon  University  (e.g.,  Myers,  McDaniel,  Kosbie,  &  Marquise, 
1993),  the  LOGO  and  related  projects  at  MIT  (e.g.,  Papert,  1993), 
and  the  long  series  of  projects  by  Alan  Kay  and  others,  leading 
most  recently  to  Squeak  and  Etoys  (Kay,  2007),  have  attempted  to 
reach  this  goal.  Much  of  the  focus  of  these  projects  was  based  on 
trying  either  to  simplify  the  syntax  and  semantics  of  communi¬ 
cating  with  the  computer  or  allowing  it  to  take  place  in  a  manner 
more  nearly  like  the  way  people  communicate  with  each  other. 
It  is  certainly  true  that  the  detailed  and  often  obscure  syntax  of 
ordinary  computer  languages  can  provide  a  barrier  to  use.  Wein¬ 
berg  (1971)  pointed  out,  for  example,  that  FORTRAN  then  had 
one  set  of  rules  for  what  constituted  a  valid  arithmetic  expres¬ 
sion  when  used  in  an  assignment  statement  and  another,  more 
restrictive  set  of  rules  for  what  constituted  a  valid  arithmetic  ex¬ 
pression  in  an  array  index.  Typically,  programming  languages  are 
full  of  these  kinds  of  details  and  no  doubt,  they  provide  unnec¬ 
essary  complexity  for  someone  who  does  not  spend  a  large 
amount  of  time  using  such  a  language.  However,  it  is  important 
to  note  that  such  details  are  only  one  of  four  potential  sources  of 
difficulty  people  may  have  in  communicating  their  desires  to  a 
computing  system.  For  concreteness,  consider  a  scenario  in 
which  a  chess  grandmaster  without  programming  experience 
desires  to  write  a  computer  program  that  plays  excellent  chess. 


Understanding  the  Syntax  and  Semantics 
of  Communicating  with  the  Computer 

As  stated,  the  chess  grandmaster  will  probably  have  to  learn  the 
somewhat  arbitrary  syntax  and  semantics  of  a  programming  lan¬ 
guage.  Although  there  is  still  promise  in  techniques  aimed  at 


programming  by  example  and  natural  language  communication, 
at  this  point,  such  systems  are  insufficient  for  writing  something 
as  complex  as  a  chess  program. 

Making  Tacit  Knowledge  Explicit 

Assuming  that  the  expert  can  master  the  rules  of  some  pro¬ 
gramming  language  and  its  associated  development  environ¬ 
ment,  there  are  other  difficulties.  One  of  these  is  that  the  chess 
expert,  like  experts  in  many  fields,  is  unlikely  to  be  consciously 
aware  of  all  the  knowledge  that  he  or  she  brings  to  bear  on  the 
game  of  chess.  While  some  of  this  knowledge  is  somewhat  ex¬ 
plicit,  such  as  develop  the  center,  protect  your  king,  and  look 
for  double  attacks,  much  more  of  it  exists  in  the  form  of  pat¬ 
terns  that  have  developed  over  the  course  of  many  experiences 
(deGroot,  1978;  Simon  &  Barenfeld,  1969;  Simon  &  Gilmartin, 
1973).  Making  this  tacit  or  implicit  knowledge  explicit,  for  fields 
of  any  depth  and  breadth,  will  probably  prove  a  far  more  diffi¬ 
cult  and  time-consuming  task  than  learning  the  details  of  a  spe¬ 
cific  programming  language.  The  methods  and  techniques  of 
knowledge  programming  and  expert  systems  may  be  of  some 
help  here  along  with  storytelling.  Typically,  one  may  elicit  addi¬ 
tional  tacit  knowledge  (that  is,  transform  it  from  tacit  to  explicit) 
by  encouraging  the  expert  to  recall  specific  instances,  asking 
them  to  generalize  and  then  asking  for  counter-examples.  It  can 
also  be  useful  to  involve  pairs  or  small  groups  in  exchanging 
stories  about  their  experiences. 

Making  the  Computation  Efficient  and  Effective 

Another  set  of  issues  revolves  around  the  efficiency  of  what  is  go¬ 
ing  on  under  the  covers.  Although  computers  are  becoming  ever 
more  powerful  and  cheaper,  it  is  still  easy  for  nonexperts  to  write 
programs  that  are  so  inefficient  as  to  be  unworkable.  A  chess 
program  that  theoretically  makes  good  moves  but  takes  years  for 
each  move  is  unworkable  and  unusable.  In  fact,  a  naive  approach 
to  writing  a  chess  program  is  simply  to  do  an  exhaustive  search 
to  all  finishes  and  work  backwards.  It  may  or  may  not  be  obvi¬ 
ous  to  someone  untrained  in  mathematics  or  computer  science 
that  such  an  approach  would  be  completely  unworkable. 

While  the  chess  case  may  be  extreme,  far  less  extreme  cases 
can  still  be  quite  problematic.  If  one  is  writing  an  interactive  pro¬ 
gram,  timing  issues  are  important  and  how  to  achieve  good  tim¬ 
ing  may  require  a  great  deal  of  expertise  and  knowledge.  If  sys¬ 
tems  involve  multiple  people  and  or  computers,  it  is  quite  easy 
to  introduce  inconsistencies,  deadlocks,  thrashing,  and  so  on, 
even  if  a  nonprogrammer  can  master  syntax  and  make  implicit 
knowledge  explicit.  For  a  deeper  examination  of  the  relation¬ 
ship  of  usability  and  choices  in  the  underlying  architecture,  see 
Bass  and  John  (2003). 

Making  the  System  Understandable  and  Maintainable 

In  the  ongoing  stream  of  behavior,  we  make  what  seem  to  be 
obvious  choices,  such  as  where  to  put  the  car  keys.  If,  a  day  later, 
we  are  prone  to  forget  this  obvious  choice,  imagine  how  much 
more  difficult  it  can  be  to  understand  and  recall  the  numerous 
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decisions  that  must  be  made  in  designing  and  implementing  a 
computer  system.  A  nonprofessional  programmer  may  have 
very  little  knowledge  of  documentation,  help  systems,  updates, 
security,  compatibility  issues,  the  process  for  reporting  and  fix¬ 
ing  bugs,  and  so  on.  Consequently,  even  if  the  first  three  hurdles 
are  overcome  and  a  workable  program  is  produced,  its  half-life 
may  be  very  short  indeed. 

To  summarize,  the  goal  of  having  nonprogrammers  directly 
instruct  computers  in  a  generative  way  has  often  focused  on  de¬ 
signing  a  very  simple  and  consistent  syntax  for  a  programming 
language.  In  the  above  section,  we  indicated  that  such  a  focus 
only  addresses  the  first  of  four  major  obstacles  end-users  face: 
to  wit,  learning  a  complex  syntax. 

While  the  possibility  of  programming  by  nonprogrammers 
is  intriguing  and  probably  quite  a  bit  more  difficult  than  it  at  first 
appears,  there  is  a  slight  variant  which  is  becoming  more  com¬ 
monplace.  There  are  instances  where  a  community  of  practice 
includes  some  individuals  who,  although  perhaps  not  profes¬ 
sional  programmers,  are  quite  proficient  at  finding  and  modify¬ 
ing  code  in  order  to  fulfill  certain  functions.  In  some  cases,  they 
may  be  using  programming-like  functions  in  an  application  pro¬ 
gram  such  as  a  spreadsheet.  In  other  cases,  they  may  actually 
be  using  full-fledged  programming  languages.  By  modifying  ex¬ 
isting  code  incrementally,  these  semi-expert  users  are  often  able 
to  address  all  four  of  the  issues  mentioned  above.  An  interesting 
early  example  of  such  a  community  was  “Moose  Crossing” 
(Bruckman,  1997),  which  provided  a  shared  environment  for 
kids  to  teach  each  other  object-oriented  programming  in  a  spe¬ 
cial  simplified  language.  Nonetheless,  there  remain  many  appli¬ 
cations  and  systems  that  users  interact  with  that  are  developed 
by  professional  programmers  working  in  software  companies. 
An  examination  of  the  way  in  which  products  are  developed,  in 
turn,  can  provide  some  insight  into  various  places  where  un¬ 
due  complexity  may  be  injected  into  systems  (and  therefore, 
how  it  might  be  reduced). 


SOURCES  OF  COMPLEXITY 
IN  THE  DEVELOPMENT  PROCESS 


Radical  Iteration  in  the  Field 

While  this  topic  is  dealt  with  in  more  detail  elsewhere  in  this 
handbook  (see  also  Greene,  Jones,  Matchen,  &  Thomas,  2003), 
it  is  worth  at  least  a  brief  mention  here  that  radical  iteration  in 
the  field  can  greatly  help  avoid,  prevent,  and  address  all  of  the 
sources  of  complexity  that  we  are  about  to  enumerate.  We  will 
discuss  two  examples  of  this  approach  at  the  end  of  this  chapter. 
By  working  as  closely  as  possible  with  the  potential  users  of  a 
system  doing  their  real  work  in  their  real  context,  one  can  avoid 
unnecessary  complexity  that  might  otherwise  be  injected  dur¬ 
ing  problem  finding,  problem  formulation,  design,  develop¬ 
ment,  deployment,  and  maintenance. 

Problem  Finding 

Most  of  our  typical  education  focuses  on  solving  problems  that 
other  people  have  already  found  and  formulated.  Great  leverage 


can  arise  from  finding  and  formulating  problems.  Conversely, 
unnecessary  complexity  in  a  system  can  be  injected  from  the 
very  beginning  by  finding  a  problem  that  is  not  really  a  problem 
(for  the  users).  A  computer  scientist,  for  instance,  might  dis¬ 
cover  that  users  intuitively  choose  a  sequence  of  tasks  to  com¬ 
plete  that  is  not  quite  as  efficient  as  the  theoretically  optimal  se¬ 
quence.  The  users  choose  routes  that  are  “good  enough”  and 
“satisficing”  (Simon,  1962).  However,  the  computer  scientist  (or 
accountant)  might  see  the  inefficiency  as  a  problem  that  needs 
to  be  solved.  The  result  may  be  a  system  that  requires  users  to 
key  in  a  long  list  of  tasks  to  be  done,  resources  to  be  used,  the 
data  required  by  the  tasks  and  so  on.  The  system  that  solves  the 
(imaginary)  problem  could  easily  take  far  longer  than  the  labor 
supposedly  saved.  Of  course,  there  is  the  even  more  insidious 
problem  that  in  order  to  model  the  efficiency  of  a  process  in 
the  first  place,  certain  simplifying  assumptions  must  be  made 
and  these  simplifications  may  well  lead  to  a  solution  that  is  less 
nearly  optimal  in  the  real  world  than  the  original  behavior.  For 
example,  in  a  large  telecommunications  company,  a  route  opti¬ 
mization  program  was  developed  to  dictate  schedules  to  repair 
people.  The  program  failed  because  it  did  not  take  into  account 
many  specific  details  that  the  repair  people  knew  about  the 
times  and  schedules  and  constraints  of  others  with  whom  they 
had  to  coordinate. 

Inadequate  observation  or  starting  with  untested  assump¬ 
tions  about  what  problems  exist  may  also  prevent  people  from 
even  noticing  easily  solvable  problems.  On  one  factory  tour  in 
the  early  1980s, Thomas  (this  chapter)  was  shown  someone 
whose  job  was  to  precisely  align  two  silver  needles.  The  person 
was  sitting  in  a  very  awkward  position  and  behind  the  silver  nee¬ 
dles  was  a  silver  background.  It  was  explained  to  me  later  that 
since  the  error  rate  was  so  high,  they  were  working  on  a  pro¬ 
ject  to  use  machine  vision  to  replace  this  position.  The  author 
pointed  out  that  they  might  first  try  providing  the  person  with 
ergonomic  seating  and  a  different  background  against  which  to 
align  the  needles.  Automation  proved  unnecessary. 

Problem  Formulation  and  Requirements 

One  popular  anecdote  that  illustrates  the  importance  of  proper 
problem  formulation  recounts  a  modern  high-rise  office  building 
in  which  the  office  workers  kept  complaining  about  the  slowness 
of  the  elevators.  Computer  programmers  were  called  in  and  they 
reprogrammed  the  algorithms.  The  complaints  increased.  Engi¬ 
neers  put  in  heavier  duty  cables  and  motors  so  that  the  eleva¬ 
tors  could  move  faster.  Complaints  remained  as  strong  as  ever, 
and  several  important  multifloor  tenants  threatened  to  move  out. 
In  desperation,  the  building  owner  was  considering  sacrificing 
some  of  the  floor  space  and  adding  additional  elevator  shafts 
at  what  would  obviously  be  a  high  price.  Someone  suggested 
putting  mirrors  on  all  the  floors  near  the  elevators  and  the  com¬ 
plaints  ceased.  In  this  case,  people  initially  assumed  that  the 
problem  to  be  solved  was  that  the  elevators  ran  too  slowly  and 
focused  on  various  ways  to  increase  the  speed.  Eventually,  some¬ 
one  came  along  who  realized  that  the  real  problem  was  that  peo¬ 
ple  were  unhappy  about  the  elevator  speed.  Mirrors  gave  them 
something  to  do  and  the  time  waiting  did  not  seem  so  onerous. 

In  many  real-world  cases,  this  step  is  made  more  complex 
because  typically  there  are  a  number  of  stakeholders,  each  of 
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which  may  have  very  different  perceptions  of  the  problems  to 
be  solved.  A  sociotechnical  pattern,  “Who  Speaks  for  Wolf?” 
(Thomas,  Danis,  &  Lee,  2002),  based  on  a  Native  American  story, 
transcribed  by  Paula  Underwood  (1991),  suggested  both  the  im¬ 
portance  of  finding  all  the  relevant  perspectives  and  stakehold¬ 
ers  early  in  development  and  various  techniques  to  try  to  ac¬ 
complish  that.  Briefly,  one  of  the  members  of  a  tribe  was  called 
“Wolf”  because  he  made  a  life  study  of  wolves.  Once,  while  Wolf 
and  a  few  other  braves  were  on  an  extended  hunting  expedi¬ 
tion,  the  tribe  held  council  and  decided  they  needed  to  move.  A 
location  was  chosen,  and  the  tribe  moved;  however,  a  few 
months  later,  it  became  obvious  that  the  tribe  had  moved  into 
midst  of  the  spring  breeding  ground  of  the  wolves.  They  had 
to  decide  whether  to  move  again,  post  guards,  or  destroy  the 
wolves.  They  finally  decided  to  move  again,  but  asked  them¬ 
selves,  “What  did  we  learn  from  this,  and  how  can  we  avoid  this 
kind  of  error  in  the  future?”  Someone  pointed  out  that  if  Wolf 
had  been  present  at  the  first  council,  he  would  have  advised 
against  the  location.  From  then  on,  they  decided  that  whenever 
they  made  a  major  decision,  they  would  ask  themselves,  “Who 
speaks  for  Wolf?”  to  see  whether  there  were  missing  stakehold¬ 
ers  or  perspectives  that  needed  to  be  taken  into  account.  Many 
projects  could  profit  from  such  a  process. 

It  is  typically  well  understood  that  the  earlier  in  the  process 
of  development  an  error  is  caught,  the  less  expensive  it  is.  An  er¬ 
ror  in  problem  formulation  can  be  extremely  long-lived  and  ex¬ 
pensive  because  it  sets  the  context  for  measuring  success.  A 
product  or  system  may  be  developed  which  appears  to  be  suc¬ 
cessful  at  every  step  because  the  wrong  thing  is  being  measured 
so  that  ultimate  failure  goes  unnoticed  until  too  late.  A  classic 
example  is  the  replacement  of  Coca-Cola  with  New  Coke.  Ac¬ 
cording  to  Gladwell  (2005),  executives  at  The  Coca-Cola  Com¬ 
pany  were  worried  because  they  were  losing  some  market  share 
to  Pepsi  and  blind  taste  tests  indicated  that  Pepsi  was  more  of¬ 
ten  preferred.  As  a  result,  a  sweeter  version  of  Coke  that  tasted 
more  like  Pepsi  was  developed  and  put  on  the  market  as  a  re¬ 
placement  for  Coke.  The  reaction  was  surprising,  immediate,  pas¬ 
sionate,  and  nearly  disastrous  for  Coke.  People  wanted  the  old 
Coke  back.  The  entire  story  probably  involves  brand  loyalty, 
memory,  and  cognitive  dissonance,  but  one  fundamental  prob¬ 
lem  was  that  the  Coca-Cola  executives  were  assuming  that  the 
goal  was  to  develop  a  new  product  that  was  sweeter  so  that  it 
would  be  preferred  in  taste  tests  over  Pepsi.  And,  they  suc¬ 
ceeded — at  solving  the  wrong  problem.  What  tastes  best  when 
you  take  a  decontextualized  sip  is  not  necessarily  what  you  prefer 
(for  a  variety  of  reasons,  not  all  directly  related  to  taste)  day  after 
day,  month  after  month.  In  contrast  to  the  taste  test,  the  case  test 
shows  what  people  actually  buy  over  a  long  period  of  time,  and  of 
course,  it  is  the  latter  measure  which  is  actually  important  to  prof¬ 
itability.  In  this  case,  the  use  of  too  simple  a  measure  resulted  in 
the  wrong  problem  being  solved.  This  is  probably  a  common  sit¬ 
uation.  Another  example  of  this  type  occurred  when  the  manu¬ 
facturer  for  a  new  terminal  for  telephone  operators  did  usability 
and  productivity  tests  on  operators  using  the  terminal  but  failed 
to  take  into  account  the  phone  company  customer  who  was  also 
on  the  line  and  whose  behavior  actually  turned  out  to  be  on  the 
critical  path  most  of  the  time.  The  entire  design,  development, 
testing,  and  so  on  was  predicated  on  optimizing  a  system  that 
consisted  of  the  computer  and  the  operator  when  the  system 
that  really  mattered  was  the  operator,  the  computer,  and  the  cus¬ 


tomer.  Again,  an  initial  oversimplification  resulted  in  a  much 
more  complex  total  solution  than  a  more  inclusive  (and  some¬ 
what  more  complex)  initial  formulation  would  have  produced. 

However,  undue  complexity  can  also  be  introduced  by  for¬ 
mulating  the  problem  in  overly  complex  terms.  The  idea  of  (and 
failure  of)  detailed  centralized  economic  planning  may  be  the 
quintessential  example. 

Finally,  undue  complexity  can  be  introduced  by  beginning 
with  the  wrong  problem  formulation.  Even  if  a  better  formula¬ 
tion  is  discovered  later,  unless  the  development  team  is  willing  to 
throw  out  everything  and  start  over,  remnants  of  the  original  for¬ 
mulation  will  tend  to  persist  into  design,  development,  deploy¬ 
ment,  testing,  and  so  on,  making  both  the  resulting  system  and 
its  associated  elements  (e.g.,  sales  and  marketing  materials,  doc¬ 
umentation,  education  packages,  problem  determination  aids, 
etc.)  more  complex  than  they  need  have  been. 

Design 

In  an  attempt  to  break  down  a  complex  problem  into  manage¬ 
able  subproblems,  development  teams,  quite  reasonably,  divide 
into  subteams  to  deal  with  various  subsystems.  Unfortunately, 
this  can  result  in  inconsistencies  in  basic  functionality  as  well  as 
in  the  user  interface.  For  example,  in  one  word  processor,  under 
certain  conditions,  the  user  would  be  faced  with  a  message  that 
said,  in  essence,  “You  cannot  delete  that  file  because  it  does  not 
exist.”  An  attempt  to  create  another  file  by  the  same  name  how¬ 
ever,  resulted  in  a  message  that  said,  in  essence,  “You  cannot 
create  that  file  because  it  already  exists.”  It  seems  clear  that  no 
commonly  agreed  upon  definition  of  what  it  meant  for  a  file  to 
exist  held  sway  through  the  whole  of  the  development  team. 

At  a  more  superficial  level,  one  approach  to  reduce  unneces¬ 
sary  discrepancies  in  the  way  that  the  user  interface  functions  is 
to  provide  a  style  guide  so  that  diverse  developers  or  business 
partners  working  on  various  aspects  or  functions  of  an  applica¬ 
tion  suite  will  tend  to  provide  a  similar  look  and  feel.  There  is 
much  to  be  said  for  this  approach  if  it  is  applied  with  perspec¬ 
tive  and  intelligence.  In  the  worst  case,  development  teams  may 
blindly  follow  what  was  meant  as  a  guideline  and  interpret  it  as  an 
ironclad  rule.  For  instance,  Thomas  (this  chapter)  became  in¬ 
volved  in  the  development  of  an  application  for  the  service  rep¬ 
resentatives  for  a  large  telecommunications  company.  The  cor¬ 
porate  development  team  insisted  that  we  must  follow  some 
guidelines  that  claimed  users  should  choose  an  object  before 
choosing  an  action.  For  users  who  must  move  between  multiple 
applications,  there  is  a  relative  advantage  of  having  consistency 
among  applications.  In  this  specific  context,  however,  the  users 
did  not  use  multiple  applications  and  it  was  clear  that  the  way 
that  they  naturally  thought  of  and  interacted  with  the  task,  choos¬ 
ing  an  action  first  was  far  more  intuitive,  quicker,  and  less  error 
prone  than  choosing  an  object  first.  Nonetheless,  the  manage¬ 
ment  of  the  development  team  believed  that  the  provided  guide¬ 
lines  were  received  truth,  and  therefore  they  must  be  followed. 

Development 

Perhaps  the  greatest  contributor  to  undue  complexity  is  that  the 
development  team  (and  to  a  lesser  extent,  management,  asso- 
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ciated  marketing  and  sales,  documentation  specialists,  etc.)  be¬ 
comes  so  familiar  with  a  system  that  almost  everything  about  it 
becomes  obvious  and  easy,  regardless  of  how  it  might  be  per¬ 
ceived  by  an  end  user.  Once  one  sees  a  hidden  figure  (such  as 
a  pig  in  the  clouds),  it  is  nearly  impossible  not  to  see  it.  Over  the 
lifecycle  of  a  product,  tens,  hundreds,  or  even  thousands  of  lit¬ 
tle  conventions  and  assumptions  become  second  nature  to 
those  associated  with  the  development. 

The  cure  for  this  malady  is  to  continually  test  the  scenarios 
of  use,  the  designs,  paper  mock-ups,  screen  shots,  prototypes, 
and  beta  versions  with  naive  users;  that  is,  folks  who  are  repre¬ 
sentative  of  potential  users  but  not  part  of  the  design  team.  If, 
for  any  reason,  this  is  not  feasible  (e.g.,  security,  the  users  do  not 
yet  exist)  other  techniques  can  provide  some  amelioration.  For 
instance,  heuristic  evaluation  (Nielsen  &  Landauer,  1993)  is  likely 
to  catch  a  fair  proportion  of  actual  problems.  This  is  best  done 
with  HCI  experts  who  are  already  familiar  with  both  the  technol¬ 
ogy  and  the  application  area.  Typically,  five  to  eight  experts  pro¬ 
vide  the  most  value.  A  variation  on  this  technique  involves  having 
people  successively  take  on  different  personae  while  interacting 
with  a  system  and  this  may  increase  the  number  of  errors  found 
in  the  same  period  of  time  (DeSurvire  &  Thomas,  1993). 

Testing 

All  too  often,  testing  schedules  become  compressed.  Partly  as  a 
result  of  time  pressure  and  partly  as  a  result  of  the  fact  that  the 
development  team  has  become  accustomed  to  the  high  level 
and  low  level  design,  the  error  codes  (or  error  messages)  are  of¬ 
ten  cryptic  and  designed  for  the  productivity  and  convenience 
of  the  testing  team.  Unfortunately,  in  many  cases,  these  error 
messages  persist  into  the  product  that  the  ultimate  end  user 
experiences.  Testers  are  often  testing  functions  in  a  very  well- 
specified  context  so  that  an  error  message  is  quite  interpretable 
to  them.  However,  to  an  end  user,  these  same  error  messages 
are  completely  incomprehensible  and  often  the  output  comes 
from  dropping  down  several  layers  in  the  software  stack.  Thus, 
an  end  user  attempting  to  hit  a  button  in  a  high-level  application 
may  see  an  error  message  that  not  only  does  not  specify  what 
corrective  action  to  take,  but  it  mentions  software  elements  that 
the  end  user  is  not  even  aware  existed. 

Another  issue  with  testing,  of  course,  is  that  testers  who 
work  in  the  development  organization  may  only  test  reason¬ 
able  combinations  of  function.  Thomas  (this  chapter)  designed 
the  user  experience  for  a  “Dynamic  Learning  Environment” 
(Farrell,  Thomas,  Rubin,  Gordin,  Katriel,  O’Donnell,  et  al., 
2004).  In  testing  the  code,  he  tried  altering  the  URL  returned 
by  the  system  in  order  to  find  what  he  was  looking  for.  The  de¬ 
velopers  all  knew  that  you  could  not  do  this  and  it  caused  the 
system  to  crash — but  it  is  an  often-used  strategy.  For  instance, 
if  you  know  that  http:/Avww.umich.edu/~personl  is  the  URL  of 
person  1,  and  you  are  trying  to  find  the  website  of  their  col¬ 
league  at  Michigan,  person  2,  you  might  reasonably  suspect  that 
their  URL  might  be  http://www.umich.edu/~person2,  and  in¬ 
deed  this  often  works.  Happily,  he  was  able  to  convince  the  de¬ 
velopers  to  prevent  this  ploy  from  crashing  the  system.  More 
generally,  it  is  important  to  have  potential  users  test  the  system 
as  well  as  those  steeped  in  the  cultural  assumptions  of  IT  gen¬ 
erally  and  a  particular  company  or  product  specifically. 


Deployment 

Today,  many  applications  include  “Wizards”  to  help  unzip,  install, 
and  even  use  a  product.  Often,  the  distribution  of  applications 
is  via  websites,  and  therefore,  finding  the  right  application,  find¬ 
ing  the  right  version  of  the  application,  finding  out  whether  one 
has  all  the  prerequisites  and  necessary  patches,  and  deciding 
which  features  and  options  to  turn  on  or  off  can  prove  as  com¬ 
plex,  or  even  more  complex,  than  actually  using  the  application. 
Often,  the  instructions  and  interfaces  associated  with  deploy¬ 
ment  suffer  obscurities  from  the  same  root  cause  as  those  intro¬ 
duced  by  the  development  team — being  overly  familiar  with  the 
application  or  system.  For  example,  in  recently  attempting  to 
download  an  upgrade  to  a  system,  the  instructions  cautioned  the 
user  to  be  sure  to  use  one  specific  URL  and  not  another.  This 
URL,  however,  did  not  refer  to  the  actual  URL  for  finding  the  de¬ 
sired  download.  The  required  URL  laid  three  layers  down  a  menu 
structure  in  a  different  application.  To  the  folks  who  deployed 
this  update,  the  context  in  which  this  specific  URL  was  to  be  used 
was  so  obvious  as  to  not  bother  making  that  context  explicit. 

Service 

With  the  spread  of  an  ever  cheaper,  higher  bandwidth  telecom¬ 
munications  infrastructure,  service  has  experienced  the  dual 
trends  of  centralization  of  function  and  service  being  geograph¬ 
ically  distant  from  the  users  they  are  trying  to  support.  While 
centralization  offers  some  benefits  in  economy  of  scale  and 
knowledge  sharing,  having  service  personnel  geographically  dis¬ 
tant  can  mean  a  decrease  in  shared  context  between  service 
people  and  the  users  they  support.  For  example,  local  tele¬ 
phone  operators  used  to  be  able  to  answer  problems  such  as, 
“I  need  the  number  for  that  gas  station  across  from  the  theater 
down  town.”  Such  problems  are  currently  unanswerable  from 
distant,  centralized  locations.  In  some  cases,  service  personnel 
and  users  or  customers  may  even  have  different  native  lan¬ 
guages  and  come  from  different  cultures.  There  is  a  possibility 
that  some  of  this  shared  context  may  soon  be  reinstated,  such  as 
from  geographical  information  systems  and  from  systems  that 
allow  service  people  to  view  screens  and  actions  remotely.  It 
may  be  that  a  shared  view  of  the  real  visual  world  is  unneces¬ 
sary;  a  shared  representation  of  the  salient  features,  assuming 
they  can  be  identified,  may  be  enough  (O’Neill,  Castellani, 
Grasso,  Roulland,  &  Tolmie,  2005). 

From  a  user’s  perspective,  undue  complexity  is  often  injected 
into  the  process  of  joint  problem  solving  with  service  personnel 
because  the  service  personnel  are  often  organized  according  to 
the  underlying  system  that  they  are  trying  to  support  rather  than 
the  symptom  experienced  by  a  user.  For  instance,  an  end  user 
may  attempt  to  print  something  on  a  new  printer  and  get  an  er¬ 
ror  message  indicating  that  the  print  job  failed.  The  user  calls  the 
help  desk,  and  the  top-level  menu  asks  whether  they  are  having 
a  problem  with  connectivity,  the  operating  system,  or  an  appli¬ 
cation  program,  any  of  which  may  be  the  appropriate  answer. 

An  apparently  successful  innovation  in  online  help  is  the  ‘An¬ 
swer  Garden”  (Ackerman  &  Malone,  1990),  which  allows  expert 
users  over  time  with  minimal  disruption  to  grow  a  more  de¬ 
tailed  tree  of  FAQs.  Other  trends  making  the  end  user’s  job  of 
getting  help  simpler  include  the  use  of  search  engines  as  well 
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as  Wikis  and  weblogs  to  support  communities  of  practice  and 
communities  of  interest. 


Maintenance 

Consider  the  maintenance  of  an  automobile  circa  1950.  While 
such  machines  were  complex,  the  parts  were  all  large  enough 
to  be  visible,  and  when  a  part  broke,  it  could  be  replaced  by  a  sim¬ 
ilar  part.  Such  repairs  might  be  simple  or  they  might  be  complex 
in  that  replacing  a  part  might  require  moving  other  parts  in  or¬ 
der  to  reach  the  part  to  be  replaced.  Modern  software  systems 
(and  such  systems  are  becoming  ubiquitously  embedded  in  all 
other  technologies)  introduce  a  set  of  new  and  often  complex 
maintenance  issues.  Before  installing  a  new  upgraded  version  of 
one  piece  of  software,  for  instance,  it  is  often  necessary  to  check 
for  several  prerequisite  pieces  of  software.  Each  of  these  may  in 
turn  require  still  other  prerequisites.  Even  if  you  already  have  one 
of  the  prerequisites  installed,  say  version  X  of  program  Y,  you  may 
also  need  to  install  a  fair  number  of  patches  toX  In  some  cases, 
you  need  to  uninstall  software,  taking  care  throughout  to  reboot 
as  needed  in  order  to  clear  out  persistent  memory.  Indeed,  if  this 
entire  tree  is  followed,  the  case  still  may  be  that  installing  some 
new  piece  of  software  causes  something  else  not  to  work  any 
longer.  While  the  use  of  Wizards  has  made  maintenance  much 
simpler  in  many  cases,  if  something  does  fail,  diagnosing  and  fix¬ 
ing  the  failure  may  require  delving  into  several  layers  of  software 
beneath  the  GUI  that  is  supposed  to  provide  the  simple  inter¬ 
face  for  the  end  user. 


WAYS  TO  MEASURE  COMPLEXITY 


A  Priori  Mathematical  Models 

We  have  already  mentioned  an  attempt  to  measure  the  com¬ 
plexity  of  programs  by  Halstead  (1977).  Other  measures  have 
been  proposed;  probably  the  best  known  are  those  by  McCabe 
and  Butler  (1986;  McCabe,  1976)  and  Jones  (1996).  These  met¬ 
rics  aim  to  capture  the  inherent  psychological  complexity  of 
the  program  and  not  just  the  behavior  that  the  program  sur¬ 
faces  to  the  end  user.  We  might  think  to  find  useful  analogues 
here  to  measures  for  the  complexity  of  the  user’s  interaction 
with  a  program  as  well.  Unfortunately,  the  measures  do  not  cur¬ 
rently  deal  with  several  major  factors  of  psychological  complex¬ 
ity.  First,  the  models  do  not  deal  with  the  complexity  of  the 
visual  (or  other  sensory)  stimuli  presented  to  the  user.  Second, 
the  models  do  not  deal  with  the  complexity  of  the  response  re¬ 
quired  of  the  user.  Third,  the  models  do  not  take  into  account 
the  many  complicating  factors  mentioned  above  that  impact  the 
correspondence  between  what  is  required  of  the  user’s  behav¬ 
ior  and  what  the  user  already  knows. 

One  of  the  first  attempts  to  quantitatively  predict  ease  of  use 
a  priori  tried  to  measure  the  complexity  of  the  internal  struc¬ 
ture  necessary  to  generate  behavior  (Reisner,  1984).  A  similar 
approach  has  been  used  to  try  to  predict  learning  time  (Bovair, 
Kieras,  &  Poison,  1990).  Basically,  both  approaches  consider  the 
complexity  of  the  underlying  rule  systems.  Reisner’s  claim  is 
that,  other  things  being  equal,  it  takes  longer  to  execute  a  more 


complex  rule  set.  Bovair,  Kieras,  and  Poison  claimed  that  it  takes 
longer  to  learn  a  more  complex  rule  set  as  well  as  to  execute  it 
once  learned.  Both  of  these  claims  seem  justified,  though  only 
with  the  caution  of  ceteris  paribus  (see  Rautenberg,  1996  for  a 
meta-analysis  of  quantification  attempts  of  user  interfaces;  see 
Ivory  &  Hearst,  2001  for  a  broader  review  of  attempts  to  auto¬ 
mate  all  or  part  of  the  process  of  usability  evaluation). 

Linear  Regression 

Psychological  tests  have  often  been  constructed  by  taking  a  very 
large  number  of  diverse  items  and  then  seeing  which  ones  cor¬ 
relate  with  a  desired  criterion.  Of  course,  one  needs  to  revali¬ 
date  these  items  with  a  new  sample,  but  it  is  interesting  that 
somewhat  predictive  tests  have  been  constructed  with  this 
method  for  many  domains.  In  fact,  linear  predictive  models 
have  been  successfully  applied  to  a  large  number  of  domains 
and  are  generally  better  than  human  experts  (see  Dawes,  1982; 
Dawes,  Faust,  &  Meehl,  1992).  Walston  &  Felk  (1977)  used  such 
a  technique  to  predict  the  effort  needed  to  complete  a  wide  se¬ 
lection  of  software  development  projects.  Interestingly,  the 
most  important  factors  had  to  do  with  the  sociopolitical  aspects 
of  projects  rather  than  the  technical  ones.  A  widely  used,  flexi¬ 
ble  method  to  predict  various  aspects  of  software  development 
efforts  is  COCOMO  (Boehm,  1981;  2000).  Though  not  strictly 
linear,  the  calculations  are  based  on  an  empirical  analysis  of  ac¬ 
tual  software  development  projects. 

Perhaps  such  approaches  can  be  profitably  extended  to  psy¬ 
chological  complexity  as  well.  In  fact,  at  least  one  attempt  to 
measure  the  “goodness”  (related  at  least  in  part  to  complexity) 
of  websites  used  a  similar  approach  (Ivory  &  Hearst,  2001)  and 
showed  some  evidence  of  success.  However,  in  their  study, 
somewhat  different  factors  were  predictive  for  different  topic  ar¬ 
eas.  Further,  because  conventions  and  underlying  technology 
keep  changing,  it  is  likely  that  linear  predictive  equations  might 
have  to  be  updated  on  a  frequent  basis.  In  general,  this  is  a  po¬ 
tential  limitation  of  linear  predictive  models.  They  can  be  quite 
accurate  and  quite  robust,  but  only  provided  the  underlying 
conditions  remain  relatively  constant.  The  absence  of  a  theo¬ 
retical  underpinning  can  lead  to  predictions  that  can  become 
inaccurate  without  warning. 

Subjective  Measures 

An  entirely  different  approach  to  measuring  cognitive  complex¬ 
ity  is  to  use  subjective  measures;  in  effect,  to  ask  people  to  rate 
or  rank  items  in  terms  of  cognitive  complexity.  In  judgments  of 
simple  stimulus  sequences,  it  appears  that  complexity  and  ran¬ 
domness  are  highly  related  (Falk  &  Konold,  1997).  An  interest¬ 
ing  and  sophisticated  attempt  to  provide  a  more  differentiated 
subjective  view  of  complexity  applicable  to  real-world  systems  is 
that  of  Cognitive  Dimensions  of  Complexity  (Green,  1989; 
Green  &  Petre,  1996).  This  approach  identifies  14  different 
dimensions  of  complexity.  It  can  be  used  as  a  practical  tool  to 
focus  the  attention  of  developers  successively  on  various  areas 
of  design  that  can  lead  to  undue  complexity.  More  recently,  re¬ 
lated  work  has  focused  on  mismatches  between  the  concepts 
inherent  in  the  design  of  an  artifact  and  the  way  users  concep¬ 
tualize  (Connell,  Blandford,  &  Green,  2004). 
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Textual  Analysis  of  Documentation 

In  general,  we  expect  the  complexity  of  a  description  of  some¬ 
thing  to  correlate  highly  with  the  complexity  of  that  thing.  Of 
course,  it  is  possible  to  describe  a  simple  thing  with  undue  com¬ 
plexity,  but  in  most  circumstances,  both  extrinsic  reward  struc¬ 
tures  and  the  rules  of  conversation  tend  to  push  description  to 
be  sufficiently  complex  to  describe  something  accurately  but 
sufficiently  simple  to  be  understood.  Therefore,  one  promising 
ersatz  measure  of  the  complexity  of  something,  such  as  a  com¬ 
puter  system  or  a  procedure  using  that  system,  is  to  measure 
the  complexity  of  the  description.  One  commonly  used  mea¬ 
sure  is  the  Flesch  Formula  (Flesch,  1948),  which  basically  uses 
the  average  length  of  words  in  letters  and  the  length  of  sen¬ 
tences  in  words  in  a  document  to  provide  as  a  score  the  educa¬ 
tional  reading  level  required  to  understand  that  document.  An¬ 
other  metric  is  the  Gunning  Fog  index  (Gunning,  1952),  which 
seeks  to  measure  the  clarity  of  writing.  A  still  more  sophisticated 
automated  approach  is  possible  with  the  DICTION  program 
(Hart,  2001)  which  gives  31  primary  and  several  derived  dimen¬ 
sions  to  describe  the  rhetorical  style  of  a  document.  One  of  the 
primary  dimensions  is  labeled  complexity  and  several  of  the  oth¬ 
ers  might  relate.  Although  this  approach  offers  promise  as  a 
metric  to  be  applied  after  the  fact,  caution  must  be  applied  to 
using  it  as  an  in-process  metric.  Otherwise,  writers  could  use 
various  tricks  to  make  the  description  of  an  artifact  appear  sim¬ 
ple  while  in  reality  making  it  incomprehensible.  (In  the  absurd 
limiting  case,  one  could  provide  a  manual  for  a  complex  prod¬ 
uct  that  simply  said,  “Use  it.”)  Clearly  documents  must  also  be 
constrained  to  be  complete  in  order  to  apply  a  fair  and  useful 
application  of  complexity  metrics  to  descriptions. 

Iterative  Design  and  Testing 

Clearly,  much  of  the  rest  of  this  handbook  expands  on  this  topic. 
While  considerations  of  complexity  may  prove  useful,  at  this 
point,  there  is  no  substitute  for  interacting  with  real  users  doing 
real  work  in  their  real  context  as  a  general  method  for  design¬ 
ing,  developing,  and  deploying  truly  useful  and  usable  systems. 
Thinking  about  and  measuring  complexity  may  well  help  in  this 
process,  but  it  will  probably  never  substitute  for  it. 


POSSIBLE  FUTURE  APPROACHES 


One  can  imagine  that  ever  more  sophisticated  techniques  for 
brain  activity  imaging  and  other  physiological  measures  may 
someday  render  a  reliable  and  easy-to-use  objective  index  of 
psychological  complexity.  Somewhat  more  likely  is  the  contin¬ 
ued  evolution  of  modeling  approaches,  such  as  SOAR  (Laird, 
Newell,  &  Rosenbloom,  1987)  and  EPIC  (Kieras  &  Meyer,  1997). 
The  follow-on  programs  might  someday  conceivably  and  auto¬ 
matically  calculate  a  set  of  complexity  numbers  for  combinations 
of  user,  task,  system,  and  context.  In  the  nearer  term,  it  seems 
likely  that  continued  improvement  of  heuristic  approaches  that 
do  require  some  human  intervention  will  prove  useful  (e.g.. 
Dimensions  of  Cognitive  Complexity;  Green,  1989)  and  the 
Brown,  Keller,  and  Hellerstein  (2005)  model  in  helping  to  de¬ 
sign  better  user  experiences.  The  latter  model  assumes  that  an 


expert  path  is  known  and  complexity  depends  upon  the  num¬ 
ber  of  actions,  the  memory  load  associated  with  retaining 
needed  parameters,  and  the  number  of  context  shifts. 

In  our  own  work,  we  are  attempting  to  extend  this  model  in 
order  to  account  more  completely  for  points  of  uncertainty  and 
the  difficulties  of  decision  making.  In  addition,  we  are  building 
tools  that  enable  complexity  metrics  to  be  produced  as  a  side  ef¬ 
fect  of  normal  development  processes.  In  this  way,  developers 
may  gain  feedback  that  is  both  timelier  and  more  differentiated 
with  respect  to  their  design  decisions.  In  the  short  term,  such 
metrics  should  provide  useful  feedback.  As  the  tools  come  to  be 
used  over  time,  we  expect  the  development  teams  to  internalize 
the  implications  so  that  better  initial  design  decisions  are  made. 


COMPLEXITY  REDUCTION  IN  PRACTICE: 
CASE  STUDIES 


clearly  complexity  is  itself  a  complex  topic.  However,  it  is  not  in¬ 
tractable.  Numerous  examples  of  good  design  and  development 
leading  to  suitable  levels  of  complexity  may  be  found.  We  con¬ 
clude  this  chapter  with  two  case  studies  from  our  own  labora¬ 
tory  that  illustrate  how  complexity  may  be  considerably  reduced 
in  a  real  world  setting. 

Case  Study  1 :  Query  By  Example 

Query  By  Example,  invented  by  Moshe  Zloof  (1975)  in  the  early 
1970s,  was  an  attempt  to  provide  a  query  language  easy  enough 
for  computer-naive  novices  to  learn  to  use.  Early  studies  indicated 
that  nonprogramming  high-school  and  college  students  were 
fairly  successful  in  translating  English  questions  into  the  syntax 
of  this  formal  query  language  (Thomas  &  Gould,  1975).  One 
probable  reason  for  the  relatively  fast  times  and  low  error  rates 
was  that,  in  many  cases,  users  only  had  to  select  the  right  place 
to  write  information  rather  than  having  to  write  all  information 
from  scratch  (see  Fig.  25.10).  Another  likely  reason  was  that,  in 
Halstead’s  (1977)  terms.  Query  By  Example  was  quite  “dense.” 
That  is,  almost  all  of  the  symbols  required  for  a  given  query  were 
symbols  that  would  be  required  in  any  conceivable  query  lan¬ 
guage.  There  was  little  overhead.  In  other  words,  the  intrinsic 
complexity  of  any  given  query  remained,  but  there  was  no  gratu¬ 
itous  or  undue  complexity  introduced  by  the  system  itself 


Employees 

Name 

Age 

Salary 

Year  of  Hire 

P- 

>65 

"Print  the  names  of  everyone  who  is  more  than  65  years  of 
age.”  Oniy  the  "p."  and  the  ">65”  is  entered  by  the  user. 

FIGURE  25.10.  A  simple  query  in  "Query  By  Example.” 
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In  some  cases,  students  continued  to  have  some  difficulties, 
but  further  studies  showed  that  these  difficulties  did  not  spring 
from  the  query  language  per  se,  but  from  general  difficulties 
with  logic  (Thomas,  1976).  For  example,  students  had  some  dif¬ 
ficulties  with  quantifiers  {all  and  some),  logical  connectives 
{and,  or,  and  not)  and  with  operational  distinctions  {sum, 
count,  and  count  unique).  These  same  difficulties  occurred  in 
the  absence  of  any  requirements  of  the  query  language.  Addi¬ 
tional  difficulties  were  encountered  when  the  structure  of  the 
database  did  not  match  well  with  the  structure  of  the  question. 
For  instance,  if  the  database  contained  a  column  labeled  “Year  of 
Hire,”  a  question  asking  for  the  names  of  people  “hired  after 
1970”  was  easy.  A  question  asking  for  the  names  of  people  “who 
have  worked  for  more  than  35  years,”  however,  was  much  more 
problematic. 

In  a  follow-on  study  (Thomas,  1983),  students  were  not 
given  English  questions  to  translate  but  instead  were  shown  a 
data  base  structure  and  given  problem  statements.  They  were 
then  to  generate  their  own  English  questions  relevant  to  the 
problem  and  then  translate  their  own  questions  into  Query  By 
Example.  These  results  were  much  less  encouraging.  These  stu¬ 
dents  had  very  little  concept  of  what  types  of  questions  were 
reasonable  to  put  to  the  computer.  One  problem  given  was, 
“Some  of  the  younger  faculty  members  feel  that  they  are  not 
paid  enough  relative  to  the  older  faculty.  Write  a  question  that 
you  think  might  shed  light  on  this  issue.”  A  typical  question  was, 
‘Are  the  younger  faculty  paid  enough?”  In  other  words,  many 
of  the  students  expected  the  computer  to  simply  “formulate 
and  solve  the  problem”  for  them. 

In  the  1970s,  novice  computer  users  not  only  might  have 
trouble  writing  a  correct  query;  they  might  also  write  an  incor¬ 
rect  query  that  would  consume  a  huge  amount  of  resource.  For 
this  reason,  it  was  considered  worthwhile  to  try  to  predict  the 
chances  that  a  query  was  correct  before  submitting  it  to  the 
search  engine.  Based  on  the  formal  properties  of  the  query 
alone  (number  of  operators,  etc.),  we  were  able  to  predict  54% 
of  the  variance  in  query  accuracy.  If  we  added  the  time  taken  to 
write  the  query  and  the  student’s  own  confidence  rating,  we 
achieved  an  accuracy  of  75%.  Finally,  if  we  added  a  term  for 
whether  the  mapping  between  the  English  and  query  was 
straightforward,  more  than  90%  of  the  variance  was  predictable. 
All  these  measures  may  be  thought  of  as  factors  related  to  psy¬ 
chological  complexity.  While  the  interface  and  language  were 
developed  and  improved  iteratively,  our  studies  also  showed  that 
remaining  user  difficulties  were  due  to  intrinsic  task  complexity. 


Case  Study  2:  Web  Accessibility  Technology 

With  the  web  growing  in  importance  to  both  recreation  and 
work,  it  became  clear  to  us  some  time  ago  that  barriers  to  web 
use  needed  to  be  lowered.  It  was  no  longer  acceptable  for  peo¬ 
ple  with  visual  and  motor  disabilities  to  be  denied  effective  web 
access. 

We  began  our  explorations  in  this  area  by  working  directly 
with  older  adults,  a  population  needing  both  web  content  mod¬ 
ifications  (text  and  image  enlargement,  navigational  simplifica¬ 
tion,  visual  contrast  modifications,  etc.)  and  input  adaptations 
(to  filter  out  hand  tremors,  remove  extraneous  key  presses  and 


mouse  clicks,  etc.).  Of  course,  modern  browsers  and  operating 
systems  provide  a  number  of  these  modifications  and  adapta¬ 
tions,  but  access  to  them  is  distributed  throughout  the  system, 
inconsistently  surfaced,  and  often  inaccessible  to  those  most  in 
need.  Our  challenge  lay  in  finding  a  way  to  reduce  the  com¬ 
plexity  of  so  much  functionality  simultaneously  while  adding 
still  more  accessibility  features  not  otherwise  available. 

Our  approach  was  iterative  and  user  centered.  Early  on,  it 
became  clear  that  our  users  did  not  want  a  simplified  browser. 
Nor  did  they  want  to  be  restricted  to  a  subset  of  the  web.  After 
multiple  attempts  (Richards  and  Hanson,  2004;  Hanson  and 
Richards,  2005),  we  converged  on  a  design  that  added  a  set  of 
very  simple  control  panels  to  the  bottom  of  an  otherwise  nor¬ 
mal  browser,  each  control  panel  being  dedicated  to  one  class 
of  adaptation  or  transformation.  Eigure  25.11  shows  one  such 
panel  in  the  context  of  the  full  browser. 

The  only  change  to  the  browser’s  normal  interface  is  the  ad¬ 
dition  of  a  single  “Settings”  button.  Clicking  on  this  button 
brings  up  the  first  of  the  series  of  control  panels.  Clicking  on 
one  of  the  buttons  in  a  control  panel  applies  an  immediate 
change  to  the  web  content.  If  the  change  is  desired  it  is  kept  by 
doing  nothing  more  with  the  panel.  Otherwise,  it  is  cancelled  by 
clicking  on  a  “None”  or  “Standard”  button  depending  on  the 
panel.  The  panels  are  organized  in  a  conceptual  ring.  Clicking 
on  the  “>”  button  brings  up  the  next  panel.  Clicking  on  the 
“<”  button  brings  up  the  previous  panel.  Clicking  on  the  “?” 
button  brings  up  an  interactive  help  page  keyed  to  the  current 
panel,  which  allows  the  user  to  easily  explore  the  full  range  of 
modification  the  panel  exposes. 

A  number  of  mechanisms  lie  beneath  this  simple  surface. 
Some  modifications  are  applied  by  altering  browser  settings 
stored  in  the  system’s  global  registry.  Others  are  applied  by  au¬ 
tomatically  creating  and  installing  a  user  style  sheet.  Others  are 
applied  by  adjusting  parameters  in  the  underlying  accessibility 
settings  of  the  system.  Others  are  applied  by  modifying  the  web 
page’s  in-memory  Document  Object  Model.  Still  others  are  ap¬ 
plied  by  means  of  always-running  agents  examining  input  event 
streams.  All  of  this  complexity  is  hidden  from  the  user. 
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FIGURE  25.11.  The  "colors"  panel  for  web  accessibility  software. 


25.  Achieving  Psychological  Simplicity:  Measures  and  Methods  To  Reduce  Cognitive  Complexity  •  505 


For  a  concrete  example  of  the  resulting  simplification,  con¬ 
sider  the  case  of  changing  the  colors  of  web  pages.  This  is  a 
modification  often  found  useful  for  people  with  visual  disabili¬ 
ties  and  for  people  with  various  forms  of  dyslexia.  To  change  just 
the  text  foreground  and  background  colors  on  a  popular 
browser  requires  16  separate  steps.  To  ensure  that  link,  visited 
link,  and  hover  colors  are  suitably  contrasting  requires  24  addi¬ 
tional  steps.  In  contrast,  in  our  design,  setting  two  of  the  most 
popular  color  combinations — black  text  on  white  backgrounds 
or  white  text  on  black  backgrounds — requires  only  a  single  click 
on  a  button  showing  the  desired  color  combination.  Optimally 
contrasting  link,  visited  link,  and  hover  colors  are  automatically 
set  as  a  side  effect  of  this  choice.  For  those  needing  a  broader 
range  of  choice,  another  simple  panel  allows  the  direct  setting 
of  the  foreground  and  background  RGB  values.  Again,  the  link, 
visited  link,  and  hover  colors  are  set  automatically. 

This  case  study  illustrates  a  number  of  approaches  to  reduc¬ 
ing  software  complexity.  First,  iterative  user-centered  design  al¬ 
lowed  for  the  rapid  exploration  of  a  range  of  alternative  inter¬ 
faces.  Second,  the  addition  of  a  new  interface  layer  allowed  for 
the  unification  of  a  number  of  disparate  mechanisms.  Third, 
careful  analysis  of  the  needs  of  users  allowed  for  some  choices 


to  be  surfaced  prominently  and  others  to  be  removed.  Fourth, 
the  provision  of  immediate  feedback  allowed  users  to  directly 
experience  the  effects  of  any  choice,  trying  them  alone  or  in 
combination  until  their  overall  web  experience  was  optimal. 


CONCLUSIONS 


Not  surprisingly,  psychological  complexity  is  itself  a  complex 
topic.  It  should  be  of  interest  to  designers  of  HCI  because  they 
typically  need  to  reduce  undue  complexity.  There  are  occa¬ 
sional  exceptions,  for  instance,  in  entertainment  and  learning 
applications  where  the  designer  may  want  to  increase  psycho¬ 
logical  complexity.  In  this  chapter,  we  have  distinguished  com¬ 
plexity  from  many  related  but  different  concepts  such  as  un¬ 
certainty.  Also,  we  have  explored  how  undue  complexity  may  be 
introduced  into  the  development  process  and  have  suggested 
approaches  to  measure  and  reduce  undue  complexity.  As  the 
intrinsic  complexity  of  the  world  seems  to  be  on  the  increase, 
the  importance  of  reducing  undue  complexity  will  continue  to 
increase. 
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INTRODUCTION 


The  working  mind  is  greatly  leveraged  by  interaction  with  the 
world  outside  it.  A  conversation  to  share  information,  a  grocery 
list  to  aid  memory,  a  pocket  calculator  to  compute  square 
roots — all  effectively  augment  a  cognitive  ability  otherwise  se¬ 
verely  constrained  by  what  is  in  its  limited  knowledge,  by  lim¬ 
ited  attention,  and  by  limitations  on  reasoning.  But  the  most 
profound  leverage  on  cognitive  ability  is  the  ability  to  invent 
new  representations,  procedures,  or  devices  that  augment  cog¬ 
nition  far  beyond  its  unaided  biological  endowment — and  boot¬ 
strap  these  into  even  more  potent  inventions. 

This  chapter  is  about  one  class  of  inventions  for  augmenting 
cognition,  collectively  called  “information  visualization.”  Other 
senses  could  be  employed  in  this  pursuit — audition,  for  exam¬ 
ple,  or  a  multi-modal  combination  of  senses — the  broader  topic 
is  really  information perceptualization-,  however,  in  this  chap¬ 
ter,  we  restrict  ourselves  to  visualization.  Visualization  employs 
the  sense  with  the  most  information  capacity;  recent  advances  in 
graphically  agile  computers  have  opened  opportunities  to  ex¬ 
ploit  this  capacity,  and  many  visualization  techniques  have  now 
been  developed.  A  few  examples  suggest  the  possibilities. 

Example  1 :  Finding  Videos  with  the  FilmFinder 

The  use  of  information  visualization  for  finding  things  is  illus¬ 
trated  by  the  FilmFinder  (Ahlberg  &  Shneiderman,  1994a, 


1994b).  Unlike  typical  movie-finder  systems,  the  FilmFinder  is 
organized  not  around  searching  with  keywords,  but  rather 
around  rapid  browsing  and  reacting  to  collections  of  films  in  the 
database  Figure  26.1  shows  a  scattergraph  of  2000  movies,  plot¬ 
ting  rated  quality  of  the  movie  as  a  function  of  year  when  it  was 
released.  Color  differentiates  type  of  movies — comedy  from 
drama  and  the  like.  The  display  provides  an  overview,  the  en¬ 
tire  universe  of  all  the  movies,  and  some  general  features  of  the 
collection.  It  is  visually  apparent,  for  example,  that  a  good  share 
of  the  movies  in  the  collection  were  released  after  1965,  but  also 
that  there  are  movies  going  back  as  far  as  the  1920s.  Now  the 
viewer  “drills  down”  into  the  collection  by  using  the  sliders  in 
the  interface  to  show  only  movies  with  Sean  Connery  that  are 
between  1  and  4%  hours  in  length  (Fig.  26.2).  As  the  sliders  are 
moved,  the  display  zooms  in  to  show  about  20  movies.  It  can 
be  seen  that  these  movies  were  made  between  I960  and  1995, 
and  all  have  a  quality  rating  higher  than  4.  Since  there  is  now 
room  on  the  display,  titles  of  the  movies  appear.  Experimenta¬ 
tion  with  the  slider  shows  that  restricting  maximum  length  to 
2  hours  cuts  out  few  interesting  movies.  The  viewer  chooses  the 
highly  rated  movie,  “Murder  on  the  Orient  Express”  by  double¬ 
clicking  on  its  marker.  Up  pop  details  in  a  box  (Eig.  26.3)  giving 
names  of  other  actors  in  the  movie  and  more  information.  The 
viewer  is  interested  in  whether  two  of  these  actors,  Anthony 
Perkins  and  Ingrid  Bergman,  have  appeared  together  in  any 
other  movies.  The  viewer  selects  their  names  in  the  box,  and 
then  requests  another  search  (Fig.  26.4).  The  result  is  a  new  dis¬ 
play  of  two  movies.  In  addition  to  the  movie  the  viewer  knew 
about,  there  is  one  other  movie,  a  drama  entitled  “Goodbye, 


FIGURE  26. 1 .  FilmFinder  overview  scattergraph.  Courtesy  University  of  Maryland. 
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FIGURE  26.3.  FilmFinder  details  on  demand.  Courtesy  University  of  Maryland 
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FIGURE  26.4.  FilmFinder  retrieval  by  example.  Courtesy  University  of  Maryland. 


Again,”  made  around  I960.  The  viewer  is  curious  about  this  more  than  500  stocks  at  once,  with  data  updated  every  15  min- 

movie  and  decides  to  watch  it.  utes.  Each  colored  rectangle  in  the  figure  is  a  company.  The  size 

Information  visualization  has  allowed  a  movie  viewer  in  a  of  the  rectangle  is  proportional  to  its  market  capitalization, 

matter  of  seconds  to  find  a  movie  he  or  she  could  not  have  spec-  Color  of  the  rectangle  shows  movement  in  the  stock  price, 

ified  at  the  outset.  To  do  this,  the  FilmFinder  employed  several  Bright  yellow  corresponds  to  about  a  6%  increase  in  price, 

techniques  from  information  visualization:  (a)  an  overview  of  bright  blue  to  about  a  6%  decrease  in  price.  Each  business  see¬ 
the  collection  showing  its  structure;  (b)  dynamic  queries,  in  tor  is  identified  with  a  label  like  “Communications.”  Those  items 

which  the  visualization  seems  to  change  instantaneously  with  marked  with  a  letter  A/^  have  an  associated  news  item, 

control  manipulations;  (c)  zooming  in  by  adding  restrictions  In  this  example,  the  investor’s  task  is  to  monitor  the  day’s 

to  the  set  of  interest;  (d)  details  on  demand,  in  which  the  user  market  and  notice  interesting  developments.  In  Eig.  26.5(a),  the 

can  display  temporarily  detail  about  an  individual  object,  and  investor  has  moved  the  mouse  over  one  of  the  bright  yellow 

(e)  retrieval  by  example,  in  which  selected  attributes  of  an  in-  rectangles,  and  a  box  identifying  it  as  Erickson,  with  a  +9.2896 

dividual  object  are  used  to  specify  a  new  retrieval  set.  gain  for  the  day,  has  popped  up  together  with  other  informa¬ 

tion.  Clicking  on  a  box  gives  the  investor  a  popup  menu  for  se¬ 
lecting  even  more  detail.  The  investor  can  either  click  to  go  to 
Example  2:  Monitoring  Stocks  with  TreeMaps  World  Wide  Web  links  on  news  or  financials,  or  drill  down,  for 

example,  to  the  sector  (Eig.  26.5[b]),  or  down  further  to  indi- 
Another  example  of  information  visualization  is  the  TreeMap  vi-  vidual  companies  in  the  software  part  of  the  technology  sector 

sualization  on  the  SmartMoney.com  website,^  which  is  shown  in  (Fig.  26.5[c]).  The  investor  is  now  able  to  immediately  note  in- 

Fig.  26.5(a).  Using  this  visualization,  an  investor  can  monitor  teresting  relationships.  The  software  industry  is  now  larger  than 


^www.  smartmoney.  com 
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FIGURE  26.5.  TreeMap  of  daily  stock  prices.  Courtesy  SmartMoney.com. 


the  hardware  industry,  for  example,  and  despite  a  recent  bat¬ 
tering  at  the  time  of  this  figure,  the  Internet  industry  is  also  rel¬ 
atively  large.  Microsoft  is  larger  than  all  the  other  companies  in 
its  industry  combined.  Selecting  a  menu  item  to  look  at  year- 
to-date  gains  (Fig.  26.6),  the  investor  immediately  notes  inter¬ 
esting  patterns:  Microsoft  stock  shows  substantial  gains, 
whereas  Oracle  is  down;  Dell  is  up,  but  Compaq  is  down;  Tiny 
Advanced  Micro  is  up,  whereas  giant  Intel  is  neutral.  Having  no¬ 
ticed  these  relationships,  the  investor  drills  down  to  put  up 
charts  or  analysts  positions  for  companies  whose  gains  in  them¬ 
selves,  or  in  relation  to  a  competitor,  are  interesting.  For  ex¬ 
ample,  the  investor  is  preparing  a  report  on  the  computer  in¬ 
dustry  for  colleagues  and  notices  how  AMD  is  making  gains 
against  Intel,  or  how  competition  for  the  Internet  is  turning  into 
a  battle  between  Microsoft  and  AOL/Time  Warner. 


Example  3:  Sensemaking  with  Permutation  Matrices 

As  a  final  information  visualization  example,  consider  the  case 
proposed  by  Berlin  (1977/1981)  of  a  hotel  manager  who  wants 
to  analyze  hotel  occupancy  data  (Table  26.1)  to  increase  her  re¬ 
turn.  In  order  to  search  for  meaningful  patterns  in  her  data,  she 
represents  it  as  a  permutation  matrix  (Fig.  26.7  [a] .  A  permuta¬ 
tion  matrix  is  a  graphic  rendition  of  a  cases  x  variables  display.  In 
Fig.  26.7(a),  each  cell  of  Table  26.1  is  a  small  bar  of  a  bar  chart. 
The  bars  for  cells  below  the  mean  are  white;  those  above  the  bar 
are  black.  By  permuting  rows  and  columns,  patterns  emerge 
that  lead  to  making  sense  of  the  data. 

In  Fig.  26.7(a),  the  set  of  months,  which  form  the  cases,  are 
repeated  to  reveal  periodic  patterns  across  the  end  of  the  cy¬ 
cle.  By  visually  comparing  the  pairs  of  rows,  one  can  find  rows 


FIGURE  26.6.  TreeMap  of  year-to-date  stock  prices.  Courtesy  SmartMoney.com. 


TABLE  26.1.  Data  for  Hotel  Occupancy  (Based  on  Bertin  (1977/1981)) 


ID 

VARIABLE 

IAN 

FEB 

MAR 

APR 

MAY 

lUNE 

lULY 

AUG 

SEPT 

OCT 

NOV 

DEC 

1 

%  Female 

26 

21 

26 

28 

20 

20 

20 

20 

20 

40 

15 

40 

2 

%  Local 

69 

70 

77 

71 

37 

36 

39 

39 

55 

60 

68 

72 

3 

%  USA 

7 

6 

3 

6 

23 

14 

19 

14 

9 

6 

8 

8 

4 

%  South  America 

0 

0 

0 

0 

8 

6 

6 

4 

2 

12 

0 

0 

5 

%  Europe 

20 

15 

14 

15 

23 

27 

22 

30 

27 

19 

19 

17 

6 

%  M. East/Africa 

1 

0 

0 

8 

6 

4 

6 

4 

2 

1 

0 

1 

7 

%  Asia 

3 

10 

6 

0 

3 

13 

8 

9 

5 

2 

5 

2 

8 

%  Businessmen 

78 

80 

85 

86 

85 

87 

70 

76 

87 

85 

87 

80 

9 

%  Tourists 

22 

20 

15 

14 

15 

13 

30 

24 

13 

15 

13 

20 

10 

%  Direct  Reservations 

70 

70 

75 

74 

69 

68 

74 

75 

68 

68 

64 

75 

11 

%  Agency  Reservations 

20 

18 

19 

17 

27 

27 

19 

19 

26 

27 

21 

15 

12 

%  Air  Crews 

10 

12 

6 

9 

4 

5 

7 

6 

6 

5 

15 

10 

13 

%  Under  20 

2 

2 

4 

2 

2 

1 

1 

2 

2 

4 

2 

5 

14 

%  20-35 

25 

27 

37 

35 

25 

25 

27 

28 

24 

30 

24 

30 

15 

%  35-55 

48 

49 

42 

48 

54 

55 

53 

51 

55 

46 

55 

43 

16 

%  Over  55 

25 

22 

17 

15 

19 

19 

19 

19 

19 

20 

19 

22 

17 

Price  of  rooms 

163 

167 

166 

174 

152 

155 

145 

170 

157 

174 

165 

156 

18 

Length  of  stay 

1.7 

1.7 

1.7 

1.91 

1.9 

2 

1.54 

1.6 

1.73 

1.82 

1.66 

1.44 

19 

%  Occupancy 

67 

82 

70 

83 

74 

77 

56 

62 

90 

92 

78 

55 

20 

Conventions 

0 

0 

0 

1 

1 

1 

0 

0 

1 

1 

1 

1 
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that  are  similar.  These  are  reordered  and  grouped  (Fig.  26.7  [b]). 
By  this  means,  it  is  discovered  that  there  seem  to  be  two  pat¬ 
terns  of  yearly  variation.  One  pattern  in  Fig.  26.7(b)  is  semian¬ 
nual,  dividing  the  year  into  the  cold  months  of  October  through 
April  and  the  warm  months  of  May  through  September.  The 
other  pattern  breaks  the  year  into  four  distinct  regions.  We  have 
thus  found  the  beginnings  of  a  schema — that  is,  a  framework 
in  terms  of  which  we  can  encode  the  raw  data  and  describe  it 
in  a  more  compact  language.  Instead  of  talking  about  the  events 
of  the  year  in  terms  of  individual  months,  we  can  now  talk  in 
terms  of  two  series  of  periods,  the  semiannual  one,  and  the  four 
distinct  periods.  As  we  do  so,  there  is  a  residue  of  information 
not  included  as  part  of  our  descriptive  language.  Sensemaking 
proceeds  by  the  omission  and  recoding  of  information  into 
more  compact  form  (see  Resnikoff,  1989).  This  residue  of  in¬ 
formation  may  be  reduced  by  finding  a  better  or  more  articu¬ 
lated  schema,  or  it  may  be  left  as  noise.  Beyond  finding  the  ba¬ 
sic  patterns  in  the  data,  the  hotel  manager  wants  to  make  sense 
of  the  data  relative  to  a  purpose:  she  wants  to  increase  the  oc¬ 
cupancy  of  the  hotel.  Therefore,  she  has  also  permuted  gen¬ 
eral  indicators  of  activity  in  Fig.  26.7(b),  such  as  %  Occupancy 
and  Length  of  Stay,  to  the  top  of  the  diagram  and  put  the  rows 
that  correlate  with  these  below  them.  This  reveals  that  Conven¬ 
tions,  Businessmen,  and  Agency  Reservations,  all  of  which  gen¬ 
erally  have  to  do  with  convention  business,  are  associated  with 
higher  occupancy.  This  insight  comes  from  the  match  in  pat¬ 
terns  internal  to  the  visualization;  it  also  comes  from  noting 
why  these  variables  might  correlate  as  a  consequence  of  factors 
external  to  the  visualization.  She  also  discovers  that  marked  dif¬ 
ferences  exist  between  the  winter  and  summer  guests  during 
the  slow  periods.  In  winter,  there  are  more  local  guests,  women, 
and  age  differences.  In  summer,  there  are  more  foreign  tourists 
and  less  variation  in  age. 

This  visualization  was  useful  for  sensemaking  on  hotel  oc¬ 
cupancy  data,  but  it  is  too  complicated  to  communicate  the 
high  points.  The  hotel  manager  therefore  creates  a  simplified 
diagram.  Fig.  26.7(c).  By  graying  some  of  the  bars,  the  main 
points  are  more  readily  graspable,  while  still  preserving  the 
data  relations.  A  December  convention,  for  example,  does  not 
seem  to  have  the  effect  of  the  other  conventions  in  bringing  in 
guests.  It  is  shown  in  gray  as  residue  in  the  pattern.  The  hotel 
manager  suggests  moving  the  convention  to  another  month, 
where  it  might  have  more  effect  on  increasing  the  occupancy  of 
the  hotel. 

What  Is  Information  Visualization? 

The  PilmFinder,  the  TreeMap,  and  the  permutation  matrix  ho¬ 
tel  analysis  are  all  examples  of  the  use  of  information  visual¬ 
ization.  We  can  define  information  visualization  as  “the  use 
of  computer-supported,  interactive,  visual  representations  of 
abstract  data  in  order  to  amplify  cognition”  (Card,  Mackinlay,  & 
Shneiderman,  1999). 

Information  visualization  needs  to  be  distinguished  from 
related  areas:  scientific  visualization  is  like  information  visu¬ 
alization,  but  it  is  applied  to  scientific  data  and  typically  is 
physically  based.  The  starting  point  of  a  natural  geometrical 
substrate  for  the  data,  whether  the  human  body  or  earth  ge¬ 


ography,  tends  to  emphasize  finding  a  way  to  make  visible  the 
invisible  (say,  velocity  of  air  flow)  within  an  existing  spatial 
framework.  The  chief  problem  for  information  visualization,  in 
contrast,  is  often  finding  an  effective  mapping  between  ab¬ 
stract  entities  and  a  spatial  representation.  Both  information 
visualization  and  scientific  visualization  belong  to  the  broader 
field  of  data  graphics,  which  is  the  use  of  abstract,  nonrepre- 
sentational  visual  representations  to  amplify  cognition.  Data 
graphics,  in  turn,  is  part  of  information  design,  which  con¬ 
cerns  itself  with  external  representations  for  amplifying  cog¬ 
nition.  At  the  highest  level,  we  could  consider  information  de¬ 
sign  a  part  of  external  cognition,  the  uses  of  the  external 
world  to  accomplish  some  cognitive  process.  Characterizing 
the  purpose  of  information  visualization  as  amplifying  cogni¬ 
tion  is  purposely  broad.  Cognition  can  be  the  process  of  writ¬ 
ing  a  scientific  paper  or  shopping  on  the  Internet  for  a  cell 
phone.  Generally,  it  refers  to  the  intellectual  processes  in 
which  information  is  obtained,  transformed,  stored,  retrieved, 
and  used.  All  of  these  can  be  advanced  generally  by  means  of 
external  cognition,  and  in  particular  by  means  of  information 
visualization. 

Why  Does  Visualization  Work? 

Visualization  aids  cognition  not  because  of  some  mystical  su¬ 
periority  of  pictures  over  other  forms  of  thought  and  commu¬ 
nication,  but  rather  because  visualization  helps  the  user  by 
making  the  world  outside  the  mind  a  resource  for  thought  in 
specific  ways.  We  list  six  groups  of  these  in  Table  26.2  (Card 
et  al.,  1999):  Visualization  amplifies  cognition  by  (a)  increasing 
the  memory  and  processing  resources  available  to  the  users, 
(b)  reducing  search  for  information,  (c)  using  visual  represen¬ 
tations  to  enhance  the  detection  of  patterns,  (d)  enabling  per¬ 
ceptual  inference  operations,  (e)  using  perceptual  attention 
mechanisms  for  monitoring,  and  (f)  by  encoding  information 
in  a  manipulable  medium.  The  FilmFinder,  for  example,  allows 
the  representation  of  a  large  amount  of  data  in  a  small  space 
in  a  way  that  allows  patterns  to  be  perceived  visually  in  the 
data.  Most  important,  the  method  of  instantly  responding  in 
the  display  to  the  dynamic  movement  of  the  sliders  allowed 
users  to  rapidly  explore  the  multidimensional  space  of  films. 
The  TreeMap  of  the  stock  market  allows  monitoring  and  explo¬ 
ration  of  many  equities.  Again,  much  data  is  represented  in  lit¬ 
tle  space.  In  this  case,  the  display  manages  the  user’s  attention, 
drawing  it  to  those  equities  with  unusually  large  changes,  and 
supplying  the  means  to  drill  down  into  the  data  to  understand 
why  these  movements  may  be  happening.  In  the  hotel  man¬ 
agement  case,  the  visual  representation  makes  it  easier  to  no¬ 
tice  similarities  of  behavior  in  a  multidimensional  attribute 
space,  then  to  cluster  and  rerepresent  these.  The  final  prod¬ 
uct  is  a  compact  (and  simplified)  representation  of  the  origi¬ 
nal  data  that  supports  a  set  of  forward  decisions.  In  all  of  these 
cases,  visualization  allows  the  user  to  (a)  examine  a  large 
amount  of  information,  (b)  keep  an  overview  of  the  whole 
while  pursuing  details,  (c)  keep  track  of  (by  using  the  display  as 
an  external  working  memory)  many  things,  and  (d)  produce  an 
abstract  representation  of  a  situation  through  the  omission  and 
recoding  of  information. 
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FIGURE  26.7.  Permutation  matrix  representation  of  hotel  data  from  (Berlin,  1977/1981).  (a)  Initial 
matrix  of  variables,  (b)  Permuted  matrix  to  group  like  patterns  together. 


Historical  Origins 

Drawn  visual  representations  have  a  long  history.  Maps  go  back 
millennia.  Diagrams  were  an  important  part  of  Euclid’s  books 
on  geometry.  Science,  from  earliest  times,  used  diagrams  to 


(a)  record  observations,  (b)  induct  relationships,  (c)  explicate 
methodology  of  experiments,  and  (d)  classify  and  conceptualize 
phenomena  (for  a  discussion,  see  Robin,  1992).  For  example, 
Fig.  26.8  is  a  hand-drawn  illustration,  in  Newton’s  first  scientific 
publication,  illustrating  how  white  light  is  really  composed  of 
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TABLE  26.2.  How  Information  Visualization  Amplifies  Cognition 


1 . Increased  Resources 

High-bandwidth  hierarchical  interaction 

Parallel  perceptual  processing 
Offload  work  from  cognitive  to 
perceptual  system 
Expanded  working  memory 
Expanded  storage  of  information 

2.  Reduced  Search 

Locality  of  processing. 

High  data  density. 

Spatially-indexed  addressing. 

3.  Enhanced  Recognition  of  Patterns 

Recognition  instead  of  recall. 

Abstraction  and  aggregation 


Visual  schemata  for  organization 
Value,  relationship,  trend 

4.  Perceptual  Inference 

Visual  representations  make  some 
problems  obvious 

Graphical  computations 

5.  Perceptual  Monitoring 

6.  Manipulable  medium 


Source:  Card,  Mackinlay,  &  Shneiderman,  1999. 


Human  moving  gaze  system  partitions  limited  channel  capacity  so  that  it  combines  high  spatial 
resolution  and  wide  aperture  in  sensing  the  visual  environments  (Larkin  &  Simon,  1987). 

Some  attributes  of  visualizations  can  be  processed  in  parallel  compared  to  text,  which  is  serial. 

Some  cognitive  inferences  done  symbolically  can  be  recoded  into  inferences  done  with  simple 
perceptual  operations  (Larkin  &  Simon,  1987). 

Visualizations  can  expand  the  working  memory  available  for  solving  a  problem  (Norman,  1993). 

Visualizations  can  be  used  to  store  massive  amounts  of  information  in  a  quickly  accessible  form 
(e.g.,  maps). 

Visualizations  group  information  used  together  reducing  search  (Larkin  &  Simon,  1987). 

Visualizations  can  often  represent  a  large  amount  of  data  in  a  small  space  (Tufte,  1983). 

By  grouping  data  about  an  object,  visualizations  can  avoid  symbolic  labels  (Larkin  &  Simon,  1987). 

Recognizing  information  generated  by  a  visualization  is  easier  than  recalling  that  information  by 
the  user. 

Visualizations  simplify  and  organize  information,  supplying  higher  centers  with  aggregated  forms  of 
information  through  abstraction  and  selective  omission  (Card,  Robertson,  &  Mackinlay,  1991); 

(Resnikoff,  1989). 

Visually  organizing  data  by  structural  relationships  (e.g.,  by  time)  enhances  patterns. 

Visualizations  can  be  constructed  to  enhance  patterns  at  all  three  levels  (Berlin,  1967/1983). 


Visualizations  can  support  a  large  number  of  perceptual  inferences  that  are  very  easy  for  humans 
(Larkin  &  Simon,  1987). 

Visualizations  can  enable  complex  specialized  graphical  computations  (Hutchins,  1996). 
Visualizations  can  allow  for  the  monitoring  of  a  large  number  of  potential  events  if  the  display  is 
organized  so  that  these  stand  out  by  appearance  or  motion. 

Unlike  static  diagrams,  visualizations  can  allow  exploration  of  a  space  of  parameter  values  and  can 
amplify  user  operations. 


many  colors.  Sunlight  enters  from  the  window  at  right  and  is 
refracted  into  many  colors  by  a  prism.  One  of  these  colors  can 
be  selected  (Toy  an  aperture  in  a  screen)  and  further  refracted  by 
another  prism,  but  the  light  stays  the  same  color,  showing  that 
it  has  already  been  reduced  to  its  elementary  components.  As  in 
Newton’s  illustration,  early  scientific  and  mathematical  diagrams 
generally  had  a  spatial,  physical  basis  and  were  used  to  reveal 
the  hidden,  underlying  order  in  that  world. 

Surprisingly,  diagrams  of  abstract,  nonphysical  information 
are  apparently  rather  recent.  Tufte  (1983)  dates  abstract  dia¬ 
grams  to  (Playfair,  1786)  in  the  18th  century.  Figure  26.9  is  one 
of  Playfair’s  earliest  diagrams.  The  purpose  was  to  convince 
readers  that  English  imports  were  catching  up  with  imports. 
Starting  with  Playfair,  the  classical  methods  of  plotting  data  were 
developed — graphs,  bar  charts,  and  the  rest. 

Recent  advances  in  the  visual  representation  of  abstract  in¬ 
formation  derive  from  several  strands  that  became  intertwined. 
In  1967,  Bertin  (1967/1983,  1977/1981),  a  French  cartographer 
published  his  theory  of  The  Semiology  of  Graphics.  This  theory 
identified  the  basic  elements  of  diagrams  and  their  combina¬ 
tion.  Tufte  (1983,  1990,  1997),  from  the  fields  of  visual  design 
and  data  graphics,  published  a  series  of  seminal  books  that  set 
forth  principles  for  the  design  of  data  graphics  and  emphasized 
maximizing  the  density  of  useful  information.  Both  Bertin’s  and 
Tufte’s  theories  became  well  known  and  influential.  Meanwhile, 
within  statistics,  Tukey  (1977)  began  a  movement  on  exploratory 
data  analysis.  His  emphasis  was  not  on  the  quality  of  graphical 


presentation,  but  on  the  use  of  pictures  to  give  rapid,  statistical 
insight  into  data  relations.  For  example,  “box  and  whisker  plots” 
allowed  an  analyst  to  get  a  rapid  characterization  of  data  distri¬ 
butions.  Cleveland  and  McGill  (1988)  wrote  an  influential  book. 
Dynamic  Graphics  for  Statistics,  explicating  new  visualizations 
of  data  with  particular  emphasis  on  the  visualization  of  multidi¬ 
mensional  data. 

In  1985,  NSF  launched  an  initiative  on  scientific  visualiza¬ 
tion  (McCormick  &  DeFanti,  1987).  The  purpose  of  this  initia¬ 
tive  was  to  use  advances  in  computer  graphics  to  create  a  new 
class  of  analytical  instruments  for  scientific  analysis,  especially  as 
a  tool  for  comprehending  large,  newly  produced  datasets  in  the 
geophysical  and  biological  sciences.  Meanwhile,  the  computer 
graphics  and  artificial  intelligence  communities  were  interested 
in  the  automatic  design  of  visual  presentations  of  data.  Mackinlay ’s 
(1986a,  1986b)  thesis  APT  formalized  Bertin’s  design  theory, 
added  psychophysical  data,  and  used  these  to  build  a  system  for 
automatically  generating  diagrams  of  data,  tailored  for  some 
purpose.  Roth  and  Maths  (1990)  built  a  system  to  do  more  com¬ 
plex  visualizations,  such  as  some  of  those  from  Tufte.  Casner 
(1991)  added  a  representation  of  tasks.  This  community  was  in¬ 
terested  not  so  much  in  the  quality  of  the  graphics  as  in  the  au¬ 
tomation  of  the  match  between  data  characteristics,  presenta¬ 
tional  purpose,  and  graphical  presentation.  Finally,  the  user 
interface  community  saw  advances  in  graphics  hardware  open¬ 
ing  the  possibility  of  a  new  generation  of  user  interfaces.  The 
first  use  of  the  term  “information  visualization”  was  probably  in 


518 


CARD 


■5*/. 


FIGURE  26.8.  Newton's  optics  illustration  (from  Robin,  i992). 
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FIGURE  26.9.  Playfair's  charts  of  English  imports  and  exports,  from  (Tufte,  i983). 


Robertson,  Card,  and  Mackinlay  (1989).  Early  studies  in  this 
community  focused  on  user  interaction  with  large  amounts  of 
information:  Feiner  and  Beshers  (1990)  presented  a  method, 
worlds  within  worlds,  for  showing  six-dimensional  financial  data 
in  an  immersive  virtual  reality.  Shneiderman  (1992)  developed  a 
technique  called  “dynamic  queries”  for  interactively  selecting 
subsets  of  data  items  and  TreeMaps,  a  space-filling  representa¬ 


tion  for  trees.  Robertson,  Card,  and  Mackinlay  (1993)  presented 
ways  of  using  animation  and  distortion  to  interact  with  large 
data  sets  in  a  system  called  the  Information  Visualizer,  which 
used  focus  +  context  displays  to  nonuniformly  present  large 
amounts  of  information.  The  emphasis  for  these  studies  was 
on  the  means  for  cognitive  amplification,  rather  than  on  the 
quality  of  the  graphics  presentations. 
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The  remainder  of  this  chapter  will  concentrate  on  the  tech¬ 
niques  that  have  been  developed  for  mapping  abstract  infor¬ 
mation  to  interactive  visual  form  to  aid  some  intellectual  task. 
The  perceptual  foundations  of  this  effort  are  beyond  the  scope 
of  this  chapter,  but  are  covered  in  Ware  (2000).  Further  details 
on  information  visualization  techniques  are  addressed  in  a  text 
by  Spence  (2000).  The  classic  papers  in  information  visualiza¬ 
tion  are  collected  in  Card  et  al.  (1999). 


THE  VISUALIZATION  REFERENCE  MODEL 


Mapping  Data  to  Visual  Form 

Despite  their  seeming  variability  information  visualizations  can 
be  systematically  analyzed.  Visualizations  can  be  thought  of  as 
adjustable  mappings  from  data  to  visual  form  to  the  human  per- 
ceiver.  In  fact,  we  can  draw  a  simple  Visualization  Reference 
Model  of  these  mappings  (Fig.  26.10).  Arrows  follow  from  Raw 
Data  (data  in  some  idiosyncratic  format)  on  the  left,  though  a 
set  of  Data  Transformations  into  Data  Tables  (canonical  de¬ 
scriptions  of  data  in  a  variables  x  cases  format  extended  to  in¬ 
clude  metadata) .  The  most  important  mapping  is  the  arrow 
from  Data  Tables  to  Visual  Structures  (structures  that  combine 
values  an  available  vocabulary  of  visual  elements — spatial  sub¬ 
strates,  marks,  and  graphical  properties).  Visual  Structures  can 
be  further  transformed  by  View  Transformations,  such  as  vi¬ 
sual  distortion  or  3D  viewing  angle,  until  it  finally  forms  a  View 
that  can  be  perceived  by  human  users.  Thus,  Raw  Data  might 
start  out  as  text  represented  as  indexed  strings  or  arrays.  These 
might  be  transformed  into  document  vectors,  normalized  vec¬ 
tors  in  a  space  with  dimensionality  as  large  as  the  number  of 


words.  Document  vectors,  in  turn,  might  be  reduced  by  multi¬ 
dimensional  scaling  to  create  the  analytic  abstraction  to  be  vi¬ 
sualized,  expressed  as  a  Data  Table  of  x,  y  z  coordinates  that 
could  be  displayed.  These  coordinates  might  be  transformed 
into  a  Visual  Structure — that  is,  a  surface  on  an  information 
landscape — which  is  then  viewed  at  a  certain  angle. 

Similar  final  effects  can  be  achieved  by  transformations  at  dif¬ 
ferent  places  in  the  model:  When  a  point  is  deleted  from  the  vi¬ 
sualization,  has  the  point  been  deleted  from  the  dataset?  Or  is  it 
still  in  the  data  merely  not  displayed?  Chi  and  Riedl  (1998)  called 
this  the  view-value  distinction,  and  it  is  an  example  of  just  one 
issue  where  identifying  the  locus  of  a  transformation  using  the 
Visualization  Reference  Model  helps  to  avoid  confusion. 

Information  visualization  is  about  the  not  just  creation  of  vi¬ 
sual  images,  but  also  the  interaction  with  those  images  in  the 
service  of  some  problem.  In  the  Visualization  Reference  Model, 
another  set  of  arrows  flow  back  from  the  human  at  the  right  into 
the  transformations  themselves,  indicating  the  adjustment  of 
these  transformations  by  user-operated  controls.  It  is  the  rapid 
reciprocal  reaction  between  the  generation  of  images  by  ma¬ 
chine  and  the  selection  and  parametric  adjustment  of  those  im¬ 
ages,  giving  rise  to  new  images  that  gives  rise  to  the  attractive 
power  of  interactive  information  visualization. 

Data  Structures 

It  is  convenient  to  express  Data  Tables  as  tables  of  objects  and 
their  attributes,  as  in  Table  26.3.  For  example,  in  the  FilmFinder, 
the  basic  objects  (or  “cases”)  are  films.  Each  film  is  associated 
with  a  number  of  attributes  or  variables,  such  as  title,  stars,  year 
of  release,  genre  type,  and  so  forth.  The  vertical  double  black 
line  in  the  table  separates  data  in  the  table  to  the  left  of  the  line 


Human  Interaction 


Raw  Data:  idiosyncratic  formats 

Data  Tables:  relations  (cases  by  variables)  +  meta-data 

Visual  Structures:  spatial  substrates  +  marks  +  graphical  properties 

Views:  graphical  parameters  (position,  scaling,  clipping,  . . . ) 


FIGURE  26.10.  Reference  model  for  visualization  (Card  et  al.,  1999).  Visualization  can  be 
described  as  the  mapping  of  data  to  visual  form  that  supports  human  interaction  in  a  workplace 
for  visual  sense  making. 
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TABLE  26.3.  A  Data  Table  About  Films 


FilmID  230  105  540 


Title 

Goldfinger 

Ben  Hur 

Ben  Hur 

Director 

Hamilton 

Wyler 

Niblo 

Actor 

Connery 

Heston 

Novarro 

Actress 

Blackman 

Harareet 

McAvoy 

Year 

1964 

1959 

1926 

Length 

112 

212 

133 

Popularity 

7.7 

8.2 

7.4 

Rating 

PG 

G 

G 

Filmiype 

Action 

Action 

Drama 

Source:  (Card  et  at,  1999). 


from  the  metadata,  expressed  as  variable  names,  to  the  left  of 
the  line.  The  horizontal  black  line  across  the  table  separates  in¬ 
put  variables  from  output  variables — that  is,  the  table  can  be 
thought  of  as  a  function. 

/(input  variables)  =  output  variables. 

So, 

Year  {FilmID  =  105)  =  1959. 

Variables  imply  a  scale  of  measurement,  and  it  is  important  to 
keep  these  straight.  The  most  important  to  distinguish  are 

N  =  Nominal  (are  only  =  or  +  to  other  values) 

O  =  Ordinal  (obeys  a  <  relation) 

Q  =  Quantitative  (can  do  arithmetic  on  them) 

A  nominal  variable  N  is  an  unordered  set,  such  as  film  titles 
{Goldfinger,  Ben  Hur,  Star  Wars}.  An  ordinal  variable  O  is  a  tu¬ 
ple  (ordered  set),  such  as  film  ratings  (G,  PG,  PG-13,  R).  A  quan¬ 
titative  variable  Q  is  a  numeric  range,  such  as  film  length  [0,  360] . 

In  addition  to  the  three  basic  types  of  variables,  subtypes 
represent  important  properties  of  the  world  associated  with 
specialized  visual  conventions.  We  sometimes  distinguish  the 
subtype  Quantitative  Spatial  (Q/  for  intrinsically  spatial  vari¬ 
ables  common  in  scientific  visualization  and  the  subtype  Quan¬ 
titative  Geographical  {Qg)  for  spatial  variables  that  are  specifi¬ 
cally  geophysical  coordinates.  Other  important  subtypes  are 
similarity  metrics  Quantitative  Similarity  (Qm),  and  the  tempo¬ 
ral  variables  Quantitative  Time  {QP  and  Ordinal  Time  {Op .  We 
can  also  distinguish  Interval  Scales  (/)  (like  Quantitative  Scales, 
but  since  there  is  not  a  natural  zero  point,  it  is  not  meaningful  to 
take  ratios).  An  example  would  be  dates.  It  is  meaningful  to  sub¬ 
tract  two  dates  (June  5,  2002  -June  3,  2002  =  2  days),  but  it 
does  not  make  sense  to  divide  them  (June  5,  2002  June  23, 
2002  =  Undefined).  Finally,  we  can  define  an  Unstructured 
Scale  {U),  whose  only  value  is  present  or  absent  (e.g.,  an  error 
flag).  The  scales  are  summarized  in  Table  26.4. 

Scale  types  can  be  altered  by  transformations,  and  this  prac¬ 
tice  is  sometimes  convenient.  For  example,  quantitative  variables 
can  be  mapped  by  data  transformations  into  ordinal  variables 


by  dividing  them  into  ranges.  For  example,  film  lengths  [0,  360] 
minutes  (type  Q)  can  be  broken  into  the  ranges  (type  O), 

[0,  360]  minutes  ^  (Short,  Medium,  Long). 

This  common  transformation  is  called  “classing,”  because  it 
maps  values  onto  classes  of  values.  It  creates  an  accessible  sum¬ 
mary  of  the  data,  although  it  loses  information.  In  the  other  di¬ 
rection,  nominal  variables  can  be  transformed  to  ordinal  values 

N^O 

based  on  their  name.  For  example,  film  titles  {Goldfinger,  Ben 
Hur,  Star  Wars}  can  be  sorted  lexicographically 

{Goldfinger,  Ben  Hur,  Star  Wars}  — > 

(Ben  Hur,  Goldfinger,  Star  Wars). 

Strictly  speaking,  we  have  not  transformed  their  values,  but  in 
many  uses  (e.g.,  building  alphabetically  arranged  dictionaries  of 
words  or  sliders  in  the  FilmFinder),  we  can  act  as  if  we  had. 

Variable  scale  types  form  an  important  class  of  metadata  that, 
as  we  shall  see,  is  important  for  proper  information  visualiza¬ 
tion.  We  can  add  scale  type  to  our  Data  Table  in  Table  26.3  to¬ 
gether  with  cardinality  or  range  of  the  data  to  give  us  essentially 
a  codebook  of  variables  as  in  Table  26.5. 

Visual  Structures 

Information  visualization  maps  data  relations  into  visual  form.  At 
first,  it  might  seem  that  a  hopelessly  open  set  of  visual  forms  can 
result.  Careful  reflection,  however,  reveals  what  every  artist 
knows:  that  visual  form  is  subject  to  strong  constraints.  Visual 
form  that  reflects  the  systematic  mapping  of  data  relations  onto 
visual  form,  as  in  information  visualization  or  data  graphics,  is 
subject  to  even  more  constraints.  It  is  a  genuinely  surprising 
fact,  therefore,  that  most  information  visualization  involves  the 
mapping  data  relations  onto  only  a  half  dozen  components  of 
visual  encoding: 

1.  Spatial  substrate 

2.  Marks 

3.  Connection 

4.  Enclosure 

5.  Retinal  properties,  or 

6.  Temporal  encoding 

Of  these  mappings,  the  most  powerful  is  how  data  are 
mapped  onto  the  spatial  substrate — that  is,  how  data  are 
mapped  into  spatial  position.  In  fact,  one  might  say  that  the  de¬ 
sign  of  an  information  visualization  consists  first  of  deciding 
which  variables  are  going  to  get  the  spatial  mappings,  and  then 
how  the  rest  of  the  variables  are  going  to  make  do  with  the  cod¬ 
ing  mappings  that  are  left. 

Spatial  substrate.  As  we  have  just  said,  the  most  impor¬ 
tant  choice  in  designing  an  information  visualization  is  which 
variables  are  going  to  map  onto  spatial  position.  This  decision 
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TABLE  26.4.  Classes  of  Data  and  Visual  Elements 


Data  Classes 

Visual  Classes 

Class 

Description 

Example 

Description 

Example 

U 

Unstructured  (can  only  distinguish 
presence  or  absence) 

ErrorFlag 

Unstructured  (no  axis,  indicated  merely 
whether  something  is  present  or  absent) 

Dot 

N 

Nominal  (can  only  distinguish  whether 
two  values  are  equal) 

{Goldfinger, 

Ben  Hur, 

Star  Wars} 

Nomina/  Grid  (a  region  is  divided  into 
subregions,  in  which  something  can  be 
present  or  absent) 

Colored  circle 

O 

Ordinal  (can  distinguish  whether  one 
value  is  less  or  greater  but  not 
difference  or  ratio) 

(Small,  Medium, 
Large) 

Ordinal  Grid  (order  of  the  subregions  is 
meaningful) 

Alpha  slider 

1 

Interval  (can  do  subtraction  on  values, 
but  no  natural  zero  and  can’t  compute 
ratios) 

|I0  Dec.  1978- 
4Iun.  19821 

Interval  Grid  (region  has  a  metric  but  no 
distinguished  origin) 

Year  axis 

Q 

Quantitative  (can  do  arithmetic  on  values) 

10-1001  kg 

Quantitative  Grid  (a  region  has  a  metric) 

Time  slider 

Qs 

— Spatial  variables 

10-201  m 

— Spatial  grid 

Qm 

— Similarity 

10-11 

— Similarity  space 

Q. 

— Geographical  coord. 

|30“N-50‘’N|Lat. 

— Geographical  cood. 

Qt 

— Time  variable 

1 10-20]  jjLsec 

— Time  grid 

TABLE  26.5.  Data  Table  with  Meta-Data 
Describing  the  Types  of  the  Variables 


FilmID 

N 

230 

105 

Title 

N 

Goldfinger 

Ben  Hur 

Director 

N 

Hamilton 

Wyler 

Actor 

N 

Connery 

Heston 

Actress 

N 

Blackman 

Harareet 

Year 

Q: 

1964 

1959 

Length 

Q 

1 12 

212 

Popularity 

Q 

7.7 

8.2 

Rating 

O 

PG 

G 

FilrnTYpe 

N 

Action 

Action 

Source:  (Card  et  at,  1999). 


gives  importance  to  spatially  encoded  variables  at  the  expense 
of  variables  encoded  using  other  mappings.  Space  is  perceptu¬ 
ally  dominant  (MacEachren,  1995);  it  is  good  for  discriminating 
values  and  picking  out  patterns.  It  is  easier,  for  example,  to  iden¬ 
tify  the  difference  between  a  sine  and  a  tangent  curve  when  en¬ 
coded  as  a  sequence  of  spatial  positions  than  as  a  sequence  of 
color  hues. 

Empty  space  itself,  as  a  container,  can  be  treated  as  if  it  had 
metric  structure.  Just  as  we  classified  variables  according  to  their 
scale  type,  we  can  think  of  the  properties  of  space  in  terms  of 
the  scale  type  of  an  axis  of  space  (cf  Engelhardt,  Bruin,  Janssen, 
&  Scha,  1996).  Axis  scale  types  correspond  to  the  variable  scale 
types  (see  Table  26.4).  The  most  important  axes  are 

U  =  Unstructured  (no  axis,  indicated  merely  whether  some¬ 
thing  is  present  or  absent) 

N  =  Nominal  Grid  (a  region  is  divided  into  subregions,  in 
which  something  can  be  present  or  absent) 

O  =  Ordinal  Grid  (the  ordering  of  these  subregions  is  mean¬ 
ingful),  and 

Q  =  Quantitative  Grid  (a  region  has  a  metric) . 


Besides  these,  it  is  convenient  to  make  additional  distinctions 
for  frequently  used  subtypes,  such  as  Spatial  axes  (Qs). 

Axes  can  be  linear  or  radial;  essentially,  they  can  involve  any 
of  the  various  coordinate  systems  for  describing  space.  Axes  are 
an  important  building  block  for  developing  Visual  Structures. 
Based  on  the  Data  Table  for  the  FilmEinder  in  Table  26.5,  we  rep¬ 
resent  the  scatterplot  of  as  composed  of  two  orthogonal  quan¬ 
titative  axes: 

Year  — >  Q^, 

Popularity  — >  Qy. 

The  notation  states  that  the  Year  variable  is  mapped  to  a 
quantitative  X-axis  and  the  Popularity  variable  is  mapped  to  a 
quantitative  Y-axis.  Other  axes  are  used  for  the  FilmEinder  query 
widgets.  For  example,  an  ordinal  axis  is  used  in  the  radio  but¬ 
tons  for  film  ratings. 

Ratings  Oy. 

and  a  nominal  axis  is  used  in  the  radio  buttons  for  film  type, 
PilniType  — >  AQ. 

Marks.  Marks  are  the  visible  things  that  occur  in  space. 
There  are  four  elementary  types  of  marks  (Fig.  26.11): 

1.  P  =  Points  (OD), 

2.  L  =  Lines  (ID), 

3.  A  =  Areas  (2D),  and 

4.  V  =  Volumes  (3D). 

Area  marks  include  surfaces  in  three  dimensions,  as  well  as  2D- 
bounded  regions. 

Unlike  their  mathematical  counterpart,  point  and  line  marks 
actually  take  up  space  (otherwise,  they  would  be  invisible)  and 
may  have  properties  such  as  shape. 
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Points 

Lines 

Areas 

Volumes 

01 

spatial 

Object 

(Position) 

Size 

1  1  1 

Gray  Scale 

Uxtcnt 

1  1  1 

•  •  •  • 

DifTcrcnliai 

Orientation 

-  /  1  N 

Color 

Texture 

Shape 

^  0 

FIGURE26.il.  Types  of  marks. 

Connection  and  enclosure.  Point  marks  and  line  marks 
can  be  used  to  signify  other  sorts  of  topological  structure:  graphs 
and  trees.  These  allow  showing  relations  among  objects  with¬ 
out  the  geometrical  constraints  implicit  in  mapping  variables 
onto  spatial  axes.  Instead,  we  draw  explicit  lines.  Hierarchies  and 
other  relationships  can  also  be  encoded  using  enclosure.  En¬ 
closing  lines  can  be  drawn  around  subsets  of  items.  Enclosure 
can  be  used  for  trees,  contour  maps,  and  Venn  Diagrams. 


FIGURE  26.12.  Retinal  properties  (Card  et  al.,  1 999).  The  six 
retinal  properties  can  be  grouped  by  whether  they  form  a  scale 
with  a  natural  zero  point  (extend)  and  whether  they  deal  with 
spatial  distance  or  orientation  (spatial). 


to  changes  in  mark  position  and  the  mark’s  retinal  properties. 
"We  need  to  distinguish  between  temporal  data  variables  to  be 
visualized 


Retinal  properties.  Other  graphical  properties  were 
called  retinal  properties  by  Bertin  (1967/1983),  because  the 
retina  of  the  eye  is  sensitive  to  them  independent  of  position. 
For  example,  the  FilmFinder  in  Fig.  26.1  uses  color  to  encode  in¬ 
formation  in  the  scatterplot: 

FilmlD(Pilm'Iype)  — >  P(Color) 

This  notation  says  that  the  FilmType  attribute  for  any  FilmID 
case  is  visually  mapped  onto  the  color  of  a  point. 

Figure  26.12  shows  Bertin’s  six  “retinal  variables”  separated 
into  spatial  properties  and  object  properties  according  to  which 
area  of  the  brain  they  are  believed  to  be  processed  (Kosslyn, 
1994).  They  are  sorted  according  to  whether  the  property  is 
good  for  expressing  the  extent  of  a  scale  (has  a  natural  zero 
point) ,  or  whether  its  principal  use  is  for  differentiating  marks 
(Bertin,  1977/1981).  Spatial  position,  discussed  earlier  as  basic 
visual  substrate,  is  shown  in  the  position  it  would  occupy  in  this 
classification. 

Other  graphical  properties  have  also  been  proposed  for  en¬ 
coding  information.  MacEachren  (1995)  has  proposed  (a)  crisp¬ 
ness  (the  inverse  of  the  amount  of  distance  used  to  blend  two 
areas  or  a  line  into  an  area),  (b)  resolution  (grain  with  raster  or 
vector  data  will  be  displayed) ,  (c)  transparency,  and  (d)  arrange¬ 
ment  (e.g.,  different  ways  of  configuring  dots).  He  further  pro¬ 
posed  dividing  color  into  (a)  value  (essentially,  the  gray  level  of 
Fig.  26.12),  (b)  hue,  and  (c)  saturation.  Graphical  properties 
from  the  perception  literature  that  can  support  preattentive 
processing  have  been  suggested  candidates  for  coding  variables 
such  as  curvature,  lighting  direction,  or  direction  of  motion  (see 
Healey,  Booth,  and  Enns,  1995).  All  of  these  suggestions  require 
further  research. 

Temporal  encoding.  Visual  Structures  can  also  tempo¬ 
rally  encode  information;  human  perception  is  very  sensitive 


Q,  — >  some  visual  representation 
and  animation,  that  is,  mapping  a  variable  into  time, 
some  variable  — >  Time. 

Time  as  animation  could  encode  any  type  of  data  (whether  it 
would  be  an  effective  encoding  is  another  matter) .  Time  as  ani¬ 
mation,  of  course,  can  be  used  to  visualize  time  as  data. 

Q,— >  Time. 

This  is  natural,  but  not  always  the  most  effective  encoding.  Map¬ 
ping  time  data  into  space  allows  comparisons  between  two  points 
in  time.  For  example,  if  we  map  time  and  a  function  of  time  into 
space  (e.g.,  time  and  accumulated  rainfall), 

Qt  — >  Qx  [make  time  be  the  X-axis] 
fiQt)  —t  Qy,  [make  accumulated  rainfall  be  the  Y-axis, 

then  we  can  directly  experience  rates  as  visual  linear  slope,  and 
we  can  experience  changes  in  rates  as  curves.  This  encoding  of 
time  into  space  for  display  allows  us  to  make  much  more  pre¬ 
cise  judgments  about  rates  than  would  be  possible  from  encod¬ 
ing  time  as  time.  Another  use  of  time  as  animation  is  similar  to 
the  unstructured  axes  of  space.  Animation  can  be  used  to  en¬ 
hance  the  ability  of  the  user  to  keep  track  changes  of  view  or 
visualization.  If  the  user  clicks  on  some  structure,  causing  it  to 
enlarge  and  other  structures  to  become  smaller,  animation  can 
effectively  convey  the  change  and  the  identity  of  objects  across 
the  change,  whereas  simply  viewing  the  two  end  states  is  con¬ 
fusing.  Another  use  is  to  enhance  a  visual  effect.  Rotating  a  com¬ 
plicated  object,  for  example,  will  induce  3D  effects  (hence,  allow 
better  reading  of  some  visual  mappings). 
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Expressiveness  and  Effectiveness 

Visual  mappings  transform  Data  Tables  into  Visual  Structure  and 
then  into  a  visual  image.  This  image  is  not  just  an  arbitrary  im¬ 
age.  It  is  an  image  that  has  a  particular  meaning  it  must  express. 
That  meaning  is  the  data  relation  of  which  it  is  the  visual  trans¬ 
formation.  We  can  think  of  the  image  as  a  sentence  in  a  visual 
language  (Mackinlay,  1986b)  that  expresses  the  relations  in  the 
Data  Table.  To  be  a  good  information  visualization,  the  map¬ 
pings  must  satisfy  some  constraints.  The  first  constraint  is  that 
the  mapping  must  be  expressive.  A  visualization  is  said  to  be 
expressive  if  and  only  if  it  encodes  all  the  data  relations  intended 
and  no  other  data  relations.  The  first  part  of  expressiveness 
turns  out  to  be  easier  than  the  second.  Suppose  we  plot  Film- 
Type  against  Year  using  the  data-to-visual  mapping  in  Fig.  26.13. 
The  problem  of  this  mapping  is  that  the  nominal  movie  rating 
data  are  expressed  by  a  quantitative  axis.  That  is,  we  have  tried 
to  map 

FilmType(N)  Position(Q). 

In  so  doing,  we  have  visually  expressed  all  the  data  relation, 
but  the  visualization  also  implies  relationships  that  do  not  ex¬ 
ist.  For  example,  the  1959  version  of  Ben  Hur  does  not  have 
a  film  type  that  is  five  times  greater  than  the  1926  version  of 
Ben  Hur,  as  implied  in  the  figure.  Wisely,  the  authors  of  the 
FilmFinder  chose  the  mapping 

FilmTypeiN)  — >  Color{N). 

Of  course,  there  are  circumstances  in  which  color  could  be 
read  as  ordinal,  or  even  possibly  quantitative,  but  the  miscella¬ 
neous  order  of  the  buttons  in  Fig.  26.1  discourages  such  an  in¬ 
terpretation  and  the  relatively  low  effectiveness  of  color  for  this 
purpose  in  Table  26.7  also  discourages  this  interpretation. 

Table  26.6  shows  the  mappings  chosen  by  authors  of  the 
FilmFinder.  The  figure  shows  the  Data  Table’s  metadata  and  data 
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FIGURE  26. 1 3.  Mapping  from  data  to  visual  form  that  violates 
expressiveness  criterion. 


and  how  they  are  mapped  onto  the  Visual  Structure.  Note  that 
the  nominal  data  of  the  PG  ratings  is  mapped  onto  a  nominal 
visualization  technique  (colors).  Note  also,  that  names  of  direc¬ 
tors  and  stars  (nominal  variables)  are  raised  to  ordinal  variables 
(through  alphabetization),  and  then  mapped  onto  an  ordinal 
axis.  This  is,  of  course,  a  common  way  to  handle  searching 
among  a  large  number  of  nominal  items. 

Some  properties  are  more  effective  than  others  for  encoding 
information.  Position  is  by  far  the  most  effective  all-around  rep¬ 
resentation.  Many  properties  are  more  effective  for  some  types 
of  data  than  for  others.  Table  26.7  gives  an  approximate  evalua¬ 
tion  for  the  relative  effectiveness  of  some  encoding  techniques 
based  on  (MacEachren,  1995).  We  note  that  spatial  position  is  ef¬ 
fective  for  all  scale  types  of  data.  Shape,  on  the  other  hand,  is 
only  effective  for  nominal  data.  Gray  scale  is  most  effective  for 
ordinal  data.  Such  a  chart  can  suggest  representations  to  a  vi¬ 
sualization  designer. 


Taxonomy  of  Information  Visualizations 

We  have  shown  that  the  properties  of  data  and  visual  repre¬ 
sentation  generally  constrain  the  set  of  mappings  that  form  the 
basis  for  information  visualizations.  Taken  together,  these  con¬ 
straints  form  the  basis  of  a  taxonomy  of  information  visualiza¬ 
tions.  Such  a  taxonomy  is  given  in  Table  26.8.  Visualizations  are 
grouped  into  four  categories.  First  are  Simple  Visual  Struc¬ 
tures,  the  static  mapping  of  data  onto  multiple  spatial  dimen¬ 
sions,  trees,  or  networks  plus  retinal  variables,  depicted  in  Fig. 
26.10.  Here  it  is  worth  distinguishing  two  cases.  There  is  a  per¬ 
ceptual  barrier  at  three  (or,  in  special  cases,  four)  variables,  a 
limit  of  the  amount  of  data  that  can  be  perceived  as  an  imme¬ 
diate  whole.  Bertin  (1977,  1981)  called  this  elementary  unit  of 
visual  data  perception  the  “image”.  Although  this  limit  has  not 
been  definitively  established  in  information  visualization  by  em¬ 
pirical  research,  there  must  be  a  limit  somewhere  or  else  peo¬ 
ple  could  simultaneously  comprehend  a  thousand  variables. 
We  therefore  divide  visualizations  into  those  that  can  be  com¬ 
prehended  in  an  elementary  perceptual  grasp  (three,  or  in  spe¬ 
cial  cases,  four  variables) — let  us  call  these  direct  reading 
visualizations — and  those  more  complex  than  that  barrier — 
which  we  call  articulated  reading  visualizations,  in  which 
multiple  actions  are  required. 

Beyond  the  perceptual  barrier,  direct  composition  of  data  re¬ 
lationships  in  terms  of  1,  2,  or  3  spatial  dimensions  plus  re¬ 
maining  retinal  variables  is  still  possible,  but  rapidly  diminishes 
in  effectiveness.  In  fact,  the  main  problem  of  information  visual¬ 
ization  as  a  discipline  can  be  seen  as  devising  techniques  for  ac¬ 
celerating  the  comprehension  of  these  more  complex  w-variable 
data  relations.  Several  classes  of  techniques  for  w-variable  visu¬ 
alization,  which  we  call  Composed  Visual  Structures,  are  based 
on  composing  Simple  Visual  Structures  together  by  reusing 
their  spatial  axes.  A  third  class  of  Visual  Structures — Interac¬ 
tive  Visual  Structures — comes  from  using  the  rapid  interaction 
capabilities  of  the  computer.  These  visualizations  invoke  the 
parameter-controlling  arrows  of  Fig.  26.10.  Finally,  a  fourth  class 
of  visualizations — Attention-Reactive  Visual  Structures — comes 
from  interactive  displays  where  the  system  reacts  to  user  actions 
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TABLE  26.6.  Meta-Data  and  Mappings  of  Data  onto  Visual  Structure  in  the  FilmFinder 


Data  Vsual  Form 


Variable 

Type 

Range 

Case, 

CasOj 

CasG|( 

Type 

Visual 

Structure 

Control 

Translormation 

Attected 

FtmiD 

N 

AJI-IDs 

230 

105 

540 

-> 

N 

Pclnts. 

Button 

All  idetallsj 

nte 

N 

AII-tIDes 

Goldnrqer 

Ben  Hut 

Ben  Hit 

^sort 

0 

Alphaslder 

Select  cases 

Dtecta 

N 

Al-drectcfs 

Hamilton 

Wyler 

Niblo 

0 

Aipnasider 

Select  cases 

Acta 

N 

AJI -actors 

Conns  ly 

Heston 

Ncfi'arro 

-»soit 

0 

Alphaslder 

Select  cases 

Actress 

N 

All -actresses 

ElacKman 

Harareet 

McArroy 

-isort 

0 

AlphaslHer 

Select  cases 

Year 

Q 

11926,  1989] 

1964 

1959 

1926 

- 

Q 

X-axis 

Axis 

Clip  range 

iengtti 

Q 

[0,  450] 

1 12 

212 

133 

Q 

Txvo-slded  slider 

Clip  range 

Populanty 

Q 

li.al 

7.7 

e.2 

7.4 

— > 

Q 

Y-axIs 

Axis 

Clip  range 

Rating 

0 

{G,  PG,  PG-13,  R1 

PG 

G 

G 

0 

Radio  Duttons 

Select  cases 

Fltn  Type 

N 

(Drama ,  Mystery, 
Comedy,  Must, 
Aettin,  Vifer,  SF, 
Western,  Horror} 

Aettin 

Action 

Drama 

N 

Color 

Radio  Duttons 

Select  cases 

Source:  (Card  et  at,  1999). 


TABLE  26.7.  Relative  Effectiveness  of  Position  and  Retinal  Encodings 


Spatial 

Q 

o 

N 

Object 

Q 

o 

N 

Extent 

(Position) 

• 

• 

• 

Gray  Scale 

C 

• 

o 

Size 

• 

• 

• 

Color 

C 

c 

• 

Differential 

Orientation 

C 

c 

• 

Texture 

c 

c 

• 

Shape 

o 

O 

• 

Source:  (Card  et  at,  1999). 


by  changing  the  display,  even  anticipating  new  displays,  to  lower 
to  cost  of  information  access  and  sensemaking  to  the  user.  To 
summarize, 

/.  Simple  Visual  Structures 
Direct  Reading 
Articulated  Reading 

II.  Composed  Visual  Structures 
Single-Axis  Composition 
Double-Axis  Composition 
Recursive  Composition 

III.  Interactive  Visual  Structure 

IV  Attention-Reactive  Visual  Structure 

These  classes  of  techniques  may  be  combined  to  produce  vi¬ 
sualizations  that  are  more  complex.  To  help  us  keep  track  of  the 
variable  mapping  into  visual  structure,  we  will  use  a  simple  short¬ 
hand  notation  for  listing  the  element  of  the  Visual  Structure  that 
the  Data  Table  has  mapped  into.  We  will  write,  for  example, 
[XYR^]  to  note  that  variables  map  onto  the  X-axis,  the  Y-axis,  and 
two  retinal  encodings.  [OX]  will  indicate  that  the  variables  map 
onto  one  spatial  axis  used  to  arrange  the  objects  (that  is,  the 


cases),  while  another  was  used  to  encode  the  objects’  values.  Ex¬ 
amples  of  this  notation  appear  in  Table  26.8  and  Fig.  26.21. 


SIMPLE  VISUAL  STRUCTURES 


The  design  of  information  visualizations  begins  with  mappings 
from  variables  of  the  Data  Table  into  the  Visual  Structure.  The 
basic  strategy  for  the  visualization  designer  could  be  described 
as  follows: 

1.  Determine  which  variables  of  the  Analytic  Abstraction  to 
map  into  spatial  position  in  the  Visual  Structure. 

2.  Combine  these  mappings  to  increase  dimensionality  (e.g., 
by  folding). 

3 .  Use  retinal  variables  as  an  overlay  to  add  more  dimensions. 

4.  Add  controls  for  interaction. 

5.  Consider  attention-reactive  features  to  expand  space  and 
manage  attention. 

We  start  by  considering  some  of  the  ways  in  which  variables  can 
be  mapped  into  space. 
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TABLE  26.8.  Taxonomy  of  Information  Visualization  Techniques 


I.  Simple 

Trees 

III.  Interactive 

Visual  Structures 

Node  and  link  trees 

Visual  Structures 

Enclosure  trees 

Direct  Reading 

TreeMaps 

Dynamic  queries 

1 -Variable  [X] 

Cone  trees 

Magic  lens 

Lists 

Networks 

Overview-i-detail 

1 D  object  charts 

Time 

Linking  and  brushing 

1 D  scatterplots 

Extraction  &  comparison 

Pie  charts 

II.  Composed 

Attribute  xxplorer 

Folded  Dimensions 

Visual  Structures 

Distributions 

IV.  Focus+Context Attention- 

Box  Plots 

Singles- Axis 

REACTIVE 

2-Varlable  [XY] 

Composition  [XY"] 

Visual  Abstraction 

2D  object  charts 

Permutation  matrices 

2D  scatterplots 

Parallel  coordinates 

Data-based  Methods 

3-Variable 

Double-Axis 

Filtering 

[XYR] 

Composition  [XY] 

Selective  aggregation 

Retinal  scatterplot 

Graphs 

View-based  methods 

Kahonen  diagrams 

Recursive  Composition 

Micro-macro  readings 

Retinal  topographies 

2D  in  2D  [(XY)^^] 

Highlighting 

[(XY)Z] 

Scatterplot  matrices 

Visual  transfer  functions 

Information  landscapes 

Prosection  matrices 

Perspective  distortion 

Information  surfaces 

Hierarchical  axes 

Alternate  geometries 

[XYZ] 

Marks  in  2D  [(XY)**] 

3D  scatterplots 

Stick  figures 

4-Varlable 

Color  icons 

[XYZR] 

Shape  coding 

3D  retinal  scatterplots 

Keim  spirals 

3D  topographies 

3D  in  3D  [(XYZ)™] 

Worlds  within  worlds 

— Barrier  of  Perception — 

Articulated  Reading 

n- Variable 

[XYR"'^] 

2D  Retinal  scatterplots 

[XYZR"'^] 

2D  Retinal  scatterplots 

1  -Variable 


One-variable  visual  displays  may  actually  use  more  than  one  vi¬ 
sual  dimension.  This  is  because  the  data  variable  or  attribute  is 
displayed  against  some  set  of  objects  using  some  mark  and  be¬ 
cause  the  mark  itself  takes  space.  Or,  more  subtly,  it  may  be  be¬ 
cause  one  of  the  dimensions  is  used  for  arranging  the  objects 
and  another  for  encoding  via  position  the  variable.  A  simple  ex¬ 
ample  would  be  when  the  data  are  just  visually  mapped  into  a 
simple  text  list  as  in  Fig.  26.14(a).  The  objects  form  a  sequence 
on  the  Y-dimension,  and  the  width  of  the  marks  (the  text  de¬ 
scriptor)  takes  space  in  the  X-dimension.  By  contrast,  a  one-di¬ 
mensional  scattergraph  (Fig.  26.l4[b])  does  not  use  a  dimen¬ 
sion  for  the  objects.  Here,  the  Y-axis  is  used  to  display  the 
attribute  variable  (suppose  these  are  distances  from  home  of 
gas  stations);  the  objects  are  encoded  in  the  mark  (which  takes 
a  little  bit  of  the  X-dimension) . 


More  generally,  many  single-variable  visualizations  are  in  the 
form  V  =/(o),  where  y  is  a  variable  attribute  and  o  is  the  object. 
Figure  26.14(c)  is  of  this  form  and  uses  the  Y-axis  to  encode  the 
variable  and  the  X-axis  for  the  objects.  Note  that  if  the  objects 
are,  as  usual,  nominal,  then  they  are  reorderable:  sorting  the  ob¬ 
jects  on  the  variable  produces  easily  perceivable  visual  patterns. 
For  convenience,  we  have  used  rectangular  coordinates,  but  any 
other  orthogonal  coordinates  could  be  used  as  the  basis  of  de¬ 
composing  space.  Figure  26.14(d)  uses  9  from  polar  coordinates 
to  encode,  say,  percentage  voting  for  different  presidential  can¬ 
didates.  In  Fig.  26.14(e),  a  transformation  on  the  data  side  has 
transformed  variable  o  into  a  variable  representing  the  distribu¬ 
tion,  then  mapped  that  onto  points  on  the  Y-axis.  In  Fig. 
26.14(f),  another  transformation  on  the  data  side  has  mapped 
this  distribution  into  2nd  quartiles,  3rd  quartiles,  and  outlier 
points,  which  is  then  mapped  on  the  visual  side  into  a  box  plot 
on  the  Y-axis.  Simple  as  they  are,  these  techniques  can  be  very 
useful,  especially  in  combination  with  other  techniques. 
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Goldfinger 
Ben  Hur 

From  Russia  with  Love 
Murder  on  the  Orient  Express 
Never  Say  Never 


(a)  List 


<XX^ — oo 

(bl)  ID  scaltergraph 

o-oOo-O 

ABODE 

(b2)  ID  object  chart _ 


(c)  ID  object  chart 


(d)  Pie  chart 


(e)  Hislogram  of  ID 
distribution 


(f )  Box  plot  of  1 D  distribution 


FIGURE  26.14.  I  -variable  visual  abstractions. 


One  special,  but  common,  problem  is  how  to  visualize  very 
large  dimensions.  This  problem  occurs  for  single-variable  visu¬ 
alizations,  but  may  also  occur  for  one  dimension  of  a  multi-vari¬ 
able  visualization.  Figure  26.15  shows  several  techniques  for 
handing  the  problem.  In  Fig.  26.15(a)  (Freeman  &  Fertig,  1995), 
the  visual  dimension  is  laid  out  in  perspective.  Even  though 
each  object  may  take  only  one  or  a  few  pixels  on  the  axis,  the 
objects  are  actually  fairly  large  and  selectable  in  the  diagram.  In 
Fig.  26.15(b)  (Eick,  Steffen,  &  Sumner,  1992),  the  objects  (rep¬ 
resenting  lines  of  code)  are  laid  out  on  a folded  Y-axis.  When  the 
Y-axis  reaches  the  bottom  of  the  page,  it  continues  offset  at  the 
top.  In  Fig.  26.15(c)  (Keim  &  Kriegel,  1994),  the  axis  is  wrapped 
in  a  square  spiral.  Each  object  is  a  single  pixel,  and  its  value  is 
coded  as  the  retinal  variable  color  hue.  The  objects  have  been 
sorted  on  another  variable;  hence,  the  rings  show  the  correla¬ 
tion  of  this  attribute  with  that  of  the  sorting  attribute. 

One-variable  visualizations  are  also  good  parts  of  controls. 
Controls,  in  the  form  of  slides,  also  consume  considerable  space 
on  the  display  (for  example,  the  controls  in  Fig.  26.1)  that  could 
be  used  for  additional  information  communication.  Figure 
26.15(d)  shows  a  slider  on  whose  surface  is  a  distribution  rep¬ 
resentation  of  the  number  of  objects  for  each  value  of  the  in¬ 
put  variable,  thereby  communicating  information  about  the 
slider’s  sensitivity  in  different  data  ranges.  The  slider  on  the  left 


of  Fig.  26.15(b)  has  a  one-variable  visualization  that  serves  as  a 
legend  for  the  main  visualization:  it  associates  color  hues  with 
dates  and  allows  the  selection  of  date  ranges. 


2-Variables 

As  we  increase  the  number  of  variables,  it  is  apparent  that  their 
mappings  form  a  combinatorial  design  space.  Figure  26.16 
schematically  plots  the  structure  of  this  space,  leaving  out  the 
use  of  multiple  lower  variable  diagrams  to  plot  higher  variable 
combinations.  Two-variable  visualizations  can  be  thought  of 
as  a  composition  of  two  elementary  axes  (Bertin,  1977,  1981; 
Mackinlay  1986b),  which  use  a  single  mark  to  encode  the  posi¬ 
tion  on  both  those  axes.  Mackinlay  called  this  mark  composi¬ 
tion,  and  it  results  in  a  2D  scattergraph  (Fig.  26.l6[g]).  Note  that 
instead  of  mapping  onto  two  positional  visual  encodings,  one 
positional  axis  could  be  used  for  the  objects,  and  the  data  vari¬ 
ables  could  be  mapped  onto  a  position  encoding  and  a  retinal 
encoding  (size),  as  in  Fig.  26.16(f). 


3-Variables  and  Information  Landscapes 

By  the  time  we  get  to  three  data  variables,  a  visualization  ca  be 
produced  in  several  ways.  We  can  use  three  separate  visual  di¬ 
mensions  to  encode  the  three  data  variables  in  a  3D  scatter¬ 
graph  (Fig.  26.l6[j]).  We  could  also  use  two  spatial  dimensions 
and  one  retinal  variable  in  a  2D  retinal  scattergraph  (Fig. 
26.l6[k]).  Or  we  could  use  one  spatial  dimension  as  an  object 
dimension,  one  as  a  data  attribute  dimension,  and  one  two  reti¬ 
nal  encodings  for  the  other  variables,  as  in  an  object  chart  such 
as  in  Fig  26.l6(i).  Because  Fig.  26.l6(i)  uses  multiple  retinal  en¬ 
codings,  however,  it  may  not  be  as  effective  as  other  techniques. 
Notice  that  because  they  all  encode  three  data  variables,  we 
have  classified  2D  and  3D  displays  together.  In  fact,  one  popular 
3-variable  information  visualization  that  lies  between  2D  and  3D 
is  the  information  landscape  (Fig.  26.l6[m]).  This  is  essentially 
a  2D  scattergraph  with  one  data  variable  extruded  into  the  third 
spatial  dimension.  Its  essence  is  that  two  of  the  spatial  dimen¬ 
sions  are  more  tightly  coupled  and  often  relate  to  a  2D  visual¬ 
ization.  For  example,  the  two  dimensions  might  form  a  map 
with  the  bars  showing  the  GDP  of  each  region. 

Another  special  type  of  3-variable  information  visualization  is 
a  2D  information  topography.  In  an  information  typography, 
space  is  partly  defined  by  reference  to  external  structure.  For  ex¬ 
ample,  the  topography  of  Fig.  26.17(a)  is  a  map  of  San  Francisco, 
requiring  two  spatial  variables.  The  size  of  blue  dots  indexes 
the  number  of  domain  names  registered  to  San  Francisco  street 
addresses.  Looking  at  the  patterns  in  the  visualization  shows 
that  Internet  addresses  have  especially  concentrated  in  the  Mis¬ 
sion  and  South  of  Mission  districts.  Figure  26.17(a)  uses  a  topog¬ 
raphy  derived  from  real  geographical  space.  "Various  techniques, 
such  as  multidimensional  scaling,  factor  analysis,  or  connec- 
tionist  self-organizing  algorithms,  can  create  abstract  spaces 
based  on  the  similarities  among  collections  of  documents  or 
other  objects.  These  abstract  similarity  spaces  can  function  like 
a  topography.  An  example  can  be  seen  in  Fig.  26.17(b),  where 
the  pages  in  a  website  are  depicted  as  regions  in  a  similarity 
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(c)  Spiraled  long  1 -variable  visual 
abstraction:  VisDB  fKeim  &  Kriegel. 
1994). _ 


tdl.  1 -variable  visual  abstraction  used  as 
a  control.  (Eick.  1993). 


(a)  Off-axis  1 -variable  visual  abstraction: 
LifeLines  (Freeman  &  Ferlig,  1995). 


(b)  Folded  long  1  -variable  visual 
abstraction:  SccSoft  (Eick,  Steffen,  & 
Sumner,  1992). _ 


FIGURE  26.15.  Uses  of  1  -variable  visual  abstractions. 


space.  To  create  this  diagram^,  a  web  crawler  crawls  the  site  and 
indexes  all  the  words  and  pages  on  the  site.  Each  page  is  then 
turned  into  a  document  vector  to  represent  the  semantic  con¬ 
tent  of  that  page.  The  regions  are  created  using  a  neural  net¬ 
work  learning  algorithm  (see  Lin,  Soergel,  &  Marchionini  (1991)) . 
This  algorithm  organizes  the  set  of  web  pages  into  regions.  A 
visualization  algorithm  then  draws  boundaries  around  the  re¬ 
gions,  colors  them,  and  names  them.  The  result,  called  a.Kahonen 
diagram  after  its  original  inventor,  is  a  type  of  retinal  similarity 
topography. 

Information  landscapes  can  also  use  marks  that  are  surfaces. 
In  Fig.  26.18(a),  topics  are  clustered  on  a  similarity  surface,  and 
the  strength  of  each  topic  is  indicated  by  a  3D  contour.  A  more 
extreme  case  is  Fig.  26.18(b),  where  an  information  landscape  is 
established  in  spherical  coordinates,  and  the  amount  of  ozone 
is  plotted  as  a  semitransparent  overlay  on  the  p-axis. 

n-Variables 

Beyond  three  variables,  direct  extensions  of  the  methods  we 
have  discussed  become  less  effective.  It  is  possible,  of  course 
to  make  plots  using  two  spatial  variables  and  n-2  retinal  vari¬ 


ables,  and  the  possibilities  for  four  variables  are  shown  in  Fig. 
26.16.  These  diagrams  can  be  understood,  but  at  the  cost  of  pro¬ 
gressively  more  effort  as  the  number  of  variables  increases.  It 
would  be  very  difficult  to  understand  an  [XYR^°]  retinal  scatter- 
graph,  for  example. 

Trees 

An  interesting  alternative  to  showing  variable  values  by  spatial 
positioning  is  to  use  explicitly  drawn  linkages  of  some  kind. 
Trees  are  the  simplest  form  of  these.  Trees  map  cases  into  sub¬ 
cases.  One  of  the  data  variables  in  a  Data  Table  (for  example,  the 
variable  ReportsTo  in  an  organization  chart)  is  used  to  define 
the  tree.  There  are  two  basic  methods  for  visualizing  a  tree:  (a) 
Connection  and  (b)  Enclosures. 

Connection.  Connection  uses  lines  to  connect  marks  sig¬ 
nifying  the  nodes  of  the  tree.  Logically,  a  tree  could  be  drawn 
merely  by  drawing  lines  between  objects  located  randomly  po¬ 
sitioned  on  the  plane,  but  such  a  tree  would  be  visually  unread¬ 
able.  Positioning  in  space  is  important.  Figure  26.20(a)  is  a  tree 
from  Charles  Darwin’s  notebook  (Robin,  1992)  drawn  to  help 


^This  figure  is  produced  by  a  program  called  SiteMap  by  Xia  Lin  and  associates.  See  http://faculty.cis.drexel.edu/sitemap/index.html. 
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(a)  X|YiR  Retinal  topography 
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(b)  XsYsR  Retinal  similarity  topography 


FIGURE  26. 1 7.  Retinal  information  topographies. 


him  work  out  the  theory  of  evolution.  Lines  proceed  from  an¬ 
cestor  species  to  new  species.  Note  that  even  in  this  informal 
setting  intended  for  personal  use  that  the  tree  uses  space  sys¬ 
tematically  (and  opportunistically).  There  are  no  crossed  lines. 
A  common  way  of  laying  out  trees  is  to  have  the  depth  in  the 
tree  map  onto  one  ordinal  access  as  in  Fig.  26.20(b),  while  the 
other  axis  is  nominal  and  used  to  separate  nodes.  Of  course, 
trees  could  also  be  mapped  into  other  coordinate  systems:  for 
example,  there  can  be  circular  trees  in  which  the  r-axis  repre¬ 
sents  depth  and  the  9-axis  is  used  to  separate  nodes  as  in  the 
representation  of  the  evolution  species  in  Fig.  26. 20(c). ^  It  is  be¬ 
cause  trees  have  no  cycles  that  one  of  the  spatial  dimensions 
can  be  used  to  encode  tree  depth.  This  partial  correlation  of 


(a)  News  siories  based  on  ThemeScapes  (Wise  el  al.,  1995).  CouTlesy  NewsMaps.com. 


(b)  Ozone  layer  surrounding  earth.  L.  Treinish.  Courtesy  IBM. 


FIGURE  26.18.  3D  information  surface  topographies. 


tree  structure  and  space  makes  trees  relatively  easy  to  lay  out 
and  interpret,  compared  to  generalized  networks.  Hierarchical 
displays  are  important  not  only  because  many  interesting  col¬ 
lections  of  information,  such  as  organization  charts  or  tax¬ 
onomies,  are  hierarchical  data,  but  also  because  important  col¬ 
lections  of  information,  such  as  websites,  are  approximately 
hierarchical.  Whereas  practical  methods  exist  for  displaying 
trees  up  to  several  thousand  nodes,  no  good  methods  exist  for 
displaying  general  graphs  of  this  size.  If  a  visualization  problem 
involves  the  displaying  of  network  data,  a  practical  design 
heuristic  is  to  see  whether  the  data  might  not  be  forced  into  a 
display  as  a  modified  tree,  such  as  a  tree  with  a  few  non-tree 
links.  A  significant  disadvantage  of  trees  is  that  as  they  get  large, 
they  acquire  an  extreme  aspect  ratio,  because  the  nodes  expand 
exponentially  as  a  function  with  depth.  Consequently,  any  suffi¬ 
ciently  large  tree  (say,  >1000  nodes)  resembles  a  straight  line. 
Circular  trees  such  as  Fig.  26.20(c)  are  one  way  of  trying  to  buy 
more  space  to  mitigate  this  problem.  Another  disadvantage  of 
trees  is  the  significant  empty  space  between  nodes  to  make 
their  organization  easily  readable.  Various  tricks  can  be  used  to 


^This  figure  is  from  David  Hillis,  University  of  Texas. 
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wrap  parts  of  the  tree  into  this  empty  space,  but  at  the  expense 
of  the  tree’s  virtues  of  readability. 

Enclosure.  Enclosure  uses  lines  to  hierarchically  enclose 
nested  subsets  of  the  tree.  Figure  26.20(d)  is  an  enclosure  tree 
encoding  of  Darwin’s  tree  in  Fig.  26.20(a).  We  have  already  seen 
one  attempt  to  use  tree  enclosure,  TreeMaps  (Fig.  26.5). 
TreeMaps  make  use  of  all  the  space  and  stays  within  prescribed 
space  boundaries,  but  they  do  not  represent  the  nonterminal 
nodes  of  the  tree  very  well  and  similar  leaves  can  have  wildly  dif¬ 
ferent  aspect  ratios.  Recent  variations  on  TreeMaps  found  ways 
to  “squarify”  nodes  (Shneiderman  &  Wattenberg,  2001),  miti¬ 
gating  this  problem. 

Networks 

Networks  are  more  general  than  trees  and  may  contain  cycles. 
Networks  may  have  directional  links.  They  are  useful  for  de¬ 
scribing  communication  relationships  among  people,  traffic  in  a 
telephone  network,  and  the  organization  of  the  Internet.  Con¬ 
tainment  is  difficult  to  use  as  a  visual  encoding  for  network  re¬ 
lationships,  so  most  networks  are  laid  out  as  node  and  link  dia¬ 
grams.  Unfortunately,  straightforward  layouts  of  large  node  and 
link  diagrams  tend  to  resemble  a  large  wad  of  tangled  string. 

We  can  distinguish  the  same  types  of  nodes  and  links  in  net¬ 
work  "Visual  Structures  that  we  did  for  spatial  axes:  (a)  Unstruc¬ 
tured  (unlabeled),  (b)  Nominal  (labeled),  (c)  Ordinal  (labeled 


(a)  Telephone  traffic  after  California  earthquake. 
(Becker.  Eick,  &  Wilks,  1995). 


(c)  Line  shortening 


(b)  Thresholding. 


(d)  Visualization  to  detect  telephone  fraud.  (Cox, 
Eick,  &  Wills,  1997). 


FIGURE  26.20.  Network  methods. 
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with  an  ordinal  quantity),  or  (d)  Quantitative  (weighted  links). 
Retinal  properties,  such  as  size  or  color,  can  be  used  to  encode 
information  about  links  and  nodes.  As  in  the  case  of  trees,  spa¬ 
tial  positioning  of  the  nodes  is  extremely  important.  Network  vi¬ 
sualizations  escape  from  the  strong  spatial  constraints  of  sim¬ 
ple  Visual  Structures  only  to  encounter  another  set  of  strong 
spatial  constraints  of  node  links  crossing  and  routing.  Networks 
and  trees  are  not  so  much  an  alternative  of  direct  of  the  direct 
graphical  mappings  we  have  discussed  so  far  as  they  are  another 
set  of  techniques  that  can  be  overlaid  on  these  mappings.  Small 
node  and  link  diagrams  can  be  laid  out  opportunistically  by 
hand  or  by  using  graph  drawing  algorithms  that  have  been  de¬ 
veloped  (Battista,  Hades,  Tamassia,  &  Tollis,  1994;  Cruz  &  Tamas- 
sia,  1998;  Tamassia,  1996)  to  optimize  minimal  link  crossing, 
symmetry,  and  other  aesthetic  principles. 

For  very  large  node  and  link  diagrams,  additional  organizing 
principles  are  needed.  If  there  is  an  external  topographic  struc¬ 
ture,  it  is  sometimes  possible  to  use  the  spatial  variables  associ¬ 
ated  with  the  nodes.  Figure  26.20(a)  shows  a  network  based  on 
call  traffic  between  cities  in  the  United  States  (Becker,  Eick,  & 
Wilks,  1995).  The  geographical  location  of  the  cities  is  used  to 
lay  out  the  nodes  of  the  network.  Another  way  to  position 
nodes  is  by  associating  nodes  with  positions  in  a  similarity 
space,  such  the  nodes  that  have  the  strongest  linkages  to  each 
other  are  closest  together.  There  are  several  methods  for  com¬ 
puting  node  nearness  in  this  way.  One  is  to  use  multidimen¬ 
sional  scaling  (MDS)  (Fairchild,  Poltrock,  &  Furnas,  1988).  An¬ 
other  is  to  use  a  “spring”  technique,  in  which  each  link  is 
associated  with  a  Hooke’s  Law  spring  weighted  by  strength  of 
association  and  the  system  of  springs  is  solved  to  obtain  node 
position.  Eick  and  Willis  (1993)  have  argued  that  the  MDS  tech¬ 
nique  places  too  much  emphasis  on  smaller  links.  They  have  de¬ 
rived  an  alternative  that  gives  clumpier  (and  hence,  more  visu¬ 
ally  structured)  clusters  of  nodes.  If  positioning  of  nodes 
corresponds  perfectly  with  linkage  information,  then  the  links 
do  not  add  more  visual  information.  If  positioning  does  not  cor¬ 
respond  at  all  with  linkage  information,  then  the  diagram  is  ran¬ 
dom  and  obscure.  In  large  graphs,  node  positions  must  have  a 
partially  correlated  relationship  to  linkage  in  order  to  allow  the 
emergence  of  visual  structure.  Note  that  this  is  what  happens 
in  the  telephone  traffic  diagram  Fig.  26.20(a).  Cities  are  posi¬ 
tioned  by  geographical  location.  Communication  might  be  ex¬ 
pected  to  be  higher  among  closer  cities,  so  the  fact  that  com¬ 
munications  is  heavy  between  coasts  stands  out. 

A  major  problem  in  a  network  such  as  Fig.  26.20(a)  is  that 
links  may  obscure  the  structure  of  the  graph.  One  solution  is 
to  route  the  links  so  that  they  do  not  obscure  each  other.  The 
links  could  even  be  drawn  outside  the  plane  in  the  third  di¬ 
mension;  however,  there  are  limits  to  the  effectiveness  of  this 
technique.  Another  solution  is  to  use  thresholding,  as  in  Fig. 
26.20(b).  Only  those  links  representing  traffic  greater  than  a  cer¬ 
tain  threshold  are  included;  the  others  are  elided  allowing  us 
to  see  the  most  important  structure.  Another  technique  is  line 
shortening,  as  in  Fig.  26.20(c).  Only  the  portion  of  the  line  near 
the  nodes  is  drawn.  At  the  cost  of  giving  up  the  precise  linkage, 
it  is  possible  to  read  the  density  of  linkages  for  the  different 
nodes.  Figure  26.20(d)  is  a  technique  used  to  find  patterns  in  an 
extremely  large  network.  Telephone  subscribers  are  repre¬ 
sented  as  nodes  on  a  hexagonal  array.  Frequent  pairs  are  located 


near  each  other  on  the  array.  Suspicious  patterns  are  visible  be¬ 
cause  of  the  sparseness  of  the  network. 

The  insightful  display  of  large  networks  is  difficult  enough 
that  many  information  visualization  techniques  depend  on  in¬ 
teractivity.  One  important  technique,  for  example,  is  node  ag¬ 
gregation.  Nodes  can  be  aggregated  to  reduce  the  number  of 
links  that  have  to  be  drawn  on  the  screen.  Which  nodes  are  ag¬ 
gregated  can  depend  on  the  portion  of  the  network  on  which 
the  user  is  drilling  down.  Similarly,  the  sets  of  nodes  can  be  in¬ 
teractively  restricted  (e.g.,  telephone  calls  greater  than  a  cer¬ 
tain  volume)  to  reduce  the  visualization  problem  to  one  within 
the  capability  of  current  techniques. 


COMPOSED  VISUAL  STRUCTURES 


So  far,  we  have  discussed  simple  mappings  from  data  into  spa¬ 
tial  position  axes,  connections  and  enclosures,  and  retinal  vari¬ 
ables.  These  methods  begin  to  run  into  a  barrier  around  three 
variables  as  the  spatial  dimensions  are  used  up  and  as  multiple 
of  the  less  efficient  retinal  variables  needed.  Most  interesting 
problems  involve  many  variables.  We  shall  therefore  look  at  a 
class  of  methods  that  reuse  precious  spatial  axes  to  encode  vari¬ 
ables.  This  is  done  by  composing  a  compound  Visual  Structure 
out  of  several  simple  Visual  Structures.  We  will  consider  five  sub¬ 
classes  of  such  composition:  (a)  mark  composition,  (b)  case 
composition,  (c)  single-axis  composition,  (d)  double-axis  com¬ 
position,  and  (e)  recursive  composition.  Schematically,  we  il¬ 
lustrate  these  possibilities  in  Fig.  26.21. 

Single-axis  composition.  In  single-axis  composition, 
multiple  variables  that  share  a  single  axis  are  aligned  using  that 
axis,  as  illustrated  in  Fig.  26.21(a).  An  example  of  single-axis 
composition  is  a  method  due  to  Bertin  called  permutation 
matrices  (Bertin,  1977/1981).  In  a  permutation  matrix  (Fig. 
26.l6[o],  for  example),  one  of  the  spatial  axes  is  used  to  repre¬ 
sent  the  cases  and  the  other  a  series  of  bar  charts  (or  rows  of  cir¬ 
cles  of  different  size  or  some  other  depiction  of  the  value  of 
each  variable)  to  represent  the  values.  In  addition,  bars  for  val¬ 
ues  below  average  may  be  given  a  different  color,  as  in  Fig.  26.7, 
in  order  to  enhance  the  visual  patterns.  The  order  of  the  objects 
and  the  order  of  the  variables  may  both  be  permuted  until  pat¬ 
terns  come  into  play.  Permutation  matrices  were  used  in  our  ho¬ 
tel  analysis  example.  They  give  up  direct  reading  of  the  data 
space  in  order  to  handle  a  larger  number  of  variables.  Of  course, 
as  the  number  of  variables  (or  objects)  increases,  manipulation 
of  the  matrices  becomes  more  time-consuming  and  visual  in¬ 
terpretation  more  complex.  Still,  permutation  matrices  or  their 
variants  are  one  of  the  most  practical  ways  of  representing 
multi-variable  data. 

If  we  superimpose  the  bar  charts  of  the  permutation  matrix 
atop  one  another,  and  then  replace  the  bar  chart  with  a  line  link¬ 
ing  together  the  tops  of  the  bars,  we  get  another  method  for 
handling  multiple  variables  by  single-axis  composition — paral¬ 
lel  coordinates  (Inselberg,  1997;  Inselberg  &  Dimsdale,  1990), 
as  shown  in  Fig.  26.22.  A  problem  is  analyzed  in  parallel  coordi¬ 
nates  by  interactively  restricting  the  objects  displayed  (the  lines) 
in  order  to  look  at  cases  with  common  characteristics.  In  Fig. 
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FIGURE  26.2 1 .  Composition  types. 


FIGURE  26.22.  Single-axis  composition:  parallel  coordinates. 


26.22  ,  parallel  coordinates  are  used  to  analyze  the  problem  of 
yield  from  a  certain  processor  chip.  XI  is  chip  yield,  X2  is  qual¬ 
ity,  X3  through  X12  are  defects,  and  the  rest  of  the  variables  are 
physical  parameters.  The  analysis,  looking  at  those  subsets  of 
data  with  high  yield  and  noticing  the  distribution  of  lines  on 
the  other  parameters,  was  able  to  solve  a  significant  problem  in 
chip  processing. 

Both  permutation  matrices  and  parallel  coordinates  allow 
analyses  in  multi-dimensional  space,  because  they  are  efficient 
in  the  use  (and  reuse)  of  spatial  position  and  the  plane.  Actually, 
they  also  derive  part  of  their  power  from  being  interactive.  In 
the  case  of  permutation  matrices,  interactivity  comes  in  re¬ 
ordering  the  matrices.  In  the  case  of  parallel  coordinates,  inter¬ 
activity  comes  in  selecting  subsets  of  cases  to  display. 

Double-axis  composition.  In  double-axis  composition, 
two  visual  axes  must  be  in  correspondence,  in  which  case  the 
cases  are  plotted  on  the  same  axes  as  a  multivariable  graph  (Fig. 
26.21  [b]).  Care  must  be  taken  that  the  variables  are  plotted  on  a 
comparable  scale.  For  this  reason,  the  separate  scales  of  the  vari¬ 
ables  are  often  transformed  to  a  common  proportion  change 
scale.  An  example  would  be  change  in  price  for  various  stocks. 
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The  cases  would  be  the  years,  and  the  variables  would  be  the 
different  stocks. 

Mark  composition  and  case  composition.  Composi¬ 
tion  can  also  fuse  diagrams.  We  discussed  that  each  dimension 
of  visual  space  can  be  said  to  have  properties  as  summarized  in 
Table  26.4.  The  visual  space  of  a  diagram  is  composed  from  the 
properties  of  its  axis.  In  mark  composition  (Fig.  26.21  [c]),  the 
mark  on  one  axis  can  fuse  with  the  corresponding  mark  on  an¬ 
other  axis  to  form  a  single  mark  in  the  space  formed  by  the  two 
axes.  Similarly,  two  object  charts  can  be  fused  into  a  single  dia¬ 
gram  by  having  a  single  mark  for  each  case.  We  call  this  latter 
form  case  composition  Fig.  26.21(d). 

Recursive  composition.  Recursive  composition  divides 
the  plane  (or  3D  space)  into  regions,  placing  a  subvisualization 


in  each  region  (Fig.  26.21  [e]).  We  use  the  term  somewhat 
loosely,  since  regions  have  different  types  of  subvisualizations. 
The  FilmFinder  in  Fig.  26.1  is  a  good  example  of  a  recursive  vi¬ 
sualization.  The  screen  breaks  down  into  a  series  of  simple  Vi¬ 
sual  Structures  and  controls:  (a)  a  3-variable  retinal  scattergraph 
(Year,  Rating,  FilmType)  -I-  (b)  a  1-variable  slider  (Title)  -I-  (c)  a 
1-variable  slider  (Actors)  -f  (d)  a  1-variable  slider  (Actresses)  -I- 
(e)  a  1-variable  slider  (Director)  -I-  (f)  a  1-variable  slider  (Film- 
Length)  -I-  (g)  a  1-variable  radio  button  control  (Rating)  -I- 
(h)  a  1-variable  button-set  (FilmType). 

Three  types  of  recursive  composition  deserve  special  men¬ 
tion:  (a)2D-in-2D,  (b)  marks-in-2D,  and  (c)  3D-in-3D.  An  exam¬ 
ple  of  2D-in-2D  composition  is  the  “prosection  matrix” 
(Tweedie,  Spence,  Dawkes,  &  Su,  1996)  shown  in  Fig.  26.23(a). 
Each  smaller  square  in  the  prosection  matrix  represents  a  pair  of 
parameters  plotted  against  each  other.  The  coloring  shows 


(a)  2D-in-2r):  Altribule  Explorer  (Tweedie, 
Spence,  Dawkes,  &  Su,  1996). 


(b)  Marks-in-2D.  Composition  of  a  stick  figure 
mark  (Pickett  &  Grinstein.  1988). 


(c)  Visualization  of  stick  figures  showing 
weather  around  Lake  Ontario. 


(d)  3D-in-3D:  Worlds-within-worlds  (Fcincr  & 
Beshers,  1990). 


FIGURE  26.23.  Recursive  composition. 
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which  values  of  the  plotted  pair  give  excellent  (red  region)  or 
partly  good  (gray  regions)  performance  for  the  design  of  some 
device.  The  arrangement  of  the  individual  matrices  into  a  su¬ 
permatrix  redefines  the  spatial  dimensions  (that  is,  associates 
it  with  different  variables)  within  each  of  the  cells,  and  the  cells 
themselves  are  arranged  in  an  overall  scheme  that  systematically 
uses  space.  In  this  way,  the  precious  spatial  dimension  is  effec¬ 
tively  expanded  to  where  all  the  variables  can  reuse  it.  An  im¬ 
portant  property  of  techniques  similar  to  this  one  is  that  space 
is  defined  at  more  than  one.  grain  size,  and  these  levels  of  grain 
become  the  basis  for  a  macro-micro  reading. 

An  example  of  marks-in-2D  composition  in  the  use  of  “stick 
figure”  displays.  This  is  an  unusual  type  of  visualization  in  which 
the  recursion  is  within  the  mark  instead  of  within  the  use  of 
space.  Figure  26.23(b)  shows  a  mark  that  is  itself  composed  of 
submarks.  The  mark  is  a  line  segment  with  four  smaller  line  seg¬ 
ments  protruding  from  the  ends.  Four  variables  are  mapped 
onto  angle  of  these  smaller  line  segments  and  a  fifth  onto  the 
angle  of  the  main  line  segment.  Two  additional  variables  are 
mapped  onto  the  position  of  this  mark  in  a  2D  display.  A  typi¬ 
cal  result  is  the  visualization  in  Fig.  26.23(c),  which  shows  five 
weather  variables  around  Lake  Ontario,  the  outline  of  which 
clearly  appears  in  the  figure. 

Feiner  and  Beshers  (1990)  provided  an  example  of  the  third 
recursive  composition  technique,  3D-in-3D  composition.  Sup¬ 
pose  a  dependent  variable  is  a  function  of  six  continuous  vari¬ 
ables,  y  =f(x,  y,  z,  w,  r,  s).  Three  of  these  variables  are  mapped 
onto  a  3D  coordinate  system.  A  position  is  chosen  in  that  space, 
say,  xl,yl,  zl.  At  that  position,  a  new  3D  coordinate  system  is 
presented  with  a  surface  defined  by  the  other  three  variables 
(Fig.  26.23[d]).  The  user  can  thus  viewy  =/(xl,yl,  zl,  w,  r,  s'). 
The  user  can  slide  the  second-order  coordinate  system  to  any 
location  in  the  first,  causing  the  surface  to  change  appropriately. 
Note  that  this  technique  combines  a  composed  visual  inter¬ 
action  with  interactivity  on  the  composition.  Multiple  second- 
order  coordinate  systems  can  be  displayed  at  the  space  simul¬ 
taneously,  as  long  as  they  do  not  overlap  by  much. 


INTERACTIVE  VISUAL  STRUCTURES 


In  the  examples  we  have  considered  so  far,  we  have  often  seen 
that  information  visualization  techniques  were  enhanced  by  be¬ 
ing  interactive.  Interactivity  is  what  makes  visualization  a  new 
medium,  separating  it  from  generations  of  excellent  work  on 
scientific  diagrams  and  data  graphics.  Interactivity  means  con¬ 
trolling  the  parameters  in  the  visualization  reference  model  (Fig. 
26.10).  This  naturally  means  that  there  are  different  types  of  in¬ 
teractivity,  because  the  user  could  control  the  parameters  to  data 
transformations,  to  visual  mappings,  or  to  view  transformations. 
It  also  means  that  there  are  different  forms  of  interactivity  based 
on  the  response  cycle  of  the  interaction.  As  an  approximation, 
we  can  think  of  there  being  three  time  constants  that  govern  in¬ 
teractivity,  which  we  take  to  be  0.1  sec,  1  sec,  and  10  sec  (Card, 
Moran,  &  Newell,  1986)  (although  the  ideal  value  of  these  may 
be  somewhat  less,  say,  0.07  sec,  0.7  sec,  and  7  sec).  The  first  time 
constant  is  the  time  in  which  a  system  response  must  be  made,  if 
the  user  is  to  feel  that  there  is  a  direct  physical  manipulation  of 


the  visualization.  If  the  user  clicks  on  a  button  or  moves  a  slider, 
the  system  needs  to  update  the  display  in  less  than  0.1  sec.  Ani¬ 
mation  frames  need  to  take  less  than  0.1  sec.  The  second  time 
constant,  1  sec,  is  the  time  to  complete  an  immediate  action,  for 
example,  an  animated  sequence  such  as  zooming  in  to  the  data 
or  rotating  a  tree  branch.  The  third  time  constant  10  sec  (mean¬ 
ing  somewhere  in  the  5  to  30  sec  interval)  is  the  time  for  com¬ 
pleting  some  cognitive  action,  for  example  deleting  an  element 
from  the  display.  Let  us  consider  a  few  well-known  techniques  for 
interactive  information  visualizations. 

Dynamic  queries.  A  general  paradigm  for  visualization 
interaction  is  dynamic  queries,  the  interaction  technique  used 
by  the  FilmFinder  in  Fig.  26.1.  The  user  has  a  visualization  of  the 
data  and  a  set  of  controls,  such  as  sliders,  by  which  subsets  of 
the  Data  Table  can  be  selected.  For  example.  Table  26.9  shows 
the  mappings  of  the  Data  Table  and  controls  for  the  FilmFinder. 
The  sliders  and  other  controls  will  select  which  subset  of  the 
data  is  going  to  be  displayed.  In  the  FilmFinder,  the  control  for 
Length  is  a  two-sided  slider.  Setting  one  end  to  90  minutes  and 
the  other  end  to  120  minutes  will  select  for  display  only  those 
cases  of  the  Data  Table  whose  year  variable  lies  between  these 
limits.  The  display  needs  to  change  within  the  0.1  sec  of  chang¬ 
ing  the  slider. 

Magic  lens  {movable  filter).  Dynamic  queries  is  one 
type  of  interactive  filter.  Another  type  is  a  movable  filter  that  can 
be  moved  across  the  display  as  in  Fig.  26.24(a).  These  magic 
lenses  are  useful  when  it  is  desired  to  filter  only  some  of  the 
display.  For  example,  a  magic  lens  could  be  used  with  a  map  that 
showed  the  population  of  any  city  it  was  moved  over.  Multiple 
magic  lenses  can  be  used  to  cascade  filters. 

Overview  -I-  detail.  We  can  think  of  an  overview  -I-  detail 
display  (Fig.  26.24  [b])  as  a  particular  type  of  magic  lens,  one  that 
magnifies  the  display  and  has  the  magnified  region  off  to  the 
side  so  as  not  to  occlude  the  region.  Displays  have  information 
at  different  grain  sizes.  A  GIS  map  may  have  information  at  the 
level  of  a  continent  as  well  as  at  the  level  of  a  city.  If  the  shape 
of  the  continent  can  be  seen,  the  display  is  too  coarse  to  see 
the  roadways  of  a  city.  Overview  -I-  detail  displays  show  that  data 
at  more  than  one  level,  but  they  also  show  where  the  finer  grain 
display  fits  into  the  larger  grain  display.  In  Fig.  26.24(b),  from 
SeeSoft  (Eick  et  al.,  1992),  a  system  for  visualizing  large  software 
systems,  the  amount  of  magnification  in  the  detail  view  is  large 
enough  that  two  concatenated  overview  -I-  detail  displays  are  re¬ 
quired.  Overview  -F  detail  displays  are  thus  very  helpful  for  data 
navigation.  Their  main  disadvantage  is  that  the  require  coordi¬ 
nation  of  two  visual  domains. 

Linking  and  brushing.  Overview  -I-  detail  is  an  exam¬ 
ple  of  coordinating  dual  representations  of  the  same  data. 
These  can  be  coordinated  interactively  with  linking  and  brush¬ 
ing.  Suppose,  for  example,  we  wish  to  show  power  consump¬ 
tion  on  an  airplane,  both  in  terms  of  the  physical  representation 
of  the  airplane  and  a  logical  circuit  diagram.  The  two  views 
could  be  shown  and  linked  by  using  the  same  color  for  the 
same  component  types.  Interactivity  itself  can  be  used  for  a  dy¬ 
namic  form  of  linking  called  brushing.  In  brushing,  running  the 


TABLE  26.9.  Visual  Marks  and  Controls  for  FilmFinder 
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(a)  Magic  Lens  (Bier,  Slone,  Pier,  Buxton,  & 
DeRose,  1993):  Detail  of  map.  Courtesy  Xerox 
Corp. 


(c)  Extract  and  compare:  SDM  (Roth,  Chuah,  & 
Mattis,  1995). 


(b)  Cascading  overview  +  detail:  SeeSoft  (Eick 
ctal.,  1992).. 


(d)  Attribute  Explorer:  (Tweedie  et  al.,  1996). 
Courtesy  Robert  Spence.. 


FIGURE  26.24.  Interaction  techniques. 
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cursor  over  a  part  of  one  of  the  views  causes  highlighting  both 
in  that  view  and  in  the  other  view. 

Extraction  and  comparison.  We  can  also  use  interac¬ 
tion  to  extract  a  subset  of  the  data  to  compare  with  another  sub¬ 
set.  An  example  of  this  is  in  the  SDM  system  (Chuah,  Roth,  Mat- 
tis,  &  Kolojejchick,  1995)  in  Fig.  26.24(c).  The  data  are  displayed 
in  a  3D  information  landscape,  but  the  perspective  interferes 
with  the  ability  to  compare  it.  Information  is  therefore  extracted 
from  the  display  (leaving  ghosts  behind)  and  placed  in  an  or¬ 
thogonal  viewing  position  where  it  can  be  compared  using  2D. 
It  could  also  be  dropped  into  another  display.  Interactivity 
makes  possible  these  manipulations,  while  keeping  them  coor¬ 
dinated  with  the  original  representations. 

Attribute  explorer.  Several  of  these  interactive  tech¬ 
niques  are  combined  in  the  Attribute  Explorer  (Tweedie  et  al., 
1996).  Figure  26.24(d)  shows  information  on  four  attributes  of 
houses.  Each  attribute  is  displayed  by  a  histogram,  where  each 
square  making  up  the  histogram  represents  an  individual 
house.  The  user  selects  a  range  of  some  attribute,  say  price. 
Those  pixels  making  up  the  histogram  on  price  have  their  cor¬ 
responding  pixels  linked  representing  houses  highlighted  on 
the  other  attributes.  Those  houses  meeting  all  the  criteria  are 
highlighted  in  one  color;  those  houses  meeting,  say,  all  but  one 
are  highlighted  in  another  color.  In  this  way,  the  user  can  tell 
about  the  “near  misses.”  If  the  users  were  to  relax  one  of  the  cri¬ 
teria  only  a  little  (say,  reducing  price  by  $100),  then  the  user 
might  be  able  to  gain  more  on  another  criterion  (say,  reducing 
a  commute  by  20  miles). 


FOCUS  +  CONTEXT  ATTENTION-REACTIVE 
ABSTRACTIONS 


So  far,  we  have  considered  visualizations  that  are  static  map¬ 
pings  from  Data  Table  to  Visual  Structure  and  those  where  the 
mappings  Data  Table  to  Visual  Structure  are  interactively  con¬ 
trolled  by  the  user.  We  now  consider  visualizations  in  which  the 
machine  is  no  longer  passive,  but  its  mappings  from  Visual 
Structure  to  View  are  altered  by  the  computer  according  to  the 
its  model  of  the  user’s  degree  of  interest.  We  can,  in  principle, 
associate  a  cost  of  access  with  every  element  in  the  Data  Table. 
Take  the  FilmFinder  in  Figure  26.3.  Details  about  the  movie 
“Murder  on  the  Orient  Express”  are  accessible  at  low  cost  in 
terms  of  time  because  they  are  presently  visible  on  the  screen. 
Details  of  “Goldfinger,”  a  movie  with  only  a  mark  on  the  dis¬ 
play,  take  more  time  to  find.  Details  of  “Last  Year  at  Marienbad,” 
a  movie  with  no  mark  on  the  display,  would  take  much  more 
time.  The  idea  is  that  with  a  model  for  predicting  users’  changes 
in  interest,  the  system  can  adjust  its  displays  to  make  costs  lower 
for  information  access.  For  example,  if  the  user  wants  some  de¬ 
tail  about  a  movie,  such  as  the  director,  the  system  can  antici¬ 
pate  that  the  user  is  more  likely  to  want  other  details  about  the 
movie  as  well  and  therefore  display  them  all  at  the  same  time: 
The  user  does  not  have  execute  a  separate  command;  the  cost 
is  therefore  reduced. 


Focus -f  context  views  are  based  on  several  premises:  First, 
the  user  needs  both  overview  (context)  and  detail  information 
(focus)  during  information  access,  and  providing  these  in  sepa¬ 
rate  screens  or  separate  displays  is  likely  to  cost  more  in  user 
time.  Second,  information  needed  in  the  overview  may  be  dif¬ 
ferent  from  that  needed  in  the  detail.  The  information  of  the 
overview  needs  to  provide  enough  information  the  user  to  de¬ 
cide  where  to  examine  next  or  to  give  a  context  to  the  detailed 
information  rather  than  the  detailed  information  itself  As  Fur¬ 
nas  (1981)  has  argued,  the  user’s  interest  in  detail  seems  to  fall 
away  in  a  systematic  way  with  distance  as  information  objects 
become  farther  from  current  interest.  Third,  these  two  types  of 
information  can  be  combined  within  a  single  dynamic  display, 
much  as  human  vision  uses  a  two-level  focus  and  context  strat¬ 
egy.  Information  broken  into  multiple  displays  (separate  leg¬ 
ends  for  a  graph,  for  example)  seem  to  degrade  performance 
due  to  reasons  of  visual  search  and  working  memory. 

Furnas  (1981)  was  the  first  to  articulate  these  ideas  system¬ 
atically  in  his  theory  of fisheye  views.  The  essence  of  focus  -f  con¬ 
text  displays  is  that  the  average  cost  of  accessing  information  is 
reduced  by  placing  the  most  likely  needed  information  for  nav¬ 
igation  and  detail  where  it  is  fastest  to  access.  This  can  be  ac¬ 
complished  by  working  on  either  the  data  side  or  the  visual  side 
of  the  visual  reference  model.  Fig.  26.10.  We  now  consider  these 
techniques  in  more  detail. 


Data-Based  Methods 

Filtering.  On  the  data  side,  focus -I- context  effects  can  be 
achieved  by  filtering  out  which  items  from  the  Data  Table  are  ac¬ 
tually  displayed  on  the  screen.  Suppose  we  have  a  tree  of  cate¬ 
gories  taken  from  Roger’s  Thesaurus,  and  we  are  interacting 
with  one  of  these,  “Hardness.” 

Matter 

ORGANIC 

Vitality 

Vitality  in  general 

Specific  vitality 
Sensation 

Sensation  in  general 

Specific  sensation 
INORGANIC 
Solid 

Hardness 

Softness 

Fluid 

Fluids  in  general 

Specific  fluids 

Of  course,  this  is  a  small  example  for  illustration.  A  tree  rep¬ 
resenting  a  program  listing  or  a  computer  directory  or  a  taxon¬ 
omy  could  easily  have  thousands  of  lines,  a  number  that  would 
vastly  exceed  what  could  fit  on  the  display  and  hence  would 
have  a  high  cost  of  accessing.  We  calculate  a  degree-of-interest 
(DOI)  for  each  item  of  the  tree,  given  that  the  focus  is  on  the 
node  Hardness.  To  do  this,  we  split  the  DOI  into  an  intrinsic 
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part  and  a  part  that  varies  with  distance  from  the  current  cen¬ 
ter  of  interest  and  use  a  formula  from  Furnas  (1981). 

DOI  =  Intrinsic  DOI  +  Distance  DOI 

Figure  26.25  shows  schematically  how  to  perform  this  com¬ 
putation  for  our  example.  We  assume  that  the  intrinsic  DOI  of 
a  node  is  just  its  distance  of  the  root  (Fig.  26.25[a]).  The  dis¬ 
tance  part  of  the  DOI  is  just  the  traversal  distance  to  a  node 
from  the  current  focus  node  (Fig.  26.25  [b];  it  turns  out  to  be 
convenient  to  use  negative  numbers  for  this  computation,  so 
that  the  maximum  amount  of  interest  is  bounded,  but  not  the 
minimum  amount  of  interest) .  We  add  these  two  numbers  to¬ 
gether  (Fig.  26.25  [c])  to  get  the  DOI  of  each  node  in  the  tree. 
Then  we  apply  a  minimum  threshold  of  interest  (—5  in  this 
case)  and  only  show  nodes  more  interesting  than  that  thresh¬ 
old.  The  result  is  the  reduced  tree: 
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INORGANIC 

ORGANIC 

Solid 

Hardness 

Softness 

Fluid 

The  reduced  tree  gives  local  context  around  the  focus  node  and 
progressively  less  detail  farther  away.  But  it  does  seem  to  give 
the  important  context. 

Selective  aggregation.  Another  focus + context  tech¬ 
nique  from  the  data  side  is  selective  aggregation.  Selective  ag¬ 
gregation  creates  new  cases  in  the  Data  Table  that  are  aggre¬ 
gates  of  other  cases.  For  example,  in  a  visualization  of  voting 


(c)  (d) 

FIGURE  26.25.  Degree-of-lnterest  calculation  for  fish-eye  visu¬ 
alization. 


behavior  in  a  presidential  election,  voters  could  be  broken 
down  by  sex,  precinct,  income,  and  party  affiliation.  As  the  user 
drills  down  on,  say,  male  Democrats  earning  between  $25,000 
and  $50,000,  other  categories  could  be  aggregated,  providing 
screen  space  and  contextual  reference  for  the  categories  of  im¬ 
mediate  interest. 


View-Based  Methods 

Micro-macro  readings.  Micro-macro  readings  are  dia¬ 
grams  in  which  “detail  cumulates  into  larger  coherent  struc¬ 
tures”  (Tufte,  1990).  The  diagram  can  be  graphically  read  at  the 
level  of  larger  contextual  structure  or  at  the  detail  level.  An  ex¬ 
ample  is  Fig.  26.26.  The  micro  reading  of  this  diagram  shows 
three  million  observations  of  the  sleep  (lines),  wake  (spaces), 
and  feeding  (dots)  activity  of  a  newborn  infant.  Each  day’s  ac¬ 
tivity  is  repeated  three  times  on  a  line  to  make  the  cyclical  as¬ 
pect  of  the  activity  more  clearly  visible.  The  macro  reading  of 
the  diagram,  emphasized  the  thick  lines,  shows  the  infant  tran¬ 
sitioning  from  the  natural  human  25-hour  cycle  at  birth  to  the 
24-hour  solar  day.  The  macro  reading  serves  as  context  and  in¬ 
dex  into  the  micro  reading. 

Highlighting.  Highlighting  is  a  special  form  of  micro¬ 
macro  reading  in  which  focal  items  are  made  visually  distinc¬ 
tive  in  some  way.  The  overall  set  of  items  provides  a  context  for 
the  changing  focal  elements. 

Visual  transfer  functions.  We  can  also  warp  the  view 
with  viewing  transformations.  An  example  is  a  visualization 
called  the  bifocal  lens  (Spence  &  Apperley  1982).  Fig.  26.27(a) 
shows  a  set  of  documents  the  user  would  like  to  view,  but  which 
is  too  large  to  fit  on  the  screen.  In  a  bifocal  lens,  documents  not 
in  a  central  focal  region  are  compressed  down  to  a  smaller  size. 
This  could  be  a  strict  visual  compression.  It  could  also  involve  a 
change  in  representation.  We  can  talk  about  the  visual  com¬ 
pression  in  terms  of  a  visual  transfer  function  Fig.  26.27(b), 
sometimes  conveniently  represented  in  terms  of  its  first  deriva¬ 
tive  in  Fig.  26.27(c).  This  function  shows  how  many  units  of  an 
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FIGURE  26.26.  Micro-macro  reading.  (Winfree,  1987).  Courtesy 
Scientific  American  Library. 
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(a) 


Original  Display  X 


(b) 


(c) 


FIGURE  26.27.  Bifocal  +  transfer  function. 


axis  in  the  original  display  are  mapped  into  how  many  units  in 
the  resultant  display.  The  result  could  be  compression  or  en¬ 
largement  of  a  section  of  the  display.  As  a  result  of  applying  this 
visual  transfer  function  to  Fig.  26.27(a),  the  display  is  com¬ 
pressed  to  Fig.  26.27(d).  Actually,  the  documents  in  the  com¬ 
pressed  region  have  been  further  altered  by  using  a  semantic 
zooming  function  to  give  them  a  simplified  visual  form.  The 
form  of  Fig.  26.27(c)  shows  that  this  is  essentially  a  step  function 
of  two  different  slopes.  An  example  of  a  two-dimensional  step 
function  is  the  Table  Lens  (Fig.  26.28[a]).  The  Table  Lens  is  a 
spreadsheet  in  which  the  columns  of  selected  cells  are  ex¬ 
panded  to  full  size  in  X  and  the  rows  of  selected  cells  are  ex¬ 
panded  to  full  size  in  Y  All  other  cells  are  compressed,  and  their 
content  represented  only  by  a  graphic.  As  a  consequence, 
spreadsheets  up  to  a  couple  orders  of  magnitude  larger  can  be 
represented. 

By  varying  the  visual  transfer  function  (see,  for  example,  the 
review  by  Leung  and  Apperley  (1994) ,  a  wide  variety  of  distorted 
views  can  be  generated.  Figure  26.28(b)  shows  an  application  in 
which  a  visual  transfer  function  is  used  to  expand  a  bubble 
around  a  local  region  on  a  map.  The  expanded  space  in  the  re¬ 
gion  is  used  to  show  additional  information  about  that  region. 


Distorted  views  must  be  designed  carefully  so  as  not  to  dam¬ 
age  important  visual  relationships.  Bubble  distortions  of  maps 
may  change  whether  roads  appear  parallel  to  each  other.  How¬ 
ever,  distorted  views  can  be  designed  with  “flat”  and  “transition” 
regions  to  address  this  problem.  Figure  26.27(a)  does  not  have 
curvilinear  distortions.  Focus -Fcontext  visualizations  can  be 
used  as  part  of  compact  user  controls.  Keahey  (2001)  has  cre¬ 
ated  an  interactive  scheme  in  which  the  bubble  is  used  to  “pre¬ 
view”  a  region.  When  the  user  releases  a  button  over  the  region, 
the  system  zooms  in  far  enough  to  flatten  out  the  bubble.  Bed- 
erson  has  developed  a  focus -Fcontext  pull-down  menu  (Beder- 
son,  2000)  that  allows  the  viewing  and  selection  of  large  lists  of 
typefaces  in  text  editor  Fig.  26.27(c). 

Perspective  distortion.  One  interesting  form  of  distort¬ 
ing  visual  transfer  functions  is  3D  perspective.  Although  it  can 
be  described  with  a  2D  distorting  visual  transfer  function,  it  is 
usually  not  experienced  as  distorting  by  users  due  to  the  spe¬ 
cial  perceptual  mechanisms  humans  have  for  processing  3D. 
Figure  26.28(c)  shows  the  Perspective  Wall  (Mackinlay  Robert¬ 
son,  &  Card,  1991).  Touching  any  place  on  the  walls  animates 
its  transition  into  the  central  focal  area.  The  user  perceives  the 
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(a)  Table  Lens.  Courtesy  of  Inxight  Softw'are. 
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(b)  Nonlinear  distortion  of  UK..  Courtesy  Alan  Keahey 


(d)  Penspeclive  Wall  (Mackinlay,  Robertson,  &  Card, 
1991). 


FIGURE  26.28.  Attention-Reactive  Visualizations. 


context  area  of  the  wall  as  an  undistorted  2D  image  in  a  3D 
space,  rather  than  as  a  distorted  2D  image;  however,  the  same 
sort  of  compression  is  still  achieved  in  the  nonfocus  area. 

Alternate  geometries.  Instead  of  altering  the  size  of 
components,  focus+context  effects  can  also  be  achieved  by 
changing  the  geometry  of  the  spatial  substrate  itself  One  ex¬ 
ample  is  the  hyperbolic  tree  (Lamping  &  Rao,  1994).  A  visual¬ 
ization  such  as  a  tree  is  laid  out  in  hyperbolic  space  (which  it¬ 
self  expands  exponentially,  just  like  the  tree  does),  and  then 
projected  on  to  the  Euclidean  plane.  The  result  is  that  the  tree 
seems  to  expand  around  the  focal  nodes  and  to  be  compressed 
elsewhere.  Selecting  another  node  in  the  tree  animates  that  por¬ 


tion  to  the  focal  area.  Munzner  (Munzner  &  Burchard,  1995)  has 
extended  this  notion  to  3D  hyperbolic  trees  and  used  them  to 
visualize  portions  of  the  Internet. 


SENSEMAKING  WITH  VISUALIZATION 


Knowledge  Crystallization 

The  purpose  of  information  visualization  is  to  amplify  cognitive 
performance,  not  just  to  create  interesting  pictures.  Informa¬ 
tion  visualizations  should  do  for  the  mind  what  automobiles  do 
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for  the  feet.  So  here,  we  return  to  the  higher  level  cognitive  op¬ 
erations  of  which  information  visualization  is  a  means  and  a  com¬ 
ponent.  A  recurrent  pattern  of  cognitive  activity  to  which  infor¬ 
mation  visualization  would  be  useful  (though  not  the  only  one!) 
is  “knowledge  crystallization.”  In  knowledge  crystallization  tasks, 
there  is  a  goal  (sometimes  ill-structured)  that  requires  the  ac¬ 
quisition  and  making  sense  of  a  body  of  information,  as  well  as 
the  creative  formulation  of  a  knowledge  product,  decision,  or  ac¬ 
tion.  Examples  would  be  writing  a  scientific  paper,  business  or 
military  intelligence,  weather  forecasting,  or  buying  a  laptop 
computer.  For  these  tasks,  there  is  usually  a  concrete  outcome  of 
the  task — the  submitted  manuscript  of  a  paper,  a  delivered  brief¬ 
ing,  or  a  purchase.  Knowledge  crystallization  does  have  charac¬ 
teristic  processes,  however,  and  it  is  by  amplifying  these  that  in¬ 
formation  visualization  seeks  to  intervene  and  amplify  the  user’s 
cognitive  powers.  Understanding  of  this  process  is  still  tentative, 
but  the  basic  parts  can  be  outlined: 

Acquire  information.  Make  sense  of  it.  Create  something  new.  Act  on  it. 

In  Table  26.10,  we  have  listed  some  of  the  more  detailed  ac¬ 
tivities  these  entail.  We  can  see  examples  of  these  in  our  initial 
examples. 

Acquire  information.  The  FilmFinder  is  concentrated 
largely  on  acquiring  information  about  films.  Search  is  one  of 
the  methods  of  acquiring  information  in  Table  26.10,  and  the 
FilmFinder  is  an  instance  of  the  use  of  information  visualization 
in  search.  In  fact,  Shneiderman  (Card  et  al.,  1999)  has  identi¬ 
fied  a  heuristic  for  designing  such  systems: 

Overview  first,  zoom  and  filter,  then  details-on-demand 

The  user  starts  with  an  overview  of  the  films,  and  then  uses 
sliders  to  filter  the  movies,  causing  the  overview  to  zoom  in  on 
the  remaining  films.  Popping  up  a  box  gives  details  on  the  par¬ 
ticular  films.  The  user  could  use  this  system  as  part  of  a  knowl¬ 
edge  crystallization  process,  but  the  other  activities  would  take 
place  outside  the  system.  The  SmartMoney  system  also  uses  the 
TreeMap  visualization  for  acquiring  information,  but  this  time 
the  system  is  oriented  toward  monitoring,  another  of  the  meth¬ 
ods  in  Table  26.10.  Aglance  at  the  sort  of  chart  in  Fig.  26.5  allows 
an  experienced  user  to  notice  interesting  trends  among  the 
hundreds  of  stocks  and  industries  monitored.  Another  method 


TABLE  26.10. 

Knowledge  Crystallization  Operators 

Acquire  Information 

Monitor 

Search, 

Capture  (make  implicit  knowledge  explicit) 

Make  sense  of  it 

Extract  information 

Fuse  different  sources 

Find  schema 

Recode  information  into  schema 

Create  something  new 

Organize  for  creation 

Author 

Act  on  it 

Distribute 

Apply 

Act 

of  acquiring  information,  capture,  refers  to  acquiring  informa¬ 
tion  that  is  tacit  or  implicit.  For  example,  when  users  browse  the 
World  Wide  Web,  their  paths  contain  information  about  their 
goals.  This  information  can  be  captured  in  logs,  analyzed,  and  vi¬ 
sualized  (Chi  &  Card,  1999).  It  is  worth  making  the  point  that  ac¬ 
quiring  information  is  not  something  that  the  user  must  neces¬ 
sarily  do  explicitly.  Search,  monitoring,  and  capture  can  be 
implicitly  triggered  by  the  system. 

Make  sense  of  it.  The  heart  of  knowledge  crystallization 
is  sensemaking.  This  process  is  by  no  means  as  mysterious  as  it 
might  appear.  Because  sensemaking  involving  large  amounts 
of  information  must  be  externalized,  the  costs  of  finding,  orga¬ 
nizing,  and  moving  information  around  have  a  major  impact  on 
its  effectiveness.  The  actions  of  sensemaking  itself  can  be  ana¬ 
lyzed.  One  process  is  extraction.  Information  must  be  got  out 
of  its  sources.  In  our  hotel  example,  the  hotel  manager  ex¬ 
tracted  information  from  hotel  records.  A  more  subtle  issue  is 
that  information  from  different  sources  must  be  fused — that  is, 
registered  in  some  common  correspondence.  If  there  are  six 
called-in  reports  of  traffic  accidents,  does  this  mean  six  different 
accidents,  one  accident  called  in  six  times,  or  two  accidents  re¬ 
ported  by  multiple  callers?  If  one  report  merely  gives  the  county, 
while  another  just  gives  the  highway,  it  may  not  be  easy  to  tell. 
Sensemaking  involves  finding  some  schema — that  is,  some  de¬ 
scriptive  language — in  terms  of  which  information  can  be  com¬ 
pactly  expressed  (Russell,  Stefik,  Pirolli,  &  Card,  1993).  In  our 
hotel  example,  permuting  the  matrices  brought  patterns  to  the 
attention  of  the  manager.  These  patterns  formed  a  schema  she 
used  to  organize  and  represent  hotel  stays  compactly.  In  the 
case  of  buying  a  laptop  computer,  the  schema  may  be  a  table 
of  features  by  models.  Having  a  common  schema  then  permits 
compact  description.  Instances  are  recoded  into  the  schema. 
Residual  information  that  does  not  fit  the  schema  is  noted  and 
can  be  used  to  adjust  the  schema. 

Create  something  new.  Using  the  schema,  information 
can  be  reorganized  to  create  something  new.  It  must  be  orga¬ 
nized  into  a  form  suitable  for  the  output  product  and  that  prod¬ 
uct  must  be  authored.  In  the  case  of  the  hotel  example,  the 
manager  created  the  presentation  of  Fig.  26.7(c). 

Act  on  it.  Finally,  there  is  some  consequential  output  of 
the  knowledge  crystallization  task.  That  action  may  be  to  dis¬ 
tribute  a  report  or  give  a  briefing,  to  act  directly  in  some  way, 
such  as  setting  up  a  new  promotion  program  for  the  hotel  or 
buying  a  laptop  on  the  basis  of  the  analysis,  or  by  giving  direc¬ 
tives  to  an  organization. 

Levels  for  Applying  Information  Visualization 

Information  visualization  can  be  applied  to  facilitate  the  vari¬ 
ous  subprocesses  of  knowledge  crystallization  just  described. 
It  can  also  be  applied  at  different  architectural  levels  in  a  system. 
These  have  been  depicted  in  Fig.  26.29.  At  one  level  is  the  use  of 
visualization  to  help  users  access  information  outside  the  im- 
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FIGURE  26.29.  Levels  of  use  for  information  visualization. 


mediate  environment — the  infosphere — such  as  information  on 
the  Internet  or  from  corporate  digital  libraries.  Figure  26.30(a) 
shows  such  a  visualization  of  the  Internet  (Bray,  1996).  websites 
are  laid  out  in  a  space  such  that  sites  closer  to  each  other  in  the 
visualization  tend  to  have  more  traffic.  The  size  of  the  disk  rep¬ 
resents  the  number  of  pages  in  the  site.  The  globe  size  repre¬ 
sents  the  number  of  out-links.  The  globe  height  shows  the  num¬ 
ber  of  in-links. 

The  second  level  is  the  information  workspace.  The  infor¬ 
mation  workspace  is  like  a  desk  or  workbench.  It  is  a  staging 
area  for  the  integration  of  information  from  different  sources. 
An  information  workspace  might  contain  several  visualizations 
related  to  one  or  several  tasks.  Part  of  the  purpose  of  an  infor¬ 
mation  workspace  is  to  make  the  cost  of  access  low  for  infor¬ 
mation  in  active  use.  Figure  26.30(b)  shows  a  3D  workspace  for 
the  Internet,  the  Web  Forager  (Card,  Robertson,  &  York,  1996). 
Pages  from  the  World  Wide  Web,  accessed  by  users  through 
clicking  on  URLs  or  searches,  appear  in  the  space.  These  can 
be  organized  into  piles  or  books  related  to  different  topics.  Fig¬ 
ure  26.30  (c)  shows  another  document  workspace,  STARLIGHT 
(Risch  et  al.,  1997).  Documents  are  represented  as  galaxies  of 
points  in  space  such  that  similar  documents  are  near  each  other. 
In  the  workspace,  various  tools  allow  linking  the  documents  to 
maps  and  other  information  and  analytical  resources. 


(a)  Infosphere:  (Bray,  1996). 


(c)  Workspace:  STARLIGHT:  (Risch  et  al., 
1997). 


(b)  Workspace:  Web  Forager  (Card, 
Robertson,  &  York,  1996). 


(d)  Visually-enhanced  object:  Voxel-Man. 
Courtesy  of  University  ofHamburg. 


FIGURE  26.30.  Information  visualization  applications. 
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The  third  level  is  visual  knowledge  tools.  These  are  tools 
that  allow  schema  forming  and  rerepresentation  of  information. 
The  permutation  matrices  in  Fig.  26.7,  the  SeeSoft  system  for 
analyzing  software  in  Fig.  26.15(b),  and  the  Table  Lens  in  Fig. 
26.27(a)  are  examples  of  visual  knowledge  tools.  The  focus  is  on 
determining  and  extracting  the  relationships. 

The  final  level  is  visually  enhanced  objects,  coherent  infor¬ 
mation  objects  enhanced  by  the  addition  of  information  visual¬ 
ization  techniques.  An  example  is  Fig.  26.30(d),  in  which  voxel 
data  of  the  brain  have  been  enhanced  through  automatic  sur¬ 
face  rendition,  coloring,  slicing,  and  labeling.  Abstract  data 
structures  representing  neural  projects  and  anatomical  labels 
have  been  integrated  into  a  display  of  the  data.  Visually  en¬ 
hanced  objects  focus  on  revealing  more  information  from  some 
object  of  intrinsic  visual  form. 


Information  visualization  is  a  set  of  technologies  that  use  vi¬ 
sual  computing  to  amplify  human  cognition  with  abstract  infor¬ 
mation.  The  future  of  this  field  will  depend  on  the  uses  to  which 
it  is  put  and  how  much  advantage  it  gives  to  these.  Information 
visualization  promises  to  help  us  speed  our  understanding  and 
action  in  a  world  of  increasing  information  volumes.  It  is  a  core 
part  of  a  new  technology  of  human  interfaces  to  networks  of  de¬ 
vices,  data,  and  documents. 
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INTRODUCTION 


Computing  and  communication  technologies  have  provided  us 
with  useful  and  powerful  information  resources,  remote  instru¬ 
ments,  and  tools  for  interacting  with  each  other.  These  possibili¬ 
ties  have  also  led  to  numerous  social  and  organizational  effects. 
These  tools  are  of  course  just  the  latest  in  a  long  line  of  modern 
technologies  that  have  changed  human  experience.  Television 
and  radio  long  ago  broadened  our  awareness  of  and  interest  in 
activities  all  over  the  world.  The  telegraph  and  telephone  en¬ 
abled  new  forms  of  organization  to  emerge.  The  new  tech¬ 
nologies  of  Computer-Supported  Cooperative  Work  (C  SCW)  ^  are 
giving  us  greater  geographical  and  temporal  flexibility  in  car¬ 
rying  out  our  activities.  They  have  also  given  us  new  modes  of 
socializing. 

Groupware  is  software  designed  to  run  over  a  network  in 
support  of  the  activities  of  a  group  or  organization.  These  ac¬ 
tivities  can  occupy  any  of  several  combinations  of  same/different 
places  and  same/different  times.  Groupware  has  been  designed 
for  all  four  of  these  combinations.  Early  groupware  applications 
tended  to  focus  on  only  one  of  these  cells,  but  more  recently 
groupware  that  supports  several  cells  and  the  transitions  among 
them  has  emerged.  We  also  do  not  think  of  groupware  as  only 
dealing  with  groups.  Both  the  individual  members  of  groups 
and  the  organizations  in  which  they  are  embedded  affect  and 
are  affected  by  groupware. 

CSCW  emerged  as  a  formal  field  of  study  in  the  mid-1980s, 
with  conferences,  journals,  books,  and  university  courses  ap¬ 
pearing  that  used  this  name.  There  were  a  number  of  important 
antecedents.  The  earliest  efforts  to  create  groupware  used  time- 
shared  systems  but  were  closely  linked  to  the  development  of 
key  ideas  that  propelled  the  personal  computer  revolution.  Van- 
never  Bush  (1945)  described  a  vision  of  something  similar  to  to¬ 
day’s  World  Wide  Web  in  an  influential  essay  published  shortly 
after  the  end  of  World  War  II.  Doug  Engelbart’s  famous  demon¬ 
stration  at  the  1968  IFIPS  meeting  in  San  Francisco  included  a 
number  of  key  groupware  components  (see  Engelbart  &  En¬ 
glish,  1968).  These  components  included  support  for  real-time 
face-to-face  meetings,  audio  and  video  conferencing,  discussion 
databases,  information  repositories,  and  workflow  support. 
Group  decision-support  systems  and  computer-supported 
meeting  rooms  were  explored  in  a  number  of  business  schools 
(see  McLeod,  1992;  Kraemer  &  Pinsonneault,  1990).  Work  on  of¬ 
fice  automation  included  many  groupware  elements,  such  as 
group  workflow  management,  calendaring,  e-mail,  and  docu¬ 
ment  sharing  (Ellis  &  Nutt,  1980).  A  good  summary  of  early  his¬ 
torical  trends  as  well  as  reprints  of  key  early  articles  appear  in 
Grief’s  (1988)  important  anthology  of  readings. 

Today  there  are  a  large  number  of  commercial  groupware 
products.  In  addition,  groupware  functions  are  increasingly  ap¬ 
pearing  as  options  in  operating  systems  or  specific  applications 
(e.g.,  access  to  meeting-support  tools  within  Microsoft  Office 
products).  Groupware  functionality  has  become  widespread 


and  familiar.  However,  there  are  still  many  research  issues  about 
how  to  design  such  systems  and  what  effects  they  have  on  the 
individuals,  groups,  and  organizations  that  use  them. 


ADOPTING  GROUPWARE  IN  CONTEXT 


Groupware  systems  are  intended  to  support  groups,  which  are 
usually  embedded  in  an  organization.  As  a  result,  there  are  a 
number  of  issues  that  bear  on  groupware  success.  In  a  justly  fa¬ 
mous  set  of  papers,  Grudin  (1988,  1994)  pointed  out  a  number 
of  problems  that  groupware  systems  have  (see  also  Markus  & 
Connolly,  1990).  In  brief,  he  pointed  out  that  developers  of 
groupware  systems  need  to  be  concerned  about  the  following 
issues  (Grudin,  1994,  p.  97): 

1.  Disparity  in  work  and  benefit.  Groupware  applications  of¬ 
ten  require  additional  work  from  individuals  who  do  not  per¬ 
ceive  a  direct  benefit  from  the  use  of  the  application. 

2.  Critical  mass  and  Prisoner’s  dilemma  problems.  Groupware 
may  not  enlist  the  “critical  mass”  of  users  required  to  be  use¬ 
ful,  or  can  fail  because  it  is  never  in  any  one  individual’s  ad¬ 
vantage  to  use  it. 

3.  Disruption  of  social  processes.  Groupware  can  lead  to  activity 
that  violates  social  taboos,  threatens  existing  political  struc¬ 
tures,  or  otherwise  de-motivates  users  crucial  to  its  success. 

4.  Exception  handling.  Groupware  may  not  accommodate  the 
wide  range  of  exception  handling  and  improvisation  that 
characterizes  much  group  activity. 

5.  Unobtrusive  accessibility.  Features  that  support  group  pro¬ 
cesses  are  used  relatively  infrequently,  requiring  unobtrusive 
accessibility  and  integration  with  more  heavily  used  features. 

6.  Difficulty  of  evaluation.  The  almost  insurmountable  obsta¬ 
cles  to  meaningful,  generalizable  analysis  and  evaluation  of 
groupware  prevent  us  from  learning  from  experience. 

7.  Failure  of  intuition.  Intuitions  in  product  development  envi¬ 
ronments  are  especially  poor  for  multi-user  applications,  re¬ 
sulting  in  bad  management  decisions  and  error-prone  design 
processes. 

8.  The  adoption  process.  Groupware  requires  more  careful  im¬ 
plementation  (introduction)  in  the  workplace  than  product 
developers  have  confronted. 

However,  there  are  reasons  for  optimism.  In  a  recent  survey 
of  the  successful  adoption  of  group  calendaring  in  several  or¬ 
ganizations,  Palen  and  Grudin  (2002)  observed  that  organiza¬ 
tional  conditions  in  the  1990s  were  much  more  favorable  for  the 
adoption  of  group  tools  than  they  were  in  the  1980s.  Further, 
the  tools  themselves  had  improved  in  reliability,  functionality, 
and  usability.  There  is  increased  “collaboration  readiness”  and 
“collaboration  technology  readiness.”  But  there  are  still  signifi¬ 
cant  challenges  in  supporting  group  work  at  a  distance  (Olson 
&  Olson,  2000). 


This  name  contains  several  anachronisms.  “Computer”  is  no  longer  confined  to  the  familiar  desktop  device.  “Cooperative”  does  not  mean  we  ignore 
competitive  uses  of  technology.  “Work”  is  only  one  of  a  number  of  venues  that  CSCW  researchers  study. 
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TECHNICAL  INFRASTRUCTURE 


Groupware  requires  networks,  and  network  infrastructure  is  a 
key  enabler  as  well  as  a  constraint  on  groupware.  A  number  of 
advanced  networks  are  exploring  the  issues  of  supporting  high- 
end  users  (e.g.,  Abilene,  abilene.internet2.edu;  National  Lamb- 
daRail,  www.nlr.net;  HOPI,  networks.internet2.edu/hopi).  Wire¬ 
less  networking  technology  gives  users  more  flexibility.  Good 
access  to  the  Internet  is  now  common  in  homes,  hotels,  coffee 
shops,  airports,  and  many  other  places.  Networking  infrastruc¬ 
ture  is  spreading  throughout  the  world.  However,  heterogene¬ 
ity  in  network  conditions  across  both  space  and  time  still  re¬ 
mains  a  major  technical  challenge.  For  instance,  doing  web 
conferencing  when  some  participants  are  on  slow  dial-up  lines 
and  others  are  on  fast  advanced  networks  requires  special  co¬ 
ordination.  Tanenbaum  (2003)  is  a  good  resource  on  the  latest 
developments  in  networking. 

The  World  Wide  Web  and  its  associated  tools  and  standards 
have  had  a  major  impact  on  the  possibilities  for  groupware 
(Schatz  &  Hardin,  1994;  Berners-Lee,  1999).  Early  groupware 
mostly  consisted  of  stand-alone  applications  that  had  to  be 
downloaded  and  run  on  each  client  machine.  Increasingly, 
group  tools  are  written  for  the  web,  requiring  only  a  web 
browser  and  perhaps  some  plug-ins.  This  makes  it  much  easier 
for  the  user,  and  also  helps  with  matters  such  as  version  control. 
It  also  enables  better  interoperability  across  hardware  and  op¬ 
erating  systems. 

Security  on  the  Internet  is  a  major  challenge  for  groupware. 
In  some  sense  the  design  of  Internet  protocols  are  to  blame, 
since  the  Internet  grew  up  in  a  culture  of  openness  and  shar¬ 
ing  (Longstaff  et  al.,  1997;  Abbate,  1999;  Tanenbaum,  2003). 
E-commerce  and  sensitive  application  domains  like  medicine 
have  been  a  driver  for  advances  in  security,  but  there  is  still 
much  progress  to  be  made  (Longstaff  et  al.,  1997;  Camp,  2000). 
Coping  with  firewalls  that  block  access  to  certain  kinds  of  orga¬ 
nizations  can  limit  the  flexibility  of  web  conferencing. 

Personal  computing  was  a  great  enabler  of  collaborative  ap¬ 
plications.  Today  we  are  liberated  from  the  desktop.  Laptops, 
personal  digital  assistants,  wearables,  and  cell  phones  provide 
access  to  information  and  people  from  almost  anywhere.  More 
and  more  applications  are  being  written  to  operate  across  these 
diverse  environments  (e.g.,  Tang  et  al.,  2001;  Starner  &  Rhodes, 
2004).  These  devices  vary  in  computational  power,  display  size 
and  characteristics,  network  bandwidth,  and  connection  relia¬ 
bility,  providing  interesting  technical  challenges  to  make  them 
all  interoperate  smoothly.  Eor  instance,  accessing  websites  from 
a  cell  phone  requires  special  user  interface  methods  to  make 
the  tiny  displays  usable. 

Additional  flexibility  is  being  provided  by  the  development  of 
infrastructure  that  lies  between  the  network  itself  and  the  ap¬ 
plications  that  run  on  client  workstations,  called  “middleware.” 
This  infrastructure  makes  it  easier  to  link  together  diverse  re¬ 
sources  to  accomplish  collaborative  goals.  For  instance,  the 
emerging  Grid  technologies  allow  the  marshalling  of  powerful, 
scattered  computational  resources  (Foster  &  Kesselman,  2004). 
Middleware  provides  such  services  as  identification,  authenti¬ 
cation,  authorization,  directories,  and  security  in  uniform  ways 
that  facilitate  the  interoperability  of  diverse  applications.  All  of 


these  technical  elements  are  components  of  cyberinfrastructure 
(Atkins  et  al.,  2003).  There  is  considerable  interest  in  the  devel¬ 
opment  of  this  infrastructure  because  of  its  large  impact  on  re¬ 
search,  education,  and  commerce. 


COMMUNICATION  TOOLS 


We  now  turn  to  a  review  of  specific  kinds  of  groupware,  high¬ 
lighting  their  various  properties  and  uses.  We  have  grouped  this 
review  under  several  broad  headings.  We  do  not  aim  to  be  ex¬ 
haustive,  but  rather  seek  to  illustrate  the  variety  of  kinds  of  tools 
that  have  emerged  to  support  human  collaborative  activities 
over  networked  systems.  We  also  highlight  various  research 
issues  pertaining  to  these  tools. 

E-mail 

E-mail  has  become  a  ubiquitous  communication  tool.  The  early 
adoption  of  standards  made  it  possible  for  messages  to  be  ex¬ 
changed  across  networks  and  different  base  machines  and  soft¬ 
ware  applications.  E-mail  is  now  also  done  from  cell  phones, 
PDAs,  television  sets,  and  kiosks  in  public  sites.  Documents  of 
many  types  can  be  easily  exchanged.  Because  of  its  widespread 
use,  it  is  often  called  the  first  successful  groupware  application 
(Sproull  &  Kiesler,  1991;  Satzinger  &  Oilman,  1992;  O’Hara- 
Devereaux  &  Johnson,  1994;  Anderson,  Bikson,  Law,  &  Mitchell, 
1995).  Indeed,  it  has  become  so  successful  that  e-mail  overload 
has  become  a  major  problem  (Whittaker  &  Sidney  1996).  And 
of  course  it  has  become  a  vector  for  viruses,  worms,  and  other 
invasive  software. 

Researchers  have  shown  that  this  widespread  use  has  had  a 
number  of  effects  on  how  people  behave.  It  has  had  large  ef¬ 
fects  on  communication  in  organizations:  it  changes  the  social 
network  of  who  talks  to  whom  (Sproull  &  Kiesler,  1991;  De- 
Sanctis,  Jackson,  Poole,  &  Dickson,  1996),  the  power  of  people 
who  formerly  had  little  voice  in  decisions  (Finholt,  Sproull,  & 
Kiesler,  1990),  and  the  tone  of  what  is  said  and  how  it  is  inter¬ 
preted  (Sproull  &  Kiesler,  1991).  For  example,  with  e-mail,  peo¬ 
ple  who  were  shy  found  a  voice;  they  could  overcome  their  re¬ 
luctance  to  speak  to  other  people  by  composing  text,  not 
speech  to  another  face.  This  invisibility,  however,  also  has  a 
more  general  effect — without  the  social  cues  in  the  recipient’s 
face  being  visible  to  the  sender,  people  will  “flame,”  send  harsh 
or  extremely  emotive  (usually  negative)  messages  (Arrow  et  al., 
1996;  Hollingshead,  McGrath,  &  O’Connor,  1993). 

As  with  a  number  of  other  “designed”  technologies,  people 
use  e-mail  for  things  other  than  the  original  intent.  People  use 
it  for  managing  time,  reminding  them  of  things  to  do,  and  keep¬ 
ing  track  of  steps  in  a  workflow  (Mackay  1989;  Carley  &  Wendt, 
1991;  Whittaker  &  Sidney  1996).  But  because  e-mail  was  not 
designed  to  support  these  tasks,  it  does  not  do  it  very  well;  peo¬ 
ple  struggle  with  reading  signals  about  whether  they  have 
replied  or  not  (and  to  whom  it  was  cc’d);  they  manage  folders 
poorly  for  reminding  them  to  do  things,  and  so  forth. 

In  addition,  because  e-mail  is  so  widespread,  and  it  is  easy 
and  free  to  distribute  a  single  message  to  many  people,  people 
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experience  information  overload.  Many  people  get  hundreds 
of  e-mail  messages  each  day,  many  of  them  mere  broadcasts  of 
things  for  sale  or  events  about  to  happen,  much  like  “classifieds” 
in  the  newspaper.  Several  early  efforts  to  use  artificial  intelli¬ 
gence  techniques  to  block  and/or  sort  incoming  e-mail  were 
tried,  and  this  has  continued  to  be  a  very  active  area  of  work 
(Malone,  Grant,  Lai,  Rao,  &  Rosenblitt,  1987;  Winograd,  1988). 
There  are  two  broad  classes  of  uses  of  e-mail  filters.  One  use  is 
to  automatically  sort  incoming  mail  into  useful  categories.  This 
is  relatively  easy  for  mail  that  has  simple  properties,  such  as  a 
person’s  name.  It  is  more  difficult  for  subtle  properties.  The 
other  major  use  is  to  weed  out  unwanted  mail,  such  as  spam. 
The  state-of-the-art  in  spam  filtering  is  in  the  range  of  80-90% 
effectiveness  in  2005  (e.g..  Federal  Trade  Commission,  2005). 
Such  filters  are  so  good  that  many  institutions  automatically  fil¬ 
ter  mail  as  it  comes  in  to  the  organization’s  gateway,  sparing 
users  the  need  to  do  it  in  their  own  clients.  Similarly,  many 
clients  now  come  with  built-in  spam  filters  that  can  be  tuned 
by  the  user  (e.g.,  Google’s  Gmail). 

These  problems  have  led  to  the  “reinvention”  of  e-mail 
(Whittaker,  Bellotti,  &  Moody,  2005).  For  example,  given  that  e- 
mail  is  often  used  in  the  context  of  managing  projects,  systems 
have  been  explored  that  have  a  more  explicit  scheme  for  task 
management  (Whittaker,  2005;  Bellotti,  Ducheneaut,  Howard, 
Smith,  &  Grinter,  2005).  To  deal  with  problems  of  e-mail  over¬ 
load,  new  schemes  for  filtering  e-mail  have  been  explored,  such 
as  routing  messages  differently  to  different  kinds  of  clients 
(e.g.,  cell  phone  vs.  desktop  machine;  see  Schmandt  &  Marti, 
2005).  Another  approach  has  been  to  explore  pricing  mecha¬ 
nisms  for  e-mail  that  are  analogous  to  pricing  for  regular  mail 
(Kraut,  Sunder,  Telang,  &  Morris,  2005).  In  such  schemes  one 
would  pay  to  send  e-mail,  with  higher  prices  presumably  indi¬ 
cating  higher  priority,  analogous  to  the  difference  between 
first  class  postage  and  bulk  rates.  These  schemes  are  currently 
exploratory  but  are  likely  to  result  in  new  options  in  future 
e-mail  clients. 

Kraut  et  al.  (1998)  reported  that  greater  Internet  use,  which 
in  their  sample  was  mostly  e-mail,  led  to  declines  in  social  in¬ 
teractions  with  family  members  and  an  increase  in  depression 
and  loneliness.  Not  surprising,  these  results  triggered  wide¬ 
spread  discussion  and  debate,  both  over  the  substance  of  the  re¬ 
sults  and  the  methods  used  to  obtain  them.  Kraut,  Gergle,  and 
Fussell  (2002)  reported  new  results  that  suggest  these  initial 
negative  effects  may  not  persist.  Interpersonal  communication 
is  one  of  the  principal  uses  of  the  Internet,  and  the  possible  im¬ 
plications  of  this  kind  of  communication  for  social  life  are  im¬ 
portant  to  understand.  Indeed,  Putnam  (2000)  has  wondered 
whether  the  Internet  can  be  a  source  of  social  cohesiveness. 
These  kinds  of  questions  need  to  be  addressed  by  additional 
large-scale  studies  of  the  kind  carried  out  by  Kraut  and  his  col¬ 
leagues  (see  Resnick,  2002). 

Conferencing  Tools:  Voice,  Video,  Text 

There  are  many  options  available  today  for  on-line  conferencing 
among  geographically  dispersed  members  of  a  group.  So-called 
computer-mediated  communication  (CMC)  has  become  wide¬ 


spread.  There  are  three  principal  modes  of  interaction,  but  each 
has  numerous  subtypes: 

'Video  -I-  Audio 

Full-scale  video  conferencing  room;  many  options  for 
specific  design 

Individual  desktop  video;  many  options  for  quality, 
interface 

Audio 

Phone  conference 

Voice  over  IP  (see  Federal  Communications  Commission, 
2005) 

Text 

Instant  messaging,  chat,  SMS  on  mobile  phones 

There  are  many  studies  that  compare  face-to-face  with  vari¬ 
ous  forms  of  CMC.  There  are  some  clear  generalizations  from 
such  work.  The  main  one  is  that  CMC  is  more  difficult  to  do  than 
FTF,  and  requires  more  preparation  and  care  (Hollingshead 
etal.,  1993;  McLeod,  1992;  Olson,  Olson,  &Meader,  1995;  Siegel, 
Dubrovsky,  Kiesler,  &  McGuire,  1986;  Straus,  1996,  1997;  Straus 
&  McGrath,  1994).  A  variety  of  things  that  come  for  free  in  FTF 
are  either  difficult  to  support  or  outright  missing  in  CMC  (Kiesler 
&  Cummings,  2002).  Backchannel  communication,  which  is  im¬ 
portant  for  modulating  conversation,  is  either  weak  or  nonexis¬ 
tent  in  CMC.  Paralinguistic  cues  that  can  soften  communication 
are  often  missing.  Participants  in  CMC  tend  to  have  an  informa¬ 
tional  focus,  which  means  there  is  usually  less  socializing,  less 
small  talk.  Over  time  this  can  lead  to  poorer  social  integration 
and  organizational  effectiveness  (Nohria  &  Eccles,  1992). 

CMC  often  introduces  delay.  This  is  well-known  to  be  very 
disruptive  to  communication  (Egido,  1988;  Krauss  &  Bricker, 
1966;  O’Conaill,  "Whittaker,  &  Wilbur,  1993;  Ruhleder  &  Jordan, 
2001;  Tang  &  Isaacs,  1993).  Participants  will  communicate  less 
information,  be  more  frustrated  with  the  communication,  and 
actually  terminate  communication  sessions  sooner.  Delay  can 
be  managed,  but  it  takes  special  care  among  the  participants 
and  turn-taking  widgets  in  the  interface  of  the  tools  being  used. 
For  instance,  if  there  is  delay,  then  full-duplex  open  communi¬ 
cation  will  not  work,  since  participants  will  step  all  over  each 
other’s  communication.  Either  the  participants  must  use  a  so¬ 
cial  protocol  (e.g.,  like  that  used  in  radio  communications  with 
spacecraft),  or  they  must  employ  a  mike-passing  procedure  with 
interface  indications  of  who  wants  to  talk  next. 

"While  it  might  seem  desirable  to  always  have  the  maximum 
communication  and  tool  support  possible,  it  is  not  always  pos¬ 
sible  or  even  necessary  to  do  so.  Research  shows  that  effective 
real-time  collaboration  can  take  place  under  a  number  of  differ¬ 
ent  arrangements,  depending  on  the  task,  the  characteristics  of 
the  participants,  the  specific  geographical  dispersion  of  the 
participants,  and  the  processes  employed  to  manage  the  inter¬ 
actions.  There  are  also  organizational  effects,  especially  when 
the  real-time  collaborations  are  embedded  in  ongoing  activities, 
as  they  almost  always  are. 

For  instance,  early  work  (Williams,  1977)  showed  that,  in 
referential  communication  tasks,  full-duplex  audio  is  just  as  ef- 
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fective  as  FTF.  Subsequent  research  comparing  audio  and 
video  conferencing  (see  Finn,  Sellen,  &  Wilbur,  1997;  Cadiz 
et  al.,  2000  recently  found  similar  results  for  a  tutored  video-in¬ 
struction  task)  showed  that  for  many  tasks  audio  is  sufficient, 
that  video  adds  nothing  to  task  effectiveness  though  partici¬ 
pants  usually  report  they  are  more  satisfied  with  video.  There 
are  important  exceptions,  however.  Negotiation  tasks  are 
more  effective  with  video  (Short,  Williams,  &  Christie,  1976). 
This  is  probably  because  the  more  subtle  visual  cues  to  the 
participants’  intentions  are  important  in  this  kind  of  task.  Fur¬ 
ther,  Veinott,  Olson,  Olson,  and  Fu  (1999)  found  that  when 
participants  have  less  common  ground  video  helps.  In  their 
case,  participants  were  non-native  speakers  of  English  who 
were  doing  the  task  in  English.  For  native  speakers,  video  was 
no  better  than  audio,  but  non-native  speakers  did  better  when 
they  had  video.  Again,  visual  cues  to  comprehension  and  mean¬ 
ing  likely  played  an  important  role.  Recently,  an  experimental 
study  by  Daly-Jones,  Monk,  and  Watts  (1998)  showed  that 
high-quality  video  resulted  in  greater  conversational  fluency 
over  just  high-quality  audio,  especially  as  group  size  increased. 
There  was  also  a  higher  rated  sense  of  presence  in  the  video 
conditions. 

An  important  lesson  to  draw  from  this  literature  is  that  there 
are  two  broad  classes  by  which  we  might  assess  whether  video 
is  important  in  real-time  collaboration.  On  the  one  hand,  except 
for  tasks  like  negotiation  or  achieving  common  ground,  groups 
are  able  to  get  their  work  done  effectively  with  just  high-quality 
audio.  However,  for  things  like  satisfaction,  conversational  flu¬ 
ency,  and  a  sense  of  presence,  video  adds  value.  These  kinds  of 
factors  might  be  very  important  for  long-term  organizational 
consequences  like  employee  satisfaction.  As  of  yet,  no  long¬ 
term  studies  have  been  done  to  examine  this  conjecture. 

Audio  quality  is  critical.  Ever  since  early  literature  review 
(Egido,  1988),  it’s  been  reported  over  and  over  again  that  if  the 
audio  is  of  poor  quality  participants  will  develop  a  workaround. 
For  instance,  if  the  audio  in  a  video  conferencing  system  or  in  a 
web  conferencing  system  is  poor  quality,  participants  will  turn 
to  a  phone  conference. 

The  social  ergonomics  of  audio  and  video  are  also  keys  to 
their  success.  Many  of  the  failures  of  audio  conferencing,  espe¬ 
cially  over  the  Internet,  result  from  poor-quality  microphones, 
poor  microphone  placement,  poor  speakers,  and  interfering 
noises  like  air  conditioning.  Getting  these  details  right  is  essen¬ 
tial.  Similarly,  for  video,  camera  placement  can  matter  a  lot.  For 
instance,  Huang,  Olson,  and  Olson  (2002)  found  that  a  camera 
angle  that  makes  a  person  seem  tall  (as  opposed  to  actually  be¬ 
ing  tall)  affects  how  influential  a  person  is  in  a  negotiation  task. 
Apparent  height  matters  a  lot.  Other  aspects  of  camera  place¬ 
ment  or  arrangement  of  video  displays  make  a  big  difference  as 
well,  but  are  not  well  known. 

An  exception  is  eye  contact,  where  studies  of  FTF  communi¬ 
cation  show  that  eye  contact  is  a  key  linguistic  and  social  media¬ 
tor  of  communication  (Argyle  &  Cook,  1976;  Kendon,  1967).  It  is 
very  difficult  to  achieve  eye  contact  in  CMC  systems.  Many  at¬ 
tempts  have  been  made  (Gale  &  Monk,  2000;  Grayson  &  Monk, 
2003;  Monk  &  Gale,  2002;  Okada,  Maeda,  Ichicawaa,  &  Mat¬ 
sushita,  1994;  "Vertegaal,  1999;  Vertegaal,  Slagter,  van  der  'Veer,  & 
Nijholt,  2001),  and  at  least  the  subjective  reports  are  that  these 


can  be  effective.  But  these  all  require  special  equipment  or  se¬ 
tups.  And  they  don’t  scale  very  well  to  multiparty  sessions. 

■While  for  most  situations  having  at  least  high-quality  audio 
is  essential,  there  are  some  special  cases  where  a  text-based 
channel,  like  chat  or  instant  messaging,  can  work  fine.  For  in¬ 
stance,  in  the  Upper  Atmospheric  Research  Collaboratory 
(UARC,  later  known  as  the  “Space  Physics  and  Aeronomy  Re¬ 
search  Collaboratory”  or  SPARC),  a  chat  system  worked  very 
well  for  carrying  out  geographically  distributed  observational 
campaigns,  since  the  flow  of  events  in  these  campaigns  were 
relatively  slow  (campaigns  went  on  for  several  days,  key  events 
would  take  many  minutes  to  unfold).  McDaniel,  Olson,  and 
Magee  (1996)  compared  chat  logs  with  earlier  FTF  conversa¬ 
tions  at  a  remote  site  and  found  many  elements  of  them  very 
similar,  including  informal  socializing.  But  this  kind  of  ongoing 
scientific  campaign  is  very  unlike  the  interactions  that  take  place 
in  a  typical  meeting. 

Instant  messaging  is  a  new  communication  modality  that  is 
making  substantial  inroads  into  organizations.  Muller,  Raven, 
Kogan,  Millen,  and  Carey  (2003)  found  in  a  survey  study  of  three 
organizations  that  the  introduction  of  instant  messaging  led  to 
significantly  less  use  of  such  communication  channels  as  e-mail, 
voice-mail,  telephone,  teleconference,  pager,  and  face-to-face. 
They  also  found  that  instant  messaging  was  used  for  “substan¬ 
tive  business  purposes.”  Furthermore,  in  one  of  the  organiza¬ 
tions  where  they  surveyed  users  after  24  months  of  usage  they 
found  that  the  substantive  reasons  for  using  IM  increased.  In  a 
study  of  IM  logs  in  an  organization,  Isaacs,  'Walendowski,  Whit¬ 
taker,  Schiano,  and  Kamm  (2002)  found  that  a  large  proportion 
of  IM  conversations  involved  “complex  work  discussions.”  They 
found  that  IM  users  seldom  switched  to  another  communica¬ 
tion  channel  once  they  were  engaged  in  IM.  Nardi,  "Whittaker, 
and  Bradner  (2000)  observed  in  a  field  study  that  workers  used 
IM  for  a  variety  of  purposes,  not  just  for  information  exchange. 
Such  matters  as  quick  questions,  scheduling,  organizing  social 
interactions,  and  keeping  in  touch  with  others  were  common 
uses  of  IM.  Thus,  IM  has  emerged  as  a  significant  communica¬ 
tion  medium  in  the  workplace,  and  is  used  even  when  other, 
richer  communication  channels  were  available. 

■While  IM  is  a  relatively  new  phenomenon  in  the  workplace, 
it  is  clearly  established  as  a  useful  and  widely  used  tool  outside 
the  workplace.  This  will  undoubtedly  assist  in  the  development 
of  more  sophisticated  versions  of  the  tool,  as  well  as  its  integra¬ 
tion  into  on-line  conferencing  systems.  There  is  clearly  much 
promise  here.  We  have  noticed,  for  example,  that  during  online 
conferences  IM  or  chat  serves  as  a  backchannel  for  side  conver¬ 
sations  or  debugging,  an  extremely  useful  adjunct  to  the  core  au¬ 
dio  or  video  communication  taking  place  in  such  conferences. 

The  other  key  feature  of  successful  remote  meetings  is  the 
ability  to  share  the  objects  they  are  talking  about,  such  as  the 
agenda,  the  to-do  list,  the  latest  draft  of  a  proposal,  a  view  of  an 
object  to  be  repaired,  and  so  on.  Many  researchers  (Fussell, 
Kraut,  &  Siegel,  2000;  Karsenty,  1999;  Kraut,  Fussell,  &  Siegel, 
2003;  Kraut  et  al.,  2002;  Luff,  Heath,  Kuzuoka,  Hindmarsh,  & 
Oyama,  2003;  Nardi  et  al.,  1993;  Whittaker,  2003;  Whittaker, 
Geelhoed,  &  Robinson,  1993)  have  provided  experimental  evi¬ 
dence  of  the  value  of  a  shared  workspace  for  synchronous  audio- 
supported  collaboration.  More  traditional  video  conferencing 
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technologies  often  offer  an  “object  camera,”  onto  which  the  par¬ 
ticipants  can  put  a  paper  agenda,  Powerpoint  slides,  or  a  manu¬ 
factured  part.  More  generally,  any  form  of  video  can  also  be  used 
to  share  work  objects  (Fussell  et  al.,  2000;  Nardi  et  al.,  1993).  For 
digital  objects,  there  are  now  a  number  of  products  that  will 
allow  meeting  participants  to  share  the  screen  or,  in  some  cases, 
the  remote  operation  of  an  application.  Some  companies  are 
using  electronic  whiteboards,  both  in  a  collocated  meeting  and 
in  remote  meetings  to  mimic  the  choreography  of  people  using 
a  physical  whiteboard.  In  some  “collaboratories,”  scientists  can 
even  operate  remote  physical  instruments  from  a  distance  and 
jointly  discuss  the  results. 

Blogs 

Weblogs,  or  more  commonly  called  “blogs,”  have  burst  upon 
the  Internet  scene  in  recent  years.  Blogging  software  that  makes 
it  easy  to  put  up  multimedia  content  has  led  people  to  set  up 
sites  for  all  manner  of  purposes.  A  site  can  contain  text,  pictures, 
movies,  and  audio  clips.  A  common  social  purpose  is  to  keep 
an  on-line  diary.  Another  is  to  provide  commentary  on  a  topic  of 
interest.  For  instance,  blogs  played  a  major  role  in  the  2004  elec¬ 
tion  (Adamic  &  Glance,  2005).  Nardi,  Schiano,  Gumbrecht,  and 
Swartz  (2004)  studied  why  people  blog,  as  it’s  sometimes  puz¬ 
zling  that  people  would  essentially  share  personal  or  private  in¬ 
formation  about  themselves  through  the  web. 


COORDINATION  SUPPORT 


Meeting  Support 

An  early  and  popular  topic  in  CSCW  was  the  support  of  face-to- 
face  meetings.  A  number  of  systems  were  developed  and  tested. 
While  of  late  the  focus  has  shifted  to  the  support  of  geographi¬ 
cally  distributed  meetings,  the  early  work  on  meeting  support 
led  to  some  important  and  useful  conclusions. 

Some  meeting-support  software  imposed  structure  on  the 
process  of  the  meeting,  embodying  various  brainstorming  and 
voting  procedures.  Group  Decision  Support  Systems  (GDSSs) 
arose  from  a  number  of  business  schools,  focusing  on  large 
meetings  of  stakeholders  intent  on  going  through  a  set  series 
of  decisions,  such  as  prioritizing  projects  for  future  funding 
(Nunamaker,  Dennis,  Valacich,  Vogel,  &  George,  1991).  With  the 
help  of  a  facilitator  and  some  technical  support,  the  group  was 
led  through  a  series  of  stages:  brainstorming  without  evaluating, 
evaluating  alternatives  from  a  variety  of  positions,  prioritizing  al¬ 
ternatives,  and  so  on.  These  meetings  were  held  in  specialized 
rooms  in  which  individual  computers  were  embedded  in  the 
tables,  networked  to  central  services,  and  summary  displays 
shown  “center  stage.”  A  typical  scenario  involved  individuals 
silently  entering  ideas  into  a  central  repository,  and  after  a  cer¬ 
tain  amount  of  time,  they  were  shown  ideas  one  at  a  time  from 
others  and  asked  to  respond  with  a  new  idea  triggered  by  that 
one.  Later,  these  same  ideas  were  presented  to  the  individuals 
who  were  then  asked  to  rank  or  rate  them  according  to  some 


fixed  criterion,  like  cost.  Aggregates  of  individuals’  opinions 
were  computed,  discussed  further  and  presented  for  vote.  The 
system  applied  computational  power  (for  voting  and  rating 
mechanisms),  and  networking  control  (for  parallel  input)  to 
support  typically  weak  aspects  of  meetings.  These  systems  were 
intended  to  gather  more  ideas  from  participants,  since  one  didn’t 
have  to  wait  for  another  to  stop  speaking  in  order  to  get  a  turn. 
And,  anonymous  voting  and  rating  was  intended  to  insure  equal 
participation,  not  dominated  by  those  in  power. 

Evaluations  of  these  GDSSs  have  been  reviewed  producing 
some  generalizations  about  their  value  (McLeod,  1992;  Kraemer 
&  Pinsoneault,  1990;  Hollingshead  et  al.,  1993).  The  systems  in¬ 
deed  fulfill  their  intentions  of  producing  more  ideas  in  brain¬ 
storming  and  having  more  evaluative  comments  because  of 
anonymity.  Decisions  are  rated  as  higher  in  quality,  but  the 
meetings  take  longer  and  the  participants  are  less  satisfied  than 
those  in  traditional  meetings. 

A  second  class  of  technologies  to  support  real-time  meetings 
is  less  structured,  more  similar  to  individual  workstation  sup¬ 
port.  In  these  systems,  groups  are  allowed  access  to  a  single 
document  or  drawing,  and  can  enter  and  edit  into  them  simul¬ 
taneously  at  will.  Different  systems  enforce  different  “locking” 
mechanisms  (e.g.,  paragraph  or  selection  locking)  so  that  one 
person  does  not  enter  while  another  deletes  the  same  thing 
(Ellis,  Gibbs,  &  Rein,  1991).  Some  also  allow  parallel  individual 
work,  where  participants  view  and  edit  different  parts  of  the 
same  document,  but  can  also  view  and  discuss  the  same  part  as 
well.  This  kind  of  unstructured  shared  editor  has  been  shown  to 
be  very  effective  for  certain  kinds  of  free-flowing  meetings,  like 
design  or  requirements  meetings  (Olson,  Olson,  Storrosten,  & 
Carter,  1993).  The  rated  quality  of  the  meeting  products  (e.g.,  a 
requirements  document  or  plan)  was  higher  when  using  these 
technologies  than  with  traditional  whiteboard  or  paper-and- 
pencil  support,  but  like  working  in  GDSSs,  people  were  slightly 
less  satisfied.  The  lower  satisfaction  here  and  with  GDSSs  may 
reflect  the  newness  of  the  technologies;  people  may  not  have 
yet  learned  how  to  persuade,  negotiate,  or  influence  each  other 
in  comfortable  ways,  to  harness  the  powers  inherent  in  the  new 
technologies. 

These  new  technologies  did  indeed  change  the  way  in  which 
people  worked.  They  talked  less  and  wrote  more,  building  on 
each  other’s  ideas  instead  of  generating  far-reaching  other 
ideas.  The  tool  seemed  to  focus  the  groups  on  the  core  ideas, 
and  keep  them  from  going  off  on  tangents.  Many  participants 
reported  really  liking  doing  work  in  the  meetings  rather  than 
spending  time  only  talking  about  the  work. 

A  third  class  of  meeting  room  support  appears  in  elec¬ 
tronic  whiteboards.  Eor  example,  the  LiveBoard  (Elrod  et  al., 
1992),  SoftBoard  and  SmartBoard  are  approximately  4'  X6'  rear- 
projection  surfaces  that  allow  pen  input,  much  the  way  a  white¬ 
board  or  flipchart  does.  People  at  Xerox  PARC  and  Boeing  have 
evaluated  the  use  of  these  boards  in  meetings  in  extended  case 
studies.  In  both  cases,  the  board  was  highly  valued  because  of 
its  computational  power  and  the  fact  that  all  could  see  the 
changes  as  they  were  made.  At  both  sites,  successful  use  required 
a  facilitator  who  was  familiar  with  the  applications  running  to 
support  the  meeting.  At  Xerox,  suggestions  made  in  the  meet¬ 
ing  about  additional  functionality  were  built  into  the  system  so 
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that  it  eventually  was  finely  tuned  support  for  their  particular 
needs  (Moran  et  al.,  1996).  For  example,  they  did  a  lot  of  list 
making  of  freehand  text  items.  Eventually,  the  board  software 
recognized  the  nature  of  a  list  and  an  outline,  with  simple  ges¬ 
tures  changing  things  sensibly.  For  example,  if  a  freehand  text 
item  was  moved  higher  in  a  list,  the  other  items  adjusted  their 
positions  to  make  room  for  it.  The  end  product  was  not  only  a 
set  of  useful  meeting  tools,  but  also  a  toolkit  to  allow  people  to 
build  new  meeting  widgets  to  support  their  particular  tasks. 

Meetings  are  important,  though  often  despised,  organiza¬ 
tional  activities.  Laboratory  research  of  the  kind  just  reviewed 
has  shown  quite  clearly  that  well-designed  tools  can  improve 
both  work  outcomes  and  participant  satisfaction.  However, 
meetings  in  organizations  seldom  use  such  tools.  Inexpensive 
mobile  computing  and  projection  equipment  combined  with 
many  commercial  products  mean  that  such  tools  are  within 
reach  of  most  organizations.  But  not  having  these  elements 
readily  available  in  an  integrated  way  probably  inhibits  their 
widespread  adoption. 

While  traditional  meetings  are  often  viewed  as  wasteful  and 
frustrating,  there  can  be  huge  benefits  to  working  together  in 
collocated  environments.  Kiesler  and  Cummings  (2002)  reviewed 
a  number  of  the  characteristics  of  physical  collocation  that  can 
benefit  performance.  In  a  detailed  study  of  one  such  situation, 
Teasley,  Covi,  Krishnan,  and  Olson  (2002)  found  that  “radical  col¬ 
location,”  in  which  software  development  teams  worked  to¬ 
gether  in  a  dedicated  project  room  for  many  weeks,  dramatically 
improved  their  productivity.  Reasons  for  this  included  the  con¬ 
stant  awareness  of  each  other’s  work  status,  the  associated  abil¬ 
ity  to  instantly  work  on  an  impasse  as  a  group,  and  the  availability 
of  rich  shared  artifacts  generated  by  the  project. 

Workflow 

Workflow  systems  lend  technology  support  to  coordinated 
asynchronous  (usually  sequential)  steps  of  activities  among 
team  members  working  on  a  particular  task.  For  example,  a 
workflow  system  might  route  a  travel  reimbursement  voucher 
from  the  traveler  to  the  approving  party  to  the  accounts  payable 
to  the  bank.  The  electronic  form  would  be  edited  and  sent  to 
the  various  parties,  their  individual  to-do-lists  updated  as  they 
received  and/or  completed  the  tasks,  and  permissions  and  ap¬ 
proval  granted  automatically  as  appropriate  (e.g.,  allowing  small 
charges  to  an  account  if  the  charges  had  been  budgeted  previ¬ 
ously  or  simply  if  there  was  enough  money  in  the  account).  Not 
only  is  the  transaction  flow  supported,  but  also  records  are  of¬ 
ten  kept  about  who  did  what  and  when  they  did  it.  It  is  this  later 
feature  that  has  potentially  large  consequences  for  the  people 
involved,  discussed  later. 

These  workflow  systems  were  often  the  result  of  work  reengi¬ 
neering  efforts,  focusing  on  making  the  task  take  less  time  and  to 
eliminate  the  work  that  could  be  automated.  Not  only  do  work- 
flow  systems  therefore  have  a  bad  reputation  in  that  they  often 
are  part  of  workforce  reduction  plans,  but  also  for  those  left, 
their  work  is  able  to  be  monitored  much  more  closely.  The  sys¬ 
tems  are  often  very  rigid,  requiring,  for  example,  all  of  a  form  to 
be  filled  in  before  it  can  be  handed  off  to  the  next  in  the  chain. 


They  often  require  a  great  deal  of  rework  because  of  this  inflexi¬ 
bility.  It  is  because  of  the  inflexibility  and  the  potential  monitor¬ 
ing  that  the  systems  fall  into  disuse  (e.g.,  Abbott  &  Sarin,  1994). 

The  fact  that  workflow  can  be  monitored  is  a  major  source 
of  user  resistance.  In  Europe,  such  monitoring  is  illegal,  and 
powerful  groups  of  organized  workers  have  made  sure  that  such 
capabilities  are  not  in  workflow  systems  (Prinz  &  Kovenbach, 
1996).  In  the  United  States,  it  is  not  illegal,  but  many  employ¬ 
ees  complain  about  its  inappropriate  use.  For  example,  in  one 
software  engineering  team  where  workflow  had  just  been  intro¬ 
duced  to  track  bug  reports  and  fixes,  people  in  the  chain  were 
sloppy  about  noting  who  they  had  handed  a  piece  of  work  off  to. 
When  it  was  discovered  that  the  manager  had  been  monitoring 
the  timing  of  the  handoffs  to  assign  praise  or  blame,  the  team 
members  were  justifiably  upset  (Olson  &  Teasley  1996).  In  gen¬ 
eral,  managerial  monitoring  is  a  feature  that  is  not  well  received 
by  people  being  monitored  (Markus,  1983).  If  such  monitoring 
is  mandated,  workers’  behavior  will  conform  to  the  specifics  of 
what  is  being  monitored  (e.g.,  time  to  pass  an  item  off  to  the 
next  in  the  chain)  rather  than  perhaps  to  what  the  real  goal  is 
(e.g.,  quality  as  well  as  timely  completion  of  the  whole  process). 

Group  Calendars 

A  number  of  organizations  have  now  adopted  online  calendars, 
mainly  in  order  to  view  people’s  schedules  to  arrange  meetings. 
The  calendars  also  allow  a  form  of  awareness,  allowing  people 
to  see  if  a  person  who  is  not  present  is  expected  back  soon.  In¬ 
dividuals  benefit  only  insofar  as  they  offload  scheduling  meet¬ 
ings  to  others,  like  to  an  administrative  assistant,  who  can  write 
as  well  as  read  the  calendar.  And,  in  some  systems  the  individual 
can  schedule  private  time,  blocking  the  time  but  not  revealing 
to  others  his  or  her  whereabouts.  By  this  description,  on-line 
calendaring  is  a  classic  case  of  what  Grudin  (1988)  warned 
against,  a  misalignment  of  costs  and  benefits;  the  individual  puts 
in  the  effort  to  record  his/her  appointments  so  that  another,  in 
this  case  a  manager  or  coworker,  can  benefit  from  ease  of  sched¬ 
uling.  However,  since  the  early  introduction  of  electronic  cal¬ 
endaring  systems,  many  organizations  have  found  successful 
adoption  (Mosier  &  Tammaro,  1997;  Grudin  &  Palen,  1995;  Palen 
&  Grudin,  2002).  Apparently  such  success  requires  a  culture  of 
sharing  and  accessibility,  something  that  exists  in  some  organi¬ 
zations  and  not  others  (Lange,  1992;  Ehrlich,  1987). 

Awareness 

In  normal  work,  there  are  numerous  occasions  in  which  peo¬ 
ple  find  out  casually  whether  others  are  in  and,  in  some  cases, 
what  they  are  doing.  A  simple  walk  down  the  hall  to  a  printer  of¬ 
fers  numerous  glances  into  people’s  offices,  noting  where  their 
coats  are,  whether  others  are  talking,  whether  there  is  intense 
work  at  a  computer,  and  so  on.  This  kind  of  awareness  is  un¬ 
available  to  workers  who  are  remote.  Some  researchers  have 
offered  various  technology  solutions;  some  have  allowed  one  to 
visually  walk  down  the  hall  at  the  remote  location,  taking  a  five- 
second  glance  into  each  passing  office  (Bellotti  &  Dourish, 
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1997;  Fish,  Kraut,  Root,  &  Rice,  1993).  Another  similar  system, 
called  “Portholes,”  provides  periodic  snapshots  instead  of  full- 
motion  video  (Dourish  &  Ely,  1992).  Because  of  privacy  impli¬ 
cations,  these  systems  have  had  mixed  success.  The  places  in 
which  this  succeeds  are  those  in  which  the  individuals  seem  to 
have  a  reciprocal  need  to  be  aware  of  each  other’s  presence, 
and  a  sense  of  cooperation  and  coordination.  A  contrasting  case 
is  the  instant  messaging  (IM)  system  in  which  the  user  has  con¬ 
trol  as  to  what  state  they  wish  to  advertise  to  their  partners 
about  their  availability.  The  video  systems  are  much  more  light¬ 
weight  to  the  user  but  more  intrusive;  the  IM  ones  give  the  user 
more  control  but  require  intention  in  action.  Another  approach 
investigated  by  Ackerman,  Starr,  Hindus,  and  Mainwaring  (1997) 
looked  at  shared  audio  as  an  awareness  tool,  though  this  too 
has  privacy  implications. 

As  mentioned  earlier,  instant  messaging  systems  provide  an 
awareness  capability.  Most  systems  display  a  list  of  “buddies” 
and  whether  they  are  currently  on-line  or  not.  Nardi  et  al.  (2000) 
found  that  people  liked  this  aspect  of  IM  (see  also  Muller  et  al., 
2003;  Isaacs  et  al.,  2002).  And,  since  wireless  has  allowed  con¬ 
stant  connectivity  of  mobile  devices  like  PDAs,  this  use  of  track¬ 
ing  others  is  likely  to  grow.  But  again,  there  are  issues  of  moni¬ 
toring  for  useful  or  insidious  purposes,  and  the  issues  of  trust 
and  privacy  loom  large  (see  Godefroid,  Herbsleb,  Jagadeesan, 
&  Li,  2000). 

Another  approach  to  signaling  what  one  is  doing  occurs  at 
the  more  micro  level.  And  again,  one  captures  what  is  easy  to 
capture.  "When  people  are  closely  aligned  in  their  work,  there 
are  applications  that  allow  each  to  see  exactly  where  in  the 
shared  document  the  other  is  working  and  what  they  are  doing 
(Gutwin  &  Greenberg,  1999).  If  one  is  working  nearby  the 
other,  this  signals  perhaps  a  need  to  converse  about  the  direc¬ 
tions  each  is  taking.  Empirical  evaluations  have  shown  that  such 
workspace  awareness  can  facilitate  task  performance  (Gutwin  & 
Greenberg,  1999). 

Studies  of  attempts  to  carry  out  difficult  intellectual  work 
within  geographically  distributed  organizations  show  that  one 
of  the  larger  costs  of  geographical  distribution  is  the  lack  of 
awareness  of  what  others  are  doing  or  whether  they  are  even 
around  (Herbsleb,  Mockus,  Finholt,  &  Grinter,  2000).  Thus,  use¬ 
ful  and  usable  awareness  tools  that  mesh  well  with  trust  and  pri¬ 
vacy  concerns  could  be  of  enormous  organizational  importance. 
This  is  a  rich  research  area  for  CSCW 


INFORMATION  REPOSITORIES 


Repositories  of  Shared  Knowledge 

In  addition  to  sharing  information  generally  on  the  web,  in  both 
public  and  intranet  settings,  there  are  applications  that  are  ex¬ 
plicitly  built  for  knowledge  sharing.  The  goal  in  most  systems  is  to 
capture  knowledge  that  can  be  reused  by  others,  like  instruction 
manuals,  office  procedures,  training,  and  “boilerplates,”  or  tem¬ 
plates  of  commonly  constructed  genres,  like  proposals  or  bids. 
Experience  shows,  however,  that  these  systems  are  not  easy  wins. 
Again,  similar  to  the  case  of  the  on-line  calendaring  systems  de¬ 
scribed  above,  the  person  entering  information  into  the  system  is 


not  necessarily  the  one  benefiting  from  it.  In  a  large  consulting 
firm,  where  consultants  were  quite  competitive  in  their  bid  for  ad¬ 
vancement,  there  was  indeed  negative  incentive  for  giving  away 
one’s  best  secrets  and  insights  (Orlikowski  &  Gash,  1994). 

Sometimes  subtle  design  features  are  at  work  in  the  incen¬ 
tive  structure.  In  another  adoption  of  Lotus  Notes,  in  this  case 
to  track  open  issues  in  software  engineering,  the  engineers 
slowly  lost  interest  in  the  system  because  they  assumed  that 
their  manager  was  not  paying  attention  to  their  contributions 
and  use  of  the  system.  The  system  design,  unfortunately,  made 
the  manager’s  actual  use  invisible  to  the  team.  Had  they  known 
that  he  was  reading  daily  what  they  wrote  (though  he  never 
wrote  anything  himself),  they  would  likely  have  continued  to 
use  the  system  (Olson  &  Teasley  1996).  A  simple  design  change 
that  would  make  the  manager’s  reading  activity  visible  to  the 
team  would  likely  have  significantly  altered  their  adoption. 

The  web  of  course  provides  marvelous  infrastructure  for  the 
creation  and  sharing  of  information  repositories.  A  variety  of 
tools  are  appearing  to  support  this.  Of  particular  interest  are 
open  source  tools  that  allow  for  a  wider,  more  flexible  infrastruc¬ 
ture  for  supporting  information  sharing  (see  www.sakai.org).  A 
major  type  of  collaboratory  (see  below)  are  those  that  provide 
shared  data  repositories  for  a  community  of  scientists.  Systematic 
research  on  the  use  of  such  tools  is  needed. 

Wikis 

A  wiki  is  a  shared  web  space  that  can  be  edited  by  anyone  who 
has  access  to  it.  They  were  first  introduced  by  'Ward  Cunning¬ 
ham  in  1995,  but  have  recently  become  very  popular.  These  can 
be  used  in  a  variety  of  ways,  both  for  work  and  for  fun.  The  most 
famous  wiki  is  'Wikipedia  (www.wikipedia.org),  an  online  ency¬ 
clopedia  where  anyone  can  generate  and  edit  content.  It  has 
grown  to  have  millions  of  entries,  and  has  versions  in  at  least  ten 
languages.  A  recent  study  carried  out  hy  Nature  found  that  for 
science  articles  'Wikipedia  and  the  Encyclopedia  Britannica  were 
about  equally  accurate  (Giles,  2005).  Bryant,  Forte,  and  Bruck- 
man  (2005)  studied  the  contributors  to  ’Wikipedia,  and  sug¬ 
gested  that  a  new  publishing  paradigm  was  emerging.  "Viegas, 
’Wattenberg,  and  Dave  (2004)  developed  imaginative  visualiza¬ 
tions  of ’Wikipedia  authoring  and  editing  behavior  over  time. 

Capture  and  Replay 

Tools  that  support  collaborative  activity  can  create  traces  of  that 
activity  that  later  can  be  replayed  and  reflected  upon.  The  Up¬ 
per  Atmospheric  Research  Collaboratory  (UARC)  explored  the 
replay  of  earlier  scientific  campaign  sessions  (Olson  et  al.,  2001), 
so  that  scientists  could  reflect  upon  their  reactions  to  real-time 
observations  of  earlier  phenomena.  Using  a  ’VCR  metaphor,  they 
could  pause  where  needed,  and  fast  forward  past  uninteresting 
parts.  This  reflective  activity  could  also  engage  new  players  who 
had  not  been  part  of  the  original  session.  Abowd  (1999)  has  ex¬ 
plored  such  capture  phenomena  in  an  educational  experiment 
called  Classroom  2000.  Initial  experiments  focused  on  reusing 
educational  sessions  during  the  term  in  college  courses.  ’We  do 
net  yet  fully  understand  the  impact  of  such  promising  ideas. 
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SOCIAL  COMPUTING 


Social  Filtering,  Recommender  Systems 

We  often  find  the  information  we  want  by  contacting  others.  So¬ 
cial  networks  embody  rich  repositories  of  useful  information  on 
a  variety  of  topics.  A  number  of  investigators  have  looked  at 
whether  the  process  of  finding  information  through  others  can 
be  automated.  The  kinds  of  recommender  systems  that  we  find 
on  websites  like  Amazon.com  are  examples  of  the  result  of  such 
research.  The  basic  principle  of  such  systems  is  that  an  individ¬ 
ual  will  tend  to  like  or  prefer  the  kinds  of  things  (e.g.,  movies, 
books)  that  someone  who  is  similar  to  him/her  likes.  They  find 
similar  people  by  matching  their  previous  choices.  Such  systems 
use  a  variety  of  algorithms  to  match  preferences  with  those  of 
others,  and  then  recommend  new  items.  Resnick  and  Varian 
(1997)  edited  a  special  issue  of  the  Communication  of  the  ACM 
on  recommender  systems  that  included  a  representative  set  of  ex¬ 
amples.  Herlocker,  Konstan,  and  Riedl  (2001)  used  empirical 
methods  to  explicate  the  factors  that  led  users  to  accept  the  ad¬ 
vice  of  recommender  systems.  In  short,  providing  access  to  ex¬ 
planations  for  why  items  were  recommended  seems  to  be  the 
key.  Cosley  Frankowski,  Kiesler,  Terveen,  and  Riedl  (2005)  studied 
factors  that  influence  people  to  contribute  data  to  recommender 
systems.  Recommender  systems  are  emerging  as  a  key  element  of 
e-commerce  (Schafer,  Konstan,  &  Riedl,  2001).  Accepting  the  out¬ 
put  of  recommender  systems  is  an  example  of  how  people  come 
to  trust  technical  systems.  This  is  a  complex  topic,  and  relates  to 
issues  like  security  that  we  briefly  described  earlier. 

Trust  of  People  via  the  Technology 

It  has  been  said  that  “trust  needs  touch,”  and  indeed  in  survey 
studies,  coworkers  report  that  they  trust  those  who  are  collo¬ 
cated  more  than  those  who  are  remote  (Rocco,  Finholt,  Hofer, 
&  Herbsleb,  2000).  Interestingly,  those  who  spend  the  most  time 
on  the  phone  chatting  about  non-work  related  topics  with  their 
remote  coworkers  show  higher  trust  than  those  they  communi¬ 
cate  with  using  only  fax  and  e-mail.  But  lab  studies  show  that  tele¬ 
phone  interaction  is  not  as  good  as  face-to-face.  People  using  just 
the  telephone  behave  in  more  self-serving,  less-trusting  ways  than 
they  do  when  they  meet  face  to  face  (Drolet  &  Morris,  2000). 

What  can  be  done  to  counteract  the  mistrust  that  comes 
from  the  impoverished  media?  Rocco  (1998)  had  people  meet 
and  do  a  team-building  exercise  the  day  before  they  engaged  in 
the  social  dilemma  game  with  only  e-mail  to  communicate 
with.  These  people,  happily,  showed  as  much  cooperation  and 
trust  as  those  who  discussed  things  face  to  face  during  the 
game.  This  is  important.  It  suggests  that  if  remote  teams  can  do 
some  face-to-face  teambuilding  before  launching  on  their  pro¬ 
ject,  they  will  act  in  a  trusting/trustworthy  manner. 

Since  it’s  not  always  possible  to  have  everyone  on  a  project 
meet  face  to  face  before  they  launch  into  the  work,  what  else 
will  work?  Researchers  have  tried  some  options,  but  with  mixed 
success.  Zheng,  Bos,  Olson,  Gergle,  and  Olson  (2001)  found 
that  using  chat  for  socializing  and  sharing  pictures  of  each  other 
also  led  to  trustful  relations.  Merely  sharing  a  resume  did  not. 


When  the  text  is  translated  into  voice,  it  has  no  effect  on  trust, 
and  when  it  is  translated  into  voice  and  presented  in  a  moving 
human-like  face,  it  is  even  worse  than  text-chat.  (Jensen,  Farn- 
ham,  Drucker,  &  Kollock,  2000;  Kiesler,  Sproull,  &  Waters, 
1996).  However,  Bos,  Gergle,  Olson,  and  Olson  (2001)  found 
that  interactions  over  video  and  audio  led  to  trust,  albeit  of  a 
seemingly  more  fragile  form. 

If  we  can  find  a  way  to  establish  trust  without  expensive 
travel,  we  are  likely  to  see  important  productivity  gains.  Clearly 
the  story  is  not  over.  However,  we  must  not  be  too  optimistic. 
In  other  tasks,  video  does  not  produce  “being  there.”  There  is  an 
overhead  to  the  conversation  through  video;  it  requires  more  ef¬ 
fort  than  working  face  to  face  (Olson  et  al.,  1995).  And,  today’s 
videos  over  the  Internet  are  both  delayed  and  choppy,  producing 
cues  that  people  often  associate  with  lying.  One  doesn’t  trust 
someone  who  appears  to  be  lying.  Trust  is  a  delicate  emotion; 
today’s  video  might  not  just  do  it  in  a  robust  enough  fashion. 


COMPUTER  SUPPORTED  COOPERATIVE 
LEARNING 


Obviously  the  range  of  CSCW  systems  that  we’ve  been  describ¬ 
ing  can  be  used  for  a  variety  of  purposes.  One  special  area  that 
has  emerged  as  a  subfield  with  a  distinctive  identity  is  educa¬ 
tion,  using  the  name  Computer  Supported  Cooperative  Learn¬ 
ing  (CSCL).  This  field  first  emerged  in  Europe,  but  has  become 
quite  widespread.  This  area  has  its  own  journals  and  national 
meetings  (see  Koschmann,  1996;  Koschmann,  Hall,  &  Miyake, 
2002;  Koschmann,  Suthers,  &  Chan,  2005). 

The  emergence  of  this  field  has  coincided  with  the  emer¬ 
gence  of  a  general  trend  in  education  toward  collaborative 
learning,  of  using  working  with  peers  in  groups  as  an  effective 
tool  for  education  (Slavin,  1994).  The  core  idea  is  that  by  work¬ 
ing  together  with  peers  the  interactions  over  problem  solving 
and  other  learning  activities  will  occur  at  a  level  that  is  most 
easily  understood  and  engaged  by  the  learners.  What  CSCL  adds 
to  this  is  a  focus  on  software  that  can  facilitate  collaboration, 
particularly  across  distance. 


INTEGRATED  SYSTEMS 


Media  Spaces 

As  an  extension  of  video  conferencing  and  awareness  systems, 
some  people  have  experimented  with  open,  continuous  audio 
and  video  connections  between  remote  locations.  In  a  number  of 
cases,  these  experiments  have  been  called  “Media  Spaces.”  For 
example,  at  Xerox,  two  labs  were  linked  with  an  open  video  link 
between  two  commons  areas  (Olson  &  Bly  1991),  the  two  loca¬ 
tions  being  Palo  Alto,  California,  and  Portland,  Oregon.  Evaluation 
of  these  experiments  showed  that  maintaining  organizational  co¬ 
hesiveness  at  a  distance  was  much  more  difficult  than  when 
members  are  collocated  (Finn  et  al.,  1997).  However,  some  con¬ 
nectedness  was  maintained.  'Where  many  of  these  early  systems 
were  plagued  with  technical  difficulties,  human  factors  limitations. 
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or  very  large  communication  costs,  in  today’s  situation  it  might 
actually  be  possible  to  overcome  these  difficulties,  making  me¬ 
dia  a  possibility  for  connecting  global  organizations.  A  new  round 
of  experimental  deployments  with  new  tools  is  needed. 

Collaborative  Virtual  Environments 

Collaborative  virtual  environments  are  3D  embodiments  of 
multi-user  domains  (MUDs).  The  space  in  which  people  inter¬ 
act  is  an  analog  of  physical  space,  with  dimensions,  directions, 
rooms,  and  objects  of  various  kinds.  People  are  represented  as 
avatars — simplified,  geometric,  digital  representations  of  people, 
who  move  about  in  the  3D  space  (Singhal  &  Zyda,  1999).  Simi¬ 
lar  to  MUDs,  the  users  in  a  meeting  situation  might  interact  over 
some  object  that  is  digitally  represented,  like  a  mock  up  of  a  real 
thing  (e.g.,  an  automobile  engine,  an  airplane  hinge,  a  piece  of 
industrial  equipment)  or  with  visualizations  of  abstract  data  (e.g., 
a  3D  visualization  of  atmospheric  data).  In  these  spaces,  one  can 
have  a  sense  as  to  where  others  are  and  what  they  are  doing,  sim¬ 
ilar  to  the  simplified  awareness  systems  described  above.  In  use, 
it  is  difficult  to  establish  mutual  awareness  or  orientation  in  such 
spaces  (Hindmarsh,  Fraser,  Heath,  Benford,  &  Greenhalgh,  1998; 
Park,  Kapoor,  &  Leigh,  2000;  Yang  &  Olson,  2002).  There  have 
even  been  some  attempts  to  merge  collaborative  virtual  envi¬ 
ronments  with  real  ones,  though  with  limited  success  so  far 
(Benford,  Greenhalgh,  Reynard,  Brown,  &  Koleva,  1998). 

"What  people  seem  to  want  is  more  like  the  Holodek  in  “Star 
Trek.”  These  environments  are  complicated  technically,  and  per¬ 
haps  even  more  complicated  socially.  In  real  life,  we  have  de¬ 
veloped  interesting  schemes  that  trigger  behavior  and  interpre¬ 
tation  of  others’  behavior  as  a  function  of  real  distance,  a  field 
called  “Proxemics”  (Hall,  1982).  Only  when  these  subtle  behav¬ 
iors  are  incorporated  into  the  virtual  environment  will  we  have 
a  chance  of  simulating  appropriate  interhuman  behavior  in  the 
virtual  3D  world. 

Collaboratories 

A  collaboratory  is  a  laboratory  without  walls  (Finholt  &  Olson, 
1997).  From  a  National  Research  Council  report,  a  collaboratory 
is  supposed  to  allow  “the  nation’s  researchers  [to]  perform 
their  research  without  regard  to  geographical  location — inter¬ 
acting  with  colleagues,  accessing  instrumentation,  sharing  data 
and  computational  resources  [and]  accessing  information  in 
digital  libraries”  (National  Research  Council,  1993,  p.  7).  Starting 
in  the  early  1990s,  these  capabilities  have  been  configured  into 


support  packages  for  a  number  of  specific  sciences  (see  Finholt, 
2002).  The  Science  of  Collaboratories  project  (www.science 
ofcollaboratories.org)  has  identified  more  than  200  existing  col¬ 
laboratories  and  is  drawing  lessons  about  why  some  succeed 
and  others  don’t  (Olson  et  al.,  2004). 

A  number  of  companies  have  also  experimented  with  simi¬ 
lar  concepts,  calling  them  “virtual  collocation.”  The  goal  there  is 
to  support  geographically  dispersed  teams  as  they  carry  out 
product  design,  software  engineering,  financial  reporting,  and 
almost  any  business  function.  In  these  cases,  suites  of  off-the- 
shelf  groupware  tools  have  been  particularly  important  and 
have  been  used  to  support  round-the-clock  software  develop¬ 
ment  among  overlapping  teams  of  engineers  in  time  zones 
around  the  world.  (Carmel,  1999).  There  have  been  a  number  of 
such  efforts,  and  it  is  still  unclear  as  to  their  success  or  what  fea¬ 
tures  make  their  success  more  likely.  (Olson  &  Olson,  2000). 


CONCLUSIONS 


Groupware  functionality  is  steadily  becoming  more  routine  in 
commercial  applications.  Similarly,  suites  of  groupware  func¬ 
tions  are  being  written  into  operating  systems.  Many  of  the  func¬ 
tions  we  have  described  in  this  article  are  becoming  ordinary 
elements  of  infrastructure  in  networked  computing  systems. 

Prognosticators  looking  at  the  emergence  of  groupware  and 
the  convergence  of  computing  and  communication  media  have 
forecast  that  distance  will  diminish  as  a  factor  in  human  interac¬ 
tions  (e.g.,  Cairncross,  1997).  However,  to  paraphrase  Mark  Twain, 
the  reports  of  distance’s  death  are  greatly  exaggerated.  Even  with 
all  our  emerging  information  and  communications  technologies, 
distance  and  its  associated  attributes  of  culture,  time  zones,  ge¬ 
ography,  and  language  will  continue  to  affect  how  humans  inter¬ 
act  with  each  other.  Emerging  distance  technologies  will  allow 
greater  flexibility  for  those  whose  work  must  be  done  at  a  dis¬ 
tance,  but  we  believe  (see  Olson  &  Olson,  2000)  that  distance  will 
continue  to  be  a  factor  in  understanding  these  work  relationships. 
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INTRODUCTION 


Pioneering  research  work  in  the  early  1980s  introduced  and  de¬ 
fined  the  concept  of  usability  as  critical  to  the  success  of  inter¬ 
active  products  and  systems  (Eason,  1984).  From  the  1990s 
there  has  been  a  developing  relationship  between  usability  and 
the  web,  such  that  for  many  people  the  two  terms  are  inextri¬ 
cably  linked.  Specifically,  not  only  does  the  web  demand  good 
usability,  it  can  be  argued  that  usability  necessitates  the  web. 

There  are  five  key  reasons  why  usability  and  the  web  have 
become  so  closely  associated  with  each  other: 

1.  Web  users  are  ubiquitous.  For  many  countries,  the  people 
who  access  the  web  are  approaching  the  population  as  a 
whole.  For  instance,  within  Great  Britain  5596  of  households 
can  now  access  the  Internet  from  home.^  For  some  countries 
(i.e..  United  States,  Denmark,  Australia),  it  has  been  esti¬ 
mated  that  over  7096  of  the  population  are  regularly  using 
the  web.^  Such  diversity  raises  a  large  number  of  issues  re¬ 
lated  to  interdependent  variables  such  as  age,  gender,  cul¬ 
ture,  disability,  language  abilities,  computer  skills/knowledge, 
domain  skills/knowledge,  and  so  on.  In  particular,  consid¬ 
erable  ongoing  research  considers  the  accessibility  of  the 
web;  that  is,  websites  can  be  designed  to  account  for  the 
needs  of  as  many  people  as  possible  (i.e.,  those  with  visual 
impairments)  (Abascal  &  Nicolle,  2001). 

2.  Web  users  are  largely  discretionary  users.  Apart  from  the  spe¬ 
cific  case  of  the  work  context  and  company  intranets,  web 
users  generally  do  not  have  to  use  a  particular  site  (or  even 
the  web  at  all)  to  achieve  their  goals.  They  have  alternatives 
available  to  them  (i.e.,  visiting  another  website,  making  a 
phone  call,  visiting  a  shop),  and  if  they  experience  usability 
problems,  they  do  not  necessarily  have  to  struggle  with  or 
adapt  to  the  poor  interface.  They  are  empowered  to  explore 
existing  options. 

3.  Web  usability  problems  have  a  clear  relationship  with  sales. 
For  websites  aiming  to  sell  products  or  services,  poor  usabil¬ 
ity  directly  impacts  sales.  If  a  user  cannot  find  the  product 
or  relevant  information,  he  or  she  is  unlikely  to  continue  in 
the  transaction.  In  this  case,  it  has  been  noted  that  usability 
is  affecting  the  experience  of  a  product  prior  to  purchase 
(Nielson,  2000).  Consequently,  the  website  information 
space  must  consider  many  of  the  issues  (i.e.,  navigation,  lay¬ 
out  of  items,  labelling)  that  are  relevant  to  the  traditional 
design  of  physical  space  for  shops. 

4.  The  web  is  evolving  at  a  rapid  pace.  Technical  characteristics 
are  continually  changing  in  response  to  new  application/task 
areas,  facilitated  by  computer  processing  power  and  com¬ 
munication  speeds.  This  has  an  immediate  impact  on  the 
functionality  available  to  web  users  (i.e.,  animations,  videos), 
developments  in  user  interface  design  (i.e.,  clickable  items, 
mouse-over  navigation),  as  well  as  tools  (i.e.,  cookies,  plug¬ 
ins).  In  contrast,  users’  skills,  knowledge,  and  expectations 


are  considerably  slower  to  evolve,  leading  to  an  inevitable 
gulf  between  users  (particularly  infrequent  users)  and  the 
web. 

5.  Website  technical  development  is  easy.  It  is  simple  to  have  a 
presence  on  the  web  when  compared  with  the  resources 
necessary  for  traditional  product-development  processes.  To 
generate  a  website  requires  few  technical  skills,  and  one  can 
do  so  with  no  programming  experience.  This  has  con¬ 
tributed  to  the  vast  number  of  websites^  ultimately  adding  to 
the  complications  of  navigation,  whilst  emphasising  the  im¬ 
portance  of  usability  as  a  differentiating  factor. 

In  this  chapter  we  first  consider  what  is  it  that  makes  the  web 
hard  to  use.  We  look  specifically  at  browsing  and  linking,  finding 
things  with  search  and  query,  increasing  the  relevance  of  things 
we  find  with  personalization,  and  user  interface  issues.  The  sub¬ 
sequent  sections  look  at  some  solutions  to  these  problems. 


WHAT  MAKES  THE  WEB  HARD  TO  USE? 


As  successful  as  the  web  is  for  delivering  information  globally 
and  rapidly,  many  problems  remain  that  make  it  an  unpleasant 
or  unproductive  experience  for  some  users,  or  even  impossible 
for  other  users. 

Much  of  the  difficulty  in  using  the  web  lies  in  the  vast  quan¬ 
tity  of  information  available — browsing  and  searching  becomes 
increasingly  difficult  and  imprecise  in  its  results.  Then  having  lo¬ 
cated  what  one  wants,  technical  flaws,  such  as  broken  links  or 
browser  incompatibilities,  often  render  the  information  un¬ 
reachable  or  unreadable.  Most  importantly,  web  pages  or  the 
browser  interface  itself  can  give  a  poor  presentation,  and  for 
some  classes  of  users,  make  the  information  inaccessible. 

Browsing  and  Linking:  "What’s  Wrong  With 
the  World  Wide  Web?"  Revisited 

Almost  a  decade  ago,  the  difficulties  in  using  the  web  were  ex¬ 
plored  in-depth,  and  presented  a  vision  of  “fourth-generation 
hypermedia”  (Bieber,  Vitali,  Ashman,  Balasubramanian,  &  Oinas- 
Kukkonen,  1997).  In  particular,  the  importance  of  hypermedia 
in  the  structuring  of  information  was  described: 

hypermedia  provides  contextual,  navigational  access  for  viewing  infor¬ 
mation  and  .  .  .  represents  knowledge  in  a  form  relatively  close  to  the 
cognitive  organizational  structures  that  people  use.  Thus  hypermedia 
supports  understanding. 

The  judicious  use  of  hypermedia  enhances  a  user’s  compre¬ 
hension.  When  that  use  is  tailored  specifically  to  the  user’s  re¬ 
quirements,  i.e.,  it  is  personalized,  the  user’s  comprehension 
and  productivity  are  maximized. 


^Figures  for  July  2005  from  web.statistics.gov.uk 
^Figures  for  2005  from  www.internetworldstats.com 
^Estimated  to  be  over  9  million  in  2002 — www.oclc.org 
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The  authors  went  on  to  define  a  table  of  desired  hypermedia 
features  (Bieber  et  al.,  1997).  Of  particular  relevance  to  HCI  and 
the  web  are  the  personalization  of  links  by  annotation,  compu¬ 
tation,  and  overviews. 

Surprisingly,  nearly  a  decade  later,  many  of  these  desired  fea¬ 
tures  are  still  lacking  from  the  mainstream  web  applications.  In 
particular  the  personalization  of  links  and  content  is  still  largely 
unachievable  without  specialist  tools.  In  fact,  some  backsliding 
on  this  is  evident  over  the  past  decade,  as  the  mid-1990s  Mo¬ 
saic  browser  supported  the  creation  of  annotations  at  both  per¬ 
sonal  and  group  level  and,  more  importantly,  offered  this  ser¬ 
vice  by  default  in  a  standard  browser."*  Mosaic  development, 
however,  was  discontinued  when  Netscape  became  widespread. 
W3C’s  Annotea  project^  offers  the  same  essential  functionality 
but  not  within  a  standard  browser  and  not  without  setting  up 
effort  from  a  technically  literate  user — annotations  are  by  no 
means  a  standard  or  easily  accessible  web  service. 

Creating  personalized  annotations  is  challenging  enough, 
but  the  creation  of  one’s  own  links  over  third-party  data  sources 
is  essentially  unknown.  This  is  not  due  to  any  shortcomings  in 
the  technology;  as  in  fact  many  solutions  have  been  available 
within  the  research  community  for  some  time  (see  “Browsing 
and  Linking”).  The  earliest  plans  for  web  browsers,  dating  back 
to  1991  (Cailliau  &  Ashman,  1999),  anticipated  a  “writeable  web” 
where  personalized  links  and  annotations  would  be  easily  sup¬ 
ported,  yet  this  has  not  become  mainstream  web  technology. 

Another  key  aspect  of  personalized  links  which  is  inevitably 
overlooked  is  that  personal  links  are  private  links.  If  one  creates 
a  web  page,  complete  with  links,  then  those  links  are  necessar¬ 
ily  as  available  as  the  content.  This  is  not  always  desirable,  es¬ 
pecially  if  links  record  one’s  associations  and  collected  thoughts, 
and  the  intellectual  property  represented  by  those  associations 
has  value.  The  technology  for  personalized  links  gives  the  user 
private  links,  over  public  documents,  in  much  the  same  way  that 
users’  bookmarks  are  private.  For  privacy  reasons  alone,  the 
technology  for  personalized  links  should  be  requisite  for  all 
mainstream  browsers. 

Other  desired  hypermedia  features,  such  as  local  overviews, 
are  provided  not  as  a  standard  web  service  but  as  a  site-specific 
courtesy  page  optionally  supplied  by  site  managers.  Global 
overviews  seem  an  increasingly  distant  feature,  as  the  vast  com¬ 
plexity  of  the  web  makes  the  usual,  graph-based  representations 
completely  unworkable,  both  computationally  and  visually.  The 
earlier  paper  noted  that: 

Web  browsers  have  no  inherent  way  of  presenting  the  structure  and 
interrelationships  of  data  of  any  sort.  For  example,  there  is  no  way  to 
visuaiize  even  the  simple  interrelationships  of  web  documents,  such 
as  “Where  can  I  go  from  here?”  or  “Which  documents  point  to  this  doc¬ 
ument?”  The  reader  has  no  idea  of  the  position  of  a  given  document 
within  the  corpora  unless  an  author  explicitly  embeds  such  details. 
(Bieber  et  al.,  1997) 

So  why  is  it  that  so  little  has  seemingly  been  done  to  address 
these  outstanding  problems?  It  could  be  because  the  technol¬ 


ogy  is  too  inaccessible  to  the  nonspecialist  user  and  that  the 
obvious  upfront  costs  of  assimilating  the  technology  outweigh 
the  perceived  benefits.  It  still  seems  to  be  true  that: 

experienced  readers  learn  to  get  by  with  the  available  functionality. 
However,  the  use  of  web  technology  is  in  part  determined  by  its  capa¬ 
bilities.  Readers  learn  to  make  do  with  the  available  tools  instead  of 
demanding  better  tools,  perhaps  because  they  are  not  aware  of  how 
better  tools  might  help  them.  (Bieber  et  al.,  1997) 

Only  when  we  make  the  web  more  usable  to  the  non¬ 
specialist  user  will  the  use  of  all  but  the  most  basic  tools  will  be¬ 
come  commonplace,  resulting  in  higher  productivity  and  bet¬ 
ter  assimilation  of  information  in  the  user. 

So  far,  we  have  seen  the  web  does  not  offer  widespread  sup¬ 
port  for  personalization  of  links  and  other  browsing  activities. 
■We  now  look  at  the  other  major  means  of  web  information  ac¬ 
cess:  searching  and  querying. 

Finding  Things  (i):  Search  and  Query  on  the  Web 

One  recent  estimate  has  placed  the  size  of  the  "World  Wide  Web 
at  around  11.5  billion  pages  (Gulli  &  Signorini,  2005).  This  figure, 
representing  nearly  1.8  web  pages  per  person  alive  today, relates 
only  to  static  or  fixed  web  pages.  It  has  been  suggested  that  if  this 
survey  was  extended  to  include  more  ephemeral  information 
(i.e.,  individual  auction  items  or  ongoing  news  stories),  then  the 
correct  figure  is  in  fact  500  times  larger  (Bergman,  2001). 

With  this  wealth  of  information,  the  web  would  be  untenable 
without  mechanisms  to  assist  navigation  and  file  location.  The 
most  common  web  tool  in  use  today  is  the  search  engine.  To 
use  a  search  engine  the  user  must  submit  a  series  of  terms 
known  as  a  query.  This  query  in  some  way  formulates  and  em¬ 
bodies  what  the  user  of  the  search  system  would  like  to  retrieve 
information  about.  The  usual  output  produced  by  the  search 
engine  is  a  small  set  of  links  to  a  selection  of  web  documents  ex¬ 
tracted  from  the  billions  available.  These  web  pages  represent 
the  search  engine’s  response  to  the  user’s  information  need, 
each  document  having  been  determined  by  the  system  as  prob¬ 
ably  relevant  to  the  query. 

Presented  with  a  set  of  results,  the  user  selects  a  web  page. 
If  the  document  fails  to  meet  the  user’s  requirement,  they  re¬ 
turn  to  the  search  engine  for  another  result,  or  modify  the 
query  in  light  of  previous  results.  This  continues  until  the  user’s 
need  is  satisfied  or  their  commitment  to  the  search  wanes  (Har¬ 
man,  1992). 

From  the  perspective  of  a  typical  user,  this  has  two  main 
problem  areas: 

1 .  the  confusion  and  uncertainty  surrounding  query  formulation 

2.  the  impenetrability  of  seemingly  endless  results 

In  “Searching  and  Querying”  we  discuss  these  obstacles  and 
review  a  range  of  solutions. 


“*111 1994  the  Mosaic  browser  was  the  “killer  application”  for  the  Internet,  being  the  first  major  web  browser,  freeiy  available  from  NCSA.  Its  developers 
later  spun  off  the  Netscape  company. 

^http:/Avww.  w3  .org/200 1/Annotea/ 

®  http://www.geohive.com  approximates  the  population  of  the  Earth  at  6  483  981  671  (5th  January  2005). 
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Finding  Things  (ii):  Relevance 

While  browsing  and  search-and-query  are  excellent  tools  for 
finding  information  on  the  web,  they  remain  generic  functions 
whose  behavior  is  the  same,  regardless  of  the  context.  What  re¬ 
mains  to  be  considered  now  is  how  to  further  filter  the  mass  of 
information  on  the  web  according  to  its  relevance,  not  to  a 
query,  but  to  the  individual  user  or  to  a  community  of  users. 

Most  information  systems  are  designed  for  a  hypothetical 
“average”  user.  This  “one-size-fits-all”  approach  ignores  diversity 
in  cultural  and  educational  backgrounds,  abilities,  objectives 
and  aspirations.  An  information  system  with  a  single  user  inter¬ 
face  for  all  users  is  conceptually  the  same  a  car  manufacturer 
selling  a  car  in  only  a  single  color — “any  color  so  long  as  it’s 
black”. ^ 

One  solution  is  to  build  personalizable  information  systems, 
delivering  content  specific  to  requirements  of  different  users. 
Without  such  systems,  effective  universal  accessibility  to  infor¬ 
mation  cannot  be  achieved.  With  it,  every  computer  user  and 
special  interest  group  will  have  personally  tailored  access  to  in¬ 
formation  sources. 

Personalization  can  take  many  forms,  and  can  involve  the  tai¬ 
loring  of  hypertext  links  (as  discussed  in  “What  Makes  the  Web 
Hard  to  Use”),  the  tailoring  of  presentation  (often  an  accessibil¬ 
ity  issue,  see  “User  Interface  Issues”)  and  the  tailoring  of  content. 

Content  personalization  has  many  benefits,  for  not  just 
users,  either  as  individuals  or  within  a  community,  but  also  for 
the  providers  of  information. 

For  individuals,  the  tailoring  of  content  can  reflect  a  wide  va¬ 
riety  of  requirements,  but  is  always  aimed  at  providing  informa¬ 
tion  most  pertinent  to  that  user  in  their  current  context.  In 
e-learning  applications  for  example,  more  challenging  lessons 
are  not  served  to  the  user  until  mastery  of  prerequisite  material 
is  achieved.  In  e-commerce,  a  simple  form  of  user  consensus 
(see  “Document  Clustering”)  underlies  a  recommender  system 
that  personalizes  suggestions  for  further  purchases  based  on  the 
current  users’  purchasing  history  similarity  to  that  of  other  users. 

For  groups,  personalization  of  content  could  help  create 
communities  of  common  interest  amongst  otherwise  disparate 
users.  Personalization  can  also  support  existing  communities 
of  users,  by  creating  portals  for  accessing  materials  which  are 
“personalized”  to  the  interests  of  the  group  or  community. 

Personalization  of  information  can  also  improve  the  whole 
computer  use  experience  for  the  two  major  groups  of  underrep¬ 
resented  users:  (a)  non-American/English  cultural  and  language 
groups,  and  (b)  people  with  special  information  needs.  The  web 
may  have  been  “invented”  by  Europeans  (Cailliauetal.,  1999),  but 
its  subsequent  development  from  1993  was  U.S.-based  and  has 
a  distinctly  American  culture,  including  a  predominance  of  the 
English  language,  linguistic  form  and  alphabets.® 

Eor  information  providers  and  publishers,  personalization 
could  create  a  delivery  of  information  as  suitable  as  possible  to 


each  individual  user  or  special-interest  group.  In  the  normal  pub¬ 
lishing  model,  it  is  not  economically  feasible  for  publishers  or 
information  owners  to  publish  information  in  a  multitude  of  dif¬ 
ferent  forms,  but  with  personalization  and  adaptation  (see  “Per¬ 
sonalization”),  it  would  be  possible  to  do  the  personalization  “on 
the  fly”  at  no  additional  cost  to  the  publishers,  bringing  enor¬ 
mous  economic  advantage  to  publishers  using  the  technology. 

This  is  essentially  the  same  principle  encountered  in  teach¬ 
ing  lectures  present  the  information  in  a  fairly  general  fashion, 
while  tutorials  or  other  small-group  teaching  give  the  students  a 
chance  to  specify  exactly  what  they  need  to  know  and  to  be 
given  examples  most  helpful  to  them.  In  fact,  the  pressure  on 
teaching  resources  has  partly  motivated  adaptation  and  person¬ 
alization  in  e-learning,  and  similarly  it  is  the  lack  of  resources 
preventing  publishers  from  delivering  their  materials  tailored  to 
special-interest  groups  and  to  the  individual.  With  e-learning, 
adaptation  and  personalization  have  arisen  because  teachers 
must  offer  personalized  tutoring  but  are  increasingly  short  of  re¬ 
sources.  With  e-commerce,  multicultural  groups  and  special 
needs  groups,  adaptation  and  personalization  can  also  be  ap¬ 
plied  to  offer  an  equally  valuable  specialisation  of  information 
that  is  just  not  otherwise  available. 

User  Interface  Issues 

We  have  just  seen  how  awareness  of  context  played  an  impor¬ 
tant  part  in  the  personalization  of  content  and  links.  However 
context  also  needs  to  be  considered  in  any  environment,  not 
just  for  personalization  purposes. 

Context  of  Use 

■When  investigating  the  usability  of  web  user  interfaces,  one 
must  first  consider  the  overriding  issue  of  context.  Context  of 
use  is  seen  as  a  critical  constituent  of  usability  defined  by  ISO 
(1998)  to  “consist  of  the  users,  tasks,  and  equipment  (hardware, 
software,  and  materials)  and  the  physical  and  social  environ¬ 
ments  in  which  a  product  is  used.”  With  respect  to  the  web,  the 
user,  the  task,  and  the  environment,  issues  are  constantly  evolv¬ 
ing,  leading  to  new  challenges  for  HCI  researchers  and  practi¬ 
tioners.  Eor  instance,  the  traditional  environment  for  web  users 
would  have  been  the  workplace,  but  statistics  show  that  this  is 
no  longer  the  single  situation  for  web  use.  Eighty-eight  percent 
of  UK  web  users  access  the  Internet  primarily  from  home.^  Also, 
an  increasing  number  of  people  now  access  the  Internet 
through  mobile  devices  (i.e.,  laptops,  phones,  PDAs).  In  these 
situations,  designers  must  consider  a  much  wider  range  of  phys¬ 
ical  environment  factors,  for  instance,  varying  lighting,  noise 
and  thermal  conditions,  as  well  as  other  tasks  that  users  may  si¬ 
multaneously  carry  out. 


^Henry  Ford  is  credited  with  saying  “You  can  paint  it  any  color,  so  long  as  it’s  black,”  but  the  Model  T  eventually  appeared  in  many  colors.  Customer 
demand  can  motivate  personalization. 

®The  American  culture  and  set  of  assumptions  is  reflected  in  not  only  issues  such  as  inadvertently  rendering  Arabic  script  backwards  (left  to  right) 
because  browsers  render  left  to  right  by  default,  but  also  in  American  spelling  of  HTML  tags,  such  as  “color”  and  “gray.” 

^Figures  for  July  2005  from  www.statistics.gov.uk 
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For  example,  there  are  widespread  concerns  regarding  the 
use  of  Internet  services  within  road-based  vehicles  (Lai,  Cheng, 
Green,  &  Tsimhoni,  2002;  Burnett,  Summerskill,  &  Porter, 
2004).  Access  to  the  web  while  driving  may  provide  a  range  of 
tangible  benefits  to  drivers,  some  driving-related  (i.e.,  pre-book¬ 
ing  a  parking  space,  accessing  real-time  traffic  information),  and 
others  oriented  toward  productivity  or  entertainment  needs 
(i.e.,  viewing  the  latest  information  on  stocks,  downloading  MP3 
files).  Traditional  access  to  the  web  through  desktop  comput¬ 
ers  has  utilized  user  interface  paradigms  that  are  both  highly 
visual  and  manual  (i.e.,  scanning  a  page  for  a  link  and  then  using 
a  mouse  to  point  and  then  click  on  the  link).  Such  activities  are 
in  clear  conflict  with  the  safety/time-critical  task  of  driving, 
which  places  heavy  demands  on  the  visual  modality,  while  re¬ 
quiring  continuous  manual  responses  (i.e.,  turning  the  steer¬ 
ing  wheel) .  Clearly,  designers  need  to  establish  fundamentally 
new  interaction  styles  for  use  in  a  driving  context  of  use  (i.e., 
speech  recognition,  voice  output,  haptic  interfaces;  see  for  ex¬ 
ample  Iwata  et  al.  in  this  book).  As  part  of  this  process,  design¬ 
ers  of  interfaces  for  in-car  Internet  services  must  consider 
whether  drivers  should  be  given  access  to  functionality  while 
the  vehicle  is  in  motion.  Considerable  ongoing  research  is  being 
conducted  to  determine  what  constitutes  an  overly  distracting 
interface  (Lai  et  al.,  2002). 

Navigation  Issues 

There  is  a  common  assertion  that  the  two  biggest  difficul¬ 
ties  facing  web  users  are  download  times  and  navigation  (Niel¬ 
son,  2000;  McCracken  &  Wolfe,  2004).  Download  times  are  largely 
governed  by  the  technical  capabilities  of  Internet  connections 
(which  have  increased  markedly  over  recent  years),  together 
with  clear  design  variables  (i.e.,  the  size  of  individual  website 
pages).  Navigation  issues  are  considerably  more  complex.  Ac¬ 
cording  to  one  diary-based  study  (Lazar,  Bessiere,  Ceaparu, 
Robinson,  &  Shneiderman,  2003),  between  one  third  and  one 
half  of  time  spent  using  a  computer  is  unproductive,  a  situation 
predominately  attributed  to  problems  in  web  navigation. 

In  analyzing  the  navigation  problem,  we  must  first  consider 
what  is  meant  by  a  navigation  task.  A  CHI  workshop  on  this 
topic  from  1997  (Jul  &  Furnas,  1997)  commented  that  a  diffi¬ 
culty  with  research  in  the  area  was  that  authors  tended  to  define 
navigation  in  varying  ways.  A  broad  view  (adapted  from  the  CHI 
workshop)  is  taken  here,  in  which  navigation  can  be  said  to  in¬ 
volve  the  following: 

•  Planning  “routes.”  When  people  navigate  through  space 
(whether  real  or  electronic),  they  must  first  consider  their 
overall  strategy;  that  is,  what  methods  will  be  appropriate  in 
the  current  situation.  For  the  web,  a  range  of  methods  exist 
that  aim  to  assist  the  user  when  deciding  how  to  navigate 
across  the  web  (i.e.,  search  engines,  directories,  URLs)  and/or 
within  specific  sites  (i.e.,  site  maps,  navigation  menus,  links). 
Many  people  find  it  difficult  to  generate  a  suitable  plan  for  a 


range  of  reasons,  either  concerning  basic  cognitive  limitations 
(such  as  remembering  URLs),  a  lack  of  knowledge  (choosing 
appropriate  search  terms,  misunderstanding  Boolean  logic), 
or  because  methods  are  poorly  implemented  (i.e.,  confusing 
layouts  for  site  maps) . 

•  Following  “routes.”  Once  a  high-level  plan  exists,  people 
need  to  execute  the  subsequent  point-by-point  decisions, 
necessitating  a  virtual  form  of  locomotion  through  the  infor¬ 
mation  space.  In  this  stage,  typical  problems  facing  the  web 
user  often  relate  to  the  design  of  linking  mechanisms  be¬ 
tween  pages,  for  example,  ambiguous  link  labelling,  unclear 
graphics  or  icons,  relevant  information  appearing  off-screen, 
the  need  to  visually  scan  large  numbers  of  links,  and  so  on. 

•  Orienting  within  the  “space.”  The  “where  am  I?”  problem  of 
the  web  is  perhaps  the  one  most  discussed  and  researched 
(Otter  &  Johnson,  2000).  For  optimum  navigation  perfor¬ 
mance  and  confidence,  people  need  to  have  a  sense  of  their 
current  location  in  relation  to  their  surroundings  (i.e.,  their 
final  destination,  their  start  point,  and  other  key  “landmarks,” 
such  as  a  home  page).  Orientation  difficulties  are  often  com¬ 
pounded  on  the  web  because  users  are  “dropped”  into  a  spe¬ 
cific  location  when  they  use  search  engines  and  bookmarks. 

•  Learning  the  “space.”  Repeated  exposure  to  any  large-scale 
environment  (whether  it  be  real  or  electronic)  will  lead  to  a 
deepening  knowledge  of  objects  within  the  space  (i.e.,  par¬ 
ticular  pages),  as  well  as  an  understanding  of  the  various  re¬ 
lationships  between  the  objects  (i.e.,  how  pages  follow  each 
other,  the  overall  structure  of  a  site).  These  specific  cases  of 
mental  models  are  commonly  known  as  cognitive  maps,  and 
facilitate  fast  and  accurate  navigation  performance,  websites 
with  poor  differentiation  (i.e.,  all  pages  appearing  to  be  simi¬ 
lar),  low  visual  access  (i.e.,  difficult  to  see  where  one  can  go 
next)  and  high  path  complexity  (i.e.,  many  links  on  a  page) 
will  all  contribute  to  a  poorly  formed  cognitive  map  (Kim  & 
Hurtle,  1995). 

Summary 

In  the  subsequent  sections,  we  turn  to  some  solutions  for  the 
problems  outlined  here,  looking  first  at  browsing  and  linking 
(“Browsing  and  Linking”),  then  turning  to  search  (“Searching 
and  Querying”)  and  personalization  (“Personalization,  Portals, 
and  Communities”). 


BROWSING  AND  LINKING 


The  fundamental  mechanism  for  viewing  information  on  the 
web  is  by  browsing  implemented  by  hypermedia  links. How¬ 
ever,  this  core  capability  still  encounters  technical  problems  that 
interrupt  the  user’s  browsing,  such  as  difficulties  in  finding  rel¬ 
evant  material  and  limitations  in  the  way  relevant  materials  can 
be  presented. 


^“The  very  names  of  the  two  most  important  elements  of  the  web  infrastructure,  hypertext  transfer  protocol  (http)  and  HyperText  Markup  Language 
(HTML)  indicate  this  clearly. 
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In  this  section,  we  look  at  some  solutions,  including  automatic 
management  of  broken  links,  the  easy  personalization  of  links, 
and  how  links  can  enable  different  perspectives  on  the  same  data. 

Broken  and  Misdirected  Links 

Broken  links  have  frustrated  users  not  only  of  the  web,  but  also 
of  earlier  hypertext  systems,  with  some  of  the  first  work  on  link 
integrity  motivated  the  open  hypermedia  systems’  principle  of 
externalizing  links,  rather  than  embedding  them  into  the  data 
(Davis,  1995). 

Externalizing  links  implies  that  links  are  stored  separately 
from  the  data  being  linked,  so  that  reconciliation  of  the  link  to 
its  referent  was  required,  usually  immediately  before  use,  as  a 
“late  binding”  of  links  (Brailsford,  1999).  However  any  changes 
in  the  data  meant  that  links  could  be  either  displaced,  pointing 
to  the  wrong  place  in  a  document  (part-of-file  error,  Davis, 
1998),  or  completely  invalidated,  with  no  document  now  known 
for  the  link  to  apply  to  (a  whole-file  error,  Davis,  1998). 

HTML  links  suffer  from  a  similar  problem,  despite  being  em¬ 
bedded  in  the  data,  since  it  is  not  only  the  source  of  the  link 
(the  “from”  part  of  it)  that  must  be  reconciled  to  its  referent,  but 
also  the  destination  (the  “to”  part) .  It  is  in  the  latter  that  HTML 
links  are  frequently  wrong,  with  an  estimated  23%  of  web  links 
being  broken  within  a  year  (Lawrence,  Pennock,  Flake,  Krovetz, 
Coetzee,  Glover,  Nielsen,  Kruger,  &  Giles,  2001). 

Broken  links  can  be  frustrating  for  the  user,  resulting  in  loss 
of  interest  in  the  website.  Fortunately,  this  is  one  area  that  has 
seen  ongoing  improvement. 

The  solutions  to  broken  links  can  be  characterised  as  being 
preventative  (creating  infrastructure  or  procedures  that  avoid 
broken  links) ,  corrective  (correcting  broken  links  where  they 
are  discovered)  or  adaptive  (never  storing  actual  links,  only  in¬ 
structions  for  making  them  as  required)  (Ashman,  2000). 

From  the  user’s  point  of  view,  it  might  initially  seem  that  pre¬ 
ventative  solutions  are  ideal,  since  the  irritation  of  broken  or 
misdirected  links  will  never  happen.  However,  many  of  the  pre¬ 
ventative  measures  can  only  guarantee  accurate  links  within  a 
limited  scope  and  changes  outside  that  scope  (such  as  an  en¬ 
tire  domain  name  change)  can  still  result  in  broken  links.  Also, 
they  can  be  functionally  limited;  for  example,  it  may  be  impos¬ 
sible  to  guarantee  link  integrity  in  information  that  is  not  part 
of  the  same  preventative  scheme. 

Corrective  solutions  tend  to  be  more  robust,  as  they  assume 
breakage  will  occur  and  have  procedures  in  place  to  correct 
links,  where  possible,  or  to  otherwise  deal  with  them.  These 
procedures  are  sometimes  computations  that  aim  to  discover 
the  new  location  for  the  linked  document.  These  often  function 
as  mass  correction  procedures,  taking  place  at  intervals,  which 
detect  broken  links  and  attempt  to  correct  them,  discard  them, 
or  notify  the  user  of  the  problem.  From  the  everyday  user’s 
point  of  view,  this  is  a  reasonable  form  of  solution,  requiring  lit¬ 
tle  or  no  effort  on  their  part,  with  breakages  often  not  encoun¬ 
tered  by  the  user.  However,  it  is  still  possible  that  the  user  will 


discover  a  broken  link,  especially  if  it  is  some  time  since  the 
most  recent  correction. 

Also,  the  corrective  approach  may  discard  unfixable  links  that 
the  user  has  previously  required.  This  leaves  the  user  with  the 
knowledge  that  a  link  that  was  once  present  is  now  gone  and 
seemingly  unrecoverable.  Perhaps  a  more  user-friendly  solution 
to  irretrievably  broken  links  is  the  so-called  “soft  404s”  (Bar-Yossef, 
Broder,  Kumar,  &  Tomkins,  2004) — when  pages  go  missing,  those 
pages  are  replaced  by  human-readable  error  messages,  which  es¬ 
sentially  assume  the  identity  of  the  missing  page.  They  frequently 
offer  the  user  the  option  of  a  search  of  the  site,  or  perhaps  redi¬ 
rect  the  user  to  a  new  location.  It  is  estimated  that  25%  or  more 
of  all  dead  links  are  these  soft  404s  (Bar-Yossef  et  al.,  2004). 

The  correction  of  links  is  very  much  still  a  research  problem, 
with  much  effort  expended  into  double  guessing  the  original 
link  creator’s  intentions  in  making  the  link.  However,  if  the  link 
was  originally  created  by  a  computation,  then  the  correction  of¬ 
ten  amounts  to  nothing  more  than  executing  the  computation 
again.  If,  for  example,  the  link  computation  was  to  “link  every 
instance  of  a  person’s  name  to  his  or  her  home  page,”  and  then 
the  name  is  moved  within  various  pages,  or  appears  or  disap¬ 
pears,  the  links  can  easily  be  reinstated  to  their  correct  positions. 

The  adaptive  approach  assumes  that  all  data  is  subject  to 
change,  as  do  the  corrective  solutions.  Instead  of  having  proce¬ 
dures  for  correction  in  place,  it  guarantees  correct  links  by 
never  having  links  in  storage.  The  instructions  to  create  links  are 
stored,  and  links  are  created  as  required  from  them.  This  dy¬ 
namic  approach  was  trialled  in  experimental  systems  that  cre¬ 
ated  links  either  on  user  demand  (Davis,  Knight,  &  Hall,  1994) 
or  as  a  background  process  (Verbyla  &  Ashman,  1994). 

The  adaptive  approach  is  now  evident  in  every-day  web  usage; 
for  example,  scripts  are  used  to  create  and  serve  web  pages  dy¬ 
namically,  and  often  result  from  a  database  query,  where  links 
are  calculated  and  inserted  at  the  time  of  serving.  However,  it 
cannot  assist  with  links  created  and  maintained  by  other  users 
and  neither  can  the  preventative  or  corrective  approaches. 

In  the  end,  the  major  obstacle  to  link  integrity  in  the  web  is 
its  anarchic  nature.  There  is  no  central  authority  that  can  im¬ 
pose  robust  linking  practice  on  the  mass  of  users;  indeed  the 
failure  of  the  error-tolerant  naming  solutions  such  as  URNs  to 
capture  the  imagination  shows  that  it  is  not  even  possible  to 
tempt  users  into  good  linking  practices,  let  alone  force  them.^^ 
Even  a  direct  plea  from  one  of  the  joint  creators  of  the  web  has 
been  unable  to  achieve  link  robustness  (Berners-Lee,  1998). 
As  long  as  users  continue  to  create  links  without  troubling  to 
maintain  them,  other  users  will  eventually  encounter  them  as 
broken  links.  There  are,  however,  numerous  solutions  that 
when  combined  will  help  keep  the  problem  at  bay,  so  that  users 
may  experience  almost  404-free  browsing. 

Personalizing  Links 

In  the  previous  section,  we  saw  the  utility  of  computation  of 
links  for  reducing  the  number  of  broken  links  a  user  encoun- 


evidenced  by  the  lack  of  recent  activity  on  URNs  at  the  World  Wide  Web  Consortium’s  Naming  and  Addressing  page  at  http://www.w3.org/ 
Addressing/ 
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ters.  In  this  section,  we  will  see  how  computed  links,  with 
other  technologies,  allow  the  user  to  personalize  the  links  and 
types  of  links,  regardless  of  the  source,  type,  or  ownership  of 
the  data. 

All  users  at  some  stage  want  to  create  their  own  links  or  an¬ 
notations,  generally  to  record  their  own  associations  between 
data  or  to  ease  everyday  information  access.  In  fact,  easing  in¬ 
formation  access  is  an  obvious  but  frequently  overlooked  pur¬ 
pose  for  links — links  function  as  a  form  of  “user  pull”  of  infor¬ 
mation,  hiding  the  information  while  still  making  its  presence 
known,  but  providing  that  information  on  request  with  the  min¬ 
imum  of  user  effort. 

Users  have  different  needs,  and  an  author  of  web  pages  can¬ 
not  anticipate  all  such  requirements,  let  alone  provide  them. 
Even  if  all  the  potentially  useful  links  were  provided,  not  only 
would  users  disagree  on  the  facility  of  the  links,  but  the  inter¬ 
face  and  performance  of  the  web  would  suffer. 

For  example,  not  every  user  wants  a  dictionary  link,  which 
could  give  a  basic  definition  of  any  word  selected  by  the  user. 
However  nonnative  speakers  of  a  language  could  find  such  a 
link  invaluable.  Glossary  links  are  essential  to  a  reader  not  fa¬ 
miliar  with  technical  terms,  but  become  intrusive  to  seasoned 
readers.  In  each  case,  the  users  want  to  be  able  to  “switch  on 
or  off”  links  to  reflect  their  own  needs. 

This  is  easy  with  current  research  technology.  Even  as  far 
back  as  1992  (Davis,  Hall,  Heath,  Hill,  &  Wilkins,  1992) ,  the  tech¬ 
nology  to  provide  links  over  data  not  owned  by  the  user  was 
available.  It  was  even  quite  simple  to  create  a  form  of  computed 
link,  for  example,  to  link  every  occurrence  of  a  name  with  a  bib¬ 
liography.  Within  a  short  time,  it  even  became  feasible  not  just 
to  exploit  link  computation  to  automatically  create  one’s  own 
links,  but  to  even  create  one’s  own  link-computation  specifica¬ 
tions,  so  that  exactly  the  right  computation,  pointing  into  the 
right  data  collections,  could  be  easily  linked  at  the  user’s  wish 
(Verbyla  et  al.,  1994). 

The  technology  that  supports  personalization  of  links  has 
not  yet  propagated  into  mainstream  web  browsers.  Yet  the  dif¬ 
ferent  solutions  have  been  trialled  in  a  web  context;  for  exam¬ 
ple,  the  Distributed  Link  Service  enabled  individuals  to  make 
their  own  private  link  sets  or  to  contribute  to  their  group’s  col¬ 
lective  link  sets  (Carr,  De  Roure,  Hall,  &  Hill,  1995).  Even  the 
creation  of  one’s  own  link-computation  specifications  was  tri¬ 
alled  in  a  web  environment  (Cawley,  Ashman,  Chase,  Dalwood, 
Davis,  &  "Verbyla,  1995).  Technically,  it  should  be  a  simple  matter 
to  incorporate  this  feature  into  all  browsers.  Whether  this  hap¬ 
pens  depends  largely  on  whether  users  insist  on  it. 

Surprisingly,  even  the  growing  awareness  of  the  need  for  pri¬ 
vacy  in  online  actions  has  not  yet  motivated  users  to  demand 
an  easy-to-use  personalized  link  facility.  Even  personalized  an¬ 
notations  are  a  minority  facility,  only  feasible  for  the  reasonably 
technically  literate  user.  Being  able  to  record  one’s  personalized 
links  enables  users  to  record  their  private  associations  and  col¬ 
lections,  often  representing  original  intellectual  ideas  or  com¬ 
mercially  valuable  information  without  advertising  it  to  all.  As 
more  of  each  user’s  online  interactions  and  transactions  are  ex¬ 
posed  to  outside  scrutiny,  authorized  or  otherwise,  the  provi¬ 
sion  of  a  feature  with  the  fortunate  side  effect  of  offering  private 
means  for  recording  ideas  ought  to  become  a  priority  for  future 
web  browser  development. 


SEARCHING  AND  QUERYING 


Browsing,  enabled  by  hypermedia  links,  was  the  first  technology 
for  accessing  information  on  the  web.  However,  the  enormous 
success  of  the  web  and  its  rapid  uptake  by  millions  of  users 
rapidly  rendered  browsing  alone  a  manifestly  inadequate  tool 
for  information  access.  Searching  has  now  become  a  major,  per¬ 
haps  the  major,  means  of  locating  information  on  the  web. 

However,  accessing  information  via  search  engines  presents 
difficulties.  First,  users  face  the  problem  of  creating  an  accurate 
description  of  their  requirement.  Second,  users  must  make 
sense  of  the  results  that  the  search  engine  produces.  Solutions 
to  these  difficulties  are  discussed  in  this  section. 


Query  Formulation 

Experts  agree  that  the  typical  user  will  experience  difficulties 
when  attempting  to  formulate  an  effective  query.  Three  signifi¬ 
cant  factors  stand  out: 

1.  Low  user  commitment.  Users  are  reluctant  to  provide  infor¬ 
mation  beyond  the  bare  minimum.  In  fact,  what  emerges  is 
that  search-engine  use  is  characterized  by  short  queries  and 
limited  interaction  coupled  with  unreasonably  high  expec¬ 
tations  (Jansen,  Spink,  &  Saracevic,  2000;  Rolker  &  Kramer, 
1999;  Kobayashi  &  Takeda,  2000): 

Real  people  in  the  real  world,  doing  real  information  seeking  and  in 
a  hurry,  use  web  search  engines  and  give  two-word  queries  to  be  run 
against  billions  of  web  pages.  We  expect,  and  get,  sub-second  re¬ 
sponse  time  and  we  complain  when  there  are  no  relevant  web  pages 
in  the  top  10  presented  to  us  (Browne  &  Smeaton,  2004). 

Users,  it  would  appear,  are  wary  of  commitment.  They  expect 
very  high-quality  results  from  every  search  instance  rather  than 
subscribing  to  the  notion  of  search  as  an  iterative  process. 

2.  Uncertain  information  needs.  Users  often  have  an  incom¬ 
plete  understanding  of  their  information  need,  and  their  ini¬ 
tial  need  will  frequently  mature  during,  and  in  direct  re¬ 
sponse  to,  the  process  of  searching  (Lancaster,  1968): 

Searchers  normally  start  out  with  an  unrefined  or  vague  information 
need  which  becomes  more  sharply  focused  as  their  search  contin¬ 
ues  and  exposure  to  information  changes  their  information  need. 
(Browne  et  al,  2004). 

This  initial  lack  of  clarity  with  regard  to  the  object  of  the 
search  readily  translates  into  imprecise  query  terms,  which  in 
turn  begets  irrelevant  and  disappointing  results.  Some  users 
quit  the  searching  process  at  a  very  early  stage,  confounded  by 
this  apparent  “failure.” 

3.  Difficulties  in  expression.  Users  may  not  know  the  correct 
syntax  to  frame  their  query  or  the  commands  to  interact  with 
the  search  engine.  They  may  know  in  general  terms  what 
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web  pages  they  wish  to  retrieve,  but  struggle  to  find  the 
query  terms  most  likely  to  identify  them: 

Except  in  special  circumstances,  it  is  difficult  for  a  user  to  ask  an  in¬ 
formation  retrieval  system  for  what  they  want,  because  the  user 
does  not,  in  general,  know  what  is  available  and  does  not  know 
from  what  it  has  to  be  differentiated.  (Card,  Robertson,  &  Mackin- 
lay,  1991) 

This  is  not  surprising,  as  the  formulation  of  a  “successful” 
query  requires  some  awareness  of  information-retrieval  theory, 
relatively  uncommon  among  typical  users.  Some  of  this  theory 
can  be  guessed  by  observant  users  (who  might  notice,  for  in¬ 
stance,  that  query  terms  that  frequently  occur  in  web  pages 
produce  poor  results),  but  the  remaining  users  must  persevere 
in  an  unfamiliar  environment,  never  knowing  why  their  search 
failed,  or  indeed  why  it  succeeded. 

Relevance  Feedback 

To  counter  some  of  the  problems  just  mentioned,  researchers 
working  with  search  engines  have  developed  strategies  de¬ 
signed  to  improve  the  quality  of  a  submitted  query.  One  of  these 
strategies  is  relevance  feedback  (RF) : 

1.  The  user  submits  an  initial  query  and  the  search  engine 
serves  results. 

2 .  The  user  then  identifies  relevant  and  nonrelevant  web  pages 
using  associated  checkboxes,  clickable  links,  radio  buttons 
etc.  This  action  supplies  the  search  engine  with  feedback. 

3.  The  search  engine  then  automatically  modifies  the  original 
query  in  response  to  the  feedback.  This  may  involve  adding 
search  terms  to  the  query,  known  as  query  expansion.  It 
may  also  involve  reweighting  of  the  query,  where  informa¬ 
tion  in  the  relevant  and  nonrelevant  set  of  documents  is 
used  to  modify  the  importance  of  various  query  terms. 

4.  The  modified  query  is  run  by  the  search  engine  and  a  new 
set  of  search  results  is  shown  to  the  user. 

5.  This  process  continues  until  the  user’s  information  need  is 
satisfied. 

This  technique  is  particularly  successful,  and  it  has  been  re¬ 
peatedly  demonstrated  that  relevance  feedback  can  improve 
the  performance  of  a  search  engine  at  comparatively  little  cost 
to  the  user  or  the  system  (Ruthven  &  Laimas,  2003;  Harman, 
1992b;  although  see  Spink,  Jansen,  &  Ozmultu,  2000,  for  con¬ 
trast).  For  example,  Koenemann  (1996)  noted  that  the: 

.  .  .  availability  and  use  of  relevance  feedback  increased  retrieval  effec¬ 
tiveness;  and  increased  opportunity  for  user  interaction  with  and  con¬ 
trol  of  relevance  feedback  made  the  interactions  more  efficient  and 
usable  while  maintaining  or  increasing  effectiveness. 

However,  it  has  also  been  found  that  in  order  for  relevance 
feedback  to  be  effective,  the  user  must  be  offered  a  meaningful 
dialogue  for  participation.  Ruthven  recently  conducted  a  series 
of  RF  experiments  examining  the  performance  of  user-supplied 
feedback  measured  against  an  automated  equivalent.  Com¬ 


menting  on  the  relatively  low  performance  of  the  users,  he  iden¬ 
tified  a  failure  in  the  infrastructure  supporting  the  user’s  search: 

.  .  .  simple  term  presentation  interfaces  are  not  sufficient  in  providing 
sufficient  support  and  context  to  allow  good  query  expansion  decisions. 
Interfaces  must  support  the  identification  of  relationships  between  rel¬ 
evant  material  and  suggested  expansion  terms  and  should  support  the 
development  of  good  expansion  strategies  by  the  searcher.’  (Ruthven, 
2003) 

In  summary,  many  search-engine  users  find  the  formulation 
of  an  effective  query  difficult.  Relevance  feedback  offers  an  ef¬ 
fective  solution,  providing  that: 

•  the  users  are  willing  to  commit  to  several  search  iterations 

•  a  suitable  and  useable  feedback  interface  is  implemented 

The  Results  List 

It  is  well  accepted  that  the  results  generated  by  a  web  search  en¬ 
gine  will  frequently  not  satisfy  the  user.  Sometimes  this  negative 
result  will  occur  because  the  search  engine  cannot  find  any 
documents  that  match  the  user’s  information  need.  However, 
it  is  much  more  likely  to  happen  because  the  user  has  submitted 
a  very  broad  query,  resulting  in  too  many  documents.  This  com¬ 
mon  difficulty  has  been  identified  by  one  author  as  the  “abun¬ 
dance  problem,”  occurring  when  “the  number  of  pages  that 
could  reasonably  be  returned  as  relevant  is  far  too  large  for  a  hu¬ 
man  user  to  digest”  (Kleinberg,  1999). 

Many  solutions  to  this  challenge  have  been  suggested,  but 
three  interesting  approaches  are  (a)  community-based  ranking 
algorithms,  (b)  improved  visual  interfaces,  and  (c)  document 
clustering.  We  now  examine  each  approach  in  turn. 

Communitij -Based  Ranking  Algoritfims 

A  search  engine  normally  ranks  a  set  of  web  pages  in  order  of 
the  likelihood  that  they  will  be  relevant  to  the  user’s  informa¬ 
tion  need,  the  document  most  likely  to  be  relevant  appearing 
first.  This  likelihood  of  relevance  is  usually  calculated  using  a  sta¬ 
tistical  measure  related  to  the  occurrence  of  the  query  terms  in 
the  documents  concerned.  Known  as  “term  frequency,”  this 
measurement  is  traditionally  normalized  with  respect  to  docu¬ 
ment  length  and  multiplied  by  a  measure  reflecting  the  speci¬ 
ficity  of  each  term  within  the  document  collection  (Sparck 
Jones,  1972;  Aizawa,  2003;  Salton,  Yang,  &  Yu,  1975). 

Document  rankings  dependent  upon  term  frequency  repre¬ 
sent  a  purely  arithmetical  evaluation  of  the  web  pages  con¬ 
cerned.  This  evaluation  can  (and  does)  provide  a  useful  ap¬ 
proximation  of  the  likelihood  of  relevance  to  the  user,  but  is  by 
no  means  an  authoritative  measure.  The  underlying  assump- 
tion-that  term  frequency  translates  directly  into  relevance — is 
exactly  that:  an  assumption.  There  is  no  guarantee  that  a  web 
page  containing  many  query  terms  will  be  any  more  relevant 
than  a  second  page  containing  a  lower  number  of  the  same 
terms.  The  latter  may  have  fewer  of  the  important  keywords, 
but  might  be  far  superior  in  other,  less-quantifiable  respects 
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(i.e.,  it  may  be  more  concise,  better  illustrated,  superior  in  style, 
easier  to  read,  more  complete  in  its  references,  etc.). 

One  solution  to  this  problem  has  been  to  supplement  the 
rudimentary  rankings  that  can  be  constructed  through  statisti¬ 
cal  observations  with  more  sophisticated  sources  of  informa¬ 
tion.  This  has  led  to  the  development  of  a  general  class  of  rank¬ 
ing  algorithms  that  implement  citation  based  metrics  for 
relevance  scoring  (Garfield,  1972;  Pinski  &  Narin,  1976).  In 
these  algorithms,  the  relative  importance  of  each  web  page  is  a 
function  of  the  number  of  other  web  pages  that  link  to  it  (Brin 
&  Page,  1998).  As  Kleinberg  observed  (1999),  these  “in-links  .  .  . 
encode  a  considerable  amount  of  latent  human  judgment.”  Ac¬ 
cordingly,  a  web  page  that  is  referenced  by  a  good  proportion  of 
its  peers  is  a  natural  candidate  for  high  ranking. 

Improved  Visual  Interfaces 

Hearst  observed  that  “long  lists  littered  with  unwanted  irrel¬ 
evant  material”  represent  an  unwieldy  and  nonintuitive  method 
for  delivering  search  results  to  the  user  (Hearst,  1997).  Rather 
than  expending  resources  developing  more  sophisticated  rank¬ 
ing  algorithms,  Hearst  asserted  that  the  answer  to  this  problem 
lay  in  shifting: 

.  .  .  the  user’s  mental  load  from  slower,  thought  intensive  processes 
such  as  reading  to  faster,  perceptual  processes  such  as  pattern  recogni¬ 
tion.  It  is  easier,  for  example,  to  compare  bars  in  a  graph  than  numbers 
in  a  list.  Color  is  very  useful  for  helping  people  quickly  select  one  par¬ 
ticular  word  or  object  from  a  sea  of  others. 

The  author  went  on  to  speculate  that  in  the  near  future  ad¬ 
vanced  search  interfaces  may  abandon  the  two-dimensional 
(2D)  page  metaphor  altogether,  adopting  “alternatives  that  al¬ 
low  users  to  see  information  on  the  web  from  several  perspec¬ 
tives  simultaneously.”  A  software  implementation  based  on  this 
very  idea,  known  as  the  Information  Visualizer  (Card,  Robert¬ 
son,  &  Mackinlay,  1991)  has  proved  surprisingly  powerful;  in 
one  experiment  an  organizational  hierarchy  requiring  80  printed 
pages  was  displayed  on  just  one  three-dimensional  (3D)  screen 
(Robertson,  Mackinlay,  &  Card,  1991).  It  seems  clear  that  a  visual 
tool  for  searching  the  web  that  helps  the  user  to  “see”  a  set  of 
search  results  rather  than  just  “read”  them  would  have  consid¬ 
erable  utility. 


Document  Clustering 

Another  way  of  simplifying  a  set  of  search  results  is  to  gen¬ 
eralize.  The  process  of  generalizing  a  set  of  results  begins  with 
identifying  particular  commonalities  shared  by  some  members 
of  that  set  (i.e.,  use  or  nonuse  of  certain  terms,  subject  matter, 
file  type,  etc.),  web  pages  determined  to  be  “similar”  in  some 
way  are  then  grouped  together  under  a  single  category  or  head¬ 
ing,  and  the  user  is  subsequently  presented  with  several  coher¬ 
ent  clusters  of  web  pages  rather  than  the  traditional  list.  Pro¬ 
vided  the  number  of  clusters  is  relatively  low,  this  technique 
quickly  reduces  the  cognitive  load  of  studying  the  results,  al¬ 
lowing  a  user  to  “skim”  rather  than  read. 


In  a  good  example  of  this  technique,  Truran,  Goulding,  and 
Ashman  (2005)  described  an  attempt  to  solve  the  problem  of 
ambiguity  in  query-based  search.  Lexical  ambiguity  causes  con¬ 
siderable  problems,  complicating  the  already  difficult  task  of 
establishing  what  a  user  is  actually  looking  for.  An  analysis  of  the 
query  log  of  a  very  large  online  encyclopedia  recently  found  that 
1  in  20  queries  was  largely  wasted  simply,  because  a  user  chooses 
ambiguous  query  terms  (Wen,  Nie,  &  Zhang,  2001). 

Discussing  this  problem,  the  authors  developed  a  sense  cat¬ 
egorization  service  known  as  SENSAI.  This  service  was  remark¬ 
able  in  that  it  was  capable  of  auto-categorizing  a  set  of  search  re¬ 
sults  into  believable  sense  categories  without  a  semantic  analysis 
of  the  web  pages  concerned.  Neither  was  an  external  knowl¬ 
edge  source,  such  as  a  dictionary  or  an  expert  system,  con¬ 
sulted.  Instead,  sense  discrimination  was  the  quiet  background 
product  of  user  consensus;  the  system  studied  and  learned 
from  its  clientele,  utilizing  what  Fitzpatrick  et  al.  termed  the 
“information-consuming  habits”  of  its  users  (Fitzpatrick  &  Dent, 
1997)  to  create  a  human-intuitive  overview  of  the  distinguish¬ 
able  semantics  of  the  query,  like  documents  grouped  with  like 
(Michalski,  Stepp,  &  Diday  1983). 

Summary 

Users  of  a  search  engine  are  frequently  overawed  by  the  num¬ 
ber  of  results  returned  when  a  query  is  submitted,  and  rarely 
proceed  beyond  the  first  page  of  suggestions  (Harman,  1992). 
To  ensure  that  the  most  relevant  documents  appear  on  the 
first  page,  more  sophisticated  ranking  algorithms  have  been 
developed,  which  exploit  the  inherently  self-referential  nature 
of  the  web. 

However,  even  with  these  improvements,  search  engines  are 
still  serving  large  chunks  of  unrefined  information  to  their 
users.  This  seems  likely  to  change  soon.  Search  engines  will  al¬ 
most  certainly  begin  to  abstract  categories  and  significant 
groupings  from  any  given  set  of  results  automatically,  and  vi¬ 
sual  interfaces  that  simplify  the  user’s  task  seem  probable. 


PERSONALIZATION,  PORTALS 
AND  COMMUNITIES 


In  most  circumstances,  one  of  the  major  impediments  to  ef¬ 
fective  use  of  the  web  is  the  disorientation  that  results  from 
its  sheer  size.  Statistics  are  commonly  quoted  as  to  just  how 
vast  the  web  is  with  one  recent  estimate  being  11.5  billion 
pages  (Gulli  et  al.,  2005).  This,  no  doubt,  will  become  rapidly 
out  of  date,  but  the  actual  figures  are  not  important  from  a 
user’s  perspective. 

From  the  perspective  of  end  users  (i.e.,  content  consumers), 
it  is  increasingly  difficult  to  find  what  they  are  looking  for  amid 
the  sea  of  inappropriate  material.  From  the  perspective  of  au¬ 
thors  (i.e.,  content  providers),  it  is  becoming  increasingly  diffi¬ 
cult  to  prevent  their  pages  from  being  lost  in  the  crowd.  In  this 
section  we  will  examine  approaches  of  personalization,  portals, 
and  community  support  that  are  becoming  widely  used  to  ad¬ 
dress  these  issues. 
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Although  personalization,  portals,  and  communities  are 
quite  distinct  approaches  to  web  design,  they  are  all  aimed  at 
producing  an  end-user  experience  that  is  uniquely  appropriate 
for  each  individual.  This  is  what  might  be  called  “my  web.” 
Personalization  is  where  the  contents  of  websites  are  changed 
(either  transparently  or  under  user  control),  so  that  the  mater¬ 
ial  is  appropriate  for  the  needs  and  requirements  of  the  indi¬ 
vidual.  This  involves  a  wide  range  of  techniques  of  varying  de¬ 
grees  of  sophistication,  from  a  user-selectable  color  scheme  at 
its  simplest  through  to  an  adaptive  content-management  system 
based  upon  a  comprehensive  user  model.  Such  systems  are 
commonly  used  for  e-commerce  websites,  and  have  been  used 
experimentally  for  e-learning.  Portals  are  gateways  to  the  web 
that  aim  to  make  its  vastness  more  accessible  by  providing  an 
appropriate  set  of  links  and  default  paths  (usually  they  are  con¬ 
tinually  or  at  least  regularly  updated).  These  may  be  personal¬ 
ized  at  the  level  of  individuals,  an  organization,  a  special  inter¬ 
est,  or  some  combination  of  these.  Where  the  personalization  is 
operating  at  the  level  of  organizations,  special  interests,  or  other 
groups,  there  is  an  opportunity  for  the  personalized  experience 
to  be  shared  by  a  community.  A  major  change  in  web  design  in 
recent  years  has  been  that  there  are  now  a  variety  of  tools  and 
techniques  to  both  support  community  interaction  on  the  web 
and  leverage  the  power  of  collaborative  effort. 

Personalization 

Until  quite  recently,  the  vast  majority  of  web  content  has  been 
designed  for  the  needs  of  content  providers  rather  than  end 
users.  In  terms  of  a  communication  model,  a  traditional  static 
HTML  site  has  a  lot  in  common  with  a  poster  on  a  billboard.  It 
offers  the  same  message  to  all  passersby  regardless  of  whether 
or  not  it  is  appropriate  to  them.  The  only  reason  that  this  model 
is  effective,  on  either  a  billboard  or  the  web,  is  that  there  are 
(or  at  least  there  can  be)  many  passersby.  Modern  websites  fre¬ 
quently  consist  of  a  lot  more  than  static  HTML,  they  are  often 
dynamic  front-ends  for  a  wide  range  of  resources  and  services. 
However,  their  communication  model  is  usually  declarative — 
with  a  “one-size-fits-all”  approach  to  content  that  treats  the  user 
as  a  hypothetical  “average.” 

This  state  of  affairs  has  recently  begun  to  change,  with  the  ad¬ 
vent  of  websites  that  personalize  content  for  each  end  user.  One 
of  the  best  known  examples  of  this  is  Amazon.com,  which  stores 
information  about  the  customers’  interests — gleaned  from  vari¬ 
ous  sources — to  generate  a  personalized  home  page  and  suggest 
items  that  are  likely  to  be  of  interest.  Adaptive  hypermedia  (AH) 
(Brusilovsky,  2001)  is  an  academic  discipline  that  is  dedicated  to 
bringing  personalization  to  the  web.  Most  applications  of  AH 
techniques  are  either  simple  e-commerce  applications,  such  as 
that  just  described  or  are  in  e-learning.  Educational  adaptive  sys¬ 
tems  (AEH)  (Brusilovsky,  2001b)  have  arisen  as  a  logical  pro¬ 
gression  of  using  the  web  for  teaching  and  learning.  In  tradi¬ 
tional  education,  the  experience  is  very  adaptive.  A  student  who 
doesn’t  understand  a  question  will  ask  the  teacher  for  an  expla¬ 
nation.  If  the  student  doesn’t  understand  the  answer,  then  the 
teacher  explains  it  in  different  terms — maybe  by  simplifying,  us¬ 
ing  an  example,  or  drawing  a  picture.  Exactly  how  the  teacher  ex¬ 
plains  it  depends  upon  the  needs  of  the  student,  so  that  the  con¬ 


tent  is  adapted  for  the  individual  student.  Static  web  pages  gen¬ 
erally  give  the  same  experience  to  all  students  regardless  of  how 
effective  this  may  be,  good  human  teachers  provide  a  cus¬ 
tomised  experience  for  individuals.  The  ultimate  goal  of  AEH  sys¬ 
tems  is  to  provide  this  capability  to  web-based  learning.  So  far, 
most  such  systems  are  experimental,  but  they  are  important  be¬ 
cause  the  majority  of  AH  research  has  been  done  in  this  area.  See 
Jameson  (in  this  book)  for  more  detail  on  adaptive  interfaces. 

Adaptive  and  Adaptable  Systems 

Two  quite  distinct  approaches  to  the  personalization  of  hy¬ 
permedia  are  (Patel  &  Kinshuk,  1997;  'Wadge  &  schraefel,  2001) 
the  adaptive  and  the  adaptable.  Adaptive  systems  dynamically 
organize  their  contents  to  meet  the  perceived  needs  of  the  user. 
This  process  is  transparent,  and  does  not  require  any  direct  in¬ 
tervention  by  the  user.  Adaptable  systems  make  changes  to  the 
content  (or  more  commonly  the  user  interface)  only  as  a  result 
of  the  explicit  intervention  of  the  user. 

Adaptable  systems  gather  information  explicitly  from  the  user, 
such  as  fWeaver  (Wolf,  2002).  This  is  typically  either  in  the  form 
of  some  sort  of  “control-panel”  setting,  or  a  questionnaire  (com¬ 
monly  completed  as  part  of  a  user  registration  process)  to  elicit 
a  set  of  preferences  from  the  user.  These  preferences  may  be 
used  to  determine  various  aspects  of  the  user  interface  (i.e., 
color  schemes,  screen  layout  etc),  or  they  may  be  about  the 
background  or  perceived  needs  of  the  user  (i.e.,  knowledge  or 
areas  of  interest). 

Adaptive  systems  gather  information  implicitly  by  monitor¬ 
ing  the  behavior  of  the  user  (Moore,  Stewart,  Zakaria,  &  Brails- 
ford,  2003).  The  navigation  of  the  user  through  the  system  may 
be  monitored  (i.e.,  the  links  followed)  or  a  quiz  may  be  used 
maybe  used  to  attempt  to  deduce  information  about  users  (i.e., 
their  abilities).  The  distinction  between  this  and  an  adaptable 
system  is  that  the  adaptable  system  is  asking  users  what  they 
want;  whereas  the  adaptive  system  attempts  to  deduce  what  the 
user  needs  (not  necessarily  the  same  thing)  and  provide  it  trans¬ 
parently.  Adaptive  hypertext  is  really  a  type  of  “attentive  system” 
(Maglio,  Barrett,  Campbell,  &  Selker,  2000),  which  is  a  system 
that  pays  attention  to  the  user’s  behavior,  so  that  the  user  can 
respond  appropriately.  Because  there  is  always  a  risk  of  user  be¬ 
havior  being  misinterpreted  by  the  system,  attentive  systems 
should  not  be  used  for  critical  information  (Maglio  et  al.,  2000). 
The  adaptation  may  well  improve  the  usability  and  appropri¬ 
ateness  of  a  website,  but  it  is  important  that  any  essential  infor¬ 
mation  is  available  independently  of  any  adaptation. 

Although  these  two  approaches  to  personalization  are  quite 
distinct,  it  is  possible  for  a  system  to  be  both  adaptive  and 
adaptable.  Amazon.com  is  a  good  example  of  this  because  it 
acquires  information  from  users  both  implicitly  and  explicitly. 
It  uses  information  based  upon  purchases  and  follow  ups  of 
search  results  or  suggestions  (i.e.,  click-through),  all  of  which 
is  implicitly  derived  from  user  behavior  (i.e.,  from  this  point  of 
view  it  is  adaptive) .  However,  it  also  allows  users  to  explicitly 
rate  items,  and  it  uses  these  ratings  as  a  basis  for  future  sugges¬ 
tions  (i.e.,  from  this  point  of  view  it  is  adaptable).  AEH  systems 
also  can  use  both  methods,  for  example  'WHURLE  (Brailsford, 
Ashman,  Stewart,  Zakaria,  &  Moore,  2002). 
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Need  or  Desire? 

Adaptive  hypertext  systems  base  their  content  delivery  upon 
what  the  system  perceives  to  be  a  user’s  need.  Adaptable  sys¬ 
tems,  on  the  other  had,  use  information  obtained  directly  from 
the  user,  which  in  most  case  will  represent  that  user’s  desire. 
Since  need  and  desire  are  not  necessarily  the  same  thing,  there 
is  potential  for  a  conflict  of  interests.  When  considering  adaptive 
systems,  the  underlying  assumption  is  that  the  system  knows 
better  than  the  user  what  is  most  appropriate.  Although  this 
sounds  disempowering  (and  if  implemented  badly  it  can  be), 
there  are  many  circumstances  where  this  view  is  justified,  par¬ 
ticularly  if  the  user  doesn’t  understand  the  ramifications  of  a 
particular  choice  at  a  decision  point.  For  example,  in  educa¬ 
tional  systems  learners  may  not  always  have  all  of  the  facts  or 
the  pedagogic  background  necessary  to  take  control  of  their 
own  teaching  regime.  In  “real-life”  education,  this  is  why  we 
have  teachers  rather  than  just  resources.  Consider,  for  instance, 
a  student  who  has  problems  with  the  math  in  a  subject  area  that 
makes  extensive  use  of  math,  although  it  is  not  the  primary 
topic  of  study  (engineering  or  computer  science  would  be  good 
examples  here).  Given  the  choice,  many  such  students  would 
prefer  fewer  math  problems;  however,  what  they  probably  need 
are  more  (and  better  explained)  math  problems.  An  adaptive 
system  that  “knows  best”  would  thus  be  capable  of  transpar¬ 
ently  providing  remedial  exercises  (as  would  a  real-life  teacher 
who  spotted  the  problem) .  However,  an  adaptable  system  runs 
the  risk  that  the  student  might  choose  to  avoid  the  parts  that 
they  find  the  most  onerous. 

Implementing  this  “system-knows-best”  paradigm  without 
due  care  creates  the  potential  for  serious  problems  in  an  adap¬ 
tive  system.  It  is  quite  likely  to  result  in  frustration  or  anger  if 
users  realize  that  the  system  is  going  against  their  desires  or  if 
the  system’s  assumption  of  need  is  incorrect — particularly  a 
problem  with  a  poorly  thought-out  or  incomplete  user  model 
(see  “User  Modeling”).  Well-designed  adaptive  systems  do  not 
take  control  away  from  the  user;  rather  they  provide  sugges¬ 
tions  (albeit  sometimes  strong  ones)  as  to  the  most  appropri¬ 
ate  path.  This  is  then  a  user  interface  issue,  and  it  may  take  the 
form  of  reordering  content  or  suggesting  what  content  is  con¬ 
sidered  most  appropriate  (i.e.,  a  common  mechanism  is  to 
change  link  color) . 

Methods  and  Models  of  Adaptation 

There  are  two  main  approaches  to  implementing  adaptation 
on  the  web,  at  the  level  of  either  content  (i.e.,  adaptive  presen¬ 
tation)  or  linking  (i.e.,  adaptive  navigation  support).  Within 
each  of  these,  there  are  a  number  of  techniques  used  that  are 
described  briefly  here  but  are  reviewed  in  detail  elsewhere 
(Brusilovsky,  1996). 

The  majority  of  adaptive  presentation  operates  on  text, 
although  in  principle  any  media  type  may  be  adapted.  Adapta¬ 
tion  of  modality  (Brusilovsky,  1996)  is  where  the  adaptation  op¬ 
erates  at  the  level  of  choices  between  different  media  types 
(i.e.,  text,  speech,  animation  etc.).  Although  this  will  likely  be¬ 
come  important  in  the  future,  multimedia  adaptation  is  cur¬ 
rently  very  rarely  implemented  (one  of  the  few  examples  is 


Fagerjord,  2005).  There  are  a  number  of  ways  in  which  the 
adaptive  presentation  of  text  has  been  implemented,  usually 
by  manipulating  fragments  of  predefined  text.  Some  systems 
add  fragments  to  a  standard  minimum  body  of  text;  others  re¬ 
move  fragments  that  the  system  deems  inappropriate.  When 
fragments  are  removed  they  may  be  completely  hidden,  or  they 
may  be  merely  dimmed.  Another  variant  of  fragment  removal 
is  to  use  “stretchtext”  (a  body  of  text  is  hidden  behind  a  single 
word  or  phrase  and  expanded  upon  request).  Instead  of  adding 
or  removing  fragments  of  text,  some  systems  simply  reorder  the 
fragments.  A  major  problem  with  all  forms  of  adaptive-text  pre¬ 
sentation  is  the  impact  that  it  has  upon  narrative  flow.  Differ¬ 
ent  users  will  necessarily  have  different  narratives,  which  isn’t 
a  problem  per  se  other  than  the  fact  that  it  is  possible  (or  even 
likely)  that  a  web  page  will  no  longer  form  a  coherent  whole  if 
its  text  is  manipulated.  This  problem  may  be  ameliorated  by 
writing  text  in  the  form  of  conceptually  discrete  “atoms”  (Brails- 
ford,  Stewart,  Zakaria,  &  Moore,  2002),  but  that  causes  quite  se¬ 
rious  problems  with  authoring  (most  people  find  it  difficult  to 
write  following  this  sort  of  constraint)  and  makes  legacy  con¬ 
tent  difficult  to  adapt. 

Adaptive  navigation  support  has  fewer  problems  with  narra¬ 
tive  flow,  because  links  are  (usually)  less  fundamental  to  the 
sense  of  a  page  than  is  text.  Adaptive  linking  has  been  divided 
into  five  main  categories  (Brusilovsky,  1996):  direct  guidance, 
link  sorting,  link  hiding,  link  annotation,  and  map  adaptation. 
With  direct  guidance ,  the  system  makes  a  recommendation  as 
to  the  next  node  to  visit,  based  upon  the  adaptation  criteria. 
This  is  often  used  for  a  “guided-tour”  type  of  direction  for 
novice  users  (i.e.,  it  is  tantamount  to  a  linear  sequence  of  “Next” 
buttons).  With  link  sorting,  the  ordering  of  links  is  changed  so 
that  the  most  appropriate  one  is  at  the  head  of  the  list  or  in  a 
particularly  prominent  place.  Inappropriate  links  are  not  dis¬ 
played  in  link  hiding.  This,  of  course,  can  make  parts  of  the  sys¬ 
tem  inaccessible  to  some  users — which  may  or  may  not  be  ac¬ 
ceptable.  A  variant  of  link  hiding  is  link  dimming,  where  it  is  still 
possible  to  follow  the  link,  but  it  is  less  prominent  in  the  user  in¬ 
terface  than  more  relevant  links.  In  a  link-annotation  system, 
visual  clues  are  attached  to  links  to  give  users  some  idea  of  their 
status  with  respect  to  the  system;  for  example,  font  sizes  or  col¬ 
ors  can  be  used  to  provide  a  gradation  of  relevance.  Lastly,  in 
map-adaptation  systems  a  navigational  map  is  generated  ac¬ 
cording  to  the  adaptation  rules. 

User  Modeling 

In  order  to  be  able  to  do  any  adaptation,  adaptive  systems 
need  to  know  who  is  using  them  and  store  information  about 
that  person.  This  information  is  stored  in  a  “user  profile”  either 
on  the  local  machine  or,  far  more  commonly,  in  a  centralized 
database.  In  the  user  profile,  as  well  as  personal  information 
(name,  ID  etc.),  the  criteria  that  the  system  will  use  for  adapta¬ 
tion  are  stored  and  constitute  the  user  model.  Although  a  de¬ 
tailed  discussion  of  user  modelling  is  beyond  the  scope  of  this 
chapter,  the  basic  principles  are  very  simple.  A  user  model  is  sim¬ 
ply  a  set  of  rules  that  adaptive  systems  use  to  make  decisions 
about  relevance.  One  approach  to  user  modelling  is  that  of 
stereotype  models  (Kaplan,  Fenwick,  &  Chen,  1993).  With  these, 
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users  are  grouped  together  with  other  like-minded  individuals. 
For  example,  “second-year  chemistry  undergraduates”  or  “peo¬ 
ple  who  like  cats”  are  both  stereotypes  and,  as  such,  it  is  possible 
to  make  reasonable  assumptions  about  members  of  these 
groups  (i.e.,  “will  know  the  structure  of  butane”  or  “will  appre¬ 
ciate  photos  of  cute  kittens”  respectively).  The  other  main  user¬ 
modeling  methodology  is  the  overlay  model  (Fischer,  Mastaglio, 
Reeves,  &  Rieman,  1990),  where  the  individual’s  knowledge, 
background,  and  any  other  relevant  data  are  overlaid  onto  the 
sum  of  all  knowledge  in  the  system.  Usually,  adaptive-hyperme¬ 
dia  systems  adopt  one  of  these  two  models.  However,  they  do 
not  have  to  be  mutually  exclusive,  as  it  is  possible  to  combine  the 
two  approaches  so  that  any  given  user  is  initially  stereotyped. 
Then,  as  the  system  learns  more  about  them,  it  uses  an  overlay 
model  to  record  their  progress  (Zakaria  &  Brailsford,  2002). 

User  modeling  is  critical  for  a  usable  adaptive  system.  If  the 
user  model  is  in  some  way  flawed  (i.e.,  if  the  information  stored 
does  not  match  the  reality  of  users’  background  and  behavior), 
then  content  is  likely  to  be  maladapted  for  a  user  and  the  system 
is  often  extremely  difficult  to  comprehend.  This  is  compounded 
by  the  fact  that  there  are  no  standards  for  user  modelling,  and 
most  adaptive  systems  use  their  own  proprietary  user  model. 

Authoring 

To  date  there  are  few,  if  any,  mature  authoring  systems  for 
personalized  web  systems  (AHA!  (De  Bra,  Stash,  &  Smits,  2004) 
is  one  of  the  more  mature  authoring  systems).  This  is,  no  doubt, 
because  most  adaptive  systems  are  either  experimental  or  pro¬ 
prietary,  and  in  both  cases  the  authors  are  either  members  of,  or 
have  direct  access  to,  the  development  team.  However,  there 
are  a  number  of  potentially  difficult  issues  that  need  to  be  con¬ 
sidered  when  attempting  to  design  authoring  interfaces  for 
adaptive  web-based  systems  (Cristea,  Carro,  &  Garzotto,  2005). 
First,  there  is  a  need  for  a  new  design  paradigm,  together  with 
associated  metaphors  because  personalization  breaks  the  fun¬ 
damental  design  paradigm  of  the  web,  that  of  the  “page.”  "When 
authoring  a  conventional  web  page,  the  author  concentrates 
on  content,  and  can  reasonably  assume  that  users  will  see  the 
page  as  it  is  created.  Hence,  most  authoring  systems  for  static 
web  pages  use  a  conventional  'WYSI’WYG  paradigm,  which  is 
quite  similar  to  that  of  a  word  processor.  In  the  case  of  adap¬ 
tive  systems,  however,  the  authoring  process  requires  a  separa¬ 
tion  of  content  authoring  from  the  authoring  of  the  parameters 
for  adaptation.  Depending  upon  the  adaptation  model  that  is 
used,  what  the  author  sees  might  be  somewhat  or  completely 
different  from  what  any  given  end  user  sees. 

Recently,  research  has  started  to  address  these  issues,  and 
frameworks  (such  as  LAOS,  Cristea,  2005)  have  been  developed 
for  adaptive  educational  systems  that  allow  the  separation  of 
different  aspects  of  the  authoring  process.  The  LAOS  frame¬ 
work  has  five  layers,  which  result  in  different  authors  being  able 
to  create  different  sections  or  aspects  of  adaptive  content.  For 
example,  there  is  a  “goals  &  constraints”  model  where  a  peda¬ 


gogic  expert  can  use  the  content  from  the  “domain  model”  to 
structure  a  lesson  that  makes  use  of  a  “user  model,”  “adaptation 
model,”  and  a  “presentation  model.”  This  layered  structure  not 
only  decreases  individual  author  load,  but  also  encourages  the 
reuse  of  existing  materials  (whether  they  are  domain  content, 
adaptation  rules,  or  presentation  settings).  The  great  benefit 
of  this  approach  is  that  it  can  combine  content  created  by  dif¬ 
ferent  authors  working  at  different  levels.  However,  that  is  also 
its  weakness  in  that  care  must  be  taken  to  avoid  inappropriate 
use  of  content  originally  intended  for  a  different  context.  Adap¬ 
tive  authoring  systems  are  very  much  in  their  infancy,  but  al¬ 
though  there  are  no  mature  systems,  the  current  research  is  at 
least  starting  to  discover  the  issues  and  problems  involved  with 
their  design. 

Portals 

A  common  way  to  gain  a  personalized  experience  of  the  web  is 
by  the  use  of  portals.  These  are  gateways  to  online  resources 
that  are,  at  their  most  basic,  simply  a  set  of  links  that  provide 
starting  point  for  web  use.  Usually,  though,  they  are  highly  so¬ 
phisticated  systems  that  are  continually  updated  and  highly  per- 
sonalizable.  Portals  are  hugely  popular,  at  the  time  of  writing  of 
the  top  10  most  browsed  English  language  websites,  four  are 
portals, and  they  are  often  presented  as  a  way  for  users  to  cre¬ 
ate  “their  space”  on  the  web.  They  are  usually  adaptable,  allow¬ 
ing  for  explicit  selections  of  user  interface  options  or  content 
appropriate  for  specific  hardware  platforms  (i.e.,  low  graphics 
content  for  PDAs),  although  some  portals  also  use  techniques  of 
adaptivity  {q.v.). 

Many  portals  have  their  roots  in  either  Internet  directories  or 
search  engines,  and  therefore  a  common  model  of  a  portal  user 
interface  is  that  of  a  categorized  front  page  (i.e.,  a  “list  of  lists”). 
Some  portals  are  essentially  starting  points  for  a  collection  of  re¬ 
sources — typically  these  are  intranet  systems  where  the  facilities 
determined  for  a  particular  user  are  presented  on  login  (i.e.,  dif¬ 
ferent  staff  or  student  resources  in  a  university  system).  Some 
portals  use  semantic  web  technology  to  categorize  their  con¬ 
tents  (i.e.,  Ontoportab^) . 

Communities 

The  Internet  has  its  roots  in  communities,  in  creating  links  and 
bridging  geographic,  cultural,  or  temporal  barriers.  Communi¬ 
ties  on  the  web  are  often  supported  by,  or  generated  through, 
portals,  which  provide  starting  points  for  people  of  similar  in¬ 
terests.  Not  only  do  these  provide  channels  through  which 
users  can  navigate  to  find  resources,  but  in  doing  so  they  can 
bring  about  a  shared  cultural  space.  Portals,  especially  those  that 
are  intranet  based,  are  often  designed  to  support  communities. 

Part  of  the  original  vision  of  computer  pioneers,  such  as  Nel¬ 
son  (1982)  was  that  people  could  easily  publish,  edit,  and  com¬ 
ment  on  as  well  as  read  publicly  available  materials,  and  this  is 
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now  starting  to  become  a  reality  on  the  modern  web.  Weblogs 
(more  commonly  referred  to  as  blogs)  allow  a  user  to  post  ma¬ 
terial,  which  can  then  be  cited,  commented  upon,  and  redis¬ 
tributed  throughout  the  internet.  These  blogs  require  no  spe¬ 
cial  knowledge  of  authoring  and  are  usually  accessed  with  a 
standard  web  browser. 

Another  technology  that  is  important  in  the  development  of 
communities  on  the  web  are  Wikis, which  are  community 
websites  designed  to  facilitate  collaborative  authoring.  The 
most  spectacular  example  of  this  is  Wikipedia,  a  vast  ency¬ 
clopaedia  written  collaboratively  by  volunteers  using  a  Wiki.^® 
This  project  was  founded  in  2001,  and  in  2004  it  contained  over 
a  million  articles  in  more  than  100  languages.  The  system  pro¬ 
vides  facilities  to  discuss  changes  made  to  articles,  and  authors 


may  vote  on  controversial  issues,  the  editorial  assumption  being 
that  the  exposure  of  material  to  many  users  will  result  in  the 
addition  of  depth  and  elimination  of  errors. 


CONCLUSIONS 


In  this  chapter  we  have  considered  HCI  in  a  specific  web  context, 
looking  at  the  issues  impeding  smooth  user  interaction  with  web 
tools  and  documents.  While  some  problems  remain  unsolved, 
generally  the  usability  of  the  web  has  improved  greatly  since  its 
early  days.  While  many  challenges  remain,  there  are  at  least  clear 
research  directions  indicating  potential  solutions. 
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INTRODUCTION 


Computers  can  assist  decision  makers  in  a  variety  of  different 
ways.  They  can,  for  instance,  provide  improved  access  to  infor¬ 
mation  or  more  informative  displays  of  this  information,  or 
support  more  effective  forms  of  communication.  They  can  also 
use  algorithms  to  actively  monitor  situations  and  to  generate  in¬ 
ferences  in  order  to  assist  with  tasks  such  as  planning,  diagnosis, 
and  process  control.  This  chapter  focuses  on  interaction  design 
issues  (Preece,  Rogers,  &  Sharp,  2002;  Scott,  Roth,  &  Deutsch, 
2005)  associated  with  this  latter  role,  in  which  the  software  uses 
numerical  computations  and/or  symbolic  reasoning  to  serve  as 
an  active  decision-support  system  or  DSS  (Turban,  Aronson,  & 
Liang,  2004). 

The  chapter  begins  with  a  focus  on  design  methods,  empha¬ 
sizing  the  value  of  concrete  scenarios  and  use  cases  (Bittner, 
2002;  Cockburn,  2000;  Kulak  &  Guiney,  2003)  to  define  the  de¬ 
sign  problem,  and  the  use  of  cognitive  task  analyses  and  work- 
domain  analyses  to  guide  both  the  development  of  the  under¬ 
lying  functions  in  the  DSS  and  the  design  of  the  interface  that 
mediates  the  interactions  of  the  user  with  these  functions. 

The  chapter  then  proceeds  to  discuss  aspects  of  human  per¬ 
formance  relevant  to  the  design  of  DSSs.  This  includes  a  dis¬ 
cussion  of  factors  that  limit  human  performance,  as  these  fac¬ 
tors  suggest  opportunities  for  improving  performance  through 
the  introduction  of  a  DSS.  This  discussion  also  includes  consid¬ 
eration  of  the  influence  of  a  DSS  on  the  user’s  cognitive 
processes,  emphasizing  the  need  to  think  in  terms  of  the  resul¬ 
tant  joint  cognitive  system  that  is  not  a  simple  combination  of 
the  capabilities  of  the  two  agents  (user  and  DSS)  alone  (Hoc, 
2000;  Hollnagel  &  "Woods,  2005;  Jones  &  Jacobs,  2000). 

The  discussion  further  emphasizes  the  need  to  consider 
more  than  just  the  surface  interactions  of  the  user  with  the  in¬ 
terface  to  the  DSS,  and  to  consider  the  impact  of  the  broader 
task  context  on  performance  (Amalberti,  1999;  C.  Miller,  Pelican, 
&  Goldman,  2000;  Parasuraman,  2000;  Rasmussen,  Pejtersen,  & 
&  Goldstein,  1994).  This  latter  emphasis  suggests  that  the  de¬ 
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sign  of  a  DSS  needs  to  be  viewed  from  the  perspective  of  coop¬ 
erative  problem  solving,  where  the  computer  and  the  person  in¬ 
teract  with  and  influence  each  other  (Beach  &  Connolly,  2005; 
Hoc,  2000;  Jones  &  Mitchell,  1995;  Larson  &  C.  Hayes,  2005;  Mi- 
tal  &  Pennathur,  2004;  Parasuraman,  Sheridan,  &  "Wickens,  2000; 
P  J.  Smith,  McCoy,  &  Layton,  1997).  From  this  viewpoint,  the 
underlying  functionality  is  just  as  important  as  the  surface-level 
representation  or  interface,  as  the  overall  interaction  deter¬ 
mines  ultimate  performance. 

The  chapter  also  frames  the  design  and  use  of  a  DSS  as  a 
form  of  cooperative  work  between  several  individuals  (Bowers, 
Salas,  &  Jentsch,  2006;  Grabowski  &  Sanborn,  2003;  Hutchins, 
1990,  1995;  Orasanu  &  Salas,  1991;  G.  M.  Olson  &  J.  S.  Olson, 
1997;  Rasmussen,  Brehner,  &  Leplat,  1991;  E  J.  Smith,  Billings, 
et  al.,  2000),  with  the  computer  as  the  medium  through  which 
they  cooperate.  This  teamwork  may  involve  system  users  who 
are  physically  or  temporally  distributed.  In  addition,  the  discus¬ 
sion  further  suggests  that  it  is  useful  to  think  of  the  design  team 
as  working  cooperatively  with  the  users,  trying  to  communicate 
with  them  and  extend  or  augment  their  capabilities,  and  doing 
so  through  the  software  and  other  artifacts  (such  as  paper  doc¬ 
uments)  that  they  have  developed.  This  perspective  is  useful  as 
a  reminder  that  we  need  to  consider  the  psychology  of  the  de¬ 
signers  as  well  as  the  psychology  of  the  users  of  systems,  apply¬ 
ing  that  understanding  to  help  us  take  advantage  of  the  strengths 
of  both  groups  as  they  work  cooperatively. 

Thus,  this  chapter  approaches  the  design  of  DSSs  from  four 
perspectives.  The  first  emphasizes  the  need  to  identify  effec¬ 
tive  approaches  to  support  the  design  process,  including  the  ap¬ 
plication  of  use  cases,  cognitive  task  analyses,  and  work-domain 
analyses.  The  second  is  to  provide  a  discussion  of  the  human 
factors  considerations  relevant  to  the  design  of  a  DSS  (He- 
lander,  Landauer,  &  Prabhu,  1997;  Meister  &  Enderwick,  2002; 
Wickens,  J.  Lee,  Liu,  Becker,  &  Gordon,  2004).  The  third  is  to 
provide  very  practical  lists  of  questions  that  need  to  be  asked 
as  part  of  the  design  process.  The  fourth  is  to  use  case  studies  to 
provide  concrete,  detailed  illustrations  of  how  these  considera¬ 
tions  can  be  addressed  in  specific  application  contexts. 
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APPROACHES  TO  DESIGN 


A  number  of  complementary  and  overlapping  approaches  have 
been  articulated  to  help  ensure  adequate  consideration  of  user 
needs,  preferences,  and  capabilities  during  the  design  process, 
and  to  gain  insights  into  possible  functionality  to  incorporate  in 
a  DSS.  These  include  the  use  of  participatory  design  (Bodker, 
Kensing,  &  Simonsen,  2004;  Schuler  &  Namioka,  1993),  scenario- 
based  design  (Carroll,  1995),  and  the  completion  of  various 
forms  of  needs  assessments  (Tobey,  2005;  Witkin  &  Altshuld, 
1995),  cognitive  task  analyses  (CTAs)  and  cognitive  walkthroughs 
(Crandall,  Klein,  &  Hoffman,  2006;  Jonassen,  Tessmer,  &  Han- 
num,  1999;  Kirwan  &  Ainsworth,  1992;  Rubin,  Jones,  &  Mitchell, 
1988;  Schragen,  Chipman,  &  Shalin,  2000).  It  also  includes  vari¬ 
ous  forms  of  domain  analyses  aimed  at  identifying  the  critical  fea¬ 
tures  and  constraints  that  characterize  the  domain  of  interest. 

Previous  chapters  have  discussed  the  use  of  participatory 
and  scenario-based  design  as  approaches  to  the  design  process. 
The  focus  of  this  section  will  therefore  be  on  various  forms  of 
cognitive  task  and  domain  analyses.  However,  it  should  be 
noted  that  the  development  and  results  of  a  cognitive  task 
analysis  can  in  general  be  very  useful  in  helping  to  support  par¬ 
ticipatory  design,  as  can  the  development  and  use  of  scenarios. 
As  Mack  (1995)  noted:  “Developing  and  sharing  a  set  of  touch¬ 
stone  stories  is  an  important  cohesive  element  in  any  social  sys¬ 
tem.  .  .  .  Gathering,  discussing,  and  sharing  these  stories  as  a 
group  can  be  an  effective  means  to  team  building”  (p.  368). 

Cognitive  Task  Analyses  and  Cognitive  Walkthroughs 

Scenarios  and  use  cases  are  also  important  in  providing  the  fram¬ 
ing  for  specific  cognitive  task  analyses,  as  descriptions  or  pre¬ 
dictions  of  performance  need  to  be  considered  in  the  context 
of  specific  tasks  and  task  settings.  The  starting  point  for  a  specific 
cognitive  task  analysis  is  the  high-level  context  provided  by  a  sce¬ 
nario  or  use  case,  including  the  social  setting,  resources,  and 
goals  of  users.  Thus,  to  begin  a  cognitive  task  analysis,  the  start¬ 
ing  point  is  specification  of  the  person  for  whom  the  product  is 
to  be  designed  (a  persona),  what  goal  that  person  is  trying  to 
achieve,  and  the  immediate  and  broader  contexts  in  which  that 
goal  will  be  pursued. 

A  number  of  variations  for  conducting  a  cognitive  task  analy¬ 
sis  (Gordon  &  Gill,  1997;  Hollnagel,  2003)  have  their  roots  in  hi¬ 
erarchical  task  analysis  techniques,  decomposing  some  high- 
level  goal  or  task  into  a  hierarchy  of  subgoals  (Annett,  Duncan, 
Stammers,  &  Gray,  1971;  Annett  &  Stanton,  2000;  Diaper  &  Stan¬ 
ton,  2004;  G.  A.  Miller,  Galanter,  &  Pribram,  I960;  Preece,  Rogers, 
Sharp,  &  Benyon,  1994).  This  approach,  originally  applied  to 
pure  task  analyses  that  were  not  concerned  with  representing 
the  underlying  cognitive  processes,  has  been  extended  in  a 
number  of  ways  that  maintain  a  central  role  for  describing  goal- 
subgoal  relationships,  but  that  also  introduce  constructs  or 
methods  for  describing  associated  cognitive  processes. 

One  influential  example  of  this  evolution  is  the  GOMS  (Goals, 
Operators,  Methods  and  Selection  rules)  model  (Card,  Moran,  & 
Newell,  1983;  Kieras,  1991,  2004;  St.  Amant,  Freed,  Ritter,  & 
Schunn,  2005;  Williams,  2005),  which  makes  use  of  a  hierarchical 


decomposition,  but  provides  a  control  structure  that  introduces 
operators  and  rules  for  selecting  which  operator  to  apply  in  a 
given  goal-driven  context.  As  Preece  et  al.  (1994)  noted: 

A  GOMS  analysis  of  human-system  interaction  can  be  applied  at  vari¬ 
ous  levels  of  abstraction  in  much  the  same  way  that  the  hierarchical  task 
analysis  splits  tasks  into  subtasks.  .  .  .  Three  broad  levels  of  granularity 
determine  the  GOMS  family  of  models: 

•  The  GOMS  model,  which  describes  the  general  methods  for  accom¬ 
plishing  a  set  of  tasks. 

•  The  unit  task  level,  which  breaks  users’  tasks  into  unit  tasks,  and  then 
estimates  the  time  that  it  takes  for  the  user  to  perform  these. 

•  The  keystroke  level,  which  describes  and  predicts  the  time  it  takes 
to  perform  a  task  by  specifying  the  keystrokes  needed  (pp.  419-420). 

This  move  toward  explicit  computational  representation  of 
internal  processes  has  been  extended  to  the  point  where  broad, 
task-independent  cognitive  architectures  have  been  developed 
that  can  be  used  to  model  performance  in  a  given  task/system 
context  and  generate  performance  predictions  (Byrne  &  Kirlik, 
2005).  Three  such  systems  are  ACT-R  (Anderson,  1993;  Ander¬ 
son  et  al.,  2004),  SOAR  (Laird,  Newell,  &  Rosenbloom,  1987; 
Nason,  Laird,  &  Schunn,  2005),  and  EPIC  (Kieras  &  Meyer,  1997; 
St.  Amant  et  al.,  2005).  All  three  incorporate  procedural  knowl¬ 
edge  that  must  operate  within  the  constraints  of  certain  con¬ 
structs,  such  as  working  memory  (Baddeley,  1998,  1999;  Cary 
&  Carlson,  2001;  Radvansky  2005;  Seamon,  2005),  that  are  used 
to  model  human  capabilities  and  limitations  (Meyers,  2004; 
Wickens  et  al.,  2004). 

This  work  on  the  development  of  computational  models  to 
describe  or  predict  task  performance  has  been  complemented 
by  more  qualitative,  judgment-based  approaches  using  expert 
reviews  that  are  similarly  rooted  in  the  hierarchical  decomposi¬ 
tion  of  a  task  based  on  goal-subgoal  relationships.  These  latter 
techniques  are  motivated  by  the  assumption  that  expert  judg¬ 
ments  about  the  adequacy  of  a  design  will  be  more  accurate  and 
complete  if  they  are  made  while  explicitly  considering  the  con¬ 
text  that  the  user  will  be  dealing  with  (the  user’s  current  goal 
and  subgoal  and  the  display  the  user  is  viewing  while  trying  to 
achieve  that  goal). 

There  are  a  number  of  variations  on  how  to  conduct  such 
qualitative  cognitive  walkthroughs  or  predictive  cognitive  task 
analyses  (Annett  &  Stanton,  2000;  Bisantz,  Roth,  &  Brickman, 
2003;  Diaper  &  Stanton,  2004;  Gordon  &  Gill,  1997;  Hollnagel, 
2003;  Jonassen  et  al.,  1999;  Kirwan  &  Ainsworth,  1992;  Klein, 
2000;  Schraagen  et  al.,  2000;  Shepherd,  2000;  "Vicente,  1999). 
As  an  example,  one  such  method  was  outlined  by  Lewis  and 
"Wharton  (1997)  as  follows: 

Step  1:  Select  the  context  for  a  use  case  to  be  used  for  an  eval¬ 
uation.  Each  such  use  case  specifies  three  primary  considerations: 

•  "What  are  the  characteristics  of  the  user  of  the  product  that 
might  affect  its  use?  (the  persona  in  the  use  case) 

•  "What  is  the  broader  physical,  organizational,  social  and  legal 
context  in  which  the  product  will  be  used? 

•  "What  is  the  high-level  task  or  goal  for  which  the  product  is 
being  used? 
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Note  that  this  specification  is  solution-independent.  It  es¬ 
sentially  defines  the  design  problem  for  this  use  case. 

Step  2:  Specify  the  normative  (correct)  paths  for  completing 
the  goal  of  this  use  case  (represented  as  a  goal  hierarchy)  us¬ 
ing  the  current  product  design,  indicating  the  alternative  se¬ 
quences  of  steps  that  the  user  could  take  to  successfully  achieve 
the  specified  goal.  Note  that  there  could  be  more  than  one  cor¬ 
rect  path  for  completing  a  given  task. 

Step  3:  Identify  the  state  of  the  product  and  the  associated 
“world”  at  each  node  in  the  goal  hierarchy.  In  the  case  of  a  soft¬ 
ware  product,  the  state  of  the  product  would  be  the  current  ap¬ 
pearance  of  the  interface  and  any  associated  internal  states 
(such  as  the  queue  of  recently  completed  actions  that  would 
be  used  should  an  undo  function  be  applied).  The  state  of  the 
“world”  applies  if  the  product  or  the  user  actually  changes 
something  in  the  world  as  part  of  an  action,  such  as  changing 
the  temperature  of  a  glass-manufacturing  system  through  a 
process-control  interface. 

Step  4:  Generate  predictions.  For  each  correct  action  (node 
in  hierarchy),  attempt  to: 

•  Predict  all  the  relevant  success  stories; 

•  Predict  all  the  relevant  failure  stories; 

•  Record  the  reasons  and  assumptions  made  in  generating 
these  stories; 

•  Identify  potential  fixes  to  avoid  or  assist  in  recovery  from  fail¬ 
ure  stories. 

This  final  step  involves  walking  through  the  hierarchy  along 
each  path  that  leads  to  success,  and,  playing  psychologist,  gener¬ 
ate  predicted  behaviors  at  each  node  (for  each  associated  screen 
display  or  product  state).  This  walkthrough  could  be  done  by  an 
expert  in  human-computer  interaction  or  by  a  domain  expert.  Be¬ 
cause  of  the  differences  in  their  backgrounds  relevant  to  the  de¬ 
sign  and  the  domain  of  interest,  individuals  from  each  group  are 
likely  to  generate  somewhat  different  predictions. 

Lewis  and  Wharton  (1997)  suggested  asking  four  questions 
to  help  guide  this  prediction  process: 

•  “Will  the  user  be  trying  to  achieve  the  right  effect?” 

•  “Will  the  user  notice  that  the  correct  action  is  available?” 

•  “Will  the  user  associate  the  correct  action  with  the  desired 
effect?” 

•  “If  the  correct  action  is  performed,  will  the  user  see  that 
progress  is  being  made?” 

P  J.  Smith,  Stone,  and  Spencer  (2006)  suggested  some  addi¬ 
tional,  more  detailed  questions  that  can  be  asked  to  help  the  an¬ 
alyst  consider  how  different  cognitive  processes  influence  per¬ 
formance  (Card  et  al.,  1983;  Wickens  et  al.,  2004): 

•  Selective  Attention:  What  are  the  determinants  of  attention? 
What  is  most  salient  in  the  display?  Where  will  the  user’s  fo¬ 
cus  of  attention  be  drawn?  (See  Bennett  &  Flach,  1992;  Eck¬ 
stein,  2004;  Johnston  &  Dark,  1986;  Lowe,  2003;  Pashler, 
Johnston,  &  Ruthruff,  2001;  Yantis,  1998.) 


•  Automaticity  and  Controlled  Attentional  Processes:  Does  the 
design  support  skill-based  performance  for  routine  scenar¬ 
ios  and  controlled,  knowledge-based  processes  for  novel  sit¬ 
uations?  (See  Logan,  2005;  Rasmussen,  1983;  Rasmussen 
etal.,  1994.) 

•  Perception:  How  will  perceptual  processes  influence  the 
user’s  interpretation?  How,  for  instance,  will  the  proximity  of 
various  items  on  the  screen  influence  judgments  of  “related¬ 
ness”  as  predicted  by  the  Gestalt  Law  of  Proximity?  (See  Ben¬ 
nett  &  Flach,  1992;  Kohler,  1992;  Tufte,  1983,  1990,  1997; 
Vicente,  2002;  Watzman,  2003.) 

•  Learning  and  Memory:  How  will  the  user’s  prior  knowledge  in¬ 
fluence  selective  attention  and  interpretation?  Does  the  knowl¬ 
edge  necessary  to  perform  tasks  reside  in  the  world  or  in  the 
user’s  mind?  (See  Baddeley  1998, 1999;  Grondin,  2005;  Hester 
&  Caravan,  2005;  Hutchins,  1995;  Norman,  2002;  Ormrod, 
2003;  Radvansky  2005;  Scaife  &  Rogers,  1996;  Seamon,  2005.) 

•  Information  Processing,  Mental  Models,  and  Situation  Aware¬ 
ness:  What  inferences/assumptions  will  the  user  make?  "What 
internal  representations  will  support  such  processing?  (See 
Endsley  2003;  Endsley  &  Garland,  2000;  Centner  &  Stevens, 
1983;  Johnson  et  al.,  1981;  Pious,  1993;  Schaeken,  2005.) 

•  Design-Induced  Error:  How  could  the  product  design  and 
the  context  of  use  influence  performance  and  induce  errors? 
(See  Bainbridge,  1983;  Beach  &  Connolly,  2005;  Johnson 
et  al.,  1981;  Larson  &  C.  Hayes,  2005;  Parasuraman  &  Riley, 
1997;  Reason,  1991,  1997;  Roth,  Bennett,  &  Woods,  1987; 
Sheridan,  2002;  Skirka,  Mosier,  &  Burdick,  1999;  P  J.  Smith  & 
Geddes,  2003;  P  J.  Smith,  McCoy,  &  Layton,  1997.) 

•  Motor  Performance:  Can  the  controls  be  used  efficiently  and 
without  error?  (See  Jagacinski  &  Flach,  2003.) 

•  Group  Dynamics:  How  will  the  system  influence  patterns  of 
interaction  among  people?  (See  Baecker,  Grudin,  Buxton,  & 
Greenber,  1995;  Brehm,  Kassin,  &  Fein,  1999;  Forsyth,  1998; 
Levi,  2001;  G.  M.  Olson  &  J.  S.  Olson,  2003.) 

Thus,  steps  1-3  as  outlined  above  involve  describing  how  a 
product  (such  as  a  DSS)  can  be  used  to  achieve  a  specific  goal. 
Step  4  involves  predicting  how  people  might  perform  when  us¬ 
ing  this  product,  either  successfully  or  unsuccessfully,  as  they  try 
to  achieve  this  goal.  Thus,  the  “cognitive”  part  of  this  analysis  is 
embedded  only  in  step  4.  Note  also  that  the  goal-subgoal  hier¬ 
archy  does  not  attempt  to  represent  the  further  sequences  of 
actions  that  could  arise  once  the  user  starts  down  a  failure  path. 
This  helps  to  keep  the  size  of  the  hierarchy  smaller,  keeping 
the  analysis  task  more  manageable.  (For  some  applications,  it 
may  be  desirable  to  actually  expand  the  goal-subgoal  hierarchy 
to  represent  the  complete  path  of  a  failure  story.) 

Finally,  going  beyond  just  the  use  of  hierarchical  decompo¬ 
sition  techniques,  Gordon  and  Gill  (1997)  noted  that: 

CTA  [Cognitive  Task  Analysis]  methodologies  differ  along  several  di¬ 
mensions.  These  include  the  type  of  knowledge  representation  or  for¬ 
malism  used  to  encode  the  information  obtained  from  the  analysis,  the 
methods  used  for  eliciting  or  identifying  expert  (or  novice)  knowledge, 
and  the  type  of  task  and  materials  used  in  the  analysis.  Typical  repre¬ 
sentation  formats  include  lists  and  outlines,  matrices  and  cross-tabula¬ 
tion  tables,  networks,  flowcharts,  and  problem  spaces  with  alternative 
correct  solution  paths  (pp.  133-134) 


29.  Human-Centered  Design  of  Decision-Support  Systems  *  577 


Thus,  although  the  use  of  hierarchical  goal-subgoal  repre¬ 
sentations  is  one  of  the  more  popular  approaches,  a  variety  of 
other  complementary  approaches  are  labeled  “cognitive  task 
analyses”  (Crandall,  Klein,  &  Hoffman,  2006).  These  alternatives 
include  the  use  of  concept  maps  and  conceptual  graph  analy¬ 
ses  (Gordon  &  Gill,  1992)  and  different  approaches  to  extract¬ 
ing  and  representing  the  knowledge  of  experts  about  critical 
incidents  (Crandall  et  al.,  2006;  Klein,  Caulderwood,  &  MacGregor, 
1989).  The  ultimate  goal  of  all  of  these  approaches,  however,  is 
to  better  represent  and  predict  how  users’  knowledge  and  con¬ 
trol  processes  or  cognitive  processes  will  be  applied  in  a  given 
task  environment.  This  task  environment  could  be  the  existing 
world  (a  descriptive  analysis)  that  is  being  studied  in  order  to 
design  improved  support  tools,  or  an  envisioned  world  that  in¬ 
corporates  some  specific  design  proposal  intended  to  improve 
performance  (a  predictive  analysis). 

Work-Domain  Analyses 

Cognitive  task  analyses  tend  to  focus  on  how  to  support  per¬ 
formance  for  an  already  identified  task  or  goal.  However,  in 
many  cases  DSSs  are  designed  to  support  performance  in  very 
complex  environments  (Bar-Yam,  2003,  2005)  where  it  is  un¬ 
likely  that  all  of  the  possible  scenarios  that  could  arise  will  be 
predicted  during  the  design  process.  Thus,  there  is  also  a  need 
for  design  methodologies  that  support  the  development  of  ro¬ 
bust  designs  that  will  be  effective  even  when  novel  situations 
arise  (Vicente,  1999). 

To  deal  with  this  need,  a  complementary  set  of  design  method¬ 
ologies  have  been  developed  under  the  label  of  “work-domain 
analysis”  (Burns,  Bizantz,  &  Roth,  2004;  Burns  &  Hajdukiewicz, 
2004;  Burns  &  Vicente,  2001;  Reising  &  Sanderson,  2002;  Vicente, 
2002;  Vicente  &  Rasmussen,  1990,  1992).  In  general  terms,  the 
emphasis  is  on  understanding  and  modeling  the  work  domain  or 
application  area  rather  than  simply  modeling  the  performance  of 
people  on  a  prespecified  set  of  tasks  within  this  domain.  The  ar¬ 
gument  is  that  the  domain  analysis  can  identify  the  critical  para¬ 
meters  or  constraints  relevant  to  successful  performance  in  that 
domain  and  thus  serve  as  the  basis  for  a  design  that  can  handle 
both  routine  and  novel  (unanticipated)  situations. 

To  make  this  point  more  concrete,  as  a  very  simple  but  illus¬ 
trative  example,  Vicente  (2000)  contrasted  verbal  directions  on 
how  to  travel  from  one  point  to  another  with  the  use  of  a  map. 
In  this  analogy,  the  directions  provide  a  very  efficient  but  brittle 
description  of  how  to  accomplish  the  desired  goal.  The  map  re¬ 
quires  more  work  (interpretation),  but  is  more  flexible  in  the 
face  of  obstacles  that  might  render  the  directions  unusable. 
Building  on  this  analogy,  Vicente  noted: 

This  decrease  in  efficiency  [of  maps  vs.  directions]  is  compensated  for  by 
an  increase  in  both  flexibility  and  generality.  Like  maps,  work  domain  rep¬ 
resentations  are  more  flexible  because  they  provide  workers  with  the 
information  they  need  to  generate  an  appropriate  response  online  in  real 
time  to  events  that  have  not  been  anticipated  by  designers.  This  is  par¬ 
ticularly  useful  when  an  unforeseen  event  occurs  because  task  repre¬ 
sentations,  by  definition,  cannot  cope  with  the  unanticipated  (p.  115) 

The  literature  on  work-domain  analyses  discusses  the  use  of 
abstraction  (means-end)  hierarchies  and  aggregation  (part- 


whole)  hierarchies  to  complete  a  domain  analysis,  and  then  us¬ 
ing  the  results  of  this  analysis  to  guide  the  design  of  the  DSS 
for  supporting  operator  performance  (Burns  &  Hajdukiewicz, 
2004;  Vicente,  1999;  Vicente  &  Rasmussen,  1990).  The  basic 
logic  of  this  approach  is  that,  in  order  to  design  appropriate  rep¬ 
resentations  to  assist  an  operator  in  completing  tasks  using 
some  system,  the  complete  and  correct  semantics  of  the  do¬ 
main  must  first  be  appropriately  defined.  Since  such  a  design  is 
concerned  with  supporting  goal-oriented  behaviors,  one  way  to 
represent  the  semantics  is  with  an  abstraction  (means-end)  hi¬ 
erarchy  that  captures  the  “goal-relevant  properties  of  the  work 
domain”  (Vicente  &  Rasmussen,  1990,  p.  210).  In  addition,  be¬ 
cause  the  domains  of  interest  are  often  quite  complex,  aggre¬ 
gation  hierarchies  may  be  useful  to  decompose  the  overall  do¬ 
main  into  a  set  of  interrelated  subparts. 

Approaches  to  Design — Summary 

Much  of  the  work  on  methods  for  completing  cognitive  task 
analyses  and  work  domain  analyses  has  been  stimulated  by  the 
challenges  of  developing  DSSs  to  support  performance  on 
complex  tasks.  The  description  above  provides  a  sense  of  the 
range  of  techniques  that  have  been  developed,  and  emphasizes 
the  need  to  understand  both  the  constraints  of  the  application 
context  and  the  nature  of  the  agent(s)  (person  or  people)  in¬ 
volved.  The  insights  provided  by  such  analyses  can  be  used  to 
guide  the  design  of  the  underlying  functionality  embedded 
within  a  DSS,  and  to  guide  the  design  of  the  interaction  be¬ 
tween  a  DSS  and  its  user. 


HUMAN  PERFORMANCE  ON 
DECISION-MAKING  TASKS 


There  are  several  reasons  why  an  understanding  of  human  per¬ 
formance  is  important  to  the  designer  of  a  DSS.  The  first  is  that 
the  motivation  for  developing  such  systems  is  to  increase  effi¬ 
ciency  (reduce  production  costs  or  time)  and/or  to  improve  the 
quality  of  performance.  Thus,  it  is  important  for  the  design  team 
to  be  able  to  efficiently  and  effectively  complete  the  initial  prob¬ 
lem  definition  and  knowledge-engineering  stages  of  a  project, 
identifying  areas  in  which  improvements  are  needed.  An  un¬ 
derstanding  of  human  performance  makes  it  possible  to,  in 
part,  take  a  top-down  approach  to  this,  looking  to  see  whether 
certain  classic  human  performance  phenomena  (such  as  hy¬ 
pothesis  fixation)  are  influencing  outcomes  in  a  particular  ap¬ 
plication.  A  second  motivation  for  understanding  human  per¬ 
formance  is  that  a  human-centered  approach  to  the  design  of  a 
DSS  requires  consideration  and  support  of  the  user’s  skills 
(Garb,  2005).  To  do  this  effectively,  knowledge  of  human  per¬ 
formance  (perception,  learning  and  memory,  problem  solving, 
decision  making,  etc.)  is  essential.  A  number  of  these  aspects 
of  human  performance  are  covered  in  previous  chapters,  so  this 
chapter  will  just  highlight  some  of  the  factors  most  relevant  to 
the  design  of  a  DSS,  and  discuss  how  such  factors  are  relevant  to 
this  design  task. 
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Errors  and  Cognitive  Biases 

In  very  broad  terms,  human  errors  (Strauch,  2004;  Wiegmann  & 
Shappell,  2003)  can  be  classified  as  slips  and  as  mistakes  (Nor¬ 
man,  1981).  Slips  arise  through  a  variety  of  cognitive  processes, 
but  are  defined  as  behaviors  in  which  the  person’s  actions  do 
not  match  his  intentions.  Generally,  this  refers  to  cases  where 
the  person  has  the  correct  knowledge  to  achieve  some  goal  but, 
due  to  some  underlying  perceptual,  cognitive,  or  motor  process, 
fails  to  correctly  apply  this  knowledge.  As  Norman  described  it: 
“Form  an  appropriate  goal  but  mess  up  in  the  performance,  and 
you’ve  made  a  slip”  (1981,  p.  106).  Mistakes,  on  the  other  hand, 
refer  to  errors  resulting  from  the  accurate  application  of  a  per¬ 
son’s  knowledge  to  achieve  some  goal,  but  where  that  knowl¬ 
edge  is  incomplete  or  wrong. 

DSSs  are  potentially  useful  for  dealing  with  either  of  these 
sources  of  errors.  If  slips  or  mistakes  can  be  predicted  by  the  de¬ 
sign  team,  then  tools  can  be  developed  to  either  help  prevent 
them,  recover  from  them,  or  reduce  their  impacts. 


Slips 

Norman  (2002)  discussed  six  categories  of  slips.  Knowing 

something  about  these  different  causes  can  help  the  designer  to 

look  for  possible  manifestations  in  a  particular  application  area. 

The  six  categories,  as  defined  in  Norman,  included: 

•  Capture  errors,  “in  which  a  frequently  done  activity  suddenly 
takes  charge  instead  of  (captures)  the  one  intended” 
(p.  107); 

•  Description  errors,  where  “the  intended  action  has  much  in 
common  with  others  that  are  possible”  and  “the  internal  de¬ 
scription  of  the  intention  was  not  sufficient .  .  .  often  result¬ 
ing  in  “performing  the  correct  action  on  the  wrong  object” 
(pp.  107-108); 

•  Data-driven  errors,  where  an  automatic  response  is  triggered 
by  some  external  stimulus  that  triggers  the  behavior  at  an  in¬ 
appropriate  time; 

•  Associative  activation  errors  where,  similar  to  a  data-driven 
error,  something  triggers  a  behavior  at  an  inappropriate  time, 
but  in  this  case  the  trigger  is  some  internal  thought  or 
process; 

•  Loss-of-activation  errors,  or  “forgetting  to  do  something” 
(p.  109); 

•  Mode  errors,  or  performing  an  action  that  would  have  been 
appropriate  for  one  mode  of  operation  for  a  system,  but 
is  inappropriate  for  the  actual  mode  or  state  that  the  system 
is  in. 


Mistakes 

As  defined  above,  mistakes  are  due  to  incorrect  knowledge 
(the  rule,  fact,  or  procedure  that  the  person  believes  to  be  true 
is  incorrect,  resulting  in  an  error)  or  incomplete  (missing) 
knowledge. 


Cognitive  Biases 

The  literature  on  human  error  also  provides  other  useful 
ways  to  classify  errors  in  terms  of  surface  level  behavior  or  the 
underlying  cognitive  process.  Many  of  these  are  discussed  un¬ 
der  the  label  of  “cognitive  biases”  (Brachman  &  Levesque,  2004; 
Bradfield  &  ’Wells,  2005;  Fraser,  E  J.  Smith,  &  J.  'W  Smith,  1992; 
Gilovich,  Griffin,  &  Kahneman,  2002;  Haselton,  Nettle,  &  An¬ 
drews,  2005;  Kahneman  &  Tversky  2000;  Pious,  1993;  Poulton, 
1989),  including: 

•  Gambler’s  fallacy  (Croson  &  Sundali,  2005).  (Concluding 
there  is  a  pattern  in  a  series  of  events  that  is  in  reality  a  ran¬ 
dom  sequence.)  Gilovich,  Vallone,  and  Tversky  (1985)  for  in¬ 
stance,  found  that  individuals  believe  they  see  streaks  in  bas¬ 
ketball  shooting  even  when  the  data  show  that  the  sequences 
are  essentially  random. 

•  Insensitivity  to  sample  size  (Bjork,  1999).  (Failing  to  under¬ 
stand  that  the  law  of  large  numbers  implies  that  the  proba¬ 
bility  of  observing  an  extreme  result  in  an  average  decreases 
as  the  size  of  the  sample  increases.)  Tversky  and  Kahneman 
(1974),  for  example,  found  that  subjects  believed  that  “the 
probability  of  obtaining  an  average  height  greater  than  6  feet 
was  assigned  the  same  value  for  samples  of  1000,  100  and 
10  men.”  (p.  1127) 

•  Incorrect  revision  of  probabilities.  (Failure  to  revise  proba¬ 
bilities  sufficiently  when  data  are  processed  simultaneously, 
or,  conversely,  revising  probabilities  too  much  when  process¬ 
ing  the  data  sequentially.) 

•  Ignoring  base  rates.  (Failure  to  adequately  consider  prior 
probabilities  when  revising  beliefs  based  on  new  data.) 

•  Use  of  the  availability  heuristic  (Blount  &  Larrick,  2000). 
(Tversky  (1982)  suggested  that  the  probability  of  some  type 
of  event  is  in  part  judged  based  on  the  ability  of  the  person  to 
recall  events  of  that  type  from  memory,  thus  suggesting  that 
factors  like  recency  may  incorrectly  influence  judgments  of 
probability.) 

•  Attribution  errors.  Jones  and  Nisbett  (1971)  described  this 
by  noting  that  “there  is  a  pervasive  tendency  for  actors  to  at¬ 
tribute  their  actions  to  situational  requirements,  whereas  ob¬ 
servers  tend  to  attribute  the  same  actions  to  stable  personal 
dispositions.”  (p.  83) 

•  Memory  distortions  due  to  the  reconstructive  nature  of 
memory  (Blank,  1998;  Handberg,  1995).  (Loftus  (1975)  de¬ 
scribed  processes  that  distort  memories  based  on  the  activa¬ 
tion  of  a  schema  as  part  of  the  perception  of  an  event,  and 
the  use  of  that  schema  to  reconstruct  the  memory  of  the 
event  based  on  what  the  schema  indicates  should  have  hap¬ 
pened  rather  than  what  was  actually  perceived  originally.  E  J. 
Smith,  Giffin,  Rockwell,  and  Thomas  (1986)  and  Pennington 
and  Hastie  (1992)  provided  descriptions  of  similar  phenom¬ 
ena  in  decision-making  tasks.) 

In  recent  years,  there  has  been  considerable  controversy 
regarding  the  nature  and  validity  of  many  of  these  explana¬ 
tions  as  “cognitive  biases”  (Fraser  et  al.,  1992;  Haselton,  2005; 
Koehler,  1996),  suggesting  that  it  is  important  to  carefully  un¬ 
derstand  the  specific  context  in  order  to  generate  predictions  as 
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to  whether  a  particular  behavior  will  be  exhibited.  Using  the  in¬ 
correct  revision  of  probabilities  as  an  illustration,  Navon  (1978) 
suggested  that  in  many  real-world  settings  the  data  aren’t  inde¬ 
pendent,  and  that  what  appears  to  be  conservatism  in  revising 
probabilities  in  a  laboratory  setting  may  be  the  result  of  the  sub¬ 
jects  applying  a  heuristic  or  cognitive  process  that  is  effective 
in  real-world  situations  where  the  data  are  correlated. 

A  more  detailed  example  is  provided  by  the  popular  use  of 
the  label  “hypothesis  fixation.”  This  behavior  refers  to  some 
process  that  leads  a  person  to  form  an  incorrect  hypothesis 
and  to  stick  with  that  hypothesis,  failing  to  collect  critical  data 
to  assess  its  validity  or  to  revise  it  in  the  face  of  conflicting  data. 
A  variety  of  cognitive  processes  have  been  hypothesized  to 
cause  this  behavior.  One  example  is  called  “biased  assimila¬ 
tion,”  in  which  a  person  judges  a  new  piece  of  data  as  sup¬ 
portive  of  his  hypothesis  (increasing  the  level  of  confidence 
in  the  hypothesis)  based  simply  on  the  consideration  of 
whether  that  outcome  could  have  been  produced  under  his 
hypothesis.  This  contrasts  with  inferential  processes  based  on 
a  normative  model  that  suggest  that  beliefs  should  be  revised 
based  on  the  relative  likelihood  of  an  outcome  under  the  pos¬ 
sible  competing  hypotheses,  and  that  would,  in  the  same  cir¬ 
cumstance,  lead  to  a  reduction  in  the  level  of  confidence  in  his 
hypothesis. 

Another  example  of  a  cognitive  process  discussed  in  the  lit¬ 
erature  as  relevant  to  hypothesis  fixation  is  the  so-called  “con¬ 
firmation  bias.”  This  phenomenon  (described  in  "Wason  (I960) 
and  in  Mynatt,  Doherty,  and  Tweney  (1978))  is  concerned  with 
the  person’s  data-collection  strategy.  For  example,  in  a  study 
asking  subjects  to  discover  the  rules  of  particle  motion  in  a  sim¬ 
ulated  world,  Mynatt  et  al.  described  performance  by  conclud¬ 
ing  that  there  was  “almost  no  indication  whatsoever  that  they  in¬ 
tentionally  sought  disconfirmation.”  Later  studies,  however, 
have  suggested  that  this  “bias”  is  really  an  adaptive  “positive 
test”  strategy  that  is  effective  in  many  real-world  settings,  and 
that  the  unusual  nature  of  the  task  selected  by  Mynatt  et  al. 
makes  it  look  like  an  undesirable  bias  (Klayman  &  Ha,  1987). 
P  J.  Smith  et  al.  (1986)  further  suggested  that,  in  real-world  task 
settings  where  such  a  strategy  might  lead  to  confirmation  of 
the  wrong  conclusion,  experts  often  have  domain-specific 
knowledge  or  rules  that  help  them  to  avoid  this  strategy  alto¬ 
gether,  thus  ensuring  that  they  collect  critical  potentially  dis- 
confirming  evidence. 

Designer  Error 

Many  of  the  error-producing  processes  discussed  above  ap¬ 
pear  to  focus  on  system  operators.  It  is  important  to  recognize, 
however,  that  the  introduction  of  a  DSS  into  a  system  is  a  form 
of  cooperative  work  between  the  users  and  the  design-and- 
implementation  team  and  that,  like  the  users,  the  design  team  is 
susceptible  to  errors  (Petroski,  1994).  These  errors  maybe  due 
to  slips  or  mistakes.  In  the  case  of  mistakes,  it  may  be  due  to 
inadequate  knowledge  of  the  application  area  (the  designers 
may  fail  to  anticipate  all  of  the  important  scenarios)  or  incorrect 
knowledge.  It  may  also  be  due  to  a  failure  to  adequately  under¬ 
stand  or  predict  how  the  users  will  actually  apply  the  DSS,  or 
how  its  introduction  will  influence  their  cognitive  processes  and 


performances.  In  addition,  sources  of  errors  associated  with 
group  dynamics  may  be  introduced.  'Whether  such  errors  are 
due  to  a  lack  of  sufficient  coordination,  where  one  team  mem¬ 
ber  assumes  that  another  is  handling  a  particular  issue,  or  due 
to  the  influence  of  group  processes  (Brehm  et  al.,  1999;  Forsyth, 
1998;  Janis,  1982;  Levi,  2001;  McCauley,  1989;  Tetlock,  1998;  Tet- 
lock,  Peterson,  McQuire,  Chang,  &  Feld,  1992)  on  design  deci¬ 
sions,  the  result  can  be  some  inadequacy  in  the  design  of  the 
DSS,  or  in  the  way  the  user  and  the  DSS  work  together.  (Note 
that  such  group  processes  are  also  important  potential  sources 
of  errors  when  system  operators  work  together  as  part  of  teams, 
with  or  without  technological  support.) 

Systems  Approaches  to  Error 

Reason  (1991,  1997)  and  'Wiegmann,  Zhang,  'Von  Thaden, 
Sharma,  and  Gibbons  (2004)  cautioned  designers  against  fixat¬ 
ing  on  the  immediately  preceding  “causes”  of  an  accident  when 
trying  to  prevent  future  occurrences.  They  remind  us  that,  in 
many  system  failures,  a  number  of  conditions  must  coincide  in 
order  for  the  failure  to  occur.  This  presents  a  variety  of  lever¬ 
age  points  for  preventing  future  occurrences,  many  of  which  are 
preemptive  and  thus  remote  from  the  actual  accident  or  sys¬ 
tem  failure.  Many  of  these  changes  focus  on  preventing  the  con¬ 
ditions  that  could  precipitate  the  system  failure,  rather  than  im¬ 
proving  performance  once  the  hazardous  situation  has  been 
encountered. 

Errors  and  Cognitive  Biases — Implications  for  Design 

As  described  at  the  beginning  of  this  section,  the  value  of  this 
literature  to  the  designer  is  the  guidance  it  provides  in  con¬ 
ducting  the  initial  knowledge-engineering  studies  that  need  to 
be  completed  in  order  to  identify  opportunities  for  improve¬ 
ment  in  some  existing  decision-making  process  that  is  to  be 
supported  by  a  new  DSS.  Familiarity  with  this  literature  is  also 
critical  to  the  designer  to  ensure  that  the  introduction  of  the 
DSS  doesn’t  result  in  new  design-induced  errors.  This  litera¬ 
ture  also  serves  to  emphasize  two  additional  considerations: 

•  The  occurrence  of  errors  are  often  due  to  the  co-occurrence 
of  a  particular  problem-solving  strategy  with  a  given  task  en¬ 
vironment  that  is  “unfriendly”  to  that  strategy  (e.g.,  situations 
in  which  the  strategy  is  not  sufficiently  robust).  Strategies  that 
were  adaptive  in  one  setting  may  no  longer  be  adaptive  in  the 
newly  designed  environment.  The  potential  for  such  a  nega¬ 
tive  transfer  of  learning  needs  to  be  considered  as  part  of  the 
design. 

•  For  routine  situations,  people  tend  to  develop  expertise  that 
lets  them  employ  knowledge-rich  problem-solving  strategies 
(Crandall  et  al.,  2006;  Klein,  1993;  Kolodner,  1993;  Newell, 
1990)  that  avoid  the  errors  that  could  be  introduced  by  cer¬ 
tain  general  problem-solving  strategies.  However,  in  many 
system  designs,  people  are  expected  to  act  as  the  critical 
safety  net  during  rare,  idiosyncratic  events  that  the  design 
team  has  failed  to  anticipate.  These  are  exactly  the  situations 
in  which  people  have  to  fall  back  on  their  general  problem- 
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solving  strategies,  and  are  thus  susceptible  to  the  potential 

errors  associated  with  these  weak  problem-solving  methods. 

In  addition,  this  review  emphasizes  the  need  to  take  a  broad 
systems  perspective,  recognizing  that  the  design  and  imple¬ 
mentation  team  are  a  potential  source  of  error,  and  under¬ 
standing  that,  if  the  goal  is  to  prevent  or  reduce  the  impact  of 
errors,  then  the  solution  is  not  always  to  change  performance  at 
the  immediate  point  where  an  error  was  made.  It  may  be  more 
effective  to  change  other  aspects  of  the  system  so  that  the  po¬ 
tentially  hazardous  situation  never  even  arises. 

Finally,  the  emphasis  on  designer  error  implies  that  the  de¬ 
sign  process  for  a  DSS  needs  to  be  viewed  as  iterative  and  evo¬ 
lutionary.  Just  because  a  tool  has  been  fielded,  it  is  not  safe  to  as¬ 
sume  that  no  further  changes  will  be  needed.  As  Horton  and 
Lewis  (1991)  discussed,  this  has  organizational  implications 
(e.g.,  ensuring  adequate  communications  regarding  the  actual 
use  of  the  DSS,  budgeting  resources  to  make  future  revisions), 
as  well  as  architectural  implications  (developing  a  system  archi¬ 
tecture  that  enables  revisions  in  a  cost-effective  manner). 

Human  Expertise 

As  discussed  in  previous  chapters,  the  nature  of  human  infor¬ 
mation  processing  imposes  a  variety  of  constraints  that  influence 
how  effectively  a  person  processes  certain  kinds  of  information. 
These  include  memory,  perceptual,  and  information-processing 
constraints.  As  a  result,  there  are  certain  decision  tasks  in  which 
the  computational  complexity  or  knowledge  requirements  limit 
the  effectiveness  of  unaided  individual  human  performance. 

From  a  design  perspective,  these  information-processing  con¬ 
straints  offer  an  opportunity  If  important  aspects  of  the  application 
are  amenable  to  computational  modeling,  then  a  DSS  may  provide 
a  significant  enhancement  of  performance,  either  in  terms  of  effi¬ 
ciency  or  the  quality  of  the  solution.  (Note  that,  even  if  develop¬ 
ment  of  an  adequate  computational  model  is  not  feasible,  there 
may  be  other  technological  improvements,  such  as  more  effective 
communications  environments,  that  could  enhance  performance. 
However,  this  chapter  is  focusing  on  DSSs  that  incorporate  active 
information-processing  functions  by  the  software.) 

A  good  example  of  this  is  flight  planning  for  commercial  air¬ 
craft.  Models  of  aircraft  performance  considering  payload, 
winds,  distance,  and  aircraft  performance  characteristics  (for 
the  specific  aircraft  as  well  as  the  general  type  of  aircraft)  are  suf¬ 
ficiently  accurate  to  merit  the  application  of  DSSs  that  use  opti¬ 
mization  techniques  to  generate  alternative  routes  and  altitude 
profiles  to  meet  different  goals  in  terms  of  time  and  fuel  con¬ 
sumption  (R  J.  Smith,  McCoy,  &  Layton,  1997).  This  example 
also  provides  a  reminder,  however,  that  it  is  not  just  human  lim¬ 
itations  that  are  relevant  to  design.  It  is  equally  important  to 
consider  human  strengths,  and  to  design  systems  that  comple¬ 
ment  and  are  compatible  with  users’  capabilities.  In  flight  plan¬ 
ning,  for  instance,  this  includes  designing  a  system  that  allows 
the  person  to  incorporate  his  judgments  into  the  generation 
and  evaluation  of  alternative  flight  plans,  considering  the  impli¬ 
cations  of  uncertainty  in  the  weather,  air  traffic  congestion,  and 
other  factors  that  may  not  be  incorporated  into  the  model  un¬ 
derlying  the  DSS. 


This  cooperative  systems  perspective  has  several  implica¬ 
tions.  First,  the  designer  needs  to  have  some  understanding  of 
when  and  how  the  person  should  be  involved  in  the  alternative 
generation  and  selection  processes.  This  requires  insights  into 
how  the  person  makes  decisions,  in  terms  of  problem-solving 
strategies  as  well  as  in  terms  of  access  to  the  relevant  knowl¬ 
edge  and  data,  and  how  the  introduction  of  a  DSS  can  influ¬ 
ence  these  problem-solving  processes.  It  also  implies  that  the 
strengths  underlying  human  perceptual  processes  need  to  be 
considered  through  display  and  representation-aiding  strate¬ 
gies  (Burns  &  Hajdukiewicz,  2004;  Jones  &  Schkade,  1995; 
Kleinmuntz  &  Schkade,  1993;  Larkin,  1989;  Tufte,  1997)  in  or¬ 
der  to  enhance  the  person’s  contributions  to  the  decision¬ 
making  process  by  making  important  relationships  more  per¬ 
spicuous  or  salient. 

Descriptive  Models 

The  literature  on  human  problem  solving  and  decision  mak¬ 
ing  provides  some  very  useful  general  considerations  for  mod¬ 
eling  human  performance  within  a  specific  application.  This  lit¬ 
erature,  which  is  reviewed  in  more  detail  in  previous  chapters, 
covers  a  variety  of  descriptive  models  of  problem  solving.  This 
includes  the  use  of  heuristic  search  methods  (Clancey,  1985; 
Dasgupta,  Chakrabarti,  &  Desarkar,  2004;  Michalewicz  &  Fogel, 
2004;  Michie,  1986;  Rayward-Smith,  Osman,  Reeves,  &  G.  Smith, 
1996;  Russell  &  Norvig,  1995).  This  modeling  approach  (Newell 
&  Simon,  1972)  conceptualizes  problem  solving  as  a  search 
through  a  space  of  problem  states  and  problem-solving  opera¬ 
tions  to  modify  and  evaluate  the  new  states  produced  by  apply¬ 
ing  these  operations.  The  crucial  insight  from  modeling  prob¬ 
lem  solving  as  search  is  an  emphasis  on  the  enormous  size  of 
the  search  space  resulting  from  the  application  of  all  possible 
operations  in  all  possible  orders.  In  complex  problems,  the  size 
of  the  space  precludes  exhaustive  searching  of  the  possibilities 
to  select  an  optimal  solution.  Thus,  some  heuristic  approach 
that  satisfices,  such  as  elimination  by  aspects  (Tversky  1972),  is 
needed  to  guide  a  selective  search  of  this  space,  resulting  in  the 
identification  of  an  acceptable  (if  not  optimal)  solution. 

Another  aspect  of  problem  solving  that  Newell  and  Simon 
(1972)  noted  when  they  formulated  the  search  paradigm  was 
the  importance  of  problem  representation.  Problem  represen¬ 
tation  issues  emphasize  that  a  task  environment  is  not  inher¬ 
ently  objectively  meaningful,  but  requires  interpretation  for 
problem  solving  to  proceed.  Some  interpretations  or  represen¬ 
tations  are  more  likely  to  lead  to  successful  solutions  than  oth¬ 
ers  (for  certain  agents).  In  fact,  an  important  component  of  ex¬ 
pertise  is  the  set  of  features  or  the  representation  used  to 
characterize  a  domain  (Lesgold  et  al.,  1988),  sometimes  referred 
to  as  “domain  ontology.” 

Task-specific  problem  spaces  allow  problem  solvers  to  incor¬ 
porate  task-  or  environment-induced  constraints,  focusing  at¬ 
tention  on  just  the  operations  of  relevance  to  preselected  goals 
(Sewell  &  Geddes,  1990;  Waltz,  1975).  For  such  task-specific 
problem  solving,  domain-specific  knowledge  (represented  in 
computational  models  as  production  rules,  frames,  or  some 
other  knowledge  representation)  may  also  be  incorporated  to 
increase  search  efficiency  or  effectiveness  (Dechter,  2003; 
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Ghallib,  Nau,  &  Traverso,  2004;  Laird  et  al.,  1987;  Shalin,  Ged- 
des,  Bertram,  Szczepkowski,  &  DuBois,  1997). 

These  models,  based  on  symbolic  reasoning,  have  been  de¬ 
veloped  for  specific  generic  tasks  like  diagnosis  or  planning 
(Chandrasekaran,  1988;  Ghallib  et  al.,  2004;  Gordon,  2004;  G.  A. 
Miller  et  al.,  I960;  Mumford,  Schultz,  &  Van  Doom,  2001; 
Nareyek,  Tourer,  &  Freuder,  2005).  For  example,  computational 
models  of  planning  focus  on  the  use  of  abstraction  hierarchies 
to  improve  search  efficiency  (Sacerdoti,  1974),  dealing  with  com¬ 
peting  and  complementary  goals  (Wilensky,  1983),  and  mixed 
top-down/bottom-up  processing  (B.  Hayes-Roth  &  F.  Hayes- 
Roth,  1979)  to  opportunistically  take  advantage  of  data  as  it  be¬ 
comes  available.  More  recent  developments  have  sought  to  deal 
with  planning  in  stochastic  environments  (Madani,  Condon,  & 
Hanks,  2003;  Majercik  &  Littman,  2003). 

In  contrast  to  such  sequential  search  models,  models  based 
on  case-based  reasoning,  recognition-primed  decision  making, 
or  analogical  reasoning  (Crandall  et  al.,  2006;  Klein,  1993;  Kolod- 
ner,  1993;  Riesbeck  &  Schank,  1989)  focus  on  prestructured  and 
indexed  solutions  based  on  previously  experienced  situations. 
These  modeling  approaches  suggest  that  human  experts  in 
complex  operational  settings  rarely  describe  their  cognitions  as 
sequential  searches  to  construct  alternative  solution  states,  but 
rather  as  recognition  processes  that  match  features  of  the  situ¬ 
ation  to  prototypical,  preformulated  response  plans  (Zuboff, 
1988;  Rasmussen,  1983).  In  complex  dynamic  domains  (Oliver 
&  Roos,  2005)  like  aviation  or  firefighting,  these  preformulated 
plans  incorporate  a  great  deal  of  implicit  knowledge  reflecting 
the  constraints  of  the  equipment,  the  environment,  other  par¬ 
ticipants  in  the  work  system,  and  previous  experience.  Over 
time,  solutions  sensitive  to  these  constraints  result  in  familiar, 
accepted  methods  which  are  then  triggered  by  situational  cues 
and  modified  in  small  ways  as  needed  to  deal  with  the  specific 
scenario.  However,  it  is  important  to  recognize  that  such  recog¬ 
nition-based  processes  can  cause  familiar  task  features  to  invoke 
only  a  single  interpretation,  resulting  in  hypothesis  fixation  or 
hindering  creative  departure  from  normal  solutions.  This  limi¬ 
tation  can  be  critical  when  the  task  environment  contains  a  new, 
unexpected  feature  that  requires  a  new  approach. 

Normative  Optimal  Models 

In  addition  to  these  descriptive  models,  a  large  amount  of  lit¬ 
erature  characterizes  human  decision  making  relative  to  various 
normative  models  based  on  optimal  processes,  which  help  to 
emphasize  important  factors  within  the  task  and  task  environ¬ 
ment  that  should  be  considered  in  making  a  decision  in  order  to 
arrive  at  a  better  decision.  These  normative  models  are  based 
on  engineering  models  such  as  statistical  decision  and  utility 
theory,  information  theory,  and  control  theory  (Jagacinski  & 
Flach,  2003;  Rouse,  1980;  Sheridan  &  Ferrell,  1974)  By  contrast¬ 
ing  human  performance  with  optimal  performance  on  certain 
tasks  and  emphasizing  the  factors  that  should  influence  deci¬ 
sion  making  in  order  to  achieve  a  high  level  of  performance,  this 
literature  helps  the  designer  to  look  for  areas  where  human  per¬ 
formance  may  benefit  from  some  type  of  DSS. 

Finally,  the  literature  on  human  expertise  (Charness,  Fel- 
tovich,  Hoffman,  &  Ericsson,  2006)  emphasized  the  ability  of 


people  to  learn  and  adapt  to  novel  situations,  and  to  develop 
skeletal  plans  that  guide  initial  decisions  or  plans,  but  that  are 
open  to  adaptation  as  a  situation  unfolds  (Geddes,  1989;  Such- 
man,  1987;  Wilensky,  1983).  The  literature  also  emphasizes  vari¬ 
ability,  both  in  terms  of  individual  differences  and  in  terms  of 
the  ability  of  a  single  individual  to  use  a  variety  of  decision-mak¬ 
ing  strategies  in  some  hybrid  fashion  in  order  to  be  responsive 
to  the  idiosyncratic  features  of  a  particular  scenario. 

Human  Expertise — implications  for  Design 

An  understanding  of  the  literature  on  human  expertise  is  of 
value  to  the  designer  in  a  number  of  different  ways.  First,  in  terms 
of  the  initial  problem-definition  and  knowledge-engineering 
stages,  familiarity  with  models  of  human  problem  solving  and 
decision  making  can  help  guide  the  designer  in  looking  for  im¬ 
portant  features  that  influence  performance  within  that  appli¬ 
cation.  Second,  in  an  effort  to  provide  cognitive  compatibility 
with  the  users,  the  design  of  many  of  the  technologies  under¬ 
lying  DSSs  is  guided  by  these  same  computational  models  of  hu¬ 
man  performance.  Third,  even  if  the  underlying  technology 
isn’t  in  some  sense  similar  to  the  methods  used  by  human  ex¬ 
perts  in  the  application,  the  designer  needs  to  consider  how  the 
functioning  of  the  DSS  system  should  be  integrated  within  the 
user’s  decision-making  processes. 

In  terms  of  some  specific  emphases,  the  above  discussion 
leads  to  some  suggestions: 

•  Don’t  fixate  on  active  DSS  technologies  as  the  only  way  to  im¬ 
prove  system  performance.  Enhancing  human  performance 
through  changes  in  procedures,  improvements  in  passive 
communication  tools,  better  external  memory  aids,  and  so 
on  may  be  more  cost-effective  in  some  cases.  In  addition, 
such  changes  may  be  needed  to  complement  an  active  DSS 
in  order  to  make  its  use  more  effective. 

•  Look  for  ways  in  which  the  direct  perception  of  ecological 
constraints  allows  people  to  perform  expertly  (Flach,  Han¬ 
cock,  Caird,  &  "Vicente,  1995).  The  ability  to  perceive  these 
critical  parameters  or  constraints  directly  may  make  what 
seems  like  a  very  difficult  information-processing  task  much 
less  demanding. 

•  Related  to  the  consideration  of  ecological  constraints  was  the 
earlier  suggestion  that  problem  representation  has  a  strong 
influence  on  how  easily  a  problem  solver  can  find  a  good  so¬ 
lution.  This  issue  of  perspicuity  (salience  of  the  problem  so¬ 
lution)  ,  however,  is  dependent  not  only  on  the  characteris¬ 
tics  of  the  task  and  task  environment,  but  also  on  the  nature  of 
the  problem  solver.  Thus,  in  order  to  enhance  human  perfor¬ 
mance  as  part  of  the  decision-making  process,  it  is  important 
to  consider  alternative  ways  of  representing  the  situation  that 
will  enable  human  perceptual  and  cognitive  processes  to  work 
more  effectively.  In  addition,  problem  representation  can  af¬ 
fect  the  performance  of  the  designer.  Consequently,  alterna¬ 
tive  problem  representations  need  to  be  generated  as  part  of 
the  design  process  to  help  the  designer  think  more  effectively. 

•  Consider  the  applicability  of  normative  optimal  models  of 
performance  for  the  task  in  order  to  focus  attention  on  fac¬ 
tors  that  should  be  influencing  current  performance,  and  to 
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contrast  optimal  strategies  with  the  strategies  that  people  are 
actually  using  in  response  to  these  task-determined  factors. 
Aspects  of  these  normative  models  may  also  be  appropriate 
for  more  direct  inclusion  in  the  DSS  itself,  even  though  the 
strategies  used  by  people  do  not  fully  reflect  the  optimal 
processes  highlighted  by  these  normative  models.  In  addi¬ 
tion  to  considering  the  task  structure  highlighted  by  norma¬ 
tive  optimal  models,  use  knowledge  of  the  variety  of  different 
descriptive  models  of  human  problem  solving  and  decision 
making  to  guide  knowledge-engineering  efforts,  making  it 
easier  and  more  efficient  to  understand  how  people  are  cur¬ 
rently  performing  the  tasks.  Thus,  all  of  these  models  of 
decision  making  represent  conceptual  tools  that  help  the  de¬ 
signer  to  more  effectively  and  efficiently  understand  perfor¬ 
mance  in  the  existing  system,  and  to  develop  new  tools  and 
procedures  to  improve  performance. 

•  Whether  the  DSS  is  designed  to  process  information  “like”  the 
user  at  some  level,  or  whether  it  uses  some  very  different  pro¬ 
cessing  strategy,  the  user  needs  to  have  an  appropriate  and  ef¬ 
fective  mental  model  of  what  the  system  is  and  is  not  doing 
(Kotovsky,  J.  R.  Hayes,  &  Simon,  et  al.,  1985;  Lehner  &  Zirk, 
1987;  Nickerson,  1988;  Zhang,  1997;  Zhang  &  Norman,  1994). 
To  help  ensure  such  cognitive  compatibility,  the  designer 
needs  to  understand  how  people  are  performing  the  task. 

•  Consider  design  as  a  prediction  task,  trying  to  predict  how 
users  will  interact  with  the  DSS  system,  and  how  it  will  influ¬ 
ence  their  cognitive  processes  and  performances. 

Additional  Cognitive  Engineering  Considerations 

The  preceding  sections  focused  on  our  understanding  of  hu¬ 
man  performance  on  decision-making  tasks  and  the  implica¬ 
tions  of  that  literature  for  design.  Below,  some  additional  con¬ 
siderations  based  on  studies  of  the  use  of  DSSs  are  presented. 

The  Human  Operator  as  Monitor 

Numerous  studies  make  it  clear  that,  given  the  designs  of 
DSSs  for  complex  tasks  must  be  assumed  to  be  brittle,  there  is  a 
problem  with  designs  that  assume  that  human  expertise  can  be 
incorporated  as  a  safety  net  by  asking  a  person  to  simply  monitor 
the  output  of  the  DSS,  with  the  responsibility  for  overriding  the 
software  if  a  problem  is  detected.  One  problem  with  this  role  is 
that,  in  terms  of  maintaining  a  high  level  of  attentiveness,  people 
do  not  perform  well  on  such  sustained  attention  tasks  (Meister 
&  Enderwick,  2002).  As  Bainbridge  (1983)  noted: 

We  know  from  many  ‘vigilance’  studies  (Mackworth,  1950/1961)  that  it 
is  impossible  for  even  a  highly  motivated  human  being  to  maintain  ef¬ 
fective  visual  attention  towards  a  source  of  information  on  which  very 
little  happens,  for  more  than  about  half  an  hour.  This  means  that  it  is 
humanly  impossible  to  carry  out  the  basic  function  of  monitoring  for 
unlikely  abnormalities,  (p.  777) 

A  related  issue  is  the  problem  of  loss  of  skill.  As  Bainbridge 
(1983)  further  noted,  if  the  operator  has  been  assigned  a  passive 
monitoring  role,  he  “will  not  be  able  to  take  over  if  he  has  not 


been  reviewing  his  relevant  knowledge,  or  practicing  a  crucial 
manual  skill”  (p.  778).  Thus,  a  major  challenge  for  retaining  hu¬ 
man  expertise  within  the  system  is  “how  to  maintain  the  effec¬ 
tiveness  of  the  human  operator  by  supporting  his  skills  and  mo¬ 
tivation”  (p.  778). 

Complacency  and  Overreliance 

Studies  reported  by  Parasuraman  and  Riley  (1997)  and  Para- 
suraman  (2000)  introduced  further  concerns  about  assigning 
the  person  the  role  of  critiquing  the  computer’s  recommenda¬ 
tions  before  acting.  These  studies  discuss  how  the  introduction 
of  a  DSS  system  can  lead  to  overreliance  by  the  human  user 
when  the  software  is  generating  the  initial  recommendations 
(Metzger  &  Parasuraman,  2005;  Skirka  et  al.,  1999). 

Excessive  Mental  Workload 

Designs  that  relegate  the  person  to  the  role  of  passive  moni¬ 
tor  run  the  risk  of  a  vigilance  decrement  due  to  insufficient  en¬ 
gagement  and  mental  workload.  At  the  other  extreme,  design¬ 
ers  must  recognize  that  “clumsy  automation”  that  leaves  the 
person  with  responsibility  for  difficult  parts  of  the  task  (such  as 
coping  with  an  emergency),  but  that  adds  additional  workload 
(Kushleyeva,  F.  Salvucci,  &  Schunn,  2005)  due  to  the  awkward  in¬ 
teractions  now  required  to  access  information  and  functions  em¬ 
bedded  in  the  DSS  (such  as  navigating  through  complex  menus 
to  view  certain  information) ,  can  actually  impair  performance 
because  of  the  added  mental  workload  of  interacting  with  the 
DSS  (Wiener  &  Nagel,  1988) .  One  line  of  research  that  is  now  get¬ 
ting  increased  attention  is  the  potential  to  design  adaptive  sys¬ 
tems  that  can  monitor  the  human  operator  using  neurophysio¬ 
logical  measures  that  attempt  to  detect  problems  with  alertness, 
mental  workload,  or  attention  (Caggiano  &  Parasuraman,  2004; 
Freeman,  Mikulka,  &  Scerbo,  2004;  Grier,  Warm,  &  Dember, 
2003;  Luo,  Greenwood,  &  Parasuraman,  2001;  Mikulka,  Scerbo,  & 
Freeman,  2002).  This  work  seeks  to  “determine  whether  a  bio- 
cybernetic,  adaptive  system  could  enhance  vigilance  perfor¬ 
mance,”  with  the  goal  of  improving  “monitoring  performance 
on  critical  activities  such  as  air  traffic  control  and  radar  and  sonar 
operation”  (Mikulka  et  al.,  2002,  p.  654),  or  offloading  work  to 
software  or  other  people  when  the  computer  detects  that  exces¬ 
sive  attentional  demands  are  being  placed  on  an  individual. 

Lac^  of  Awareness  or  Understanding 

Even  if  the  person  is  sufficiently  engaged  with  the  DSS  and  the 
underlying  task,  designers  need  to  consider  how  to  ensure  that 
the  user  has  an  accurate  mental  model  of  the  situation,  and  of  the 
functioning  and  state  of  the  DSS  (Billings,  1997;  Larkin  &  Simon, 
1987;  Mitchell  &  R.  A.  Miller,  1986;  Roth  et  al.,  1987;  Sarter  & 
Woods,  1993).  If  the  person  does  not  have  such  an  understand¬ 
ing,  then  it  may  be  difficult  for  him  to  intervene  at  appropriate 
times  or  to  integrate  the  computer’s  inputs  into  his  own  think¬ 
ing  appropriately  (Geddes,  1997b).  This  concern  has  implications 
for  selection  of  the  underlying  technology  and  conceptual  model 
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for  a  system,  as  well  as  for  the  design  of  the  visual  or  verbal  dis¬ 
plays  intended  to  represent  the  state  of  the  world  and  the  state  of 
the  software  for  the  user,  including  explanations  of  how  the  DSS 
has  arrived  at  its  recommendations  (Clancey  1983;  Hasling, 
Clancey,  &  Rennels,  1984;  Nakatsu  &  Benbasat,  2003). 

Lac^  of  Trust  and  User  Acceptance 

As  outlined  above,  overreliance  can  be  a  problem  with  cer¬ 
tain  assignments  of  roles  to  the  person  and  the  computer.  At  the 
other  extreme,  lack  of  trust  or  acceptance  of  the  technology  can 
eliminate  or  reduce  its  value  (Clarke,  Hardstone,  Rouncefield,  & 
Sommerville,  2006;  Hollnagel,  1990;  Muir,  1987).  This  lack  of  ac¬ 
ceptance  can  result  in  outright  rejection  of  the  DSS  (in  which 
case  it  either  is  not  purchased  or  not  used),  or  in  a  tendency  to 
underutilize  it  for  certain  functions.  It  is  important  to  note  that 
this  lack  of  acceptance  can  be  due  to  resistance  to  change 
(Cartwright  &  Zander,  I960;  Forsyth,  1998;  Levi,  2001)  even  if 
there  is  no  intrinsic  weakness  in  the  DSS,  due  to  general  beliefs 
held  by  the  operators  (rightly  or  wrongly)  about  how  the  soft¬ 
ware  will  influence  their  lives,  or  due  to  beliefs  about  how  well 
such  a  DSS  can  be  expected  to  perform  (Andes  &  Rouse,  1992; 
J.  Lee  &  See,  2004;  Marsh  &  Dibben,  2003;  Muir  &  Moray,  1996; 
Riegelsberger,  Sasse,  &  McCarthy,  2005). 

Active  Biasing  of  the  User's  Cognitive  Processes 

Complacency  is  one  way  in  which  a  DSS  can  influence  the 
person’s  cognitive  processing.  Studies  have  also  shown,  how¬ 
ever,  that  the  use  of  a  DSS  can  also  actively  alter  the  user’s  cog¬ 
nitive  processes  (Beach  &  Connolly,  2005;  Larson  &  C.  Hayes, 
2005;  P  J.  Smith,  McCoy,  &  Layton,  1997).  The  displays  and  rec¬ 
ommendations  presented  by  the  software  have  the  potential  to 
induce  powerful  cognitive  biases,  including  biased  situation  as¬ 
sessment  and  failures  by  the  user  to  activate  and  apply  his  ex¬ 
pertise  because  normally  available  cues  in  the  environment  are 
no  longer  present.  The  net  result  is  that  the  person  fails  to  ex¬ 
hibit  the  expertise  that  he  would  normally  contribute  to  the  de¬ 
cision-making  task  if  working  independently,  not  because  he 
lacks  that  expertise  but  because  the  computer  has  influenced  his 
cognitive  processes  in  such  a  way  that  this  knowledge  is  never 
appropriately  activated.  Studies  have  shown  that  these  biasing  ef¬ 
fects  can  induce  practitioners  to  be  31%  more  likely  to  arrive  at 
an  incorrect  diagnosis  on  a  medical  decision-making  task  (Guer- 
lain  et  al.,  1996)  and  31%  more  likely  to  select  a  very  poor  plan  on 
a  flight-planning  task  (E  J.  Smith,  McCoy,  &  Layton,  1997). 

Distributed  Work  and  Alternative  Roles 

Much  of  the  literature  focuses  on  the  interactions  between 
a  single  user  and  the  DSS.  Increasingly,  however,  researchers 
and  system  developers  are  recognizing  that  one  approach  to 
effective  performance  enhancement  is  to  think  in  terms  of  a  dis¬ 
tributed  work  paradigm  (Schroeder  &  Axelsson,  2005;  P.  J. 
Smith,  McCoy,  &  Orasanu,  2001),  in  which  the  software  maybe 
one  “agent”  in  this  distributed  system  (Geddes  &  Lizza,  1999), 
or  the  software  may  be  viewed  primarily  as  a  mediator  to  sup¬ 


port  human-human  interactions  within  virtual  or  distributed 
teams  (Baecker  et  al.,  1995;  Bowers  et  al.,  2006;  Caldwell,  2005; 
Carroll,  Neale,  Isenhour,  &  McCrickard,  2003;  Handy,  1995;  Hertel, 
Geister,  &  Konradt,  2005;  Hinds  &  Kiesler,  2005;  Hinds  & 
Mortensen,  2005;  Hutchins,  1990, 1995;  Katz,  Lazer,  Arrow,  &  Con¬ 
tractor,  2004;  Lurey  &  Raisinghani,  2001;  G.  M.  Olson  &  J.  S.  Olson, 
1997,  2003;  Orasanu  &  Salas,  1991;  Rasmussen  et  al.,  1991;  Salas, 
Bowers,  &  Edens,  2001;  Salas  &  Fiore,  2004;  Smith,  McCoy,  & 
Layton,  1997).  Such  distributed  systems  can  now  be  found  in 
the  military,  in  medicine,  in  aviation  and  a  variety  of  other  ap¬ 
plication  areas. 

Sycara  and  Lewis  (2004)  suggested  that: 

There  are  three  possibie  functions  that  software  agents  might  have 
within  human  teams: 

1.  Support  the  individuai  team  members  in  completion  of  their  own 

tasks 

2.  Assume  the  role  of  a  (more  or  less)  equal  team  member  by  perform¬ 
ing  the  reasoning  and  tasks  of  a  human  teammate 

3.  Support  the  team  as  a  whole,  (p.  204) 

and  that  “teams  could  be  aided  along  four  general  dimensions:  ac¬ 
cessing  information,  communicating,  monitoring,  and  planning” 
(p.  227).  In  considering  the  design  and  use  of  DSSs  for  such  ap¬ 
plications,  all  of  the  more  traditional  factors  must  be  considered. 
In  addition,  questions  regarding  the  impact  of  the  technology  on 
team  situation  awareness  (Endsley  2003),  rapport,  trust,  and  com¬ 
munication  become  paramount.  It  is  also  important  to  think  of  a 
broader  set  of  design  parameters,  as  distributed  approaches  to 
work  open  up  the  potential  to  use  a  variety  of  different  architec¬ 
tures  for  distributing  the  work  in  terms  of  the  locus  of  control  or 
responsibility  and  the  distribution  of  knowledge,  data,  and  infor¬ 
mation-processing  capabilities  (Griffith,  Sawyer,  &  Neale,  2003; 
Sheridan,  1997;  Sheridan,  2002;  E  J.  Smith,  McCoy,  Orasanu,  et  al., 
1997;  E  J.  Smith,  Billings,  et  al.,  2000;  E  J.  Smith,  Beatty,  Spencer,  & 
Billings,  2003).  Furthermore,  DSSs  may  be  necessary  to  make  ef¬ 
fective  certain  architectures  for  distributing  the  work. 

Organizational  Failures 

The  discussions  above  focus  on  design-induced  errors  made 
by  the  system  operators.  It  is  equally  important  for  the  design 
team  to  recognize  that  part  of  the  design  process  is  ensuring 
that  the  management  of  the  organization  into  which  the  DSS  is 
introduced  will  provide  an  effective  a  safety  net.  This  means  that 
the  design  team  needs  to  ensure  that  the  organization  has  es¬ 
tablished  effective  procedures  to  detect  significant  problems  as¬ 
sociated  with  the  introduction  and  use  of  the  DSS,  and  that 
such  problems  are  communicated  to  the  levels  of  management 
where  responsibility  can  and  will  be  taken  to  respond  effectively 
to  these  problems  (Horton  &  Lewis,  1991). 


CASE  STUDIES 


Below,  case  studies  are  presented  focusing  on  the  designs  of 
three  quite  different  DSSs.  Within  these  case  studies,  a  number 
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of  the  issues  raised  earlier  are  discussed  in  more  detail  within 
the  context  of  real-world  applications,  along  with  presentations 
of  design  solutions  that  provide  concrete  illustrations  of  how 
to  deal  with  those  issues. 

Case  Study  A — Distributed  Work  in  the  National 
Airspace  System:  Ground  Delay  Programs 

Earlier  in  this  chapter,  we  indicated  that  a  distributed-work  sys¬ 
tem  can  be  characterized  in  terms  of  the  assignment  of  control 
or  decision-making  authority  in  relationship  to  the  distribution 
of  knowledge  and  data.  The  Federal  Aviation  Administration’s 
(FAA’s)  Enhanced  Ground  Delay  Program  provides  a  very  infor¬ 
mative  example  for  illustrating  the  significance  of  these  para¬ 
meters  when  designing  a  distributed  system. 

The  Application  Area 

The  EAAs  Enhanced  Ground  Delay  Program  (GDP)  has  been 
in  operation  since  1998.  Its  goal  is  to  help  match  the  arrival  de¬ 
mand  at  an  airport  with  the  available  capacity,  where  demand 
could  exceed  capacity  because  of  a  weather  constraint  at  that 
airport,  a  closed  runway,  and  so  on. 

The  strategy  underlying  the  use  of  GDPs  is  to  delay  depar¬ 
tures  to  the  constrained  airport,  holding  these  departures  on 
the  ground  when  that  airport’s  arrivals  are  predicted  to  exceed 
capacity.  Such  a  strategy  has  potential  benefits  for  the  EAA  in 
terms  of  reducing  air  traffic  controller  workload  (reducing  the 
need  for  the  use  of  airborne  holding  or  some  other  air  traffic 
control  tactic).  It  also  has  potential  benefits  for  the  NAS  Users, 
as  it  can  be  more  economical  to  delay  departures,  holding  these 
flights  on  the  ground  (reducing  fuel  consumption  due  to  air¬ 
borne  holding  as  well  as  reducing  the  potential  for  diversions). 

The  Original  Approach  to  Distributing  Work  in  a  GDP 

In  principle,  there  are  a  variety  of  different  ways  to  implement 
GDPs.  Prior  to  1998,  the  EAA  used  GDPs,  but  did  so  using  an  ap¬ 
proach  that  was  much  less  effective  than  the  currently  used  pro¬ 
cedure.  Under  this  original  paradigm,  traffic  managers  at  the 
EAA’s  Air  Traffic  Control  Systems  Command  Center  (ATCSCC) 
would  set  the  airport  arrival  rate  (AAR)  for  an  airport,  and  then 
a  DST  would  automatically  assign  an  arrival  slot  to  each  specific 
flight,  which  would  then  be  held  on  the  ground  until  it  was  time 
to  depart  to  make  its  assigned  arrival  slot.  NAS  Users  had  one  op¬ 
portunity  to  swap  flights  in  a  batch  mode  (e.g.,  requesting  all  of 
the  swaps  at  one  time).  This  limited  their  ability  to  monitor  the 
development  of  the  situation  over  time  and  adapt  as  new  infor¬ 
mation  became  available.  In  addition,  although  NAS  Users  were 
asked  to  cooperate  in  order  to  make  this  program  work,  there 
was  actually  a  disincentive  for  them  to  provide  the  EAA  with  in¬ 
formation  about  cancellations  or  potential  delays.  If  the  cancel¬ 
lation  of  a  flight  was  reported  to  the  EAA,  the  NAS  User  lost  its  as¬ 
signed  arrival  slot;  if  it  was  reported  that  a  flight  was  going  to 
miss  its  departure  time  (because  of  a  mechanical  problem,  for  in¬ 
stance),  that  flight  could  be  assigned  an  even  greater  delay.  As  a 


result,  NAS  Users  often  chose  to  withhold  this  information  from 
the  EAA,  resulting  in  unused  arrival  slots  (wasted  capacity). 

An  Alternative  Approach  to  the  Design  of  GDPs 

Given  these  limitations,  a  joint  industry-FAA  program 
evolved  to  identify  and  implement  a  different  approach  to 
GDPs.  This  Collaborative  Decision  Making  program  explored 
methods  for  still  achieving  the  FAA’s  primary  goal  (matching  ar¬ 
rival  rates  to  airport  capacity),  while  giving  NAS  Users  the  flexi¬ 
bility  to  better  achieve  their  business  goals. 

One  approach  would  have  been  to  have  NAS  Users  provide 
the  FAA  with  all  of  the  knowledge  and  data  about  their  business 
concerns  relevant  to  decisions  about  which  flights  to  assign  to 
which  arrival  slots.  In  principle,  the  FAA  traffic  managers  could 
have  then  determined  “good”  slot  allocations  for  each  NAS  User, 
and  assigned  flights  to  slots  accordingly.  Such  a  decision  would 
have  had  to  somehow  take  into  consideration  the  competitive 
nature  of  this  situation  and  through  some  decision  process  de¬ 
termine  an  “equitable”  solution. 

It  was  decided  that  such  an  approach  would  produce  unre¬ 
alistic  demands  in  terms  of  cognitive  complexity  and  mental 
workload  for  these  traffic  managers,  as  they  would  have  had  to 
master  the  relevant  expertise  of  the  dispatchers  working  for 
each  NAS  User  and  to  somehow  use  that  knowledge  to  generate 
equitable  slot  assignments.  (Dispatchers  work  for  airlines  and 
other  general  aviation  operations  that  are  required  to,  or  choose 
to,  abide  by  Part  121  regulations  of  the  U.S.  Federal  Air  Regula¬ 
tions,  sharing  responsibility  for  each  flight  with  the  pilots.  They 
complete  the  preflight  planning — determining  the  flight’s 
route,  departure  time,  fuel  load,  and  so  on — and  are  responsi¬ 
ble  for  following  that  flight  while  it  is  airborne.) 

Strategy  for  distributing  work.  Instead,  the  CDM  pro¬ 
gram  developed  a  new  architecture  for  distributing  the  work 
associated  with  GDPs.  This  architecture  uses  a  classic  systems 
engineering  approach  for  dealing  with  complexity,  decompos¬ 
ing  the  overall  plan  for  a  GDP  into  a  set  of  nearly  independent 
subplans  that  can  be  developed  by  different  individuals  and 
that,  when  completed  by  each  of  these  individuals,  achieves  the 
desired  goals  (assuming  the  plan  is  successfully  implemented) . 
The  significance  of  the  qualifier  “nearly”  should  not  be  over¬ 
looked  in  terms  of  its  importance,  however.  The  reality  is  that 
there  are  cases  where  interactions  among  these  different  indi¬ 
viduals  are  required,  so  the  system  design  has  to  include  mech¬ 
anisms  that  support  such  interactions  in  order  to  ensure  that 
the  appropriate  perspectives  are  considered  in  setting  up  a  GDI) 
and  that  exceptions  can  be  made  for  individual  flights  when  that 
is  required  to  deal  with  some  overriding  consideration. 

Like  the  original  GDI)  this  Enhanced  GDP  starts  with  the  as¬ 
sumption  that  a  “neutral  resource  broker”  is  needed  because,  in 
situations  where  a  GDP  is  called  for,  the  NAS  Users  are  compet¬ 
ing  for  a  limited  resource  (arrival  slots  at  the  constrained  air¬ 
port).  The  critical  difference,  however,  is  the  application  of  the 
following  principle:  “The  “referee”  (the  FAA)  should  only  con¬ 
trol  the  process  at  the  level  of  detail  necessary  to  achieve  its 
goals  concerned  with  safety  and  throughput,  thus  allowing  NAS 
Users  the  flexibility  to  try  to  achieve  their  own  goals  to  the  ex- 
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tent  possible  subject  to  this  higher-level  constraint.”  Thus,  the 
critical  change  is  the  level  of  abstraction  at  which  ATCSCC  (the 
neutral  referee)  controls  the  process.  Instead  of  determining 
which  specific  flights  are  allocated  to  specific  arrival  slots,  ATC¬ 
SCC  determines  the  start  and  stop  times  for  the  GDI;  the  arrival 
rate  for  the  constrained  airport,  and  the  departure  airports  to  be 
included  in  the  GDI)  and  then  assigns  arrival  slots  to  specific 
NAS  Users  (such  as  a  specific  airline)  rather  than  to  specific 
flights  (using  a  software  tool  implemented  by  Metron  Aviation 
called  the  Flight  Schedule  Monitor  or  FSM,  shown  in  Fig.  29.1). 

The  NAS  User  can  then  decide  for  itself  which  of  its  flights 
to  assign  to  one  of  its  arrival  slots.  An  airline  might,  for  instance, 
have  flights  from  Dallas,  Chicago,  and  Miami  all  scheduled  to 
fly  to  LaGuardia  in  New  York,  all  scheduled  to  arrive  between 
1800-1815Z,  and  all  included  in  a  GDP  for  LaGuardia.  If  that  air¬ 
line  was  given  “ownership”  of  an  arrival  slot  at  1855Z  in  a  GDP 
for  LaGuardia,  it  could  assign  that  slot  to  any  one  of  these  three 
flights  based  on  their  relative  importance  to  that  airline’s  busi¬ 
ness  concerns — considering  factors  such  as  passenger  connec¬ 


tions,  aircraft  requirements  for  later  flights  out  of  LaGuardia, 
crew  time  restrictions,  and  aircraft  maintenance  schedules. 

Ration-by-schedule  (RBS).  In  order  to  accomplish  this 
new  process,  there  are  several  requirements.  First,  a  traffic  man¬ 
ager  at  ATCSCC  has  to  set  the  start  and  end  times  and  the  air¬ 
port  arrival  rate,  and  determine  which  departure  airports  to  in¬ 
clude.  To  do  this,  ATCSCC  solicits  input  from  the  relevant  FAA 
facilities  and  the  NAS  Users.  This  interaction  takes  the  form  of 
a  teleconference  that  is  held  every  two  hours,  during  which 
ATCSCC  can  propose  the  use  of  a  GDP  and  provide  an  oppor¬ 
tunity  for  input  from  each  of  the  participants.  This  decision¬ 
making  process  is  supported  by  a  simulation  or  “what-if  ”  func¬ 
tion  in  FSM  that  allows  the  user  to  input  a  set  of  parameters 
(proposed  arrival  rate,  start  and  end  times,  and  included  de¬ 
parture  airports)  and  view  the  predicted  impact  in  terms  of  av¬ 
erage  and  maximum  delays,  and  the  potential  for  a  spike  in  de¬ 
mand  at  the  airport  after  the  end  time  for  the  GDP  Window  3 
in  Fig.  29.1  shows  such  a  demand  pattern  for  SFO  in  one-hour 
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The  FSM  display 
consists  of  four 
main  windows. 
Clockwise,  starting 
from  the  upper 
right,  they  are: 

1.  The  Control 
window 

2.  The  Flight 
Information 
window 

3.  The  Demand 
Graph 

4.  The  Timeline 
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FIGURE  29.1.  Screen  display  from  FSM  .  (Window  1  controls  the  display  of  information  in  FSM; 
Window  2  provides  access  to  information  about  specific  flights  included  in  the  GDP;  Window  3 
provides  an  indication  of  the  predicted  arrival  rate,  with  the  display  changing  based  on  the  para¬ 
meters  set  for  the  GDP,  thus  supporting  "what-iffing";  Window  4  indicates  the  predicted  arrival 
times  for  individual  flights.)  (Source;  Metron  Aviation,  Inc.) 
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time  bins.  The  predicted  peak  demand,  for  instance,  is  45  flights 
arriving  during  the  0200Z  hour. 

After  considering  these  inputs,  the  ATCSCC  traffic  manager 
sets  the  parameters  for  the  GDT  and  lets  FSM  determine  the 
arrival  slot  allocations  for  each  NAS  User.  These  allocations  are 
based  on  a  “Ration  By  Schedule”  (RBS)  logic.  The  number  of 
arrival  slots  each  user  gets  is  proportional  to  its  share  of  the 
flights  that  are  included  in  the  GDP  that  were  originally  sched¬ 
uled  to  arrive  at  the  constrained  airport.  The  slots  “owned”  by 
that  NAS  User  are  initially  assigned  to  its  aircraft  based  on  the 
order  of  their  originally  scheduled  arrival  times  (see  Fig.  29.2). 
Thus,  if  that  NAS  User  originally  has  20  flights  out  of  a  total  of  80 
flights  that  are  included  in  the  GDP  and  are  scheduled  to  arrive 
at  the  constrained  airport  from  1300-1400Z  (for  a  GDP  from 
1200Z-1900Z),  and  if  the  GDP  requires  this  total  of  80  to  be  re¬ 
duced  to  40,  then  that  NAS  User  gets  10  slots  from  1300-1400Z. 
As  a  default,  the  first  10  of  its  included  flights  scheduled  to  arrive 
after  1300Z  are  assigned  these  10  slots.  The  arrival  times  for  the 
remaining  10  flights  are  pushed  back  to  slots  after  1400Z. 

Note  that,  to  make  this  logic  work,  the  GDP  needs  to  cover 
a  period  in  which,  during  the  GDP  or  just  after  its  end,  there 
are  valleys  in  the  predicted  demand  that  fall  below  the  reduced 
airport  arrival  rate  (AAR),  so  that  the  program  doesn’t  create  a 
huge  peak  in  arrival  demand  at  the  end  of  the  GDP  The  para¬ 
meters  defining  the  GDI)  along  with  the  slot  allocations,  are 
then  disseminated  to  the  NAS  Users. 


cisions  that  consider  tradeoffs  among  a  number  of  flights.  As  a  re¬ 
sult,  instead  of  requiring  some  type  of  potentially  time-consuming 
collaboration  among  the  affected  dispatchers  in  that  NAS  User’s 
Flight  Operations  Center,  the  responsibility  for  managing  par¬ 
ticipation  in  a  GDP  has  been  assigned  to  a  single  senior  dis¬ 
patcher  (often  a  new  position  referred  to  as  the  ‘Air  Traffic  Con¬ 
trol”  or  ‘ATC  Coordinator”). 

Most  ATC  Coordinators  manage  the  GDP  for  their  fleets  using 
another  set  of  DSSs  that  allow  the  ATC  Coordinator  to: 

•  Test  alternative  GDPs  (GDPs  with  different  parameters)  to 
see  which  alternative  is  best  from  his  fleet’s  perspective. 
(This  information  can  be  used  for  discussions  during  the 
planning  teleconference  in  which  the  GDP  is  proposed,  pro¬ 
viding  input  to  ATCSCC  regarding  the  GDP  parameters  that 
should  be  used.) 

•  Swap  arrival  slots  among  different  flights  in  the  fleet  either 
manually  or  though  an  automated  process.  (It  is  not  unusual 
to  make  swaps  involving  a  hundred  flights  based  on  passen¬ 
ger  connections.  This  requires  a  DST  that  uses  an  algorithm 
to  estimate  the  relative  importance  of  the  different  flights. 
The  ATC  Coordinator  can,  however,  constrain  or  override  the 
solution  recommended  by  the  DST.) 

•  Test  for  the  impact  of  strategies  for  canceling  flights  or 
delaying  their  departure  times. 


Slot  swapping.  Abstractly,  the  strategy  of  controlling  the 
process  at  a  more  abstract  level  (by  setting  an  AAR)  and  the 
strategy  of  using  the  RBS  algorithm  to  ration  slots  to  NAS  Users 
are  critical,  as  they  make  it  possible  to  distribute  the  overall 
work  as  relatively  independent  subtasks.  The  task  of  the  FAA 
(the  neutral  referee)  is  to  set  the  high-level  control  parameters. 
The  NAS  Users  can  then  work  within  the  resultant  constraints  to 
implement  plans  for  their  own  fleets  that  are  most  effective 
from  a  business  perspective. 

The  NAS  Users  accomplish  this  by  swapping  flights  among  as¬ 
signed  arrival  slots.  Each  NAS  User  can  do  this  by  swapping  flights 
among  its  own  slots  (see  Fig.  29.3),  or  by  making  a  swap  with  a 
slot  “owned”  by  another  NAS  User,  if  such  a  swap  is  available. 

Traditionally,  decisions  about  what  to  do  with  a  particular 
flight  were  handled  by  the  individual  dispatcher  responsible  for 
that  flight.  However,  this  new  process  involves  making  global  de¬ 
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FIGURE  29.2.  Arrival  slot  allocation  using  the  Ration-By-Schedule 
algorithm.  (Assume  that  the  arrival  airport  can  normally  handle 
one  arrival  every  two  minutes.  If  the  GDP  reduces  the  arrival  rate 
by  50%  for  all  arrivals,  then  the  flights  are  assigned  revised  arrival 
slots  that  keep  the  original  ordering,  but  show  increasing  delay 
over  time.) 


Figure  29.3  illustrates  swapping  flights  belonging  to  the  same 
NAS  User.  One  very  important  variation  of  this  feature  is  can¬ 
celing  a  flight  and  then  using  its  slot  to  swap  with  another  flight. 
This  is  important  because  this  provides  the  incentive  for  the 
NAS  user  to  provide  the  FAA  with  critical  information: 

According  to  the  “rules  of  the  game”  for  GDPs  (E  J.  Smith  et  al.,  2003), 
when  the  NAS  User  cancels  a  flight,  that  User  gets  to  keep  its  arrival 
slot  and  can  swap  it  just  like  any  other  slot  under  its  control.  In  order 
to  make  swaps  with  canceled  flights,  however,  the  User  must  inform 
the  FAA  of  the  cancellation,  (accomplished  through  digital  software 
communications) 

This  swapping  process  is  further  enhanced  by  a  procedure 
that  allows  swapping  among  the  different  NAS  Users.  This  pro¬ 
cedure  is  referred  to  as  “Slot-Credit  Substitution.” 

Slot-Credit  Substitution  is  best  understood  in  the  context  of 
making  a  decision  to  cancel  a  flight.  Suppose  Airline  ABC  has 
flights  that  arrive  every  hour  on  the  hour  from  EGA  (LaGuardia 
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FIGURE  29.3.  Arrival  slot  allocation  after  swapping  FLT-32 1  and 
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Airport)  to  ORD  (Chicago  O’Hare).  As  shown  in  Fig.  29.4,  these 
flights  are  scheduled  to  arrive  at  ORD  at  1000,  1100,  1200,  and 
1300Z. 

Suppose  a  GDP  is  issued  for  ORD  that  delays  these  flights, 
assigning  revised  arrival  times  (slots)  of  1030,  1200,  1300  and 
1400Z,  respectively.  In  principle,  each  of  these  flights  is  still  ca¬ 
pable  of  departing  on  time  and  therefore  arriving  at  its  originally 
scheduled  arrival  time  if  a  suitable  swap  could  be  found  to  elim¬ 
inate  its  delay  due  to  the  GDP  Thus  the  Earliest  Runway  Time 
of  Arrival  (ERTA)  for  each  of  these  flights  is  the  same  as  its  orig¬ 
inally  scheduled  arrival  time.  Each  flight  could,  in  principle, 
therefore  be  swapped  to  move  its  arrival  to  its  ERTA.  However, 
it  could  not  be  swapped  earlier  than  its  ERTA. 

Given  this  situation,  the  ATC  Coordinator  for  Airline  ABC 
would  be  willing  to  cancel  Flight  1  (and  put  the  passengers  on 
the  next  flight)  if  this  would  significantly  reduce  the  delay  on 
later  flights.  In  the  case  shown  in  Fig.  29.4,  however,  even  if  he 
cancels  Flight  1  the  slot  released  is  too  early  for  Flight  2  (a  flight 
with  an  llOOZ  ERTA  is  too  late  to  make  use  of  a  slot  with  a  1030Z 
CTA).  However,  another  airline  out  there  that  wants  a  slot  with 
a  1030Z  CTA,  and  would  gladly  give  up  an  llOOZ  slot  to  get  it. 
We  know  this  because  the  ground  delay  program  software  de¬ 
tects  that  another  airline  has  a  flight  with  an  llOOZ  CTA  and  has 
set  its  ERTA  to  1030Z  or  earlier. 

In  this  case  the  steps  in  the  process  work  as  follows: 

Airline  ABC  sends  a  message  saying: 

I  am  willing  cancel  Flight  1  and  give  up  my  1030Z  CTA  only  if 

a.  I  can  get  a  slot  at  1  lOOZ  and  put  Flight  2  into  it 

b.  Moving  Flight  2  frees  up  a  CTA  at  1200Z  into  which  I  will  put  Flight  3 

c.  Moving  Flight  3  frees  up  a  CTA  at  1300Z  into  which  I  will  put  Flight  4. 

The  ground  delay  program  software  looks  for  a  flight  to  form 
the  necessary  bridge  and  when  it  finds  the  one  mentioned 
above,  it  approves  the  entire  transaction.  (If  it  could  not  find  a 
suitable  flight  to  form  the  bridge,  it  would  have  rejected  the  en¬ 
tire  transaction  and  Airline  ABC  would  not  have  been  obligated 
to  cancel  Flight  1.) 

Note  that  it  is  not  the  setting  of  the  ERTA  that  kicks  off  the 
Slot-Credit  Substitution  process.  (ERTAs  are  always  set  as  a  mat¬ 
ter  of  course,  by  default  set  as  the  originally  scheduled  or  filed 
arrival  time  for  each  flight.)  Rather,  it  is  the  airline’s  request  for 
a  Slot-Credit  Substitution  transaction  that  causes  the  ground  de¬ 
lay  program  software  to  search  for  a  suitable  bridge. 

Overall,  then,  this  process  has  truly  distributed  the  work  by 
changing  the  locus  of  control  and  the  parameters  of  control  of 
certain  decisions.  The  ATCSCC  traffic  manager’s  task  is  to  assign 
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the  parameters  defining  the  GDE  The  ATC  Coordinators  for 
each  NAS  User  can  then,  subject  to  the  constraints  imposed  by 
the  established  GDR  make  decisions  about  which  flights  to  as¬ 
sign  to  which  slots  and  can  even,  by  setting  ERTAs,  offer  to  swap 
arrival  slots  with  other  NAS  Users. 

Adapting  to  deal  with  uncertainty.  The  GDP  process 
as  described  thus  far  basically  assumes  that  the  NAS  is  a  deter¬ 
ministic  process.  If  this  assumption  was  true,  and  if  the  deci¬ 
sion  makers  had  perfect  information,  then  the  process  would 
run  in  a  flawless  manner,  with  the  right  number  of  flights  arriv¬ 
ing  at  the  constrained  airport  at  the  right  time. 

However,  there  are  numerous  sources  of  uncertainty  in  the 
NAS.  The  duration  and  extent  of  weather  constraints  are  not  to¬ 
tally  predictable;  the  estimated  time  required  to  clear  an  air¬ 
craft  with  a  blown  tire  off  a  runway  may  be  wrong;  a  flight  may 
miss  its  assigned  departure  slot  (calculated  by  working  back¬ 
ward  from  its  assigned  arrival  time  and  its  estimated  air  time);  or 
a  flight  may  take  more  or  less  time  than  predicted  to  arrive  once 
airborne.  To  deal  with  these  uncertainties,  certain  forms  of 
adaptation  have  been  built  into  the  process. 

First,  ATCSCC  can  override  the  slot  allocation  for  any  indi¬ 
vidual  flight,  and  assign  it  some  other  earlier  time  because  of 
an  extenuating  circumstance.  (In  the  parlance  of  the  field,  such 
a  flight  is  referred  to  as  a  “white  hat.’’)  Second,  if  there  are  a 
number  of  slots  before  the  end  of  the  GDP  that  have  not  been 
filled  with  active  flights,  ATCSCC  can  run  a  “compression”  algo¬ 
rithm  that  moves  later  flights  up  into  those  slots  (subject  to  the 
constraints  imposed  by  the  ERTAs  for  those  flights  and  the  rule 
that  a  flight  cannot  be  moved  up  to  a  slot  that  would  require  it 
to  depart  within  the  next  30  minutes).  Third,  ATCSCC  can  revise 
the  GDP  itself  because  the  original  predictions  or  resultant  de¬ 
cisions  have  turned  out  to  be  too  inaccurate,  setting  new  para¬ 
meters  (such  as  a  new  AAR  or  a  new  end  time),  and  causing  a 
major  shift  in  arrival  slot  assignments. 

A  fourth  method  for  dealing  with  uncertainty  is  to  pad  the 
original  estimate  for  the  AAR.  The  major  reason  for  doing  this 
is  to  deal  with  potential  “pop-up”  flights.  As  described  earlier, 
slots  are  allocated  based  on  the  available  schedule  for  arrivals 
at  an  airport.  However,  many  general  aviation  aircraft  do  not 
publish  a  schedule  indicating  where  and  when  they  plan  to  fly 
on  a  given  day.  They  fly  on  an  as-needed  basis  for  their  owners. 
Nevertheless,  they  still  need  to  be  accommodated  by  the  sys¬ 
tem.  To  deal  with  such  flights,  when  setting  the  AAR  using  ESM, 
the  ATCSCC  traffic  manager  can  indicate  the  number  of  pop-ups 
to  plan  for,  thus  leaving  some  open  slots  for  those  flights.  "When 
such  a  flight  files  its  flight  plan,  it  is  then  assigned  the  average 
delay  that  was  assigned  to  those  flights  that  are  now,  under  the 
GDR  expected  to  arrive  in  the  same  15-minute  period. 

Administrative  controls.  In  addition  to  these  real-time 
decisions,  the  overall  process  is  monitored  for  problems.  One 
such  potential  problem  is  the  quality  of  the  data  submitted  by  a 
NAS  User.  To  deal  with  this  issue,  three  metrics  are  monitored: 

•  Time-out  cancels.  A  time-out  cancel  is  a  flight  that  is  expected  to 
operate  (one  that  was  scheduled,  but  for  which  a  cancellation 
message  was  never  submitted),  but  either  never  operates,  or 
operates  well  after  its  assigned  departure  slot. 
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•  Cancelled-but-flew  flights.  A  cancelled-but-flew  flight  is  a  flight 
that  the  participant  cancels  but  that  ends  up  operating. 

•  Undeclared  flights.  An  undeclared  flight  is  a  flight  belonging 
to  a  NAS  User  that  normally  submits  a  schedule  of  its  ex¬ 
pected  flights  for  each  day,  and  that  operates  without  prior 
notice  to  the  system. 

NAS  Users  must  demonstrate  acceptable  performance  in  terms 
of  these  metrics  in  order  to  continue  to  participate  in  using  the 
slot-swapping  functions  provided  by  GDPs.  This  helps  to  ensure 
sufficient  data  integrity  for  the  overall  process  to  function  as 
planned. 

A  second  area  requiring  administrative  oversight  is  a  concern 
over  gaming.  Suppose,  for  example,  that  SFO  (San  Francisco) 
has  a  GDP  in  effect  because  of  reduced  visibility  at  that  airport. 
A  flight  that  wants  to  fly  to  SFO  should  take  a  delay  based  on  this 
GDP  However,  if  it  files  for  OAK  (Oakland,  an  airport  very  close 
to  SFO)  instead,  and  then  requests  an  amendment  to  land  in¬ 
stead  at  SFO  once  airborne,  it  has  violated  the  spirit  of  the  GDP 
program.  ATCSCC  monitors  for  such  practices,  and  takes  ad¬ 
ministrative  actions  to  prevent  them  in  the  future. 

Evaluation  of  tfie  implemented  Solution 

There  are  two  components  of  the  benefits  from  the  En¬ 
hanced  Ground  Delay  Program  which  has  been  described  above 
in  terms  of  its  major  features.  The  first  is  the  benefit  to  the  NAS 
Users  derived  from  swapping  flights  in  order  to  reduce  delays 
for  their  higher-priority  flights.  These  savings  are  clearly  very 
substantial,  but  would  be  very  difficult  and  time  consuming  to 
estimate.  As  a  result,  to  date  no  study  has  been  conducted  to 
put  a  dollar  figure  on  that  class  of  benefits. 

The  second  benefit  arises  from  the  use  of  the  compression 
process  to  fill  in  unused  arrival  slots,  therefore  reducing  overall 
delays.  Figure  29.5  shows  the  frequency  of  use  of  the  Enhanced 
GDP  process  since  the  introduction  of  the  compression  process 
in  1999.  Note  that,  over  the  past  two  years,  it  has  not  been  un¬ 


common  to  run  as  many  as  100  GDPs  per  month.  Table  29.1 
then  provides  an  estimate  of  the  minutes  and  dollars  saved  by 
the  compression  process  alone,  using  a  conservative  average 
cost  per  minute  of  operation  for  a  flight  of  $42.  These  estimates 
suggest  that  the  use  of  the  compression  process  alone  reduced 
delay  by  28,188,715  minutes  from  March  18,  1999  to  June  30, 
2005,  thus  saving  over  1  billion  dollars. 


Case  Study  A — Conclusions 

The  Enhanced  GDP  illustrates  a  deliberate  effort  to  design  a 
distributed  work  system.  The  design  strategy  is  to  have  one 
party  (ATCSCC  as  a  neutral  referee)  set  a  high-level  constraint 
(the  airport  arrival  rate),  and  to  then  let  the  other  involved  par¬ 
ties  (the  NAS  Users)  find  solutions  that  meet  their  business 
needs  as  effectively  as  possible  subject  to  that  constraint  (by 
swapping  slots  among  flights  to  favor  the  highest-priority 
flights).  Taking  this  approach,  knowledge  and  data  remain  dis¬ 
tributed,  matching  the  locus  of  control  for  the  various  decisions 
that  need  to  be  made. 

This  distribution  also  serves  to  reduce  the  cognitive  complex¬ 
ity  of  the  task  confronting  each  individual.  The  ATCSCC  traffic 
manager  doesn’t  need  to  know  anything  about  User  priorities, 
and  the  ATC  Coordinator  for  a  NAS  User  doesn’t  need  to  become 
an  expert  at  setting  airport  arrival  rates.  Finally,  because  excep¬ 
tions  are  necessary  to  override  poor  decisions  when  they  are  pro¬ 
posed  by  someone  due  to  a  slip  or  due  to  insufficient  access  to  the 
relevant  knowledge  or  data,  there  must  be  a  mechanism  to  sup¬ 
port  interactions  among  the  different  participants  in  order  to 
exchange  the  necessary  information  and  modify  those  decisions. 

To  make  this  process  work,  however,  certain  other  require¬ 
ments  arise  during  its  design: 

•  To  make  it  politically  acceptable,  non-participants  (NAS  Users 
who  do  not  choose  to  participate  in  slot  swapping,  and  so 
on)  must  not  be  penalized.  Such  non-participants  are  initially 
assigned  slots  just  like  everyone  else,  but  their  flights  are 


Month 


FIGURE  29.5.  Frequency  of  use  of  the  enhanced  GDP  by  year. 
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TABLE  29. 1 .  Time  and  Money  Savings  from  the  use  of  the 
Compression  Process  in  GDPs  since  1999 


Data  Included  in  Analysis: 

—  All  compression  from  Mar.  1 8,  1 999  -  June  30,  2005 

—  3,573,796  flights  included 

—  28, 188,71 5  total  minutes  reduced 

—  $1,183,926,030  (@S42/min)  dollars  saved 
Departure  Delay  Performance 

—  Average  reduction:  7.7  minutes 
Percentiles: 

—  Reductions  greater  than  15  minutes:  12.3%  (440,889  flights) 

—  Reductions  greater  than  30  minutes:  6.1%  (218,886flights) 

—  Reductions  greater  than  45  minutes:  3.8%  (137,424  flights) 

—  Reductions  greater  than  60  minutes:  2.6%  (93,558  flights) 


never  moved  to  a  later  slot  as  part  of  any  swapping.  (They 
may  actually  benefit  from  compression,  however,  if  they  are 
moved  up  to  an  earlier  slot  that  is  empty.) 

•  There  must  be  an  incentive  for  information  sharing. 

•  DSTs  must  be  available  to  provide: 

-  digital  communication; 

-  simulation  of  the  predicted  impact  of  alternative  decisions 
(what-if  capabilities) ; 

-  algorithmic  support  for  identifying,  evaluating,  and  imple¬ 
menting  complex  decisions  involving  large  numbers  of 
flights  (swapping); 

-  information  displays  that  allow  the  user  to  understand  the  sit¬ 
uation  as  relevant  to  that  user’s  responsibilities,  and  to  effec¬ 
tively  evaluate  the  solution  recommended  by  the  technology; 

-  control  by  the  user  to  override  the  DSTs  recommendations 
in  cases  where  the  technology  exhibits  brittleness. 

•  Organizational  changes  in  roles  and  responsibilities  (e.g., 
new  specialists  at  ATCSCC  with  the  knowledge  and  skills  to 
develop  GDPs;  ATC  Coordinators  for  the  NAS  Users  to  look  at 
tradeoffs  in  delaying  alternative  flights) . 

•  The  overall  procedures  must  support  adaptation,  so  that  de¬ 
cisions  can  be  modified  as  the  scenario  evolves. 

In  summary,  this  case  study  serves  to  illustrate  the  importance 
of  looking  at  design  from  a  broad  systems  perspective  when 
dealing  with  a  distributed  work  environment,  considering  not 
just  the  detailed  design  issues  necessary  to  implement  a  given 
DST,  but  also  considering  the  alternative  strategies  for  distrib¬ 
uting  the  work  and  the  organizational  changes  required  to  match 
people  with  certain  skills  and  knowledge  with  corresponding 
responsibilities,  and  to  make  the  system  resilient  in  the  face  of 
slips,  mistakes,  and  unanticipated  events. 


Case  Study  B — Human  Interactions  with 
Autonomous  Mobile  Robotic  Systems 

A  rapidly  emerging  technical  area  is  that  of  autonomous  mo¬ 
bile  robotic  systems.  An  autonomous  mobile  robotic  system  can 
perform  complex  tasks  for  extended  periods  with  no  human 


intervention,  and  thus  represents  a  large  step  beyond  remote 
control.  But  no  system  is  truly  autonomous — interaction  with 
its  human  owners  and  operators  is  inevitable.  This  case  study 
illustrates  the  issues  and  needed  approaches  to  provide  an  ef¬ 
fective  and  responsible  relationship  between  a  human  super¬ 
visor  and  a  team  of  autonomous  mobile  robotic  systems. 

Remote  control  of  mobile  robotic  vehicle  systems  has  be¬ 
come  commonplace  over  the  past  two  decades.  Remote  control 
depends  on  a  human  operator  to  visualize  the  situation  around 
the  remote  vehicle  and  to  provide  control  inputs  to  maneuver 
the  vehicle  in  space  and  to  manage  its  sensors  and  manipulators 
to  achieve  the  desired  mission.  Examples  of  remote  control  of 
robotic  vehicles  include  deep  ocean  exploration  and  repairs, 
bomb  disposal  operations,  and  handling  of  very  hazardous  ma¬ 
terials  such  as  radioactive  waste. 

While  remote  control  has  been  successfully  used  for  many 
situations  in  which  it  would  be  highly  undesirable  to  attempt 
to  send  a  human,  the  requirement  to  maintain  continuous  in¬ 
formation  and  control  flow  between  the  vehicle  and  the  human 
operator  has  been  limiting.  Over  the  past  decade,  significant 
investment  has  been  made  in  research  and  development  of  au¬ 
tonomous  mobile  robotic  systems  with  promising  results  for 
land,  air,  and  undersea  vehicles. 

The  study  of  human  interaction  with  remotely  operated  and 
autonomous  systems  has  a  long  history.  The  classic  work  was 
by  Sheridan  and  Verplank  (1978)  because  of  their  study  of  issues 
for  remote  operations  of  undersea  robotic  vehicles,  and  has 
been  recently  updated  in  Parasuraman  et  al.  (2000).  This  work 
defines  ten  levels  of  autonomy,  ranging  from  fully  manual  to 
fully  autonomous.  Most  significantly,  the  levels  are  distinguished 
by  defining  the  nature  of  the  interactions  between  the  human 
and  the  machine. 

The  highest  six  levels  are  particularly  relevant  to  this  case  study: 

10.  The  computer  decides  everything,  acts  autonomously,  ignoring  the 
human 

9.  (The  computer)  informs  the  human  only  if  it,  the  computer  decides  to 
8  (The  computer)  informs  the  human  only  if  asked,  or 
7.  (The  computer)  executes  automatically,  then  necessarily  informs  the 
human,  and 

6.  (The  computer)  allows  the  human  a  restricted  time  to  veto  before 
automatic  execution,  or 

5.  executes  that  suggestion  if  the  human  approves,  or  .  .  .  (p.  933) 

Thus,  like  many  other  issues  in  human-computer  interaction, 
creating  a  decision-support  system  for  control  of  a  team  of 
autonomous  entities  requires  careful  attention  to  the  cognitive 
characteristics  of  humans. 


Tfie  Application  Area 

During  the  mid-1990s,  as  a  result  of  the  earlier  successful 
demonstrations  of  intelligent  systems  technology  within  the 
military,  key  technology  decision  makers  with  the  defense  in¬ 
dustry  research  and  development  community  recognized  that  it 
was  possible  to  create  an  unmanned  combat  air  vehicle. 

The  unmanned  combat  air  vehicle,  or  UCAy  was  envisioned 
as  a  platform  for  performing  very  dangerous  missions  or,  alterna¬ 
tively,  very  routine  missions.  An  example  of  a  dangerous  mission 
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for  modern  combat  aircraft  is  suppression  of  enemy  air  defenses, 
or  SEAD.  In  the  SEAD  mission,  a  team  of  UCAVs  would  be  as¬ 
signed  the  role  of  finding  and  attacking  enemy  surface-to-air 
missile  sites,  so  that  the  airspace  over  the  battle  area  would  be 
safe  for  manned  aircraft. 

The  development  of  unmanned  reconnaissance  aircraft  has 
made  steady  progress,  resulting  in  the  deployment  of  the  Preda¬ 
tor  and  Global  Hawk  unmanned  aircraft  systems.  While  the 
Predator  is  still  a  remote  controlled  system,  flown  by  a  human 
pilot  at  a  ground  station,  the  Global  Hawk  is  a  true  autonomous 
aircraft.  These  successes  raised  confidence  that  the  UCAV  con¬ 
cept  was  feasible  as  well. 

The  demands  of  combat  aviation  are,  however,  quite  dif¬ 
ferent  from  reconnaissance  and  surveillance.  Reconnaissance 
missions  normally  are  single-aircraft  missions,  planned  so  that 
the  aircraft  avoids  potential  threats.  The  typical  mission  also 
has  well-defined  locations  where  information  is  sought,  and 
success  depends  on  collecting  the  information  without  being 
detected. 

Combat  aviation,  in  contrast,  is  conducted  in  an  environ¬ 
ment  of  uncertainty  and  hostility.  The  enemy  actively  seeks  to 
deny  information  about  his  location  and  his  intent  by  blocking 
and  disrupting  sensors  and  communications,  and  by  using  de¬ 
coys.  The  enemy  is  also  using  his  defensive  weapons  to  find  and 
destroy  his  opponent’s  forces.  In  this  environment,  a  team  of 
combat  aircraft  must  operate  successfully  with  degraded  infor¬ 
mation  and  limited  contact  with  human  supervisors.  The  team 
must  be  able  to  find  and  attack  the  enemy  while  avoiding  dam¬ 
age  to  noncombatants.  At  the  same  time,  the  team  must  be  able 
to  recognize  and  remediate  system  degradations  and  failures 
without  assistance. 

For  an  unmanned  combat  aircraft,  the  difficulty  of  this  envi¬ 
ronment  is  higher  still.  The  human  supervisor  no  longer  has  the 
direct  sensory  perceptions  available  to  a  pilot  flying  on  the  air¬ 
craft.  His  entire  understanding  of  the  situation  depends  on  lim¬ 
ited  sensor  data,  and  his  ability  to  control  the  responses  to  the 
situation  is  hindered  by  time  lags  and  dropouts  of  data  due  to 
remote  data  transmissions.  To  be  successful,  the  UCAV  concept 
needs  to  be  robust  and  flexible  in  the  face  of  a  decidedly  hos¬ 
tile  environment. 

The  Need  for  Decision  Aiding 

While  the  current  generation  of  unmanned  reconnaissance 
and  surveillance  aircraft  has  been  successful,  this  success  has 
largely  been  dependent  on  the  skill  and  direct  activity  of  their  hu¬ 
man  operators.  Several  humans  are  required  to  operate  the  Preda¬ 
tor,  which  is  actually  remotely  flown  and  not  an  autonomous  sys¬ 
tem.  Similarly,  Global  Hawk  requires  a  team  of  humans  that 
continuously  monitors  a  single  aircraft  throughout  its  mission,  as 
well  as  a  significant  mission-planning  effort  prior  to  launch. 

If  unmanned  combat  air  vehicles  are  to  be  useful,  a  significant 
reduction  in  human-operator  effort  is  required.  Modern  manned 
aircraft,  such  as  the  F-22  Raptor  and  the  newly  designed  F/A-35 
Joint  Strike  Fighter,  are  flown  by  a  single  human  pilot.  While 
training  of  a  military  combat  pilot  is  expensive  and  takes  consid¬ 
erable  time,  UCAVs  will  not  be  accepted  as  an  adjunct  to  manned 
aircraft  unless  the  mission  can  be  reliably  performed  and  both 


(a)  the  cost  of  the  unmanned  aircraft  is  less  than  a  manned  air¬ 
craft  and  (b)  the  cost  of  the  human  operators  needed  for  one 
unmanned  air  vehicle  is  significantly  less  than  the  cost  of  training 
and  supporting  a  pilot  for  a  manned  aircraft. 

As  a  result,  the  design  goal  for  unmanned  combat  aircraft  is 
typically  stated  as  a  single  operator  controlling  a  flight  of  four 
or  more  UCAVs  in  a  combat  setting.  Given  the  issues  that  con¬ 
front  a  human  operator  of  an  unmanned  combat  aircraft  mis¬ 
sion,  researchers  (Elmore,  Dunlap,  &  Campbell,  2001;  Geddes 
&  Lee,  1998)  have  recognized  that  decision  aiding  is  needed 
across  a  broad  front: 

•  Situation  awareness.  The  human  operator  must  understand 
the  hostile  external  environment,  the  offensive  and  defensive 
postures  of  all  of  the  aircraft  under  his  control,  and  the  in¬ 
ternal  status  of  all  of  the  aircraft  systems  for  all  of  the  aircraft 
under  his  control.  This  is  a  massive  amount  of  information 
to  digest,  with  the  potential  for  sudden  and  significant 
changes  at  any  moment. 

•  Planning  and  resource  commitment.  The  UCAVs  are  assigned 
a  mission,  and  the  human  operator  is  responsible  for  man¬ 
aging  the  frequently  changing  tactical  behaviors  needed  to 
meet  the  mission  goals.  Generating  alternative  short-term 
tactical  plans,  evaluating  the  alternatives,  selecting  the  most 
effective  plans  and  committing  the  aircraft  resources  to  the 
plans  is  a  continuous  requirement. 

•  Coordination  and  Acting.  As  plans  are  generated  and  se¬ 
lected,  the  human  operator  must  also  ensure  that  the  plans 
are  coordinated  across  the  flight,  free  of  conflicts,  and  per¬ 
formed  as  expected.  At  any  point  in  time,  multiple  plans 
across  subsets  of  the  UCAVs  in  the  flight  will  be  active  and  will 
require  monitoring. 

•  Understanding  and  responding  to  irregularities.  Combat  air¬ 
craft,  sensors,  and  weapons  are  extremely  complex,  and  as  a 
result,  behavior  anomalies  are  common.  In  addition,  the  en¬ 
emy  seeks  to  exploit  vulnerabilities  in  systems  to  create  ir¬ 
regularities.  Recognizing  abnormal  situations  and  responding 
in  a  timely  way  is  a  frequent  requirement  in  combat. 

In  meeting  these  cognitive  needs  of  the  human  operator,  the 
designer  must  be  careful  not  to  create  a  “black  box”  automation 
that  blocks  human  involvement  (Wiener  &  Curry,  1980).  More  re¬ 
cently,  a  related  but  distinctly  different  challenge  has  emerged — 
that  of  “point  intelligence”  that  is  not  as  robust  and  flexible  as  it 
appears  (Geddes,  1997b;  Hammer  &  Singletary,  2004).  Point  in¬ 
telligence  is  designed  to  solve  a  particular  narrow  problem,  such 
as  assessment  or  planning,  but  does  not  have  broad  competency. 
As  Hammer  and  Singletary  (2004)  pointed  out,  this  leads  to  be¬ 
havioral  breakdowns  between  the  human  and  computer. 

The  Design  Solution 

From  2002  through  2005,  the  U.S.  Air  Force  Research  Labo¬ 
ratories  sponsored  the  development  and  evaluation  of  an  ap¬ 
proach  to  supporting  human  management  of  autonomous  un¬ 
manned  combat  air  vehicles  (Atkinson,  2003,  2005).  This 
system,  known  as  AUCTIONEER,  combines  a  cognitive  model 
of  decision  making  with  a  market  auctioning  system.  The  result 
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is  a  system  that  is  capable  of  seamless  operation  across  Levels  5- 
9  of  Sheridan’s  classification  of  levels  of  autonomy. 

As  discussed  by  Atkinson,  the  functional  organization  of  a 
complex  vehicle,  such  as  a  combat  aircraft,  normally  has  a  num¬ 
ber  of  layers,  as  shown  in  Fig.  29.6.  The  lowest  layer  acts  to  con¬ 
trol  the  physical  parts  of  the  aircraft,  such  as  its  flight  control 
surfaces,  engine  control  surfaces  and  fuel  metering,  sensors, 
and  weapons.  The  next  layer  contains  functions  that  integrate 
the  behaviors  of  the  lowest  layer  into  more  complex  behaviors 
to  perform  tasks.  For  example,  the  Navigation  System  may  act  to 
receive  position  and  velocity  data  from  the  sensors  and  direct 
the  flight  controls  and  engines  to  new  settings  in  order  to  fly  the 
aircraft  toward  a  desired  waypoint  in  space.  This  second  layer 
contains  many  functions  that  are  typically  implemented  as  con¬ 
ventional  automation,  with  limited  provisions  for  human  inter¬ 
actions.  Even  so,  a  modern  manned  combat  aircraft  cockpit  may 
have  over  350  controls  for  a  pilot  to  manipulate  to  configure 
his  vehicle,  sensors,  and  weapons  systems. 

The  third  layer  in  Fig.  29.6  provides  the  overall  mission  or¬ 
ganization  of  the  aircraft’s  behaviors,  so  that  the  intended  utility 
of  the  system  can  be  achieved.  For  manned  aircraft,  this  layer  is 
mostly  performed  by  a  human  pilot.  The  typical  training  time 
for  a  carefully  selected  person  to  reach  threshold  proficiency 
as  a  combat  aircraft  pilot  is  two  years,  and  expertise  normally  re¬ 
quires  10  years  of  continued  training  and  practice. 

The  fourth  layer,  the  coordination  layer,  is  often  performed 
by  humans  who  are  distantly  located  as  part  of  a  command-and- 
control  structure.  ’With  the  present  generation  of  manned  air¬ 
craft  and  command  and  control  systems,  little  “computer-to- 
computer”  coordination  is  available.  Most  coordination  is 
performed  human-to-human. 

If  unmanned  vehicles  are  to  meet  their  economic  goals  by 
using  significantly  fewer  human  operators,  the  mission  layer  and 
the  coordination  layer  must  both  be  substantially  more  au¬ 
tonomous.  Yet,  because  an  unmanned  combat  system  must  also 
be  able  to  apply  lethal  weapons,  uninterrupted  autonomy  for 
any  significant  periods  of  the  mission  may  compromise  safety 
and  human  responsibility.  As  a  result,  operations  at  Level  10  of 
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FIGURE  29.6.  Behaviors  of  complex  systems  are  produced  in 
layers. 


Sheridan’s  hierarchy  are  strongly  undesirable.  However,  seam¬ 
less  operation  across  Levels  5-9  of  Sheridan’s  model  of  auton¬ 
omy  is  strongly  required. 

The  AUCTIONEER  system  is  based  on  the  architecture  from 
Elmore  et  al.  (2001),  which  has  evolved  directly  from  the  origi¬ 
nal  Pilot’s  Associate  architecture  (Lizza  &  Banks,  1991;  Rouse, 
Geddes,  &  Hammer,  1990)  of  the  early  1990s.  This  architecture 
uses  a  distributed  system  of  cognitive  engines,  providing  full  sit¬ 
uation  assessment,  planning,  coordinating,  and  acting  for  each 
entity  in  the  system,  both  human  and  UCAV 

To  meet  the  need  for  situation  awareness,  the  AUCTIONEER 
system  provides  for  a  distributed  model  of  beliefs  that  is  shared 
across  the  unmanned  members  of  the  flight  and  the  human  op¬ 
erator.  The  belief  model,  embodied  as  a  concept  graph,  allows 
each  aircraft  to  combine  information  from  other  aircraft  with 
its  own  and  with  that  provided  by  other  sources,  as  well  as  input 
from  the  human.  The  belief  model  also  provides  for  active  mon¬ 
itoring  for  specific  events  that  are  coupled  to  the  ongoing  and 
planned  tasks.  'When  a  task  is  initiated,  specific  monitors  are  ac¬ 
tivated  in  the  belief  model.  'When  monitored  events  are  de¬ 
tected,  existing  tasks  may  be  modified  or  new  tasks  activated 
by  the  planning  functions. 

Distributed  planning  for  AUCTIONEER  is  performed  using  a 
market-based  auctioning  system  that  has  no  centralized  auc¬ 
tion  manager  service.  Each  member  of  the  flight  can  initiate  an 
auction  for  a  task  that  it  determines  needs  to  be  performed  (be¬ 
cause  of  a  monitored  event) ,  and  any  members  of  the  flight,  in¬ 
cluding  the  auction  initiator,  can  bid  on  the  task.  More  than  one 
auction  can  be  proceeding  simultaneously.  The  auction  mecha¬ 
nism  does  not  assume  that  all  of  the  members  can  receive  the 
auction  announcement,  or  that  all  of  the  bids  can  be  received 
in  return  because  of  communications  dropouts. 

The  auctioning  system  uses  a  carefully  structured  represen¬ 
tation  of  the  tasks  that  may  need  to  be  performed  during  the 
mission.  The  structure  is  a  “Plan  Goal  Graph”  consisting  of  a  di¬ 
rected  acyclic  graph  with  alternating  plan  and  goal  nodes.  The 
Plan  Goal  Graph  defines  the  network  of  causal  relationships  be¬ 
tween  activities  described  by  plan  nodes  and  goal  nodes  that 
reflect  what  each  activity  is  expected  to  accomplish.  The  Plan 
Goal  Graph  has  the  additional  capability  to  be  used  as  a  model 
of  the  intentions  of  groups  of  active  entities  (Geddes,  1997a), 
and  is  used  in  AUCTIONEER  to  both  create  the  tasks  to  be  auc¬ 
tioned  and  to  interpret  the  actions  of  the  members  of  the  flight 
and  the  human  operator  as  the  tasks  are  performed. 

Similar  studies  by  Stenz,  Dias,  and  others  at  Carnegie  Mellon 
(Stenz  &  Dias,  1999;  Zlot  &  Stenz,  2003)  have  shown  that  mar¬ 
ket-based  auctioning  on  tree-structured  tasks  has  many  com¬ 
putational  advantages  over  either  the  auctioning  of  unstruc¬ 
tured  primitive  tasks  or  the  use  of  algorithms  that  attempt 
optimal  task  allocations.  However,  successful  use  of  an  auc¬ 
tioning  system  must  provide  mechanisms  for  the  human  oper¬ 
ator  to  understand  the  processes  and  outcomes  of  the  auctions 
and  to  intervene  in  auctioned  tasks  without  disrupting  the  ben¬ 
efits  of  using  market-based  auctions  as  a  means  for  resource 
allocation. 

Eor  example,  in  Fig.  29.7,  the  user  can  see  the  approved 
paths  and  proposed  paths  of  vehicles  in  the  flight;  review  past, 
future,  and  proposed  tasking  for  each  target  or  for  each  vehicle; 
and  edit  proposed  and  active  tasks  if  desired. 
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FIGURE  29.7.  The  set  of  UCAVs  have  begun  execution  of  one  coordinated  attack  while  propos¬ 
ing  a  second  coordinated  attack  against  a  different  target  complex.  The  routes  in  the  lighter  grey 
(magenta  in  true  color)  are  for  the  proposed  attack,  while  the  darker  routes  are  the  currently 
accepted  paths.  The  group  of  straight  paths  in  the  lower  right  of  center  are  for  vehicles  that  have 
not  yet  won  any  of  the  auctioned  tasks. 


Results 

The  AUCTIONEER  study  is  particularly  useful  as  a  case  study 
of  the  effects  of  opening  the  “black  box”  of  automation  because 
the  system  was  studied  with  both  passive  and  active  operator 
involvement. 

Performance.  In  the  first  study  results  (Atkinson,  2003), 
the  auction  processes  were  run  without  human  intervention. 
This  study  established  the  basis  for  measurements  of  effective¬ 
ness  for  the  auction  environment  as  a  fully  autonomous  process. 
This  first  study  found  that  the  auction  process  was  an  effective 
method  for  allocating  resources  in  terms  of  mission  accom¬ 
plishment.  More  targets  were  attacked  in  a  shorter  time  with 
fewer  abandoned  tasks  than  what  could  be  achieved  by  unaided 
human  control  of  the  vehicles.  A  second  concern  was  timeli¬ 
ness,  because  the  SEAD  mission  requires  that  targets  are  at¬ 
tacked  or  suppressed  before  they  can  disrupt  the  friendly  forces. 


The  auction  process  also  resulted  in  shorter  average  timelines 
to  attack  targets. 

The  most  significant  findings  dealt  with  the  stability  of  task¬ 
ing.  Because  the  winner  of  an  auction  might  later  find  a  better 
target  before  finishing  the  current  task,  winners  could  become 
auctioneers  and  “sell”  their  current  tasks  to  another.  Atkinson 
found  similar  results  to  Zlot  and  Stenz:  that  this  reselling 
process  did  not  result  in  churning  of  tasks  and  in  fact  led  to  im¬ 
proved  outcomes. 

Acceptance.  The  second  study  (Atkinson,  2005)  added 
the  human  operator  back  into  the  mission,  with  the  ability  to  in¬ 
tervene  in  the  auction  process  as  described  above.  In  this  study, 
the  overall  findings  showed  that  the  operator’s  understanding 
and  effectiveness  were  limited  by  the  lack  of  insight  into  the 
auction  process.  Even  though  the  underlying  decision-support 
model  used  appropriate  human  cognitive  constructs,  the  sys¬ 
tem  was  still  a  black  box  to  the  human  because  it  did  not  reveal 
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its  internal  model  at  the  user  interface.  Atkinson  and  her  col¬ 
leagues  provided  a  recommended  set  of  features  to  open  the 
black  box  of  the  auction  process. 

Communication  of  beliefs.  The  operator  should  be  able 
to  view  the  beliefs  held  by  each  member  of  the  flight,  and  act  to 
modify  those  beliefs  when  required.  This  provides  assurance 
that  the  members  of  the  flight  are  acting  with  a  common  assess¬ 
ment  of  the  situation  and  are  exploiting  all  available  information. 

Communications  of  intentions  in  the  context  of  a 
bid.  If  the  only  information  available  to  the  operator  about  an 
offered  bid  for  a  task  is  its  total  cost,  the  operator  loses  impor¬ 
tant  information  about  the  context  of  the  bid.  By  allowing  the 
operator  to  view  the  chain  of  intentions  that  led  to  the  bid,  as 
represented  in  the  Plan  Goal  Graph,  the  operator  would  be  able 
to  understand  how  the  bid  was  composed,  and  to  be  assured 
that  the  bid  was  proper. 

Communication  of  direct  commands.  Operator  com¬ 
mands  need  to  be  recognized  as  distinct  from  task-auction  re¬ 
quests.  This  would  allow  the  operator  to  act  with  assurance  that 
his  commanded  action  would  be  carried  out  by  the  aircraft  and 
resource  that  was  directed,  rather  than  incurring  an  auction  re¬ 
sponse  of  uncertain  outcome.  Any  additional  task  reallocations 
that  result  from  the  operator’s  commands  may  be  automatically 
handled  as  auctions. 

The  findings  in  this  case  study  underscore  two  important 
issues  in  designing  intelligent  decision-support  systems:  (a)  the 
importance  of  avoiding  “black  box”  automation  for  high-stakes 
environments,  and  (b)  the  advantages  of  designing  and  building 
decision-support  systems  that  are  based  on  models  of  human 
cognitive  processes. 

While  it  may  seem  easier  to  design  and  implement  ap¬ 
proaches  that  are  mathematically  elegant,  this  can  result  in  an 
automation  “black  box”  that  undermines  human  understand¬ 
ing.  Similarly,  efforts  to  reduce  the  scope  of  the  decision  sup¬ 
port  to  solve  just  one  part  of  the  operator’s  cognitive  prob¬ 
lem  may  result  in  point  intelligence  that  can  also  degrade 
performance. 

Humans  learn  to  interact  with  other  humans  as  part  of  the 
development  of  social  skills  for  effective  behaviors  within  their 
cultures.  In  cultures  based  on  trust  and  mutual  accomplish¬ 
ment,  the  ability  to  communicate  about  beliefs,  goals,  and  in¬ 
tentions  is  an  important  skill.  This  case  study  suggests  that  cre¬ 
ating  automated  behavioral  systems  that  can  use  explicit  models 
of  beliefs,  goals,  and  intentions  as  the  basis  for  communication 
with  humans  is  an  effective  way  to  open  the  black  box. 

Case  Study  C — Interactive  Critiquing 
as  a  Form  of  Decision  Support 

This  case  study  emphasizes  the  challenge  for  interaction  de¬ 
signers  in  dealing  with  the  use  of  decision-support  tools  that 
have  the  potential  for  brittle  performance  due  to  known  or  un¬ 
known  limitations.  While  there  are  no  perfect  solutions  to  this 
problem,  there  are  a  number  of  approaches  to  help  reduce  its 
impact. 


This  case  study  looks  at  a  tool  that  incorporates  three  com¬ 
plementary  approaches  to  the  design  of  an  expert  system  in  or¬ 
der  to  improve  overall  performance,  while  reducing  the  poten¬ 
tial  impact  of  brittle  performance  by  the  expert  system.  The  first 
approach  to  deal  with  the  impact  of  brittle  performance  by  an 
expert  system  is  to  design  a  role  that  encourages  the  user  to  be 
fully  engaged  in  the  problem  solving,  and  to  apply  his  knowl¬ 
edge  independently  without  being  first  influenced  by  the  soft¬ 
ware.  In  this  case  study,  the  approach  explored  to  achieve  this  is 
the  use  of  the  computer  as  a  critic,  rather  than  as  the  initial 
problem-solver  (Fischer,  Lemke,  Mastaglio,  &  Morch,  1991; 
P  Miller,  1986;  Silverman,  1992;  Tianfield  &  Wang,  2004).  Thus, 
instead  of  asking  the  person  to  critique  the  performance  of  the 
software,  the  computer  is  assigned  the  role  of  watching  over  the 
person’s  shoulder.  Note  that  this  is  more  accurately  described  as 
having  the  design  team  try  to  anticipate  all  of  the  scenarios  that 
can  arise,  and  then,  for  all  of  those  scenarios,  trying  to  incorpo¬ 
rate  the  knowledge  necessary  to  detect  possible  slips  or  mis¬ 
takes  on  the  part  user  and  to  provide  alerts  and  assistance  in 
recovering  (Pritchett,  Vandor,  &  Edwards,  2002;  Wogalter  & 
Mayhorn,  2005). 

The  second  approach  to  reduce  sensitivity  to  brittleness  in 
the  computer’s  performance  is  to  incorporate  metaknowledge 
into  the  expert  system  that  can  help  it  to  recognize  situations 
in  which  it  may  not  be  fully  competent.  By  doing  so,  the  soft¬ 
ware  may  be  able  to  alert  the  person  to  be  especially  careful  be¬ 
cause  the  computer  recognizes  that  this  is  an  unusual  or  diffi¬ 
cult  case. 

The  third  approach  to  deal  with  brittleness  is  to  develop 
problem-solving  strategies  that  reduce  susceptibility  to  the  im¬ 
pact  of  slips  or  mistakes.  In  the  software  discussed  in  this  case 
study  this  is  accomplished  by  incorporating  a  problem-solving 
strategy  that  includes  the  collection  of  converging  evidence 
using  multiple  independent  problem-solving  strategies  and 
sources  of  data  to  arrive  at  a  conclusion. 

What  follows,  then,  is  a  discussion  of  a  specific  expert  sys¬ 
tem  that  incorporates  these  three  strategies  for  dealing  with 
brittleness.  Empirical  testing  of  the  system  suggests  that  this 
approach  can  significantly  enhance  performance,  even  in  cases 
where  the  software  is  not  fully  competent  (E  J.  Smith  &  Rud- 
mann,  2005). 

The  Application  Area 

The  specific  problem  area  considered  in  this  case  study  is 
the  design  of  a  decision-support  tool  to  assist  blood  bankers  in 
the  identification  of  compatible  blood  for  a  transfusion.  One  of  the 
difficult  tasks  that  blood  bankers  must  complete  as  part  of  this 
process  is  the  determination  of  whether  the  patient  has  any  al- 
loantibodies  present  in  his  blood  serum,  and  if  so,  what  partic¬ 
ular  antibodies  are  present. 

Diagnosis  as  a  generic  task.  Abstractly,  this  is  a  classic 
example  of  the  generic  task  of  abduction  or  diagnosis  (J.  Joseph- 
son  &  S.  Josephson,  1994;  Psillos,  2002).  It  involves  deciding 
which  tests  to  run  (what  data  to  collect),  collecting  and  inter¬ 
preting  those  data,  and  forming  hypotheses,  as  well  as  decid¬ 
ing  what  overall  combination  of  problem-solving  strategies  to 
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employ.  Characteristics  of  this  generic  task  that  make  it  difficult 
for  people  include: 

•  The  occurrence  of  multiple-solution  problems,  where  more 
than  one  “primitive”  problem  is  present  at  the  same  time; 

•  The  occurrence  of  cases  where  two  or  more  “primitive”  prob¬ 
lems  are  present,  and  where  one  problem  masks  the  data  in¬ 
dicative  of  the  presence  of  the  other; 

•  The  existence  of  “noisy”  data; 

•  The  existence  of  a  large  “data  space”  where  data  must  be  col¬ 
lected  sequentially,  so  that  the  person  must  decide  what  data 
to  collect  next  and  when  to  stop; 

•  The  presence  of  time  stress,  in  which  an  answer  must  be 
determined  quickly  (Elstein,  Shulman,  &  Sprafka,  1978;  E  J. 
Smith  et  al.,  1998). 

In  this  application,  the  “primitive”  problems  are  the  individ¬ 
ual  antibodies  that  may  be  present  in  the  patient’s  blood  serum. 
Time  stress  can  arise  when  the  patient  is  in  an  emergency  and 
needs  a  transfusion  quickly. 


Sample  problem.  To  illustrate  the  nature  of  this  diagno¬ 
sis  task,  consider  the  following  partial  description  of  an  interac¬ 
tion  with  the  decision-support  tool  AIDA  (the  Antibody  IDenti- 
fication  Assistant)  that  is  the  focus  of  this  case  study  (Guerlain 
et  al.,  1999).  Initially,  the  medical  technologist  needs  to  deter¬ 
mine  whether  the  patient  is  type  A,  B,  AB,  or  O,  and  whether  the 
patient  is  Rh  positive.  Then  the  technologist  determines 
whether  the  patient  shows  evidence  of  any  autoantibodies  or  al- 
loantibodies.  As  part  of  this  determination,  displays  like  the  one 
shown  in  Fig.  29.8  are  provided  by  AIDA. 

To  make  visual  scanning  easier  on  this  data  display,  some  of 
the  rows  have  been  highlighted  by  the  technologist  in  yellow 
(shown  in  light  gray  in  this  black-and-white  version).  In  addi¬ 
tion,  to  reduce  memory  load,  the  technologist  has  marked  a 
number  of  intermediate  conclusions,  indicating  that  the  patient 
could  have  antibodies  against  the  C,  E,  M,  and  N  antigens  (indi¬ 
cated  by  the  technologist  in  orange  in  the  actual  system,  but  left 
white  in  Fig.  29.8),  that  the  f,  y  Cw,  Lua,  Kpa,  and  Jsa  antibodies 
are  unlikely  (marked  in  blue  in  the  actual  system,  but  light  gray 
in  Fig.  29.8),  and  that  the  other  antibodies  shown  as  the  labels 
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Cell  fflO  does  not  possess  the  S  antigen.  Therefore^  even  if  anti-S  was  present  it  could  not 
possibly  react  with  this  cell.  Consequently,  this  is  not  a  good  cell  for  ruling  out  anti-S. 


[Rule  Out  Anv 


Undo  Rule  Out 


FIGURE  29.8.  Full  test  panel  with  intermediate  results  marked  using  color  coded  markers  pro¬ 
vided  by  AiDA  (shown  here  in  black  and  white).  In  this  case,  AIDA  has  interrupted  the  user  to 
suggest  that  an  error  may  have  been  made  in  ruling  out  anti-S. 
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across  the  top  row  can  be  ruled  out  (marked  in  green  in  the  ac¬ 
tual  system  but  shown  in  darker  gray  on  this  black-and-white 
version.)  These  color-coded  intermediate  answers  are  trans¬ 
ferred  to  all  other  data  displays  to  reduce  the  memory  load  for 
the  technologist.  Figure  29.8  also  provides  an  example  of  a  cri¬ 
tique  that  AIDA  has  provided  in  response  to  an  error  made  by 
the  technologist  in  marking  anti-S  as  ruled  out.  This  critique  was 
generated  by  the  expert  model  (rule-based  system)  underlying 
AIDA,  which  monitors  all  of  the  test  selections  and  markings 
made  by  the  technologist  as  he  solves  the  case  using  the  inter¬ 
face  provided  by  AIDA. 

Thus,  this  figure  serves  to  illustrate  that: 

•  The  technologist  is  responsible  for  completing  the  analysis, 
deciding  which  tests  to  run  and  which  intermediate  and  fi¬ 
nal  conclusions  to  reach,  and  is  thus  very  engaged  in  the  task; 

•  The  computer  provides  an  interface  that  makes  it  easy  for  the 
technologist  to  select  the  tests  to  run  and  to  view  the  resul¬ 
tant  data  (using  the  highlighting  to  make  visual  scanning  eas¬ 
ier),  as  well  as  to  remember  intermediate  conclusions  (using 
color-coded  markings  to  indicate  these  conclusions) ; 

•  Although  the  primary  motivation  for  the  technologist  in 
marking  the  data  forms  is  to  make  his  task  easier,  when  he 
does  so  the  computer  is  provided  with  a  great  deal  of  data 
regarding  what  the  user  is  thinking,  and  can  use  those  data  to 
make  inferences  based  on  its  expert  model  about  when  to 
provide  a  critique  cautioning  the  user  that  he  may  have  made 
a  slip  or  mistake. 

The  need  for  decision  aiding.  Initial  laboratory  and 
field  studies  indicated  that  the  task  of  determining  the  alloanti- 
bodies  present  in  a  patient’s  blood  is  a  difficult  one  for  many 
technologists.  A  variety  of  causes  of  errors  were  observed,  in¬ 
cluding  slips,  perceptual  distortions,  incorrect  or  incomplete 
knowledge  (Guerlain  et  al.,  1999),  and  cognitive  biases  (Brach- 
man  &  Levesque,  2004;  Bradfield  &  "Wells,  2005;  Fraser  et  al., 
1992;  Gilovich  et  al.,  2002;  Haselton  et  al.,  2005;  Kahneman  & 
Tversky  2000;  Pious,  1993;  Poulton,  1989). 


Jhe  Design  Solution 

It  was  the  judgment  of  the  design  team  that  this  task  was  suf¬ 
ficiently  complex  that,  given  the  available  development  re¬ 
sources,  it  was  unlikely  that  all  of  the  possible  scenarios  could 
be  anticipated  and  dealt  with  by  the  design  team.  It  was  also 
noted  that,  for  this  task,  the  cost  of  an  erroneous  diagnosis  was 
potentially  high.  Thus  it  was  decided  that,  instead  of  automating 
the  task,  a  decision-support  system  should  be  developed  that 
would  keep  the  person  very  much  engaged  in  the  task,  and  pro¬ 
vide  other  safety  nets  to  reduce  the  chances  of  error  for  those 
cases  in  which  the  computer’s  knowledge  was  incomplete.  This 
conclusion  was  reinforced  by  a  preliminary  study  regarding  the 
impact  of  role  on  diagnostic  accuracy  in  antibody  identification. 
Guerlain  et  al.  (1996)  showed  that,  when  the  user  was  asked  to 
critique  the  computer’s  performance,  instead  of  having  the 
computer  critique  the  user’s  performance,  the  final  answer  was 
wrong  3196  more  often  in  cases  where  the  computer’s  knowl¬ 


edge  was  incomplete  when,  in  that  case,  the  person  was  as¬ 
signed  the  role  of  critic  rather  than  having  the  computer  critique 
the  person. 

Critiquing — additional  design  considerations.  The 

literature  provided  additional  guidance  in  deciding  whether  and 
how  to  develop  this  software  to  play  the  role  of  a  critic.  For  ex¬ 
ample,  R  Miller  (1986)  developed  a  prototype  system  called 
ATTENDING  that  focused  on  anesthesiology.  Based  on  studies 
of  its  use.  Miller  suggested  that  critiquing  systems  are  most 
effective  in  applications  where  the  user  has  a  task  that  is  fre¬ 
quently  performed,  but  that  requires  the  person  to  remember 
and  apply  a  great  deal  of  information  in  order  to  complete  a 
case.  Miller’s  conclusion  was  that,  on  such  tasks,  the  person  is 
more  susceptible  to  slips  and  mistakes  and  would  therefore 
benefit  significantly  from  the  decision-support  system. 

A  second  consideration  was  how  intrusive  the  interactions 
with  the  knowledge-based  system  would  be  for  the  user.  A 
number  of  researchers  have  suggested  that  an  interface  that  re¬ 
quires  that  the  user  spend  significant  time  entering  data  and  in¬ 
forming  the  computer  about  what  he  has  concluded  is  too 
cumbersome,  and  will  therefore  be  unlikely  to  be  adopted  in 
actual  practice  (Berner,  Brooks,  R.  Miller,  Masarie,  &  Jackson, 
1989;  Harris  &  Owens,  1986;  Miller,  1986;  Shortliffe,  1990;  Tian- 
field  &  "Wang,  2004). 

A  third  consideration  was  concern  over  the  potential  for 
complacency  if  the  person  played  the  role  of  critic,  letting  the 
computer  complete  an  initial  assessment  and  then  having  the 
person  decide  whether  to  accept  this  assessment.  Parasuraman 
and  Riley  (1997)  have  shown  that,  in  such  a  role,  there  is  a  risk 
that  the  person  will  become  overreliant  on  the  computer,  and 
will  not  adequately  apply  his  knowledge  in  completing  the  cri¬ 
tique  (Metzger  &  Parasuraman,  2005).  (Note,  however,  that  a 
person  could  become  overreliant  even  with  the  roles  reversed, 
as  the  person  might  start  to  get  careless  and  assume  the  com¬ 
puter  will  always  catch  his  slips.  Administrative  controls,  based 
on  regular  monitoring  of  the  person’s  performance,  might  help 
reduce  such  complacency,  but  this  is  an  as-of-yet  unexplored 
aspect  of  the  use  of  critiquing  systems.) 

A  final  consideration  was  the  timeliness  of  critiques.  "Wait¬ 
ing  for  the  user  to  reach  a  final  answer  before  providing  a  cri¬ 
tique,  as  has  been  the  case  for  many  critiquing  systems,  is  po¬ 
tentially  inefficient  and  objectionable,  as  the  user  may  have 
invested  considerable  time  and  effort  in  arriving  at  a  mistaken 
answer  that  the  computer  could  have  headed  off  earlier  in  the 
person’s  problem-solving  process.  Furthermore,  if  the  critique 
is  given  well  after  the  user  has  made  a  slip  or  mistake,  it  may 
be  difficult  for  him  to  remember  exactly  why  he  arrived  at  that 
intermediate  conclusion  (thus  making  it  harder  to  decide 
whether  to  accept  the  computer’s  critique).  This  consideration 
therefore  suggests  the  need  for  an  interface  that  provides  the 
computer  with  data  about  the  user’s  intermediate  conclusions 
rather  than  just  the  user’s  final  answer,  so  that  critiquing  can  be 
more  interactive. 

Based  on  these  considerations,  AIDA  was  developed  as  a  cri¬ 
tiquing  system  that  supported  the  user  as  he  completed  the 
antibody  identification  task.  In  order  to  provide  immediate,  con¬ 
text-sensitive  critiques,  an  interface  was  developed  that  encour¬ 
aged  the  user  to  mark  intermediate  conclusions  on  the  screen. 
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As  suggested  earlier,  these  markings  in  fact  reduced  the  per¬ 
ceptual  and  memory  loads  for  the  user,  thus  implicitly  encour¬ 
aging  this  form  of  communication  and  allowing  the  computer  to 
detect  and  respond  immediately  with  context  sensitive  critiques 
to  potential  slips  and  errors  made  by  the  person. 

Complementary  strategies  to  reduce  susceptibility  to 
brittleness.  The  considerations  outlined  above  focused  on 
how  to  keep  the  person  engaged  in  the  diagnosis  task,  and  how 
to  avoid  triggering  the  cognitive  biases  that  can  arise  if  the  com¬ 
puter  suggests  an  answer  before  the  person  has  explored  the 
data  himself  (Larson  &  C.  Hayes,  2005;  P  J.  Smith,  McCoy,  &  Lay- 
ton,  1997).  AIDA  also  incorporated  two  additional  design  strate¬ 
gies  to  reduce  the  potential  for  error.  One  was  the  incorpora¬ 
tion  of  metaknowledge  into  the  knowledge-based  system.  The 
other  was  to  include  a  problem-solving  strategy  that  was  robust 
even  in  the  face  of  slips  or  mistakes  by  either  the  person  or  the 
computer  (e.g.,  the  design  team). 

Metaknowledge  was  included  to  help  the  computer  identify 
cases  in  which  its  knowledge  might  be  incomplete.  Such  rules 
were  developed  by  identifying  the  potential  weak  points  in  the 
computer’s  problem-solving  process.  An  example  was  the  use  of 
thresholding  by  AIDA  when  applying  rules  such  as: 

If  a  test  cell’s  reactions  are  0  on  a  test  panel  for  all  three  of  the  testing 
conditions  (IS,  LISS  and  IgG)  as  they  are  for  the  first  test  cell  (Donor 
A478)  shown  in  Figure  29.8,  and  if  e  is  present  (shown  by  a  -f  in  the 
column  labeled  e  in  the  row  corresponding  to  the  first  test  cell)  on  that 
test  cell  but  E  is  not  (shown  by  a  0  in  the  column  labeled  E  in  the  row 
corresponding  to  the  first  test  cell),  then  mark  e  as  ruled  out. 

This  heuristic  usually  produces  the  correct  inference.  Since 
it  does  not  directly  reason  with  some  form  of  probabilistic  rea¬ 
soning  (Bochman,  2004;  Pearl,  1988;  Shafer,  1996;  Shafer  & 
Pearl,  1990),  however,  it  can  lead  to  ruling  out  an  antibody  that 
is  actually  present  in  the  patient’s  serum  with  no  consideration 
of  the  level  of  uncertainty.  This  is  most  likely  to  happen  when 
the  reaction  strengths  are  weak.  Thus,  AIDA  was  provided  with 
a  rule  that  monitored  for  weak  reactions  on  cells,  and  when  this 
situation  was  detected  and  the  user  was  observed  trying  to  com¬ 
plete  rule-outs  without  first  enhancing  the  reactions  with  some 
alternative  test  phase,  the  system  cautioned  the  user.  In  this 
way,  AIDA  put  the  user  on  alert  to  be  especially  careful  in  ap¬ 
plying  his  normal  strategies  and  rules. 

A  second  strategy  incorporated  into  AIDA  as  protection 
against  brittleness  was  to  monitor  for  the  collection  and  consid¬ 
eration  of  converging  evidence  by  the  technologist.  This  prob¬ 
lem-solving  strategy  was  observed  in  use  by  one  of  the  experts 
involved  in  the  development  of  AIDA.  She  used  this  strategy  to 
catch  her  own  errors  and  those  of  the  technologists  working  un¬ 
der  her  supervision.  The  strategy  was  based  on  the  assumption 
that  any  one  data  source  or  line  of  reasoning  could  be  suscepti¬ 
ble  to  error,  and  that  it  is  therefore  wise  to  accept  a  conclusion 
only  if  independent  types  of  data  (test  results  that  are  not  based 
on  the  same  underlying  data  or  process)  and  independent  sets 
of  heuristics  or  problem-solving  strategies  have  been  used  to  test 
that  conclusion.  Based  on  this  expert  strategy,  AIDA  monitored 
the  user’s  problem-solving  process  to  see  whether  such  con¬ 
verging  evidence  had  been  collected  prior  to  reaching  a  final  an- 


TABLE  29.2.  Percentage  of  Blood  Bankers  Arriving  at  the 
Wrong  Final  Answer  on  Four  Test  Cases 


Control  Group 

Treatment  Group 

Posttest  Case  1 

33% 

0% 

Posttest  Case  2 

50% 

19% 

Posttest  Case  3 

38% 

0% 

Posttest  Case  4 

63% 

0% 

swer.  If  not,  AIDA  cautioned  the  user  and  suggested  types  of 
converging  evidence  that  could  be  collected. 


Evaluation  of  the  Implemented  Solution 

From  the  standpoint  of  human-computer  interaction,  the 
key  question  is  how  effectively  this  human-machine  system  per¬ 
forms  using  this  cooperative  problem-solving  paradigm  (Erkens, 
Andriessen,  &  Peters,  2003).  To  gain  insights  into  this,  an  em¬ 
pirical  study  was  conducted  using  AIDA. 

This  study  of  37  practitioners  at  seven  different  hospitals 
found  that  those  blood  bankers  using  AIDA  with  its  critiquing 
functions  turned  on  made  significantly  fewer  errors  (p  <  .01) 
than  those  that  used  AIDA  as  a  passive  interface.  Errors  in  the 
final  diagnoses  (answers)  were  reduced  by  31-63%. 

On  those  cases  where  AIDA  was  fully  competent  (Posttest 
Cases  1,2,  and  4),  errors  (%  of  technologists  getting  the  wrong 
final  answer)  were  reduced  to  0%  when  the  critiquing  functions 
were  on  (Treatment  Group).  In  the  case  where  AIDA  was  not 
fully  competent  (Posttest  Case  2),  AIDA  still  helped  reduce 
errors  by  31%. 

These  empirical  results  support  the  potential  value  of  in¬ 
teractive  critiquing,  supplemented  with  the  metaknowledge 
and  problem-solving  strategies  that  were  embedded  in  the  soft¬ 
ware,  as  a  design  strategy  for  applications  in  which  concerns  re¬ 
garding  the  potential  brittleness  of  the  technology  are  deemed 
significant. 


CONCLUSION 


One  goal  of  this  chapter  has  been  to  outline  different  concep¬ 
tual  approaches  to  support  the  design  of  DSSs,  including  the 
use  of  cognitive  task  analyses  and  work-domain  analyses.  A 
second  goal  has  been  to  review  relevant  knowledge  about  the 
psychologies  of  both  the  designers  and  users  of  DSSs,  identi¬ 
fying  important  design  questions  that  need  to  be  considered 
in  the  development  of  such  software  systems.  Finally,  the  chap¬ 
ter  emphasizes  the  need  to  take  a  broader  systems  perspective 
when  deciding  how  to  integrate  DSSs  into  a  work  environ¬ 
ment,  as  there  are  a  number  of  other  high-level  design  para¬ 
meters  (such  as  deciding  how  to  distribute  the  work  among  a 
team  of  people  and  machines)  that  may  offer  alternative  op¬ 
portunities  for  effectively  incorporating  DSSs  into  the  overall 
system. 
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INTRODUCTION 


The  expansion  of  the  Internet  has  resulted  in  an  increase  in  the 
usefulness  of  Computer  Mediated  Communication  (CMC)  and 
the  popularity  of  online  communities.  It  is  estimated  that  25%  of 
Internet  users  have  participated  in  chat  rooms  or  online  dis¬ 
cussions  (Madden  &  Rainie,  2003). 

This  chapter  begins  with  an  introduction  to  the  area  of  CMC 
and  online  communities  by  providing  their  definitions  as  well  as 
their  advantages  and  disadvantages.  Different  types  of  CMC  are 
then  introduced  and  two  special  types  of  online  communities 
are  described  in  more  depth.  More  specifically,  we  pay  special 
attention  to  the  evolution  of  Wiki  and  game-based  communities. 
These  two  relatively  new  areas  of  online  sociability  create  new 
opportunities  and  challenges  in  the  way  people  work,  learn,  and 
play  online.  Wiki-based  communities  facilitate  new  modes  of  so¬ 
cial  collaboration,  where  the  creation  of  online  content  is  no 
longer  an  individual  action  but  rather  it  is  transformed  into  a  so¬ 
cial,  collaborative  activity.  Massively  Multi-Player  Online  Role 
Playing  Games  (MMORPGs)  have  taken  the  social  aspects  of 
computer  game  playing  to  a  new  dimension,  where  players  in¬ 
teract,  socialize,  and  form  networks  of  communities  by  having 
fun  online. 

Online  communities  are  a  source  of  valuable  data  that,  when 
properly  analyzed,  can  provide  us  with  insights  about  the  social 
experience  people  who  are  part  of  them  have.  For  this  reason, 
the  analysis  and  evaluation  of  online  communities  requires  a 
good  understanding  of  all  the  available  evaluation  frameworks 
and  methodologies  that  exist.  We  provide  a  description  of  the 
key  methods  later  in  this  chapter.  We  then  demonstrate  the  ap¬ 
plication  of  some  of  these  methods  to  two  characteristic  case 
studies.  Our  first  case  study  looks  at  how  learning  communities 
can  be  analyzed  and  how  results  from  this  analysis  can  be  used 
for  improving  the  pedagogical  value  of  e-learning.  The  second 
case  study  investigates  the  use  of  activity  theoretical  frameworks 
in  the  analysis  of  computer-game-based  communities.  We  then 
conclude  this  chapter  with  a  brief  summary  and  suggestions  for 
new  directions  in  the  area  of  online  communities. 


CMC  AND  ONLINE  COMMUNITIES 


Computer  Mediated  Communication 

It  is  by  now  no  secret  how  vital  the  Internet  was,  is,  and  will  con¬ 
tinue  to  be  in  our  lives.  One  of  the  most  important  characteris¬ 
tics  of  this  medium  is  the  opportunities  it  offers  for  human- 
human  communication  through  computer  networks.  As  Metcalfe 
(1992)  pointed  out,  communication  is  the  Internet’s  most  im¬ 
portant  asset.  E-mail  is  just  one  of  the  many  modes  of  communi¬ 
cation  that  can  occur  by  using  computers.  Jones  (1995)  pointed 
out  that  through  communication  services,  like  the  Internet, 
Usenet,  and  bulletin  boards,  online  communication  has  for 
many  people  supplanted  the  postal  service,  telephone,  and 
even  the  fax  machine.  All  these  applications  in  which  the  com¬ 
puter  is  used  to  mediate  communication  are  called  Computer- 
Mediated  Communication  (CMC). 


December  (1997)  defined  CMC  as  “the  process  by  which  peo¬ 
ple  create,  exchange,  and  perceive  information  using  networked 
telecommunications  systems  (or  non-networked  computers) 
that  facilitate  encoding,  transmitting,  and  decoding  messages.” 
He  emphasized  that  studies  of  CMC  view  this  process  from  dif¬ 
ferent  interdisciplinary  theoretical  perspectives  (social,  cogni¬ 
tive/psychological,  linguistic,  cultural,  technical,  and  political) 
and  often  draw  from  fields  such  diverse  as  human  communica¬ 
tion,  rhetoric  and  composition,  media  studies,  human-computer 
interaction  (HCI),  journalism,  telecommunications,  computer 
science,  technical  communication,  and  information  studies. 

Online  Communities 

Online  communities  emerge  by  using  CMC  applications.  The 
term  online  community  is  multidisciplinary  in  nature,  means 
different  things  to  different  people,  and  is  difficult  to  define 
(Preece,  2000).  To  gain  a  general  understanding  of  what  online 
communities  are,  Rheingold  wrote  that  “  [online]  communities 
are  social  aggregations  that  emerge  from  the  Net  when  enough 
people  carry  on  those  public  discussions  long  enough,  with  suf¬ 
ficient  human  feeling,  to  form  webs  of  personal  relationships 
in  cyberspace”  (Rheingold,  1993,  p.  5). 

Online  communities  are  also  often  referred  to  as  cyber  soci¬ 
eties,  cyber  communities,  web  groups,  virtual  communities, 
web  communities,  virtual  social  networks,  and  e-communities, 
among  several  others. 

Cyberspace  is  the  new  frontier  in  social  relationships,  and 
people  are  using  the  Internet  to  make  friends,  colleagues,  lovers, 
as  well  as  enemies  (Suler,  2004).  As  Korzeny  pointed  out,  even 
as  early  as  1978,  online  communities  were  formed  around  inter¬ 
ests  and  not  physical  proximity  (Korzeny,  1978).  In  general,  what 
brings  people  together  in  an  online  community  are  common  in¬ 
terests  such  as  hobbies,  ethnicity,  education,  and  beliefs.  As  Wal¬ 
lace  (1999)  pointed  out,  meeting  in  online  communities  elimi¬ 
nates  prejudging  based  on  someone’s  appearance,  and  thus 
people  with  similar  attitudes  and  ideas  are  attracted  to  each  other. 

It  is  estimated  that  as  of  September  2002  there  are  over 
600  million  people  online  (NICA  Internet  Surveys,  2004).  The 
emergence  of  the  so-called  “global  village”  was  predicted  years 
ago  (McLuhan,  1964)  because  of  television  and  satellite  tech¬ 
nologies.  However,  it  is  argued  by  Fortner  (1993)  that  “global 
metropolis”  is  a  more  representative  term  (Choi  &  Danowski, 
2002).  If  one  takes  into  account  that  the  estimated  world  popu¬ 
lation  of  2002  was  6.2  billion  (U.S.  Census  Bureau,  2004),  then 
the  online  population  is  nearly  10%  of  the  world  population — a 
significant  percentage  which  must  be  taken  into  account  when 
analyzing  online  communities.  In  most  online  communities, 
time,  distance,  and  availability  are  no  longer  disseminating  fac¬ 
tors.  Given  that  the  same  individual  may  be  part  of  several  dif¬ 
ferent  or  numerous  online  communities,  it  is  obvious  why  online 
communities  keep  increasing  in  numbers,  size,  and  popularity. 

Preece,  Rogers,  and  Sharp  (2002)  stated  that  an  online  com¬ 
munity  consists  of  people,  a  shared  purpose,  policies,  and  com¬ 
puter  systems.  They  identified  the  following  member  roles: 

•  Moderators  and  mediators  who  guide  discussions/serve  as 

arbiters; 
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•  Professional  commentators  who  give  opinions/guide  discussions ; 

•  Provocateurs  who  provoke; 

•  General  Participants  who  contribute  to  discussions; 

•  Lurkers  who  silently  observe. 

CMC  has  its  benefits  as  well  as  its  limitations.  For  instance, 
CMC  discussions  are  often  potentially  richer  than  face-to-face 
discussions.  However,  users  with  poor  writing  skills  may  be  at  a 
disadvantage  when  using  text-based  CMC  (ScotCIT,  2003).  Table 
30.1  summarized  the  advantages  and  disadvantages  of  CMC 
(ScotCIT,  2003). 

Examples  of  CMC  and  Online  Communities 

Examples  of  CMC  include  asynchronous  communication  like 
e-mail  and  bulletin  boards;  synchronous  communication  like 
chatting;  and  information  manipulation,  retrieval,  and  storage 
through  computers  and  electronic  databases  (Ferris,  1997). 
Table  30.2  lists  the  main  types  of  CMC,  their  mode  (synchronous 
or  asynchronous),  and  the  type  of  media  they  support  (text, 
graphics,  audio,  video). 

When  it  comes  to  website  designers,  choosing  which  CMC  to 
employ  (for  instance,  forum  or  chat  room)  is  not  a  matter  of 
luck  or  randomness.  Selecting  the  right  CMC  tool  depends  on 
a  lot  of  factors.  For  example,  in  the  case  of  e-learning,  the  choice 
of  the  appropriate  mode  of  CMC  will  be  made  by  asking  and  an¬ 
swering  questions  such  as  (Bates,  1995;  CAl)  2004;  Heeren, 
1996;  Reiser  &  Gagne,  1983): 

•  Are  the  users  spread  across  time  zones?  Can  all  participants 
meet  at  the  same  time? 

•  Do  the  users  have  access  to  the  necessary  equipment? 

•  What  is  the  role  of  CMC  in  the  course? 

•  Are  the  users  good  readersAvriters? 

•  Are  the  activities  time-independent? 

•  How  much  control  is  allowed  to  the  students? 

Audio  conferencing  is  a  real-time  communication  mechanism, 
since  the  communication  happens  synchronously.  Depending 


on  the  application,  text  chat  and  graphics  may  also  be  sup¬ 
ported.  Video  conferencing,  like  audio  conferencing,  offers  a 
useful  mode  of  communication  but  has  the  added  benefit  of  al¬ 
lowing  participants  to  see,  not  just  hear,  one  another. 

IRC  and  chats  also  support  synchronous  communication, 
since  they  enable  the  users  to  carry  out  conversations  using  text 
messaging.  MUDs  build  on  chats  by  providing  avatars  and 
graphical  environments  in  which  the  users  can  engage  in  inter¬ 
active  fantasy  games  (Preece,  2000). 


TABLE  30.1.  Advantages  and  Disadvantages  of  CMC 
Advantages 

•  Time  and  place  independence 

•  No  need  to  travel 

•  Time  lapse  between  messages  allows  for  reflection 

•  Participants  have  added  time  to  read  and  compose  answers 

•  Questions  can  be  asked  without  waiting  for  a  "turn” 

•  It  allows  all  participants  to  have  a  voice  without  the  need  to  fight  for 
"airtime,”  as  in  a  face-to-face  situation 

•  The  lack  of  visual  cues  provides  participants  with  a  more  equal  footing 

•  Many  to  many  interaction  may  enhance  the  communication 

•  Answers  to  questions  can  be  seen — and  argued — by  all 

•  Discussion  is  potentially  richer  than  in  a  face-to-face  situation 

•  Messages  are  archived  centrally  providing  a  database  of  interactions 
which  can  be  revisited 

Disadvantages 

•  Communication  mainly  takes  place  via  written  messages,  so 
participants  with  poor  writing  skills  may  be  at  a  disadvantage 

•  Paralinguistic  cues  (facial  expression,  intonation,  gesture,  body 
orientation)  as  to  a  speakers’  intention  are  not  available,  except 
through  combinations  of  keystrokes  (emoticons)  or  the  use  of 
typeface  emphasis  (italics,  bold,  capital  letters) 

•  Time  gaps  within  exchanges  may  affect  the  pace  and  rhythm  of 
communications  leading  to  a  possible  loss  in  textual  coherence 

•  The  medium  is  socially  opaque;  participants  may  not  know  which  or 
how  many  people  they  may  be  addressing 

•  The  normal  repair  strategies  of  face-to-face  communication  are  not 
available  and  misunderstandings  may  be  harder  to  overcome 

•  Context  and  reference  of  messages  may  be  unclear  and  misunder¬ 
standings  may  occur 


TABLE  30.2.  CMC  Systems,  Their  Mode,  and  the  Types  of  Media  that  They  Support 

Supports 


Type  of  CMC 

Communication  Mode 

Text 

Graphics 

Audio 

Video 

Audio  conferencing 

Synchronous 

Some  applications 

No 

Yes 

No 

Video  conferencing 

Synchronous 

Yes 

Yes 

Yes 

Yes 

IRC 

Synchronous 

Yes 

as  attachments 

as  attachments 

as  attachments 

MUD 

Synchronous 

Yes 

No 

No 

No 

WWW 

Sync  &  Async 

Yes 

Yes 

Yes 

Yes 

E-mail 

Asynchronous 

Yes 

as  attachments 

as  attachments 

as  attachments 

Newsgroups/BBS 

Asynchronous 

Yes 

No 

No 

No 

Discussion  Boards 

Asynchronous 

Yes 

as  attachments 

as  attachments 

as  attachments 

Voice  mail 

Asynchronous 

Some  applications 

No 

Yes 

No 

Wiki 

Asynchronous 

Yes 

Yes 

Yes 

Possible 

Online  Virtual  Game 

Synchronous 

Yes 

Yes 

Yes 

Yes 

Environments 
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Websites  are  usually  asynchronous,  providing  community  in¬ 
formation  and  links  to  other  sites,  but  sometimes  also  have  syn¬ 
chronous  software,  like  chats,  embedded  in  them  (Preece,  2000). 

E-mail  is  an  asynchronous  mode  of  communication,  usually 
in  the  form  of  text.  However,  the  ability  to  add  attachments  to 
e-mail  messages  makes  it  possible  for  audio,  video,  and  graphic 
files  to  be  used  also.  Voice  mail  is  an  expansion  of  e-mail  whereby 
users  may  record  themselves  speaking  out  a  message  and  then 
send  that  voice  file  to  their  contact,  instead  of  typing  it.  News- 
groups,  like  e-mail,  provide  an  asynchronous  mode  of  commu¬ 
nication,  but  unlike  e-mail  where  the  messages  come  to  the 
users,  it  is  a  “pull”  technology  meaning  the  users  must  go  to  the 
UseNet  groups  themselves  (Preece,  2000).  Finally,  discussion 
boards,  also  referred  to  as  “forums”  or  “bulletin  boards”  provide 
an  asynchronous  mode  of  communication  where  the  users  post 
messages  for  others  to  see  and  respond  on  their  own  time. 

Preece  (2000)  described  in  detail  some  of  these  types  of 
CMC  and  their  different  characteristics.  In  this  chapter,  we  focus 
on  Wiki  and  Online  Virtual  Game  Environments  which  in  our 
view  provide  a  set  of  new,  novel  modes  of  communication  and 
online-community-building  systems . 

Wiki-based  communities.  The  “Wiki”,  named  for  the 
Hawaiian  word  “quick,”  is  a  new  technology  that  embodies  the 
notion  of  new  media  use  where  everyone  is  the  author.  It  is  a 
freely  expandable  collection  of  hypertexts  which  can  be  easily 
edited  by  any  user  with  knowledge  of  a  simple  markup  lan¬ 
guage.  It  does  not  require  any  specific  tools;  all  you  need  is  a 
form-capable  web-browser  client.  This  simplistic  method  gives 
the  freedom  to  everyone  reading  the  page  to  amend  or  correct 
what  he  or  she  is  reading.  This  aims  to  encourage  people  to 
contribute  to  the  expansion  of  the  page  (Halvorsen,  2005). 
Every  user  can  be  an  author  by  simply  clicking  the  edit  button, 
changing  the  current  text,  and  submitting  the  new  version, 
which  is  then  converted  automatically  into  HTML. 

Wikis  are  a  system  that  explicitly  supports  collaboration  and 
community  building  as  it  decentralizes  the  effort  of  creating  a 
website  from  the  hands  of  the  few  and  distributes  it  to  a  huge 
community  of  Internet  users.  In  a  Wiki  environment,  users  are 
not  only  editing,  they  are  also  encouraged  to  create  their  own 
content  and  their  own  pages.  A  link  to  existing  pages  can  be 
made  easily,  and  a  new  page  can  be  created  by  making  a  new 
link.  Thus,  apart  from  contents,  the  users  also  codesign  the 
structure  of  a  Wiki  site. 

The  goal  of  Wiki  sites  is  to  become  a  shared  repository  of 
knowledge,  with  the  knowledge  base  growing  over  time.  Un¬ 
like  chat  rooms.  Wiki  content  is  expected  to  have  some  degree 
of  seriousness  and  permanence.  In  a  Wiki,  the  users  create  the 
content  in  collaboration  and  over  time.  Like  weblogs.  Wikis  have 
been  around  for  some  time  and  are  popular  among  the  tech¬ 
nology  community.  However,  weblogs  can  be  highly  personal 
while  Wikis  are  intensely  collaborative.  Whereas  weblogs  tend  to 
use  a  modified  WYSIWYG  editing  environment.  Wikis  use  a  sim¬ 
ple  set  of  formatting  commands. 

Recently  there  has  been  an  increasing  interest  in  using  Wikis 
for  learning  (Wang  et  ah,  2005;  Jones,  2003).  Although  any 
knowledge-building  application  that  demands  the  absolute  and 
immutable  integrity  of  the  content  is  not  really  suitable  for  a 


Wiki,  it  is  useful  in  situations  where  communities  of  people  are 
developing  shared  ideas,  values,  or  resources.  A  biology  teacher 
for  example  could  start  a  Wiki  site  by  posting  some  material  cre¬ 
ating  a  tentative  structure  of  the  subject,  and  uploading  some 
media  files.  When  the  students  visit,  they  can  expand  the  con¬ 
tents  by  modifying  or  posting  more  material,  or  making  links 
to  new  pages,  thus  enriching  the  learning  resources.  Through 
shared  construction  cycles  the  students  feel  closer  to  the  learn¬ 
ing  system  as  they  contribute  to  its  development  instead  of  be¬ 
ing  passively  presented  with  the  information. 

A  project  has  already  been  undertaken  to  rebuild  a  web- 
based  learning  site  for  spectroscopy  using  Wiki  technologies 
with  the  goal  of  making  the  resource  more  relevant  and  con¬ 
tent-rich,  to  attract  authors  from  different  backgrounds  to  pro¬ 
vide  contents  in  multiple  languages  to  support  the  international 
users  (Mader,  2004).  Perhaps  the  most  famous  educational  Wiki 
in  existence  is  Wikipedia  (Wikipedia,  2005)  a  free-content  ency¬ 
clopedia  in  many  languages  that  anyone  can  edit.  It  is  the  re¬ 
sult  of  a  vast  collaboration  and  currently  contains  over  a  million 
entries  in  multiple  languages.  The  Wikipedia  project  shows  that 
the  model  might  work,  and  that  groups  of  people  can  collabo- 
ratively  create  shared  knowledge  artifacts. 

However,  a  Wiki  site  could  be  highly  unstructured  because 
no  editorial  function  examines  the  contributions  or  guarantees 
quality  or  accuracy  of  its  content.  It  is  the  responsibility  of  the 
users  to  ensure  correctness  and  it  is  their  collective  responsi¬ 
bility  to  take  care  of  all  aspects  of  policy,  such  as  appropriate 
behavior  in  the  community  as  well  as  rules  (Halvorsen,  2005). 

Since  anyone  can  literally  contribute  anything,  vandalism 
sometimes  occurs  within  Wikis.  Therefore,  version  control  is 
crucial.  Each  version  is  saved,  so  that  it  is  easy  to  reverse  to  a  for¬ 
mer  version  if  necessary  (Emigh  &  Herring,  2005).  Studies  have 
confirmed  that  vandals  are  often  stopped  and  the  vandalized 
pages  are  reversed  quickly  (Viegas,  Wattenberg,  &  Dave,  2004; 
Aronsson,  2002). 

Wiki-based  collaboration  could  be  a  very  practical  platform 
for  asynchronous  distributed  brainstorming;  since  Wikis  allow 
ideas  to  be  captured  quickly,  they  facilitate  elaboration  on 
existing  ideas  and  document  the  evolution  of  revisions  made 
to  ideas. 

With  the  increasing  popularity  of  Wikis,  it  is  not  surprising 
that  recently  it  has  been  used  to  assemble  online  communities 
of  various  interests.  The  central  access  for  all  users  has  made  the 
Wiki  appropriate  for  project  work,  document  production,  the 
joint  development  of  project  concepts,  and  discussion  forums 
of  all  types. 

The  Wiki  technology  appears  to  be  a  suitable  approach  for 
supporting  cooperative  community,  especially  for  knowledge 
generation.  One  of  the  reasons  that  make  it  a  successful  tool 
for  community  building  is  its  “anyone-can-contribute”  policy, 
where  producing  new  pages  is  quick  and  easy,  as  is  linking  them 
to  existing  pages. 

Moreover,  it  also  provides  a  simple  means  of  communication 
through  the  discussion  section  associated  with  each  Wiki  page 
as  well  as  the  history  entry  which  keeps  track  of  the  changes 
made  by  every  user.  In  such  a  way,  the  users  are  constructing 
and  discussing  the  page  at  the  same  time  within  the  same  space. 
This  results  in  the  rapid  appearance  of  a  chaotic  network  of  Wiki 
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pages  and  the  knowledge  grows  organically.  Facilitated  by  the 
Wiki  technology,  users  can  contribute  and  integrate  knowledge 
into  and  obtain  knowledge  from  the  existing  database  created 
by  others. 

The  particular  advantage  of  the  Wiki  approach  compared  to 
other  CMC  tools  on  knowledge  generation  and  exchange  sys¬ 
tems  is  the  focus  on  both  the  process  as  well  as  the  result  of 
communication  (Kittowski,  F.  F.  &  Kohler,  A.,  2002). 

Discussion  boards  focus  largely  on  the  process  of  the  com¬ 
munication  where  opinions  are  exchanged,  conflicts  are  re¬ 
solved,  and  agreements  are  achieved  regarding  the  documents/ 
projects  which  are  being  worked  on.  The  document/project  it¬ 
self  is  not  being  worked/expanded  explicitly  through  the  use  of 
discussion  boards;  instead,  only  after  the  consensus  is  reached 
will  the  project/document  be  worked  by  extracting  ideas  from 
the  postings  of  the  discussion. 

Content  and  document  management  systems  are  used  to  or¬ 
ganize  and  facilitate  collaborative  creation  of  documents  and 
other  content,  usually  applications  for  maintaining  documents, 
including  version  controls,  right  of  access,  document  protections, 
and  so  on.  In  terms  of  collaboration,  this  allows  the  process  of 
expansion  for  each  document  version  by  several  people.  Com¬ 
munication  processes  and  discussions  that  lead  to  the  final  prod¬ 
uct  are  limited,  for  example,  to  annotations. 

Wikis,  in  contrast,  permit  users  to  discuss  and  work  on  the 
document  simultaneously.  Moreover,  cooperative  production  of 
content  becomes  very  efficient  through  the  realignment  of  the 
distinction  between  author  and  reader.  Thus,  the  Wikis’  features 
largely  fulfill  the  requirements  for  a  tool  to  support  a  coopera¬ 
tive,  knowledge-building  community. 

In  fact,  many  Wikis  are  collaborative  communities.  As  men¬ 
tioned  previously.  Wikis  allow  anyone  to  click  the  “edit  button” 
and  change  the  website.  Surprisingly,  many  Wikis  are  able  to  do 
this  successfully  without  major  issues  in  terms  of  vandalism. 
One  of  the  important  vandal  control  features  is  that  the  Wiki 
stores  the  history  of  each  page.  Thus,  for  each  vandal,  there  is 
probably  a  group  of  people  who  actually  needs  the  information 
that  was  removed  through  vandalism,  and  who  will  reverse  the 
page  to  its  former  version.  In  addition,  some  Wikis  are  in  fact 
not  completely  open  as  they  restrict  access,  while  some  even 
have  a  democratic  error/vandalism  reporting  system. 

Online  virtual  game  communities.  With  the  advent 
of  ubiquitous  broadband  Internet  connection  and  the  increas¬ 
ing  graphical  processing  power  of  personal  computers,  a  new 
paradigm  of  gaming  has  emerged.  Massively  Multi-Player  Online 
Role  Playing  Games  (MMORPGs)  have  changed  the  game  in¬ 
dustry  dramatically.  MMORPGs  provide  a  fictional  setting  where 
a  large  group  of  users  voluntarily  immerse  themselves  in  a 
graphical  virtual  environment  and  interact  with  each  other  by 
forming  a  community  of  users. 

Although  the  concept  of  multiplayer  gaming  is  not  new,  the 
game  world  of  most  local  network  multiplayer  games,  as  op¬ 
posed  to  MMORPG,  are  simplistic  and  can  accommodate  only 
around  16  concurrent  players  in  a  limited  space. 

A  MMORPG  enables  thousands  of  players  to  play  in  an  evolv¬ 
ing  virtual  world  simultaneously  over  the  Internet.  The  game 
world  is  usually  modeled  with  highly  detailed  3D  graphics,  al¬ 


lowing  individuals  to  interact  not  only  with  the  gaming  envi¬ 
ronment,  but  also  with  other  players.  Usually  this  involves  the 
players  representing  themselves  with  avatars,  the  visual  repre¬ 
sentation  of  the  player’s  identity  in  the  virtual  world. 

The  MMORPG  environment  is  a  new  paradigm  in  computer 
gaming  in  which  players  are  part  of  a  persistent  world  which 
exists  independent  of  the  users  (Yee,  2005).  Unlike  other  games 
where  the  virtual  world  ceases  to  exist  when  players  switch  off 
the  game,  in  an  MMORPG,  the  world  exists  before  the  user  logs 
on,  and  continues  to  exist  when  the  user  logs  off  More  impor¬ 
tantly,  events  and  interactions  occur  in  the  world  even  when  the 
user  is  not  logged  on  as  many  other  players  are  constantly  in¬ 
teracting,  thus  transforming  the  world.  To  accommodate  the 
large  number  of  users,  the  worlds  in  MMORPGs  are  vast  and  var¬ 
ied  in  term  of  “geographical  locations,”  characters,  monsters, 
items,  and  so  forth.  More  often  than  not,  new  “locations”  or 
items  are  added  by  the  game  developers  from  time  to  time  ac¬ 
cording  to  the  demand  of  the  players. 

On  one  hand,  a  MMORPG  like  a  Role  Playing  Game  (RPG); 
it  involves  killing  monsters,  collecting  items,  developing  char¬ 
acters,  and  so  on.  However,  it  contains  an  extra  aspect  which  is 
the  internal  sociability  within  the  game.  Unlike  single  player 
games  which  rely  on  other  external  modes  of  communication 
(such  as  mailing  lists,  discussion  forums  outside  the  game)  to 
form  the  gaming  culture,  the  culture  is  formed  within  the 
MMORPG  environment  itself 

In  such  a  way,  these  MMORPG  virtual  worlds  represent  the 
persistent  social  and  material  world,  which  is  structured  around 
narrative  themes  (usually  fantasy),  where  players  are  engaged  in 
various  activities:  slaying  monsters,  attacking  castles,  scaveng¬ 
ing  goods,  trading  merchandise,  and  so  on.  On  one  hand,  the 
game’s  virtual  world  represents  the  escapist  fantasy;  on  another, 
it  supports  social  realism  (Kolbert,  2001). 

That  means  games  are  no  longer  meant  to  be  a  solitary  ac¬ 
tivity  played  by  a  single  individual.  Instead,  the  player  is  ex¬ 
pected  to  join  a  virtual  community  that  is  parallel  with  the  phys¬ 
ical  world,  in  which  societal,  cultural,  and  economical  systems 
arise.  It  has  been  gradually  becoming  a  world  that  allows  players 
to  immerse  into  experiences  which  closely  match  those  of  the 
real  world:  virtual  relationships  are  sought,  virtual  marriages  are 
held,  virtual  shops  are  set  up,  and  so  forth. 

The  MMORPG  genre  now  boasts  hundreds  of  thousands  of 
users  and  accounts  for  millions  of  dollars  in  revenue  each  year. 
The  number  of  people  who  play  the  games  (and  the  time  they 
invest  in  terms  of  activities  within  and  around  the  game)  is  as¬ 
tounding.  The  MMORPG,  Lineage  (NCsoft,  2005),  for  example, 
had  more  than  2.5  million  current  subscribers  in  2002  (Vaknin, 
2002)  and,  within  a  year.  Ultima  Online  (Electronic  Art,  2005)  at¬ 
tracted  more  than  160  million  person-hours  (Kolbert,  2001). 

Such  games  are  ripe  for  cultural  analysis  of  the  social  prac¬ 
tices  around  them.  Although  fundamentally  MMORPGs  are 
video  games  with  virtual  spaces  with  which  the  players  inter¬ 
act,  they  should  be  regarded  not  just  as  a  piece  of  game  soft¬ 
ware;  they  are  a  community,  a  society,  and,  if  you  wish,  a  culture. 
These  games  are  becoming  the  most  interesting  interactive 
computer-mediated  communication  and  networked  activity  en¬ 
vironment  (Taylor,  2002).  Thus,  understanding  the  pattern  of 
participation  in  these  game  communities  is  crucial,  as  such  virtual 
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communities  function  as  a  major  mechanism  of  socialization  of 
the  players  to  the  norms  of  the  game  culture  that  emerges  as 
Squire  and  Steinkeuhler  (in  press)  noted: 

Playing  one’s  character(s)  and  living  in  [these  virtual  worlds]  becomes 
an  important  part  of  daily  life.  Since  much  of  the  excitement  of  the  game 
depends  on  having  personal  relationships  and  being  part  of  [the]  com¬ 
munity’s  developing  politics  and  projects,  it  is  hard  to  participate  just  a 
little  (Squire,  Steinkeuhler,  in  press) 

Recently  game  designers  have  tried  to  stretch  the  bound¬ 
aries  further  by  structuring  in-game  activities  to  maximize 
interaction.  One  of  the  examples  of  sociability  by  design  in 
MMORPG  is  Star  "Wars  Galaxies  (Sony,  2005)  which  is  organized 
so  that  players  are  steered  toward  certain  locations  in  the  game 
world  where  social  playing  is  expected  to  take  place  (Duche- 
neaut,  Moore,  &  Nickell,  2004). 

Such  communities  formed  around  the  game  can  be  broadly 
divided  into  two  categories:  in-game  and  out-of-game  commu¬ 
nities.  Most  MMORPGs  are  created  to  encourage  long-term  rela¬ 
tionships  among  the  players  through  the  features  that  support 
the  formation  of  in-game  communities.  One  of  the  most  evident 
examples  is  the  concept  of  guilds.  Guilds  are  a  fundamental  com¬ 
ponent  of  MMORPG  culture  for  people  who  are  natural  organiz¬ 
ers  to  run  a  virtual  association  which  has  formalized  membership 
and  rank  assignments  and  encourage  participation.  Sometimes, 
a  player  might  join  a  guild  and  get  involved  in  a  guild  war  in  or¬ 
der  to  fight  for  the  castle.  Each  guild  usually  has  a  leader  and  sev¬ 
eral  guilds  could  team  up  in  a  war.  This  involves  a  complicated 
leader-subordinate  and  leader-leader  relationship. 

In  addition,  to  encourage  social  interaction,  MMORPGs  are 
specially  designed  in  such  a  way  that  some  game  goals  are  al¬ 
most  impossible  to  be  achieved  without  forming  communities. 
For  example,  one  player  alone  could  spend  a  long  period  col¬ 
lecting  all  of  the  items  needed  to  assemble  a  device.  But  a  guild 
could  ask  its  members  to  fan  out  in  small  groups  and  collect  all 
of  the  necessary  components  in  one  day.  Complex  devices  be¬ 
yond  the  reach  of  any  individual  player  could  be  quickly  con¬ 
structed  by  the  guild.  The  guild  could  also  accept  donations 
from  members  and  then  distribute  those  contributions  to  oth¬ 
ers  according  to  their  needs,  benefiting  everyone  through  this 
collaboration  (Kelly,  2004). 

Apart  from  relatively  long-term  relationship  such  as  guild 
communities,  MMORPGs  also  provide  many  opportunities  for 
short-term  relationship  experiences.  For  example,  a  player  could 
team  up  with  another  player  to  kill  monsters  in  order  to  develop 
the  abilities  of  their  avatars  (“level  up”)  or  some  more  expert 
players  could  help  newer  players  to  get  through  the  game. 

When  trying  to  win  the  game,  players  often  need  to  get  in¬ 
formation  from  other  resources:  guidebooks,  discussion  fo¬ 
rums,  other  players,  and  the  like.  Therefore  game  playing  is 
generally  more  concerned  with  player-player  interaction  than 
with  player-game  interaction.  What  is  at  first  confined  to  the 
game  alone  soon  spills  over  into  the  virtual  world  beyond  it 
(e.g.,  websites,  chatrooms,  e-mail)  and  even  life  off-screen  (e.g., 
telephone  calls,  face-to-face  meetings). 

Apart  from  these  external  communities  around  the  game 
which  are  mediated  through  e-mails  or  online  forums  (which 
also  exist  in  many  other  games),  an  interesting  phenomenon 


fuses  the  internal  and  external  game  communities.  The  partici¬ 
pation  in  an  external  community  starts  to  break  the  magic  cir¬ 
cle  of  the  game — that  game  space  is  no  longer  separate  from 
real  life — as  the  out-of-game  community  trades  in-game  items 
for  real  money. 

For  example,  Norrath,  the  world  of  EverQuest,  was  estimated 
to  have  the  77th  largest  economy  in  the  real  world  based  on 
buying  and  selling  in  online  auction  houses. 

About  12,000  people  call  it  their  permanent  home,  although  some 
60,000  are  present  there  at  any  given  time.  The  nominal  hourly  wage  is 
about  USD3.42  per  hour,  and  the  labors  of  the  people  produce  a  GNP 
per  capita  somewhere  between  that  of  Russia  and  Bulgaria.  A  unit  of 
Norrath’s  currency  is  traded  on  exchange  markets  at  USD0.0107,  higher 
than  the  Yen  and  the  Lira.  (Castronova,  2001) 

Having  illustrated  the  social  phenomenon  around  such  play¬ 
ful  virtual  community,  it  is  believed  that  it  is  fruitful  to  research 
such  communities  as  we  might  be  able  to  derive  some  useful 
implications  on  how  successful  Computer  Supported  Collabo¬ 
rative  Work  (CSCW)  and  Computer  Supported  Collaborative 
Learning  (CSCL)  environments  can  be  designed.  For  this  reason, 
in  the  next  section  we  will  describe  some  of  the  methodologies 
that  can  be  used  in  such  studies,  and  later  we  will  present  the  ap¬ 
plication  of  some  of  these  methods  to  two  case  studies. 


ANALYZING  ONLINE  COMMUNITIES: 
FRAMEWORKS  AND  METHODOLOGIES 


There  are  various  aspects  and  attributes  of  CMC  that  can  be 
studied  to  help  us  better  understand  online  communities — for 
instance,  the  analysis  of  the  frequency  of  exchanged  messages 
and  the  formation  of  social  networks,  or  the  analysis  of  the  con¬ 
tent  of  the  exchanged  messages  and  the  formation  of  virtual 
communities.  To  achieve  such  an  analysis  a  number  of  theoreti¬ 
cal  frameworks  have  been  developed  and  proposed.  For  exam¬ 
ple,  Henri  (1992)  provided  an  analytical  model  for  cognitive 
skills  that  can  be  used  to  analyze  the  process  of  learning  within 
messages  exchanged  between  students  of  various  online  e-learn¬ 
ing  communities.  Mason’s  work  (1991)  provided  descriptive 
methodologies  using  both  quantitative  and  qualitative  analysis. 
Furthermore,  five  phases  of  interaction  analysis  are  identified 
in  Gunawardena,  Lowe,  and  Anderson’s  model  (1997): 

1 .  Sharing/Comparing  of  Information 

2.  The  Discovery  and  Exploration  of  Dissonance  or  Inconsis¬ 
tency  among  Ideas,  Concepts,  or  Statements 

3.  Negotiation  of  Meaning/Co-Construction  of  Knowledge 

4.  Testing  and  Modification  of  Proposed  Synthesis  or  Co- 
Construction 

5.  Agreement  Statement(s)/Applications  of  Newly  Constructed 
Meaning 

In  this  section  we  provide  a  description  of  some  of  the  most 
commonly  used  online  community  evaluation  techniques  as 
well  as  their  weaknesses  and  strengths. 


30.  Online  Communities  *  609 


Query-Based  Techniques  and  User  Profiles: 

Interviews,  Questionnaires,  and  Personas 

Interviews.  An  interview  can  be  defined  as  a  type  of  con¬ 
versation  that  is  initiated  by  the  interviewer  in  order  to  obtain  re¬ 
search  relevant  information.  The  interview  reports  have  to  be 
carefully  targeted  and  analyzed  to  make  an  impact  (Usability  Net, 
2003).  Interviews  are  usually  carried  out  on  a  one-to-one  basis  in 
which  the  interviewer  collects  information  from  the  interviewee. 
Interviews  can  take  place  by  telephone  or  face  to  face  (Burge  & 
Roberts,  1993) .  They  can  also  take  place  via  nonreal-time  methods 
like  fax  and  e-mail,  although  in  these  cases  they  function  like  ques¬ 
tionnaires.  Interviews  are  useful  for  obtaining  information  that  is 
difficult  to  elicit  through  approaches  such  as  background  knowl¬ 
edge  and  general  principles.  There  are  three  types  of  interviews: 

1.  Structured  interviews,  which  consist  of  predetermined  ques¬ 
tions  asked  in  fixed  order  like  a  questionnaire; 

2.  Semi-structured  interviews,  in  which  questions  are  deter¬ 
mined  in  advance  but  maybe  reordered,  reworded,  omitted, 
and  elaborated; 

3.  Unstructured  interviews,  which  are  not  based  on  predeter¬ 
mined  questions  but  instead  upon  a  general  area  of  inter¬ 
est,  thus  allowing  the  conversation  to  develop  freely. 

Interviews  can  be  used  to  gain  insights  about  general  char¬ 
acteristics  of  the  participants  of  an  online  community  and  their 
motivations  for  participating  in  the  community  under  investi¬ 
gation.  The  data  collected  comes  straight  from  the  participants 
of  the  online  communities,  whereby  they  are  able  to  provide 
feedback  based  on  their  own  personal  experiences,  activities, 
thoughts,  and  suggestions. 

Advantages  of  Interviews  (Usability  Net,  2003)  include  (a)  what 
is  talked  about  can  address  directly  the  informant’s  individual 
concerns;  (b)  mistakes  and  misunderstandings  can  be  quickly 
identified  and  cleared  up;  (c)  they  are  more  flexible  than  ques¬ 
tionnaires;  and  (d)  they  can  cover  low-probability  events. 

Disadvantages  of  Interviews  include  (a)  the  danger  of  analyst 
bias  towards  his  or  her  own  knowledge  and  beliefs;  (b)  the  ac¬ 
curacy  and  honesty  of  responses;  and  (c)  they  often  must  be 
used  with  other  data-collection  techniques  to  improve  quality  of 
data  collected. 

Questionnaires.  A  questionnaire  is  a  self-reporting 
query-based  technique.  Questionnaires  are  typically  produced 
on  printed  paper,  but  due  to  recent  technologies  and  in  partic¬ 
ular  the  Internet,  many  researchers  engage  in  the  use  of  online 
questionnaires,  thus  saving  time  and  money  and  eliminating  the 
problem  of  a  participant’s  geographical  distance.  Three  types 
of  questions  can  be  used  with  questionnaires:  (a)  open  ques¬ 
tions,  where  the  participants  are  free  to  respond  however  they 
like;  (b)  closed  questions,  which  provide  the  participants  with 
several  choices  for  the  answer;  and  (c)  scales  where  the  re¬ 
spondents  must  answer  on  a  predetermined  scale. 

For  online  communities,  questionnaire  can  be  used  to  elicit 
facts  about  the  participants,  their  behaviors,  and  their  beliefs/ 
attitudes.  Like  interviews,  questionnaires  are  an  important  tech¬ 
nique  for  collecting  user  opinions  and  experiences  they  have  had 
using  CMC  and  their  overall  existence  in  online  communities. 


The  main  advantages  of  questionnaires  are  that  they  are  faster 
to  carry  out  than  observational  techniques,  and  they  can  cover 
low-probability  events.  Disadvantages  include  (a)  information  is 
an  idealized  version  of  what  should,  rather  than  what  does,  hap¬ 
pen;  (b)  responses  may  lack  accuracy  or  honesty;  (c)  there  is 
danger  of  researcher  bias  towards  subset  of  knowledge  he  or  she 
possesses;  and  (d)  they  must  be  used  in  conjunction  with  other 
techniques  for  validity. 

Personas.  Findings  from  interviews  and  questionnaires  can 
be  further  used  as  a  basis  for  developing  user  profiles  using  per¬ 
sonas.  A  persona  is  a  precise  description  of  the  user  of  a  system, 
and  of  what  he  or  she  wishes  to  accomplish  (Cooper,  1999).  The 
specific  purpose  of  a  persona  is  to  serve  as  a  tool  for  software  and 
product  design.  Although  personas  are  not  real  people,  they  rep¬ 
resent  them  throughout  the  design  stage  (Blomkvist,  2002)  and  are 
best  based  on  real  data  collected  through  query-based  techniques. 

Personas  are  rich  in  details,  including  name,  social  history, 
and  goals,  and  are  synthesized  from  findings  using  query-based 
techniques  with  real  people  (Cooper,  1999).  The  technique 
takes  into  account  user  characteristics  and  creates  a  concrete 
profile  of  the  typical  user  (Cooper,  1999). 

For  online  communities,  personas  can  be  used  to  better  un¬ 
derstand  the  participants  of  the  community  and  their  back¬ 
grounds.  Personas  can  also  be  used  as  a  supplement  to  social 
network  analysis  (described  later  in  this  chapter)  to  get  a  greater 
overview  of  the  characteristics  of  key  participants  of  a  commu¬ 
nity.  Using  personas,  web  developers  gain  a  more  complete  pic¬ 
ture  of  their  prospective  and/or  current  users  and  are  able  to  de¬ 
sign  the  interfaces  and  functionality  of  their  systems  to  be  more 
personalized  and  suited  for  the  communication  of  the  members 
of  their  online  communities. 

Advantages  of  personas  include  (a)  they  can  be  used  to  create 
user  scenarios;  (b)  they  can  be  anonymous,  protecting  use  privacy; 
and  (c)  they  represent  the  user  stereotypes  and  characteristics. 

Disadvantages  of  personas  include  that  if  not  enough  per¬ 
sonas  are  used,  users  are  forced  to  fall  into  a  certain  persona 
type  which  might  now  accurately  represent  them;  and  that  they 
are  time-consuming. 

Log  Analysis 

A  log,  also  referred  to  as  weblog,  server  log,  or  log-file,  is  in  the 
form  of  a  text  file  and  is  used  to  track  the  users’  interactions  with 
the  computer  system  they  are  using.  The  types  of  interaction 
recorded  include  key  presses,  device  movements,  and  other  in¬ 
formation  about  the  user  activities.  The  data  is  collected  and  an¬ 
alyzed  using  specialized  software  tools  and  the  range  of  data  col¬ 
lected  depends  on  the  log  settings.  Logs  are  also  time-stamped 
and  can  be  used  to  calculate  how  long  a  user  spends  on  a  par¬ 
ticular  task  or  how  long  a  user  lingers  in  a  certain  part  of  the 
website  (Preece,  Rogers,  &  Sharp,  2002).  In  addition,  an  analy¬ 
sis  of  the  server  logs  can  help  us  find  out  when  people  visited  the 
site,  the  areas  they  navigated,  the  length  of  their  visits,  the  fre¬ 
quency  of  their  visits,  their  navigation  patterns,  from  where  they 
are  connected,  and  details  about  the  computers  they  are  using. 

Log  analysis  is  a  useful  and  easy-to-use  tool  when  analyzing 
online  communities.  For  example,  someone  can  use  log  analysis 
to  answer  more  accurately  questions  like  student  attendance  of 
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an  online  learning  community.  Furthermore,  logs  can  identify 
the  web  pages  users  spend  more  time  viewing,  and  the  paths 
that  they  used.  This  helps  identify  the  navigation  problems  of 
the  website,  but  also  gives  a  visualization  of  the  users’  activities 
in  the  virtual  communities.  For  instance,  in  the  case  of  e-learn¬ 
ing  communities,  the  log  files  will  show  which  students  are  ac¬ 
tive  in  the  CMC  postings  even  if  they  are  not  active  participants 
(few  postings  themselves),  but  just  observing  the  conversations. 
Preece  and  Maloney-Krichmar  (2003)  noted  that  data  logging 
does  not  interrupt  the  community,  while  at  the  same  time  can 
be  used  to  examine  mass  interaction. 

Advantages  of  logs  (Preece  et  al.,  2002)  are  that  they  (a)  help 
evaluators  analyze  users’  behavior;  (b)  help  evaluators  under¬ 
stand  how  users  worked  on  specific  tasks;  (c)  are  unobtrusive; 
and  (d)  can  automatically  log  large  volumes  of  data. 

Disadvantages  of  logs  (Preece  et  al.,  2002)  are  that  (a)  power¬ 
ful  tools  are  needed  to  explore  and  analyze  the  data  quantita¬ 
tively  and  qualitatively;  and  (b)  there  are  user  privacy  issues. 

Content  and  Textual  Analysis 

Content  analysis  is  an  approach  to  understanding  the  processes 
that  participants  engage  in  as  they  exchange  messages  (McLough- 
lin,  1996).  There  have  been  several  frameworks  created  for 
studying  the  content  of  messages  exchanged  in  online  commu¬ 
nities.  Examples  include  work  from  Archer,  Garrison,  Anderson, 
and  Rourke  (2001);  Gunawardena,  Lowe,  and  Anderson’s  (1997) 
model  for  examining  the  social  construction  of  knowledge  in 
computer  conferencing;  Henri’s  (1992)  content-analysis  model 
and  Fahy,  Crawford,  and  Ally’s  (2001)  Transcript  Analysis  Tool 
(TAT)  which  is  described  in  more  detail  below. 

The  TAT  focuses  on  the  content  and  interaction  patterns  at 
the  component  level  of  the  transcript  (Fahy  et  al.,  2001).  After 
a  lengthy  experience  with  other  transcript  tools  and  reviews  of 
previous  studies,  Fahy  et  al.  (2001)  chose  to  adapt  Zhu’s  (1996) 
analytical  model  for  the  TAT.  Zhu’s  model  (1996)  examined  the 
forms  of  electronic  interaction  and  discourse,  the  forms  of  par¬ 
ticipation  and  the  direction  of  participant  interaction  in  com¬ 
puter  conferences.  The  TAT  also  contains  echoes  of 'Vygotskian 
theory,  primarily  those  dealing  with  collaborative  sense  mak¬ 
ing,  social  negotiation,  and  proximal  development  (Cook  &  Ral¬ 
ston,  2003).  The  TAT  developers  have  come  up  with  the  follow¬ 
ing  strategic  decisions  (Fahy,  2003):  (a)  the  sentence  is  the  unit 
of  analysis;  (b)  the  TAT  is  the  method  of  analysis;  and  (c)  inter¬ 
action  is  the  criterion  for  judging  conference  success  and  topi¬ 
cal  progression  (types  and  patterns). 

The  TAT  was  designed  to  permit  transcript  content  to  be  coded 
reliably  and  efficiently  (Fahy  et  al.,  2001),  while  the  advantages  of 
TAT  are  (Fahy,  2003;  Cook  &  Ralston,  2003;  Fahy  et  al.,  2001) 

It  reveals  interaction  patterns  that  are  useful  in  assessing  dif¬ 
ferent  communication  styles  and  online  behavioral  prefer¬ 
ences  among  participants; 

It  recognizes  the  complexity  of  e-conferences  and  measures 
the  intensity  of  interaction; 

It  enables  the  processes  occurring  within  the  conferences 
to  be  noted  and  recorded; 

It  probes  beyond  superficial  systems  data,  which  mask  the 
actual  patterns  of  discussion; 


It  relates  usefully  to  other  work  in  the  area; 

It  discriminates  among  the  types  of  sentences  within  the 
transcript; 

It  reflects  the  importance  of  both  social  and  task-related  con¬ 
tent  and  outcomes  in  transcript-analysis  research. 

The  unit  of  analysis  of  the  TAT  is  the  sentence.  In  the  case 
of  highly  elaborated  sentences,  the  units  of  analysis  can  be  in¬ 
dependent  clauses  which,  punctuated  differently,  could  be  sen¬ 
tences  (Fahy,  2003).  Fahy  et  al.  (2001)  have  concluded  that  the 
selection  of  message-level  units  of  analysis  might  partially  ex¬ 
plain  problematic  results  that  numerous  researchers  have  had 
with  previous  transcript-analysis  work.  They  also  believe  that 
the  finer  granularity  of  sentence-level  analysis  results  in  several 
advantages  (Fahy,  2003): 

Reliability; 

Ability  to  detect  and  describe  the  nature  of  the  widely  varying 
social  interaction,  and  differences  in  networking  pattern,  in 
the  interactive  behavior  of  an  online  community,  including 
measures  of  social  network  density  and  intensity; 
Confirmation  of  gender  associations  in  epistolary/expository 
interaction  patterns,  and  in  the  use  of  linguistic  qualifiers  and 
intensifiers.  Table  30.3  shows  the  TAT  categories  (Fahy  et  al., 
2001;  Fahy,  2003). 

Social  Network  Analysis  (SNA) 

Social  Network  Analysis  (SNA)  is  the  mapping  and  measuring  of  rela¬ 
tionships  and  flows  between  people,  groups,  organizations,  computers 
or  other  information/knowledge  processing  entities.  The  nodes  in  the 
network  are  the  people  and  groups  while  the  links  show  relationships 
or  flows  between  the  nodes.  SNA  provides  both  a  visual  and  a  mathe¬ 
matical  analysis  of  human  relationships.  (Krebs,  2004,  p.  1) 

Preece  (2000)  added  that  SNA  provides  a  philosophy  and  a 
set  of  techniques  for  understanding  how  people  and  groups  re¬ 
late  to  each  other,  and  has  been  used  extensively  by  sociologists 
(Wellman,  1982;  Wellman,  1992),  communication  researchers 
(Rice,  1994;  Rice,  Grant,  Schmitz,  &  Torobin,  1990)  and  others. 
Analysts  use  SNA  to  determine  if  a  network  is  tightly  bounded, 
diversified,  or  constricted;  to  find  its  density  and  clustering;  and 
to  study  how  the  behavior  of  network  members  is  affected  by 
their  positions  and  connections  (Garton,  Haythornthwaite  & 
Wellman,  1997;  Hanneman,  1998;  Scott,  2000).  Network  re¬ 
searchers  have  developed  a  set  of  theoretical  perspectives  of 
network  analysis.  Some  of  these  are  (Borgatti,  2002), 

•  Focus  on  relationships  between  actors  than  the  attributes  of 
actors; 

•  Sense  of  interdependence :  a  molecular  rather  than  atomistic  view; 

•  Structure  affects  substantive  outcomes; 

•  Emergent  effects. 

“The  aim  of  social  network  analysis  is  to  describe  why  people 
communicate  individually  or  in  groups”  (Preece,  2000,  p.  183), 
while  the  goals  of  SNA  are  (Dekker,  2002) 

•  To  visualize  relationships/communication  between  people 
and/or  groups  using  diagrams; 
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TABLE  30.3.  TAT  Categories 
Category 

1 :  Questioning 

The  questioning  category  is  further  broken  down  into  two  types  of 
questions: 

lA  Vertical  Questions 

These  questions  assume  a  "correct”  answer  exists,  and  that  they 
can  be  answered  if  the  right  authority  to  supply  it  can  be  found. 
Example:  "Does  anybody  know  what  time  the  library  opens  on 
Saturdays?" 

1 B  Horizontal  Questions 

For  these  questions,  there  may  not  be  only  one  right  answer.  These 
questions  invite  negotiation.  Example:  "Do  you  really  think  mp3  files 
are  should  become  illegal,  or  you  don't  see  any  harm  by  them?" 

2:  Statements 

This  category  consists  of  two  subcategories: 

2A  Nonreferential  Statements 

These  statements  contain  little  self-revelation  and  usually  do  not 
invite  response  or  dialogue  and  their  main  intent  is  to  impart  facts 
or  information.  Example:  "We  found  that  keeping  content  up-to- 
date,  distribution  and  PC  compatibility  issues  were  causing  a  huge 
draw  on  Ed.  Centre  time." 

2B  Referential  Statements 

Referential  statements  are  direct  answers  to  questions.  They  can 
include  comments  referring  to  specific  preceding  statements. 
Example:  "That's  right;  it's  the  1997  issue  that  you  want.” 

3;  Reflections 

Reflections  are  significant  personal  revelations,  where  the  speaker 
expresses  personal  or  private  thoughts,  judgments,  opinions,  or 
information.  Example:  "My  personal  opinion  is  that  it  shouldn't 
have  been  a  penalty  kick.” 

4:  Scaffolding  and  Engaging 

Scaffolding  and  engaging  initiate,  continue,  or  acknowledge 
interpersonal  interaction.  They  personalize  the  discussion  and 
can  agree  with,  thank  or  otherwise  recognize  someone  for  their 
helpfulness  and  comments.  Example,  "Thanks  Dave;  I’ve  been 
trying  to  figure  that  out  for  ages  ;)” 

5;  References/Authorities 

Category  5  is  compromised  of  two  types; 

5A  Quotations,  references  to,  paraphrases  of  other  sources. 

Example,  "You  said,  'I'll  be  out  of  the  city  that  day."' 

5B  Citations,  attributions  of  quotations  and  paraphrases. 
Example;  "Mathew,  P  (2001).  A  beginners’  guide  to  mountain 
climbing.” 


•  To  study  the  factors  which  influence  relationships  and  the 
correlations  between  them; 

•  To  draw  out  implications  of  the  relational  data,  including 
bottlenecks; 

•  To  make  recommendations  to  improve  communication  and 
workflow  in  an  organization. 

Preece  (2002)  and  Beidernikl  and  Paier  (2003)  listed  the  fol¬ 
lowing  as  the  limitations  of  SNA; 

•  More  theory  that  speaks  directly  to  developers  of  online 
communities  is  needed; 

•  The  data  collected  may  be  personal  or  private; 

•  The  analysis  of  the  data  is  quantitative  and  specific  to  the  par¬ 
ticular  network,  while  common  survey  data  are  qualitative 
and  generalize  answers  on  the  parent  population. 


It  is  also  worth  pointing  out  that  network  analysis  is  con¬ 
cerned  about  dyadic  attributes  between  pairs  of  actors  (like  kin¬ 
ship,  roles,  and  actions),  while  social  science  is  concerned  with 
monadic  attributes  of  the  actor  (like  age,  sex,  and  income). 
There  are  two  approaches  to  SNA: 

Ego-centered  analysis — Focuses  on  the  individual  as  op¬ 
posed  to  the  whole  network,  and  only  a  random  sample  of 
network  population  is  normally  involved  (Zaphiris,  Zacharia, 
&  Rajasekaran,  2003).  The  data  collected  can  be  analyzed  us¬ 
ing  standard  computer  packages  for  statistical  analysis  like 
SAS  and  SPSS  (Garton,  Haythornthwaite,  &  Wellman,  1997). 
Whole  network  analysis — The  whole  population  of  the  net¬ 
work  is  surveyed  and  this  facilitates  conceptualization  of  the 
complete  network  (Zaphiris  et  al.,  2003).  The  data  collected 
can  be  analyzed  using  microcomputer  programs  like  UCINET 
and  Krackplot  (Garton  et  al.,  1997). 

The  following  are  important  units  of  analysis  and  concepts  of 
SNA  (Garton  et  al.,  1997;  Wellman,  1982;  Hanneman,  2001;  Za¬ 
phiris  et  al.,  2003;  Wellman,  1992): 


Nodes: 

The  actors  or  subjects  of  study; 

Relations: 

The  strands  between  actors.  They  are 
characterized  by  content,  direction, 
and  strength; 

Ties: 

Connect  a  pair  of  actors  by  one  or 
more  relations; 

Multiplexity: 

The  more  relations  in  a  tie,  the  more 
multiplex  the  tie  is; 

Composition: 

This  is  derived  from  the  social 
attributes  of  both  participants; 

Range: 

The  size  and  heterogeneity  of  the 
social  networks; 

Centrality: 

Measures  who  is  central  (powerful)  or 
isolated  in  networks; 

Roles: 

Network  roles  are  suggested  by  similari¬ 
ties  in  network  members’  behavior; 

Density: 

The  number  of  actual  ties  in  a  network 
compare  to  the  total  amount  of  ties  that 
the  network  can  theoretically  support; 

Reachability: 

In  order  to  be  reachable,  connections 
that  can  be  traced  from  the  source  to 
the  required  actor  must  exit; 

Distance: 

The  number  of  actors  that  information 
has  to  pass  through  to  connect  one 
actor  with  another  in  the  network; 

Cliques: 

Subsets  of  actors  in  a  network,  who 
are  more  closely  tied  to  each  other 
than  to  the  other  actors  who  are  not 
part  of  the  subset. 

Social  Network  Analysis  is  a  very  valuable  technique  when  it 
comes  to  analyzing  online  communities  as  it  can  provide  a  visual 
presentation  of  the  community  and,  more  importantly,  it  can 
provide  us  with  qualitative  and  quantitative  measures  of  the  dy¬ 
namics  of  the  community.  The  application  of  SNA  to  the  analysis 
of  online  communities  is  further  demonstrated  with  a  case 
study  later  in  this  chapter. 
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CASE  STUDIES 


In  this  section  we  present  two  case  studies  that  demonstrate  the 
use  of  theoretical  and  analytical  techniques  for  studying  online 
communities.  In  the  first  case  study,  we  demonstrate  how  the 
results  from  an  “attitude  toward  thinking  and  learning”  ques¬ 
tionnaire  can  be  combined  with  social  network  analysis  to  de¬ 
scribe  the  dynamics  of  a  computer-aided  language-learning 
(CALL)  online  community.  In  the  second  case  study,  we  present 
a  theoretical  activity  theory  model  that  can  be  used  for  describ¬ 
ing  interactions  in  online  game  communities. 

Computer  Aided  Language  Learning  Communities 

In  the  first  case  study  we  demonstrate  a  synthetic  use  of  quan¬ 
titative  (SNA)  and  qualitative  (questionnaires)  methods  for  an¬ 
alyzing  the  interactions  that  take  place  in  a  Computer  Aided 
Language  Learning  (CALL)  course.  Data  was  collected  directly 
from  the  discussion  board  of  the  “Learn  Greek  Online”  (LGO) 
course  (Kypros-Net  Inc.,  2005). 

LGO  is  a  student-centered  e-Learning  course  for  learning 
Modern  Greek  and  was  built  using  a  participatory  design  and 
distributed  constructionism  methodology  (Zaphiris  &  Zacharia, 
2001).  In  an  ego-centered  SNA  approach,  we  have  carried  out  an 
analysis  of  the  discussion  postings  of  the  first  50  actors  (in  this 
case  the  students  of  the  course)  of  LGO. 

To  carry  out  the  social  network  analysis  we  used  an  SNA  tool 
called  “NetMiner”  (Cyram,  2004),  which  enabled  us  to  obtain 
centrality  measures  for  our  actors.  The  “in  and  out  degree  cen¬ 
trality”  was  measured  by  counting  the  number  of  interaction 
partners  per  each  individual  in  the  form  of  discussion  threads 
(for  example,  if  an  individual  posts  a  message  to  three  other  ac¬ 
tors,  then  his  or  her  out-degree  centrality  is  three;  whereas  if 
an  individual  receives  posts  from  five  other  actors  then  his  or 
her  in-degree  is  five). 

Due  to  the  complexity  of  the  interactions  in  the  LGO  discus¬ 
sion  we  had  to  make  several  assumptions  in  our  analysis: 

•  Posts  that  received  zero  replies  were  excluded  from  the  analy¬ 
sis.  This  was  necessary  in  order  to  obtain  meaningful  visual¬ 
izations  of  interaction; 

•  Open  posts  were  assumed  to  be  directed  to  everyone  who 
replied; 

•  Replies  were  directed  to  all  the  existing  actors  of  the  specific 
discussion  thread  unless  the  reply  or  post  was  specifically  di¬ 
rected  to  a  particular  actor. 


gumentative  stance  to  learning  (Galotti,  Clinchy  Ainsworth, 
Lavin,  &  Mansfield,  1999). 

The  out-degree  results  of  the  social  network  analysis  are  de¬ 
picted  in  Fig.  30.1  in  the  form  of  a  sociogram,  and  the  in-degree 
results  are  depicted  in  Fig.  30.2.  Each  node  represents  one  stu¬ 
dent  (to  protect  the  privacy  and  anonymity  of  our  students,  their 
names  have  been  replaced  by  a  student  number).  The  position 
of  a  node  in  the  sociogram  is  representative  of  the  centrality  of 
that  actor  (the  more  central  the  actor  the  more  active).  As  can 
be  seen  from  Fig.  30.1,  students  S12,  S7,  S4,  and  S30  (with  out- 
degree  scores  ranging  from  0.571  to  0.265)  are  at  the  centre  of 
the  sociogram  and  possess  the  highest  out-degree.  The  same  stu¬ 
dents  also  posses  the  highest  in-degree  scores  (Fig.  30.2).  This  is 
an  indication  that  these  students  are  the  most  active  members  of 
this  online  learning  community,  posting  and  receiving  the  largest 
number  of  postings.  In  contrast  participants  in  the  outer  circle 
(e.g.,  S8,  S9,  S14,  etc.)  are  the  least  active  with  the  smallest  out-de- 
gree  and  in-degree  scores  (all  with  0.02  out-degree  scores). 

In  addition,  a  clique  analysis  was  carried  out  (Fig.30.  3)  and  it 
showed  that  15  different  cliques  (the  majority  of  which  are  over¬ 
lapping)  of  at  least  three  actors  each  have  been  formed  in  this 
community. 

As  part  of  the  ego-centered  analysis  for  this  case  study  we 
look  in  more  detail  at  the  results  for  two  of  our  actors:  S12,  who 
is  the  most  central  actor  in  our  SNA  analysis  (e.g.,  with  the  high¬ 
est  out-degree  score),  and  S9,  an  actor  with  the  smallest  out- 
degree  score.  It  is  worth  noting  that  both  members  joined  the 
discussion  board  at  around  the  same  time. 

First,  through  a  close  look  at  the  clique  data  (Table  30.4)  we 
can  see  that  S12  is  a  member  of  10  out  of  the  15  cliques,  whereas 
S9  is  not  a  member  of  any — an  indication  of  the  high  interactiv¬ 
ity  of  S12  versus  the  low  interactivity  of  S9.  In  an  attempt  to  cor¬ 
relate  the  actors’  position  in  the  SNA  sociogram  with  their  self- 
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In  addition  to  the  analysis  of  the  discussion  board  interac¬ 
tions  we  also  collected  subjective  data  through  the  form  of  a 
survey.  More  specifically,  the  students  were  asked  to  complete 
an  Attitudes  Towards  Thinking  and  Learning  Survey  (ATTLS). 
The  ATTLS  uses  20  Likert  scale  questions  to  measure  the  extent 
to  which  a  person  is  a  “connected  knower”  (CK)  or  a  “separate 
knower”  (SK) .  People  with  higher  CK  scores  tend  to  find  learn¬ 
ing  more  enjoyable,  and  are  often  more  cooperative  and  con¬ 
genial,  and  more  willing  to  build  on  the  ideas  of  others,  while 
those  with  higher  SK  scores  tend  to  take  a  more  critical  and  ar- 
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FIGURE  30. 1 .  Out-degree  analysis  sociogram. 
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TABLE  30.4.  Clique  Analysis  of  the  LGO  Discussions 


cliques 

Actors 

K1 

SI 2,  S7,  S30,  S40,  S43,  S44,  S45 

K2 

SI 2,  S7,  S30,  S4 

K3 

S12,S7,S10,S11,S13 

K4 

S12,  S7,  S14 

K5 

S12,  S7,  S25 

K6 

S12,  S7,  S41 

K7 

SI 2,  S20,  S21,  S22 

K8 

SI 2,  S29,  S4,  S30,  S31,  S32,  S33,  S34 

K9 

SI 2,  S38,  S39,  S40 

KIO 

SI 2,  S46,  S49,  S50 

Kll 

SI,  S2,  S3,  S4,  S5,  S6,  S7 

K12 

S16,  S26,  S27,  S28 

K13 

S23,  S20,  S24 

K14 

S47,  S46,  S49,  S50 

K15 

S48,  S46,  S49,  S50 

S9  answered  them  with  a  score  of  1  (strongly  disagree).  More 
specifically,  S12  strongly  agreed  that 


FIGURE  30.2.  In-degree  analysis  sociogram. 


S18 


FIGURE  30.3.  Clique  analysis  sociogram. 


1.  S/He  is  more  likely  to  try  to  understand  someone  else’s  opin¬ 
ion  than  to  try  to  evaluate  it; 

2.  S/He  often  find  herself/himself  arguing  with  the  authors  of 
books  read,  trying  to  logically  figure  out  why  they’re  wrong; 

3.  S/He  finds  that  he/she  can  strengthen  his/her  own  position 
through  arguing  with  someone  who  disagrees  with  them; 

4.  S/He  feels  that  the  best  way  achieve  his/her  own  identity  is  to 
interact  with  a  variety  of  other  people; 

5.  S/He  likes  playing  devil’s  advocate,  arguing  the  opposite  of 
what  someone  is  saying. 

S9  strongly  disagreed  with  all  of  the  above  statements.  These 
are  all  indications  that  S12  is  a  “connected  knower”  (CK) 
whereas  S9  is  a  “separate  knower”  (SK) . 

This  case  study  showed  that  the  combination  of  quantitative 
and  qualitative  techniques  can  facilitate  a  better  and  deeper  un¬ 
derstanding  of  online  communities.  SNA  was  found  to  be  a  very 
useful  technique  for  visualizing  interactions  and  quantifying 
strengths  and  dynamics  in  online  communities.  In  combination 
with  the  ATTLS,  it  was  possible  to  identify  the  key  players  of  the 
e-Learning  community  These  members’  roles  show  them  to  be 
more  powerful  and  central  in  the  discussions.  Identifying  their 
characteristics  enables  us  to  reinforce  the  community  by  making 
other  participants  more  active  in  the  discussion  board  commu¬ 
nication.  This  active  learning  approach  could,  in  turn,  improve 
the  pedagogical  value  of  e-learning  within  these  communities. 


reported  attitudes  towards  teaching  and  learning  we  looked 
more  closely  at  the  answers  these  two  actors  (S12,  S9)  provided 
to  the  ATTLS.  Actor  S12  answered  all  20  questions  of  the  ATTLS 
with  a  score  of  at  least  3  (on  a  1-5  Likert  scale),  whereas  S9  had 
answers  ranging  from  1  to  5.  The  overall  ATTLS  score  of  S 12  is 
86  whereas  that  of  S9  is  60.  A  clear  dichotomy  of  opinions  oc¬ 
curred  on  5  of  the  20  questions  of  the  ATTLS.  S12  answered  all 
5  of  those  questions  with  a  score  of  5  (strongly  agree)  whereas 


Game  Communities  and  Activity  Theoretical  Analysis 

The  main  motivation  of  the  second  case  study  arises  from  the 
more  general  area  of  computer  game-based  learning.  Game- 
based  learning  has  focused  mainly  on  how  the  game  itself  can 
be  used  to  facilitate  learning  activities  but  we  claim  that  the  ed¬ 
ucational  opportunity  in  computer  games  stretches  beyond  the 
learning  activities  in  the  game  per  se.  Indeed,  if  you  observe 
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most  people  playing  games,  you  will  likely  see  them  download¬ 
ing  guidelines  from  the  Internet  and  participating  in  online  fo¬ 
rums  to  talk  about  the  game  and  share  strategies.  In  actuality,  al¬ 
most  all  game  playing  could  be  described  as  a  social  experience, 
and  it  is  rare  for  a  player  to  play  a  game  alone  in  any  meaning¬ 
ful  sense  (Kuo,  2004).  This  observation  is  even  more  evident  in 
Massively  Multiplayer  Online  Role  Playing  Games  (MMORPGs) 
which  has  been  discussed  earlier  in  this  chapter.  For  example, 
the  participation  in  a  MMORPG  is  constituted  through  language 
practice  within  the  in-game  community  (e.g.,  in-game  chatting 
and  joint  tasks)  and  out-of-game  community  (e.g.,  the  creation 
of  written  game-related  narratives  and  fan-sites) .  The  learning 
is  thus  not  embedded  in  the  game,  but  it  is  in  the  community 
practice  of  those  who  inhabit  the  game. 

Types  of  game  communities.  Because  of  the  points 
mentioned  earlier,  we  believe  that  the  study  on  computer 
games  should  be  expanded  to  include  the  entire  game  commu¬ 
nity.  Computer  game  communities  can  be  categorized  into 
three  classes  which  we  have  identified  (Fig.  30.  4)  (Ang,  Zaphiris 
&  Wilson,  2005)  as: 

Single  Game-play  Community — This  refers  to  a  game  com¬ 
munity  formed  around  a  single-player  game.  Although  play¬ 
ers  of  a  single-player  game  like  “The  Sims  2”  and  “Final  Fan¬ 
tasy  VII”  play  the  game  individually,  they  are  associated  with 
an  out-of-game  community  which  discusses  the  game  either 
virtually  or  physically. 

Social  Game-play  Community — This  refers  to  multiplayer 
games  which  are  played  together  in  the  same  physical  loca¬ 
tion.  It  creates  game  communities  at  two  levels:  in-game  and 
out-of-game.  Occasionally,  these  two  levels  might  overlap. 
The  out-of-game  interaction  might  be  affected  by  issues  be¬ 
yond  the  specific  game  system;  for  example,  the  community 
starts  exchanging  information  about  another  game. 
Distributed  Game-play  Community — This  is  an  extension  of 
social  game-play  community,  but  it  emphasizes  the  online 
multiplayer  game  in  which  multiple  sessions  of  game  are  es¬ 
tablished  in  different  geographical  locations. 


Player-game  interaction 

□ 
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(a)  Single  game-play  community  (b)  Social  game-play  community  (c)  Distributed  game-play  community 

FIGURE  30.4.  Types  of  game  communities. 


The  study  of  game  communities,  especially  out-of-game 
communities,  from  the  perspective  of  education  is  still  very 
much  unexplored.  We  believe  the  potential  of  games  in  educa¬ 
tion  is  not  limited  to  what  is  going  on  in  the  game.  Educators 
could  benefit  by  studying  games  as  a  social  community  because 
games  are  now  becoming  a  culture  that  permeates  the  lives  of 
many,  especially  the  younger  generation.  Black  (2004)  has  in¬ 
vestigated  the  interactions  among  participants  in  a  virtual  com¬ 
munity  of  Japanese  comic  fans  which  involve  a  lot  of  reading 
and  writing  throughout  the  site.  She  examines  how  the  com¬ 
munity  of  fans  helps  each  its  members  with  English  language 
writing  skills  and  with  cross-cultural  understanding.  In  this  sec¬ 
tion  we  have  pointed  out  that  game  communities  can  emerge 
from  both  single-player  and  multiplayer  games.  We  believe  that 
by  further  studying  the  social  interaction  in  the  game  commu¬ 
nity,  we  will  be  able  to  utilize  games  to  learn  in  a  more  fruitful 
way.  In  the  next  section,  we  apply  and  evaluate  one  of  these 
models  of  game  communities  to  a  specific  scenario  in  knowl¬ 
edge  building  using  activity  theory. 

Activity  theory.  In  this  case  study,  we  demonstrate  how 
activity  theory  can  be  used  to  analyze  an  out-of-game  commu¬ 
nity  around  a  single-player  game  which  is  based  on  construc¬ 
tionist  activities  as  proposed  by  Papert  (1980).  Papert  claimed 
that  even  for  adults,  learning  remains  essentially  bound  to  con¬ 
text,  in  which  knowledge  is  shaped  by  the  use  of  external  sup¬ 
ports.  His  approach  helps  us  understand  how  learning  is  actu¬ 
alized  when  individual  learners  construct  their  own  favorite 
artifacts  or  objects-to-think-with  (Papert,  1980). 

Although  Papert’s  theory  provided  a  solid  framework  for 
understanding  children’s,  and  even  adults’,  ways  of  learning  by 
designing,  it  does  not  offer  a  systematic  framework  for  analyz¬ 
ing  the  construction  activities  within  a  learning  community. 
Analyzing  constructionist  activities  can  be  useful  as  it  could 
help  inform  constructional  design  for  educational  purposes. 
The  most  significant  analysis  includes  the  learning  within  a 
community  as  well  as  the  development  of  an  individual.  We  are 
also  interested  in  finding  out  how  tools  such  as  computers 
help  learners  construct  artifacts  and  knowledge.  Hence,  we 
would  like  to  draw  from  the  "Vygotskian  naturalist  approach 
which  emphasizes  human  activity  systems.  Lev  "Vygotsky 
(1930)  formulated  a  theoretical  concept  which  is  very  different 
from  the  prevailing  understanding  of  psychology  which  was 
dominated  by  behaviorism  at  that  time.  This  new  orientation 
was  a  model  of  tool-mediation  and  object-orientedness.  He 
proposes  the  classic  triangle  model  to  demonstrate  the  idea  of 
mediation: 

In  Fig.  30.5,  the  subject  is  the  individual  engaged  in  the  me¬ 
diated  action,  and  the  mediating  artifact  or  tool  could  include 
physical  artifacts  and/or  prior  knowledge  of  the  subject.  The  ob¬ 
ject  is  the  goal/objective  of  the  activity.  Although  constructionist 
learning  relies  very  much  on  computational  tools,  the  concept 
of  mediation  is  not  explicated.  Fig.  30.5  shows  explicitly  that 
the  relationship  between  the  subject  and  the  object  is  no  longer 
straightforward;  instead,  it  is  mediated  by  the  tool.  For  example, 
when  building  a  website,  the  subject  is  working  towards  an  ob¬ 
jective  (e.g.,  to  add  a  table  in  the  web  page)  using  not  only  the 
computer  (external  tools)  but  also  her  internal  understanding 
of  how  websites  and  computers  work  (internal  tools) . 
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Mediation  tool 


Subject  Object 

FIGURE  30.5.  \^gotsky’s  mediation. 


Leont’ev  (1978)  extended  this  notion  of  activity  to  differen¬ 
tiate  between  an  individual  action  and  a  collective  activity  by 
proposing  a  hierarchy  of  activity  (Table  30.5).  Collective  activity 
is  connected  to  the  object  of  the  whole  community,  of  which 
each  individual  subject  is  often  not  consciously  aware.  An  indi¬ 
vidual  action  is  connected  to  a  conscious  goal.  Below  the  col¬ 
lective  activity  and  individual  action  there  is  the  level  of  opera¬ 
tions  that  are  dependent  on  the  conditions  in  which  the  action 
is  performed.  Thus,  an  activity  system  can  be  analyzed  at  three 
levels:  (a)  the  activity  level  which  is  oriented  toward  the  ob¬ 
ject/objective  and  carried  out  by  the  whole  community;  (b)  the 
action  level  which  is  directed  at  the  individual  goal,  and  (c)  the 
operation  level  which  is  elicited  by  conditions  and  is  performed 
unconsciously. 

This  hierarchy  is  crucial  in  explaining  the  learning  process 
in  an  activity  system.  We  would  like  to  illustrate  an  example  of 
this  hierarchy  in  learning  a  foreign  language  (Table  30.5).  The 
overall  objective  is  to  be  able  to  engage  in  a  meaningful  conver¬ 
sation.  In  the  beginning,  the  learner  has  to  work  on  the  gram¬ 
mar  and  the  choice  of  words  at  a  conscious  level.  When  the 
learner  has  reached  a  higher  proficiency  level,  these  actions  are 
transformed  into  operations.  The  learner  no  longer  needs  to  se¬ 
lect  appropriate  words  and  check  grammar  rules  deliberately,  as 
these  have  been  learned  thoroughly  and  are  now  operating  un¬ 
consciously.  The  consciousness  of  the  learner  is  now  focused  on 
expressing  himself  properly  depending  on  the  objective  of  the 
conversation.  Grammatical  rules  become  invisible  to  the  learner 
and  he  is  only  selecting  appropriate  goals  to  be  achieved.  There¬ 
fore,  it  can  be  inferred  that  activity  theory  treats  learning  as  the 
shift  from  the  higher  level  (action)  to  the  lower  level  (opera¬ 
tion)  in  the  hierarchy. 

Drawing  on  work  by  Vygotsky  and  Leont’ev,  Engestrom  (En- 
gestrom,  2001)  viewed  all  human  activities  as  contextualized 
within  an  interdependent  activity  system.  Engestrom  added 
collective  mediation  to  Vygotsky’s  tool  mediation  and  presents 
the  triangle  model  of  activity  system  (Fig.  30.6). 


TABLE  30.5.  Hierarchy  of  Activity 


Unit  of  Analysis 

Stimulus 

Subject 

Language  Learning 
Example 

Activity 

Object 

Community 

Engage  in  a 
meaningful 
conversation 

Action 

Goal 

Individual 

Sentence 

construction 

Operation 

Conditions 

Unconscious 

Word  selections, 
grammar  rules 

tool 


FIGURE  30.6.  The  triangle  activity  system  diagram. 


In  the  diagram,  the  subject  is  the  individual  or  group  se¬ 
lected  as  the  point  of  view  of  the  analysis.  The  object  refers  to 
the  raw  material  or  the  problem  space  at  which  the  activity  is 
directed  and  which  is  transformed  into  outcomes  with  the  help 
of  external  and  internal  tools.  Tools  are  the  concepts,  physical 
tools,  artifacts,  or  resources  that  mediate  a  subject’s  interactions 
with  an  object.  The  community  refers  to  those  with  whom  the 
subject  shares  the  same  general  object.  The  division  of  labor 
(DOL)  is  the  classification  of  tasks  among  the  members  on  the 
community  while  the  rules  are  the  regulations,  norms,  and  con¬ 
ventions  within  the  activity  system. 

Constructionist  learning  can  be  described  and  visualized 
through  this  activity  triangle.  Mediated  by  the  tool  and  the  com¬ 
munity,  the  learner  externalizes  her  initial  stage  of  knowledge 
through  object  construction.  The  individual  externalization  (me¬ 
diated  by  the  tool)  can  be  broken  down  into  actions  and  opera¬ 
tions.  Actions  are  directed  toward  a  personal  goal  and  are  carried 
out  with  careful  deliberation.  For  example  (Fig.  30.7),  in  a  book¬ 
writing  activity,  the  author  (an  expert  word-processor  user)  will 
operate  (e.g.,  type)  the  word  processor  at  the  unconscious  level, 
and  consciously  act  on  the  book  (to  select  appropriate  words, 
and  construct  meaningful  sentences  and  paragraphs)  she  is  writ¬ 
ing.  At  a  certain  point,  the  author  encounters  a  new  condition 
with  the  word  processor  which  she  is  not  familiar  with — for 


word  processor 


FIGURE  30.7.  The  transformation  of  individual  action-operation. 
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example,  having  to  insert  a  table  into  the  book.  Under  this  new 
condition,  a  breakdown  is  said  to  have  happened.  The  conscious 
effort  of  the  author  is  not  any  longer  placed  on  the  book  itself 
but  instead  is  now  placed  on  the  word  processor  (e.g.,  to  achieve 
the  action  (insert  tables),  the  author  performs  the  operation 
(read  help  files)).  Once  the  author  has  thoroughly  learned  about 
the  table  insert,  she  can  again  act  on  the  book  consciously  and 
development  is  said  to  have  happened. 

Activity  theoretical  analysis  for  constructionist  learn¬ 
ing.  We  have  conducted  a  study  to  analyze  a  Wiki-based  game 
community  using  activity  theory.  The  goal  of  this  study  was  to 
demonstrate  the  usefulness  of  activity  theory  in  researching  on¬ 
line  communities.  The  collaborative  construction  activity  in  a 
Wiki-supported  community  devoted  to  writing  a  game  guide¬ 
book  is  examined.  Activity  theory  is  used  as  an  analytical  tool  in 
order  to  investigate  this  community.  In  this  section,  experiences 
and  challenges  from  the  analysis  are  reported  in  order  to  give 
some  insights  into  how  activity  theory  can  be  helpful  for  online 
community  research. 

The  analysis  of  online  communities  can  be  done  through  the 
lenses  of  various  concepts  associated  with  activity  theory,  such 
as  the  levels  of  activity  hierarchy  and  different  perspectives  of 
the  triangle  model  as  explained  in  the  previous  section. 

For  instance,  we  can  start  our  analysis  with  the  most  basic  as¬ 
pect  of  constructionism  by  simply  examining  the  relationship 
between  subjects  and  the  object.  Then,  we  can  analyze  Vygot¬ 
sky’s  mediation  model  of  activity  system  consisting  of  individual 
actions  and  tools.  The  analysis  may  also  be  extended  to  the 
whole  community  of  this  system  to  include  emerging  rules  as 
well  as  division  of  labor  (DOT)  that  mediate  the  community.  The 
focus  can  also  be  placed  mainly  on  the  constructionist  concept 
of  externalizing  the  internal  meanings  onto  a  sharable  artifact 
through  mediation.  More  specifically,  we  can  look  into  these 
(and  other)  aspects: 

•  Subject-object:  What  are  the  constructionist  actions  that  act 
on  the  object  and  transform  the  object  into  outcome? 

•  Subject-tool-object:  How  do  actions  shift  to  operations  and 
vice  versa?  How  do  tools  mediate  individual  actions  and  op¬ 
erations?  What  is  the  nature  of  the  mediating  tools?  How  do 
they  support  knowledge  building? 

•  Subject-rules-object  and  subject-DOL-object:  What  is  the  nature 
of  implicit  and  explicit  rules?  How  is  DOT  manifest  in  the  com¬ 
munity?  How  do  rules  and  DOT  support  knowledge  building? 

Apart  from  its  focus  on  both  individual  and  collective  devel¬ 
opment  through  action-operation  transformation,  activity  the¬ 
ory  also  helps  analyze  the  tools  and  capture  the  rules  and  the  di¬ 
vision  of  labor  (DOT)  which  mediate  the  actions.  These  uses 
must  be  further  explained  in  order  to  differentiate  individual 
mediation  and  collective  mediation.  Taking  the  example  of  our 
study  on  game  community,  individual  mediation  places  its  em¬ 
phasis  on  how  a  user  uses  the  tool  to  write  the  game  guide, 
without  taking  into  account  how  other  users  act  in  the  commu¬ 
nity.  In  other  words,  it  is  about  the  affordance  of  the  tool  to  sup¬ 
port  what  an  individual  can  do. 

Collective  mediation  is  about  the  community,  which  con¬ 
sists  of  two  major  components:  rules  and  DOT.  Rules  define 


what  can  be  done  and  cannot  be  done  in  a  community.  This 
should  not  be  confused  with  the  affordance  of  the  tool.  The 
tool  might  afford  certain  actions  such  as  writing  in  abusive  lan¬ 
guage,  but  the  rules  might  want  to  ban  this  action.  DOT  is  self- 
explained:  how  the  workload  is  divided  among  many  users  in 
a  community. 

Activity  theory  appears  to  be  a  promising  framework  as  it 
gives  an  analytical  lens  on  analyzing  and  interpreting  the  data. 
Activity  theory  provides  different  perspectives  of  analysis,  as  it 
casts  different  light  on  the  collected  data  as  researchers  can 
examine  it  from  many  perspectives  by  focusing  on  different 
sub-triangles  of  the  activity  system  diagram.  It  also  helps  us 
examine  learning  process — how  learning  occurs  individually 
and  collectively  through  the  transformation  of  hierarchy  of 
activity  from  action  to  operation.  Furthermore,  both  individ¬ 
ual  and  collective  aspects  are  given  equal  importance.  Activity 
theory  informs  the  development  of  the  whole  community  as 
well  as  the  individual’s  development.  It  explains  how  individ¬ 
ual  development  contributes  to  the  community  growth,  and 
vice  versa. 

Based  on  our  study,  individual  actions  help  sharpen  the  me¬ 
diation  tool,  while  collective  actions  bring  about  new  rules  or 
refine  existing  rules  that  mediate  the  collective  action.  In  short, 
activity  theory  is  useful  to  analyze  the  community  in  the  fol¬ 
lowing  ways: 

•  It  helps  understanding  of  the  individual  mediation  process: 
subject-tool-object; 

•  It  clearly  presents  the  collective  mediation  process:  subject- 
community-object; 

•  It  reveals  the  emerging  rules  and  division  of  labor  in  the 
community. 

In  a  "Wiki  space,  knowledge  is  socially  constructed;  it  is  created 
individually  with  tools  and  negotiated  and  agreed  to  within  a 
community  based  on  emerging  rules  and  division  of  labor  (DOT) . 
It  starts  as  a  single  unit  of  information  (a  page  in  this  context) 
and  grows  organically  and  evolves  into  a  complex  and  well- 
structured  set  of  knowledge.  From  our  findings,  we  induce  what 
contributes  to  the  development  of  the  community: 

•  Users  share  some  historical  backgrounds:  they  already  share 
some  of  the  tools/rules  before  joining  the  community,  they 
also  share  the  interest  on  the  same  game; 

•  Users  share  the  same  object  (goal)  which  is  to  build  a  game 
guide  book; 

•  A  user’s  individual  action:  this  goal-oriented  individual  action 
triggers  negotiations  that  leads  to  the  growth  of  the  space; 

•  The  community’s  agreement  on  the  object:  not  only  share 
the  same  object,  the  community  must  be  able  to  negotiate 
and  agree  on  the  object; 

•  Tools  that  support  these  actions  and  negotiations; 

•  Emerging  rules  that  coordinate  the  activity; 

•  DOT  that  divide  the  responsibilities. 

The  evolution  of  a  knowledge-building  community  needs 
more  than  a  group  of  devoted  users  who  share  the  same  ob- 
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ject.  It  involves  negotiation  and  agreement  among  the  users  on 
the  object.  Although  every  user  tends  to  act  toward  her  or  his 
own  goal,  it  takes  the  compromise  of  the  entire  community  to 
agree  on  the  object. 

Activity  theory  helps  online  community  researchers  identify 
design  issues  at  two  aspects:  the  software’s  application  as  tools 
as  well  as  the  social  interaction  within  the  community  around 
the  tools.  It  also  reveals  the  development  of  the  community¬ 
building  process  from  the  individual  and  collective  level, 
through  the  shift  of  activity  hierarchy.  We  thus  believe  that  ana¬ 
lyzing  online  communities  with  activity  theory  will  yield  fruitful 
results  and  give  insights  on  online  community  design. 

Finally,  we  must  reiterate  the  fact  that  activity  theory  itself  is 
not  limited  to  what  is  presented  in  the  triangle  diagram  as  pro¬ 
posed  by  Engestrom.  Although  his  model  is  tremendously  use¬ 
ful,  it  overlooks  several  significant  concepts  of  activity  theory. 
One  major  limitation  is  the  static  representation  of  activity  the¬ 
ory.  The  triangle  diagram  represents  only  a  snapshot  of  a  par¬ 
ticular  time,  thus  making  it  hard  to  analyze  the  activity  across 
time.  It  is  understood  that  Engestrom’s  model  is  intended  to 
be  open  so  that  it  can  be  used  in  various  domains  but  this  has 
been  proven  to  pose  serious  difficulty  among  the  practitioners, 
as  some  researchers  have  started  to  operationalize  it  so  that  it 
is  more  practical  in  day-to-day  methodology  (Korpela,  Soriyan, 
&  Olufokunbi,  2000;  Barab,  Hay,  &  Yamagata- Lynch,  2001; 
Mwanza,  2002).  Hopefully  with  the  increasing  attention  drawn 
by  activity  theory,  the  theory  will  be  expanded  and  operational¬ 
ized  to  fit  the  purpose  of  HCI  research  in  general  and  online 
community  in  particular. 


DISCUSSION  AND  CONCLUSIONS 


This  chapter  looked  at  the  definitions  of  computer  mediated 
communication  and  online  communities  and  pointed  out  the 
multi-disciplinary  nature  of  the  definitions  and  the  way  online 
communities  are  being  analyzed  and  studied.  We  introduced 
the  different  types  of  CMC  and  online  communities  and  put  spe¬ 
cial  attention  to  Wikis  and  game-based  communities.  These  two 
relatively  new  types  of  communities  exhibit  new  modes  of 
interaction  that  are  worth  studying  further.  We  provided  an 
overview  of  some  of  the  most  commonly  used  techniques  and 
theoretical  frameworks  for  analyzing  online  communities.  Then, 
we  used  two  case  studies  to  demonstrate  the  use  of  those  tech¬ 
niques.  Although  both  of  our  case  studies  focus  on  e-learning 
communities,  communities  have  been  used  in  a  multitude  of 
disciplines.  Empathic  online  communities,  for  example,  seem  to 
be  gaining  a  lot  of  momentum,  with  people  supporting  each 
other.  Online  communities  are  widely  used  for  general  enter¬ 
tainment  and  current  affairs  discussions  as  well.  Professional 
communities  (e.g.,  business,  art,  industry  specific)  are  also  be¬ 
ing  formed  within  the  online  environment. 

The  study  of  online  communities  is  flourishing,  primarily  due 
to  the  increasing  popularity  of  online  services  and  tools  that  pro¬ 
vide  the  construction  of  such  networks  of  users.  The  study  of 
such  complex  communities  requires  the  use  of  a  synthesis  of 
methodologies  and  theoretical  foundations.  In  our  first  case 
study,  we  demonstrated  how  Social  Network  Analysis  can  be  used 


to  model  and  visualize  online  community  interactions;  in  the  sec¬ 
ond  case  study,  the  theoretical  foundations  based  on  activity 
theory  were  applied  to  the  domain  of  game-based  communities. 

At  the  beginning  of  the  Internet  technology,  online  com¬ 
munities  were  solely  used  for  synchronous  (often  just  chat) 
or  asynchronous  (most  commonly  discussion  board)  textual 
interaction.  This  is  no  longer  the  case,  as  people  are  interact¬ 
ing  in  online  communities  using  new  and  more  innovative  in¬ 
teraction  paradigms.  Game-based  communities,  for  instance, 
allow  users  to  represent  themselves  (using  3D  avatars)  visu¬ 
ally  in  virtual  environments  which  are  often  depicted  as  a  fan¬ 
tastic  and  unrealistic  world.  They  can  navigate,  change,  or 
even  create  the  virtual  world  (and  thus  the  context  of  the 
community)  they  interact  with.  As  such,  the  traditional  bound¬ 
ary  between  author  and  reader  is  distorted  as  the  designer 
(authors)  is  not  the  one  and  only  who  determines  what  the 
system  is  like.  Rather,  the  participants  (readers)  are  cocon¬ 
structing  the  virtual  world  as  they  are  not  anymore  using  the 
tool  to  communicate;  they  are  creating  and  interacting  with 
a  virtual  environment  through  which  they  can  meet,  socialize, 
and  work  with  others.  Preece  (2000)  was  the  first  to  identify 
and  stress  this  important  social  dimension  of  online  commu¬ 
nities.  Now,  this  online  sociability  is  becoming  more  mature 
and  more  central  to  the  online  communities  with  which  we 
interact. 

This  new  paradigm  of  interaction  poses  new  challenges  for 
researchers  and  practitioners.  The  importance  of  appropriate 
representation  of  emotions  and  other  social  and  cultural  cues  in 
online  communities  is  now  becoming  even  more  important. 
With  textual  interfaces  there  was  an  attempt  to  represent  these 
social  cues  using  emoticons.  How  can  this,  for  example,  be 
transferred  to  the  domain  of  virtual  game  communities?  Do  we 
want  our  avatars  to  behave  like  us  or  do  we  want  them  to  have 
some  alternate  and  illusionary  identities  with  extraordinary  abil¬ 
ities  or  unusual  behaviors?  How  can  we  come  up  with  new  in¬ 
terfaces  and  new  interaction  paradigms  that  can  facilitate  this 
better,  in  order  to  cope  with  the  new  demands  from  the  users  in 
such  online  communities? 

A  second  area  which  is  gaining  popularity  is  the  research  of 
online  communities,  or  Internet  research.  Content  analysis  and 
query-based  techniques  were  sufficient  if  what  we  wanted  was 
a  first  good  impression  of  the  interactions  taking  place  in  an 
online  textual  community.  This  is  not  anymore  the  case.  Study¬ 
ing  the  sociability  of  game-based  communities,  for  example,  re¬ 
quires  the  synthesis  of  more  techniques.  We  might  want  to  im¬ 
merse  ourselves  in  that  community,  engage  in  long-term 
ethnographic  studies  of  its  environment,  and  get  the  first-hand 
experiences  of  what  happens  in  it.  We  might  have  to  employ 
social-cultural  theoretical  frameworks  (e.g..  Activity  Theory)  to 
get  a  better  understanding  of  the  ways  people  behave  differ¬ 
ently  from  the  real  world  in  these  inherently  social  environ¬ 
ments.  We  might  have  to  quantify,  using  Social  Network  Analysis 
for  example,  the  dynamics  of  the  networks  and  subgroups  that 
evolve  around  these  communities. 

The  possibility  of  online  communities  is  unlimited  with  the 
emergence  of  more  mature  and  imaginative  virtual  worlds.  Only 
by  treating  them  in  equality  with  their  physical  counterparts 
which  encapsulate  the  practices  of  economy,  politics,  ideology, 
and  everyday  life,  can  we  research  and  study  them  intellectually. 


618  •  ZAPHIRIS,  ANG,  LAGHOS 


References 


Ang,  C.  S.,  Zaphiris,  P,  &  Wilson,  S.  (2005).  Social  interaction  in  game 
communities  and  second  language  learning.  The  19th  British  HCI 
Group  Annual  Conference,  Edinburgh,  UK 

Archer,  W,  Garrison,  R.  D.,  Anderson,  T.,  &  Rourke,  L.  (2001).  A  frame¬ 
work  for  analyzing  critical  thinking  in  computer  conferences.  Euro¬ 
pean  Conference  on  Computer-Supported  Collaborative  Learning, 
Maastricht,  Netherlands. 

Aronsson,  L.  (2002).  Operation  of  a  large  scale,  general  purpose  Wiki 
website.  Experience  from  susning.nu’s  first  nine  months  in  service.  In 
J.  A.  Carvalho,  A.  Hiibler,  &  A.  A.  Baptista  (Eds.),  Proceedings  of  the 
6th  International  ICCC/IFIP  Conference  on  Electronic  Publishing 
(pp.  27-37).  Karlovy  Vary,  Czech  Republic:  Verlag  fiir  Wissenschaft 
und  Forschung. 

Barab,  S.  A.,  Hay,  K.  E.,  &  Yamagata-Lynch,  L.  C.  (2001).  Constructing  net¬ 
works  of  action-relevant  episodes:  An  In  Situ  research  methodol¬ 
ogy.  The  Journal  of  Learning  Sciences,  10(1  &  2),  63-112. 

Bates,  A.  W  (1995).  Technology,  open  leamingand distance  education. 
London:  Routledge. 

Beidernikl,  G.,  &  Paier,  D.  (2003).  Network  analysis  as  a  tool  for  assess¬ 
ing  employment  policy.  Proceedings  of  the  Evidence-Based  Policies 
and  Indicator  Systems  Conference,  London. 

Black,  R.  W  ClDWi) .  Anime-inspired  affiliation:  An  ethnographic  inquiry 
into  the  literacy  and  social  practices  of  English  language  learners 
writing  in  the  fan-fiction  community.  Paper  presented  at  2004  meet¬ 
ing  of  American  Educational  Research  Association,  San  Diego,  CA. 

Blomkvist,  S.  (2002).  Persona — an  overview,  Uppsala  University. 
Retrieved  November  22,  2004,  from  http:/Avww.it. uu.se/edu/course/ 
homepage/hcinet/ht04/library/docs/Persona-overview.pdf 

Borgatti,  S.  (2000).  What  is  social  network  analysis.  Retrieved  Novem¬ 
ber  9,  2004,  from  http://wwwanalytictech.com/networks/whatis.htm 

Burge,  E.  L.  &  Roberts,).  M.  (1993).  Classrooms  with  a  difference:  A 
practical  guide  to  the  use  of  conferencing  technologies.  Ontario: 
University  of  Toronto  Press. 

cap;  University  of  Warwick  (2004).  E-Guide:  Using  computer  mediated 
communication  in  leamingand  teaching.  Retrieved  November  8, 
2004,  from  http://www2.warwick.ac.uk/services/cap/resources/eguides/ 
cmc/cmclearning/ 

Castronova,  E.  (2001,  December).  Virtual  worlds:  A  first-hand 
account  of  market  and  society  on  the  cyberian  frontier .  CESifo 
Working  Paper  Series  No.  618.  Fullerton:  Center  for  Economic 
Studies  and  Ifo  Institute  for  Economic  Research,  California  State 
University. 

Choi,  J.  H.,  &  Danowski,  J.  (2002).  Cultural  communities  on  the  net — 
Global  village  or  global  metropolis:  A  network  analysis  of  Usenet 
newsgroups./oMr««/  of  Computer-Mediated  Communication,  7(3). 
Retrieved  March  8,  2007,  from  http://lCMC.indiana.edu/vol7/issue3/ 
chai.html 

Cook,  D.,  &  Ralston,).,  (2003).  Sharpening  the  focus:  Methodological 
issues  in  analyzing  on-line  conferences.  Technology,  Pedagogy  and 
Education,  12(fi),  361-376. 

Cooper,  A.  (1999).  The  Inmates  are  running  the  asylum.  Indianapolis, 
IN:  SAMS,  a  division  of  Macmillan  Computer  Publishing. 

Cyram.  (2004) .  Netminer for  windows.  Retrieved  March  8,  2007,  from 
http://netminer.com 

December,  ).  (1997).  Notes  on  defining  of  computer-mediated  com¬ 
munication.  Computer-Mediated  Communication  Magazine,  3(1). 
Retrieved  March  8,  2007,  from  http://www.december.com/cmc/may/ 
1997/j  an/december.  html 

Dekker,  A.  H.  (2002,  )anuary-February).  A  category-theoretic  approach 
to  social  network  analysis.  Proceedings  of  Computing:  The  Aus¬ 
tralasian  Theory  Symposium  (CATS).  Melbourne,  Australia. 


Ducheneaut  N.,  Moore  R. ).,  &  Nickell,  E.  (200A).  Designing  for  socia¬ 
bility  in  massively  multiplayer  games:  An  examination  of  the 
“Third  Places”  of  SWG.  Other  Players  Conference,  Denmark. 

Electronic  Art  (2005).  Ultima  online.  Retrieved  Novembers,  2005,  from 
http://wwwuo.com/ 

Emigh,  W,  &  Herring,  S.  C.  (2005).  Collaborative  authoring  on  the  web: 
A  genre  analysis  of  online  encyclopedias.  Proceedings  of  the  38th 
Hawai’i  International  Conference  on  System  Sciences  (HICSS-38). 
Los  Alamitos,  CA:  IEEE  Press. 

Engestrom,  Y  (2001) .  Expansive  learning  at  work:  Toward  an  activity  the¬ 
oretical  reconceptualisation./oMraa/  of  Education  and  Work,  14(1), 
133-156. 

Fahy,  P).  (2003).  Indicators  of  support  in  online  interaction.  Interna¬ 
tional  Review  of  Research  in  Open  and  Distance  Learning,  4(1). 
Retrieved  March  8,  2007,  from  http://www.irrodl.org/index.php/ 
irrodl/article/view/129/209 

Fahy,  P  ).,  Crawford,  G.,  &  Ally,  M.,  (2001).  Patterns  of  interaction  in  a 
computer  conference  transcript.  International  Review  of  Research 
in  Open  and  Distance  Learning,  2(1).  Retrieved  8  March  2007,  from 
http://wwwirrodl.org/irrodl/article/view/36/73 

Ferris,  P.  (1997).  What  is  CMC?  An  overview  of  scholarly  definitions. 
Computer-Mediated  Communication  Magazine,  4(1).  Retrieved 
March  8,  2007,  from  http:www.december.com/cmc/may/1997/jun/ 
december.html 

Fortner,  R.  S.  (1993) .  International  communication:  History,  conflict, 
and  control  of  the  global  metropolis.  Belmont,  CA:  Wadsworth. 

Galotti,  K.  M.,  Clinchy  B.  M.,  Ainsworth,  K.,  Lavin,  B.,  &  Mansfield,  A.  F. 
(1999) .  A  new  way  of  assessing  ways  of  knowing:  The  attitudes  towards 
thinking  and  learning  survey  (KlllS) .  Sex  Roles,  40(9  &  10),  745-766. 

Garton,  L.,  Haythornthwaite,  C.,  SWellman,  B.  (1997).  Studying  On-line 
Social  Networks.  In)ones,  S.  (Ed.),  Doing  internet  research.  Thou¬ 
sand  Oaks,  CA:  Sage. 

Gunawardena,  C.,  Lowe,  C.,  &  Anderson,  T.  (1997).  Analysis  of  a  global 
online  debate  and  the  development  of  an  interaction  analysis  model 
for  examining  social  construction  of  knowledge  in  computer  confer- 
encing.  Journal  of  Educational  Computing  Research,  17(1),  397-431. 

Halvorsen,  R.  F.  (2005).  TheStateOfWiki.  Retrieved  September  22,  2005, 
from  http://heim.ifi.uio.no/~aurilla/web/rune.pdf 

Hanneman,  R.  A.  (2001).  Introduction  to  social  network  methods. 
Retrieved  November  9,  2004,  from  http://faculty.ucr.edu/~hanneman/ 
SOC157/TEXT/rextIndex.html 

Heeren,  E.  (1996).  Technology  support  for  collaborative  distance 
learning.  Doctoral  dissertation.  University  of  Twente,  Enschede. 

Henri,  F.  (1992).  Computer  conferencing  and  content  analysis.  In  A.  R. 
Kaye  (Ed.),  Collaborative  learning  through  computer  conferenc¬ 
ing:  The Najaden  Papers  (pp.  117-136).  Berlin:  Springer-Verlag. 

)ones  G.  B.  (2003).  Blogs  and  Wikis:  Environments  for  on-line  collabo¬ 
ration.  G  7ec6«o/ogy,  7(2),  12-16. 

)ones,  S.  (1995).  Computer-mediated  communication  and  community. 
Computer-Mediated  Communication  Magazine,  2(3). Retrieved 
March  8,  2007,  from  http://www.december.com/cmc/may/pms/mar/ 
jones.html 

Kelly  R.  V  (2001) .  Massively  multiplayer  online  role-playing  games: 
The  people,  the  addiction  and  the  playing  experience .  McFarland 
&  Company  U.S. 

Kittowski,  F.  E,  &  Kohler,  A.  (2002).  Knowledge  creating  communities  in 
the  context  of  work  processes.  AGIZ  SIGGROUP  Bulletin,  23(3),  8-13. 

Kolbert,  E.  (2001,  May  28).  Pimps  and  dragons:  How  an  online  world 
survived  a  social  breakdown.  The  New  Yorker.  Retrieved  February 
15,  2005,  from  http://www.newyorker.eom/fact/content//010528fa_ 
FACT 


30.  Online  Communities  *  619 


Korpela,  M.,  Soriyan,  H.  A.,  &  Olufokunbi,  K.  C.  (2000).  Activity  analysis 
as  a  method  for  information  systems  development.  Scandinavian 
Journal  of  Information  Systems,  12,  191-210. 

Korzenny  F.  (1978).  A  theory  of  electronic  propinquity:  Mediated  com¬ 
munication  in  organizations.  Communication  Research,  5,  3-23. 

Krebs,  V  (2004) .  An  introduction  to  social  network  analysis.  Retrieved 
November  9,  2004,  from  http://wtvw.orgnet.com/sna.html 

Kuo,  J.  (2004).  Online  videogames  in  mental  health.  Paper  presented  at 
the  annual  meeting  of  the  American  Psychiatry  Association,  New  York. 

Kypros-Net  Inc.  (2005).  The  world  of  Cyprus.  Retrieved  March  8,  2007, 
from  http://kypros.org 

Leont’ev,  A.  N.  {191S) .  Activity,  consciousness,  and  personality .  New  York: 
Prentice-Hall. 

Madden,  M.,  &  Rainie,  L.  (2003).  Pew  Internet  &  American  Life  Project 
Surveys.  Pew  Internet  &  American  Life  Project,  Washington,  DC. 

Mader,  S.  (2004).  OpenSpectrum:  A  Wiki-based  learning  tool  for  Spec¬ 
troscopy  that  anyone  can  edit.  Winter  2005  CONGCHEM:  Trends 
and  New  Ideas  in  Chemical  Education,  virtual  conference.  Retrieved 
March  8,  2007,  from  http://www.ched-ccce.Org/confchem/2005/a/ 
index.html 

Mason,  R.  (1991).  Analyzing  computer  conferencing  interactions.  Com¬ 
puters  in  Adult  Education  and  Training,  2(3).  161-173. 

McLoughlin,  C.  (1996,  January  21-25).  A  learning  conversation:  Dynam¬ 
ics,  collaboration  and  learning  in  computer  mediated  communica¬ 
tion.  In  C.  McBeath  &  R.  Atkinson  (Eds.),  Proceedings  of  the  Third 
International  Interactive  Multimedia  Symposium ,  Promaco  Con¬ 
ventions,  Perth,  Western  Australia,  267-273. 

McLuhan,  M.  (1964).  Understanding  media:  The  extension  of  man. 
New  York:  McGraw  Hill. 

Metcalfe,  B.  (1992).  Internet  fogies  to  reminisce  and  argue  at  Interop 
Conference  (p.  45).  Info  World. 

Mwanza,  D.  (2002).  Towards  an  activity-oriented  method  for  HCI 
research  and  practice.  PhD  Thesis,  Open  University. 

NCsoft  (2005).  lineage.  Retrieved  November  8,  2005,  from  http://www 
.lineage.com/ 

NUA  Internet  Surveys  (2004).  Retrieved  October  20,  2004,  from  http:// 
www.nua.ie/surveys/how_many_online/index.html 

Papert,  S.  {19SQ)) .  Mindstorms:  Children,  computers,  and  powerful 
ideas.  New  York:  Basic  Books. 

Preece,  J.  (2000).  Online  Communities:  Designing  usability,  supporting 
sociability .  Chichester,  UK:  John  Wiley  and  Sons. 

Preece,  J.,  &  Maloney-Krichmar,  D.  (2003)  Online  communities:  Focus¬ 
ing  on  sociability  and  usability.  In  J.  Jacko  &  A.  Sears  (Eds.),  Hand¬ 
book  of  Human-Computer  Interaction  (pp.  596-620).  Mawhah,  NJ: 
Lawrence  Erlbaum  Associates. 

Preece,  J.,  Rogers,  Y,  &  Sharp,  H.  (2002).  Interaction  design:  Beyond 
human-computer  interaction.  New  York:  John  Wiley  &  Sons. 

Reiser,  R.  A.,  &  Gagne,  R.  M.  (1983).  Selecting  media  for  instruction. 
Englewood  Cliffs,  NJ:  Educational  Technology  Publications. 

Rheingold,  H.  (1993).  The  virtual  community:  Homesteading  on  the 
electonic  frontier .  Reading,  MA:  Addison-Wesley. 

Rice,  R.  (1994).  Network  analysis  and  computer  mediated  communica¬ 
tion  systems.  In  S.  W  J.  Galaskiewkz  (Ed.),  Advances  in  social  net¬ 
work  analysis.  Newbury  Park,  CA:  Sage. 

Rice,  R.  E.,  Grant,  A.  E.,  Schmitz,  J.,  &  Torobin,  J.  (1990).  Individual  and 
network  influences  on  the  adoption  and  perceived  outcomes  of 
electronic  messaging.  Social  Networks,  12,  17-55. 

ScotCIT.  (2003).  Enabling  large-scale  institutional  implementation  of 
communications  and  information  technology  (ELICIT).  Using  Com¬ 


puter  Mediated  Conferencing.  Retrieved  November  2,  2004,  from 
http://www.elicit.scotcit.ac.uk/modules/cmcl/welcome.htm 
Scott,  J.  (2000).  Social  network  analysis:  A  handbook  (2nd  ed.).  Lon¬ 
don:  Sage. 

Sony  (2005).  Star  Wars  Galaxies.  Retrieved  November  8,  2005,  from 
http://starwarsgalaxies.station.sony.com/ 

Squire,  K.,  &  Steinkeuhler,  C.  (in  press):  Generating  cyberculture/s:  The 
case  of  Star  Wars  Galaxies.  In  D.  Gibbs,  &  K.  L.  Krause  (Eds.),  Cyber- 
lines:  Languages  and  cultures  of  the  Internet  (2nd  ed.) .  Albert  Park, 
Australia:  James  Nicholas  Publishers. 

Suler,  J.  (2004).  The  final  showdown  between  in-person  and  cyberspace 
relationships.  Retrieved  November  3,  2004,  from  http://wwwl. rider 
.edu/~suler/psycyber/showdown.html 
Taylor,  T.  L.  (2002).  Whose  game  is  this  anyway?:  Negotiating  corporate 
ownership  in  a  virtual  world.  Proceedings  of  the  CGDC  Conference, 
227-242.  Tampere,  Finland 

U.S.  Census  Bureau  (2004).  Global  population  profile  2002 .  Retrieved 
October  20,  2004,  from  http://www.census.gov/ipc/www/wp02.html 
Usability  Net.  (2003).  UsabilityNet  (2003)  Interviews.  Retrieved  Decem¬ 
ber  3,  2004,  from  http://www.usabilitynet.org/tools/interviews.html 
Vaknin,  S.  (2002).  TrendSiters:  Games  people  play.  Electronic  Book  Web. 

Retrieved  February  23,  2002,  from  http://12.108.175.91/ebookweb 
Viegas,  F.  B.,  Wattenberg,  M.,  &  Dave,  K.  (2004).  Studying  cooperation 
and  conflict  between  authors  with  history  flow  visualizations.  In  E. 
Dykstra-Erickson  &  M.  Tscheligi  (Eds.),  Proceedings  of  ACM  CHI 
2004  Conference  on  Human  Eactors  in  Computing  Systems,  (pp. 
575-582).  Vienna,  Austria.  ACM  Press 
Vygotsky,  L.  (1930).  Mind  and  society.  Cambridge,  MA:  Harvard  Univer¬ 
sity  Press. 

Wallace,  P  (1999).  The  psychology  of  the  Internet .  Cambridge,  UK:  Cam¬ 
bridge  University  Press. 

Wang,  H.  C.,  Lu,  C.  H.,  Yang,  J.  Y,  Hu,  H.  W,  Chiou,  G.  E,  Chiang,  Y  T, 
et  al.  (2005).  An  empirical  exploration  of  using  Wiki  in  an  English  as 
a  second  language  course.  Proceedings  of  5th  IEEE  International 
Conference  on  Advanced  Learning  Technologies  (ICALT  2005), 
Kaohsiung,  Taiwan,  155-157. 

Wellman,  B.  (1982).  Studying  personal  communities.  In  P.  M.  N.  Lin 
(Ed.),  Social  structure  and  network  analysis  (pp.  61-80).  Beverly 
Hills,  CA:  Sage. 

Wellman,  B.  (1992).  Which  types  of  ties  and  networks  give  what  kinds 
of  social  SMpponl  Advances  in  Group  Processes,  9,  207-235. 
Wikipedia  (2005).  Wikipedia:  The  free  encyclopedia.  Retrieved  October 
1,  2005,  from  http://en.wikipedia.org/wiki/Wikipedia 
Yee,  N.  (2005,  May).  The  demographics  and  derived  experiences  of 
users  of  massively  multi-user  online  graphical  environments.  Paper 
presented  at  The  55th  Internal  Communication  Association  Annual 
Conference,  New  York. 

Zaphiris,  P,  &  Zacharia,  G.  (2001).  Design  methodology  of  an  online 
Greek  language  course.  Ext.  Abstracts  CHI  2001  conference .  ACM 
Press. 

Zaphiris,  E,  Zacharia,  G.,  &  Rajasekaran,  M.  (2003).  Distributed  con¬ 
structionism  through  participatory  design.  In  Ghaoui,  C.  (Ed.) 
E-education  applications:  Human  factors  and  innovative 
approaches  (pp.  164-179).  London,  UK:  Idea  Group. 

Zhu,  E.  (1996).  Meaning  negotiation,  knowledge  construction,  and  men¬ 
toring  in  a  distance  learning  course.  In  Proceedings  of  Selected 
Research  and  Development  Presentations  at  the  1996 National  Con¬ 
vention  of  the  Association  for  Educational  Communications  and 
Technology,  Indianapolis,  IN. 


VIRTUAL  ENVIRONMENTS 


Kay  M.  Stanney 

University  of  Central  Florida 

Josepfi  V  Cofin 

Naval  Research  Laboratory 


Introduction  . 622 

System  Requirements . 622 

Hardware  Requirements . 623 

Multimodal  I/Os  . 623 

Tracking  systems . 624 

Interaction  techniques . 625 

Augmented  cognition  techniques  . 625 

Software  Requirements . 625 

Modeling . 625 

Autonomous  agents  . 626 

Networks . 626 

Design  and  Implementation  Strategies . 626 

Cognitive  Aspects . 626 

Multimodal  interaction  design . 626 

Perceptual  illusions  . 627 

Navigation  and  wayfinding  . 628 

Content  Development  . 628 

Products  Liability . 628 

Usage  Protocols  . 628 


Health  and  Safety  Issues  . 629 

Cybersickness,  Adaptation,  and  Aftereffects  . 629 

Social  Impact  . 630 

Virtual  Environment  Usability  Engineering . 630 

Usability  Techniques  . 630 

Sense  of  Presence  . 630 

Virtual  Environment  Ergonomics  . 631 

Application  Domains . 631 

VE  as  a  Selection  and  Training  Tool  . 631 

VE  as  Entertainment  Tool . 633 

VE  as  a  Medical  Tool  . 634 

Psychological/cognitive  rehabilitation 

applications  . 634 

Motor  rehabilitation  applications  . 634 

VE  as  a  System  Acquisition  Tool . 634 

Conclusions . 635 

Acknowledgments  . 635 

References  . 635 


622  •  STANNEY  AND  COHN 


INTRODUCTION 


Interactive  computing  has  evolved  through  the  years  from  cryp¬ 
tic  command-based  interfaces  to  a  collection  of  task-based  ap¬ 
plications  to  ecologically  valid  immersive  environments,  each 
advance  dissolving  more  of  the  barrier  between  users  and  their 
desired  actions.  To  many  with  virtual  environments  (VEs)  we 
have  reached  the  panacea  in  interactive  computing.  Such  envi¬ 
ronments  immerse  users  in  realistic  settings,  allowing  them  to 
engage  in  an  intuitive  and  intimate  manner  with  their  digital  uni¬ 
verse.  Such  capability  affords  the  opportunity  to  “learn  by  do¬ 
ing,”  “train  like  we  fight,”  and  “involve  me  and  I  understand.” 
While  early  VEs  were  low  in  resolution  and  sluggish  in  respon¬ 
siveness,  advances  in  display  and  tracking  technology  have 
largely  resolved  these  issues  such  that  today  VE  applications 
abound  (Davies,  2002).  Yet,  considerable  HCI  research  and  de¬ 
velopment  is  required  to  resolve  lingering  issues,  such  as  cyber¬ 
sickness  and  usability,  if  VE  technology  is  to  be  openly  embraced 
by  users.  This  chapter  reviews  the  current  state-of-the-art  in  VE 
technology,  provides  design  and  implementation  strategies,  dis¬ 
cusses  health  and  safety  concerns  and  potential  countermea¬ 
sures,  and  presents  the  latest  in  VE  usability  engineering  ap¬ 
proaches.  Current  efforts  in  a  number  of  application  domains 
are  reviewed.  The  chapter  should  enable  readers  to  better  spec¬ 
ify  design  and  implementation  requirements  for  VE  applications 


and  prepare  them  to  use  this  advancing  technology  in  a  man¬ 
ner  that  minimizes  health  and  safety  concerns. 


SYSTEM  REQUIREMENTS 


Virtual  environments  provide  multimodal  computer  generated 
experiences,  which  are  driven  by  the  hardware  and  software 
used  to  generate  a  virtual  world  (see  Fig.  31.1).  Hardware  inter¬ 
faces  consist  primarily  of  the 

•  Interface  devices  used  to  present  multimodal  information 
and  sense  the  VE; 

•  Tracking  devices  used  to  identify  head  and  limb  position  and 
orientation; 

•  Interaction  techniques  that  allow  users  to  navigate  through 
and  interact  with  the  virtual  world. 

Software  interfaces  include  the 

•  Modeling  software  used  to  generate  VEs; 

•  Autonomous  agents  that  inhabit  VEs; 

•  Communication  networks  used  to  support  multi-user  virtual 
environments. 


Hardware  Requirements 


Software  Requirements 


FIGURE  31.1.  Hardware  and  software  requirements  for  virtual  environment  generation. 
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Hardware  Requirements 

Virtual  environments  require  very  large  physical  memories, 
high-speed  processors,  high-bandwidth  mass-storage  capacity, 
and  high-speed  interface  ports  for  interaction  devices  (Durlach 
&  Mavor,  1995).  These  requirements  are  easily  met  by  today’s 
high-speed,  high-bandwidth  computing  systems,  many  of  which 
have  surpassed  the  Gigahertz  barrier  (e.g.,  AMD’s  Athlon,  Intel’s 
Pentium  IV  and  Itanium,  Sun’s  UltraSPARC  I\^  IBM’s  Power5). 
The  future  looks  even  brighter,  with  promises  of  massive  paral¬ 
lelism  in  quantum  computing,  which  will  allow  tomorrow’s 
computing  systems  to  be  exponentially  faster  than  their  ances¬ 
tors.  With  the  rapidly  advancing  ability  to  generate  complex  and 
large-scale  virtual  worlds,  hardware  advances  in  multimodal 
input/output  (I/O)  devices,  tracking  systems,  and  interaction 
techniques  are  needed  to  support  generation  of  increasingly  en¬ 
gaging  virtual  worlds.  In  addition,  the  coupling  of  augmented 
cognition  and  VE  technologies  can  lead  to  substantial  gains  in 
the  ability  to  evaluate  their  effectiveness. 

Multimodal  I/Os.  To  present  a  multimodal  VE  (see  chap¬ 
ter  14),  multiple  devices  are  used  to  present  information  to  'VE 
users.  Stanney  and  Zyda  (2002)  suggested  that  the  single  largest 
challenge  in  presenting  multimodal  VE  applications  is  the  ad¬ 
vances  required  in  peripheral  computer  connections.  Currently 
the  serial  ports  used  to  connect  position  trackers  and  other  pe¬ 
ripheral  devices  are  typically  those  designed  for  character  in¬ 
put  rather  than  high-speed  data  transfer.  This  creates  an  input 
port  speed  problem  that  must  be  resolved  if  multimodal  'VE  sys¬ 
tems  are  to  effectively  present  information  to  users. 

In  terms  of  peripheral  devices,  the  one  that  has  received  the 
greatest  attention,  both  in  hype  and  disdain,  is  almost  certainly 
the  head-mounted  display  (HMD).  One  benefit  of  HMDs  is 
their  compact  size,  as  an  HMD  when  coupled  with  a  head 
tracker  can  be  used  to  provide  a  similar  visual  experience  as  a 
multitude  of  bulky  displays  associated  with  Spatially  Immersive 
Displays  (SIDs)and  desktop  solutions.  There  are  three  main 
types  of  HMDs:  monocular  (e.g.,  one  display  source) ,  biocular 
(e.g.,  two  displays  with  separate  display  and  optics  paths  that 
show  one  image),  and  binocular  (e.g.,  stereoscopic  viewing  via 
two  image  generators)  (Vince,  2004).  HMDs  generally  use  CRT 
or  LCD  image  sources,  with  CRTs  typically  providing  higher  res¬ 
olution,  although  at  the  cost  of  increased  weight.  When  cou¬ 
pled  with  tracking  devices,  HMDs  can  be  used  to  present  3D  vi¬ 
sual  scenes  that  are  updated  as  a  user  moves  his  or  her  head 
about  a  virtual  world.  Although  this  often  provides  an  engaging 
experience,  due  to  poor  optics,  sensorial  mismatches,  and  slow 
update  rates,  these  devices  are  also  often  associated  with  ad¬ 
verse  effects  such  as  eyestrain  and  nausea  (May  &  Badcock, 
2002).  In  addition,  while  HMDs  have  come  down  substantially 
in  weight,  rendering  them  more  suitable  for  extended  wear, 
they  are  still  hindered  by  cumbersome  designs,  obstructive 
tethers,  suboptimal  resolution,  and  insufficient  fields  of  view. 
These  shortcomings  may  be  the  reason  behind  why,  in  a  recent 
review  of  HMD  devices,  more  than  a  third  had  been  discontin¬ 
ued  by  their  manufacturers  (Bungert,  2005).  Of  the  HMDs 
available,  there  are  several  low-  to  mid-cost  models,  which  are 
relatively  lightweight  and  provide  a  horizontal  field-of-view  and 
resolution  far  exceeding  predecessor  systems. 


Low-technology  stereo  viewing  VE  display  options  include 
anaglyph  methods,  where  a  viewer  wears  glasses  with  distinct 
color  polarized  filters;  parallel  or  cross-eyed  methods,  in  which 
right  and  left  images  are  displayed  adjacently  (parallel  or 
crossed),  requiring  the  viewer  to  actively  fuse  the  separate  im¬ 
ages  into  one  stereo  image;  parallax  stereogram  methods,  in 
which  an  image  is  made  by  interleaving  columns  of  two  images 
from  a  left-  and  right-eye  perspective  image  of  a  3D  scene;  po¬ 
larization  methods,  in  which  the  images  for  the  left  and  right 
eyes  are  projected  on  a  plane  through  two  orthogonal  linearly 
polarizing  filters  (e.g.,  the  right  image  is  polarized  horizontally; 
the  left  is  polarized  vertically)  and  glasses  with  polarization  filters 
are  donned  to  see  the  3D  effect;  Pulfrich  methods,  in  which  an 
image  of  a  scene  moves  sideways  across  the  viewer’s  field  of  view 
and  one  eye  is  covered  by  a  dark  filter  so  that  the  darkened  im¬ 
age  reaches  the  brain  later  causing  stereo  disparity;  and  shutter 
glass  methods  in  which  images  for  the  right  and  left  eyes  are  dis¬ 
played  in  quick  alternating  sequence  and  special  shutter  glasses 
are  worn  that  “close”  the  right  or  left  eye  at  the  correct  time 
(Halle,  1997;  Vince,  2004).  All  of  these  low-technology  solutions 
are  limited  in  terms  of  their  resolution,  the  maximum  number 
of  views  that  they  can  display,  and  clunky  implementation;  they 
can  also  be  associated  with  pseudoscopic  images  (e.g.,  the  depth 
of  an  object  can  appear  to  flip  inside  out). 

Other  options  in  visual  displays  include  Spatially  Immersive 
Displays  (SIDs)  (e.g.,  displays  that  surround  viewers  physically 
with  panoramic  large  field-of-view  imagery  generally  projected 
via  fixed  front  or  rear  projection  display  units;  Majumder, 
2003),  desktop  stereo  displays,  and  volumetric  displays  that  fill 
a  volume  of  space  with  a  “floating”  image  (Halle,  1997).  Exam¬ 
ples  of  SIDs  include  the  Cave  Automated  Virtual  Environment 
(CAVE)  (Cruz-Neira,  Sandin,  &  DeFanti,  1993),  ImmersaDesk, 
Power'Wall,  Infinity 'Wall,  and  VisionDome  (Majumder,  1999).  Is¬ 
sues  with  SIDs  include  a  stereo  view  that  is  correct  for  only  one 
or  a  few  viewers,  noticeable  overlaps  between  adjacent  projec¬ 
tions,  and  image  warp  on  curved  screens.  Desktop  display  sys¬ 
tems  have  advantages  over  SIDs  because  they  are  smaller,  eas¬ 
ier  to  configure  in  terms  of  mounting  cameras  and 
microphones,  easier  to  integrate  with  gesture  and  haptic  de¬ 
vices,  and  more  readily  provide  access  to  conventional  interac¬ 
tion  devices,  such  as  mice,  joysticks,  and  keyboards.  Issues  with 
such  displays  include  stereo  that  is  only  accurate  for  one  viewer 
and  a  limited-display  volume.  Volumetric  displays  are  advanta¬ 
geous  because  imagery  can  be  seen  from  a  multitude  of  view¬ 
ing  angles,  generally  without  the  need  for  goggles.  A  recent  ex¬ 
ample  of  this  type  of  display  is  the  Perspecta  Spatial  3D,  which 
projects  images  onto  a  rotating  flat  screen  inside  a  glass  dome 
(Favalora  et  al.,  2003).  Issues  with  volumetric  displays  include 
low  resolution  and  the  tendency  for  transparent  images  to  lose 
interposition  cues.  Also,  view-independent  shading  of  objects  is 
not  possible  with  volumetric  displays,  and  current  solutions 
do  not  exhibit  arbitrary  occlusion  by  interposition  of  objects 
(Halle,  1997). 

The  way  of  the  future  seems  to  be  wearable  computer  dis¬ 
plays  (e.g..  Microvision,  MicroOptical) ,  which  can  incorporate 
miniature  LCDs  directly  into  conventional  eyeglasses  (Treden- 
nick  &  Shimamoto,  2005).  If  designed  effectively,  these  devices 
should  eliminate  the  tethers  and  awkwardness  of  current  de¬ 
signs,  while  enlarging  the  field  of  view  and  enhancing  resolution. 
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Stanney  and  Zyda  (2002,  p.  3)  suggested  that  “with  advances  in 
wireless  and  laser  technologies  and  miniaturization  of  LCDs, 
during  the  next  decade  visual  display  technology  should  realize 
the  substantial  gains  necessary  to  provide  high  fidelity  virtual 
imagery  in  a  lightweight  non-cumbersome  manner.” 

When  virtual  environments  provide  audio  (see  chapter  9), 
the  interactive  experience  is  generally  greatly  enhanced  (Shil¬ 
ling  &  Shinn-Cunningham,  2002).  Audio  can  be  presented  via 
spatialized  or  nonspatialized  displays.  Just  as  stereo  visual  dis¬ 
plays  are  a  defining  factor  for  VE  systems,  so  are  “interactive” 
spatialized  audio  displays  (e.g.,  those  with  “on-the-fly”  position¬ 
ing  of  sounds).  VRSonic’s  SoundScape3D  (http://www.vrsonic 
.com/),  Aureal’s  A3D  Interactive  (http://www.a3d.com/),  and 
AuSIM3D  (http://ausim3d.com/)  are  examples  of  positional  3D 
audio  technology.  Recently  there  have  been  promising  devel¬ 
opments  in  new  sound  modeling  paradigms  (e.g.,  VRSonic’s 
ViBe  technology)  and  sound  design  principles  that  will  hope¬ 
fully  lead  to  a  new  generation  of  tools  for  designing  effective 
spatial-audio  environments  (Fouad,  2004;  Fouad  &  Balias,  2000; 
Hahn,  Fouad,  Gritz,  &  Lee,  1998). 

Developers  must  decide  if  sounds  should  be  presented  via 
headphones  or  speakers.  For  nonspatialized  audio,  most  audio 
characteristics  (e.g.,  timbre,  relative  volume),  are  generally  con¬ 
sidered  to  be  equivalent  whether  projected  via  headphones  or 
speakers.  This  is  not  so  for  spatialized  audio,  in  which  the  infor¬ 
mation  must  generally  be  projected  through  headphones  in  or¬ 
der  to  exploit  subtle  nuances.  However,  spatialized  audio  head¬ 
phones  tend  to  be  more  expensive  than  speakers,  and  they  also 
tend  to  require  customization  to  an  individual  listener  for  opti¬ 
mum  effect.  Speaker  systems  have  a  larger  footprint,  are  sensi¬ 
tive  to  room  acoustics,  and  the  listener  must  be  placed  in  the 
“sweet  spot”  (e.g.,  ideal  listening  position)  for  the  proper  3D  au¬ 
dio  effect  to  be  realized  (Shilling  &  Shinn-Cunningham,  2002). 
With  true  3D  audio,  a  sound  can  be  placed  in  any  location,  right 
or  left,  up  or  down,  near  or  far  via  the  use  of  a  head-related  trans¬ 
fer  function  (HRTF)  to  represent  the  manner  in  which  sound 
sources  change  as  a  listener  moves  his  or  her  head  (Begault, 
1994;  Butler,  1987;  Cohen,  1992).  Recent  advances  have  allowed 
for  the  development  of  personalized  HRTF  functions,  however, 
currently,  these  functions  still  require  a  significant  amount  of  cal¬ 
ibration  time  and  fail  to  provide  adequate  cues  for  front-to-back 
or  up-down  localization  (Crystal  Rivers  Engineering,  1995).  Ide¬ 
ally,  a  more  generalized  HRTF  could  be  developed  that  would 
be  applicable  to  a  multitude  of  users.  This  may  be  possible  by  us¬ 
ing  a  best-fit  HRTF  selection  process  in  which  one  finds  the  near¬ 
est  matching  HRTF  in  a  database  of  candidate  HRTFs  by  com¬ 
paring  the  physiological  characteristics  of  stored  HRTFs  to  those 
of  a  target  user  (Algazi,  Duda,  Thompson,  &  Avendano,  2001). 

While  not  as  commonly  incorporated  into  VEs  as  visual  and 
auditory  interfaces,  haptic  devices  (see  chapter  10)  can  be  used 
to  enhance  aspects  of  touch  and  movement  of  the  hand  or  body 
segments  while  interacting  with  a  virtual  environment.  Specifi¬ 
cally,  haptic  displays  can  be  used  to  support  two-way  commu¬ 
nications  between  humans  and  interactive  systems,  enabling 
bidirectional  interaction  between  a  user  and  his  or  her  sur¬ 
roundings  (Hale  &  Stanney,  2004).  In  general,  haptic  displays 
are  effective  at  alerting  people  to  critical  tasks  (e.g.,  warning), 
providing  a  spatial  frame  of  reference  within  one’s  personal 
space,  and  supporting  hand-eye  coordination  tasks.  Texture 


cues,  such  as  those  conveyed  via  vibrations  or  varying  pressures, 
are  effective  as  simple  alerts  and  may  speed  reaction  time  and 
aid  performance  in  degraded  visual  conditions  (Akamatsu, 
1994;  Biggs  &  Srinivasan,  2002;  Massimino  &  Sheridan,  1993; 
Mulgund,  Stokes,  Turieo,  &  Devine,  2002).  Kinesthetic  devices 
are  advantageous  when  tasks  involve  hand-eye  coordination 
(e.g.,  object  manipulation),  where  haptic  sensing  and  feedback 
are  key  to  performance.  Currently  available  haptic-interaction 
devices  include  static  displays  (e.g.,  convey  deformability  or 
Braille);  vibrotactile,  electrotactile,  and  pneumatic  displays  (e.g., 
convey  tactile  sensations  such  as  surface  texture  and  geometry, 
surface  slip,  surface  temperature);  force  feedback  systems  (e.g., 
convey  object  position  and  movement  distances);  and  ex¬ 
oskeleton  systems  (e.g.,  enhance  object  interaction  and  weight 
discrimination)  (Hale  &  Stanney,  2004).  Advances  in  this  area 
are  worth  pursuing  because  incorporating  haptic  feedback  in 
"VEs  can  substantially  enhance  performance  (Burdea,  1996). 

The  “vestibular  system  can  be  exploited  to  create,  prevent, 
or  modify  acceleration  perceptions”  in  virtual  environments 
(Lawson,  Sides,  &  Hickinbotham,  2002,  p.  137).  For  example, 
by  simulating  acceleration  cues,  a  person  can  be  psychologi¬ 
cally  transported  from  his  or  her  veridical  location,  such  as  sit¬ 
ting  in  a  chair  in  front  of  a  computer,  to  a  simulated  location, 
such  as  the  cockpit  of  a  moving  airplane.  "While  vestibular  cues 
can  be  stimulated  via  many  different  techniques  in  "VEs,  three  of 
the  most  promising  methods  are  physical  motion  of  the  user 
(e.g.,  motion  platforms),  wide  field-of-view  visual  displays  that 
induce  vection  (e.g.,  an  illusion  of  self-motion),  and  locomo¬ 
tion  devices  that  induce  illusions  of  self-motion  without  physi¬ 
cal  displacement  of  the  user  through  space  (e.g.,  walking  in 
place,  treadmills,  pedaling,  foot  platforms)  (Hettinger,  2002; 
Hollerbach,  2002;  Lawson  et  al.,  2002).  Of  these  options,  mo¬ 
tion  platforms  are  probably  the  most  advanced.  Motion  plat¬ 
forms  are  generally  characterized  via  their  range  of  motion/de¬ 
grees  of  freedom  and  actuator  type  (Isdale,  2000).  In  terms  of 
range  of  motion,  motion  platforms  can  move  a  person  in  many 
combinations  of  translational  (e.g.,  surge-longitudinal  motion, 
sway-lateral  motion,  heave-vertical  motion),  and  rotational 
(e.g.,  roll,  pitch,  yaw)  degrees  of  freedom  (DOF).  A  single-DOF 
translational  motion  system  might  provide  a  vibration  sensa¬ 
tion  via  of  a  “seat  shaker.”  A  common  6DOF  configuration  is  a 
hexapod,  which  consists  of  a  frame  with  six  or  more  extendable 
struts  (actuators)  connecting  a  fixed  base  to  a  movable  plat¬ 
form.  In  terms  of  actuators,  electrical  actuators  are  quiet  and 
relatively  maintenance  free;  however,  they  are  not  very  re¬ 
sponsive  and  they  cannot  hold  the  same  load  as  can  hydraulic 
or  pneumatic  systems.  Hydraulic  and  pneumatic  systems  are 
smoother,  stronger,  and  more  accurate;  however,  they  require 
compressors,  which  may  be  noisy.  Servos  are  expensive  and  dif¬ 
ficult  to  program. 

Although  other  multimodal  interactions  are  possible  (e.g., 
gustatory,  olfactory),  there  has  been  limited  research  and  de¬ 
velopment  beyond  the  primary  three  interaction  modes  (e.g., 
visual,  auditory,  haptic),  although  efforts  have  been  made  to 
support  advances  in  olfactory  interaction  (Jones,  Bowers,  "Wash¬ 
burn,  Cortes,  &  "Vijaya  Satya,  2004). 

Tracking  systems.  Tracking  systems  allow  determination 
of  users’  head  or  limb  position  and  orientation,  or  the  location 
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of  hand-held  devices  in  order  to  allow  interaction  with  virtual 
objects  and  traversal  through  3D  computer-generated  worlds 
(Foxlin,  2002).  Tracking  is  what  allows  the  visual  scene  in  a  VE  to 
coincide  with  a  user’s  point  of  view,  thereby  providing  an  ego¬ 
centric  real-time  perspective.  Tracking  systems  must  be  carefully 
coupled  with  the  visual  scene,  however,  to  avoid  unacceptable 
lags  (Kalawsky,  1993).  Advances  in  tracking  technology  have 
been  realized  in  terms  of  drift-corrected  gyroscopic  orientation 
trackers,  outside-in  optical  tracking  for  motion  capture,  and 
laser  scanners  (Foxlin,  2002).  The  future  of  tracking  technology 
is  likely  hybrid  tracking  systems,  with  an  acoustic-inertial  hy¬ 
brid  on  the  market  (see  http://www.isense.com/products/)  and 
several  others  in  research  labs  (e.g.,  magnetic-inertial,  optical- 
inertial,  and  optical-magnetic).  In  addition,  ultra-wideband  radio 
technology  holds  promise  for  an  improved  method  of  omni¬ 
directional  point-to-point  ranging. 

Tracking  technology  also  allows  for  gesture  recognition,  in 
which  human  position  and  movement  are  tracked  and  interpreted 
to  recognize  semantically  meaningful  gestures  (Turk,  2002) .  Track¬ 
ing  devices  that  are  worn  (e.g.,  gloves,  bodysuits)  are  currently 
more  advanced  than  passive  techniques  (e.g.,  camera,  sensors), 
yet  the  latter  hold  much  promise  for  the  future,  as  they  are  more 
powerful  and  less  obtrusive  than  those  that  must  be  worn. 

Interaction  techniques.  While  one  may  think  of  joy¬ 
sticks  and  gloves  when  considering  VE  interaction  devices, 
there  are  many  techniques  that  can  be  used  to  support  inter¬ 
action  with  and  traversal  through  a  virtual  environment.  Inter¬ 
action  devices  support  traversal,  pointing  and  selection  of 
virtual  objects,  tool  usage  (e.g.,  through  force  and  torque  feed¬ 
back),  tactile  interaction  (e.g.,  through  haptic  devices),  and  en¬ 
vironmental  stimuli  (e.g.,  temperature,  humidity)  (Bullinger, 
Breining,  &  Braun,  2001). 

Supporting  traversal  throughout  a  VE,  via  motion  interfaces, 
is  of  primary  importance  (Hollerbach,  2002).  Motion  interfaces 
are  categorized  as  either  active  (e.g.,  locomotion)  or  passive 
(e.g.,  transportation).  Active  motion  interfaces  require  self¬ 
propulsion  to  move  about  a  virtual  environment  (e.g.,  treadmill, 
pedaling  device,  foot  platforms).  Passive  motion  interfaces 
transport  users  within  a  VE  without  significant  user  exertion 
(e.g.,  inertial  motion,  as  in  a  flight  simulator,  or  noninertial  mo¬ 
tion,  such  as  in  the  use  of  a  joystick  or  gloves).  The  utility,  func¬ 
tionality,  cost,  and  safety  of  locomotion  interfaces  beyond  tradi¬ 
tional  options  (e.g.,  joysticks)  have  yet  to  be  proven.  In  addition, 
beyond  physical  training,  concrete  applications  for  active  mo¬ 
tion  interfaces  have  yet  to  be  clearly  delineated. 

Another  interaction  option  is  speech  control  (see  chapters 
8  and  35).  Speaker-independent  continuous  speech  recognition 
systems  are  currently  commercially  available  (Fluang,  1998) .  Eor 
these  systems  to  provide  effective  interaction,  however,  addi¬ 
tional  advances  are  needed  in  acoustic  and  language-modeling 
algorithms  to  improve  the  accuracy,  usability,  and  efficiency  of 
spoken  language  understanding. 

Gesture  interaction  allows  users  to  interact  through  nonver¬ 
bal  commands  conveyed  via  physical  movement  of  the  fingers, 
hands,  arms,  head,  face,  or  other  body  limbs  (Turk,  2002).  Ges¬ 
tures  can  be  used  to  specify  and  control  objects  of  interest,  di¬ 
rect  navigation,  manipulate  the  environment,  and  issue  mean¬ 
ingful  commands. 


To  support  natural  and  intuitive  interaction,  a  variety  of  in¬ 
teraction  techniques  can  be  coupled.  Combining  speech  inter¬ 
action  with  nonverbal  gestures  and  motion  interfaces  can  pro¬ 
vide  a  means  of  interaction  that  closely  captures  real-world 
communications . 

Augmented  cognition  techniques.  Augmented  cogni¬ 
tion  is  a  new  computing  paradigm  in  which  users  and  computers 
are  tightly  coupled  via  physiological  gauges  that  measure  the 
cognitive  state  of  users  and  adapt  interaction  to  optimize  human 
performance  (Schmorrow,  Stanney,  Wilson,  &  Young,  2005).  If 
incorporated  into  VE  applications,  augmented  cognition  could 
provide  a  means  of  evaluating  their  educational  validity  and 
compelling  nature.  Neuroscience  studies  have  established  that 
differential  aspects  of  the  brain  are  engaged  when  learning  dif¬ 
ferent  types  of  materials  and  the  areas  in  the  brain  that  are  acti¬ 
vated  change  with  increasing  competence  (Kennedy,  Drexler, 
Jones,  Compton,  &  Ordy,  2005).  Thus,  if  VE  users  were  im¬ 
mersed  in  an  educational  experience,  augmented  cognition 
technology  could  be  used  to  gauge  if  targeted  areas  of  the  brain 
were  being  activated  and  dynamically  modify  the  content  of  a  VE 
learning  curriculum  if  desired  activation  patterns  were  not  being 
generated.  Physiological  measures  could  also  be  used  to  detect 
the  onset  of  cybersickness  (see  “Cybersickness,  Adaptation,  and 
Aftereffects”)  and  to  assess  the  engagement,  awareness,  and  anx¬ 
iety  of  VE  users,  thereby  potentially  providing  much  more  ro¬ 
bust  measures  of  immersion  and  presence  (see  “Sense  of 
Pressence”).  Such  techniques  could  prove  invaluable  to  enter¬ 
tainment  VE  applications  (cf,  Badique,  Cavazza,  Klinker,  Mair, 
Sweeney,  Thalmann,  et  al.,  2002)  that  seek  to  provide  the  ulti¬ 
mate  experience,  military  training  VE  applications  (cf ,  Knerr, 
Breaux,  Goldberg,  &  Thurman,  2002)  that  seek  to  emulate  the 
“violence  of  action”  found  during  combat,  and  therapeutic  VE 
applications  (cf.  North,  North,  &  Coble,  2002;  Strickland, 
Hodges,  North,  &  Weghorst,  1997)  that  seek  to  overcome  dis¬ 
orders  such  as  fear  of  heights  or  flying. 

Software  Requirements 

Software  development  of  VE  systems  has  progressed  tremen¬ 
dously,  from  proprietary  and  arcane  systems,  to  development 
kits  that  run  on  multiple  platforms  (e.g.,  general-purpose  oper¬ 
ating  systems  to  workstations)  (Pountain,  1996).  Virtual-envi¬ 
ronment  system  components  are  becoming  modular  and  dis¬ 
tributed,  thereby  allowing  VE  databases  (e.g.,  editors  used  to 
design,  build,  and  maintain  virtual  worlds)  to  run  independently 
of  visualizers  and  other  multimodal  interfaces  via  network  links. 
Standard  APIs  (Application  Program  Interfaces)  (e.g.,  OpenGL, 
Direct-3D,  Mesa)  allow  multimodal  components  to  be  hard- 
ware-independent.  Virtual  environment  programming  lan¬ 
guages  are  advancing,  with  APIs,  libraries,  and  particularly  script¬ 
ing  languages  allowing  nonprogrammers  to  develop  virtual 
worlds  (Stanney  &  Zyda,  2002).  Advances  are  also  being  made  in 
modeling  of  autonomous  agents  and  communication  networks 
used  to  support  multiuser  virtual  environments. 

Modeling.  A  VE  consists  of  a  set  of  geometry,  the  spatial 
relationships  between  the  geometry  and  the  user,  and  the 
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change  in  geometry  invoked  by  user  actions  or  the  passage  of 
time  (Kessler,  2002).  Generally,  modeling  starts  with  building 
the  geometry  components  (e.g.,  graphical  objects,  sensors, 
viewpoints,  animation  sequences)  (Kalawsky  1993).  These  are 
often  converted  from  CAD  data.  These  components  then  get 
imported  into  the  VE  modeling  environment  and  rendered 
when  appropriate  sensors  are  triggered.  Color,  surface  textures, 
and  behaviors  are  applied  during  rendering.  Programmers  con¬ 
trol  the  events  in  a  VE  by  writing  task  functions,  which  become 
associated  with  the  imported  components. 

A  number  of  3D  modeling  languages  and  toolkits  are  avail¬ 
able  that  provide  intuitive  interfaces  and  run  on  multiple  plat¬ 
forms  and  renderers  (e.g.,  3D  Studio  Max,  AC3D  Modeler, 
ZBrush,  modo  203,  AccuRender,  ACIS  3D,  Ashlar-vellum  Ar- 
gonZKenon/Cobalt,  Carrara,  CINEMA  4D,  DX  Studio,  EON  Stu¬ 
dio,  MultiGen  Creator  and  Vega,  RenderWare,  solidThinking) 
(Ultimate  3D  Links,  2007).  In  addition,  there  are  scene  manage¬ 
ment  engines  (e.g.,  R3vis  Corporation’s  OpenRM  Scene  Graph 
is  an  API  that  provides  cross-platform  scene  management  and 
rendering  services)  that  allow  programmers  to  work  at  a  higher 
level,  defining  characteristics  and  behaviors  for  more  holistic 
concepts  (Kershner,  2002;  Menzies,  2002).  There  have  also  been 
advances  in  photorealistic  rendering  tools  (e.g..  Electric  Image 
Animation  System),  which  are  evolving  toward  full-featured 
physics-based  global  illumination  rendering  systems  (Heirich  & 
Arvo,  1997;  Merritt  &  Bacon,  1997).  Taken  together,  these  ad¬ 
vances  in  software  modeling  allow  for  the  generation  of  com¬ 
plex  and  realistic  VEs  that  can  run  on  a  variety  of  platforms, 
permitting  access  to  VE  applications  by  both  small-  and  large- 
scale  application-development  budgets. 

Autonomous  agents.  Autonomous  agents  are  synthetic 
or  virtual  human  entities  that  possess  some  degree  of  auton¬ 
omy,  social  ability,  reactivity,  and  pro-activeness  (Allbeck  & 
Badler,  2002;  also  see  chapter  15).  They  can  have  many  forms 
(e.g.,  human,  animal),  which  are  rendered  at  various  levels  of 
detail  and  style,  from  cartoonish  to  physiologically  accurate 
models.  Such  agents  are  a  key  component  of  many  VE  applica¬ 
tions  involving  interaction  with  other  entities,  such  as  adver¬ 
saries,  instructors,  or  partners  (Stanney  &  Zyda,  2002). 

There  has  been  significant  research  and  development  in 
modeling  embodied  autonomous  agents.  As  with  object  geom¬ 
etry,  agents  are  generally  modeled  off-line  and  then  rendered 
during  real-time  interaction.  While  the  required  level  of  detail 
varies,  modeling  of  hair  and  skin  adds  realism  to  an  agent’s  ap¬ 
pearance  (Allbeck  &  Badler,  2002).  There  are  a  few  toolkits  avail¬ 
able  to  support  agent  development,  with  one  of  the  most  no¬ 
table  offered  by  Boston  Dynamics,  Inc.  (BDI)  (http://www.bdi 
.com/),  a  spin-off  from  the  MIT  Artificial  Intelligence  Laboratory. 
BDI’s  products  allow 'VE  developers  to  work  directly  in  a  3D  data¬ 
base,  interactively  specifying  agent  behaviors,  such  as  paths  to 
traverse  and  sensor  regions.  The  resulting  agents  move  realisti¬ 
cally,  respond  to  simple  commands,  and  travel  about  a  "VE  as  di¬ 
rected.  Another  option  is  ArchVision’s  (http://www.archvision 
.com/)  3D  Rich  Photorealistic  Content  (RPC)  People. 

Networks.  Distributed  networks  allow  multiple  users  at 
diverse  locations  to  interact  within  the  same  virtual  environ¬ 
ment.  Improvements  in  communications  networks  are  required 


to  allow  realization  of  such  shared  experiences  in  which  users, 
objects,  processes,  and  autonomous  agents  from  diverse  loca¬ 
tions  interactively  collaborate  (Durlach  &  Mavor,  1995).  Yet  the 
foundation  for  such  collaboration  has  been  built  within  the  Next 
Generation  Internet  (NGI).  The  NGI  initiative  (http://www.ngi 
.gov/)  aimed  to  connect  a  number  of  universities  and  national 
labs  at  speeds  100  to  1,000  times  faster  than  the  1996  Internet  in 
order  to  experiment  with  collaborative-networking  technolo¬ 
gies,  such  as  high-quality  video  conferencing  and  audio  and 
video  streams.  The  NGI’s  successor,  the  Large  Scale  Networking 
(LSN)  Coordinating  Group  (http://www.hpcc.gov/iwg/lsn.html), 
aims  to  develop  technologies  and  services  that  enable  wireless, 
optical,  mobile,  and  wireline  communications  and  enhance  net¬ 
working  software,  with  a  goal  of  achieving  the  terabit  per  sec¬ 
ond  objective  which  proved  elusive  to  the  NGI. 

In  addition,  Internet2  is  using  existing  networks  (e.g.,  NSE’s 
vBNS — ^Very  high-speed  Backbone  Network  Service)  to  deter¬ 
mine  the  transport  designs  necessary  to  carry  real-time  multi- 
media  data  at  high  speed  (http://apps.internet2.edu/).  Distrib¬ 
uted  'VE  applications  can  leverage  the  special  capabilities  (e.g., 
high  bandwidth,  low  latency,  low  jitter)  of  these  advancing  net¬ 
work  technologies  to  provide  shared  virtual  worlds  (Singhal  & 
Zyda,  1999). 


DESIGN  AND  IMPLEMENTATION  STRATEGIES 


■While  many  conventional  HCI  techniques  can  be  used  to  design 
and  implement  'VE  systems,  there  are  unique  cognitive,  content, 
product  liability,  and  usage  protocol  considerations  that  must  be 
addressed  (see  Fig.  31.2). 

Cognitive  Aspects 

The  fundamental  objective  of  VE  systems  is  to  provide  multi¬ 
modal  interaction  or,  when  sensory  modalities  are  missing,  per¬ 
ceptual  illusions  that  support  human  information  processing 
in  pursuit  of  a  VE  application’s  goals,  which  could  range  from 
training  to  entertainment.  Ancillary  yet  fundamental  to  this  goal 
is  to  minimize  cognitive  obstacles,  such  as  navigational  difficul¬ 
ties,  that  could  render  a 'VE  application’s  goals  inaccessible. 

Multimodal  interaction  design.  Virtual  environments 
are  designed  to  provide  users  with  direct  manipulative  and 
intuitive  interaction  with  multisensory  stimulation  (Bullinger 
et  al.,  2001).  The  number  of  sensory  modalities  stimulated  and 
the  quality  of  this  multisensory  interaction  are  critical  to  the  re¬ 
alism  and  potential  effectiveness  of  VE  systems  (Popescu,  Bur- 
dea,  &  Trefftz,  2002).  Yet  there  is  currently  a  limited  under¬ 
standing  of  how  to  effectively  provide  such  sensorial  parallelism 
(Burdea,  1996);  however,  Stanney,  Samman,  Reeves,  Hale,  Buff, 
Bowers  et  al.  (2004)  provided  a  set  of  preliminary  cross-modal 
integration  rules.  These  rules  consider  aspects  of  multimodal 
interaction,  including:  (a)  temporal  and  spatial  coincidence, 
(b)  working  memory  capacity,  (c)  intersensory  facilitation  ef¬ 
fects,  (d)  congruency,  and  (e)  inverse  effectiveness.  When  mul¬ 
timodal  sensory  information  is  provided  to  users  it  is  essential 
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1)  Designing  VE  stimulus  to  minimize  adverse  effects 

2)  Quantifying  the  stimulus  intensity  of  a  VE  system 

3)  identifying  individual  capacity  of  target  user  population 
to  resist  adverse  effects  of  VE  exposure 

4)  Setting  exposure  duration  and  inter-session  interval  to 
minimize  adverse  effects 

5)  Educating  users  regarding  potential  risks  of  VE  exposure.. 


FIGURE  31.2.  Design  and  implementation  considerations  for  virtual  environments. 


to  consider  such  rules  governing  the  integration  of  multiple 
sources  of  sensory  feedback.  VE  users  have  adapted  their 
perception-action  systems  to  “expect”  a  particular  type  of  in¬ 
formation  flow  in  the  real  world;  VEs  run  the  risk  of  breaking 
these  perception-action  couplings  if  the  full  range  of  sensory  is 
not  supported  or  if  it  is  supported  in  a  manner  that  is  not  con¬ 
tiguous  with  real-world  expectations.  Such  pitfalls  can  be 
avoided  through  consideration  of  the  coordination  between 
sensing  and  user  command  and  the  transposition  of  senses  in 
the  feedback  loop.  Specifically,  command  coordination  consid¬ 
ers  user  input  as  primarily  monomodal  and  feedback  to  the  user 
as  multimodal.  Designers  need  to  consider  which  input  modal¬ 
ities  are  most  appropriate  to  support  execution  of  a  given  task 
within  the  VE,  if  there  is  any  need  for  redundant  user  input,  and 
whether  or  not  users  can  effectively  handle  such  parallel  input 
(Stanney  Mourant,  &  Kennedy,  1998;  Stanney  et  al.,  2004). 

A  limiting  factor  in  supporting  multimodal  sensory  stimula¬ 
tion  in  VEs  is  the  current  state  of  interface  technologies.  With 
the  exception  of  the  visual  modality,  current  levels  of  technol¬ 
ogy  simply  cannot  even  begin  to  reproduce  virtually  those  sen¬ 
sations,  such  as  haptics  and  audition,  which  users  expect  in  the 
real  world.  One  solution  to  current  technological  shortcomings, 
sensorial  transposition,  occurs  when  a  user  receives  feedback 
through  senses  other  than  those  expected,  which  may  occur  be¬ 
cause  a  command  coordination  scheme  has  substituted  avail¬ 


able  sensory  feedback  for  those  that  cannot  be  generated  within 
a  virtual  environment.  Sensorial  substitution  schemes  may  be 
one  for  one  (e.g.,  visual  for  force)  or  more  complex  (e.g.,  visual 
for  force  and  auditory;  visual  and  auditory  for  force).  If  designed 
effectively,  command  coordination  and  sensory  substitution 
schemes  should  provide  multimodal  interaction  that  allows  for 
better  user  control  of  the  virtual  environment.  On  the  other 
hand,  if  designed  poorly,  these  solutions  may  in  fact  exacerbate 
interaction  problems. 

Perceptual  illusions.  When  sensorial  transpositions  are 
used,  there  is  an  opportunity  for  perceptual  illusions  to  occur. 
With  perceptual  illusions,  certain  perceptual  qualities  perceived 
by  one  sensory  system  are  influenced  by  another  sensory  sys¬ 
tem  (e.g.,  “feel”  a  squeeze  when  you  see  your  hand  “grabbing”  a 
virtual  object).  Such  illusions  could  simplify  and  reduce  the  cost 
of  VE  development  efforts  (Storms,  2002).  For  example,  when 
attending  to  a  visual  image  coupled  with  a  low-quality  auditory 
display  auditory-visual  cross-modal  perception  allows  for  an  in¬ 
crease  in  the  perceived  quality  of  the  visual  image.  Thus,  in  this 
case  if  the  visual  Image  is  the  focus  of  the  task,  there  may  be  no 
need  to  use  a  high-quality  auditory  display.  Unfortunately,  little 
is  known  about  how  to  leverage  these  phenomena  to  reduce 
development  costs  while  enhancing  one’s  experience  in  a  vir¬ 
tual  environment.  Perhaps  the  one  exception  is  vection  (e.g.,  a 
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compelling  illusion  of  self-motion  throughout  a  virtual  world), 
which  is  known  to  be  enhanced  via  a  number  of  display  factors, 
including  a  wide  field  of  view  and  high  spatial  frequency  content 
(Hettinger,  2002).  Other  such  illusions  exist  (e.g.,  visual  domi¬ 
nance)  and  could  likewise  be  leveraged  if  perceptual  and  cog¬ 
nitive  design  principles  are  identified  that  can  be  used  to  trigger 
and  capitalize  on  these  illusory  phenomena. 

Navigation  and  wayfinding.  Effective  multimodal  in¬ 
teraction  design  and  use  of  perceptual  illusions  can  be  impeded 
if  navigational  complexities  arise.  Navigation  is  the  aggregate  of 
wayfinding  (e.g.,  cognitive  planning  of  one’s  route)  and  the 
physical  movement  that  allows  travel  throughout  a  virtual  envi¬ 
ronment  (Darken  &  Peterson,  2002).  A  number  of  tools  and 
techniques  have  been  developed  to  aid  wayfinding  in  virtual 
worlds,  including  maps,  landmarks,  trails,  and  direction  find¬ 
ing.  These  tools  can  be  used  to  display  current  position,  current 
orientation  (e.g.,  compass),  log  movements  (e.g.,  “breadcrumb” 
trails),  demonstrate  or  access  the  surround  (e.g.,  maps,  binoc¬ 
ulars),  or  provide  guided  movement  (e.g.,  signs,  landmarks) 
(Chen  &  Stanney,  1999).  Darken  and  Peterson  (2002)  provided 
a  number  of  principles  concerning  how  best  to  use  these  tools. 
If  effectively  applied  to  VEs,  these  principles  should  lead  to  re¬ 
duced  disorientation  and  enhanced  wayfmding  in  large-scale  vir¬ 
tual  environments. 

Content  Development 

Content  development  is  concerned  with  the  design  and  con¬ 
struction  of  the  virtual  objects  and  synthetic  environment  that 
support  a  VE  experience  (Isdale,  Eencott,  Heim,  &  Daly,  2002). 
While  this  medium  can  leverage  existing  HCI  design  principles, 
it  has  unique  design  challenges  that  arise  due  to  the  demands  of 
real-time,  multimodal,  collaborative  interaction.  In  fact,  content 
designers  are  just  starting  to  appreciate  and  determine  what  it 
means  to  create  a  full  sensory  experience  with  user  control  of 
both  point  of  view  and  narrative  development.  Aesthetics  is 
thought  to  be  a  product  of  agency  (e.g.,  pleasure  of  being),  nar¬ 
rative  potential,  presence  and  co-presence  (e.g.,  existing  in  and 
sharing  the  virtual  experience),  as  well  as  transformation  (e.g., 
assuming  another  persona)  (Church,  1999;  Murray,  1997).  Con¬ 
tent  development  should  be  about  stimulating  perceptions 
(e.g.,  sureties,  surprises),  as  well  as  contemplation  over  the  na¬ 
ture  of  being  (Eencott,  1999;  Isdale  et  al.,  2002). 

Existing  design  techniques,  for  example  from  entertainment, 
video  games,  and  theme  parks  can  be  used  to  support  VE  con¬ 
tent  development  (see  chapters  43  and  46).  Game  development 
techniques  that  can  be  leveraged  in  VE  content  development  in¬ 
clude  but  are  not  limited  to  providing  a  clear  sense  of  purpose, 
emotional  objectives,  perceptual  realism,  intuitive  interfaces, 
multiple  solution  paths,  challenges,  a  balance  of  anxiety  and  re¬ 
ward,  as  well  as  an  almost  unconscious  flow  of  interaction  (Is¬ 
dale  et  al.,  2002).  From  theme  park  design,  content  develop¬ 
ment  suggestions  include:  (a)  having  a  story  that  provides  the 
all-encompassing  theme  of  the  VE  and  thus  the  “rules”  that 
guide  design,  (b)  providing  location  and  purpose,  (c)  using 
cause-and-effect  to  lead  users  to  their  own  conclusions,  and  (d) 
anchoring  users  in  the  familiar  (Carson,  2000a,  2000b).  While 


these  suggestions  provide  guidelines  for  VE  content  develop¬ 
ment,  considerable  creativity  is  still  an  essential  component  of 
the  process.  Isdale  et  al.  (2002)  suggested  that  the  challenges 
of  VE  content  development  highlight  the  need  for  art  to  com¬ 
pliment  technology. 

Products  Liability 

Those  who  implement  VE  systems  must  be  cognizant  of  po¬ 
tential  product-liability  concerns.  Exposure  to  a  VE  system  of¬ 
ten  produces  unwanted  side  effects  that  could  render  users  in¬ 
capable  of  functioning  effectively  upon  return  to  the  real  world. 
These  adverse  effects  may  include  nausea  and  vomiting,  pos¬ 
tural  instability,  visual  disturbances,  and  profound  drowsiness 
(Stanney,  Salvendy  Deisigner,  DiZio,  Ellis,  Ellison,  et  al.,  1998). 
As  users  subsequently  take  on  their  normal  routines,  unaware 
of  these  lingering  effects,  their  safety  and  well-being  may  be 
compromised.  If  a  VE  product  occasions  such  problems,  liabil¬ 
ity  of  VE  developers  or  system  administrators  could  range  from 
simple  accountability  (e.g.,  reporting  what  happened)  to  full  le¬ 
gal  liability  (e.g.,  paying  compensation  for  damages)  (Kennedy 
&  Stanney,  1996;  Kennedy,  Kennedy,  &  Bartlett,  2002).  In  or¬ 
der  to  minimize  their  liability,  manufacturers  and  corporate 
users  should  design  systems  and  provide  usage  protocols  to 
minimize  risks,  warn  users  about  potential  aftereffects,  monitor 
users  during  exposure,  assess  users’  risk,  and  debrief  users 
after  exposure. 

Usage  Protocols 

In  order  to  minimize  products  liability  concerns,  VE  usage  pro¬ 
tocols  should  be  carefully  designed.  A  comprehensive  VE  us¬ 
age  protocol  will  involve  the  following  activities  (see  Stanney, 
Graeber,  &  Kennedy,  in  press) : 

1.  Designing  "VE  stimulus  to  minimize  adverse  effects: 

•  Ensure  system  lags/latencies  are  minimized  and  stable; 

•  Ensure  frame  rates  are  optimized; 

•  Ensure  interpupilary  distance  of  visual  display  is  ad¬ 
justable  and  set  appropriately; 

•  Determine  if  size  of  field  of  view  is  causing  excessive  vec- 
tion,  such  that  the  spatial  frequency  content  of  the  visual 
scene  should  be  reduced; 

•  Determine  if  multimodal  feedback  is  integrated  into  the 
■VE  such  that  sensory  conflicts  are  minimized. 

2.  Quantifying  stimulus  intensity  of  a  "VE  system  using  the  Sim¬ 
ulator  Sickness  Questionnaire  (Kennedy,  Lane,  Berbaum, 
&  Lilienthal,  1993)  or  other  means  and  comparing  the  out¬ 
come  to  other  systems  (see  Stanney  et  al.,  2005).  If  a  given 
VE  system  is  of  high  intensity  (say  the  50th  or  higher  per¬ 
centile)  and  is  not  redesigned  to  lessen  its  impact,  signifi¬ 
cant  dropouts  can  be  expected. 

3.  Identifying  individual  capacity  of  target  user  population  to 
resist  adverse  effects  of  VE  exposure  via  the  Motion  History 
Questionnaire  (Kennedy  &  Graybiel,  1965)  or  other  means. 
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•  For  highly  susceptible  population,  redesigning  the  VE  to 
reduce  stimulus  intensity  or  expect  high  dropout  rates; 

•  Providing  warnings  for  those  with  severe  susceptibility  to  mo¬ 
tion  sickness,  seizures,  migraines,  cold,  flu,  or  other  ailments. 

4.  Setting  exposure  duration  and  intersession  interval  to  min¬ 
imize  adverse  effects  by  (see  Stanney  et  al.,  2005): 

•  Limiting  initial  exposures  (e.g.,  10  minutes  or  less); 

•  Setting  intersession  exposure  intervals  two  to  five  days 
apart  to  enhance  individual  adaptability; 

•  Incrementally  increasing  VE  stimulus  intensity  within  one 
exposure  (incremental  adaptation)  or  across  multiple  ex¬ 
posures  (incremental  habituation); 

•  Determining  if  users  are  able  to  complete  an  adaptive 
process  at  each  increment  of  stimulus  intensity  (e.g.,  de¬ 
pressing  sensitization  to  preexposure  levels  or  below);  if 
not,  lowering  stimulus  intensity; 

•  Avoiding  repeated  exposure  intervals  occurring  less  than 
two  hours  apart  if  adverse  effects  are  experienced  in  an 
exposure. 

5.  Educating  users  regarding  potential  risks  ofVE  exposure  (e.g., 
inform  users  they  may  experience  nausea,  malaise,  disorien¬ 
tation,  headache,  dizziness,  vertigo,  eyestrain,  drowsiness,  fa¬ 
tigue,  pallor,  sweating,  increased  salivation,  and  vomiting). 

6.  Educating  users  regarding  potential  adverse  aftereffects  of 
VE  exposure  (e.g.,  inform  users  they  may  experience  dis¬ 
turbed  visual  functioning,  visual  flashbacks,  and  unstable 
locomotor  and  postural  control  for  prolonged  periods  post¬ 
exposure). 

7.  Instructing  users  to  terminate  VE  interaction  if  they  start 
to  feel  ill. 

8.  Providing  adequate  air  flow  and  comfortable  thermal 
conditions. 

9.  Adjusting  equipment  to  minimize  fatigue. 

10.  Eor  strong  VE  stimuli,  warning  users  to  avoid  extraordinary 
maneuvers  (e.g.,  flying  backward  or  experiencing  high  rates 
of  linear  or  rotational  acceleration)  during  initial  interaction. 

11.  Providing  an  attendant  to  monitor  users’  behavior  and  en¬ 
sure  their  well-being: 

•  Attendant  should  terminate  exposure  if  they  see  red  flags 
(e.g.,  excessive  sweating,  verbal  frustration,  prolonged 
lack  of  movement  in  VE,  and  less  overall  movement  [e.g., 
restricting  head  movement]  or  if  users  verbally  request 
termination  or  complain  of  excessive  symptoms); 

•  Attendant  should  assess  well  being  of  users’  postexpo¬ 
sure  (e.g.,  can  use  battery  similar  to  field  sobriety  tests; 
Tharp,  Burns,  &  Moskowitz,  1981). 

12.  Specifying  amount  of  time  postexposure  that  users  must  re¬ 
main  on  premises  before  driving  or  participating  in  other 
such  high-risk  activities.  Do  not  allow  individuals  who  fail 
postexposure  tests  or  experience  adverse  aftereffects  to 
conduct  high-risk  activities  until  they  have  recovered  (e.g., 
have  someone  else  drive  them  home). 

13.  Calling  users  the  next  day  or  having  them  call  to  report  any 
prolonged  adverse  effects. 


Regardless  of  the  strength  of  the  stimulus  or  the  susceptibil¬ 
ity  of  the  user,  following  a  systematic  usage  protocol  can  mini¬ 
mize  the  adverse  effects  associated  with  "VE  exposure. 


HEALTH  AND  SAFETY  ISSUES 


The  health  and  safety  risks  associated  with  VE  exposure  com¬ 
plicate  usage  protocols  and  lead  to  products  liability  concerns. 
It  is  thus  essential  to  understand  these  issues  when  utilizing  VE 
technology.  There  are  both  physiological  and  psychological  risks 
associated  with  VE  exposure,  the  former  being  related  primar¬ 
ily  to  sickness  and  aftereffects  and  the  latter  primarily  being  con¬ 
cerned  with  the  social  impact. 

Cybersickness,  Adaptation,  and  Aftereffects 

Motion-sickness-like  symptoms  and  other  aftereffects  (e.g.,  bal¬ 
ance  disturbances,  visual  stress,  altered  hand-eye  coordination) 
are  unwanted  byproducts  of 'VE  exposure.  The  sickness  related 
to  VE  systems  is  commonly  referred  to  as  “cybersickness”  (Mc¬ 
Cauley  &  Sharkey,  1992).  Some  of  the  most  common  symptoms 
exhibited  include  dizziness,  drowsiness,  headache,  nausea,  fa¬ 
tigue,  and  general  malaise  (Kennedy  et  al.,  1993).  More  than 
80%  of  users  will  experience  some  level  of  disturbance,  with  ap¬ 
proximately  12%  ceasing  exposure  prematurely  due  to  this  ad¬ 
versity  (Stanney,  Kingdon,  Nahmens,  &  Kennedy,  2003).  Of 
those  who  drop  out,  approximately  10%  can  be  expected  to 
have  an  emetic  response  (e.g.,  vomit),  however,  only  1-2%  of  all 
users  will  have  such  a  response.  These  adverse  effects  are 
known  to  increase  in  incidence  and  intensity  with  prolonged  ex¬ 
posure  duration  (Kennedy,  Stanney,  &  Dunlap,  2000).  'While 
most  users  will  experience  some  level  of  adverse  effects,  symp¬ 
toms  vary  substantially  from  one  individual  to  another,  as  well  as 
from  one  system  to  another  (Kennedy  &  Eowlkes,  1992).  These 
effects  can  be  assess  via  the  Simulator  Sickness  Questionnaire 
(Kennedy  et  al.,  1993),  with  values  above  20  requiring  due  cau¬ 
tion  (e.g.,  warn  and  observe  users)  (Stanney  et  al.,  2005). 

To  overcome  such  adverse  effects,  individuals  generally  un¬ 
dergo  physiological  adaptation  during  VE  exposure.  This  adap¬ 
tation  is  the  natural  and  automatic  response  to  an  intersensorily 
imperfect  VE  and  is  elicited  due  to  the  plasticity  of  the  human 
nervous  system  (Welch,  1978).  Due  to  technological  flaws  (e.g., 
slow  update  rate,  sluggish  trackers),  users  of  VE  systems  may 
be  confronted  with  one  or  more  intersensory  discordances 
(e.g.,  visual  lag,  a  disparity  between  seen  and  felt  limb  position). 
In  order  to  perform  effectively  in  the  'VE,  they  must  compensate 
for  these  discordances  by  adapting  their  psychomotor  behav¬ 
ior  or  visual  functioning.  Once  interaction  with  a  VE  is  discon¬ 
tinued,  these  compensations  persist  for  some  time  after  expo¬ 
sure,  leading  to  aftereffects. 

Once  VE  exposure  ceases  and  users  return  to  their  natural 
environment  they  are  likely  unaware  that  interaction  with  the 
VE  has  potentially  changed  their  ability  to  effectively  interact 
with  their  normal  physical  environment  (Stanney  &  Kennedy, 
1998).  Several  different  kinds  of  aftereffects  may  persist  for 
prolonged  periods  following  VE  exposure  (Welch,  1997).  Eor 
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example,  hand-eye  coordination  can  be  degraded  via  percep¬ 
tual-motor  disturbances  (Kennedy,  Stanney,  Ordy,  &  Dunlap, 
1997;  Rolland,  Biocca,  Barlow,  &  Kancherla,  1995),  postural  sway 
can  arise  (Kennedy  &  Stanney,  1996),  as  can  changes  in  the 
vestibulo-ocular  reflex  (VOR),  or  one’s  ability  to  stabilize  an  im¬ 
age  on  the  retina  (Draper,  Prothero,  &  Viirre,  1997).  The  impli¬ 
cations  of  these  aftereffects  are  that: 

1.  VE  exposure  duration  may  need  to  be  minimized; 

2.  Highly  susceptible  individuals  or  those  from  clinical  popula¬ 
tions  (e.g.,  those  prone  to  seizures)  may  need  to  avoid  or 
be  banned  from  exposure; 

3.  Users  should  be  closely  monitored  during  VE  exposure; 

4.  Users’  activities  should  be  closely  monitored  for  a  consider¬ 
able  period  of  time  postexposure  to  avoid  personal  injury 
or  harm. 

Social  Impact 

Virtual  environment  technology,  like  its  ancestors  (e.g.,  televi¬ 
sion,  computers),  has  the  potential  for  negative  social  implica¬ 
tions  through  misuse  and  abuse  (Kallman,  1993;  also  see  chap¬ 
ter  61).  Yet  violence  in  VE  is  nearly  inevitable,  as  evidenced  by 
the  violent  content  of  popular  video  games.  Such  animated  vi¬ 
olence  is  a  known  favorite  over  the  portrayal  of  more  benign 
emotions  such  as  cooperation,  friendship,  or  love  (Sheridan, 
1993;  also  see  chapter  4).  The  concern  is  that  users  who  engage 
in  what  seems  like  harmless  violence  in  the  virtual  world  may 
become  desensitized  to  violence  and  mimic  this  behavior  in  the 
look-alike  real  world. 

Currently,  it  is  not  clear  whether  or  not  such  violent  behavior 
will  result  from  'VE  exposure,  early  research,  however,  is  not  re¬ 
assuring.  Calvert  and  Tan  (1994)  found  VE  exposure  to  sig¬ 
nificantly  increase  the  physiological  arousal  and  aggressive 
thoughts  of  young  adults  (Calvert  &  Tan,  1994).  Perhaps  more 
disconcerting  was  that  neither  aggressive  thoughts  nor  hostile 
feelings  were  found  to  decrease  due  to  VE  exposure,  thus  pro¬ 
viding  no  support  for  catharsis.  Such  increased  negative  stimu¬ 
lation  may  then  subsequently  be  channeled  into  real-world 
activities.  The  ultimate  concern  is  that  VE  immersion  may  po¬ 
tentially  be  a  more  powerful  perceptual  experience  than  past, 
less  interactive  technologies,  thereby  increasing  the  negative  so¬ 
cial  impact  of  this  technology  (Calvert,  2002).  A  proactive 
approach  is  needed  which  weighs  the  risks  and  potential  con¬ 
sequences  associated  with  VE  exposure  against  the  benefits. 
■Waiting  for  the  onset  of  harmful  social  consequences  should  not 
be  tolerated. 


VIRTUAL  ENVIRONMENT 
USABILITY  ENGINEERING 


Most  'VE  user  interfaces  are  fundamentally  different  from  tradi¬ 
tional  graphical  user  interfaces,  with  unique  I/O  devices,  per¬ 
spectives,  and  physiological  interactions.  Thus,  when  develop¬ 
ers  and  usability  practitioners  attempt  to  apply  traditional 
usability  engineering  methods  to  the  evaluation  of  VE  systems 


they  And  few  if  any  that  are  particularly  well  suited  to  these  en¬ 
vironments  (for  notable  exceptions  see  Gabbard,  Hix,  &  Swan, 
1999;  Hix  &  Gabbard,  2002;  Stanney,  Mollaghasemi,  &  Reeves, 
2000).  There  is  a  need  to  modify  and  optimize  available  tech¬ 
niques  to  meet  the  needs  of 'VE  usability  engineering,  as  well  as 
to  better  characterize  factors  unique  to  VE  usability,  including 
sense  of  presence  and  'VE  ergonomics. 

Usability  Techniques 

Assessment  of  usability  for  VE  systems  must  go  beyond  tradi¬ 
tional  approaches,  which  are  concerned  with  the  determina¬ 
tion  of  effectiveness,  efficiency,  and  user  satisfaction  (see  chap¬ 
ter  55).  Evaluators  must  consider  whether  multimodal  input 
and  output  is  optimally  presented  and  integrated,  navigation 
is  supported  to  allow  the  VE  to  be  readily  traversed,  object  ma¬ 
nipulation  is  intuitive  and  simple,  content  is  immersive  and  en¬ 
gaging,  and  the  system  design  optimizes  comfort  while  mini¬ 
mizing  sickness  and  aftereffects.  The  affective  elements  of 
interaction  become  important  when  evaluating  VE  systems  (see 
chapter  58).  It  is  an  impressive  task  to  ensure  that  all  of  these 
criteria  are  met. 

Gabbard,  Hix,  and  Swan  (1999)  have  developed  a  taxonomy 
of 'VE  usability  characteristics  that  can  serve  as  a  foundation  for 
identifying  and  evaluating  usability  criteria  particularly  relevant 
to  VE  systems.  Stanney,  Mollaghasemi,  and  Reeves  (2000)  used 
this  taxonomy  as  the  foundation  on  which  to  develop  an  auto¬ 
mated  system,  jWAUVE'  (Multicriteria  Assessment  of  Usability  for 
Virtual  Environments),  which  assesses  VE  usability  in  terms  of 
how  effectively  each  of  the  following  are  designed:  (a)  naviga¬ 
tion,  (b)  user  movement,  (c)  object  selection  and  manipulation, 
(d)  visual  output,  (e)  auditory  output,  (f )  haptic  output,  (g)  pres¬ 
ence,  (h)  immersion,  (i)  comfort,  (j)  sickness,  and  (k)  afteref¬ 
fects.  MAUVE  can  be  used  to  support  expert  evaluations  of 'VE 
systems,  similar  to  the  manner  in  which  traditional  heuristic 
evaluations  are  conducted.  Due  to  such  issues  as  cybersickness 
and  aftereffects,  it  is  essential  to  use  these  or  other  techniques 
(c£.  Modified  Concept  Book  Usability  Evaluation  Methodology; 
Swartz,  2003)  to  ensure  the  usability  of  VE  systems,  not  only  to 
avoid  rendering  them  ineffective  but  also  to  ensure  that  they  are 
not  hazardous  to  users. 

Sense  of  Presence 

One  of  the  usability  criteria  unique  to  VE  systems  is  sense  of 
presence.  Virtual  environments  have  the  unique  advantage  of 
leveraging  the  imaginative  ability  of  individuals  to  psychologically 
“transport”  themselves  to  another  place,  one  that  may  not  exit  in 
reality  (Sadowski  &  Stanney,  2002).  To  support  such  transporta¬ 
tion,  'VEs  provide  physical  separation  from  the  real  world,  by  im¬ 
mersing  users  in  the  virtual  world  via  an  HMD,  then  imparting 
sensorial  sensations  via  multimodal  feedback  that  would  natu¬ 
rally  be  present  in  the  alternate  environment.  Focus  on  generat¬ 
ing  such  presence  is  one  of  the  primary  characteristics  distin¬ 
guishing  'VEs  from  other  means  of  displaying  information. 

Presence  has  been  defined  as  the  subjective  perception  of 
being  immersed  in  and  surrounded  by  a  virtual  world  rather 
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than  the  physical  world  one  is  currently  situated  in  (Stanney, 
Salvendy,  et  al.,  1998).  Virtual  environments  that  engender  a 
high  degree  of  presence  are  thought  to  be  more  enjoyable,  ef¬ 
fective,  and  well  received  by  users  (Sadowski  &  Stanney,  2002). 
To  enhance  presence,  designers  of  VE  systems  should  spread 
detail  around  a  scene,  let  user  interaction  determine  when  to 
reveal  important  aspects,  maintain  a  natural  and  realistic,  yet 
simple  appearance,  and  utilize  textures,  colors,  shapes,  sounds, 
and  other  features  to  enhance  realism  (Kaur,  1999).  To  generate 
the  feeling  of  immersion  within  the  environment,  designers 
should  isolate  users  from  the  physical  environment  (use  of  an 
HMD  may  be  sufficient),  provide  content  that  involves  users  in 
an  enticing  situation  supported  by  an  encompassing  stimulus 
stream,  provide  natural  modes  of  interaction  and  movement 
control,  and  utilize  design  features  that  enhance  vection  (Stan¬ 
ney  et  al.,  2000).  Presence  can  be  assessed  via  Witmer  and 
Singer’s  (1998)  Presence  Questionnaire  or  techniques  used  by 
Slater  and  Steed  (2000),  as  well  as  a  number  of  other  means 
(Sadowski  &  Stanney,  2002). 

Virtual  Environment  Ergonomics 

Ergonomics,  which  focuses  on  fitting  a  product  or  system  to  the 
anthropometric,  musculoskeletal,  cardiovascular,  and  psy¬ 
chomotor  properties  of  users,  is  an  essential  element  of  VE  sys¬ 
tem  design  (McCauley-Bell,  2002).  Supporting  user  comfort 
while  donning  cumbersome  HMDs  or  unwieldy  peripheral  de¬ 
vices  is  an  ergonomics  challenge  of  paramount  importance;  dis¬ 
comfort  could  supersede  any  other  sensations  (e.g.,  presence, 
immersion) .  If  a  "VE  produces  discomfort,  participants  may  limit 
their  exposure  time  or  possibly  avoid  repeat  exposure.  Overall 
physical  discomfort  should  thus  be  minimized,  while  user  safety 
is  maximized. 

Ergonomic  concerns  affecting  comfort  include  visual  dis¬ 
comfort  resulting  from  visual  displays  with  improper  depth 
cues,  poor  contrast  and  illumination,  or  improperly  set  IPDs 
(Stanney  et  al.,  2000).  Physical  discomfort  can  be  driven  by  re¬ 
strictive  tethers;  awkward  interaction  devices;  or  heavy,  awk¬ 
ward,  and  constraining  visual  displays.  To  enhance  the  er¬ 
gonomics  of  VE  systems,  several  factors  should  be  considered, 
including  (McCauley-Bell,  2002) : 

•  Is  operator  movement  inhibited  by  the  location,  weight,  or 
window  of  reach  of  interaction  devices  or  HMDs? 

•  Does  layout  and  arrangement  of  interaction  devices  and 
HMDs  support  efficient  and  comfortable  movement? 

•  Is  any  limb  overburdened  by  heavy  interaction  devices  or 
HMDs? 

•  Do  interaction  devices  require  awkward  and  prolonged 
postures? 

•  If  a  seat  is  provided,  does  it  support  user  movement  and  is  it 
of  the  right  height  with  adequate  back  support? 

•  If  active  motion  interfaces  are  provided  (e.g.,  treadmills), 
are  they  adjustable  to  ensure  fit  to  the  anthropometries  of 
users? 

•  Are  the  noise  and  sound  levels  within  ergonomic  guidelines 
and  do  they  support  user  immersion? 


APPLICATION  DOMAINS 


The  wide  range  of  VE  types  and  designs  makes  them  ideal  tools 
for  supporting  a  range  of  performance-enhancing  tasks.  Most 
commonly  VEs  are  used  as  training  aids;  a  less  common  but 
increasingly  well-supported  use  is  as  a  selection  tool,  the  flip 
side  to  the  training  application.  Other  applications  for  VE  in¬ 
clude  their  growing  potential  for  providing  entertainment, 
their  use  as  medical  rehabilitative  tools,  and  even  their  imple¬ 
mentation  within  the  training  system  acquisition  cycle  of  the 
U.S.  government. 

VE  as  a  Selection  and  Training  Tool 

Perhaps  the  most  common  application  for  VEs  is  as  training 
tools.  The  U.S.  military  has  focused  considerable  resources  on 
VE  as  a  means  to  resolve  many  of  the  training  deficits  that  re¬ 
sult  from  the  rigors  of  military  life  (e.g.,  sustaining  schoolhouse- 
gleaned  skills  and  knowledge  sets  during  prolonged  deploy¬ 
ment  periods,  acquiring  new  skills  and  knowledge  while  away 
from  the  schoolhouse,  providing  large-scale  environments  for 
multitrainee  exercises,  distributing  training  exercises).  As  VE 
training  exercises  become  more  closely  aligned  to  real-world 
conditions,  a  series  of  factors,  including  training  transfer  effec¬ 
tiveness,  cost,  logistics  and  safety,  move  to  the  forefront.  Fur¬ 
ther,  in  terms  of  applying  VE  to  enhance  human  performance, 
training  is  actually  the  second  stage  of  a  two-stage  process.  Ide¬ 
ally,  one  would  like  to  ensure  that  those  individuals  for  whom 
training  is  being  provided  have  a  certain  degree  of  “perfor¬ 
mance  capability.”  In  this  approach,  VE  systems  can  be  used  as 
part  of  a  comprehensive  performance-enhancement  program 
that  focuses  on  selecting  those  users  with  the  correct  set  of 
Knowledge,  Skills,  and  Abilities  (KSAs)  and  then  providing, 
when  needed,  training  to  fine-tune  those  KSAs. 

Given  the  potentially  high  cost  of  developing  VEs,  this 
makes  sense.  Given  the  broader,  yet  equally  high  costs  of 
bringing  individuals  into  an  organization  and  training  them  to 
some  level  of  proficiency,  it  becomes  glaringly  obvious  that 
having  an  ability  to  select  individuals  who  are  most  likely  to 
benefit  from  VE  training  is  critical.  For  example,  in  the  U.S.  Air 
Force  the  cost  of  a  single  individual  student  pilot  failing  to 
complete  basic  flight  school  can  run  to  $100,000  (Siem,  Car- 
retta,  &  Mercatante,  1988).  Student  failure  can  be  attributed  to 
both  inadequate  selection  techniques  and  deficient  training 
techniques.  Clearly  both  selection  and  training  play  critical 
roles  in  producing  effective  users.  The  challenge  is  to  develop 
a  program  that  ensures  a  smooth  union  between  the  two-one, 
which  identifies  the  best  candidates  and  then  provides  the  op¬ 
timum  training. 

Traditional  approaches  to  selection  focus  on  social  and  psy¬ 
chophysical  assessments.  For  example,  aptitude  tests,  ranging 
from  traditional  pen-and-paper-type  efforts  (Carretta  &  Ree, 
1995)  to  psychomotor  tests  (Carretta  &  Ree,  1993)  to  computer- 
based  (but  not  VE!)  assessments  (Carretta  &  Ree,  1993)  have 
all  been  used  with  varying  levels  of  success.  The  single  most  im¬ 
portant  criticism  of  each  of  these  approaches  is  that  they  are 
designed  to  be  predictive  of  future  performance,  and  as  such 
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are  more  often  than  not  abstractions  of  aspects  of  the  larger 
task(s)  for  which  the  individual  is  being  selected.  An  alternate 
approach  would  be  to  provide  selectees  with  a  method  that 
provides  a  direct  indication  of  their  performance  abilities.  This 
distinction,  essentially  between  a  test  being  predictive  of  per¬ 
formance  ability  versus  indicative  of  performance  ability,  has  a 
great  impact  on  selection.  A  meta-analysis  performed  within  the 
aviation  domain,  where  much  of  selection  research  has  focused, 
found  that  typical  predictive  validities  (most  often  reported  as 
either  the  correlation  coefficient,  r,  or  the  multiple  correlation 
coefficient,  R,  and  representing  the  degree  to  which  a  given 
predictor/set  of  predictors  and  performance  metrics  are  re¬ 
lated)  for  such  assessments  range  from  a  low  of  0.14  to  a  “high” 
of  about  0.40  (Martinussen,  1996).  Yet,  when  a  simulation  com¬ 
ponent  is  added  to  this  mix,  these  values  have  been  shown  to 
improve  considerably,  pushing  correlations  towards  the  0.60 
level  (Gress  &  Willkomm,  1996). 

A  potential  deterrent  with  using  VEs  as  part  of  a  selection 
toolkit  is  the  high  cost  associated  with  developing  both  the  sys¬ 
tem  as  well  as  the  performance  measures  that  need  to  be  inte¬ 
grated  with  it.  However,  when  considered  as  one  part  of  a  com¬ 
prehensive  performance  enhancement  program,  this  concern 
vanishes.  The  same  effort  and  cost  devoted  to  developing  the 
system  for  selection  can  typically,  with  minimal  additional  input, 
be  modified  (e.g.,  via  altered  scenarios)  to  meet  identified  train¬ 
ing  goals  and  objectives.  It  is  therefore  necessary  to  explore 
some  of  the  basic  requirements  for  using  VEs  as  training  tools 
in  order  to  understand  how  one  technology  can  truly  be  applied 
for  these  multiple  purposes. 

A  constant  thread  in  training  research  is  the  notion  that  in 
order  for  training  to  be  effective,  the  basic  skills  being  taught 
must  show  some  degree  of  transfer  to  real-world  performance. 
Over  100  years  ago,  Thorndike  and  Woodworth  (1901)  laid 
down  the  most  basic  training-transfer  principle  when  they  pro¬ 
posed  that  transfer  was  determined  by  the  degree  of  similarity 
between  any  two  tasks.  Applying  this  heuristic  to  VE  design, 
one  might  conclude  that  the  most  basic  way  to  ensure  perfect 
transfer  is  to  ensure  that  the  real-world  performance  elements 
that  are  meant  to  be  trained  should  be  replicated  perfectly  in 
a  virtual  environment.  This  notion  of  “identical  elements” 
could  easily  create  a  serious  challenge  for  system  designers 
even  by  today’s  technology  standards,  as  VEs  are  still  not  able 
to  perfectly  duplicate  the  wide  range  of  sensorial  stimuli  en¬ 
countered  during  daily  interactions  with  our  world  (Stoffregen, 
Bardy,  Smart,  &  Pagulyan,  2003).  Countering  this  somewhat 
simplistic  design  approach  is  Osgood’s  (1949)  principle  that 
greater  similarity  between  any  two  tasks  along  certain  dimen¬ 
sions  will  not  guarantee  wholesale,  perfect  transfer.  The  chal¬ 
lenge,  as  noted  by  Roscoe  (1982)  is  to  find  the  right  balance  be¬ 
tween  technical  fidelity  and  training  effectiveness. 

This  suggests  that  when  developing  training  systems,  it  is  im¬ 
portant  to  take  two  factors  into  consideration.  The  first  is  how 
one  can  measure  training  transfer,  and  the  second  is  how  one 
can  assess  the  cost/benefit  relationship  associated  with  making 
the  decision  to  use  VEs  to  provide  training.  The  most  basic  met¬ 
ric  for  assessing  transfer  is  the  percent  of  transfer  (Wickens  & 
Hollands,  2000)  in  which  the  amount  of  time  saved  in  learning 
the  real  world  task  is  expressed  in  terms  of  how  much  time  a 


control  group,  denied  such  training,  required  to  learn  the  task 
(see  Equation  1): 

%Transfer  =  {(Control  Time  -  Transfer  Time')! 

Control  Time}*lQQ 

A  quick  inspection  of  this  metric  suggests  that  it  is  overly 
simplistic.  Consider  a  situation  in  which  the  percent  transfer  is 
high,  yet  the  total  amount  of  time  spent  training  in  the  transfer 
platform  (e.g.,  ’VE)  may  be  greater  than  that  spent  by  a  control 
group  to  achieve  similar  levels  of  performance.  Such  an  out¬ 
come  is  hardly  cost  effective.  Consequently,  in  parallel  to  a  ba¬ 
sic  quantification  of  transfer,  another  metric,  the  Training  Effec¬ 
tiveness  Ratio  (TER;  Povenmire  &  Roscoe,  1973)  focusing  on 
training  efficiency  is  needed  (see  Equation  2) . 

TER  =  Amount  of  time  saved  learning  task 
by  Transfer  Group/Time  spent  in  transfer  platform 

All  things  being  equal,  these  two  measures  should  perfectly 
quantify  VE  utility  as  a  training  tool.  Yet,  a  third  factor,  cost,  al¬ 
most  always  factors  in.  Indeed,  cost  is  often  the  single  most  crit¬ 
ical  driving  factor  in  determining  whether  to  supplement  a  train¬ 
ing  program  with  VE  tools.  In  terms  of  training  effectiveness, 
cost  can  be  quantified  using  the  Training  Cost  Ratio  (TCR;  "Wick- 
ens  &  Hollands,  2000;  see  Equation  3). 

TCR  =  Training  cost  in  real  platform/ 

Training  cost  in  transfer  platform  ^ 

"When  making  the  determination  to  use  a  VE  to  provide 
some  (or,  less  likely,  all)  of  a  given  set  of  training,  one  must  con¬ 
sider  the  interplay  between  these  measures.  Specifically,  the  re¬ 
lationship  between  training  effectiveness,  as  expressed  through 
the  TER,  and  training  cost,  as  expressed  through  the  TCR,  will 
drive  this  decision.  A  good  rule  of  thumb,  provided  by  "Wickens 
and  Holland  (2000)  is: 

TERXTCR  >  1.0,  VE  Training  is  effective  (4) 

TER  X  TCR  <  1.0,  VE  Training  is  ineffective  (5) 

A  slightly  different  way  of  interpreting  this  was  provided  by 
Roscoe  (1982)  who  suggested  that,  given  that  the  higher  the 
level  of  technical  fidelity  the  more  costly  the  system,  there  is  a 
region  of  design  space  within  which  cost/benefit  is  optimal; 
move  away  from  this  region  in  either  direction,  and  you  run  the 
risk  of  paying  a  lot  for  a  system  that  delivers  less  than  effective 
(and  perhaps  even  negative)  training. 

This  brief  overview  should  not  leave  one  with  the  impression 
that  the  decision  to  use — or  to  not  use — ^VEs  for  training  is  sim¬ 
ple.  Assessing  the  TER  *  TCR  function  is  no  easy  matter.  Simply 
put,  the  cost  of  performing  the  study  needed  to  capture  the  ef¬ 
fectiveness  of  a  training  device  can  far  outstrip  the  perceived  im¬ 
mediate  impact  of  incorporating  the  device  into  a  training  cur¬ 
riculum.  Often,  the  type  of  system  being  developed  requires  a 
significant  investment  in  time,  money,  and  labor  in  order  to  be 
evaluated.  Consider  a  situation  in  which  a  new  flight  simulator  is 
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being  considered  for  purchase.  In  order  to  determine  both  the 
TER  and  the  TCR,  one  must  conduct  an  actual  transfer  study, 
which  involves 

1.  Removing  a  set  of  trainees  from  their  curriculum  for  experi¬ 
mental  purposes; 

2.  Developing  reliable  performance  metrics; 

3.  Providing  both  the  experiment  group  and  the  control 
group  with  adequate  time  in  the  real  world  (e.g.,  aircraft) 
environment  and  the  experiment  group  time  in  the  VE; 

4.  Dealing  with  typically  low  effect  sizes,  which  forces  experi¬ 
menters  to  use  large  sample  sizes. 

Yet,  just  as  the  TER  *  TCR  function  could  guide  when  to  use 
and  not  use  VEs,  it  is  possible  to  pull  some  general  principles 
from  the  literature.  For  example,  cognitively  oriented  training, 
which  often  features  problem  solving  or  decision  making  as  a 
key  training  goal,  may  oftentimes  be  satisfied  by  using  simple, 
relatively  inexpensive  visualization  systems  (Gopher,  Weil,  & 
Bareket,  1994;  Morris  &  Tarr,  2002;  Stone,  2002;  Figueroa, 
Bischof,  Boulanger,  &  Hoover,  2005;  Milham,  Hale,  Stanney 
Cohn,  Darken,  &  Sullivan,  2004).  Contrastingly,  motor  skill 
based  training,  which  often  requires  complex  interactions  be¬ 
tween  users  and  their  environments,  frequently  requires  more 
costly  solutions.  When  considering  whether  or  not  to  “go  vir¬ 
tual,”  a  useful  solution  would  be  to  first  determine  the  general 
category  of  skill  to  be  trained  (Cognitive  or  Motor;  cf ,  Ander¬ 
son,  1987);  estimate  the  TER  and  TCR  from  sources  as  broadly 
ranging  as  basic  science  studies,  published  costs  of  similarly  de¬ 
sired  systems,  and  one’s  own  best  estimates;  and  then  develop 
a  basic  decision  matrix  to  establish  a  range  of  TER  *  TCR  values. 

VE  as  Entertainment  Tool 

In  many  ways  entertainment  can  be  considered  a  prime  tech¬ 
nology  pusher.  Consider  the  evolution  of  the  VCR,  Digital  Video 
Disk,  High  Definition  Television  and,  more  recently,  the  advent 
of  the  all-in-one  device  (phone,  PDA,  musicVideo  player)  in  var¬ 
ious  forms.  Each  of  these  devices  captured  a  unique  market 
niche  precisely  because  they  pushed  ahead  current  technology 
barriers  in  ways  never  before  considered.  Yet,  if  entertainment 
provides  the  technology  push  necessary  to  achieve  revolutions, 
the  consumers  of  such  technologies  provide  an  equally  strong 
technology  pull,  placing  discrete  constraints  on  this  transfor¬ 
mation.  Often,  such  constraints  are  directly  at  odds  with  enter¬ 
tainment’s  evolution.  An  entertainment  developer  may  create  a 
new  game  that  requires  a  high-end  graphics  system,  a  costly 
sound  system,  and  perhaps  even  a  tactile  (haptic)  interface  to 
fully  take  advantage  of  the  system.  The  consumers,  on  the  other 
hand,  may  have  their  own  satisficing  heuristic,  requiring  that, 
in  addition  to  providing  novel  entertainment  value,  any  new 
technology  must  cost  less  than  a  fixed  amount  and  fit  into  their 
back  pockets.  Unless  there  is  significant  demand  for  a  technol¬ 
ogy  to  factor  these  constraints  into  its  development  (e.g.,  a  sig¬ 
nificant  consumer  base)  the  resultant  product  may  either  be 
forestalled  or,  when  placed  on  the  market,  flounder  and  vanish 
overnight. 


Until  recently,  the  high  cost  of  the  component  technologies 
comprising  'VEs  precluded  their  becoming  anything  more  than 
a  research  test-bed  for  well-funded  labs  or  an  exclusive  tool  for 
high-end  consumers,  like  commercial  airline  companies  or  the 
military  establishments  of  entire  countries.  Yet,  after  nearly  four 
decades  of  research  and  development,  VEs  have  finally  begun 
to  realize  their  potential.  This  maturation  owes  much  to  ad¬ 
vancements  in  the  computer  industry  which,  proceeding  at  a 
Moore’s  Law  rate,  has  been  able  to  evolve  the  simple  desktop 
PC  into  a  powerful  graphical  rendering  system.  As  technology 
barriers  have  been  shattered,  so  too  have  many  of  the  HCI  chal¬ 
lenges  that  were  typically  associated  with  earlier  technology- 
limited  systems,  thus  resulting  in  a  number  of  entertainment- 
based  applications.  Current  VE  entertainment  applications 
include  games,  online  communities,  location-based  entertain¬ 
ment,  theme  parks,  and  other  venues  (Badique  et  al.,  2002). 
From  interactive  arcades  to  cybercafes,  the  entertainment  in¬ 
dustry  has  leveraged  the  unique  characteristics  of  the  VE 
medium,  providing  dynamic  and  exciting  experiences.  Such  ap¬ 
plications  tend  to  set  the  pace,  both  technological  and  in  terms 
of  content,  for  advances  in  VE  technology  in  general.  Loosely 
put,  the  goal  of  such  entertainment  technologies  is  to  provide 
consumers  with  experiences  that  they  would  not  ordinarily  en¬ 
counter  in  their  typical  daily  activities.  Consequently,  the  goals 
and  objectives  of  entertainment-based  systems  are  quite 
broadly  defined  (and,  sometimes,  not  necessarily  defined  at 
all!)  and  shaped  primarily  through  user  feedback  vis-a-vis  fac¬ 
tors  such  as  “ease  of  use,”  “engagement  level,”  “interest  level,” 
and  so  forth.  These  factors  are  molded  to  a  large  extent  by  pop¬ 
ular  culture.  By  attending  to  these  global  trends,  developers 
can  determine  what  technologies  exist,  or  need  to  be  created, 
to  support  current  user  demand.  Thus,  in  spite  of  the  optimism 
generated  by  such  entertainment  applications,  a  cautionary 
note  is  warranted.  The  uses  for  which  typical  VEs  have  been, 
and  continue  to  be,  considered  (e.g.,  primarily  as  training 
tools)  have  a  strong  scientific  basis  upon  which  to  define  sys¬ 
tem  specifications,  and  a  lengthy  history  linking  these  specifi¬ 
cations  to  specific  technologies.  Consider,  for  example,  the  pre¬ 
requisites  for  designing  a  system  to  train  a  Naval  Officer  to 
maneuver  a  ship  in  a  harbor  (Hays,  Castillo,  Seamon,  &  Bradley, 
1997;  Nguyen,  Cohn,  Mead,  Helmick,  &  Patrey,  2001).  Since  the 
trainees  will,  ultimately,  be  graded  on  their  performances,  de¬ 
velopers  must  identify  the  necessary  training  goals  and  objec¬ 
tives  (e.g.,  Osgood’s  [1949]  identical  elements),  identify  the 
necessary  technologies  (or  develop  them  if  absent) ,  integrate 
them,  test  and  validate  their  training  efficacy,  and  ultimately 
hand  them  off  to  be  used.  Such  a  developmental  process  can 
require  tens  of  millions  of  dollars,  time  investments  on  the  or¬ 
der  of  years,  and  personnel  investments  numbering  in  the 
dozens.  Nevertheless,  precisely  because  training  is  perceived  as 
(and  often  is)  critical  to  a  given  organization’s  success,  train¬ 
ing  requirements  can  provide  a  justification  for  such  large  in¬ 
vestments.  On  the  other  hand,  in  entertainment,  the  driving 
force  is  delivering  customer  satisfaction  ahead  of  the  competi¬ 
tion.  As  a  result,  it  is  quite  likely  that  the  vast  majority  of  enter¬ 
tainment  developers  will  assume  less  risk  when  developing 
their  products  in  order  to  cater  to  a  constantly  shifting  market 
interest. 
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VE  as  a  Medical  Tool 

Much  has  been  written  about  applications  for  VE  within  the 
medical  arena  (cf.,  Moline,  1995;  Satava  &  Jones,  2002).  While 
some  of  these  applications  represent  unique  approaches  to  har¬ 
nessing  the  power  of  VE,  many  other  applications,  such  as  sim¬ 
ulating  actual  medical  procedures,  reflect  training  applications, 
and  therefore  will  not  be  discussed  anew  here.  One  area  of 
medical  application  for  which  VE  is  truly  coming  into  its  own  is 
medical  rehabilitation.  In  particular,  two  areas  of  rehabilitation, 
psychological/cognitive  and  motor,  show  strong  promise. 

Psychological/cognitive  rehabilitation  applications. 

Perhaps  the  fastest-growing  application  for  VEs  in  medicine  is  in 
the  area  of  phobia  treatment  (Rizzo,  Buckwalter,  &  van  der  Zaag, 
2002;  Satava  &  Jones,  2002;  Thacker,  2003).  The  reason  for  this 
is  likely  due  to  the  ideal  matching  between  VE’s  strengths  (pre¬ 
senting  evolving  information  with  which  users  can  interact  in 
various  ways)  and  such  therapy’s  basic  requirements  (incre¬ 
mental  exposure  to  the  offending  environment) .  Importantly, 
compared  to  previous  treatment  regimens,  which  oftentimes 
simply  required  patients  to  mentally  revisit  their  fears,  VEs  of¬ 
fer  a  significantly  more  immersive  experience.  In  fact,  it  is  quite 
likely  that  many  of  VE’s  shortcomings,  such  as  poor  visual  reso¬ 
lution,  inadequate  physics  modeling  underlying  environmental 
cues,  and  failure  to  fully  capture  the  wide  range  of  sensorial 
cues  present  in  the  real  world  will  be  ignored  by  the  patient, 
whose  primary  focus  is  on  overcoming  anxiety  engendered  by 
her  or  his  specific  phobias.  On  a  practical  level,  VEs  enable  pa¬ 
tients  to  simply  visit  their  therapist’s  office,  where  they  can  be 
provided  an  individually  tailored  multimodal  treatment  experi¬ 
ence  (Rothbaum,  Hodges,  Watson,  Kessler,  &  Opdyke,  1996; 
Emmelkamp,  Bruynzeel,  &  Drost,  van  der  Mast,  2001;  Anderson, 
Rothbaum,  &  Hodges,  2003). 

Motor  rehabilitation  applications.  Many  of  "VE’s  qual¬ 
ities  that  make  it  an  ideal  tool  for  providing  medical  training — 
such  as  tactile  feedback  and  detailed  visual  information  (Satava 
&  Jones,  2002) — also  make  it  an  ideal  candidate  for  supple¬ 
menting  motor  rehabilitation  treatment  regiments  for  such  con¬ 
ditions  as  stroke,  Parkinson’s,  and  pain  treatment  (Holden, 
Todoroy  Callahan,  &  Bizzi,  1999;  Gershon,  Zimand,  Lemos,  Roth¬ 
baum,  &  Hodges,  2003).  In  determining  how  best  to  apply  "VE  in 
these  and  other  treatment  regimens,  Holden  (2005)  suggested 
considering  three  practical  areas  in  which  VE  is  strongest:  rep¬ 
etition,  feedback,  and  motivation.  All  three  elements  are  criti¬ 
cal  to  both  effective  learning  and  regaining  of  motor  function; 
the  application  of  VE,  in  each  case,  provides  a  powerful  method 
for  rehabilitation  specialists  to  maximize  the  effect  of  a  treat¬ 
ment  regimen  for  a  given  session,  and,  because  they  may  reduce 
the  time  investment  required  by  therapists  (one  can  simply  im¬ 
merse  the  patient,  initiate  the  treatment,  and  then  allow  the 
program  to  execute),  to  also  expand  the  access  of  such  treat¬ 
ments  to  a  wider  population. 

Since  VE  is  essentially  computer-based,  patients  can  effec¬ 
tively  have  their  attention  drawn  to  a  specific  set  of  movement 
patterns  they  may  need  to  make  to  regain  function;  conduct¬ 
ing  this  in  a  “loop”  provides  unlimited  ability  to  repeat  a  pattern, 
while  using  additional  visualization  aids,  such  as  a  rendered  cur¬ 


sor  or  “follow-me”  types  of  cues,  to  force  the  patient  into  mov¬ 
ing  a  particular  way  (cf ,  Chua,  Crivella,  Daly,  Hu,  Schaaf,  Ven¬ 
tura,  et  al.,  2003).  As  well,  it  is  a  relatively  simple  matter  to  digi¬ 
tize  movement  information,  store  it,  and  then,  based  on 
comparisons  to  previously  stored,  desired,  movement  patterns 
provide  additional  feedback  to  assist  the  patient.  In  terms  of 
motivation,  treatment  scenarios  can  be  tailored  to  capture  spe¬ 
cific  events  that  individual  patients  find  most  motivating:  a  base¬ 
ball  fan  can  practice  her  movement  in  a  baseball-like  scenario; 
a  car  enthusiast  can  do  so  in  a  driving  environment. 

There  are  certain  caveats  that  must  be  considered  when  ex¬ 
ploiting  VE  for  rehabilitation  purposes,  most  significantly  the 
potentially  rapid  loss  of  motor  adaptations  following  VE  expo¬ 
sure.  Lackner  and  DiZio  (1994)  demonstrated  that  certain  basic 
patterns  of  sensori-motor  recalibrations  learned  in  a  given  phys¬ 
ical  environment  can  diminish  within  an  hour,  postexposure,  al¬ 
though  subsequent  findings  (DiZio  &  Lackner,  1995)  suggest 
that  there  are  certain  transfer  benefits  that  are  longer-lasting. 
Brashers-Krugg,  Shadmehr,  and  Bizzi  (1996)  provided  addi¬ 
tional  evidence  that  sensori-motor  recalibrations  of  the  type 
likely  to  be  required  for  rehabilitation  have  post  exposure  peri¬ 
ods,  in  excess  of  four  hours,  during  which  their  effects  can  be 
extinguished.  Most  importantly  Cohn,  DiZio  and  Lackner 
(2000)  demonstrated  that  such  recalibrations,  when  learned  in 
VE,  have  essentially  no  transfer  to  real-world  conditions  post¬ 
exposure.  Clearly,  more  research  is  needed  to  understand  the 
conditions  under  which  such  transfer  effects  can  be  made  most 
effective  within  the  clinical  setting. 

VE  as  a  System  Acquisition  Tool 

In  many  areas  of  research  and  development,  rising  costs,  with 
their  associated  increasing  consequences  of  failure,  have  all  but 
forced  scientists  and  engineers  to  focus  only  on  developing 
those  products  that  have  an  almost  guaranteed  chance  of  suc¬ 
ceeding  in  the  marketplace.  While  this  may  seem  like  an  ideal 
situation — after  all,  who  would  want  to  field  technologies  that 
won’t  work?  — the  net  result  is  an  increasing  aversion,  among 
companies  and  their  financial  backers,  to  shoulder  much  risk. 
Consequently,  much-needed  breakthrough  technologies  may 
be  ignored  in  favor  of  “sure-thing”  technologies,  which  more 
often  than  not  do  little  to  push  ahead  technology  barriers. 

The  United  States  Department  of  Defense  (DoD),  seeking  to 
reverse  this  trend  while  maintaining  a  level  of  control  over  risk, 
developed  the  process  of  Simulation  Based  Acquisition  (SBA), 
a  methodology  that  introduced  the  use  of  modeling  and  simu¬ 
lation  (M&S)  within  the  product-development  process  (Sanders, 
1997;  Zittel,  2001).  Most  recently,  these  M&S  solutions  have 
become  increasingly  integrated  into  4D  interactive  tools  that 
enable  human-in-the-loop  testing  of  design  and  operational 
principles  (Sanders,  1997).  As  one  example  that  cuts  across  DoD 
and  industry,  consider  the  development  of  their  newest  fighter 
aircraft,  the  Joint  Strike  Tighter  (JSF).  While  more  common 
M&S  solutions,  such  as  the  ability  to  simulate  individual  com¬ 
ponents  digitally  before  manufacturing  them  (through  the 
CAD/CAM  process),  formed  a  core  component  of  the  develop¬ 
ment  process,  more  advanced  M&S  solutions,  such  as  VEs,  have 
enabled  developers  to  include  users  throughout  the  develop- 


3 1 .  Vlirtual  Environments  *  63  5 


ment  process,  in  order  to  answer  such  questions  as  (a)  how 
users  (e.g.,  pilots)  would  implement  the  fighter  in  combat,  us¬ 
ing  virtual  war-gaming  techniques;  (b)  how  well  individual  pilots 
could  handle  a  given  JSF  design;  and,  (c)  how  the  JSF  would 
alter  the  nature  of  warfare  (Zittel,  2001). 


CONCLUSIONS 


clearly,  VEs  have  evolved  significantly  from  Morton  Heilig’s  1962 
Sensorama  experience  (Burdea  &  Coiffet,  2003).  Yet,  despite 
significant  revolutions  in  component  technology,  many  of  the 
challenges  addressed  by  that  primitive  system,  such  as  multi¬ 
modal  sensori-interaction,  visual  representation,  and  scenario 
generation,  have  yet  to  be  fully  resolved  more  than  40  years 
later.  At  the  same  time,  our  understanding  of  the  potential  such 
tools  have  to  offer  has  advanced  considerably.  No  longer  sim¬ 
ple  amusements,  these  powerful  machines  can  provide  educa¬ 
tional  value,  assist  in  treating  core  physical  and  cognitive  mal¬ 


adies,  and  even  help  design  better  VE  systems.  As  the  uses  for 
which  VEs  are  ideally  suited  continue  to  be  defined  and  refined, 
one  can  anticipate  that  current  development  challenges  will  be 
resolved,  only  to  find  new  ones  waiting  to  take  their  place. 
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INTRODUCTION 


What  do  we  mean  when  we  say  that  we  trust  a  computer  system? 
While  trust  is  a  complex  concept  with  many  different  meanings 
in  behavioral  science  literature,  it  has  a  specific  meaning  when 
applied  to  a  person’s  interaction  with  a  technology  system.  Trust 
in  a  system  is  the  extent  to  which  a  system  behaves  as  the  user 
expects.  Put  another  way,  it  is  a  measure  of  how  much  users  be¬ 
lieve  that  the  system  will  do  what  they  ask  it  to  do  without  caus¬ 
ing  any  harm.  This  view  suggests  that  people  can  enter  an  inter¬ 
action  with  a  system  with  some  ideas  about  what  the  interaction 
will  involve,  and  that  their  trust  in  the  system  can  either  increase 
or  decrease  with  experience.  Trust  becomes  important,  particu¬ 
larly  for  people  or  organizations  that  rely  on  people  wanting  to 
use  their  websites  or  products,  because  without  trust  in  a  sys¬ 
tem,  people  will  find  other  ways  to  conduct  business.  Parts  of  the 
expectations  people  have  about  systems  are  related  to  the  in¬ 
tended  function  or  purpose  of  the  system.  They  trust  that  ATMs 
will  correctly  distribute  money  and  debit  their  accounts  appro¬ 
priately.  They  expect  that  merchandise  ordered  through  web¬ 
sites  will  arrive  safely  at  their  homes.  However,  there  are  also 
expectations  about  the  handling  of  information  they  exchange 
with  the  systems.  They  assume  that  transactions  are  secure — 
that  they  know  with  whom  they  are  dealing  and  that  the  infor¬ 
mation  reaches  only  the  intended  target.  They  also  assume  that 
their  privacy  is  respected — that  promises  the  owners  of  the  sys¬ 
tem  make  about  how  the  information  will  be  used  or  shared  are 
kept.  This  intersection  of  trust,  security,  and  privacy  is  becom¬ 
ing  increasingly  important,  as  information  technology  becomes 
a  more  pervasive  part  of  our  lives. 

As  we  will  discuss  below,  there  can  be  many  aspects  and  in¬ 
fluences  on  the  trust  placed  by  a  user  in  a  computer  system.  "We 
will  not  try  to  address  them  all  in  this  chapter,  but  will  focus  on 
two  important  contributing  factors  associated  with  the  risks  of 
harm  one  might  have  in  interacting  with  a  system.  When  people 
interact  with  systems,  they  assume  that  doing  so  will  not  cause 
them  any  harm.  They  expect,  that  is  they  trust,  that  the  inter¬ 
actions  are  truthful  and  without  hidden  consequences.  If  they 
interact  with  websites,  they  assume  that  it  represents  communi¬ 
cation  with  a  specific  person  or  organization.  If  they  disclose 
information  to  the  websites,  they  assume  that  the  disclosures 
will  only  reach  the  intended  targets  and  that  only  appropriate 
use  will  be  made  of  the  information. 

We  will  consider  security  as  the  concept  associated  with  how 
well  a  system  can  protect  information  it  contains.  Many  factors 
are  associated  with  security  in  systems  that  interact  with  hu¬ 
mans.  Perhaps  the  most  important  of  these  is  authentication — 
how  a  system  and  a  user  can  be  confident  of  each  other’s  iden¬ 
tities.  If  users  and  systems  have  confidence  that  they  know  the 
identities  of  the  others,  we  generally  assume  that  neither  will  at¬ 
tempt  to  do  harm  to  the  other.  "While  research  in  system  secu¬ 
rity  is  often  aimed  at  minimizing  risks  associated  with  malicious 
attacks  on  systems,  for  HCI  we  will  focus  on  the  tradeoffs  be¬ 
tween  rigorous  authentication  and  ease  of  interaction. 

The  concept  of  privacy  goes  beyond  security  to  examine  how 
well  the  use  of  information  that  the  system  acquires  about  a 
user  conforms  to  the  explicit  or  implicit  assumptions  regarding 
that  use  associated  with  the  personal  information.  We  would 


like  to  draw  an  important  distinction  when  discussing  privacy 
from  an  HCI  view.  In  general,  studies  have  looked  at  privacy 
from  two  different  perspectives.  From  an  end-user  perspective, 
privacy  can  be  considered  as  restricting  access  to  personal  in¬ 
formation,  or  it  can  be  viewed  as  controlling  use  of  personal  in¬ 
formation.  In  the  former,  the  user  expresses  a  wish  to  be  left 
alone;  in  the  latter  the  wish  is  to  use  information  according  to 
expressed  wishes.  While  much  has  been  said  about  the  end  of 
privacy  in  the  pervasive  computing  world — meaning  that  so 
much  is  known  about  each  of  us  that  it  is  futile  to  worry  about 
privacy — this  has  really  been  an  attempt  to  highlight  the  diffi¬ 
culty  individuals  face  in  trying  to  remain  anonymous  (the  first 
consideration  above) .  "While  electronic  surveillance  is  increas¬ 
ingly  common,  this  does  not  mean  that  people  have  to  give  up 
their  rights  to  control  the  use  of  information  collected  about 
them.  It  is  our  assumption  that  collection  of  more  and  more 
data  about  us  is  a  trend  that  will  continue.  However,  we  also  as¬ 
sume  that  legislation  will  support  people’s  rights  to  ensure  that 
appropriate  use  is  made  of  data  collected.  In  considering  pri¬ 
vacy,  we  will  generally  assume  that  security  in  a  system  is  ade¬ 
quate.  Thus,  we  will  view  data  protection  failures  associated 
with  unauthorized  access  to  information  to  be  security  failures 
or  breaches,  and  those  associated  with  noncompliance  to  stated 
privacy  policies  to  be  privacy  breaches.  We  will  explore  the  pri¬ 
vacy  issues  in  more  detail  below.  For  the  purposes  of  this  chap¬ 
ter,  a  simple  but  useful  definition  of  privacy  is  “the  ability  of  in¬ 
dividuals  to  control  the  terms  under  which  their  personal 
information  is  acquired  and  used”  (Culnan,  2000,  p.  21). 

In  summary,  security  involves  technology  to  ensure  that  in¬ 
formation  is  appropriately  protected.  It  involves  users  in  that  se¬ 
curity  features  such  as  passwords  for  access  control  or  encryp¬ 
tion  to  prevent  unintended  disclosure,  often  place  requirements 
on  users  in  order  to  function  correctly.  Password  schemes  must 
be  hard  to  guess  but  should  be  easy  to  remember.  Encryption 
mechanisms  must  prevent  unintended  decryption  but  be  trans¬ 
parent  (or  nearly  so)  to  intended  senders  and  recipients.  Privacy 
involves  mechanisms  to  support  compliance  with  some  basic 
principles  (e.g.,  OECD,  1980;  U.S.  Fair  Credit  Reporting  Act  of 
1978,  1978).  These  suggest  that  people  should  be  informed 
about  information  collection,  told  in  advance  what  will  be  done 
with  their  information,  and  given  a  reasonable  opportunity  to  ap¬ 
prove  of  such  use  of  information.  Trust  is  seen  as  increasing 
when  security  and  privacy  are  provided  for.  Without  trust,  it  is 
perceived  that  people  will  be  less  likely  to  use  systems. 

As  a  part  of  our  research,  we  identified  connections  between 
the  concepts  privacy,  security,  and  trust  as  they  applied  to  inter¬ 
action  with  organizations  that  collect  personal  information  (PI) . 
This  work  (Karat,).,  Karat,  C.,  Brodie,  &  Feng,  2005)  employed 
a  Contextual  Design  method  that  enabled  us  to  identify  themes 
present  in  data  and  to  understand  the  relationships  among 
them.  One  theme  involved  the  relationship  of  privacy  to  other 
concepts.  Many  of  the  participants  we  talked  with  discussed  how 
privacy  related  to  other  concepts  such  as  security,  personaliza¬ 
tion,  trust,  and  education.  Fig.  32.1  below  shows  how  each  of 
these  concepts  relates  to  privacy.  Good  security  enables  and  is 
a  building  block  for  privacy,  and  the  interviewees  have  noted 
that,  as  they  focus  on  managing  privacy  in  their  organizations, 
they  have  found  ways  to  enhance  their  security  as  well.  The  trust 
that  data  subjects  have  in  an  organization  is  paramount,  and 
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FIGURE  32.1.  The  relationships  between  privacy  and  other 
concepts. 

providing  privacy  protection  is  a  critical  means  of  ensuring  that 
trust.  Privacy  education  is  important  for  both  the  organizations 
and  their  external  users.  Many  interviewees  stated  that  privacy 
education  of  their  employees  is  an  ongoing  priority  now  and 
that  the  privacy  of  data  subjects’  PI  cannot  be  effectively  pro¬ 
tected  until  employees  (the  data  users)  believe  it  is  important 
and  follow  through  in  their  daily  actions.  Many  have  initiated 
training  programs  and  have  frequently  asked  questions  (FAQs) 
available  for  employees  to  access  to  understand  how  to  handle 
PI  correctly.  The  interviewees  in  the  current  research  study 
stated  that  their  customers  would  only  provide  PI  on  the  condi¬ 
tion  that  it  would  be  protected  and  not  misused,  and  this  is  a 
critical  element  in  personalization.  Privacy  is  woven  into  organi¬ 
zations’  business  processes.  Interviewees  stated  that  they  are 
reexamining  their  business  processes  in  order  to  protect  the 
personally  identifying  information  (PII)  of  their  employees,  cus¬ 
tomers,  constituents,  and  patients.  They  are  finding  redundan¬ 
cies  in  the  collection  of  PII  and  are  realizing  financial  benefits 
from  streamlining  their  processes  privacy  management. 

The  privacy  and  security  functionality  of  the  applications  that 
users  experience  must  also  be  usable  to  gain  the  user’s  trust. 
There  are  unique  challenges  in  making  these  capabilities  usable. 
We  will  explore  these  issues  before  discussing  the  research  in 
privacy,  security,  and  trust. 


HCI  AND  USABILITY  IN  THE  PRIVACY, 
SECURITY  AND  TRUST  DOMAINS 


■While  the  core  usability  goals  of  understanding  the  user,  their 
tasks,  and  the  context  of  use  apply  across  a  wide  range  of  do¬ 
mains,  unique  aspects  of  privacy  and  security  present  challenges 
and  opportunities  when  designing  security  and  privacy  func¬ 
tionality.  First,  a  key  issue  to  consider  is  that  the  use  of  security 


and  privacy  solutions  is  not  the  user’s  main  goal.  Users  value 
and  want  security  and  privacy  functionality,  and  they  are  not 
likely  to  trust  systems  that  do  not  provide  them,  but  they  regard 
them  as  only  secondary  to  completing  their  primary  tasks  (e.g., 
completing  an  online  banking  transaction,  ordering  medica¬ 
tions).  The  central  conflict  in  the  minds  of  many  is  that  security 
mechanisms  are  seen  as  making  operations  harder,  while  us¬ 
ability  is  focused  on  making  operations  easier.  The  user  would 
like  the  solutions  to  be  as  transparent  as  possible,  but  users  do 
want  to  be  in  control  and  understand  what  is  occurring.  Therein 
lies  the  HCI  challenge.  Thus,  the  display  of  information  and  the 
interaction  methods  related  to  security  and  privacy  solutions 
need  to  be  accessible  when  desired  by  the  user. 

Second,  as  more  of  people’s  interactions  in  daily  life  involve 
the  use  of  computing  technology  and  sensitive  information,  dis¬ 
parate  types  of  users  must  be  accommodated.  Security  solutions 
in  particular  have  historically  been  designed  with  a  highly 
trained  technical  user  in  mind.  The  user  community  has  broad¬ 
ened  extensively  as  organizational  business  processes  have 
come  to  include  new  types  of  roles  and  users  in  the  security  and 
privacy  area.  Many  compliance  and  policy  roles  in  organizations 
are  handled  by  legal  and  business  process  experts  who  have  lim¬ 
ited  technical  skills.  Moreover,  the  end  user  base  includes  vir¬ 
tually  everyone  in  the  population.  The  functionality  provided  by 
the  system  for  people  with  different  roles  must  accommodate 
the  skill  sets  of  each.  Security  and  privacy  are  requirements  for 
doing  business  as  an  organization  and  must  be  done  well,  or  the 
organization  may  lose  the  user  as  a  customer  or  worse. 

Third,  the  risk  of  the  negative  impact  of  usability  problems 
is  higher  for  security  and  privacy  applications  than  for  many 
other  types  of  systems.  Although  complexity  is  at  the  very  heart 
of  many  security  and  privacy  solutions,  from  an  HCI  point  of 
view  that  complexity  is  really  the  enemy  of  the  success  of  secu¬ 
rity  and  privacy.  If  the  system  is  so  complex  that  the  various 
users  groups  (e.g.,  technical  users,  business  users,  and  end 
users)  cannot  understand  it,  then  costly  errors  will  occur.  There 
is  a  saying  in  the  HCI  field  that  if  the  user  cannot  understand 
functionality,  it  doesn’t  exist.  In  the  case  of  security  and  privacy 
badly  designed  functionality  may  put  users  at  more  risk  than  if 
they  used  less  sophisticated  solutions.  The  increased  risk  of  er¬ 
rors  in  this  domain  provides  an  even  greater  incentive  to  in¬ 
clude  HCI  work  in  system  research  and  development.  The  user 
issues  in  the  domain  provide  unique  technical  challenges  to  ar¬ 
chitects  to  design  the  solutions  to  be  simple  and  effective. 

Fourth,  users  will  need  to  be  able  to  easily  update  security 
and  privacy  solutions  to  accommodate  frequent  changes  in  leg¬ 
islation  and  regulations.  Additionally,  different  domains  (e.g., 
health  care,  banking,  government)  and  geographies  will  have 
unique  requirements.  Systems  must  be  designed  to  enable  easy 
and  effective  updates  to  them.  ’While  this  list  is  not  exhaustive, 
these  challenges  provide  a  unique  focus  and  strong  incentive 
to  include  HCI  in  the  system  lifecycle.  ’We  discuss  the  valuable 
role  that  HCI  can  play  in  more  detail  below. 

Views  on  Privacy 

The  rapid  advancement  of  the  use  of  information  technology 
in  industry,  government,  and  academia  makes  it  much  easier  to 
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collect,  transfer,  and  store  PI  around  the  world.  This  raises  chal¬ 
lenging  questions  and  problems  regarding  the  use  and  protec¬ 
tion  of  PI  (Kobsa,  2002).  Questions  of  who  has  what  rights  to 
information  about  us  for  what  purposes  become  more  impor¬ 
tant  as  we  move  toward  a  world  in  which  it  is  technically  possi¬ 
ble  to  know  just  about  anything  about  just  about  anyone.  As 
Adams  and  Sasse  (2001,  p.  49)  stated,  “Most  invasions  of  pri¬ 
vacy  are  not  intentional  but  due  to  designers’  inability  to  antici¬ 
pate  how  this  data  could  be  used,  by  whom,  and  how  this  might 
affect  users.”  Deciding  how  we  are  to  design  privacy  considera¬ 
tions  in  technology  for  the  future  includes  philosophical,  legal, 
and  practical  dimensions — any  or  all  of  which  can  be  considered 
as  within  the  domain  of  the  field  of  HCI. 

"We  provide  a  description  of  our  scope  in  addressing  this 
area,  since  privacy  can  and  does  mean  different  things  to  differ¬ 
ent  people.  We  are  primarily  focused  on  a  view  of  privacy  as  the 
right  of  an  individual  to  control  personal  information  use  rather 
than  as  the  right  to  individual  isolation  (National  Research 
Council,  2003;  OECD,  1980;  U.S.  Fair  and  Accurate  Credit  Trans¬ 
action  Act,  2003).  Organizations  commonly  provide  a  descrip¬ 
tion  of  what  kind  of  information  they  will  collect  and  how  they 
will  use  it  in  privacy  policies.  In  some  areas  (e.g.,  the  collection 
and  use  of  health  care  information  in  the  United  States  or  move¬ 
ment  of  personal  information  across  national  boundaries  in  Eu¬ 
rope),  such  policies  can  be  required,  though  the  content  of  the 
policy  is  not  generally  specified  in  legislation.  While  there  has 
been  considerable  consensus  around  a  set  of  high-level  privacy 
principles  for  information  technology  (OECD,  1980),  we  do  not 
think  it  is  likely  that  a  single  privacy  policy  can  be  created  to  ad¬ 
dress  all  information  privacy  needs.  For  example,  there  will 
likely  be  considerable  differences  in  privacy  legislation  in  differ¬ 
ent  regions  of  the  world  (Manny,  2003).  Similarly,  organizations 
in  different  fields  (e.g.,  healthcare,  banking,  government)  need 
to  tailor  policies  to  their  domains  and  needs  (Ball,  2003;  Baumer, 
Earp,  &  Payton,  2000).  While  we  will  focus  on  privacy  policy,  we 
acknowledge  that  privacy  is  not  entirely  about  “setting  rules  and 
enforcing  them”  (Palen  &  Dourish,  2002).  To  implement  privacy 
within  an  organization,  the  coordination  of  people,  business 
processes,  and  technology  is  required.  Still  we  do  believe  that 
such  policies  are  essential  when  interacting  with  technology  and 
organizations  in  that  they  enable  people  to  better  understand 
the  boundaries  between  public  and  private  information  and 
technology  (Altman,  1975). 

It  is  interesting  to  note  that  while  privacy  policies  are  not 
new  to  most  organizations,  very  little  has  been  done  to  imple¬ 
ment  the  policies  through  technology  (Smith,  1993).  There  are 
emerging  standards  for  privacy  policies  on  websites  (Cranor, 
2002),  but  these  address  machine-readable  policy  content  with¬ 
out  specifying  how  the  policy  might  be  created  or  implemented. 
The  reality  is  that  there  is  very  little  capability  to  have  technol¬ 
ogy  actually  implement  access  and  disclosure  limitations  that  we 
might  expect  from  a  policy  statement  like  “We  will  not  share 
your  information  with  a  third  party  without  your  consent.”  The 
emerging  focus  is  on  how  organizations  could  create  a  wide 
range  of  policies,  and  how  technology  might  enable  the  policies 
to  be  enforced  and  audited  for  compliance.  J.  Karat,  C.  Brodie, 
C.  Karat,  andj.  Feng  (2005)  focused  on  technology  to  enable  us¬ 
able  privacy  policy  authoring  and  enforcement,  rather  than  try¬ 
ing  to  directly  address  what  privacy  rights  people  should  have 


(e.g.,  Warren  &  Brandeis,  1890)  or  how  to  deidentify  informa¬ 
tion  such  as  video  stored  in  systems  (e.g..  Senior,  Pankanti, 
Hampapur,  Brown,  Tiann,  &  Ekin,  2004).  We  think  privacy  is  an 
important  social  issue  and  that  technology  can  enable  flexible, 
reliable  and  accountable  privacy  policy  (e.g.,  be  privacy  en¬ 
abling)  and  not  just  be  a  force  which  reduces  individual  rights. 

The  current  situation  with  respect  to  privacy  in  organizations 
is  described  in  a  Forrester  report  (Hagen,  2000).  This  research  re¬ 
veals  a  mismatch  between  consumer  demands  for  privacy  and  en¬ 
terprise  practices  in  industry.  According  to  this  report,  although 
customer  concerns  about  privacy  remain  high,  the  majority  of 
executives  (58%)  believe  privacy  issues  are  addressed  extremely 
well  by  their  companies.  Most  executives  do  not  know  whether 
their  customers  check  the  privacy  policies  or  not  and  few  see  the 
need  to  enhance  their  privacy  practices.  Research  in  the  Asia-Pa¬ 
cific  region  complements  these  results  (Office  of  the  Federal  Pri¬ 
vacy  Commissioner  of  Australia,  2000).  "With  these  results  in  mind, 
we  suggest  that  privacy  protection  must  extend  across  the  net¬ 
work  into  enterprise  processes  and  that  there  is  a  need  to  audit 
data  collection  and  sharing  mechanisms.  We  agree  that  technol¬ 
ogy  design  generally  does  reflect  concerns  of  society  in  general 
(Ackerman  &  Mainwaring,  2005),  and  believe  that  we  are  experi¬ 
encing  a  shift  toward  a  greater  concern  for  privacy  in  IT  design. 

Central  to  our  view  of  privacy  is  the  notion  that  the  parties 
involved  in  information  exchanges  have  implicit  or  explicit  poli¬ 
cies  with  regard  to  the  use  of  the  information.  This  applies  both 
to  the  person  who  the  information  is  about  and  to  the  organi¬ 
zation  collecting  and  using  the  information.  In  the  privacy  liter¬ 
ature  on  organizations,  while  there  has  been  some  attention 
given  to  the  generally  implicit  policies  of  end  users  whose  PI  is 
being  collected  and  used  (often  called  data-subjects),  it  is 
mostly  the  policies  of  the  organization  collecting  the  informa¬ 
tion  that  has  been  given  attention.  Smith  (1993)  described  such 
organizational  policies  and  noted  the  lack  of  technology  in  en¬ 
forcing  the  policies.  He  described  the  rather  unstructured  ways 
in  which  organizations  develop  privacy  policy — a  characteriza¬ 
tion  that  has  changed  little  in  the  nearly  10  years  since  his  re¬ 
search  was  published  in  spite  of  the  increased  legislation  of  the 
past  few  years.  In  our  work,  we  try  to  address  both  the  needs 
of  data  subjects  and  organizations  by  addressing  the  gap  be¬ 
tween  policy  and  practice. 

HCI  Studies  of  Privacy 

Rapid  advances  in  technology  are  pushing  us  closer  and 
closer  to  Mark  Weiser’s  (1991)  vision  of  a  ubiquitous  comput¬ 
ing  world.  Technology  has  permeated  every  facet  of  our  daily 
lives — from  enabling  us  to  shop  online  with  a  few  clicks  of  the 
mouse,  to  recommending  interest-appropriate  movies  (Good, 
et  al.,  1999),  to  helping  us  share  documents  with  colleagues  on 
the  team  (e.g.,  Horstmann  &  Bentley,  1997),  to  allowing  us  to 
estimate  when  a  collaborator  may  be  available  for  communica¬ 
tion  (e.g.,  Begole,  Tang,  Smith,  &  Yankelovich,  2002),  to  remind¬ 
ing  us  of  relevant  tasks  based  on  our  current  context  (e.g.,  Dey 
&  Abowd,  2000),  and  so  on.  The  benefits  of  these  technologi¬ 
cal  developments  promise  to  improve  quality  and  experience  of 
life  by  making  it  more  convenient  and  efficient.  In  order  to  ef¬ 
fectively  provide  these  benefits,  systems  need  to  capture,  store. 
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and  analyze  information  that  may  be  deemed  sensitive  by  the  in¬ 
dividuals  concerned.  The  quantity  of  information  varies  from 
system  to  system,  and  the  associated  sensitivity  can  vary  from  in¬ 
dividual  to  individual  and  context  to  context.  As  a  result,  to 
achieve  their  potential,  such  systems  must  first  overcome  vari¬ 
ous  social  hurdles.  In  particular,  user  perceptions  regarding  po¬ 
tential  violations  of  privacy  have  emerged  as  a  key  factor  that 
affects  technological  acceptance  and  adoption  (Herbsleb, 
Atkins,  Boyer,  Handel,  &  Finholt,  2002;  Want,  Hopper,  Falcao, 
&  Gibbons,  1992).  For  example,  the  ambitious  Total  Information 
Awareness  (TLA)  initiative  proposed  by  the  U.S.  government  was 
abandoned  before  proceeding  beyond  the  planning  stages  due 
to  its  inability  to  address  satisfactorily  privacy  fears  raised  by  cit¬ 
izens  concerned  with  civil  liberties  (Associated  Press,  2003).  As 
rampant  increases  in  viruses,  worms,  spam,  spyware,  and  ad¬ 
ware  threaten  to  erode  user  confidence,  it  has  become  increas¬ 
ingly  important  that  a  system  empower  users  to  appropriately 
manage  privacy. 

The  motivation  in  studying  these  issues  within  the  field  of 
HCI  lies  in  the  interest  in  designing  technological  systems  with 
privacy  in  mind  at  the  outset.  While  researchers  have  acknowl¬ 
edged  the  existence  of  privacy  concerns  in  a  wide  variety  of 
technological  domains  (e.g.,  e-mail;  Bellotti,  1996;  e-commerce; 
e.g.,  Ackerman,  Cranor,  &  Reagle,  J.  1999;  media  spaces;  e.g., 
Mantei,  Baecker,  Sellen,  Buxton,  Milligan,  &  Wellman,  1991;  data 
mining;  e.g.,  Thuraisingham,  2002;  homes  of  the  future;  e.g., 
Meyer  &  Rakotonirainy,  2003;  and  so  on),  designing  effective  so¬ 
lutions  to  address  these  concerns  is  challenging.  One  of  the  rea¬ 
sons  is  that  relatively  few  empirical  studies  have  been  con¬ 
ducted  with  the  sole  aim  of  studying  privacy  issues.  In  addition, 
empirical  investigations  of  privacy  pose  numerous  methodolog¬ 
ical  challenges.  To  start  with,  privacy  has  proved  hard  to  define 
due  to  its  highly  nuanced  and  context-dependent  nature.  Thus, 
individual  differences  in  perceptions  and  interpretations  of  pri¬ 
vacy  could  lead  to  researcher-introduced  bias.  For  example,  in 
a  review  of  23  privacy  surveys.  Harper  and  Singleton  (2001) 
pointed  out  that  surveys  may  suffer  from  manipulative  ques¬ 
tioning  on  the  one  hand,  and  that  unprompted  surveys  may  re¬ 
veal  very  little  privacy  concern  on  the  other  hand.  It  is  also  pos¬ 
sible  that  certain  privacy  issues  remain  undetected  because  the 
researchers  did  not  recognize  them  as  such.  Moreover,  cultural 
differences  in  expectations  and  behaviors  regarding  these  issues 
tend  to  be  quite  profound  (Milberg,  Burke,  Smith,  &  Kallman, 
1995)  making  it  difficult  to  generalize  findings  across  cultures, 
or  to  study  settings  that  involve  individuals  from  multiple  cul¬ 
tures.  Differences  in  privacy  laws  in  different  countries  could 
make  it  difficult  to  isolate  actual  intention  from  mere  legal  com¬ 
pliance.  Further,  methodologies  for  studying  privacy  may  them¬ 
selves  be  deemed  too  privacy  invasive,  causing  users  to  deviate 
from  normal  practice  and  to  withhold  revealing  sensitive  as¬ 
pects.  As  a  result,  relying  on  self-reported  attitudes  and  behavior 
alone  may  not  provide  a  valid  view  of  normal  practices  (Spiek- 
ermann,  Grossklags,  &  Berendt,  2001). 

With  this  overview  of  the  privacy  and  HCI  area  as  context,  we 
will  examine  some  key  research  topic  areas  in  more  depth. 
These  areas  will  include  anonymity,  personalization,  and  end- 
user  views  of  privacy,  including  privacy  policies,  the  Platform 
for  Privacy  Preferences  (P3P),  Peer-to-Peer  (P2P),  ubiquitous 
computing,  and  electronic  voting.  We  will  conclude  this  review 


with  a  discussion  of  the  views  and  requirements  of  end  users 
within  organizations  who  are  responsible  for  managing  the  per¬ 
sonal  data  held  by  the  organization. 

Anonymity.  Anonymity  and  anonymization  in  information 
technology  includes  the  ability  of  a  user  to  maintain  privacy 
while  completing  transactions  on  a  network  and  the  ability  for 
a  user  to  keep  the  data  they  provide  from  identifying  them  per¬ 
sonally.  In  researching  anonymizing  networks,  Dingledine  and 
Mathewson  (2005)  concluded  that,  in  order  for  users  to  be  able 
to  preserve  their  privacy  on  networks  by  completing  transac¬ 
tions  without  revealing  communication  partners,  it  is  critical 
that  the  system  chosen  is  usable  so  that  other  users  can  suc¬ 
cessfully  employ  the  system  as  well.  Anonymizing  networks  are 
successful  at  hiding  users  among  users.  An  eavesdropper  may 
be  able  to  determine  who  is  using  the  network  but  cannot  iden¬ 
tify  who  completes  a  particular  transaction  and  the  larger  the 
group  of  users  on  the  network,  the  more  anonymous  the  par¬ 
ticipants  become.  The  catch  phrase  in  this  area  is  “anonmymity 
loves  company”  (Reiter  &  Rubin,  1998). 

Malin  and  Sweeney  (2004;  Sweeney,  2002)  demonstrated 
that  minimal  amounts  of  information  believed  to  be  anonymous 
can  be  used  to  personally  identify  an  individual.  Since  data  are 
aggregated  from  a  myriad  of  databases  in  the  networked  world 
in  which  we  live,  users  may  falsely  believe  that  they  can  remain 
anonymous  by  providing  only  minimal  bits  of  information  in 
transactions  here  and  there  over  time.  When  the  data  are  ag¬ 
gregated  though,  the  person  can  be  identified  easily  based  on 
three  minimal  pieces  of  information.  Users  may  not  think  about 
the  types  of  data  from  different  sources  that  might  be  combined 
and  analyzed  in  order  to  identify  them,  and  thus  there  is  a  false 
sense  of  privacy.  This  research  also  makes  a  gray  area  clear  in 
terms  of  what  data  elements  might  be  labeled  “PI”  versus  “PH” — 
it  depends  on  the  other  data  available  in  context. 

Personalization.  J.  Karat,  C.  Karat,  and  Brodie  (2004) 
stated  that  the  user  experience  in  HCI  is  different  as  the  systems 
with  which  users  interact  become  aware  of  the  context  of  the  in¬ 
teraction.  Personalizing  interaction  involves  the  use  of  informa¬ 
tion  about  the  user  to  alter  the  content  presented  to  provide 
value  to  the  user.  In  their  empirical  studies,  J.  Karat,  C.  Karat, 
and  Brodie  found  that  the  most  critical  element  in  the  willing¬ 
ness  of  users  to  adopt  and  use  personalization  systems  was  user 
control  of  personal  data.  Users  want  to  be  in  control  of  their 
privacy  and  appreciate  the  benefits  of  personalization  when 
they  have  control  over  the  use  of  their  personal  data. 

Cranor  (2004)  complemented  this  line  of  research  with  an 
analysis  of  privacy  risks  associated  with  personalization  and  pre¬ 
sented  a  set  of  design  guidelines  to  reduce  privacy  risks  in  per¬ 
sonalization  systems.  Telzrow  and  Kobsa  (2004)  completed  a 
meta-analysis  of  30  consumer  studies  of  user  privacy  prefer¬ 
ences  regarding  personalization  systems  and  found  that  con¬ 
sumer  demands  and  current  practice  diverge  significantly  re¬ 
garding  control  of  personal  information.  The  vast  majority  of 
businesses  allow  neither  control  over  what  information  is  stored 
nor  ability  to  access  it  for  verification,  correction,  or  updates. 

End  user  views  of  privacy.  Researchers  have  conducted 
many  studies  on  end  user  preferences  regarding  privacy  on  the 
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web  and  the  HCI  factors  that  are  necessary  to  satisfy  the  user’s 
desire  for  privacy.  Jensen  and  Potts  (2004)  found  that,  while 
most  surveys  of  user  concerns  about  privacy  show  high  rates  of 
behaviors  such  as  reading  the  privacy  policy  or  taking  concrete 
actions  to  protect  their  privacy,  informal  analysis  of  log-file  data 
suggests  that  the  actual  rates  of  these  user  behaviors  are  much 
lower.  Jensen,  Potts,  and  Jensen  (2005)  found  that  users  place 
inappropriate  trust  in  the  presence  of  trust  indicators  such  as 
the  TRUSTe  mark  or  a  privacy  policy  on  the  website,  assuming 
quality  in  the  presence  of  these  trust  indicators  rather  than  un¬ 
derstanding  that  the  level  of  their  trust  should  be  dependent  on 
the  content  of  these  policies.  The  users  were  willing  to  divulge 
personal  information  when  it  was  not  warranted.  Jensen,  Potts, 
and  Jensen  recommended  that  HCI  professionals  work  to  in¬ 
crease  awareness  of  the  privacy  issues  related  to  the  policy 
issues  associated  with  the  trust  indicators,  as  unscrupulous  on¬ 
line  vendors  could  use  the  trust  indicators  to  mislead  users  to 
accept  privacy  policies  and  divulge  private  information  that 
they  not  would  otherwise  willingly  do.  Buffett,  Fleming,  Richter, 
Scott,  and  Fleming  (2004)  created  a  technique  for  enabling  a 
user  to  compute  the  value  of  the  consequences  of  exchanging 
personal  information  on  the  web.  The  paper  provides  a  demon¬ 
stration  of  the  effectiveness  of  the  technique  in  improving  a 
user’s  expected  utility  in  a  simple  privacy  negotiation. 

Cranor  (2005)  led  the  ’W3C  standardization  work  in  the  area 
of  the  Platform  for  Privacy  Preferences  (P3P)  policies.  P3P  poli¬ 
cies  provide  end  users  with  information  about  the  privacy  policies 
of  a  website  before  they  interact  with  it.  Cranor  and  a  team  at 
AT&T  Labs  designed  and  developed  the  Privacy  Bird  (Cranor, 
2002),  a  browser-help  object,  that  provided  summary  informa¬ 
tion  about  the  agreement  or  lack  thereof  between  the  end 
user’s  privacy  preferences  and  the  privacy  policy  of  the  website 
with  which  the  person  was  considering  interacting.  Research 
continues  to  improve  the  usability  of  the  Privacy  Bird  in  com¬ 
municating  to  users  about  website  privacy  policies  and  in  cap¬ 
turing  user  privacy  preferences. 

Peer-To-Peer  (P2P)  systems  enable  users  to  share  easily  files 
by  downloading  simultaneously  from  multiple  sources  and  by 
sharing  many  different  file  types.  Good  and  Krekelberg  (2003) 
illustrated  HCI  problems  with  P2P  file  sharing  systems  such  as 
KaZaA.  KaZaA,  the  most  popular  and  widely  used  in  2003,  had 
an  average  of  120  million  downloads  worldwide  and  3  million 
users  online  at  any  given  time  during  the  course  of  the  study. 
Good  and  Krekelberg  (2004)  conducted  a  user  study  of  KaZaA 
and  found  that  a  large  proportion  of  users  unwillingly  shared 
personal  and  private  files,  leaving  them  at  risk  of  being  taken 
advantage  of  through  unknown  exposure  of  PI.  The  majority 
of  the  users  in  the  study  were  unable  to  correctly  determine 
the  files  they  were  sharing.  Many  thought  they  were  sharing 
no  files  when  they  were  actually  sharing  all  of  the  files  on  their 
hard  drives. 

The  area  of  ubiquitous  computing  presents  many  new  capa¬ 
bilities  for  users  in  communicating  with  the  outside  world 
through  a  number  of  different  mobile,  pervasive  devices  and 
through  many  choices  in  terms  of  managing  their  privacy  in 
these  new  computing  contexts.  A  group  of  researchers  con¬ 
ducted  a  series  of  research  studies  in  this  domain.  In  the  first 
study,  Lederer,  Mankoff,  and  Dey  (2003)  conducted  a  question¬ 
naire-based  study  on  the  relative  importance  of  the  inquirer  and 


the  situation  in  determining  a  user’s  decision  regarding  the  pre¬ 
ferred  accuracy  of  PI  disclosed  through  the  ubiquitous  device. 
They  found  that  users  were  more  likely  to  provide  the  same 
level  of  accuracy  in  the  disclosure  of  PI  to  the  same  inquirer  in 
different  situations  rather  than  to  different  inquirers  in  the  same 
situation.  In  later  research,  Hong,  Ng,  Lederer,  and  Landay 
(2004)  developed  a  privacy-risk  model  for  designing  privacy- 
sensitive  ubiquitous  computing  systems  and  presented  two  case 
studies  that  illustrated  the  use  of  the  privacy-risk  model  in  the 
design  of  applications.  The  goal  was  to  refine  the  concepts  of 
privacy  to  concrete  issues  that  could  be  addressed  by  users  em¬ 
ploying  different  applications.  Hong  and  Landay  (2004)  created 
a  toolkit  for  developers  called  “Confab”  that  facilitates  the  de¬ 
velopment  of  privacy-sensitive  ubiquitous  applications.  Confab 
includes  a  framework  and  customizable  privacy  mechanisms. 
The  tool  also  includes  extensions  for  managing  location  privacy. 
The  goal  of  Confab  is  to  enable  application  developers  to  ad¬ 
dress  the  range  of  trust  and  privacy  requirements  of  end  users 
of  ubiquitous  computing  systems. 

There  is  a  growing  area  of  research  related  to  HCI  issues  and 
privacy  in  electronic  voting.  Hochheiser,  Bederson,  Johnson,  C. 
Karat,  and  Lazar  (2005)  analyzed  the  current  electronic  voting 
systems  and  determined  that  increased  usability  was  needed  in 
these  systems  in  three  areas.  First,  voters  must  have  information 
available  to  them  about  the  electronic  voting  machines  prior  to 
arriving  at  the  polling  place  and  while  there.  Second,  voters  must 
be  able  to  easily  and  confidently  understand  the  machine,  read 
the  contents  of  the  ballot,  and  cast  their  votes.  Finally,  voters 
must  be  able  to  trust  the  fairness  and  accuracy  of  the  electronic 
voting  system.  A  critical  factor  in  trusting  the  system  is  being  able 
to  verify  their  votes  without  sacrificing  their  anonymity.  Thus,  pri¬ 
vacy  is  a  critical  requirement  in  the  design  of  successful  elec¬ 
tronic  voting  systems.  Voter-'Verified  Paper  Trails  (WPTs)  can  pro¬ 
vide  auditability  of  individual  votes.  It  is  also  crucial  to  have 
well-designed  audit  trails  regarding  large-scale  aspects  of  the  sys¬ 
tem  such  as  configuration,  installation,  and  voter-count.  Shubina 
and  Smith  (2004)  conducted  research  on  how  to  design  coer¬ 
cion-resistant,  voter  verifiable  electronic  voting  systems.  They 
stated  that  it  is  important  that  voters  can  verify  that  the  tally  re¬ 
flects  the  sum  of  votes  actually  cast  and  that  voters  can  maintain 
the  privacy  of  their  votes  and  thus  not  be  at  risk  of  coercion  from 
adversaries.  They  designed  and  prototyped  a  system  to  accom¬ 
plish  these  goals.  Their  design  ensured  that  voters  are  not  sub¬ 
ject  to  coercion  from  adversaries  by  providing  a  voter  verification 
receipt  that  provides  no  real  information  without  access  to  the 
voting  machine  or  the  contents  of  the  cast  ballots. 

In  terms  of  providing  solutions  for  users  to  manage  their 
privacy  in  the  future,  the  European  Union  is  sponsoring  re¬ 
search  on  a  system  called  PRIME  (Privacy  and  Identity  Manage¬ 
ment  for  Europe).  The  goal  of  the  PRIME  system  is  to  support 
the  user  in  controlling  what  PI  about  him-  or  herself  is  revealed 
to  others  through  interactions  with  individuals,  systems,  and 
organizations  (Pettersson  et  al.,  2005).  Pettersson  et  al.  (2005) 
described  three  alternative  user  interface  paradigms  for  privacy- 
enhanced  identity  management  and  illustrated  how  key  legal 
privacy  principles  from  the  European  Union  Directives  are  as¬ 
sociated  with  these  designs.  In  the  report  of  the  first  year  of 
research  in  this  effort,  the  team  reported  some  results  of  ini¬ 
tial  usability  evaluations  of  mock-ups  of  the  three  paradigms. 
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Organizational  requirements  in  managing  privacy. 

HCI  research  that  is  underway  and  focused  on  the  organiza¬ 
tional  view  of  privacy  includes  the  SPARCLE  Policy  Workbench 
(Karat,  J.,  Karat,  C.,  Brodie,  &  Feng,  2005)  and  efforts  by  Adams 
and  Blanford  (2005)  in  the  healthcare  domain.}.  Karat,  C.  Karat, 
Brodie,  and  Feng  employed  user-centered  design  methods  with 
109  target  users  to  identify  organizational  privacy  requirements, 
and  then  they  designed  and  tested  a  prototype  system  for  users 
to  author  privacy  policies,  implement  the  policies,  and  conduct 
compliance  audits  of  them.  The  prototype  was  a  Wizard  of  Oz 
version  of  a  system,  meaning  that  users  were  able  to  have  an  im¬ 
mersive  and  dynamic  experience  with  the  system  capability  that 
seemed  real,  although  there  was  no  functional  code  behind  the 
screens.  Empirical  results  show  that  organizational  users  value 
highly  a  set  of  capabilities  that  enables  policy  officers  to  use  nat¬ 
ural  language  to  author  the  policies,  perhaps  beginning  with  a 
template.  Users  found  a  visualization  of  the  policies  very  valu¬ 
able  for  communication,  review,  and  reaching  consensus  across 
the  organization.  The  implementation  capability  enabled  the  ex¬ 
perts  to  approve  the  nominated  mappings  between  rule  ele¬ 
ments  and  database  fields  and  applications  in  the  organization’s 
configuration.  The  compliance  audit  capability  enabled  the 
users  to  run  general  audits  to  verify  that  the  policy  complied 
with  regulations,  to  verify  that  the  policy  was  enforced  opera¬ 
tionally  as  stated,  and  to  run  specific  inquiries  based  on  indi¬ 
vidual  requests  for  information  about  use  of  personal  informa¬ 
tion  by  the  organization.  The  organizational  users  who  were 
participants  in  the  evaluation  study  rated  the  prototyped  func¬ 
tionality  as  being  of  very  high  value  to  them. 

C.  Karat,}.  Karat,  Brodie,  and  Feng  (2006)  also  conducted 
an  empirical  study  to  determine  whether  the  two  methods  of 
authoring  rules  that  were  prototyped  by}.  Karat,  C.  Karat, 
Brodie,  and  Feng  (2005;  natural  language  with  a  guide  or  struc¬ 
tured  list)  enabled  users  to  create  higher  quality  rules  than  an 
unguided  natural  language  control  method  (using  a  word-pro- 
cessing  window).  Empirical  results  demonstrated  that  both  pro¬ 
totyped  authoring  methods  yielded  higher  quality  rules  than 
the  control  condition.  With  no  training  in  use  of  the  methods, 
users  were  able  to  author  privacy  rules,  covering  about  8096  of 
the  required  rule  elements  described  in  scenarios  when  using 
either  the  natural  language  with  a  privacy  rule  guide  tool  or  a 
structured  list  tool,  as  compared  to  covering  about  4096  of  the 
required  elements  in  scenarios  using  the  unguided  natural  lan¬ 
guage  control  condition. 

Research  and  development  on  the  SPARCLE  (Server  Privacy 
Architecture  and  Capability  Enablement)  policy  workbench  is 
continuing.  The  general  rule  authoring  utility  for  the  privacy  do¬ 
main  is  functional,  working  code  that  has  been  successfully 
tested  with  the  privacy  policies  of  banking  and  finance,  health¬ 
care,  and  government  organizations  (Brodie,  C.  Karat,  &}.  Karat, 
2006).  Users  can  create  new  policies  with  SPARCLE,  can  import 
existing  text  versions  of  policies,  or  can  cut  and  paste  sections  of 
policies  to  form  a  new  policy.  Then,  SPARCLE  employs  natural 
language  processing  technology  to  parse  and  identify  the  rule 
elements.  Users  review  and  modify  the  rules,  and  when  they  are 
happy  with  the  policy,  the  policy  is  transformed  into  XACML 
code  (the  OASIS  international  standard  for  the  format  of  security- 
access  control  rules  with  a  privacy  profile)  for  input  to  the  en¬ 
forcement  engine.  The  team  is  continuing  to  design  and  create 


other  components  in  the  end-to-end  solution  for  privacy  and 
believes  that  the  policy  workbench  can  be  generalized  to  the  se¬ 
curity  domain  and  possibly  others. 

Adams  and  Blanford  (2005)  addressed  the  gap  between  or¬ 
ganizational  and  user  perspectives  of  security  and  privacy  in  the 
healthcare  domain.  They  conducted  ethnographic  evaluations 
of  93  clinical  staff,  managers,  library  staff,  and  IT  department 
members  in  two  hospitals.  They  found  contrasting  perspectives 
on  security  and  privacy  within  the  groups.  They  employed  the 
concept  of  communities  of  practice  to  identify  security  and  pri¬ 
vacy  issues  within  organizations,  acknowledging  the  importance 
of  knowledge  gained  through  a  work  community  in  day-to-day 
work  practices  as  well  as  formal  learning.  The  community  of 
practice  acted  as  a  link  between  the  individual  and  the  organi¬ 
zation.  In  their  research,  one  hospital  improved  communication 
and  awareness  across  the  organization  through  a  community 
and  user-centered  approach  to  the  design  and  development  of 
an  organizational  privacy  and  security  application.  In  the  second 
hospital,  a  clash  between  the  formal  rules  and  the  perspective  of 
the  local  community  of  practice  were  at  the  heart  of  an  identi¬ 
fied  security  problem.  The  authors  highlighted  the  importance 
of  designing  security  and  privacy  practices  within  the  context 
of  communities  of  practice. 

Summary  of  Findings  and  Recommendations 

There  is  a  range  of  HCI  research  that  can  inform  the  usabil¬ 
ity  of  the  design  of  privacy  mechanisms  (see  Table  32.1).  The 
review  has  covered  topics  related  to  the  networked  world  in 
which  we  live,  current  privacy  legislation,  individual  differences 
in  defining  the  meaning  of  privacy,  end  user’s  concern  with 
anonymity  while  interacting  over  networks  and  in  terms  of  the 
risks  of  aggregation  of  data,  end  user  desire  for  personalization, 
the  impact  of  some  technologies  (e.g.,  pervasive  devices,  P2P, 
P3P)  on  privacy,  and  end  user  and  organizational  views  of  pri¬ 
vacy  in  a  variety  of  domains.  The  central  theme  that  emerges 
is  informed  user  control  over  the  disclosure  of  personal  infor¬ 
mation.  As  presented  in  this  section,  different  views  and  situa¬ 
tional  contexts  regarding  privacy  ensure  that  this  is  a  multi¬ 
faceted  challenge  for  experts  in  the  HCI  and  privacy  fields  to 
address.  In  our  roles  as  usability  professionals,  it  is  incumbent 
on  us  to  consider  the  issues  of  privacy  in  the  design,  review, 
approval,  development,  and  use  of  computing  systems  with 
which  we  are  involved. 

The  HCI  field  will  be  enriched  by  considering  privacy  more 
fully,  and  this  research  will  cross-pollinate  in  several  areas.  For 
example,  cross-cultural  studies  of  privacy  inform  the  growing 
interest  in  HCI  in  cross-cultural  interfaces  and  coordination. 
Kumaraguru  and  Cranor  (2005)  conducted  a  study  about  pri¬ 
vacy  in  India,  and  the  results  demonstrated  an  overall  lack  of 
awareness  and  concern  about  privacy  issues  among  users  as 
compared  to  studies  of  users  conducted  in  the  United  States.  In¬ 
creased  understanding  of  the  diversity  and  complexity  of  user 
preferences,  and  potential  clusters  of  variables  of  interest,  is 
providing  new  impetus  for  HCI  research  on  individual  differ¬ 
ences.  Moreover,  considerations  of  visualizations  and  intelligent 
tutoring  systems  for  privacy  in  a  range  of  applications  and  tasks 
could  germinate  new  emphases  in  HCI.  Privacy  in  information 
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TABLE  32.1.  Design  Factors  Impacting  Usable  Privacy 
Solutions. 

Variable  Impact  on  HCI  and  Privacy 

Networked  World  •  There  is  technology  and  data 

collection  in  most  aspects  of 
everyday  life  in  the  developed 
world. 

Privacy  Legislation  •  Requirements  for  privacy  vary  by 

geography  and  across  cultures. 
Organizations  and  end  users 
must  become  knowledgeable. 

Individual  Differences  •  People  perceive  and  define 

privacy  intrusions  differently,  and 
privacy  solutions  must  be 
designed  to  support  varying 
definitions  and  choices. 

End  User  Desire  for  Anonymity  •  End  users  must  be  informed  of 

the  risks  of  choices  involved  in 
different  anonymity-enabling 
situations. 

•  User  control  of  data  to  be 
disclosed  is  critical  for  using 
personalization. 

•  Users  want  control  over  access 
to  them  but  want  it  to  be  easy  to 
use,  fast,  and  flexible. 

•  Can  inform  user  of  any  issues 
between  their  own  policy  and 
policy  of  a  website  they  are 
considering  interacting  with. 

•  No  verification  of  adherence  to 
policy  possible  at  this  time. 

•  User  must  be  clearly  informed 
about  data  at  risk  of  disclosure. 

•  User  must  understand  how 
anonimity  is  protected  and 
verification  is  enabled. 

•  Authors  of  privacy  policies,  who 
have  the  knowledge  of 
organizational  practices,  must 
be  able  to  tie  written  policies 
to  implementation  through 
technology,  with  verification 
through  compliance  audits. 

•  Critical  to  understand  commu¬ 
nity  of  practice  issues  in  an 
organization  in  creating  privacy 
polices. 


technology  is  likely  to  remain  a  critical  issue  for  the  near  future. 
As  it  directly  engages  aspects  of  user  control  and  power,  pri¬ 
vacy  is  central  to  the  concerns  of  HCI.  HCI  and  privacy  will  to  re¬ 
main  closely  interlocked,  and  in  their  intersection  are  the  seeds 
for  successful  solutions. 

Views  on  Security 

It  is  broadly  recognized  that  one  of  the  major  challenges  to  the 
effective  deployment  of  information  security  systems  is  getting 


people  to  use  them  correctly  (CRA,  2003).  As  far  back  as  the 
1970s,  usability  with  specific  reference  to  security  mechanisms 
was  identified  as  a  key  principle  in  the  design  of  secure  systems 
(Saltzer  &  Schroeder,  1975).  Even  beyond  the  domain  of  elec¬ 
tronic  information  systems,  there  are  many  examples  of  the  fact 
that  overly  complex  security  systems  actually  reduce  effective 
security.  For  example,  Kahn  (1967),  cited  by  Anderson  (1993), 
suggested  that  Russian  military  disasters  of  World  War  II  were 
partly  because  Russian  soldiers  abandoned  the  official  army  ci¬ 
pher  systems  because  they  were  too  hard  to  use,  and  instead  re¬ 
verted  to  simpler  systems  that  proved  easier  to  crack.  Schneiner 
(2000)  summed  up  the  situation:  “Security  measures  that  aren’t 
understood  by  and  agreed  to  by  everyone  don’t  work”  (p.  373). 
However,  as  with  many  areas  in  the  design  of  complex  systems, 
recognizing  a  potential  problem  does  not  necessarily  create  a 
rush  of  work  to  resolve  it.  Work  on  making  security  usable — and 
balancing  the  complex  relationship  between  secure  and  easy 
to  use — is  just  beginning  to  become  a  major  topic. 

Networked  computer  systems  are  increasingly  the  site  of 
people’s  work  and  activity.  Millions  of  ordinary  citizens  conduct 
commercial  transactions  over  the  Internet,  manage  their  fi¬ 
nances,  and  pay  their  bills  online.  Companies  increasingly  use 
the  Internet  to  connect  different  offices  and  to  form  virtual 
teams  to  tackle  mission-critical  problems  through  virtual  inter¬ 
action.  For  example,  interaction  between  citizens  and  local  and 
federal  government  agencies  can  increasingly  be  conducted 
electronically;  and  the  2004  national  elections  in  Brazil  and  (to  a 
much  more  limited  extent)  the  United  States  saw  the  introduc¬ 
tion  of  electronic  voting,  which  will  no  doubt  become  more 
widespread. 

However,  these  new  opportunities  have  costs  associated 
with  them.  Commercial,  political,  and  financial  transactions  in¬ 
volve  disclosing  sensitive  information.  The  media  regularly  car¬ 
ries  stories  about  credit  card  fraud,  identity  theft,  and  hackers 
breaking  into  commercial  servers.  Many  people  are  nervous 
about  committing  PI  to  electronic  information  infrastructures. 
Even  though  modern  PCs  are  powerful  enough  to  offer  strong 
cryptographic  guarantees  and  high  levels  of  security,  these  con¬ 
cerns  remain. 

The  need  for  secure  systems  is  broadly  recognized,  but  most 
discussions  of  the  problem  of  security  focus  on  the  foundational 
elements  of  information  systems  (such  as  network  transmis¬ 
sion  and  information  storage)  and  the  mechanisms  available  to 
system  developers,  integrators,  and  managers  to  ensure  secure 
operation  and  management  of  data.  Security,  though,  is  a 
broader  concern  and  a  problem  for  the  end  users  of  informa¬ 
tion  systems  as  much  as  for  their  administrators.  Participation  in 
activities  such  as  e-commerce  requires  that  people  be  able  to 
trust  the  infrastructures  that  will  deliver  these  services  to  them. 

This  is  not  quite  the  same  as  saying  that  we  need  more  se¬ 
cure  infrastructures.  De  Paula  et  al.  (2005)  suggested  that  it  is 
important  to  separate  theoretical  security  (the  level  of  secure 
communication  and  computation  that  is  technically  feasible) 
from  effective  security  (the  level  of  security  that  can  practically 
be  achieved  in  everyday  settings) .  Levels  of  effective  security  are 
usually  lower  than  those  of  theoretical  security.  A  number  of  rea¬ 
sons  for  this  disparity  have  been  identified,  including  poor  im¬ 
plementations  of  key  security  algorithms  (Kelsey  et  al.,  1998),  in¬ 
secure  programming  techniques  (Wagner  et  al.,  2000;  Shankar 
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et  al.,  2001),  insecure  protocol  design  (Kemmerer  et  al.,  1994; 
Schneier  &  Mudge,  1998),  and  inadequate  operating  systems 
support  (Ames  et  al.,  1983;  Bernaschi  et  al.,  2000). 

One  important  source  of  the  disparity,  though,  is  problems 
around  the  extent  to  which  users  can  comprehend  and  make 
effective  use  of  security  mechanisms.  Approaches  that  attempt 
to  make  the  provision  of  system  security  automatic  or  trans¬ 
parent  essentially  remove  security  from  the  domain  of  the  end 
user.  However,  in  situations  where  only  the  end  user  can  de¬ 
termine  the  appropriate  use  of  information  or  the  necessary 
levels  of  security,  then  this  explicit  disempowerment  becomes 
problematic. 

Perhaps  the  best  consolidation  of  usability  issues  for  secu¬ 
rity  comes  from  Sasse  and  Flechais  (2005).  They  observed  that 
currently  users  often  disclose  (or  write  down)  passwords,  fail 
to  encrypt  confidential  messages,  switch  virus  checkers  off,  and 
generally  engage  in  behavior  counter  to  good  security,  and  they 
asked  why  this  is  so  common.  They  concluded  that  most  users: 

1.  Have  problems  using  security  tools  correctly 

2.  Do  not  understand  the  importance  of  data,  software  and 

systems  for  their  organizations 

3.  Do  not  believe  that  the  assets  are  at  risk 

4.  Do  not  understand  that  their  behavior  puts  assets  at  risk 

Whitten  and  Tygar  (1999)  identified  a  “weakest  link  prop¬ 
erty”  stating  that  attackers  need  only  to  exploit  a  single  weak 
point  in  system  security.  Frequently,  the  human  user  proves  to 
be  this  weakest  link — not  from  malicious  intention  but  from  in¬ 
ability  to  reasonably  do  the  right  thing.  Sometimes  this  is  a 
mater  of  education — people  need  to  be  made  aware  of  what  to 
do  and  why.  Some  of  the  blame  and  burden  on  making  our  sys¬ 
tems  more  secure  rests  in  making  security  systems  more  usable. 

Usability  of  Security  Software  and  Mechanisms 

In  a  series  of  studies,  researchers  at  University  College,  Lon¬ 
don  have  explored  some  of  the  interactions  between  usability 
and  security  (Adams,  Sasse,  &  hunt,  1997;  Adams  &  Sasse,  1999). 
They  focused  on  user-visible  elements  of  security  systems,  such 
as  passwords.  Although  many  information  systems  professionals 
regard  users  as  being  uninterested  in  the  security  of  their  sys¬ 
tems  (and,  indeed,  likely  to  circumvent  it  by  choosing  poor 
passwords,  etc.),  Adams  and  Sasse’s  investigations  demon¬ 
strated  that  users  are  certainly  motivated  to  support  the  security 
of  the  system  but  often  are  unable  to  determine  the  security  im¬ 
plications  of  their  actions.  The  specific  problems  that  they  iden¬ 
tify  with  passwords  have  also  led  to  interesting  design  alterna¬ 
tives  (Brostoff  &  Sasse,  2000;  Dhamija  &  Perrig,  2000). 

In  some  cases,  the  complexity  of  making  security  work  is  as 
much  a  matter  of  interface  design  as  anything  else.  Whitten  and 
Tygar  (1999)  presented  a  usability  analysis  of  PGP  5.0,  demon¬ 
strating  the  difficulties  that  users  have  in  completing  experi¬ 
mental  tasks  (in  their  user  study,  only  3  out  of  12  test  subjects 
successfully  completed  a  standard  set  of  tasks  using  PGP  to  en¬ 
crypt  and  decrypt  e-mail.)  The  problems  that  they  uncovered 
were  largely  problems  of  interface  design,  and  in  particular,  the 


poor  matching  between  user  needs  and  the  structure  of  the  en¬ 
cryption  technology  provided  to  meet  these  needs. 

Zurko  and  Simon  (1996)  explored  similar  concerns  in  their 
focus  on  user-centered  security.  Their  work  addressed  their  per¬ 
ceptions  that  the  inscrutability  of  conventional  security  mecha¬ 
nisms  makes  it  less  likely  that  users  will  employ  them  effectively. 
The  approach  they  outlined  focused  on  graphical  interfaces  and 
query  mechanisms  to  MAR  an  authorization  engine.  While  this 
approach  is  clearly  helpful,  it  is  limited  to  a  particular  area  of  sys¬ 
tem  security  and  lacks  the  real-time  feedback. 

Control  Over  Security 

One  area  at  the  intersection  of  usability  and  security  that  has 
received  some  attention  is  the  role  of  access  control  in  interac¬ 
tive  and  collaborative  systems.  For  example,  Dewan  and  Shen 
(1998;  Shen  &  Dewan,  1992)  explored  the  use  of  access  con¬ 
trol  and  meta-access  control  models  as  a  basis  for  describing 
and  controlling  degrees  of  information  access  and  management 
in  collaborative  systems.  This  is  not  simply  a  technical  matter 
since  the  structure  and  behavior  of  these  internal  components 
can  have  a  significant  effect  on  the  forms  of  interactivity  and  col¬ 
laboration  they  can  support  (Greenberg  &  Marwood,  1994). 

Many  collaborative  systems  involve  privacy  issues  and  need 
to  provide  users  with  control  over  the  disclosure  of  information. 
This  has  spurred  a  number  of  researchers  to  explore  the  devel¬ 
opment  of  privacy  control  systems  that  are  tailored  to  the  needs 
of  end  users.  For  instance,  Dourish  (1993)  described  the  rela¬ 
tionship  between  three  different  security  mechanisms  for  similar 
multimedia  communication  systems,  each  of  which  reflects  as¬ 
sumptions  and  requirements  of  the  different  organizations  in 
which  they  were  developed.  Bellotti  and  Sellen  (1993)  drew  on 
experiences  with  multimedia  and  ubiquitous  computing  environ¬ 
ments  to  identify  the  source  of  a  number  of  potential  privacy  and 
security  problems.  Their  primary  concepts — disembodiment  and 
dissociation — ^were  visibility  problems,  related  to  the  disconnec¬ 
tion  between  actors  and  actions  that  rendered  either  actor  invisi¬ 
ble  at  the  site  of  action,  or  actions  invisible  to  the  actor. 

Based  on  their  investigations  of  privacy  problems  in  online 
transactions,  Ackerman  and  colleagues  (Ackerman,  Cranor,  & 
Reagle,  1999;  Ackerman  &  Cranor,  1999)  proposed  the  idea  of 
privacy  critics — semiautonomous  agents  that  monitor  online  ac¬ 
tion  and  inform  users  about  potential  privacy  threats  and  avail¬ 
able  countermeasures.  Again,  this  mechanism  turns  on  the  abil¬ 
ity  to  render  invisible  threats  visible. 

One  important  related  topic  is  control  over  the  degree  of  se¬ 
curity  available.  One  of  our  criticisms  of  traditional  security  sys¬ 
tems  has  been  their  all  or  nothing  approaches.  However,  some 
work  has  attempted  to  characterize  degrees  of  security  provi¬ 
sion,  as  embodied  by  the  idea  of  “quality  of  security  service” 
(Irvine  &  Levin,  1999;  Spyropoulou  et  al.,  2000).  This  builds  on 
earlier  work,  establishing  taxonomy  of  security  service  levels 
(Irvine  &  Levin,  1999).  The  fundamental  insight  is  that  organiza¬ 
tions  and  applications  need  to  trade  off  different  factors  against 
each  other,  including  security  of  various  forms  and  degrees,  in 
order  to  make  effective  use  of  available  resources  (Thomsen  & 
Denz,  1997;  Henning,  1999).  While  this  work  is  directed  towards 
resource  management  rather  than  user  control,  it  begins  to  unpack 
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the  security  black  box  and  characterizes  degrees  and  qualities 
of  security. 

For  end  users,  perceived  security  can  be  defined  as  the  level 
of  security  that  users  feel  while  they  are  shopping  on  e-com¬ 
merce  websites.  Yenisey  Ozok,  and  Salvendy  (2005)  reported  a 
study  that  aimed  to  determine  items  that  positively  influence 
this  feeling  of  security  by  users  during  shopping  and  to  develop 
guidelines  for  perceived  security  in  e-commerce.  An  experiment 
allowed  users  with  different  security  assurances  to  shop  on  sim¬ 
ulated  e-commerce  websites.  The  participants  were  divided  into 
three  groups — shopping  for  cheap,  midrange,  and  expensive 
products,  respectively.  Following  the  shopping  environment,  a 
virtual  shopping  security  questionnaire  was  presented  to  the 
users.  Generally,  there  were  no  significant  differences  in  item 
ratings  between  the  groups  of  different  shopping  item  values. 
A  factor  analysis  procedure  determined  two  main  factors  con¬ 
cerning  perceived  security  in  e-commerce.  The  perceived  op¬ 
erational  factor  includes  the  website’s  blocking  of  unauthorized 
access;  emphasis  on  login  name  and  password  authentication; 
funding  and  budget  spent  on  security;  monitoring  of  user  com¬ 
pliance  with  security  procedures;  integration  of  state-of-the-art 
systems;  distribution  of  security  items  within  the  website;  web¬ 
site’s  encryption  strategy;  and  consolidation  with  network  se¬ 
curity  vendors.  The  perceived  policy-related  factor  includes  the 
website’s  emphasis  on  network  security;  top  management  com¬ 
mitment;  effort  to  make  users  aware  of  security  procedures;  the 
website’s  keeping  up-to-date  with  product  standards;  the  web¬ 
site’s  emphasis  on  security  in  file  transfers;  and  issues  concern¬ 
ing  the  Web  browser. 

Key  Topic  Areas  for  HCl  in  Security 

Secure  design  and  administration.  Yee  (2005)  pro¬ 
vided  guidance  for  designing  and  evaluating  usable  secure  soft¬ 
ware  aimed  at  protecting  the  interests  of  the  legitimate  user. 
Starting  with  the  work  of  Saltzer  and  Schroeder  (1975),  he  di¬ 
vided  his  guidelines  into  two  general  categories — guidelines 
for  authorizing  others  to  access  valuable  resources  and  guide¬ 
lines  for  communication  between  the  system  and  the  user.  For 
authorization,  he  suggested  associating  greater  risk  to  the  user 
with  greater  effort  or  less  visible  operations  so  that  user’s  nat¬ 
ural  tendencies  leads  to  safe  operations.  For  communication,  he 
suggested  enabling  users  to  express  safe  security  policies  that  fit 
their  tasks.  The  guidelines  are  not  empirically  derived,  but  are 
gleaned  from  the  experiences  of  security  software  designers. 

Kandogan  and  Haber  (2005)  applied  ethnographic  approaches 
to  the  study  of  a  particular  class  of  users  involved  in  the  protec¬ 
tion  of  systems — security  administrators.  The  issues  are  some¬ 
what  different  for  populations  whose  job  responsibilities  in¬ 
clude  the  security  of  systems  used  by  others  when  compared 
to  the  situation  in  which  users  are  acting  as  their  own  security 
administrators.  However,  they  represent  another  important  cat¬ 
egory  of  user  in  the  area  of  creating  usable  security  systems.  In 
the  ethnographic  tradition,  they  examined  their  users  in  typical 
work  situations  (e.g.,  detecting  and  addressing  a  security  attack) 
and  looked  at  how  well  their  tools  help  them  to  do  their  job. 
One  important  result  of  their  work  was  the  realization  that  ad¬ 
ministrators  simply  have  too  much  to  look  at  in  the  course  of 


their  daily  work  and  that  they  specifically  need  tools  to  help 
them  understand  vast  amounts  of  dynamic  data.  Their  work  fo¬ 
cused  on  developing  visualization  tools  to  assist  in  this  situation. 

Balfanz,  Durfee,  Smetters,  and  Grinter,  (2004)  presented 
guidelines  for  usable  security  systems  developed  over  a  course 
of  research  that  has  focused  on  the  intersection  of  security  and 
usability.  The  guidelines  included  advice  to  focus  on  designing 
both  security  and  usability  into  the  system  rather  than  attempt¬ 
ing  to  retrofit  either  into  existing  systems.  It  seems  that  the 
lessons  learned  in  HCI  work  in  regards  to  the  need  to  consider 
it  early  and  often  during  design  also  apply  to  security  (security 
engineering  is  an  important  topic  in  that  field,  much  as  usabil¬ 
ity  engineering  is  in  HCI).  Additional  guidelines  also  called  for 
a  focus  on  the  user  by  looking  for  high-level  building  blocks  that 
can  be  used  to  create  user-oriented  solutions,  rather  than  as¬ 
suming  users  will  be  skilled  at  assembling  their  own  tools  for 
effective  security.  They  pointed  out  that  the  security  community 
has  long  held  the  belief  that  security  is  more  important  than 
user  needs  and  that  users  must  adapt  to  requirements  in  order 
to  assure  system  security.  An  empirical  study  (Balfanz,  Durfee, 
Grinter,  Smetters,  &  Stewart,  2004)  showed  how  hard  it  is  for  se¬ 
curity  researchers  to  accept  the  difficulty  of  using  some  of  their 
own  systems.  When  they  evaluated  the  time  it  took  to  secure  a 
wireless  network  using  a  tool  they  had  developed,  they  were 
surprised  to  find  that  it  took  users  over  two  hours  on  average. 
The  authors  mentioned  that  the  reaction  in  the  security  com¬ 
munity  was  often  that  the  empirical  data  must  be  wrong.  While 
these  views  are  changing  with  the  emergence  of  usable  secu¬ 
rity  research  groups  like  this  one,  we  should  not  expect  the  dif¬ 
ficult  trade-offs  to  be  quickly  resolved. 

Maxion  and  Reeder  (2005)  provided  a  study  in  which  a 
system  specifically  designed  to  assist  users  in  avoiding  errors 
(Salmon)  was  compared  to  the  standard  operating  system  in¬ 
terface  for  making  file-sharing  decisions.  Salmon  was  found  to 
increase  successful  task  completion  by  30096  (the  standard  in¬ 
terface  used  misleading  terminology,  which  caused  users  to  feel 
that  they  were  sharing  files  when  in  fact  they  were  not).  Users 
also  spent  less  time  searching  for  information  using  Salmon  and 
had  a  greater  proportion  of  their  time  on  essential  task  steps 
rather  than  security  overhead.  In  their  study,  they  demonstrated 
that  attending  to  error  avoidance  in  interface  design  could  fa¬ 
cilitate  usable  security. 

Authentication  mechanisms.  Security  systems  are  de¬ 
signed  to  let  authorized  people  in  (the  permission  problem), 
and  to  keep  unauthorized  people  out  (the  prevention  prob¬ 
lem).  This  involves  three  distinct  steps:  identification,  authenti¬ 
cation,  and  authorization  (Renaud,  2005).  The  identification 
step  asks  a  person  to  identify  himself — usually  by  means  of  a 
token  or  an  identification  string  such  as  an  e-mail  address  or 
account  number.  Once  the  identification  token  has  been  pro¬ 
vided,  the  person  has  to  provide  some  evidence  of  his  or  her 
identity  (authentication) .  This  can  be  done  by  presenting  some¬ 
thing  he  or  she  knows  (e.g.,  password),  something  he  or  she 
recognizes  (e.g.,  graphical  passwords),  something  he  or  she 
holds  (e.g.,  a  certificate),  or  something  he  or  she  is  (e.g.,  bio¬ 
metrics).  For  the  first  three,  authentication  depends  on  the  user 
and  the  system  sharing  a  secret,  which  for  security  purposes 
should  be  difficult  for  someone  else  to  guess,  and  for  usability 
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purposes  should  be  easy  for  the  user  to  remember.  The  trade¬ 
off  between  these  two  purposes  can  be  difficult  to  resolve,  and 
contributes  to  discussions  of  how  much  security  rather  than  a 
view  of  security  as  something  that  is  either  present  or  not.  In  the 
case  of  biometrics,  the  system  records  a  digital  representation 
of  some  aspect  of  a  person’s  physiology  or  behavior  at  enroll¬ 
ment,  and  this  is  confirmed  at  authentication  time. 

Many  authentication  scenarios  can  be  strengthened  (in  a  se¬ 
curity  sense)  with  public  key  cryptography.  For  example,  a  user 
can  have  a  smart  card  that  contains  a  public  key  and  a  match¬ 
ing  private  key.  Instead  of  a  password,  the  user’s  public  key  can 
be  placed  on  file  at  a  remote  computer  system  (authentication 
server).  To  authenticate  the  user,  the  remote  system  sends  the 
user  a  random  challenge.  The  user  signs  the  challenge  with  his 
private  key  and  sends  the  result  back  to  the  remote  server, 
which  verifies  the  signature  with  the  public  key.  In  this  way,  the 
remote  server  can  verify  that  the  user  has  possession  of  the  pri¬ 
vate  key  without  having  to  receive  it.  Instead  of  having  the  pub¬ 
lic  key  on  file  at  the  remote  system,  the  smart  card  can  submit 
both  the  signed  challenge  and  a  public  key  certificate  that  has 
been  signed  by  a  third  party.  In  this  case,  the  use  of  public  key 
technology  is  called  public  key  infrastructure  (PKI) . 

"Whatever  methods  are  used,  at  each  stage  of  an  authentica¬ 
tion  process,  we  can  ask,  “Is  it  secure?”  The  real  areas  of  vulner¬ 
ability  are  the  input  mechanism  and  the  user.  In  the  case  of 
knowledge-based  authentication,  the  user  must  be  able  to  keep 
the  secret  and  the  secret  must  be  hard  to  discover. 

Passwords.  Random  passwords  are  currently  the  most 
popular  user  authentication  mechanism.  As  such,  they  also  rep¬ 
resent  the  most  common  research  target  for  work  in  the  secu¬ 
rity  and  usability  arena.  One  might  consider  them  the  white  rat 
for  HCI  work,  similar  to  menu  structures  or  word  processor  de¬ 
signs  of  earlier  HCI  eras.  Yan,  Anderson,  Blackwell,  and  Grant, 
(2005)  conducted  research  to  examine  some  of  the  commonly 
held  beliefs  about  the  security  and  usability  properties  of  vari¬ 
ous  password  generation  guidelines.  In  their  work,  they  con¬ 
firmed  that  passwords  generated  by  typical  advice  (e.g.,  use  of 
a  random  sequence  of  letters  and  numbers)  can  be  difficult  for 
people  to  remember,  but  that  more  cognitively  friendly  advice 
(such  as  select  mnemonic-based  passwords)  can  be  much  easier 
to  remember  without  sacrificing  the  theoretical  security  levels 
of  random  passwords.  They  also  explored  additional  security 
screening  capabilities — such  as  screening  for  weak  choices  (e.g., 
obvious  dates/places) — and  found  that  such  filtering  could  work 
well  with  other  advice  provided  to  users.  Contrary  to  beliefs 
held  within  security  communities,  they  did  not  find  that  theo¬ 
retical  analysis  necessarily  corresponded  to  actual  system  secu¬ 
rity.  For  example,  encouraging  the  use  of  truly  random  pass¬ 
words  resulted  in  more  frequent  writing  down  and  carrying 
passwords  than  encouraging  the  use  of  mnemonic  passwords. 

Others  have  taken  up  the  challenge  of  providing  usable  pass¬ 
words  through  nontext  password  schemes.  For  example.  Mon¬ 
rose  and  Reiter  (2005)  provided  an  analysis  of  graphical  pass¬ 
words.  As  with  text  passwords,  graphical  passwords  might  be 
selected  by  the  user  (possibly  enhancing  usability  but  perhaps 
decreasing  security)  or  selected  by  the  system  (enhancing  se¬ 
curity  at  the  cost  of  usability).  "While  there  is  certainly  ample  ev¬ 
idence  that  people  exhibit  powerful  memory  for  images  (e.g.. 


Mandler,  1991),  it  is  not  completely  clear  that  this  translates  eas¬ 
ily  into  a  superior  password  mechanism.  Results  from  Monrose 
and  Reiter  demonstrated  that  graphical  password  schemes  suf¬ 
fer  drawbacks  similar  to  those  of  textual  schemes.  People 
tended  to  select  memorable  graphical  passwords  enabling  them 
to  be  more  easily  attacked,  and  random  graphical  figures  can 
be  more  secure  but  also  more  difficult  to  recall. 

"Wiedenbeck,  "Waters,  Birget,  Brodskiy  and  Memon,  (2005)  ex¬ 
perimented  with  an  interesting  variant  of  graphical  password. 
In  their  PassPoint  system,  users  are  provided  with  an  image  in 
which  they  establish  a  password  by  selecting  a  series  of  points  on 
the  image  (the  number  of  points  selected  and  the  order  in  which 
they  are  selected  are  factors  in  the  complexity — and  thus  the  po¬ 
tential  security  features  of  the  password).  Their  experimental 
work  indicated  that  such  a  system  is  promising  from  a  usability 
perspective — people  can  learn  the  systems  easily  and  remember 
their  sequences  over  time.  However,  performance  with  such  a 
system  was  not  as  good  as  with  a  comparison  textual  password 
system.  It  took  more  time  to  learn  the  passwords,  and  more  time 
to  enter  them  even  with  practice.  The  extent  to  which  such  per¬ 
formances  differences  might  be  due  to  novelty  associated  with 
graphical  systems  compared  to  textual  systems  remains  unclear, 
and  is  in  need  of  further  research  before  we  might  expect  them 
to  replace  the  textual  passwords  with  which  we  have  so  much  ex¬ 
perience.  As  De  Angeli,  Conventry  Johnson,  and  Renaud  (2005) 
reported  in  their  work  on  the  graphical  authentication  systems, 
successful  design  of  password  systems  is  a  complex  task  and  re¬ 
quires  considering  and  weighting  a  number  of  factors  (such  as 
the  trade-off  between  security  and  usability). 

Some  of  the  trade-offs  between  security  and  usability,  and 
some  of  the  tension  between  the  approaches  of  the  two  com¬ 
munities  can  be  seen  in  some  recent  research  lines.  Davis,  Mon¬ 
rose,  and  Reiter  (2004)  provided  a  study  that  looked  at  how  user 
selection  of  passwords  in  graphical  password  schemes  might  in¬ 
crease  the  likelihood  that  passwords  can  be  attacked.  For  text 
passwords,  user  selection  is  the  norm,  and  there  is  no  evidence 
that  allowing  users  to  select  passwords  (following  some  guide¬ 
lines  for  length  and  character  type)  makes  the  resulting  pass¬ 
words  any  easier  to  brute  force  guess  than  the  theoretical  max¬ 
imum  set  of  passwords  for  the  set  of  characters  involved. 
Several  new  approaches  for  password  selection — including  a 
number  that  involves  allowing  the  user  to  select  a  sequence  of 
images — have  been  developed  to  help  with  password  memora¬ 
bility  (for  a  commercially  available  example,  see  Real  User  Cor¬ 
poration,  2002).  "While  there  has  been  research  on  several  of 
these  approaches  to  indicate  that  they  provide  passwords  that 
are  more  memorable  than  text  passwords,  and  have  similar  the¬ 
oretical  protection  (entropy),  there  has  not  been  much  research 
into  how  effective  they  are  in  actual  use.  The  research  reported 
by  Davis  et  al.  suggested  that  people  do  not  select  as  randomly, 
and  instead  form  passwords  with  a  limited  set  of  possibilities 
(perhaps  analogous  to  forming  passwords  with  strings  that 
might  be  easily  associated  with  an  individual  such  as  birthdates). 
Overall,  many  questions  remain  regarding  how  to  optimize  both 
security  and  usability  at  the  same  time  in  the  area  of  knowledge- 
based  passwords. 

Biometrics.  As  we  have  discussed,  authentication  in  com¬ 
puter  systems  has  suffered  from  the  limited  number  of  types  of 
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mechanisms  available  for  a  system  to  know  who  it  is  interacting 
with.  With  advances  in  technology,  we  have  new  opportunities 
(beyond  traditional  mechanisms  such  as  passwords)  for  systems 
to  recognize  us.  Biometrics  refers  to  a  means  of  identification 
that  can  be  uniquely  associated  with  an  individual  (e.g.,  voice 
patterns,  fingerprints,  and  hand  geometry).  Biometrics  involves 
the  comparison  of  live  anatomical,  physiological,  or  behavioral 
characteristics  to  a  stored  template  of  a  person  (Coventry,  2005). 
Physiological  biometrics  that  have  been  investigated  include 
those  based  on  fingerprints,  hand  and/or  finger  geometry,  and 
the  patterns  of  retinas,  veins,  irises,  and  faces.  Behavioral  bio¬ 
metric  techniques  that  have  been  advanced  include  those  based 
on  voice,  signature,  and  typing  behavior  (Peacock,  Ke,  &  Wilker- 
son,  2005).  Other  techniques,  which  are  not  yet  as  well  devel¬ 
oped,  include  recognition  of  characteristics  such  as  ear  shape, 
gait,  laughter  recognition,  facial  thermograms,  and  lip  shapes. 

To  begin,  it  is  important  to  distinguish  between  the  use  of 
biometrics  for  identification  and  verification.  For  identification, 
the  task  is  to  identify  an  individual  out  of  a  population  of  all  pos¬ 
sible  users.  As  the  population  of  possible  users  grows,  the  de¬ 
mands  on  identification  can  also  grow,  in  terms  of  either  per¬ 
formance  or  accuracy  of  the  recognition  system — challenges 
similar  to  speech  recognition  for  large  vocabulary  unconstrained 
speech.  For  verification,  the  task  is  to  verify  a  particular  identity 
by  matching  a  characteristic  to  a  stored  template  for  the  indi¬ 
vidual.  The  computational  task  is  much  simpler  for  verification 
than  for  identification  (where  only  fingerprints,  retinal  scanning, 
and  iris  scanning  have  been  proven  successful  for  large  popula¬ 
tions).  For  the  most  part,  we  will  talk  about  the  use  of  biomet¬ 
rics  for  verification  in  this  chapter. 

Coventry  (2005)  provided  an  excellent  illustration  of  how 
biometrics  might  be  used  and  what  some  of  the  current  trade¬ 
offs  are  for  a  common  application — automatic  teller  machines 
(ATMs).  For  all  techniques,  usability  issues  can  be  cited  as  bar¬ 
riers  to  implementation  and  acceptance.  Fingerprint  identifica¬ 
tion  is  relatively  well  developed,  but  sensors  placed  in  public 
places  are  subject  to  environmental  issues  (such  as  dirt  or  latent 
prints)  as  well  as  user  training  issues  (they  can  be  sensitive  to 
where  the  finger  is  placed  on  the  reading  device).  Retina  scans 
have  attractive  potential,  but  are  currently  quite  invasive  and  dif¬ 
ficult  to  develop  for  a  full  range  of  users  (they  require  placing 
the  eye  close  to  a  sensor  location).  Speaker  verification  can  be 
subject  to  background  noise  problems  and  relatively  compli¬ 
cated  user  registration  requiring  more  input  than  a  single  word 
or  phrase.  Signature  verification  is  attractive  because  of  its  long 
use  and  association  with  financial  authorization,  but  current 
technology  is  not  yet  seen  as  reliable.  Typing  verification  is  seen 
as  having  some  potential,  but  lacking  the  reliability  necessary  for 
large  population  verification. 

Biometrics  researchers  have  determined  that  real-life  users 
are  the  biggest  variable  in  system  performance.  Ashbourn 
(2000)  suggested  several  user  characteristics  for  evaluating  bio¬ 
metric  systems.  These  included  a  users  general  acceptance  of 
the  biometrics  concept  employed  (e.g..  Is  the  idea  of  having  a 
system  read  such  characteristics  acceptable?),  general  knowl¬ 
edge  of  the  technology  (e.g.,  Does  the  user  understand  what  is 
being  done?),  knowledge  of  the  particular  biometric  character¬ 
istic  (e.g..  Does  the  user  understand  that,  if  fingerprints  are  be¬ 
ing  read,  his  or  her  finger  should  be  centrally  placed  on  the 


sensing  device?),  experience  with  the  sensing  device,  environ¬ 
ment  of  use  (e.g.,  public  or  private),  and  transaction  criticality. 

While  biometrics  technologies  are  improving  rapidly,  inher¬ 
ent  performance  limitations  remain  and  are  extremely  difficult 
to  work  around,  except  perhaps  by  combining  multiple  tech¬ 
nologies  or  providing  for  a  bypass. 

Other  security  approaches.  One  other  way  of  authenti¬ 
cation  by  something  I  have  is  with  smart  cards — here  referring 
to  a  portable  device  (or  card)  with  processing  power  and  au¬ 
thentication  information  contained  on  it.  Smart  cards  essentially 
add  an  integrated  circuit  to  familiar  plastic  credit  cards  enabling 
the  use  of  cryptographic  services  like  random  number  genera¬ 
tion  and  public  key  cryptography  (Piazzalunga,  Salvaneschi,  & 
Coffetti,  2005). 

Just  (2004)  examined  the  design  of  challenge  question  sys¬ 
tems,  looking  for  ways  to  improve  the  usability/security  charac¬ 
teristics  of  these.  Beckles,  Welch,  and  Basney  (2005)  looked  at 
the  usability  of  security  mechanisms  in  grid  computing  contexts. 

Encryption  of  data  is  a  well-established  mechanism  for  pro¬ 
tecting  information.  Rather  than  storing  or  transmitting  data 
which  can  be  read  in  the  clear  (e.g.,  data  which  is  stored  in  stan¬ 
dard  code  schemes  such  as  ASCII),  information  is  encrypted 
using  a  key  and  then  must  be  decrypted  using  a  key.  While  this 
contributes  to  achieving  some  security  goals  (for  example,  if  an 
encrypted  file  containing  sensitive  information  is  lost  or  stolen, 
it  would  be  a  considerably  harder  task  to  make  use  of  the  infor¬ 
mation  than  if  an  plain  data  file  were  lost  or  stolen),  there  is  a  us¬ 
ability  cost  for  this  added  security — users  might  have  to  indicate 
when  and  how  files  are  to  be  encrypted  or  decrypted.  Unfortu¬ 
nately,  the  complexity  of  this  process  has  been  found  to  cause 
user  difficulties,  variously  resulting  in  inability  to  use  security 
mechanisms  appropriately  (Whitten  &  Tygar,  1999;  Caloyan- 
nides,  2004;  Guttmann,  2003).  Work  continues  addressing  the 
issues  associated  with  cryptographic  approaches  to  security 
(e.g.,  Balfanz,  Durfee,  &  Smetters,  2005). 

Usability  Challenges  Specific  to  Security — Attack  Defense 

Much  of  what  we  present  above  should  seem  familiar  to  the 
usability  practitioner  in  the  sense  that  security  considerations 
should  be  considered  as  part  of  the  overall  system  use  context. 
The  choice  of  authentication  technique  in  general  should  con¬ 
sider  the  various  trade-offs  in  the  techniques  available.  Biomet¬ 
ric  techniques  can  be  considered  if  users  are  likely  to  accept 
them  and  if  they  cover  the  population  of  expected  users.  If  pass¬ 
words  are  employed,  they  should  be  easy  for  users  to  recall  and 
enter  but  difficult  to  guess  or  steal.  Some  specific  issues  arise  be¬ 
cause  of  the  nature  of  security  as  a  protection  against  attack. 
The  fact  that  it  is  not  just  the  user,  but  also  that  other  parties 
with  intention  to  do  harm  that  must  be  considered,  makes  con¬ 
sidering  usability  issues  complicated  in  new  ways. 

For  the  majority  of  usability  work,  the  goal  is  to  make  the 
user’s  primary  task  easy.  For  security  considerations,  we  en¬ 
counter  a  different  situation — the  fact  that  people  might  be  trying 
to  deceive  the  user  and  commit  fraud  (Conti,  Ahamad,  &  Stasko, 
2005).  Of  considerable  recent  interest  in  the  security  arena  is  the 
battle  against  phishing — the  misrepresentation  by  someone  of  an 
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identity  or  website  intended  to  draw  the  user  into  interactions 
that  can  be  harmful  to  the  user.  A  phishing  attack  succeeds  when 
a  user  is  tricked  into  forming  an  inaccurate  model  of  the  interac¬ 
tion  and  when  the  user  takes  actions  contrary  to  their  intentions. 
Attacks  often  begin  with  e-mails  sent  to  potential  victims,  pur¬ 
porting  to  be  from  an  individual  or  organization  with  whom  that 
the  user  has  some  legitimate  reason  for  interacting. 

The  issue  of  phishing  has  been  viewed  as  a  model  problem  for 
illustrating  usability  concerns  in  security  (Dhamija  &  Tygar,  2005; 
Miller  &  Wu,  2005).  An  Anti-Phishing  Working  Group  (2005)  col¬ 
lected  and  described  phishing  attacks.  Analysis  of  the  reasons  why 
phishing  attacks  are  so  often  successful  points  to  a  number  of  in¬ 
teresting  usability  aspects.  For  example,  operating  systems  and 
windowing  platforms  that  permit  general-purpose  graphics  also  al¬ 
low  attackers  to  mimic  the  appearance  of  legitimate  websites. 
Users  tend  to  habituate  on  commonly  occurring  warnings  about 
submitting  data  over  unencrypted  connections  and  can  easily  fail 
to  notice  when  they  are  actually  entering  information  to  an  inse¬ 
cure  website.  Because  organizations  can  invest  heavily  in  having 
their  names  and  logos  associated  with  trust,  attackers  can  take  ad¬ 
vantage  of  this  association  by  simply  convincing  the  user  that  they 
represent  a  trusted  organization.  Because  security  is  generally  a 
secondary  goal  for  users,  there  is  only  so  much  attention  we  can 
expect  them  to  pay  to  attempts  at  fraud. 

End  users  are  not  alone  in  trying  to  prevent  attacks  on  sys¬ 
tems.  For  people  working  in  organizations  with  system  admin¬ 
istrators,  the  usability  problem  is  partly  moved  from  the  end 
user  to  the  administrator.  Here  the  focus  shifts  to  making  tools 
for  users  whose  primary  activities  include  monitoring  the  health 
of  the  overall  system.  Security  is  one  of  their  main  tasks.  Intru¬ 
sion  detection — the  problem  of  detecting  computer  attacks  in  a 
timely  manner — is  one  of  both  great  difficulty  and  utmost  im¬ 
portance.  Finding  specific  evidence  of  attack  activity  in  the  enor¬ 
mous  number  of  potentially  relevant  alerts,  packets,  operating 
system  events,  and  user  actions  presents  an  almost  overwhelm¬ 
ing  task  for  intrusion  detection  (ID)  analysts.  Visualization  tools 
and  techniques  can  be  used  to  increase  the  effectiveness  of  in¬ 
trusion  detection  analysts  by  more  fully  exploiting  their  visual 
cognition  abilities.  Recent  work  identified  promising  avenues 
for  research  in  visualization  for  intrusion  detection  (e.g..  Good- 
all,  Ozok,  Lutters,  Rheingens,  &  Komlodi,  2005).  Goodall  et  al. 
presented  a  user-centered  visualization  based  understanding  of 
the  work  of  ID  and  the  needs  of  analysts  derived  from  the  first 
significant  user  study  of  ID.  Consistent  with  good  information 
visualization  practice,  their  tools  presented  analysts  with  both 
at-a-glance  understanding  of  network  activity  and  low-level  net¬ 
work  link  details.  Results  from  preliminary  usability  testing  show 
that  users  performed  better  and  found  easier  those  tasks  deal¬ 
ing  with  network  state  in  comparison  to  network  link  tasks. 


Summary  of  Findings  and  Recommendations 

We  would  all  like  our  information  to  be  secure.  That  is  why 
we  would  like  our  systems  to  be  resistant  to  access  or  modifica¬ 
tion  from  individuals  who  we  do  not  authorize  to  have  access 
to  them.  This  requires  that  users  be  aware  of  security  mecha¬ 
nisms  even  though  they  would  generally  prefer  not  to.  The  de¬ 
sign  challenge  is  to  make  security  appropriately  usable — that  is 


to  make  it  as  easy  to  use  as  possible  for  the  intended  users, 
while  making  it  as  difficult  as  possible  to  circumvent  for  the  po¬ 
tential  attacker.  This  means  that  security  administration  and 
user  involves  trade-offs  that  need  to  be  evaluated  in  the  design 
of  a  system.  Absolute  security  is  a  myth — we  need  to  under¬ 
stand  that  we  are  looking  for  appropriate  levels  of  security. 

Some  of  the  aspects  of  security  that  researchers  have  investi¬ 
gated  are  mentioned  above  and  summarized  below  in  Table  32.2. 
We  think  that  serious  consideration  of  the  trade-offs  between  us¬ 
ability  and  security  just  began  and  that  much  work  remains. 
While  initial  approaches  involved  hiding  security  controls  from 
users,  we  believe  that  this  needs  to  be  done  in  balance  with  giv¬ 
ing  the  user  the  ability  to  control  the  level  of  security  required. 


Views  on  Trust 

Trust  is  important  to  every  relationship  that  exists  in  society, 
both  those  of  personal  and  public  nature.  Trust  is  related  to 
security  and  privacy  but  differs  from  these  concepts  in  that  it 
is  not  an  objective  measure.  Trust  is  based  on  the  perception 
that  the  person,  organization,  or  system  one  is  dealing  with  is 
reliable.  There  are  many  definitions  of  trust  and  none  of  them 
satisfies  every  use.  These  definitions  vary  across  academic  do¬ 
mains  and  with  context.  When  dealing  with  e-commerce  and 
other  computer  applications,  the  user  is  deciding  not  only  to 
trust  the  individual  or  organization,  but  also  the  technology  im¬ 
plementation  provided  by  the  individual  or  organization. 
Kuhlen  (1998)  defined  trust  as  “allowing  us  to  act  as  if  we  have 


TABLE  32.2.  Design  Factors  Impacting  Usable  Security 


Automatic  Security  Versus 
User  Controlled  Security 


Authentication  Mechanisms 


Passwords 


Biometrics 


Attack  Considerations 


•  Security  is  not  generally  the  main 
user  task  focus  for  PIM. 

•  Risk  Management  is  not  a  topic 
that  end  users  explicitly  understand. 

•  Access  to  resources  is  controlled 
by  knowledge  of  who  the  user  is. 

•  Identification  relies  on 
authentication  through  one 
(or  more)  mechanisms. 

•  Mechanisms  have  usability/security 
trade-offs. 

•  Can  be  "something  I  know”  or 
"something  I  have.” 

•  Usually  textual,  but  can  be  other. 

•  Users  will  forget  passwords. 

•  Users  will  select  passwords  that  are 
not  optimally  secure. 

•  A  form  of  "something  I  have" 
authentication. 

•  Not  all  users  might  have  a 
characteristic  (e.g.,  fingerprint). 

•  Recognition  technology  is  involved, 
so  errors  and  attacks  are  possible. 

•  Complete  design  requires 
consideration  of  possible  attacks. 

•  Anti-phishing  approaches 

•  Visualization  for  system 
administrator  tools 
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perfect  knowledge  about  the  reliability  of  that  entity  even  if  we 
do  not”  (p.  3).  Marsh  and  Dibben  (2003)  defined  trust  as  con¬ 
cerning  “a  positive  expectation  regarding  the  behavior  of  some¬ 
one  or  something  in  a  situation  that  entails  risk  to  the  trusting 
party”  (p.  466).  Jpsang  and  Presti  (2004)  defined  trust  as  “the 
extent  to  which  one  party  is  willing  to  depend  on  somebody 
or  something,  in  a  given  situation  with  a  feeling  of  relative  se¬ 
curity,  even  though  negative  consequences  are  possible”  (p.  1). 
All  of  these  definitions  share  the  concept  that  trust  is  about  al¬ 
lowing  us  to  feel  comfortable  and  allowing  another  entity,  such 
as  an  organization  or  an  organization’s  computer  system,  to 
take  one  or  more  actions  for  us  even  though  there  is  risk  that 
we  will  be  harmed  in  some  way.  In  other  words,  in  the  domain 
of  e-commerce,  the  trusting  party  believes  that  the  trusted  or¬ 
ganization  has  implemented  privacy  and  security  so  that  it  will 
protect  the  PI  of  the  individuals  interacting  with  it. 

Earlier  in  this  chapter,  we  discussed  the  ongoing  research 
into  security  and  privacy  functionality  that  is  intended  to  protect 
information  that  is  collected  and  stored  by  organizations.  There¬ 
fore,  in  this  discussion,  we  will  assume  that  privacy  and  secu¬ 
rity  protections  are  in  place,  and  we  will  concentrate  on  the  fac¬ 
tors  that  affect  individuals’  perception  of  trust. 

Factors  Influencing  Trust 

Many  researchers  have  studied  the  factors  that  influence  trust 
when  dealing  with  online  applications.  Brodie,  C-M.  Karat,  and 
J.  Karat  (2004)  showed  that  greater  degrees  of  transparency  and 
control  regarding  the  use  of  personal  information  do  increase 
website  visitor  trust  in  the  domain  of  IT  equipment.  In  addition, 
to  control  of  one’s  own  data  and  transparency  of  its  use,  re¬ 
searchers  have  found  that  an  individual’s  familiarity  with  a  web¬ 
site  and  the  perceived  credibility  and  quality  of  the  website  also 
affect  trust.  Fogg  et  al.  (2001)  found  that  website  visitors’  per¬ 
ceptions  of  the  credibility  of  websites  were  most  influenced  by 
the  degree  to  which  it  was  connected  to  a  known  organization  in 
the  real  world  and  whether  it  seemed  well  designed  and  imple¬ 
mented.  Building  on  this  research,  Fogg,  Soohoo,  Danielson, 
Marable,  Stanford,  and  Tauber  (2003)  conducted  a  study  with 
2,500  participants  in  order  to  understand  factors  that  influence 
the  perceived  credibility  of  a  website.  They  found  18  factors  that 
influenced  credibility.  The  biggest  factor  by  far  was  the  design  of 
the  appearance  of  the  website.  This  was  followed  by  the  infor¬ 
mation  design  or  structure,  information  focus,  perceived  moti¬ 
vation  of  the  website  owner,  perceived  usefulness  of  the  infor¬ 
mation,  accuracy  of  the  information,  name  recognition  and 
reputation  of  the  website,  the  tone  of  the  writing,  the  identity 
of  the  sponsor,  functionality  on  the  website,  customer  service, 
past  experience  with  the  website,  clarity  of  the  information,  per¬ 
formance  on  a  test,  readability,  and  affiliations  (Fogg  et  al.,  2003). 
They  attributed  the  importance  of  the  design  look  of  the  website 
to  the  fact  that  it  is  very  prominent  when  a  visitor  looks  at  a  web¬ 
site.  This  supports  Fogg’s  (2003)  theory  that  credibility  is  based 
on  noticing  a  feature  and  then  interpreting  its  quality  to  create 
judgment.  This  theory  is  posited  as  a  reason  why  there  are  ap¬ 
parent  differences  in  findings  relating  to  the  affect  of  privacy 
guarantees  (e.g.,  privacy  statements  and  privacy  seals)  on  web¬ 


sites  (Palmer,  Bailey,  &  Faraj,  2000;  Turow,  2003).  Palmer  et  al. 
found  that  privacy  seals  and  statements  did  contribute  to  user’s 
trust  of  website,  while  Turow  found  that  website  visitors  do  not 
read  privacy  statements  or  understand  the  meaning  of  seals  and 
that  there  is  a  need  to  educate  the  public  regarding  privacy  is¬ 
sues  and  the  web. 

Given  that  familiarity  is  a  factor  in  establishing  trust,  Zhang 
and  Ghorbani  (2004)  identified  several  factors  that  influence 
familiarity,  including  the  individual’s  knowledge  of  similar  ser¬ 
vices  (prior  experience) ,  the  number  of  times  he  or  she  has  vis¬ 
ited  the  particular  website  (repeated  exposure),  the  length  of 
each  visit  to  the  website  (level  of  processing) ,  and  the  interval  of 
time  between  visits  (forgetting  rate) . 

Corritore,  Kracher,  and  Wiedenbeck  (2003)  created  a  model 
using  similar  factors  as  influencing  trust  in  the  online  world.  In 
this  model,  trust  is  determined  by  both  external  factors  and  the 
user’s  perception  of  the  application  or  website.  The  external 
factors  include  the  user’s  propensity  to  trust  and  prior  experi¬ 
ence  in  similar  situations  and  the  user’s  perceptions  include  his 
or  her  perception  of  the  credibility  of  the  website,  how  easy  it 
is  to  use,  and  the  degree  of  risk  involved  (Corritore  et  al.,  2003). 
Briggs,  Simpson,  and  De  Angeli  (2004)  looked  at  trust  in  a  nar¬ 
rower  domain — advice  websites.  They  developed  a  model  of 
how  people  determine  whether  to  trust  the  advice  they  receive 
from  websites.  The  authors  developed  a  22-point  scale  designed 
to  break  down  trust  into  a  set  of  judgments  that  can  be  mea¬ 
sured.  This  list  includes  whether  the  user  perceives  the  infor¬ 
mation  to  be  prepared  by  an  expert  or  a  knowledgeable  source, 
whether  comments  from  other  users  were  available  on  the  web¬ 
site,  if  the  website  was  owned  by  a  known  and  respected  com¬ 
pany,  if  they  had  to  wait  a  long  time  on  the  website,  if  different 
options  were  suggested  by  the  website,  if  the  website  was  per¬ 
ceived  as  hard  to  use,  whether  the  user  felt  involved  in  how  the 
website  constructed  the  advice  offered,  if  the  site  was  perceived 
to  be  interactive,  if  the  advice  was  tailored  to  the  user,  if  the 
reasoning  was  explained  to  the  user,  if  there  was  an  opportunity 
to  contact  a  human,  if  the  advice  appeared  to  be  impartial,  if  the 
advice  was  perceived  to  be  good  and  the  user  trusted  it,  and  if 
the  website  behaved  in  a  predictable  way  (Briggs  et  al,  2004). 
This  study  supported  an  earlier  study  that  found  trust  was  in¬ 
fluenced  by  the  perceived  credibility  of  the  source  of  the  infor¬ 
mation,  whether  the  website  was  personalized  so  that  informa¬ 
tion  was  tailored  to  the  individual,  and  whether  the  website  was 
operated  in  a  predictable  way  (Briggs,  Buford,  De  Angeli,  & 
Lynch,  2002).  While  many  have  found  the  trust  affects  the  use  of 
personalization,  Briggs,  Simpson,  and  De  Angeli  (2004)  found 
that  the  presence  of  personalization  could  contribute  to  trust  if 
it  allowed  the  visitor  to  feel  that  the  website  is  tailored  to  their 
needs.  Sillence,  Briggs,  Fishwick,  and  Harris  (2004)  had  similar 
results  when  studying  trust  of  information  on  health  related 
websites.  They  proposed  a  three  stage  model  of  trust  in  which 
visitors  make  initial,  rapid  assessment  of  a  website  based  on  the 
design  and  appearance  of  the  website  and  then  do  a  more  sys¬ 
tematic  evaluation  in  which  the  credibility  of  the  website  and 
the  degree  to  which  it  appears  to  be  personalized  to  their  situ¬ 
ations  becomes  more  important.  The  participants  were  look¬ 
ing  for  information  written  by  or  for  people  in  situations  simi¬ 
lar  to  their  own.  The  third  stage  of  this  model  addressed  the 
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maintenance  of  the  trust  relationship  over  time  and  left  future 
work  in  this  paper  (Sillence  et  al.,  2004). 

Patrick,  Briggs,  and  Marsh  (2005)  provided  a  thorough 
overview  of  many  issues  related  to  trust,  including  credibility 
of  information  on  a  website,  familiarity  with  the  website,  and 
external  factors  such  as  prior  knowledge  and  disposition  to 
trust.  They  also  described  how  trust  is  developed  slowly  over 
time,  but  a  bad  experience  or  even  a  processing  error  can  de¬ 
stroy  that  trust  very  quickly.  They  reviewed  a  number  of  pri¬ 
vacy  models  that  have  been  operationalized  using  question¬ 
naires,  including  Bhattacherjee’s  (2002)  model  that  described 
how  familiarity  leads  to  trust  and  to  a  willingness  to  do  busi¬ 
ness  with  a  website  and  Corritore  et  al.’s  (2003)  model  that 
is  described  above.  Patrick  et  al.  (2005)  concluded  with  design 
guidelines  for  promoting  trust  based  on  the  literature  that 
they  reviewed. 

Trust  and  Risk 

Trust  cannot  be  considered  without  considering  its  relation¬ 
ship  to  risk.  As  Patrick  et  al.  (2005)  pointed  out,  “Trust  is  inti¬ 
mately  associated  with  risk”  (p.  81),  and  people  evaluate  the  de¬ 
gree  of  risk  in  many  situations  both  online  and  offline.  In  the 
online  world,  people  need  different  skills  to  judge  the  trustwor¬ 
thiness  of  websites  and  applications  and  the  organizations  that 
create  them  correctly.  People  can  become  victims  of  phishing  at¬ 
tacks  and  identity  theft  if  they  trust  too  much.  However,  Fried¬ 
man,  Kahn,  and  Howe  (2000)  pointed  out  costs  to  trusting  too  lit¬ 
tle,  as  well  as  opportunities  that  can  be  missed  if  users  trust  too 
little.  Friedman  et  al.  discussed  several  issues  that  make  trust  and 
risk  decisions  more  difficult  than  face-to-face  transactions  such  as 
the  anonymity  of  online  transactions,  the  difficulty  in  assessing 
the  reliability  and  security  of  technology,  and  misleading  state¬ 
ments  and  images.  Patrick  et  al.  (2005)  referenced  work  by 
Chaiken  (1980)  that  described  two  strategies  used  depending  on 
the  perceived  level  of  risk  involved.  Where  people  are  not  highly 
involved  with  the  decision,  they  often  make  decisions  based  on 
appearances,  and  when  they  are  more  deeply  involved,  they  em¬ 
ploy  strategies  that  are  more  systematic  (Chaiken,  1980).  This 
suggests  that  the  degree  to  which  users  believe  they  are  at  risk 
will  determine  the  strategies  they  use  to  assess  the  credibility  of 
a  website  or  system  and  the  organization  behind  it.  This  model 
is  similar  to  the  three-stage  model  proposed  by  Sillence  et  al. 
(2004)  in  that,  in  the  first  stage  of  their  model,  users  make  quick 
decisions  based  on  appearance,  which  is  similar  to  Chaiken’s  de¬ 
scription  of  when  people  are  not  highly  involved.  In  the  second 
stage,  Sillence  et  al.  described  the  use  perceptions  of  the  credi¬ 
bility  of  the  information,  which  is  similar  to  the  systematic  strate¬ 
gies  that  Chaiken  described  when  people  are  more  involved. 

Trust  and  Personalization 

In  the  last  few  years,  many  e-commerce  companies  have 
invested  in  personalizing  the  e-commerce  websites  in  order  to 
encourage  business.  However,  personalization  is  only  useful 
if  users  are  willing  to  share  data  with  the  website,  and  this  has 


not  always  been  the  case.  This  requires  that  users  trust  that 
owners  of  the  website  will  not  misuse  their  data.  Hagan  (2000) 
found  that  website  visitors  often  show  their  distrust  of  internet 
websites  by  disabling  cookies  on  their  computers  and  entering 
incorrect  information  into  online  forms.  Many  researchers 
have  studied  what  aspects  of  an  application  or  website  affect 
user’s  willingness  to  use  personalized  features  of  websites.  The 
Office  of  the  Federal  Privacy  Commissioner  of  Australia  (2000) 
conducted  a  large  study  of  the  factors  that  affect  whether  in¬ 
dividuals  are  willing  to  share  information  with  a  business  or 
other  organization.  They  found  that  it  is  important  to  individ¬ 
uals  that  they  understand  how  their  data  will  be  used  and  that 
they  have  control  over  who  has  access  to  it.  "Welty  and  Becerra- 
Fernandez  (2001)  examined  how  information  technologies 
can  reduce  transaction  costs  and  increase  trust  by  increasing 
the  symmetry  of  information  available  to  both  sides  of  the 
transaction  in  a  business-to-business  (B2B)  setting.  Likewise, 
Brodie,  C.M.  Karat,  and  J.  Karat  (2004)  showed  that  greater  de¬ 
grees  of  transparency  and  control  regarding  the  use  of  PI  do 
increase  willingness  to  share  data  in  order  to  get  access  to  per¬ 
sonalized  features  of  websites  in  the  IT  domain.  Overall,  trust 
and  personalization  have  a  complex  relationship  in  which  the 
presence  of  personalization  functionality  that  can  provide 
website  visitors  with  information  tailored  to  their  situations 
and  needs  has  been  found  to  lead  to  a  greater  degree  of  trust 
with  a  website.  At  the  same  time,  website  visitors  must  trust 
the  website  and  the  organization  behind  it  enough  to  share 
personal  information  if  the  functionality  is  going  to  be  effec¬ 
tive.  Research  has  shown  that  website  visitors  are  willing  to 
share  more  personal  information  when  they  understand  how 
and  by  whom  their  data  will  be  used  and  when  they  have  the 
ability  to  control  how  it  is  used,  which  suggests  that  these  are 
important  factors  for  designers  to  keep  in  mind  when  design¬ 
ing  their  websites. 

Summary  of  Findings  and  Recommendations 

Trust  is  necessary  in  an  uncertain  world.  Given  today’s 
world  in  which  the  Internet  is  playing  a  larger  role  all  of  the 
time  while  the  risks  from  phishing  scams,  spyware,  and 
viruses  continue  to  grow,  helping  users  to  develop  an  appro¬ 
priate  level  of  trust  is  an  important  goal.  While  there  has  been 
a  great  deal  of  research  on  why  and  when  people  trust  in  the 
online  world,  there  is  more  to  do.  HCI  researchers  and  prac¬ 
titioners  can  help  by  continuing  to  build  an  understanding  of 
what  leads  individuals  to  trust  a  website  or  an  application. 
Table  32.3  contains  a  summary  of  the  factors  identified  in  the 
research  cited  in  this  chapter  that  have  been  found  to  influ¬ 
ence  trust  in  the  online  world  to  date.  HCI  practitioners  and 
website  designers  need  to  understand  the  factors  that  influ¬ 
ence  trust  and  use  this  knowledge  as  they  design  websites 
and  applications.  Research  to  date  has  tended  to  focus  on  a 
few  domains,  such  as  e-commerce  and  online  advice  websites, 
and  it  is  not  clear  to  what  extent  the  findings  hold  across  cul¬ 
tural  boundaries.  HCI  researchers  can  address  these  issues 
by  determining  to  what  degree  the  factors  that  influence  trust 
generalize  across  domains  and  cultures. 
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TABLE  33.3.  Design  Factors  Impacting  Trust. 


Perceived  Credibility  of  the 
Website  or  Application 


User  Familiarity  with  a 


Predictability  of  the  Behavior 
of  the  Website 
Perceived  Quality  of 
Information  Provided 

External  Factors 


Ease  of  Use 

Perceived  Deree  of  Risk  in 
Dealing  with  the  Site 
Site  Personalized  for  the  User 


Ability  to  Control 
Informattion  Shared 
with  Site 

Degree  of  Transparency 
Regarding  Use  of  Shared 
Information 

Presence  of  Privacy/Security 
Seals  and  Statements 


•  Credibility  has  been  found  to  be 
influenced  by  many  factors. 

•  The  most  common  factors  sited  was 
website  and  information  design,  the 
perceive  usefulness  of  the  data,  and 
reputation  of  the  provider 

•  Familiarity  has  been  found  to 

be  influenced  by  prior  experiences 
with  similar  sites,  repeated  exposure 
to  the  site,  the  level  of  involvement 
with  the  site  as  measured  by  time 
spend  on  each  page,  and  the  time 
between  visits  to  the  site. 

•  Users  tended  to  trust  sites  that 
behaved  in  a  predictable  fashion. 

•  judgments  of  the  quality  of  online 
advice  is  related  to  the  perceived 
credibility  of  the  site. 

•  These  include  propensity  of  the  user 
to  trust  and  users'  prior  experiences 
in  similar  situations. 

•  Users  trusted  easy  to  use  sites. 

•  Users  were  less  likely  to  trust  sites 
where  they  perceived  more  was  at  risk. 

•  Does  it  appear  to  be  tailored  for  the 
individual  or  be  created  by  a  person  in 
a  similar  situation  to  the  individual? 

•  Users  were  more  willing  to  trust  sites 
with  personal  information  if  they  felt 
they  had  a  control  of  that  information. 

•  Users  were  more  willing  to  trust  sites 
when  they  understood  how  there  data 
would  be  used  and  by  whom. 

•  Users  trust  the  simple  presence  of 
these  seals,  though  it  has  been  shown 
that  these  statements  and  symbols  are 
often  not  read  or  understood. 


CONCLUSIONS  AND  DIRECTIONS 


The  intersection  of  HCI,  privacy,  security,  and  trust  is  emerging  as 
a  critical  area  for  research  amid  the  backdrop  of  recent  world 
events.  Information  is  being  improperly  disclosed  in  ways  that 
causes  real  harm  to  people  through  identity  theft.  Organizations 
are  concerned  about  theft  of  proprietary  data  and  documents. 
Citizens  are  becoming  increasingly  concerned  about  the  use  of 
the  vast  amounts  of  data  that  organizations  and  governments 
have  about  them.  The  rapid  advancement  of  the  use  of  informa¬ 
tion  technology  in  industry,  government,  and  academia  makes 
it  much  easier  to  collect,  transfer,  and  store  sensitive  information 
of  a  variety  of  types  around  the  world.  We  are  faced  with  techni¬ 
cal  challenges  that  result  from  inadequate  consideration  of  se¬ 
curity  and  privacy  issues  in  designing  our  information  systems, 
which  if  not  addressed  will  result  in  a  significant  deterioration 
of  the  trust  that  people  have  in  IT.  While  there  certainly  is  con¬ 
siderable  research  in  this  area,  particularly  on  the  security  side,  it 
is  becoming  increasingly  clear  that  really  making  our  systems  se¬ 
cure  and  enabling  appropriate  attention  to  privacy  issues  will  re¬ 


quire  more  than  just  a  technology  focus.  While  usability  has  been 
identified  as  a  major  challenge  to  moving  the  results  of  security 
and  privacy  research  into  use  in  real  systems  (CRA,  2003),  it  is 
not  clear  that  this  challenge  has  reached  the  interdisciplinary 
researchers  needed  to  carry  out  the  complex  agenda  of  work. 

We  see  privacy  as  a  complex  social  issue  concerned  with  indi¬ 
viduals’  rights  to  know  what  information  is  collected  about  them 
and  how  it  might  be  used.  It  needs  to  be  concerned  with  more 
than  just  the  protection  of  PII,  such  as  credit  card  numbers,  na¬ 
tional  identification  numbers,  and  other  such  identifiers.  In  iso¬ 
lation,  one  data  element  might  not  identify  the  individual  to 
whom  it  relates,  however,  data  aggregation  techniques  are  chang¬ 
ing  the  playing  field.  Research  by  Sweeney  (2001;  Malin  &  Sweeney, 
2004)  demonstrated  that  minimal  amounts  of  information  be¬ 
lieved  to  be  anonymous  can  be  used  to  personally  identify  some¬ 
one.  Questions  regarding  who  has  what  rights  to  information 
about  us  for  what  purposes  become  more  important  as  we  move 
toward  a  world  in  which  it  is  technically  possible  to  know  just 
about  anything  about  just  about  anyone.  As  Adams  and  Sasse 
(2001)  stated,  “Most  invasions  of  privacy  are  not  intentional  but 
due  to  designers’  inability  to  anticipate  how  this  data  could  be 
used,  by  whom,  and  how  this  might  affect  users”  (p.  50). 

Privacy  of  information  held  in  information  systems  depends 
on  security — the  ability  to  protect  a  system’s  resources  from 
harmful  attack  or  unauthorized  access.  Security  is  an  essential 
component  in  systems  that  house  personal  information  on  em¬ 
ployees,  clients,  constituents,  students,  customers,  and  patients. 
The  role  of  security  goes  well  beyond  the  protection  of  PI.  For 
example,  it  includes  corporate  and  national  security  information 
about  trade  secrets,  business  practices,  payrolls,  research,  and 
development;  strategy  and  logistics  plans;  fuel  preserves;  and 
similar  things  that  could  wreak  havoc  if  placed  in  the  wrong 
hands  (e.g.,  by  industrial  competitors  who  wish  to  obtain  un¬ 
fair  advantage;  terrorists  who  wish  to  bring  harm  to  business 
or  government  entities,  bring  down  power  grids,  or  destabilize 
banking  and  financial  systems  around  the  world). 

The  security  functionality  in  these  systems  must  be  usable  for 
the  people  in  the  range  of  user  roles  with  different  perspectives 
and  skills  who  interact  with  it.  If  the  security  aspects  of  a  system 
are  so  complex  that  individuals  within  the  various  user  roles 
(e.g.,  technical  users,  business  users,  and  end  users)  cannot  un¬ 
derstand  it,  then  costly  errors  will  occur.  In  the  HCI  field,  there  is 
a  saying  that  if  the  user  cannot  understand  the  functionality,  it 
does  not  exist.  In  security,  the  problem  may  even  be  worse  than 
the  functionality  not  existing  because  the  organization  may  think 
data  is  secured  when  it  is  not.  Deciding  how  we  can  design  us¬ 
able  and  effective  privacy  and  security  technology  capabilities  for 
the  future  includes  philosophical,  legal,  cultural,  business,  so¬ 
cial  policy,  human  performance,  and  practical  dimensions.  Since 
at  its  core  the  goals  of  HCI  and  user-centered  design  are  to  un¬ 
derstand  the  user,  the  user’s  tasks  or  goals,  and  the  social  and 
physical  context  in  which  the  user  is  interacting  with  the  system, 
all  of  these  dimensions  can  be  considered  as  within  the  scope 
of  creating  usable  and  effective  privacy  and  security  solutions. 

HCI  research  in  privacy  and  security  is  a  critical  area  of  focus  in 
today’s  world.  Communication  and  collaboration  is  occurring  be¬ 
tween  people  who  have  expertise  in  one  or  more  of  the  areas  of 
privacy,  security,  and  HCI.  This  emerging  community  is  beginning 
to  work  together  to  incorporate  HCI  in  the  design  and  develop- 
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ment  process  for  usable  and  effective  privacy  and  security  solu¬ 
tions.  Some  unique  challenges  to  HCI  research  are  related  to  pri¬ 
vacy  and  security  as  compared  to  other  domains.  First,  the  use 
of  security  and  privacy  solutions  is  generally  not  the  user’s  main 
goal.  Users  value  and  want  security  and  privacy  functionality  but 
they  regard  this  functionality  as  secondary  to  completing  their 
primary  tasks  (e.g.,  completing  a  transaction  or  process).  Second, 
as  mentioned  above,  today’s  solutions  need  to  be  usable  for  dif¬ 


ferent  groups  of  users  with  differing  skill  sets  as  compared  to  the 
past  when  security  solutions  in  particular  were  designed  with  a 
highly  trained  and  dedicated  technical  user  in  mind.  Third,  the 
risk  of  the  negative  impact  of  usability  problems  is  higher  for  se¬ 
curity  and  privacy  applications  than  for  many  other  types  of  sys¬ 
tems.  Finally,  users  must  be  able  to  update  security  and  privacy  so¬ 
lutions  easily  to  accommodate  frequent  changes  in  legislation, 
regulation,  and  organizational  requirements. 
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INTRODUCTION 


U.S.  healthcare  expenditures  were  nearly  $1.7  trillion  in  2003 
and  were  expected  to  grow  to  |2  trillion  by  2005  (BlueCross 
BlueShield  Association).  Despite  such  large  spending,  many 
Americans  remain  uninsured  and  do  not  have  access  to  health¬ 
care  services.  Furthermore,  while  our  country  has  the  most  for¬ 
midable  medical  workforce  in  the  world  and  develops  and  uses 
the  most  modern  medical  technologies,  the  World  Health  Orga¬ 
nization  (2000)  recently  rated  the  quality  and  performance  of  the 
U.S.  healthcare  systems  as  being  worse  than  most  of  its  counter¬ 
parts  in  the  western  world.  Chassin,  Galvin,  and  the  National 
Roundtable  on  Health  Care  Quality  (1998)  documented  three 
types  of  quality  problems:  (a)  overuse,  (b)  underuse,  and  (c)  mis¬ 
use.  The  results  of  an  extensive  review  of  over  70  publications 
covering  years  1993  through  2000  provide  “abundant  evidence 
that  serious  and  extensive  quality  problems  exist  throughout 
American  medicine  resulting  in  harm  to  many  Americans”  (Insti¬ 
tute  ofMedicine,  2001,  p.  24).  In  its  first  report,  To  Err  is  Human, 
the  Institute  ofMedicine  (2000)  reported  serious  and  widespread 
errors  in  healthcare  delivery  that  resulted  in  frequent  avoidable 
injuries  to  patients.  The  Institute  ofMedicine  (2001)  suggested 
four  key  underlying  reasons  for  inadequate  quality  of  care  in  the 
U.S.  healthcare  system:  (a)  the  growing  complexity  of  science 
and  technology,  (b)  the  increase  in  chronic  conditions,  (c)  a 
poorly  organized  delivery  system,  and  (d)  constraints  on  exploit¬ 
ing  the  revolution  in  information  technology. 

Reengineering  the  delivery  of  healthcare  services  through  in¬ 
novative  development,  application,  and  use  of  information  and 
medical  technologies  can  result  in  tremendous  cost  savings,  im¬ 
proved  access  to  healthcare  services,  as  well  as  improved  qual¬ 
ity  of  life  for  all  citizens.  Healthcare  professionals  in  the  United 
States  have  recognized  that  both  the  information  revolution  and 
the  biological  revolution  will  offer  tremendous  opportunities — 
and  challenges — for  (re)designing  the  healthcare  system  of  the 
future.  They  are  aware  of  the  need  to  better  utilize  new  infor¬ 
mation  and  communication  technologies  and  incorporate  com¬ 
puting  power  into  care  delivery  and  clinical  practice.  They  are 
also  aware  that  the  widely  publicized  biological  revolution 
(which  includes  both  advances  in  genetics  and  advances  in  bio¬ 
medical  engineering)  will  soon  bring  a  large  number  of  screen¬ 
ing  and  diagnostic  tests  as  well  as  new  treatment  strategies  and 
disease-management  tools.  It  is  clear  that  the  combination  of 
biotechnology,  computing  power,  information  and  communi¬ 
cation  technologies,  distance  technology,  and  sensor  technol¬ 
ogy  will  make  future  delivery  of  healthcare  in  the  United  States 
unrecognizable  from  the  care  we  deliver  today. 

However,  unlike  other  parts  of  the  American  economy,  the 
healthcare  system  has  been  slow  to  embrace  modern  informa¬ 
tion  and  communication  technology,  which  is  often  viewed  as 
too  expensive,  very  unusable,  and  quite  divergent  from  current 
practice  (Schoen,  Davis,  Osborn,  &  Blendon,  2000).  In  addition, 
technologies  do  not  typically  come  “ready-to-use”  off  the  shelf 
and  it  is  extremely  difficult,  if  not  impossible,  for  healthcare 
provider  organizations  to  have  the  range  of  expertise,  in  house, 
that  is  needed  to  design,  adapt,  and  implement  technologies 
to  meet  an  organization’s  needs.  In  fact,  many  healthcare  orga¬ 
nizations  are  often  not  even  fully  aware  of  their  own  needs,  do 


not  know  which  technologies  are  available  for  what,  and  do  not 
know  how  modern  information  and  communication  technolo¬ 
gies  can  be  effectively  used  to  improve  and  simplify  care  deliv¬ 
ery.  In  many  cases,  recent  information  technology  adoptions 
among  healthcare  providers  have  been  driven  by  Federal  and 
state  regulations  and  requirements  rather  than  well-recognized 
internal  needs  and  growth  opportunities.  For  example,  public 
and  private  groups  (i.e.,  the  federal  government  and  the  Leap 
Frog  group,  a  collaboration  of  businesses)  are  strongly  encour¬ 
aging  implementation  of  Computerized  Physician  Order  Entry 
(CPOE)  and  other  information  technologies  in  order  to 
achieved  improvements  in  the  quality  and  efficiency  of  provid¬ 
ing  care  (Kaushal,  Shojania,  et  al.  2003). 

Healthcare  informatics  is  a  field  that  can  be  widely  defined  as 
the  generation,  development,  application,  and  testing  of  infor¬ 
mation  and  communication  principles,  techniques,  theories, 
and  technologies  to  improve  healthcare  delivery.  It  includes  the 
understanding  of  data,  information,  and  knowledge  used  in  the 
delivery  of  healthcare  and  an  understanding  of  how  these  data 
are  captured,  stored,  accessed,  retrieved,  displayed,  interpreted, 
used  and  made  more  efficient.  While  healthcare  informatics  in¬ 
tersects  with  the  field  of  medical  or  bioinformatics,  it  is  different 
in  the  sense  that  it  focuses  on  healthcare  delivery,  and  hence  is 
centered  on  the  patient  (and/or  consumer),  the  clinician  (or 
healthcare  professional,  or  “provider”),  and,  more  importantly, 
the  patient-provider  interaction.  Human-computer  interaction 
(HCI),  from  the  perspective  of  both  the  patient/consumer  as 
well  as  the  provider,  is  essential  to  the  success  of  “healthcare 
informatics.”  In  this  chapter,  we  first  review  the  characteristics 
of  the  healthcare  industry  in  the  United  States.  We  then  review 
information  systems  used  by  consumers,  patients,  and  providers 
and  raise  HCI  issues  and  challenges  associated  with  both  per¬ 
spectives.  Then  we  propose  a  framework  for  evaluating  health¬ 
care  applications  and  conclude  with  a  discussion  of  future  op¬ 
portunities  and  challenges  for  HCI  in  healthcare. 


CHARACTERISTICS  OF  THE 
HEALTHCARE  INDUSTRY 


The  Healthcare  Industry 

As  previously  discussed,  the  Institute  of  Medicine  (2001)  puts 
forth  four  key  underlying  reasons  for  inadequate  quality  of  care 
in  the  U.S.  healthcare  system  today:  (a)  the  growing  complexity 
of  science  and  technology,  (b)  the  increase  in  chronic  condi¬ 
tions,  (c)  a  poorly  organized  delivery  system,  and  (d)  constraints 
on  exploiting  the  revolution  in  information  technology.  In  addi¬ 
tion,  a  growing  trend  toward  consumerism  is  becoming  a  major 
force  in  shaping  the  future  organization  of  the  healthcare  indus¬ 
try.  These  five  trends,  detailed  further  in  the  following  discus¬ 
sion,  are  shaping  the  future  of  healthcare  in  the  United  States. 

Complexity  of  Science  and  Technology 

The  sheer  volume  of  new  healthcare  science  and  technolo¬ 
gies — the  knowledge,  skills,  interventions,  treatments,  drugs. 
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and  devices — is  very  large  today  and  has  advanced  much  more 
rapidly  that  our  ability  to  use  and  deliver  them  in  a  safe,  effec¬ 
tive,  and  efficient  way.  Government  as  well  as  private  invest¬ 
ment  in  pharmaceutical,  medical,  and  biomedical  research  and 
development  has  increased  steadily.  The  healthcare  delivery 
system  has  not  kept  up  with  phenomenal  advancement  in  sci¬ 
ence  and  technology  and  proliferation  of  knowledge,  treat¬ 
ments,  drugs,  and  devices.  With  current  advances  in  genomics 
(offering  promise  in  diagnosis  as  well  as,  possibly,  treatment), 
sensor  technologies  (offering  promise  in  automated  detection, 
measurement,  and  monitoring),  nanotechnologies  (offering 
promise  in  diagnosis,  treatment,  and  control),  and  informa¬ 
tion  and  communication  technologies  (enabling  remote  deliv¬ 
ery,  telemedicine,  e-health,  and  patient  empowerment),  the 
complexity  of  science  and  technology  in  healthcare  is  only  go¬ 
ing  to  increase. 

Chronic  Conditions 

As  noted  by  the  Institute  of  Medicine  (2001),  “because  of 
changing  mortality  patterns,  those  age  65  and  over  constitute  an 
increasingly  large  number  and  proportion  of  the  U.S.  population” 
(p.  26).  Therefore,  there  is  an  increase  in  both  the  incidence  and 
prevalence  of  chronic  conditions  (defined  as  conditions  lasting 
more  than  three  months  and  not  self-limiting).  Hoffman,  Rice, 
and  Sung  (1996)  estimated  that  patients  with  chronic  conditions 
make  up  80%  of  all  hospital  bed  days,  83%  of  prescription  drug 
use,  and  55%  of  emergency-room  visits.  Compared  to  acute  ill¬ 
nesses,  effectively  treating  chronic  conditions  requires  disease 
management  and  control  over  long  periods  of  time,  collaborative 
processes  between  providers  and  patient,  as  well  as  patient  in¬ 
volvement,  self-management,  and  empowerment. 

Organization  of  the  Delivery  System 

The  healthcare  delivery  system  in  the  United  States  is  a  highly 
complex  system  that  is  nonlinear,  dynamic,  and  uncertain.  The 
system  is  further  complicated  by  a  large  number  of  agents  who 
are  multiple  stakeholders,  each  with  multiple,  sometimes  con¬ 
flicting,  goals,  aspirations,  and  objectives.  As  a  result,  the  entire 
system  leads  to  a  lack  of  accountability;  it  is  has  frequently  mis¬ 
aligned  reward  as  well  as  incentive  structures,  and  it  suffers  from 
inefficiencies  embedded  in  multiple  layers  of  processes.  The 
healthcare  “product”  or  “service”  is  often  ill  defined  or  difficult 
to  define  and  evaluate.  The  processes  involved  in  delivering 
healthcare  services  are  complex,  ill  specified,  and  difficult  to 
measure,  monitor,  and  control.  Health  outcomes  are  difficult 
to  measure,  manage,  and  analyze.  The  system  experiences 
growing  cost  pressures,  faces  potential  insurance  premium  in¬ 
creases,  and  is  extremely  fragmented.  Wagner,  Austin,  and  Von 
Korff  (1996)  identified  five  elements  needed  to  improve  pa¬ 
tients’  outcomes  in  a  population  increasing  afflicted  by  chronic 
conditions: 

•  evidence-based,  planned  care 

•  reorganization  of  practices  to  meet  the  needs  of  patients  who 

require  more  time  and/or  resources,  and  closer  follow-up 


•  systematic  attention  to  patients’  need  for  information  and 
behavioral  change 

•  ready  access  to  necessary  clinical  knowledge  and  expertise 

•  supportive  information  systems 

Regarding  this  last  point,  the  Institute  of  Medicine  (2001) 
pointed  to  the  fact  that: 

healthcare  organizations  are  only  beginning  to  apply  information  tech¬ 
nology  to  manage  and  improve  patient  care.  A  great  deal  of  medical  in¬ 
formation  is  stored  on  paper.  Communication  among  clinicians  and 
with  patients  does  not  generally  make  use  of  the  Internet  or  other  con¬ 
temporary  information  technology.  Hospitals  and  physician  groups  op¬ 
erate  independently  of  one  another,  often  providing  care  without  the 
benefit  of  complete  information  on  the  patient’s  condition  or  medical 
history,  services  provided  in  other  settings,  or  medications  prescribed 
by  other  providers,  (p.  30) 

Information  Technology 

The  revolution  in  information  technology  holds  great 
promise  in  a  number  of  areas  for  consumers,  patients,  clinicians, 
and  all  organizations  involved  in  the  delivery  of  healthcare  ser¬ 
vices.  A  recent  report  by  the  National  Research  Council  of  The 
National  Academies  (2000)  identified  six  major  information 
technology  applications  domains  in  healthcare:  (a)  consumer 
health,  (b)  clinical  care,  (c)  administrative  and  financial  transac¬ 
tions,  (d)  public  health,  (e)  professional  education,  and  (g)  re¬ 
search.  While  many  applications  are  currently  in  use  (such  as 
online  search  for  medical  information  by  patients),  others,  such 
as  remote  and  virtual  surgery  and  simulation  of  surgical  proce¬ 
dures,  are  in  early  stages  of  development  (Institute  of  Medicine, 
2001).  While  the  Internet  (and  intranets)  has  been  a  driving 
force  for  changes  in  the  information  technology  landscape  in 
the  past  decade,  many  healthcare  applications  are  not  web 
based.  Applications  such  as  administrative  billing  systems,  elec¬ 
tronic  medical  records  EMR),  and  computerized  physician  order 
entry  systems  frequently  remain  on  legacy  systems,  which  are 
often  built  around  older  mainframe  systems  and  lacking  inte¬ 
gration  with  other  applications. 

With  respect  to  consumers/patients  and  providers,  the  Com¬ 
mittee  on  the  Quality  of  Health  Care  in  America  identified  five 
key  areas  in  which  information  technology  could  contribute  to 
an  improved  delivery  system  (Institute  of  Medicine,  2001): 

•  access  to  medical  knowledge  base 

•  computer-aided  decision  support  systems 

•  collection  and  sharing  of  clinical  information 

•  reduction  in  medical  errors 

•  enhanced  patient  and  clinician  communication 

Consumerism 

The  Internet  and  other  developments  in  information  and 
communication  technologies  are  contributing  to  greater  con¬ 
sumerism  with  stronger  demands  from  individuals  for  informa¬ 
tion  and  convenience.  People  are  more  demanding;  they  want 
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timely  and  easy  access  to  medical  information,  the  latest  in 
technology,  and  the  latest  in  customer  service.  Patients  are  start¬ 
ing  to  have  access  to  tools  that  can  lead  to  empowerment  and 
shared  decision  making  regarding  their  own  healthcare.  In  ad¬ 
dition,  because  of  increased  fragmentation  and  specialization 
in  medical  care,  patients  need  to  take  a  more  active  role  in  man¬ 
aging  their  health  and  healthcare  to  ensure  that  the  various 
providers  involved  in  providing  their  care  (i.e.,  primary  care 
physician,  specialist,  pharmacist,  etc.)  have  the  information  they 
need  to  provide  appropriate,  quality  care. 

There  are,  however,  many  technical,  organizational,  behav¬ 
ioral,  and  social  challenges  and  barriers  to  a  increased  use  of 
information  technology  by  individuals  in  managing  their  health/ 
healthcare.  These  technological  challenges  not  only  include  the 
design  of  optimal,  effective,  flexible  human-computer  interfaces 
(HCIs),  but  also  issues  of  privacy  and  security  of  information. 
Those  are  discussed  in  the  following  section. 

The  Healthcare  Regulatory  Environment 

As  Kumar  and  Chandra  (2001)  mentioned,  the  healthcare  in¬ 
dustry  has  unique  legislative  challenges.  Among  them,  two  in 
particular  have  implications  on  the  field  of  healthcare  informat¬ 
ics:  (a)  the  Health  Insurance  Portability  and  Accountability  Act 
(HIPAA);  and  (b)  health  information  and  other  business  data  se¬ 
curity.  The  Health  Insurance  Portability  and  Accountability  Act 
was  passed  and  signed  into  law  in  1996  and  is  designed  to  im¬ 
prove  the  portability  of  health-insurance  coverage  in  the  group 
and  individual  markets,  limit  healthcare  fraud  and  abuse,  and 
simplify  the  administration  of  health  insurance.  The  act  has  se¬ 
rious,  impending  implications  for  healthcare  providers  and  in¬ 
formation  managers.  Of  all  its  mandates,  administrative  simpli¬ 
fication  is  perhaps  the  most  critical  for  healthcare  information 
managers,  who  are  faced  with  everything  from  establishing  stan¬ 
dardized  financial  and  clinical  electronic  data  interchange  (EDI) 
code  sets  to  adopting,  assigning,  and  using  unique  numerical 
identifiers  for  each  healthcare  provider,  payer,  patient,  and  em¬ 
ployer.  Both  HIPAA  and  the  growing  role  of  the  Internet-based 
technologies  in  delivering  healthcare  create  an  even  greater  and 
critical  concern  for  data  security  and  privacy. 

HIPAA  contains  five  key  Titles  that  apply  to  every  provider, 
payer  (including  self-insured  employers),  and  healthcare  clear¬ 
inghouses.  Title  I  pertains  to  healthcare  access,  portability,  and  re- 
newability.  Title  II  relates  to  healthcare  fraud  and  abuse.  Title  III 
addresses  tax-related  provisions  and  medical  savings  accounts.  Ti¬ 
tle  IV  discusses  the  application  and  enforcement  of  group  health- 
plan  requirements.  Lastly,  Title  V  relates  to  revenue  offsets. 

Currently,  the  healthcare  industry  is  focusing  its  efforts  on 
Title  II  of  the  Act,  also  known  as  Administrative  Simplification. 
Unlike  past  legislation  that  has  imposed  regulations  on  only 
Medicare  and  Medicaid,  all  healthcare  providers,  payers,  and 
clearinghouses  must  meet  HIPAA  Administrative  Simplification 
regulations  within  26  months  of  each  subsection’s  finalization. 
There  are  four  subsections  to  Title  II.  They  are: 

•  transaction  standards 

•  coding  sets 

•  patient  privacy 

•  security 


Transaction  Standards  and  Coding  Sets 

The  Department  of  Health  and  Human  Services  has  adopted 
national  standards  for  electronic  administrative  and  financial 
healthcare  transactions.  This  is  one  of  the  most  positive  attrib¬ 
utes  of  Title  II  as  it  eliminates  the  conflicting  transaction  stan¬ 
dards,  coding  sets,  and  identifiers  used  by  the  various  players 
in  the  industry.  Initial  transaction  and  code  sets  were  imple¬ 
mented  in  October  2002,  with  a  revised  code  set  standards  de¬ 
fined  and  implemented  by  October  2003.  The  standards  relate 
to  enrollment,  referrals,  claims,  payments,  eligibility  for  a  health 
plan,  payment  and  remittance  advice,  premium  payments, 
health-claim  status,  and  referral  certifications  and  authoriza¬ 
tions.  By  developing  national  standards,  it  is  anticipated  that 
Electronic  Data  Interchange  (EDI)  of  healthcare  data  will  signif¬ 
icantly  reduce  administrative  costs.  The  Workgroup  on  Elec¬ 
tronic  Data  Interchange  (2000)  estimated  that  EDI  can  reduce 
administrative  costs  by  $26  billion  per  year  by  streamlining  pre¬ 
certification,  enrollment  status,  and  reimbursement  processes. 
Code  Sets  have  also  been  defined  in  which  variations  are  not 
permitted.  Code  Sets  are  based  on  the  following  standards: 

ICD-9  Volumes  1  &  2 — Diagnosis  Coding 

ICD-9  Volume  3 — Inpatient  Hospital  Service  Coding 

CPT  4 — Physician  Service  Coding 

CDT-2 — Dental  Service  Coding 

HCPCS — Other  Health  Related  Coding 

DRG — Diagnosis  Related  Groups 

NDC — National  Drug  Coding 

HIPAA  also  establishes  unique  identifiers  for  healthcare 
providers,  health  plans  and  payers,  employers,  and  eventually 
patients/individuals.  Currently  the  patient/individual  identifier  is 
pending  as  no  consensus  could  be  reached. 

Privacy  and  Security 

Administrative  simplification  also  addresses  patient  privacy 
and  security.  Privacy  standards  exist  for  disclosure  of  patient 
identifiable  information  (including  demographic  data) ,  training 
of  healthcare  workforce,  individual’s  rights  to  see  records,  pro¬ 
cedures  for  amending  inaccuracies  in  medical  records,  mainte¬ 
nance  of  privacy  when  patient  information  is  exchanged  between 
business  associates,  designation  of  a  privacy  officer,  procedures 
for  complaints,  sanctions  for  infractions,  duty  to  mitigate,  and 
document  compliance.  As  a  component  of  disclosure,  covered 
entities  (providers,  payers,  and  clearinghouses)  must  make  rea¬ 
sonable  efforts  not  to  use  or  disclose  more  than  the  minimum 
amount  of  protected  patient  information  necessary  to  accom¬ 
plish  the  intended  purpose  of  the  use  of  disclosure.  Wide  adop¬ 
tion  and  use  of  electronic  medical  record  (EMR)  systems  will 
present  technical  challenges  to  preventing  disclosure  of  this  in¬ 
formation.  In  the  past,  accessibility  of  paper-based  patient 
records  to  clinicians  was  restricted  in  part  by  the  chart’s  physical 
location  (i.e.,  in  medical  records  storage,  on  the  nursing  unit  in 
a  hospital).  However,  EMR  makes  these  charts  available  to  a 
much  large  number  of  clinicians  in  a  wider  range  of  physical 
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locations,  making  it  more  difficult  to  prevent  inappropriate  uses 
of  this  protected  information  by  authorized  users  of  the  EMR. 

Security  standards  require  establishment  of  administrative 
procedures  (policies  and  procedures),  physical  safeguards 
(physical  access  to  computers),  technical  security  (individual 
and  network  computer  access),  and  electronic  signature  (op¬ 
tional,  but  if  used,  it  must  be  digital).  Implementation  require¬ 
ments  of  the  privacy  and  security  standards  have  received  a 
tremendous  amount  of  debate.  While  most  feel  the  intent  of 
the  standards  is  good,  the  cost  implications  have  left  many  ad¬ 
vocating  for  a  longer  implementation  period.  The  Department 
of  Health  and  Human  Services  estimated  the  total  cost  to  im¬ 
plement  Subtitle  II  to  be  approximately  1 17  billion  to  1 18  bil¬ 
lion  over  the  next  10  years.  Some  industry  experts  (Kennedy 
and  Blum,  2001)  purported  implementation  and  maintenance 
costs  ranging  from  $50  to  $200  million  for  a  large  healthcare 
provider  and/or  payer.  However,  these  numbers  may  increase 
as  more  healthcare  providers  implement  wireless  networks  to 
support  the  use  of  EMR  and  other  clinical  systems.  Use  of  wire¬ 
less  networks  and  centralized  databases  that  support  EMR  and 
other  systems  creates  a  new  set  of  technical  security  require¬ 
ments  and  issues. 


HEALTHCARE  INFORMATICS 


As  defined  earlier,  healthcare  informatics  comprises  the  genera¬ 
tion,  development,  application,  and  testing  of  information  and 
communication  principles,  techniques,  theories,  and  technolo¬ 
gies  to  improve  the  delivery  of  healthcare  with  a  focus  on  the 
patient/consumer,  the  provider,  and,  more  importantly,  the  pa¬ 
tient-provider  interaction.  While  a  number  of  systems  using  a 
number  of  platforms  and  technologies  have  already  been  de¬ 
veloped  and  are  currently  being  developed,  the  field  itself  is  still 
in  its  infancy.  In  addition,  current  systems  have  been  designed 
to  fit  within  the  existing  healthcare  delivery  system  and  thus 
are  only  marginally  or  superficially  impacting  the  way  healthcare 
is  being  delivered.  The  true  potential  of  healthcare  informatics 
is  yet  to  be  experienced  and  will  radically  transform  the  way 
healthcare  is  being  delivered  and  managed  in  the  future.  The 
following  sections  provide  background  information  on  current 
healthcare  applications  with  an  emphasis  on  two  types  of  users: 
(a)  the  consumer/patient  and  (b)  the  clinician/provider. 

Consumer  Health  Information 

Consumer  health  information  has  been  defined  as  “any  infor¬ 
mation  that  enables  individuals  to  understand  their  health  and 
make  health-related  decisions  for  themselves  or  their  families” 
(Harris,  1995).  Patrick  and  Koss  (1995)  listed  a  variety  of  orga¬ 
nizations  and  entities  that  produce  and/or  disseminate  con¬ 
sumer  health  information  including  health-related  organizations 
(involved  in  provision  of,  or  payment  for,  healthcare  services 
and  supporting  services),  libraries,  health  voluntary  organiza¬ 
tions  (i.e.,  American  Heart  Association,  American  Cancer,  Amer¬ 
ican  Lung,  etc.,  and  6000+  other  health-interest  societies), 
broadcast  and  print  media,  employers,  government  agencies, 


community-based  organizations  (i.e.,  churches,  YMCA,  agen¬ 
cies  for  the  elderly) ,  networked  computer  health  information 
providers — “virtual”  communities.  Methods  of  dissemination  are 
diverse  and  include  informal  channels,  printed  text,  broadcast 
electronic  media,  dial-up  services  (telephone),  nonnetworked 
computer-based  information  (i.e.,  CD-ROM,  Kiosk  technology), 
and  networked  interactive  computer-based  information.  Harris 
(1995)  discussed  a  number  of  problems  with  health  information 
including  how  to  interpret  conflicting  or  differing  information; 
how  to  judge  reliability;  how  to  choose  among  many  alterna¬ 
tives;  and  how  to  deal  with  the  vast  quantities  of  information, 
much  of  which  is  superficial  or  even  inaccurate. 

Interactive  Health  Communication 

Print  and  broadcast  dissemination  of  health  information  leads  to 
a  number  of  problems  such  as  the  timing  relative  to  need,  single 
directionality  (difficulty  of  following  up,  clarifying,  and  under¬ 
standing),  timeliness  and  relevance  vis-a-vis  updates,  and  not 
being  unique  to  individuals  (Patrick  &  Koss,  1995).  As  Harris 
(1995)  pointed  out,  electronic  sources  of  health  information 
have  the  potential  to  be  more  timely  and  complete  than  other 
media  and  can  become  more  accessible  to  all  citizens.  Conse¬ 
quently,  the  use  of  interactive  health  communication  or  con¬ 
sumer  health  informatics  has  become  increasingly  popular.  Un¬ 
fortunately,  the  advantages  have  not  been  fully  realized.  The 
science  panel  on  interactive  communication  and  health  defined 
interactive  health  communication  as  “the  interaction  of  an  indi¬ 
vidual — consumer,  patient,  caregiver,  or  professional — with  or 
through  an  electronic  device  or  communication  technology  to 
access  or  transmit  health  information  or  to  receive  guidance 
and  support  on  health-related  issues”  (Robinson,  Patrick,  Eng, 
&  Gustafson,  1998,  p.  1284).  The  panel  identified  six  potential 
functions  of  interactive  health  communication  applications: 
(a)  relay  information,  (b)  enable  informed  decision  making, 

(c)  promote  healthful  behaviors,  (d)  promote  peer  information 
exchange  and  emotional  support,  (e)  promote  self-care,  and 

(d)  manage  demand  for  health  services. 

Ferguson  (1997)  referred  to  two  types  of  consumer  health 
informatics:  community  and  clinical.  Community  consumer 
health  informatics  includes  resources  such  as  online  networks, 
forums,  databases,  and  websites  that  anyone  with  a  home  com¬ 
puter  can  access.  Clinical  consumer  health  informatics  includes 
resources  such  as  programs  or  systems  developed  by  clinicians, 
system  developers,  or  HMOs,  and  is  provided  only  to  selected 
groups  of  members  or  patients.  With  respect  to  community  con¬ 
sumer  health  informatics  resources,  there  is  a  growing  concern 
that  the  barriers  to  use  of  online  health  resources  are  becom¬ 
ing  similar  to  those  for  actual  health  services.  These  barriers 
include  cost,  geography,  literacy,  culture,  disability,  and  other 
factors  related  to  the  capacity  of  people  to  use  services  appro¬ 
priately  and  effectively.  Eng  et  al.  (1998)  urged  public  and  pri¬ 
vate  sector  organizations  to  collaborate  in  reducing  the  gap  be¬ 
tween  the  “haves”  and  “have  nots”  for  health  information.  They 
suggested  the  need  for  supporting  health  information  technol¬ 
ogy  access  in  homes  and  public  places,  developing  applications 
for  the  growing  diversity  of  users,  funding  research  on  access- 
related  issues,  enhancing  literacy  in  health  and  technology, 
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training  health  information  intermediaries,  and  ensuring  the 
quality  of  health  information  and  support. 

In  a  report  on  consumer  health  informatics,  the  U.S.  Gen¬ 
eral  Accounting  Office  (1996)  described  three  general  cate¬ 
gories  of  consumer  health  informatics.  They  include  systems 
that  (a)  provide  health  information  to  user  (one-way  commu¬ 
nication);  (b)  tailor  specific  information  to  the  user’s  unique  sit¬ 
uation  (customized  communication);  or  (c)  allow  the  user  to 
communicate  and  interact  with  healthcare  providers  or  other 
users  (two-way  communication).  The  report  also  cited  a  num¬ 
ber  of  issues  of  concern:  access,  cost,  information  quality,  secu¬ 
rity  and  privacy,  computer  literacy,  copyright,  systems  develop¬ 
ment  (compatibility,  infrastructure,  and  standardization),  and 
the  potential  for  information  overload. 

Consumer/Patient  Web-Based 
Applications  and  e-Health 

An  estimated  95  million  Americans  seek  health  information  on¬ 
line  (Fox  2005).  In  1999,  the  Benton  Foundation  estimated 
there  were  10,000  or  more  health-related  websites.  However, 
this  number  has  grown  tremendously  to  meet  increasing  con¬ 
sumer  demand  for  healthcare  information.  In  early  2006,  the 
Yahoo  directory  of  health-related  websites  (dir.yahoo.com/ 
health)  listed  over  10,000  websites  related  to  diseases  and  con¬ 
ditions  alone  and  thousands  of  other  sites  addressing  topics 
ranging  from  pharmacy  to  alternative  medicine  to  public  health. 
This  variety  of  information  available  demonstrates  that  con¬ 
sumers  and  patients  desire  a  range  of  information  and  services 
including  getting  disease-treatment  information,  obtaining  re¬ 
port  cards  on  physicians  or  hospitals,  exchanging  information 
with  other  patients,  interacting  online  with  their  physicians,  and 
managing  their  own  health  benefits.  The  sheer  volume  of  in¬ 
formation  on  the  Internet  exceeds  most  expectations  yet  raises 
problems:  efficient  search  for  information,  information  retrieval, 
information  visualization,  human-information  processing,  un¬ 
derstanding,  and  assimilation.  Authors  of  Internet  information 
need  to  determine  how  to  best  structure  the  information  for 
use  by  others.  E-Health  can  be  described  as  the  transition  of 
healthcare  processes  and  transactions  into  the  Internet-delivered 
electronic  superhighway.  Potential  problems  exist  that  are  spe¬ 
cific  to  patients  seeking  electronic  health-related  information 
(Sonnenberg,  1997),  including  the  lack  of  editorial  control  of  in¬ 
formation,  conflict  of  interest  for  website  sponsors,  and  unfil¬ 
tered  information  presenting  an  unbalanced  view  of  medical  is¬ 
sues.  However,  referring  primarily  to  self-help  groups,  Ferguson 
(1997)  believed  that  the  problem  of  obtaining  bad  medical  in¬ 
formation  online  is  not  very  different  from  obtaining  bad  medical 
information  at  cocktail  parties,  in  the  tabloids,  in  magazine  ads, 
etc.  In  addition  to  the  content  of  information,  there  are  tech¬ 
nological  problems  that  may  affect  both  access  and  use  of  health 
information  on  the  Internet:  slow  modems,  poor  institutional 
Internet  connections,  firewalls  that  interfere  with  Internet  traffic, 
malfunctioning  message  routers,  and  heavy  Internet  usage  in 
the  immediate  geographic  area  (Lindberg  &  Humphreys,  1998). 
Most  current  concerns  about  interactive  health-communication 
center  on  the  fact  that  these  applications  have  the  potential  to 
both  improve  health  and  to  cause  harm;  thus  highlighting  the 


need  to  ensure  their  accuracy  quality  safety,  and  effectiveness 
(Robinson  et  al.,  1998). 

Provider  Healthcare  Informatics 

Providers  and  healthcare  provider  organizations  have  long  used 
health-information  systems  to  support  both  administrative  and 
clinical  functions  of  healthcare  delivery  and  management.  How¬ 
ever,  despite  the  fact  that  healthcare  is  one  of  the  most  infor¬ 
mation-intensive  industries;  it  has  very  few  state-of-the-art 
information-management  systems.  Healthcare  is  fragmented, 
with  hundreds  of  thousands  of  payers,  hospitals,  physicians,  lab¬ 
oratories,  medical  centers,  pharmacies,  and  clinics,  each  with  its 
own  legacy  of  systems,  hardware,  software,  and  platforms.  Elec¬ 
tronic  data  interchange  and  connectivity  issues  have  become 
critical.  Numerous  information  systems  have  been  developed 
and  implemented,  the  most  noteworthy  being  integrated  EMRs 
and  computerized  physician  order  entry  systems.  The  follow¬ 
ing  is  a  discussion  of  some  important  clinical  applications. 

Electronic  Medical  Records  EMRs 

EMRs  are  increasingly  being  adopted  in  both  primary-care 
and  inpatient-care  environments.  EMRs  provide  functions  to 
document  all  clinical  processes  and  patient-related  information 
relevant  to  the  delivery  of  patient  care.  EMRs  are  advocated  and 
used  to  improve  the  quality,  accessibility,  and  timeliness  of  pa¬ 
tient  medical  information.  However,  because  EMRs  replace  tra¬ 
ditional  patient  charts,  they  affect  a  wide  range  of  clinical  users 
and  clinical  processes.  As  such,  they  face  a  number  of  barriers  to 
adoption,  including  cost,  a  lack  of  tested  systems,  problems 
with  data  entry,  inexperienced  vendors,  confidentiality  con¬ 
cerns,  and  security  concerns  (Wager,  Lee,  White,  Ward,  &  Orn- 
stein,  2000).  In  addition,  EMR  may  have  barriers  that  are  specif¬ 
ically  related  to  the  practice  of  medicine.  For  example,  physician 
use  of  EMR  while  with  patients  could  affect  patient  perceptions 
of  quality  of  care  or  quality  of  physician-patient  interactions. 
While  some  past  studies  examining  this  issue  have  not  shown 
this  to  be  the  case  (i.e.,  Legler  &  Oates,  1993),  a  recent  survey  of 
pediatric-care  primary  physicians  using  EMR  indicated  that  they 
felt  it  reduced  eye  contact  with  patients  and  increased  the  du¬ 
ration  of  patient  visits  (Adams,  Mann,  &  Baushner,  2003).  Thus, 
the  impact  of  these  technologies  on  physician-patient  inter¬ 
actions  is  not  clear  at  this  point. 

A  key  factor  that  will  contribute  to  whether  EMR  systems  be¬ 
come  more  rapidly  adopted  in  primary-care  practices  is  whether 
or  not  physicians  themselves  perceive  the  systems  as  improv¬ 
ing  quality  (of  the  medical  records,  patient  care,  and  overall  per¬ 
formance)  .  A  recent  study  examined  this  issue  using  qualitative 
methods  in  five  community-based  practices  that  had  used  EMR 
for  at  least  two  years  and  did  not  use  a  duplicate  record  system 
(Wager  et  al.,  2000).  Results  indicated  that  many  physicians  and 
staff  members  perceived  benefits,  such  as  increased  access,  an 
ability  to  search  the  system,  and  improved  overall  quality  of 
medical  records.  There  were,  however,  several  disadvantages 
mentioned,  including  the  frequency  of  downtime  and  the  time 
necessary  to  develop  customized  templates. 
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In  a  recent  study  examining  the  quality  of  worklife  of  family 
physicians,  Karsh,  Beasley,  Hagenauer,  and  Sainfort  (2001)  col¬ 
lected  quantitative  data  about  EMR  in  order  to  compare  per¬ 
ceptions  of  medical  records  between  physicians  using  EMR  and 
those  not  using  EMR.  Specifically,  they  assessed  whether  or  not 
the  use  of  EMR  was  related  to  perceptions  of  improved  quality 
of  medical  records.  The  results  showed  that  physicians  who 
used  EMR  perceived  their  medical  records  to  be  more  up  to 
date  and  accessible.  Physicians  who  used  EMR  were  also  more 
satisfied  with  the  overall  quality  of  their  medical-records  sys¬ 
tems.  On  the  other  hand,  there  were  no  differences  in  percep¬ 
tions  of  whether  medical  records  could  be  modified  to  meet  in¬ 
dividual  needs.  This  suggests  that  EMR  can  have  positive 
impacts  on  medical  records  (it  is  possible,  though,  that  the  pos¬ 
itive  responses  were  caused  not  by  positive  traits  of  the  EMR, 
but  rather  because  of  cognitive  dissonance). 

The  results  of  Karsh  et  al.’s  (2001)  study  supported  those  of 
qualitative  studies  of  physicians  who  use  EMR.  Wager  et  al. 
(2000)  found  that  physicians  in  primary  practice  who  had  used 
EMR  for  at  least  two  years  believed  EMR  to  have  many  benefits 
over  paper-based  systems,  including  increased  access  and  avail¬ 
ability  of  patient  information  to  multiple  users,  the  ability  to 
search  the  system,  improved  overall  quality  of  patient  records, 
improved  quality  of  documentation,  increased  efficiency,  facili¬ 
tated  cross-training,  and  improved  communication  within  the 
practice.  Thus,  it  is  clear  that  EMR  has  the  potential  to  improve 
the  quality  of  patient  records  and  therefore,  possibly,  the  quality 
of  care.  However,  to  fully  capitalize  on  such  systems,  they  need 
to  be  designed  to  maximize  usability  connectivity,  and  portabil¬ 
ity  while  guaranteeing  privacy  and  security. 

Computerized  Physician  Order  Entry  Systems 

Computerized  physician  order  entry  (CPOE)  systems  have 
substantial  potential  for  improving  the  medication  ordering 
process  because  they  enable  physicians  to  write  orders  directly 
online.  They  ensure  complete,  unambiguous,  and  legible  or¬ 
ders;  they  assist  physicians  at  the  time  of  ordering  by  suggesting 
appropriate  doses  and  frequencies,  by  displaying  relevant  data 
to  assist  in  prescription  decisions,  by  checking  drugs  prescribed 
for  allergies  and  drug-drug  interactions.  As  such,  those  systems 
are  believed  to  potentially  reduce  the  incidence  of  medical  er¬ 
rors  in  general,  and  medication  errors  in  particular.  In  a  study  of 
the  impact  of  CPOE,  Bates,  Teich,  Lee,  Seger,  Kuperman,  Ma’ 
Luf,  Boyle,  and  Leape  (1999)  found  that  such  systems  substan¬ 
tially  decreased  the  rate  of  nonmissed-dose  medication  errors. 
In  addition,  other  studies  have  demonstrated  that  use  of  CPOE 
decreases  turn-around  time  for  medication  orders  as  well  as  ra¬ 
diation  and  laboratory  orders  (Mekhijan,  Kumar,  Kuehn,  Bentley 
Teater,  &  Thomas,  2002)  and  improved  adherence  to  clinical 
guidelines  (Overhage  &  Tierney  1997). 

However,  these  systems  are  not  a  panacea.  In  some  cases, 
their  use  has  resulted  in  negative  outcomes,  including  increased 
physician  order  entry  time  (Bates  et  al.,  1994;  Shu  et  al.,  2001) 
and  an  increased  coordination  load  on  clinical-care  teams  (Cheng 
etal.,  2003). 

For  computerized  physician-order-  entry  systems  to  be  fully 
successful,  physicians  need  to  use  them.  However,  as  the  stud¬ 


ies  previously  discussed  indicate,  the  gains  achieved  come  at  a 
cost,  frequently  imposed  on  the  physicians  using  the  system. 
This  leads  to  issues  of  HCI  design,  usability,  and  integration 
within  the  care  delivery  processes  to  ensure  that  the  burden 
on  these  users  is  minimized.  The  importance  of  addressing 
these  issues  has  also  been  highlighted  in  several  recent  studies 
showing  that  the  potential  for  new  types  of  prescribing  errors 
can  be  created  by  usability  problems  in  the  interface  and/or  lack 
of  integration  of  the  system  with  clinical  work  practices  (Koppel, 
Metlay,  Cohen,  Abaluck,  Localio,  Kimmel,  &  Strom,  2005).  In 
certain  care  situations,  such  as  critical-care  environments,  this 
lack  of  integration  between  the  clinical  processes  and  patient 
care  needs  (i.e.,  urgency  of  care)  can  have  dire  consequences 
for  patients  (Han,  Carcillo,  Venkataraman,  Clark,  Watson,  & 
Nguyen,  2005).  This  emphasizes  the  importance  of  identifying 
and  resolving  potential  usability  problems  prior  to  implemen¬ 
tation,  continually  monitoring  their  use,  and  incrementally  im¬ 
proving  their  usability  in  the  context  of  the  needs  of  specific 
clinical-care  environments. 


Patient  Monitoring  Systems 

Gardner  and  Shabot  (2001)  defined  patient  monitoring  as 
“repeated  or  continuous  observations  or  measurement  of  the 
patient,  his  or  her  physiological  function,  and  the  function  of  life 
support  equipment,  for  the  purpose  of  guiding  management 
decisions”  (p.  444).  Electronic  patient  monitors  are  used  to  col¬ 
lect,  display,  store,  and  interpret  physiological  data.  Increasingly, 
such  data  are  collected  using  newly  developed  sensors  from  pa¬ 
tients  in  all  care  settings  as  well  as  in  patients’  own  homes.  "While 
such  data  can  be  extremely  useful  for  diagnosis,  monitoring, 
alerts,  as  well  as  treatment  suggestions,  the  amount,  diversity, 
and  complexity  of  data  collected  presents  challenges  to  HCI 
design. 

imaging  Systems 

Imaging  is  a  central  element  of  the  healthcare  process  for 
diagnosis,  treatment  plan  design,  image-guided  treatment, 
and  treatment-response  evaluation.  Greenes  and  Brinkley 
(2001)  noted  that  the  proliferation  in  number  and  kind  of  im¬ 
ages  generated  in  healthcare  led  to  the  creation  of  a  subdis¬ 
cipline  of  medical  informatics  called  “imaging  informatics.”  In 
their  review  and  summary  of  the  field,  Greenes  and  Brinkley 
noted  that: 

as  processing  power  and  storage  have  become  less  expensive,  newer, 
computationally  intensive  capabilities  have  been  widely  adopted.  "Wide¬ 
spread  access  to  images  and  reports  will  be  demanded  throughout 
healthcare  delivery  networks.  .  .  .  We  will  see  significant  growth  in  im¬ 
age-guided  surgery  and  advances  in  image-guided  minimally  invasive 
therapy  as  imaging  is  integrated  in  real  time  with  the  treatment  process. 
Telesurgery  will  be  feasible,  (pp.  534-536) 

They  also  highlight  that  such  ambitious  evolution  of  imag¬ 
ing  systems  will  be  in  part  dependent  on  continued  advances 
in  user  interfaces  and  software  functionality. 
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\nformation~Retrieval  Systems 

Hersh,  Detmer,  and  Frisse  (2001)  defined  information  re¬ 
trieval  as  the  “science  and  practice  of  indentification  and  effi¬ 
cient  use  of  recorded  media”  (p.  539).  In  their  review  of  the  sys¬ 
tems,  they  addressed  four  elements  of  the  information-retrieval 
process:  (a)  indexing,  (b)  query  formulation,  (c)  retrieval,  and 
(d)  evaluation  and  refinement.  Indexing  is  the  process  by  which 
content  is  represented  and  stored,  and  query  formulation  is 
the  process  by  which  user  information  needs  are  translated  in 
terms  of  an  actionable  query.  Focusing  on  clinicians’  use,  Hersh 
et  al.  (2001)  pointed  out  that  the  evolution  of  the  Internet  has 
posed  new  and  so  far  unmet  challenges  regarding  the  indexing 
of  (essentially  loosely  and  irregularly  structured)  documents. 
They  call  for  a  unified  medical-language  system.  Thinking  of 
novice  users  such  as  consumers  and  patients,  another  signifi¬ 
cant  challenge  resides  in  query  formulation,  since  novice  users 
generally  have  a  hard  time  expressing  ill-defined  information 
needs  because  of  their  lack  of  domain  knowledge. 


Decision-Support  Systems 

Decision-support  systems  (DSS)  are  computer  systems  used  to 
support  complex  decision  making.  The  goal  of  DSS  is  to  im¬ 
prove  the  efficiency  and/or  effectiveness  of  decision-making 
processes  (Shim,  Warkentin,  Courtney,  Power,  Sharda,  &  Carls- 
son,  2002).  As  such,  many  of  the  complex  decision  processes 
in  healthcare  can  benefit  from  the  use  of  DSS.  One  class  of  DSS 
in  healthcare  is  clinical  decision  support  systems  (CDSS),  which 
provided  decision  support  to  clinical  users  during  the  process 
of  providing  patient  care.  Another  class  of  DSS,  patient-decision- 
support  systems  (Patient  DSS),  is  designed  to  educate  and  sup¬ 
port  patients  as  they  make  decisions  that  affect  their  healthcare. 
The  following  sections  provides  a  brief  overview  of  these  two 
types  of  healthcare-related  DSS. 


Clinical-  Decision-Support  Systems 

Musen,  Shahar,  Shortliffe  (2001)  defined  a  clinical  decision 
support  system  (DSS)  as  “any  computer  program  designed  to 
help  health  professionals  make  clinical  decisions”  (p.  575). 
They  characterized  clinical  decision  support  systems  along  five 
dimensions: 

1.  system  function 

2.  mode  for  giving  advice 

3.  style  of  communication 

4.  underlying  decision-making  process 

5.  HCI 

A  large  number  of  applications  already  exists  and  can  be  fur¬ 
ther  developed.  The  opportunities  are  almost  unlimited.  Bates 
et  al.  (2001)  proposed  that  appropriate  increases  in  the  use  of  in¬ 
formation  technology  in  healthcare,  especially  the  introduction 
of  clinical  decision-support  systems  and  better  linkages  in  and 
among  systems,  could  result  in  substantial  reduction  in  medical 


errors.  For  example,  studies  have  demonstrated  that  implemen¬ 
tation  of  CPOE  systems  that  include  CDSS  functions  (i.e.,  drug 
interaction  checking,  allergy  checking,  etc.)  have  reduced  med¬ 
ication  errors  and  adverse  drug  events  in  both  general-care  set¬ 
tings  (e.g..  Bates,  Teich,  Lee,  Seger,  Kuperman,  Ma’Luf,  Boyle, 
Leape,  1999)  and  pediatric-care  settings  (i.e.,  Potts,  Barr,  Gre¬ 
gory,  Wright,  &  Patel,  2004;  Cordero  &  Kuehn,  2004). 

However,  as  noted  by  Musen  et  al.  (2001),  “systems  can  fail 
...  if  they  require  that  a  practitioner  interrupt  the  normal  pat¬ 
tern  of  patient  care”  (p.  587).  Thus,  new  technologies  (mobile 
devices,  wireless  networks,  and  distance-communication  tech¬ 
nologies)  as  well  as  novel  HCIs  (based  on  speech,  gestures,  and 
virtual  reality)  offer  huge  potential  to  maximize  usability  and 
permit  seamless  integration  of  clinical  decision-support  systems 
within  complex,  dynamic  work  processes. 

Patient  Decision-Support  Systems 

The  medical  industry  has  increasingly  acknowledged  the 
need  to  enable  patients  to  participate  in  making  health-related 
decisions.  In  order  to  effectively  participate  in  decision  mak¬ 
ing,  most  patients  need  to  become  better  informed  about  the 
options  available  and  need  help  assimilating  that  information 
to  apply  it  to  the  decision  at  hand.  These  health-related  deci¬ 
sions  range  from  choosing  a  health-insurance  plan  to  working 
with  their  doctor  to  select  a  treatment  for  cancer. 

Patient  DSS  are  one  of  several  types  of  decision  aids  used  to 
help  patients  participate  in  health-related  decisions.  Decision 
aids  are  interventions  provided  to  assist  individuals  as  they  make 
a  deliberative  choice  between  two  or  more  alternatives  (Bekker, 
Hewison,  &  Thornton,  2003).  Patient  DSS  supports  a  patient  in 
one  or  more  stages  of  making  a  health-related  decision.  Much 
of  the  past  and  current  patient  DSS  research  and  development 
efforts  have  targeted  patients  with  life-threatening  or  chronic  dis¬ 
eases.  Most  have  focused  on  supporting  decisions  regarding 
treatment  options  (i.e.  medical  or  surgical  therapies),  although  a 
few  have  examined  early-detection  and  other  issues  (O’Connor, 
1999).  Patient  DSS  that  support  patients  faced  with  treatment  de¬ 
cisions  provide  one  or  more  of  the  following  functions  that  fa¬ 
cilitate  patient  participation  in  decision  making  (O’Connor,  1999; 
Scott  &  Lenert,  1998) : 

Educate  the  patient.  Provide  the  patient  with  informa¬ 
tion  about  the  treatment  alternatives  and  outcomes,  especially 
highlighting  risks  and  benefits  associated  with  each  treatment 
alternative. 

Tailor  information.  Tailor  information  content  and/or 
presentation  based  on  patient  characteristics  such  as  their 
health  and  demographics  factors. 

Assess  preferences.  Use  preference-elicitation  methods 
to  assess  the  patient’s  values/preferences  for  the  possible  inter¬ 
vention  outcomes. 

Optimize  decision.  Optimize  the  decision  outcome 
based  on  context,  heuristics,  probabilities  of  outcomes,  and 
algorithms. 


33.  Human  Computer  Interaction  in  Health  Care  •  669 


Reviews  of  patient  decision  aids  including  patient  DSS  (i.e., 
Molenaar,  Sprangers,  Postma-Schuit,  Rutgers,  Noorlander, 
Hedricks,  &  De  Haes,  2000;  O’Connor,  1999;  O’Connor,  Rostom, 
Fiset,  Tetroe,  Entwistte,  &  Llewellyn -Thomas,  Homes-Rovner, 
Barry,  Jones  1999;  O’Connor,  Stacey,  Entwistle,  Llewellyn-Thomas, 
Rovner,  &  Holmes-Rovner,  Tait,  Tetroe,  Fiset,  Barry,  &  Jones 
2003b)  have  revealed  that  “decision-support  strategies  have  re¬ 
ceived  generally  consistent  positive  ratings  by  patients  in  terms 
of  feasibility  acceptability,  length,  balance,  clarity,  amount  of  in¬ 
formation,  and  usefulness  in  decision  making”  (O’Connor,  1999, 
p.  260).  Additionally  they  demonstrate  a  number  of  benefits  for 
patients,  including  improved  decision-relevant  knowledge,  re¬ 
duced  decisional  conflict,  improved  congruence  between  values 
and  choice,  more  realistic  expectations,  more  active  patient  par¬ 
ticipation,  and  fewer  patients  who  are  unable  to  make  a  deci¬ 
sion.  Bekker,  Hewison,  and  Thornton  (2003)  proposed  that 
many  observed  benefits  are  a  result  of  decision  aids  enabling  pa¬ 
tients  to  use  more  effective  cognitive  and  emotional  strategies. 
Note  that  the  reviews  do  not  indicate  a  consistent,  significant  ef¬ 
fect  of  decision  aids  on  decision  satisfaction,  although  a  gener¬ 
ally  positive  impact  on  satisfaction  with  the  decision  process  is 
indicated.  Use  of  decision  aids  tends  to  reduce  the  number  of 
patients  who  take  a  passive  role  in  decision  making  and  the 
number  who  are  undecided  (O’Connor  et  al.,  2003). 

In  addition  to  being  used  to  support  patients  faced  with  treat¬ 
ment  choices,  similar  support  systems  can  be  used  to  help 
healthy  individuals  make  health-related  choices.  For  example, 
DSS  can  support  healthcare  consumers  as  they  make  the  impor¬ 
tant  choice  of  choosing  health  insurance.  Considerable  effort  has 
been  made  to  educate  consumers  about  health  insurance.  One 
system  available  for  learning  about  health  insurance  plans  is  the 
Health  Plan  Report  Card  created  by  the  National  Committee  for 
Quality  Assurance  (NCQA)  (The  National  Committee  for  Quality 
Assurance,  2003).  The  report  card  is,  in  effect,  an  interactive  de¬ 
cision-support  tool  that  helps  consumers  compare  and  select  a 
health  plan.  The  Report  Card  provides  information  on  a  variety  of 
attributes  for  each  plan  and  enables  consumers  to  identify  plans 
that  meet  their  specifications.  This  system  provides  access  to  plan 
quality  information  and  uses  an  easily  comparable  matrix  format 
and  a  “star”  rating  system  to  simplify  comparisons  of  plans.  De¬ 
spite  the  increasing  availability  of  information  on  health  plans, 
many  consumers  still  find  the  information  confusing  and  the  de¬ 
cision  difficult.  By  presenting  health-plan  information  in  a  way 
that  supports  and,  perhaps,  adapts  to  a  consumer’s  characteris¬ 
tics  and  their  health-plan  preferences,  we  may  improve  the  us¬ 
ability  of  the  information  and  enable  consumers  to  make  more  in¬ 
formed  decisions  about  their  health-insurance  coverage. 

Patient  Preferences 

Patients’  preferences  and  priorities  regarding  their  health 
and  healthcare  are  applied  each  time  they  participate  in  making 
health-related  choices.  However,  an  individual’s  stated  prefer¬ 
ences  can  be  influenced  by  external  factors.  For  example,  re¬ 
searchers  have  found  that  framing  of  information  about  treat¬ 
ments  is  handled  (positive,  negative,  or  neutral)  has  affected 
patient’s  treatment  preferences  (Llewellyn-Thomas,  McGreal, 
&  Thiel,  1995).  It  is  likely  that  these  tendencies  for  external  fac¬ 


tors  to  influence  individual  preferences  will  hold  for  a  variety 
of  health-related  choices.  Therefore,  presenting  decision  mak¬ 
ers  with  information  previously  not  considered  in  their  decision 
may  cause  them  to  change  their  preference  structure.  Slovic 
(1995)  viewed  “preference  construction  as  an  active  process” 
(p.  369).  Taking  this  view,  it  is  important  for  patient  DSS  design¬ 
ers  to  take  a  patient- (user)  centered  approach  to  the  design  of 
these  systems  to  ensure  that  patient  decision  outcomes  are  im¬ 
proved  and  to  avoid  introducing  systematic  biases  into  the  deci¬ 
sion  process.  By  structuring  the  education  and  decision  process 
to  help  the  patient  develop  a  comprehensive  picture  of  relevant 
facets  of  the  decision,  we  can  help  the  patient  develop  a  more 
comprehensive,  rational  set  of  preferences  to  apply  to  his  or  her 
health  choices.  In  this  way,  DSS  can  help  patients/healthcare 
consumers  make  more  informed  decisions. 

HCI  Cfiallenges  in  Patient  DSS 

The  heterogeneity  of  patient/healthcare  consumer  popula¬ 
tion  presents  a  number  of  challenges  to  Patient  DSS  designers. 
Patients/consumers  vary  greatly  in  age,  physical  ability  mental 
ability,  computer  experience,  healthcare/health  condition  knowl¬ 
edge,  and,  as  discussed  previously,  decision-related  preferences. 
They  also  employ  difference  decision  strategies  when  faced  with 
a  multi-attribute  decision  tasks  such  as  many  of  those  faced  in 
healthcare.  These  decision  strategies  influence  what  informa¬ 
tion  the  patient  (decision  maker)  wants  to  see  and  how  he  or 
she  views  that  information.  Decision  makers  frequently  use  either 
compensatory  or  noncompensatory  strategies  to  evaluate  al¬ 
ternatives  and  make  a  decision.  Compensatory  strategies  are 
those  in  which  the  information  about  every  alternative  is  weighed 
and  compared  (Johnson,  M.  M.  S.,  1990).  Comparatively,  in  non¬ 
compensatory  strategies,  alternatives  may  be  eliminated  after  an 
incomplete  search,  thereby  reducing  the  cognitive  load  of  the 
decision  task  (Johnson,  M.  M.  S.,  1990).  Selection  of  a  decision 
strategy  is  related  to  a  number  of  factors,  including  age  (Johnson, 
M.  M.  S.,  1990)  and  personality  type.  In  addition  to  the  charac¬ 
teristics  of  the  individual,  selection  of  a  decision  strategy  is  also 
influenced  by  decision-task  characteristics  such  as  the  complex¬ 
ity  of  the  decision  (Payne,  1976)  and  the  type  of  decision  aids 
available  (Todd  &  Benbasat,  2001). 

Because  of  the  differing  information  and  decision-support 
needs  of  various  user  populations,  designing  effective  decision 
support  for  a  broad  range  of  patients/healthcare  consumers  is 
challenging.  Therefore,  it  is  imperative  that  Patient  DSS  de¬ 
signers  model  the  needs  of  their  target  user  population  and  de¬ 
sign  systems  to  meet  those  needs.  For  example,  certain  patient 
populations  may  suffer  from  limited  physical  or  cognitive  abili¬ 
ties,  making  it  vital  to  create  accessible  designs  that  meet  their 
special  needs. 


EVALUATING  COMPUTERIZED 
HEALTHCARE  APPLICATIONS 


■While  there  has  been  a  proliferation  of  web-based  healthcare 
applications  available  for  consumers  and  patients,  the  issue  of 
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evaluating  applications  and  guiding  users  in  choosing  the  best 
applications  has  become  extremely  important.  While  evaluation 
is  obviously  also  important  for  applications  targeted  at  health¬ 
care  professionals,  the  issue  is  not  as  critical  because  healthcare 
professionals  are  experts  and  can  exercise  their  own  judgment  in 
the  suitability  and  quality  of  applications  designed  to  assist  them 
in  their  work.  Consumers  and  patients,  on  the  other  hand,  have 
no  or  limited  basis  for  exercising  such  judgment. 

Concerned  that  the  growth  of  the  Internet  was  leading  to 
too  much  health  information  with  vast  chunks  of  it  incom¬ 
plete,  misleading,  or  inaccurate,  Silberg,  Lundberg,  and  Musac- 
chio  (1997)  proposed  four  standards  for  websites: 

1.  Authorship:  authors  and  contributors,  their  affiliations,  and 
relevant  credentials  should  be  provided. 

2.  Attribution:  references  and  sources  for  all  content  should  be 
listed  clearly,  and  all  relevant  copyright  information  noted. 

3.  Disclosure:  website  “ownership”  should  be  prominently 
and  fully  disclosed,  as  should  any  sponsorship,  advertising, 
underwriting,  commercial  funding  arrangements  or  sup¬ 
port,  or  potential  conflicts  of  interest.  This  includes  arrange¬ 
ments  in  which  links  to  other  sites  are  posted  because  of 
financial  considerations.  Similar  standards  should  hold  in 
discussion  forums. 

4.  Currency:  dates  that  content  was  posted  and  updated  should 
be  indicated. 

Murray  and  Rizzolo  (1997)  pointed  out  that: 

somehow,  just  the  fact  that  information  is  traveling  quickly  through 
space  and  being  presented  on  the  computer  screen  lends  it  an  air  of 
authority  which  may  be  beyond  its  due.  Sites  with  official-sounding 
names  can  dupe  the  inexperienced  or  uncritical  into  unquestioned  ac¬ 
ceptance  of  the  content. 

In  addition  to  the  standards  identified  by  Silberg  et  al. 
(1997),  Murray  and  Rizzolo  (1997),  and  others  have  cited  other 
criteria  for  evaluating  websites.  These  include  the  authority  of 
the  author/creator,  the  accuracy  of  information  and  compara¬ 
bility  with  related  sources,  the  workability  (user  friendliness, 
connectivity,  search  access),  the  purpose  of  the  resource  and 
the  nature  of  the  intended  users,  whether  criteria  for  infor¬ 
mation  inclusion  are  stated,  the  scope  and  comprehensiveness 
of  the  materials,  and  the  uniqueness  of  the  resource. 

Jadad  and  Gagliardi  (1998)  identified  a  number  of  instru¬ 
ments  used  to  provide  external  ratings  of  websites.  These  rat¬ 
ings  are  used  to  produce  awards  or  quality  ratings,  provide  seals 
of  approval,  identify  sites  that  are  featured  as  the  “best  of  the 
web”  or  “best”  in  a  given  category,  and/or  to  declare  sites  as 
meeting  quality  standards.  They  attempted  to  determine  what 
criteria  were  used  to  establish  these  ratings  and  to  establish  the 
degree  of  validation  of  these  rating  instruments.  However,  few 
organizations  listed  the  criteria  behind  their  ratings  and  none 
provided  information  on  interobserver  validity  or  construct 
validity.  Jadad  and  Gagliardi  (1998)  also  discussed: 

•  whether  it  is  desirable  to  evaluate  information  on  the  In¬ 
ternet  due  to  concerns  over  freedom  of  expression,  exces¬ 
sive  regulatory  control,  etc. 


•  whether  it  is  possible  to  evaluate  information  on  the  Inter¬ 
net  due  to  the  lack  of  a  gold  standard  for  quality  information 
and  the  controversy  surrounding  its  definition. 

They  also  pointed  out  that  information  on  Internet  is  “dif¬ 
ferent  from  that  found  in  journals — information  is  produced 
and  exchanged  by  groups  of  people  (i.e.,  health  professionals, 
consumers,  vendors,  etc.)  using  multiple  formats  (i.e.,  text, 
video,  sound)  modified  at  fast  and  unpredictable  rate,  and 
linked  within  a  highly  elaborate  and  complex  network  of  Inter¬ 
net  sites.”  They  concluded  that  with  respect  to  external  inde¬ 
pendent  evaluation  of  sites  it  is  not  clear: 

•  whether  evaluation  instruments  should  exist  in  the  first  place, 

•  whether  they  measure  what  they  claim  to  measure, 

•  whether  they  lead  to  more  good  than  harm 

•  whether  users  may  ever  notice,  or  if  they  notice,  whether 
they  will  ignore  evidence  in  support  or  against  desirability, 
feasibility  or  benefits  of  formal  evaluations  of  health  infor¬ 
mation  on  the  Internet. 

Robinson  et  al.  (1998)  focused  more  on  internal  evaluation 
of  applications  such  as  self-evaluation  by  the  sponsors  or  devel¬ 
opers  of  interactive  health-communication  applications.  They 
pointed  out  a  number  of  barriers  to  evaluating  these  applications. 
These  include  the  fact  that  the  media  and  infrastructure  underly¬ 
ing  applications  is  in  a  dynamic  state;  applications  themselves 
change  frequently;  many  applications  are  used  in  situations 
where  a  variety  of  influences  on  health  outcomes  exist,  few  of 
which  are  subject  to  easy  assessment  or  experimental  controls; 
developers  lack  familiarity  with  evaluation  methods  and  tools; 
and  developers  often  believe  that  evaluation  will  delay  develop¬ 
ment,  increase  front-end  costs,  and  have  limited  impact  on  sales. 

Patrick  and  Koss  (1995)  suggested  that  the  effectiveness  of  con¬ 
sumer  health  information  should  be  measured  by  how  rapidly  and 
completely  desired  messages  are  communicated  and  how  com¬ 
pletely  intended  changes  in  behavior  occur.  Saying  that  Silberg 
et  al.’s  criteria  (explicit  authorship  and  sponsorship,  attribution 
of  sources,  and  dating  of  materials)  are  not  enough,  Wyatt  (1997) 
provided  far  more  specific  direction  for  evaluating  websites.  He 
believes  that  evaluation  of  websites  should  go  beyond  mere  ac¬ 
countability  to  assess  the  quality  of  their  content,  functions,  and 
likely  impact.  Evaluating  the  content  should  include: 

•  determining  the  accuracy  of  web  material  by  comparing  it  to 
the  best  evidence,  i.e.,  for  effectiveness  of  treatment — ran¬ 
domized  trials,  for  risk  factors,  cohort  studies,  or  for  diagnos¬ 
tic  accuracy — blinded  comparisons  of  test  with  a  standard. 

•  determining  timeliness  by  checking  the  date  on  web  pages 
but  recognizing  that  material  may  not  have  been  current  at 
that  time,  so  need  independent  comparison  with  most  up  to 
date  facts  is  preferable. 

•  determining  if  people  can  read  and  understand  web  material, 
asking  visitors  to  record  satisfaction  is  unlikely  to  reveal  prob¬ 
lems  with  comprehension  since  visitors  may  not  realize  they 
misunderstood  something  or  may  blame  themselves.  So, 
need  a  minimum  reading  age  for  material  for  the  public. 
However,  more  accurate  to  ask  users  questions  based  on  the 
web  content. 
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Evaluating  the  functions  of  websites  should  include  deter¬ 
mining  how  easy  it  is  to  locate  a  site,  how  easy  it  is  to  locate  ma¬ 
terial  within  the  site  and  whether  the  site  is  actually  used  and 
by  whom.  For  those  investing  resources  in  a  website,  with  re¬ 
spect  to  evaluating  the  impact,  he  suggests  looking  at  the  im¬ 
pact  on  clinical  processes,  patient  outcomes,  and  its  cost  effec¬ 
tiveness  compared  to  other  methods  of  delivering  the  same 
information.  He  recommended  using  randomized  control  trials 
as  the  most  appropriate  method  for  determining  impact.  Finally, 
with  respect  to  evaluation  methodology,  he  pointed  out  the  im¬ 
portance  of  choosing  appropriate  subjects  (not  technology  en¬ 
thusiasts)  and  the  need  to  make  reliable  and  valid  measure¬ 
ments.  Wyatt  (1997)  believed  that: 

Ideally,  investigators  would  have  access  to  a  library  of  previously  vali¬ 
dated  measurement  methods,  such  as  those  used  for  quality  of  life. 
However,  few  methods  are  available  for  testing  the  effect  of  informa¬ 
tion  resources  on  doctors  and  patients,  so  investigators  must  usually 
develop  their  own  and  conduct  studies  to  explore  their  validity  and 
reliability,  (p.  1880) 

The  Science  Panel  on  Interactive  Communication  and  Health 
(SciPICH)  was  convened  by  the  Office  of  Disease  Prevention 
and  Health  Promotion  of  the  U.S.  Department  of  Health  and 
Human  Services  to  examine  interactive  health  communication 
technology  and  its  potential  impact  on  the  health  of  the  public. 
The  panel  was  comprised  of  14  experts  from  a  variety  of  disci¬ 
plines  related  to  interactive  technologies  and  health,  including 
medicine,  HCI,  public  health,  communication  sciences,  educa¬ 
tional  technology,  and  health  promotion.  One  of  the  products 
of  the  SciPICH  products  is  an  evaluation  reporting  template 
(Robinson  et  ak,  1998)  for  developers  and  evaluators  of  inter¬ 
active  health  communication  applications  to  help  them  report 
evaluation  results  to  those  who  are  considering  purchasing  or 
using  their  applications.  The  template  has  four  main  sections: 
(a)  description  of  the  application,  (b)  formative  and  process 
evaluation,  (c)  outcome  evaluation,  and  (d)  background  of  eval¬ 
uators.  The  panel  defined  the  three  different  types  of  evaluation 
as  follows: 

Formative  evaluation.  Used  to  assess  the  nature  of  the 
problem  and  the  needs  of  the  target  audience  with  a  focus  on 
informing  and  improving  program  design  before  implementa¬ 
tion.  This  is  conducted  prior  to  or  during  early  application  de¬ 
velopment,  and  commonly  consists  of  literature  reviews,  re¬ 
views  of  existing  applications,  and  interviews  or  focus  groups 
of  “experts”  or  members  of  the  target  audience. 

Process  evaluation.  Used  to  monitor  the  administra¬ 
tive,  organizational,  or  other  operational  characteristics  of  an 
intervention.  This  helps  developers  successfully  translate  the 
design  into  a  functional  application  and  is  performed  during 
application  development.  This  commonly  includes  testing  the 
application  for  functionality  and  may  be  known  as  alpha  and 
beta  testing. 

Outcome  evaluation.  Used  to  examine  an  intervention’s 
ability  to  achieve  its  intended  results  under  ideal  conditions 
(i.e.,  efficacy)  or  under  real  world  circumstances  (i.e.,  effective¬ 


ness),  and  its  ability  to  produce  benefits  in  relation  to  its  costs 
(i.e.,  efficiency  or  cost  effectiveness).  This  helps  developers 
learn  whether  the  application  is  successful  at  achieving  its  goals 
and  objectives  and  is  performed  after  the  implementation  of  the 
application. 

Evaluating  the  effectiveness  of  web-based  applications  de¬ 
signed  to  relay  information  and/or  enable  informed  decision 
making  is  complicated  because  the  “success”  of  these  particu¬ 
lar  types  of  applications  is  (a)  not  necessarily  always  related  to 
observable  behaviors;  (b)  based  on  the  quality  and  usability  of 
the  information  within  the  application;  and  (c)  a  function  of 
the  application  itself  as  well  as  the  users.  In  terms  of  outcome- 
evaluation  applications,  Robinson  et  al.  (1998)  gave  examples  of 
the  types  of  questions  that  such  evaluation  should  address: 

1.  How  much  do  users  like  the  application? 

2.  How  helpful/ useful  do  users  find  the  application? 

3.  Do  users  increase  their  knowledge? 

4.  Do  users  change  their  beliefs  or  attitudes  (i.e.,  self-efficacy, 
perceived  importance,  intentions  to  change  behavior,  and 
satisfaction)? 

5.  Do  users  change  their  behaviors  (i.e.,  risk  factor  behav¬ 
iors,  interpersonal  interactions,  compliance,  and  use  of 
resources)? 

6.  Are  there  changes  in  morbidity  or  mortality  (i.e.,  symptoms, 
missed  days  of  school/work,  physiologic  indicators)? 

7.  Are  there  effects  on  cost/resource  utilization  (i.e.,  cost- 
effectiveness  analysis)? 

8.  Do  organizations  or  systems  change  (i.e.,  resource  utiliza¬ 
tion,  and  effects  on  “culture”?). 

However,  for  websites  designed  primarily  to  provide  infor¬ 
mation  or  enable  informed  decision  making,  only  the  first  three 
questions  apply.  Other  potential  outcomes  related  to  change  in 
behavior  might  be  applicable  depending  on  the  nature  of  deci¬ 
sions  made.  Consequently,  the  evaluation  of  these  types  of  web 
guides  needs  to  focus  on  the  use  of  the  guides  (assuming  that 
users  who  “like”  an  application  will  use  it  more  than  those  who 
do  not),  the  usefulness  of  the  guides,  the  usability  of  the  guides, 
the  ability  of  the  guide  to  increase  knowledge,  and  the  contri¬ 
bution  of  the  guide  to  decision  making.  These  elements  can  be 
integrated  into  a  conceptual  framework  as  shown  in  Eig.  33.1 
Part  of  this  framework  draws  from  the  Agency  for  Health  Care 
Policy  and  Research  reported  on  consumer  health  informatics 
and  patient  decision  making  (1997)  as  well  as  Sainfort  and 
Booske  (1996). 

The  framework  suggests  that  to  evaluate  the  effectiveness 
of  web-based  health  applications,  at  least  three  perspectives  (at 
the  bottom  of  the  figure)  can  be  taken  individually  or  in  combi¬ 
nation:  (a)  the  consumer,  (b)  the  site  sponsor,  and  (c)  outside 
experts.  The  framework  posits  that  the  characteristics  of  the  sys¬ 
tem  under  evaluation  primarily  influence  accessibility  and  us¬ 
ability  of  information.  Then,  accessibility,  in  conjunction  with 
consumer/patient  characteristics,  influence  actual  access  to  in¬ 
formation.  In  turn,  the  usability  of  this  information,  again  in 
conjunction  with  consumer  characteristics,  will  influence  actual 
use  of  information  by  the  consumer.  Use  of  information  is  a 
complex  construct.  The  framework  emphasizes  three  main  uses 
of  information:  (a)  knowledge  (inquiry,  verify,  learn,  augment. 
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FIGURE  33.1.  Conceptual  framework. 


etc.),  (b)  decision  making,  and  (c)  behavior  (intentions  and  ac¬ 
tual  behavior  change).  All  three  uses  are  generally  intercon¬ 
nected,  with  behavior  usually  following  knowledge  and/or  de¬ 
cision  making  (whether  explicitly  or  implicitly).  Also  emphasized 
in  the  framework  is  the  fact  that  “use”  can  be  generic  (common 
to  most  web-based  health  applications)  or  specific  (to  the  ap¬ 
plication)  and  also  that  use  of  information  accessed  can  be 
intended  (i.e.,  the  information  was  sought  to  accomplish  a  spe¬ 
cific  purpose  or  use)  or  emergent  (i.e.,  a  piece  of  information 
accessed  triggers  a  new  use).  Finally,  the  framework  shows,  sur¬ 
rounding  this  entire  process,  that  the  context  (the  situation, 
time,  place,  organization,  etc.)  can  influence  all  key  elements: 
accessibility,  access,  usability,  and  use.  Consumer  evaluations  are 
formed  because  of  the  experiences  they  have  using  the  system 
over  time.  Expert  evaluation  usually  involves  assessment  of  the 
system  itself  and  its  content,  as  well  as  an  assessment  of  its  (an¬ 


ticipated)  impact  on  users.  Sponsors  evaluation  involves  both 
consumer  and  expert  evaluations  as  well  as  consideration  of 
organizational  resources  expended  in  the  design,  development, 
implementation,  and  operation  of  the  system. 

It  is  important  to  differentiate  information  access  and  quan¬ 
tity  from  use  and  usefulness  (Booske  &  Sainfort,  1998).  Indeed, 
while  many  studies  primarily  address  attempts  to  measure  the 
“quantity”  of  information  used,  others  have  acknowledge  the 
need  to  consider  the  “quality”  of  information.  The  introduction 
of  the  web  as  a  source  of  information  has  introduced  a  number 
of  additional  considerations  in  evaluation  of  the  use  and  use¬ 
fulness  of  information.  The  most  common  measure  of  informa¬ 
tion  use  on  the  web  is  the  number  of  “hits”  or  “visits”  to  a  par¬ 
ticular  page.  Most  of  these  counts  do  not  differentiate  between 
multiple  visits  by  a  single  user  versus  multiple  users  nor  do  they 
consider  the  amount  of  time  spent  on  a  particular  page,  i.e.. 
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whether  the  page  is  merely  used  as  a  link  to  elsewhere  or  as  a 
source  of  information  itself 

In  describing  three  general  approaches  to  information  sys¬ 
tems  evaluation  (i.e.,  the  system’s  output,  behavior,  and  archi¬ 
tecture),  Orman  (1983)  defined  the  quality  and  quantity  of  in¬ 
formation  as  the  relevant  variables  in  defining  an  information 
system’s  output.  However,  he  went  on  to  say  that  while  both 
quantity  and  quality  of  information  were  “of  considerable  theo¬ 
retical  interest  but  of  little  practical  value  since  neither  can  be 
defined  or  measured  with  acceptable  precision.”  Orman  defined 
the  quality  of  information  produced  by  a  system  in  terms  of  “its 
contributions  to  the  quality  of  the  decisions  it  aids”  and  points 
out  that  this  is  “highly  influenced  by  the  style  and  the  behavior 
of  the  information  user  and  the  state  of  the  environment.” 

This  ties  in  with  the  concept  of  the  “value”  of  information,  an 
important  part  of  traditional  decision  and  economic  analysis. 
’Within  the  field  of  decision  science,  the  primary  use  of  infor¬ 
mation  is  to  help  make  decisions.  However,  an  examination  of 
literature  from  other  disciplines  has  shown  that  there  are  other 
ways  to  conceptualize  the  value,  and  thus  the  effectiveness  of  in¬ 
formation  and  its  usefulness  (Orman,  1983).  Although  it  can  be 
thought  that  the  term  “information  usefulness”  could  be  an  ana¬ 
logue  for  value  of  information,  within  the  field  of  HCI,  the  con¬ 
cept  of  usefulness  includes  additional  dimensions.  The  con¬ 
cepts  of  “perceived  usefulness”  and  “usability”  of  information 
systems  are  both  receiving  more  attention  as  more  systems  are 
developed  for  novice  users.  For  example,  Davis,  Bagozzi,  and 
’Warshaw  (1989)  developed  scales  to  measure  perceived  useful¬ 
ness  and  ease  of  use  to  evaluate  specific  software  in  the  work 
setting.  They  defined  perceived  usefulness  as  “a  prospective 
user’s  subjective  probability  that  using  a  specific  application  sys¬ 
tem  will  increase  his  or  her  job  performance  within  an  organi¬ 
zational  context.”  Ease  of  use  was  defined  as  “the  degree  to 
which  the  prospective  user  expects  the  target  system  to  be  free 
of  effort.”  Using  a  seven-point  Likert  scale,  potential  users  of 
software  responded  to  six  items  addressing  perceived  useful¬ 
ness  and  six  to  assess  perceived  ease  of  use. 

Of  particular  relevance  in  considering  the  usability,  and  po¬ 
tentially  the  usefulness,  of  information  is  the  manner  of  dis¬ 
play.  Although  technological  advances  continue  to  increase  the 
number  of  possible  methods  for  disseminating  information,  the 
primary  output  media  for  healthcare  information  are  still  print, 
video,  and  sound.  Print  information  is  often  categorized  as  text, 
tables,  graphs,  and  figures.  There  is  a  quite  extensive  body  of  lit¬ 
erature  comparing  the  effectiveness  of  displaying  information 
in  tables  versus  graphs  (Jarvenpaa  &  Dickson,  1988,  and 
Dahlberg,  1991  provided  comprehensive  summaries  of  this  em¬ 
pirical  work).  Many  of  these  studies  rely  on  elicited  “directed 
interpretations”  where  subjects  are  given  specific  questions 
about  the  information  in  tables  and  graphs.  In  contrast,  a  more 
recent  study  by  Carswell  and  Ramzy  (1997)  elicited  spontaneous 
interpretation  of  a  series  of  tables,  bar  graphs,  and  line  graphs 
to  find  out  “what  information  subjects  choose  to  take  away 
from  a  display  rather  than  their  ability  to  extract  information 
when  promoted.” 

’Within  the  context  of  decision  making,  Schkade  and  Klein- 
muntz  (1994)  found  that  different  characteristics  of  information 
display  affected  aspects  of  choice  processes.  They  found  that  or¬ 
ganization  of  information  (such  as  a  matrix  versus  a  list)  influ¬ 


enced  organization  acquisition;  form  (numeric  versus  linguis¬ 
tic)  influenced  information  combination  and  evaluation;  while 
sequence  had  only  a  limited  effect  on  acquisition.  Johnson,  E.  J., 
Payne,  and  Bettman  (1988)  and  others  have  provided  evidence 
of  preference  reversals  because  of  different  information  displays. 

Many  of  the  issues  surrounding  the  use  of  the  Internet  are 
issues  of  usability.  Usability  is  a  central  notion  of  the  HCI  field. 
Simply  put,  Shackel  (1991)  defined  usability  as  the  “the  capa¬ 
bility  to  be  used  by  humans  easily  and  effectively”  while  his 
more  formal  operational  and  goal-oriented  definition  said,  “for 
a  system  to  be  usable,  the  following  must  be  achieved”: 

Effectiveness.  The  required  range  of  tasks  must  be  ac¬ 
complished  better  than  some  required  level  of  performance 
(i.e.,  in  terms  of  speed  and  errors);  by  some  required  percent¬ 
age  of  the  specified  target  range  of  users;  within  some  required 
proportion  of  the  range  of  usage  environments. 

Learnahility.  ’Within  some  specified  time  from  commis¬ 
sioning  and  start  of  user  training;  based  on  some  specified 
amount  of  user  training  and  support;  and  within  some  specified 
relearning  time  each  time  for  intermittent  users. 

Flexibility.  'With  flexibility  allowing  adaptation  to  some 
specified  percentage  variation  in  tasks  and/or  environments  be¬ 
yond  those  first  specified. 

Attitude.  And  within  acceptable  levels  of  human  cost  in 
terms  of  tiredness,  discomfort,  frustration,  and  personal  effort; 
so  that  satisfaction  causes  continued  and  enhanced  usage  of  the 
system,  (p.  48) 

In  an  attempt  to  place  system  usability  in  relation  to  other 
system  concepts,  Shackel  (1991)  suggested  that: 

utility  (will  it  do  what  is  needed  functionally?),  usability  (will  the  users 
actually  work  it  successfully?)  and  likeahility  (will  the  users  feel  it  is  suit¬ 
able?)  must  be  balanced  in  a  trade-off  against  cost  (what  are  the  capital 
and  running  costs  and  what  are  the  social  and  organizational  conse¬ 
quences?)  to  arrive  at  a  decision  about  acceptability  (on  balance  the 
best  possible  alternative  for  purchase),  (p.  50). 

These  concepts  are  all  relevant  in  a  decision  to  purchase  or  ac¬ 
cept  an  information  system,  but  may  not  be  all  relevant  when  the 
resource  or  product  under  consideration  is  information.  As  Or¬ 
man  (1983)  pointed  out,  the  value  of  an  information  system  is  dif¬ 
ferent  from  the  value  of  its  information  content,  “just  as  the  value 
of  a  candy  machine  is  different  from  the  value  of  the  candy  it  dis¬ 
penses.”  Furthermore,  usability  of  information  is  a  necessary  but 
not  sufficient  criterion  for  information  to  be  useful,  (p.  312) 

In  the  context  of  informing  healthcare  consumer  decisions, 
Hibbard  and  Slovic-Jewett  (1997)  asked  how  much  informa¬ 
tion  is  too  much.  In  response  to  this  question,  they  deter¬ 
mined  that  the  critical  element  is  the  ability  to  interpret  and 
integrate  information  items:  integration  is  a  very  difficult  cog¬ 
nitive  process.  They  suggest  that  for  a  consumer,  more  infor¬ 
mation  is  not  necessarily  better  and  that  the  simple  provision 
is  not  sufficient  when  the  information  is  complex.  In  complex- 
decision  situations,  it  is  important  to  pay  particular  attention 
to  human-information  processing  capabilities  and  differences 
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across  individuals.  For  example,  specifically  addressing  infor¬ 
mation  presentation  format,  Togo  and  Hood  (1992)  found  a 
significant  interaction  effect  between  gender  and  format.  Per¬ 
sonality  differences  have  also  been  found  to  contribute  to  vari¬ 
ation  in  information  processing.  The  way  people  gather  and 
evaluate  information  is  at  least  in  part  based  on  their  psycho¬ 
logical  type  (Slocum  &  Hellriegel,  1983).  Some  people  are  dri¬ 
ven  to  know  details  before  decisions  are  made  while  others 
feel  more  comfortable  assuming  what  is  not  known.  Individu¬ 
als  vary  on  how  they  receive  messages,  seek  information,  or¬ 
ganize  information,  and  process  information.  One  approach  to 
categorizing  cognitive  style,  the  Myers-Briggs  Type  Indicator 
(MBTI),  is  based  on  Jung’s  typology  (Myers,  1987). 

Other  models  of  learning  styles  can  be  used  to  differentiate 
among  individuals  and  their  preferred  methods  or  strategies  for 
taking  in  and  processing  information.  Felder  (1996)  provided 
a  useful  summary  of  these  models.  For  example,  Kolb’s  learn¬ 
ing-style  model  classified  people  as  having  a  preference  for 
(a)  concrete  experience  or  abstract  conceptualization  (how 
they  take  information  in)  and  (b)  active  experimentation  or 
reflective  observation  (how  they  internalize  information).  The 
Hermann  Brain  Dominance  Instrument  (HBDI)  classified  peo¬ 
ple  in  terms  of  their  relative  preferences  for  thinking  that  are 
based  on  the  task-specialized  functioning  of  the  physical  brain, 
i.e.,  left  brain  versus  right  brain,  cerebral  versus  limbic.  The 
Felder-Silverman  Learning  Style  model  classified  learners  along 
five  dimensions:  (a)  sensing  or  intuitive,  (b)  visual  or  verbal, 
(c)  inductive  or  deductive,  (d)  active  or  reflective,  and  (e)  se¬ 
quential  or  global. 

In  addition  to  individual  differences,  other  factors  that  can 
affect  information  processing  reflect  characteristics  of  the  in¬ 
formation  itself  (John  &  Cole,  1986),  i.e.,  information  quantity, 
information  source,  information  format  (mode  of  presentation, 
organization,  order),  information  complexity,  nature  of  access 
(i.e.,  voluntary  vs.  mandatory),  instructions  in  the  use  of  the  in¬ 
formation,  response  formats  (i.e.,  recognition,  recall,  judgment, 
choice),  as  well  as  the  interface  itself 

Understanding  cognitive  processes  involved  in  accessing, 
processing,  interpreting,  and  using  healthcare  information  is 
critical  to  the  successful  design  and  implementation  of  web- 
based  healthcare  applications.  This  particular  point  extends  to 
applications  targeted  at  providers  and  other  healthcare  profes¬ 
sionals.  In  looking  at  the  impact  of  computer-based  patient 
record  systems  on  data  collection,  knowledge  organization,  and 
reasoning,  Patel,  Kushniruk,  Yang,  and  Yale  (2000)  indicated  that 
exposure  to  computer-based  patient  records  was  associated 
with  changes  in  physicians’  information  gathering  and  reason¬ 
ing  strategies.  They  concluded  that  such  technology  could  have 
profound  influence  in  shaping  cognitive  behavior.  In  such  sys¬ 
tems,  the  HCI  itself  can  have  a  strong  influence  on  information 
gathering,  processing,  and  reasoning  strategies.  Recently,  Patel, 
Arocha,  and  Kaufman  (2001)  surveyed  literature  on  aspects  of 
medical  cognition  and  suggested  that: 

cognitive  sciences  can  provide  important  insights  into  the  nature  of  the 
processes  involved  in  human-computer  interaction  and  help  improve 
the  design  of  medical  information  systems  by  providing  insight  into  the 
roles  that  knowledge,  memory,  and  strategies  play  in  a  variety  of  cogni¬ 
tive  activities  (p.  324). 


FUTURE  OPPORTUNITIES  AND  CHALLENGES 


Information  and  communication  technologies  have  only  begun 
to  impact  the  healthcare  industry.  Regarding  the  use  of  the  In¬ 
ternet,  a  number  of  applications  will  be  at  the  leading  edge. 
Mittman  and  Cain’s  1999  prediction  that  these  applications 
would  include  consumer  health-information  services,  online 
support  groups  for  patients  and  caregivers,  healthcare-provider- 
information  services,  provider-patient  e-mail,  communications 
infrastructure  and  transaction  services,  and  EMRs  appears  to 
true.  Similarly,  the  barriers  they  predicted  would  impede  or 
slow  down  the  development  of  the  Internet  in  healthcare  have 
largely  proven  true  as  well:  security  concerns,  weaknesses  in¬ 
herent  to  web  interfaces  (especially  browsers,  search-engine 
technology,  and  inability  to  interact  with  legacy  systems),  mked 
or  lack  of  quality  of  information,  physician  ambivalence,  disarray 
of  current  healthcare-information  systems,  lack  of  resources  for 
web  development,  and  lack  of  unified  standards  for  electronic 
communications  and  transactions. 

For  information  technology  to  fully  impact  healthcare  de¬ 
livery  with  a  focus  on  both  patients  and  providers,  these  bar¬ 
riers  need  to  be  overcome.  'We  believe  that  the  greatest  benefits 
will  be  reached  in  the  short  term  by  focusing  on  the  following 
five  areas. 


Supporting/Enhancing  the  Patient-Provider  Interaction 

The  Internet  will  become  a  critical  element  of  the  physician- 
patient  encounter.  Organizational  websites  will  evolve  from 
publishing  generic  consumer  content  to  providing  personal¬ 
ized,  online  services  to  all  consumers  (individuals,  patients, 
providers,  etc.).  The  Internet  needs  to  provide  for  technology 
that  truly  supports  two-way  interaction  between  patients  and 
providers  and  directly  impacts  care  delivery. 

Effective  technologies  need  to  be  developed  to  fully  support, 
enhance,  and  extend  the  patient-physician  interaction  so  as  to 
increase  the  efficiency  of  care  delivery,  increase  the  quality  of 
care  received  by  patients,  as  well  as  increase  the  effectiveness 
of  the  work  performed  by  the  physician.  This  latter  point  is  im¬ 
portant  since  a  recent  study  by  Linzer,  Konrad,  Douglas,  Mc- 
Murray  Pathman,  ’Williams,  Schwartz,  Gerrity  Scheckler,  Bigby 
Rhodes  (2000)  showed  that  time  stress,  defined  as  reports  by 
the  physician  that  they  needed  more  time  for  patients  than  they 
were  allotted,  was  significantly  related  to  burnout,  job  dissatis¬ 
faction,  and  patient-care  issues.  This  study  was  performed  with 
a  national  sample  {n  =  5704)  of  physicians  in  primary  care  spe¬ 
cialties  and  medical  and  pediatric  subspecialties.  It  demon¬ 
strates  that  the  ability  to  spend  sufficient  and  quality  time  with 
each  patient  is  critical  to  ensure  long-term  job  satisfaction,  avoid 
physician  burnout,  and  increase  quality  of  patient  care. 

In  developing  such  technologies,  as  we  mentioned  earlier, 
one  would  need  a  full  understanding  of  the  various  cognitive 
processes  involved  in  information  gathering,  knowledge  acqui¬ 
sition  and  organization,  reasoning  strategies,  and  decision  mak¬ 
ing.  Of  particular  importance  is  the  recognition  of  various  users 
with  various  characteristics  potentially  using  technologies  in  a 
variety  of  situations,  contexts,  organizations,  and  environments. 
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Supporting/Enhancing  Collaborative 
Work  among  Providers 

The  nature  of  the  physician/associate  provider  (nurse,  physician 
assistant)  relationship  is  evolving  from  mere  interaction  to  true 
collaboration  with  technologies  allowing  associate  providers  to 
increase  involvement  in  clinical  decision  making  and  imple¬ 
mentation.  Technologies  to  support  collaborative  work  envi¬ 
ronment  in  fast-pace,  mobile  environment  are  needed. 

As  mentioned  earlier,  the  healthcare  work  environment  is 
very  complex  and  involves  a  variety  of  healthcare  profession¬ 
als,  all  with  varying  needs  for  information,  knowledge,  and 
support.  Designing  technologies  supporting  both  individual 
needs  as  well  as  collaboration  among  individuals  presents  a 
number  of  challenges.  In  a  recent  study,  Jacko,  Sears,  and 
Sorensen  (2001)  showed  that  different  healthcare  profession¬ 
als  (physicians,  pharmacists,  nurses)  exhibit  very  different  pat¬ 
terns  of  use  of  the  Internet  for  clinical  purposes  and  have  very 
different  perceptions  of  needed  enhancements  to  support 
their  respective  needs.  The  patterns  differed  in  terms  of  the 
range  and  type  of  information  as  well  as  the  depth  and  speci¬ 
ficity  of  information. 

Healthcare  organizations  will  be  faced  with  technology  in¬ 
tegration  challenges  and  will  make  use  of  the  Internet  as  well 
as  intranet  technologies  to  manage  and  organize  the  variety 
of  applications  in  use  by  providers  and  their  patients.  Of  criti¬ 
cal  importance  for  healthcare  organizations  will  be  the  design 
of  systems  to  support  clinical  decisions,  knowledge  manage¬ 
ment,  organizational  learning,  administrative  transactions,  and 
supply  chain. 


Developing  and  Utilizing  New  Information 
and  Communication  Technologies 

Wireless,  handheld,  and  nanotechnologies  will  drive  higher 
adoption  rates  of  the  Internet  in  clinical  settings,  patient  moni¬ 
toring,  and  disease  management.  Both  the  number  of  inter¬ 
active  wireless  device  users  and  the  number  of  devices  will  in¬ 
crease  significantly  in  the  next  few  years.  New  applications  for 
such  technologies  need  to  be  developed,  based  on  fundamental 
research  conducted  to  investigate  usage  of  such  devices. 

Mobile  computing  and  wireless  technologies  will  increas¬ 
ingly  become  an  important  part  of  healthcare’s  information 
technology.  Turisco  and  Case  (2001)  reported  the  following 
predictions: 

•  The  number  of  wireless  Internet  users  will  reach  83  million 
by  the  end  of  2005. 

•  By  the  end  of  2004,  there  will  be  95  million  browser-enabled 
cellular  phones  and  more  than  13  million  web-enabled  per¬ 
sonal  digital  assistants. 

•  The  wireless  LAN  market  is  expected  to  reach  $1  billion  in 
2001  and  this  figure  will  double  by  2004. 

Turisco  and  Case  (2001)  reported  that  mobile  computing 
applications  for  healthcare  began  with  reference  tools  and  then 
moved  to  transaction-based  systems  to  automate  simple  clini¬ 
cal  and  business  tasks.  They  predict  that  the  next  step  will  be 
to  provide  multiple  integrated  applications  on  a  single  device.  In 
particular,  they  mention  the  following  applications:  prescription 


writing,  charge  capture  and  coding,  lab  order  entry  and  results 
reporting,  clinical  documentation,  alert  messaging,  and  commu¬ 
nication,  clinical-decision  support,  medication  administration, 
and  inpatient-care  solutions. 


Designing  and  Utilizing  Adaptive  HCls 

The  development  and  integration  of  information  and  communi¬ 
cation  technologies  designed  to  support,  facilitate,  and  enhance 
patient/provider  interactions  are  critical.  In  particular,  research  is 
needed  to  design  optimal  HCIs.  A  dual  focus  on  (a)  increasing 
job  satisfaction  and  effectiveness  for  the  medical  personnel  and 
(b)  increasing  quality  of  care  as  well  as  safety  for  patients  is 
needed.  To  support  these  objectives,  the  HCI  must  possess  at 
least  four  qualities: 

•  multimodal,  i.e.,  have  the  ability  to  display  and  accept  infor¬ 
mation  in  a  combination  of  visual,  aural,  and  haptic  modes, 

•  personalized,  i.e.,  tailored  to  respond  in  a  manner  best  suited 
to  the  current  user  and  his  or  her  needs, 

•  multisensor,  i.e.,  have  the  ability  to  detect  and  transmit 
changes  in  the  user  and/or  situation, 

•  adaptive,  i.e.,  have  the  ability  to  change  its  behavior  in  real 
time  to  accommodate  user  preferences,  user  disabilities,  and 
changes  in  the  situation/environment. 

Optimal  interfaces  are  critical  to  virtually  all  applications 
connecting  people  to  information  technologies  and  people  to 
people  via  information  and  communication  technologies.  Cur¬ 
rent  research  is  underway  to  develop  intelligent  adaptive  multi¬ 
modal  interface  systems.  In  the  future,  interfaces  will  automati¬ 
cally  adapt  themselves  to  the  user  (capabilities,  disabilities,  etc.), 
task,  dialogue,  environment,  and  input/output  modes  in  order 
to  maximize  the  effectiveness  of  the  HCI. 

This  is  especially  critical  in  healthcare  for  both  types  of  users: 
consumers/patients  and  providers.  Consumers/patients  present 
a  number  of  challenges  regarding  the  interface.  In  addition  to 
presenting  different  personal  and  sociodemographic  character¬ 
istics,  users  in  healthcare  present  varying  degrees  of  health  sta¬ 
tus:  healthy  consumers,  newly  diagnosed  patients,  chronically  ill 
patients,  and/or  their  caregivers  will  use  the  same  devices  in 
potentially  very  different  ways.  Similarly,  different  providers  will 
have  very  different  characteristics  and  will  perform  a  multitude 
of  varying  and  highly  dynamic  tasks  in  different  contexts  and 
environments. 


Moving  to  e-Health 

The  opportunities  for  improving  service  and  decreasing  cost  via 
e-commerce  technologies  and  the  supply  chain  are  tremen¬ 
dous.  In  addition,  modern  information  and  communication 
technologies  (including  sensors,  wireless  communication,  and 
implant  technologies)  will  enable  electronic  delivery  of  health 
care.  This  is  far  more  comprehensive  than  the  mere  electronic 
delivery  of  health  information  to  patients  and  providers  and 
includes  new  developments  such  as  telemedicine  and  virtual 
reality.  Krapichler,  Haubner,  Losch,  Schuhmann,  Seeman,  and 
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Englmeierl  (1999)  claimed  that  “virtual  environments  are  likely 
to  be  used  in  the  daily  clinical  routine  in  [the]  medicine  of  to¬ 
morrow”  (p.  448).  However,  a  number  of  barriers  will  need  to 
be  overcome.  Organizational  barriers  to  e-Health  include  infra¬ 
structure,  organization,  culture,  and  strategy,  systems  integra¬ 
tion,  and  workflow  integration.  Technological  barriers  include 
integration  and  security,  interface  design,  connectivity,  speed, 
reliability,  and  usability  issues. 

The  evolution  toward  e-Health  will  involve  a  number  of  ma¬ 
jor  changes.  For  example,  organizational  websites  will  evolve  from 
publishing  generic  consumer  content  to  providing  personalized, 
online  interactive  services  to  profitable  patient  segments.  Deliv¬ 


ery  organizations  will  focus  on  direct-to-patient  relationship 
building,  migrating  to  a  health  system  truly  centered  on  patients 
and  communities.  Communities  of  interest  will  rapidly  expand 
to  become  a  force  in  healthcare  navigation.  Wireless  and  hand¬ 
held  technologies  will  drive  higher  adoption  rates  of  the  Inter¬ 
net  in  clinical  settings.  The  Internet  will  become  a  critical  element 
of  the  physician-patient  encounter  and  will  support  and  enhance 
the  patient-provider  interaction.  These  changes  will  lead  to  a  re¬ 
structuration  of  the  healthcare  industry.  Digital  health  plans  will 
emerge  and  compete  or  threaten  the  traditional  healthcare-payer 
business  model.  Virtual  networks  will  begin  to  emerge  around 
specialty  services  to  provide  efficient  personalized  healthcare. 
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WITHOUT  EMOTION  THERE  IS  NO  GAME 


Shakespeare  designed  the  emotional  space  between  characters;  game 
developers  design  the  emotional  space  between  players  and  game. 

Researchers  have  only  just  begun  to  explore  the  role  emotion 
plays  in  human  activities.  In  designing  emotional  responses, 
most  aesthetic  disciplines  treat  audiences  as  consumers  of  con¬ 
tent  and  pay  little  attention  to  designing  emotions  from  inter¬ 
action  or  contribution.  Whether  it  is  a  movie,  a  bottle  of  per¬ 
fume,  or  a  website  advertisement,  a  broadcast  model  is  used  to 
elicit  emotions  from  observers  rather  than  participants.  Scien¬ 
tific  research  is  now  beginning  to  show  how  emotion  influences 
cognition  and  behavior  offering  new  opportunities  to  solicit 
emotion  through  action.  Emotion  emerges  from  and  plays  a 
part  in  most  activities,  from  following  a  goal  to  just  goofing 
around  (Damasio,  1994;  Ekman,  2003;  Norman,  2004;  Lazzaro, 
2004b).  Video  games  lead  the  way  as  interactive  products  that 
create  emotion.  More  emotional  than  software  and  more  inter¬ 
active  than  films,  games  manipulate  player  affect  to  create 
poignant  experiences.  How  they  do  this  provides  lessons  for  the 
design  of  games,  entertainment  interfaces,  and  other  interactive 
products. 

Emotion  is  essential  to  maintain  player  focus,  make  deci¬ 
sions,  perform,  learn,  and  enjoy  the  process  of  play.  Emotion- 
rich  stimuli  grab  players’  attention,  such  as  a  swashbuckling 
adventure  in  Sid  Meier’s  Piratesl  The  emotions  surrounding 
swordplay  increase  players’  immersion  and  negative  affect  or 
mindset  focuses  players  on  applying  effort  to  overcome  obsta¬ 
cles.  Meanwhile,  positive  affect  from  finding  pirate  treasure  im¬ 
proves  exploration  of  alternatives.  Exploring  options  to  get  the 
player’s  sticky  ball  up  on  top  of  a  table  in  Katamari  Damacy  is 
made  easier  by  the  positive  affect  created  as  it  squashes  and 
picks  up  cutely  rendered  bon-bons,  toys,  and  other  items. 
Strong  emotional  states  also  allow  easier  formation  and  recol¬ 
lection  of  memories,  especially  if  the  user’s  emotion  matches 
the  emotion  of  the  item  to  be  remembered  (Ekman,  2003).  Spe¬ 
cial  moves  in  Top  Spin  Tennis  that  offer  an  emotional  response 
in  an  opponent  are  easier  to  remember  and  motivate  the  search 
for  more.  Games  are  innovators  in  the  design  of  emotional  re¬ 
sponses  integrated  into  the  activity  to  accelerate  it  or  provide 
friction  for  game  goals.  Each  affective  state,  each  emotion, 
carves  a  unique  signature  into  a  player’s  psychology,  physiology, 
and  behavioral  state  to  create  a  player  experience.  Hoping  that 
graphic  realism  alone  will  create  emotions  is  nearly  the  same  as 
adding  more  background  color  to  fix  a  usability  issue. 

Lessons  about  game  experiences  inform  entertainment  in¬ 
terfaces  and  other  product  experiences.  Games  are  not  just  en¬ 
tertaining;  they  are  self-motivating  activities.  User-experience 
designers  for  all  types  of  products  can  take  advantage  of  how 
games  create  these  emotions  from  participation.  To  transition 
from  interface  design  to  user  experience  design,  interfaces  need 
to  be  more  than  “transparent.”  In  games,  the  interface  that 
makes  everything  easy  robs  the  player  of  the  fun.  Pushing  a  but¬ 
ton  for  a  car  to  drive  itself  is  less  thrilling  than  winning  the 
Grand  Prix.  In  software,  the  interface  cannot  do  it  all  because  it 
does  not  have  the  user’s  knowledge.  Therefore,  not  only  must 


interfaces  get  out  of  a  user’s  way,  they  should  allow  the  user  to 
express  him  or  herself  by  motivating  and  supporting  the  cogni¬ 
tive  and  behavioral  functions  required  for  use.  Designers  can 
take  inspiration  from  games  to  fashion  emotional  responses 
during  interaction;  for  instance,  computers  can  detect  and  re¬ 
spond  to  user  emotion,  such  as  “being  more  helpful”  when  soft¬ 
ware  detects  frustration  in  a  user’s  face  or  fingertips.  Like  inter¬ 
changeable  colored  lenses,  games  employ  emotion-producing 
attributes  to  support  human  performance:  rose-colored  here 
for  a  mood  boost,  yellow  for  sharper  vision  later  on,  green  night 
vision  with  a  distance  indicator  for  the  dark  corners,  and  black 
to  win  style  points  from  others.  Current  usability  methods  (in¬ 
creasing  efficiency,  effectiveness,  and  satisfaction)  mostly  re¬ 
move  frustration  points;  they  do  not  yet  include  techniques  to 
measure  and  craft  other  emotions.  To  exaggerate,  a  10096  usable 
product  would  be  boring  once  it  eliminates  all  the  challenge. 
Customers  strap  software  onto  their  boots  like  a  crampon,  but  it 
should  not  do  the  job  so  well  that  it  climbs  the  mountain  for 
them  (Lazzaro,  2004a).  It  is  not  a  productivity  tool  if  one  button- 
click  creates  the  whole  spreadsheet.  User  experiences  should 
focus  on  making  the  process  of  the  task  not  only  easier,  but  also 
more  enjoyable. 

Forget  Usability:  Make  It  Fun 

“/  don’t  want  to  feel  like  the  game  just  wasted  two  hours  of  my  life.  ” 

— k  Might  and  Magic  Player 

■Wasting  time  has  a  whole  different  meaning  to  a  gamer.  Unlike 
a  spreadsheet,  the  outcome  of  a  game  is  in  the  experience  of 
play  rather  than  in  the  quality  of  the  end  result  and  so  is  harder 
to  quantify.  Traditionally  productivity  focuses  more  on  design¬ 
ing  a  process  that  creates  a  better  end-product  or  result.  Game 
design,  on  the  other  hand,  focuses  mostly  on  designing  inter¬ 
active  play  that  is  enjoyable  in  its  own  right.  Rather  than  effi¬ 
ciency,  game  enjoyment  requires  the  pure  pleasure  of  the  ex¬ 
perience  and  a  fair  degree  of  frustration  (Lazzaro,  2004a).  Similar 
to  user  experiences,  player  experiences  are  created  when  what 
happens  in  the  game  affects  the  player  internally  as  well  as  ex¬ 
ternally.  Player  experiences  are  the  combination  of  emotion, 
thoughts,  and  other  sensations  that  occur  inside  and  in  between 
players  during  play.  Beyond  usability,  player  experiences  design 
focuses  on  affect  as  well  as  ease  of  use  (Fig.  34.1). 


Productivity  Software  Goals 

Game  Goal 

Task  completion 

Entertainment 

Eliminate  errors 

Fun  to  beat  obstacles 

External  reward 

Intrinsic  reward 

Outcome-based  rewards 

Process  is  its  own  reward 

Intuitive 

New  things  to  learn 

Reduce  workload 

Increase  workload 

Assumes  technology  needs 

Assumes  humans  need 

to  be  humanized 

to  be  challenged 

FIGURE  34. 1 .  The  goals  of  productivity  and  game  experiences 
have  several  important  differences.  (Source:  Lazzaro  &  Keeker, 
2004). 
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The  interactive  entertainment  offered  by  games  provides 
unique  opportunities  to  create  emotions  in  the  player  and 
unique  challenges  for  the  design  professional.  A  game  must  be 
usable  enough  to  play  but  not  so  usable  as  to  allow  someone 
to  push  a  button  and  win.  Players  crave  the  illusion  of  superior 
control  that  allows  them  to  accomplish  more  than  others  in¬ 
cluding  out-thinking  the  game  designer;  but  not  so  much  con¬ 
trol  that  they  lose  their  way  and  do  not  have  a  good  time. 

Emotions  Are  for  More  than  Entertainment 

“Experience  is  the  feeling  of  what  happens.  ” 

— Damasio,  1994 

Playing  games  in  their  discretionary  time,  gamers  mainly  play 
for  the  emotions  the  games  create.  Recent  neuropsychology  re¬ 
search  suggests  that  two  interconnected  information-process 
ing  systems  continually  scan  the  environment  to  create  a  per¬ 
son’s  experience  of  the  world.  A  person’s  cognitive  system  in¬ 
terprets  and  represents  the  world  internally  in  order  to  rea¬ 
son,  understand,  and  interact  with  it.  A  person’s  affective 
system  interprets  external  and  internal  stimuli  relative  to  goals 
and  needs.  This  affective  system  kicks  in  with  an  emotional  and 
physiological  reaction  before  a  cognitive  response  is  ready. 
Ideas,  thoughts,  memories,  and  knowledge  are  components 
of  cognition;  emotions,  moods,  sentiments,  and  other  internal 
sensations  comprise  a  person’s  affective  response.  (Damasio, 
1994;  Norman,  2004)  From  a  cognitive-psychology  perspective, 
cognition  understands  the  world  and  affect  evaluates  it  (Nor¬ 
man).  On  a  basic  level,  we  bring  items  with  positive  affect 
(sweet-tasting,  soothing  voices,  warm  to  the  touch)  closer  to 
us.  We  push  (generally  speaking)  objects  with  negative  affect 
(bitter,  bloody,  sharp,  diseased)  away  from  us.  In  the  context 
of  games,  the  study  of  affect  must  also  include  the  discussion  of 
enjoyment. 

Player  experiences  emerge  inside  the  player  from  the  pro¬ 
cess  of  interacting  with  the  game.  Player-experience  design 
crafts  these  cognitive  and  affective  responses  in  conjunction 
with  user  behavior.  Therefore,  the  design  of  player  experiences 
must  refine  not  only  the  gamer’s  cognitive  response  to  a  system, 
for  example,  by  reducing  complexity;  it  must  also  design  the 
gamer’s  affective  response,  for  example,  by  inspiring  interest 
or  rewarding  success  in  order  to  increase  engagement  and  sup¬ 
port  cognitive  tasks.  For  the  purposes  of  game  design,  affect 
supports  cognitive  as  well  as  behavioral  tasks,  because  emotion 
has  a  significant  effect  on  enjoyment,  attention,  memory,  learn¬ 
ing,  and  performance. 

Emotion  and  cognition  walk  hand  in  hand.  According  to 
Ekman,  beyond  entertainment,  emotions  are  about  goals  and 
the  things  we  care  about  (2003).  This  makes  the  creation  of 
emotions  ideally  suited  for  game  design  because  most  game- 
play  offers  goals  with  levels  and  scores  to  indicate  progress.  In 
films,  we  feel  emotions  only  if  we  somehow  identify  with  the 
characters  on  the  screen  and  vicariously  feel  their  emotions 
(Boorstin,  1990).  This  also  happens  in  games,  but  more  cen¬ 
tral  to  interactive  entertainment  is  when  players  feel  emotions 
from  what  they  accomplish  and  fail  at  during  the  game.  In  pro¬ 
ductivity,  feedback  on  effectiveness,  such  as  creating  a  sales 
presentation,  happens  after  the  task  of  creating  it  is  complete. 


In  games,  the  success-feedback  loop  is  immediate  and  built 
into  the  process  of  play. 


REQUIREMENTS  FOR  A  PLAYER 
EXPERIENCE  FRAMEWORK 


Emotion’s  Five  Impacts  on  Play 

“Emotions  prepare  us  to  deal  with  important  events  without  us  hav¬ 
ing  to  think  about  what  to  do.  ” 

— (Ekman,  2003,  p.  20) 

Games  entertain  with  emotions  so  players  enjoy  the  ride. 

Emotions  impact  player  experiences  in  five  ways.  What  play¬ 
ers  like  most  about  games  is  not  the  packaging,  graphics,  or  the 
AI,  but  the  total  experience  that  the  game  creates  for  the  player. 
This  experience  lies  in  the  cognitive,  affective,  and  behavioral 
changes  gamers  create  for  themselves  as  they  play.  Emotion  gen¬ 
erates  a  big  part  of  the  entertainment  value,  the  opportunity  for 
challenge  and  mastery  the  thrill  from  novelty,  the  ticket  to  relax¬ 
ation,  and  the  opportunity  to  hang  out  with  friends.  Customers 
buy  this  ride  designed  by  game  designers.  These  emotions  create 
moving  gameplay  and  make  a  victory  taste  sweet  Figure  34.2. 

Not  only  does  the  act  of  play  produce  emotions,  but  also 
emotions  and  affect  substantially  influence  the  player  and  how 
he  or  she  plays.  Unlike  user  experiences,  the  primary  aim  of 
player  experiences  is  to  move  the  player  emotionally  along  with 
or  counter  to  the  game  goal.  Such  techniques  heighten  the 
emotional  response  in  the  player.  Emotions  entertain  players, 
focus  their  attention,  help  them  decide,  aid  their  performance, 
and  assist  and  motivate  learning.  Enjoyable  emotions  from 
gameplay  increase  motivation  to  play  further  in  order  to  expe¬ 
rience  more  emotions,  which  makes  games  self-motivational. 
Emotions  are  there  from  the  first  click  to  final  volley. 


FIGURE  34.2.  Game  interaction  involves  and  creates  many 
emotions  that  are  clearly  visible  on  the  face  during  play,  as  seen 
here  during  the  tutorial  for  a  popular  action  game.  Copyright© 
2004  XEODesign,  Inc.  All  rights  reserved. 
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Emotion  During  Play  Helps  Gamers 

1.  Enjoy:  Creates  entertainment  from  strong  shifts  in  internal 
sensations 

2.  Focus:  Directs  effort  and  attention 

3.  Decide:  Aids  decision-making 

4.  Perform:  Supports  different  approaches  to  action  and 
execution 

5.  Learn:  Provides  motivation  for  learning,  aids  in  memory  and 
rewards  progress 

Games  heighten  emotional  responses  to  increase  enjoyment. 
Mario  Kart,  for  example,  adjusts  speed  of  players  to  keep  them 
together  for  “close”  races;  likewise, and  Baxter  keeps  play¬ 
ers  just  ahead  of  rolling  boulders.  A  choice  between  certain 
death  and  escape  via  a  narrow  window  ledge  in  Tom  Clancy’s 
Splinter  Cell  provides  focus  to  fill  attention  in  a  way  that  scrolling 
through  options  in  a  word  processor’s  drop-down  menu  or  walk¬ 
ing  down  an  office  corridor  of  options  does  not.  Strong  exag¬ 
gerated  affect  aids  player  decisions  whether  it  is  to  attack  a  goblin 
in  Worlds  ofWarcraft  (WOW)  or  grant  a  Sim  bathroom  privileges 
in  The  Sims.  Gameplay  creates  moods  that  aid  players  to  per¬ 
form;  negative  affect  during  a  fire  fight  in  Battlefield  improves 
pursuing  a  narrow  course  of  alternatives  such  as  stopping  a 
sniper,  while  positive  affect  from  munching  letter  sounds  in 
Bookworm  improves  identification  of  new  word  patterns  in  this 
Scrabble-Vike  tile  game.  A  player  learns  to  keep  diners  happy  in 
Diner  Dash  when  an  angry  customer  empties  the  tip  jar. 

1.  Enjoy:  Emotions  create  strong  shifts  in  internal  sensations  to 
heighten  and  refresh  player  experiences 

Suspend  players  over  boiling  lava  or  confront  them  with 
hideous  boss  monsters  and  the  game  heightens  their  emotions. 
Players  look  for  emotional  rewards  as  well  as  a  high  score.  Much 
of  the  enjoyment  comes  from  the  player’s  affective  response  and 
ultimately  separates  player  experiences  from  user  experiences. 
In  games,  player  participation  is  essential.  The  emotions  come 
from  the  players’  efforts  in  accomplishing  a  task  and  game  en¬ 
joyment  centers  on  the  experience  of  strong  emotional  shifts 
from  their  actions.  Games  provide  the  environment  where  the 
player  becomes  the  central  hero  to  accomplish  the  extraordinary. 
The  buddy  rescued  or  the  enemy  vanquished  in  Battlefield  is  no 
more  real  than  in  film,  but  the  player’s  role  in  the  achievement  is. 
An  interactive  medium,  the  choices  offered  by  the  game  (includ¬ 
ing  the  graphical  user  interface)  must  sharply  enhance  the  ex¬ 
perience.  A  poor  interface  reduces  interactivity  and  harms  the 
game  experience.  Therefore,  a  large  part  of  good  game  design 
lies  in  creating  effective  interfaces  that  also  create  emotional  re¬ 
sponses.  Movies,  by  comparison,  invite  the  audience  to  share  in 
the  joys  and  sorrows  of  characters  on  screen.  'Where  games  move 
beyond  film  to  claim  their  true  power  is  by  rewarding  player  ac¬ 
tion  with  emotions.  Movies  can  never  hand  the  audience  a  jet 
ski  for  the  thrill  of  stopping  the  impending  global  thermonuclear 
war.  As  part  of  their  unique  value  proposition,  games  have  to. 

2.  Focus:  Emotion  directs  effort  and  attention  aids  or  influences 
many  aspects  of  cognition  by  focusing  attention,  providing  im¬ 
mediate  feedback  and  rewards 


Emotion  supports  cognitive  tasks  by  directing,  focusing,  and 
holding  attention,  creating  absorbing  engagement  and  at  the 
same  time  allowing  emotion-laden  ideas  to  dominate  thought 
(Clore  &  Gasper,  2000  as  cited  in  Brave  &  Nass,  2003).  Events 
are  filtered  through  mood’  we  attend  more  to  thoughts  that 
match  the  current  mood,  and  ideas  similar  to  a  user’s  current 
mood  are  remembered  better  (Ekman,  2003;  Thorson  &  Eries- 
taad,  1985  as  cited  in  Brave  &  Nass,  2003).  Emotional  stimuli 
rewards  more  detailed  analysis. 

In  games,  tight  feedback  loops  reward  player  actions  with 
immediate  visual  and  audio  feedback  that  motivate  the  player  to 
want  to  take  another  action.  This  rewarding-stimulus-response 
loop  is  a  powerful  motivating  force  that  reaches  a  wide  mass- 
market  audience.  Several  game-design  techniques  magnify  the 
effect  emotions  already  have  to  focus  players  on  a  task.  For  ex¬ 
ample,  game  obstacles  sometimes  use  negative  affect  to  in¬ 
crease  player  focus  and,  at  other  times,  they  use  positive  affect 
to  increase  creativity  and  problem  solving  to  provide  interesting 
behaviors  or  situations  to  encourage  players  to  explore  (using 
positive  affect  to  increase  creativity  and  problem  solving).  Emo¬ 
tions  reward  attention.  Game  feedback  provides  new  stimuli  to 
interpret  and  creates  new  experience.  Rewards  create  positive 
and  negative  affect.  Failures  create  negative  affect  with  the 
hopes  that  players  will  double  their  efforts  and  try  again. 

3.  Decide:  Emotion  is  vital  to  decision-making  in  games 

Humans  use  both  thought  and  emotion  to  make  decisions. 
Experimental  evidence  suggests  that  people  with  damage  to 
brain  structures  involved  in  emotions  can  generate  appropriate 
logical  options  and  discuss  each  decision’s  impacts  and  tradeoffs 
with  great  clarity,  but  are  unable  to  make  the  actual  choice  itself 
(Damasio,  1994;  Norman,  2004).  'When  we  select  an  entree  from 
a  restaurant  menu,  it  is  what  we  “feel”  like  having.  There  are  also 
logical  components  to  that  choice  (calories,  fiber,  who’s  paying, 
etc.),  but  there  is  something  other  than  logic  that  happens  inside 
that  helps  us  choose  and  “feel”  good  about  the  selection.  Some 
people  even  feel  that  their  car  drives  better  after  it  has  been 
washed  or  had  the  oil  changed  (positive  affect) .  Emotions  also 
let  people  make  snap  decisions  (Ekman,  2003;  Norman,  2004). 
Emotions  help  players  decide  and  combine  with  logic  to  make 
these  decisions  interesting.  Because  positive  and  negative  affect 
guides  players,  it  is  easier  to  choose  between  options  with  strong 
emotional  stimuli.  Game  interfaces  that  supply  strong  emotional 
responses  also  have  this  effect.  Games  also  use  affect  to  add  con¬ 
flict.  Either  the  emotions  support  the  objective  or  make  achiev¬ 
ing  it  more  difficult  as  players  resist  the  urge  to  run  away.  Rec¬ 
ognizing  this  initial  role  emotion  plays  in  decision-making  offers 
a  crucial  way  to  improve  decision-support  products. 

4.  Perform:  Emotion  supports  different  approaches  to  action 
and  execution 

Emotions  are  a  key  component  in  most  tasks.  This  is  easy  to 
see  in  games,  which  are  often  designed  to  create  a  particular  af¬ 
fective  state,  which  sometimes  requires  following  strict  detailed 
procedures  with  zero  tolerance  for  error  or  creative  exploration 
of  alternative  options.  The  lesson  for  human  computer  interac¬ 
tion  (HCI)  is  that  properly  designed  emotions  support  the  right 
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affective  state  to  get  the  job  done  as  well  as  increase  appeal.  In 
productivity  software,  this  is  important  for  both  the  task  (such 
as  struggling  to  find  the  right  word  while  writing)  and  the  soft¬ 
ware  (such  as  struggling  to  find  the  right  feature),  and  both  cre¬ 
ate  frustration.  Like  a  car’s  seatbelt  alarm,  certain  levels  of  frus¬ 
tration  create  mild  negative  affect  in  the  user  and  direct  the  user 
toward  a  certain  action.  Too  much  negative  affect  (such  as  when 
players  cannot  keep  cars  on  the  racetrack) ,  makes  players  feel 
like  they  will  fail  and  quit.  Software  needs  aspects  that  create 
emotions  to  support  tasks,  and  allow  options  that  let  users  bal¬ 
ance  their  own  levels  of  frustration.  Like  cycles  of  hard  and  easy 
things  to  do  in  a  game,  users  need  to  experience  new  emotions 
to  refresh  their  experiences.  Emotion  creates  moods  that  persist 
and  help  the  player  perform.  The  emotion  the  designer  chooses 
should  help  the  actions  required.  Negative  affect  narrows  atten¬ 
tion  on  aspects  relevant  to  the  problem,  while  positive  affect 
opens  it  to  explore  new  alternatives.  Positive  moods  influence 
creativity  and  flexible  thinking  for  problem  solving.  Relaxed  and 
happy  thought  processes  expand  and  become  more  creative  and 
imaginative,  and  make  players  more  tolerant  of  minor  difficul¬ 
ties  (Norman,  2004).  Moods  with  a  negative  affect  focus  atten¬ 
tion,  while  positive  moods  help  players  generate  new  options. 
Cycling  between  moods  creates  variety  to  refresh  the  experience, 
and  offers  the  option  of  approaching  a  problem  with  a  different 
problem-solving  strategy  related  to  the  new  mood. 

5.  Learn:  emotion  provides  motivation  and  rewards  rrogress 

Played  for  pure  enjoyment,  game  emotion  motivates  play¬ 
ers  to  pay  closer  attention  and  repeat  an  action,  enabling  them 
to  master  highly  complex  interfaces  and  interactions,  learn 
countless  features  and  strategies,  and  spend  hours  doing  this 
even  though  they  may  fail  repeatedly.  To  increase  product  mas¬ 
tery,  many  game  methods  apply  nongame  interfaces.  While  pro¬ 
ductivity  software  users  prefer  to  learn  the  bare  minimum  num¬ 
ber  of  features  to  accomplish  their  work  rather  than  achieving 
level  42  in  spreadsheet  wizardryness,  game-like  motivation  may 
increase  exploration  of  additional  product  features.  All  games 
involve  learning.  Emotions  reward  changes  and  growth  inside 
players  as  they  master  what  they  could  not  do  before.  Learning 
and  remembering  are  easier  for  emotional  stimuli,  and  when  ex¬ 
periencing  an  emotion  or  a  mood,  it  is  easier  to  remember 
thoughts  that  have  a  matching  emotional  context.  Improved 
performance  such  as  learning  a  new  skill  or  high  score  is  a  big 
part  of  the  enjoyment  of  play. 


Requirements  for  an  Emotion  Framework  for  Games 


“The  problem  with  the  words  ‘enjoyment’  and  ‘happiness’  is  that 
they’re  not  specific  enough;  they  imply  a  single  state  of  mind  and  feel¬ 
ing,  in  the  same  way  that  the  terms  ‘upset’  and  ‘negative’  don’t  reveal 
whether  someone  is  sad,  angry,  or  disgusted.  ” 

— (Ekman,  2003  p.l90) 


(c)  what  game  designers  can  control,  and  (d)  what  researchers 
can  measure.  The  method’s  components  must  be  observable, 
salient  to  the  player,  relevant  to  the  player’s  experience  of  fun, 
and  apply  to  a  wide  variety  of  game  genres  and  hardware  plat¬ 
forms.  It  must  account  for  how  activities  offered  by  best-selling 
games  make  them  more  popular  than  others  with  similar  fea¬ 
tures.  It  should  track  internal  and  external  aspects  of  play  in¬ 
cluding  complex  emotions  and  their  impact  on  players’  lives.  A 
framework  must  capture  emotions’  role  in  enjoyment,  main¬ 
taining  player  interest,  decision  making,  performance,  and 
learning.  Designing  for  emotion  in  games  should  build  on  the 
research  and  theories  of  several  disciplines.  It  must  cover  dif¬ 
ferences  in  expectations  and  play  patterns  such  as  Bartle  who 
first  classified  players  as  (a)  Achievers,  (b)  Explorers,  (c)  Social- 
izers  or  (d)  Player  Killers  (Bartle,  a).  It  must  measure  emotion 
from  play  from  different  game  components  such  as  challenge 
and  fantasy  (Malone,  1981)  and  biometrics  such  as  heart  rate, 
control  pressure  and  facial  expression  (Mandryk,  2004;  Sykes 
&  Brown,  2003;  Hazlett,  2003)  and  specific  emotional  states 
such  as  pleasure  and  aggression  using  FMRI  (Weber,  Ritterfeld, 
&  Mathiak  2006).  It  should  cover  emotions  from  game  goals; 
open-ended  play;  as  well  as  emotions  from  game  events  and 
those  that  happen  in  real  life.  Most  importantly,  the  method 
must  describe  which  gameplay  mechanisms  produce  specific 
emotions  that  are  central  components  of  the  player  experience 
(Fig.  34.3). 

Evaluating  Relevant  Frameworks  for  Emotion, 

Products,  and  Entertainment 

A  limitation  shared  by  current  models  of  emotions  and  prod¬ 
ucts  is  that  they  focus  more  on  positive  and  negative  affect  than 
on  how  to  create  specific  emotions  from  interaction;  and  even 
less  attention  is  given  to  how  the  actions  of  a  game  player  create 
specific  emotions.  To  be  useful  to  game  and  product  designers, 
player-experience  methodologies  must  identify  more  emotions 
than  simply  positive  and  negative  affect.  Affective  states  grow 
and  change  during  gameplay. 


game 


fiero 

curiosity 

structured 

• 

open 

ended 

amuse 

relax 

"I  like  feeling 
smarter  than  the 
game.” 

"Me  and  my 
character  are  one." 

"I  play  this  rather 
than  eating  a 
muffin." 

"1  play  what  my 
friends  play." 


life 


A  practical  methodology  for  designing  emotions  should  support 
designers  as  well  as  researchers  to  build  better  player  experi¬ 
ences.  More  than  a  model,  a  practical  method  for  designing  and 
examining  player  experiences  should  take  into  account  four  per¬ 
spectives:  (a)  what  players  like  most,  (b)  what  creates  emotions. 


FIGURE  34.3.  Players  play  for  "the  ride"  that  games  takes  them 
on.  Player  experiences  (PX)  generate  emotions  from  reactions 
along  two  axes:  goal-directed  versus  open-ended  play,  what  hap¬ 
pens  inside  the  game  versus  how  gameplay  relates  to  other 
aspects  of  the  player’s  life.  Copyright©  2004-2007  XEODesign, 
Inc.  All  rights  reserved. 
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Many  emotions  are  enjoyable,  such  as  curiosity,  which  can 
also  lead  to  other  positive  emotions  such  as  wonder  and  even 
love.  However,  players  also  enjoy  play  experiences  containing 
negative  emotions.  The  pleasure  that  comes  from  many  game 
designs  requires  mastering  difficult  situations  or  experiencing 
unpleasant  emotions.  How  emotions  grow  and  change  during 
play  is  a  critical  part  of  player  experiences.  During  game  testing, 
designers  need  more  information  than  whether  an  action  was 
or  not  fun.  Player  observation  leads  to  the  design  of  better 
games,  but  this  requires  a  deeper  understanding  of  the  player  ex¬ 
perience  and  better  techniques  on  how  to  measure  it  (Fig.  34.4). 

Flow  the  Optimal  Experience:  Csikszentmihalyi 

The  most  relevant  and  influential  psychological  research  out¬ 
side  of  games  is  Csikszentmihalyi’s  model  of  on  optimal  experi¬ 


ences  or  “flow”  (Csikszentmihalyi,  1990).  He  found  that  people 
are  happiest  when  engaged  in  intrinsically  motivating  activities 
such  as  rock  climbing,  dancing,  and  gardening.  These  activities  of¬ 
fer  long-term,  deep,  memorable  experiences  that  require  con¬ 
centration  and  growth.  He  noted  that  optimal  experiences  care¬ 
fully  balance  skill  with  difficulty  to  create  a  deeply  enjoyable 
mental  and  physical  state.  Without  challenge,  an  activity  becomes 
routine  and  boredom  sets  in.  With  too  much  challenge,  the  per¬ 
son  becomes  too  anxious  and  leaves  because  he  or  she  feels  frus¬ 
trated  (Csikszentmihalyi,  1990) .  The  activity  should  aim  to  achieve 
flow  through  experience  design  and  create  forward  movement 
and  personal  development  through  pleasurable  engaging  expe¬ 
riences.  With  its  balance  of  difficulty  and  user  skill,  flow  clearly  de¬ 
scribes  a  critical  component  of  play  experiences.  However,  Csik¬ 
szentmihalyi’s  model  only  took  into  account  two  emotions:  (a) 
anxiety  (frustration)  and  (b)  boredom,  and  ignored  several  other 
emotions  essential  for  gameplay.  While  these  first  two  emotions 


Comparison  of  Models  and  Methods  to  Create  Affect  from  Products  and  Entertainment  Experiences 

XEODesign  Four  Fun  Keys 
Model 

Hard  Fun 

Fiero 

Challenge 

Game,  Goal 

Easy  Fun 

Curiosity 

Novelty,  Fantasy 

Game,  Open  Ended 

Serious  Fun 
Relaxation 

Real  World  Purpose 
Life,  Open  Ended 

People  Fun 

Amusement 

Social 

Life,  Goal 

Baitle's  Original  4  Player 

Types  (1996,  2003a,  2003b) 

Achiever 

Player  Killer 

Explorer 

Socializer 

Player  Killer 

Boorstin  ( 1 990) 

Voyeuristic  Eye 

Visceral  Eye 

Vicarious  Eye 

Csikszentmihalyi  ( 1 990) 

Ekman  (2003) 

Enjoyment,  flow 

Pleasure,  microflow 

Auto  appraisal,  memory 
of  emotion,  imagination, 
reflective  appraisal 

Empathy  w/  another. 
Violation  of  social  norm, 
talking  about  emotion, 
making  facial 
expression  of  emotion 

Hassenzahl  et  al.  (2000) 

Kim  (2000) 

Ergonomic  quality 

Hedonic  quality 

Community 

La  Blanc  et  al.  (2004) 

Mechanics,  Dynamics 
Aesthetics 

Aesthetics 

Malone  (1981) 

Challenge 

Curiosity 

Fantasy 

Norman  (2004) 

Behavioral 

Reflective  Visceral 

Reflective 

Piaget  (1962) 

Formal  games  with  rules 

Sensory-motor  play 

Pretend  play 

Tiger  (1992) 

Iordan  (2000) 

Physio-pleasure 

Psycho-pleasure 

Ideo-pleasure 

Socio-pleasure 

Wright  et  al.  (2003) 

Spatial-temporal  thread 

Compositional  thread 
Sensual  thread 

Emotional  thread 

Common  Drama  and  Theater 
Constructs 

Character  "Motivation”, 
Plot  points.  Objectives 
3-act  structure 

Setting,  Plot,  Story, 

Character,  Suspension 
of  disbelief 

Catharsis,  Music, 

Set  and  costume 
design 

Character 

Dialog 

Acting 

FIGURE  34.4.  Several  frameworks  describe  the  emotion  resulting  from  entertainment  experience 
or  the  use  of  a  product.  Comparing  their  similarities  and  differences  provides  interesting  insight 
into  the  basic  requirements  for  entertainment  and  product  emotions.  Copyright©  2004-2007 
XEODesign,  Inc.  All  rights  reserved. 
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are  important  in  games,  many  other  emotions  play  an  important 
role  in  player  engagement.  Csikszentmihalyi  grouped  together 
physical,  mental,  and  aesthetic  challenges,  which  prevented  dis¬ 
cussing  emotional  efifects  of  art  and  audio  separately  from  the  chal¬ 
lenge  of  gameplay.  A  model  for  emotions  in  games  should  connect 
the  emotions  most  important  to  games  to  how  games  create 
them. 

Pleasure  from  Products:  Jordan,  Norman,  and  Boorstin 

Pleasure  comes  from  different  aspects  of  experiencing  a 
product.  To  define  what  he  calls  the  new  human  factors,  Jordan 
expanded  on  anthropologist  Tiger’s  Four  Pleasures  model 
to  create  a  framework  for  thinking  about  pleasure  from  prod¬ 
ucts.  Jordan  discussed  (a)  the  ideo-pleasure,  (b)  physio-plea- 
sure,  (c)  psycho-pleasure,  and  (d)  socio-pleasures  of  a  product 
(or  emotions  from  a  product’s  idea,  physical,  psychological,  and 
social  attributes).  Each  of  these  Four  Pleasures  describes  enjoy¬ 
ment  from  a  different  perspective  including  how  the  use,  own¬ 
ership,  or  identification  with  a  product  produces  emotion 
(Tiger,  1992,  as  cited  in  Jordan,  2000;  Jordan,  2000).  Changing  an 
aspect  can  strengthen  a  desired  emotional  response.  Norman 
built  on  and  simplified  Jordan’s  model  to  three  layers  of  mental 
processing:  (a)  the  Visceral  (an  automatic  biological  response), 
(b)  the  Behavioral  (learned  actions) ,  and  (c)  the  Reflective  (in¬ 
volving  thought,  self-image  and  relationship  to  others)  (Nor¬ 
man,  2004).  Interestingly,  filmmaker  Jon  Boorstin  also  proposed 
three  ways  that  films  delight  audiences:  (a)  the  Visceral  Eye  (en¬ 
joyment  on  the  biological  level  such  as  automatic  sensory  reac¬ 
tions  to  explosions  or  speed),  (b)  Voyeuristic  Eye  (enjoyment 
through  seeing  events  unfold),  and  (c)  the  Vicarious  Eye  (en¬ 
joyment  through  identification  with  people  on  screen) 
(Boorstin,  1990).  Unfortunately,  none  of  these  frameworks  go 
deeply  into  what  actions  create  specific  emotions. 

Enter  Ekman  and  Facial  Expressions 

Through  his  research  of  universal  facial  gestures,  Paul  Ek- 
man’s  Facial  Action  Coding  System  (or  FACS)  and  a  compilation 
of  others’  research,  Ekman’s  work  has  increased  understanding 
of  specific  human  emotions’  cross-cultural  boundaries.  His  fasci¬ 
nating  and  highly  accessible  book  Emotions  Revealed  described 
how  to  identify  emotions  through  facial  gestures  and  the  impor¬ 
tant  roles  emotions  play  in  our  lives  (Ekman,  2003)  (Fig.  34.5). 

Because  of  its  specificity,  the  FACS  coding  system  for  facial 
gestures  offers  promise  as  an  emotional  measure  for  games  and 
software,  and  is  the  basis  for  many  of  XEODesign’s  observa¬ 
tional  studies.  However,  Eckman’s  work  focused  on  the  identi¬ 
fication  of  individuals’  experience  of  specific  emotions  and 
stops  short  of  discussing  how  products  or  their  use  can  create 
these  emotions.  "What  game  designers  need  is  a  method  that 
connects  specific  emotions  to  player  actions  in  the  game. 

All  of  these  models  lay  the  groundwork  for  what  is  ultimately 
needed  by  game  designers — a  method  for  producing  specific 
emotions  from  the  interactions  players  most  appreciate  during 
goal  oriented  and  open  ended  play.  To  do  this  the  game  indus¬ 
try  needs  an  expanded  method  to  create  specific  captivating 


Six  Plus  One  Universal  Emotions  with  Universal 

Facial  Gestures 

Emotion 

Example 

Frustration: 

Figuring  out  how  to  get  character  off  a  roof  in  Tom 
Clancy's  Splinter  Cell  (and  all-too-often  created  by 
usability  issues  that  detract  from  the  player 
experience) 

Fear: 

Falling  into  boiling  lava,  fast-moving  projectiles 
aimed  at  the  player  in  Doom 

Surprise: 

Using  Mysf's  linking  books  for  the  first  time  to 
transport  to  a  new  world. 

Sadness: 

When  the  young  magician  Aerith,  in  Final  Fantasy 

VII  is  murdered 

Amusement: 

When  two  Sims  get  married  in  The  Sims,  or  rolling 
over  and  picking  up  sumo  wrestlers  in  Katamari 
Damacy 

Disgust 

Becoming  a  social  outcast  (social  disgust)  after 
losing  the  dancing  challenge  in  Sid  Meier's  Pirates! 

Curiosity:* 

Wanting  to  know  what  happens  by  driving  the 
race  track  the  wrong  way  in  Project  Gotham  Racing  3 

*Not  all  researchers  (including  Ekman)  considered  curiosity  a  universal  emotion 
with  a  unique  facial  gesture.  I  include  it  here  as  a  seventh  emotion  because 
of  its  importance  in  games  and  ease  of  observation. 

FIGURE  34.5.  Researchers  generally  agree  that  there  are  at  least 
six  emotions  with  universal  facial  gestures. 


emotions  from  the  best-loved  types  of  gameplay.  This  is  a  core 
focus  of  XEODesign’s  independent  research. 


THE  PLAYER  EXPERIENCE  FRAMEWORK 


Why  We  Play  Games:  Four  Fun  Keys  Model 

XEODesign  conducted  independent  research  to  identify  four 
key  processes  that  create  emotion  in  best-selling  games.  (Figure 
34.6)  Each  experience  involves  a  different  emotion  to  create  a 
different  Player  Experience  Profile.  By  presenting  a  goal  and 
breaking  it  into  small  achievable  steps,  games  create  emotions 
from  Hard  Fun,  where  the  frustration  of  the  attempt  is  com¬ 
pensated  by  the  feelings  of  accomplishment  and  mastery  from 
overcoming  obstacles.  Outside  of  goals,  games  provide  novel 
opportunities  for  interaction,  exploration,  and  imagination, 
which  create  Easy  Fun .  Games  that  use  emotions  in  play  to  mo¬ 
tivate  real-world  benefits  to  help  players  change  how  they  think, 
feel,  and  behave  or  to  accomplish  real  work  create  Serious  Fun. 
Finally,  games  that  invite  friends  along  get  an  interpersonal  emo¬ 
tional  boost  from  People  Fun  (Lazzaro,  2004b).  The  Four  Fun 
Keys  are  a  collection  of  related  game  interactions  (game  me¬ 
chanics)  that  deliver  what  players  like  most  about  games.  Each 
offers  a  key  to  “unlock”  unique  emotions  such  as  frustration,  cu¬ 
riosity,  relaxation,  excitement,  and  amusement.  Best-selling 
games  provide  features  that  support  at  least  three  of  these  Eour 
Fun  Keys  to  create  a  wider  emotional  response  in  the  player.  To 
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The  4  Fun  Keys  Model  for  Emotion  through  Gameplay 


game 


fiero 

A 

curiosity 

Hard 

Easy 

Fun 

Fun 

px 

• 

player 

experience 

People 

Serious 

Fun 

Fun 

amuse 

relax 

life 


FIGURE  34.6.  Each  Fun  Key  is  a  collection  of  game  mechanics 
that  create  a  favorite  aspect  of  gameplay.  Emotions  prepare  play¬ 
ers  for  action  and  reward  players.  These  Four  Fun  Keys  each  con¬ 
tribute  to  a  unique  set  of  emotions  to  the  game.  Changing  one  of 
the  four  key  mechanisms  will  change  the  Player  Experience  pro¬ 
file  of  the  game.  Copyright©  2004-2007  XEODesign,  Inc.  All 
rights  reserved. 

keep  things  fresh  during  a  single-play  session,  gamers  move  be¬ 
tween  the  four  different  play  styles  (Lazzaro,  2004b).  Develop¬ 
ing  each  key  focuses  and  rewards  the  player  with  emotion  from 
a  self-motivating  experience  that  deepens  the  game’s  player- 
experience  profile.  Designers  of  products  and  productivity  soft¬ 
ware  can  also  use  these  Four  Fun  Keys  to  increase  emotional 
engagement  for  applications  outside  of  games. 

Only  some  of  the  emotions  from  playing  basketball  in  the 
real  world  come  from  the  Hard  Fun  of  making  baskets.  Close  ex¬ 
amination  reveals  that  all  four  Fun  Keys  are  part  of  this  popular 
sport.  Dribbling  the  ball  or  doing  tricks  like  a  Harlem  Globe¬ 
trotter  offers  Easy  Fun  from  novelty  and  role  play.  Intentionally 
blowing  away  frustration  and  getting  a  workout  creates  Serious 
Fun.  Competition  and  teamwork  make  the  game  even  more 
emotional  from  People  Fun.  All  four  types  of  fun  make  basket¬ 
ball’s  player  experience  more  enjoyable.  None  of  these  require 
story  or  character.  Through  examination  of  how  each  type  of 
fun  creates  emotions,  designers  and  researchers  can  create  bet¬ 
ter  and  more  emotional  player  experiences  (Fig.  34.7). 


FIERO  FROM  HARD  FUN  GAMEPLAY 


Gameplay  that  rewards  effort  through  challenges  to  create  fiero. 

Desire  for  Fiero  Helps  Players  Master  Challenge 

“It’s  easy  to  tell  what  games  my  husband  enjoys  the  most.  If  he  screams 
7  hate  it.  I  hate  it.  I  hate  it,  ’  then  I  know  two  things:  a)  be  will  finish  it 
and  b)  buy  version  two.  If  he  doesn’t  say  this,  he’ll  put  it  down  after  a 
couple  of  hours.  ” 

— ^Wife  of  a  hardcore  PC  gamer 


Four  Fun  Keys  to  More  Emotion  through  Gameplay 


Lead  Emotion 

Fun  Key 

Key  Game  Mechanic 

Fiero* 

Hard  Fun 

Affect  related  to  challenge. 

Curiosity 

Easy  Fun 

strategy,  and  mastery 

Affect  related  to  novelty, 

Relaxation 

Serious  Fun 

ambiguity,  detail,  fantasy,  role- 
play  and  absorbing  attention 
Affect  related  to  purposefully 

Amusement 

People  Fun 

changing  oneself,  learning  or 
doing  real  work 

Affect  related  to  competition, 

*  Personal  triumph 

over  adversity. 

cooperation,  and  socializing 
with  others 

FIGURE  34.7.  Game  mechanics  create  emotion  from  what  play¬ 
ers  like  most  about  play.  To  this  group  of  easily  observable  emo¬ 
tions,  we  add  frequently  reported  emotions  collected  through 
verbal  descriptions  of  internal  sensations  as  well  as  player's  body 
language  during  play.  Copyright©  2004-2007  XEODesign,  Inc. 
All  rights  reserved. 

The  most  obvious  enjoyment  in  games  comes  from  mastering  a 
challenge  and  reaching  a  goal.  Hard  Fun  is  a  self-motivating  ac¬ 
tivity  that  keeps  the  user  focused  and  enthusiastic  by  providing 
an  obstacle,  an  objective,  and  a  score.  Hard  Fun  game  mechanics 
challenge  a  player  to  overcome  an  obstacle  to  achieve  a  goal. 
Hard  Fun  experiences  reward  mastery,  either  explicitly  with 
points  and  bonuses  or  implicitly  through  new  levels  or  abilities. 
Because  this  type  of  play  requires  application  of  effort,  we  call 
this  Hard  Fun.  In  our  research,  we  expand  on  this  phrase  (Csik- 
szentmihalyi,  1990;  Papert)  to  define  Hard  Fun  as  the  rewarding 
process  of  mastering  a  challenge  that  involves  the  creation  and 
testing  of  strategies  and  the  application  of  effort.  Hard  Fun  re¬ 
wards  effort  and  discourages  failed  approaches.  Hard  Fun  cre¬ 
ates  the  emotions  of  frustration  and  boredom.  More  importantly, 
it  produces  fiero,  the  Italian  word  Isabella  Poggi  and  Ekman  used 
to  describe  the  personal  feeling  of  triumph  over  adversity  (Poggi, 
as  cited  in  Ekman,  2003).  One  of  the  most  important  game  emo¬ 
tions,  fiero  is  a  strong  feeling  of  personally  accomplishing  some¬ 
thing  difficult  such  as  defeating  the  boss  monster  (Figure  34.8). 

Hard  Fun  requires  a  high  investment  of  energy  from  the 
player.  By  perfectly  balancing  player  skill  with  game  difficulty 
Hard  Fun  meets  many  of  the  characteristics  and  requirements 
for  flow  (Csikszentmihalyi,  1990;  Lazzaro,  2004b).  For  example, 
Pac  Man ’s  simple  game  mechanic  (eat  dots  and  avoid  ghosts) 
offers  clear  long-term  and  short-term  goals,  the  opportunity  to 
concentrate,  achievable  tasks,  an  uncertain  outcome,  and  im¬ 
mediate  feedback  to  player  decisions.  It  creates  a  deep  sense 
of  control  through  tight  feedback  loops  between  player  input 
and  action  in  the  game.  All  of  these  enhance  players’  absorption 
into  a  challenging  activity  and  improve  their  ability  to  perform. 
Beyond  flow’s  balance  of  difficulty  and  skill,  game  designers  do 
other  things  to  change  how  players  feel  about  their  progress  in 
the  game.  According  to  game  designer  Steve  Meretzky  rewards 
along  the  path  to  the  goal  enhance  enjoyment  such  as  power 
ups,  big  jumps  in  score,  animations  and  sounds  (Lazzaro,  2005a) . 
For  example,  the  power  ups  in  Pac  Man  create  super-charged 
feelings  as  players  turn  the  tables  to  chase  ghosts. 
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Hard  Fun 

frustration 


,  fiero! 


player  skill 


Game  Features 

goals 

challenge 

obstacles 

strategy 

power  ups 

puzzles 

score 

levels 

Emotions 

fiero 

frustration 

boredom 


ority  and  picks  up  the  perceived  pace  of  the  game).  Emotional 
states  also  focus  effort  and  attention  on  available  choices  in  play. 
Positive  affect,  for  example,  helps  to  generate  new  ideas  such 
as  what  gemstones  to  match  in  Bejeiveled-,  or  negative  affect, 
surrounding  an  impending  invasion  in  Civilization  provides 
additional  focus  for  decision  making  on  how  to  respond. 

Fiero  Enhances  a  Player’s  Sense  of  Progress 


“/  have  to  concentrate!” 

— Traveler  in  St.  Louis  airport  on  what  he  likes  most  about  JamDat 
Bowling  on  his  mobile  phone.  Playing  removes  distractions  from  his 
consciousness  and  engrosses  him  in  a  rewarding  activity  while  he  waits. 


FIGURE  34.8.  Similar  to  Csikszentmihalyi's  concept  of  flow, 
games  must  balance  difficulty  with  player  skill  and  provide 
enough  variety  of  challenge,  strategies,  and  puzzles.  If  the  player 
gets  too  frustrated  or  bored,  he  or  she  leaves  the  game.  Chart¬ 
ing  a  player’s  progress  through  a  game  onto  Csikszentmihalyi’s 
flow  model  illustrates  that  in  addition  to  the  requirements  for 
flow,  games  can  be  made  more  emotional  from  sharp  changes  in 
the  level  of  difficulty  for  game  challenges.  Game  difficulty 
increases  to  the  end  of  a  level  and  then  resets  at  the  start  of  the 
next.  When  players  succeed  at  the  point  where  they  about  to 
quit,  they  are  more  likely  to  experience  fiero.  In  this  figure  above 
fiero  occurs  at  the  end  of  level  two  before  continuing  to  level 
three  of  this  game  (Modified  from  Csikszentmihalyi,  1 990).  Copy¬ 
right©  2004-2007  XEODesign,  Inc.  All  rights  reserved. 


Hard  Fun  focuses  player  attention  on  achieving  results  by 
providing  opposition  and  constraints;  such  as  removing  an  alien 
threat  in  Halo  or  aligning  puzzle  pieces  in  Tetris.  Often  games 
provide  a  choice  of  strategy;  and  games  with  high-replay  ability 
offer  a  choice  of  goals.  For  example.  The  Sims  offers  several  win¬ 
ning  conditions  such  as  the  best-looking  house,  the  most 
friends,  or  the  most  money.  The  new  obstacles,  constraints,  and 
tradeoffs  from  different  goals  suggest  multiple  strategies  and 
enhance  the  challenge.  These  emotions  facilitate  achievement 
of  a  goal,  focus  play  effort,  and  reward  accomplishment.  Nega¬ 
tive  affect  increases  focus  and  concentration  (Norman,  2004), 
such  as  when  the  player  encounters  resistance  to  the  goal 
through  failed  attempts.  Sometimes  Hard  Fun  emotions  run 
counter  to  what  needs  to  be  done.  For  example,  performing 
well  or  moving  toward  something  rather  than  running  away  is 
more  challenging  when  surrounded  by  negative  affect.  Other 
Hard  Fun  emotions,  such  as  fiero  from  achievement,  put  the 
player  into  a  positive  affective  state  by  rewarding  the  player’s 
efforts.  Unlike  many  emotions,  fiero  does  not  require  an  audi¬ 
ence  (Pioggi  as  cited  in  Ekman,  2003)  and  is  a  powerful  emotion 
unavailable  in  film  or  novels.  Players  enjoy  these  emotions  from 
Hard  Fun,  including  the  mental  focus  provided  by  frustration 
that  helps  players  concentrate  (Fig.  34.9). 

While  general  enjoyment  results  from  emotions  that  provide 
motivation,  other  emotions  impact  the  player’s  response  to  the 
game’s  stimulus.  Decision  making  is  aided  or  made  more  chal¬ 
lenging  by  affect,  whether  it  is  the  positive  affect  of  collecting 
Bejeiveled’s  colored  gemstones  or  the  negative  affect  from 
Bookworm’s  burning  letter  tiles  (which  makes  their  use  a  pri- 


Players  enjoy  many  other  aspects  of  games  with  emotions 
outside  of  flow’s  balance  of  difficulty  and  skill.  To  enhance  a 
player’s  sense  of  progress  as  play  continues,  games  offer  new 
tools  or  abilities  along  with  new  obstacles,  constraints,  and 
tradeoffs  to  maintain  interest.  Levels  divide  challenge  into  grad¬ 
ually  increasing  amounts  of  difficulty.  The  difficulty  also  in¬ 
creases  inside  each  level,  with  most  levels  ending  in  a  “boss 
monster”  final  challenge  similar  to  a  major  plot  point  in  a  story. 
Defeating  boss  monsters  or  boss  puzzles  produces  fiero,  and 
then  start  of  the  next  level  is  often  dramatically  easier  to  provide 
emotional  relief  The  best-selling  game  Halo  uses  Hard  Fun  to 
enhance  a  player’s  performance  by  the  way  it  breaks  a  variety 
of  challenges  into  levels,  rewards  progress,  provides  power  ups, 
such  as  new  weapons  or  armor,  and  adds  new  vehicles  to  offer 
fresh  strategy  options  (Fig.  34.10). 

Hard  Fun  Creates  Challenge  with  Strategies  and  Puzzles 

Central  to  the  enjoyment  of  Hard  Fun  is  the  creation  and  testing 
of  strategies,  applying  creativity  and  the  development  of  skills. 
■While  some  players  focus  on  the  goal,  many  enjoy  the  process 
of  learning  how  to  win  or  take  pride  in  how  much  better  they 
play  with  each  repetition  of  the  game.  Unlike  real-world  sports 
such  as  baseball,  computer  games  increase  the  challenge  by 
changing  the  rules  and  winning  strategies  between  levels  to 
keep  the  emotions  from  gameplay  fresh.  Players  must  strategize 
to  find  new  ways  to  play.  Rather  than  requiring  more  points  in 
less  time,  best-selling  games  such  as  Collapse  keep  players  en¬ 
gaged  by  making  a  winning  strategy  obsolete  a  few  levels  later 
(Lazzaro,  2004b) .  The  process  of  devising  new  strategies  or  solv¬ 
ing  puzzles  in  new  ways  is  something  that  players  like  most 
about  games.  The  success  and  failure  of  these  mental  tasks  and 
the  increase  in  player  skill  all  create  Hard  Fun  emotion  and  help 
players  stay  engaged  and  playing. 


CURIOSITY  FROM  EASY  FUN  GAMEPLAY 


Curiosity  Retains  Attention 

“Part  of  the  enjoyment  comes from  the  spy  technology .  .  .  cool  spy  tools 
are  part  of  the  Spy  experience.  ” 

— Xavier  playing  Splinter  Cell 
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Chain  of  Hard  Fun  Emotions  Increases  Enjoyment 

Goal-directed  gameplay  creates  Hard  Fun  emotions  that  focus  and  reward  players  for  overcoming  obstacles. 


Emotion 

Common  Themes  and  Triggers 

fiero! 

Frustration 

Opposition  to  an  important  goal,  sudden  reversal,  feeling  of  being  thwarted, 
physical  restraint.  Anger  prepares  the  body  to  remove  an  obstacle  by  force. 
(Ekman,  2003) 

frustration 

relief 

boredom 

Hard  Fun  emotions  involve  obstacles, 
strategy  and  success  and  help  players 
accomplish  a  goal  by  focusing  and 
rewarding  effort. 

Fiero* 

(Italian) 

Boredom** 

Personal  triumph  over  adversity.  (Ekman,  2003)  Overcoming  difficult  obstacles, 
players  raise  their  arms  over  their  heads.  They  do  not  need  to  experience  anger 
prior  to  success,  but  it  does  require  effort  and  some  frustration. 

Repetitive,  dull,  or  tedious  tasks.  (Ekman,  2003)  Lack  of  interest  in  the  outcome 
or  in  playing.  Dispelling  boredom  is  also  a  major  reason  to  play  games. 

*Fiero  is  a  positive  emotion  that  has  a  body  rather  than  a  facial  gesture.  It  is  not  yet  known  what  fiero  gestures  looks  like  across-cultural  boundaries. 
*  *Ekman  does  not  list  boredom  as  having  a  universal  facial  gesture;  however,  it  is  frequently  seen  in  games  that  lack  sufficient  challenge. 

(Source:  Lazzaro,  2004b) 


FIGURE  34.9.  Hard  Fun  Emotion  Cycle.  Copyright©  2004-2007  XEODesign,  Inc.  All  rights  reserved. 


In  fiero,  the  ultimate  game  emotion,  the  sensations  are  powerful.  It  is 
how  players  feel  when  they  beat  the  boss  monster  or  make  level  20 
after  difficult  struggles,  or  when  they  win  a  tennis  match  at  Wimbledon. 
As  fists  punch  the  air  a  victorious  player  screams,  "Yes!  I  did  it!” 

In  XEODesign’s  play  lab,  fiero  appears  as  a  positive  upward  gesture  of  the 
arms  or  body  after  succeeding  a  challenge.  Some  players  jump  their 
characters  up  and  down  to  express  this  emotion.  (Source:  Lazzaro,  2004b) 

FIGURE  34.10.  In  fiero  a  player  raises  an  arm  to  show  their  ex¬ 
citement.  Copyright©  2004  XEODesign,  Inc.  All  rights  reserved. 


Less  apparent  but  equally  important  to  Hard  Fun,  top-rated 
games  offer  a  lot  of  gameplay  outside  of  or  en  route  to  a  goal. 
Easy  Fun  is  a  self-motivating  activity  that  maintains  player  en¬ 
gagement  through  novelty  beyond  an  obstacle,  goal,  or  score. 
Easy  Eun  offers  novel  interaction  to  inspire  player  curiosity  to 
explore,  fantasize,  and  role  play.  By  balancing  what  is  expected 
and  unexpected,  careful  game  design  prevents  the  player  from 
quitting  from  either  disinterest  or  disbelief  Easy  Fun  derives 
from  the  ability  to  explore  and  create  exceptional  experiences 
unavailable  in  the  real  world.  The  emotions  of  curiosity,  sur¬ 
prise,  wonder,  and  awe  surrounding  Easy  Fun  capture  and  re¬ 
tain  the  player’s  attention,  as  opposed  to  Hard  Fun  where  the 
emotions  of  frustration  in  hopes  of  fiero  help  players  focus  on 
and  apply  effort  toward  attaining  a  goal  (Fig.  34.11). 

If  Hard  Fun  revolves  around  goal  achievement.  Easy  Fun  fo¬ 
cuses  attention  by  offering  opportunities  to  explore,  get  lost  in 
a  fantasy,  role  play,  or  simply  horse  around.  With  compelling  Easy 
Eun,  the  player  ignores  the  goal  completely  or  forgets  about 
keeping  score.  Easy  Eun  players  find  rewarding,  open-ended  ac¬ 
tivities  on  top  of  the  game’s  main  goal.  The  so-called  sandbox 
play  patterns  capture  player  attention  and  pull  players  into  deep 
states  of  immersion  through  curiosity  instead  of  daring  them 
with  challenge.  Best-selling  games  offer  opportunities  for  inter¬ 
action  through  unusual  yet  enjoyable  behavior  of  the  controls 
and  novel  interaction  with  the  world.  Players  become  fascinated 
just  by  interacting  with  the  game,  such  as  being  able  to  flip  a  car 
off  freeway  exit  ramps  in  GTA  (Grand  Theft  Auto?)  Like  the  fiero 
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IB  Game  Features 

explore 
fantasy 
fool  around 
role  play 
ambiguity 
detail 

uniqueness 

Emotions 

curiosity 

surprise 

wonder 

awe 


Gameplay  that  fills  attention  through  novelty  to  inspire  curiosity. 

FIGURE  34. 1 1 .  Easy  Fun  emotions  maintain  player  attention 
without  challenge  through  novelty  and  inspiring  fantasy.  Similar 
to  Csikszentmihalyi's  concept  of  flow,  players  will  leave  a  game 
because  of  disbelief  or  disinterest.  To  maintain  player  interest, 
the  game  design  must  balance  the  expected  with  the  unex¬ 
pected.  The  player  experience  profile  of  Easy  Fun  includes  curios¬ 
ity,  surprise,  and  wonder.  Copyright©  2004-2007  XEODesign,  inc. 
All  rights  reserved. 


from  Hard  Fun,  Easy  Fun  has  positive  peak  emotions  that  reward 
play  such  as  surprise,  wonder,  and  that  “ah-ha”  feeling  from  fig¬ 
uring  something  out.  However,  unlike  Hard  Fun  none  of  these 
Easy  Fun  emotions  require  frustration.  Story  can  often  generate 
the  emotions  of  wonder  and  surprise,  but  role-playing  and  a 
player’s  own  discovery  through  exploration  create  these  emo¬ 
tions  on  a  much  more  personal  level.  Positive  affect  encourages 
creativity  and  exploration  of  alternatives  (Norman,  2004)  that 
opens  the  player  up  to  free  associations  and  other  possibilities. 
Easy  Fun  emotions  focus  on  filling  player  attention. 

Close  examination  of  the  conditions,  causes,  and  relation¬ 
ships  between  emotions  provide  opportunities  for  game  de¬ 
signers  to  create  even  bigger  emotional  responses  in  players.  It 
is  not  that  a  player  cannot  feel  curious  during  Hard  Fun;  but 
with  Easy  Eun,  curiosity  and  the  sheer  joy  of  interaction  drive 
the  player  rather  than  only  the  score,  as  it  is  in  Hard  Fun.  Like 
improv  theater.  Easy  Fun  in  games  such  as  Grand  Theft  Auto 
makes  offers  to  players  such  as  a  car  and  a  plate-glass  store  win¬ 
dow.  It  is  up  to  the  player  to  accept  this  opportunity  and  dis¬ 
cover  how  the  car  and  window  interact. 


Chain  of  Easy  Fun  Emotions  Increase  Enjoyment 

Open-ended  gameplay  creates  Easy  Fun  emotions  and  in¬ 
creases  immersion  in  an  activity  (Fig.  34.12). 


Easy  Fun:  the  Bubble  Wrap  of  Game  Design 


“In  real  life,  if  a  cop  pulled  me  over  I’d  stop  and  hand  over  my  dri¬ 
ver  ’s  license.  Here  I  can  run  away  and  see  what  happens.  ” 

— Xavier  playing  GTA  Vice  City 


Easy  Fun  is  the  bubble  wrap  of  game  design.  It  is  fun  without  a 
purpose.  Easy  Fun  provides  novel  opportunities  for  interaction 
that  players  “discover”  alongside  of  or  outside  of  the  main  play. 
The  enjoyment  and  label  for  Easy  Fun  comes  from  the  way  play¬ 
ers  goof  around,  frolic,  explore,  and  play  with  an  ease  they  do 
not  have  when  pursuing  a  specific  winning  condition  as  with 
Hard  Fun.  Best-selling  games  such  as  G7A,  Halo  wdMyst  offer 
numerous  opportunities  for  Easy  Fun;  so  that  when  the  chal¬ 
lenge  gets  too  tough  or  loses  its  appeal,  the  players  have  many 
other  things  to  do  that  create  emotional  responses.  A  big  role 
of  Easy  Fun  in  best-selling  games  is  to  refresh  the  player  between 
or  in  the  middle  of  challenges.  In  Halo,  for  example,  players  can 
cycle  between  the  Hard  Fun  of  combat  and  the  Easy  Fun  of  ex¬ 
ploring  a  ring  world.  The  game’s  battlefield  is  on  the  inside  of  a 
ring-shaped  planet,  which  inspires  curiosity  as  players  investigate 
as  they  approach  a  horizon  that  instead  of  dipping  down  from 
view,  dips  up  overhead.  Easy  Fun  also  provides  interest  when 
players  pursue  the  opposite  direction  of  a  game  goal  such  as 
putting  the  Sims  in  the  pool  and  removing  the  ladders  in  The 
Sims  or  placing  predator  and  prey  animals  in  the  same  pen  just 
to  see  what  happens  in  Zoo  Tycoon  (a  flurry  of  dust  and  the  prey 
disappears).  Through  exploring  both  what’s  right  and  what’s 
wrong,  games  offer  more  opportunities  for  emotion,  especially 
from  violating  norms.  In  addition  to  relief  from  challenge.  Easy 
Fun  prevents  progress  in  gameplay  from  feeling  like  a  skeleton  of 
correct  decisions.  Easy  Fun  reinvigorates  emotionally  and  often 
reinterests  the  player  in  the  goal.  By  offering  both  kinds  of  fun, 
the  game  extends  the  average  play  session  and  lets  the  player 
self-regulate  the  challenge  if  the  Hard  Fun  becomes  too  hard. 

“The  journey  is  the  reward.  ” — Design  philosophy  at  Cyan,  creators  of  Myst 

The  emotions  from  Easy  Fun  both  inspire  and  satisfy  a 
player’s  curiosity.  To  create  the  emotions  of  curiosity  surprise, 
wonder,  and  awe  in  addition  to  novelty.  Easy  Fun  gameplay  uses 
juxtaposition,  where  contrast  between  items  or  events  requires 
the  player  to  investigate.  Like  Magritte’s  surrealist  painting,  “This 
is  not  a  pipe,”  a  player  of  The  Sims  must  interpret  the  Siamese 
pictograph  language  spoken  by  the  characters.  The  player  pro¬ 
jects  in  and  provides  an  explanation  for  any  conversation  be¬ 
tween  Sims.  The  Easy  Fun  of  games  provides  opportunities  for 
fantasy  and  role  play.  Players  can  take  elements  of  the  game  and 
do  their  own  thing  with  them  whether  it  is  wielding  an  ore’s 
mace  in  World  ofWarcraft  (WOW)  or  a  donning  a  superhero’s 
cape  in  City  ofHeros.  Games  also  reward  player  curiosity  with 
details  such  as  in  Cyan’s  Myst.  In  addition  to  the  Hard  Fun  of 
puzzles,  Myst  offers  Easy  Fun  gameplay  from  exploring  worlds. 
To  encourage  players  to  slow  down  and  notice,  Myst  provides 
numerous  small  details  in  the  environment  that  reward  closer 
inspection  and  encourage  a  slow,  careful  gameplay  style. 

Instead  of  running  through  at  top  speed,  players  spend  more 
time  exploring  and  looking  for  clues  to  solve  the  mystery.  This 
supports  the  style  of  interaction  needed  to  play  and  win  the 
game.  In  addition  to  detail,  Myst  captures  player  attention 
through  ambiguity  in  the  setting,  surrealistic  juxtapositions  such 
as  the  boiler  room  inside  a  tree,  linking  books  and  faded 
sketches  of  other  wondrous  technology.  The  conflict  that  cre¬ 
ates  emotional  tension  in  Easy  Fun  is  often  between  what  the 
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Chain  of  Easy  Fun  Emotions  Increases  Enjoyment 

Unstructured  sandbox  play  creates  Easy  Fun  Emotions  that  reward  players  outside  of  challenge  and  keeping  score 


Emotion  Common  Themes  and  Triggers 


wonder  Curiosity* 

^  awe 


Unusual,  unresolved  situation  that  peaks  player’s  inquisitiveness.  (Ekman, 
2003)  Something  that  players  find  strange,  odd,  or  intriguing,  such  as  Myst's 
surreal  ship-rock  island. 


surprise 


curiosity 


relief 


Surprise  Sudden  change.  Briefest  of  all  emotions,  does  not  feel  good  or  bad,  after 

interpreting  the  event  this  emotion  merges  into  fear,  relief,  etc.  (Ekman,  2003) 
such  as  when  matching  two  blocks  clears  the  whole  board. 


Wonder  Overwhelming  improbability  (Ekman,  2003).  Curious  items  amaze  players  at 

their  unusualness,  unlikelihood  and  improbability  without  breaking  out  of 
realm  of  possibilities,  such  as  in  Mysf's  linking  books. 

Awe  Combination  of  wonder  with  a  fear  and  dread  (Ekman,  2003),  such  as  a 

beautiful  but  impossibly  powerful  dragon  or  female  warrior  in  EverQuest 

*Not  all  researchers  including  Ekman  recognized  curiosity  or  interest  as  a  universal  emotion  with  a  distinct  facial  gesture.  However,  those  who  did  recognize  it  saw  the 
emotion  indicated  by  a  lifting  and  drawing  together  of  the  inner  eyebrows.  In  our  research,  we  saw  it  frequently  combined  with  leaning  forward.  It  was  also  a  feeling  re¬ 
ported  verbally  by  players  (Lazzaro,  2004b). 


Easy  Fun  offers  emotions  surrounding  the 
unique.  Games  provide  a  sequence  of 
emotions,  often  starting  with  curiosity 
then  creating  surprise.  Easy  Fun  can 
sometimes  create  wonder  or  awe  if  the 
effect  is  particularly  strong. 


FIGURE  34. 1 2.  Easy  Fun  Emotion  Cycle.  Copyright©  2005-2007  XEODesign,  Inc.  All  rights  reserved. 


player  knows  and  does  not  know.  Easy  Fun  offers  detail  that  re¬ 
wards  player  exploration  and  paying  closer  attention.  Because 
challenges  can  feel  like  a  grind,  Easy  Fun  refreshes  and  provides 
new  and  interesting  things  to  do  in  the  game.  Easy  Fun  provides 
novelty  to  keep  play  open  ended  and  interesting  not  because 
players  wonder  about  whether  they  have  “the  stuff”  it  takes  to 
reach  a  goal,  but  simply  to  make  the  player  experience  inter¬ 
esting,  surprising,  and  far  from  routine. 


RELAXATION  FROM  SERIOUS  FUN  GAMEPLAY 


Relaxation  Changes  How  Players  Think,  Feel,  and  Behave 


Serious  Fun 


do  real  work 

learn 

thoughts 

sensations 

values 

self 

improvement  behavior 


Gameplay  that  changes  the  self  and  the  real  world. 


Game  Features 

repetition 

rhythm 

knowledge 

stimulation 

meditation 

work  out 

get  into  shape 

study 

Emotions 

relax/excite 
desirable  self 
change 
real  work 
esteem  boost 


“Playing  helps  me  blow  off  frustration  at  my  boss.  ” 

— a  hard  core  Halo  player 

In  Serious  Fun,  people  playgames  with  a  purpose  (Fig.  34.13). 
They  play  to  improve  their  lives  by  changing  how  they  think,  feel. 


FIGURE  34.13.  Serious  Fun  creates  emotions  to  engage  in 
activities  that  players  hope  will  change  how  they  think,  feel,  and 
behave  or  that  will  accomplish  real  work.  The  player  experience 
profile  of  Serious  Fun  includes  relaxation,  excitement,  boosts  in 
self-esteem,  and  the  satisfaction  from  a  job  well  done.  Copy¬ 
right©  2004-2007  XEODesign,  Inc.  All  rights  reserved. 
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or  behave.  The  enjoyment  motivates  continued  engagement 
with  an  activity  that  brings  the  desired  results.  The  Serious  Fun  in 
games  reliably  relaxes  or  excites  gamers  as  they  play  for  a  pur¬ 
pose  to  change  the  player’s  internal  state,  develop  good  habits, 
improve  self-esteem,  learn,  or  do  work  outside  of  the  game  itself 
In  Serious  Fun,  the  entertainment  value  captures  attention  from 
the  emotions  surrounding  the  human  values  expressed  through 
the  act  of  play.  Almost  like  therapy,  players  report  unparalleled 
states  of  concentration  and  focus  during  play,  making  this  shift  in 
emotional  state  one  of  gaming’s  biggest  takeaways.  Whether  the 
activity  is  fast-action  with  lots  of  explosions  or  slow-paced  col¬ 
ored-block  matching,  players  play  to  experience  feelings  of  ex¬ 
citement,  blow  away  frustration,  “get  perspective”  on  their  trou¬ 
bles,  or  create  a  meditative  experience.  The  stimulation  and 
concentration  required  drives  out  bothersome  thoughts.  Some 
players  simply  want  to  feel  more  relaxed,  excited,  or  less  bored. 
Players  look  for  a  physical  and  mental  workout  such  as  exercise 
in  Dance  Dance  Revolution,  heightened  reflexes  in  Project 
Gotham  Racing,  or  an  increase  in  mental  acuity  from  word 
games  such  as  Bookworm.  Positive  and  negative  affect  from  play 
guides  the  player  toward  correct  moves,  and  negative  affect  in¬ 
creases  the  perceived  pace  of  the  game.  Real-world  benefits  in¬ 
clude  releasing  stress  therapeutically,  learning  new  vocabulary  or 
math,  improving  physical  fitness  through  exercise,  and  increas¬ 
ing  a  player’s  mental  agility,  which  some  players  believe  wards  off 
the  effects  of  aging.  Physical  movement  also  creates  an  emo¬ 
tional  release  felt  after  exercise  and  positive  feelings  from  learn¬ 
ing  and  accomplishing  tasks  in  an  engaging  way.  Games  played 
for  any  of  these  reasons  offer  Serious  Fun. 

Players  play  to  change  themselves.  Serious  Fun  is  similar  to 
Easy  Fun  in  that  players  enjoy  immersion  rather  than  challenge. 


However,  one  aspect  of  enjoyment  many  players  prefer  is 
achieving  a  purpose  outside  the  game  experience  itself  such  as 
the  ability  of  the  game  to  calm  or  excite  them.  As  in  Easy  Fun, 
players  want  the  game  to  fill  their  attention,  but  Serious  Fun  fo¬ 
cuses  on  play’s  affects  long  after  the  game  is  over.  The  term  Se¬ 
rious  Fun  captures  the  real  effort  and  intent  that  players  expend 
to  alter  their  internal  states,  express  their  values  and  beliefs,  and 
improve  their  real  world  skills.  Without  Serious  Fun,  a  game 
leaves  few  long-lasting  effects  and  the  play  experience  feels 
more  like  a  waste  of  time.  Playing  may  be  hard  or  easy,  but  what 
is  important  to  many  players  is  that  the  effects  of  a  game  last 
long  after  the  game  is  over. 


Chain  of  Serious  Fun  Emotions  Increase  Emotions 

In  Serious  Fun,  many  players  play  to  change  how  they  feel,  and 
they  take  different  emotional  paths  to  get  there  (Fig.  34.14). 

Serious  Fun:  Enjoyment  Helps  Accomplish  Real  Work 

“/  felt  better  about  playing  [crosswords]  because  it’s  good  for  me.  If 
someone  would  tell  me  Tetris  was  good  for  me  I’d  feel  better  about 
playing  that.  ” 

— Ellen  on  doing  crosswords.  She  believes  the  memory  demands 
of  the  game  will  keep  her  mentally  sharp  and  delay  the  onset 

of  Alzheimer’s  disease. 

Serious  Fun  teaches  or  accomplishes  real  work.  Learning- 
type  games,  from  multiplication  to  database  administration,  uti- 


Chain  of  Serious  Fun  Emotions  Increases  Enjoyment 

Gameplay  that  changes  has  a  player  feels  thinks  or  behaves  creates  Serious  Fun  emotions  from  creating  something  players  value. 


Emotion 

Common  Themes  and  Triggers 

Excitement 

From  sudden  changes,  novelty  and  challenge  (Ekman,  2003).  The 
unexpected  catches  attention.  Many  players  enjoy  the  adrenaline 
rush.  Some  want  to  raise  their  state  of  arousal,  others  like  how  the 
emotional  intensity  makes  them  more  relaxed  afterwards. 

more 

frustration 

excitement 

Relaxation 

Relief  from  negative  emotion.  (Ekman,  2003).  Gamers  often  start  a 
game  to  attain  relief  from  negative  emotions  and  thoughts  prior  to 

frustration 

boredom 

zen-like 

focus 

relaxation 

playing. 

*Self-esteem  and  knowledge  acquisition  and  the  result  of  exercise  are  not  emotions,  but  changes  in  a  player's  internal  states  that  are  reported  as  desirable  results 
from  play  (Lazzaro,  2004b). 

FIGURE  34.14.  Serious  Fun  Emotion  Cycle.  A  large  part  of  the  appeal  of  games  comes  from  how 
they  change  the  player  inside.  This  is  referred  to  as  Serious  Fun.  Here  a  player  starts  in  one  of  two 
emotional  states  and  plays  a  game  to  end  up  feeling  different.  Many  gamers  appreciate  real-world 
benefits  from  play.  Some  play  games  to  become  deeply  immersed  in  a  process  that  they  hope  will 
change  them  long  after  the  game  is  over.  Copyright©  2005-2007  XEODesign,  Inc.  All  rights  reserved. 
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lize  emotion  to  encourage  the  learning  of  important  content 
or  to  provide  a  mental  workout,  as  in  an  effort  to  prevent 
Alzheimer’s.  Relaxation  and  excitement  make  it  easier  for  the 
player  to  focus.  Serious  Fun  provides  emotions  and  opportuni¬ 
ties  for  success  unavailable  to  the  player  in  real  life  by  offering 
a  cheaper,  safer,  more  rewarding  experience  with  the  content  or 
activity.  Rescue  simulations  can  train  firefighters  about  situations 
that  are  too  dangerous  or  expensive  to  do  in  real  time.  In  edu¬ 
cational  games  such  as  DBA  Day  (an  Oracle  workplace  simula¬ 
tion  XEODesign  worked  on)  players  role  play  a  database  ad¬ 
ministrator,  including  managing  their  own  time.  Gee  argued 
that  mastering  any  simulations  requires  mastering  the  content 
embedded  in  that  simulation.  By  making  decisions  for  the  char¬ 
acter  in  a  simulation,  he  believed  that  simulation  games  also 
teach  the  morals  and  values  of  that  character  coming  from  their 
merged  identities.  The  player  makes  the  decisions  and  the  game 
character  has  the  special  abilities  to  make  changes  in  the  vir¬ 
tual  world  (Gee,  2003).  These  provide  learning  opportunities, 
whether  it  is  values  and  actions  of  a  thief  in  Sly  Cooper  and  the 
Thievius  Raccoonus  or  a  restaurant  owner’s  time  and  people- 
management  skills  'mDiner Dash  (Lazzaro,  2005b).  Serious  Fun 
provides  real  benefits  from  play  by  using  game-like  structures  to 
reward  concentration  and  focus  attention  on  an  activity  that  is 
good  for  players.  Leap  Frog’s  FDA  pen-top  computer  helps  stu¬ 
dents  with  their  Spanish  vocabulary  by  offering  a  verbal  transla¬ 
tion  of  words  written  in  English;  hearing  it  spoken  in  another 
language  motivates  practice. 

Games  offer  a  new  spin  on  learning  by  capitalizing  on  player 
emotion  and  interaction  to  create  customized  training.  Mavis 
Beacon  Teaches  Typing  motivates  typists  with  typing  games, 
such  as  keeping  ants  out  of  a  picnic  basket  or  driving  a  racecar. 
In  addition  to  explicit  learning  games  (such  as  Oracle’s  DBA  Day 
or  an  unrelated  project.  Doom  DBA),  games  also  embed  tech¬ 
niques  to  offer  players  more  immersive  instruction.  The  intro¬ 
ductory  experience  in  Halo  not  only  provides  a  seamless  intro¬ 
duction  to  the  user  interface  by  integrating  it  into  the  story;  but 
also  adjusts  the  controls  to  meet  the  player’s  preferences  with¬ 
out  a  dialog  box.  In  Grand  Tourismo,  a  detailed  car-racing  sim¬ 
ulation  game,  the  opening  time  trials  break  down  racing  skills, 
such  as  cornering,  into  small,  easy-to-master  steps  that  fit  to¬ 
gether  to  create  a  more  enjoyable  practicing  experience.  By  pro¬ 
viding  a  motivating  alternative  to  accomplishing  an  otherwise 
boring  or  unmotivativating  task.  Serious  Eun  helps  players  ac¬ 
complish  real-world  objectives  such  as  getting  in  shape.  Some¬ 
one  may  play  Dance  Dance  Revolution  for  the  excitement  of 
moving  to  the  music,  to  lose  weight  by  burning  calories,  or  to 
learn  the  physical  skills  and  coordination  required  to  dance 
better.  Others  pldy  Karaoke  Revolution  to  learn  how  to  sing. 

Games  even  make  work  fun.  Players  pay  for  the  experience  of 
being  a  waitress  (Diner  Dash) ,  business  owner  {Lemonade  Ty¬ 
coon),  dungeon  master  {Dungeon  Keeper),  professional  sports 
player  {Madden  Football),  or  theme-park  owner  {Rollercoaster 
Tycoon).  They  even  buy  the  opportunity  to  sort  bugs  by  color 
{Tumblebugs) ,  pick  up  their  rooms  {Katamari  Damacy),  or 
manage  a  city  {Sim  City).  Games  model  life  problems  {The  Sims) 
and  improve  performance  during  real  work.  In  The  ESP  Game, 
developed  at  Carnegie  Mellon  University,  people  play  a  guess¬ 
ing  game  to  enliven  the  otherwise  boring  task  of  providing  text 
labels  for  images  (von  Ahn,  2004) .  Idea  or  prediction  markets  use 


games  to  beat  expert  opinion  polls.  Prediction  markets  allow  par¬ 
ticipants  to  express  their  opinions  through  buying  and  selling 
shares  with  either  virtual  or  real  money  One  of  the  oldest,  the  Uni¬ 
versity  of  Iowa’s  Iowa  Electronic  Market,  has  allegedly  beat  ex¬ 
pert  polls  in  predicting  U.S.  presidential  elections  since  1988.  Play¬ 
ers  of  the  Hollywood  Stock  Exchange  use  virtual  money  to  predict 
what  actor,  director,  or  film  will  receive  an  Oscar  nomination.  At 
NewsEutures,  players  compete  for  prizes  based  on  their  ability  to 
predict  news  events.  Several  companies,  such  as  Google,  use  in¬ 
ternal  markets  to  predict  launch  dates  and  job  openings.  Other 
companies  have  created  internal-prediction  markets  for  sales  fore¬ 
casting  (Newsweek,  2004),  and  managing  manufacturing  capacity 
{Harvard  Business  Review,  2003).  Not  without  controversy  (gam¬ 
bling  is  illegal  in  many  countries),  markets  have  even  been  pro¬ 
posed  to  predict  terrorist  attacks  (The  New  York  Times,  2003).  Be¬ 
cause  playing  prediction  markets  accomplishes  a  real  purpose,  it 
changes  how  participants  feel  about  playing  in  general.  Like  offer¬ 
ing  a  prize  for  a  competition  to  develop  a  solar  car,  having  real 
money  or  reputation  on  the  line  increases  the  excitement.  In  Seri¬ 
ous  Fun,  players  accomplish  real  work  for  many  reasons,  includ¬ 
ing  to  meditate,  lose  weight,  label  the  every  image  on  Internet,  or 
beat  'Wall  Street  predictions. 


AMUSEMENT  FROM  PEOPLE  FUN  GAMEPLAY 


Amusement  Encourages  Social  Interaction 

“It’s  the  people  that  are  addictive,  not  the  game.  ” 

— Bob,  a  hardcore  sports  game  player 

One  advantage  that  a  computer  game  has  is  that  it  is  ready  to 
play  when  you  are.  No  friend  required.  Still,  for  many  people  it 
is  the  experience  of  playing  with  friends  or  family  that  makes 
play  worthwhile.  Group  gaming  provides  a  mechanism  for  so¬ 
cial  interaction,  a  quick  excuse  to  invite  friends  over,  and  more 
challenges  to  gameplay.  People  Fun  creates  amusement  be¬ 
tween  gamers  as  they  play  to  spend  time  with  their  friends.  Peo¬ 
ple  report  playing  games  when  their  friends  do,  playing  games 
they  do  not  like,  and  even  playing  the  types  of  games  that  they 
do  not  like  playing  games  just  to  spend  more  time  with  their 
friends  (Lazzaro,  2004b). 

“Since  we  lost  half  our  guild  to  Star  'Wars  Galaxies  it’s  not  as  fun.  ” 

— A  Hard  Core  Gamer  Playing  Dark  Age  of  Camelot 

People  Fun  is  a  self-motivating  activity  that  maintains  player 
engagement  by  supporting  interaction  with  other  people  (Fig. 
34.15).  People  Fun  encourages  interaction  with  other  players 
as  they  establish  social  hierarchies,  joke,  and  develop  social 
bonds.  It  creates  affect  by  providing  opportunities  to  cooperate, 
compete,  and  watch  others  play.  Some  emotions,  such  as  grati¬ 
tude,  generosity,  and  schadenfreude,  the  German  word  for 
“pleasure  at  the  misfortune  of  a  rival”  (Ekman,  2003),  cannot  oc¬ 
cur  while  playing  alone.  "When  people  play  together  they  invent 
new  ways  to  interact,  develop  house  rules,  and  add  their  own 
content  to  create  more  pleasure  for  themselves  and  their 
friends. 
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Game  Features 

People  Fun  cooperate 

compete 

communicate 

perform 

spectacle 

communicate  characters 

personalize 

Feelings 

compete  amusement 

shadenfreude 

naches 

generosity 

gratitude 

elevation 

Gameplay  that  involves  others  to  create  emotion  between  players 
and  increase  social  bonds. 


FIGURE  34.15.  People  Fun  creates  emotions  that  increase  a 
player’s  enjoyment  from  social  interaction.  The  player  experience 
profile  of  People  Fun  includes  many  emotions  that  require  two 
people  such  as  Schadenfreude,  amusement,  naches,  and  grati¬ 
tude.  Copyright©  2004-2007  XEODesign,  Inc.  All  rights  reserved. 


The  most  emotion  from  games  arises  when  people  play  to¬ 
gether  in  the  same  room.  Whether  playing  in  a  living  room  or 
cyber  cafe,  the  frequency,  duration,  variety,  and  intensity  of  emo¬ 
tional  displays  all  increase  (Lazzaro,  2004b).  The  emotions  of 
schadenfreude,  amusement,  and  naches  (Yiddish  for  the  plea¬ 
sure  and  pride  felt  when  a  child  or  mentee  succeeds)  occur 
more  frequently  when  playing  with  others.  Playing  over  the  In¬ 
ternet,  even  with  video  or  voice  chat,  elicits  fewer  emotional 
displays  than  when  playing  in  the  same  room.  A  major  reason 
behind  this  increased  emotion  is  that  being  in  the  same  room  al¬ 
lows  for  additional  interaction  between  players  beyond  what  is 
available  in  the  game  alone.  Players  exchange  insults  and  outdo 
each  other  with  witty  commentary.  They  add  content  and  new 
rules  as  real  time  presence  allows  for  more  flexibility  (Lazzaro, 
2004b).  Considered  in  light  of  Ekman’s  nine  pathways  to  creat¬ 
ing  emotions  (Fig.  34.16),  People  Fun  uses  more  of  them  than 
Hard  Fun,  Easy  Fun,  or  Serious  Fun.  In  fact,  the  last  five  out  of 
the  nine  in  Fig.  34.16  are  enhanced  in  group  face-to-face  play. 


People  Fun:  Creates  More  Emotion  When  Playing 
Face  to  Face 

“Enjoyment  when  your  friends  blow  up.  ” 

— Pat,  a  PS2  gamer  on  his  favorite  game  emotion 

The  involvement  of  others  increases  the  complexity  of  the 
game  by  creating  cooperation  and  competition,  which  in  turn  in¬ 
creases  the  emotions  from  play.  Different  roles  for  players  with 
shared  and  opposing  objectives  increase  player  interaction. 
Achieving  a  goal  with  a  simple  rule  set  becomes  exponentially 
more  complex  when  a  player’s  opponent  is  another  person  or 
group.  Not  all  players  like  the  feelings  of  rivalry  from  competition 
with  their  friends.  Many  prefer  to  team  up  with  their  buddies  to 


Nine  Pathways  to  Emotions 
(Ekman,  2004) 

Example  from  Games 

1 .  Auto  appraisal 

Explosions,  fire,  and  boiling  lava 

2.  Reflective  appraisal 

Thinking  about  last  night’s 
dungeon  raid 

3.  Memory  of  an  emotion 

Remembering  falling  from  a  cliff 

4.  Imagination 

Thinking  about  what  happens 
from  falling  off  a  cliff 

5 .  Talking  about 

Discussing  game  events  with 
other  players 

6.  Empathy  or  witnessing 

Another  player’s  facial 

another's  emotion 

expression  or  character  emote 

7.  Instruction  by  others  on 

Another  player’s  assessment 

how  to  feel 

of  an  event 

8.  Violation  of  social  norm 

Driving  over  other  players 
instead  of  racing  against  them 

9.  Making  facial  expression 

Smiling  and  laughing  after  defeat 

of  an  emotion 

in  front  of  friends  makes  it  feel 
more  positive 

FIGURE  34. 1 6.  There  are  nine  ways  to  create  emotions  in  a  person. 


compete  against  computer-controlled  characters.  To  encourage 
group  formation.  Massively  Multiplayer  Online  Games  (MMOs)  of¬ 
ten  have  interdependent  classes  of  players,  a  design  that  comes 
from  the  Dungeons  and  Dragons  paper-based  role-playing  game. 
In  these  games,  such  as  WOW,  character  classes  encourage  the  for¬ 
mation  of  groups,  and  treasure  quests  and  dangerous  dungeons 
require  players  to  pool  their  characters’  specializations  such  as 
archers,  swordsmen,  hunters,  paladins,  wizards,  and  healers.  Im¬ 
proved  success  rates  and  score  bonuses  reward  players  working 
together  in  the  game  and  help  form  the  social  bonds  that  increase 
a  player’s  emotional  attachment  to  the  game  (Fig.  34.17). 

People  Fun  keeps  players  going  by  providing  new  ways  for 
players  to  interact.  It  provides  a  variety  of  player-to-player  out¬ 
comes  and  bright  spectacles  for  the  audience.  It  provides  the  op¬ 
portunity  to  perform  difficult  maneuvers  and  stunts  that  reward 
practice  with  which  to  amuse  and  amaze  friends.  In  Sou/  Calibur 
11,  players  interact  with  each  other  by  fighting  rather  than  racing 
side  by  side  to  beat  the  clock.  The  frequency  and  the  variety  of 
player-to-player  interaction  increase  commentary  between  players 
and  make  it  more  enjoyable  for  an  audience  to  watch.  Spectators 
can  participate  throughout  with  comments  and  criticisms.  Peo¬ 
ple  Fun  games  with  hidden  aspects,  special  moves,  cheat  codes, 
and  Easter  Eggs  (hidden  games,  objects,  rooms,  or  animations)  of¬ 
fer  even  more  opportunity  to  impress  friends.  All  of  these  create 
admiration  and  envy  between  players  (Lazzaro,  2004b). 

Some  Emotions  Require  Two  People 

People  Fun  wins  out  over  other  types  of  gameplay  with  its  ex¬ 
clusive  access  to  emotions  that  require  two  people,  such  as 
schadenfreude  and  naches.  The  most  frequent  emotion  in  co¬ 
located  group  gaming  is  amusement.  In  a  group  context,  even 
negative  events  are  sources  of  laughter.  The  feeling  of  naches 
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Chain  of  People  Fun  Emotions  Increases  Enjoyment  between  Players 

People  Fun  motivates  group  interaction,  interpersonal  relationships,  and  creates  emotions  between  players. 

Emotion 

Common  Themes  and  Triggers 

gratitude  Amusement 

Something  that’s  funny.  (Ekman,  2003)  Regardless  of  the  emotional  tenor  of  the  game, 
people  in  groups  laugh  more  than  when  playing  alone. 

generosity 

elevation 

Schadenfreude* 

Pleasure  over  misfortune  of  a  rival  (Ekman,  2003).  Competitive  players  enjoy  beating 
each  other,  especially  a  long-term  rival.  Boasts  are  made  about  player  prowess  and 
ranking. 

Naches* 

Pleasure  or  pride  at  the  accomplishment  of  a  child  or  mentee  (Ekman,  2003).  Players 
enjoy  seeing  friends  succeed  as  well  as  the  achievements  of  nonplayer  characters  they 
have  helped,  such  as  in  Back  to  Baghdad,  where  players  can  exchange  health  packs. 

*Schadenfreude  and  naches  do  not  have  universal  facial  gestures;  however,  they  are  frequently  described  by  players  (Lazzaro,  2004b). 

FIGURE  34.17.  People  Fun  Emotion  Cycle.  In  People  Fun,  some  emotions  link  together  like  a 
chain  between  players,  such  as  offering  a  healing  spell  or  health  pack  (generosity)  to  a  fellow 
player  in  need  (gratitude).  A  third  player  feels  elevation  at  witnessing  this  human  kindness  and 
becomes  more  likely  to  be  generous.  Next  time  a  different  player  may  be  in  need  and  the  links 
on  this  chain  would  rotate,  in  this  way,  one  game  mechanic  offers  three  emotions  depending  on 
who  starts  the  chain.  Copyright©  2005-2007  XEODesign,  Inc.  All  rights  reserved. 


and  emotional  attachment  created  in  Nintendogs  (a  puppy  sim¬ 
ulation  game  for  the  Nintendo  DS)  are  so  strong  that  they  not 
only  created  a  best-selling  title,  but  the  title’s  popularity  in¬ 
creased  the  sales  of  the  hardware  platform  as  well. 

People  Fun  in  games  is  also  present  in  interaction  with  char¬ 
acters  on  screen.  It  enhances  emotions  with  facial  expressions 
for  nonplayer  characters  (NPCs)  as  well  as  for  player-controlled 
characters.  In  Diner  Dash,  players  not  only  have  to  react  to 
changes  in  characters’  facial  expressions  but  to  win  the  game  re¬ 
quires  managing  customer  emotions  to  prevent  them  from  get¬ 
ting  so  angry  that  they  leave  (Lazzaro,  2005b).  Games  such  as 
{World  ofWarcraft)  increase  emotions  between  players  by  of¬ 
fering  emotes  and  chat  features.  Adding  voice  and  video  com¬ 
munication  between  players  increases  emotions,  and  co-located 
group  play  takes  this  one  step  further  by  supporting  face-to-face 
collaboration  in  addition  to  face-to-screen. 

Players  Modify  Games  to  Enhance  Emotions 
During  Group  Play 

In  People  Fun,  players  like  playing  together  so  much  that  they  cre¬ 
ate  house  rules  and  convert  single  player  games  to  make  group 
play  more  enjoyable.  In  order  to  play  a  Buzz  Lightyear  game  to¬ 
gether,  one  family  assigns  each  person  a  different  key  to  run, 
jump,  or  shoot.  To  share  in  a  pirate  adventure,  one  set  of  three  col¬ 
lege  students  plays  Sid  My er’s  Pirates!  game  by  taking  turns.  In  a 
bike-racing  game  where  players  outnumbered  the  controllers,  the 


winner  is  the  player  who  “ran  the  gauntlet”  and  beat  all  challengers 
in  the  room,  not  by  winning  the  race,  as  that  would  take  too  long, 
but  by  outdistancing  each  opponent.  This  said,  some  players  es¬ 
chew  contact  with  other  people  if  they  mostly  care  about  devel¬ 
oping  strategy  or  understanding  how  to  play  better.  To  them,  play¬ 
ing  a  game  such  as  Hearts  online  is  better  when  played  alone 
because  they  do  not  want  to  chat  or  posture  with  players  that  they 
do  not  know.  Such  players  prefer  the  Hard  Fun  even  in  multiplayer 
games.  'Whether  it  is  WOW  or  Spades,  and  depending  on  their 
mood,  some  end  up  soloing  rather  than  joining  a  group. 


EMOTION  FROM  ENTERTAINMENT 
INTERFACES  AND  INTERACTIVE  PRODUCTS 


The  Four  Fun  Keys:  Hard  Fun,  Easy  Fun,  Serious  Fun,  and  People 
Fun  play  an  important  role  in  making  games  self-motivating. 
They  provide  the  opportunity  for  challenge  and  mastery, 
prospect  for  exploration  and  discovery,  give  a  boost  to  self¬ 
esteem  and  an  excuse  to  spend  time  with  friends.  The  gameplay 
in  games  capitalizes  on  the  interaction  between  human  emo¬ 
tion,  thought,  and  behavior.  From  this  perspective,  productiv¬ 
ity  applications  have  the  same  goal.  Productivity  applications 
take  for  granted  that  the  user  provides  the  motivation  for  using 
the  tool,  such  as  workplace  responsibilities  or  a  boss’s  dead¬ 
line.  The  compelling  and  enjoyable  nature  of  computer  games 
makes  the  process  itself  enjoyable,  so  game  designers  spend  a 
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great  deal  of  design  effort  to  ensure  that  their  games  provide  a 
rich  emotional  experience  during  play.  Study  of  these  interac¬ 
tions  offers  clues  as  to  how  to  make  productivity  tools  more  mo¬ 
tivating  without  making  it  a  game.  The  importance  of  emotions 
in  design  is  that  emotions  can  influence  when,  how,  and  why 
to  use  a  product. 

Game-Inspired  Affect  Makes  Product 
Use  More  Entertaining 

Games  offer  many  lessons  for  product-interaction  design  outside 
of  entertainment,  especially  in  the  relationship  between  design 
and  emotion.  Software  should  make  tasks  easy  by  requiring  just 
a  few  steps  and  providing  appropriate  emotional  stimuli  to  focus 
and  reward  the  user  for  completing  them.  One  of  the  biggest  dif¬ 
ferences  between  productivity  software  and  software  used  for 
entertainment  is  that  the  pleasure  and  motivation  comes  from 
accomplishing  outside  tasks,  such  as  completing  a  spreadsheet 
or  creating  a  database.  Emotions  in  nongames  should  create  en¬ 
joyment,  aid  decision  making,  focus  effort  and  attention,  and 
provide  motivation.  Like  games,  it  is  possible  to  design  interac¬ 
tions  that  are  similar  to  all  four  types  of  fun  to  create  a  wide  spec¬ 
trum  of  emotions  in  nongame  software.  (Fig.  34.18) 

Playful  Product  Attributes  Increase  Emotions  During  Else 

Many  products  employ  novel  opening  techniques  to  increase 
customer  appeal.  Beyond  “out-of-the-box”  experiences,  the  ex¬ 
perience  of  the  product  in  use  is  becoming  an  important  part 
of  a  product’s  competitive  advantage.  Similar  to  the  way  pack¬ 
aging  frames  consumer  expectations  and  emotional  associations 


Creating  Affect  Through  Actions  For  Nongames 

Hard  Fun 

Easy  Fun 

Affect  to  increase  focus  on  and 
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Searching 

Time  management 
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and  cohesion  between 
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Cell  phones 

Learning  tools 
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management  applications 

Cleaning  products 
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FIGURE  34.18.  The  affect  from  the  Four  Fun  Keys  can  increase 
emotions  and  enjoyment  for  several  kinds  of  productivity-soft¬ 
ware  applications  and  consumer  products.  Copyright© 
2004-2007  XEODesign,  Inc.  All  rights  reserved. 


with  a  product,  how  a  product  opens  or  switches  on  can  create 
emotions  and  other  associations  every  time  the  product  is  used. 
For  example.  Danger’s  Sidekick  cell-phone  lid  spins  open  hori¬ 
zontally  with  a  snap.  The  Robo-Book  Light  surprises  by  unfold¬ 
ing  like  a  mechanical  arm.  The  nozzle  on  All  liquid  laundry  soap 
dispenser  features  a  bright  red  rubber  ball  to  squeeze  for  de¬ 
tergent.  The  designers  at  IDEO  transformed  the  tiered  old  car¬ 
pet  sweeper  to  create  the  Swiffer  Carpet  Flick.  The  Carpet  Flick 
combines  a  free-flowing  universal  joint  at  the  head  (Easy  Eun),  a 
transparent  body  so  customers  miss  none  of  the  action  (Hard 
Fun’s  feedback  on  goals),  and  accomplishes  real  work  in  an  en¬ 
joyable  way  (Serious  Fun).  The  buttons  on  some  productivity 
software  offer  Easy  Fun  if  they  glow  or  highlight  when  moused 
over.  A  hard  disk  defragmenting  utility  displays  progress  with  an 
animation  that  is  mesmerizing  and  game-like  to  watch,  as  the 
colored  segments  are  organized  and  grouped  together.  None  of 
these  products  are  games,  yet  all  of  them  offer  Easy  Fun  with 
novelty  to  inspire  curiosity,  surprise,  and  a  little  wonder.  Re¬ 
gardless  of  whether  it  is  a  productivity  application  or  a  trash  can, 
if  it  creates  enjoyable  emotions,  it  will  be  used  more  often  and 
shown  off  to  friends  (People  Fun) . 

Breath-relaxation  techniques  are  more  effective  if  practiced 
daily.  To  encourage  frequent  use  with  Serious  Fun  the  Stress- 
eraser  (a  home-therapy  device  that  provides  bio  feedback  on 
the  user’s  pulse  and  breathing  exercises)  uses  both  Hard  Fun 
and  Serious  Fun.  The  device  rewards  customers  with  a  point 
system  and  game-like  chart  feedback  of  their  pulse  data.  The 
use  of  a  display  graph  and  a  gradual  increase  in  number  of 
points  earned  per  session  encourages  use  by  offering  Hard  Fun 
and  Serious  Fun  at  the  same  time. 

“Why  not  challenge  yourself  to  see  how  good  you  can  make  yourself  feel 
by  getting  100  or  more  points  a  day  for  two  months?” 

‘A  session  of  30  or  more  points  is  a  great  way  to  start  the  day  off  right. 
Especially  on  days  when  you  feel  like  you  got  up  on  the  wrong  side  of 
the  bed.” 

— (www.stresseraser.com) 

The  catalog  copy  on  the  company’s  website  frames  customer 
expectations  in  terms  of  Hard  Fun.  The  game-like  intent  of  the 
product  designers  is  even  more  evident  in  unreleased  game 
modules  where  customers  float  a  bird  around  obstacles  as  a  re¬ 
ward  for  reducing  their  pulse  (Fabricant,  2005).  Their  approach  to 
relaxation  is  not  about  building  frustration  to  eventually  succeed 
through  their  skill  to  feel  hero.  Instead,  with  the  Stresseraser,  as 
in  the  Interaction  Institutes  Brain  Ball  game  (Hjelm,  2003),  peo¬ 
ple  win  by  relaxing  and  taking  deep  breaths.  For  these  products 
goals  and  feedback  allows  players  to  monitor  and  change  their 
own  emotional  state  for  Hard  Fun  and  Serious  Fun. 


Designing  Game  Affect  for  Entertainment  Access 

Playful  Interaction  and  Feedback  Facilitates 
Entertainment  Access 

Game-inspired  affect  increases  the  appeal  of  entertainment  ap¬ 
plications  such  as  televisions,  game  consoles,  e-zines,  and  music 
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players.  Because  the  primary  customer  goal  is  to  locate  content, 
accessing  entertainment  options  need  not  be  a  game  in  itself. 
However,  entertainment  devices  can  have  a  game-like  feel.  They 
should  be  even  easier  to  use  than  productivity  applications,  be¬ 
cause  they  don’t  borrow  motivation  from  completing  an  outside 
task.  They  should  also  create  excitement  about  the  entertain¬ 
ment  options  they  provide. 

Since  the  beginning,  graphic-user  interfaces  have  employed 
metaphors  such  as  windows,  trashcans,  and  elevators  to  explain 
how  features  work.  Apple’s  Macintosh  OSX  operating  system  es¬ 
tablished  a  new  trend  in  modern  interface  design  by  going  be¬ 
yond  the  clear  explanation  of  function  to  creating  an  experi¬ 
ence.  Mac  OSX  is  an  interface  that  is  “fun”  to  use.  The  dock 
magnifies  icons  under  the  cursor  to  make  them  easier  to  select, 
at  the  same  time  creating  pleasurable  animation  that  is  pure 
Easy  Fun,  even  without  the  goal  of  opening  an  application.  The 
genie-out-of-a-bottle  animation  of  document  windows  as  they 
expand  out  of  a  tiny  icon  on  the  dock  offers  more  Easy  Fun  to 
create  curiosity,  surprise,  and  wonder.  Operating  these  features 
entertains  all  by  its  self 

Creating  Emotions  in  Entertainment  Access: 

Apple’s  iPod  Case  Study 

The  most  game-inspired  device  to  come  out  of  Apple  is  the 
iPod.  The  iPod’s  novel  combination  of  hardware  and  software 
interfaces  creates  a  new  experience  full  of  affect.  The  novel  in¬ 
teraction  facilitates  entertainment  access  by  offering  a  customer 
experience  unlike  any  other.  While  not  technically  a  game,  the 
iPod  uses  Hard  Fun,  Easy  Fun,  Serious  Fun,  and  People  Fun  to 
create  emotional  experiences  for  the  user. 

Central  to  the  user  experience  of  the  iPod  is  the  control 
wheel.  It  attracts  attention  and  creates  emotions  of  Easy  Fun  such 
as  curiosity  and  surprise.  Its  novel  round  motion  easily  scrolls 
through  long  play  lists.  Emotions  arising  from  exploring  the 
free-flowing  motion  focus  attention  on  finding  a  song.  The  quick 
feedback  (including  a  separate  speaker  just  for  the  scroll  clicks) 
enhances  the  sense  of  mastery  (Hard  Fun)  as  a  customer  gains 
control.  It  feels  like  a  game.  Even  without  pressing  play,  scrolling 
is  enjoyable  because  using  the  four  directional  buttons  and  wheel 
mimic  a  game  controller.  In  addition  to  the  form  resembling  a 
stereo  speaker,  the  circular  DJ  Scratching  action  takes  advantage 
of  an  action  already  associated  with  making  music  more  excit¬ 
ing  and  even  offers  an  opportunity  to  role  play  for  more  Easy 
Fun.  The  music  visualizations  on  the  companion  iTunes  software 
offer  fascinating  animations  in  time  with  the  music. 

A  game  of  Bricks  (similar  to  Breakout)  shipped  as  an  Easter 
egg  (hidden  game)  in  the  first  edition  and  later  more  elaborate 
games  such  as  Zuma  under  their  own  games  tab  further  es¬ 
tablished  the  iPod’s  connection  to  fun  and  enjoyment.  As  easy 
as  the  iPod  is  to  use,  users  must  experiment  and  might  expe¬ 
rience  frustration  in  learning  to  understand  the  menu  hierar¬ 
chy.  This  offers  opportunities  for  Hard  Fun  as  customers 
search  for  their  favorite  track.  Users  may  experience  a  feeling 
of  accomplishment  rather  than  fiero  when  they  succeed,  but 
the  dancing  black  silhouette  models  in  the  iPod’s  ad  cam¬ 
paigns  frolic  with  arms  overhead  clearly  evokes  fiero  and  other 
kinds  of  joy.  That  the  iPod  offers  Serious  Fun  of  mood-altering 


music  and  videos  is  a  given.  For  additional  Serious  Fun,  iPods 
also  play  business-audio  books  and  offer  file  storage  to  do  the 
real  work  of  transporting  documents  and  spreadsheets  along 
with  music  files.  The  trademark  white  headphones  offer  emo¬ 
tions  from  People  Fun  even  while  separating  from  others  audi- 
torally  by  sending  a  clear  message  about  the  fun-loving  social 
group  an  to  which  iPod  customer  belongs.  These  status-symbol 
icons  create  People  Fun  emotions  from  association  with  be¬ 
longing  to  the  “in  crowd”  as  well  as  the  emotions  expressed  in 
the  ads.  Several  iPod  accessories  allow  customers  to  share  their 
music  during  playback.  And  iTunes’  trading  of  play  lists,  cre¬ 
ation  of  custom  CDs,  and  podcasting  all  provide  ways  for  iPod 
users  to  reach  out  to  other  people  for  more  People  Fun. 

Creating  Emotions  in  Entertainment  Access: 

Microsoft's  XBOX  360Case  Study 

Ready  to  Inhale.  The  designers  of  the  XBOX  360  video 
game  platform  had  a  specific  customer  experience  in  mind. 
■Whereas  the  design  inspiration  for  the  first  XBOX  was  like  the 
Hulk  bulging  with  muscles  to  break  out  of  a  box,  the  XBOX  360 
was  to  be  more  of  an  inhale,  like  Bruce  Lee  drawing  breath  in 
preparation  for  play,  poised,  and  ready  to  strike.  Instead  of  a  file- 
and-folder  metaphor,  the  main  UI  for  the  video-game  console 
is  designed  to  capture  a  moment  of  preparation  before  the  chal¬ 
lenge.  The  idea  was  to  avoid  fussy  animations,  yet  provide 
something  interesting  and  intuitive  to  get  the  player  emotion¬ 
ally  ready  and  mentally  focused  before  play  begins.  The  Easy 
Fun  of  navigating  the  menus  increases  immersion.  The  emo¬ 
tions  for  this  interface  match  the  player’s  goals  during  this  part 
of  the  use  cycle. 

In  terms  of  Easy  Fun,  the  menus  for  the  XBOX  360’s  main  UI 
curve  sideways  like  rib  bones  and  mimic  the  gesture  of  inhala¬ 
tion  as  they  move.  The  menus  slip  from  side  to  side  in  a  breath- 
ing-like  motion.  The  sounds  are  organic  and  whip-like  with  a  sat¬ 
isfying  thump  at  the  end  as  the  menu  snaps  into  place.  This 
novel  auditory  and  visual  experience  makes  the  menus  enjoyable 
to  explore  without  becoming  too  complicated  or  intrusive.  Nov¬ 
elty  of  the  sideways  menus  creates  curiosity  and  encourages  ex¬ 
ploration.  The  designers  were  going  for  playful  (the  attitude  in 
games)  without  being  so  entertaining  as  to  be  a  game  in  itself  It 
is  about  providing  very  simple  access  and  a  pleasurable  experi¬ 
ence  that  creates  excitement  but  doesn’t  get  in  the  way. 

The  most  important  way  the  XBOX  360  increases  enjoyment 
of  games  played  on  its  console  is  through  the  use  of  People 
Fun.  To  deliver  more  People  Fun  emotions  and  to  foster  con¬ 
nections  between  players,  the  XBOX  360  offers  community  fea¬ 
tures  for  all  its  titles.  Gamers  can  create  their  own  persistent 
player  profile  (or  gamer  card)  that  spans  game  titles.  Personal 
profiles  provide  an  in-game  identity  and  list  in-game  accom¬ 
plishments  so  friends  can  compare  game-specific  achieve¬ 
ments,  scores,  and  what  game  they  are  currently  playing.  For 
the  nongamers  in  the  household,  the  XBOX  360  allows  access 
and  display  of  family  photos  and  video  for  a  different  kind  of  so¬ 
cial  interaction. 

The  emotions  in  these  user  experiences  for  iPods  and  XBOX 
360s  not  only  help  customers  accomplish  tasks.  The  UIs  create 
experiences  that  make  the  devices  pleasurable  to  use,  put  users 
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in  the  appropriate  emotional  and  cognitive  mindset,  and  capture 
the  next  level  of  the  four  pleasures  that  Jordan  considers  essen¬ 
tial  in  the  new  human  factors:  (a)  Physio-pleasure,  (b)  Psycho¬ 
pleasure,  (c)  Ideo-pleasure,  and  (d)  Socio-pleasure  (Jordan, 
2000).  As  the  focus  for  HCI  shifts  from  interface  design  to  user 
experience,  more  accurate  taxonomies  for  internal  experiences 
and  methods  to  measure  these  experiences  are  needed.  The  de¬ 
velopment  of  these  tools  will  facilitate  the  design  of  more  en¬ 
joyable  player  experiences.  While  entertainment  interfaces  such 
as  the  iPod  and  the  XBOX  360  have  done  much  to  create  pow¬ 
erful  user  experiences  on  several  levels,  there  is  still  much  more 
that  is  possible  to  do. 


NEW  DIRECTIONS  AND  OPEN  ISSUES 


New  Play  With  Smaller,  Smarter,  More  Flexible  Devices 

As  technology  shrinks  and  integrates  itself  into  more  aspects  of 
our  lives,  people  play  more  games  in  more  places.  Devices  that 
are  smaller,  smarter,  mobile,  and  contextually  aware  create  new 
opportunities  for  electronic  gaming,  which  until  now  has  largely 
been  restricted  to  players  at  home  around  a  single  screen.  What 
happens  when  the  services  offered  by  a  laptop  become  cheap 
enough  to  print  on  a  candy  wrapper?  Mobile  and  alternate-real¬ 
ity  gaming  offer  the  opportunity  to  make  things  happen  in  the 
real  world,  which  further  enhances  other  experiences  such  as 
enjoying  the  company  of  friends  and  dispelling  boredom  while 
waiting  in  line.  Games  break  out  into  the  real  world  through 
geo-caching  and  games  of  tag  through  a  city.  Promotional  Alter¬ 
nate  Reality  Games  (ARGs)  such  as  the  The  Beast  for  the  movie 
Al  and  I  Love  Bees  for  the  game  Halo  2,  use  real-world-web- 
enhanced  gameplay  to  market  products  and  create  communities 
of  millions  of  players  (4ourty2wo,  2005).  With  ubiquitous  com¬ 
puting,  everything  from  mobile  phones,  to  the  front  door,  to  a 
ketchup  bottle  in  a  diner  could  contain  enough  smarts  to  offer 
services.  Will  they  all  contain  a  screen  and  therefore  the  potential 
to  host  a  game?  Will  we  surf  the  net  from  our  saltshaker  or  will 
it  provide  other  opportunities  to  engage  our  attention? 

Many  designers  chase  faster  hardware  and  better  graphics, 
yet  the  emotional  power  of  games  does  not  occur  on  screen;  it 
takes  place  inside  the  player’s  head.  The  biggest  emotions  that 
new  technology  will  create  is  not  through  rendering  blood,  sweat, 
and  tears  in  molecular  detail.  The  real  changes  in  emotional  gam¬ 
ing  will  happen  through  supporting  a  more  agile  design  process 
and  by  making  gaming  devices  smaller,  sharable,  context-aware, 
and  more  accessible,  such  as  game  controllers  with  fewer  than 
12  buttons  like  Nintendo’s  Wii.  Until  we  all  get  personal 
holodecks  and  natural  language  processing,  the  success  of  many 
new  kinds  of  games  will  come  from  connecting  the  cheapest 
emotion  generators  around:  a  player’s  rivals  and  friends. 

Life  is  becoming  more  game  like  as  games  and  elements  of 
play  inspire  new  dimensions  of  product  appeal.  Like  games, 
products  create  experiences.  Cutting-edge  product  designers 
now  design  customer  experiences,  not  products,  as  witnessed 
by  the  iPod.  Designers  aim  for  engagement  in  addition  to  mak¬ 
ing  something  better/faster/cheaper  and  easier  to  use  or  market. 
Adding  playful  elements  to  goods  and  services  increases  the  at¬ 


traction  of  everything  from  advertising  messages,  to  Southwest 
Airline’s  in-flight  safety  announcements  to  the  design  of  public 
spaces.  We  see  the  increasing  importance  of  emotions  in  design 
of  products  such  as  the  playful  squid  shape  of  the  Phillip  Stark 
orange  juicer,  the  surprising  opening  rotation  of  Danger’s  Side- 
kick  mobile  phone,  and  in  the  pleasing  octave  chords  produced 
\y^  Segway’s  acoustically  designed  motors.  The  progress  being 
made  toward  this  shift  toward  more  emotional  design  is  even 
more  apparent  in  games. 

Games  are  already  redesigning  how  we  work  and  shop.  Em¬ 
ployers  use  games  to  screen  potential  hires  for  three-dimen¬ 
sional  (3D)  reasoning  skills  and  train  them  to  solve  problems 
with  multiple  variables.  Games  changed  consumer  processes 
such  as  buying  and  selling  on  eBay  or  the  dining  experience  at 
Dave  and  Busters  (Chuck  E.  Cheese  for  adults) .  Even  web  software 
applications  like  Flikr  (www.flikr.com)  include  more  fun  to  in¬ 
crease  appeal  by  sharing  items  between  friends  or  by  offering 
features  to  be  “gamed,”  such  as  displaying  the  number  of 
friends  a  person  has  in  Friendster  or  Linkedin  (www.friendster 
.com.www.linkedin.com)  social-networking  software.  Under¬ 
standing  how  games  create  emotions  makes  products  and  ser¬ 
vices  more  engaging  and  enjoyable  and  even  improves  commu¬ 
nity  and  the  quality  of  life. 

Open  Issues 

Better  design  of  the  emotions  required  for  and  that  result  from 
interaction  will  unlock  more  human  potential  from  using  prod¬ 
ucts  and  games  by  priming  and  rewarding  users  with  appropri¬ 
ate  affective  states.  We  know  that  cognition  and  emotion  walk 
hand  in  hand.  We  are  only  just  beginning  to  discover  how  they 
support  each  other  and  how  interaction  creates  emotion.  The 
Four  Fun  Keys  model  starts  this  journey  to  build  and  examine 
player  experiences  in  terms  of  specific  emotions  such  as  fiero 
from  Hard  Fun,  curiosity  from  Easy  Fun,  relaxation  from  Serious 
Fun,  and  amusement  from  People  Fun.  Further  measurement  of 
specific  emotions  during  different  types  of  cognitive,  behavioral, 
and  emotional  experiences  will  help  piece  together  the  role 
individual  emotions  play  in  cognition  and  activities.  For  example, 
a  surprising  event  orients  an  individual’s  attention  to  determine 
whether  the  source  is  a  benefit  or  a  threat.  Surprise  does  this 
in  addition  to  creating  internal  sensations,  producing  facial  ex¬ 
pressions,  and  communicating  information  about  the  source 
to  others.  Curiosity,  the  lead  emotion  from  Easy  Fun,  has  a 
strong  cognitive  component,  which  also  focuses  attention  in  a 
pleasurable  way.  Satisfying  curiosity  especially  when  it  results 
in  feelings  of  surprise  and  wonder  produces  strong  pleasurable 
sensations  that  motivate  an  individual  to  repeat  the  activity.  Ad¬ 
ditional  research  will  tell  us  how  to  create  designs  that  inspire, 
maintain,  and  intensify  feelings  of  curiosity.  This  is  important  for 
applications  such  as  Google  to  improve  searching  the  Internet 
or  when  browsing  an  e-commerce  catalog  (www.google.com). 
Conversely,  intensifying  hero  from  finding  the  search  item  also 
improves  the  user  experience.  Informed  with  a  deep  under¬ 
standing  of  how  curiosity  and  frustration  employ  different  af¬ 
fective  states  to  focus  attention  the  design  of  entertainment  and 
productivity  products  can  achieve  the  next  level  of  engagement 
by  sculpting  emotional  responses  that  compliment  the  tasks. 


698  •  LAZZARO 


CONCLUSION 


Games  have  the  creative  freedom  to  push  technical  boundaries 
and  be  light  years  ahead  of  productivity  interfaces.  They  have  been 
champions  and  early  adopters  of  new  interface  techniques  from 
joysticks  to  voice  commands.  Game  interfaces  lasso  new  hardware 
and  experiment  with  dialogs  and  menus  to  deliver  novel  experi¬ 
ences  with  a  vigor  never  seen  in  productivity  applications.  Mas¬ 
tering  these  innovative  interaction  techniques  provides  richer  ex¬ 
periences  that  inspire  a  devotion  to  learning  features.  Radial 
menus  in  The  Sims,  audio  menus,  and  draw-your-own-game  in¬ 
terfaces  in  LeapFrog’s  PLY  pen-top  computer,  the  Line  Rider  web 
game,  the  iPod’s  scroll  wheel  and  XBOX  360’s  organic  side¬ 
scrolling  tabs  are  all  examples  of  original  alternatives  to  navigat¬ 
ing  file-and-folder  hierarchies  and  dialog  boxes.  Games  boast 
heads-up  displays  and  voice  commands  to  allow  players  to  multi¬ 
task  (Jom  Clancy’s  Rainbow  Six  on  the  XBOX)  or  sing  (Karaoke 
Revolution).  Still  other  games  use  camera  interfaces  that  track 
the  body’s  movement  (EyeToy  games.  Virus  Attack  for  camera 
phones),  physical  motion  (Dance  Dance  Revolution  dance  pad. 
■Wii  Sports)  and  touch  and  gestures  (Black  and  White,  Ninten- 
dogs,  Yoshi’s  Touch  and  Go,  Electroplankton,  Pac-Pix)  Others  use 
positional  sensing/ubiquitous  computing  and  motion  sensors 
(WarioWare:  Twisted).  Games  offer  rich  social  systems  for  inter¬ 
personal  collaboration  and  communication  (WOVC  EverQuest). 
Games  even  prototype  future  interface  technology  such  as  con¬ 
text-sensitive  holograms  to  encourage  cursor  exploration  (Star 
Wars  Commando) .  The  interface  for  each  of  these  technologies 
supports  a  fresh  experience. 

In  addition,  games  offer  an  emotional  punch  that  has  be¬ 
come  a  cultural  force.  The  automotive  industry  already  consults 
racing-game  designers  on  how  to  make  more  exciting  cars.  U.S. 
politicians  such  as  Howard  Dean  use  web  games  to  teach 
democracy  and  increase  interest  in  their  campaigns.  Games  are 
not  only  innovating  interface  design,  they  are  also  full  of  emo¬ 
tional  experiments.  For  example,  in  Fable,  the  moral  values  ex¬ 
pressed  by  player  decisions  change  the  character  into  a  shin¬ 
ing,  blonde-haired  hero  or  demonic  devil  with  horns.  Although 
not  everyone  plays  computer  games,  they  command  a  profound 
influence  on  other  media  and  culture.  Additionally,  the  genera¬ 
tion  raised  on  games  (today’s  college  students  all  played  Ore¬ 
gon  Trail  in  grade  school)  will  play  more  games  as  adults  than 
their  parents  did. 

Games’  distinct  emotional  processes  come  from  what  play¬ 
ers  like  most  during  play,  which  informs  the  functional  design  of 
product  and  software.  Unfortunately,  the  sole  emotion  goal  ad¬ 
dressed  by  usability  is  to  reduce  frustration.  While  important, 
this  falls  short  of  offering  strong  emotional  rewards  for  accom¬ 
plishing  difficult  tasks.  Crafting  these  emotions  contributes 
to  user  success  because,  when  inspired  by  rich  emotional  re¬ 
sponses,  users  will  explore  and  learn  more  of  an  application’s 
features,  making  them  more  efficient  at  their  jobs.  Over-engi¬ 
neering  a  task  by  making  it  too  predictable,  repetitive,  and  easy 
to  complete  increases  the  likelihood  of  boredom  and  actually 
reduces  satisfaction  over  the  long  term.  Players  and  workers  ex¬ 
perience  huge  emotional  rewards  for  completing  complex  chal¬ 
lenging  tasks.  It  would  be  unfortunate  if  these  were  eliminated 
from  the  work  we  do  as  humans. 


In  HCI,  the  big  mistake  used  to  be  fixing  clumsy  complex  fea¬ 
tures  with  pretty  background  graphics.  Today,  the  big  mistake  lies 
in  believing  that  user  experiences  will  improve  if  designers  re¬ 
move  frustration  points  (usability),  have  the  interface  do  all  the 
work,  and  reduce  the  task  to  a  series  of  trivial  steps.  Customers 
already  have  emotional  reactions  to  their  software.  Designers 
must  learn  to  speak  this  language  of  emotion  from  interaction. 

The  role  of  a  good  interface  is  to  prepare  users  to  master 
something  difficult,  then  allow  them  to  give  themselves  credit 
for  mastering  the  difficult  skill,  while  leaving  enough  ambiguity 
and  challenge  to  make  the  task  fun.  What’s  next  for  user-expe¬ 
rience  design  goes  beyond  refining  features  to  logically  support 
how  users  perform  tasks.  Design  must  also  address  how  fea¬ 
tures  motivate  users  through  affective  states  to  support  and 
refresh  the  task  at  hand.  Not  by  making  the  software  unneces¬ 
sarily  complicated,  but  by  doing  what  games  do:  support  se¬ 
quential  skill-building  to  achieve  complex  goals  and  encourage 
players  to  move  beyond  their  points  of  failure  to  feel  empow¬ 
ered  masters  of  their  own  destiny.  Experience  designers  are  al¬ 
ready  using  Hard  Fun,  Easy  Fun,  Serious  Fun,  and  People  Fun  by 
embedding  emotions  from  goal-directed,  open-ended,  pur¬ 
poseful,  and  social  play. 

By  offering  life  problems  in  miniature,  games  provide  im¬ 
portant  clues  to  the  relationship  of  emotion  in  human  problem 
solving,  goal  achievement,  and  interaction  with  other  people. 
Researching  games  clearly  defines  the  relationships  between  ac¬ 
tion  and  emotion  as  well  as  between  emotions  themselves.  In 
games,  many  emotions  are  opposites,  happen  in  sequences, 
have  prerequisites,  and  share  links  with  others.  Some  emotions 
require  people,  relate  to  goals,  or  involve  the  future  or  the  past. 
Connecting  how  gameplay  leads  to  specific  emotions  estab¬ 
lishes  a  framework  for  the  process  of  creating  more  emotional 
user  experiences.  The  Four  Fun  Keys  Model  creates  effective 
analysis  and  design  techniques  to  identify  and  create  a  wider 
range  of  emotions.  It  connects  emotions  to  popular  types  of 
gameplay  and  demonstrates  how  chains  of  emotions  are  em¬ 
bedded  or  released  from  different  activities.  By  harnessing  play, 
even  productivity  software  and  workflow  design  can  take  ad¬ 
vantage  of  the  motivating  force  of  game  emotions. 

Games  are  the  new  medium  of  the  21st  century.  Unlike  any 
other  design  discipline,  player  experiences  that  unfold  over 
time  are  more  interactive  than  movies,  painting,  architecture, 
industrial  design,  literature,  or  fashion.  Games  are  dynamic 
processes  that  create  dynamic  experiences.  They  are  much 
more  than  a  series  of  static  impressions  seen  in  sequence. 
Games  offer  a  unique  set  of  emotions  from  accomplishment 
and  failure.  At  their  core  is  the  ability  for  players  to  interact  with 
the  content.  The  experiences  that  come  from  this  interaction, 
the  ability  to  create  emotions,  and  the  way  that  this  interaction 
moves  will  separate  games  as  an  art  form  distinct  from  movies 
and  other  visual  arts. 

Emotions  play  an  essential  role  in  providing  the  entertain¬ 
ment  value  that  captivates  players.  Erom  games,  we  can  learn 
how  to  improve  the  emotions  that  keep  users  engaged  in  activ¬ 
ities  of  different  types.  Through  player  interaction  and  control 
of  events,  games  promise  to  become  more  emotional  than 
movies  and  other  entertainment,  but  whether  these  are  the 
same  emotions  remains  to  be  seen.  Games  now  include  more 
detailed  storylines  to  increase  player  engagement.  However,  it  is 
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clear  from  XEODesign’s  research  that  games  create  strong  emo¬ 
tions  even  without  stories,  and  it  is  clear  that  games  already  cre¬ 
ate  more  emotions  through  interaction  with  other  players  in  the 
game  world.  Many  aspects  of  the  storytelling  language  of  cin¬ 
ema  apply  to  games  and  they  will  still  move  players  emotion¬ 
ally.  However,  emotions  from  the  player’s  goals  and  the  things 
he  or  she  cares  about  will  likely  prove  to  be  stronger.  In  games, 
the  mechanic  is  the  moral  of  the  story. 

Releasing  the  full  emotional  potential  of  games  will  not  be 
easy.  It  requires  deep  understanding  of  how  emotions  work,  be¬ 
cause  more  is  known  about  crafting  emotional  entertainment 
experiences  (such  as  a  movie)  by  offering  the  viewer  empathy 
with  a  protagonist  engaged  in  a  predetermined  sequence  of 
events.  Less  is  known  about  creating  emotion  through  being  the 
protagonist.  Viewing  a  prerendered  video  between  game  levels 
is  less  compelling  than  having  the  player’s  actions  create  an 
emotionally  rich  experience  itself,  but  at  the  moment  these 
mini-movies  are  easier  to  design.  Going  forward,  these  dramatic 
tools  for  creating  affect  should  inspire  rather  than  dictate  game 
design.  Eventually  the  emotional  props  from  these  cut  scenes 
will  fade  from  games  just  as  title  cards  disappeared  from  old 
silent  pictures  once  sound  technology  allowed  the  actors  and 
the  action  to  speak  for  themselves. 

Psychology,  sociology,  theater,  literature,  film,  and  only  re¬ 
cently  games  have  all  studied  how  entertainment  engages  au¬ 
diences.  Games  are  unique  in  that  they  entertain  by  creating 
emotion  from  interaction.  Before  XEODesign’s  research  on 
emotion  and  the  fun  of  games,  none  had  studied  how  enter¬ 
tainment  creates  specific  emotions  and  uses  them  to  focus  at¬ 


tention  and  motivate  play.  The  Four  Fun  Keys  Model  creates  the 
four  most  important  sets  of  emotions  for  games.  Each  fun  key 
is  a  collection  of  game  interactions  (game  mechanics)  that  cap¬ 
tivate  player  attention  with  different  series  of  emotion  that 
make  games  self-motivating.  These  emotion  cycles  create  the 
player  experience  and  separate  best-selling  games  from  their 
imitators.  Based  on  contextual  research  of  people  playing  their 
favorite  games  cross  genre,  platform,  and  gender;  the  Four  Fun 
Keys  describe  how  emotion  comes  from  what  is  the  most  fun 
about  games.  Entertainment  interfaces  and  products  such  as 
Apple’s  iPod  and  Microsoft’s  XBOX  360  already  use  interaction 
from  each  of  the  Four  Fun  Keys  to  unlock  powerful  emotions 
to  increase  user  enjoyment  and  build  stronger  brand  impres¬ 
sions.  Interaction  that  generates  a  lot  of  emotion  is  more 
memorable  and  feels  like  play. 
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INTRODUCTION 


This  chapter  is  written  for  professionals  familiar  with  human- 
computer  interaction  (HCI),  but  not  with  the  issues  and  con¬ 
siderations  particular  to  motor  vehicles.  For  non-HCI  profes¬ 
sionals,  reading  chapters  1-16  of  this  text  should  provide  the 
desired  background.  For  automotive  industry  interface  design¬ 
ers,  this  chapter  should  pull  together  information  dispersed 
throughout  the  literature. 

HCI  of  interest  to  motor  vehicle  designers  because  of  the 
rapid  growth  of  driver  information  systems  that  utilize  comput¬ 
ers  and  communications,  collectively  referred  to  as  “telematics.” 
Included  under  the  umbrella  of  telematics  are  navigation  sys¬ 
tems,  cell  phones,  collision-warning  systems  of  various  types, 
lane  departure  warning  systems,  and  so  forth.  These  systems 
can  (a)  potentially  enhance  public  safety,  (b)  make  transport 
more  efficient  (saving  time  and  fuel) ,  (c)  make  driving  more  en¬ 
joyable,  and  (d)  make  drivers  more  productive. 

More  specifically,  this  chapter  reviews  design  documents  and 
evaluation  methods  for  driver  interfaces  in  order  to  promote 
safety  and  ease  of  use.  An  underlying  theme  is  that  the  safety- 
critical  nature  of  driving  leads  to  significant  departures  from 
standard  HCI  practice,  and  to  some  methods  and  measures  that 
are  unique  to  automotive  applications.  Since  this  is  a  reference 
handbook,  engineering  practice  receives  more  attention  than 
scientific  theory  and,  furthermore,  given  its  technology  focus, 
the  design  of  traditional  (noncomputer)  motor  vehicle  driver 
interfaces  (such  as  switches  for  headlights  and  windshield 
wipers)  is  not  covered.  For  information  on  those  interfaces,  see 
Peacock  and  Karwowski  (1993)  or  the  latest  edition  of  the  Soci¬ 
ety  of  Automotive  Engineers  (SAE)  Handbook  (Society  of  Auto¬ 
motive  Engineers,  2005).  Readers  interested  in  additional  re¬ 
search  literature  on  telematics  should  see  Michon  (1993),  Parkes 
and  Franzen  (1993),  Barfield  and  Dingus  (1997),  Noy  (1997), 
or  the  proceedings  from  the  2005  Driving  Assessment  confer¬ 
ence  (http://ppc.uiowa.edu/driving-assessment/2005/final/in- 
dex.htm). 


What  Kinds  of  Telematics  Products  Are  Likely 
in  the  Near  Term? 

Over  the  years,  several  studies  have  used  expert  opinions  to 
predict  the  future  of  automotive  electronics,  specifically  telem¬ 
atics  applications  (e.g.,  Ribbens  &  Cole,  1989;  Underwood,  Chen, 
&  Ervin,  1991;  Underwood,  1989,  1992;  Richardson  &  Green, 
2000).  The  most  extensive  information,  however,  is  often  con¬ 
tained  in  proprietary  market  surveys  not  for  public  distribution. 
The  most  recent  publicly  available  projections  appear  in  Green 
and  colleagues  (2001),  summarizing  input  from  83  senior  exec¬ 
utives  in  the  automotive  industry.  Interestingly,  those  projec¬ 
tions  coincide  with  the  time  period  when  this  handbook  is  to  be 
published.  Table  35.1  lists  the  mean  value  of  the  year  estimated 
when  each  feature  would  be  introduced  into  10%  of  new  lux¬ 
ury  vehicles  in  2004-2008. 

Since  these  data  were  collected  in  August,  2000,  and  the 
planning  horizon  for  a  new  vehicle  is  three  years,  estimates 
thrctugh  2004  should  reflect  real  product  plans,  not  speculation 


about  the  future.  For  some  features  (built-in  personal  digital  as¬ 
sistant  [PDA] ,  electronic  toll  tag),  there  was  not  complete  agree¬ 
ment  on  when  introduction  would  occur,  as  a  significant  frac¬ 
tion  of  respondents  thought  some  features  would  never  be 
introduced.  Nonetheless,  the  table  makes  the  point  that  overall, 
these  predictions  tend  to  be  slightly  optimistic,  but  not  unrea¬ 
sonable.  Future  products  are  most  likely  related  to  driving  sup¬ 
port  systems  (various  collision  warning  and  avoidance  systems, 
night  vision),  information  systems  (cell  phones,  e-mail/Internet, 
PDAs),  and  the  use  of  voice  technology.  In  about  a  decade  or 
less,  the  author  believes  that  wearable  computers  will  be  a  topic 
of  significant  interest  for  driving.  (See  chapter  15  by  Starner, 
Siewiorek,  &  Mailagic).  In  aggregate,  these  systems  could  re¬ 
shape  the  driver’s  task  from  one  of  real-time  control  of  the  ve¬ 
hicle  (steering  the  vehicle  and  using  the  throttle  to  adjust 
speed)  to  information  management. 

Chapter  Organization 

How  might  one  organize  information  on  telematics?  In  their 
classic  paper  on  usability,  Gould  and  Lewis  (1985),  identified 
three  key  principles  to  be  followed  when  designing  products  for 
ease  of  use: 

1.  Early  focus  on  users  and  tasks 

2.  Empirical  measurement 

3.  Iterative  design 

These  principles  not  only  apply  to  office  applications  and  web 
development,  but  automotive  applications  as  well.  In  the  auto¬ 
motive  context,  designers  need  to  understand  (a)  who  drives  ve¬ 
hicle  (users),  (b)  what  in-vehicle  tasks  they  perform,  and,  most 
importantly,  (c)  the  driving  task,  (d)  task  context,  and  (e)  the 
consequence  of  task  failures.  These  topics  are  the  focus  of  the 
first  part  of  this  chapter. 

Secondly,  it  is  important  to  be  able  to  measure  driver  and  sys¬ 
tem  performance  (empirical  measurement).  That  topic  consti¬ 
tutes  the  second  part  of  this  chapter. 

Surprisingly,  there  has  been  little  reporting  of  how  iterative 
design  is  used  in  developing  driver  interfaces,  though  the  ap¬ 
proach  is  used.  A  great  deal  of  automotive  design  relies  upon  fol¬ 
lowing  design  guidelines,  which  is  the  final  focus  of  this  chapter. 


WHAT  IS  THE  DRIVING  CONTEXT  IN  WHICH 
USERS  PERFORM  TASKS? 


Scott  McNealy,  CEO  of  Sun  Microsystems,  once  said,  ‘A  car  is 
nothing  more  than  a  Java  technology-enabled  browser  with 
tires”  (Kayl,  2000).  He  is  dead  wrong.  The  author  has  never 
heard  of  anyone  claiming,  ‘A  computer  came  out  of  nowhere, 
hit  me,  and  vanished.”  Yet  police  officers  and  insurance  ad¬ 
justers  hear  such  claims  about  motor  vehicles  every  day.  Like¬ 
wise,  the  author  knows  of  no  one  who  was  ever  been  killed  as 
a  consequence  of  operating  a  computer  at  their  desk,  but  the 
loss  of  life  associated  with  crashes  arising  from  normal  motor 
vehicle  operation  is  huge. 
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TABLE  35.1  Estimated  Year  of  Feature  Introduction  into  Luxury  Vehicles  from  Green  et  al.  (2001) 
(Bold  =  estimated  to  be  in  10  percent  of  2006  model  year  luxury  vehicles) 


Mean 

Year 

Estimate 

"Never" 

Responses* 

Feature 

Description 

2004.3 

1 

Built-in  wireless  phone  interface 

2004.4 

0 

GPS  navigation 

Uses  Global  Positioning  Satellites  for  location,  e.g..  Hertz  NeverLost 

2004.6 

1 

Automatic  collision  notification 

Calls  when  air  bag  deploys,  e.g.,  OnStar  in  GM  vehicles 

2004.7 

2 

Satellite  radio 

Nationwide  satellite  broadcast  of  100  channels,  e.g.,  XM,  Sirius 

2004.8 

4 

Removable  media  for  entertainment 
and  data 

(e.g.,  iPod,  but  for  entertainment  only) 

2004.8 

0 

Email/internet  access 

2004.8 

10 

Built  in  PDA  (e.g.,  Palm)  docking  station 

PDA  is  a  personal  digital  assistant 

2004.8 

0 

Adaptive  cruise  control  (ACC) 

Scans  for  traffic  and  adjust  speed  to  maintain  driver  set  separation 

2005.0 

1 

Rear  parking  aid 

Several  SUVs  (e.g.,  Murano) 

2005.2 

1 

MP3  support 

MP3  is  an  audio  file  format 

2005.2 

0 

Bluetooth  support 

Short-range  wireless  communication  allowing  direct  link  of 
cellular  phones 

2005.3 

0 

Automatic  download  of  traffic/ 
congestion  information 

Navigation  systems  in  several  Honda  products 

2005.3 

2 

Blind  spot  detection  and  warning 

Radar  and  other  systems  to  detect  vehicles  to  the  side  and  rear 

2005.5 

1 

Voice  operation  of  some  controls 

laguar,  several  Nissan  vehicles 

2005.6 

2 

Downloadable  software  features 

2005.6 

2 

Downloadable  software  fixes 

2005.6 

3 

Forward  collision  warning 

Laser  radar  systems 

2005.7 

4 

Forward  parking  aid 

Sonar-based  short  range  systems  to  assess  distance  ahead 

2005.7 

6 

Lane  departure  warning 

Video  systems  that  scan  for  lane  markings,  Infiniti  SUVs 

2005.7 

4 

Dual  voltage  (42/1 2  volt) 

Current  power  is  1 2-volt 

2005.8 

12 

Built-in  electronic  toll  and  payment  tag 

Allows  travel  through  toll  both  without  stopping 

2005.8 

0 

General  purpose  text/data  speech  capability 

2005.9 

7 

Large  general-purpose  display 

2005.9 

6 

Offboard  applications  via  data  link 

2005.9 

9 

Night  vision  systems 

Infrared-based  systems  to  show  objects  ahead  on  a  head-up 
display  (HUD),  Cadillac 

2006.0 

16 

Black  box  crash  recorder 

2006.0 

7 

Active  suspension 

2006.1 

8 

Forward  collision  braking  only 

Active,  not  just  a  warning 

2006.2 

9 

General  purpose  computer  (e.g.,  AutoPC) 

2006.3 

62 

Karaoke 

Most  respondents  said  never 

2006.3 

9 

Drowsy  driver  detection 

2006.7 

1 

42  V  electrical  system 

2006.8 

10 

Open  electronics  bay  with  utilities 

2006.8 

7 

Fingerprint  or  voice-controlled  entry 

2006.9 

11 

Interface  to  wearable  computer 

2007.1 

14 

Large  area  HUD 

For  speedometer,  navigation,  back  up  camera,  etc. 

2007.1 

25 

Alcohol-impaired  driver  detection 

Most  respondents  said  never 

2007.1 

1 

Brake  by  wire 

2007.1 

4 

Drive  by  wire 

2007.2 

6 

Hybrid  drive-train  (electric/combustion) 

2007.6 

15 

Automatic  lane  control 

Takes  over  steering 

2008.0 

20 

Forward  collision  braking  and  steering 

2008.2 

26 

All-electric  drive-train 

*  Respondents  had  the  option  of  identifying  a  particular  year  or  selecting  never. 


According  to  the  World  Health  Organization  (Peden  et  al., 
2004),  0.75  to  1.8  million  people  die  in  road  traffic  crashes  each 
year,  or  about  3,000  per  day.  (See  also  http://www.unece.org/ 
trans/main/welcwpl.html.)  Traffic  crashes  have  been  ranked 
ninth  in  terms  of  the  percentage  of  disability-adjusted  life  years 
lost  (Murray  &  Lopez,  1996b),  well  ahead  of  war  (I6th),  vio¬ 
lence  (19th),  and  alcohol  use  (20th).  If  the  current  trends  con¬ 
tinue,  by  the  year  2020,  traffic  crashes  will  become  the  third 


largest  cause  of  death  and  disability  after  clinical  depression 
and  heart  disease.  In  1990,  traffic  crashes  ranked  as  the  ninth 
biggest  killer  in  the  world. 

Additional  insights  come  from  crash  data  for  the  United 
States,  for  which  reliable,  detailed  crash  statistics  are  available. 
According  to  the  U.S.  Department  of  Transportation  (2005a), 
42,636  deaths  occurred  in  traffic  crashes  in  the  United  States  in 
2004.  Of  them,  19,091  were  passenger  car  occupants;  5,801  were 
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pickup  truck  occupants;  4,641  were  pedestrians;  4,735  were  in 
SUVs;  4,008  were  motorcyclists;  2,036  were  in  vans;  761  were 
in  larger  trucks;  725  were  bicyclists;  and  808  were  in  other  cat¬ 
egories.  (For  additional  information,  see  the  latest  edition  of 
Injury  Facts  [National  Safety  Council,  2004]  and  the  U.S.  De¬ 
partment  of  Transportation  Fatality  Analysis  Reporting  System 
[FARS]  website  [http :/Avww-fars. nhtsa.dot.gov/].) 

What  Kinds  of  Crashes  Are  Associated  With  Telematics? 

It  is  widely  accepted  that  some  telematics  tasks  are  distracting 
and  that  distraction  can  lead  to  crashes.  The  most  commonly 
cited  statistics  concerning  distraction-related  crashes  (Table 
35.2)  were  based  on  data  from  the  Crashworthiness  Data  Sys¬ 
tem  (CDS).  CDS  is  an  annual  probability  sample  of  approxi¬ 
mately  5,000  police-reported  crashes  involving  at  least  one 
passenger  vehicle  that  was  towed  from  the  scene  (out  of  a  pop¬ 
ulation  of  almost  3.4  million  tow-away  crashes).  Minor  crashes 
(involving  property  damage  only)  are  not  in  CDS.  CDS  crashes 
are  investigated  in  detail  by  specially  trained  teams  of  profes¬ 
sionals  who  provide  much  more  information  than  is  given  in 
police  reports. 

Note  that  distraction-related  crashes  primarily  involved  a 
single  vehicle  (41%),  though  rear-end  crashes  (moving,  10%; 
stopped,  22%)  were  also  common.  Intersection  crashes  repre¬ 
sented  another  18%  of  the  total.  Crashes  tended  to  peak  in  the 
morning  rush  hour  and,  to  a  much  lesser  extent,  in  the  evening 
rush.  The  overwhelming  majority  of  the  crashes  occurred  in 
good  weather  (86%  clear,  10%  rain,  4%  snow/hail/sleet),  and 
many  occurred  at  lower  speeds  (40%  at  0-35  mph,  40%  at  40-50 
mph,  17%  at  55-60  mph,  and  4%  at  over  65  mph).  Thus,  these 
data  suggest  that  device  test  scenarios  should  emphasize  situa¬ 
tions  in  which  single-vehicle  crashes  (often  run-off-the-road),  as 
well  as  those  involving  rear-end  collisions  into  stopped  vehicles, 
are  likely. 

Other  recent  evidence  appeared  in  Stutts,  Reinfurt,  Staplin, 
and  Rodgman  (2001)  and  Eby  and  Kostyniuk  (2004).  Table  35.3 
shows  summary  statistics  from  Stutts  and  colleagues  (2001;  CDS 


files  from  1995,  the  first  year  for  which  driver  attention  was  ex¬ 
amined,  until  1999)  concerning  the  types  of  distraction  involved. 

These  data,  however,  should  be  used  with  some  caution. 
First,  there  is  a  high  percentage  of  unknown  and  missing  data  in 
the  set,  an  outcome  typical  when  distraction  is  coded.  For  ex¬ 
ample,  the  driver  attention  status  (1  of  360  variables  included) 
was  “unknown”  for  36%  of  the  cases.  Several  years  have  elapsed 
since  the  midpoint  of  this  study,  and  many  driver  interfaces  have 
changed  considerably  since  the  data  were  collected.  For  ex¬ 
ample,  according  to  the  Cellular  Telecommunications  and  Inter¬ 
net  Association  (CTIA;  http://files.ctia.org/pdf/CTIA_Survey_ 
Year_End_2006_Graphics.pdf),  the  number  of  cell  phones  in 
use  grew  by  more  than  a  factor  of  four  between  1997  and  2006. 
Given  that  increase  in  exposure,  the  percentage  of  such  crashes 
is  likely  to  be  underestimated  by  that  factor  using  the  data  from 
Stutts  and  colleagues  (2001).  Unlike  external  distractions,  dis¬ 
tractions  due  to  vehicle  hardware  can  be  reduced  or  eliminated 
by  appropriate  engineering  on  the  part  of  manufacturers  and 
suppliers. 

The  Scope  of  the  Cell  Phone  Problem 

Because  telematics  devices  are  new,  the  data  on  device-related 
crashes  is  limited.  This  situation  has  hampered  progress  in  un¬ 
derstanding  the  risks  of  such  devices,  especially  cell  phones, 
which  receive  the  most  attention.  At  the  time  this  chapter  was 
written,  22  states  record  whether  cell  phones  are  causal  factors 
in  crashes,  with  most  just  starting  to  collect  this  information  in 
the  last  few  years  (Sundeen,  2005).  The  National  Conference  of 
State  Legislatures  data  suggested  that  cell  phones  were  involved 
in  approximately  0.18%  of  all  crashes.  Currently,  6%  of  all  drivers 
are  using  cell  phones  at  any  moment  (U.S.  Department  of  Trans¬ 
portation,  2005b);  however,  given  increased  feature  content 
(MP3  support,  text  messaging,  broadcast  TV),  cell-phone  use  is 
likely  to  increase  even  more  than  over  the  last  few  years. 

In  terms  of  fatalities,  the  data  from  the  National  Police 
Agency  (NPA)  in  Japan  is  often  cited.  Admittedly,  there  is  less 
evidence  to  assess  the  quality  of  that  data  than  there  is  for  U.S. 


TABLE  35.2  Distraction/Inattention  Crashes  by  Crash  Type  from  Knipling  and  Goodman  (1996) 


Crash  Type  Row  %  Column  % 

Sleepy 

Distracted 

Looked  But  Did  Not  See 

Unknown 

Attentive 

Total 

Single  Vehicle 

5.8 

18.1 

0.2 

31.8 

44.1 

100.0 

66.2 

41.2 

0.7 

20.6 

45.9 

30.0 

Rear-End/Lead  Vehicle  Moving 

12.7 

21.3 

3.4 

48.3 

14.3 

100.0 

27.9 

9.6 

2.0 

6.4 

2.9 

5.9 

Rear-End/Lead  Vehicle  Stopped 

23.9 

11.4 

52.6 

11.8 

100.0 

— 

21.9 

13.8 

14.1 

4.9 

12.1 

Intersection/Crossing  Path 

7 

17.9 

52.8 

22.3 

100.0 

— 

18.1 

63.6 

39.8 

26.6 

34.3 

Lane  Change/Merge 

5.6 

17.2 

41.8 

35.3 

100.0 

— 

1.6 

6.7 

3.4 

4.6 

3.8 

Head-On 

1.0 

7.0 

8.1 

46.4 

37.5 

100.0 

1.7 

2.2 

3.5 

4.3 

5.4 

4.2 

Other 

7.3 

9.7 

53.5 

28.9 

100.0 

— 

5.4 

9.7 

11.4 

9.7 

9.7 

Total  Crashes 

2.6 

13.2 

9.7 

45.7 

28.8 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 
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TABLE  35.3  Crashes  for  Which  Distraction  was  a  Known  Factor 
(8.3  percent  of  the  crashes) 

(Bold  =  equipment  related  distraction) 


Distraction 

Estimate  (%) 

Error  (%) 

Outside  person,  object,  or  event 

29.4 

±4.7 

Adjusting  radio/cassette/CD 

11.4 

±7.2 

Other  occupant 

10.9 

±3.3 

Moving  object  in  vehicle 

4.3 

±3.2 

Other  device/object 

2.9 

±1.6 

Adjusting  vehicle/climate  controls 

2.8 

±1.1 

Eating  and/or  drinking 

1.7 

±0.6 

Using/dialing  cell  phone 

1.5 

±0.9 

Smoking-related 

0.9 

±0.4 

Other  distractions 

25.6 

±6.0 

Unknown  distraction 

8.6 

±5.3 

TABLE  35.4  Mobile  Phone  and  Navigation  Systems-Related 
Crashes  in  lapan 


System 

Result 

1997 

1998 

1999 

Cell  Phones 

Injuries 

2,095 

2,397 

2,418 

Deaths 

20 

28 

24 

Navigation  Systems 

Injuries 

117 

131 

205 

Deaths 

1 

2 

2 

TABLE  35.5  Cell  Phone  and  Navigation  Crashes 
per  Year  in  Japan 

Crashes 

1997 

1998 

1999 

2000 

2001 

2002 

2003 

2004 

Cell  phones 

2297 

2648 

2583 

1453 

3040 

2847 

2597 

1868 

Navigation 

127 

146 

219 

190 

1415 

1307 

1098 

1253 

systems 

federal  or  state  data.  Further,  it  is  suspected  that  even  when  in¬ 
vestigated,  mobile  phone  use  is  underreported,  because  re¬ 
porting  involves  drivers  admitting  to  something  inappropriate. 
Tables  35.4  and  35.5  show  the  annual  crash  statistics  for  phones 
and,  for  perspective,  navigation  systems.  The  decreases  in  the 
number  of  cell-phone  crashes  after  1999  and  2003  reflect  new 
legal  restrictions  on  cell  phone  use. 

Using  these  data,  cell-phone-related  deaths  in  the  United 
States  for  the  year  in  which  this  chapter  was  written  (2005)  can 
be  estimated.  (See  Wierwille,  1995.)  The  average  number  of 
cell-phone-related  deaths  in  Japan  was  about  25  per  year  for 
1997-1999.  In  determining  this  estimate,  one  must  realize  that 
a  random  variation  exists  in  deaths  per  year,  and  the  number  in¬ 
creased  over  that  period.  On  average,  the  number  of  traffic 
deaths  in  the  United  States  is  about  four  and  a  half  times  that  of 
Japan.  Assuming  everything  else  is  equal,  this  would  suggest  at 
least  112  (=25  *  4.5)  cell-phone-related  deaths  per  year  in  the 
United  States  for  1997-1999.  If  the  number  of  calls  per  driver 
and  the  mean  call  duration  are  assumed  unchanged  (they  have 
actually  increased  by  30%),  then  the  change  in  exposure  of  the 
population  between  1998  and  2005  is  simply  due  to  the  in¬ 
crease  in  the  number  of  phones  in  existence,  roughly  3.19 
times.  Using  the  CTIA  data  to  determine  the  number  of  users 
and  112  deaths  in  1998  as  the  base  estimate,  the  total  for  2005 


cell-phone-related  crashes  in  the  United  States  is  357  deaths 
(=112  X  3.19);  this  number  should  continue  to  increase  at  15% 
or  so  per  year,  the  current  growth  rate  of  cell  phone  subscribers. 
Other  sources  suggest  the  number  of  cell-phone-related  deaths 
is  much  higher  (Pena,  2007). 

By  way  of  comparison,  about  120  children  and  small-stature 
women  were  killed  in  front-passenger  air-bag  crashes,  and  about 
200  people  in  Firestone/Explorer  rollover  crashes  over  several 
years.  Both  of  these  situations  have  led  to  major  changes  in  fed¬ 
eral  requirements  and  product  engineering  changes,  as  well  as 
significant  financial  losses  to  several  manufacturers  (Green, 
2001c).  In  response  to  the  Firestone  situation,  the  U.S.  Senate 
passed  the  TREAD  Act  (http://www.driveusa.net/treadact.htm) 
three  weeks  after  it  was  introduced.  Hence,  designing  telemat¬ 
ics  interfaces  to  assure  safety  and  usability  is  a  topic  worthy  of 
significant  effort. 

Supporting  the  crash  statistics,  the  most-often-cited  study 
of  cellular-phone  use  is  Redelmeier  and  Tibshirani  (1997;  see 
also  Redelmeier  &  Tibshirani,  2001.)  They  examined  data  for 
almost  700  drivers  who  were  mobile-phone  users  and  were  in¬ 
volved  in  motor-vehicle  crashes  that  resulted  in  substantial  prop¬ 
erty  damage.  Each  driver’s  mobile-phone  records  for  the  day  of 
the  crash  and  the  previous  week  were  examined.  Redelmeier 
and  Tibshirani  reported  the  risk  of  a  crash  was  4.3  times  greater 
when  a  mobile  phone  was  used  than  when  it  was  not.  Interest¬ 
ingly,  hands-free  units  had  a  greater  risk  ratio  (though  not  sig¬ 
nificant)  than  hand-held  units  (5.9 : 1  versus  3.9 : 1).  Other  data 
(Koushki,  Ali,  &  Al-Saleh,  1999;  McEvoy  et  al.,  2005;  Violanti  & 
Marshall,  1996)  suggested  similar  risk  ratios.  Other  human  fac¬ 
tors  data  show  drivers  taking  longer  to  respond  to  brake  lights 
of  lead  vehicles,  departing  from  the  lane  more  often,  and  ex¬ 
hibiting  other  undesired  characteristics  (Horrey  &  Wickens, 
2004)  while  using  a  cell  phone. 

If  phone  use  is  a  concern,  which  tasks  lead  to  crashes?  Table 
35.6  shows  task-specific  crash  statistics  for  cell  phones  and,  for 
comparison,  navigation  systems.  These  NPA  data  identify  four 
primary  tasks:  (a)  receiving  a  call,  (b)  dialing,  (c)  talking,  and 
(d)  other.  What  other  represents  is  unknown.  It  is  also  unknown 
if  talking  includes  voice  services  (voice  mail,  bank  by  phone, 
etc.).  Notice  that  NPA  reported  the  number  one  cause  of  crashes 
was  receiving  a  call.  Most  people,  either  at  work  or  at  home  (and 
while  driving,  even  in  hazardous  traffic  situations),  immediately 
(within  1-4  seconds)  answer  the  phone  when  it  rings,  almost  re¬ 
gardless  of  the  situation  (Nowakowski,  Friedman,  &  Green, 
2001).  (An  exception  is  at  dinnertime  in  the  United  States,  when 
telemarketing  calls  are  common.)  For  example,  one  could  be 
in  his  or  her  office  talking  to  a  very  important  person  such  as  the 


TABLE  36.6  Driver  Tasks  and  Crashes 
(lanuary-November,  1999)  from  NPA  Data 


Cell  Phone 

Navigation  System 

Task 

Crashes 

Task 

Crashes 

Receiving  Call 

1077 

Dialing 

504 

Looking 

151 

Talking 

350 

Operating 

46 

Other 

487 

Other 

8 

Total 

2418 

205 
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President  of  the  United  States.  If  the  phone  rang,  many  people 
might  ask  the  President  to  wait  a  moment  while  they  answered 
the  incoming  call,  though  it  is  highly  likely  someone  more  im¬ 
portant  might  be  calling.  In  fact,  many  people  now  have  call 
waiting  so  they  can  accept  additional  incoming  calls  while  on 
the  phone.  Logically,  answering  the  phone  should  not  usually 
be  more  important  than  driving,  but  people  behave  otherwise 
out  of  habit. 

Not  only  is  the  answering  task  a  problem  because  of  its  im¬ 
mediacy,  but  also  sometimes  because  of  its  visual  demands. 
Many  drivers  use  portable  phones  that  might  be  in  a  jacket 
pocket  on  the  seat  or  in  a  purse.  Thus,  in  answering  the  phone, 
the  driver’s  first  task  might  be  to  search  for  it.  Although  hands¬ 
free  systems  can  reduce  the  duration  of  the  disruption,  they  do 
not  eliminate  the  disruption  itself  or  its  incorrect  prioritization. 

In  the  Japanese  data,  the  second  most  common  cause  of 
crashes  was  dialing.  Clearly,  going  to  a  hands-free  interface  can 
reduce  the  visual  and  manual  demands,  though  not  the  cogni¬ 
tive  demands  of  dialing;  however,  the  current  evidence  is  that 
there  is  no  difference  in  crash  risk  when  hands-free  phones  are 
provided.  It  could  be  that  providing  a  hands-free  phone  makes 
it  easier  to  dial,  so  hands-free  phone  users  engage  in  more  con¬ 
versations  while  driving.  Another  possibility  is  that  seeing  a 
phone  being  held  is  a  warning  to  other  drivers  of  a  potentially 
distracted  driver. 

Finally,  a  substantial  number  of  calls  were  associated  with  just 
talking  on  the  phone.  While  some  distractions  associated  with 
talking  on  the  phone  are  manual  (such  as  holding  the  phone), 
most  distractions  are  cognitive,  with  drivers  focusing  on  the 
conversation,  rather  than  on  driving.  It  is  this  aspect  of  cell 
phone  use  that  is  commonly  observed  by  drivers.  Current  wis¬ 
dom  in  the  U.S.  is  that  if  a  person  is  weaving  between  lanes  and 
it  is  night,  they  are  drunk.  If  they  are  weaving  and  it  is  during  the 
day,  they  are  on  the  phone. 

Talking  on  the  phone  is  different  from  talking  to  a  passenger, 
especially  an  adult  in  the  front  seat.  Often,  that  adult  behaves  as 
a  co-driver,  looking  both  ways  at  intersections  and  checking  the 
mirrors  during  lane  changes,  and  speaking  less  in  higher  risk 
situations  (Drews,  Pasupathi,  &  Strayer,  2004;  Crundall,  Bains, 
Chapman,  &  Underwood,  2005).  When  drivers  move  their  heads 
to  scan  an  intersection  or  expressway  entrance  or  exit,  passen¬ 
gers  often  stop  talking.  People  on  the  phone  have  no  knowledge 
of  the  driving  situation,  and  just  keep  talking.  Technology  could 
assist  in  reducing  the  scope  of  this  problem,  for  example  by  pro¬ 
viding  cues  to  inform  callers  of  the  driving  situation  and  drivers 
of  the  call  duration  or  their  driving  performance. 

Recognizing  the  risks  of  cell  phone  use,  many  countries  (Aus¬ 
tralia,  England,  Israel,  Italy,  Switzerland,  and  Spain)  permit  only 
hands-free  operation.  As  was  noted  earlier,  however,  hands-free 
use  is  essentially  no  different  in  risk  from  handheld  operation. 
Many  states  are  also  considering  restrictions  on  cell  phone  use 
while  driving,  with  the  exception  of  making  911  and  other  emer¬ 
gency  calls.  Obtaining  passage  of  such  laws  has  proven  difficult. 

The  author  believes  that  a  complementary  approach  is  to  im¬ 
plement  an  in-vehicle  workload  manager  (Michon,  1993;  Green, 
2000b,  2004;  Hoedemaeker,  de  Ridder,  &  Janssen,  2002;  Piechulla, 
Mayser,  Gehrke,  &  Konig,  2003).  As  initially  conceived,  such  sys¬ 
tems  would  use  data  from  four  sources:  (a)  the  navigation  system 
(such  as  lane  width  and  radius  of  curvature),  to  assess  the  de¬ 


mands  due  to  road  geometry,  (b)  the  adaptive  cruise  control 
system  (headway  and  range  rate  to  vehicles  ahead),  to  assess  traf- 
Hc  demands,  (c)  the  traction  control  system,  to  assess  road  sur¬ 
face  friction,  and  (d)  the  wipers,  lights,  and  clock,  to  assess  visi¬ 
bility.  This  information — along  with  information  on  the  driver 
(e.g.,  age)  and  the  specific  visual,  auditory,  cognitive,  and  motor 
demands  of  each  in-vehicle  task — could  be  used  to  schedule  the 
occurrence  of  in-vehicle  tasks.  Thus,  when  driving  on  a  curving 
road  in  heavy  traffic  in  a  downpour,  incoming  mobile  phone 
calls  could  be  directed  to  an  answering  machine  and  30,000- 
mile  maintenance  reminders  could  be  postponed.  When  the 
driving  task  demand  is  low,  drivers  could  have  access  to  a  wide 
range  of  functions.  Being  able  to  reliably  predict  the  momentary 
workload  of  driving,  however,  has  proven  to  be  very  difficult. 

As  an  alternative,  it  is  useful  to  think  more  broadly  about 
when  the  driver  is  likely  to  be  overloaded  and,  as  some  vehicle 
engineers  have  realized,  that  often  occurs  when  the  driver  is  ma¬ 
neuvering  or  about  to  maneuver.  Maneuvering  includes  (a)  chang¬ 
ing  lanes,  (b)  merging  onto  an  expressway,  (c)  turning  at  an  in¬ 
tersection,  (d)  parking,  (e)  braking  in  response  to  a  lead  vehicle, 
(f)  accelerating  from  a  traffic  light,  and  so  forth.  These  situations 
are  much  easier  to  identify  than  the  overload  situations  de¬ 
scribed  earlier  and,  furthermore,  are  situations  when  drivers  are 
less  likely  to  desire  additional  tasks  (such  as  responding  to  an 
incoming  call)  and  would  likely  lead  to  much  greater  market  ac¬ 
ceptance  of  a  workload  manager  (Eoh,  Green,  Schweitzer,  & 
Hegedus  2006). 

What  Kind  of  Trips  Do  People  Make  and  Why? 

Every  five  years,  the  U.S.  Department  of  Transportation  con¬ 
ducts  the  National  Household  Travel  Survey  to  obtain  travel  data 
for  the  United  States  (Hu  &  Reuscher,  2004).  In  the  most  re¬ 
cent  data  from  2001,  an  average  person  in  the  United  States  trav¬ 
eled  about  14,500  miles  per  year,  making  four  trips  per  day.  On 
average,  people  drove  about  40  miles  per  day,  with  most  of  the 
miles  (about  35)  covered  in  a  personal  vehicle.  Keep  in  mind 
that  these  are  averages,  and  that  public  transit  (including  school 
buses)  prevalent  in  urban  areas  accounts  for  only  296  of  all  trips. 
The  travel  situation  is  likely  to  be  different  for  more  urbanized 
countries  (Japan,  most  of  Europe)  as  well  where  public  transit  is 
more  prevalent. 

According  to  the  2001  data  (Table  35.7),  the  most  common 
reason  for  travel  is  family  and  personal  business,  which  includes 
shopping,  running  errands,  and  dropping  off  and  picking  up 
others,  accounting  for  almost  half  of  the  trips. 

These  and  other  data  (on  trip  distances,  travel  speeds,  time 
of  day,  etc.)  in  the  National  Household  Travel  Survey  provide 


TABLE  35.7  Summary  of  Trip  Purposes 


Purpose 

%  Person  Trips 

Family  &  personal  business 

44.6 

Work 

i4.8 

Social  &  recreational 

27. i 

School  &  church 

9.8 

Work-related 

2.9 

Other 

0.8 
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background  information  on  the  context  for  which  telematics  de¬ 
vices  should  be  designed  and  may  suggest  scenarios  that  should 
be  used  in  evaluating  safety  and  usability. 

In  contrast  to  the  emerging  understanding  of  the  driving 
task,  less  is  known  about  the  real  use  of  in-vehicle  devices  while 
driving,  in  particular  the  frequency  and  duration  of  use. 


Who  Are  the  Users? 

Almost  any  adult  has  the  potential  of  driving,  with  the  determi¬ 
nation  being  made  by  completing  simple  driver  licensing  re¬ 
quirements.  Thus,  in  some  ways,  the  driving  population  repre¬ 
sents  the  population  of  candidate  users  for  office  computer 
systems.  For  most  states,  the  percentage  of  drivers  of  either 
gender  is  within  1%  of  being  equal  (Highway  Statistics  2003, 
2004a).  Data  with  regard  to  age  are  shown  in  Fig.  35.1.  Notice 
that  the  percentage  of  the  population  that  is  licensed  hits  80%  at 
age  21  and  remains  at  that  level  until  the  mid-70s.  Thus,  in  de¬ 
signing  in-vehicle  systems  for  motor  vehicles,  few  adults  can  be 
excluded,  which  differs  from  the  design  of  office  computer  sys¬ 
tems,  where  the  emphasis  is  on  the  working  population  (gener¬ 
ally  less  than  65  years  old).  Further,  because  of  a  wide  age  range 
and  other  differences,  significant  differences  in  individual  per¬ 
formance  can  be  expected.  For  example,  in  UMTRI  telematics 
studies,  older  drivers  typically  required  one  and  a  half  to  two 
times  longer  to  complete  tasks  than  younger  drivers  (Green, 
2001d).  This  fact,  along  with  the  requirement  to  design  and  test 
for  the  reasonable  worst-case  drivers,  makes  testing  drivers  over 
age  65  imperative. 

In  contrast  to  users  of  computers,  operators  of  motor  vehi¬ 
cles  must  be  licensed.  In  the  United  States,  the  process  of  be¬ 
coming  a  licensed  driver  begins  with  obtaining  a  copy  of  the 
state  driving  manual  and  learning  the  state’s  traffic  laws.  Candi¬ 
dates  must  also  pass  vision  tests  (see  httpV/www.lowvision 
care.comWisionlaws.htm)  and  take  a  test  of  rules  of  the  road  to 
obtain  a  learner’s  permit,  often  on  their  I6th  birthdays.  Consis¬ 
tent  with  the  increasingly  common  practice  of  graduated  driver 
licensing,  the  learner  can  drive  at  restricted  times  with  adult  su¬ 
pervision,  must  generally  complete  a  driver’s  education  class 
(which  often  includes  gory  crash  movies  to  convince  teenagers 
not  to  drink  and  drive  and  to  wear  seat  belts),  and  after  a  few 
years  and  an  on-the-road  test,  is  licensed  to  drive.  (For  details,  see 
http:/Avww.highwaysafety.org/laws/state_laws/grad_license.html, 


Ag« 


FIGURE  35.1.  Distribution  of  driver  age  groups  developed  from 
U.S.  Department  of  Transportation  data. 


http://www.insure.com/auto/teenstates.html,  and  http://www 
.hsrc.unc.edu/pubinfo/grad_overview.htm.)  The  rationale  for 
graduated  licensing  is  to  provide  new  drivers  with  more  experi¬ 
ence  under  less  risky  conditions.  To  put  this  in  perspective,  May- 
hew,  Simpson,  and  Pak  (2003)  show  that  crash  rates  per  10,000 
novice  drivers  drop  dramatically  with  time,  being  about  120, 
100,  and  70  after  one,  three,  and  six  months  of  being  licensed. 
Similarly,  recognizing  the  increased  risk  of  elderly  drivers,  some 
states  have  special  renewal  procedures  for  them  (IIHS,  2006). 

In  the  United  States,  obtaining  a  commercial  driver’s  license, 
which  is  needed  to  drive  buses,  large  trucks,  and  other  vehicles, 
is  a  more  complex  process.  Most  candidates  either  obtain  (a)  on- 
the-job  training,  (b)  training  integrated  into  their  lifestyle  (using 
machinery  on  a  farm),  or  (c)  training  at  truck  driving  schools 
(Sloss  &  Green,  2000).  That  population  tends  to  be  older  than 
the  working  population  as  a  whole,  and  is  predominantly  male. 

Driver  licensing  practice  varies  from  country  to  country.  In 
Japan,  for  example,  the  failure  rate  for  the  basic  licensing  exam 
is  much  higher  than  in  the  United  States,  and  there  is  much 
greater  use  of  special  schools  to  train  drivers. 


What  Kinds  of  Vehicles  Do  People  Drive 
(The  Platform  Question)? 

For  computers,  people  are  concerned  about  the  (a)  brand, 
(b)  amount  of  memory,  (c)  processor  speed,  (d)  capacity  of  the 
hard  drive,  (e)  type  and  version  of  operating  system  (Windows, 
Mac,  or  Linux),  (f)  type  and  version  of  browser,  and  so  forth. 
The  hardware  rarely  moves  very  far  from  its  initial  location; 
however,  the  content  of  individual  computers  is  in  a  state  of 
flux,  being  constantly  updated  over  a  life  span  of  just  a  few 
years.  Fortunately,  the  physical  interface  is  fairly  consistent — a 
Q'WFRTY  keyboard,  mouse,  and  often  a  17-inch  or  19-inch  mon¬ 
itor.  The  on-screen  “desktop”  is  a  more  flexible  space  than  the 
motor  vehicle  instrument  panel. 

Unlike  personal  computers,  a  motor  vehicle  is  almost  com¬ 
pletely  identified  by  its  make,  model,  and  year.  Updates  over  an 
average  11-year  life  span  are  rare.  In  most  countries  (at  least 
where  there  is  left-hand  drive),  many  aspects  of  driving  are  fairly 
consistent:  the  (a)  input  devices  (steering  wheel,  brake,  and 
throttle),  (b)  method  of  operation,  (c)  location,  and  (d)  primary 
displays  (windshield  and  mirrors).  In  contrast,  there  is  no  con¬ 
sistency  in  the  controls  or  displays  for  telematics  interfaces.  Fur¬ 
thermore,  although  new  motor  vehicle  models  are  offered  once 
per  year,  major  changes  typically  occur  every  five  years.  Computer 
software  and  hardware  model  upgrades  occur  almost  continu¬ 
ally,  with  a  three-year-old  computer  generally  being  considered 
obsolete.  Thus,  the  hardware  life  cycles  of  the  two  contexts  are 
quite  different. 

According  to  the  U.S.  Department  of  Transportation,  237  mil¬ 
lion  motor  vehicles  were  registered  in  the  United  States  in  2003 
(Highway  Statistics  2003,  2004b).  Of  them,  57.3%  were  automo¬ 
biles,  40.1%  were  trucks,  2.3%  were  motorcycles,  and  0.3%  were 
buses.  In  less-developed  countries,  motorcycles  make  up  a 
much  larger  fraction  of  the  vehicle  fleet. 

For  2004,  North  American  car  and  truck  production  was  16.9 
million  vehicles  (Tierney,  2005).  Many  people  do  not  realize  that 
the  United  States  now  produces  more  trucks  than  cars.  The  pre- 
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TABLE  35.8 

Top  20  Vehicles  in  U.S.  (Sales  in  2004)  from  Wards 
Auto  World 
(Bold  =  trucks) 

Rank 

Make 

Model 

Sales 

1 

Ford 

F-Series 

901,931 

2 

Chevrolet 

Silverado 

680,768 

3 

Toyota 

Camry/Camry  Solara 

426,990 

4 

Dodge 

Ram  Pickup 

426,289 

5 

Honda 

Accord 

386,770 

6 

Honda 

Civic 

309,196 

7 

Chevrolet 

Impala 

290,259 

8 

Ford 

Explorer 

276,303 

9 

Chevrolet 

Malibu 

268,017 

10 

Ford 

Taurus 

248,148 

11 

Toyota 

Corolla 

243,208 

12 

Dodge 

Caravan 

242,307 

13 

Nissan 

Altima 

235,889 

14 

CMC 

Sierra 

213,756 

15 

Ford 

Focus 

208,339 

16 

Chevrolet 

Cavalier 

195,275 

17 

Chevrolet 

Trail  Blazer 

193,862 

18 

Chevrolet 

Tahoe 

186,161 

19 

Ford 

Escape 

183,430 

20 

Jeep 

Grand  Cherokee 

182,313 

dominance  of  trucks  is  due  the  popularity  of  minivans,  sport 
utility  vehicles  (SUVs),  and  light  trucks  for  personal  use,  not 
the  production  of  heavy  trucks  for  commercial  use.  Table  35.8 
summarizes  U.S.  sales,  including  vehicles  made  in  Japan,  which 
comprise  a  significant  fraction  of  U.S.  sales.  Trucks  predominate 
sales.  Thus,  as  designers  think  about  packaging  telematics  ap¬ 
plications,  trucks  are  a  likely  platform. 


WHAT  MEASURES  OF  SAFETY 
AND  USABILITY  ARE  OF  INTEREST? 
(THE  EMPIRICAL  MEASUREMENT  ISSUE) 


As  was  noted  earlier,  usability  is  difficult  to  achieve  without  em¬ 
pirical  measurement.  First  impressions  suggest  that  the  mea¬ 
surement  of  usability  of  office  computer  and  web  applications, 
and  the  measurement  of  the  usability  of  telematics  applications, 
are  quite  similar.  In  an  office,  one  measures  task  completion 
time,  errors,  and  ratings  of  ease  of  use.  A  typical  usability  lab  has 
(a)  a  one-way  mirror,  (b)  cameras,  (c)  video  editing  equipment, 
(d)  audio  mixers,  and  (e)  at  least  two  rooms,  one  for  the  subject 
and  one  for  an  experimenter.  (See  chapter  57.) 

In  a  typical  laboratory  for  examining  telematics,  the  same 
measures  may  be  obtained;  however,  other  driving-specific  mea¬ 
sures,  as  listed  in  Table  35.9,  may  also  be  obtained,  especially  in 
driving  simulators  and  on-the-road  tests.  (See  Green,  1995a,  b, 
c.)  In  addition,  a  host  of  other  performance  measures  often 
used  in  non-HCI  contexts  are  also  often  collected:  (a)  ratings 
of  workload  (Cooper-Harper  ratings,  NASA  TLX),  (b)  measures 
of  situation  awareness,  and  (c)  measures  of  object  and  event  de¬ 
tection  (pertaining  to  traffic).  (See  Roskarn  et  ah,  2002;  Tije¬ 
rina,  Angell,  Austria,  Tan,  &  Kochhar,  2003;  Johansson  et  ah, 
2004.)  Because  the  user  is  engaged  in  concurrent  performance 


TABLE  35.9.  Some  Driving-Specific  Usability  Measures 

Category  Measure 

Lateral  Number  of  lane  departures 

Mean  and  standard  deviation  of  lane  position 
Standard  deviation  of  steering  wheel  angle 
Number  of  larger  steering  wheel  reversals 
Time  to  line  crossing 
Steering  entropy 

Longitudinal  Number  of  collisions 

Time  to  collision 

Headway  (time  or  distance  to  lead  vehicle) 

Mean  and  standard  deviation  of  speed 
Speed  drop  during  a  task 
Heading  entropy 

Number  of  braking  events  over  some  g  threshold 
Visual  Number  of  glances 

Mean  glance  duration 
Maximum  glance  duration 
Total  eyes-off-the-road  time 


of  a  manual  control  and  information  management  tasks,  the  sit¬ 
uation  is  in  some  ways  is  similar  to  that  described  by  Landry  in 
the  aerospace  chapter.  (See  chapter  36.) 

A  major  challenge  in  assessing  the  safety  and  usability  of 
telematics  is  dealing  with  the  trade-offs  that  drivers  naturally 
make.  The  impression  is  that  when  preoccupied  with  an  in-ve¬ 
hicle  task,  drivers  lose  awareness  of  the  driving  context — that  is, 
situation  awareness.  Drivers  attempt  to  compensate  by  slowing 
down  (to  make  driving  easier),  allowing  for  larger  headways 
and,  if  very  preoccupied,  paying  less  attention  to  steering  (so 
lane  variance  and  the  number  of  lane  departures  increase) ; 
however,  drivers  can  respond  in  strange  ways.  For  example,  if 
asked  to  use  two  different  in-vehicle  systems,  one  of  which  is 
not  well  designed,  they  might  attempt  to  maintain  equal  per¬ 
formance  on  both:  slow  down  more  for  the  more  difficult  in¬ 
terface,  but  compensate  by  having  better  steering  performance 
for  the  poorer  interface.  Assessment  is  difficult  because  the 
trade-off  functions  for  all  of  these  measures  are  unknown. 
One  strategy  used  to  overcome  the  trade-off  problem  is  to  min¬ 
imize  the  opportunity  for  trade-offs.  For  example,  this  might 
include  using  cruise  control  to  fix  the  speed  (and  in  some  cases, 
headway)  and  provide  incentives  and  feedback  to  maintain  dri¬ 
ving  performance,  so  only  task  time  and  errors  might  trade  off 


HOW  ARE  TELEMATICS  EVALUATED? 


The  assessment  of  mobile  applications  often  occurs  in  contexts 
other  than  simulators  and  real  roads  as  suggested  above.  Table 
35.10  provides  a  summary  of  the  contexts  and  their  strengths 
and  weaknesses.  (See  Green,  1995a,  for  additional  details.) 

Over  the  last  decade,  enormous  improvements  have  been 
made  in  the  quality  of  the  tools  available  for  human  factors  eval¬ 
uations  of  in-vehicle  evaluations.  These  include: 

1.  Reductions  in  the  size  of  video  cameras  (to  that  of  a  postage 
stamp)  and  their  cost. 
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Focus  groups 


Clinics 


Part  task  simulation 


Driving  simulator 


Instrumented  vehicle  on 
test  track  or 
public  roads 


Operational  field  test 


TABLE  35.10.  Evaluation  Contexts 


Method  and  Equipment 

•  Groups  of  eight  to  1 2  people  demographically  similar 
to  customers  sit  around  a  table  and  discuss  a  product 
or  service  guided  by  a  facilitator 

•  Camera  is  often  behind  one-way  mirror 

•  Generally  done  in  multiple  cities  (one  or  two  groups/city) 

•  Often  conducted  by  a  marketing  firm 


•  Customers  in  various  cities  are  given  the  opportunity  to 
experience  a  new  product  and  its  competition,  often 
two  or  three  vehicles,  side  by  side 

•  Customers  say  which  product  or  feature  they  prefer 

•  A  sample  of  users  operate  the  device  (e.g.,  computer- 
simulation  of  a  new  radio)  and  user  task  times,  errors, 
and  comments  are  recorded 

•  Test  facility  is  not  sophisticated 

•  TYpically  driving  simulators  are  fixed  base  (no  motion) 
and  cost  $25,000  to  $250,000  each 

•  simulators  at  manufacturers  tend  to  be  in  1-3  million 
dollar  range,  though  some  are  much  more  (e.g.,  Ford  is 
about  $10,000,000). 

•  1-5  projectors  with  total  40-210  degree  forward  field 
of  view,  real  vehicle  cab,  steering  system  with  torque 
feedback,  and  realistic  sound 

•  rear  image  may  be  projected  or  mirrors  may  be 
replaced  with  small  LCDs 

•  For  advice,  see  Green,  Nowakowski,  Mayer,  and 
Tsimhoni,  2003;  Green,  2005 


•  Production  vehicle  (usually  a  car  and  often  in  the  past, 
a  station  wagon)  is  fitted  with  cameras  aimed  at  driver, 
forward  scene,  instrument  panel,  and  lane  markings, 
and  with  sensors  for  steering  wheel  angle,  brake 
pressure,  speed,  and  headway 

•  Eye  fixation  system  may  also  be  provided 

•  System  of  interest  is  also  installed 

•  In  a  typical  experiment,  the  driver  is  asked  to  follow  a 
test  route  while  a  back  seat  experimenter  operates  the 
test  equipment 

•  Compact  instrumentation  is  installed  in  a  fleet  of 
vehicles  (10-50) 

•  Each  vehicle  is  borrowed  by  a  potential  user  for  a  week, 
a  month,  or  even  a  year 

•  Driving  performance  is  surreptitiously  recorded  by  the 
vehicle 

•  Unlike  an  instrumented  car,  continuous  video  is  not 
recorded 

•  In  addition  to  data  recorded  by  the  instrumented 
vehicle,  GPS-determined  location  is  also  recorded 

•  Vehicles  are  periodically  polled  for  data  (and  data  is 
automatically  dumped)  by  an  independent  digital 
cellular  phone 

•  Test  is  confined  to  a  single  metropolitan  area 


Advice  and  Comments 


•  Useful  in  getting  ideas  for  product  concepts,  but  not 
predictions  of  the  safety  or  usability  of  new  products 
because  the  products  are  never  used 

•  Approach  might  be  used  by  manufacturers  when  a 
usability  test  might  be  more  appropriate 

•  Generally  no  quantitative  data 

•  Essential  to  report  actual  quotes  from  participants,  not 
what  the  facilitator  recalls 

•  Only  exposes  users  to  a  limited  number  of  options 

•  Performance  data  often  not  collected 

•  Approach  is  commonly  used  by  industry 

•  Because  the  results  are  highly  proprietary,  published 
studies  are  rare 

•  Not  done  that  often 

•  Relatively  less  common  now  than  in  the  past  as 
simulators  have  improved  and  instrumented  vehicles 
have  become  more  common 

•  Operation  requires  considerable  experience 

•  Simulator  sickness  is  a  major  problem 

•  Each  experiment  requires  construction  of  a  test 
road/world  and  scripting  the  behavior  of  vehicles  and 
pedestrians 

•  Facility  can  require  considerable  space  (e  g.,  1000 
square  feet) 

•  Generally  requires  large  number  of  fixed  small  (lipstick 
or  smaller)  cameras 

•  Best-known  vendors  in  U.S  are  Systems  Technology 
Inc.  ($5,000-$25,000,  http://www.systemstech.com/ 
content/view/23/39/),  GlobalSim  Corporation, 
($75,000-$!  50,000,  http://www.drivesafety.com/),  and 
OKTAL  in  Europe  (http://www.scaner2.com/php/) 

•  Typical  cost  is  at  least  $100,000-$250,000  though 
some  low-cost  systems  may  utilize  a  single  camcorder 
on  a  vertically-mounted  curtain  rod  in  the  back  seat 
aimed  at  the  instrument  panel 

•  Problem  in  the  past  has  been  finding  enough  space  and 
power  for  the  equipment,  which  has  been  solved  by 
laptops  and  lipstick  cameras 

•  Commonly,  instrumented  research  test  vehicles  are 
used,  though  there  is  increasing  use  of  systems  that 
can  be  installed  in  the  subject's  own  vehicle 

•  Each  test  requires  unique  instrumentation 

•  Tests  are  very  expensive  ($1 0,000,000  to  $40,000,000) 
and  can  only  be  conducted  with  significant  government 
support 

•  Experiment  generally  lasts  several  years 

•  Planning  stage  for  experiments  takes  several  years 

•  At  any  given  time,  there  may  only  one  operational  field 
test  in  progress  in  the  U.S. 


710  •  GREEN 


2.  Significant  improvements  in  the  fidelity,  along  with  signifi¬ 
cant  reductions  in  the  cost,  of  wide  field-of-view  driving 
simulators. 

3.  Electronic  innovations  that  allow  packaging  compact  instru¬ 
mentation  systems  in  vehicles  and  that  reduce  power  con¬ 
sumption  (and  heat  generation) . 

4.  GPS  systems  for  precise  tracking  of  vehicle  location. 

5.  Digital  cellular  phones  for  remote  downloading  of  vehicle 
data. 

Development  of  better,  lower-cost  eye-fixation  systems  is  still 
needed,  though  the  leading  vendors  of  hardware  for  driving  re¬ 
search.  Seeing  Machines,  the  manufacturers  of  FaceLab  (http:// 
seeingmachines.com/),  and  Smart  Eye  (http://www.smarteye.se) 
are  making  progress.  Challenges  include  their  accuracy  and  abil¬ 
ity  to  hold  calibration,  dealing  with  glasses  (common  for  older 
drivers) ,  and  operation  in  bright  sunlight. 


WHAT  DESIGN  DOCUMENTS  EXIST 
FOR  TELEMATICS? 


What  Types  of  Documents  Are  There? 

To  establish  that  a  product  is  safe  and  easy  to  use,  empirical 
measurement  is  important.  Beyond  verification,  findings  from 
such  measurements  can  be  fed  back  to  the  design  team  to  im¬ 
prove  the  design. 

Design  is  also  guided  by  product  and  service  specifications. 
In  the  case  of  office  applications,  style  guidelines  exist  for  vari¬ 
ous  platforms  and  applications.  (See  chapter  47  by  Mayhew  for 
related  information  on  requirements  specification.)  For  telem¬ 
atics  applications,  publicly  available,  product-specific  interface 
guidelines  do  not  exist  but  there  are  other  types  of  important 
written  materials  (Green,  2001a,  2001b;  Schindhelm  et  al., 
2004).  Automotive  design  documents  fall  into  five  general 
classes:  (a)  principles,  (b)  information  reports,  (c)  guidelines, 
(d)  recommended  practices,  and  (e)  standards.  Principles  give 
high-level  recommendations  for  design  and  are  similar  to  those 
found  in  office  HCI  applications,  such  as  “design  interfaces  to 
minimize  learning.” 

Information  report  is  a  term  used  by  SAE  to  refer  to  a  com¬ 
pilation  of  engineering  reference  data  or  educational  material 
useful  to  the  technical  community.  Information  reports  do  not 
specify  how  something  should  be  designed,  but  provide  useful 
background  information. 

Guidelines  give  much  more  specific  advice  about  how  to  de¬ 
sign  an  interface  element.  Eor  example,  guideline  9  in  chapter  7 
of  Green  (1995)  stated,  “Turn  displays  should  show  two  turns 
in  a  row  when  the  turns  are  in  close  proximity”  (p.  41),  where 
close  proximity  means  0.1  miles  apart  or  less.  The  impact  of 
guidelines  can  depend  on  the  issuing  organization.  Eor  exam¬ 
ple,  automotive  design  guidelines  written  by  research  organiza¬ 
tions  have  no  real  authority.  Guidelines  written  by  the  Interna¬ 
tional  Organization  of  Standardization  (ISO),  while  technically 
voluntary,  can  become  requirements,  because  in  some  coun¬ 
tries,  type  approval  (approval  for  sale)  requires  compliance  with 
ISO  guidelines.  Eor  vehicle  models  sold  worldwide,  global  man¬ 


ufacturers  find  building  common  vehicle  systems  that  comply 
with  ISO  standards  to  be  less  costly  than  building  noncompliant, 
country-specific  systems.  In  Japan,  the  Japan  Automobile  Man¬ 
ufacturers  Association  (JAMA,  2004)  has  a  set  of  guidelines  for 
navigation  systems.  Although  theoretically  voluntary,  “requests” 
from  the  National  Police  Agency  make  the  JAMA  guidelines  a  re¬ 
quirement  for  all  original  equipment  manufacturers  (OEMs). 

Recommended  practice,  a  term  used  by  SAE,  refers  to  meth¬ 
ods,  procedures,  and  technology  that  are  intended  as  guides  to 
standard  engineering  practice.  The  content  may  be  of  general 
nature,  or  may  present  data  that  have  not  yet  gained  broad  en¬ 
gineering  acceptance.  Common  practice  is  to  follow  a  recom¬ 
mended  practice,  if  possible.  It  is  recognized  that  a  product  lia¬ 
bility  action  against  a  product  (especially  in  the  United  States) 
is  extraordinarily  difficult  to  defend  if  the  product  design  devi¬ 
ates  from  recommended  practice. 

A  standard  specifies  how  something  must  be  done  and  in¬ 
cludes  broadly  accepted  engineering  practices,  or  specifications 
for  a  material,  product,  process,  procedure,  or  test  method.  In 
the  case  of  SAE,  a  standard  is  technically  voluntary  because  SAE 
has  no  enforcement  powers.  Even  more  so  than  with  a  recom¬ 
mended  practice,  a  product  not  complying  with  an  SAE  stan¬ 
dard  is  almost  not  defendable  in  a  product  liability  action,  and  is 
unlikely  to  be  purchased  from  a  supplier  by  an  OEM. 

Non-ISO  Documents 

Table  35.11  provides  a  summary  of  the  design  document  activ¬ 
ities  to  date  excluding  those  of  the  ISO  (described  later).  As 
indicated  in  the  table,  the  EU  guidelines  are  quiet  brief  and  are 
merely  statements  of  very  general  principles  (for  example  that 
interfaces  should  be  simple  to  operate).  Others  contain  a  lim¬ 
ited  set  of  specifics  (Alliance,  JAMA,  SAE  J  2364,  and  J2365, 
TRL),  and  others  contain  significant  design  details  (Battelle, 
HARDIE,  UMTRI).  Readers  should  not  take  the  view  that  the 
newer  or  longer  sets  of  guidelines  are  necessarily  better.  Rep¬ 
resentatives  from  industry  (with  academic  involvement  for  SAE 
efforts)  developed  the  Alliance,  EU,  JAMA,  and  SAE  guidelines. 
Contractors  (Battelle  and  UMTRI)  working  for  the  U.S.  Depart¬ 
ment  of  Transportation  developed  the  guidelines  bearing  their 
names.  The  TRL  and  HARDIE  guidelines  were  contracted  ef¬ 
forts  in  Europe. 

See  http ://www. umich . edu/~ driving/guidelines/guidelines 
.html.  for  unofficial  electronic  copies.  (To  avoid  copyright  prob¬ 
lems,  only  draft  and  not  final  versions  have  been  posted  for  SAE 
and  ISO  documents.) 

All  of  these  guidelines  are  voluntary  except  for  the  JAMA 
guidelines,  as  noted  earlier.  In  the  United  States,  the  most  im¬ 
portant  guidelines  from  the  perspective  of  regulation  are  SAE 
Recommended  Practices  J2364  andJ2365,  especially  J2364, 
and  the  Alliance  guidelines.  They  are  all  discussed  later  in  this 
chapter.  Many  manufacturers  and  suppliers  are  complying 
with  them. 

The  guidelines  developed  by  the  Alliance,  EU,  and  JAMA  are 
being  revised  and  that  process  is  expected  to  continue  for  sev¬ 
eral  years.  Eor  updates,  readers  are  therefore  advised  to  see 
either  the  authoring  organization  or  the  previously  noted  UMTRI 
website. 
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TABLE  35.11.  Major  Non-ISO  Telematics  Guidelines  and  Recommended  Practices 


Common  Document  Name 

Reference 

Size  (pages) 

Comments 

Alliance  guidelines 

Alliance  of  Automobile 

Manufacturers  (AAM)  (2003  )une 

17);  version  3 

67 

Restatement  of  EU  principles,  plus  considerable 
details  and  rationale,  to  be  used  by  almost  all 
manufacturers  in  U.S.;  key  sections  are 
principles  2.1  and  2.2,  which  still  need 
development;  still  being  updated  by  AAM 

Battelle  guidelines 

Campbell,  Carney,  and  Kantowitz, 

(1997) 

261 

Voluminous  document  with  references  to  interface 
design,  heavy  on  trucks.  User  interface  has 
been  said  to  have  a  Windows  OS  flavor,  includes 
physical  ergonomics  information  (e.g.,  legibility, 
control  sizes)  which  are  not  included  in  the 
UMTRI  guidelines. 

EU  guidelines 

Commission  of  the  European 
Communities  (1999) 

2 

Mostly  "motherhood"  statements.  Some  revisions 
are  expected. 

HARDIE  guidelines 

Ross,  Midtiand,  Fuchs,  Pauzie, 

Engert,  Duncan,  Vaughan,  Vernet, 
Peters,  Burnett,  and  May  (1996) 

480 

Early  set  of  European  guidelines,  less  data  than 
UMTRI  or  Battelle. 

lAMA  guidelines 

lapan  Automobile  Manufacturers 
Association  (2004) 

15 

First  set  of  detailed  design  guidelines  for  driver 
interfaces.  These  guidelines  are  voluntary  in 
lapan  but  followed  by  all  OEMs  there  and 
sometimes  by  aftermarket  suppliers.  Some 
aspects  are  particular  to  Japan.  Device  location 
restrictions  are  important. 

SAE  12364  ("  1 5-second 
rule”) 

Society  of  Automotive  Engineers 
(2004,  August) 

13 

Specifies  the  maximum  allowable  task  time  and 
test  procedures  for  navigation  system  tasks 
performed  while  driving  for  systems  with  visual 
displays  and  manual  controls;  also  describes  an 
interrupted  vision  (visual  occlusion)  method  as 
well;  See  also  SAE  |2678. 

SAE  12365  (SAE  calculations) 

Society  of  Automotive  Engineers 
(2002,  May) 

23 

Method  to  compute  total  task  time  for  tasks  not 
involving  voice  used  early  in  design  to  estimate 
compliance. 

TRL  checklist 

Quimby  (1999) 

18 

Simple  check  list 

UMTRI  guidelines 

Green,  Levison,  Paelke,  and  Serafin 
(1993) 

111 

First  set  of  comprehensive  design  guidelines  for 
the  U.S.  Includes  principles,  general  guidelines, 
and  specific  design  criteria  with  an  emphasis  on 
navigation  interfaces. 

ISO  Documents 

Much  of  the  ISO  activity  over  the  last  decade  has  occurred  un¬ 
der  the  auspices  of  International  Organization  of  Standardiza¬ 
tion  Technical  Committee  22,  Subcommittee  13  (ISO  TC  22/SC 
13-Ergonomics  Applicable  to  Road  Vehicles,  in  particular.  Work¬ 
ing  Group  8  [WG8 — Transport,  Information,  and  Control  Sys¬ 
tems  or  TICS] ;  Green,  2000a) .  WG8  has  about  50  delegates  from 
the  major  vehicle  producing  nations,  with  the  most  15  active 
members  appearing  at  meetings  held  2  to  3  times  per  year,  usu¬ 
ally  in  Europe. 

Table  35.12  shows  the  standard  and  technical  reports  devel¬ 
oped  (or  in  progress)  by  Working  Group  8  that  pertain  to  telem¬ 
atics.  (For  the  complete  list,  see  ISO,  2007.)  Most  of  the  stan¬ 
dards  can  be  quite  general,  sometimes  not  containing  the  detail 
found  in  the  Battelle,  HARDIE,  or  UMTRI  guidelines.  For  a  vari¬ 
ety  of  reasons,  they  tend  to  emphasize  measurement  methods 
and  organization  over  specifications  and  safety  limits.  To  pro¬ 


mote  international  harmonization,  national  standards  organiza¬ 
tions,  technical  societies  (e.g.,  SAE),  and  government  organiza¬ 
tions  (e.g.,  U.S.  Department  of  Transportation)  often  permit  ISO 
standards  to  supersede  their  own  standards,  so  ISO  standards 
are  very  important. 

Note:  ISO  documents  follow  a  very-well-defined,  three-year 
process  through  several  stages  (Preliminary  Work  Item  [PWI] , 
Committee  Draft  [CD] ,  Draft  International  Standard  [DIS] ,  Final 
Draft  International  Standard  FDIS] ,  and  International  Standard 
[IS])  as  they  are  passed  from  the  working  group  to  the  subcom¬ 
mittee  to  the  technical  committee,  and  finally,  to  the  secretariat 
for  review  and  approval.  The  major  hurdles  are  the  working 
group  and  subcommittee,  where  passage  requires  two-thirds  of 
the  nations  participating.  The  emphasis  of  this  process  is  on 
building  a  voluntary  consensus  (ISO,  2003).  Some  items  that  are 
more  informational  in  nature  become  technical  reports  instead  of 
standards.  Because  of  the  limited  number  of  experts  available, 
WG8  is  very  selective  in  adding  items  to  its  work  program. 
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TABLE  35.12.  ISO  TC  2  2/SC  1 3/WG  8  Work  Program 


Effort 

Summary 

Status 

Dialogue  management  principles  and 
compliance  procedures 

Provides  high-level  ergonomic  principles  (compatibility  with  driving, 

consistency,  simplicity,  error  tolerance,  etc.)  to  be  applied  in  the  design  of 
dialogues  that  take  place  between  the  driver  of  a  road  vehicle  and  an  in- 
vehicle  information  system  while  the  vehicle  is  in  motion.  Provides  general 
directions  on  how  to  test  for  compliance. 

Std  1 5005 

Specifications  and  compliance 
procedures  for  in-vehicle  auditory 
presentation 

Provides  requirements  for  auditory  messages  including  signal  levels, 
appropriateness,  coding,  etc.,  along  with  compliance  test  procedures 

Std  1 5006 

Measurement  of  driver  visual  behavior 

Generally  describes  video-based  equipment  (cameras,  recording  procedures, 
etc.)  and  procedures  (subject  descriptions,  experiment  design  parameters, 
tasks,  performance  measures,  etc.)  used  to  measure  driver  visual  behavior 

Std  1 5007,  part  1 
and  TS  part  2 

Legibility  (visual  presentation  of 
information) 

Provides  requirements  for  character  size,  contrast,  luminance,  etc.  and 
specifies  how  they  are  to  be  measured 

Std  1 5008 

Warning  system  messages 

This  state-of-the-art  literature  review  (circa  2002)  of  warning  systems  covers 
topics  such  alarm  theories;  the  design  of  visual,  auditory,  and  tactile 
warnings;  redundancy;  etc.  The  report  contains  summaries  of  a  significant 
number  of  studies. 

TR  16352 

Message  priority 

Provides  two  methods  for  determining  a  priority  index  for  in-vehicle  messages 
(e.g.,  navigation  turn  instruction,  collision  warning,  low  oil)  presented  to 
drivers  while  driving.  For  one  method,  priority  is  based  on  criticality 
(likelihood  of  injury  if  the  event  occurs  and  urgency  (required  response 
time)),  determined  on  four-point  scales  by  experts. 

TS  16951 

Suitability  of  TICS  while  driving 

Generally  describes  a  process  for  assessing  whether  a  specific  TICS,  or  a 
combination  of  TICS  with  other  in-vehicle  systems,  is  suitable  for  use  while 
driving.  It  addresses;  a  user-oriented  TICS  description  and  context  of  use, 

TICS  task  description  and  analysis,  assessment,  and  documentation. 

Std  17287 

Occlusion  method  to  assess  distraction 

Describes  how  the  visual  demand  of  a  display  can  be  assessed  by  periodically 
blocking  (occluding)  the  driver's  view  of  the  display.  Includes  requirements 
for  number  and  training  of  subjects,  test  hardware,  viewing  and  occluded 
periods,  and  two  metrics  for  data  analysis. 

Std  16673  (in  press) 

Lane  change  test 

Proposes  a  procedure  for  testing  the  demand  of  telematics  devices  using  a  PC- 
based  driving  simulator.  Subjects  perform  a  number  of  lane  changes,  some 
of  which  occur  while  using  an  in-vehicle  device.  The  test  is  based  on  results 
of  the  Advanced  Driver  Attention  Metrics  (ADAM)  project.  Complements  SAE 
and  AAM  procedures. 

NWl  26022 

SAE  12364  (The  1 5-Second  Rule) 

Of  the  design  documents  in  the  literature,  few  have  generated 
as  much  discussion  as  SAE  Recommended  Practice  J2364,  also 
known  as  “The  15-Second  Total  Task  Time  Rule”  or  “  15-Second 
Rule.”  SAE  J2364  establishes  two  procedures  for  determining  if 
a  navigation-system-related  data  entry  task  involving  visual  dis¬ 
plays  and  manual  controls  is  excessive  while  driving  (Society  of 
Automotive  Engineers,  2004a,  b;  Green,  1999c).  The  practice 
applies  to  OEM  and  aftermarket  products.  The  practice  does 
not  and  should  not  apply  to  voice  interfaces  or  passenger  op¬ 
eration  because  the  task  demands  are  fundamentally  different. 
There  is  no  reason  why  the  requirements  do  not  make  sense  for 
similar  tasks  for  other  systems  given  its  performance  basis. 

In  developing  this  practice,  criteria  were  selected  to  be  related 
to  crash  risk,  likely  to  lead  to  design  improvements,  and  easy  to 
measure.  Some  suggested  this  document  should  contain  design 
criteria,  such  as  a  specification  for  a  maximum  number  of  items 
on  a  menu.  It  became  clear,  however,  that  such  a  design  practice 
needed  a  performance  basis,  and  specifying  a  design  standard 


would  constrain  innovation,  so  a  performance-based  practice  was 
developed  instead.  Many  measures  were  then  proposed,  with 
eyes-off-the-road  time  being  most  popular.  The  logic  behind  this 
measure  is  simple.  If  drivers  are  not  looking  at  the  road  while  dri¬ 
ving,  they  are  more  likely  to  crash,  with  the  likelihood  of  a  crash 
increasing  with  increased  eyes-off-the-road  time.  As  shown  in 
Green  (1999d),  this  is  reflected  in  the  following  equation: 

#  U.S.  deaths  in  1989  =  *market.penetration.fraction* 

[  —  .133  +  (.0447  *  (mean  glance  time)^-^  (#  of  glances)  (frequency))] 

where 

market. penetration.fraction  is  the  fraction  of  vehicles  with  a 
system  (10%  a  .  1)  mean  glance  time  is  the  number  of  seconds 
drivers  looked  away  from  the  road  #  of  glances  is  the  number  of 
times  the  device  is  looked  for  each  use  sequence  frequency  is 
the  number  of  use  sequences  per  week. 

As  an  example,  a  task — say,  entering  an  address — could  have 
a  mean  glance  duration  of  2.7  seconds,  require  an  average  of 
27.5  glances  per  entry,  and  be  performed  twice  per  week. 
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Unfortunately,  eyes-off-the-road  time  is  time-consuming,  ex¬ 
pensive  to  measure,  and  requires  specialized  equipment  and 
skilled  personnel.  It  also  requires  a  fully  functional  system  in¬ 
stalled  either  in  a  simulator  or  test  vehicle,  something  that  is 
only  available  late  in  design.  Invariably,  problems  are  identified 
so  close  to  production  that  few,  if  any,  changes  can  be  made. 
One  of  the  key  lessons  from  the  Gould  and  Lewis  design  prin¬ 
ciples  from  HCI  is  that  feedback  from  early  on  in  the  design 
process  is  critical. 

A  review  of  the  literature  (Green,  1999d)  found  that  task 
time  while  driving  was  highly  correlated  with  eyes-off-the-road 
time.  Looking  at  in-vehicle  systems  is  time  shared  with  driving, 
so  the  more  the  driver  needs  to  look,  the  longer  the  task  will 
take.  Further,  dynamic  (on-the-road)  task  time  and  eyes-off-the- 
road  time  are  correlated  with  static  task  time,  the  time  to  com¬ 
plete  the  task  when  the  vehicle  is  parked  (e.g..  Green,  1999d; 
Farber  et  al.,  2000;  Young  et  al.,  2005).  Static  task  time  is  easy  to 
measure  and  can  be  done  using  prototypes  available  early  in 
design. 

The  current  SAE  J2364  test  procedure  (Society  of  Automo¬ 
tive  Engineers,  2004a)  requires  that  10  subjects  between  the 
ages  45  and  65  be  tested.  Each  subject  completes  five  practice 
trials  and  three  test  trials  in  a  parked  vehicle,  simulator,  or  lab¬ 
oratory  mockup.  The  test  cases  to  be  examined  (addresses  for 
destination  entry)  are  to  be  representative  of  what  is  planned 
for  production. 

In  the  static  method,  the  subject  performs  the  task,  with  the 
duration  being  from  when  the  subject  is  told  to  start  until  the 
goal  is  achieved.  Timing  is  continuous  except  for  computational 
interruptions  equal  to  or  greater  than  1.5  seconds,  a  time  pe¬ 
riod  during  which  the  device  is  computing  (for  example,  a 
route) .  If  feedback  is  provided  to  the  driver,  that  period  is  ex¬ 
cluded  from  the  15-second  task  time  limit.  The  interface  com¬ 
plies  with  J2364  if  the  mean  of  the  log  of  the  task  times  is  less 
than  the  log  of  15  seconds.  (Logs  were  used  to  reduce  the  influ¬ 
ence  of  long  outliers.) 

It  must  be  emphasized  that  the  15-second  limit  is  for  a  static 
test.  On-the-road  drivers  will  take  30%  to  50%  longer  overall,  and 
furthermore  alternate  between  looking  inside  the  vehicle  and 
looking  at  the  road.  The  test  procedure  does  not  suggest  drivers 
can  safely  look  away  from  the  road  continuously  for  15  seconds. 

Some  have  argued  that  use  of  static  task  time  fails  to  iden¬ 
tify  interfaces  requiring  long  glance  durations.  However,  analy¬ 
sis  of  real  products  shows  the  primary  risk  is  from  tasks  that  take 
too  long  to  complete.  In  fact,  it  is  very  difficult  to  think  of  dri¬ 
ver  tasks  for  navigation  systems  that  have  short  total  task  times 
but  very  long  glance  durations.  In  real  driving,  people  truncate 
glances  to  the  interior  when  the  glances  become  too  long  but 
tend  to  complete  tasks,  even  if  they  are  unacceptably  long.  In 
practice,  eliminating  tasks  with  long  completion  tasks  (the 
worst  tasks)  also  eliminates  many  of  the  tasks  with  long  glance 
durations. 

Nonetheless,  J2364  provides  an  alternative  method  involving 
visual  occlusion.  In  that  method,  either  the  subjects  wear  spe¬ 
cial  LCD  glasses,  or  vision  to  the  device  is  otherwise  periodically 
interrupted,  simulating  looking  back  and  forth  between  the 
road  and  the  device.  (Unlike  driving,  though,  subjects  do  noth¬ 
ing  in  the  occlusion  interval.)  The  device  is  visible  for  1.5  sec¬ 
onds  and  occluded  for  1  to  2  seconds,  with  1.5  seconds  being 


recommended.  Compliance  is  achieved  if  the  sum  of  the  log  of 
the  viewing  times  is  less  than  the  log  of  20  seconds. 

Alliance  Principles 

The  Alliance  of  Automobile  Manufacturers,  the  trade  association 
of  most  major  manufactures  of  automobiles  in  the  United  States 
(except  for  Nissan,  Honda,  Subaru,  and  Isuzu),  has  devoted  con¬ 
siderable  effort  in  developing  design  guidelines.  Although  the 
document  scope  states  that  it  applies  only  to  advanced  infor¬ 
mation  and  communication  systems,  it  should  apply  to  tradi¬ 
tional  information  or  communications  systems  (e.g.,  entertain¬ 
ment  systems),  since  the  low-level  tasks  (reading  displays, 
pressing  buttons,  etc.)  and  the  manner  in  which  those  tasks  in¬ 
terfere  with  driving  are  the  same  for  all  systems. 

Their  design  principles  document  is  a  detailed  elaboration  of 
the  24  principles  in  the  EU  guidelines  (e.g.,  “The  system  should 
be  located  ...  in  accordance  with  relevant .  .  .  standards.  .  .  .” 
“No  part . . .  should  obstruct  any  vehicle  controls  or  displays. . .”), 
which  seem  obvious  at  a  high  level;  however,  the  important 
content  of  the  principles  is  in  how  they  are  achieved.  Each  prin¬ 
ciple  has  four  parts:  (a)  rationale,  (b)  criterion/criteria,  (c)  veri¬ 
fication  procedures,  and  (d)  examples. 

The  most  important  principle  is  2.1:  “Systems  with  visual 
displays  should  be  designed  such  that  the  driver  can  complete 
the  desired  task  with  sequential  glances  that  are  brief  enough 
not  to  adversely  affect  driving.”  Two  alternative  sets  of  criteria 
are  offered.  Alternative  A  says  that  the  85th  percentile  of  single¬ 
glance  durations  should  generally  not  exceed  2  seconds,  and 
task  completion  should  not  require  more  than  20  seconds  of 
total  glance  time.  Verification  can  be  achieved  by  a  visual  oc¬ 
clusion  procedure  (1.5  second  viewing  time,  1.0  second  occlu¬ 
sion  time),  or  by  monitoring  eye  fixations  directly  using  either 
a  camera  aimed  at  the  face  or  an  eye  fixation  monitoring  system 
in  either  a  divided-attention  or  on-road  test. 

Alternative  B  requires  that  the  number  of  lane  exceedances 
do  not  exceed  the  number  associated  with  a  reference  task  such 
as  manual  radio  tuning,  and  that  cars  following  headway  should 
not  degrade  under  those  conditions,  either.  The  verification 
procedure  is  stipulated  to  be  driving  on  a  divided  road  (either 
real  or  simulated)  at  45  mph  or  less  in  daylight,  on  dry  pave¬ 
ment,  with  low  to  moderate  traffic.  Additional  details  are  pro¬ 
vided  describing  the  location  of  the  radio,  the  stations  to  choose 
among,  what  constitutes  a  trial,  subject  selection  (equal  num¬ 
bers  of  men  and  women  between  the  ages  of  45  and  65),  and 
so  forth. 

Although  both  procedures  seem  well  described  on  the  sur¬ 
face,  additional  details  and  constraints  are  needed  to  make 
those  procedures  repeatable.  For  example,  the  differences  in 
performance  between  driving  in  “low”  and  “medium”  traffic 
could  be  quite  considerable.  One  source  of  ideas  is  Lai’s  (2005) 
dissertation,  which  specifically  proposes  using  variations  in  lat¬ 
eral  position  and  variations  in  speed  maintenance  as  assess¬ 
ment  measures.  Nonetheless,  the  principles  represent  a  rea¬ 
sonable  first  step  in  developing  a  test  protocol.  On  the  other 
hand,  on-road  and  simulator  tests  are  extremely  expensive  and 
time-consuming,  often  impractical,  and  occur  too  late  to  have  a 
useful  impact.  It  is  therefore  the  author’s  preference  to  rely  on 
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simpler  evaluation  procedures  such  as  J2364  and  the  J2365  cal¬ 
culation  procedure  described  in  the  next  section. 


WHAT  TOOLS  AND  ESTIMATION  PROCEDURES 
CAN  AID  TELEMATICS  DESIGN? 


SAE  12365  Calculations 

SAE  J2365  (Green,  1999a)  was  developed  to  allow  designers  and 
engineers  to  calculate  completion  times  early  in  design,  when 
it  is  still  a  concept  that  can  easily  be  modified.  As  with  J2364, 
J2365  is  for  in-vehicle  tasks  involving  visual  displays  and  manual 
controls  evaluated  statically — that  is,  while  parked  (or  in  a 
bench-top  simulation).  SAE  J2365  applies  to  both  OEM  and  af¬ 
termarket  equipment.  Though  intended  for  navigation  systems, 
J2365  should  provide  reasonable  estimates  for  most  in-vehicle 
tasks  involving  manual  controls  and  visual  displays. 

The  calculation  method  is  based  on  the  Goals,  Operators, 
Methods,  and  Selection  rules  (GOMS)  model  described  by  Card, 
Moran,  and  Newell  (1980)  with  task  time  data  from  several 
sources.  (See  Kieras’  chapter  60  for  background  information  on 
GOMS.)  The  keystroke  data  was  drawn  from  UMTRI  studies  of  the 
Siemens  Ali-Scout  navigation  system  (Steinfeld,  Manes,  Green,  & 
Hunter,  1996;  Manes,  Green,  &  Hunter,  1998).  Search  times  were 
based  on  Olson  and  Nilsen  (1987-1988)  and  the  mental  time  es¬ 
timates  were  drawn  from  the  Keystroke-Level  Model  (Card, 
Moran,  and  Newell,  1983)  and  UMTRI  Ali-Scout  studies.  Thus,  the 
times  shown  in  Table  35-13  have  been  tailored  for  the  automo¬ 
tive  context.  (See  also  Nowakowski,  Utsui,  &  Green,  2000.) 

The  basic  approach  involves  top-down,  successive  decom¬ 
position  of  a  task.  The  analyst  divides  the  task  into  logical  steps. 
Eor  each  step,  the  analyst  identifies  the  human  and  device  task 
operators.  Sometimes  analysts  get  stuck  using  this  approach  be¬ 
cause  they  are  not  sure  how  to  divide  a  task  into  steps.  In  those 
cases,  utilizing  a  bottom-up  approach  may  overcome  such  road¬ 
blocks.  For  each  goal,  the  analyst  identifies  the  method  used. 


The  analyst  is  advised  to  use  paragraph  descriptions  of  each 
method  to  document  them  and  then  convert  those  descriptions 
into  pseudo  code.  All  steps  are  assumed  to  occur  in  series;  mul¬ 
tiple  tasks  cannot  be  completed  at  the  same  time.  Furthermore, 
most  drivers  are  assumed  to  use  only  visible,  noncognitively 
loading  shortcuts.  Invisible  shortcuts  are  likely  to  be  used  only 
by  experts. 

Next,  the  pseudo  code  task  description  is  entered  into  an  Ex¬ 
cel  spreadsheet.  The  analyst  looks  up  the  associated  time  for  each 
operator  listed  in  Table  35.10  and  sums  them  to  determine  total 
task  time.  To  assist  in  understanding  the  process,  the  practice  pro¬ 
vides  a  step-by-step  example  of  entering  a  street  address  into  a 
PathMaster/NeverLost  navigation  system,  a  popular  U.S.  product. 
For  background  on  the  calculation  method,  see  Green  (1999b). 

The  J2365  approach  shares  a  number  of  assumptions,  many 
of  which  are  also  shared  with  the  basic  GOMS  model.  For  ex¬ 
ample,  the  model  assumes  error-free  performance,  which  while 
not  likely,  can  be  adjusted  for  (say  by  increasing  the  computed 
value  by  25%).  Further,  activities  are  assumed  to  be  routine 
cognitive  tasks,  with  users  knowing  each  step  and  executing 
them  in  a  serial  manner.  Again,  adjustments  in  computed  time 
can  account  for  users  sometimes  forgetting  what  is  next. 

Though  many  of  these  assumptions  are  not  true,  adjust¬ 
ments  can  be  made  for  them  and  many  times  the  adjustments 
are  small.  Furthermore,  violations  of  assumptions  tend  to  af¬ 
fect  all  interfaces  equally,  so  decisions  about  which  of  several 
interfaces  is  best  still  hold.  As  a  practical  matter,  the  estimates 
are  good  enough  for  most  engineering  decisions.  Readers 
should  keep  in  mind  that  J2364  only  requires  the  use  of  10  sub¬ 
jects  at  most,  so  there  is  some  error  in  those  estimates.  Those 
errors  are  likely  to  be  as  large  as  variability  among  analysts  and 
amongJ2365  estimates. 


MS  Estimates 

A  more  complex  estimation  procedure,  the  IVIS  DEMAnD  model 
(In-Vehicle  Information  System  Design  Evaluation  and  Model 


TABLE  35.13  OperatorTimes  (seconds) 


Code 

Name 

Operator  Description 

Time  (s) 

Young  Drivers  Older  Drivers 
(18-30)  (55-60) 

Rn 

Reach  near 

From  steering  wheel  to  other  parts  of  the  wheel,  stalks,  or  pods 

0.31 

0.53 

Rf 

Reach  far 

From  steering  wheel  to  center  console 

0.45 

0.77 

Cl 

Cursor  once 

Press  a  cursor  key  once 

0.80 

1.36 

C2 

Cursor  2  times  or  more 

Time/keystroke  for  the  second  and  each  successive  cursor  keystroke 

0.40 

0.68 

LI 

Letter  or  space  1 

Press  a  letter  or  space  key  once 

1.00 

1.70 

L2 

Letter  or  space  2  times  or  more 

Time/keystroke  for  the  second  and  each  successive  cursor  keystroke 

0.50 

0.85 

N1 

Number  once 

Press  the  letter  or  space  key  once 

0.90 

1.53 

N2 

Number  2  times  or  more 

Time/keystroke  for  the  second  and  each  successive  number  key 

0.45 

0.77 

E 

Enter 

Press  the  Enter  key 

1.20 

2.04 

F 

Function  keys  or  shift 

Press  the  function  keys  or  Shift 

1.20 

2.04 

M 

Mental 

Time/mental  operation 

1.50 

2.55 

S 

Search 

Search  for  something  on  the  display 

2.30 

3.91 

Rs 

Response  time  of  system-scroll 

Time  to  scroll  one  line 

0.00 

0.00 

Rm 

Response  time  of  system-new  menu 

Time  for  new  menu  to  appear 

0.50 

0.50 

Note  1 :  The  keystroke  times  shown  in  Table  35.12  include  the  time  to  move  between  keys. 
Note  2:  System  response  times  to  display  new  menus  may  be  empirically  determined. 
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of  Attention  Demand,  described  by  Hankey,  Dingus,  Hanowski, 
Wierwille,  &  Andrews,  2000a,  2000b).  The  model,  which  runs 
under  Windows  98  or  Windows  NT,  allows  analysts  to  calculate  a 
wide  range  of  performance  characteristics  for  proposed  user 
interfaces.  (The  CD  can  be  obtained  from  the  U.S.  Department 
of  Transportation,  Federal  Highway  Administration,  Turner-Fair- 
bank  Highway  Research  Center  in  McLean,  Virginia.) 

Consistent  with  common  understanding,  the  model  assumes 
there  are  five  basic  human  resources:  (a)  visual  input,  (b)  audi¬ 
tory  input,  (c)  central  processing,  (d)  manual  output,  and 
(e)  speech  output.  Overload  of  any  one  of  these  resources  will 
affect  task  performance.  The  more  demanding  an  in-vehicle 
task,  the  greater  the  likelihood  of  a  crash.  The  model  does  not 
include  a  haptic  component  because  haptic  displays  are  rare  in 
contemporary  vehicles.  Although  many  have  developed  models 
of  human  performance  that  partition  human  resources  more 
finely,  a  five-component  model  is  sufficient  for  most  in-vehicle 
analyses.  The  data  in  the  model  were  based  on  four  experiments: 
(a)  Gallagher  (2001),  (b)  Blanco  (1999),  (c)  Biever  (1999),  and 

(d)  research  conducted  by  Westat,  a  consulting  company.  These 
experiments  concentrated  on  reading  visual  displays  while  dri¬ 
ving,  though  there  was  work  on  auditory  information  as  well. 

To  use  the  model,  analysts  need  to  create  a  description  of 
each  task  drivers  perform.  Generally,  that  involves  selecting 
the  task  in  question  from  a  large  library  of  tasks  in  the  data¬ 
base,  modifying  an  existing  description,  or  creating  one  from 
scratch.  Tasks  are  grouped  into  seven  categories:  (a)  conven¬ 
tional,  (b)  search,  (c)  search-plan,  (d)  search-plan-interpret, 

(e)  search-plan-compute,  (f)  search-compute,  and  (g)  search- 
plan-interpret-compute.  Analysts  need  to  select  the  driver  age 
category  (or  specify  all  ages),  the  traffic  density,  the  road  com¬ 
plexity,  the  reliance  on  symbols/labels,  the  location  of  the  dis¬ 
play  and  other  characteristics. 

The  output  of  the  model  includes  about  20  parameters, 
such  as  (a)  the  expected  number  of  glances,  (b)  total  task  time, 
(c)  ratings  of  mental  demand  and  frustration,  (d)  total  task  de¬ 
mand,  and  so  forth.  In  addition,  the  model  output  specifies  if 
design  thresholds  are  exceeded.  The  model  proposes  two  sets 
of  thresholds:  (a)  yellow-line  and  (b)  red-line.  Yellow-line 
thresholds  were  sets  of  points  at  which  there  was  a  measurable 
degradation  in  driving  performance  {p  <  .05)  from  baseline 
driving  in  the  research  conducted  to  support  model  develop¬ 
ment.  Red-line  thresholds  indicated  that  a  composite  group  of 
surrogate  safety  measures  of  driving  performance  was  substan¬ 
tially  affected.  The  red-line  values  were  determined  primarily 
from  the  literature  and  expert  opinion.  Table  35.14  shows  those 
thresholds. 

Independent  validation  data  of  the  model  has  yet  to  appear 
in  the  literature. 


TABLE  35.14.  MS  Yellow  and  Red  Line  Thresholds  from 
Hankey,  Dingus,  Hanowski,  Wierwille,  and  Andrews  (2000a) 


Measure 

Affected  (yellow) 

Substantially  Affected  (red) 

Single  glance  time 

1 .6  s 

2.0  s 

Number  of  glances 

6  glances 

1 0  glances 

Total  visual  task  time 

7s 

15  s 

Total  task  time 

12  s 

25  s 

CLOSING  THOUGHTS 


This  chapter  makes  the  following  key  points: 

1.  Driving  is  quite  different  from  sitting  at  a  desk  in  an  office 
because  of  the  concern  for  crash  risk. 

2 .  A  large  number  of  new  telematics  applications  will  appear  in 
vehicles  over  the  next  few  years.  They  should  radically  re¬ 
shape  the  driver’s  task,  providing  the  driver  with  a  flood  of 
information. 

3.  This  flood  of  information  has  the  potential  to  distract  dri¬ 
vers  from  driving.  Distraction  is  associated  with  specific 
types  of  crashes  (single  vehicle  run  off  road,  rear  end)  that 
are  most  common  under  generally  good  driving  conditions. 

4.  Relative  to  other  distractions,  the  number  of  telematics- 
related  crashes  has  been  low  in  the  recent  past;  however, 
current  projections  are  that  in  excess  of  200  people  are 
killed  per  year  in  the  U.S.  in  cell-phone-related  crashes,  and 
the  number  is  growing.  This  number  of  fatalities  exceeds 
that  from  front  air  bags  and  Firestone/Explorer  tire  rollovers, 
both  of  which  had  major  financial  impact  on  industry  and 
led  to  federal  regulatory  changes. 

5.  Workload  managers  may  be  a  long-term  alternative  to  leg¬ 
islation  to  reduce  distraction  that  leads  to  crashes. 

6.  Trips  are  made  for  a  wide  variety  of  purposes  that  need  to 
be  considered  in  assessing  telematics  applications. 

7.  Telematics  need  to  be  developed  for  a  wide  range  of  driver 
ages.  Even  well  into  their  retirement  years,  most  people  still 
drive  and  will  need  to  be  able  to  use  telematics. 

8.  Though  people  commonly  think  of  passenger  cars  as  the 
primary  form  of  personal  transportation  in  the  United 
States,  more  trucks  (especially  pickups,  SUVs,  and  vans)  are 
produced  in  the  United  States  than  passenger  cars,  and 
they  will  be  the  most  common  platform  in  the  future. 

9.  In  assessing  safety  and  usability,  a  wide  variety  of  measures 
of  longitudinal  and  lateral  control  are  used  in  addition  to 
task  completion  time,  errors,  and  subjective  ratings  of  ease 
of  use.  A  major  challenge  in  interface  evaluation  is  dealing 
with  performance  trade-offs  between  measures. 

10.  Over  the  last  few  years,  there  have  been  significant  ad¬ 
vances  in  driving  simulators  and  instrumented  vehicles  that 
have  improved  their  quality  and  reduced  their  cost  for 
safety  and  usability  evaluations;  however,  the  cost  of  these 
systems  may  be  out  of  the  range  of  most  laboratories,  es¬ 
pecially  those  in  academic  settings. 

11.  The  Alliance,  Battelle,  EU,  HARDIE,  JAMA,  SAE,  TRL,  and 
UMTRI,  along  with  ISO,  have  developed  key  design  docu¬ 
ments.  The  SAE  recommended  practices  (especially  SAE 
J2364,  the  15-second  rule)  and  the  Alliance  guidelines  have 
the  most  impact  in  the  United  States,  as  does  JAMA  in  Japan. 
ISO  is  in  the  process  of  developing  a  large  set  of  standards 
and  other  organizations  are  updating  theirs. 

12.  SAE  J2365  and  IVIS  can  assist  in  predicting  task  perfor¬ 
mance  time,  a  simple  measure  for  evaluating  telematics 
safety  and  usability. 

Thus,  while  the  HCI  literature  provides  a  framework  for  test 

methods  and  evaluation,  a  great  deal  is  specific  to  the  motor 
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vehicle  context  because  of  the  safety-critical  nature  of  the  con¬ 
text  and  the  timesharing  not  found  in  office  activities.  To  meet 
the  needs  of  the  future,  the  cost  of  the  methods  needs  to  be 
reduced,  and  reliable  tools,  especially  for  recording  eye  fixa¬ 


tions,  are  needed.  Significant  research  is  needed  to  support 
the  development  driver  performance  models  (and  workload 
managers)  and  understand  how  drivers  use  real  telematics 
applications. 
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HUMAN-COMPUTER  INTERACTION 
IN  AEROSPACE 


Aviation’s  interest  with  human-computer  interaction  (HCI)  con¬ 
cepts  began  with  the  need  to  understand  how  pilots  could  in¬ 
teract  with  mechanical  displays  that  indicated  such  things  as  air¬ 
speed,  altitude,  aircraft  orientation,  heading,  and  bearing  from  a 
radio  navigation  aid.  As  aircraft  technology  progressed  and  dis¬ 
plays  became  more  numerous  and  more  complex,  the  interface 
issues  became  more  pronounced.  As  travel  by  aircraft  was  com¬ 
mercialized,  the  consequences  of  these  interface  issues  became 
more  severe.  As  modern  air-traffic  control  was  introduced,  a 
new  arena  for  human  interface  issues  opened.  By  the  1970s, 
computers  were  in  commercial  aircrafts,  and  air-traffic  control 
was  utilizing  radar  displays  and  automated  flight  data  process¬ 
ing.  This  progression  has  not  stopped,  as  higher  levels  of  au¬ 
tomation  and  new  types  of  interfaces  have  been  steadily  added 
to  flight  decks  and  air-traffic  control  rooms  over  the  succeed¬ 
ing  decades. 

This  continuous  progression  of  displays  and  automation  has 
taxed  the  capabilities  of  researchers  to  fully  investigate  the  vast 
and  varied  challenges  associated  with  aviation  HCI.  This  chapter 
discusses  the  efforts  that  have  been  made,  are  being  made,  and 
those  issues  which  are  just  now  on  the  horizon. 


PAST  CHALLENGES,  SOLUTIONS, 
AND  PRINCIPLES 


Human  factors  issues  related  to  aviation  displays  were  recog¬ 
nized  as  early  as  1923,  as  indicated  by  the  following  quotation 
from  a  National  Advisory  Committee  for  Aeronautics  (the  pre¬ 
decessor  to  NASA): 

The  reaction  of  the  aviator  to  his  instruments  has  to  be  con¬ 
sidered,  as  well  as  the  operation  of  the  instruments  themselves. 
This  is  evident  enough  in  the  case  of  appliances  such  as  oxy¬ 
gen  apparatus,  intended  solely  for  the  comfort  and  efficiency 
of  the  aviator,  or  in  the  case  of  complicated  instruments  such 
as  bombsights.  It  is  equally  true,  however,  with  the  more  simple, 
direct-reading  instruments.  It  is  not  enough  for  such  instru¬ 
ments  to  be  mechanically  correct;  they  must  be,  in  the  case  of 
service  instruments,  readily  intelligible  to  the  pilot.  The  manip¬ 
ulation  of  the  instrument  must  not  make  an  appreciable  de¬ 
mand  on  his  time  or  attention.  The  visibility  must  be  satisfactory 
both  day  and  night.  Finally,  service  instruments  must  be  “fool¬ 
proof”  While  much  can  be  accomplished  by  technical  instruc¬ 
tion  courses  for  aviation  personnel,  still  the  personal  prejudice 
of  the  average  pilot  has  to  be  reckoned  with.  If  the  instrument 
for  any  reason  fails  to  appeal  to  the  individual  pilot,  he  will  take 
great  chances  rather  than  trouble  to  look  at  it.  On  the  other 
hand,  if  the  instrument  pleases  his  fancy,  he  may  grow  so  at¬ 
tached  to  it  that  he  will  claim  he  could  not  fly  safely  without  it, 
even  though  the  instrument  is  scientifically  known  to  be  incor¬ 
rect.  Curious  examples  of  this  circumstance  were  found  in  the 
popularity  of  the  earlier  liquid  type  Pitot  tube  among  the  British 
pilots  and  the  spinning-top  inclinometer  among  the  French. 
(Hersey  1923,  pp.  8-9) 


Another  early  example  of  aviation  cognitive  engineering  re¬ 
search  was  in  1939,  when  an  Applied  Psychology  Unit  at  Cam¬ 
bridge  University  in  England  (headed  by  Frederic  Bartlett)  be¬ 
gan  studying  the  design  of  aviation  equipment  (Bartlett,  1943). 
Bartlett’s  group  studied  manual  control  performance,  the  in¬ 
terrelationship  between  controls  and  displays,  and  human  vigi¬ 
lance.  Also  in  1939,  the  National  Research  Council  in  the  United 
States  created  a  Council  on  Aviation  Psychology.  The  U.S.  Army 
and  Navy  created  programs  on  aviation  psychology  in  1940  and 
1945,  respectively. 

World  War  II  provided  a  wealth  of  aviation  experience,  and 
after  the  war  was  over,  aviation-sawy  graduate  students  enter¬ 
ing  universities  on  the  GI  Bill.  At  the  same  time,  advances  were 
being  made  in  navigation  systems  technology,  cathode  ray  tube 
(CRT)  displays,  aircraft  instrumentation,  and  control  technol¬ 
ogy.  The  rapid  increase  in  civil  aviation  made  the  development 
of  air-traffic  control  a  priority  as  well,  and  the  advent  of  radar 
displays  peaked  interest  in  cognitive  engineering  issues  for  air- 
traffic  controllers. 

These  developments  led  to  an  explosion  of  research  in  the 
1950s,  both  in  the  United  States  and  in  Europe.  One  of  the  more 
influential  reports  was  a  1951  blue  ribbon  committee  report 
titled  “Human  Engineering  for  an  Effective  Air-Navigation  and 
Traffic-Control  System”  (Fitts,  1951b).  This  paper  summarized 
the  state  of  research  as  it  concerned  air  transportation,  and 
identified  key  areas  of  future  research.  As  such,  it  provides  a 
convenient  milepost  for  a  discussion  of  the  history  of  aviation 
cognitive  engineering  research. 

This  chapter  will  begin  from  a  historical  perspective,  cover¬ 
ing  the  major  issues  conceived  of  in  the  1950s  and  how  they 
were  addressed  in  the  succeeding  decades,  if  at  all.  Following 
that  will  be  a  discussion  of  modern  concerns,  which  grew  out 
of  the  computerization  of  flight  decks  and  innovative  display 
concepts  such  as  heads-up  displays.  Lastly  will  be  a  discussion  of 
the  future  challenges  for  aviation  and  air-traffic  control  displays. 

One  concern  for  researchers  in  the  1950s  regarded  the  dis¬ 
play  of  information  on  “transitory  displays,”  mainly  dials  and 
gauges.  As  such,  the  main  focus  was  on  (a)  whether  (and  how) 
to  integrate  information  on  a  single  display,  (b)  how  to  arrange 
displays,  (c)  how  displays,  controls,  and  the  comprehension  of 
spatial  information  were  related,  and  even  (d)  what  types  of  dis¬ 
plays  were  best  for  certain  functions. 

Many  of  these  questions  have  been  successfully  addressed, 
providing  today’s  engineers  with  some  of  the  most  concrete 
guidance  that  cognitive  engineering  has  to  offer.  Some  of  them 
continue  to  be  fascinating  topics  of  research  to  this  day,  and  of 
course,  many  new  topics  related  to  information  display  and  as¬ 
sessment  have  arisen  over  the  years. 

Single-Sensor  Single-Instrument  (SSSI) 

Displays  versus  Integrated  Displays 

The  problems  inherent  in  SSSI  displays  have  become  increas¬ 
ingly  obvious  as  the  number  and  complexity  of  sensors  (and 
other  information  sources)  has  increased.  This  has  been  partic¬ 
ularly  noticeable  in  aviation,  where  flight  decks  have  undergone 
significant  changes  since  1951,  but  has  also  been  true  for  nu¬ 
clear  plants,  which  were  just  beginning  to  be  tested  in  1951. 
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Interestingly,  air-traffic  controllers  have  only  recently  begun  to 
encounter  this  problem  as  the  number  of  displays  (and  ele¬ 
ments  on  those  displays)  provided  to  a  controller  (and  in  the 
Traffic  Management  Unit  [TMU])  has  substantially  increased. 

SSSI  displays  have  been  gradually  replaced  (or  incorpo¬ 
rated  into  multifunction  displays)  as  display  technology  has 
improved,  although  there  are  still  examples  of  traditional  SSSI 
displays  on  older  aircraft  and  plants  (it  is  an  expensive  ven¬ 
ture  to  replace  them  with  graphical  displays) .  It  seems  clear 
from  the  research,  however,  that  fewer  displays  are  better  than 
many  displays  (although  the  former  is  not  without  its  own 
problems). 

The  reason  that  SSSI  displays  are  generally  considered  to  be 
worse  than  an  integrated  display  seems  to  be  related  to  the  fact 
that  a  person’s  ability  to  monitor  dynamic  information  is  very 
low,  even  when  only  keeping  track  of  a  few  items  (Monty,  1973). 
Also,  aircraft  operators  have  demonstrated  that  they  have  diffi¬ 
culty  integrating  a  large  amount  of  displayed  information  (Mur¬ 
phy,  McGee,  Paler,  Paulk,  &  Wempe,  1978).  These  findings  mean 
that  it  is  difficult  for  an  operator  to  maintain  comprehension  of 
a  dynamic  situation  when  forced  to  scan  from  display  to  display. 
A  study  of  the  operation  of  manipulators  showed  subjects  per¬ 
formed  30%  to  40%  better  using  two  displays  as  opposed  to  four 
displays,  despite  a  greater  amount  of  information  in  the  four- 
display  case  (Bejczy  &  Paine,  1977).  This  result  extended  a  much 
earlier  finding  showing  that  fewer  detections  would  be  missed 
detections  if  there  were,  for  example,  five  times  the  error  rate 
on  one  display  as  compared  to  five  displays  each  with  one  fifth 
the  error  rate  (Conrad,  1951). 

The  presence  of  large  numbers  of  displays  also  increases  the 
amount  of  irrelevant  data  through  which  the  operator  must 
sort,  leading  to  the  problem  of  clutter.  If  the  SSSI  displays  must, 
at  every  moment,  display  all  the  information  that  may  be 
needed  in  any  state  of  the  system,  then  there  must  be  irrele¬ 
vant  information  present  at  any  given  time,  and  the  operator 
must  filter  out  this  irrelevant  information.  As  the  quantity  of 
irrelevant  information  increases,  operator  performance  de¬ 
creases,  particularly  if  the  information  is  similar  (Hodge  &  Reid, 
1971;  Well,  1971;  Dorris,  Connolly,  Sadosky  &  Burroughs,  1977). 

Irrelevant  data  is  a  problem,  even  assuming  the  operator 
knows  what  data  is  relevant  (which  is,  and  continues  to  be,  its 
own  problem).  The  relevant  data  must  then  be  located  from  the 
different  (and  often,  physically  separated)  instruments,  and 
then  mentally  integrated  to  obtain  the  higher  order  represen¬ 
tation.  This  operation  can  be  cognitively  intensive,  and  during 
situations  of  stress  or  short  time  constraints,  it  may  not  be  pos¬ 
sible  (Vicente  &  Rasmussen,  1992). 

In  the  next  section,  these  issues — general  design  principles, 
arrangement  of  displays,  and  clutter — will  be  discussed  in  more 
detail.  Many  of  the  same  (or  similar)  principles  can  be  noticed  in 
chapter  2  of  this  volume. 

Basic  Display  Principles 

In  examining  the  question  of  whether  fewer  displays  contain¬ 
ing  integrated  information  were  better,  and  how  information 
may  be  integrated,  researchers  have  developed  a  number  of 
principles  for  the  design  of  displays.  In  particular,  researchers 


early  on  studied  problems  in  discerning  the  direction  of  mo¬ 
tion  of  a  pointer,  and  in  the  use  of  multipointer  displays  (which 
were  being  used  in  altimeter  designs;  Fitts,  1951a).  Since  then,  a 
number  of  design  principles  with  regard  to  such  quantitative 
displays  have  been  put  forth  (Sanders  &  McCormick,  1993). 

One  study  compared  digital  speedometers  to  dials  and 
curvilinear  displays  (all  electronically  generated).  For  accuracy 
and  speed  of  reading,  the  digital  display  performed  consis¬ 
tently  better  (Simmonds,  Galer,  &  Baines,  1981).  In  general, 
digital  displays  have  been  found  superior  when  a  precise  nu¬ 
meric  value  is  required,  and  when  the  values  presented  remain 
visible  long  enough  to  be  read. 

Fixed-scale  displays  with  moving  pointers  are  better  in  the 
opposite  cases — where  precision  is  not  required  or  the  values 
change  quickly.  They  are  also  better  in  cases  where  direction 
and  rate  of  change  of  the  values  are  important.  For  the  design  of 
analog  displays,  the  following  guidelines  (Heglin,  1973)  have 
been  generally  accepted: 

•  A  moving  pointer  against  a  fixed  scale  is  preferred. 

•  Small  changes  in  quantity  are  more  visible  with  a  moving 
pointer  display. 

•  If  the  quantity  to  be  read  is  analogous  to  some  physical  in¬ 
terpretation,  horizontal  or  vertical  fixed  scales  with  moving 
pointers  should  be  used  in  order  to  provide  a  zero  reference. 

•  Do  not  mix  multiple  moving-element  indicators  when  they  are 
related  to  the  same  function,  as  this  can  cause  reversal  errors. 

•  Moving  scale  displays  can  be  used  where  the  scale  is  too  great 
to  be  displayed  on  the  face  of  an  instrument. 

•  If  a  control  changes  the  quantity  on  the  display,  it  is  less  am¬ 
biguous  for  the  pointer  to  move  than  the  scale. 

In  addition,  for  quantitative  displays  the  numeric  progres¬ 
sion  and  length  of  scale  units  affect  the  speed  and  accuracy  of 
quantitative  reading  (Whitehurst,  1982),  and  scale  progressions 
should  be  in  increments  of  Is  or  5s  (to  ease  determination  of 
readings  in  multiples  of  1  and  5).  Unusual  progressions  and  dec¬ 
imals  should  be  avoided.  It  was  also  found  that  the  distance 
between  graduation  markers  should  be  no  less  than  0.05  to  0.07 
inches  apart  (greater  under  low  illumination  conditions),  that 
scale  markers  should  be  used,  and  they  should  have  graduation 
marks  for  the  lowest  unit  to  be  read,  but  in  no  case  lower  than 
one  fifth  or  one  tenth  (Cohen  &  Follert,  1970).  Bar-type  displays 
should  only  show  one  segment  of  the  bar,  should  not  have  the 
bar  extended  to  zero,  and  should  not  be  used  where  a  need  for 
an  indication  at  zero  is  needed  (Green,  1984). 

Altimeter  designs  were  of  particular  concern  in  the  late 
1940s,  due  to  a  spate  of  accidents  related  to  misreading  them. 
One  of  the  early  altimeters  responsible  for  some  of  the  prob¬ 
lems  violated  one  of  the  principles  above:  it  had  three  pointers 
on  the  same  scale.  The  operator  would  have  to  read  three 
pieces  of  information  and  integrate  them  to  get  altitude.  In 
1968,  a  study  determined  that  an  integrated  vertical  display  of 
altitude  was  best  in  terms  of  time  and  accuracy  of  reading 
(Roscoe,  1968).  In  explaining  the  results,  the  researcher  indi¬ 
cated  that  key  factors  were  the  analogy  of  the  vertical  instru¬ 
ment  to  altitude  and  the  integration  of  all  elements  of  altitude 
into  one  display. 
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The  distinction  between  quantitative  and  check-reading  in¬ 
struments  has  survived  to  this  day,  with  the  added  distinction 
of  qualitative  and  situational  awareness  instruments  (Sanders 
&  McCormick,  1993).  In  fact,  despite  the  prevalence  of  digital 
computers  driving  them,  graphical  displays  often  show  repre¬ 
sentations  of  analog-like  displays  for  instruments  designed  for 
qualitative  or  check-reading.  This  is  because  analog  displays 
were  found  to  be  more  quickly  read  for  this  purpose,  and  con¬ 
tributed  less  to  workload,  when  compared  to  digital  displays 
(Hanson,  Payne,  Shively,  &  Kantowitz,  1981). 

Check-reading  display  research.  Researchers  dem¬ 
onstrated  that  for  a  number  of  instruments  configured  together 
for  check  reading,  the  pointers  on  round  instruments  were  best 
positioned  at  9  or  12  o’clock  (Sanders  &  McCormick,  1993). 
"When  all  pointers  show  nearly  identical  readings,  the  grouping 
of  the  instruments  and  positioning  of  the  pointers  appeal  to  the 
similarity  gestalt  of  human  cognition.  Violation  of  the  gestalt  is 
immediately  and  reliably  recognized.  Further  research  indicated 
that  the  addition  of  a  line  joining  the  instruments  at  their  null 
position  improved  or  added  to  the  gestalt  and  improved  per¬ 
formance  (Dashevsky  1964). 

Qualitative  instrument  research.  The  distinction  does 
not  appear  to  have  been  made  between  quantitative  and  quali¬ 
tative  (perhaps  a  subset  of  which  is  check  reading)  instruments 
before  I960.  Qualitative  instruments  are  distinct  from  quantita¬ 
tive  in  that  the  specific  numerical  value  is  irrelevant;  instead  the 
approximate  value,  its  value  in  relation  to  some  standard,  its 
rate,  or  its  trend  are  of  interest.  The  difference  may  have  first 
been  noticed  in  a  study  that  examined  two  types  of  readings  of 
an  instrument  using  three  types  of  scales.  In  that  study,  subjects 
had  to  either  read  the  quantitative  value,  or  read  high,  low,  or 
OK.  Open-window  displays  (see  Fig.  36.1)  were  read  the  quick¬ 
est  for  quantitative  values,  but  vertical  scales  were  read  quick¬ 
est  for  the  qualitative  assessment  (Elkin,  1959). 

Qualitative  scales  also  typically  have  some  identification  of 
ranges.  This  can  be  done  through  the  use  of  color,  or  by  the  ad¬ 
dition  of  coded  markings  that  have  some  association  to  the 
meaning  of  the  range  (Sabeh,  Jorve,  &  Vanderplas,  1958). 
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FIGURE  36. 1 .  Display  types. 


Arrangement  of  Displays 

A  second  question  brought  up  in  the  Fitts  (1951b)  report  re¬ 
garded  the  organization  of  the  display  elements.  A  number  of 
advances  in  relation  to  this  arrangement  problem  were  made 
shortly  after  its  publication.  Extensive  studies  (most  notably 
one  in  which  Fitts  himself  participated)  were  underway  exam¬ 
ining  visual  scanning  patterns  to  determine  the  optimal  layout 
of  instruments  (Fitts,  Jones,  &  Milton,  1950).  These  studies  led 
the  Civil  Aeronautics  Board  (later  to  become  the  FAA)  to  es¬ 
tablish  a  standard  arrangement  for  flight  instruments,  com¬ 
monly  referred  to  as  the  “basic  T,”  in  1953  (Civil  Aeronautics 
Board,  1953).  It  included  six  basic  instruments,  namely,  (a)  air¬ 
speed,  (b)  attitude,  (c)  altitude,  (d)  direction,  (e)  “climb” 
(which  is  taken  here  to  mean  vertical  speed),  (f)  flight  path 
deviation,  and  (g)  their  required  arrangement.  This  was  later 
reduced  to  just  the  first  four  instruments  (Civil  Aeronautics 
Board,  1957). 

The  analysis  used  in  relating  visual  scanning  patterns  to  dis¬ 
play  positioning  was  referred  to  as  “link  analysis.”  In  this  analysis, 
a  “link”  was  sequential  fixations  on  two  items.  The  more  occur¬ 
rences  of  sequential  fixations  between  two  items,  the  stronger 
the  link.  Items  strongly  linked  should  then  be  located  next  to 
one  another,  a  technique  and  principle  used  in  other  fields  (such 
as  in  operation  centers  aboard  naval  vessels)  (Chapanis,  1959). 
Other  studies  of  eye  movements  determined  that  the  best  loca¬ 
tion  for  important  information  is  just  above  or  below  the 
glareshield  of  a  vehicle  (Cole,  Milton,  &  McIntosh,  1954).  Since 
then,  advances  have  been  made  in  automating  information  or¬ 
ganization  (Mendel  &  Sheridan,  1986)  and  instrument-panel  lay¬ 
out  (Pulat  &  Ayoub,  1979),  although  methods  of  this  sort  are  not 
widely  used. 

The  layout  of  the  instruments  not  only  serves  to  reduce  the  time 
to  scan  the  instruments,  but  can  also  affect  workload.  A  principle  of 
display  design  called  the  “proximity  compatibility  principle”  states 
that,  when  attempting  to  integrate  two  sources  of  information  from 
two  locations,  greater  workload  is  induced  by  greater  separation, 
as  the  physical  proximity  of  the  sources  should  mimic  the  cogni¬ 
tive  proximity  of  the  information  (Wickens  &  Carswell,  1995). 

This  principle  relates  to  the  requirement  of  the  operator  to 
integrate  different  sources  of  information.  There  are  two  basic 
types  of  integration  covered  by  this  principle.  The  first  involves 
locating  a  source  of  information  (such  as  the  heading  indicator) 
next  to  related  sources  of  information  (such  as  bank  indication 
and  turn  coordinator).  The  second  involves  the  display  itself  in¬ 
tegrating  more  than  one  piece  of  information,  such  as  the  atti¬ 
tude  indicator  showing  both  pitch  and  bank  at  the  same  time. 

Multifunction  graphical  displays  (MFDs)  overcome  these  two 
limitations.  These  displays  provide,  in  one  viewport,  a  number  of 
elements  that  may  need  to  be  integrated  frequently  (although 
not  always),  satisfying  the  proximity  compatibility  principle  (Sei- 
dler  &  "Wickens,  1992).  The  content  of  the  MFD  can  be  manually 
controlled,  providing  the  opportunity  to  present  a  wealth  of  in¬ 
formation  for  which  there  previously  was  no  physical  space. 

Two  problems  arise,  however.  First,  information  that  is  not 
currently  displayed  is  unavailable  to  the  pilot.  If  that  information 
changes,  the  pilot  will  be  unaware  of  the  change.  Second,  the 
“depth”  added  to  the  display  presents  the  problem  of  navigation. 
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although  this  can  be  mitigated  through  good  organization  (and 
minimal  depth)  of  the  pages  (Allen,  1983;  Schneiderman,  1998, 
Francis,  2000).  If  these  principles  are  not  met,  however,  there 
is  the  possibility  that  the  operator  will  become  lost  in  the  data¬ 
base.  In  addition,  there  is  workload  (and  heads-down  time)  as¬ 
sociated  with  the  controls  for  the  display. 

How  Much  Information  Can  Be  Displayed  Effectively? 

The  problem  of  how  much  information  can  be  fit  on  a  display, 
has  significantly  evolved  since  1951,  but  is  still  a  topic  of  debate 
and  research.  A  large  body  of  work  on  visual  processing  has 
been  amassed  over  the  years,  and  many  of  the  theories  are  still 
hotly  debated.  What  is  certain  is  that  distinguishability  of  items 
on  a  display  is  overwhelmed  by  the  somewhat  limited  percep¬ 
tual  processing  capacity  of  the  human.  Without  going  into  a 
great  amount  of  detail,  some  of  the  findings  with  respect  to  this 
issue  will  be  summarized. 

There  are  two  fundamental  limits  on  extracting  information 
from  a  visual  scene:  (a)  processing  and  (b)  attention.  At  any  time, 
only  a  small  amount  of  the  information  available  to  the  retina  can 
be  processed  (although  all  of  it  is  perceived).  The  mechanics  of 
processing  are  still  being  determined,  but  it  appears  likely  that 
some  high-level  representation  is  quickly  obtained,  followed  by 
a  “weeding  out”  of  some  portions  of  the  image  (VanRullen  & 
Thorpe,  2001).  Scene  processing  allows  a  variety  of  information 
to  be  extracted,  but  specific  objects  of  interest  may  or  may  not  be 
extracted  in  sufficient  detail.  Focused  attention  captures  this  de¬ 
tail,  but  objects  can  only  be  processed  serially,  and  subsequent 
processing  of  these  objects  is  inhibited  (referred  to  as  “negative 
priming”;  Tipper  &  Driver,  1988).  The  combination  of  these  ef¬ 
fects  yields  an  effective  capacity  of  4  to  6  items  (VanRullen  & 
Koch,  2003).  This  number  represents  how  many  items  can  be 
represented  in  visual  awareness  at  any  one  time.  Objects/infor¬ 
mation  may  then  be  transferred  to  long-term  memory,  a  process 
that  has  also  been  researched  extensively. 

How  much  information  can  be  fit  on  one  display,  then,  is  de¬ 
pendent  on  the  task  and  environmental  factors.  If  one  is  read¬ 
ing  a  depiction  of  an  instrument  approach  with  the  intention  of 
memorizing  details,  and  there  is  no  time  pressure,  a  great  deal 
of  information  can  be  displayed  and  recalled  effectively.  If  an 
air-traffic  controller  has  a  fraction  of  a  second  to  glean  track  in¬ 
formation  from  a  radar  display,  he  or  she  is  unlikely  to  extract 
more  than  4  to  6  pieces  of  information  from  that  display.  More¬ 
over,  what  information  would  be  extracted  from  that  display  is 
unclear,  although  factors  of  the  displayed  information  (salience, 
size,  color,  etc.)  can  influence  selective  attention. 

One  further  question  remains  in  regards  to  information  dis¬ 
play  that  we  will  address  here — the  rate  at  which  information 
can  be  extracted.  This  is  also  complex,  dependent  upon  physi¬ 
ological  factors,  features  of  the  display,  complexity  of  the  in¬ 
formation  to  be  extracted,  and  other  factors.  Simple  responses 
to  visual  stimuli  can  occur  within  30  to  60  msec  (Iwasaki,  1996), 
but  more  complex  objects  (such  as  text)  can  require  a  great 
deal  more  time. 

In  summary,  the  answer  to  the  question  of  how  much  infor¬ 
mation  can  be  fit  on  a  display  is,  “It  depends.”  The  question  has 


become  even  more  relevant  as  electromechanical  displays  have 
been  almost  universally  replaced  by  computer-driven  elec¬ 
tronic  displays,  which  can  display  a  great  deal  of  information 
in  one  place.  These  displays  are  also  three-dimensional  (3D) 
in  the  sense  that  they  can  contain  pages  of  information  (adding 
a  new  task — navigation  of  a  display),  and  their  contents  can  be 
modified  by  the  user  (or  automation) . 

Clutter 

Clutter  is  a  significant  issue  in  two  main  classes  of  displays: 
(a)  multifunction  displays  (such  as  air-traffic  control  displays  and 
flight  deck  displays  that  show  route,  weather,  and  traffic),  and  (b) 
heads-up  displays  (HUDs).  The  source  of  the  clutter  is  different 
in  the  two  cases.  In  the  former  case,  a  large  quantity  of  data  is  be¬ 
ing  overlaid  on  the  display,  whereas  in  the  latter,  the  display  is 
overlaid  on  a  source  of  great  clutter  (the  outside  world).  In  both 
cases,  the  problem  is  the  same:  response  time  for  a  stimulus  is  in¬ 
creased  with  the  proximity  of  nonstimulus  items  (C.  W.  Eriksen 
&  B.  A.  Eriksen,  1974).  Additionally,  clutter  can  increase  the  time 
it  takes  for  the  pilot  to  locate  an  item,  or  even  obscure  items  of 
interest  (Wickens,  2003).  Clutter  has  also  been  mentioned  by 
pilots  in  subjective  ratings  for  some  time  (Abbott  et  al.,  1980). 

The  solution  to  this  problem  is  to  provide  the  operator  with 
decluttering  capability — that  is,  the  ability  to  either  manually  or 
automatically  remove  information  from  the  display;  however,  if 
the  information  is  no  longer  present,  it  cannot  be  integrated 
with  other  information,  and  changes  to  the  information,  which 
could  be  important,  will  of  course  not  be  noticed.  This  is  in  ad¬ 
dition  to  the  workload  required  to  remove  or  add  information 
to  the  display,  which  could  be  considerable,  particularly  in  the 
case  of  navigating  a  multifunction  display. 

Displaying  Spatial  Relationships 

The  Fitts  (1951b)  report  suggested  that  “studies  should  be 
made  to  determine  principles  governing  the  effective  display 
of  information  about  the  relative  position  of  aircraft  and  ground 
objects  in  tri-dimensional  space”  (p.  xvii).  Specifically,  the  com¬ 
mittee  felt  that  research  should  be  directed  at  determining  the 
best  types  of  projections  for  displaying  spatial  relations,  and 
how  symbolic  and  pictorial  displays  are  best  used  for  this  pur¬ 
pose.  In  addition,  more  studies  were  called  for  to  determine 
what  display  characteristics  were  required  when  control  manip¬ 
ulation  resulted  in  changes  in  the  display  (e.g.,  turning  a  control 
yoke  results  in  changes  in  bank  and  heading  indications). 

Aviation  displays  is  a  natural  place  for  interest  into  the  display 
of  spatial  relations  to  develop.  Air-traffic  controllers  must  gauge 
both  absolute  and  relative  position  and  motion  of  aircraft  in  at 
least  two  dimensions  (often  three) ,  and  provide  spatial  control 
guidance  to  pilots.  A  pilot  must,  through  use  of  instrumenta¬ 
tion,  determine  relative  (and  sometimes  absolute)  position  in 
3D  coordinates,  and  keep  track  of  3D  orientation.  This  is  ac¬ 
complished  through  the  use  of  (a)  an  attitude  indicator  (ADI), 
which  shows  pitch  and  bank,  (b)  a  heading  indicator,  which 
shows  absolute  orientation,  (c)  a  turn  indicator,  which  shows 
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yaw  rate,  and  (d)  one  or  more  instruments  showing  relative  po¬ 
sition  and  bearing  (e.g.,  altimeters,  radio  magnetic  indicator, 
course  deviation  indicator,  vertical  deviation  indicator,  distance 
measuring  equipment) . 

One  problem  that  almost  certainly  drove  the  recommenda¬ 
tion  in  the  Fitts  (1951b)  report  regarding  the  display  of  orienta¬ 
tion  was  the  problem  of  the  ADI.  In  particular,  what  should 
move,  the  aircraft  or  the  earth?  Several  principles  related  to  this 
question  (Roscoe,  1981)  have  been  put  forth  in  consideration  of 
the  body  of  research  into  interfaces: 

•  Principle  of  pictorial  realism:  The  display  should  be  pictorially 
analogous  to  the  real  world. 

•  Principle  of  compatible  motion:  The  part  that  moves  in  the 
real  world  should  be  the  part  that  moves  on  the  display. 

•  Principle  of  integration:  Related  information  should  be  inte¬ 
grated  on  the  same  display. 

•  Principle  of  pursuit  presentation:  A  display  should  allow  pur¬ 
suit  tracking  versus  just  compensatory  tracking — that  is,  if  a 
target  moves  both  absolutely  and  relatively,  both  sources  of 
movement  should  be  displayed. 

So  a  moving  aircraft  on  an  ADI  or  an  air-traffic  controller’s 
radarscope  would  be  compatible  with  the  principle  of  com¬ 
patible  motion;  however,  while  the  radarscope  also  complies 
with  the  principle  of  pictorial  realism,  from  the  pilot’s  per¬ 
spective,  the  moving  aircraft  would  (loosely)  violate  it;  from 
the  pilot’s  perspective,  it  appears  that  the  horizon  is  banked 
as  opposed  to  the  aircraft.  Although  designers  have  settled  on 
the  moving  earth  model  for  ADI  design,  the  debate  was  never 
satisfactorily  resolved.  Experiments  based  on  (actually,  an  ex¬ 
tension  of)  a  proposal  to  integrate  the  two  models  (Fogel, 
1959)  showed  improved  performance  (Beringer,  "Willeges,  & 
Roscoe,  1975).  This  topic  continues  to  be  discussed  to  this 
day,  with  researchers  arguing  that  the  moving  earth  ADI  has 
continued  to  be  used  only  because  of  the  lack  of  interest  on 
the  part  of  pilots  to  change  it,  despite  evidence  that  it  is  has 
been  causing  accidents  for  decades  (Bryan,  Stonecipher,  & 
Aron,  1954). 

In  addition  to  the  orientation  question  addressed  mainly  by 
the  ADI,  there  are  questions  about  how  best  to  portray  relative 
position.  This  is  a  common  problem  for  both  air-traffic  con¬ 
trollers,  who  must  assess  relative  bearing  and  range  between  air¬ 
craft,  and  pilots,  who  must  frequently  assess  their  relative  bear¬ 
ing  and  range  from  some  navigational  fix.  More  recently,  the 
question  of  vertical  navigation  has  received  attention. 

A  typical  horizontal-navigation  problem  for  a  pilot  is  com¬ 
mon  to  anyone  familiar  with  basic  instrument  flying.  From  any 
given  position,  a  pilot  could  be  asked  to  fly  to  a  “fix,”  whose  po¬ 
sition  is  defined  by  bearing  and  range  relative  to  a  radio  naviga¬ 
tion  aid  (e.g.,  a'VOR).  Although  this  can  be  calculated  from  cur¬ 
rent  heading  and  the  ownship’s  bearing  and  range  to  the  'VOR, 
it  is  not  a  trivial  task.  A  simpler  task  is  to  fly  inbound  on  a  course 
to  a  'VOR  or  outbound  on  a  radial  from  a  VOR.  Another  exam¬ 
ple  is  the  task  of  determining  the  position  of  the  ownship  given 
access  to  several  radio  navigational  aids. 

Recent  research  has  qualified  the  difficulties  inherent  in 
these  operations  as  “representational”  problems  (Zhang,  &  Nor¬ 


man,  1994).  The  theory  follows  from  the  argument  that  the  in¬ 
formation  to  accomplish  many  tasks  is  distributed  between  the 
external  stimuli  (the  interface)  and  internal  cognition  (in  this 
case,  of  the  pilot;  Norman,  1993).  Furthermore,  the  information 
representations  may  have  different  scales:  (a)  ratio,  (b)  interval, 
(c)  ordinal,  or  (d)  nominal.  Efficient  external  representations  re¬ 
flect  all  the  categories  of  information  present  in  the  “real-world” 
object.  Deficiencies  in  information  would  require  that  informa¬ 
tion  to  be  distributed  to  the  internal  cognition  of  the  operator, 
while  surplus  information  (information  present  in  the  display 
that  is  not  in  the  object)  could  be  misleading. 

Application  of  this  theory  to  the  problem  of  horizontal  navi¬ 
gation  shows  that  different  equipment  requires  different  levels  of 
internal  cognition.  A  simple  ADE  or  VOR  system  requires  inter¬ 
nal  computation  of  angular  differences  and  range.  A  map  type 
display,  however,  explicitly  represents  these,  simplifying  the  task. 

An  extreme  example  of  this  representational  problem  is 
the  difficulty  in  determining  relative  position  from  terrain  (or 
weather)  on  a  paper  map  (or  even  presented  aurally!)  from  their 
knowledge  of  absolute  position  as  given  by  the  flight  instru¬ 
ments.  It  is  difficult  and  time-consuming  to  accurately  compre¬ 
hend  this  situation.  This  was  apparently  recognized,  resulting  in 
the  display  of  “minimum  safe”  altitudes,  which  changes  the  na¬ 
ture  of  what  needs  to  be  comprehended  (the  pilot  only  must 
know  his  or  her  position  with  respect  to  that  altitude,  a  one¬ 
dimensional,  relative  task) . 


INTRODUCING  AUTOMATION 
TO  THE  FLIGHT  DECK 


In  addition  to  research  concerning  information  display  and  as¬ 
sessment,  much  was  being  learned  about  automation  in  gen¬ 
eral.  The  Fitts  report  (1951b)  mentioned  the  need  to  under¬ 
stand  how  tasks  should  be  allocated  between  human  and 
machine,  and  this  was  the  subject  of  some  effort  on  the  part  of 
Fitts  himself  Computer  technology  was  in  its  infancy,  and  little 
was  known  in  the  1950s  about  the  possible  limits  on  the  ability 
of  machines  to  assist  humans.  Many  thought  that  these  ma¬ 
chines  would  gradually  replace  all  human  functions.  Today,  re¬ 
searchers  better  understand  the  different  ways  in  which  we  can 
describe  the  relationship  between  human  and  operator,  al¬ 
though  this  relationship  is  changing  as  new  types  of  automated 
assistance  are  being  added  to  aircraft  and  air- traffic  systems. 

This  section  will  discuss  the  concept  of  allocating  roles  to  au¬ 
tomation  and  humans,  beginning  with  some  of  the  early  con¬ 
cepts,  and  then  progressing  to  more  recent  characterizations 
of  the  roles  of  humans  and  automation.  Following  this  will  be  a 
discussion  of  how  cognitive  engineering  has  helped  shape  early 
systems,  consisting  mainly  of  control  automation,  and  how  it  is 
significantly  involved  in  newer  types  of  systems,  consisting 
mainly  of  information  automation. 

Autopilots 

In  a  sense,  automatic  control  capability,  such  as  an  autopilot, 
neatly  satisfied  the  prominent  conception  of  the  proper  roles  of 


36.  Human-Computer  Interaction  in  Aerospace  •  727 


automation  and  human  operators.  The  automation  took  over  the 
routine  work  of  simultaneously  maintaining  course,  altitude,  and 
speed.  Humans  had  a  new  system  to  deal  with,  but  could  lever¬ 
age  their  long-term  memory  capability,  reasoning,  and  judgment. 
However,  Fitts  (1951b)  also  noted  that  humans  are  poor  moni¬ 
tors,  and  even  suggested  a  specific  proper  role  of  automation 
may  be  to  monitor  humans.  The  addition  of  a  highly  reliable  sys¬ 
tem  to  monitor  for  long  periods  of  time  violates  that  principle. 

The  autopilots,  in  fact,  were  a  source  of  a  number  of  acci¬ 
dents.  A  much-discussed  accident  in  which  an  L-1011  crashed 
into  the  Everglades  because  the  autopilot  was  inadvertently  dis¬ 
connected  is  an  example  of  the  “graceful”  degradation  of  the  au¬ 
topilot  system.  The  automation  fails  with  little  or  no  notice  to 
the  flight  crew,  who  are  not  good  at  monitoring  it  to  start  with. 
Several  other  incidents  demonstrated  that  pilots  can  overrely  on 
automation  to  perform  its  tasks  (Parasuraman  &  Riley,  1997). 

The  autopilot’s  development  began  in  1914,  when  Lawrence 
Sperry  created  a  gyroscopic  device  capable  of  stabilizing  an  air¬ 
craft’s  flight  without  input  from  the  pilot.  This  lead  to  the  design 
of  a  fully  automatic  aircraft  by  1915,  and  a  fully  automatic  pilot  by 
1916,  although  it  would  not  be  until  1931  that  an  autopilot  was 
licensed  to  operate  with  passengers  on  board  (Davenport,  1978). 
An  automatic  landing  system  was  invented  in  1937  by  Capt.  Carl 
Crane,  and  since,  numerous  refinements  in  control  theory  have 
been  applied  to  autopilots.  One  particularly  interesting  addition 
to  control  theory  was  the  finding  that  the  behavior  of  human- 
machine  systems  with  simple  feedback  control  systems  (such  as 
an  aircraft  under  nominal  flight  conditions)  can  be  modeled  using 
a  simple  control  law  (McRuer  &  Graham,  1965). 

"While  changes  to  the  underlying  control  theory  remain 
transparent  to  pilots,  changes  in  the  capabilities  of  the  autopi¬ 
lots  have  not.  Pilots  interfacing  with  the  autopilot  often  face 
“mode  confusion,”  a  term  that  indicates  when  the  pilot’s  un¬ 
derstanding  of  the  mode  in  which  the  autopilot  is  operating  dif¬ 
fers  from  its  actual  mode.  When  this  occurs,  incidents  such  as 
those  described  above  are  more  prevalent. 

In  general,  however,  autopilots  have  been  amazingly  suc¬ 
cessful.  They  relieve  pilots  of  a  great  deal  of  manual  workload, 
particularly  during  routine  phases  of  flight  such  as  cruise.  Au¬ 
topilots  are  also  capable  of  increased  accuracy  at  navigation, 
which  is  reflected  in  their  ability  to  land  aircraft  in  poor  weather, 
including  when  poor  visibility  completely  obstructs  the  pilot 
from  seeing  the  runway. 

Role  Allocation 

The  “Fitts  list”  shown  in  Table  36.1  served  as  the  launching  point 
for  studies  on  the  allocation  of  tasks  to  humans  and  automation. 


TABLE  36. 1 . 

Fitts  List 

Humans  Are  Good  At 

Machines  Are  Good  At 

Sensory  functions,  detection 

Speed  and  power 

Perceptual  ability 

Routine,  repetitive  work 

Flexibility,  improvisation 

Computation 

ludgment  and  selective  recall 

Short-term  storage 

Reasoning 

Simultaneous  operations 

Long-term  memory 

Short-term  memory 

Automation  for  the  control  of  aircraft  had  been  around  for 
decades  by  1951.  Not  much  more  was  automated,  however,  un¬ 
til  the  late  1960s  and  early  1970s,  when  some  systems,  naviga¬ 
tion,  and  automated  alerting  were  added.  The  early  1980s  saw 
the  advent  of  graphical  displays,  flight  management  systems,  and 
the  removal  of  flight  engineers.  The  late  1980s  introduced  the 
concept  of  glass  cockpits,  replacing  nearly  all  electromechanical 
“steam  gauges”  with  graphical  displays  (Billings,  1997). 

Over  the  decades,  several  different  attempts  at  methodolo¬ 
gies  for  allocating  function  have  been  made.  This  section  will  be 
organized  around  those  efforts:  (a)  the  Fitts  list,  (b)  the  “auto¬ 
mate  everything”  philosophy,  (c)  supervisory  control,  (d)  guide¬ 
lines  for  automation,  and  (e)  levels  of  automation. 

Fitts  list.  In  order  to  discuss  the  advantages  and  dis¬ 
advantages  of  this  approach,  but  without  rehashing  Fitts’  dis¬ 
cussion  of  them,  we  will  first  discuss  the  contents  of  the  list,  and 
findings  surrounding  the  truth  or  fallacy  of  the  claims.  Then  we 
will  give  examples  of  applying  the  Fitts  list  to  automation,  ex¬ 
amining  the  strengths  and  weaknesses  of  the  approach.  Finally, 
I  will  give  some  examples  of  criticism  levied  at  this  method  of 
function  allocation. 

Many  of  the  items  in  the  list  still  appear  to  be  true.  'Very  lit¬ 
tle  argument  can  be  made  with  the  assertion  that  machines 
can  generate  more  power  than  humans  can,  and  they  can  per¬ 
form  many  tasks  faster  than  a  human.  On  the  other  side  of  the 
list,  humans  are  still  much  more  capable  of  improvisation  and 
making  judgments,  and  computers  are  not  able  to  reason  in 
any  meaningful  way,  despite  significant  advances  in  computer 
technology. 

Yet  some  comparisons  appear  to  be  more  complicated  than 
suggested  by  the  Fitts  list.  'While  a  human’s  perceptual  ability  is 
quite  impressive,  some  machines  are  capable  of  remarkable 
feats  of  detection:  thermal  imaging  systems  that  can  “see” 
through  walls,  satellites  that  can  read  individual  license  plates, 
microscopes  that  can  image  individual  atoms,  explosive  detec¬ 
tion  equipment  that  can  “smell”  tiny  amounts  of  specific  chem¬ 
icals,  and  so  on.  Despite  this,  it  still  requires  a  human  to  inter¬ 
pret  the  results.  Machines  still  have  only  primitive  (and  brittle) 
abilities  to  deal  with  recognition  (particularly  of  symbol  systems, 
faces,  and  expressions,  etc.). 

In  general,  it  appears  humans  are  “wired”  for  rapid  operation 
in  a  highly  uncertain  and  diverse  environment,  whereas  ma¬ 
chines  are  still  generally  relegated  to  rather  specific  operations 
in  a  well-regulated  and  defined  environment.  Therefore,  while 
machines  may  be  better  at  routine  and  repetitive  work,  they  are 
generally  unable  to  deal  with  events  or  circumstances  outside  of 
the  expected  operating  regime.  Humans,  however,  are  able  to 
operate  in  the  face  of  such  events,  leading  to  the  notion  that  hu¬ 
mans  are  suited  for  “supervising”  automation  (which  will  be  dis¬ 
cussed  in  a  later  section) . 

In  practice,  Fitts  lists  have  been  difficult  to  apply  in  allocat¬ 
ing  function  (Sheridan,  1998).  One  likely  reason  is  that  a  Fitts 
list  approach  requires  automation  to  be  assigned  to  a  particular 
function.  In  reality,  most  functions  require  that  some  aspects 
are  done  by  human  operators  and  some  by  automation.  Since 
the  functions  are  shared  between  humans  and  automation,  any 
distinction  of  human  or  machine  for  a  function  is  likely  to  be 
suboptimal. 
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The  “automate  everything”  philosophy.  Although  this 
method  of  allocation  seems  unjustifiable  on  its  face,  it  has  had  its 
proponents,  and  its  reasons.  Humans  have  typically  been  seen  as 
the  source  of  a  majority  of  error  in  the  system.  Replacing  them 
with  “trustworthy”  automation,  which  also  came  with  the  bene¬ 
fits  of  lower  cost  and  more  efficient  operation,  did  not  seem 
quite  so  ridiculous  several  decades  ago.  Officials  of  two  differ¬ 
ent  aircraft  manufacturers  have  been  quoted  as  adhering  to 
some  form  of  this  approach  even  within  this  last  decade  (Billings, 
1997).  In  addition,  these  thoughts  came  at  a  time  when  the 
growth  of  technology  appeared  to  be  facilitating  the  design  of 
automation. 

By  the  late  1970s,  it  was  becoming  apparent  that  humans  and 
automation  had  trouble  getting  along.  Among  the  problems 
frequently  cited  (Wiener  &  Curry,  1980)  were: 

•  Crew  errors  exacerbating  automation  errors 

•  Improper  setup  of  automated  systems 

•  Improper  response  to  alerts 

•  Failure  to  monitor 

•  Loss  of  proficiency 

In  truth,  however,  the  hurdles  for  an  entirely  automated  air¬ 
craft  have  been,  so  far,  too  great  to  be  surmounted.  For  one,  the 
current  social  and  political  climate  would  have  to  undergo  sig¬ 
nificant  changes.  Furthermore,  since  that  time  it  has  become  ap¬ 
parent  that  automation  is  not  as  reliable  as  had  been  believed 
at  the  time.  Examples  of  failures  in  automatic  equipment  have 
been  common  and  persistent  (Wiener  &  Curry,  1980). 

In  addition,  under  this  strategy  the  operator  would  be  left 
with  any  functions  that  could  not  be  automated  (Parasuraman, 
Sheridan,  &  Wickens,  2000).  The  result  is  that  ill-defined,  diffi¬ 
cult  tasks  may  be  assigned  to  an  operator  that  is  significantly  de¬ 
tached  from  the  operation  of  the  system.  It  is  unlikely  that  the 
operator  could  perform  well  under  these  circumstances. 

In  such  a  system,  the  operator  may  be  left  with  the  task  of 
monitoring  the  system  with  the  resumption  of  control  should 
the  automation  fail.  Unfortunately,  the  skills  necessary  to  do  so 
would  likely  be  degraded  from  nonuse.  In  addition,  retrieval 
from  long-term  memory  is  more  difficult  if  information  goes  un¬ 
used  for  long  periods  of  time,  and  the  model  of  the  system  re¬ 
quired  for  diagnosis  and  action  is  unlikely  to  be  present  in  an 
operator  who  rarely  actively  controls  it.  These  last  few  points 
have  been  referred  to  as  “ironies  of  automation,”  since  the  more 
advanced  the  automation,  the  more  valuable  may  be  the  oper¬ 
ator’s  function,  but  the  less  able  the  operator  may  be  to  fulfill 
that  function  (Brainbridge,  1982). 

Human  supervisory  control.  About  the  same  time  as 
the  automate-everything  approach,  a  concept  of  interaction  be¬ 
tween  humans  and  automation  was  being  discussed,  called  hu¬ 
man  supervisory  control  (Ferrell  &  Sheridan,  1967).  This  was 
an  offshoot  of  research  into  how  humans  could  control  a  re¬ 
mote  vehicle  (specifically,  vehicles  on  the  moon) .  It  became  ap¬ 
parent  that  under  the  three-second  delay  for  information  to  be 
sent  to  and  return  from  the  moon,  some  control  loops  would 
have  to  be  closed  by  the  automation,  with  the  operator  acting  as 
a  supervisor  of  the  automation  (Sheridan,  1992). 


plan 


teach 


nitor  [*-| 


learn 


I-*]  navigation[-»CH^  guidance  | — control 


controlled 

process 


FIGURE  36.2.  Sheridan’s  supervisory  and  control  loops. 
(Adapted  from  Thomas  B.  Sheridan.  Telerobotics  Automation,  and 
Human  Supervisory  Control,  Figures  1.2  and  1.3,  Massachusetts 
Institute  of  Technology,  published  by  The  MIT  Press). 


The  concept  reflected  the  growing  role  of  automation,  and 
the  changing  of  the  human’s  role  from  operator  to  manager  of 
automation.  As  shown  in  Fig.  36.2,  several  levels  of  supervisory 
and  control  loops  exist,  with  the  human  closing  the  supervi¬ 
sory  loops,  and  each  control  loop  being  able  to  be  closed  by 
either  the  automation  or  the  human. 

Designers  could  now  employ  control  theory  concepts  to  the 
human-automation  problem.  The  role  of  humans  was  essen¬ 
tially  defined  by  this  paradigm,  and  the  designer  could  then 
look  at  lower  level  tasks  and  turn  to  questions  such  as  (a)  what 
human  resources  are  required  for  each  task,  (b)  how  much  time 
and  effort  is  required,  (c)  how  much  time  and  energy  is  avail¬ 
able,  and  (d)  what  the  consequences  are  of  accomplishing  or 
not  accomplishing  an  action  (Sheridan,  1992). 

Sheridan  goes  on  to  describe  the  limitations  of  humans  and 
automation,  and  the  implications  of  this  for  applying  the  con¬ 
cept.  These  admonishments  went  unheeded,  however,  and  au¬ 
tomation  was  steadily  added  to  the  flight  deck  and  control 
rooms,  with  the  expectation  that  the  operators  would  handle 
the  additional  managerial  workload. 

Levels  of  automation.  Parasuraman,  Sheridan,  and 'Wick¬ 
ens  (2000)  recently  proposed  a  new  model  for  allocation  of  au¬ 
tomation  functions  across  four  classes  and  along  a  continuum  of 
low  automation  to  high  automation  (Parasuraman,  Sheridan,  & 
■Wickens,  2000).  The  four  classes  are  (a)  information  acquisition, 
(b)  information  analysis,  (c)  decision  and  action  selection,  and 
(d)  action  implementation.  The  level  of  automation  can  be  al¬ 
located  for  each  of  these  four  levels,  and  can  be  adapted  “on  the 
fly”  during  operational  use.  The  overriding  consideration  in  de¬ 
termining  level  of  automation  is  the  consequences  for  human 
performance  in  the  wake  of  that  automation  decision. 

Information  acquisition  automation  relates  to  the  “sensing 
and  registration  of  input  data.”  This  includes  such  things  as 
moving  sensors  (low  level  of  automation),  organizing  infor¬ 
mation  (moderate),  highlighting  information  (moderate),  and 
filtering  information  (could  be  a  high  level  of  automation). 

Information  analysis  automation  relate  to  the  requirement  to 
integrate  and/or  transform  information.  Low-level  examples  in¬ 
clude  showing  track  predictions  or  trends.  At  higher  levels,  data 
may  be  integrated,  displaying  only  high-level  information  to  the 
operator. 

Decision  and  action  selection  automation  assists  in  deter¬ 
mining  courses  of  action  and  in  making  decisions.  Low-level  au- 
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tomation  in  this  class  may  simply  recommend  courses  of  action. 
At  higher  levels  the  automation  may  perform  the  action  that  it 
feels  is  appropriate.  The  difference  is  demonstrated  by  the 
ground-proximity  warning  system  (GPWS),  which  recommends 
an  action  and  is  at  a  fairly  low  level,  and  the  ground  collision 
avoidance  system,  which  is  at  a  significantly  higher  level  as  it  au¬ 
tomatically  takes  control  to  avoid  a  ground  impact  if  the  pilot 
does  not  take  action. 

Action  implementation  automation  performs  actions  for  the 
operator.  The  level  is  determined  by  the  amount  of  manual 
control  left  to  the  human  operator.  For  example,  an  autopilot 
on  heading  hold  may  still  require  heading  changes  using  a  turn 
controller  and  speed  control  using  throttles.  This  would  be 
a  lower  level  of  action  implementation  than  a  fully  coupled 
autopilot. 

The  appropriate  level  of  automation  for  different  classes 
should  then  be  decided  upon  by  examining  possible  solutions 
at  different  levels  of  automation  and  applying  primary  and  sec¬ 
ondary  criteria  in  a  possibly  iterative  fashion.  The  primary  cri¬ 
teria  are  based  upon  aspects  of  human  performance  such  as 
mental  workload,  situation  awareness,  complacency,  and  skill 
degradation.  The  secondary  criteria  are  based  upon  more  prac¬ 
tical  concerns,  such  as  reliability  of  the  automation  and  the 
costs  versus  benefits  of  automation  decisions. 

Ecological  allocation.  Advocates  of  this  approach  (al¬ 
though  they  do  not  use  this  term  to  describe  it)  regard  function- 
based  representations  of  work  as  inadequate  representations 
of  the  actual  workplace.  The  workplace  is  complex,  and  there 
is  interplay  between  technology  and  work — that  is,  just  as  the 
workplace  is  changed  by  automation,  the  use  of  the  automa¬ 
tion  may  be  changed  by  the  workplace.  The  claim  is  then  that 
one  cannot  adequately  allocate  function  without  fully  compre¬ 
hending  the  work  environment  and  the  situations  that  arise 
therein.  Suchman  (1987)  stressed  that  work  is  “situated,”  and  as 
a  result  the  system  must  “incorporate  both  a  sensitivity  to  local 
circumstances  and  resources  for  the  remedy  of  troubles  in  un¬ 
derstanding  that  inevitably  arise”  (p.  28). 

Moreover,  the  situations  that  arise  can  fundamentally  change 
the  work  environment.  In  a  discussion  of  previous  work,  Wright, 
Dearden,  and  Fields  (1998)  related  three  themes  that  emerged 
from  their  (and  other  related)  studies: 

The  first  is  that  divisions  of  labor  are  often  dynamically  re-allocated  on 
the  basis  of  local  contingencies  on  an  occasion  of  practice.  The  second 
is  that  the  processes  by  which  such  working  divisions  of  labor  are 
achieved  are  a  significant  component  of  that  work.  The  third  theme  is 
that  even  where  divisions  of  labor  are  pre-computed  and  reified  in  such 
mechanisms  as  standard  operating  procedures,  there  is  still  work  to  be 
done  in  making  those  procedures  work.  (Wright  et  al.,  1998,  p.  339) 

Often,  due  to  the  changing  nature  of  work,  organization, 
and  even  situation,  the  underlying  functions  of  automation  can 
change.  In  a  study  of  two  incidents  in  the  London  Under¬ 
ground  (subway  system)  in  which  trains  left  their  drivers  be¬ 
hind,  it  was  discovered  that  the  operators  had  altered  one  of 
the  two  controls  required  to  operate  the  train  (Wright  et  al., 
1998).  One  automated  control  requires  all  doors  of  the  train 
to  be  shut  before  the  train  will  move.  A  second  control  is  sim¬ 
ply  a  button  for  the  operator  to  push  to  start  the  train  moving. 


The  operators,  apparently  believing  that  the  second  control 
was  redundant,  taped  the  button  in  the  on  position.  The  two  in¬ 
cidents  happened  when  operators  left  their  station  to  fix  stuck 
doors.  When  the  doors  were  closed,  the  “circuit”  was  closed, 
and  the  trains  began  moving. 

Wright  et.  al.  (1998)  argued  that  the  operators  had  unknow¬ 
ingly  changed  the  allocation  of  functions.  Their  function  had 
been  removed — the  automation  now  had  sole  control  to  oper¬ 
ate  the  vehicle.  Yet  these  types  of  changes  to  function  are  not 
only  the  result  of  errors.  Pilots  make  use  of  automation  in  ways 
that  change  their  function  as  well  (e.g.,  stall  warning  systems  as 
angle  of  attack  indicators,  situational  awareness  instruments  as 
navigation  sensors). 

For  this  reason,  it  may  be  desirable  to  allocate  functions  dy¬ 
namically,  as  it  is  done  in  normal  social  work  contexts.  In  this 
concept,  the  allocation  of  function  would  depend  on  the  con¬ 
text  of  each  occasion  of  use  (Scallen,  Hancock,  &  Dudley,  1996). 
In  this  case,  designers  would  have  to  examine  the  consequences 
of  improper  configuration  of  the  automation  and  how  to  miti¬ 
gate  the  likelihood  of  that  occurring  (Dearden,  Harrison,  & 
Wright.  1998). 

One  method  of  obtaining  the  richer  representations  of  the 
work  context  called  for  in  this  concept  would  be  the  use  of  sce¬ 
narios  (Dearden  et  al.,  1998),  as  discussed  in  chapter  54.  Other 
methods  include  ethnographic  approaches  (Hughes,  Randall,  & 
Shapiro,  1992)  and  contextual  design  methods  (Beyer  & 
Holtzblatt,  1998),  which  are  discussed  in  detail  in  chapters  51 
and  50,  respectively.  The  former  finds  researchers  immersing 
themselves  in  the  work  environment  under  study  to  attempt  to 
understand  the  social  organization  of  the  work.  In  the  latter,  the 
work  environment  is  viewed  from  different  perspectives  (e.g., 
user,  cultural  relationships,  sequences  of  actions,  information 
artifacts,  physical  layout)  to  define  roles  and  tasks. 

Guidelines  for  automation.  Over  the  years,  a  num¬ 
ber  of  efforts  at  establishing  guidelines  for  automation  have 
been  described.  Several  of  these,  by  notable  authors,  are  dis¬ 
cussed  here. 

Wiener  and  Curry  first  presented  guidelines  for  automation 
in  1980.  They  advised  designers  to  create  automation  that  was 
comprehensible  to  the  operator,  so  that  the  operator  could  de¬ 
tect  and  properly  react  to  failures  in  the  system.  They  also  ad¬ 
vised  that  the  system  should  appear  to  perform  the  task  in 
much  the  same  way  as  the  operator,  allowing  for  individual 
styles  of  operation  while  not  affecting  overall  system  perfor¬ 
mance.  They  suggested  that  operators  should  be  well  trained  on 
the  setup  of  the  system,  as  well  as  nominal  and  off-nominal  op¬ 
eration,  emphasizing  its  use  as  an  aid  rather  than  as  a  replace¬ 
ment.  Operators  should  also  be  trained  on  the  alerts  of  the  sys¬ 
tems,  which  should  be  able  to  be  validated  by  the  operator,  and 
should  clearly  indicate  both  the  source  of  the  alarm  and  the 
severity  of  the  situation.  Wiener  and  Curry  also  suggested  that 
monitoring  may  be  an  important  task  for  the  operator,  who 
should  be  trained  and  motivated  to  monitor,  and  whose  work¬ 
load  should  be  kept  at  a  moderate  level  to  avoid  both  high-  and 
low-workload  situations. 

Billings,  in  1997,  agreed  with  many  of  these  findings,  but 
stressed  that  the  automation  should  never  have  control  over  the 
system.  The  operator  should  be  in  control,  and  even  should  not 
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be  automatically  prevented  from  exceeding  normal  operating 
limitations.  Billings  also  warned  against  overautomation,  the 
tendency  of  complex  automation  to  become  incomprehensible, 
and  the  resulting  system  being  too  complex  for  the  operator  to 
control  in  the  case  of  automation  failure. 

Parasuraman  and  Riley  (1997)  added  to  these  guidelines  by 
pointing  out  the  possibility  that  in  certain  situations  operators 
can  “over-rely”  on  automation,  while  in  other  situations  opera¬ 
tors  can  “under-rely”  on  the  automation  (Parasuraman  et  al., 
2000).  These  situations  generally  occur  when  the  operator  does 
not  have  sufficient  knowledge  about  the  automation  or  the  sit¬ 
uation,  or  when  confidence  in  the  automation  is  misplaced  (ei¬ 
ther  high  or  low).  They  found  that  sudden  alerts  do  not  ade¬ 
quately  allow  the  operators  to  prepare  to  respond,  and 
suggested  the  use  of  preparatory,  or  “likelihood,”  alerts.  They 
warned  that  in  many  cases,  automation  replaced  operator  er¬ 
ror  with  designer  error,  which  can  occur  if  designers  do  not  in¬ 
corporate  knowledge  of  operational  practice  into  their  design, 
and  do  not  allow  for  some  nonintended  uses  of  the  automation. 

Early  Air-Traffic  Control 

Efforts  at  controlling  air  traffic  began  quickly  after  the  advent  of 
aircraft.  These  early  efforts  consisted  of  drafting  procedures  for 
safe  operation  of  aircraft,  and  were  undertaken  by  private  com¬ 
panies  until  the  1930s  (Gilbert,  1973).  As  aviation  grew,  dedi¬ 
cated  professionals  increased  their  efforts  to  track  aircraft  and 
ensure  their  separation  from  one  another. 

This  process  was,  and  to  a  surprising  degree  still  is,  a  highly 
manual  process.  Initially,  aircraft  had  contact  via  radio  with 
their  companies,  who  relayed  position  and  timing  information 
to  a  central  control  facility.  The  controllers  in  that  facility  would 
record  this  information  on  a  large  blackboard  and  manipulate 
aircraft  markers  (also  called  “shrimp  boats”)  on  a  map  table. 
They  would  analyze  this  information  in  an  attempt  to  predict 
potential  conflicts  between  aircraft.  This  ability  to  mentally  pro¬ 
ject  aircraft  states  is  still  relied  on  today  when  radar  displays  fail. 

Progress  from  these  modest  beginnings  was  slow  compared 
to  the  progress  of  aircraft  technology.  The  two  most  significant 
advances  in  air-traffic  technology  occurred  after  World  War  II. 
At  least  partially  due  to  their  experiences  in  the  Berlin  Airlift,  air- 
traffic  personnel  adopted  the  practice  of  speaking  directly  to  air¬ 
craft.  In  addition,  the  development  of  radar  allowed  controllers 
to  track  aircraft  in  real  time. 

Early  radar  displays.  Although  the  application  to  air- 
traffic  control  is  obvious,  early  efforts  at  controlling  aircraft  with 
radar  were  troubled.  Early  radar  displays  showed  the  position 
of  each  aircraft  relative  to  the  radar  station,  but  did  not  display 
other  critical  information,  such  as  aircraft  identification,  speed, 
or  altitude.  Because  of  this,  associating  targets  with  radar  re¬ 
turns  was  difficult,  and  projecting  their  status  into  the  future 
was  still  a  largely  cognitive  process.  The  recognition  of  this 
helped  bring  about  the  introduction  of  secondary  surveillance 
radar,  in  which  a  transponder  on  board  each  aircraft  transmits 
an  identification  code  and  altitude  to  a  ground  receiving  station 
(Gilbert,  1973).  This  information  can  then  be  correlated  with 
the  radar  returns,  eliminating  the  need  for  map  tables. 


Flight  strips  and  strip  bays.  Controllers  gradually  re¬ 
placed  the  blackboard  with  small  strips  of  paper  on  which  in¬ 
formation  about  each  flight  was  recorded.  These  would  be 
placed  in  a  rack  (or  bay)  in  temporal  order,  and  even  passed 
from  controller  to  controller  as  the  flight  progressed.  As  com¬ 
puters  were  added  in  the  1960s  and  1970s,  one  of  their  first 
tasks  was  to  automate  this  flight  data  processing,  including 
printing  flight  strips. 


MODERN  CHALLENGES,  SOLUTIONS, 
AND  PRINCIPLES 


The  first  50  years  of  aviation  saw  significant  changes  and  inno¬ 
vations  in  both  aircraft  and  air-traffic  control.  Recent  develop¬ 
ments  such  as  (a)  complex  flight  management  systems,  (b)  mul¬ 
tifunction  displays,  (c)  highly  integrated  information  systems, 
(d)  alerting  and  warning  systems,  and  (e)  sophisticated  auto¬ 
matic  control  capability  have  presented  even  more  complex 
challenges  for  aviation  professionals  and  researchers. 

Glass  Cockpits,  Flight  Management  Systems, 

Mental  Models,  and  Mode  Confusion 

Beginning  in  the  1970s,  airliners  were  equipped  with  flight  man¬ 
agement  computers,  which  would  assist  pilots  in  numerous 
tasks,  including  flight  path  management,  navigation,  and  fuel 
burn  management.  The  interface  to  the  flight  management  com¬ 
puter  is  through  a  control  display  unit,  which  consists  of  (a)  a 
small,  alphanumeric  keyboard,  (b)  several  function  keys,  and  (c) 
a  small  display  screen,  all  of  which  fits  into  about  a  6-inch  by  10- 
inch  space.  On  this  unit,  pilots  must  enter  navigation  informa¬ 
tion,  select  information  to  be  displayed,  and  navigate  through 
menus.  By  the  1980s,  flight-management  computers  were  inte¬ 
grated  with  cathode  ray  tube  (CRT)  displays,  creating  flight  decks 
mostly  devoid  of  gauges,  also  called  “glass  cockpits.”  An  example 
of  a  glass  cockpit  (from  an  MD-11  aircraft)  is  shown  in  Eig.  36.3. 

Such  systems  integrated  the  operation  of  the  aircraft  with 
navigation,  resulting  in  a  “fundamental  shift  in  aircraft  automa¬ 
tion”  (Billings,  1997).  Although  such  displays  provided  the  ben¬ 
efits  of  a  highly  integrated  display,  there  was  a  cost  in  terms  of 
workload,  crew  coordination,  error  management,  and  vigilance 
(Wickens,  Eadden,  Merwin,  &  Verves,  1998).  In  addition,  a  num¬ 
ber  of  accidents  have  been  associated  with  pilots  misunder¬ 
standing  the  state  of  the  automation.  These  accidents  have  been 
attributed  to  the  pilots  lacking  an  accurate  “mental  model”  of 
the  automation. 

Mental  models,  described  in  detail  in  chapter  4,  are  a  concep¬ 
tual  representation  of  a  system  posited  cognitively.  The  precur¬ 
sors  to  mental  model  research  date  back  nearly  a  century.  These 
early  inquiries  (Wittgenstein,  1922;  Craik,  1943),  and  a  signifi¬ 
cant  body  of  recent  work,  argue  that  cognition  is  inherently  imag- 
istic,  so  that  mental  models  are  a  theory  of  how  people  think. 
This  theory  is  a  topic  of  contentious  debate  in  the  psychological 
community,  with  many  researchers  vehemently  denying  that 
people  think  in  these  “pictures  in  the  head,”  but  rather  that 
thought  consists  of  formal  rules  of  inference  (Pylyshyn,  2003). 


36.  Human-Computer  Interaction  in  Aerospace 
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FIGURE  36.3.  MD-1 1  flight  deck  instruments.  (Photo  from  Transportation  Safety  Board  of 
Canada  Report  Number  A98H0003,  page  32,  Transportation  Safety  Board  of  Canada,  1998.  Repro¬ 
duced  with  the  permission  of  the  Minister  of  Public  Works  and  Government  Services,  2007). 


Undeniably,  however,  people  at  least  think  about  such  imag- 
istic  models.  In  doing  so,  operators  reason  about  their  system 
based  on  these  models.  There  is  convincing  evidence  that  such 
models  correlate  well  with  certain  operator  errors,  and  point 
to  ways  in  which  good  design  can  mitigate  these  errors.  As  such, 
we  will  try  to  skirt  the  contentious  issue  of  whether  mental 
models  are  theories  or  models,  and  stick  to  how  the  concept 
has  been  deployed  in  support  of  cognitive  engineering. 

Mental  model  research  was  borne  out  of  the  observation  that 
many  people  use  analogy  to  understand  complex  phenomena. 
This  idea  became  more  formalized  in  the  cognitive  science  lit¬ 
erature,  highlighted  by  a  flood  of  papers  and  books  on  mental 
models  in  the  early  1980s  (Gentner  &  Stevens,  1983;  Johnson- 
Laird,  1983).  If  one  believed  that  a  person’s  knowledge  of  a 
complex  system  was  a  product  of  that  person’s  model  of  the  sys¬ 
tem,  then  an  understanding  of  the  limitations  of  these  models 
was  crucial  to  explaining  behavior  and  errors. 

Nowhere  did  mental  models  seem  to  fit  in  better  than  in  ex¬ 
plaining  a  number  of  human-automation  interaction  failures. 
Flight  deck  automation  was  becoming  increasingly  complex, 


taxing  the  abilities  of  pilots  to  accurately  model  it.  Error  is  likely 
where  such  inaccuracies  exist.  Several  particularly  prominent 
examples  of  this  have  occurred  in  the  last  15  years.  In  1988,  a 
Boeing  767  was  inadvertently  put  into  a  vertical  speed  mode, 
nearly  resulting  in  a  crash.  A  similar  situation  occurred  in  an  Air¬ 
bus  320,  when  a  crew  mistakenly  selected  a  descent  rate  of 
3,300  feet  per  minute  instead  of  on  a  3.3°  downward  slope  and 
flew  into  terrain.  In  1994,  a  China  Airlines  Airbus  300  aircraft 
crashed  after  pilots  tried  to  continue  a  landing  while  the  au¬ 
topilot  had  been  inadvertently  put  into  a  go-around  mode.  Also 
in  1988,  an  Airbus  320  was  flown  into  terrain  by  an  aircrew  dur¬ 
ing  an  air  show;  the  crew  believed  that  the  envelope  protection 
of  the  aircraft  would  prevent  a  stall,  but  the  aircraft  was  in  a 
mode  where  that  protection  was  not  provided. 

These  incidents  indicated  that  one  piece  of  flight  deck  au¬ 
tomation  particularly  prone  to  such  mental  model  inaccuracies 
is  the  flight  management  system,  which  interacts  closely  with 
the  autopilot.  Researchers  had  been  aware  of  mode  problems  in 
HCI  for  some  time,  and  in  due  time  these  issues  arose  in  the 
interaction  of  the  pilot  with  the  computer  that  operates  the 
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autopilot.  This  aspect  of  the  problem — as  a  computer  interface 
issue — has  significantly  affected  the  research  on  mental  mod¬ 
els  in  aviation. 

In  addition  to  calling  for  training  for  pilots  to  better  under¬ 
stand  their  systems,  much  effort  has  been  put  into  creating 
models  of  the  pilot’s  interaction  with  the  flight  management  sys¬ 
tem  in  an  effort  to  understand  how  systems  might  be  designed 
to  mitigate  these  model  inaccuracies. 

Heads-Up  Displays 

In  order  to  minimize  the  need  to  view  instruments  inside  the 
cockpit,  “heads-up”  displays  (HUDs)  were  superimposed  on  the 
windscreen,  providing  at  least  the  basic  T  information.  The 
same  method  can  be  used  to  superimpose  images  on  the  in¬ 
side  of  the  visor  of  a  helmet  (known  as  “helmet  mounted  dis¬ 
plays”  [HMDs]).  HUDs  have  moved  from  the  military,  where 
they  found  their  first  application,  to  commercial  aircraft  and 
even  automobiles. 

One  major  difference  between  the  instrumentation  on  a  HUD 
and  the  same  instrumentation  on  a  traditional  instrument  panel 
is  that  the  world  can  be  viewed  through  the  HUD.  This  allows 
for  the  possibility  of  integrating  real  external  images  with  the  in¬ 
strumentation  without  violating  the  proximity  principle.  The  flip 
side  of  this,  however,  is  that  the  world  peeking  through  the 
HUD  provides  a  great  deal  of  clutter  on  the  display,  although 
some  research  indicates  that  this  is  not  a  significant  increase  in 
difficulty  (Ververs  &  Wickens,  1998).  Overall,  the  HUD  has  a 
number  of  advantages  that  make  it  superior  to  conventional  po¬ 
sitioning  of  displays.  The  HUD  enhances  the  operator’s  ability 
to  switch  between  near  and  far  domains  (Levy,  Foyle,  &  McCann, 
1998),  although  strangely  this  reverses  to  a  detriment  for  the  de¬ 
tection  of  unexpected  events  (Wickens  &  Prevett,  1995). 

Information  Automation 

The  rapid  rise  in  computer  and  display  technology  has  allowed 
engineers  to  develop  systems  that  alleviate  some  of  the  infor¬ 
mation  burden  on  the  operator.  These  systems  perform  auto¬ 
mated  monitoring  (such  as  conflict  alerting  and  prevention), 
provide  information  to  assist  the  pilot’s  decision  making,  and 
help  provide  coordination  between  different  operators. 

Alerting  systems.  In  the  late  1970s,  aircraft  were  under¬ 
going  a  rapid  expansion  in  terms  of  warnings  and  alerting  sys¬ 
tems.  One  survey  found  general  agreement  that  there  were  too 
many  warnings  and  that  standards  and  guidelines  were  needed 
for  warning  system  design  (Cooper,  1977).  Another  researcher 
determined  that  the  number  of  visual  alerts  and  flags  had  dou¬ 
bled,  while  the  number  of  aural  alerts  (see  chapter  14)  had  in¬ 
creased  by  50%  since  the  beginning  of  the  commercial  jet  age 
(Veitengruber,  1977).  Based  on  the  data  at  the  time,  researchers 
were  recommending  that  alerts  be  reduced,  prioritized,  and  in¬ 
hibited  during  critical  phases  of  flight. 

Since  then,  alerting  systems  have  grown  significantly  in  ca¬ 
pability  and  complexity.  Current  alerting  systems  may  utilize 
multiple  sensors,  ground  communications,  large  knowledge 


databases,  and  may  have  sophisticated  algorithms  for  alert 
threshold  and  guidance  recommendations  (Pritchett,  2001). 

One  of  the  most  widely  analyzed  problems  in  alerting  sys¬ 
tem  technology  relates  to  problems  with  conformance  to  the 
ground-proximity  warning  system.  The  ground-proximity  warn¬ 
ing  system  (GPWS)  was  a  complex  alert  designed  to  warn  of 
something  extremely  hazardous:  proximity  to  or  closure  to  ter¬ 
rain.  It  was  complex  in  that  closure  to  terrain  can  be  caused  by 
a  number  of  things,  including  landing,  and  the  system  had  to 
discriminate  these  events.  In  part  due  to  this  complexity,  and  in 
part  due  to  poor  design,  the  system  frequently  produced  false 
alarms.  This  had  the  effect  of  reducing  operator  trust  in  the 
automation  and  causing  underreliance  (the  pilots  would  not 
believe  the  alert) . 

High  false  alarm  rates  can  also  be  caused  by  improper  setting 
of  thresholds  for  alerts,  a  process  that  is  analogous  to  signal  de¬ 
tection.  The  threshold  for  an  alert  can  be  based  upon  the  sys¬ 
tem  operating  characteristics  curve,  which  is  in  turn  a  function 
of  the  sensors  (Kuchar,  1996).  Setting  the  threshold  is  then  a 
tradeoff  between  missed  detections  and  false  alarms.  Typically, 
this  means  that  the  fewer  the  false  alarms,  the  more  the  missed 
detections.  Interestingly,  some  researchers  have  warned  against 
having  too  low  a  false  alarm  rate  (Farber  &  Paley,  1993).  If  the 
false  alarm  rate  is  too  low,  not  only  will  the  occurrence  be  in¬ 
frequent  and  unexpected,  but  also  it  will  likely  require  a  later 
alert,  compounding  the  difficulty  of  reacting  to  the  alert. 

The  problem  of  false  alarms  is  further  exacerbated  by  the  low 
base  rate  of  incidents  requiring  an  alert  (Parasuraman  &  Riley, 
1997).  In  these  cases,  the  relative  rate  of  false  alarms  is  in¬ 
creased.  Yet  in  many  cases,  the  cost  of  a  false  alarm  is  consid¬ 
ered  much  lower  than  a  missed  detection,  leading  designers  to 
err  on  the  side  of  economy  and/or  safety. 

These  (generally)  are  cases  where  the  alerting  system  is  func¬ 
tioning  as  a  signal  detector  or  a  hazard  detector,  the  latter  being 
a  more  complex  version  of  the  former.  Alerting  systems  have  also 
added  hazard-resolving  logic,  such  as  in  TCAS II  (Pritchett,  2001) 
and  NASA’s  Advanced  Airspace  Concept  (Erzberger,  2004).  The 
benefit  of  such  an  approach  is  that  the  resolving  maneuver  can 
be  incorporated  into  the  alerting  logic,  resulting  in  a  reduction  of 
false  alarms;  however,  such  a  system  relies  heavily  on  the  pilot 
appropriately  performing  the  maneuver.  It  is  also  possible  that 
the  maneuver  may  actually  induce  a  collision  by  creating  a  prob¬ 
lem  where  none  actually  existed. 

Decision  support  systems.  In  addition  to  alerting  sys¬ 
tems,  there  are  also  systems  on  board  aircraft  and,  to  an  even 
greater  extent,  in  air-traffic  control  which  are  considered  deci¬ 
sion  support  tools  (see  chapter  30  for  a  discussion  of  design 
approaches  for  decision  support  systems).  These  tools  do  not 
warn  or  alert,  but  provide  information  to  support  decision  mak¬ 
ing.  This  information  is  often  integrated  from  several  sources,  or 
enhanced  in  some  way. 

One  simple  example  of  a  decision  support  system  on  a  flight 
deck  is  the  weather  radar  system.  This  system  provides  no  dis¬ 
crete  alert,  but  rather  shows  the  pilot  where  hazardous  weather 
exists  to  help  pilots  determine  how  to  deviate  to  avoid  it.  With 
no  weather  radar,  pilots  had  to  rely  on  visual  means,  which  can 
only  detect  gross  features  such  as  height,  darkness,  and  the 
presence  of  lightning.  Weather  radar,  however,  not  only  shows 
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rates  of  precipitation  (an  indication  of  severity),  but  can  also 
show  the  lateral  and  vertical  extent  of  the  weather. 

Another  decision  support  system  is  the  Engine  Indication 
and  Crew  Alerting  System.  While  some  aspects  of  this  system 
are  more  correctly  considered  an  alerting  system,  much  of  the 
system  is  simply  decision  support.  This  display  shows  pilots  the 
status  of  most  of  the  systems  on  board.  It  can  be  configured  to 
depict  engine  instruments  or  other  graphical  depictions  of  the 
operation  of  the  systems  on  board  the  aircraft. 

As  mentioned,  there  are  a  number  of  decision-support  tools 
being  used  in  air-traffic  control.  In  fact,  the  number  of  such  dis¬ 
plays  is  increasing  at  such  a  rate  that  the  FAA  is  attempting  to 
integrate  these  systems  to  reduce  the  sheer  number  of  sepa¬ 
rate  systems  operating  in  air-traffic  facilities. 

Modern  Air-Traffic  Control 

The  current  air-traffic  control  system,  while  reliant  on  com¬ 
puter  resources,  has  been  designed  to  be  resilient  to  the  loss  of 
such  resources.  Computers  are  still  used  for  display  of  primary 
and  secondary  radar  information,  as  well  as  flight  data  process¬ 
ing.  The  upgrading  of  these  capabilities  from  the  1970s  has 
been  slow,  impacted  by  the  widespread  replacement  of  experi¬ 
enced  controllers  following  the  air-traffic  controller  strike  of 
1981  and  the  failure  of  the  Advanced  Automation  System  (AAS) 
program  of  the  late  1980s  (Lee  &  Davis,  1996).  On  the  heels  of 
the  AAS  system,  however,  have  come  a  number  of  advances,  in¬ 
cluding  attempts  to  automate  flight  strips  and  the  introduction 
of  automation  aids. 

Automated  flight  strips  and  situation  awareness. 

Flight  strips  have  proven  remarkably  resilient  to  change.  Many 
controllers  still  use  flight  strips,  and  flight  strip  bays  are  still 
used  in  air-traffic  control  facilities.  Recently,  however,  manual 
flight  strips  are  being  replaced  with  electronic  flight  strips, 
most  notably  in  the  User  Request  and  Evaluation  Tool  (URET), 
which  took  25  years  to  implement  (Arthur  &  McLaughlin, 
1998).  Similar  efforts  to  replace  flight  strips  have  been  under¬ 
taken  in  Europe  (Berndtsson  &  Normark,  1999).  The  reasons 
for  their  slow  demise  are  highly  instructive  of  some  of  the  prob¬ 
lems  with  automating  some  forms  of  human  work. 

Despite  the  significant  workload  of  the  manual  process  of 
dealing  with  flight  strips,  controllers  have  been  reluctant  to  part 
with  them,  or  to  use  suggested  replacements.  This  is,  in  large 
part,  due  to  the  flight  strips  contribution  to  the  controller’s  sit¬ 
uation  awareness.  For  a  controller,  whose  job  is  still  to  a  large 
degree  cognitive,  their  situation  awareness  is  crucial  (Whitfield 
&  Jackson,  1983).  Researchers  have  found  that  controllers  indi¬ 
cate  that  flight  strips  are  a  primary  means  of  establishing  and 
maintaining  their  situation  awareness  (Harper  &  Hughes,  1991). 

It  is  a  notable  theme  that  modern  research  attention  has 
turned  from  information  display,  as  described  above,  to  infor¬ 
mation  assessment.  As  more  complex  automation  is  added  to 
flight  decks  and  air-traffic  facilities,  and  as  more  information  is 
provided  to  operators,  accessing  and  comprehending  that  au¬ 
tomation  and  information  became  problematic.  Researchers  be¬ 
gan  to  understand  that  not  only  did  they  need  to  know  how  to 
display  information,  but  also  how  operators  represented  and 


utilized  that  information,  in  order  to  provide  a  better  match  be¬ 
tween  technology  and  the  human. 

Interest  in  information  assessment  has  lead  to  a  significant 
body  of  research  into  situation  awareness  (SA) .  This  has  been 
a  particularly  important  concept  in  aviation,  since  safe  and  suc¬ 
cessful  operation  in  aviation  requires  a  great  deal  of  knowledge 
about  the  environment  outside  of  an  individual  aircraft.  Many 
aviation  accidents  can  be  attributed  to  a  lack  of  knowledge  of 
the  environment  external  to  the  aircraft  (e.g.,  collisions  with 
aircraft  and  the  ground).  Air-traffic  controllers’  knowledge  of 
the  “big  picture”  includes  not  only  the  states  of  the  aircraft  in 
their  own  sector,  but  those  of  neighboring  sectors,  airports, 
weather,  and  much  more.  Attempts  to  understand  how  pilots 
and  controllers  obtain  and  maintain  this  “big  picture”  have  been 
the  focus  of  SA  research. 

Since  as  far  back  as  the  first  World  War,  military  aviators  have 
understood  the  importance  of  SA  and  have  instructed  their  pi¬ 
lots  in  developing  and  maintaining  good  SA.  One  of  Germany’s 
top  World  War  I  fighter  aces,  Oswald  Boelcke,  listed  among  his 
“dicta”  that  “the  pilot  must  acquire  the  habit  of  “taking  in”  un¬ 
consciously  the  general  progress  of  the  whole  multi-aircraft 
dogfight  going  on  around  the  individual  combat  in  which  the  pi¬ 
lot  will  become  involved  .  .  .  (so  that)  no  time  (is)  wasted  in  as¬ 
sessment  of  the  general  situation  after  the  end  of  an  individual 
combat”  (Hacker,  1984,  p.  1).  Boelcke  also  prescribed  knowl¬ 
edge  of  one’s  own  machine,  the  enemy’s  machine,  and  naviga¬ 
tional  fixes.  SA  has  been  a  significant  part  of  aeronautical  train¬ 
ing  since  that  time. 

As  a  research  topic,  however,  the  concept  was  mostly  ig¬ 
nored  by  researchers  until  the  1980s.  Due  to  increasing  flight 
deck  and  air-traffic  automation,  pilots’  and  air-traffic  con¬ 
trollers’  role  as  supervisor  of  these  systems  was  increasing,  re¬ 
ducing  the  time  they  could  spend  in  developing  SA.  At  the 
same  time,  a  great  deal  of  new  sensor  information  was  becom¬ 
ing  available  to  designers  of  aviation  automation,  information 
that  could  be  used  to  reinforce  the  controllers’  and  pilots’  SA. 
Researchers  began  looking  into  what  SA  is,  what  affects  an  op¬ 
erator’s  ability  to  construct  SA  and  to  keep  it,  and  how  it  might 
be  measured. 

Many  definitions  of  SA  exist,  with  most  agreeing  that  SA  is, 
at  least  in  part,  the  comprehension  of  elements  of  the  environ¬ 
ment  that  have  (or  may  have)  some  bearing  on  the  task  being 
accomplished.  Some  efforts  have  viewed  SA  as  a  static,  infor¬ 
mation-driven  product,  some  have  viewed  it  as  a  dynamic 
process,  while  others  have  viewed  SA  as  a  high-level  descrip¬ 
tion  of  certain  aspects  of  task  behavior.  A  significant  body  of 
work  has  also  gone  towards  determining  how  to  measure  SA. 

One  of  the  most  widely  quoted  definitions  of  SA  is  “the  per¬ 
ception  of  elements  in  the  environment  within  a  volume  of  time 
and  space,  the  comprehension  of  their  meaning,  and  the  pro¬ 
jection  of  their  status  in  the  near  future”  (Endsley,  1988,  p.  97). 
Another  researcher  has  called  SA  “an  integrated  understanding 
of  factors  that  will  contribute  to  the  safe  flying  of  the  aircraft 
under  normal  or  nonnormal  conditions”  (Regal,  1988,  p.  65). 
These  definitions  describe  SA  as  a  product,  something  that  an 
operator  either  has  or  does  not  have.  As  such,  errors  resulting 
from  a  lack  of  SA  can  be  studied.  Researchers  have  found  that 
controllers,  viewing  recorded  air-traffic  files,  were  unable  to  re¬ 
call  many  details  about  traffic  when  asked  about  them  during 
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the  scenario,  including  call  signs,  control  level  (who  has  con¬ 
trol  of  the  aircraft),  altitude,  speed,  and  heading;  controllers  of¬ 
ten  even  failed  to  report  that  some  aircraft  were  present  at  all 
(Endsley  &  Rogers,  1996).  Other  researchers  studying  error  re¬ 
port  databases  have  found  that  in  only  about  20%  of  serious  op¬ 
erational  air-traffic  errors  were  controllers  aware  of  the  problem 
developing  (Gosling,  2002). 

Researchers  who  view  SA  as  a  product  often  defer  the 
process  of  obtaining  and  maintaining  SA  to  a  separate  process, 
often  referring  to  it  as  “situation  assessment”.  This  type  of  SA 
has  been  described  as  “(an)  adaptive,  externally-directed  con¬ 
sciousness”  (Smith  &  Hancock,  1995,  p.  137),  and  “the  integra¬ 
tion  of  knowledge  resulting  from  recurrent  situation  assess¬ 
ments”  (Sarter  &  Woods,  1991,  p.  45).  Here,  the  sources  of 
information  important  to  SA  (visual,  auditory,  tactile,  other  sen¬ 
sory  input,  knowledge  of  procedures  and  regulations,  etc.)  are 
diverse  and  often  clearly  identifiable.  The  operator’s  ability 
to  obtain  SA  begins  with  the  most  fundamental  cognitive  pro¬ 
cesses  (detection),  and  progresses  to  very  sophisticated  con¬ 
cepts  (comprehension  and  projection). 

Some  researchers  have  called  into  question  the  utility  of  the 
SA  concept.  Many  feel  it  is  a  high-level  concept  that  does  not  have 
sufficient  granularity  to  really  explain  anything.  SA  has  been  re¬ 
ferred  to  as  a  description  of  observations  of  humans  operating 
complex  systems  in  a  dynamic  environment  (Billings,  1995)  and 
as  simply  equivalent  to  expertise  (Crane,  1992).  As  such,  SA  is  a 
description  of  a  set  of  cognitive  processes  that  are  used  together, 
and  is  really  only  useful  in  categorizing  or  grouping  behaviors  and 
errors.  Some  researchers  feel  that  considering  SA  as  a  causal 
agent  can  be  counterproductive  to  understanding  operator  be¬ 
havior  (Flach,  1995). 

No  one  set  of  measures  has  been  clearly  identified  for  SA. 
Typical  measures  can  be  broken  into  three  categories:  (a)  ex¬ 
plicit  (or  knowledge-based),  (b)  implicit  (or  performance- 
based),  and  (c)  subjective  (Vidulich,  1992).  Examples  of  explicit 
methods  include  participants’  recall  of  a  situation  (post  hoc),  an 
ongoing  narrative  provided  concurrently  with  the  task,  or  freez¬ 
ing  a  scenario  and  questioning  the  participant  about  the  deci¬ 
sions,  events,  or  the  task  environment.  These  measures  can  be 
compared  to  actual  state  of  the  system  to  provide  a  better  mea¬ 
sure,  but  have  been  criticized  as  too  subjective  (in  the  case  of 
recall;  Fracker,  1991),  or  too  intrusive  (in  the  case  of  freezing  a 
scenario;  Sarter  &  Woods,  1991). 

Implicit  measures  examine  task  performance,  and  correlate 
that  with  SA.  These  measures  are  generally  unobtrusive  and  ob¬ 
jective,  and  (as  mentioned  earlier)  can  be  used  in  conjunction 
with  explicit  measures.  It  has  been  suggested  that  these  mea¬ 
sures  can  succeed  where  explicit  methods  cannot,  particularly 
in  situations  where  a  determination  of  the  timing  of  events  is 
important  and  where  the  subject  may  be  unaware  of  his  or  her 
deficiency  (Pritchett,  Hansman,  &  Johnson,  1995).  These  mea¬ 
sures  may  examine  performance  at  the  task  overall,  or  alter  the 
task  to  determine  whether  the  subject  notices  the  change. 

Subjective  ratings  are  assessments  of  SA  made  either  by  an 
observer  or  by  the  subject.  These  ratings  can  use  a  number  of 
different  scales,  and  can  be  either  direct  or  relative  (such  as 
by  comparing  SA  in  one  situation  to  SA  in  another  situation). 
Although  these  ratings  can  be  affected  by  a  number  of  factors, 
including  task  performance,  one  technique  that  has  been  stud¬ 


ied  extensively  is  the  Situation  Awareness  Rating  Technique 
(SART;  Taylor,  1989.  This  technique  has  the  operator  rate  10 
constructs  on  a  seven-point  scale.  The  constructs  are  grouped 
into  three  categories:  (a)  attentional  demand  (which  includes 
the  instability,  variability,  and  complexity  of  the  situation), 
(b)  attentional  supply  (including  arousal,  spare  mental  capac¬ 
ity,  concentration,  and  division  of  attention),  and  (c)  under¬ 
standing  (which  includes  information  quality  and  quantity,  and 
familiarity).  SART  has  been  found  to  be  more  sensitive  than 
overall  subjective  measures  of  SA  (Selcon  &  Taylor,  1989),  al¬ 
though  some  researchers  feel  that  SART  confounds  SA  with 
workload  (Jones  &  Endsley,  2000). 

Decision  support  tools  in  air-traffic  control.  One  of 

the  most  significant  areas  of  improvement  within  air  traffic  over 
the  last  two  decades  has  been  in  decision  support  tools.  Over 
the  years,  a  number  of  automation  tools  have  been  slowly  added 
to  the  air-traffic  control  system,  and  others  have  been  consid¬ 
ered  and  failed.  These  advancements  have  included  the  intro¬ 
duction  of  two  conflict  prediction  systems,  the  failed  attempt 
to  provide  sequencing  and  runway  selection  aids,  and  the  in¬ 
troduction  of  centrally  scheduled  time-based  metering  tools. 

The  first  conflict  prediction  system  was  created  in  the  early 
1980s  as  a  quality  assurance  program.  The  “operational  error  de¬ 
tection  program”  (OEDP)  was  designed  to  identify  when  two 
aircraft  had  come  closer  to  one  another  than  allowed  under  FAA 
regulations.  If  the  OEDP  was  set  off,  a  record  was  made  of  the 
transgression,  and  investigation  took  place.  This  is  obviously 
highly  undesirable  for  controllers,  who  have  nicknamed  the  sys¬ 
tem  “the  snitch  patch.”  As  a  result  of  the  OEDI)  controllers  rou¬ 
tinely  pad  separation  between  aircraft  to  ensure  that  the  snitch 
patch  does  not  activate  (Cotton,  2003). 

The  OEDP  also  gives  a  three-minute  warning  that  a  violation 
is  going  to  occur,  although  the  system  is  not  considered  highly 
reliable  by  controllers.  Part  of  the  reason  for  the  system’s  unre¬ 
liability  is  that  its  predictions  are  based  on  simple  extrapolations 
of  the  current  states  of  the  aircraft  involved,  without  considera¬ 
tion  of  intentions  (such  as  its  intention  to  stop  climbing  or  to 
change  course).  The  system’s  unreliability  has  led  to  several  ef¬ 
forts  to  replace  it.  The  URET  system  mentioned  previously  in¬ 
cludes  a  “conflict  probe,”  which  examines  the  predicted  flight 
path  of  aircraft  and  gives  an  indication  of  the  potential  for  colli¬ 
sion  (Brudnicki  &  McFarland,  1997). 

In  the  1990s,  NASA  undertook  an  effort  to  provide  automa¬ 
tion  tools  for  controllers  and  traffic  managers.  The  suite  of  tools, 
called  the  Center-TRACON  Automation  System  (CTAS),  was 
based  upon  combining  radar  position  and  speed  data,  aircraft 
flight  plans,  weather  data,  and  models  of  aircraft  flight  charac¬ 
teristics.  The  models  were  based  upon  those  used  in  the  flight 
management  systems  in  aircraft,  which  are  known  to  be  highly 
accurate.  One  of  the  first  tools  proposed  was  the  Final  Approach 
Spacing  Tool  (FAST) .  This  system  was  to  be  used  by  controllers 
handling  aircraft  close  to  busy  airports.  These  controllers  transi¬ 
tion  aircraft  from  intermediate  altitudes  and  about  fifty  miles  out 
from  an  airport  and  coordinate  their  arrival  to  the  runway.  This  is 
a  very  complex  task,  requiring  controllers  to  predict  the  proper 
sequencing  and  merging  of  aircraft  from  several  directions.  FAST 
generated  advisories  for  sequencing,  runway  assignment,  head¬ 
ings,  and  speeds  to  ease  the  merging  and  spacing  of  aircraft  (Lee 
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&  Davis,  1996).  Although  deemed  acceptable  by  controllers  (Lee 
&  Sanford,  1998),  the  system  was  ultimately  not  implemented. 
One  of  the  likely  factors  was  that  although  the  system  was  ex¬ 
tremely  accurate  in  its  predictions,  it  was  not  perfectly  so,  and 
controllers  may  have  felt  that  the  workload  and  distraction  of  the 
few  inaccurate  predictions  outweighed  its  benefits. 

Another  part  of  the  CTAS  suite  of  tools  is  the  Traffic  Manage¬ 
ment  Advisor  (TMA) ,  which  provides  a  similar  service  as  FAST 
for  controllers  handling  aircraft  further  out  from  the  airport 
(out  to  about  300  miles).  This  system  provides  delay  advisories 
for  each  aircraft  approaching  a  busy  airport.  Controllers  are 
then  responsible  for  slowing  aircraft  to  meet  the  assigned  de¬ 
lays,  utilizing  the  same  techniques  they  use  to  space  aircraft  nor¬ 
mally.  The  main  shift  between  the  current  method  and  TMA  is 
that  the  spacing  TMA  provides  is  time-based,  versus  the  dis¬ 
tance-based  method  currently  used.  Time-based  metering  in 
this  fashion  has  been  shown  to  be  more  efficient  than  distance- 
based  (Sokkappa,  1989;  Vandevenne  &  Andrews,  1993),  and 
TMA  has  shown  significant  benefits  where  it  has  been  adopted 
(Knorr,  2003).  Similar  efforts  focused  on  runway  sequencing 
and  scheduling  have  been  developed  in  Europe,  Canada,  and 
Australia  (Ljungberg  &  Lucas,  1992;  Robinson  III,  Davis,  &  Isaac¬ 
son,  1997;  Barco  Othogon,  2003;  NavCanada,  2003).  More  ad¬ 
vanced  systems  are  also  being  developed  which  move  some 
separation  authority  to  the  flight  deck  (Prevot  et  al.,  2003). 


FUTURE  CHALLENGES 


Aviation  has  undergone  a  great  deal  of  change  since  its  incep¬ 
tion,  although  predictions  of  its  development  path  have  not  al¬ 
ways  been  accurate.  Forty  or  50  years  ago,  predictions  of  where 
the  air  transport  system  would  be  would  probably  have  included 
much  more  automation  of  air-traffic  control  and  high  reliance  on 
satellite  systems,  with  larger  and  faster  aircraft  than  were  then 
utilized.  These  aspects  have  largely  not  been  realized  to  the  ex¬ 
tent  expected.  The  aviation  system  of  the  future  will  undoubt¬ 
edly  hold  other  unexpected  developments,  but  a  few  chal¬ 
lenges  for  researchers,  engineers,  and  practitioners  are  clear. 

Three-Dimensional  Displays 

To  overcome  the  limitations  of  two-dimensional  (2D)  displays, 
several  alternatives  to  providing  three  dimensions  in  a  display 
have  been  investigated.  Air-traffic  and  navigation  displays  are 
both  trying  to  represent  three  dimensions  in  a  2D  display.  In¬ 
evitably,  some  information  is  lost  in  the  display  in  this  case.  This 
can  be  accomplished  through  binocular,  stereoscopic,  or  holo¬ 
graphic  devices  (Mountford  &  Somberg,  1981),  but  practical 
problems  make  this  unlikely.  Also  unlikely  due  to  the  cognitive 
demands  of  integration  is  using  time  frame  compressions  (a 
display  of  sequential  images;  Roscoe,  1981);  however,  significant 
research  has  been  accomplished  in  using  perspective  illusion 
displays,  where  the  illusion  of  three  dimensions  is  given 
through  the  use  of  perspective. 

Three-dimensional  displays  can  portray  a  variety  of  view¬ 
points,  to  different  effect  (Wickens  &  Prevett,  1995).  In  testing 


such  displays,  2D  displays,  with  either  some  analog  display  of  al¬ 
titude  or  an  additional  profile  view,  are  used  for  comparison. 
This  adds  complexity  to  testing  these  displays;  often,  additional 
information  has  to  be  added  to  current  displays  in  order  to  iso¬ 
late  the  3D  presentation  from  the  additional  information  pro¬ 
vided  by  the  display. 

In  addition,  dual  panel  2D  displays,  although  they  may  con¬ 
tain  the  same  information  as  3D  displays,  have  an  additional 
cost — that  of  scanning  and  integrating.  Therefore,  it  would  seem 
that  3D  displays  adhere  to  several  tenets  of  good  display  design: 
proximity  compatibility,  integration,  and  pictorial  realism. 

If  the  airplane  is  not  navigating  vertically,  however,  the  3D 
display  may  impose  a  clutter  cost,  as  the  operator  must  sort 
through  the  irrelevant  altitude  information.  This  is  in  addition 
to  the  problem  of  ambiguity  of  position  of  objects  along  the  line 
of  sight  of  the  display  (McGreevy  &  Ellis,  1986).  Lastly,  if  the  3D 
perspective  is  immersed  (the  perspective  is  as  the  world  looks 
from  the  cockpit) ,  no  information  concerning  the  world  to  the 
side  or  behind  the  aircraft  is  presented,  resulting  in  a  “keyhole” 
view  of  the  world  (Woods,  1984). 

For  judgments  requiring  mapping  between  the  world  and 
the  display,  a  3D  display  with  a  viewpoint  outside  the  cockpit 
seems  to  provide  the  best  performance  (Olmos,  Wickens,  & 
Chudy,  1997).  This  benefit  is  again  tied  to  the  principles  of  real¬ 
ism  and  integration,  and  is  generally  viewed  as  providing  better 
overall  situation  awareness  than  immersed  3D  views. 

Tunnels  in  the  sky.  Enhancements  to  the  immersed 
viewpoint  of  a  3D  display  can  present  the  operator  with  a  highly 
integrated  pathway,  or  “tunnel  in  the  sky.”  Superior  perfor¬ 
mance  in  navigation  and  tracking  has  been  demonstrated  using 
this  display  (Wickens  &  Prevett,  1995),  although  they  are  cur¬ 
rently  still  in  development.  The  biggest  obstacle  to  the  deploy¬ 
ment  of  such  automation  seems  to  be  the  cost  of  certification, 
rather  than  any  technical  or  cognitive  issues.  It  appears  that 
these  displays  only  suffer  from  effects  similar  to  those  found  on 
other  perspective  displays  (confounding  of  distance  along  the 
line  of  sight)  and  on  HUDs  (difficulty  in  detecting  unusual 
events  due  to  the  compelling  nature  of  the  display;  Wickens, 
Mavor,  Parasuraman,  &  McGee,  1998). 

The  scale  problem  in  air-traffic  control.  Increases 
in  computational  ability  have  made  3D  displays  even  more 
practical,  but  many  issues  surrounding  them  have  not  been  re¬ 
solved.  In  some  ways,  aviation  is  a  3D  task,  with  vertical  and 
horizontal  navigation;  however,  the  scales  of  these  navigation 
tasks  are  different,  one  expressed  in  feet,  the  other  in  nauti¬ 
cal  miles.  It  is  extremely  difficult  to  portray  these  two  axes  on 
a  single  display  without  distorting  one  or  the  other,  with  sig¬ 
nificant  repercussions  for  judgments  and  decisions  of  the  op¬ 
erator  using  the  display.  In  addition,  air-traffic  control  is  only 
marginally  a  3D  task,  with  aircraft  separated  for  much  of  its 
flight  through  the  use  of  altitude  stratification.  This  separa¬ 
tion  method  places  aircraft  at  discrete  altitudes,  with  climbing 
and  descending  transitions  between  these  strata.  As  such,  sep¬ 
aration  in  the  vertical  axis  can  be  assured  for  much  of  the  air¬ 
craft  simply  by  comparing  two  numbers.  Additional  informa¬ 
tion  about  vertical  position  is  superfluous  (except  during 
climbs  and  descents) . 
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Coordination  Support  Systems 

One  type  of  information  system  which  is  likely  to  become 
prevalent  in  the  future  is  what  is  referred  to  here  as  a  coordi¬ 
nation  support  system  (for  a  further  discussion  of  computer 
supported  cooperative  work,  see  chapter  28).  It  is  becoming 
more  common  that  work,  including  in  the  air-traffic  system,  is 
distributed  across  organizations.  For  example,  alerts  generated 
by  an  aircraft’s  TCAS  system  have  to  be  coordinated  between 
the  two  aircraft  in  the  conflict  pair  to  avoid  having  the  aircraft’s 
resolutions  conflict.  In  addition,  coordinating  arrivals  into  air¬ 
ports  involves  affecting  aircraft  many  hundreds  of  miles  away 
from  the  airport,  across  air-traffic  control  organizational  and 
even  facility  boundaries. 

One  approach  to  enabling  such  cooperative  work  is  to  ac¬ 
tively  share  information  between  organizations  through  proce¬ 
dural  means.  Such  an  approach  means  that  one  organization, 
with  localized  information  of  interest,  distributes  that  informa¬ 
tion  to  other  parties.  In  the  air-traffic  system,  however,  the  great 
majority  of  the  information  needed  for  collaborative  work  is 
present  in  the  task  environment,  or  relates  to  the  impact  of 
other  organizations’  decisions  on  the  traffic  situation.  If  such 
ecological  information  could  be  provided,  then  collaborative 
work  would  happen  naturally. 

Such  an  approach  is  supported  by  the  concept  of  a  common 
information  space  (CIS)  (Schmidt  &  Bannon,  1992).  A  CIS  is  a 
source  of  information  shared  within  and  across  organization 
boundaries  in  order  to  conduct  coordinated  work  for  common 
purposes.  Instead  of  passing  information  from  one  organiza¬ 
tion  to  another,  information  is  made  available  through  the  CIS, 
which  can  then  be  accessed  and  manipulated  by  any  interested 
party.  Moreover,  objects  can  be  created  within  the  CIS  which 
serve  as  translators  across  different  communities.  These  ob¬ 
jects,  referred  to  as  “boundary  objects,”  are  entities  in  which 
decontextualized  information  resides  in  a  particular  form  (Starr 
&  Griesemer,  1989).  The  distinction  which  defines  boundary 
objects  is  their  purpose  in  acting  as  translators,  or,  as  defined  by 
Bannon  (2000),  they  are  a  “means  for  sharing  items  in  a  com¬ 
mon  information  space”  (p.  5).  Information,  devoid  of  context, 
is  presented  in  a  form  that  allows  users  from  different  commu¬ 
nities  of  interest  to  comprehend  it  within  their  own  contexts. 
This  object  is  accessed  by  users  in  different  communities  of  in¬ 
terest  in  order  to  conduct  their  particular  portion  of  the  work. 
The  information  in  boundary  objects  is  contextualized  only 
within  each  community  of  interest,  and  this  context  differs  in 
each  separate  community. 

Technologies  Needed  for  3  X  Air  Traffic 

The  rate  of  growth  of  air  traffic  has  been  consistent  for  a  number 
of  decades,  and  is  not  expected  to  change  significantly  in  the 
next  few  decades.  Such  a  rate  of  growth  would  double  or  triple 
air  traffic,  a  level  which  some  researchers  feel  is  unsupportable 
by  the  current  method  of  air-traffic  control. 

Such  a  conclusion  is  consistent  with  the  historical  develop¬ 
ment  of  the  air  traffic  system.  In  order  to  grow  beyond  its  capa¬ 
bilities  in  the  1940s,  radar  was  added,  reducing  the  workload  of 
the  controllers,  which  enabled  them  to  deal  with  more  traffic  and 


higher  aircraft  speeds.  To  continue  to  grow,  computerized  flight 
data  processing  was  added,  further  reducing  workload  and  al¬ 
lowing  controllers  to  handle  more  traffic.  Increases  in  staffing,  re¬ 
duction  in  sector  sizes,  strategic  control,  and  other  innovations 
have  allowed  further  increases  in  air  traffic  levels.  To  further  con¬ 
tinue  these  increases,  researchers  argue  that  a  more  fundamen¬ 
tal  shift  is  needed,  by  sharing  some  of  the  air  traffic  responsibili¬ 
ties  between  the  current  authority  (air  traffic  controllers)  and  the 
flight  deck,  by  allowing  automation  to  accomplish  some  current 
tasks  of  the  controllers,  or  some  combination  of  these. 

Datalink  and  the  loss  of  the  party  line.  One  high- 
workload  task  for  controllers  is  communication.  For  some  air 
traffic  control  tasks,  the  capacity  of  the  radio  frequency  is  a  lim¬ 
iting  factor  for  how  many  aircraft  can  be  handled  within  a  sector. 
This  is  also  a  limiting  factor  for  how  small  sectors  can  be,  since 
controllers  must  still  talk  to  each  aircraft  at  least  twice  (once 
upon  entering,  once  upon  leaving  the  sector),  and  small  sectors 
reduce  the  time  between  these  communications. 

One  solution  to  this  communication  workload  is  to  allow 
communication  directly  between  the  data  processing  system 
and  the  flight,  through  a  system  known  as  “datalink.”  This  tech¬ 
nology  has  been  suggested  for  decades  (Schmidt  &  Saint,  1969), 
but  its  use  has  been  stifled  by  several  issues. 

One  of  these  issues  is  that  pilots’  situation  awareness  is  af¬ 
fected  by  the  availability  of  information  on  the  communication 
channel.  Pilots  hear  the  communication  of  other  aircraft  with 
the  controller,  and  can  glean  information  that  is  relevant  to 
them,  such  as  expectations  of  future  clearances,  the  impact  of 
weather  on  aircraft  ahead  of  them,  and  positions  and  intentions 
of  other  aircraft.  If  this  “party  line”  information  is  passed  over  a 
data  channel,  pilots  would  no  longer  have  access  to  it,  reduc¬ 
ing  their  situation  awareness  (Midkiff  &  Hansman,  1993). 

4D  contracts:  Future  changes  in  the  roles  of  pilots 
and  controllers?  The  flight  management  systems  used  by 
the  current  generation  of  commercial  aircraft  is  capable  of  meet¬ 
ing  arrival  times  over  points  in  its  flight  path,  subject  to  the  con¬ 
straints  of  the  flight  envelope.  The  logic  behind  these  flight 
management  systems  is  also  available  to  ground-based  systems, 
and  is  in  fact  the  heart  of  the  Center-Tracon  Automation  System, 
a  suite  of  air-traffic  management  tools  developed  by  NASA 
(Erzberger  &  Nodell,  1989). 

These  ground-based  systems  can  use  the  flight  manage¬ 
ment  algorithms  to  make  predictions  of  arrival  times  for  all  air¬ 
craft  in  the  National  Airspace  System  (NAS).  This  has  been  ac¬ 
complished  for  flights  arriving  at  a  number  of  the  busiest 
airports  in  the  United  States  in  the  Traffic  Management  Advisor 
system  (Swenson  et  ah,  1997).  This  capability  is  being  ex¬ 
panded  to  regional  or  national  airspace,  and  for  scheduling  air¬ 
craft  over  any  congested  point  in  the  NAS  (Landry,  Parley,  & 
Hoang,  2005). 

One  proposal  for  increasing  the  capacity  of  the  air  traffic  sys¬ 
tem  is  to  allow  such  a  system  to  negotiate  “4D  contracts,”  indi¬ 
cating  navigational  waypoints,  altitudes,  and  times  of  arrival  at 
those  points  for  all  aircraft.  The  flight  management  systems 
would  then  be  responsible  for  adhering  to  the  contract.  These 
contracts  would  ensure  conflict-free  routes,  and  coordinate 
arrivals  into  congested  resources,  including  airports. 
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Such  a  system  would  radically  alter  the  roles  of  pilots  and 
controllers  in  relation  to  their  automation.  Pilots  and  controllers 
would  be  monitors  of  a  highly  automated  system,  in  comparison 
to  their  current  roles  of  actively  controlling  the  flight  paths  of 
aircraft.  As  such,  this  presents  enormous  challenges  to  the  avi¬ 
ation  HCI  community. 

Automating  separation  assurance.  The  primary 
responsibility  of  an  air-traffic  controller  is  to  provide  separation 
assurance  between  aircraft.  Even  if  datalink  and  4D  contracts 
were  established,  the  controller  is  still  limited  by  the  cognitively 
intensive  process  of  monitoring  for  conflicts.  This  task  is  also 
under  consideration  for  automated  assistance,  either  on  the 
flight  deck  or  through  a  centralized  system. 

One  proposal  is  to  equip  flight  decks  with  a  Cockpit  Display  of 
Traffic  Information  (CDTI)  system,  and  utilize  an  advanced  form 
of  the  current  Traffic  Collision  and  Avoidance  System  (TCAS).  A 
second  proposal  is  to  utilize  a  centralized  system  for  providing 
alerts  (Erzberger,  2004).  Such  a  system  would  automatically  co¬ 
ordinate  resolutions,  and  provide  for  an  independent  backup  in 
case  of  failures  to  resolve.  In  both  cases,  automation  would  detect 
conflicts,  identify  resolutions,  and  (perhaps)  automatically  per¬ 
form  the  resolutions.  Such  systems  overcome  the  latencies  in 
the  human  detection,  communication,  and  action  cycles;  how¬ 
ever,  as  the  human  operators  are  pushed  back  further  from  the 
actual  operation  of  the  systems,  their  ability  to  intervene  in  the 
case  of  automation  failures  wanes.  Because  of  this,  such  automa¬ 
tion  must  be  highly  reliable  for  the  system  to  succeed. 

UAVs  in  the  Airspace  System 

The  rapid  development  of  uninhabited  aerial  vehicles  (UAVs) 
will  also  greatly  impact  the  air-traffic  system  of  the  future.  To 
date  such  vehicles  have  been  mostly  constrained  to  special  uses, 
such  as  the  military.  It  seems  likely,  however,  that  these  vehi¬ 
cles  will  need  to  be  integrated  into  the  air-traffic  control  system 
as  their  uses  expand. 

Recently,  efforts  have  begun  to  identify  the  issues  related  to 
integrating  UAVs  into  the  airspace  system  (Wegerbauer,  2005). 


Such  vehicles,  while  not  piloted  in  the  traditional  sense,  are 
controlled  by  a  remote  pilot.  The  issues  related  to  their  use  of 
the  airspace  system  concern  collision  danger  and  safe  extraction 
of  the  vehicle  in  the  case  of  a  communication  (or  other  system) 
failure.  These  are  but  the  first  issues  to  be  dealt  with,  particularly 
as  one  considers  more  advanced  UAVs  with  increasing  amounts 
of  autonomy. 


CONCLUSIONS 


Eitts  (1951b)  and  his  colleagues  described  the  research  chal¬ 
lenges  in  visual  displays  as  of  1951.  The  document  is  remarkable 
in  that  it  spurred  (or  foresaw)  a  great  deal  of  research  that  con¬ 
tinues  to  this  day.  Some  of  that  can  be  attributed  to  the  general 
nature  of  the  questions,  but  many  of  the  specific  research  ques¬ 
tions  and  methods  have  been  examined  and  used  over  the  last 
forty  years.  What  was  noticeably  (and  understandably)  absent 
from  the  discussion  of  transitory  displays  was  any  reference  to 
higher  cognition.  In  some  ways,  this  is  indicative  of  the  progress 
made  in  aerospace  human-machine  system  research.  Many  of  the 
difficulties  associated  with  these  concepts  originated  as  higher 
levels  (and  greater  amounts)  of  automation  were  added  to  the 
flight  deck. 

The  progression  of  research  has  also  followed  this  model. 
Early  research  resulted  in  a  number  of  principles  for  integrating 
displays,  for  determining  the  form  of  displays  for  different  pur¬ 
poses,  and  for  the  positioning  of  displays.  Later  work  has  con¬ 
centrated  on  higher  level  cognitive  issues  such  as  mental  models 
and  situation  awareness.  In  addition,  work  has  been  done  on  un¬ 
derstanding  how  to  apply  automation,  resulting  in  a  number  of 
principles  regarding  automation  use  and  allocation  of  function. 
Most  recently,  new  types  of  displays  and  automation  such  as 
warning  systems  and  decision  support  systems  have  been  intro¬ 
duced,  yielding  a  new  set  of  problems  for  researchers  to  tackle.  As 
the  air  transportation  system  matures  and  the  demand  for  ever- 
greater  levels  of  capacity  increases,  it  seems  that  greater  reliance 
on  automated  systems  will  be  required.  For  such  a  system  to  be 
successful,  such  automation  must  be  based  on  sound  principles. 
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INTRODUCTION 


The  intent  of  this  chapter  is  to  review  principles  and  challenges 
in  game  design  improvement  and  evaluation  and  to  discuss  user- 
centered  techniques  that  address  those  challenges.  First,  we  will 
present  why  games  are  important,  followed  by  the  definitions 
and  differences  between  games  and  productivity  software.  In  the 
next  section,  we  discuss  the  principles  and  challenges  that  are 
unique  to  the  design  of  games.  That  discussion  provides  a  frame¬ 
work  for  what  we  believe  include  the  core  variables  that  should 
be  measured  to  aid  in  game  design  testing  and  evaluation.  The 
chapter  will  conclude  with  some  examples  of  how  those  vari¬ 
ables  can  be  operationalized  through  methods  used  by  the 
Games  User  Research  group  at  Microsoft  Game  Studios. 


WHY  GAMES  ARE  IMPORTANT 


Computers  that  appeared  commercially  in  the  1950s  created  a 
technological  barrier  that  was  not  easy  to  overcome  for  the 
greater  population.  Only  scientists,  engineers,  and  highly  tech¬ 
nical  persons  were  able  to  use  these  machines  (Preece,  Rogers, 
Sharp,  Benyon,  Holland,  &  Carey,  1994).  As  computers  became 
less  expensive,  more  advanced,  and  more  reliable,  the  technol¬ 
ogy  that  was  once  available  only  to  a  small  group  of  people  was 
now  permeating  itself  throughout  the  entire  population  and 
into  everyday  life.  In  order  to  ease  this  transition,  the  need  for 
a  well-designed  interface  between  the  user  and  the  technology 
became  a  necessity.  Computer  games  come  from  similar  origins, 
and  potentially  may  head  down  a  similar  path.  Some  early  at¬ 
tempts  at  making  commercial  video  games  failed  due  to  unnec¬ 
essarily  high  levels  of  complexity.  Nolan  Bushnell,  cofounder  of 
Atari,  stated  the  issue  quite  succinctly:  “No  one  wants  to  read  an 
encyclopedia  to  play  a  game”  (Kent,  2000,  p.28).  In  retrospect, 
some  of  the  most  successful  video  games  were  the  ones  that 
were  indeed  very  simple. 

Entertainment  in  general  is  a  field  of  great  importance  to 
everyone.  Today,  the  video  games  industry  is  one  of  the  fastest 
growing  forms  of  entertainment  to  date.  According  to  the  En¬ 
tertainment  Software  Association  (ESA),  revenue  from  com¬ 
puter  and  console  games  has  grown  from  3.2  billion  dollars  in 
1994  (Interactive  Digital  Software  Association,  2000)  to  7.3  bil¬ 
lion  dollars  in  2004  (Entertainment  Software  Association,  2005). 
Approximately  248  million  games  were  sold  in  2004  (Entertain¬ 
ment  Software  Association,  2005).  To  put  this  into  perspective, 
according  to  the  ESA,  4796  of  Americans  either  purchased  or 
planned  to  purchase  one  or  more  games  in  2005  (Entertain¬ 
ment  Software  Association,  2005).  These  statistics  do  not  even 
consider  the  international  importance  of  video  games. 

Video  games  are  quite  important  to  the  software  develop¬ 
ment  industry  due  to  their  roles  in  pushing  research  and  tech¬ 
nology  forward.  Games  have  acted  as  catalysts  for  hardware  in¬ 
novations  (e.g.,  graphics  cards,  processors,  and  sound  cards; 
e.g.,  Kroll,  2000),  research  on  the  relationships  between  artificial 
intelligence  and  synthetic  characters  or  computer  graphics  (e.g., 
Funge,  2000;  Laird,  2001),  and  physics  (e.g.,  Hecker,  2000). 
These  are  all  areas  that  previously  were  confined  to  engineering 
and  computer  science  communities. 


Players  Are  as  Diverse  as  the  Games  That  They  Play 

With  annual  sales  reaching  8  billion  dollars  across  a  broad  range 
of  game  genres  and  platforms,  it  is  no  surprise  that  the  players 
are  also  quite  diverse.  Video  games  are  no  longer  associated 
with  male  adolescents  and  young  adults.  According  to  the  ESA 
(2005),  7596  of  American  heads  of  household,  over  half  of  all 
Americans  over  the  age  of  9  have  played,  and  19%  of  Americans 
over  the  age  of  50  years  old  have  played  an  electronic  game.  The 
average  age  of  those  who  play  video  games  is  30,  and  43%  of 
Americans  who  playgames  are  female  (ESA,  2005).  In  addition, 
the  extremely  popular  casual  online  games  market  is  dominated 
by  older  females. 

Assumptions  that  games  are  limited  to  a  narrow  market  may 
stem  from  claims  that  the  electronics  industry  has  traditionally 
been  marketed  to  boys  (e.g.,  Chaika,  1996).  Controversial  be¬ 
liefs  about  males  possessing  an  innate  proclivity  towards  video 
gaming  or  an  innate  skill  advantage  over  females  are  commonly 
sported  in  casual  conversation  and  sometimes  offered  by  schol¬ 
arly  publications  (e.g.,  Williams,  1987).  While  Nintendo’s  posi¬ 
tioning  has  been  demonstrated  by  previous  marketing  cam¬ 
paigns  and  promotions  on  such  things  as  cereal  boxes  (Herz, 
1997;  Provenzo,  1991),  genetic  claims  tend  to  be  driven  by  spec¬ 
ulation  and  stereotype. 


DEFINING  GAMES 


As  stated,  the  gaming  industry  is  becoming  as  widespread  and 
popular  as  computers  and  televisions.  As  the  industry  matures, 
games  become  more  complex,  making  games  difficult  to  define 
narrowly.  Articulating  a  clear  and  succinct  set  of  principles  that 
capture  the  essence  of  games  is  not  straightforward.  However, 
shedding  some  light  on  the  difference  between  games  and  pro¬ 
ductivity  applications  may  help.  When  comparing  games  to  pro¬ 
ductivity  software,  principles  and  methods  can  be  successfully 
applied  to  both  games  and  productivity  applications.  Regardless 
of  the  domain,  the  same  techniques  would  be  applied  to  under¬ 
stand  a  misleading  button  label  or  a  confusing  process  model. 

At  the  same  time,  there  are  fundamental  differences  between 
productivity  applications  and  gaming  applications.  Some  of  the 
differences  can  be  clear.  Games  are  usually  more  colorful, 
wacky,  and  escapist  than  productivity  applications,  with  the  in¬ 
clusion  of  interesting  story  lines,  or  animated  characters. 

User-centered  design  principles  have  not  reached  game  mak¬ 
ers  to  the  degree  that  they  have  influenced  other  electronic  ap¬ 
plications.  The  following  sections  will  describe  some  differences 
that  have  important  implications  for  user  research  on  games. 

The  Goals  Are  Different 

At  their  roots,  productivity  applications  are  tools.  Similarly,  the 
design  intentions  behind  productivity  applications  are  to  make 
task  performance  easier,  quicker,  less  error  prone,  more  qual¬ 
ity  driven,  and  accessible  to  populations  of  interest.  In  this 
sense,  a  word  processor  or  spreadsheet  differs  only  from  a  pow¬ 
ered  woodworking  tool  in  terms  of  its  complexity  and  domain 
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of  application.  One  can  argue  that  better  word  processors  al¬ 
low  a  wider  set  of  people  to  make  documents  better,  faster,  and 
easier  and  with  fewer  errors.  Similarly,  a  power  mitre  box  cuts 
wood  with  angles  that  are  more  precise,  faster,  and  more  reli¬ 
able  than  a  hand  mitre  box  or  a  hand  saw.  The  focus  of  design 
and  usability  is  to  produce  an  improved  product  or  result. 

Games  at  their  roots  are  different.  Games  are  intended  to 
be  a  pleasure  to  play.  Ultimately,  games  are  like  movies,  litera¬ 
ture,  and  other  forms  of  entertainment.  They  exist  in  order  to 
stimulate  thinking  and  feeling.  Their  outcomes  are  more  expe¬ 
riential  than  tangible.  This  is  not  to  say  that  word  processors  or 
other  tools  cannot  be  a  pleasure  to  use  or  that  people  do  not 
think  or  feel  when  using  them,  but  that  is  rarely  their  primary 
design  intention.  In  a  good  game,  both  the  outcomes  and  the 
journeys  are  rewarding.  This  fundamental  difference  leads  us 
to  devote  more  of  our  effort  to  measuring  perceptions,  while 
productivity  applications  focus  more  on  task  completion. 

The  Rules  Are  in  the  Game  World 

Games  create  an  artificial  world  with  artificial  objectives.  Cap¬ 
turing  the  king  has  a  chess-specific  meaning  that  is  different 
from  the  precise  meaning  outside  of  the  chess  world.  In  some 
sense,  this  simplifies  the  problem  of  the  game  designer  and  of 
the  usability  practitioner.  User  goals  are  almost  completely  de¬ 
fined  by  the  game  designer.  The  resulting  concern  is  to  ensure 
that  user  interpretations  match  the  game  design’s  expectations 
(or  differ  in  a  creative  and  entertaining  manner) . 

Industry  Competition  Is  Intense 

Competition  within  the  games  industry  is  more  intense  than 
other  software  domains.  Games  compete  with  each  other  and 
many  other  forms  of  entertainment  for  your  attention.  Pretend 
that  you  have  to  write  a  book  chapter.  For  that,  there  are  a  lim¬ 
ited  number  of  viable  tools:  handwriting,  dictation,  a  typewriter, 
or  a  limited  set  of  software  applications.  However,  imagine  that 
you  have  a  few  hours  to  kill  and  that  you  want  to  avoid  produc¬ 
tivity  at  all  costs.  You  could  consult  the  New  York  Times  cross¬ 
word  puzzle,  argue  politics  with  your  best  friend,  watch  a  mys¬ 
tery  film,  or  stock  up  with  health  potions  and  attack  demons  in 
Diablo  (2000). 

Games  Must  Challenge 

Perhaps  the  most  important  distinction  between  games  and 
productivity  applications  is  the  complex  relationship  between 
challenge,  consistency,  and  frustration.  To  illustrate  this  differ¬ 
ence,  consider  the  following  example.  It  has  been  said  that  the 
easiest  game  to  use  would  consist  of  one  button  labeled  “Push.” 
When  you  push  it,  the  display  would  say,  “YOU  WIN.”  This  game 
would  have  few,  if  any  usability  issues.  However,  it  would  not 
be  fun  either.  Games  deliberately  impose  constraints,  whereas 
productivity  applications  attempt  to  remove  them.  Games  de¬ 
liberately  evolve,  whereas  productivity  applications  strive  for 
consistency.  At  their  hearts,  the  fun  in  most  games  comes  from 


learning  how  to  overcome  ever-increasing  challenges.  Ironically, 
some  educators,  practitioners,  and  researchers  think  it  comes 
from  fantasy  and  bright  colors. 

Many  of  the  characteristics  mentioned  above  are  not  unique 
to  games.  All  of  these  characteristics  are  relevant  to  the  discus¬ 
sion  of  the  ways  in  which  user  research  must  adapt  to  be  useful 
for  game  development.  Later  we  will  discuss  how  these  differ¬ 
ences  create  particular  challenges  for  both  the  design  and  eval¬ 
uation  of  games. 


TYPES  OF  GAMES 


In  the  following  sections,  we  review  some  common  gaming  plat¬ 
forms  and  game  types. 

Platforms 

One  of  the  simplest  classifications  of  games  is  by  the  platform  or 
hardware  on  which  they  are  played.  Cassell  and  Jenkins  (2000) 
differentiated  between  games  played  on  a  PC  from  those  played 
on  a  console.  Crawford  (1982)  divided  games  into  even  finer 
categories;  arcade/coin  operated,  console,  PC,  mainframe,  and 
handheld.  Different  gaming  platforms  can  be  differentiated  by 
their  technical  capabilities,  physical  control  interface,  visual 
interface,  and  the  context  in  which  they  are  played. 

PC.  A  useful  distinction  can  be  made  between  PC  games 
that  are  normally  acquired  through  retail  outlets  on  CDs,  per¬ 
sistent  world  games  in  which  much  of  the  content  lives  on  the 
Internet,  and  casual,  web-based  games.  While  there  are  many 
technical  differences  between  these  kinds  of  games,  the  impor¬ 
tant  distinction  is  the  business  model  and  level  of  investment  re¬ 
quired  by  the  players.  Most  retail  games  need  to  rely  heavily  on 
flash,  reputation,  and  recognizable  novelty  to  attract  users  to  a 
relatively  large  investment.  Continued  investment  in  the  game  is 
useful  only  to  build  a  reputation  or  to  convince  customers  with 
higher  thresholds  that  they  will  get  their  moneys’  worth  from 
the  game.  Massively  multiplayer  online  persistent  worlds  (MMOs) 
require  continued  user  investment  to  obtain  monthly  subscrip¬ 
tion  fees.  The  long  learning  curves  and  reinforcement  schedules 
in  MMOs  are  tailored  to  make  players  invest  more  time,  money 
and  effort  for  long-term  rewards.  Many  casual  gamers  are  at¬ 
tracted  to  free,  familiar  games.  For  these  casual  users,  game- 
play  is  squeezed  between  activities  that  are  more  important  to 
them.  As  a  result,  games  with  very  little  learning  investment  are 
very  appealing  to  casual  gamers.  Removing  penalties  for  setting 
aside  the  game  (for  minutes,  hours,  days,  or  forever)  can  mean 
the  difference  between  successful  and  unsuccessful  casual 
games  (see  the  casual  games  section  for  more  details). 

Console.  For  user  research  purposes,  the  most  important 
unique  characteristic  of  console  video  game  systems  is  the  ad¬ 
vantage  of  a  fixed  set  of  hardware.  By  contrast  with  PC  games, 
there  is  very  little  game  setup,  minimal  maintenance  efforts 
(e.g.,  few  software  patches,  video  card  conflicts,  etc.),  and  a  con¬ 
sistent  set  of  input  device  capabilities.  Some  game  genres  are 
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more  popular  on  console  than  PC.  This  difference  in  popular¬ 
ity  can  usually  be  attributed  to  differences  in  the  input  devices 
that  are  typically  associated  with  each  platform.  It  is  hard  to  en¬ 
ter  text  using  a  gamepad.  Most  popular  MMOs  are  still  pub¬ 
lished  for  PCs  in  spite  of  the  fact  that  fast  Internet  connections 
and  hard  drives  are  becoming  more  popular  additions  to  con¬ 
sole  systems.  This  may  change  as  it  becomes  easier  to  attach 
other  devices  such  as  keyboards  or  voice-command  hardware 
that  would  make  communication  more  robust. 

More  recently,  casual  games  are  also  available  for  all  console 
systems,  including  Xbox  360,  where  consumers  can  download, 
try,  and  purchase  casual  games  directly  from  the  console  via  a 
broadband  connection. 

Portable  devices.  Portable  devices  include  portable  gam¬ 
ing  systems  (handhelds),  mobile  phones,  and  personal  digital 
assistants  (PDAs).  Handheld  devices  are  a  standardized  plat¬ 
form,  with  screens  and  controls  designed  primarily  for  gaming. 
Mobile  phones  on  the  other  hand  provide  greater  challenges 
to  creating  meaningful  gameplay  experience,  including  limited 
screen  resolution  and  small  controls  (Hyman,  2005).  Mobile 
games  generally  must  support  games  that  can  be  played  in 
shorter  blocks  of  time  and  that  can  work  well  with  the  limita¬ 
tions  of  the  handsets.  The  unlimited  combination  of  handset 
controls,  processor  and  network  capabilities,  and  runtime  envi¬ 
ronments  create  even  more  challenges  to  developing  quality 
games  (IDGA,  2005;  Hyman  2005). 

Consumers  primarily  view  mobile  phones  as  communication 
devices,  which  are  not  meant  to  replace  the  gaming  experience 
available  on  handhelds  such  as  the  Nintendo  DS  and  Sony  PSP 
(Moore  &  Rutter,  2004).  Unlike  handheld  gaming  devices,  mo¬ 
bile  phones  are  generally  carried  at  all  times.  This  creates  an 
opportunity  to  provide  gaming  at  any  time,  anywhere  (Hymen, 
2005;  IDGA,  2005).  However,  with  mobility  comes  the  greater 
need  to  design  for  time,  location,  social,  emotional,  and  moti¬ 
vational  context  (Sidel  &  Mayhew,  2003). 

Cross  platform.  Cross-platform  convergence  is  an 
emerging  trend  in  the  games  industry  that  complicates  the  clas¬ 
sification  of  games  by  platform  and  that  creates  new  challenges 
in  user-centered  game  design.  Developers  are  now  creating 
multiple  versions  of  a  game  that  interact  with  one  another 
across  platforms.  For  example,  Nintendo  games  such  as  AwA 
mal  Crossing  (2002),  The  Legend  of  Zelda:  The  Wind  Waker 
(2003),  and  Harvest  Moon  (2003)  allow  users  to  connect  a 
Game  Boy  portable  device  to  the  GameCube  console  to  access 
additional  features  and  content  or  to  use  the  Nintendo  DS  as  a 
touch  screen  controller.  Another  example  of  cross-platform 
games  would  be  a  title  where  the  player  may  execute  specific 
planning  or  strategic  tasks  on  a  mobile  device  at  any  time,  then 
have  those  actions  affect  the  full  game  experience  the  next  time 
it  is  played  on  a  console  or  PC.  This  cross-platform  design  ef¬ 
fect  has  been  referred  to  as  a  reciprocation  effect  (Yuen,  2005). 

The  next  generation  of  cross-platform  experiences  may  in¬ 
volve  matching  players  for  real  time  or  asynchronous  competi¬ 
tive  and  co-operate  multiplayer  experiences  regardless  of  what 
platform  they  are  on  (portable,  PC,  console,  etc).  As  these 
trends  become  realized,  the  promise  of  interesting  challenges 
and  opportunities  for  game  developers  continue  to  increase. 


Game  Types 

Retail  games  (NPD  classifications).  The  NPD  Group 
(a  marketing  company)  uses  a  fine-grained  classification  scheme 
for  game  type  that  is  referred  to  quite  often  in  the  games  in¬ 
dustry.  They  offer  the  following  classes  and  subclasses  as  seen  in 
Table  37.1.  Games  can  also  be  categorized  in  a  more  granular 
fashion  by  splitting  each  genre  up  into  a  subgenre.  For  instance, 
action  games  can  be  divided  into  Action  Driving  Hybrid  (True 
Crime,  GTA3,  etc.).  Action  Combat  (Max  Payne,  Tomb  Raider, 
etc.),  and  Platformer  (Jak  and  Baxter,  Super  Mario  Sunshine, 
etc).  In  general,  the  genres  represent  types  of  gameplay  me¬ 
chanics,  themes,  or  content. 

Casual  games.  Casual  games  are  a  rapidly  growing  seg¬ 
ment  of  the  video  games  market,  producing  an  estimated  $500 
million  in  online  sales  in  2005  (IDG  Entertainment,  2005).  Ca¬ 
sual  games  as  a  whole  are  appealing  to  a  broad  range  of  users, 
can  be  found  on  all  platforms,  and  represent  several  genres. 


TABLE  37.1  The  NPD  Group  Super  Genre  Classification 
Scheme  for  Games 


Category 

Description 

Action 

Control  a  character  and  achieve  one  or  more 
objectives  with  that  character. 

Fighting 

Defeat  an  opponent  in  virtual  physical  hand-to- 
hand  or  short-range  combat. 

Racing 

Complete  a  course  before  others  do  and/or 
accumulate  more  points  than  others  while 
completing  a  course. 

Shooter 

Goal  is  to  defeat  enemies  in  combat  with  ranged 
weapons  (first-person  shooters,  third-person 
shooters). 

Strategy 

Strategically  manage  resources  to  develop  and 
control  a  complex  system  in  order  to  defeat 
an  opponent  or  achieve  a  goal.  The  goal  is  to 
defeat  opponent(s)  using  a  large  and 
sophisticated  array  of  elements. 

Roie-Piaying 

Control  a  character  that  is  assuming  a  particular 
role  in  order  to  achieve  a  goal  or  mission.  Rich 
story  and  character  development  are  common. 

Famiiy 

The  primary  objective  is  to  interact  with  others 

entertainment 

and/or  to  solve  problems.  This  genre  includes 
puzzles  and  parlor  games. 

Children's 

Same  as  family  entertainment,  but  geared  to  a 

entertainment 

younger  audience. 

Sports 

Manage  a  team  or  control  players,  to  either  win  a 
game  or  develop  a  team  to  championship  status. 
Involves  individual,  team,  and  extreme  sports. 

Adventure 

Control  a  character  to  complete  a  series  of 

puzzles  that  are  required  to  achieve  a  final  goal. 

Arcade 

Games  on  coin-operated  arcade  machines,  or 
games  that  have  similar  qualities  to  classic 
arcade  games.  Generally,  they  are  fast  paced, 
action  games. 

Flight 

Plan  flights  and  pilot  an  aircraft  in  a  realistic, 
simulated  environment 

All  other  games 

Educational,  compilations,  non  flight  simulators, 
rhythm  games,  etc. 
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For  these  reasons,  they  do  not  fit  into  either  of  the  previous 
categorization  schema.  Casual  games  generally  meet  the  fol¬ 
lowing  requirements:  (a)  are  easy  to  start  and  control;  (b)  can 
be  enjoyed  in  small  time  intervals;  (c)  do  not  require  an  exten¬ 
sive  investment  in  time  to  enjoy  or  progress,  and  therefore 
usually  do  not  have  deep  linear  story  lines;  and  (d)  have  dis¬ 
creet  goals  and  rules  that  are  intuitive  or  easy  to  learn.  Casual 
games  often  have  a  significantly  lower  development  costs,  can 
be  distributed  digitally  (or  via  physical  media),  offer  trial  ver¬ 
sions  and  lower  retail  prices  (or  are  free),  and  utilize  lower  sys¬ 
tem  requirements  and  storage  space.  These  requirements  and 
attributes  make  these  games  well  suited  for  platforms  such  as 
PC  distributed  via  the  web  and  mobile  devices,  where  they  are 
most  common. 

Websites  such  as  Real  Arcade,  Pogo.com,  Yahoo!  Games,  and 
MSN  Games  provide  free  online  casual  games,  pay  per  play  ca¬ 
sual  games,  subscription  services,  downloadable  games  that  in¬ 
clude  free  trials,  and  games  that  can  be  purchased  for  $5  to  |20. 
Mobile  phone  carriers  and  a  variety  of  websites  provide  casual 
games  for  mobile  devices.  Although  there  are  feature  rich,  3-D 
noncasual  games  available  for  mobile  devices,  the  top  selling 
titles  are  casual  games  such  as  Tetris,  Bejeiveled,  Bowling,  and 
Pacman  (Hyman,  2005). 


PRINCIPLES  AND  CHALLENGES 
OF  GAME  DESIGN 


Having  differentiated  games  from  other  applications,  we  can 
look  at  some  of  the  unique  issues  in  game  design  and  evaluation. 


Identifying  the  Right  Kind  of  Challenges 

Games  are  supposed  to  be  challenging.  This  requires  a  clear 
understanding  of  the  difference  between  good  challenges  and 
frustrating  usability  problems.  Most  productivity  tools  struggle 
to  find  a  balance  between  providing  a  powerful  enough  tool  set 
for  the  expert  and  a  gradual  enough  learning  curve  for  the 
novice.  However,  no  designer  consciously  chooses  to  make  a 
productivity  tool  more  difficult.  The  ideal  tool  enables  users  to 
experience  challenge  only  in  terms  of  expressing  their  own  cre¬ 
ativity.  For  games,  learning  the  goals,  strategies,  and  tactics  to 
succeed  is  part  of  the  fun. 

Unfortunately,  it  is  not  always  clear  which  tasks  should  be 
intuitive  (e.g.,  easy  to  use)  and  which  ones  should  be  challeng¬ 
ing.  Input  from  users  becomes  necessary  to  distinguish  good 
challenges  from  incomprehensible  design.  Take  a  driving  game 
for  example.  It  is  not  fun  having  difficulty  making  your  car  move 
forward  or  turn.  However,  learning  to  drive  is  still  a  fundamen¬ 
tal  part  of  the  challenge  in  the  game.  While  all  cars  should  use 
the  same  basic  mechanisms,  varying  the  ways  that  certain  cars 
respond  under  certain  circumstances  may  be  fun.  It  should  be 
challenging  to  identify  the  best  car  to  use  on  an  icy,  oval  track 
as  opposed  to  a  rally-racing  track  in  the  middle  of  the  desert. 
The  challenge  level  in  a  game  must  gradually  increase  in  order 
to  maintain  the  interest  of  the  player. 


Addressing  Different  Skill  Levels 

Unfortunately,  not  all  players  start  from  the  same  place  in  terms 
of  gaming  experience  or  talent.  Obviously,  frequent  failure  can 
be  a  turn  off  Success  that  comes  too  easily  can  also  become 
repetitive.  Games  must  address  the  problem  of  meeting  all  play¬ 
ers  with  the  correct  level  of  challenge.  Tuning  a  game  to  the 
right  challenge  level  is  called  “game  balancing.” 

There  are  many  ways  to  balance  the  difficulty  of  the  game. 
The  most  obvious  way  is  to  let  players  choose  the  difficulty 
themselves.  Many  games  offer  the  choice  of  an  easy,  medium,  or 
hard  difficulty  level.  While  this  seems  like  a  simple  solution,  it 
is  not  simple  to  identify  exactly  how  easy  the  easiest  level  should 
be.  Players  want  to  win,  but  they  do  not  want  to  be  patronized. 
Too  easy  is  boring  and  too  hard  is  unfair.  Either  experience  can 
make  a  person  cease  playing. 

Another  approach  to  varying  skill  levels  is  to  require  explicit 
instruction  that  helps  all  users  become  skilled  in  the  game.  You 
might  imagine  a  tutorial  in  which  a  professional  golfer  starts  by 
explaining  how  to  hit  the  ball  and  ends  by  giving  instruction  on 
how  to  shoot  out  of  a  sand  trap  onto  a  difficult  putting  green.  In¬ 
struction,  however,  need  not  be  presented  in  a  tutorial.  It  could 
be  as  direct  as  automatically  selecting  the  appropriate  golf  club 
to  use  in  a  particular  situation  with  no  input  from  the  user,  sim¬ 
ilar  to  the  notion  of  an  adaptive  interface,  where  the  interface 
provides  the  right  information  at  the  right  time. 

The  environments,  characters,  and  objects  in  a  game  provide 
another  possibility  for  self-regulation.  Most  games  will  offer  play¬ 
ers  some  choices  regarding  their  identities,  their  opponents,  and 
their  environments.  The  better  games  will  provide  a  variety  of 
choices  that  allow  users  to  regulate  the  difficulty  of  their  first  ex¬ 
periences.  With  learning  in  mind,  it  is  not  uncommon  for  novice 
players  to  choose  champion  football  teams  to  play  against  weak 
opponents.  As  long  as  players  can  distinguish  the  good  teams 
from  the  bad  ones,  and  the  teams  are  balanced  appropriately, 
users  will  be  able  to  manage  their  own  challenge  levels. 

Some  games  take  it  even  further  by  identifying  the  skill  level 
of  the  player  and  regulating  the  instruction-level  appropriately.  In 
this  situation,  instruction  can  be  tuned  to  the  skill  level  of  the 
player  by  associating  it  with  key  behavioral  indicators  that  signifies 
that  the  player  is  having  difficulty.  If  the  game  does  not  detect  a 
problem,  it  does  not  have  to  waste  the  player’s  time  with  those 
instructions.  In  Halo  2  (2004) ,  the  game  detects  difficulties  that  a 
player  may  have  with  certain  tasks.  For  example,  to  get  into  a  ve¬ 
hicle,  the  player  must  press  and  hold  the  X  button  on  his  or  her 
controller  when  standing  next  to  the  driver’s  seat  of  that  vehicle.  If 
the  player  is  standing  in  the  right  position,  but  taps  the  X  button 
repeatedly  (instead  of  holding  the  button  down) ,  the  game  will 
present  a  more  explicit  instruction  to  press  and  hold  the  button. 
This  is  just  one  of  many  dynamic  instructions  that  appear  through¬ 
out  Halo  2  based  on  behavioral  indicators  of  player  difficulty. 

Productivity  tools  have  implemented  similar  problem-iden¬ 
tification  features,  but  often  with  mixed  success  due  to  the  open 
nature  of  tasks  in  most  productivity  applications.  Good  game  tu¬ 
torials  have  succeeded  by  setting  clear  goals  and  completely 
constraining  the  environment.  Doing  so  focuses  the  user  on  the 
specific  skill  and  simplifies  the  detection  of  problem  behavior. 
Other  lessons  from  game  tutorial  design  will  be  described  in 
later  sections  of  this  chapter. 


746  •  PAGULAYAN  ET  AL. 


Another  in-game  approach  to  auto-regulating  the  difficulty 
level  requires  adjusting  the  actual  challenge  level  of  the  oppo¬ 
nents  during  the  game.  Evaluating  the  success  of  the  player  and 
adjusting  the  opponent  difficulty  during  the  game  is  often  called 
“dynamic  difficulty  adjustment,”  or  “rubber  banding.”  When 
players  perform  very  skillfully  their  performances  are  moder¬ 
ated  by  computer-generated  bad  luck  and  enhanced  opponent 
attributes.  In  a  football  game,  the  likelihood  of  fumbling,  throw¬ 
ing  an  interception,  or  being  sacked  may  increase  as  the  player 
increases  his  or  her  lead  over  their  opponent.  Even  though  this 
may  seem  like  a  good  solution,  there  can  be  a  downside.  Most 
people  would  prefer  to  play  a  competitive  game  (and  win)  than 
to  constantly  trounce  a  less-skilled  opponent.  However,  overde¬ 
veloped  rubber  banding  can  cheat  a  skilled  player  out  of  the 
crucial  feeling  of  mastery  over  the  game. 

A  final  approach  focuses  on  providing  tools  that  maximize 
the  ability  of  the  trailing  player  to  catch  up  with  the  leading 
player.  The  key  for  the  game  designer  is  to  think  of  ways  to 
maintain  challenge,  reward,  and  progress  for  the  unskilled 
player  without  severely  hampering  the  skilled  player.  One  in¬ 
teresting  and  explicit  example  of  this  is  found  in  Diddy  Kong 
Racing  (1997).  In  this  game,  the  racer  can  collect  bananas  along 
the  roadway.  Each  banana  increases  the  top  speed  of  your  car. 
The  player  can  also  collect  missiles  to  fire  forward  at  the  lead¬ 
ing  cars.  Each  time  you  hit  a  car  with  a  missile,  it  not  only  slows 
the  car’s  progress,  but  it  jars  loose  several  bananas  that  the  trail¬ 
ing  player  can  pick  up.  Thus,  trailing  players  have  tools  that  they 
can  use  to  catch  the  leaders  even  if  the  leaders  are  not  making 
any  driving  mistakes.  The  chief  distinction  between  this  and  dy¬ 
namic  difficulty  adjustment  is  that  the  game  is  not  modifying 
skills  based  on  success.  Instead,  the  rules  of  the  game  provide 
the  trailing  player  with  known  advantages  over  the  leader. 

Players  Must  Be  Rewarded  Appropriately 

Explicit  or  slow  reinforcement  schedules  may  cause  users  to 
lose  motivation  and  quit  playing  a  game.  Because  playing  a 
game  is  voluntary,  games  need  to  quickly  grab  the  user’s  atten¬ 
tion  and  keep  him  or  her  motivated  to  come  back  repeatedly. 
One  way  to  accomplish  this  is  to  reward  players  for  continued 
play.  Theories  of  positive  reinforcement  suggest  behaviors  that 
lead  to  positive  consequences  tend  to  be  repeated.  Thus,  it 
makes  sense  that  positive  reinforcement  can  be  closely  tied  to 
one’s  motivation  to  continue  playing  a  game.  However,  it  is  less 
clear  which  types  of  reinforcement  schedules  are  most  effective. 

Although  the  model  should  not  necessarily  be  used  for  all 
games,  research  suggests  that  continuous  reinforcement  sched¬ 
ules  can  establish  desired  behaviors  in  the  quickest  amount  of 
time  (Steers  &  Porter,  1991).  Unfortunately,  once  continuous  re¬ 
inforcement  is  removed,  desired  behaviors  extinguish  very 
quickly.  Use  of  partial  reinforcement  schedules  take  longer  to 
extinguish  desired  behaviors  but  may  take  too  long  to  capture 
the  interest  of  gamers.  Research  suggests  that  variable  ratio 
schedules  are  the  most  effective  in  sustaining  desired  behav¬ 
iors  (Jablonsky  &  DeVries,  1972).  This  kind  of  schedule  is  a  sta¬ 
ple  of  casino  gambling  games,  in  which  a  reward  is  presented  af¬ 
ter  a  variable  number  of  desired  responses.  Overall,  there  is  no 
clear  answer.  Creating  a  game  that  establishes  immediate  and 


continued  motivation  to  continue  playing  over  long  periods  is 
a  very  complex  issue. 

Another  facet  of  reinforcement  systems  that  may  impact  en¬ 
joyment  of  a  game  is  whether  the  player  attributes  the  fact  that 
they  have  been  playing  a  game  to  extrinsic  or  intrinsic  motiva¬ 
tions.  Intrinsic  explanations  for  behavior  postulate  that  the  mo¬ 
tivators  to  perform  the  behavior  come  from  the  personal  needs 
and  desires  of  the  person  performing  the  behavior,  whereas  ex- 
trinsically  motivated  behaviors  are  those  that  people  perform  in 
order  to  gain  a  reward  from  or  please  other  people.  In  research 
on  children’s  self-perceptions  and  motivations,  M.  Lepper, 
Greene,  and  Nisbett  (1973)  discovered  that  children  who  were 
given  extrinsic  rewards  for  drawing  were  less  likely  to  continue 
drawing  than  those  who  had  only  an  intrinsic  desire  to  draw. 
The  conclusion  that  they  drew  is  that  children  perceived  their 
motivations  to  draw  as  coming  from  extrinsic  sources  and  thus 
discounted  their  self-perceptions  that  they  liked  to  draw. 

The  same  may  be  true  of  reward  systems  in  games  (Lepper, 
M.  R.  &  Malone,  1987;  Malone,  1981).  To  a  certain  degree,  all 
reinforcement  systems  in  games  are  extrinsic  because  they  are 
created  or  enabled  by  game  developers.  Some  reward  systems 
are  more  obviously  extrinsic  than  others  are.  For  instance,  imag¬ 
ine  the  following  rewards  that  could  be  associated  with  combat 
in  a  fantasy  role-playing  game  (RPG).  The  player  who  slays  a 
dragon  with  the  perfect  combination  of  spell  casting  and  sword¬ 
play  may  acquire  the  golden  treasure  that  the  dragon  was  hoard¬ 
ing.  In  this  situation,  the  personal  satisfaction  comes  from  being 
powerful  enough  to  win  and  smart  enough  to  choose  the  cor¬ 
rect  tactics.  The  gold  is  an  extrinsic  motivator.  The  satisfaction  is 
intrinsic.  By  analogy  from  M.  Lepper,  Greene,  and  Nisbett’s 
(1973)  research,  feelings  of  being  powerful  and  smart  (intrinsic 
motivators)  are  more  likely  to  keep  people  playing  than  extrin¬ 
sic  rewards. 

Collecting  and  Completing 

The  chief  goal  of  many  games  is  to  acquire  all  of  the  available 
items,  rewards,  and  knowledge  contained  in  the  game.  In  games 
such  -isPokemon  Crystal  (2000),  the  central  challenge  is  to  ac¬ 
quire  as  many  of  the  Pokemon  characters  as  you  can  and  learn 
all  of  their  skills  well  enough  to  outsmart  your  opponent  at  se¬ 
lecting  the  right  characters  for  a  head-to-head  competition.  Not 
coincidentally,  the  catch  phrase  for  the  Pokemon  Crystal  game 
is  “Gotta  catch  ’em  all!” 

This  game  mechanic  is  also  used  by  numerous  games  to  add 
depth  and  repeat  play.  Though  this  is  not  the  primary  mechanic 
in  Madden  NFL  06  (2005) ,  the  ability  to  collect  electronic  player 
cards  and  use  them  strategically  in  games  provides  incentive 
for  gamers  to  experiment  with  much  of  the  content  that  they 
may  not  experience  if  playing  through  a  standard  season. 

Story 

Characters  and  narrative  help  gamers  attach  meaning  and  sig¬ 
nificance  to  action  sequences.  Those  in  the  games  industry  pro¬ 
pose  that  many  games  neither  have  nor  need  a  story.  It  is  our 
contention  that  the  key  to  understanding  narrative  in  games  is 
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to  realize  that  story  lines  may  be  both  embedded  in  the  game, 
or  they  may  emerge  in  the  course  of  playing  a  game. 

When  most  consumers  think  about  story,  they  think  about 
embedded  story  lines  (Levine,  2001).  Final  Fantasy  X  (2001) 
tells  its  story  by  cutting  action  sequences  with  a  series  of  full- 
motion  cut  scenes,  real-time  cut  scenes,  and  player-driven  char¬ 
acter  interactions.  The  embedded  story  forms  a  central  part  of 
the  appeal  of  the  game.  Levine  (2001)  pointed  out  that  much  of 
the  narrative  in  a  game  emerges  in  the  successes  and  failures  ex¬ 
perienced  by  players  throughout  the  course  of  the  game.  This 
is  especially  true  of  multiplayer  games,  in  which  story  is  often 
generated  exclusively  by  the  interactions  of  the  participants.  As 
Levine  described  it,  the  story  is  generated  by  replaying  an  ab¬ 
stract  narrative  structure  with  a  strict  set  of  rules  and  possibilities 
within  a  novel  set  of  circumstances.  Sporting  events  both  within 
and  outside  of  the  video  game  world  provide  an  excellent  ex¬ 
ample  of  this  type  of  narrative.  No  author  scripted  the  result  of 
the  last  World  Cup  tournament,  but  each  such  event  has  the  po¬ 
tential  to  be  an  epic  narrative  for  both  participants  and  viewers. 

Technological  Innovation  Drives  Design 

There  is  a  great  deal  of  pressure  on  designers  to  utilize  new  tech¬ 
nologies  that  may  break  old  interaction  models.  The  desire  to  ex¬ 
perience  ever  more  realistic  and  imaginative  games  has  pushed 
game  developers  into  engineering  and  computer-science  re¬ 
search  domains.  Likewise,  technology  often  drives  game  design 
in  order  to  showcase  new  capabilities.  The  constant  demand  for 
novelty  can  be  strong  enough  incentive  for  game  makers  to  try 
untried  designs,  “spruce  up”  familiar  interfaces,  and  break  rules 
of  consistency.  For  example,  the  original  AtTZ2iC  series  (1999) 
sported  a  new  interface  model  in  which  users  selected  interface 
areas  by  holding  the  thumbstick  in  a  direction  while  pressing  a 
button.  It  is  possible  that  Sega  chose  the  new  design  primarily 
because  it  was  new  and  different  from  existing  interfaces.  It  re¬ 
quired  the  somewhat  new  (at  that  time)  mechanics  of  the 
thumbstick  on  the  controller,  it  minimized  movement  because 
one  could  point  directly  at  any  given  item  in  the  menu,  and  it  was 
cleverly  shaped  like  a  football  (which  made  more  sense  for 
NFL2K  than  for  another  sport).  However,  errors  due  to  more 
error-prone  targeting  and  the  hold  and  click  interaction  meta¬ 
phor  made  the  system  harder  for  first-time  players  to  use. 

Perceptual-Motor  Skill  Requirements 

The  way  that  functions  are  mapped  onto  available  input  devices 
can  determine  the  success  or  failure  of  a  game.  A  crucial  part 
of  the  fun  in  many  games  comes  from  performing  complex  per¬ 
ceptual-motor  tasks.  While  today’s  arcade-style  games  include 
increasingly  more  sophisticated  strategic  elements,  a  core  ele¬ 
ment  of  many  games  is  providing  the  ability  to  perform  ex¬ 
tremely  dexterous,  yet  satisfying  physical  behaviors.  These  be¬ 
haviors  are  usually  quick  and  well-timed  responses  to  changes 
and  threats  in  the  environment.  If  the  controls  are  simple 
enough  to  master  and  the  challenges  increase  at  a  reasonable 
difficulty,  these  mostly  physical  responses  can  be  extremely  sat¬ 
isfying  (Csiksentmihalyi,  1990). 


Problems  can  arise  when  games  require  unfamiliar  input  de¬ 
vices.  This  is  a  common  complication  in  console  game  usability 
research  because  new  console  systems  usually  introduce  new 
input  device  designs  unique  to  that  system  (see  Fig.  37.1).  Fur¬ 
thermore,  game  designers  often  experiment  with  new  meth¬ 
ods  for  mapping  the  features  of  their  games  to  the  unique  tech¬ 
nical  requirements  and  opportunities  of  new  input  devices. 
Unfortunately,  this  does  not  always  result  in  a  better  gameplay 
experience. 

Balancing  Multiplayer  Games 

As  we  have  seen,  different  strategies  can  be  used  to  support  a 
broad  range  of  skill  player  skill  levels.  While  this  is  complicated 
in  single-player  games,  it  becomes  even  more  daunting  when 
players  of  different  skills  make  up  both  the  opponents  and  the 
allies.  By  far  the  most  common  strategy  for  regulating  the  chal¬ 
lenge  level  of  online  multiplayer  games  is  to  provide  strong 
matchmaking  tooh.  Internet  Backgammon  (2001)  automati¬ 
cally  matches  players  from  around  the  world  based  on  their  self- 
reported  skill  level.  Other  games  use  algorithms  that  count 
player  winning  percentages  and  strength  of  opponent. 

Another  strategy  seeks  to  solve  skill  balance  problems  out¬ 
side  of  the  game  by  offering  players  a  wide  array  of  arenas  and 
game  types.  Rather  than  actively  connecting  players  of  like  skill, 
this  approach  provides  a  broad  array  of  places  to  play  and  allows 
players  to  self-select  into  game  types  that  suit  their  style  and  at¬ 
tract  the  players  with  whom  they  want  to  associate.  A  final 
approach,  used  frequently  by  instant  messaging  clients  (e.g., 
Yahoo  IM,  MSN  Messenger),  is  to  make  it  easy  to  start  a  game 
with  friends.  Though  skill  levels  of  friends  may  not  always 
match,  you  are  presumably  less  likely  to  perceive  the  match  as 
unfair  against  people  that  you  know. 

Game  designers  also  employ  a  variety  of  in-game  strategies 
to  balance  out  the  competition.  The  most  common  way  to  allow 
less  skilled  players  to  compete  effectively  with  players  that  are 
more  skilled  is  to  play  team  games.  Many  games  allow  players  to 


FIGURE  37. 1 .  This  is  a  sample  of  different  types  of  input  de¬ 
vices  used  in  games;  from  the  traditional  keyboard  and  mouse,  to 
different  console  gamepads. 
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take  on  a  variety  of  roles.  Capture  the  Flag  is  a  common  back¬ 
yard  tag-style  game  that  forms  the  basis  for  a  game  type  in  many 
first-person  shooters.  Some  players  go  on  offense  to  capture  the 
opponent’s  flag,  some  stay  back  to  defend  their  flag.  Likewise, 
some  players  may  take  a  long-range  sniper  weapon  to  frustrate 
the  enemy,  while  others  take  weapons  that  are  more  effective  at 
close  range.  The  defending  and  sniping  roles  can  be  more  com¬ 
fortable  for  some  novice  players  because  they  allow  the  player 
to  seek  protective  cover,  they  require  less  knowledge  of  the  play 
field,  and  one-on-one  combat  can  be  avoided. 

Some  first-person  shooters  also  allow  the  game  host  to  set 
handicaps  for  successful  players  and  bonuses  for  less  successful 
players.  For  example,  one  version  of  Unreal  Tournament  in¬ 
creased  the  size  of  the  player’s  character  with  every  point  that 
they  won,  and  decreased  the  size  of  the  character  every  time  that 
the  character  died.  The  size  differences  made  successful  players 
easier  targets  and  unsuccessful  players  more  difficult  targets. 

Enabling  Social  Networking 

Skill  level  is  not  the  only  dimension  of  importance.  Recent  sys¬ 
tems,  such  as  the  Xbox  Live  service,  seek  to  group  players  of 
similar  attitudes  and  behaviors.  The  goal  is  to  place  more  an¬ 
tagonistic  players  with  like-minded  others  who  appreciate 
tough  talk,  while  protecting  players  who  do  not  want  to  or 
should  not  be  exposed  to  aggressive  or  offensive  language. 

This  is  just  one  of  many  tools  that  games  have  developed  to 
promote  group  play  and  improve  communication  between  on¬ 
line  players.  Most  online  games  include  some  form  of  in-game 
messaging.  Often  this  messaging  is  tailored  for  fast  and  efficient 
communication  of  key  functions  or  timely  game  events.  Most 
massively  multiplayer  online  RPGs  have  negotiation  systems 
that  help  people  trade  objects  safely  and  efficiently.  Most  first- 
person  shooters  present  automatic  messages  telling  you  which 
players  have  just  been  killed  by  whom.  Real-time  strategy  (RTS) 
games  allow  gamers  to  set  flares  or  markers  on  the  map  to  no¬ 
tify  your  allies  of  key  positions.  Each  game  genre  has  a  key  set  of 
in-game  communication  tools  to  support  the  game  play. 

More  frequently,  game  designers  are  starting  to  embed  co¬ 
operative  tools  into  the  environment  itself  Much  of  the  benefit 
of  vehicles  in  first-person  shooting  games  comes  from  their  use 
as  cooperative  tools.  Used  in  concert,  these  tools  can  often  be 
extremely  effective.  From  a  game  design  perspective,  they  pro¬ 
vide  a  great  incentive  for  people  to  come  together,  strategize, 
and  work  co-operatively.  These  shared  successes  can  be  a  huge 
part  of  the  fun  in  online  games. 

Massively  multiplayer  games  employ  a  wide  variety  of  incen¬ 
tives  for  players  to  form  groups.  Success  comes  from  taking  on 
missions  that  are  far  too  dangerous  for  any  single  player  to  ac¬ 
complish  on  their  own.  Players  choose  a  role  and  learn  to  com¬ 
plement  the  strengths  of  the  other  members  of  their  group  in 
order  to  overcome  fearful  enemies  and  accomplish  great  quests. 
Recent  massively  multiplayer  games  have  invested  even  more  in 
the  creation  of  large-scale  guilds.  Most  massively  multiplayer 
games  intentionally  keep  large  areas  of  information  secret  from 
their  players — to  provide  the  need  and  the  opportunity  for  sym¬ 
biotic  relationships  between  experts  and  novices.  In  addition, 
modern  MMOs  often  provide  pyramid-scheme  style  bonuses  in 


wealth  and  experience  to  the  leaders  and  captains  of  guilds.  In 
return,  members  receive  physical,  material,  educational,  and 
social  protection  from  their  leaders  and  peers. 


IMPORTANT  FACTORS  IN  GAME 
TESTING  AND  EVALUATION 


Most  game  genres  are  subtly  different  in  the  experiences  that 
they  provoke.  It  may  seem  obvious  that  the  point  of  game  de¬ 
sign  is  making  a  fun  game.  Some  games  are  so  fun  that  people 
will  travel  thousands  of  miles  and  spend  enormous  amounts  of 
money  to  participate  in  gaming  events.  However,  we  would  like 
to  propose  a  potentially  controversial  assertion.  That  the  fun¬ 
damental  appeal  of  some  games  lies  in  their  ability  to  challenge, 
to  teach,  to  bring  people  together,  or  simply  to  experience  un¬ 
usual  phenomena.  Likewise,  the  definition  of  fun  may  be  differ¬ 
ent  for  every  person.  'When  you  play  simulations  games,  your 
ultimate  reward  may  be  a  combination  of  learning  and  mastery. 
■When  you  play  something  like  the  MTV  Music  Generator 
(Codemasters,  1999),  your  ultimate  reward  is  the  creation  of 
something  new.  'When  you  go  online  to  play  card  games  with 
your  uncle  in  Texas,  you  get  to  feel  connected.  Plight  Simula¬ 
tor  2000  (2000)  lets  people  understand  and  simulate  experi¬ 
ences  that  they  always  wished  they  could  have.  While  these  may 
be  subcomponents  of  fun  in  many  cases,  there  may  be  times 
when  using  “fun”  as  a  synonym  for  overall  quality  will  lead  to 
underestimations  of  the  quality  of  a  game.  'While  a  fun  game  is 
often  synonymous  with  a  good  game,  researchers  are  warned  to 
wisely  consider  which  measures  best  suit  the  evaluation  of  each 
game  that  they  evaluate. 

Game  Designer  Intent 

As  stated  earlier,  the  goal  of  a  game  is  to  create  a  pleasurable 
experience.  Goals  in  a  game  are  not  necessarily  derived  from  ex¬ 
ternal  user  needs  as  in  productivity  applications.  In  games,  goals 
are  defined  in  accordance  with  the  game  designer’s  vision, 
which  is  a  novel  position  because  historically,  success  in  has 
been  defined  by  the  accomplishment  of  user  tasks  and  goals 
(Pagulayan,  Gunn,  Romero,  2006).  When  approaching  a  game 
for  user-centered  design  or  testing,  it  is  best  to  assume  the  role 
of  facilitating  the  designer’s  vision  for  the  game  (Pagulayan  & 
Steury,  2004;  Pagulayan,  Steury,  Fulton,  &  Romero,  2003)  be¬ 
cause  many  times,  only  the  designer  can  recognize  when  the 
player  experience  is  not  being  experienced  as  intended.  In  tra¬ 
ditional  usability  testing,  it  is  often  very  recognizable  when  there 
is  user  error,  but  not  so  in  games. 

Davis,  Steury,  and  Pagulayan  (2005)  discussed  a  case  study  in 
the  game  Brute  Force  which  revealed  that  players  were  not  en¬ 
countering  certain  gameplay  features  early  enough  in  their 
gameplay  experience.  The  players  were  not  failing,  but  that  play¬ 
ers  were  taking  much  longer  to  play  through  the  second  mission 
than  intended.  By  understanding  the  design  vision,  the  authors 
were  able  to  work  with  the  designers  to  provide  feedback  that 
resulted  in  shortening  the  second  mission  and  in  reordering 
other  missions  to  match  the  design  intent  of  the  game. 
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Ease  of  Use 

The  ease  of  use  of  a  game’s  controls  and  interface  is  closely  re¬ 
lated  to  fun  ratings  for  that  game.  Think  of  this  factor  as  a  gate¬ 
keeper  on  the  fun  of  the  game.  If  the  user  must  struggle  or  can¬ 
not  adequately  translate  their  intentions  into  in-game  behaviors, 
they  will  become  frustrated.  This  frustration  can  lead  the  user  to 
perceive  the  game  as  being  unfair  or  simply  inaccessible  (or 
simply  not  fun).  Thus,  it  becomes  very  clear  why  usability  be¬ 
comes  very  important  in  games.  Ease  of  use  should  be  evaluated 
with  both  usability  and  attitude-measurement  methodologies, 
which  are  discussed  later  in  the  chapter. 

Starting  a  game.  Starting  the  kind  of  game  that  the  user 
wants  is  an  easily  definable  task  with  visible  criteria  for  success. 
This  is  something  one  can  measure  in  typical  usability  laborato¬ 
ries.  Though  designers  often  take  game  shell  (the  interface  used 
to  start  the  game)  design  for  granted,  a  difficult  or  confusing 
game  shell  can  limit  users’  discoveries  of  features  and  impede 
their  progress  towards  enjoying  the  game.  The  immediate  con¬ 
cern  for  users  can  be  starting  the  kind  of  game  that  they  want 
to  play.  Games  often  provide  several  modes  of  play.  When  the 
game  shell  is  difficult  to  navigate,  users  may  become  frustrated 
before  they  have  even  begun  the  game.  For  example,  we  have 
found  that  many  users  are  unable  to  use  one  of  the  common 
methods  that  sports  console  games  use  to  assign  a  game  con¬ 
troller  to  a  particular  team.  This  has  resulted  in  many  users  mis¬ 
takenly  starting  a  computer  versus  computer  game.  Depending 
on  the  feedback  in  the  in-game  interface,  users  may  think  that 
they  are  playing  when,  in  fact,  the  computer  is  playing  against 
itself  In  these  cases  users  may  even  press  buttons,  develop 
incorrect  theories  about  how  to  play  the  game,  and  become  in¬ 
creasingly  confused  and  frustrated  with  the  game  controls.  The 
most  effective  way  to  avoid  these  problems  is  to  identify  key 
user  tasks  and  test  their  usability. 

Pagulayan,  Steury  Fulton,  et  al.  (2003)  discussed  a  case  study 
where  they  found  issues  with  difficulty  settings  in  the  game 
shell.  In  early  usability  testing,  participants  were  having  prob¬ 
lems  with  setting  the  difficulty  level  of  opponents  in  Combat 
Flight  Simulator  (2000).  This  is  a  case  where  the  users’  game- 
play  experiences  would  have  been  quite  frustrating  because  of  a 
usability  error  in  the  game  shell  if  it  were  not  addressed. 

Tutorials  or  instructional  gameplay.  As  mentioned 
earlier,  tutorials  are  sometimes  necessary  to  introduce  basic 
skills  in  order  to  play  the  game.  In  this  situation,  instructional 
goals  are  easily  translated  into  the  usability  labs  with  compre¬ 
hension  tasks  and  error  rates. 

One  of  the  risks  of  not  testing  tutorials  or  instructional  mis¬ 
sions  is  inappropriate  pacing,  which  can  often  result  from  an 
ill-conceived  learning  curve  at  the  start  of  the  game.  Many 
games  simply  start  out  at  too  difficult  a  challenge  level.  This  is  an 
easy  and  predictable  trap  for  designers  and  development  teams 
to  fall  into  because  when  designers  spend  months  (or  even 
years)  developing  a  game,  they  risk  losing  track  of  the  skill  level 
of  the  new  player.  A  level  that  is  challenging  to  the  development 
team  is  likely  to  be  daunting  to  the  beginner. 

Unfortunately,  the  reverse  can  also  be  troubling  to  the  new 
user.  Faced  with  the  task  of  addressing  new  players,  designers 


may  resort  to  lengthy  explanations.  Frequently,  developers  will 
not  budget  time  to  build  a  ramped  learning  process  into  their 
game.  Instead,  they  may  realize  late  in  the  development  cycle 
that  they  need  to  provide  instruction.  If  this  is  done  too 
abruptly,  the  learning  process  can  end  up  being  mostly  expla¬ 
nation,  and  to  be  frank,  explanation  is  boring.  The  last  thing  that 
you  want  to  do  is  to  bore  your  user  with  a  longwinded  explana¬ 
tion  of  what  they  are  supposed  to  do  when  they  get  into  your 
game.  It  is  best  to  learn  in  context  and  at  a  measured  pace  or 
users  may  just  quit  the  game. 

A  very  positive  example  is  the  first  level  of  Banjo  Kazooie 
(2000).  At  the  start,  the  player  is  forced  to  encounter  a  helpful  tu¬ 
tor  and  listen  to  a  few  basic  objectives.  Then,  they  must  complete 
some  very  basic  objectives  that  teach  some  of  the  basic  character 
abilities.  Much  of  the  tutorial  dialogue  may  be  skipped,  but  the 
skills  necessary  to  continue  must  be  demonstrated.  In  this  way, 
the  game  teaches  new  skills  but  never  requires  tedious  instruc¬ 
tion.  The  player  learns  primarily  by  doing.  All  of  this  is  done  in 
the  shadow  of  a  very  visible  path  onto  the  rest  of  the  game  so  the 
user  never  loses  sight  of  where  they  need  to  go. 

In-game  interfaces.  In-game  interfaces  are  used  primarily 
to  deliver  necessary  status  feedback  and  to  perform  less-frequent 
functions.  "We  measure  effectiveness  with  more  traditional  lab 
usability  testing  techniques  and  desirability  with  attitude  mea¬ 
surements  such  as  surveys  (see  next  section  for  example). 

Some  PC  games  make  extensive  use  of  in-game  interfaces  to 
control  the  game.  For  example,  simulation  and  RTS  games  can 
be  controlled  by  keyboard  and  mouse  presses  on  interface  ele¬ 
ments  in  the  game.  Usability  improvements  in  these  interfaces 
can  broaden  the  audience  for  a  game  by  making  controls  more 
intuitive  and  reducing  tedious  aspects  of  managing  the  game 
play.  In-game  tutorial  feedback  can  make  the  difference  be¬ 
tween  confusion  and  quick  progression  in  learning  the  basic 
mechanisms  for  playing.  In  this  situation,  iterative  usability  eval¬ 
uations  become  a  key  methodology  for  identifying  problems 
and  testing  their  effectiveness  (see  next  section  for  example). 

Many  complex  PC  and  console  video  games  make  frequent 
use  of  in-game  feedback  and  heads-up  displays  (HUD)  to  dis¬ 
play  unit  capabilities  and  status.  For  example,  most  flight  com¬ 
bat  games  provide  vital  feedback  about  weapons  systems  and 
navigation  via  in-game  displays.  Without  this  feedback,  it  can 
be  difficult  to  determine  distance  and  progress  towards  objec¬ 
tives,  unit  health  and  attack  success.  This  feedback  is  crucial  for 
player  learning  and  satisfaction  with  the  game.  With  increasing 
game  complexity  and  3-D  movement  capabilities,  these  displays 
have  become  a  crucial  part  of  many  game  genres.  Usability  test¬ 
ing  is  required  to  establish  whether  users  can  detect  and  cor¬ 
rectly  identify  these  feedback  systems  (see  Pagulayan,  Steury, 
Fulton,  et  al.,  2003  for  detailed  example  of  usability  testing  the 
HUD  m  MechWarrior  4:  Vengeance,  2000). 

Mapping  input  devices  to  functions.  A  learnable  map¬ 
ping  of  buttons,  keys,  or  other  input  mechanisms  to  functions  is 
crucial  for  enjoying  games.  We  measure  effectiveness  with  us¬ 
ability  techniques  and  desirability  with  attitude  measurements. 
Without  learnable  and  intuitive  controls,  the  user  will  make  fre¬ 
quent  mistakes  translating  their  desires  into  onscreen  actions. 
We  have  seen  consistently  that  these  kinds  of  mistakes  are  enor- 
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mously  frustrating  to  users,  because  learning  to  communicate 
one’s  desires  through  an  eight-button  input  device  is  not  very 
fun.  The  selection  of  keys,  buttons,  and  other  input  mechanisms 
to  activate  particular  features  is  often  called  “control  mapping.” 
Players  tend  to  feel  that  learning  the  control  mapping  is  the 
most  basic  part  of  learning  the  game.  It  is  a  stepping-stone  to 
getting  to  the  fun  tasks  of  avoiding  obstacles,  developing  strate¬ 
gies,  and  blowing  things  up. 

By  contrast  to  other  ease  of  use  issues,  evaluating  the  control 
mapping  may  involve  as  much  subjective  measurement  as  be¬ 
havioral  observation.  Button  presses  are  fast,  frequent,  and  hard 
to  collect  automatically  in  many  circumstances.  Furthermore, 
problems  with  control  mappings  may  not  manifest  themselves 
as  visible  impediments  to  progress,  performance,  or  task  time.  In¬ 
stead,  they  may  directly  influence  perceptions  of  enjoyment,  con¬ 
trol,  confidence,  or  comfort.  Due  to  differences  in  experience  lev¬ 
els  and  preferences  between  participants,  there  may  also  be 
significant  variation  in  attitudes  about  how  to  map  the  controls. 

Dissatisfaction  with  the  controller  design  can  also  be  a  cen¬ 
tral  factor  that  limits  enjoyment  of  all  games  on  a  system.  For 
example,  the  results  of  one  set  of  studies  on  the  games  for  a  par¬ 
ticular  console  game  system  were  heavily  influenced  by  com¬ 
plaints  about  the  system’s  controller.  Grasping  the  controller 
firmly  was  difficult  because  users’  fingers  were  bunched  up  and 
wrists  were  angled  uncomfortably  during  game  play.  Ratings  of 
the  overall  quality  of  the  games  were  heavily  influenced  by  the 
controller  rather  than  the  quality  of  the  game  itself 

Because  of  these  concerns  and  the  importance  of  optimizing 
control  mappings,  we  recommend  testing  control  mappings 
with  both  usability  and  attitude-assessment  methodologies. 

Challenge 

Challenge  is  distinct  from  ease  of  use  and  is  measured  almost 
exclusively  with  attitude-assessment  methodologies.  This  can  be 
a  critical  factor  to  the  enjoyment  of  a  game,  and  obviously  can 
be  highly  individualized  and  is  rightly  considered  subjective. 

Consumers  may  have  difficulties  distinguishing  the  appro¬ 
priate  kinds  of  challenge  that  result  from  calculated  level  and  ob¬ 
stacle  design,  from  the  difficulty  that  is  imposed  by  inscrutable 
interface  elements  or  poor  communication  of  objectives.  In 
either  case,  the  result  is  the  same.  If  not  designed  properly  the 
player’s  experience  will  be  poor.  Thus,  it  is  up  to  the  user-testing 
professional  to  make  measurement  instruments  that  evaluate 
the  appropriateness  of  the  challenge  level  independent  of  us¬ 
ability  concerns.  In  one  example,  Pagulayan,  Steury  Fulton,  et  al. 
(2003)  used  attitude-assessment  methodologies  to  determine 
the  final  design  of  the  career  mode  in  RalliSport  Challenge 
(2002).  In  this  situation,  finding  a  solution  to  the  design  problem 
was  not  necessarily  related  to  ease  of  use  issues,  or  other  usabil¬ 
ity-related  issues.  The  final  design  was  based  on  what  was  most 
fun  and  appropriately  challenging  for  users. 

Pace 

We  define  pace  as  the  rate  at  which  players  experience  new  chal¬ 
lenges  and  novel  game  details.  We  measure  this  with  attitude- 
measurement  methodologies. 


Most  designers  will  recognize  that  appropriate  pacing  is  re¬ 
quired  to  maintain  appropriate  levels  of  challenge  and  tension 
throughout  the  game.  You  might  think  of  this  as  the  sequence 
of  obstacles  and  rewards  that  are  presented  from  the  start  of 
the  game  to  the  end.  However,  the  way  a  designer  will  address 
pace  will  depend  on  a  variety  of  issues,  including  game  type, 
game  genre,  and  their  particular  vision  for  the  gameplay  expe¬ 
rience.  In  a  tennis  game,  pace  can  be  affected  by  a  number  of 
things,  including  the  number  of  cut-scenes  in  between  each 
point,  to  the  actual  player  and  ball  movement  speed  (Pagulayan 
&  Steury,  2004). 

One  group  at  Microsoft  uses  a  critical  juncture  analogy  to 
describe  pacing.  As  a  metaphor,  they  suggest  that  the  designer 
must  attend  to  keeping  the  user’s  attention  at  10  seconds, 
10  minutes,  10  hours,  and  100  hours.  The  player  can  always  put 
down  the  game  and  play  another  one,  so  one  must  think  cre¬ 
atively  about  giving  the  user  a  great  experience  at  these  criti¬ 
cal  junctures.  Some  games  excel  at  certain  points  but  not  oth¬ 
ers.  For  example,  the  massively  multiplayer  game  may  have  the 
user’s  rapt  attention  at  10  seconds  and  10  minutes.  The  fact 
that  hundreds  of  thousands  pay  1 10  per  month  to  continue 
playing  indicates  that  these  games  are  very  rewarding  at  the 
100-hour  mark.  Anyone  who  has  played  one  of  these  games  can 
tell  you  that  they  are  extremely  difficult  and  not  too  fun  to  play 
at  the  10-hour  mark.  At  this  point,  you  are  still  being  killed 
repeatedly.  That  is  no  fun  at  all.  Arcade  games  obviously  take 
this  approach  very  seriously.  Though  they  may  not  scale  to 
100  hours,  good  arcade  games  attract  you  to  drop  a  quarter  and 
keep  you  playing  for  long  enough  to  make  you  want  to  spend 
another  quarter  to  continue. 

Pacing  may  also  be  expressed  as  a  set  of  interwoven  objec¬ 
tives  much  like  the  subplots  of  a  movie.  Again,  Banjo  Kazooie 
(2000)  provides  an  excellent  example  of  good  pacing.  Each  level 
in  Banjo  Kazooie  contains  the  tools  necessary  to  complete  the 
major  objectives.  Finding  the  tools  is  an  important  part  of  the 
game.  New  abilities,  objectives,  skills,  and  insights  are  gradually 
introduced  as  the  player  matures.  While  progressing  towards 
the  ultimate  goal  (of  vanquishing  the  evil  witch  and  saving  the 
protagonist’s  sister),  the  player  learns  to  collect  environmental 
objects  that  enable  them  to  fly,  shoot,  become  invincible, 
change  shape,  gain  stamina,  add  extra  lives,  and  unlock  new  lev¬ 
els.  This  interlocking  set  of  objectives  keeps  the  game  interest¬ 
ing  and  rewarding.  Even  if  one  is  unable  to  achieve  a  particular 
goal,  there  are  always  sets  of  sub  goals  to  work  on.  Some  of 
which  may  provide  cues  about  how  to  achieve  the  major  goal. 

Summary 

Attitude  methodologies  are  better  apt  to  measure  factors  such 
as  overall  fun,  graphics,  sound,  challenge,  and  pace.  The  typi¬ 
cal  iterative  usability  is  an  exploratory  exercise  designed  to  un¬ 
cover  problem  areas  where  the  designers  intentions  do  not 
match  the  users  expectations,  as  a  result,  we  typically  choose 
not  to  use  usability  test  to  assess  fun  or  challenge.  When  at¬ 
tempting  to  assess  attitudinal  issues  as  overall  fun  and  chal¬ 
lenge,  we  make  use  of  a  survey  technique  that  affords  testing 
larger  samples.  Internally,  we  have  adopted  the  term  Play  test 
or  sometimes  Consumer  Playtest  for  this  technique.  At  the 
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same  time,  we  use  typical  iterative  usability  methods  to  deter¬ 
mine  design  elements  that  contribute  to  or  detract  from  the 
experience  of  fun. 


USER  RESEARCH  IN  GAMES 


Introduction  to  Methods — Principles  in  Practice 

In  the  following  section,  we  propose  various  methodologies  and 
techniques  that  attempt  to  accurately  measure  and  improve 
games  game  usability  and  enjoyment.  Many  of  the  examples  are 
taken  from  techniques  used  and  developed  by  the  Games  User 
Research  group  at  Microsoft  Game  Studios. 

Our  testing  methods  can  be  organized  by  the  type  of  data  be¬ 
ing  measured.  At  the  most  basic  level,  we  categorize  our  data 
into  two  types:  behavioral  and  attitudinal.  Behavioral  refers  to 
observable  data  based  on  performance,  or  particular  actions 
performed  by  a  participant  that  one  can  measure.  This  is  very 
similar  to  typical  measures  taken  in  usability  tests  (e.g.,  time  it 
takes  to  complete  a  task,  number  of  attempts  it  takes  to  suc¬ 
cessfully  complete  a  task,  task  completion,  etc.).  Attitudinal 
refers  to  data  that  represent  participant  opinions  or  views,  such 
as  subjective  ratings  from  questionnaires  or  surveys.  These  are 
often  used  to  quantify  user  experiences.  Selection  of  a  particu¬ 
lar  method  will  depend  on  what  variables  are  being  measured 
and  what  questions  need  to  be  answered. 

Another  distinction  that  is  typically  made  is  between  forma¬ 
tive  and  summative  evaluations,  which  we  apply  to  our  testing 
methods  as  well.  Formative  refers  to  testing  done  on  our  own 
products  in  development.  Summative  evaluations  are  bench¬ 
mark  evaluations,  done  either  on  our  own  products  or  on  com¬ 
petitor  products.  It  can  be  a  useful  tool  for  defining  metrics  or 
measurable  attributes  in  planning  usability  tests  (Nielsen,  1993), 
or  to  evaluate  strengths  and  weaknesses  in  competitor  products 
for  later  comparison  (Dumas  &  Redish,  1999). 

While  these  methods  are  useful,  they  do  not  allow  us  to  ad¬ 
dress  issues  with  extended  gameplay,  that  is,  issues  that  may 
arise  after  playing  the  game  for  a  couple  of  days  or  more.  This 
is  problematic,  because  one  of  the  key  challenges  in  game  de¬ 
sign  is  longevity.  With  the  competition,  the  shelf  life  of  a  game 
becomes  very  limited. 

Usability  techniques.  Traditional  usability  techniques 
can  be  used  to  address  a  portion  of  the  variables  identified  as 
important  for  game  design.  In  addition  to  measuring  perfor¬ 
mance,  we  use  many  standard  usability  techniques  to  answer 
how  and  why  process-oriented  questions.  For  example,  how  do 
users  perform  an  attack,  or  why  are  controls  so  difficult  to  learn? 
We  use  (a)  structured  usability  tests,  (b)  rapid  iterative  testing 
and  evaluation  (RITE;  Medlock,  Wixon,  Terrano,  &  Romero, 
2002),  and  (c)  other  variations  and  techniques,  including  open- 
ended  usability  tasks,  paper  prototypes,  and  gameplay  heuris¬ 
tics.  For  clarity  of  presentation,  each  technique  will  be  discussed 
separately,  followed  by  a  case  study.  Each  case  study  will  only 
contain  information  pertinent  to  a  specific  technique,  thus  ex¬ 
amples  may  be  taken  from  a  larger  usability  test. 


Structured  usability  test.  A  structured  usability  test 
maintains  all  the  characteristics  that  Dumas  &  Redish  (1999) 
proposed  as  common  to  all  usability  tests:  (a)  goal  is  to  im¬ 
prove  usability  of  the  product,  (b)  participants  represent  real 
users,  (c)  participants  do  real  tasks,  (d)  participant  behavior  and 
verbal  comments  are  observed  and  recorded,  and  (e)  data  are 
analyzed,  problems  are  diagnosed,  and  changes  are  recom¬ 
mended.  We  have  found  that  issues  relating  to  expectancies,  ef¬ 
ficiency,  and  performance  interaction  are  well  suited  for  this 
type  of  testing.  Some  common  areas  of  focus  for  structured  us¬ 
ability  testing  are  in  game  shell  screens,  or  control  schemes.  The 
game  shell  can  be  defined  as  the  interface  where  a  gamer  can 
determine  and  or  modify  particular  elements  of  the  game.  This 
may  include  main  menus  and  options  screens  (e.g.,  audio, 
graphics,  controllers,  etc.). 

An  example  that  uses  this  method  is  in  the  MechComman- 
der  2  (2001)  usability  test.  Portions  of  the  method  and  content 
have  been  omitted. 

Case  Study:  MechCommander  2  Usability  Test.  MechCom- 
mander  2  (MC2)  is  a  PC  RTS  game  where  the  gamer  takes  con¬ 
trol  of  a  unit  of  mechs  (e.g.,  large  giant  mechanical  robots).  One 
area  of  focus  for  this  test  was  on  the  ’Mech  Lab,  a  game  shell 
screen  where  mechs  can  be  customized  (see  Fig.  37.2).  The 
gamer  is  able  to  modify  weaponry,  armor,  and  other  similar  fea¬ 
tures  and  is  limited  by  constraints  such  as  heat,  money,  and 
available  slots. 

The  first  step  in  approaching  this  test  was  to  define  the 
higher-order  goals.  Overall,  the  game  shell  screens  had  to  be 
easy  to  navigate,  understand,  and  manipulate,  not  only  for  those 
familiar  with  mechs  and  the  mech  universe,  but  also  for  RTS 
gamers  who  are  not  familiar  with  the  mech  universe.  Our  goal 
was  for  gamers  to  be  able  to  modify  and  to  customize  mechs  in 
the  ’Mech  Lab. 

As  mentioned,  one  of  the  most  important  steps  in  this  pro¬ 
cedure  is  defining  the  participant  profiles.  Getting  the  appro¬ 
priate  users  for  testing  is  vital  to  the  success  and  validation  of 
the  data  since  games  are  subject  to  much  scrutiny  and  criticism 
from  its  gamers.  To  reiterate,  playing  games  is  a  choice.  For 


FIGURE  37.2.  Screenshot  of  the  'Mech  Lab  in  MechCommander  2. 
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MechCommander 2  (2001),  we  defined  two  participant  profiles 
which  represented  all  of  the  variables  we  wanted  to  cover.  The 
characteristics  of  interest  included  those  who  were  familiar  with 
RTS  games  (experienced  gamers)  and  those  who  were  not  RTS 
gamers  (novice  gamers).  We  also  wanted  gamers  that  were  fa¬ 
miliar  with  the  mech  genre,  or  the  mech  universe.  Overall,  we 
needed  a  landscape  of  gamers  that  had  some  connection  or 
interest  that  would  make  them  a  potential  consumer  for  this 
title,  whether  through  RTS  experience,  or  mech  knowledge. 

Tasks  and  task  scenarios  were  created  to  simulate  situations 
that  a  gamer  may  encounter  when  playing  the  game.  Most  im¬ 
portantly,  tasks  were  created  in  order  to  address  the  predefined 
higher  order  goals.  Participants  were  instructed  to  talk  aloud, 
and  performance  metrics  were  recorded  (e.g.,  task  completion, 
time).  The  following  are  examples  from  the  usability  task  list. 

1.  Give  the  SHOOTIST  jumping  capabilities. 

This  task  allowed  us  to  analyze  participant  expectations.  To 
succeed  in  this  task,  one  had  to  select  a  “CHANGE  WEAPONS” 
button  from  a  different  game  shell  screen,  which  brought  them 
into  the  ’Mech  Lab.  If  the  task  was  to  change  a  weapon  on  a 
mech,  the  terminology  would  probably  have  been  fine.  Thus, 
this  task  had  uncovered  two  main  issues:  (a)  could  they  get  to 
the  ’Mech  Lab  where  you  modify  the  mech,  and  (b)  were  they 
able  to  discover  the  process  of  modifying  the  mech.  It  was  ac¬ 
curately  predicted  that  gamers  would  have  difficulties  with  the 
button  terminology.  Thus,  that  button  was  changed  to  “MODIFY 
MECH.” 

To  change  the  components  (e.g.,  add  the  jump  jets),  partici¬ 
pants  could  either  select  the  item  and  drag  it  off  the  mech  or  se¬ 
lect  the  item  and  press  the  “REMOVE”  button  (see  Fig.  38.2). 
One  unexpected  issue  that  arose  was  that  participants  unknow¬ 
ingly  removed  items  because  the  distance  required  for  removing 
an  item  was  too  small.  The  critical  boundary  that  was  imple¬ 
mented  was  too  strict.  In  addition,  participants  had  difficulties 
adding  items  by  dragging  and  dropping  because  the  distance  re¬ 
quired  for  adding  an  item  was  too  large  (e.g.,  the  item  would 
not  stay  on  the  mech  unless  it  was  placed  exactly  on  top  of  the 
appropriate  location).  Appropriate  recommendations  were 
made  and  implemented. 

2.  Replace  the  MG  Array  with  the  Flamer. 

One  of  the  constraints  presented  for  modifying  a  mech  was 
heat  limit.  Each  weapon  had  a  particular  heat  rating.  Eor  ex¬ 
ample,  if  the  heat  limit  for  a  mech  is  35  and  the  current  heat  rat¬ 
ing  is  32  only  weapons  with  a  rating  of  3  or  fewer  could  be  added. 
In  this  task,  the  “Flamer”  had  a  heat  rating  much  larger  than  the 
“MG  Array,”  thus  making  impossible  to  accomplish  this  task  with¬ 
out  removing  more  items.  The  issues  here  were  the  usability  of 
the  heat  indicator,  heat  icons,  and  the  discoverability  of  heat  limit 
concept.  None  of  the  participants  figured  this  out.  Recommen¬ 
dations  included  changing  the  functionality  of  the  Heat  Limit 
Meter  and  adding  better  visual  cues  to  weapons  that  exceed  the 
heat  limit.  Both  of  these  changes  were  implemented. 

Rapid  iterative  testing  and  evaluation  method  (RITE). 

Medlock  et  al.  (2002)  documented  another  common  usability 
method  used  by  the  Games  User  Research  Group  at  Microsoft 
Game  Studios,  which  they  refer  to  as  the  RITE  method.  In  this 


method,  fewer  participants  are  used  before  implementing 
changes,  but  more  cycles  of  iteration  are  performed.  With  RITE, 
it  is  possible  to  run  almost  two  to  three  times  the  total  sample 
size  of  a  standard  usability  test.  However,  only  one  to  three  par¬ 
ticipants  are  used  per  iteration  with  changes  to  the  prototype 
immediately  implemented  before  the  next  iteration  (or  group  of 
one  to  three  participants). 

The  goal  of  the  RITE  method  is  to  address  as  many  issues 
and  fixes  as  possible  in  a  short  amount  of  time  in  hopes  of  im¬ 
proving  the  gamer’s  experience  and  satisfaction  with  the  prod¬ 
uct.  However,  the  utility  of  this  method  is  entirely  dependent  on 
achieving  a  combination  of  factors  (Medlock  et  al.,  2002).  The 
situation  must  include  (a)  a  working  prototype;  (b)  the  identi¬ 
fication  of  critical  success  behaviors,  important,  but  not  vital 
behaviors,  and  less  important  behaviors;  (c)  commitment  from 
the  development  team  to  attend  tests  and  immediately  review 
results;  (d)  time  and  commitment  from  development  team  to 
implement  changes  before  next  round;  and  (e)  the  ability  to 
schedule  and  run  new  participants  as  soon  as  the  product  has 
been  iterated.  Aside  from  these  unique  requirements,  planning 
the  usability  test  is  very  similar  to  more  traditional  structured  us¬ 
ability  tests. 

It  is  very  helpful  to  categorize  potential  usability  issues 
into  four  categories:  (a)  clear  solution,  quick  implementation, 
(b)  clear  solution,  slow  implementation,  (c)  no  clear  solution, 
and  (d)  minor  issues.  Each  category  has  implications  for  how 
to  address  each  issue.  In  the  first  category,  fixes  should  be  im¬ 
plemented  immediately  and  should  be  ready  for  the  next  itera¬ 
tion  of  testing.  In  the  second  category,  fixes  should  be  started  in 
hopes  that  it  can  be  tested  by  later  rounds  of  testing.  For  the 
third  and  fourth  category,  more  data  should  be  collected. 

The  advantage  of  using  the  RITE  method  is  that  is  allows  for 
immediate  evaluation  and  feedback  of  recommended  fixes  that 
were  implemented.  Changes  are  agreed  upon  and  made  directly 
to  the  product.  If  done  correctly,  the  RITE  method  affords  more 
fixes  in  a  shorter  period.  In  addition,  by  running  multiple  itera¬ 
tions  over  time,  we  are  potentially  able  to  watch  the  number  of 
usability  issues  decrease.  It  provides  a  nice,  easily  understand¬ 
able,  and  accessible  measure.  In  general,  more  iterations  of  test¬ 
ing  are  the  better.  However,  this  method  is  not  without  its  dis¬ 
advantages.  In  this  situation,  we  lose  the  ability  to  uncover 
unmet  user  needs  or  work  practices,  we  are  unable  to  develop  a 
deep  understanding  of  gamer  behaviors,  and  we  are  unable  to 
produce  a  thorough  understanding  of  user  behavior  in  the  con¬ 
text  of  a  given  system  (Medlock  et  al.,  2002). 

The  following  example  demonstrates  how  the  RITE  method 
was  used  in  designing  the  Age  of  Empires  LL.  The  Age  of  Kings 
(1999)  tutorial.  Again,  portions  of  the  method  and  content  have 
been  omitted  (see  Medlock  et  al.,  (2002)  for  more  details). 

Case  Study:  Age  of  Empires  LL  The  Age  of  Kings  Tutorial.  Age 
of  Empires  IL:  The  j^e  of  Kings  (AoE2-,  1999)  is  an  RTS  game  for 
the  PC  in  which  the  gamer  takes  control  of  a  civilization  span¬ 
ning  over  a  thousand  years,  from  the  Dark  Ages  through  the  late 
medieval  period.  In  this  case  study,  a  working  prototype  of  the 
tutorial  was  available,  and  critical  concepts  and  behaviors  were 
defined.  Also,  the  development  team  was  committed  to  attend¬ 
ing  each  of  the  sessions.  They  were  committed  to  quickly  im¬ 
plementing  agreed  upon  changes.  Finally,  the  resources  for 
scheduling  were  available.  The  key  element  in  this  situation  for 
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TABLE  37.2.  Age  of  Empires  Example  Concepts  and  Behaviors 
Categorized  into  Three  Concepts  Using  the  RITE  Method 
(Medlock,  Wixon,  Terrano,  &  Romero,  2002) 


Essential 

Concepts/Behaviors 

Important 

Concepts/Behaviors 

Concepts/Behaviors 
of  Lesser  Interest 

•  movement 

•  queuing  up  units 

•  using  hotkeys 

•  multiselection 

•  setting  gathering 

•  using  mini-map 

of  units 

points 

modes 

•  "fog  of  war” 

•  garrisoning  units 

•  using  trading 

•  scrolling  main 

•  upgrading  units 

•  understanding  sound 

screen  via  mouse 

through  technology 

effects 

success  was  the  commitment  from  the  development  team  to 
work  in  conjunction  with  us. 

In  the  Aoi;2  tutorial,  there  were  four  main  sections:  (a)  march¬ 
ing  and  fighting  (movement,  actions,  unit  selection,  the  fog  of 
war^),  (b)  feeding  the  army  (resources,  how  to  gather,  where 
to  find),  (c)  training  the  troops  (use  of  mini-map,  advancing 
through  ages,  build  and  repair  buildings,  relationship  between 
housing  and  population,  unit  creation  logic),  and  (d)  research 
and  technology  (upgrading  through  technologies,  queuing 
units,  advancing  through  ages).  Each  of  these  sections  dealt 
with  particular  skills  necessary  for  playing  the  game.  In  essence, 
the  tutorial  had  the  full  task  list  built  in.  In  the  previous  case 
study,  this  was  not  the  case. 

At  a  more  abstract  level,  the  goals  of  the  tutorial  had  to  be 
collectively  defined  (with  the  development  team) .  In  more  con¬ 
crete  terms,  specific  behaviors  and  concepts  that  a  gamer 
should  be  able  to  perform  after  using  the  tutorial  were  identi¬ 
fied,  then  categorized  into  the  three  levels  of  importance:  (a)  es¬ 
sential  behaviors  that  users  must  be  able  to  perform  without  ex¬ 
ception,  (b)  behaviors  that  are  important,  but  not  vital  to 
product  success,  and  (c)  behaviors  that  were  of  lesser  interest. 
Table  37.2  lists  some  examples  of  concepts  and  behaviors  from 
each  of  the  three  categories.  This  is  an  important  step  because 
it  indirectly  sets  up  a  structure  for  decision  rules  to  be  used 
when  deciding  what  issues  should  be  addressed  immediately, 
and  what  issues  can  wait. 

The  general  procedure  for  each  participant  was  similar  to 
other  usability  tests  we  often  perform.  If  participants  did  not 
go  to  the  tutorial  on  their  own,  they  were  instructed  to  do  so 
by  the  specialist. 

During  the  session,  errors  and  failures  and  were  recorded.  In 
this  situation,  an  error  was  defined  as  anything  that  caused  con¬ 
fusion.  A  failure  was  a  considered  an  obstacle  that  prevented 
participants  from  being  able  to  continue.  After  each  session,  a 
discussion  ensued  among  the  specialist  and  the  development 
team  to  determine  what  issues  (if  any)  warranted  an  immedi¬ 
ate  change  at  that  time. 

In  order  to  do  this  successfully,  certain  things  had  to  be  con¬ 
sidered.  For  example,  how  can  one  gauge  how  serious  an  issue 
is?  In  typical  usability  tests,  the  proportion  of  participants  expe¬ 


riencing  the  error  is  a  way  to  estimate  its  severity.  Since  changes 
are  made  rapidly  here,  the  criteria  must  change  to  the  intuitively 
estimated  likelihood  that  users  will  continue  to  experience  the 
error.  Another  thing  to  consider  is  clarity  of  the  issue,  which  was 
assessed  by  determining  if  there  is  a  clear  solution.  We  have  of¬ 
ten  found  that  if  issues  do  not  have  an  obvious  solution  then  the 
problem  not  fully  understood.  Finally,  what  errors  or  failures 
were  essential,  important,  or  of  lesser  interest.  Efforts  of  the 
development  team  should  be  focused  on  issues  related  to  the 
essential  category  when  possible. 

At  this  point,  we  broke  down  the  issues  into  three  groups. 
The  first  group  included  issues  with  a  solution  that  could  be 
quickly  implemented.  Every  issue  in  this  group  was  quickly  im¬ 
plemented  before  the  next  participant  was  run.  The  second 
group  consisted  of  issues  with  a  solution  that  could  not  be 
quickly  implemented.  The  development  team  began  working 
on  these  in  hopes  it  could  be  implemented  for  later  iterations  of 
the  test.  Einally  there  were  issues  with  no  clear  solutions.  These 
issues  were  left  untouched  because  more  data  was  needed  to 
assess  the  problem  at  a  deeper  level  (e.g.,  more  participants 
needed  to  be  run).  Any  fixes  implemented  in  the  builds  were 
kept  as  each  participant  was  brought  in.  Thus,  it  was  possible 
that  many  of  the  participants  experienced  a  different  version  of 
the  tutorial  over  the  duration  of  testing. 

Overall,  seven  different  iterations  were  used  across  16  par¬ 
ticipants.  Fig.  37.3  represents  the  number  of  errors  and  failures 
recorded  over  time.  The  number  of  errors  and  failures  gradually 
decreased  across  participants  as  new  iterations  of  the  build  were 
introduced.  By  the  seventh  and  final  iteration  of  the  build,  the 
errors  and  failures  reliably  were  reduced  to  zero. 

Although  we  feel  that  the  AoE2  tutorial  was  an  enormous 
success,  largely  due  to  the  utilization  of  the  RITE  method,  the 
method  does  have  its  disadvantages  and  should  be  used  with 
caution.  Making  changes  when  issues  and  solutions  are  unclear 
may  result  in  not  solving  the  problem  at  all,  while  creating 
newer  usability  problems  in  the  interface.  We  experienced  this 
phenomena  a  couple  of  times  in  the  AoE2  study. 

Also,  making  too  many  changes  at  once  may  introduce  too 
many  sources  of  variability  and  create  new  problems  for  users. 
Deducing  specifically  the  source  of  the  new  problem  becomes 
very  difficult.  A  related  issue  is  not  following  up  changes  with 
enough  participants  to  assess  whether  or  not  the  solution  really 
addressed  the  problem.  Without  this  follow  up,  there  is  little  ev¬ 
idence  supporting  that  the  implementations  made  were  appro¬ 
priate  (which  is  a  problem  with  traditional  usability  methods  as 
well).  The  last  thing  to  consider  is  that  other  important  usability 
issues  that  may  surface  less  frequently  are  likely  to  be  missed. 
Using  such  small  samples  between  iterations  allows  for  the  pos¬ 
sibility  that  those  less  occurring  issues  may  not  be  detected. 

Variations  on  usability  methods.  Now  that  we  have 
presented  two  general  types  of  usability  testing,  it  is  worthy  to 
mention  some  variations  on  these  methods:  (a)  open-ended 
tasks,  (b)  paper  prototyping,  and  (c)  gameplay  heuristics. 


The  /bg  of  war  refers  to  the  black  covering  on  a  mini-map  or  radar  that  has  not  been  explored  yet  by  the  gamer.  The  fog  of  war  lifts  once  that  area 
has  been  explored.  Use  of  the  fog  of  war  is  most  commonly  seen  in  RTS  games. 
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FIGURE  37.3.  A  record  of  failures  and  errors  over  time  for  the 
Age  of  Empires  Tutorial  when  using  the  RITE  method  (Medlock 
et  al.,  2002). 


In  general,  it  is  often  recommended  that  tasks  in  usability  tests 
be  small,  with  a  specified  outcome  (e.g.,  Nielsen,  1993).  However, 
we  have  found  situations  where  the  inclusion  of  an  open-ended 
task  yields  important  data  as  well.  In  many  usability  studies,  we  of¬ 
ten  include  an  open-ended  task  where  participants  are  not  in¬ 
structed  to  perform  or  achieve  anything  in  particular.  In  other 
words,  there  is  no  specified  outcome  to  the  participant.  These 
tasks  can  be  used  to  analyze  how  gamers  prioritize  certain  tasks 
or  goals,  in  a  nonlinear  environment.  These  tasks  are  also  useful 
in  situations  where  structured  tasks  may  confound  the  participant 
experience  or  situations  where  we  are  interested  in  elements  of 
discovery.  An  example  of  an  open-ended  task  is  as  follows. 

1.  Play  the  game  as  if  you  were  at  home.  The  moderator  will  tell 

you  when  to  stop. 

This  example  was  taken  from  a  usability  test  on  Halo:  Com¬ 
bat  Evolved  (2001;  Pagulayan,  Steury,  Fulton  et  al.,  2003).  Par¬ 
ticipants  were  presented  with  a  mission  with  no  instruction 
other  than  playing  as  if  they  were  at  home.  Traditional  usability 
metrics  were  not  tracked  or  used.  Instead,  the  focus  was  watch¬ 
ing  players  and  the  tactics  and  strategies  they  employed  while 
playing  through  the  game.  Results  demonstrated  that  novice 
players  would  start  firing  at  enemies  as  soon  as  they  were  visi¬ 
ble,  which  was  not  how  the  designers  intended  combat  to  oc¬ 
cur.  The  design  intent  was  for  combat  to  occur  at  closer  ranges. 

By  allowing  participants  to  play  through  the  mission  with  no 
structured  task,  designers  were  able  to  detect  the  strategies  play¬ 
ers  would  employ.  As  a  result,  several  changes  were  made  to  the 
gameplay  to  encourage  players  to  engage  in  much  closer  combat 
(see  Pagulayan,  Steury,  Fulton  et  al.,  2003  for  more  details). 

Prototyping,  heuristic  evaluations,  and  empirical  guideline 
documents  are  other  techniques  we  often  use  when  more  time- 
consuming  testing  cannot  be  done.  In  practice,  these  techniques 
do  not  differ  when  used  on  games.  Nielsen  (1993)  categorized 
prototyping  and  heuristic  evaluations  as  “discount  usability  en¬ 
gineering,”  and  we  would  agree.  We  also  tend  to  view  empirical 
guideline  documents  in  a  similar  manner.  Empirical  guideline 
documents  are  essentially  lists  of  usability  principles  for  particu¬ 


lar  content  areas  based  on  our  collective  experience  doing  user¬ 
testing  research.  Examples  of  some  of  these  content  areas  in¬ 
clude  console  game  shell  design,  PC  game  shell  design,  PC  tuto¬ 
rial  design  principles,  movement,  aiming,  and  camera  issues  in 
first  and  third  person  shooter  games,  and  online  multiplayer  in¬ 
terfaces.  Desurvire,  Caplan,  and  Toth  (2004)  developed  a  list  of 
heuristics  targeted  at  computer  and  video  games.  These  have 
been  broken  down  into  four  general  categories:  game  play,  game 
story,  game  mechanics,  and  game  usability.  For  the  full  list  of 
gameplay  heuristics,  see  Desurvire  et  al.,  (2004). 

Survey  Techniques 

The  use  of  surveys  has  been  explored  in  great  depth  (e.g., 
Bradburn  &  Sudman,  1988;  Couper,  2000;  Labaw,  1981;  Payne, 
1979;  Root  &  Draper,  1983;  Sudman,  Bradburn,  &  Schwarz, 
1996)  and  is  considered  a  valid  approach  for  creating  an  attitu- 
dinal  data  set  as  long  as  you  ask  questions  that  users  are  truly  ca¬ 
pable  of  answering  (see  Root  &  Draper,  1983).  We  conduct  sur¬ 
veys  in  the  lab  and  over  the  Internet  in  order  to  collect 
attitudinal  data  regarding  gameplay  experiences,  self-reported 
play  behaviors,  preferences,  styles,  and  motivations. 

Formative  Playtest  surveys.  We  combine  hands-on  play 
with  surveys  to  create  a  formative  lab  method  called  “Playtest.” 
In  nearly  all  instances,  questioning  follows  the  pattern  of  a 
forced  choice  question  (sometimes  a  set  of  forced-choice  ques¬ 
tions)  followed  by  a  more  open-ended  question  encouraging 
the  participants  to  state  the  reasons  behind  their  response.  The 
goal  of  a  Playtest  is  to  obtain  specific  information  about  how 
consumers  perceive  critical  aspects  of  a  game  and  provide  ac¬ 
tionable  feedback  to  game  designers  (Davis  et  al.,  2005).  The  ba¬ 
sic  Playtest  method  is  used  formatively  to  compare  a  product 
at  time  one  and  time  two  during  its  development.  While  a  mod¬ 
ified  version  of  the  basic  method  is  used  cumulatively  to  com¬ 
pare  a  summative  evaluation  to  other,  relevant  games. 

Playtests  have  several  advantages  and  characteristics  that  dif¬ 
ferentiate  them  from  usability  studies  and  online  survey  re¬ 
search.  For  one,  they  focus  mainly  on  perceptions.  A  larger  sam¬ 
ple  size  is  required  for  statistical  power  and  generalization  of  the 
results  to  the  target  audience.  In  most  cases,  25  to  35  partici¬ 
pants  are  used  as  a  pragmatic  trade-off  between  confidence  and 
resource  constraints.  Specialized  studies  may  require  more.  Sta¬ 
tistical  representations  of  the  attitudinal  measures  are  used  to 
identify  potential  strengths  and  weaknesses  that  need  to  further 
be  explored,  provide  information  about  the  extent  or  severity  of 
an  attitude  or  to  describe  trends  of  interest  that  come  to  light. 

Conducting  the  studies  in  the  lab  as  opposed  to  online,  allow 
us  to  test  in-progress  games  during  early  development.  It  also 
allows  us  to  take  more  control  of  the  testing  situations  and  mon¬ 
itor  a  limited  amount  of  play  behaviors  that  can  be  paired  with 
the  attitudinal  data. 

On  the  other  hand,  there  are  a  few  notable  disadvantages. 
First,  structured  lab  studies  limit  the  amount  of  gameplay  that 
can  be  tested  in  a  single  study.  An  artificial  lab  scenario  or  time 
line  may  prevent  the  engineer  from  collecting  critical  informa¬ 
tion  about  the  game  experience.  This  is  a  classic  validity  trade¬ 
off  present  in  most  lab  studies.  Second,  the  amount  of  prepara- 


37.  User-Centered  Design  In  Games  •  755 


tion  that  is  required  to  design  a  Playtest  questionnaire  takes 
some  time.  The  study  plan  cannot  be  adjusted  during  the 
course  of  the  study.  The  engineer  must  be  intimately  aware  of 
the  development  team’s  questions  to  include  all  of  the  relevant 
questions  in  the  survey.  This  requires  domain  knowledge  in 
games  and  experience  in  survey  design. 

Second,  the  data  can  sometimes  be  difficult  to  interpret  if 
the  problem  is  not  major  or  if  the  game  facet  is  novel.  Some¬ 
times  the  data  does  not  provide  specific  causes  for  the  problem, 
making  it  difficult  to  nail  down  the  best  solution.  Follow-up 
studies,  (Playtest  or  usability)  are  often  recommended  (or  nec¬ 
essary)  to  parse  out  these  issues.  Finally,  the  approachable  na¬ 
ture  of  the  data  is  ripe  for  misinterpretation  by  team  members 
who  may  not  be  familiar  with  the  current  state  of  the  game,  de¬ 
sign  details,  or  the  methods  used  in  Playtest.  It  has  been  right¬ 
fully  asserted  that  usability  tests  do  not  necessarily  require  a  for¬ 
mal  report  upon  completion  (Nielsen,  1993).  We  believe  that 
may  be  less  true  for  using  this  kind  of  technique.  Careful  and 
consistent  presentation  formats  are  required.  Instructions  and 
contextual  information  that  will  help  with  interpretation  should 
be  included,  along  with  some  form  of  qualitative  data  (e.g., 
open-ended  comments).  Metrics  from  games  in  development 
should  focus  on  problem  detection  and  often  cannot  be  used 
for  predictive  purposes. 

Development  of  the  Playtest  surveys.  Several  steps 
have  been  taken  to  increase  the  effectiveness  and  efficiency  of 
Playtest.  First,  core  questions  for  each  genre,  play  style,  and  core 
game  feature  have  been  developed,  using  a  multistep  iteration 
and  evaluation  process.  This  process  included  construct  devel¬ 
opment,  survey  item  development,  formative  evaluation  with 
subject  matter  experts,  and  formative  evaluation  of  question 
validity  using  Cognitive  Interviewing  techniques  and  statistical 
validation.  While  many  survey  items  have  been  validated,  most 
Playtests  requires  new  items  to  be  created  in  order  to  meet  the 
specific  needs  of  that  particular  game  and  study  goal.  Therefore, 
a  repository  of  previously  used  customized  questions  has  been 
created,  along  with  guidelines  for  writing  new  survey  items. 

Type  o/  Formative  Studies 

There  are  essentially  three  types  of  formative  studies  that 
are  conducted  during  development:  (a)  the  critical  facet  test, 
(b)  the  initial  experience  test,  and  (c)  the  extended  Playtest.  We 
also  run  tests  that  do  not  easily  fit  into  any  of  these  categories, 
including  subtle  variations  on  the  above,  a  few  cases  of  large 
sample  observationally  based  studies,  and  studies  that  focus  on 
games  from  a  more  conceptual  level. 

As  in  the  usability  section,  for  clarity  of  presentation,  each 
technique  will  be  discussed  separately,  followed  by  a  case  study. 
Each  case  study  will  only  contain  information  pertinent  to  a 
specific  technique,  thus  examples  may  be  taken  from  a  larger 
Playtest. 

Critical  facet  Playtest.  Games  often  take  the  form  of  re¬ 
peating  a  core  experience  within  an  array  of  different  context 
and  constraints.  A  driving  game  is  always  about  aiming  an  object 
that  is  hurtling  through  space.  The  critical  facet  Playtest  focuses 


directly  on  that  core  experience  and  making  sure  that  it  is  fun. 
While  the  core  experience  can  often  be  assessed  in  usability 
testing.  Playtesting  is  necessary  to  assess  attitudes  and  percep¬ 
tions  about  the  core  experience. 

The  following  example  demonstrates  how  a  critical  facet  test 
was  applied  to  the  critical  facets  of  Oddworld:  Munch’s  Oddysee 
(2001),  an  Xbox  game.  Oddworld:  Munch ’s  Oddysee  is  a  platform/ 
adventure  game  that  allows  you  to  switch  back  and  forth  be¬ 
tween  two  main  characters  as  they  proceed  through  the  in¬ 
creasingly  difficult  dangers  of  Oddworld  on  a  quest  to  save 
Munch’s  species  from  extinction.  The  core  gameplay  is  explor¬ 
ing  the  realm  by  running,  jumping,  and  swimming  through  the 
environment.  In  this  case,  there  were  concerns  about  the  user’s 
visual  perspective — which  we  typically  call  the  camera.  Does  the 
camera  support  exploration  of  the  environment? 

Case  study:  Munch’s  Oddysee,  Camera.  Previous  usability 
testing  with  the  Oddworld:  Munch’s  Oddysee  (2001)  deter¬ 
mined  that  while  some  users  indicated  dissatisfaction  with  the 
behavior  of  the  camera,  other  participants  chose  not  to  mention 
it  at  all  while  engaged  in  the  open-ended  usability  tasks.  The 
camera’s  behavior  was  programmed  to  create  maximal  cine¬ 
matic  effects  (e.g.,  sometimes  zooming  out  to  show  the  size  of 
an  area)  and  attempt  to  enhance  gameplay.  The  camera  would 
often  show  a  specific  view  with  the  intent  of  showing  you  what 
was  behind  the  next  door  or  on  the  other  side  of  the  wall  while 
still  keeping  the  main  character  in  view.  "While  many  users  liked 
the  look,  style,  and  behavior  of  the  camera,  users  often  wanted 
more  control  over  the  behavior  of  the  camera.  Indeed  some 
participants  would  actively  say  things  such  as,  “That  looks  re¬ 
ally  cool  right  now  [after  the  camera  had  done  something  visu¬ 
ally  interesting]  but  I  want  the  camera  back  pointing  this  way 
now”  Because  feedback  from  the  usability  lab  contained  both 
positive  and  negative  feedback,  the  development  team  did  not 
see  the  usability  data  as  conclusive.  Further,  changing  the  cam¬ 
era  would  be  a  major  cost  to  the  game  in  terms  of  redesign  and 
redevelopment  time. 

After  having  played  the  game  for  an  hour,  25  participants 
were  asked  for  general  perceptions  of  the  game.  More  specific 
questions  followed.  Questions  related  to  the  camera  were 
asked  in  the  latter  portion  of  the  questionnaire  because  previ¬ 
ous  experience  in  the  usability  lab  had  shown  that  merely  men¬ 
tioning  the  camera  as  part  of  a  task  would  often  cause  partici¬ 
pants  previously  silent  on  the  subject  to  vociferously  criticize 
aspects  of  the  camera’s  behavior.  "With  the  knowledge  that  we 
wanted  to  factor  out  any  priming-related  effects,  two  analyses 
were  conducted. 

The  first  analysis  was  based  on  the  response  to  the  questions 
related  to  the  behavior  of  the  camera  itself  Nearly  half  of  the 
participants  (46%)  indicated  that  the  camera  did  not  give  them 
enough  flexibility  of  control. 

The  second  analysis  went  back  through  individual  responses 
to  determine  the  attitudes  of  those  participants  who  mentioned 
the  camera  before  the  survey  first  broached  the  subject.  Forty- 
three  percent  of  the  participants  were  found  to  have  mentioned 
the  camera  in  a  negative  fashion  prior  to  being  asked  specifically 
about  the  camera  questions. 

Based  on  this  data  and  other  anecdotal  evidence,  the  devel¬ 
opment  team  chose  to  give  the  players  more  flexibility  of  camera 
control.  The  result  was  more  frequent  use  of  a  camera  behavior 
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we  termed  a  third-person  follow  camera.  The  behavior  of  this 
camera  had  the  double  advantage  of  being  more  easily  con¬ 
trolled  by  users  and  of  conforming  to  a  set  of  behaviors  more 
often  expected  by  users.  It  maintained  focus  on  the  main  char¬ 
acter  without  major  adjustments  to  point  of  view  (e.g.,  to  “look 
over  a  wall”  or  “behind  a  door”).  Other  camera  behaviors  (e.g., 
still  camera  behaviors  that  pan  with  the  character)  are  still  a 
part  of  the  game  but  have  been  localized  to  areas  where  these 
alternative  camera  behaviors  can  only  create  an  advantage  for 
the  user. 

Initial  experience  Playtest.  As  with  many  things,  first 
impressions  are  a  key  component  of  overall  satisfaction  with  a 
game.  Given  that  many  games  are  a  linear  experience/narrative, 
there  is  a  lot  of  value  in  obtaining  attitudinal  data  related  to  the 
first  portions  of  gameplay.  Lessons  learned  at  first  are  often 
applied  throughout  the  game.  Obviously,  the  later  portions  of 
the  game  will  never  be  experienced  unless  the  first  portions  of 
the  game  are  enjoyed. 

The  following  example  explains  how  a  set  of  formative  ini¬ 
tial  experience  tests  were  run  ior  MechCommander  2 ,  the  RTS 
game  described  earlier  in  the  chapter.  The  earlier  usability  test 
focused  on  the  choices  that  users  could  make  prior  to  starting 
a  mission,  whereas  this  test  focused  on  in-game  components, 
such  as  user’s  ability  to  take  control  of  their  squad  and  lead 
them  through  battles,  while  experiencing  satisfaction  and  mo¬ 
tivation  to  continue  playing. 

Case  study:  MechCommander  2,  Initial  Missions .  Twenty- 
five  participants,  who  were  representative  of  the  target  market, 
were  brought  onsite  and  were  asked  to  begin  playing  from  the 
first  mission  of  the  game.  After  each  mission,  participants  were 
asked  to  stop  playing  and  report  their  impressions  of  the  fun, 
the  excitement,  and  the  clarity  of  objectives.  Participants  also  in¬ 
dicated  their  comfort  level  with  the  basics  of  controlling  the 
game.  The  participants  were  able  to  play  through  as  many  as 
three  missions  before  the  session  ended.  For  purposes  of 
brevity,  this  case  study  will  focus  on  the  results  related  to  the 
first  mission. 

Although  the  first  mission  had  been  designed  to  offer  tutor¬ 
ial  elements  in  a  brief  and  fun  way,  the  experience  was  gener¬ 
ally  not  satisfying  to  participants.  Approximately  a  third  of  the 
participants  had  a  poor  initial  impression  of  the  game.  They  felt 
that  the  challenge  was  too  easy  and  that  the  mission  was  not 
exciting.  Furthermore,  there  were  some  problems  related  to 
clarity  of  goals.  By  combining  the  survey  results  with  oppor¬ 
tunistic  observation,  it  was  noted  that  some  participants  moved 
their  units  into  an  area  where  they  were  not  intended  to  go. 
This  disrupted  their  experience  and  limited  their  ability  to  pro¬ 
ceed  quickly.  Responses  to  more  open-ended  questions  indi¬ 
cated  that  some  of  the  core  gameplay  components  and  unique 
features  did  not  come  across  to  users.  Some  users  complained 
about  the  standard  (predictable  or  commonplace)  nature  of  the 
game.  Finally,  several  participants  complained  about  the  fact 
that  they  were  being  taught  everything  and  wanted  to  turn  the 
tutorial  off 

A  number  of  actions  were  selected  from  team  insight  and 
user  testing  recommendations.  First,  the  team  decided  to  sepa¬ 
rate  the  tutorial  missions  from  the  required  course  of  the 
game — in  order  to  save  experienced  players  from  the  require¬ 


ment  of  completing  the  tutorial.  Second,  the  scope  of  the  first 
mission  was  expanded  so  that  users  would  have  more  time  in 
their  initial  experience  with  the  game.  Third,  the  clarity  of  ob¬ 
jectives  was  improved  via  minor  interface  changes.  Fourth,  ad¬ 
dressing  the  same  issue,  the  design  of  the  map  on  which  the  mis¬ 
sion  took  place  was  revamped  to  require  a  linear  approach  to 
first  mission  that  limited  the  likelihood  of  users  becoming  lost. 
Fifth,  the  amount  and  challenge  of  combat  was  increased.  Finally, 
one  of  the  unique  components  of  the  game  was  introduced.  The 
new  mission  included  the  ability  to  call  in  support  from  off-map 
facilities.  Despite  all  these  changes,  the  mission  was  still  targeted 
to  be  completed  within  approximately  10-15  minutes. 

A  follow-up  Playtest  was  intended  to  verify  the  efficacy  of  the 
changes.  Twenty-five  new  participants  were  included  in  the 
study.  Results  from  this  test  indicated  that  the  design  changes 
had  created  a  number  of  payoffs.  Far  fewer  participants  felt  the 
mission  was  too  easy  (13%  vs.  33%),  only  3%  indicated  that  the 
mission  was  not  exciting,  measures  of  clarity  of  objectives  im¬ 
proved,  and  surprisingly,  there  was  no  drop  off  on  ratings  of 
comfort  with  basic  controls  because  of  the  tutorial  aspects  being 
improved.  Results  were  not  a  total  success,  as  some  participants 
were  now  rating  the  mission  as  too  hard  while  others  in  re¬ 
sponse  to  open-ended  questions  complained  about  the  overly 
linear  nature  of  the  mission  (e.g.,  there  was  only  one  way  to  pro¬ 
ceed  and  few  interesting  decision  related  to  how  to  proceed 
through  the  mission).  In  response  to  these  results,  the  devel¬ 
opment  team  decided  to  retune  the  first  mission  to  make  it  a 
little  easier,  but  not  to  address  the  comments  related  to  the  lin¬ 
earity  of  the  mission.  The  second  mission  of  the  game  was  less 
linear,  and  so  it  was  hoped  that,  with  the  major  constraints  to 
enjoyment  removed,  most  participants  would  proceed  to  the 
second  mission  and  find  the  mission  goals  to  be  more  engaging. 
The  data  from  the  second  mission  was  rated  far  more  fun  than 
the  first  mission,  validating  their  assumption. 

Extended  Playtests.  While  the  initial  experience  with  a 
game  is  very  important,  the  success  or  failure  of  a  game  often 
depends  on  the  entire  experience.  In  an  attempt  to  provide 
game  developers  with  user-centered  feedback  beyond  the  first 
hour  of  gameplay,  we  conduct  extended  Playtests  that  test  sev¬ 
eral  hours  of  consecutive  gameplay.  This  is  done  by  conducting 
Playtest  studies  that  run  for  more  than  two  hours,  or  by  having 
participants  participate  in  more  than  one  study.  Attitudinal  data 
is  taken  at  significant  time  or  experience  intervals  (e.g.,  after  a 
mission).  Participants  can  also  provide  self-directed  feedback 
about  a  specific  point  in  the  game  at  anytime  during  the  session. 
Basic  behaviors  and  progress  (e.g.,  completed  level  4  at  12:51 
p.m.)  can  be  recorded  by  Playtest  moderators  and  used  in  con¬ 
junction  with  the  attitudinal  data. 

Focus  groups.  In  addition  to  usability  and  survey  tech¬ 
niques,  we  sometimes  employ  other  user-centered  design 
methods  that  are  qualitative  in  nature.  Focus  groups  provide 
an  additional  source  of  user-centered  feedback  for  members  of 
the  project  team.  Our  Playtest  facilities  offer  us  the  opportunity 
to  precede  a  candid  user-discussion  with  hands-on  play  or  game 
demonstration.  In  this  setting,  participants  can  elaborate  on 
their  experience  with  the  product,  provide  feedback  about  the 
user-testing  process,  speculate  about  the  finished  version  of  the 


37.  User-Centered  Design  In  Games  •  757 


product,  generate  imaginative  feature  or  content  ideas,  or  find 
group  consensus  on  issues  in  a  way  that  models  certain  forms  of 
real-world  shared  judgments. 

For  example,  a  focus  group  discussion  regarding  a  sports 
game  revealed  that  users  had  varying  opinions  about  what  it 
meant  for  a  game  to  be  a  simulation  versus  an  arcade-style 
game.  Additionally,  they  had  many  different  ideas  as  to  what 
game  features  made  up  each  style  of  game.  From  this  conversa¬ 
tion,  we  learned  a  few  of  the  Playtest  survey  questions  were  not 
reliable  and  needed  revision. 

Besides  the  general  limitations  inherent  in  all  focus  group 
studies  (Greenbaum,  1988;  Krueger,  1994),  the  focus  groups  de¬ 
scribed  here  typically  include  practical  limitations  that  deviate 
from  validated  focus  group  methods.  Generally,  focus  group 
studies  comprise  of  a  series  of  4  to  12  focus  groups  (Green¬ 
baum,  1988;  Krueger,  1994);  however,  we  typically  run  one  to 
three  focus  groups  per  topic.  This  makes  the  focus  group  more 
of  a  generative  research  tool  rather  than  a  confident  method  for 
identifying  real  trends  and  topics  in  the  target  population. 
Fewer  group  interviews  make  the  study  more  vulnerable  to  idio¬ 
syncrasies  of  individual  groups. 

A  Combined  Approach 

As  you  can  see,  we  have  provided  a  number  of  different  tech¬ 
niques  that  we  use  in  game  development  and  evaluation.  The 
presentation  has  been  structured  in  such  way  to  clearly  demon¬ 
strate  the  differences  between  the  techniques  we  use.  The  im¬ 
portant  thing  to  realize  is  that  no  one  method  exists  indepen¬ 
dent  of  other  methods.  The  following  is  a  brief  case  study  on 
an  Xbox  game  called  Blood  Wake  (2001)  that  demonstrates  how 
some  of  these  techniques  can  be  used  together. 

Case  Study:  Blood  Wake.  Blood  Wake  (2001)  is  a  vehicular 
combat  console  game  using  boats  and  a  combination  of  pow¬ 
erful  weapons.  Our  initial  main  concern  was  the  aiming  mech¬ 
anism  and  the  controls — users  must  be  able  to  control  their 
boat  and  destroy  the  enemy.  A  secondary  concern  was  related 
to  the  initial  missions,  goals,  gameplay,  and  general  look  and 
feel  of  the  game 

The  first  problem  we  addressed  was  the  aiming  mechanism. 
Users  were  presented  with  two  different  types  of  aiming 
schemes:  auto-aim  and  manual-stick  aim.  In  the  auto-aim 
scheme,  the  target  reticule  automatically  snapped  to  an  enemy 
that  was  within  a  reasonable  distance  allowing  the  player  to  ma¬ 
neuver  their  boat  while  shooting  the  enemy.  In  the  manual-stick 
aim  scheme,  the  user  had  to  manually  aim  the  reticule  using  the 
right  stick  on  the  gamepad  with  no  assistance.  In  previous  us¬ 
ability  studies,  we  have  found  difficulties  with  the  manual  aim¬ 
ing  mode.  However,  this  method  is  used  quite  often  in  popular 
games,  thus  the  general  belief  was  that  it  was  better  and  pre¬ 
ferred.  Many  felt  the  auto-aim  method  would  be  too  easy. 

We  decided  to  run  a  within-subjects  design  Playtest  to  assess 
which  scheme  users  liked  better.  Our  results  showed  that  7396 
of  participants  liked  the  auto-aim,  whereas  only  3896  liked  the 
manual-stick  aim.  This  was  proof  enough  to  convince  the  de¬ 
velopers  to  focus  their  efforts  on  the  auto-aim  method. 

Although  we  solved  the  aiming  controversy  unoptimized 
navigation  controls  made  it  difficult  to  assess  the  fun  and  chal¬ 


lenge  of  the  game.  Poor  attitudinal  ratings  could  be  explained  by 
the  gameplay  or  the  poorly  tuned  boat  controls.  In  order  to  op¬ 
timize  these  controls,  we  headed  to  the  usability  labs.  There  we 
presented  participants  with  two  different  control  schemes.  In 
one  scheme,  throttle  and  steering  were  both  mapped  to  the 
left  analog  stick  (push  forward/back  for  throttle;  push  left/right 
for  steering).  This  enabled  the  user  to  drive  and  steer  with  one 
finger.  In  another  scheme,  the  throttle  was  mapped  to  the  right 
trigger,  and  steering  was  on  the  left  analog  stick.  Separating  the 
steering  from  the  throttle  would  map  more  directly  onto  famil¬ 
iar  mental  models  for  driving  and  steering  cars.  Previous  data 
indicated  that  neither  scheme  was  clearly  superior,  so  the  goal 
of  this  test  was  to  optimize  the  controls  for  both  schemes. 

Participants  were  asked  to  perform  common  maneuvers  in 
the  game.  The  following  are  examples  taken  from  the  usability 
task  list: 

1.  Once  the  game  starts,  you  should  see  a  bunch  of  boxes  that 
form  a  path  in  the  distance.  As  fast  as  you  can,  try  to  follow 
the  trail  of  items  and  collect  as  many  boxes  as  you  can  with¬ 
out  turning  around  if  you  get  off  course. 

2.  Somewhere  on  this  map,  there  is  a  straight  line  of  boxes. 
Find  the  straight  line  of  boxes  then  try  to  slalom  (like  a  skier) 
through  the  line  of  the  boxes.  Try  to  do  this  as  fast  as  you  can 
without  collecting  any  of  the  boxes. 

As  you  can  see  from  the  two  examples,  the  tasks  were  very  per¬ 
formance  based.  Participants  were  instructed  to  think  aloud 
while  we  measured  task  performance  and  task  time. 

By  the  end  of  the  study,  we  were  able  to  provide  detailed  in¬ 
formation  to  the  team  about  how  to  optimize  the  controls.  Some 
of  the  issues  that  were  addressed  included  the  sensitivity  of  the 
steering,  controlling  boat  speed,  and  difficulties  moving  the  boat 
in  reverse.  Iteration  on  the  controls  was  followed  by  multiple 
Playtests  that  focused  on  user  perceptions  of  five  of  the  initial 
missions.  The  results  of  the  Playtest  validated  our  earlier  recom¬ 
mendations  and  allowed  us  to  tweak  the  controls  even  more. 

By  running  multiple  Playtests,  we  were  also  able  to  get  data 
and  feedback  on  gameplay.  After  each  mission,  we  elicited  feed¬ 
back  on  the  perceived  difficulty,  clarity  of  objectives,  attitudes 
about  the  environment,  and  general  fun  ratings.  This  allowed  us 
to  pinpoint  where  participants  were  running  into  problems, 
such  as  not  knowing  where  to  go  next  and  not  understanding 
how  to  complete  objectives.  This  feedback  led  to  dramatic 
changes  in  the  design  of  the  missions. 


CONCLUSION 


The  need  for  the  continued  development  of  user-centered  design 
methods  in  video  games  has  indeed  arrived.  Games  drive  new 
technologies  and  affect  an  enormous  amount  of  people.  In  ad¬ 
dition,  games  represent  a  rich  space  for  research  areas  involving 
technology,  communication,  attention,  perceptual-motor  skills, 
social  behaviors,  virtual  environments,  just  to  name  a  few.  It  is 
our  position  that  video  games  will  eventually  develop  an  intel¬ 
lectual  and  critical  discipline,  like  films,  which  would  result  in 
continually  evolving  theories  and  methodologies  of  game  de- 
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sign.  The  result  will  be  an  increasing  influence  on  interface  de¬ 
sign  and  evaluation.  This  relationship  between  theories  of  game 
design  and  traditional  HCI  evaluation  methods  has  yet  to  be  de¬ 
fined,  but  definitely  yields  an  exciting  future. 

As  mentioned  earlier,  user-centered  design  methods  are  be¬ 
ginning  to  find  their  way  into  the  video  games  industry.  Com¬ 
mercial  game  companies,  such  as  Ubisoft  Entertainment,  Elec¬ 
tronic  Arts,  Inc.,  and  Microsoft  Corporation,  employ  some  level 
of  user-centered  methodologies  to  their  game  development 
processes,  and  this  presumably  will  grow.  However,  this  is  not 
to  say  that  video  games  are  qualitatively  different  from  other 
computer  applications.  There  are  just  as  many  similarities  as 
differences  to  other  computer  fields  that  have  already  benefit 
from  current  user-centered  design  methods.  Thus,  it  makes 
sense  to  utilize  these  methods  when  applicable,  but  also  to 
adapt  methods  to  the  unique  requirements  that  we  have  iden¬ 
tified  in  video  games. 

In  this  chapter,  we  emphasized  the  difference  between 
games  and  productivity  applications  in  order  to  illustrate  the 
similarities  and  differences  between  these  two  types  of  software 
applications.  We  also  chose  to  reference  many  different  video 
games  in  hopes  that  these  examples  would  resonate  with  a 
number  of  different  readers.  Case  studies  were  included  to  try 
to  demonstrate  in  practice  how  we  tackle  some  of  the  issues 
and  challenges  mentioned  earlier  in  the  chapter.  It  is  our  inten¬ 


tion  that  practitioners  in  industry,  as  well  as  researchers  in  acad¬ 
emia,  should  be  able  to  take  portions  of  this  chapter,  and  adapt 
them  to  their  particular  needs  when  appropriate,  similar  to  what 
we  have  done  in  creating  the  actual  methods  mentioned  in  this 
chapter.  That  said,  we  are  upfront  that  most,  if  not  all  of  our 
user-testing  methods  are  not  completely  novel.  Our  user-testing 
methods  have  been  structured  and  refined  based  on  a  combi¬ 
nation  of  our  applied  industry  experience,  backgrounds  in  ex¬ 
perimental  research,  and  of  course,  a  passion  for  video  games. 
This  allows  us  to  elicit  and  utilize  the  types  of  information 
needed  for  one  simple  goal  and  to  make  the  best  video  games 
that  we  can  for  as  many  people  as  possible. 
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INTRODUCTION 


This  is  the  age  of  the  computer.  Whether  we  use  a  personal 
computer  to  balance  our  checkbook,  operate  a  mainframe  that 
sorts  signal  intercepts  for  our  government,  or  work  the  cash 
register  at  the  local  Burger  King,  the  steady  hum  of  computer 
technology  permeates  our  daily  existence.  Comparison  shop¬ 
ping  can  now  be  done  in  seconds  online  with  a  few  mouse 
clicks,  rather  than  in  days  of  real-world  driving.  University  book¬ 
stores  now  compete  not  only  with  local  equivalents,  but  also 
with  online  merchants  on  other  continents  often  with  favorable 
consequences  to  the  wallet  and  purse.  The  cost  of  distributing 
ideas  has  reached  an  all-time  low,  forever  altering  political  dia¬ 
logue,  and  websites  have  brought  together  far-flung  communi¬ 
ties  dedicated  to  everything  from  Japanese  cartoons  to  satanic 
cults.  According  to  the  latest  available  figures  from  the  U.S.  Cen¬ 
sus,  61.8%  of  households  in  the  United  States  have  home  com¬ 
puters  and  54.7%  have  Internet  access  (U.S.  Census,  2005). 

This  prevalence  masks  a  persistent  problem,  however.  While 
the  61.8%  figure  is  impressive  from  the  perspective  of  merely  a 
decade  ago,  it  is  nonetheless  true  that  the  increase  in  comput¬ 
ers  has  not  been  uniform  across  every  subgroup  in  society  nor 
has  it  affected  all  groups  in  the  same  way.  To  the  contrary,  the 
computer  revolution  has  left  some  groups  behind.  A  person 
with  a  bachelor’s  degree  is  30%  more  likely  to  own  a  computer 
than  a  person  with  only  a  high-school  education.  A  household 
with  an  income  of  $75,000-199,999  has  a  90%  chance  of  own¬ 
ing  a  computer;  one  with  an  income  of  $25,000-149,999  has  a 
67%  chance.  There  are  also  racial  differences  in  computer  own¬ 
ership.  White  and  Asian  Americans  are  over  20%  more  likely  to 
own  a  computer  than  Black  and  Hispanic  Americans  (U.S.  Cen¬ 
sus,  2005).  Moreover,  in  the  last  decade  of  the  20th  century,  the 
gap  in  computer  ownership  between  African  Americans  and 
Whites  widened.  These  differences  persist  even  when  control¬ 
ling  for  income.  It  has  been  shown  that  owning  a  computer 
leads  to  dramatic  advantages  on  academic  test  scores.  It  is  par¬ 
ticularly  interesting  that,  controlling  for  the  number  of  comput¬ 
ers  in  a  particular  household,  wealthy  Americans  and  White 
Americans  gained  even  more  of  an  advantage  than  poor  and 
minority  students  (Atwell  &  Battle,  1999). 

A  divide  also  exists  between  men  and  women,  with  women 
not  enjoying  the  benefits  of  the  technological  revolution  on  par 
with  men  (Cooper  &  Weaver,  2003).  The  difficulties  women  face 
while  using  computers  are  sweeping.  They  are  underrepre¬ 
sented  in  their  use  and  ownership  of  computers  (Pinkard,  2005; 
Wilson,  Wallin,  &  Reiser,  2003;  Yelland  &  Lloyd,  2001),  take  fewer 
technology  classes  in  high  school  and  college  (Pinkard,  2005), 
are  far  less  likely  to  graduate  college  with  degrees  in  IT  fields 
and,  most  significantly,  enjoy  interacting  with  computers  much 
less  than  do  men  (Mitra,  Lenzmeir,  Steffensmeir,  Avon,  Qu,  & 
Hazen,  2000). 

Computers  are  becoming  central  to  more  jobs  every  year. 
Current  estimates  suggested  that  by  2010,  25%  of  all  new  jobs  in 
the  public  and  private  sectors  will  be  technologically  oriented 
(AAU\)((  2000).  However,  even  more  important,  computers  play 
a  role  in  all  of  the  basic  activities  of  life  from  banking,  to  shop¬ 
ping,  to — increasingly — voting.  Decades  ago,  computer  inno¬ 
vation  was  driven  by  the  space  program,  the  cold  war,  and  mili¬ 


tary  technology.  Now,  a  new  car’s  computer  technology  is  more 
than  1,000  times  more  powerful  than  what  guided  the  Apollo 
moon  missions  (Alliance  of  Automobile  Manufacturers,  2006). 
Computers  are  inescapable.  With  all  of  this  in  mind,  it  is  a  soci¬ 
etal  problem  that  the  path  to  computer  efficacy  is  more  diffi¬ 
cult  for  the  poor,  ethnic  minorities,  and  women  (Wilson,  Wallin, 
&  Reiser,  2003). 


THE  DIGITAL  DIVIDE 


Discrimination  against  women,  at  least  in  certain  domains,  has 
deep  and  complex  roots.  Understanding  the  basis  of  such  dis¬ 
crimination  is  important  and  complex.  The  roots  of  the  digital 
divide  share  some  commonalities  with  discrimination  that 
women  have  faced  in  employment  and  professional  advance¬ 
ment  but  also  have  their  own  distinct  origins.  The  use  of  com¬ 
puters  in  the  home,  classroom,  and  workplace  is  only  a  few 
decades  old,  which  affords  us  the  opportunity  to  gain  a  glimpse 
at  the  genesis  of  the  particular  problem  of  the  gender  divide  in 
information  technology. 

In  the  late  1970s,  computers  began  to  replace  television  as 
the  technological  innovation  in  the  classroom.  By  the  1980s 
they  were  ubiquitous  in  education  and  on  their  way  to  becom¬ 
ing  a  fixture  in  most  households.  In  this  context.  Wilder,  Mackie, 
and  Cooper  (1985)  surveyed  school  children  to  assess  their  at¬ 
titudes  toward  computers.  They  found  a  large  difference  in  the 
degree  to  which  boys  and  girls  were  attracted  to  the  computer. 
As  early  as  kindergarten,  boys  indicated  more  positive  attitudes 
about  computer  technology  than  girls.  These  small  attitudinal 
differences  became  dramatic  in  the  fifth  grade  and  continued  to 
grow  through  the  middle-  and  high-school  years  (Wilder,  Mackie, 
&  Cooper,  1985).  Computer  use  had  just  begun  to  spread  into 
the  mainstream  of  public  life,  and  numerous  explanations  for 
the  difference  were  considered.  Wilder  et  al.  hoped  that  the 
gender  differences  in  regards  to  computers  were  either  an  arti¬ 
fact  of  the  particular  geographic  area  studied  in  the  investiga¬ 
tion  or  something  that  would  diminish  as  technology  became 
more  widely  accessible.  It  was  easy  to  hope  that  the  problem 
would  fix  itself  in  those  days;  public  education  is  a  great  equalizer. 

This  was  not  to  be  the  case.  Disturbing  effects  discovered  in 
the  1980s  persisted  into  the  1990s.  The  clearest  data  was  not 
on  the  question  of  usage,  but  on  anxiety.  In  a  host  of  domains, 
both  young  girls  and  older  women  reported  that  computers  are 
not  creators  of  fun  and  amusement  but  rather  the  source  of  ap¬ 
prehension.  Weil,  Rosen  and  Sears  (1987)  reported  that  about 
1  in  3  adults  in  the  United  States  experienced  what  they  called 
“computerphobia” — adverse  anxiety  reactions  to  the  use  of 
computers.  Dembrot  and  her  colleagues  were  among  the  first  to 
investigate  the  imbalance  in  computer  anxiety  as  a  function  of 
gender.  They  found  that  female  college  students  expressed  con¬ 
siderably  more  anxiety  about  computers  than  did  their  male 
counterparts  (Dambrot,  Watkins-Malek,  Silling,  Marshall,  & 
Carver,  1985;  see  also  Temple  &  Lips,  1989).  This  finding  was 
replicated  frequently  throughout  the  1990s  (i.e.,  Colley,  Gale, 
&  Harris,  1994;  Todman  &  Dick,  1993).  In  the  late  1990s,  these 
differences  between  males  and  females  were  as  ubiquitous  as 
they  were  in  the  1980s,  with  females  from  elementary  school 
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grades  to  university  graduates  expressing  greater  anxiety  and 
negative  attitudes  (Brosnan,  1998;  Whitley,  1997). 

Now,  in  the  first  decade  of  the  21st  century,  there  are  some 
promising  signs  that  the  gender  gap  in  computer  technology  may 
be  weakening.  The  U.S.  Census  showed  marked  increase  in  com¬ 
puter  use  by  women,  especially  in  the  use  of  the  Internet  and 
e-mail  and  in  the  workplace.  Nonetheless,  despite  the  increased 
use,  women  continue  to  lag  behind  men  in  feelings  of  compe¬ 
tence  with  the  computer.  They  also  continue  to  suffer  greater 
anxiety  about  using  information  technology  and  have  fewer  pos¬ 
itive  attitudes  about  working  and  playing  with  the  computer  than 
do  men  (Colley  &  Comber,  2003;  Schumacher  &  Morahan-Martin, 
2001).  Surveying  school-age  children  and  comparing  their  re¬ 
sponses  to  those  collected  more  than  a  decade  ago,  Colley  and 
Comber  (2003)  found  that  girls’  interest  in  computer  applications 
improved,  but  that  girls  continue  to  like  the  computer  less  than 
boys  do.  When  given  a  chance  to  use  computers  in  the  volun¬ 
tary  world  outside  of  school,  girls  use  the  computers  less  fre¬ 
quently  than  do  boys.  Similarly,  Mucherah  (2003)  recently  re¬ 
ported  that  teenage  girls  feel  far  less  involved  with  computers 
and  enjoy  them  less  than  boys  of  comparable  ages.  At  Princeton 
University,  researchers  asked  incoming  college  students  about 
their  reactions  to  computers  (Cooper  &  Weaver,  2003).  Despite 
having  a  highly  capable  and  academically  accomplished  sample, 
they  found  that  the  young  women  were  far  less  confident  of  their 
ability  with  computers  than  were  the  young  men.  The  incoming 
female  undergraduates  reported  feeling  significantly  less  com¬ 
fortable  with  computers  than  the  men  did,  even  though  most  of 
them  had  taken  computer  classes  in  their  high  schools  and  more 
than  8096  of  them  had  taken  higher-level  mathematics,  includ¬ 
ing  calculus.  That  any  differences  were  seen  in  such  circum¬ 
stances  is  very  discouraging,  and  the  effects  were  not  small. 

Those  same  researchers  also  asked  incoming  students  to 
imagine  that  they  were  going  to  take  a  course  in  psychological 
statistics.  They  presented  the  following  question:  Suppose  that 
you  were  asked  to  complete  a  statistics  homework  assignment 
on  the  computer.  How  comfortable  would  you  feel  in  doing  that 
assignment?  These  highly  capable  students  again  differed  based 
on  gender.  Men  felt  that  they  would  be  comfortable  complet¬ 
ing  the  assignment  while  women  felt  uncomfortable  (Cooper 
&  Weaver,  2003).  Therefore,  the  lack  of  confidence  just  noted 
is  not  merely  an  abstract  concept.  Even  in  the  context  of  a  spe¬ 
cific  example,  women  were  just  not  as  sure  of  their  abilities  as 
men  were.  We  can  easily  imagine  that  the  difference  might  have 
been  even  greater  had  4  out  of  5  of  the  women  not  already  com¬ 
pleted  courses  in  calculus. 

The  digital  divide  is  a  worldwide  problem.  Much  of  the  pre¬ 
vious  research  was  conducted  in  the  United  States.  Other  studies 
from  Western  Europe  and  other  highly  developed  countries 
show  similar  effects.  Data  have  been  gathered  in  Great  Britain 
(Colley,  Gale,  &  Harris,  1994),  Australia  (Okebukola  &  Woda, 
1993),  Canada  (Temple  &  Lips,  1989),  and  Spain  (Earina,  Arce, 
Sobral,  &  Carames  1991),  always  with  the  same  result.  In  a  review 
of  this  literature  for  the  International  Association  for  the  Evalua¬ 
tion  of  Educational  Achievement,  Reinen  and  Plomp  (1997)  con¬ 
cluded  that,  “concern  about  gender  equity  is  right.  .  .  .  Females 
know  less  about  information  technology,  enjoy  using  the  com¬ 
puter  less  than  male  students  and  perceive  more  problems  with 
.  .  .  activities  carried  out  with  computers  in  schools”  (p.  65). 


As  interest  in  this  issue  has  intensified,  additional  interna¬ 
tional  data  have  led  to  the  same  conclusion.  Recent  data  re¬ 
ported  from  Romania  (Dundell  &  Haag,  2002),  Egypt  (Abdel- 
hamid,  2002)  and  Italy  (Eavio  &  Antonietti,  2002),  for  example, 
continue  to  show  the  persistence  of  the  digital  divide  in  a  wide 
array  of  educational  systems  around  the  globe.  Although  there 
are  some  exceptions,  (i.e.,  Solvberg,  2002,  in  Norway)  gender 
differences  have  been  remarkably  durable. 

What  the  Digital  Divide  Is  Not 

The  gender  gap  is  less  about  total  hours  using  a  computer  than 
about  using  a  computer  voluntarily  for  enjoyment  and  comfort 
with  information  technology.  In  schools,  it  is  not  about  total 
number  hours  spent  in  front  of  the  computer  screen,  but  rather 
the  interference  of  computer  anxiety  with  the  ability  and  ex¬ 
citement  to  learn.  In  the  workplace,  the  digital  divide  is  not 
about  the  magnitude  of  use,  but  rather  about  women’s  reac¬ 
tions  to  the  technology  with  which  they  interact.  It  is  about  their 
comfort,  attitudes,  and  levels  of  anxiety.  Women  use  computers 
at  their  jobs  more  than  men  do.  The  use  of  the  computer  as 
typewriter  and  cash  register,  for  example,  necessarily  requires 
human  computer  interaction  (HCI)  in  the  workplace,  and  with 
women  holding  far  more  service  and  administrative  support 
jobs  than  men,  their  computer  use  is  relatively  high.  In  fact,  in 
2003,  6396  of  women  used  computers  for  their  jobs;  whereas 
only  5196  of  men  did  so. 

Exposure  in  the  workplace  and  in  the  school  has  not  ended 
the  disparity  between  men  and  women  in  terms  of  their  levels 
of  comfort  using  a  computer,  attitudes  about  computers,  and 
willingness  to  use  computers  in  contexts  in  which  computer  use 
is  not  required.  Especially  in  educational  settings,  anxiety  with 
using  computers  can  not  only  result  in  a  feeling  of  discomfort, 
but  also  can  lead  to  less-adequate  performance  with  the  com¬ 
puter  and  the  material  that  was  supposed  to  be  learned  more 
enjoyably  and  efficaciously  with  computer  technology. 


UNDERSTANDING  THE  ROOTS 
OF  THE  DIGITAL  DIVIDE 


The  digital  divide  is  not  caused  by  lack  of  use,  nor  is  it  due  to 
differences  in  economic  status,  social  class,  or  heredity.  We 
also  assume  that  differences  based  on  biological  sex  play,  at 
most,  a  negligible  role  in  accounting  for  the  differences.  Rather, 
we  see  the  different  reactions  to  information  technology  to 
be  rooted  in  the  socialization  of  boys  and  girls  as  they  learn 
to  cope  with  the  social  constructions  that  form  the  norms, 
rules,  and  expectations  for  their  gender.  As  a  heuristic  guide  to 
understanding  the  digital  divide,  we  propose  a  model  that  de¬ 
scribed  a  series  of  factors  whose  result  is  differential  attitudes 
and  differential  comfort  levels  with  the  use  of  computers  in 
contemporary  society. 

The  model,  which  we  will  describe  in  more  detail  in  the  fol¬ 
lowing  sections,  takes  as  its  starting  point  the  idea  that  there 
exists  in  our  social  world  entrenched  stereotypes  of  the  behav¬ 
iors  and  attitudes  that  are  appropriate  for  children  and  adults 
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of  each  gender.  Boys  are  supposed  to  be  more  eager  to  play 
with  computers  than  girls.  The  most  important  consequence 
of  this  stereotype  is  that  girls  will  experience  more  anxiety  when 
playing  with,  or  learning  from,  computers;  thus  making  it  diffi¬ 
cult  for  them  to  have  pleasant  and  successful  computer  inter¬ 
actions.  This  will  happen  whether  or  not  girls  accept  the  stereo¬ 
type  as  valid.  Girls  who  accept  the  stereotype  as  valid  will  be 
harmed  by  what  is  referred  to  as  the  “self-fulfilling  prophecy” 
(Merton,  1948;  Rosenthal  &  Jacobson,  1968).  Ironically,  girls 
who  do  not  believe  that  the  stereotype  is  true  will  nonetheless 
experience  anxiety  with  computers  because  of  the  phenome¬ 
non  known  as  “stereotype  threat”  (Steele  &  Aronson,  1995). 
The  mere  knowledge  that  the  stereotype  exists  and  other  mem¬ 
bers  of  society  believe  it  sets  in  motion  processes  that  lead  to 
confirmation  of  the  stereotype.  As  the  model  showed,  a  girl  who 
knows  that  there  is  a  stereotype  predicting  poor  computer  com¬ 
petency  on  her  part  will  experience  more  computer  anxiety 
and,  in  the  end,  poorer  performance  and  more  negative  atti¬ 
tudes  about  computers.  This,  in  turn,  will  lead  to  anxiety  and  a 
greater  chance  of  failure. 

Our  model  also  shows  that  different  attributional  patterns 
for  boys  and  girls  contribute  to  the  cycle  that  perpetuates  the 
digital  divide.  Because  of  the  different  interpretations  that  boys 
and  girls  are  taught  with  regard  to  success  in  achievement  do¬ 
mains,  the  stereotype  about  the  relation  of  gender  to  computer 
use  may  become  reinforced  and  more  resistant  to  change.  As 
Fig.  38.1  suggests,  the  dilemma  is  a  self-reinforcing  cycle  in 
which  boys,  typically  to  their  advantage,  and  girls,  to  their  dis¬ 
advantage,  become  enveloped  in  the  veil  of  the  gender  stereo¬ 
type  for  computing. 

In  the  Beginning 

Undeniably,  gender  stereotypes  abound.  Like  most  stereotypes, 
they  were  created  by  society  over  an  extended  time,  and  even 
though  they  are  now  undesirable,  they  are  reluctant  to  be  dis¬ 
mantled.  Regardless  of  whether  or  not  they  are  true,  stereo¬ 
types  have  dramatic  impact  on  behavior.  For  example,  in  most 
western  societies,  we  share  common  societal  expectations 
about  the  toys  boys  and  girls  are  supposed  to  play  with.  We  do 
not  expect  to  see  the  war  characters  in  our  favorite  toy  store 


FIGURE  38.1.  A  logical  model  of  digital  divide. 


sharing  shelf  space  with  dress-up  dolls  and  doll  strollers.  To  the 
contrary,  we  expect  that  the  warriors  will  be  near  the  cars,  trains, 
and  space  heroes.  The  dolls  will  be  near  the  carriages,  play 
houses,  and  play  schools.  Boys  will  find  their  toys  in  the  former 
section;  girls  in  the  latter.  In  reality,  of  course,  there  is  gender 
overlap  such  that  some  boys  find  their  favorite  toy  in  the  section 
with  dolls  and  carriages.  And  many  girls  are  in  toy-heaven  when 
confronted  by  the  cars  and  trucks.  However,  in  general,  there 
is  a  strong  effect  for  gender  stereotyping  of  toys,  based  on  and 
reinforced  by,  adult  expectations  of  what  is  expected  to  be  in¬ 
teresting  and  pleasing  to  boys  and  girls. 

Gender  stereotypes  abound  in  the  classroom  as  well.  Re¬ 
gardless  of  whether  or  not  the  stereotypes  are  true,  we  see 
mathematics,  science,  and  technology  as  the  province  of  boys 
more  than  girls.  Girls  write  well  and  are  interested  in  literature 
and  poetry.  Computers  are  the  bedrock  of  information  technol¬ 
ogy  and,  as  we  have  seen  from  surveys  of  children  and  adults  de¬ 
scribed  above,  in  established  democracies  and  developing  na¬ 
tions,  we  have  a  similar  stereotype  about  who  enjoys  and 
benefits  from  computers.  It  is  not  immediately  apparent  why 
gender  stereotypes  developed  for  computers  become  nearly 
identical  to  stereotypes  about  science  and  mathematics.  Al¬ 
though  the  algorithms  that  comprise  computer  software  are 
complex  and  mathematically  sophisticated,  and  computers 
burst  into  our  consciousness  in  large-scale  space  and  science 
ventures,  most  computer  users  do  not  interact  with  computers 
at  that  level.  A  screen,  keyboard  and  mouse  pad  form  the  basis 
of  the  interfaces  that  most  people  have  with  computers.  Why 
did  the  use  of  computers  become  associated  with  gender? 

The  answer  to  that  question  is  multidetermined  and  a  full 
analysis  is  beyond  the  scope  of  the  current  chapter.  However, 
we  can  isolate  one  of  the  causes  of  the  gender  stereotype  in  the 
introduction  of  the  computer  into  the  educational  system.  The 
classroom  is  a  ubiquitous  melting  pot  and  its  influence  on  chil¬ 
dren’s  attitudes  is  profound.  When  educators  first  looked  to 
computers  to  supplement  their  normal  educational  methods, 
they  made  an  understandable,  though  fundamental,  error.  They 
drew  their  inspiration  from  the  world  of  the  video  game  and 
video  arcade.  The  best  examples  of  popular  computer  games 
in  the  1980s  were  not  the  increasingly  rich  variety  currently 
available,  but  the  far  less  diverse  sampling  of  the  video  arcade 
and  the  early  Nintendo  and  Sega  gaming  systems.  While  these 
programs  drew  their  contexts  from  a  multiplicity  of  domains, 
everything  from  medieval  combat  to  futuristic  space  adven¬ 
tures,  what  most  had  in  common  was  an  emphasis  on  competi¬ 
tive  responding.  As  games  grew  more  elaborate,  story  lines  were 
increasingly  incorporated  to  keep  children’s  interest  in  space 
adventures,  sports,  and  battles. 

Educators  have  always  searched  for  ways  to  make  learning 
more  efficient  and  more  enjoyable.  That  computers  can  give 
students  an  interactive  experience  makes  them  an  obvious  and 
attractive  addition  to  the  classroom.  Computer-software  manu¬ 
facturers  turned  out  hundreds  of  programs  designed  to  assist 
teachers  in  delivering  instruction  in  every  discipline  from  art  to 
zoology.  They  most  likely  contemplated  how  to  design  such  pro¬ 
grams  and  wondered  what  children  wanted.  One  thing  that  was 
obvious  at  this  time  was  that  childred  would  rush  to  finish — or 
ignore  entirely — their  homework  for  a  chance  to  hit  the  arcade. 
"Video-game  designers  were  posting  large  profits  and  the  growth 
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of  their  industry  seemed  to  be  limited  only  by  rapidly  disap¬ 
pearing  technical  issues.  If  children  would  voluntarily  spend 
hours  navigating  the  story  line  of  Kings  Quest,  it  seemed  fair  to 
assume  that  instruction  delivered  in  the  same  game-like  format 
would  be  popular. 

What  went  unnoticed  was  predominantly  boys  visited  ar¬ 
cades  and  spent  hours  playing  their  favorite  video  games.  Turn¬ 
ing  classrooms  into  video  arcades  by  adopting  software  that  re¬ 
sembles  video  games  might  have  been  attractive  to  most  young 
boys,  but  it  presented  a  problem  for  girls.  Every  benefit  that  was 
gained  among  video-game  players  by  making  learning  software 
the  image  of  video  games  was  a  deficit  to  those  who  did  not 
relate  to  or  enjoy  such  games. 

One  of  the  earliest  analyses  of  the  issue  of  computer  tech¬ 
nology  in  the  classroom  was  that  of  Lepper  and  Malone  (1987). 
They  asked  girls  and  boys  what  they  liked  about  computers.  The 
responses  of  boys  matched  intuitions.  Boys  liked  activities  that 
were  in  the  form  of  games.  Story-lines  featuring  sports,  war,  and 
space  were  popular.  Boys  liked  eye-hand  coordination  and  com¬ 
petition.  They  preferred  their  feedback  in  the  form  of  flashing 
lights,  blaring  sounds,  and  explosions.  Girls  were  different.  They 
disliked  competitive  programs,  the  videogame  emphasis  on  re¬ 
flexes,  and  the  dramatic  end  games.  What  they  wanted  was  bet¬ 
ter  learning  tools.  Frequent  and  clear  feedback,  preferably  not  in 
the  form  of  explosions,  was  also  favored.  Girls  appreciated  com¬ 
puters  as  learning  tools,  but  found  the  game  focus  very  dis¬ 
heartening.  With  computer-assisted  instruction  (CAI)  programs 
like  Word  Invasion,  Demolition  Division,  and  Slam  Dunk  Math, 
it  is  clear  that  boys  were  more  precisely  targeted  than  girls. 
Chappel  (1996)  reported  that  in  the  real  worlds  of  education 
and  business,  programs  overwhelmingly  favor  male  identifica¬ 
tion  and  male  interests.  Yet  such  programs  have  been  one  of  the 
major  ways  in  which  children  are  introduced  to  computers.  The 
grand  attempt  to  “make  learning  fun”  has  been  premised  on  an 
unfortunate,  gender-biased  definition  of  the  word  “fun”  that  is 
not  widely  applicable. 

A  Designer's  Bias 

Why  is  there  such  a  strange  disconnect  between  program 
designers  and  female  users?  (See  Lynn,  Raphael,  Olefsky,  & 
Bachen,  2003  for  a  discussion.)  There  are  many  possibilities. 
Men  are  more  interested  in  computer  programming  than 
women,  and  one  could  say  that  male  programmers  are  more 
likely  to  produce  male-focused  software,  but  that  is  both  un¬ 
charitable  and  hard  to  test.  Of  far  greater  interest  is  the  possi¬ 
bility  that  software  designers  implicitly  assume  that  their  users 
will  be  male  and  tailor  their  work  accordingly.  In  everyday  com¬ 
munication,  we  often  adapt  our  tone  to  our  audience.  If  we 
think  our  audience  is  hostile,  we  are  likely  to  act  in  a  more  hos¬ 
tile  manner  ourselves  (Snyder  &  Swann,  1978).  Similarly,  if  a 
male  college  student  believes  that  the  female  he  is  speaking  to 
is  attractive,  he  is  likely  to  act  in  a  friendly  and  inviting  manner 
(Snyder,  Tanke,  &  Berscheid,  1977).  In  HCI,  a  software  designer 
is  interacting  with  a  communication  partner,  albeit  a  strange 
one.  In  the  mind’s  eye  of  the  programmer  is  an  eventual  user. 
Someone  is  sitting  at  the  front  end  of  the  computer  screen,  an¬ 
swering  questions,  and  interacting  with  the  program.  But  who  is 


that  user?  It  is  possible  that  the  software  designer  uses  stereo¬ 
typical  beliefs  to  portray  the  most  likely  consumer  for  whom 
he  or  she  is  preparing  an  educational  or  entertainment  product. 
In  this  view,  computer  programmers  write  software  as  though 
they  are  communicating  with  boys  because  their  automatic  rep¬ 
resentation  of  the  gender  of  the  user  is  male. 

Huff  and  Cooper  (1987)  investigated  the  possibility  that  the 
gender  of  the  typical  user  influences  the  communication 
process  and,  accordingly,  the  characteristics  of  the  software  that 
is  produced.  They  asked  teachers  in  the  New  Jersey  public 
schools  to  design  software  to  help  seventh-grade  children  learn 
the  appropriate  use  of  commas.  The  teachers  were  given  one 
of  three  different  instructions.  One  group  of  teachers  was  asked 
to  design  the  software  for  seventh  grade  boys,  another  for  sev¬ 
enth-grade  girls,  and  the  third,  most  interestingly  of  all,  were 
just  told  to  tailor  their  work  for  seventh-grade  “students.”  These 
teachers  were  all  well  versed  in  the  likes  and  dislikes  of  seventh 
graders.  Their  designs  were  always  fascinating  and  the  two  gen¬ 
der-specific  conditions  produced  the  kind  of  program  concepts 
that  Lepper  and  Malone  (1987)  would  have  expected.  In  de¬ 
scribing  her  boy-directed  idea,  one  teacher  wrote: 

here  is  an  opportunity  to  enjoy  the  world,  do  sports,  and  learn  English 
grammar  at  the  same  time.  Your  child  will  enjoy  shooting  cannons  and 
competing  for  the  highest  score.  After  playing  with  this  program,  you 
child  will  use  commas  in  a  natural  and  correct  manner. 

The  group  writing  for  girls  had  no  difficulty  in  guessing  the 
right  features  to  include.  A  typical  response  was  expressed  by 
one  teacher  when  she  described  her  program  as,  “Two  girls  go 
on  a  shopping  trip  to  a  record  shop  to  find  music  for  a  dance  be¬ 
ing  given  at  school.  They  converse  with  each  other  and  make 
decisions  about  what  to  buy.  The  use  of  commas  and  rules  in¬ 
volved  are  taught  through  this  trip.  Reinforcement  is  available  in 
worksheet  form.”  The  activity  is  social,  not  military,  and  lacks 
the  boyish  embellishments  that  girls  find  anxiety  inducing. 

The  teachers  recognized  that  they  needed  to  write  vastly  dif¬ 
ferent  programs  to  motivate  students  of  different  genders. 
While  their  results  appear  to  rely  heavily  on  gender  stereotypes, 
they  do  recognize  the  problem  and  produce  programs  that  are 
appropriate  to  the  point  of  caricature.  Based  solely  on  these 
conditions,  one  would  expect  that  teachers  would  be  able  to 
find  a  happy  medium  when  working  for  “students.” 

The  “student”  condition,  however,  is  precisely  where  the 
problem  arises.  Teachers  designing  without  a  specified  gender 
did  not  assume  that  half  their  users  would  be  male  and  half 
would  be  female.  Their  answers  were  resoundingly  like  the  pro¬ 
grams  that  had  been  written  for  boys  and  nothing  like  the  pro¬ 
grams  that  had  been  written  when  girls  were  the  focus  of  atten¬ 
tion.  A  typical  description  in  this  condition  began,  “Here’s  a 
fast-paced  program  for  your  arcade-game  lovers.  Just  what  the 
teenager  spends  his  quarters  on!  ....  Sentences  zip  across  the 
screen — some  correctly  punctuated  with  commas,  some  not. 
Correct  sentences  are  “zapped”  off  the  screen  by  your  students 
as  they  try  to  be  on  the  roster  of  top  scorers.” 

Programs  that  most  teachers  wrote  for  “students”  were 
nearly  identical  to  programs  that  other  teachers  wrote  for  boys. 
All  of  the  programs  were  coded  and  assessed  by  independent 
raters  and  then  subjected  to  a  multidimensional  scaling  analysis. 
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The  results  showed  that  the  programs  written  for  students  were 
statistically  indistinguishable  from  the  programs  written  for  boys 
on  a  dimension  that  ranged  from  “learning  tool”  to  “toy.”  Both 
were  markedly  on  the  “toy”  end  of  the  dimension.  Programs 
written  when  girls  were  the  focus  of  attention  were  written  as 
learning  tools  and  were  significantly  different  from  both  the  boy 
and  student  programs  (Fig.  38.2). 

Greater,  Anxiety,  and  Poorer  Performance 

Even  though  it  was  clear  that  most  educational  programs  were 
designed  with  the  boy  definition  of  fun  in  mind,  it  still  remained 
to  be  seen  what  effect  these  programs  had  on  girls  compared 
to  more  gender-neutral — though  far  less  common — alterna¬ 
tives.  Cooper,  Hall,  and  Huff  (1990)  examined  this  issue  using 
Demolition  Division,  a  program  intended  to  teach  division  in  a 
stereotypically  boy  manner  by  employing  war-related  imagery, 
competition,  and  eye-hand  coordination.  Its  manufacturer  de¬ 
scribed  it  as  “an  opportunity  to  practice  the  division  of  problems 
[sic]  in  a  war  game  format.  Tanks  move  across  the  screen  as 
guns  from  bulkheads  are  fired  by  the  students  as  he  answers  the 
problem.  Hits  and  misses  (correct  and  incorrect  answers)  are 
recorded  at  the  bottom  of  the  screen.”  The  researchers  had 
middle-school  boys  and  girls  learn  division  with  either  this  pro¬ 
gram  or  another  one.  Arithmetic  Classroom,  a  CAI  program  that 
lacked  all  of  the  features  of  a  stereotypically  boy-focused  game 
but  taught  essentially  the  same  information.  The  students 
worked  with  the  program  for  several  minutes  in  a  computer 
cluster  in  their  school.  Following  the  exercise,  the  children  filled 
out  a  questionnaire  assessing  their  liking  for  the  CAI-learning 
program  as  well  as  their  level  of  anxiety  and  stress.  When  they 
returned  to  the  classroom,  their  ability  to  perform  division 
problems  was  assessed. 

The  data  showed  that  girls  liked  the  Demolition  Division  pro¬ 
gram  less  often  than  boys  did,  and  the  girls  also  were  considerably 
more  anxious  at  the  conclusion  of  the  CAI  lesson.  In  addition,  the 
level  of  anxiety  was  negatively  correlated  with  performance:  the 
more  anxious  the  student,  the  less  she  had  learned.  In  Arithmetic 
Classroom,  however,  the  results  were  quite  different.  Remem- 
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FIGURE  38.2.  Level  of  computer  anxiety  after  learning  from  a 
male  oriented  (Demolition  Division)  and  Control  (Arithmetic 
Classroom)  Computer  Assisted  Learning  program.  (Source:  Huff 
&  Cooper). 


ber  that  this  program  had  no  competitive  elements,  no  eye-hand 
coordination  tasks,  and  no  war-story  plot  line.  The  anxiety  levels 
of  girls  using  this  program  were  not  any  higher  than  those  of 
boys.  Girls  felt  quite  comfortable  and  experienced  slightly  less 
anxiety  than  did  the  boys.  The  results  are  shown  in  Fig.  38.3. 
When  the  computer  software  had  the  formal  elements  that  boys 
enjoy,  girls  showed  the  typical  pattern  of  the  digital  divide:  They 
experienced  stress  and  anxiety  and,  consequently,  did  not  per¬ 
form  as  well.  There  was  no  evidence  for  the  gender  divide  when 
the  program  had  the  formal  features  that  girls  enjoy. 

Nearly  a  decade  after  Cooper  et  al.,  another  group  of  re¬ 
searchers  found  a  similar  effect  in  a  more  elaborate  computer 
game.  In  this  case,  the  variable  of  concern  was  not  flashing  lights 
or  explosions,  but  identification.  Littleton,  Light,  Joiner,  Messer, 
and  Barnes  (1992)  worried  that  many  of  the  storyline  CAI  games 
did  not  include  characters  that  girls  could  relate  to.  These  re¬ 
searchers  targeted  a  popular  CAI  game  cdWeA  King  and  Crown, 
which  taught  a  series  of  spatial  reasoning  skills  in  an  adventure 
format.  The  characters  in  the  game,  however,  were  primarily 
warriors  and  the  game  was  aggressive.  Boys  learned  the  skills 
necessary  for  the  game  and  fully  succeeded  in  the  adventure  ap¬ 
proximately  50%  of  the  time.  Girls  were  successful  only  8%  of 
the  time.  Littleton  et  al.  found  that  the  male-oriented  world  of 
King  and  Crown  caused  the  girls  to  disidentify,  become  anx¬ 
ious,  and  withdraw.  Yet,  the  problem  was  with  the  context,  not 
the  content.  When  performance  was  examined  in  another  pro¬ 
gram,  one  that  taught  the  same  skills  in  a  gender-neutral  con¬ 
text,  girls  and  boys  performed  equally  well.  Now,  50%  of  both 
genders  completed  the  game.  Taken  together,  the  Cooper  et  al. 
(1990)  and  Littleton  et  al.  studies  suggested  that  girls  are  not 
innately  inferior  at  learning  through  computer  programs.  The 
problem  is  in  the  design  of  the  game.  Programs  rich  in  boy- 
favorable  elements  are  not  efficient  learning  tools  for  girls;  their 
anxiety  level  increases  while  their  interest  and  performance  de¬ 
crease.  Also  important  is  that  boys  do  not  do  worse  on  the  pro¬ 
grams  that  allow  girls  to  do  better. 

The  problem  is  not  one  confined  to  young  children  in  the 
lower  grades.  The  problem  may  start  there,  but  it  persists  into 
adulthood.  Surveys  have  revealed  negative  attitudes  and  higher 
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FIGURE  38.3.  Results  of  multi-dimensional  scaling  depicting 
programs  written  by  teachers  for  instructing  boys,  girls  and  stu¬ 
dents.  The  arrows  depict  the  central  tendency  of  each  condition. 
(Source:  Huff  &  Cooper). 
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computer  anxiety  among  females  in  college,  the  workplace,  and 
among  retirees  (Zhan,  2005).  Experimental  data  collected  with 
college  students  also  support  the  negative  effects  of  male- 
favored  software  on  women’s  levels  of  computer  anxiety.  For  ex¬ 
ample,  in  a  study  by  Robinson-Staveley  and  Cooper  (1990) ,  men 
and  women  students  at  Princeton  University  played  the  game  of 
Zork,  in  which  players  compete  to  find  a  buried  treasure  in  an 
adventure-game  format.  Women  reported  a  high  degree  of 
stress  while  playing  the  game  and,  in  turn,  performed  poorly. 
Male  students,  on  the  other  hand,  performed  considerably  bet¬ 
ter  and  did  not  experience  computer  anxiety. 

Another  interesting  finding  in  the  Robinson-Staveley  and 
Cooper  (1990)  study  is  that  the  social  context  of  computing  has 
a  substantial  effect  on  the  experience  of  computer  anxiety.  In 
their  research,  gender  differences  in  performance  occurred 
only  in  the  presence  of  other  people.  If  the  students  were  asked 
to  solve  the  Zork  adventure  in  complete  privacy,  the  women  did 
well  (better  than  the  men)  and  experienced  only  a  little  com¬ 
puter  anxiety.  Similarly,  middle-school  girls  in  the  Demolition 
Division  study  did  not  experience  more  computer  anxiety  than 
boys  if  they  worked  with  the  CAI  program  without  the  presence 
of  others  (Cooper  et  al.,  1990). 

The  social  context  also  matters  when  the  gender  composi¬ 
tion  of  learning  groups  is  considered.  Girls  learning  in  the  pres¬ 
ence  of  boys  suffer  from  increased  computer  anxiety  and  learn 
less.  Light,  Littleton,  Bale,  Joiner,  and  Messer  (2000)  had  boys 
and  girls  work  with  a  mildly  competitive  problem-solving  game 
in  which  the  players’  task  was  to  reach  a  geographical  location 
without  being  captured  by  monsters.  The  children  worked  in 
groups  of  two,  either  same  sex  or  opposite  sex  dyads.  Light  et  al. 
(2000)  found  that,  overall,  boys  performed  better  than  girls  in 
this  game.  However,  in  same-sex  dyads,  the  difference  in  per¬ 
formance  was  small.  In  mixed-sex  dyads,  the  difference  was  en¬ 
hanced.  Boys’  performance  was  markedly  improved  relative  to 
their  performance  in  the  same-sex  group  while  girls’  perfor¬ 
mance  showed  significant  decrements. 

Nicholson,  Gelpi,  Young,  and  Sulzby  (1998)  examined  the  in¬ 
teractions  that  occurred  when  first-grade  girls  and  boys  were 
asked  to  work  together  on  a  computer  task.  They  found  that, 
in  mixed-gender  groups,  girls  were  likely  to  have  their  compe¬ 
tence  and/or  their  work  criticized  or  laughed  at  compared  to 
girls  working  with  other  girls.  It  is  unsurprising  that  taking  an  al¬ 
ready  somewhat  anxious  person  and  putting  her  in  a  position 
where  she  may  be  subject  to  ridicule  or  competition  worsens 
performance,  but  it  is  still  an  important  consideration.  'While  the 
centrality  of  social  context  in  these  effects  relates  more  to  us¬ 
age  environment  than  program  design,  these  findings  under¬ 
score  the  socially  constructed  nature  of  the  gender  gap.  In  a 
more  friendly  and  accepting  environment,  girls  can  prosper. 

"What  we  have  seen  through  the  analysis  of  patterns  in  edu¬ 
cation  software  is  just  part  of  one  path  through  the  model  we 
propose.  Gender-stereotypes  have  led  to  the  creation  of  pro¬ 
grams  that  increase  the  girls’  computer  anxiety.  This  anxiety 
then  lowers  their  computer  performance,  making  them  more 
skeptical  about  their  computer  ability  and  reaffirming  the 
stereotype.  However,  software  and  peer  environment  are  not 
the  only  means  through  which  society  can  affect  a  woman’s 
computer  success.  'What  are  the  effects  of  an  authority  figure,  a 
parent,  a  teacher,  a  boss,  holding  such  a  stereotype? 


The  Hazards  of  Low  Expectations 

Reactions  to  stereotypes  can  vary.  First,  the  stereotype  can  be 
believed.  There  is  good  reason  to  believe  that  gender-based 
stereotypes  can  have  the  power  of  the  self-fulfilling  prophecy, 
creating  further  evidence  for  the  stereotype.  The  classic  self- 
fulfilling  prophecy  study  was  conducted  in  a  classroom  setting 
by  Rosenthal  and  Jacobson  (1968).  These  researchers  convinced 
teachers  that  a  new  test  of  intelligence  could  predict  which  of 
their  students  were  likely  to  experience  sudden  improvements 
in  their  academic  ability  during  the  next  school  year.  Some  stu¬ 
dents  were  identified  as  likely  to  show  these  “spurts,”  but  for 
the  majority,  nothing  was  said.  'When  they  tested  the  students  at 
the  end  of  the  year,  they  found  that  students  that  the  teachers 
had  expected  to  show  considerable  improvement  did  perform 
much  better  on  standardized  tests  than  students  not  labeled 
that  way,  especially  among  younger  students.  The  interesting 
part  of  this  study  was  that  the  test  Rosenthal  and  Jacobson  ad¬ 
ministered  was  entirely  bogus  and  the  results  were  intentionally 
randomized.  The  teachers  responded  differently  to  students 
whom  they  expected  to  perform  well.  'While  no  data  shows  ex¬ 
actly  the  nature  of  this  difference,  imagination  provides  a  host  of 
possible  explanations;  the  teachers  could  have  been  more  en¬ 
couraging,  more  attentive,  and  more  supportive  of  those  they 
“knew”  could  do  well.  This  study  has  long  been  seen  as  a  warn¬ 
ing  to  teachers  to  be  careful  in  labeling  their  students  either  pos¬ 
itively  or  negatively.  Students  are  attuned  to  the  expectations  of 
their  superiors  and  can  be  persuaded  to  either  try  harder  or  give 
up  based  on  what  is  expected. 

'Word,  Zanna,  and  Cooper  (1974)  showed  that  stereotypes 
about  groups  of  people  also  impact  peoples’  performance.  Their 
study  made  use  of  implicit  racial  stereotypes  about  African  Amer¬ 
icans.  In  the  context  of  a  job  interview,  'Word  et  al.  showed  that 
the  negative  racial  stereotypes  that  'White  job  interviewers  held 
about  the  traits  and  capabilities  of  Black  candidates  subtly  and 
nonconsciously  affected  the  way  in  which  they  behaved  toward 
'White  and  Black  applicants.  Although  the  interviewers  did  not 
consciously  realize  it,  they  behaved  in  subtly  more  negative  ways 
toward  the  Black  applicants  than  they  did  toward  the  'White  ap¬ 
plicants.  They  spoke  more  quickly  to  African  Americans,  avoided 
eye  contact,  and  sat  at  greater  distances.  In  the  end,  it  was  clear 
that  candidates  who  were  treated  in  these  ways  performed  ob¬ 
jectively  worse  in  their  job  interviews  (Word  et  al.,  1974). 

These  studies,  as  well  as  many  others,  should  serve  as  a  cau¬ 
tion  to  parents  and  teachers.  Believing  that  girls  dislike  com¬ 
puters  or  are  not  competent  with  them  can  directly  lead  to  girls 
being  treated  differently  when  interacting  with  computers. 
Adult  educators  and  parents  who  introduce  the  computer  tasks 
to  children  may  have  negatively  stereotyped  beliefs  (like  the  in¬ 
terviewers  in  the  'Word  et  al.  study)  and  communicate  these  be¬ 
liefs  in  subtle  ways  to  their  children.  Self-fulfilling  prophecies 
have  the  added  danger  of  promoting  resiliency  in  the  stereotype 
because  those  parents  and  teachers  who  hold  stereotypical  be¬ 
liefs  will  often  see  them  confirmed.  'What  they  do  not  realize  is 
that  their  lessons,  their  examples,  and  their  communications 
may  have  contributed  to  creating  the  very  disparity  they  believe 
they  are  observing  (Schofield,  1995). 

The  self-fulfilling  prophecy  literature  also  provides  examples 
of  how  expectations  about  oneself  cw  affect  one’s  future  per- 
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formance.  Zanna,  Sheras,  Cooper,  and  Shaw  (1975)  modified 
Rosenthal  and  Jacobson’s  procedure  by  telling  not  only  the 
teachers  but  also  the  students  of  the  results  of  a  bogus  ability 
test.  As  was  the  case  when  the  teachers  believed  certain  stu¬ 
dents  would  excel,  students  who  had  been  told  that  they  should 
improve  dramatically  during  the  year  drastically  outperformed 
their  peers  in  reading  and  math  at  year’s  end.  In  the  context  so¬ 
cietal  messages,  a  girl  who  comes  to  believe  the  stereotype 
about  her  gender  can  be  expected  to  give  up  more  easily  and 
not  become  as  competent  as  her  peers.  This  link  between  com¬ 
puter  attitudes  and  ability  makes  computer  anxiety  part  of  a 
vicious  and  mutually  reinforcing  cycle. 

Boys,  in  believing  that  technology  is  in  their  domain,  are 
likely  to  benefit  from  the  self-fulfilling  prophecy.  Their  parents, 
teachers,  and  other  socializing  agents  act  in  ways  that  produce 
positive  feelings  about  computers,  and  the  boys  may  respond 
positively  in  the  way  they  approach,  think  about,  and  perform 
with  computers.  This  is  also  a  reinforcing  cycle.  "When  trying  to 
dismantle  the  negative  cycle  that  holds  back  girls,  one  cannot 
neglect  the  positive  cycle  that  enables  boys.  The  message 
should  be  that  technology  is  everyone’s  domain. 

Mere  Awareness:  Stereotype  Threat 

People  can  also  believe  that  the  stereotype  exists  but  disbelieve 
in  the  truth  of  the  stereotype.  For  example,  Devine’s  (1990) 
work  on  racial  stereotypes  shows  that,  regardless  of  whether 
White  participants  were  prejudiced  or  not,  they  could  quickly 
and  automatically  produce  the  list  of  traits  that  form  Whites’ 
stereotypes  of  Blacks.  Even  the  most  self-confident  female  com¬ 
puter  scientist  cannot  help  but  be  aware  that  most  of  the  pop¬ 
ulation  considers  her  an  anomaly.  Unfortunately,  even  the  mere 
knowledge  of  a  stereotype  can  have  harmful  effects.  This  is  an¬ 
other  path  between  the  gender  stereotype  and  computer  anxi¬ 
ety.  It  is  called  “stereotype  threat.” 

Research  on  stereotype  threat  has  shown  that  the  mere 
knowledge  of  a  negative  stereotype  applying  to  one’s  group  can 
cause  one  to  perform  poorly  at  a  particular  task  (Spencer, 
Steele,  &  Quinn,  1999;  Steele,  1997;  Steele  &  Aronson,  1995). 
Therefore,  how  does  a  girl  in  an  introductory  programming 
course  get  harmed  by  a  stereotype  she  doesn’t  believe?  She 
feels  that  girls  are  just  as  good  at  computers  as  boys.  Or  maybe 
she  thinks  that,  even  if  the  stereotype  is  true  about  girls  in  gen¬ 
eral,  it  is  not  true  about  her.  She  likes  to  use  computers  and  gen¬ 
erally  does  very  well  with  them.  Where  is  the  weakness? 

She  could  have  two  worries  that  would  cause  her  to  feel  anx¬ 
ious.  First,  she  may  worry  that  others  will  judge  her  based  on  the 
stereotype.  Even  though  she  is  bright  and  accomplished,  she  may 
worry  that  others  still  view  her  as  a  stereotypical  woman.  She  may 
still  have  her  work  questioned  and  belittled  just  because  of  her 
sex  (Nicholson  et  al,  1998;  Schofield,  1995).  It  is  a  worry  common 
to  groups  subject  to  discrimination.  Such  worries  can  distract  her 
from  the  task  at  hand.  With  cognitive  resources  devoted  to  wor¬ 
rying  about  whether  she  is  being  judged  according  to  the  gender 
stereotype,  the  student  may  perform  less  well  at  the  task. 

The  other  worry  is  linked  to  the  first.  She  knows  there  is  a 
stereotype  and  she  wants  to  disprove  it.  She  doesn’t  just  want  to 
be  good;  she  wants  to  overcome  the  deficit  that  people  might 
believe  exists.  Task  importance  has  been  shown  to  be  a  key  vari¬ 


able  in  stereotype  threat  research.  The  more  something  mat¬ 
ters,  the  more  it  hurts  to  be  reminded  of  the  stereotype. 

As  in  much  literature  about  stereotypes,  work  on  stereo¬ 
type  threat  began  with  studies  involving  African  Americans. 
Given  the  stereotype  that  African-American  students  are  not  as 
academically  capable  as  White  students,  Steele  and  Aronson 
(1995)  predicted  that  making  this  stereotype  relevant  would 
decrease  the  performance  of  black  subjects  taking  a  standard¬ 
ized  test.  Initially,  they  made  the  test  relevant  by  saying  that 
the  test  was  a  very  reliable  measure  of  academic  ability.  And 
students  who  thought  the  test  was  relevant  performed  worse 
than  those  who  did  not.  The  researchers  went  on  to  show  that 
making  the  stereotype  more  salient  also  worked  even  if  they 
did  not  tell  the  test-takers  that  the  exam  should  accurately  re¬ 
flects  their  intelligence.  Think  about  the  two  worries  in  the 
context  of  these  experiments.  Steele  and  Aronson’s  (1995) 
subjects  were  so  concerned  about  confirming  the  stereotype 
that  they  did  precisely  that. 

Spencer,  Steele,  and  Quinn  (1999)  showed  a  similar  effect 
using  gender  differences  in  mathematics.  When  women  were 
told  that  the  results  of  a  test  showed  gender  differences,  they 
performed  more  poorly  than  when  gender  differences  were  not 
mentioned.  Interestingly,  the  experimenters  did  not  actually  say 
that  men  outperformed  women  in  the  “gender-differences”  con¬ 
ditions,  but  all  subjects  came  to  the  appropriate  conclusion. 

Other  research  has  shown  that  stereotype  threat  can  affect 
a  wide  array  of  activities.  Stone  and  colleagues  have  shown 
that  Whites  are  likely  to  experience  stereotype  threat  on  ac¬ 
tivities  that  are  alleged  to  be  indicants  of  “natural  athletic  abil¬ 
ity”  when  compared  to  Blacks  (Stone,  Lynch,  Sjomeling,  & 
Darley,  1991;  Stone,  Perry,  &  Darley,  1991).  To  revisit  the  math 
finding,  Aronson  and  his  colleagues  showed  that  stereotype 
threat  affected  White  males’  performance  on  a  mathematics 
test  when  they  thought  that  the  test  was  diagnostic  of  their 
performance  relative  to  that  of  Asian-American  males  (Aron¬ 
son,  Lustina,  Good,  Keough,  Steele,  &  Brown,  1999).  Stereo¬ 
type  threat  is  a  pervasive  phenomenon  that  works  by  increas¬ 
ing  anxiety  and  cognitive  load. 

The  stereotype  that  links  gender  to  computer  performance 
is  as  well  known  as  the  others.  Although  there  are  currently  no 
published  studies  to  link  the  stereotype  threat  experienced  by 
women  to  performance  on  the  computer,  the  link  between  the 
two  is  clear  to  see.  Especially  when  a  student  identifies  with  her 
gender  and  gender  is  a  salient  aspect  of  the  social  situation  in 
which  the  student  finds  herself,  then  the  same  threat  that  dealt 
a  blow  to  women’s  performance  on  mathematics  tests.  White 
males’  performance  on  an  athletic  task  should  occur  to  women 
using  the  computer.  Simply  through  knowledge  of  the  existence 
of  the  stereotype,  the  woman  is  more  likely  than  the  man  to  suc¬ 
cumb  to  that  stereotype  demonstrating  greater  anxiety  and 
poorer  performance. 

Compounding  the  Problem: 

Gender  Differences  in  Attribution 

One  of  the  more  ubiquitous  assumptions  in  the  behavioral  sci¬ 
ences  is  that  people  strive  to  understand  the  causes  of  their  be¬ 
havior  in  the  social  world.  This  axiom  has  given  rise  to  a  field  of 
study  known  as  “attribution”  (Heider,  1958;  Jones  &  Davis,  1965; 
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Kelley  1967;  Weiner,  1979).  In  the  wake  of  witnessing  behavior, 
people  are  motivated  to  make  attributions  for  the  causes  of  the 
act,  whether  it  is  their  own  act  or  the  behavior  of  another.  If  an 
athlete  catches  a  football  on  the  gridiron,  was  it  because  he  was 
a  good  athlete  (an  internal  attribution)  or  because  it  was  an  easy 
pass  to  catch  (an  external  attribution)  or  simply  because  he  was 
lucky,  opening  his  arms  at  the  right  moment  with  the  ball  float¬ 
ing  into  his  hands  at  just  the  right  moment?  Similarly,  if  I  an¬ 
swer  a  question  correctly  on  a  standardized  test,  I  wonder  if  it 
was  because  I  am  smart,  because  the  test  question  was  easy,  or 
because  I  luckily  picked  the  correct  answer  from  a  set  of  equally 
obscure  alternatives. 

In  achievement  domains  that  are  stereotypically  male,  a  pat¬ 
tern  of  attributions  occurs  that  is  protective  for  boys  but  damag¬ 
ing  for  girls.  Boys  come  to  feel  that  any  success  they  achieve  at  a 
stereotypically  male  task  is  a  function  of  their  ability,  whereas  any 
failure  is  due  to  lack  of  trying,  bad  luck,  or  an  unduly  difficult 
task.  This  is  a  protective  pattern  because  success  serves  to  bol¬ 
ster  boys’  opinions  of  how  good  they  are.  It  enables  them  to 
complete  even  more  difficult  tasks  with  a  strong  belief  in  their 
own  ability  to  succeed.  If  they  don’t  succeed,  they  can  rely  on  the 
notion  that  they  only  need  to  try  harder,  pay  more  attention,  or 
be  more  judicious  in  their  choice  of  tasks.  Lack  of  success,  in 
short,  does  not  translate  into  a  belief  in  lack  of  ability. 

Girls,  on  the  other  hand,  make  a  very  different  pattern  of  at¬ 
tribution.  Success  is  attributed  moreso  because  of  external  fac¬ 
tors  such  as  luck,  effort,  or  an  unduly  easy  task;  whereas  failure 
is  taken  personally  as  confirmation  of  a  lack  of  ability.  This  po¬ 
tentially  damaging  attributional  pattern  causes  girls  to  believe 
that  the  primary  route  to  success  in  a  stereotypically  male  do¬ 
main  like  math,  science,  or  computer  technology  is  through 
luck  or  effort.  By  continuously  working  hard,  a  girl  may  feel  that 
she  can  achieve  success,  but  it  is  not  because  of  her  ability  or 
intelligence  at  the  task.  Failure,  by  contrast,  can  be  devastating 
because  it  can  provide  evidence  for  what  she  already  believed: 
That  she  is  not  a  capable  performer  in  the  world  that  has,  ac¬ 
cording  to  stereotypes,  been  the  province  of  boys  (Diener  & 
Dweck,  1978;  Licht  &  Dweck,  1984;  Nicholls,  1975). 

Parsons  and  colleagues  examined  attributional  differences  in 
the  field  of  mathematics  (Parsons,  Meece,  Adler,  &  Kaczula, 
1982).  Children  were  asked  to  rate  their  own  ability  at  mathe¬ 
matics,  as  well  the  difficulty  level  of  the  courses  they  were  tak¬ 
ing.  The  children’s  parents  were  also  asked  to  rate  their  chil¬ 
dren’s  math  ability  and  course  difficulty.  Parents  of  girls  believed 
that  their  children  had  less  mathematical  ability  than  did  parents 
of  boys.  Moreover,  the  girls’  view  of  their  own  ability  was  related 
to  their  parents’  view,  but  not  to  their  actual  performance  in 
math  classes.  Objective  school  records  showed  neither  overall 
difference  in  the  children’s  performance  nor  any  difference  in 
the  difficulty  of  the  courses.  Nonetheless,  parents  unintention¬ 
ally  socialized  their  children  into  thinking  that  the  girls  were  less 
gifted  in  math  than  their  male  counterparts. 

More  recently,  Tiedermann  (2000)  surveyed  several  hundred 
elementary-school  students  and  their  parents  concerning  the 
children’s  ability  in  mathematics.  As  in  the  Parsons  et  al.  (1982) 
study,  there  were  no  objective  differences  between  boys  and 
girls  as  measured  by  their  school  records.  Tiedermann  found 
that  both  mothers  and  fathers  thought  that  boys  were  more 
skilled  in  mathematics  than  girls.  And  the  more  strongly  parents 
believed  in  the  gender  stereotype  about  math,  the  more  they  at¬ 


tributed  greater  mathematics  ability  to  their  sons,  but  not  to 
their  daughters. 

The  consensus  of  studies  that  have  been  conducted  in  the  at¬ 
tribution  tradition  shows  that  boys  and  girls  do  make  different 
attributions  for  success  and  failure  in  stereotypically  male  do¬ 
mains  and  that  the  impact  on  girls’  confidence  in  their  own  abil¬ 
ity  is  damaging.  Nelson  and  Cooper  (1997)  adopted  these  in¬ 
sights  and  applied  them  to  the  field  of  information  technology. 
Ten-year-old  boys  and  girls  were  asked  to  unscramble  anagrams 
on  a  computer.  After  a  few  trials,  half  of  the  boys  and  half  of  the 
girls  began  to  see  error  messages  appear  on  the  screen.  The  er¬ 
ror  messages  increased  in  frequency  and  severity  until  they  fi¬ 
nally  stated  that  the  computer  was  shutting  down  and  the  drive 
was  about  to  be  destroyed.  The  other  half  of  the  children  re¬ 
ceived  no  such  error  messages  and,  in  the  end,  reported  that  no 
computer  errors  had  been  detected. 

Nelson  and  Cooper  (1997)  then  asked  the  children  to  indicate 
what  they  thought  their  ability  level  was  at  computer  tasks.  Fol¬ 
lowing  a  successful  performance  (i.e.,  without  error  messages), 
girls  thought  that  their  ability  was  about  average  compared  with 
other  10-year  old  girls.  In  the  failure  message  condition,  how¬ 
ever,  they  thought  their  ability  was  significantly  inferior  to  the 
average  10-year  old.  Boys  thought  their  computer  ability  was 
higher  than  the  average  10-year  old,  and  failure  messages  did 
not  affect  their  confidence  in  their  ability.  In  addition,  when  the 
boys  and  girls  in  the  error-message  (failure)  condition  were  asked 
to  describe  the  reason  for  their  failure  to  complete  the  task,  girls 
were  three  times  more  likely  to  attribute  the  failure  to  their  lack 
of  ability  than  were  boys.  In  the  success  condition,  boys  were 
much  more  likely  than  girls  to  attribute  their  smooth  and  er¬ 
rorless  performance  to  their  own  ability,  while  girls  were  more 
likely  to  evenly  distribute  their  attributions  to  good  luck,  per¬ 
sistent  effort,  or  an  easy  task. 

■Working  with  college  students  in  Germany,  Dickhauser,  and 
Stiensmeyer-Pelster  (2002)  also  asked  students  to  make  attribu¬ 
tions  for  success  and  failure  at  a  computer  task.  Like  the  young 
children  in  the  Nelson  and  Cooper  (1997)  study,  university 
males  were  much  more  likely  to  attribute  failure  to  a  defective 
computer;  whereas  females  were  more  likely  to  attribute  fail¬ 
ure  to  their  own  ability,  causing  females  to  feel  greater  shame 
about  their  performance  and  lowered  expectation  about  future 
interactions  with  the  computer. 

The  conclusion  of  these  studies  points  to  the  fact  that  chil¬ 
dren  are  taught  by  their  parents  about  how  to  make  attributions 
for  success  and  failure.  At  least  in  the  academic  areas  that  are 
stereotypically  seen  as  male  domains,  parents  teach  their  chil¬ 
dren  that  success  for  boys  is  due  to  ability,  but  success  for  girls  is 
due  to  more  ephemeral  factors  that  do  not  rely  on  girls’  internal 
capabilities.  In  contrast,  girls’  failure  is  an  indictment  of  their 
capability  at  the  task  but  boys  need  only  work  harder  or  have 
better  luck  in  order  to  achieve  in  the  future.  These  attributions 
have  consequences.  Boys  are  encouraged  to  keep  trying,  be¬ 
cause  they  have  the  basic  ability  to  succeed.  Girls,  on  the  other 
hand,  are  not  stimulated  by  success  because  success  is  not  a  re¬ 
flection  of  their  ability.  Failure  is.  Therefore,  it  is  not  surprising 
that,  in  the  last  step  of  the  Nelson  and  Cooper  (1997)  study,  chil¬ 
dren  were  asked  if  they  would  like  to  try  another  task  either  on 
the  computer  or  by  traditional  paper  and  pencil.  Following  fail¬ 
ure  at  the  anagrams  task,  girls  were  much  less  likely  than  boys  to 
want  to  interact  with  a  computer  for  the  future  task. 
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DISMANTLING  THE  DIVIDE 


The  digital  divide  has  made  it  difficult  for  women  to  participate 
fully  in  the  technological  age.  As  we  have  seen,  women  now  use 
computers  frequently  but  continue  to  feel  greater  anxiety,  more 
negative  attitudes,  and  lower  personal  efficacy  in  their  inter¬ 
actions  with  computer  technology.  The  cycle  is  continuous.  Be¬ 
ginning  with  the  shared  knowledge,  or  stereotype,  that  com¬ 
puters  are  the  province  of  men  and  boys,  women  and  girls 
either  succumb  to  that  stereotype  or  fight  against  it.  Either  way, 
the  mere  knowledge  that  the  stereotype  exists  causes  girls  and 
women  to  experience  anxiety.  They  need  not  believe  the  stereo¬ 
type  is  true;  nor  do  they  need  to  believe  that  the  stereotype  ap¬ 
plies  to  them.  This  computer  anxiety  often  leads  to  negative  at¬ 
titudes  and  lowered  performance,  which  is  then  interpreted  via 
attributional  processes  to  reflect  the  accuracy  of  the  stereotype. 
Parents  and  educators  inadvertently  teach  girls  to  attribute  any 
success  they  may  have  with  computer  technology  to  luck  or  ef¬ 
fort,  which  limits  the  ameliorative  role  that  positive  perfor¬ 
mance  can  have  on  a  girl’s  estimation  of  her  capability  to  suc¬ 
ceed  at  computer  tasks.  The  attributional  patterns  that  boys 
develop  allow  them  to  benefit  from  successful  performance  and 
shield  them  from  being  dissuaded  or  discouraged  by  the  occa¬ 
sional  occurrence  of  errors  and  failures. 


Software  Design  Can  Limit  the  Gender  Stereotype 

How  then  can  the  cycle  be  disrupted?  One  place  to  begin  is  with 
the  stereotype  itself  It  cannot  be  willed  out  of  existence 
overnight.  However,  its  pervasiveness  can  be  disrupted  in  a 
number  of  ways.  Educators  and  software  designers  were  par¬ 
tially  insightful  when  they  saw  the  opportunity  to  use  technol¬ 
ogy  to  make  learning  fun.  With  the  power  of  digital  technology 
to  accommodate  children’s  varied  interests  and  fantasies,  learn¬ 
ing  can  be  placed  in  a  context  that  children  find  meaningful. 
What  the  designers  did  not  see  clearly  was  that  their  collective 
decision  to  model  educational  software  on  the  image  of  the 
video  arcade  set  in  motion  a  series  of  psychological  and  socio¬ 
logical  factors  that  helped  to  reify  the  image  of  computers  as  be¬ 
ing  the  province  of  boys — i.e.,  the  people  who  are  fascinated 
by  the  video  arcade. 

The  design  of  educational  software  needs  to  change.  This  has 
already  begun  to  happen,  with  far  more  of  a  variety  of  educational 
programs  available  in  the  current  decade  than  existed  in  the  past 
two  decades.  Nonetheless,  the  pace  of  educational  software  that 
appeals  equally  to  both  genders  needs  to  quicken.  The  software, 
along  with  the  various  peripheral  interfaces,  is  a  significant  com¬ 
municator  of  the  computer  stereotype.  As  we  have  seen,  the  ed¬ 
ucational  and  entertainment  software  packages  available  at  the 
end  of  the  20th  century  were  communication  packages  directed 
at  boys.  They  were  written  as  though  educators  were  speaking 
directly  to  boys,  encouraging  them  to  learn  by  having  fun,  and 
simultaneously  leaving  girls  out  of  the  conversation. 

One  means  to  change  the  communication  pattern  of  com¬ 
puter  technology  is  to  change  the  metaphor  that  characterizes 
the  fantasy  elements  of  the  game.  A  war  metaphor  is  a  strong 
communication  that  that  the  learning  technology  is  for  boys;  a 


sports  metaphor  is  a  less  potent,  but  probably  similar,  commu¬ 
nication.  Learning  a  lesson  for  the  purpose  of  hitting  a  home 
run,  making  a  goal,  or  scoring  a  touchdown  are  less  likely  to 
pique  the  interests  of  most  girls;  nor  would  such  metaphors 
serve  to  weaken  the  stereotype  that  educational  computing  has 
been  designed  for  boys. 

Learning  what  girls  like,  and  ultimately  designing  informa¬ 
tion  technology  software  that  appeals  to  girls,  is  an  empirical 
question.  We  need  to  study  girls’  interactions  with  computer 
technology  and  design  software  that  addresses  their  interests 
and  preferences.  Lepper  and  Malone  (1987)  led  the  way  by 
showing  that  girls  prefer  educational  communication  to  be  in 
the  form  of  a  learning  tool.  If  it  is  to  be  preferred  over  other 
communications,  then  it  much  teach  what  needs  to  be  learned 
in  a  direct  and  efficient  way  (Lynn  et  al.,  2003).  Moreover,  the 
software  attracts  the  interest  of  girls  to  the  extent  that  it  is  in¬ 
teractive,  and  involves  communication  and  sharing  (Light  et  al., 
2000;  Littleton  etal.,  1992). 

Other  forms  of  human  computer  interaction  (HCI)  issues  are 
also  important.  What  kinds  of  peripheral  devices  should  be 
used?  How  will  the  student  ask  for  help?  Will  the  screen  be  used 
to  keep  score?  Will  there  be  a  score?  Will  lights  flash  and  objects 
explode  in  order  to  increase  attention,  curiosity,  and  interest? 
These  are  not  trivial  considerations.  'When  girls  come  to  feel  that 
the  computer  is  not  a  tool  intended  for  them,  part  of  their  belief 
may  be  through  the  human  factors  assumptions  made  about 
their  interactions  with  the  machine.  Passig  and  Levin  (1999) 
worked  with  kindergarten  children  to  assess  what  they  liked 
about  computer  interfaces.  They  found  that  boys,  compared  to 
the  girls,  preferred  to  use  navigational  buttons  to  discover  how 
a  game  should  be  played.  Girls  preferred  that  writing  be  part  of 
the  game  and  preferred  a  way  to  ask  for  help  directly  rather  than 
use  computer  interfaces  such  as  buttons.  Passig  and  Levin 
(1999)  found  that  the  children’s  satisfaction  with  the  games,  and 
their  time-on-task  for  learning  at  the  computer,  were  direct 
functions  of  whether  the  human-factor  decisions  were  consis¬ 
tent  with  their  gender  preferences. 

The  creation  of  software  programs  that  speak  to  both  gen¬ 
ders  is  a  first  step.  A  second  step  is  to  encourage  parents,  school 
boards,  and  educators  to  purchase  such  programs.  This  is  a 
business  and  educational  issue  with  obvious  payoffs  to  society 
as  well  as  the  corporate  bottom  line.  Recent,  successful  com¬ 
mercial  experience  with  such  programs  as  Barbie  Fashion  De¬ 
signer  (Subrahmanyarn  &  Greenfield,  1998)  and  Purple  Moon 
attests  to  the  financial  viability  of  such  enterprises.  'Whether  it 
is  more  appropriate  to  utilize  current  gender  stereotypes  in  or¬ 
der  to  be  certain  that  there  are  computer  programs  accessible 
to  each  gender,  or  whether  it  is  best  to  design  programs  that 
have  elements  that  both  genders  enjoy  is  a  complex  issue  that 
goes  beyond  the  scope  of  the  current  chapter.  However,  the 
conscious  and  deliberate  focus  that  ascertains  that  programs  ed¬ 
ucators  put  in  schools  and  that  parents  bring  to  the  home  is  a 
necessary  step  to  weaken  the  stereotype  that  forms  the  crux  of 
the  digital  divide. 

Decisions  about  software  design  can  also  be  informed  by  ex¬ 
amining  data  collected  from  adults.  Men  and  women  across  a 
broad  age  range  have  logged  into  Massive  Online  Role  Playing 
Games  (MMORPGs).  Men  vastly  outnumber  women  in  the  game 
(Yee,  2006a),  but  it  is  still  informative  to  illustrate  the  kinds  of 
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activities  and  parameters  within  the  MMORPGs  that  interest 
women.  Players  can  choose  a  character,  or  avatar,  to  represent 
them  in  the  game.  In  one  data  set,  48%  of  the  males  chose  to 
be  represented  by  a  female  character;  whereas  only  23%  of  the 
female  players  chose  a  male  avatar.  This  suggests  the  impor¬ 
tance  of  having  female  protagonists  in  computer  activities. 
Those  activities,  such  as  learning  tools  for  school-aged  children, 
that  require  an  identification  between  player  and  a  character 
on  the  screen,  will  most  likely  cause  a  female  player  to  feel  un¬ 
comfortable  if  she  is  not  represented  by  a  female  protagonist. 
Women  also  are  more  likely  then  men  to  view  their  avatar  as  an 
idealized  representation  of  themselves  and  to  see  their  in-game 
behavior  as  similar  to  their  real-life  behavior  (Yee,  2007). 

Women  are  also  more  likely  then  men  to  use  the  MMORPG 
environment  to  build  supportive  social  networks.  According  to 
(Yee,  2006b),  women  form  stronger  friendships  in  the  game 
than  do  men  and  shun  playing  for  achievement,  dominance  and 
advancement.  On  a  percentage  basis,  women  are  more  likely 
than  men  to  join  the  game  with  friends  and  are  much  more 
likely  to  join  with  a  romantic  partner.  They  are  also  more  likely 
to  share  contact  information.  Men  play  for  power,  dominance, 
and  points.  Women  play  to  enjoy  the  communication,  social 
contact,  and  social  interaction.  Games  that  are  multifaceted 
such  as  the  MMORPG  environment  allow  for  players  to  find  el¬ 
ements,  activities,  and  goals  that  appeal  to  both  genders.  Care¬ 
ful  scrutiny  of  what  appeals  to  women  in  these  multifaceted 
games  can  highlight  the  features  that  would  make  games  for 
young  and  adult  females  more  interesting  and  more  acceptable. 

Disrupting  the  digital  divide  at  the  level  of  software  stereo¬ 
typing  is  the  most  important  step  in  reducing  its  deleterious 
impact  on  women.  The  self-fulfilling  nature  of  the  stereotype 
and  the  existence  of  stereotype  threat  will  cease  to  be  issues 
that  support  the  divide  if  the  stereotypes  are  diminished.  Soft¬ 
ware  designers  need  to  orient  their  software  to  appeal  to  all 
groups  of  users.  This  is  particularly  true  of  programs  used  in  ed¬ 
ucation  because,  for  many  children,  school  represents  their  first 
exposure  to  the  computer.  It  is  here  that  they  learn  whether 
computer  technology  is  competitive,  like  the  games  in  video 
arcades  or  is  a  medium  of  communication  that  is  equally  acces¬ 
sible  to  girls  and  boys. 

Adjusting  the  Social  Context 

The  social  context  of  computing  can  also  help  to  weaken  the 
digital  divide.  In  line  with  the  research  of  Robinson-Staveley  and 


Cooper  (1990)  and  Cooper  et  al.  (1990),  female  college  students 
and  middle-school  students  performed  better  and  experienced 
less  anxiety  when  computing  in  private  than  when  computing  in 
public.  One  suggestion  from  these  studies  is  that  girls  should  be 
afforded  more  private  space  when  working  with  computer  tech¬ 
nology.  Discomfort  about  competitive  interactions,  worry  about 
ridicule  (Nicholson  et  al.,  1998),  and  stress  about  the  stereotype 
held  by  others  (Steele  &  Aronson,  1995)  can  exacerbate  the  anx¬ 
iety  that  perpetuates  the  digital  divide.  Where  possible,  struc¬ 
turing  the  work  and  school  environments  to  allow  private  com¬ 
puter  interactions  may  be  quite  helpful. 

Another  strategy  for  diminishing  the  gender  divide  in  com¬ 
puting  is  to  consider  an  educational  structure  in  which  girls 
have  an  opportunity  to  engage  in  computer  instruction  with 
other  girls  rather  than  boys.  This  can  be  accomplished  by  single¬ 
sex  schools  or  by  classes  within  a  traditional  school  that  are  for 
girls  only.  Research  with  young  children  (Light  et  al.,  2000) 
showed  that  girls  performed  significantly  better  in  same-sex 
dyads.  Jackson  and  Smith  (2000)  applied  the  single-gender  con¬ 
cept  to  11-13-year-old  girls  in  their  math  classes.  They  found 
evidence  for  increased  performance  and  lower  stress  when  the 
girls  took  their  class  in  the  single-gender  format. 

Changing  the  Conclusions 

Parents  and  educators  are  prone  to  succumb  to  the  same 
stereotype  as  the  rest  of  society:  If  computer  technology  is  an 
area  that  boys  do  well  in,  then  it  must  be  true  that  girls’  success 
is  a  lucky  break  or  a  matter  of  sheer  determined  effort.  In  our 
roles  as  parents  and  educators,  we  must  be  vigilant  about  the 
two  elements  of  that  trap.  First,  it  is  only  a  stereotype  that  sup¬ 
ports  the  first  premise:  Girls  are  not  intrinsically  worse  at  inter¬ 
acting  with  computers  than  boys.  Second,  girls  and  boys  must 
be  allowed  to  use  the  outcomes  of  their  behavior  to  make  judg¬ 
ments  about  how  good  they  are  at  computer  tasks.  If  a  child 
does  well  at  computer  tasks,  it  should  be  taken  as  evidence  for 
his  or  her  ability.  However,  research  has  shown  that  girls  are 
not  given  the  opportunity  to  benefit  from  their  success.  By  un¬ 
wittingly  communicating  attributional  patterns  that  differ  by 
gender,  parents  and  educators  have  systematically  deprived  girls 
of  benefiting  from  their  success  and  deprived  all  children  of 
learning  from  their  mistakes.  As  Fig.  38.1  depicts,  the  effects 
of  successful  and  unsuccessful  computer  performance  pass 
through  an  attributional  filter  that  undermines  the  beneficial  ef¬ 
fects  of  success  and  exacerbates  the  digital  divide. 
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INTRODUCTION 


The  expanding  power  of  computers  and  the  recent  growth  of  in¬ 
formation  technologies  such  as  the  Internet  have  made  it  possi¬ 
ble  for  large  numbers  of  people  to  have  direct  access  to  an 
increasingly  wide  array  of  information  sources  and  services.  Net¬ 
work  usage  is  exploding  and  new  interfaces,  search  engines,  and 
features  are  becoming  available  at  an  unprecedented  rate.  In 
2003,  about  61.8%  of  households  in  the  United  States  owned 
a  personal  computer  (PC)  and  approximately  54.7%  of  these 
households  had  access  to  the  Internet  (U.S.  Department  of  Com¬ 
merce,  2005;  Fig.  39.1).  In  essence,  use  of  technology  has  become 
an  integral  component  of  work,  education,  communication,  and 
entertainment.  Technology  is  also  being  increasingly  used  within 
the  healthcare  arena  for  service  delivery,  in-home  monitoring, 
interactive  communication  (e.g.,  between  patient  and  physician), 
transfer  of  health  information  and  peer  support.  Use  of  automatic 
teller  machines,  interactive  telephone-based  menu  systems,  cel¬ 
lular  telephones,  and  VCRs  is  also  quite  common.  Furthermore, 
telephones,  television,  home  security  systems,  and  other  com¬ 
munication  devices  are  becoming  more  integrated  with  com¬ 
puter  network  resources  providing  faster  and  more  powerful  in¬ 
teractive  services.  In  essence,  in  order  to  function  independently 
and  successfully  interact  with  the  environment,  people  of  all  ages 
need  to  interact  with  some  form  of  technology. 

Coupled  with  the  technology  explosion  is  the  aging  of  the 
population.  In  2004,  persons  65  years  or  older  represented 
12.4%  of  the  U.S.  population  and  it  is  estimated  that  people  in 
this  age  group  will  represent  20.6%  of  the  population  by  2050 
(Fig.  39.2).  In  addition,  the  older  population  itself  is  getting 
older.  Currently,  about  44.5  million  people  are  over  the  age  of 
75  years,  and  by  the  year  2050,  almost  50  million  people  will 
be  75  years  or  older  (National  Center  for  Health  Statistics,  2005; 
Fig.  39.2). 

Given  that  older  people  represent  an  increasing  large  pro¬ 
portion  of  the  population  and  will  need  to  be  active  users  of 
technology,  issues  surrounding  aging  and  information  tech¬ 
nologies  are  of  critical  importance  within  the  domain  of  human- 
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FIGURE  39.1.  Trends  of  computer  and  internet  access:  1984- 
2003.  (Source:  U.S.  Department  of  Commerce,  October  2005). 
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FIGURE  39.2.  U.S.  population.  (Source:  National  Center  for 
Health  Statistics,  2005). 


computer  interaction  (HCI).  To  ensure  that  older  adults  are  able 
to  adapt  to  the  new  information  environment,  we  need  to  un¬ 
derstand  (a)  the  implications  of  age-related  changes  in  func¬ 
tional  abilities  for  the  design  and  implementation  of  technol¬ 
ogy  systems;  (b)  the  needs  and  preferences  of  older  people  with 
respect  to  the  design  of  technology  interfaces  and  software  ap¬ 
plications;  and  (c)  the  problems  and  challenges  older  adults 
confront  when  adopting  new  technologies.  Although  this  topic 
has  received  increased  attention  within  the  research  commu¬ 
nity,  there  are  still  many  unanswered  questions. 

The  intent  of  this  chapter  is  to  summarize  the  current  state 
of  knowledge  regarding  information  technologies  and  older 
adults.  Topics  that  will  be  discussed  include  adoption  and  use  of 
information  technologies  by  older  adults,  training,  hardware 
and  software  design,  and  other  issues  such  as  privacy  and  trust 
in  technology.  A  detailed  discussion  of  the  aging  process  will  not 
be  provided.  There  are  many  excellent  sources  of  this  material 
(e.g.,  Fisk,  Rogers,  Charness,  Czaja,  &  Sharit,  2004;  Birren  & 
Schaie,  2001;  Fisk  &  Rogers,  1997).  Instead,  a  brief  review  brief 
review  of  age-related  changes  in  abilities  that  have  relevance  to 
the  design  of  technology  systems  will  be  presented.  Finally,  sug¬ 
gestions  for  design  guidelines  and  areas  of  needed  research  will 
be  summarized.  It  is  hoped  that  this  chapter  will  serve  to  moti¬ 
vate  researchers  and  system  designers  to  consider  older  adults 
as  an  important  component  of  the  HCI  community. 


USE  OF  COMPUTER  TECHNOLOGY 
BY  OLDER  ADULTS 


Usage  Patterns  and  Trends 

As  noted,  there  are  a  number  of  settings  where  older  people  are 
likely  to  encounter  computers  and  other  forms  of  communica¬ 
tion  technologies,  such  as  the  Internet,  including  the  work¬ 
place,  the  home,  healthcare,  and  service  settings.  However,  de¬ 
spite  a  trend  toward  increased  computer  use  among  older 
people,  use  of  computers  and  other  forms  of  technology  is  still 
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lower  among  older  people  as  compared  to  younger  people. 
As  shown  in  Fig.  39.3,  more  than  32  million  of  older  people 
(55  years  and  older)  have  a  computer  at  home  as  compared  to 
62  million  of  people  between  35  to  54  years.  Similarly  although 
use  of  the  Internet  among  older  people  is  increasing  it  is  still 
lower  than  that  of  younger  age  groups  (Fig.  39.4).  In  2005,  about 
26%  of  people  age  65  and  older  were  Internet  users  as  com¬ 
pared  to  67%  of  people  age  50  to  64  and  80%  of  those  30  to  49 
years  old.  Furthermore,  people  aged  65  and  older  are  much  less 
likely  than  younger  people  to  have  a  high-speed  Internet  con¬ 
nection  (Pew  Internet  &  American  Life  Project,  2005).  In  addi¬ 
tion,  seniors  who  do  use  the  Internet  tend  to  be  White,  highly 
educated,  and  living  in  households  with  higher  incomes  (Pew 
Internet  &  American  Life  Project,  2004).  Recent  data  (Czaja 
et  al.,  in  press)  also  indicated  that  older  adults  are  less  likely 
than  younger  adults  to  use  other  forms  of  technology  such  as 
automatic  teller  machines  or  VCRs  and  that  general  use  of  tech¬ 
nology  is  a  predictor  of  use  of  computers  and  use  of  the  Internet. 

Lack  of  technology  use  puts  older  adults  at  a  disadvantage 
in  terms  of  their  abilities  to  live  and  function  independently  and 
to  successfully  negotiate  the  built  environment.  Furthermore, 
the  full  benefits  of  technology  may  not  be  realized  by  older  pop¬ 
ulations.  Technology  holds  great  potential  for  improving  the 
quality  of  life  for  older  people.  For  example,  computer  networks 
can  facilitate  linkages  between  older  adults  and  healthcare 
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FIGURE  39.3.  Percent  of  U.S.  persons  using  computer.  (Source: 
U.S.  Census  Bureau,  Current  Population  Survey,  October  2003). 
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FIGURE  39.4.  Percent  of  U.S.  persons  using  internet.  (Source: 
U.S.  Census  Bureau,  Current  Population  Survey,  October  2003). 


providers  and  communication  with  family  members  and  friends, 
especially  those  who  are  long  distant.  It  is  quite  common  within 
the  United  States  for  family  members  to  be  dispersed  among 
different  geographic  regions.  In  fact,  nearly  7  million  Americans 
are  long-distance  caregivers  for  older  relatives  (Family  Care¬ 
giving  Alliance,  2005).  Clearly,  network  linkages  can  make  it  eas¬ 
ier  for  family  members  to  communicate,  especially  those  who 
live  in  different  time  zones.  Computers  may  also  be  used  to  help 
older  people  communicate  with  healthcare  providers  or  other 
older  people  and  may  help  older  people  become  involve  in  con¬ 
tinuing  education.  For  example,  telemedicine  applications  allow 
direct  communication  between  healthcare  providers  and  pa¬ 
tients.  There  are  also  many  opportunities  to  enroll  in  distance 
learning  courses  online.  These  opportunities  will  be  more  en¬ 
hanced  with  future  developments  in  multimedia  and  video  con¬ 
ferencing  technologies. 

Computers  and  the  Internet  can  also  help  older  people  ac¬ 
cess  information  about  community  services  and  resources.  The 
Internet  may  also  be  used  to  facilitate  the  performance  of  rou¬ 
tine  tasks  such  as  financial  management  or  shopping.  Access  to 
these  resources  and  services  may  be  particularly  beneficial  for 
older  people  who  have  mobility  restrictions  or  lack  of  trans¬ 
portation.  Finally,  many  government  services,  such  as  Medicare 
and  Social  Security,  have  online  information  and  are  moving  to¬ 
ward  online  application  processes.  Clearly  technology  is  becom¬ 
ing  an  integral  component  of  our  everyday  lives.  The  following 
section  will  present  a  more  detailed  discussion  of  the  potential 
use  of  computer  technology  by  older  adults.  This  is  followed  by 
a  discussion  of  the  implications  of  aging  for  system  design. 


Computer  and  Technology  Usage  in  Everyday  Domains 
Work  Environments 

One  setting  where  older  people  are  likely  to  encounter  com¬ 
puter  technology  is  the  workplace.  The  rapid  introduction  of 
computers  and  other  forms  of  automated  technology  into  oc¬ 
cupational  settings  implies  that  most  workers  need  to  interact 
with  computers  simply  to  perform  their  jobs.  Computer-inter- 
active  tasks  are  becoming  prevalent  within  the  service  sector, 
office  environments,  and  manufacturing  industries.  In  2003,  77 
million  people  in  the  United  States  used  a  computer  on  the  job 
these  workers  account  for  55%  of  the  labor  force  and  about  2  out 
of  every  5  employed  individuals  used  the  Internet  on  the  job 
(U.S.  Bureau  of  Labor  Statistics,  2005).  This  is  in  comparison  to 
45%  of  workers  in  1993  and  25%  in  1984  (U.S.  Census  Bureau, 
1999).  Currently,  more  than  2  million  adults  aged  65  and  older 
use  a  computer  at  work,  and  more  than  1.3  million  use  the  In¬ 
ternet  at  work  (U.S.  Census  Bureau,  2003). 

The  data  also  suggests  that  technology  will  have  a  major  im¬ 
pact  on  the  future  structure  of  the  labor  force,  changing  the 
types  of  jobs  that  are  available  and  the  way  in  which  jobs  are  per¬ 
formed.  In  this  regard,  most  of  the  growth  will  come  from  three 
occupational  groups:  computer  and  mathematical  occupations, 
healthcare  practitioners  and  technical  occupations,  and  educa¬ 
tion,  training,  and  library  occupations.  Other  occupations  that 
will  experience  growth  include  management  and  financial  oc¬ 
cupations,  sales  and  related  occupations,  office  and  administrative 
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support  operations,  and  installation,  maintenance  and  repair  oc¬ 
cupations  especially  within  the  telecommunications  industry 
(U.S.  Bureau  of  Labor  Statistics,  2005).  Computer  and  commu¬ 
nication  technologies  will  be  an  integral  component  of  most  of 
these  occupations. 

The  number  of  people  who  are  telecommuting  is  also  rapidly 
increasing.  Telecommuting  encompasses  a  number  of  work 
arrangements  including  home-based  work,  satellite  office,  and 
neighborhood  telework  centers.  It  can  be  done  on  a  full-  or  part- 
time  basis.  In  2001,  about  29  million  workers  in  the  United  States 
engaged  in  some  form  of  telecommuting,  and  it  is  estimated  that 
there  could  be  slightly  more  than  40  million  telecommuters  by 
2010  (Potter,  2003).  Telecommuting  maybe  particularly  appro¬ 
priate  for  older  adults,  as  they  are  more  likely  than  younger  peo¬ 
ple  to  be  mobility  impaired  or  engaged  in  some  form  of  caregiv¬ 
ing.  Telecommuting  also  allows  for  more  flexible  work  schedules 
and  autonomy  and  is  more  amenable  to  part-time  work.  These 
job  characteristics  are  general  preferred  by  older  people.  A  re¬ 
cent  study  by  Sharit,  Czaja,  Hernandez,  et  al.  (2004)  found  that 
older  people  are  interested  in  pursuing  this  type  of  work  as  it 
provides  them  with  an  opportunity  to  remain  productive  and  en¬ 
gaged  in  work  activities  at  a  more  flexible  schedule. 

However,  in  today’s  world  telecommuting  typically  involves 
the  use  of  computer  and  the  Internet,  which  may  be  problem¬ 
atic  for  older  people  as  they  are  less  likely  to  have  computer 
skills.  The  prevalence  of  telecommuting  also  raises  other  inter¬ 
esting  issues  such  as  job  training.  For  example,  can  online  train¬ 
ing  programs  be  used  to  train  teleworkers,  what  is  the  best  way 
to  keep  these  workers  updated  on  changes  in  job  demands,  and 
what  strategies  can  be  used  to  provide  teleworkers  with  techni¬ 
cal  support?  As  will  be  discussed,  issues  regarding  job  training 
are  particularly  pertinent  for  older  adults. 

In  summary,  to  ensure  that  older  adults  are  able  to  adapt  suc¬ 
cessfully  to  new  workplace  technologies,  it  is  important  that 
older  adults  be  provided  with  access  to  retraining  programs  and 
incentives  to  invest  in  learning  new  skills  and  abilities.  Greater  at¬ 
tention  also  needs  to  be  given  to  the  design  of  training  and  in¬ 
structional  materials  for  older  learners.  It  is  also  important  to  un¬ 
derstand  how  to  design  technology  so  that  it  is  useful  and  usable 
for  older  adult  populations  especially  those  with  some  type  of 
impairment.  Adaptive  technologies  may  make  continued  work 
more  viable  for  older  people,  especially  those  with  some  type 
of  chronic  condition  or  disability.  For  example,  a  number  of  avail¬ 
able  technologies  can  aid  people  with  blindness  or  low  vision, 
such  as  portable  Braille  computers,  speech  synthesizers,  optical 
character  recognition,  screen  enlargement  software,  and  video 
magnifiers.  Similarly,  personal  amplifying  devices  and  amplified 
telephone  receivers  can  aid  persons  with  hearing  loss  and  voice 
recognition  software  may  be  beneficial  for  persons  who  have  lim¬ 
ited  ability  to  use  traditional  input  devices,  such  as  a  mouse  or 
keyboard  because  of  hand  or  finger  limitations.  Clearly,  a  number 
of  technologies  can  improve  the  ability  of  older  adults  to  func¬ 
tion  in  work  environments;  however,  the  availability  of  these 
technologies  does  not  guarantee  their  success.  The  degree  to 
which  these  technologies  improve  the  work  life  of  older  persons 
depends  on  the  usability  of  these  technologies,  the  availability  of 
these  technologies  within  organizations,  the  manner  in  which 
these  technologies  are  implemented  (e.g.,  training),  and  the  will¬ 
ingness  of  older  people  to  use  these  devices. 


Home  Environments 

There  are  a  number  of  ways  older  people  can  use  comput¬ 
ers  at  home  to  enhance  their  independence  and  qualities  of  life. 
Home  computers  can  provide  access  to  information  and  ser¬ 
vices  and  can  facilitate  the  performance  of  tasks  such  as  banking 
and  grocery  shopping.  Many  older  people  have  problems  per¬ 
forming  these  tasks  because  of  restricted  mobility,  lack  of  trans¬ 
portation,  inconvenience,  and  fear  of  crime  (Nair,  1989).  For 
example,  about  13%  of  older  people  report  restrictions  in  per¬ 
forming  activities  such  as  shopping  (Federal  Interagency  Forum 
on  Aging-Related  Statistics,  2004).  Home  computers  can  also 
be  used  to  expand  educational,  recreational,  and  communica¬ 
tion  opportunities. 

Several  studies  (e.g.,  Furlong,  1989;  Filers,  1989;  Czaja,  Guer- 
rier,  Nair,  &  Landauer,  1993)  have  shown  that  older  adults  are 
receptive  to  using  e-mail  as  a  form  of  communication  and  that 
e-mail  is  effective  in  increasing  social  interaction  among  the 
elderly.  Increased  social  connectivity  can  so  be  beneficial  for 
older  people  especially  those  who  are  isolated  or  live  alone.  In 
2003,  about  31%  of  all  older  Americans  lived  alone,  including 
41%  of  older  women  and  17%  of  older  men  (Federal  Inter¬ 
agency  Forum  on  Aging- Related  Statistics,  2004).  A  recent  study 
(Cody,  Dunn,  Hoppin,  &  Wendt,  1999)  found  that  older  adults 
who  learned  to  use  the  Internet  had  more  positive  attitudes  to¬ 
ward  aging,  higher  levels  of  perceived  social  support,  and 
higher  levels  of  connectivity  with  friends  and  relatives. 

The  Internet  can  also  be  used  by  older  people  for  continuing 
education.  There  are  websites  and  software  programs  available 
on  a  wide  variety  of  topics.  In  fact,  e-learning,  which  refers  to 
learning  delivered  or  enhanced  by  electronic  technology,  is  be¬ 
coming  one  of  the  most  popular  forms  of  training  within  indus¬ 
try  and  in  the  education  industry  for  the  lifelong  learner  (Willis, 
2004).  There  are  also  formal  online  degree  programs  and  op¬ 
portunities  to  be  linked  via  videoconferencing  and  networking 
facilities  to  actual  classrooms.  The  American  Association  for  Re¬ 
tired  Persons  (AARP)  has  begun  to  offer  several  online  courses. 
These  learning  opportunities  can  enable  older  adults  to  remain 
intellectually  engaged  and  active  especially  those  who  have  diffi¬ 
culty  accessing  more  traditional  classroom-based  adult  education 
programs.  Research  (e.g..  Bakes  &  Smith,  1999)  clearly  shows 
that  cognitive  engagement  and  stimulation  is  important  to  suc¬ 
cessful  aging.  In  fact,  lifelong  learning  is  a  growing  interest 
among  older  people.  Currently,  in  the  United  States,  more  than 
33  million  adults  aged  45  and  older  are  engaged  in  some  form 
of  continuing  education  (Adler,  2002).  The  success  of  SeniorNet 
also  points  to  the  receptivity  of  older  people  to  use  computers 
for  activities  such  as  communication  and  continuing  education. 
SeniorNet  is  a  nonprofit  organization  whose  mission  is  to  pro¬ 
vide  people  over  the  age  of  50  with  access  to  computer  technol¬ 
ogy.  Members  learn  to  use  computers  for  communicating,  con¬ 
tinuing  education,  financial  management,  and  other  activities, 
such  as  desktop  publishing.  SeniorNet  also  has  online  courses  on 
a  variety  of  other  topics  such  as  literature  and  poetry.  Currently 
the  organization  has  over  40,000  members  and  over  240  Learning 
Centers  throughout  the  United  States  (SeniorNet,  2005). 

The  Internet  can  also  create  online  learning  communities, 
which  bring  social  interaction  to  learning  and  support  the  learn¬ 
ing  process.  An  online  community  refers  to  an  aggregation  of 
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people  who  have  a  shared  goal,  interest,  need,  or  activity  and 
who  have  repeated  interactions  and  share  resources  (Preece  & 
Maloney-Krichmar,  2003).  The  imminent  availability  of  the  next 
generation  Internet  and  interactive  multimedia  programming 
will  further  expand  the  education  experiences  that  are  available 
to  individuals  and  enable  information  to  be  tailored  to  the  spe¬ 
cific  needs  and  characteristics  of  users.  This  may  be  particularly 
beneficial  to  older  adults  who  often  learn  at  a  slower  pace  than 
younger  people  learn  and  need  more  instructional  support.  A 
recent  pilot  study  (Stoltz-Loike,  Morrell,  &  Loike,  2005)  found 
that  e-learning  can  be  an  effective  tool  for  teaching  older  adults 
technology  and  business  related  skills.  The  e-learning  tool  eval¬ 
uated  in  the  study  was  customized  for  older  adults.  However, 
as  noted  by  the  authors,  the  results  were  based  on  a  small  sam¬ 
ple  and  the  e-learning  methods  were  not  compared  to  other 
traditional  training  methods.  They  also  pointed  out  that  many  of 
the  e-learning  materials  available  on  the  market  assume  a  rela¬ 
tively  sophisticated  knowledge  of  technology  and  familiarity 
with  e-learning  environments.  This  may  be  disadvantageous  to 
older  people  who  have  less  knowledge  of  technology  and  less 
experience  with  computers. 

In  fact,  despite  the  significant  market  of  multimedia  user  in¬ 
terfaces  in  training,  multimedia  design  “currently  leaves  a  lot  to 
be  desired,”  and  as  with  other  products  with  which  humans  in¬ 
terface,  it  needs  to  “adopt  a  usability  engineering  approach”  (Sut¬ 
cliffe,  2002).  Currently,  there  is  little  empirical  knowledge  to  guide 
the  development  of  these  applications.  In  addition,  almost  no 
research  has  been  done  with  older  adults.  This  issue  is  especially 
compelling  given  that  multimedia  applications  place  demands  on 
cognitive  processes,  such  as  visual  search,  working  memory,  and 
selective  attention,  which  are  known  to  decline  with  age. 

Computers  may  also  be  used  to  augment  the  memory  of 
older  people  by  providing  reminders  of  appointments,  important 
dates,  and  medication  schedules.  Chute  and  Bliss  (1994)  showed 
that  computers  could  be  used  for  cognitive  rehabilitation  and 
for  the  training  and  retraining  of  cognitive  skills  such  as  memory 
and  attention.  Schwartz  and  Plude  (1995)  found  that  compact 
disc  (CD)  interactive  systems  are  an  effective  medium  for  mem¬ 
ory  training  with  the  elderly.  Computers  can  also  enhance  the 
safety  and  security  of  older  people  living  at  home.  As  noted,  a 
large  proportion  of  older  women  live  at  home  alone.  Systems  can 
be  programmed  to  monitor  home  appliances,  electrical,  and  ven¬ 
tilation  systems  and  can  be  linked  to  emergency  services. 

Healtficare 

Computer  technology  also  holds  the  promise  of  improving 
healthcare  for  older  people.  Electronic  links  can  be  established 
between  healthcare  professionals  and  older  clients,  providing 
caregivers  with  easy  access  to  their  patients  and  allowing  them  to 
conduct  daily  status  checks  or  to  remind  patients  of  home 
healthcare  regimes.  Computers  may  also  be  used  for  healthcare 
assessment  and  monitoring.  Ellis,  L.  B.  M.,  Joo,  and  Gross  (1991) 
demonstrated  that  older  people  could  successfully  use  a  micro¬ 
computer-based  health  risk  assessment.  In  the  future  computers 
may  be  used  to  monitor  a  patient’s  physical  functioning  such  as 
measuring  blood  pressure,  pulse  rate,  temperature,  and  so  forth. 
The  use  of  computers  in  healthcare  management  offers  the  po¬ 


tential  of  allowing  many  people  who  are  at  risk  for  institutional¬ 
ization  to  remain  at  home. 

The  Internet  is  also  shaping  and  having  a  pronounced  im¬ 
pact  on  personal  health  behavior.  Interactive  health  communi¬ 
cation,  or  e-Health,  generally  refers  to  the  interaction  of  an 
individual  with  an  electronic  device  or  communication  technol¬ 
ogy  (such  as  the  Internet)  to  access  or  transmit  health  informa¬ 
tion  or  to  receive  or  provide  guidance  and  support  on  a  health 
related  issue  (Robinson,  Eng,  &  Gustafson,  1998).  The  scope  of 
e-Health  applications  is  broad  but  mostly  encompasses  search¬ 
ing  for  health  information,  participating  in  support  groups  and 
consulting  with  healthcare  professionals.  Currently,  more  than 
70,000  websites  provide  health  information,  and  in  2003,  77  mil¬ 
lion  American  adults  searched  the  Internet  for  health  informa¬ 
tion  (Pew  Internet  &  American  Life,  2003).  The  majority  of  con¬ 
sumers  search  for  information  on  a  specific  disease  or  medical 
problem,  medical  treatments  or  procedures,  medications,  al¬ 
ternative  treatments  or  medicine,  and  information  on  providers 
or  hospitals.  Reasons  for  the  growth  of  seeking  health  informa¬ 
tion  online  include  easier  access  by  a  more  diverse  group  of 
users  to  more  powerful  technologies,  the  development  of  par¬ 
ticipative  healthcare  models,  the  growth  of  health  information 
that  makes  it  difficult  for  any  one  physician  to  keep  pace,  cost 
containment  efforts  that  reduce  physicians  time  with  patients, 
and  raising  concerns  about  self-care  and  prevention  (Cline  & 
Hayes,  2001). 

However,  the  fact  that  consumers  have  access  to  e-Health  ap¬ 
plications  has  significant  implications  for  both  patients  and 
providers.  On  the  positive  side,  access  to  health  information  can 
empower  patients  to  take  a  more  active  role  in  the  healthcare 
process.  Patient  empowerment  can  result  in  more  informed  de¬ 
cision  making,  more  tailored  treatment  decisions,  stronger  pa¬ 
tient-provider  relationships,  increased  patient  compliance,  and 
better  medical  outcomes.  On  the  negative  side,  access  to  this 
wide  array  of  health  information  can  overload  both  patient  and 
physicians,  disrupt  existing  relationships,  and  lead  to  poor  de¬ 
cision  making  on  the  part  of  consumers.  For  example,  one  ma¬ 
jor  concern  within  the  e-Health  arena  is  the  lack  of  quality 
control  mechanisms  for  health  information  on  the  Internet. 
Currently,  consumers  can  access  information  from  credible 
scientific  and  institutional  sources  (e.g.,  Medline  Plus)  and  un¬ 
reviewed  sources  of  unknown  quality.  Inaccurate  health  infor¬ 
mation  could  results  in  inappropriate  treatment  or  cause  de¬ 
lays  in  seeking  healthcare.  In  fact,  a  recent  study  (El-Attar,  Gray, 
Nair,  Ownby,  &  Czaja,  2005)  found  that  older  adults  trust  health 
information  on  the  Internet  and  generally  find  the  Internet  to 
be  a  valuable  source  of  health  information.  However,  they  also 
indicated  that  they  found  health  websites  somewhat  difficult  to 
use.  Other  concerns  related  to  the  ability  of  nonspecialists  to  in¬ 
tegrate  and  interpret  the  wealth  of  information  that  is  available 
and  the  ability  of  healthcare  providers  to  keep  pace  with  their 
patients.  Physicians  increasingly  report  that  patients  come  to  of¬ 
fice  visits  armed  with  information  on  their  illness  or  conditions 
and  treatment  options  (Ferguson,  1998).  Results  from  an  Inter¬ 
net  user  survey  (Pew  Internet  &  American  Life  Project,  2000) 
indicated  that  access  to  Internet  health  information  has  an  in¬ 
fluence  on  consumer  decisions  about  seeking  care,  treatment 
choices,  and  their  interactions  with  physicians.  Finally,  some  con¬ 
sumers  may  find  health  information  difficult  to  access  because  of 
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design  features  that  result  in  usability  problems,  lack  of  training,  or 
limited  access  to  technology. 

Computer  technology  may  also  be  beneficial  for  family  care¬ 
givers  who  are  providing  care  for  an  older  person  with  a  chronic 
illness  or  disease  such  as  dementia.  Generally,  the  prevalence 
of  chronic  conditions  or  illnesses,  such  as  dementia,  diabetes, 
heart  disease,  or  stroke,  increases  with  age,  and  consequently, 
older  adults  (especially  the  “oldest  old”)  are  more  likely  to  need 
some  form  of  care  or  assistance.  Approximately,  7  million  peo¬ 
ple  aged  65  years  and  older  have  mobility  or  self-care  limita¬ 
tions,  and  about  4.5  million  Americans  suffer  from  Alzheimer’s 
disease  (AD) .  Family  members  are  the  primary  and  preferred 
source  of  help  for  elders.  Currently,  at  least  22  million  Americans 
are  providing  care  for  an  adult  who  is  ill  or  disabled  (Family 
Caregiver  Alliance,  2005). 

Current  information  technologies  offer  the  potential  of  pro¬ 
viding  support  and  delivering  services  to  caregivers  and  other 
family  members.  Computer  networks  can  link  caregivers  to  each 
other,  healthcare  professionals,  community  services,  and  edu¬ 
cational  programs.  Information  technology  can  also  enhance  a 
caregiver’s  ability  to  access  health-related  information  or  infor¬ 
mation  regarding  community  resources.  Gallienne,  Moore  and 
Brennan  (1993)  found  that  access  to  a  computer  network, 
“ComputerLink”,  increased  the  amount  of  psychological  sup¬ 
port  provided  by  nurses  to  a  group  of  homebound  caregivers 
of  AD  patients  and  enabled  caregivers  to  access  a  support  net¬ 
work  that  enabled  them  to  share  experiences,  foster  new  friend¬ 
ships,  and  gather  information  on  the  symptoms  of  AD.  Tech¬ 
nology  can  also  aid  caregivers’  abilities  to  manage  their  own 
healthcare  needs  as  well  as  those  of  their  patients  by  giving 
them  access  information  about  medical  problems,  treatments, 
and  prevention  strategies.  Software  is  available  on  several 
health-related  topics  such  as  stress  management,  caregiving 
strategies,  and  nutrition.  For  example,  the  Alzheimer’s  Associa¬ 
tion  has  a  website  that  has  information  for  caregivers  as  does 
the  Family  Caregiving  Alliance. 

The  Miami  site  of  the  REACH  (Resources  for  Enhancing 
Alzheimer’s  Caregiver  Health)  program  evaluated  a  family- 
therapy  intervention  augmented  by  a  computer-telephone 
system  (CTIS)  for  family  caregivers  of  Alzheimer’s  patients. 
The  intent  of  the  CTIS  system  was  to  enhance  the  family  ther¬ 
apy  intervention  by  facilitating  the  caregivers’  ability  to  access 
formal  and  informal  support  services  caregiver.  The  system 
enabled  the  caregivers  to  communicate  with  therapists,  fam¬ 
ily,  and  friends;  to  participate  in  online  support  groups;  to 
send  and  receive  messages;  and  to  access  information  data¬ 
bases  such  as  the  Alzheimer’s  Association  Resource  Guide.  A 
respite  function  was  also  provided.  In  addition,  the  CTIS  sys¬ 
tem  provided  therapists  with  enhanced  access  to  both  the 
caregivers  and  their  family  members.  The  experience  with  the 
system  was  very  positive  with  high  acceptance  of  the  system 
by  caregivers.  The  majority  of  caregivers  like  the  system  and 
find  it  valuable  and  easy  to  use.  The  most  common  reason 
that  caregivers  used  the  system  was  to  communicate  with 
other  family  members,  especially  those  who  did  not  live 
nearby.  The  data  also  indicated  that  the  system  facilitated 
communication  with  other  caregivers.  Most  caregivers  re¬ 
ported  that  they  found  the  participation  in  the  online  support 


groups  to  be  very  valuable  (Czaja  &  Rubert,  2002).  Finally,  the 
data  showed  that  caregivers  who  received  the  CTIS  interven¬ 
tion  reported  less  depressive  symptoms  especially  those  who 
were  Cuban  Americans  (Eisdorfer  et  al.,  2003). 

Clearly,  computer  technology  holds  the  promise  of  improv¬ 
ing  the  quality  of  life  for  older  adults  and  their  families.  How¬ 
ever,  for  the  full  potential  of  technology  to  be  realized  for  these 
populations,  the  needs  and  abilities  of  older  adults  must  be  con¬ 
sidered  in  system  design.  As  will  be  demonstrated  in  this  chap¬ 
ter,  older  adults  generally  find  technologies  such  as  computers 
to  be  valuable  and  are  receptive  to  using  this  type  of  technology. 
However,  available  data  (e.g..  Mead,  Spauding,  Sit,  Meyer,  & 
■Walker,  1997;  Czaja,  Sharit,  Ownby,  Roth,  &  Nair,  2001)  also  in¬ 
dicated  that,  although  older  people  are  generally  willing  and 
able  to  use  computers,  they  typically  have  more  problems  learn¬ 
ing  to  use  and  operate  computer  technology  than  younger 
adults.  They  also  have  less  knowledge  about  potential  uses  of 
computers  and  about  how  to  access  computers  and  other  forms 
of  technology.  Morrell,  Mayhorn,  and  Bennett  (2000)  found  that 
the  two  primary  predictors  for  not  using  the  Internet  among 
people  aged  60  years  and  older  were  lack  of  access  to  a  com¬ 
puter  and  lack  of  knowledge  about  the  Internet.  Other  barriers 
to  computer  and  Internet  access  include  cost,  lack  of  technical 
support,  and  usability  problems.  A  recent  study  examining  web 
usability  among  people  aged  65  and  older  found  that  standard 
websites  are  twice  as  difficult  to  use  for  seniors  as  compared  to 
younger  adults  (Nielsen  Norman  Group,  2005).  People  over  the 
age  of  65  also  tend  to  report  lower  confidence  than  younger 
people  in  their  abilities  to  learn  to  use  computers  (American  As¬ 
sociation  of  Retired  People,  2002).  Before,  the  full  benefits  of 
computer  technology  can  be  realized  for  older  people  it  is  im¬ 
portant  to  maximize  the  usefulness  and  usability  of  these  tech¬ 
nologies  for  this  population.  The  following  section  will  review 
characteristics  of  older  adults  that  have  relevance  to  the  design 
of  computer-based  systems. 


UNDERSTANDING  THE  OLDER 
COMPUTER  USER 


Who  Are  Today's  Elderly? 

In  general,  older  Americans  today  are  healthier,  more  diverse, 
and  more  educated  than  previous  generations  (Bass,  1995).  Be¬ 
tween  1970  and  2000,  the  percentage  of  adults  aged  65  and 
older  who  had  completed  high  school  increased  by  about  40%, 
and  in  2000,  at  least  16%  of  people  in  this  age  group  had  at  least 
a  bachelor’s  degree.  As  noted,  higher  levels  of  education  are 
typically  associated  with  technology  adoption,  and  people  who 
are  more  educated  are  more  likely  to  use  computers  and  the 
Internet. 

On  some  indices,  today’s  older  adults  are  healthier  than  pre¬ 
vious  generations.  The  number  of  people  65  and  older  report¬ 
ing  very  good  health  and  experiencing  good  physical  function¬ 
ing,  such  as  ability  to  walk  a  mile  or  climb  stairs,  has  increased  in 
recent  years.  Disability  rates  among  older  people  are  also  de¬ 
clining  (Federal  Interagency  Forum  on  Aging-Related  Statistics, 
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2004).  However,  the  likelihood  of  developing  a  disability  in¬ 
creases  with  age,  and  many  older  people  have  at  least  one 
chronic  condition  such  as  arthritis  or  hearing  and  vision  impair¬ 
ments  (Fig.  39.5). 

Disability  among  older  adults  has  important  implications  for 
system  design.  People  with  disabilities,  especially  disabled  el¬ 
ders  and  minorities  with  disabilities,  are  less  likely  to  use  tech¬ 
nology  such  as  computers  both  at  home  and  at  work  (Kaye, 
2000;  Fig.  39.6).  In  fact,  recent  data  indicate  that  Americans  of  all 


ages  who  have  a  disability  have  among  the  lowest  rates  of  Inter¬ 
net  access  (Pew  Internet  &  American  Life  Report,  2004). 

Consistent  with  demographic  changes  in  the  U.S.  population 
as  a  whole,  the  older  population  is  becoming  more  ethnically 
diverse.  The  greatest  growth  will  be  seen  among  Hispanic  per¬ 
sons,  followed  by  nonHispanic  Blacks.  Currently,  individuals 
from  ethnic  minority  groups  are  less  likely  to  own  or  use  tech¬ 
nologies  such  as  computers.  This  implies  that  technology  access 
and  training  programs  need  to  be  targeted  for  older  minority 
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FIGURE  39.5.  Chronic  illness  and  older  adults.  (Source:  Clinical  Geriatrics,  1999). 
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FIGURE  39.6.  Computer  use  experience  by  disability  status. 
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populations.  Finally,  there  are  more  elderly  women  than  men, 
and  the  proportion  of  the  population  that  is  female  increases 
with  age.  In  sum,  health  status,  race,  educational  background, 
cultural  traditions,  and  economic  circumstances  may  all  influ¬ 
ence  the  adoption  and  use  of  computer-based  technologies. 
Thus,  system  designers  need  to  understand  the  heterogeneity 
of  the  older  adult  population  and  ensure  that  usability  testing 
is  done  with  truly  representative  user  groups. 

Aging  and  Abilities 

Several  age-related  changes  in  functional  abilities  have  relevance 
to  the  design  of  computer  systems.  These  include  changes  in 
sensory/perceptual  processes,  motor  abilities,  response  speed, 
and  cognitive  processes.  In  this  chapter,  a  brief  review  of  aging 
is  provided  to  a  framework  for  understanding  the  potential  im¬ 
plications  of  the  aging  process  for  system  design.  It  is  impor¬ 
tant  to  recognize  that  there  are  substantial  individual  differences 
in  rate  and  degree  of  functional  change.  Within  any  age  group, 
young  or  old,  there  is  significant  variability  in  range  of  abilities, 
and  this  variability  tends  to  increase  with  age. 

Sensory  Processes 

A  number  of  changes  in  visual  abilities  have  relevance  to  the 
design  of  computer  systems.  Currently,  about  14  million  peo¬ 
ple  in  the  United  States  suffer  from  some  type  of  visual  impair¬ 
ment,  and  as  shown  in  Fig.  39.4,  the  incidence  of  visual  impair¬ 
ment  increases  with  age.  Generally,  with  increased  age,  there  is 
a  loss  of  static  and  dynamic  visual  acuity.  Older  adults  also  ex¬ 
perience  a  reduction  in  the  range  of  accommodation,  a  loss  a 
contrast  sensitivity,  decreases  in  dark  adaptation,  declines  in 
color  sensitivity  (especially  in  the  blue  region),  and  heightened 
susceptibility  to  problems  with  glare.  Visual  search  skills  and  the 
ability  to  detect  targets  against  a  background  also  decline  with 
age  (Kline  &  Schieber,  1985). 

Although  most  older  people  will  not  experience  severe  visual 
impairments,  they  may  experience  declines  in  eyesight  suffi¬ 
cient  to  make  it  more  difficult  to  perceive  and  comprehend  vi¬ 
sual  information.  This  has  vast  implications  for  the  design  of  com¬ 
puter  systems  given  that  computer  communications  is  primarily 
based  on  visually  presented  text.  Charness,  Schumann,  and 
Boritz  (1992)  reported  that  the  majority  of  older  participants, 
in  their  studies  of  word  processing,  experienced  some  difficulty 
reading  the  screen  and  reported  that  these  difficulties  may  have 
contributed  to  the  lower  performance  of  older  people.  In  a  more 
recent  study,  Charness  and  Holley  (2001)  found  that  older  peo¬ 
ple  had  more  difficulty  than  younger  adults  selecting  small  targets 
on  a  computer  screen,  especially  when  they  were  using  a  mouse 
as  opposed  to  a  light  pen.  Visual  decrements  may  make  it  more 
difficult  for  older  people  to  perceive  small  icons  on  toolbars,  read 
e-mail,  or  locate  information  on  complex  screens  or  websites. 
Age-related  changes  in  vision  also  have  implications  for  the  design 
of  written  instructions  and  computer  manuals  (Fisk  et  al.,  2004) 

Aging  is  also  associated  with  declines  in  auditory  acuity.  Many 
older  adults  experience  some  decline  in  auditory  function.  Age- 
associated  losses  in  hearing  include  a  loss  of  sensitivity  for  pure 


tones,  especially  for  high  frequency  tones;  difficulty  under¬ 
standing  speech,  especially  if  the  speech  is  distorted;  problems 
localizing  sounds;  problems  in  binaural  listening;  and  increased 
sensitivity  to  loudness  (Schieber,  Fozard,  Gordon-Salant,  &  Weif- 
fenback,  1991).  These  changes  in  audition  are  also  relevant  to 
design  of  computer  systems.  Older  people  may  also  find  it  diffi¬ 
cult  to  understand  synthetic  speech  as  this  type  of  speech  is  typ¬ 
ically  characterized  by  some  degree  of  distortion.  Multimedia 
systems  may  also  be  problematic  for  older  adults.  High  fre¬ 
quency  alerting  sounds  such  as  beeps  or  pings  may  also  be  dif¬ 
ficult  for  older  adults  to  detect. 

Motor  Skills 

Aging  is  also  associated  with  changes  in  motor  skills  including 
slower  response  times,  declines  in  ability  to  maintain  continu¬ 
ous  movements,  disruptions  in  coordination,  loss  of  flexibility, 
and  greater  variability  in  movement  (Rogers  &  Fisk,  2000).  The 
incidence  of  chronic  conditions  such  as  arthritis  also  increases 
with  age  (Fig.  39.4).  These  changes  in  motor  skills  have  direct 
relevance  to  the  ability  of  older  people  to  use  current  input  de¬ 
vices  such  as  a  mouse  or  keyboard.  For  example,  various  aspects 
of  mouse  control  such  as  moving,  clicking,  fine  positioning,  and 
dragging  are  likely  to  be  difficult  for  older  people. 

A  study  by  Smith,  Sharit,  and  Czaja  (1999)  also  examined  age 
differences  in  the  performance  of  basic  computer-mouse  con¬ 
trol  techniques  (pointing,  clicking,  double-clicking,  and  drag¬ 
ging)  .  The  data  indicated  that  the  older  participants  had  more 
difficulty  performing  the  tasks  than  the  younger  participants 
especially  the  complex  tasks  such  as  double-clicking.  Further¬ 
more,  age-related  changes  in  psychomotor  abilities  such  as 
manual  dexterity  were  related  to  performance  differences. 

Other  studies  that  have  examined  age  differences  in  mouse 
performance  have  found  similar  results.  For  example.  Riviere 
and  Thakor  (1996)  found  that  older  adults  were  less  successful 
in  performing  tracking  tasks  with  a  mouse  than  were  younger 
adults.  Increased  task  difficulty  resulted  in  greater  age  differ¬ 
ences.  Walker,  Millians,  and  Worden  (1996)  compared  older  and 
younger  people  on  a  basic  target  acquisition  task  and  found  that 
older  people  had  more  difficulty  especially  with  smaller  and 
more  distant  targets.  Charness,  Bosman,  and  Elliott  (1995)  com¬ 
pared  people  of  different  ages  on  a  target  acquisition  task  us¬ 
ing  mice  and  keyboard  control  in  one  study  and  target  acquisi¬ 
tion  and  scroll-bar  dragging  tasks  using  mice  and  light  pens  in 
another  study.  Although  the  performance  of  the  older  people 
improved  with  practice,  they  performed  more  slowly  in  all 
cases,  especially  with  the  mouse.  Findings  from  these  studies 
suggest  that  alternative  input  devices  might  be  beneficial  for 
older  people.  In  fact,  recent  data  (Charness  &  Holley,  2001)  sug¬ 
gested  that  for  target  selection  tasks  age  differences  in  perfor¬ 
mance  might  be  minimized  with  the  use  of  alternative  input  de¬ 
vices  such  as  a  light  pen.  In  this  regard,  Murata  and  Iwase  (2005) 
compared  target-pointing  times  among  younger,  middle-aged, 
and  older  adults  using  a  mouse  and  a  touch  panel.  They  found 
that  pointing  time  was  longer  for  the  older  age  group  when  us¬ 
ing  a  mouse  but  that  there  were  no  significant  age-related  dif¬ 
ferences  in  pointing  time  with  the  touch  panel.  Based  on  these 
results,  they  recommend  that  for  pointing  tasks,  a  touch  panel 
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interface  should  be  used  for  middle-aged  and  older  adults  and 
provide  design  guidelines  for  touch  panel  interfaces. 

Cognitive  Abilities 

Age-related  changes  in  cognition  also  have  relevance  to  the  per¬ 
formance  of  computer-based  tasks.  It  is  well  established  that 
component  behaviors  comprising  cognition  decline  with  age. 
Processes  that  decline  include  attentional  processes,  working 
memory,  discourse  comprehension,  problem  solving  and  rea¬ 
soning,  inference  formation  and  interpretation,  and  encoding 
and  retrieval  in  memory  (Park,  1992).  Aging  is  also  associated 
with  declines  in  information  processing  speed.  Older  people 
tend  to  take  longer  to  process  incoming  information  and  typi¬ 
cally  require  a  longer  time  to  respond. 

Generally  computer  tasks  are  characterized  by  cognitive  de¬ 
mands.  For  example,  component  abilities  such  as  psychomotor 
speed  and  attention  are  important  predictors  of  performance  of 
data  entry  tasks  and  menu-based  tasks  (Czaja  &  Shark,  1998a, 
Sharit,  Czaja,  Nair,  &  Lee,  2003),  and  skills  such  as  spatial  mem¬ 
ory  and  motor  skills  are  important  to  word-processing  perfor¬ 
mance.  Similarly,  searching  for  information  on  the  Internet  is  a 
complex  cognitive  task  and  involves  cognitive  skills  such  as 
memory,  reasoning,  attention,  learning,  and  problem  solving. 
Recent  data  also  indicated  that  cognitive  abilities  are  also  related 
to  technology  adoption  such  that  people  with  higher  fluid  in¬ 
telligence  are  more  likely  to  adopt  new  technologies  such  as 
computers  and  the  Internet  (Czaja  et  al.,  in  press).  Given  that 
there  are  age-related  declines  in  these  component  abilities,  use 
of  current  information  technologies  such  as  the  Internet  is  likely 
to  be  challenging  for  older  adults.  Morrell  and  Echt  (1996) 
posited  that  age-related  declines  in  cognitive  abilities  such  as 
working  memory,  processing  speed,  and  text  comprehension 
are  influential  in  age-related  differences  in  the  acquisition  of 
computer  skills.  They  recommended  that  software,  instruc¬ 
tional  materials,  and  training  protocols  be  designed  to  reduce 
demands  on  these  cognitive  mechanisms. 

Although  the  current  technologies  may  present  challenges 
for  older  adults,  the  literature  generally  suggests  that  older  peo¬ 
ple  are  receptive  to  using  new  technologies.  The  following  sec¬ 
tion  will  summarize  the  literature  on  aging  and  attitudes  toward 
computers. 


ACCEPTANCE  OF  INFORMATION 
TECHNOLOGIES  BY  OLDER  ADULTS 


One  issue  that  warrants  discussion  when  considering  age  and 
information  technology  is  the  degree  to  which  older  people  are 
willing  to  interact  with  these  types  of  systems  and  the  factors 
that  influence  technology  adoption.  A  commonly  held  belief  is 
that  older  people  are  resistant  to  change  and  unwilling  to  in¬ 
teract  with  high-tech  products.  However,  the  available  data 
largely  disputes  this  stereotype.  The  majority  of  studies  that 
have  examined  the  attitudes  of  older  people  toward  computer 
technology  indicate  that  older  people  are  receptive  to  using 
computers.  However,  older  people  do  report  more  computer 


anxiety,  less  computer  self-efficacy,  and  less  comfort  using  com¬ 
puters  than  younger  adults  (Nair,  Lee,  &  Czaja,  2005;  Czaja  et  al., 
in  press).  Furthermore,  computer  anxiety  and  computer  self- 
efficacy  are  important  predictors  of  technology  adoption  (Ellis, 
E.  R.,  &  Allaire,  1999;  Czaja  et  al.,  in  press).  However,  the  data 
also  indicated  that  attitudes  toward  technology  and  comfort  us¬ 
ing  technology  are  influenced  by  experience  and  the  nature  of 
interactions  with  computer  systems  as  well  as  system  design. 

Several  studies  (e.g..  Jay  &  Willis,  1992;  Charness,  Schumann 
&  Boritz,  1992;  Dyck  &  Smither,  1994)  showed  that  people  who 
had  experience  with  computers  had  more  positive  attitudes  and 
greater  computer  confidence.  Other  studies  showed  that  the 
nature  of  the  experience  with  technology  influences  attitudes 
toward  that  technology.  For  example,  in  a  study  examining  age 
differences  in  acquisition  of  word-processing  skills  (Czaja,  Ham¬ 
mond,  Blascovich,  &  Swede,  1986),  the  results  showed  that  post 
training  attitudes  were  related  to  the  training  experience  such 
that  people  who  rated  the  training  experience  and  their  own 
performances  positively  also  had  more  positive  posttraining  at¬ 
titudes  toward  computers.  Furthermore,  there  were  no  age  ef¬ 
fects  for  either  pretraining  or  posttraining  computer  attitudes. 
Danowski  and  Saks  (1980)  also  found  that  a  positive  experience 
with  computer-mediated  communication  resulted  in  positive  at¬ 
titudes  toward  computers  among  a  sample  of  older  people.  A 
more  recent  study  (Czaja  &  Sharit,  1998b)  examined  age  differ¬ 
ences  in  computers  as  a  function  of  computer  experience 
among  a  community  sample  of  384  adults  ranging  in  age  from 
20  to  75  years.  The  results  indicated  that,  in  general,  older  peo¬ 
ple  perceived  less  comfort  and  efficacy  using  computers  than 
younger  people.  However,  experience  with  computers  resulted 
in  more  positive  attitudes  for  all  participants  irrespective  of  age. 
Kalasky,  Czaja,  Sharit,  &  Nair  (1999)  also  found  that  experience 
with  computer  systems  increased  the  comfort  of  older  people 
toward  using  computers. 

Generally,  user  satisfaction  with  a  system,  which  is  deter¬ 
mined  by  how  pleasant  the  system  is  to  use,  is  considered  an  im¬ 
portant  aspect  of  usability.  User  satisfaction  is  an  especially 
important  usability  aspect  for  systems  that  are  used  on  a  dis¬ 
cretionary  basis  such  as  home  computers  (Nielsen,  1993).  Kel¬ 
ley,  Morrell,  Park,  and  Mayhorn  (1999)  found  that  the  most 
important  predictor  of  continued  use  of  a  bulletin  board  system, 
ELDERCOMM,  was  success  at  initial  training.  They  also  found 
that  the  most  positive  attitudes  toward  computers  were  found 
among  persons  who  used  the  system  most  often.  Cody  et  al. 
(1999),  in  their  study  of  Internet  use  among  older  adults,  also 
found  that  people  who  spent  the  most  time  online  had  the  least 
computer  anxiety  and  high  computer  efficacy.  Einally,  in  our 
study  of  e-mail  (Czaja,  Guerrier,  et  al.,  1993),  all  participants 
found  it  valuable  to  have  a  computer  in  their  home.  However, 
the  perceived  usefulness  of  the  technology  was  an  important 
factor  with  respect  to  use  of  the  technology.  When  the  partici¬ 
pants  were  asked  what  type  of  computer  applications  they 
would  like  available,  the  most  common  requests  included  emer¬ 
gency  response,  continuing  education,  health  information,  and 
banking/shopping. 

The  available  data  suggests  that  older  people  are  not  techno¬ 
phobic  and  are  willing  to  use  computers.  However,  the  nature  of 
their  experience  with  technology  and  available  technology  ap¬ 
plications  are  important  determinants  of  attitudes,  confidence. 
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and  comfort  using  technology.  Ultimately  confidence  and  com¬ 
fort  influence  technology  adoption  (Czaja  et  al.,  in  press). 


OLDER  ADULTS  AND  THE  ACQUISITION 
OF  COMPUTER  SKILLS 


A  number  of  studies  examined  the  ability  of  older  adults  to 
learn  to  use  technology  such  as  computers  (e.g.,  Elias,  E  K., 
Elias,  M.  F.,  Robbins,  &  Gage,  1987;  Gist,  Rosen,  &  Schwoerer, 
1988:  Zandri  &  Charness,  1989;  Czaja,  Hammond,  Blascovich, 
et  al.,  1986;  Czaja,  Hammond,  &  Joyce,  1989;  Charness  et  al., 
1992;  Morrell,  Park,  Mayhorn,  &  Echt,  1995;  Mead  et  al.,  1997). 
These  studies  encompass  a  variety  of  computer  applications 
and  vary  with  respect  to  training  strategies  such  as  concep¬ 
tual  versus  procedural  training  (Morrell,  Park,  et  al.,  1995)  or 
active  versus  passive  learning  approaches  (Czaja,  Hammond, 
&  Joyce,  1989).  The  influence  of  other  variables,  such  as  atti¬ 
tude  toward  computers  and  computer  anxiety,  on  learning  has 
also  been  examined. 

Overall,  the  results  of  these  studies  indicate  that  older  adults 
are,  in  fact,  able  to  use  technology  such  as  computers  for  a  va¬ 
riety  of  tasks.  However,  they  are  typically  slower  to  acquire  new 
skills  than  younger  adults  and  generally  require  more  help  and 
hands-on  practice.  In  addition,  when  compared  to  younger 
adults  on  performance  measures,  older  adults  often  achieve 
lower  levels  of  performance.  However,  the  literature  also  indi¬ 
cates  that  training  interventions  can  be  successful  in  terms  of 
improving  performance,  and  it  points  to  the  importance  of 
matching  training  strategies  with  the  characteristics  of  the 
learner.  For  example,  Czaja,  Hammond,  and  Joyce  (1989)  found 
that  older  adults  benefit  from  using  analogies  to  familiar  con¬ 
cepts  and  from  a  more  active,  hands-on  training  approach.  Sim¬ 
ilarly,  Mead  et  al.  (1997)  examined  the  effects  of  type  of  train¬ 
ing  on  efficiency  in  a  World  Wide  Web  search  activity.  The 
participants  were  trained  with  a  hands-on  web  navigation  tuto¬ 
rial  or  a  verbal  description  of  available  navigation  tools.  The 
hands-on  training  was  found  to  be  superior,  especially,  for  older 
adults.  Older  adults  who  received  hands-on  training  increased 
the  use  of  efficient  navigation  tools.  Mead  and  Fisk  (1998)  ex¬ 
amined  the  impact  of  the  type  of  information  presented  during 
training  on  the  initial  and  retention  performance  of  younger 
and  older  adults  learning  to  use  ATM  technology.  Specifically, 
they  compared  two  types  of  training:  concept  and  action.  The 
concept  training  presented  factual  information  whereas  the  ac¬ 
tion  training  was  procedural  in  nature.  The  action  training  was 
found  to  be  superior  for  older  adults.  Generally,  the  literature 
suggests  that  these  types  of  strategies  are  also  beneficial  for 
younger  people.  The  literature  (e.g.,  Czaja,  Sharit,  Ownby  et  al., 
2001;  Charness,  Kelley,  Bosman,  &  Mottram,  2001)  also  suggests 
that  prior  experience  with  technology  is  an  important  predic¬ 
tor  of  ability  to  learn  to  use  new  technology.  Finally,  it  is  impor¬ 
tant  to  provide  older  people  with  training  on  the  potential  uses 
of  the  technical  system  (e.g.,  what  the  Internet  can  be  used  for), 
as  well  as  training  on  basic  procedural  operations  (e.g.,  use  of 
the  mouse).  As  one  would  expect,  the  usability  of  the  system 
from  both  a  hardware  and  a  software  perspective  is  also  impor¬ 
tant  (Fisk  et  al.,  2004). 


Mayhorn,  Stronge,  McLaughlin,  and  Rogers  (2004)  provided 
suggestions  for  development  of  effective  computer  training  for 
older  adults.  They  stressed  the  importance  of  applying  a  sys¬ 
tems  approach  to  the  design  of  training  programs  where  the 
goals,  abilities,  and  experience  levels  of  older  adults  are  consid¬ 
ered  in  the  design  and  evaluation  of  instructional  programs  and 
materials.  Also,  given  the  important  role  of  anxiety  and  self-effi¬ 
cacy  in  technology  adoption  (Czaja  et  al.,  in  press),  it  is  impor¬ 
tant  that  training  environments  are  relaxed  and  strategies  that 
reduce  anxiety  and  increase  self-efficacy  are  incorporated  into 
training  programs. 


AGING  AND  COMPUTER  TASK  PERFORMANCE 


In  terms  of  actual  performance,  only  a  handful  of  studies  have 
examined  the  ability  of  older  people  to  perform  computer- 
based  tasks  that  are  common  in  work  settings.  For  example, 
Czaja  and  Sharit  (1993,  1998a,  1998b),  Sharit  &  Czaja  (1999); 
Czaja  et  al.  (1998)  conducted  a  series  of  studies  examining  age 
performance  differences  on  a  variety  of  simulated  computer- 
based  tasks  (e.g.,  data  entry,  inventory  management,  customer 
service) .  Overall,  the  results  of  these  studies  indicate  that  older 
adults  are  willing  and  able  to  perform  these  types  of  tasks.  How¬ 
ever,  the  younger  adults  generally  performed  at  higher  levels 
than  the  older  people  did.  Importantly,  the  data  also  indicated 
that  there  was  considerable  variability  in  performance  among 
the  older  people  and  that  with  task  experience  those  in  their 
middle  years  (40  to  59  years  old)  performed  at  roughly  the  same 
levels  as  the  young  adults.  In  fact,  task  experience  resulted  in 
performance  improvements  for  people  of  all  ages.  The  results 
also  indicated  that  interventions  such  as  redesigning  the  screen, 
providing  on-screen  aids  and  reconfiguring  the  timing  of  the 
computer  mouse  improved  the  performance  of  all  participants. 

Other  investigators  examined  age  as  a  potential  factor  im¬ 
pacting  on  ability  to  use  the  computer  for  information  search 
and  retrieval.  This  is  an  important  area  of  investigation,  given 
that  this  is  one  of  the  most  common  reasons  people  use  tech¬ 
nology  such  as  computers.  For  example,  this  type  of  activity  is 
also  central  to  use  of  the  Internet.  Also,  in  many  work  settings 
such  as  department  stores,  airlines,  hotels,  utility  and  health  in¬ 
surance  companies,  and  educational  institutions,  workers  are 
required  to  search  through  computer  databases  and  access  in¬ 
formation  to  respond  to  customer  requests.  Generally,  the  find¬ 
ings  from  these  studies  indicated  that,  while  older  adults  are 
capable  of  performing  these  types  of  tasks,  there  are  age-related 
differences  in  performance.  Furthermore,  these  differences  ap¬ 
pear  to  relate  to  age  differences  in  cognitive  abilities. 

For  example,  Westerman,  Davies,  Glendon,  Stammers,  and 
Matthews  (1995)  examined  the  relationship  between  spatial 
ability,  spatial  memory,  vocabulary  skills,  and  age,  and  the  abil¬ 
ity  to  retrieve  information  from  a  computer  database  which  var¬ 
ied  according  to  how  the  database  was  structured  (e.g., 
hierarchical  vs.  linear).  In  general,  they  found  that  the  older 
subjects  were  slower  in  retrieving  the  information  than  the 
younger  adults;  however,  there  were  no  age  related  differences 
in  accuracy.  The  learning  rates  also  differed  for  the  two  groups 
such  that  the  older  people  were  slower  than  the  younger  peo- 


39.  Information  Technology  and  Older  Adults  *  787 


pie  were.  They  found  that  the  slower  response  on  the  part  of 
the  older  adults  was  dependent  on  general  processing  speed. 

Freudenthal  (1997)  examined  the  degree  to  which  latencies 
on  an  information  retrieval  task  were  predicted  by  movement 
speed  and  other  cognitive  variables  in  a  group  of  younger  and 
of  older  adults.  The  participants  were  required  to  search  for  an¬ 
swers  to  questions  in  a  hierarchical  menu  structure.  Results  in¬ 
dicated  that  the  older  subjects  were  slower  than  the  younger 
subjects  were  on  overall  latencies  for  information  retrieval  and 
that  this  slowing  increased  with  each  consecutive  step  in  the 
menu.  Similar  to  Westerman  et  al.  (1995),  movement  speed  was 
a  significant  predictor  of  overall  latency.  Other  cognitive  abili¬ 
ties,  such  as  reasoning  speed,  spatial  ability,  and  memory,  were 
also  predictive  of  response  latencies.  However,  memory  and 
spatial  abilities  predicted  only  latency  on  steps  further  into  the 
menu  structure.  Freudenthal  (1997)  suggested  that  deep  menu 
structures  may  not  be  appropriate  for  older  adults  as  navigation 
through  these  types  of  structures  is  dependent  on  spatial  skills 
that  tend  to  decline  with  age.  Vicente,  Hayes,  and  Williges 
(1987)  also  found  that  age,  spatial  ability,  and  vocabulary  were 
highly  predictive  of  variance  in  search  latency  for  a  computer- 
based  information  retrieval  task.  They  postulated  that  people 
with  low  spatial  ability  tend  to  “get  lost”  in  the  database.  Mead, 
Sit,  Jamieson,  Rousseau,  and  Rogers  (1996)  examined  the  abil¬ 
ity  of  younger  and  older  adults  to  use  an  online  library  data¬ 
base.  Overall,  the  younger  adults  achieved  more  success  than 
did  the  older  adults  in  performing  the  searches.  They  also  used 
more  efficient  search  strategies.  The  older  adults  also  made 
more  errors  when  formulating  search  queries  and  had  more 
difficulty  recovering  from  these  errors.  Czaja,  Sharit,  Ownby 
et  al.  (2001),  in  their  investigation  of  this  type  of  task,  which 
simulated  a  customer  service  representative  for  a  health  insur¬ 
ance  company,  also  found  that  spatial  skills  were  important  pre¬ 
dictors  of  performance  as  were  response  speed  and  prior  com¬ 
puter  experience.  Kubeck,  Miller-Albrecht,  and  Murphy  (1999) 
investigated  age  differences  in  finding  information  using  the  In¬ 
ternet  in  a  naturalistic  setting.  They  found  that  older  people 
tended  to  use  less  efficient  search  strategies  than  younger  peo¬ 
ple  and  were  less  likely  to  find  correct  answers.  However,  they 
also  found  that,  with  training,  older  adults  were  successful  in 
their  searches  and  had  very  positive  reactions  to  their  Internet 
experiences. 

As  discussed,  information  seeking  is  a  complex  process  and 
places  demands  on  cognitive  abilities  such  as  working  memory, 
spatial  memory,  reasoning,  and  problem  solving.  Information 
seeking  within  electronic  environment  also  requires  special 
skills  such  as  knowledge  related  to  the  search  system.  Given 
that  older  adults  typically  experience  declines  in  cognitive  abili¬ 
ties,  such  as  working  memory,  and  that  they  are  less  likely  than 
younger  people  to  have  knowledge  of  the  structure  and  orga¬ 
nization  of  search  systems,  a  relevant  question  is  the  degree  to 
which  they  will  experience  difficulty  searching  for  information 
in  electronic  environments.  Generally,  the  available  literature 
suggests  that  older  adults  are  able  to  search  and  retrieve  infor¬ 
mation  within  electronic  environments.  However,  they  appear 
to  have  more  difficulty  than  do  younger  adults  and  tend  to  use 
less  efficient  navigation  strategies.  The  also  appear  to  have  prob¬ 
lems  remembering  where  and  what  they  searched.  In  order  to 
maximize  the  ability  of  older  people  to  successful  interact  with 


electronic  information  systems,  such  as  the  Internet,  and  to 
have  access  to  the  information  highway,  we  need  to  have  an  un¬ 
derstanding  of  the  source  of  age-related  difficulties.  This  type  of 
information  will  allow  us  to  develop  interface  design  and  train¬ 
ing  strategies  to  accommodate  individual  differences  in  perfor¬ 
mance.  Currently,  there  is  very  little  information  on  problems 
experienced  by  older  people  when  attempting  to  learn  and  nav¬ 
igate  the  Internet,  especially  in  real-world  contexts. 


DESIGNING  COMPUTER  SYSTEMS  TO 
ACCOMMODATE  OLDER  ADULTS 


Hardware  Considerations 

As  discussed,  age-related  changes  in  functioning  have  implications 
for  the  design  of  technology  systems.  For  example,  careful  atten¬ 
tion  needs  to  be  paid  to  the  design  of  display  screens,  placement, 
size,  shape  and  labeling  of  controls,  and  design  and  layout  of  in¬ 
structional  materials  and  manuals.  Design  features  such  as  char¬ 
acter  size  and  contrast  are  especially  important  for  older  computer 
users.  Larger  characters  and  high  contrast  displays  are  generally 
beneficial  for  older  people.  This  may  not  be  a  major  problem  with 
most  computers  used  in  the  home  or  the  workplace,  as  it  is  rela¬ 
tively  easy  to  enlarge  screen  characters.  However,  it  may  be  an  is¬ 
sue  for  computers  in  public  places  such  as  information  kiosks  and 
ATM  machines.  In  addition  to  character  size  and  contrast,  it  is  also 
important  to  minimize  the  presence  of  screen  glare. 

The  organization  and  amount  of  information  on  a  screen  is 
important,  as  there  are  declines  in  visual  search  skills  and  se¬ 
lective  attention  with  age.  Only  necessary  information  should  be 
presented  on  a  screen  and  important  information  should  be 
highlighted.  Further  principles  of  perceptual  organization,  such 
as  grouping,  should  be  applied.  Caution  must  also  be  exercised 
with  respect  to  the  use  of  color  coding,  as  there  are  declines  in 
color  discrimination  abilities  with  age.  Finally,  as  far  as  possible, 
information  should  be  in  a  consistent  location  and  important  in¬ 
formation  should  stand  out  and  be  in  central  locations.  A  flaw 
with  many  existing  web  pages  is  that  advertising  information  has 
more  prominence  than  the  site  information  (e.g.,  pharmaceuti¬ 
cal  advertisements  on  health  websites). 

Design  and  labeling  of  controls  and  input  devices  need  spe¬ 
cial  consideration.  In  our  study  of  e-mail  (Czaja,  Guerrier,  et  al., 
1993)  people  commonly  confused  the  send  and  the  cancel  keys. 
Even  though  these  keys  were  labeled,  they  were  identical  in  size, 
identical  in  shape,  and  close  in  proximity.  These  findings  under¬ 
score  the  importance  of  clearly  differentiating  controls  so  that  la¬ 
beling  is  easy  to  read  and  functions  are  easily  identified. 

Although  there  is  a  growing  body  of  research  examining  the 
relative  merits  and  disadvantages  of  various  input  devices,  only 
a  few  studies  have  examined  age  effects.  These  studies  generally 
suggested  that  commonly  used  input  devices,  such  as  key¬ 
boards  and  the  mouse,  might  be  problematic  for  older  people. 
More  research  needs  to  be  directed  toward  identifying  the  effi¬ 
cacy  of  alternative  input  devices  for  older  people. 

For  example,  Charness  and  Holley  (2001)  suggested  that  older 
adults  may  find  a  light  pen  may  be  easy  to  use  because  it  is  a  di¬ 
rect  addressing  device,  and  it  eliminates  the  need  to  translate 
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the  target  selection  device  onto  the  display.  However,  light  pens 
may  be  difficult  to  use  for  people  with  arthritis  or  tremors  as  it 
may  be  hard  for  them  to  grasp  and  point  the  pen.  Ellis, 
L.  B.  M.,  et  al.  (1991),  in  their  study  of  a  computerized  health¬ 
care  system,  concluded  that  a  touchscreen  version  of  the  ap¬ 
praisal  system  may  have  eliminated  the  interface  problems  ex¬ 
perienced  by  older  users.  Casali  (1992)  evaluated  the  ability  of 
persons  with  impaired  hand  and  arm  functions  (age  unknown) 
and  nondisabled  persons  to  perform  a  target  acquisition  task 
with  five  cursor  control  devices:  a  mouse,  trackball,  cursor  keys, 
joystick,  and  tablet.  She  found  that  even  persons  with  profound 
disabilities  were  able  to  operate  each  device  by  using  minor 
modifications  and  unique  operating  strategies.  The  mouse, 
trackball,  and  tablet  resulted  in  better  performance  than  the  joy¬ 
stick  and  cursor  keys  for  all  participants.  Consistent  with  Char- 
ness  and  Holley  (2001,  June),  Casali  found  that  small  targets 
were  problematic  for  the  physically  impaired  users  as  was  the 
task  of  dragging.  In  a  follow-up  study  (Casali  &  Chase,  1993) 
with  a  sample  of  persons  with  arm  and  hand  disabilities,  the 
data  indicated  that  while  the  mouse,  trackball,  and  tablet  tended 
to  result  in  quicker  performance,  these  devices  were  more  error 
prone  than  the  keyboard  and  joystick.  In  addition,  performance 
improved  with  practice. 

Clearly,  much  work  needs  to  be  done  to  evaluate  which  types 
of  input  devices  are  optimal  for  older  people,  especially  for 
those  who  have  restrictions  in  hand  function.  It  appears  that 
input  devices,  such  as  mice  and  trackballs,  might  not  be  appro¬ 
priate  for  older  adults,  or  they  will  at  least  require  more  practice 
to  effectively  use  these  types  of  devices.  It  may  be  that  speech 
recognition  devices  will  eliminate  many  of  the  problems  (visual 
and  movement)  associated  with  manual  input  devices.  In  this 
regard,  Ogozalek  and  Praag  (1986)  found  that,  although  using 
a  simulated  listening  typewriter  as  compared  to  a  keyboard  ed¬ 
itor  made  no  difference  in  performance  of  a  composition  task 
for  both  younger  and  older  people,  the  voice  input  was  strongly 
preferred  by  all  participants.  More  recently,  Kalasky  et  al.  (1999) 
found  that  a  commercially  available  speech-recognition  program 
was  robust  in  accepting  the  speech  input  of  both  younger  and 
older  adults.  The  technology  was  also  acceptable  to  both  user 
groups,  and  the  data  suggested  that  the  older  participants 
found  the  system  more  useful  than  their  younger  counterparts. 
These  results  are  encouraging  and  suggest  that  speech  inter¬ 
faces  may  be  especially  beneficial  for  older  people. 

Software  Considerations 

There  has  been  very  little  research  examining  the  impact  of  the 
design  of  the  software  interface  on  the  performance  of  older 
computer  users.  Given  that  there  are  age-related  changes  in 
cognitive  processes,  such  as  working  memory  and  selective  at¬ 
tention,  it  is  highly  likely  that  interface  style  (e.g.,  function  keys 
vs.  menus)  will  have  a  significant  influence  on  the  performance 
of  older  adults.  The  limited  data  that  are  available  support  this 
conclusion. 

Joyce  (1990)  evaluated  the  ability  of  older  people  to  learn  a 
word-processing  program  as  a  function  of  interface  style.  The 
participants  interacted  with  the  program  using  one  of  three 
interface  styles:  on-screen  menu,  functions  keys,  and  pull-down 


menus.  She  found  that  people  using  the  pull-down  menus  per¬ 
formed  better  on  the  word-processing  tasks.  Specifically,  they 
performed  the  tasks  more  quickly  and  made  fewer  errors.  They 
also  executed  a  greater  number  of  successful  editorial  changes. 
Joyce  hypothesized  that  the  pull  down-menu  reduced  the  mem¬ 
ory  demands  of  the  task,  as  the  users  did  not  have  to  remem¬ 
ber  editing  procedures.  Although  the  on-screen  menu  provided 
memory  cues,  the  names  of  the  menu  items  were  not  reflective 
of  menu  contents,  thus  requiring  the  user  to  remember  which 
items  were  contained  in  which  menu.  The  names  of  the  pull¬ 
down  menus  were  indicative  of  menu  contents. 

Egan  and  Gomez  (1985)  found  that  a  display  editor  was  less 
difficult  for  older  adults  that  a  line  editor.  The  line  editor  was 
command  based  and  required  the  user  to  remember  command 
language  and  produce  complicated  command  syntax.  Using  the 
display  editor,  changes  were  made  by  positioning  a  cursor  at  the 
location  of  change  and  using  labeled  function  keys  rather  than  a 
command  language.  Thus,  fewer  memory  demands  were  asso¬ 
ciated  with  the  display  editor.  Further,  the  display  editor  was 
less  complex  than  the  line  editor,  and  in  accordance  with  age- 
complexity  hypothesis,  we  would  anticipate  smaller  age  effects 
(Cerella,  Poon,  &  Williams,  1980). 

In  our  study  of  e-mail  (Czaja,  Hammond,  &  Joyce,  1989),  we 
found  that  study  participants  sometimes  had  difficulty  remem¬ 
bering  the  address  name  required  to  access  a  particular  appli¬ 
cation.  We  also  had  to  provide  on-screen  reminders  of  basic 
commands  (e.g.,  “Press  enter  after  entering  the  address”)  even 
though  the  system  was  simple  and  had  no  complex  command 
procedures.  Similarly,  in  our  research  on  text  editing,  we  found 
that  the  older  participants  had  more  difficulty  remembering 
editing  procedures  than  younger  adults  did.  Similarly,  in  our 
study  of  interactive  telephone  menu  systems  (Sharit,  Czaja,  Nair, 
et  al.,  2003),  we  found  that  older  adults  experienced  difficulties 
using  the  menu  systems  and  had  to  use  the  repeat  function 
more  frequently  than  younger  adults.  They  also  reported  frus¬ 
tration  using  the  menus. 

Charness  and  Holley  (2001)  examined  the  differential  effect 
of  a  keystroke-based  command,  menu,  and  menu  +  icon  inter¬ 
face  on  word-processing  skill  acquisition  among  a  sample  of 
younger  and  older  adults.  They  found  that  the  menu  and  the 
menu  +  icon  interface  yielded  better  performance  for  all  par¬ 
ticipants.  The  menu  conditions  provided  more  environmental 
support  and  were  associated  with  fewer  memory  demands. 

Consistent  with  the  cognitive  aging  literature,  these  results 
suggested  that  systems  that  place  minimal  demands  on  working 
memory  would  be  suitable  for  older  people.  In  addition,  mini¬ 
mal  demands  should  be  placed  on  spatial  abilities.  In  this  re¬ 
gard,  interface  styles  such  as  windows  should  prove  beneficial. 
However,  windows  may  be  problematic  for  older  adults  as  win¬ 
dows  are  spatially  organized,  and  it  is  sometimes  easy  to  get  lost 
within  a  windows  environment.  Also,  windows  may  initially  in¬ 
crease  cognitive  demand  as  users  are  required  to  learn  window 
operations.  On-screen  aids  such  as  maps  and  history  markers 
may  also  prove  to  be  beneficial  for  older  people.  We  found,  for 
example,  that  a  simple  graphical  aid  that  depicted  the  structure 
of  the  menu  system  helped  older  adults  use  interactive  tele¬ 
phone  menu  systems  (Sharit,  Czaja,  Nair,  et  al.,  2003). 

Clearly,  a  number  of  interface  issues  need  to  be  investi¬ 
gated.  At  the  present  time,  we  can  only  offer  general  guide- 


39.  Information  Technology  and  Older  Adults  *  789 


lines  with  respect  to  designing  interfaces  to  accommodate  the 
needs  of  older  users.  Table  39.1  presents  a  summary  of  these 
guidelines.  An  abundance  of  research  needs  to  be  carried  out 
within  this  area. 

Other  Important  Design  Considerations 

In  addition  to  general  guidelines  for  the  design  of  training  pro¬ 
grams,  hardware,  and  software,  other  issues  need  to  be  consid¬ 
ered  to  ensure  that  the  benefits  of  technology  are  maximized  for 
older  people.  As  noted,  the  older  adult  population  is  very  hetero¬ 
geneous  in  terms  of  culture/ethnicity,  health,  education,  and  ex¬ 
perience  with  technology  so  it  is  important  to  take  into  account 
the  wide  range  of  abilities,  needs,  and  desires  of  older  users  to 
ensure  that  technology  can  adapt  to  their  individual  differences. 
For  example,  it  is  important  to  consider  different  designs  (e.g.,  lan¬ 
guages)  and  approaches  (e.g.,  advertisements)  for  different  cul¬ 
tural/ethnic  groups.  It  is  also  important  to  consider  how  to  cus¬ 
tomize  or  adjust  interfaces  and  response  devices  to  accommodate 
those  with  varying  abilities.  Privacy  issues  are  also  an  important 
concern  as  the  use  of  technologies  become  more  widespread  for 
health  and  financial  and  service  applications.  The  issue  of  trust 
also  needs  consideration.  Trust  and  ultimately  acceptance  of  tech¬ 
nology  may  be  weakened  by  unreliability  or  excessive  complexity. 
There  are  also  potential  problems  with  overtrust,  which  can  be  se¬ 
rious  if  technology  fails.  Issues  of  safety  and  maintainability  must 
also  be  addressed.  Finally,  the  cost  of  systems  needs  to  be  consid¬ 
ered  not  only  in  terms  of  finances  but  also  in  terms  of  effort  re¬ 
quired  on  the  part  of  the  user  with  respect  to  technology  access 
and  learning  and  maintenance  requirements. 


TABLE  39.1.  Summary  of  Interface  Design  Guidelines 
for  Older  Adults 

1 .  Maximize  the  contrast  between  characters  and  screen  background. 

2.  Minimize  screen  glare. 

3.  Avoid  small  targets  and  characters  that  are  small  (fonts  <  1 2). 

4.  Minimize  irrelevant  screen  information. 

5.  Present  screen  information  in  consistent  locations  (e.g.  error 
messages). 

6.  Adhere  to  principles  of  perceptual  organization  (e.g.  grouping). 

7.  Highlight  important  screen  information. 

8.  Avoid  color  discriminations  among  colors  of  the  same  hue  or  in  the 
blue-green  range. 

9.  Clearly  label  keys. 

10.  Maximize  size  of  icons. 

1 1 .  Use  icons  that  are  easily  discriminated,  meaningful  and  label  icons  if 
possible. 

1 2.  Provide  sufficient  practice  on  the  use  of  input  devices  such  as  a 
mouse  or  trackball. 

13.  Provide  sufficient  practice  on  window  operations. 

14.  Minimize  demands  on  spatial  memory. 

15.  Provide  information  on  screen  location. 

16.  Minimize  demands  on  working  memory. 

1 7.  Avoid  complex  command  languages. 

1 8.  Use  operating  procedures  that  are  consistent  within  and  across 
applications. 

19.  Provide  easy  to  use  online  aiding  and  support  documentation. 


CONCLUSIONS 


There  are  many  areas  where  older  people  are  likely  to  interact 
with  computer  technology  including  the  workplace,  the  home, 
service  and  healthcare  settings.  Current  data  indicates  that  older 
adults  are  generally  receptive  to  using  this  type  of  technology 
but  often  have  more  difficulty  than  younger  people  acquiring 
computer  skills  and  using  current  computer  systems.  They  also 
report  more  anxiety  about  using  technologies  such  as  comput¬ 
ers  and  express  less  confidence  in  their  ability  to  learn  to  inter¬ 
act  with  these  systems  than  do  younger  adults.  This  presents  a 
challenge  for  HCI  community. 

For  many  older  people,  especially  those  who  are  frail,  isolated, 
or  have  some  type  of  mobility  restrictions,  access  to  technologies 
such  as  computers  and  the  Internet  holds  promise  of  enhancing 
independence  and  qualities  of  life  for  older  people  by  providing 
linkages  to  goods  and  resources,  facilitating  communication,  and 
enhancing  opportunities  for  work  and  lifelong  learning.  Tech¬ 
nology  can  also  enhance  the  delivery  of  health  services  to  older 
adults  and  help  them  access  information  on  health  related  topics. 
However,  before  the  potential  of  technology  is  realized  for  older 
adults  the  needs,  preferences,  skills,  and  abilities  of  older  peo¬ 
ple  need  to  be  understood  by  system  designers. 

Although  research  in  this  area  has  grown,  there  are  many 
unanswered  questions.  For  example,  there  are  still  many  issues 
regarding  design  of  input  devices  and  interface  design,  such  as 
how  to  best  design  speech  recognition  systems  and  menus  help 
systems  to  accommodate  older  people.  We  also  know  little  about 
the  efficacy  of  design  aids  and  support  tools  for  older  adults.  In 
addition,  we  need  more  information  on  how  to  best  train  older 
adults  to  learn  to  use  new  technologies,  and  there  are  many 
questions  regarding  the  design  of  online  training  programs  and 
multimedia  formats.  Issues  regarding  privacy  and  trust  in  tech¬ 
nology  also  represent  critical  areas  of  needed  research.  There  are 
also  many  questions  related  to  the  Internet,  such  as  how  does 
access  to  Internet  information  impact  on  healthcare  behavior 
and  how  do  we  best  train  seniors  to  identify  and  integrate  the 
enormous  amount  of  information  that  is  available  on  the  Inter¬ 
net  that  remain  unanswered?  We  also  need  to  examine  how  tech¬ 
nology  in  the  workplace  impacts  on  employment  opportunities 
and  the  work  performance  of  older  people.  The  issue  of  tele¬ 
commuting  has  received  little  attention.  In  addition,  we  need 
more  information  on  factors  influencing  technology  adoption  es¬ 
pecially  for  minority  elderly  or  those  of  lower  education  or  eco¬ 
nomic  status.  There  are  also  many  questions  related  to  quality 
of  life  and  socialization  that  need  to  be  addressed. 

Many  needs  of  older  people  would  be  amenable  to  techno¬ 
logical  solutions.  However,  what  is  lacking  is  a  systematic  effort 
to  understand  these  needs  and  to  incorporate  them  into  de¬ 
sign  solutions  and  the  marketplace  (National  Research  Council, 
2004).  In  essence,  in  order  to  design  information  systems  so 
that  they  are  useful  and  usable  for  older  people,  it  is  important 
to  understand  (a)  why  technology  is  difficult  to  use,  when  it  is; 

(b)  how  to  design  technology  for  easier  and  effective  use;  and 

(c)  how  to  effectively  teach  people  to  use  and  take  advantage 
of  technologies  that  are  available.  Answers  to  these  questions 
will  not  only  serve  to  benefit  older  adults  but  all  potential  users 
of  technology  systems. 
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DESIGNING  FOR  AND  WITH  CHILDREN 


How  is  designing  computer  software  and  hardware  for  kids  dif¬ 
ferent  from  designing  for  adults?  Many  researchers  have  ad¬ 
dressed  questions  about  the  impact  of  technology  on  children; 
less  has  been  said  about  the  impact  children  can  have  on  the  de¬ 
sign  of  technology.  Methods  for  designing  with  and  for  children 
are  only  recently  becoming  widespread  features  of  the  design 
literature  (see  Jensen  &  Skoy  2005). 

In  designing  for  children,  people  tend  to  assume  that  kids 
are  creative,  intelligent,  and  capable  of  great  things  if  they  are 
given  good  tools  and  support.  If  children  cannot  or  do  not  care 
to  use  technologies  we  have  designed,  it  is  our  failure  as  de¬ 
signers.  These  assumptions  are  constructive,  because  users  gen¬ 
erally  rise  to  designers’  expectations.  In  fact,  the  same  assump¬ 
tions  are  useful  in  designing  for  adults.  Designers  of  software  for 
children  start  at  an  advantage  because  they  tend  to  believe  in 
their  users.  However,  they  may  also  be  at  a  disadvantage  be¬ 
cause  they  no  longer  remember  the  physical  and  cognitive  dif¬ 
ferences  of  being  a  child. 

In  this  chapter,  we  will 

•  Describe  how  children’s  abilities  change  with  age,  as  it  relates 
to  HCI 

•  Discuss  how  children  differ  from  adults  cognitively  and  phys¬ 
ically,  for  those  characteristics  most  relevant  for  HCI 

•  Discuss  children  as  participants  in  the  design  process 

•  Review  recommendations  for  usability  testing  with  kids 

•  Review  genres  of  computer  technology  for  kids,  and  design 
recommendations  for  each  genre 


HOW  ARE  CHILDREN  DIFFERENT? 


As  people  develop  from  infants  to  adults,  their  physical  and  cog¬ 
nitive  abilities  increase  over  time  (Kail,  1991;  Miller,  L.  T.  &  "Ver¬ 
non,  1997;  Thomas,  1980).  The  Swiss  psychologist  Jean  Piaget 
was  a  leading  figure  in  analyzing  how  children’s  cognitions 
evolve  (Piaget,  1970).  Piaget  showed  that  children  do  not  just 
lack  knowledge  and  experience  but  also  fundamentally  experi¬ 
ence  and  understand  the  world  differently  than  adults.  He  di¬ 
vided  children’s  development  into  a  series  of  stages: 

•  Sensorimotor  (birth  to  2  years) 

•  Preoperational  (ages  2  to  7) 

•  Concrete  Operational  (ages  7  to  1 1) 

•  Formal  Operational  (ages  11  and  up;  Piaget,  1970,  pp.  29-33) 

Contemporary  research  recognizes  that  all  children  develop 
differently,  and  individuals  may  differ  substantially  from  this  typ¬ 
ical  picture  (Schneider,  1996).  However,  this  general  character¬ 
ization  remains  useful. 

In  the  sensorimotor  stage,  children’s  cognitions  are  heavily 
dependent  on  what  their  senses  immediately  perceive.  Soft¬ 
ware  for  children  this  young  is  difficult  to  design.  Little  inter¬ 
action  can  be  expected  from  the  child.  Obviously,  all  instruc¬ 


tions  must  be  given  in  audio,  video,  or  animation,  since  babies 
cannot  read.  Furthermore,  babies  generally  cannot  be  expected 
to  use  standard  input  devices  like  a  mouse  effectively,  even  with 
large  targets. 

“Reader  Rabbit  Toddler”  (www.thelearningcompany.com) 
is  targeted  at  children  ages  one  to  three.  To  eliminate  the  need 
for  mouse  clicking,  the  cursor  is  transformed  into  a  big  yellow 
star  with  room  for  five  small  stars  inside  it.  As  the  mouse  is  held 
over  a  target,  the  small  stars  appear  one  at  a  time.  When  the 
fifth  star  appears,  it  counts  as  clicking  on  that  target.  If  the  child 
does  click,  the  process  simply  moves  faster.  The  only  downside 
is  the  occasional  unintended  click  on  the  “Go  back  to  the  main 
menu”  icon. 

In  most  activities  in  “Reader  Rabbit  Toddler,”  nearly  random 
mouse  movement  will  successfully  complete  the  activity.  For 
example,  in  the  “Bubble  Castle”  activity,  the  child  needs  to  res¬ 
cue  animals  trapped  in  soap  bubbles  that  are  bouncing  around 
the  screen.  Random  mouse  movements  will  catch  the  animals 
relatively  quickly.  Yet,  the  parent  or  teacher  watching  a  child’s 
use  of  the  software  over  time  will  typically  begin  to  detect  pat¬ 
terns  in  that  mouse  movement  that  become  more  obviously  in¬ 
tentional — the  mouse  moves  more  directly  toward  the  bubbles 
with  animals  in  them.  This  is  a  particularly  well  thought  out  in¬ 
terface  because  it  mimics  how  young  children  learn  language. 
A  baby’s  first  attempts  at  sounds  are  greeted  with  great  enthu¬ 
siasm — the  child  says  an  unrecognizable  phoneme  and  the  par¬ 
ents  smile  and  say,  “You  said  Dada!  This  is  Dada!”  Over  time,  the 
utterance  begins  to  really  sound  like  the  child  said  “Dada.”  An 
initial  positive  reinforcement  for  even  the  most  remote  attempt 
at  the  target  behavior  puts  the  child  on  a  good  learning  trajec¬ 
tory  to  acquiring  that  behavior  (Holdaway  1979). 

Many  examples  of  software  and  other  cultural  artifacts  for 
young  children  are  designed  in  accordance  with  adult  expecta¬ 
tions  of  what  a  child  should  like.  There  are  a  few  noteworthy  ex¬ 
ceptions — for  example,  the  television  show  Teletubbies  is  out  of 
harmony  with  those  stereotypes.  Many  adults  find  the  television 
show  bizarre  and  grating,  but  it  is  wildly  popular  with  toddlers. 
The  designers  of  the  original  BBC  television  series,  Davenport 
and  Wood  (1997),  used  detailed  observations  of  young  chil¬ 
dren’s  play  and  speech  in  their  design.  Wood  commented: 

Our  ideas  always  come  from  children.  If  you  make  something  for  chil¬ 
dren,  the  first  question  you  must  ask  yourself  is,  “What  does  the  world 
look  like  to  children?”  Their  perception  of  the  world  is  very  different  to 
that  of  grown-ups.  We  spend  a  lot  of  time  watching  very  young  children: 
how  they  play;  how  they  react  to  the  world  around  them;  what  they  say. 
(Davenport  &  Wood.  1997) 

Focus  groups  also  played  an  important  role  (BBC,  1997). 
Young  children  are  so  radically  different  from  adults  that  innov¬ 
ative  design  requires  careful  fieldwork. 

While  toddlers’  interactions  with  software  on  a  standard 
desktop  computer  afford  limited  possibilities,  specialized  hard¬ 
ware  can  expand  the  richness  and  complexity  of  interactions. 
For  example,  “Music  Blocks”  (www.neurosmith.com)  is  recom¬ 
mended  for  ages  two  and  up.  Five  blocks  fit  in  slots  in  the  top  of 
a  device  rather  like  a  “boom  box”  portable  music  player.  Each 
block  represents  a  phrase  of  music.  Each  side  of  the  block  is  a 
different  instrumentation  of  that  musical  phrase.  Rearranging 
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the  blocks  changes  the  music.  Interaction  of  this  complexity 
would  be  impossible  for  two-year-olds  using  a  screen-based  in¬ 
terface,  but  is  quite  easy  with  specialized  hardware.  Research  on 
alternative  computer  interfaces  such  as  tangible  technologies 
(Ishii  &  Ullmer,  1997;  Dourish,  2001)  holds  great  promise  for 
novel  children’s  interface  designs  (Price,  Rogers,  Scaife,  Stan¬ 
ton,  &  Neale,  2003;  O’Malley  &  Stanton,  2004). 

In  the  preoperational  stage  (ages  2  to  7),  children’s  attention 
spans  are  brief  They  can  hold  only  one  thing  in  memory  at  a 
time.  They  have  difficulty  with  abstractions.  They  cannot  un¬ 
derstand  situations  from  other  people’s  points  of  view.  "While 
some  children  may  begin  to  read  at  a  young  age,  designs  for  this 
age  group  generally  assume  the  children  are  still  preliterate.  It  is 
reasonable  to  expect  children  at  this  age  can  click  on  specific 
mouse  targets,  but  they  must  be  relatively  large.  Most  design¬ 
ers  generally  still  avoid  use  of  the  keyboard  (except  “hit  any  key” 
approaches). 

In  the  concrete  operational  stage  (ages  7  to  11),  “We  see 
children  maturing  on  the  brink  of  adult  cognitive  abilities. 
Though  they  cannot  formulate  hypothesis,  and  though  abstract 
concepts  such  as  ranges  of  numbers  are  often  still  difficult,  they 
are  able  to  group  like  items  and  categorize”  (Schneider,  1996, 
p.  69).  Concrete  operational  children  are  old  enough  to  use  rel¬ 
atively  sophisticated  software,  but  still  young  enough  to  appre¬ 
ciate  a  playful  approach.  It  is  reasonable  to  expect  simple  key¬ 
board  use.  Children’s  abilities  to  learn  to  type  grow  throughout 
this  age  group.  It  is  reasonable  to  expect  relatively  fine  control 
of  the  mouse. 

Finally,  by  the  time  a  child  reaches  the  formal  operational 
stage  (ages  11  and  up),  designers  can  assume  the  child’s  think¬ 
ing  is  generally  similar  to  that  of  adults.  Their  interests  and 
tastes,  of  course,  remain  different.  Designing  for  this  age  group 
is  much  less  challenging,  because  adult  designers  can  at  least 
partially  rely  on  their  own  intuitions. 

Using  age  as  a  guide  can  be  useful;  however,  designers 
should  be  aware  that  designing  too  young  can  be  just  as  prob¬ 
lematic  as  designing  too  old.  Children  are  acutely  aware  of  their 
own  abilities;  being  asked  to  interact  with  technology  designed 
for  younger  children  can  be  perceived  as  an  affront  or  boring 
(Halgren,  Fernandes,  &  Thomas,  1995;  Gilutz  &  Nielsen,  2002). 


FIGURE  40. 1 .  Children  playing  with  Music  Blocks. 


In  addition  to  considering  the  implications  of  cognitive  devel¬ 
opment  on  children’s  abilities  to  use  technologies,  it  is  impor¬ 
tant  to  remember  that  different  age  groups  differ  culturally  too. 
Understanding  what  is  fun  or  interesting  for  a  particular  age 
group  involves  understanding  both  children’s  developmental 
abilities  and  children’s  culturally-dependent  aesthetic  sensibili¬ 
ties.  Oosterholt,  Kusano,  and  "Vries  (1996)  suggested  that  de¬ 
signers  should  avoid  trying  to  be  fashionable — what  is  cool 
changes  quickly — and  should  target  a  limited  age  range  because 
children’s  abilities  and  sensibilities  change  quickly  as  well. 

In  the  next  sections,  we  will  focus  on  several  characteristics 
of  children  that  are  relevant  for  HCI  research: 

•  Dexterity 

•  Speech 

•  Reading 

•  Background  knowledge 

•  Interaction  style 

Dexterity 

Young  children’s  fine  motor  control  is  not  equal  to  that  of  adults 
(Thomas,  1980),  and  they  are  physically  smaller.  Devices  de¬ 
signed  for  adults  may  be  difficult  for  children  to  use.  Joiner, 
Messer,  Light,  and  Littleton  (1998)  noted  that  “the  limited 
amount  of  research  on  children  has  mainly  assessed  the  per¬ 
formance  of  children  at  different  ages  and  with  different  input 
devices”  (p.  514). 

Numerous  studies  confirm  that  children’s  performances  with 
mice  and  other  input  devices  increase  with  age  (Joiner  et  al., 
1998;  Hourcade,  2002).  Compared  to  adults,  children  have  diffi¬ 
culty  holding  down  the  mouse  button  for  extended  periods  and 
have  difficulty  performing  a  dragging  motion  (Strommen,  1994). 
This  means  that  many  standard  desktop  interface  features  pose 
problems  for  young  users.  For  example,  kids  have  difficulty  with 
marquee  selection.  Marquee  selection  is  a  technique  for  select¬ 
ing  several  objects  at  once  using  a  dynamic  selection  shape.  In 
traditional  marquee  selection,  the  first  click  on  the  screen  is  the 
initial,  static  corner  of  the  selection  shape  (typically  a  rectangle). 
Dragging  the  mouse  controls  the  diagonally  opposite  corner  of 
the  shape,  allowing  you  to  change  the  dimensions  of  the  se¬ 
lected  area  to  encapsulate  the  necessary  objects.  Dragging  the 
mouse  away  from  the  initial  static  corner  increases  the  size  of  the 
selection  rectangle,  while  dragging  the  mouse  toward  the  initial 
static  corner  decreases  the  size  of  the  selection  rectangle.  A  badly 
placed  initial  corner  can  make  it  difficult  and  sometimes  impos¬ 
sible  to  select/encapsulate  all  of  the  objects.  Berkovitz  (1994) 
experimented  with  a  new  encirclement  technique:  the  initial 
area  of  selection  is  specified  with  an  encircling  gesture  and  mov¬ 
ing  the  mouse  outside  of  the  area  enlarges  it. 

Kids  may  have  trouble  double-clicking,  and  their  small  hands 
may  have  trouble  using  a  three-button  mouse  (Bederson,  Hol- 
lan,  Druin,  Stewart,  Rogers,  &  Proft,  1996).  As  with  adults,  chil¬ 
dren  can  use  point-and-click  interfaces  more  easily  than  drag- 
and-drop  (Inkpen,  2001;  Joiner  et  al.,  1998).  Inkpen  (2001) 
noted,  “Despite  this  knowledge,  children’s  software  is  often  im¬ 
plemented  to  utilize  a  drag-and-drop  interaction  style.  Bringing 
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solid  research  and  strong  results  ...  to  the  forefront  may  help 
make  designers  of  children’s  software  think  more  about  the  im¬ 
plications  of  their  design  choices”  (p.  30). 

Strommen  (1998)  noted  that  since  young  children  cannot  re¬ 
liably  tell  their  left  from  their  right,  interfaces  for  kids  should  not 
rely  on  that  distinction.  In  his  Actimates  interactive  plush  toy  de¬ 
signs,  the  toys’  left  and  right  legs,  hands,  and  eyes  always  per¬ 
form  identical  functions.  More  recent  interactive  soft  toys  like 
“Hug  &  Learn  Baby  Tad”  by  Leapfrog  (http://www.leapfrog.com) 
rely  in  clear  visual  markings  on  left  and  right  paws  to  indicate 
their  distinct  functions. 

Speech 

Speech  recognition  has  intriguing  potential  for  a  wide  variety 
of  applications  for  children.  O’Hare  and  McTear  (1999)  studied 
use  of  a  dictation  program  by  12-year-olds  and  found  that  they 
could  generate  text  more  quickly  and  accurately  than  by  typ¬ 
ing.  They  note  that  dictation  automatically  avoids  some  of  the 
errors  children  would  otherwise  make,  because  the  recognizer 
generates  correct  spelling  and  capitalization.  This  is  desirable 
in  applications  where  generating  correct  text  is  the  goal.  If  in¬ 
stead,  the  goal  is  to  teach  children  to  write  correctly  (and,  for 
example,  to  capitalize  their  sentences),  then  dictation  software 
may  be  counterproductive. 

"While  O’Hare  and  McTear  (1999)  were  able  to  use  a  standard 
dictation  program  with  12-year-olds,  Nix,  Fairweather,  and 
Adams  (1998)  noted  that  speech  recognition  developed  for 
adults  will  not  work  with  very  young  children.  In  their  research 
on  reading  tutors  for  children  5  to  7  years  old,  they  first  tried  a 
speech  recognizer  designed  for  adults.  The  recognition  rate  was 
only  75%,  resulting  in  a  frustrating  experience  for  their  subjects. 
Creating  a  new  acoustic  model  from  the  speech  of  children  in 
the  target  age  range,  they  were  able  to  achieve  an  error  rate  of 
less  than  5%.  Further  gains  were  possible  by  explicitly  account¬ 
ing  for  common  mispronunciations  and  children’s  tendency  to 
respond  to  questions  with  multiple  words  where  adults  would 
typically  provide  a  one-word  answer.  Even  with  the  improved 
acoustic  model,  the  recognizer  still  made  mistakes.  To  avoid 
frustrating  the  children  with  incorrect  feedback,  they  chose  to 
have  the  system  never  tell  the  child  they  were  wrong.  "When  the 
system  detects  what  it  believes  to  be  a  wrong  answer,  it  simply 
gives  the  child  an  easier  problem  to  attempt. 

Reading 

The  written  word  is  a  central  vehicle  for  communicating  infor¬ 
mation  to  humans  in  human-computer  interfaces.  Conse¬ 
quently,  designing  computer  technology  for  children  with  de¬ 
veloping  reading  skills  presents  a  challenge.  Words  that  are  at  an 
appropriate  reading  level  for  the  target  population  must  be  cho¬ 
sen.  Larger  font  sizes  are  generally  preferred.  Bernard,  Mills, 
Frank,  and  McKnown  (2001)  found  that  kids  9  to  11  years  old 
prefer  14-point  fonts  over  12-point.  Surprisingly,  very  little  em¬ 
pirical  work  has  been  done  in  this  area.  Most  designers  follow 
the  rule  of  thumb  that  the  younger  the  child,  the  larger  the  font 
should  be. 


Designing  for  preliterate  children  presents  a  special  chal¬ 
lenge.  Audio,  graphics,  and  animation  must  substitute  for  all 
functions  that  would  otherwise  be  communicated  in  writing. 
The  higher  production  values  required  can  add  significantly  to 
development  time  and  cost.  Likewise,  visually  impaired  children 
pose  unique  challenges  for  designers.  Audio,  tactile,  and  other 
sensory  interfaces  can  provide  opportunities  for  visually  and 
otherwise  impaired  children  who  cannot  use  traditional  inter¬ 
faces  to  interact  with  computers  autonomously  (McElligott  & 
van  Leeuwen,  2004). 

Background  Knowledge 

Many  user  interfaces  are  based  on  metaphors  (Erickson,  1990) 
from  the  adult  world.  Jones  (1992)  noted  that  children  are  less 
likely  to  be  familiar  with  office  concepts  like  file  folders  and  in- 
out  boxes.  In  designing  an  animation  system  for  kids,  Halgren 
et  al.  (1995)  found  many  kids  to  be  unfamiliar  with  the  meta¬ 
phor  of  a  frame-based  filmstrip  and  that  of  a  'VCR.  It  is  helpful 
to  choose  metaphors  that  are  familiar  to  kids,  though  kids  often 
have  success  in  learning  interfaces  based  on  unfamiliar  meta¬ 
phors  if  they  are  clear  and  consistent  (Schneider,  1996). 

Interaction  Style 

Children’s  patterns  of  attention  and  interaction  are  quite  differ¬ 
ent  from  those  of  adults.  Traditional  task-oriented  analyses  of  ac¬ 
tivity  may  fail  to  capture  the  playful,  spontaneous  nature  of  chil¬ 
dren’s  interactions  with  technology.  Eor  example,  when  adults 
are  the  intended  users,  designers  take  great  pains  to  create  er¬ 
ror  messages  that  are  informative  and  understandable  based  on 
the  assumption  that  users  want  to  avoid  generating  the  message 
again.  Hanna,  Risden,  and  Alexander  (1997)  used  a  funny  noise 
as  an  error  message  and  found  that  the  children  repeatedly  gen¬ 
erated  the  error  to  hear  the  noise.  Similarly,  Halgren  et  al.  (1995) 
found  that  children  would  click  on  any  readily  visible  feature  just 
to  see  what  would  happen,  and  they  might  click  on  it  repeatedly 
if  it  generated  sound  or  motion  in  feedback.  This  behavior  was 
causing  young  users  to  be  trapped  in  advanced  modes  they  did 
not  understand.  The  designers  chose  to  hide  advanced  func¬ 
tionality  in  drawers — a  metaphor  that  is  familiar  to  children. 

By  hiding  the  advanced  tools,  the  novice  users  would  not  stumble  onto 
them  and  get  lost  in  their  functionality.  Rather,  only  the  advanced  users 
who  might  want  the  advanced  tools  would  go  looking  for  more  options. 
This  redesign  allows  the  product  to  be  engaging  and  usable  by  a  wider 
range  of  ages  and  abilities.  (Halgren  et  al.,  1995) 

Resnick  and  Silverman  (2005)  went  even  further  in  their  de¬ 
sign  principle:  “Make  it  as  simple  as  possible — and  maybe  even 
simpler.”  They  warned  against  “functionality  creep”  and  sug¬ 
gested  removing  advanced  functionality  altogether  if  it  is  not 
clearly  required  to  support  children’s  creative  efforts. 

Children  also  bring  unique  interaction  styles  to  online  envi¬ 
ronments;  they  respond  to  information  they  encounter  while 
browsing  the  web  in  markedly  different  ways  than  adults.  In  a 
study  of  55  first  through  fifth  graders,  the  Nielsen  Norman 
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Group  (Gilutz  &  Nielsen,  2002)  found  that  kids  were  often  un¬ 
able  to  distinguish  between  website  content  and  advertisements. 
Moreover,  they  rarely  scrolled  down  to  find  content;  instead, 
they  chose  to  interact  with  website  elements  that  were  immedi¬ 
ately  visible.  When  examining  a  new  website,  children  were  will¬ 
ing  to  hunt  for  links  in  the  content  by  “scrubbing  the  screen” 
with  the  mouse  instead  of  relying  solely  on  visual  cues  (Gilutz  & 
Nielsen,  2002). 

Children  are  more  likely  than  adults  to  work  with  more  than 
one  person  at  a  single  computer.  They  enjoy  doing  so  to  play 
games  (Inkpen,  1997),  and  they  may  be  forced  to  do  so  because 
of  limited  resources  in  school  (Stewart,  Raybourn,  Bederson,  & 
Druin,  1998).  Teachers  may  also  create  a  shared-computer  setup 
to  promote  collaborative  learning.  When  multiple  children  work 
at  one  machine  simultaneously,  they  need  to  negotiate  sharing 
control  of  input  devices.  Giving  students  multiple  input  devices 
increases  their  productivity  and  their  satisfaction  (Inkpen,  1997; 
Inkpen,  Gribble,  Booth,  &  Klawe,  1995;  Stewart  et  al.,  1998). 
Inkpen  et  al.  (1995)  compared  two  different  protocols  for  trans¬ 
ferring  control  between  multiple  input  devices:  give  and  take.  In 
a  give  protocol,  the  user  with  control  clicks  the  right  mouse  but¬ 
ton  to  cede  it  to  the  other  user;  in  a  take  protocol,  the  idle  user 
clicks  to  take  control.  In  one  study  with  12-year-olds  and  an¬ 
other  with  9-  to  13-year-olds,  they  found  that  girls  solve  more 
puzzles  with  a  give  protocol,  but  boys  are  more  productive  with 
a  take  protocol  (Inkpen,  1997;  Inkpen  et  al.,  1995). 


CHILDREN  AND  THE  DESIGN  PROCESS 


Users  play  a  variety  of  roles  in  HCI  design  processes.  Visionary 
designers,  such  as  Kay  (1972)  and  Papert  (1972),  began  consid¬ 
ering  the  abilities  and  sensibilities  of  children  in  the  design  of 
new  technologies  as  early  as  the  1970s.  Today,  ethnographic  and 
participatory  (Schuler  &  Namioka,  1993)  methods  are  becoming 
increasingly  common  features  of  the  human-centered  design 
toolkit,  as  HCI  designers  attempt  to  deeply  understand  the  prac¬ 
tices  and  preferences  of  people  who  will  be  using  new  tech¬ 
nologies.  When  designers  enter  the  world  of  children,  and,  con¬ 
versely,  when  children  enter  the  laboratory,  many  of  the 
traditional  rules  change.  As  we  have  seen,  children  are  not  just 
little  adults;  they  engage  with  the  world  in  fundamentally  dif¬ 
ferent  ways.  Naturally,  they  bring  a  host  of  social,  emotional,  and 
cognitive  elements  to  the  design  process  that  are  unfamiliar  to 
designers  who  are  accustomed  to  working  with  adults.  In  this 
section,  we  examine  new  and  traditional  methods  for  working 
with  children  in  a  human-centered  design  process. 


Use  of  Video  With  Children 

Like  adults,  children  may  change  their  behaviors  when  a  video 
camera  is  present.  Druin  (1999)  and  a  design  team  found  in  their 
early  work  that  children  tended  to  “freeze”  or  “perform”  when 
they  saw  a  video  camera  in  the  room.  In  subsequent  work, 
Druin’s  team  observed  that  the  problems  associated  with  video¬ 
taping  had  more  to  do  with  power  relationships  than  with  the 
video  cameras  themselves.  When  the  children  are  in  control  of 


the  cameras,  their  discomfort  decreases  (Alborzi  et  al.,  2000).  In 
addition  to  considering  the  social  impact  of  using  a  camera  with 
children,  there  are  also  technical  difficulties  to  deal  with.  Druin’s 
research  team  found  that,  even  with  smaller  cameras,  it  was  dif¬ 
ficult  to  capture  data  in  small  bedrooms  and  large  public  spaces. 
The  sound  and  speech  captured  in  public  spaces  was  difficult  to 
understand  or  even  inaudible.  Finally,  it  was  difficult  to  know 
where  to  place  cameras  because  they  did  not  know  where  chil¬ 
dren  would  sit,  stand,  or  move  in  the  environment.  Druin  rec¬ 
ommended  using  multiple  data  sources  to  capture  “messy”  de¬ 
sign  environments  with  children,  including  note  takers  and 
participant  observers  in  addition  to  videotaping.  Druin  also  en¬ 
couraged  the  design  team  to  use  video  cameras  (along  with  jour¬ 
nal  writing,  team  discussion,  and  adult  debriefing)  as  a  way  to 
record  their  brainstorming  sessions  and  other  design  activities. 

Goldman-Segall  (1996)  explained  why  video  data  are  an  im¬ 
portant  part  of  ethnographic  interviews  and  observations.  When 
using  video,  the  researcher  does  not  have  to  worry  about  re¬ 
membering  or  writing  down  every  detail:  “She  can  concentrate 
fully  on  the  person  and  on  the  subtleties  of  the  conversation.” 
The  researcher  also  has  access  to  “a  plethora  of  visual  stimuli 
which  can  never  be  ‘translated’  into  words  in  text,”  such  as  body 
language,  gestures,  and  facial  expressions.  It  is  especially  im¬ 
portant  to  be  able  to  review  the  body  language  of  children  as 
they  interact  with  software.  Hanna  et  al.  (1997)  stated  that  chil¬ 
dren’s  “behavioral  signs  are  much  more  reliable  than  children’s 
responses  to  questions  about  whether  or  not  they  like  some¬ 
thing,  particularly  for  younger  children.  Children  are  eager  to 
please  adults,  and  may  tell  you  they  like  your  program  just  to 
make  you  happy.”  MacFarlane,  Sim,  and  Horton  (2005)  sug¬ 
gested  that  both  signs  (behaviors)  and  symptoms  (children’s  di¬ 
rect  responses)  should  be  used  together  to  understand  chil¬ 
dren’s  enjoyments  of  and  abilities  to  use  new  technologies. 
"Video  is  extremely  useful  in  being  able  to  study  behavioral  signs 
as  the  researchers  may  miss  some  important  signs  and  gestures 
during  the  actual  observation  or  interview. 

Instead  of  using  video  in  its  traditional  capacity  for  ethno¬ 
graphic-style  observation,  some  researchers  have  attempted  to 
capitalize  on  children’s  playful  treatment  of  video  cameras  to 
elicit  articulation  about  new  technologies.  In  studies  using  video 
probes  to  capture  domestic  communication  patterns,  Hutchin¬ 
son  et  al.  (2003)  observed  that  images  are  particularly  attractive  to 
young  people  as  an  entertaining  medium  for  interacting  and  com¬ 
municating.  During  classroom  observations  of  children  using 
math-learning  software,  Lamberty  and  Kolodner  (2005)  encour¬ 
aged  children  to  engage  in  “camera  talk”  with  stationary  cameras 
if  they  wished.  Many  of  the  children  regularly  talked  to  the  cam¬ 
era.  This  spontaneous  behavior  revealed  both  their  preferences 
for  using  the  software  and  their  developing  understanding  of  frac¬ 
tions.  Likewise,  Iversen  (2002)  suggested  that,  by  provoking  chil¬ 
dren  to  verbalize,  video  cameras  provide  a  communication  link 
between  designers  and  young  informants,  thereby  enriching  both 
the  data  collected  and  the  design  experience. 

Methods  for  Designing  and  Testing  With  Kids 

In  this  section,  we  review  a  variety  of  methods  for  designing 
with  and  for  children.  These  methods  differ  dramatically  in  the 
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amount  of  power  they  grant  to  children.  Some  methods  en¬ 
courage  us  to  view  children  as  codesigners  with  an  equal  voice 
in  determining  design  direction,  whereas  others  place  children 
in  a  more  reactive  role  as  evaluators  or  subjects  in  laboratory- 
based  usability  tests.  In  practice,  designers  use  methods  from 
different  points  on  this  power  spectrum  depending  on  the  ma¬ 
turity  of  the  project,  and  often  move  back  and  forth  between 
testing  with  kids  and  open-ended  exploration  (Scaife,  Rogers, 
Aldrich,  &  Davies,  1997). 

Drain  (2002)  unpacked  this  spectrum  of  control  by  describ¬ 
ing  four  different  roles  that  children  can  play  in  the  design  of 
new  technologies:  user,  tester,  informant,  and  design  partner. 
The  most  reactive  role  she  described  for  children  in  design  is 
user.  As  users,  children  interact  with  existing  technologies  and 
have  no  direct  impact  on  the  design  of  the  technology,  except  in 
the  form  of  recommendations  for  future  designs.  As  testers, 
children  are  asked  to  provide  feedback  about  technology  in  de¬ 
velopment  so  that  it  can  be  refined  before  it  is  released;  how¬ 
ever,  adult  designers  determine  the  goals  of  the  technology 
much  earlier.  As  informants,  children  play  an  earlier,  more  active 
role  in  determining  the  goals  and  features  of  new  technologies. 
When  children  play  the  role  of  informant,  they  interact  directly 
with  designers,  but  ultimately,  the  designers  decide  what  the 
children  need  or  want  based  on  observations,  interviews,  or 
other  data  collection  methods.  Finally,  Drain  explains  that  as  de¬ 
sign  partners,  children  are  equal  stakeholders  in  the  design 
process.  Although  they  may  not  be  able  to  contribute  to  the  de¬ 
velopment  of  the  technology  in  equivalent  ways,  their  exper¬ 
tise  is  equal  in  importance  to  that  of  other  contributors  to  the 
design  process. 

The  notion  of  children  as  design  partners  will  be  explored 
more  fully  in  the  section  on  co-operative  inquiry.  Methods  for 
including  children  as  informants  and  design  partners  borrow 
from  the  tradition  of  participatory  design  that  emerged  in  the 
Scandinavian  workplace.  Participatory  design  is  an  “approach 
towards  computer  systems  design  in  which  the  people  destined 
to  use  the  system  play  a  critical  role  in  designing  it”  (Schuler  & 
Namioka,  1993,  p.  xi).  With  children,  this  idea  is  even  more  im¬ 
portant:  Since  they  are  physically  and  cognitively  different  from 
adults,  their  participations  in  the  design  process  may  offer  sig¬ 
nificant  insights.  Schuler  wrote: 

[Participatory  Design]  assumes  that  the  workers  themselves  are  in  the 
best  position  to  determine  how  to  improve  their  work  and  their  work 
life  ...  It  views  the  users’  perceptions  of  technology  as  being  at  least  as 
important  to  success  as  fact,  and  their  feelings  about  technology  as  at 
least  as  important  as  what  they  can  do  with  it.  (Schuler  &  Namioka, 
1993,  p.  xi) 

Empowering  children  in  this  way  and  including  them  in  the  de¬ 
sign  process  can  be  difficult  due  to  the  traditionally  unequal 
power  relationships  between  kids  and  adults. 

On  the  other  end  of  the  spectrum,  methods  for  including 
children  as  users  and  testers  often  borrow  from  the  traditional 
practices  of  experimental  psychology.  Usability  testing  generally 
takes  place  in  a  controlled  setting.  Sometimes  a  single  design  is 
tested  with  the  goal  of  improving  it;  at  other  times,  different  de¬ 
sign  ideas  might  be  compared  to  establish  which  ones  generate 


more  positive  feedback  or  enable  better  task  completion.  Data 
collection  methods  like  verbal  protocol  analysis  (Ericsson  & 
Simon,  1993)  are  commonly  used  and  will  be  further  discussed 
in  the  section  on  adapting  traditional  usability  methods  for  kids. 

Cooperative  Inquiry 

Druin  (1999)  developed  a  systematic  approach  to  develop¬ 
ing  new  technologies  for  children  with  children;  she  created 
new  research  methods  that  included  children  in  various  stages 
of  the  design  process.  This  approach,  called  “cooperative  in¬ 
quiry,”  is  a  combination  of  participatory  design,  contextual  in¬ 
quiry,  and  technology  immersion.  Children  and  adults  work  to¬ 
gether  on  a  team  as  research  and  design  partners.  She  reiterated 
the  idea  that  “each  team  member  has  experiences  and  skills  that 
are  unique  and  important,  no  matter  what  the  age  or  discipline” 
(Alborzi  et  al.,  2000,  p.  97). 

In  this  model,  the  research  team  frequently  observes  chil¬ 
dren  interacting  with  software,  prototypes,  or  other  devices  to 
gain  insight  into  how  child  users  will  interact  with  and  use  these 
tools.  When  doing  these  observations,  both  adult  and  child  re¬ 
searchers  observe,  take  notes,  and  interact  with  the  child  users. 
During  these  observations,  there  are  always  at  least  two  note- 
takers  and  one  interactor,  and  these  roles  can  be  filled  by  either 
an  adult  or  a  child  team  member.  The  interactor  is  the  re¬ 
searcher  who  initiates  discussion  with  the  child  user  and  asks 
questions  concerning  the  activity.  If  there  is  no  interactor  or  if 
the  interactor  takes  notes,  the  child  being  observed  may  feel 
uncomfortable,  like  being  “on  stage”  (Druin,  1999).  Other  re¬ 
searchers  have  also  found  that  the  role  of  interactor  can  be  use¬ 
ful  for  members  of  the  design  team.  Scaife  and  Rogers  (1999) 
successfully  involved  children  as  informants  in  the  development 
of  ECOi,  a  program  that  teaches  children  about  ecology.  They 
wanted  the  kids  to  help  them  codesign  some  animations  in 
ECOi.  Rather  than  just  having  the  software  designer  observe  the 
children  as  they  played  with  and  made  comments  about  the 
ECOi  prototypes,  the  software  designer  took  on  the  role  of  in¬ 
teractor  to  elicit  suggestions  directly.  Through  these  on  the  fly, 
high-tech  prototyping  sessions,  they  learned  that  “it  was  possi¬ 
ble  to  get  the  software  designer  to  work  more  closely  with  the 
kids  and  to  take  on  board  some  of  their  more  imaginative  and 
kid-appealing  ideas”  (Scaife  &  Rogers,  1999). 

When  working  as  design  partners,  children  are  included  from 
the  beginning.  The  adults  do  not  develop  all  the  initial  ideas  and 
then  later  see  how  the  children  react  to  them.  The  children  par¬ 
ticipate  from  the  start  in  brainstorming  and  developing  the  initial 
ideas.  The  adult  team  members  need  to  learn  to  be  flexible  and 
learn  to  break  away  from  carefully  following  their  session  plans, 
which  is  too  much  like  school.  Children  can  perform  well  in  this 
more  improvisational  design  setting,  but  the  extent  to  which  the 
child  can  participate  as  a  design  partner  depends  on  his  or  her 
age.  Children  younger  than  7  years  may  have  difficulty  in  ex¬ 
pressing  themselves  verbally  and  in  being  self-reflective.  These 
younger  children  also  have  difficulty  in  working  with  adults  to 
develop  new  design  ideas.  Children  older  than  10  are  typically 
beginning  to  become  preoccupied  with  preconceived  ideas  of 
the  way  things  are  supposed  to  be.  In  general,  it  has  been  found 
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that  children  ages  7  to  10  years  old  are  the  most  effective  proto¬ 
typing  partners.  They  are  “verbal  and  self-reflective  enough  to 
discuss  what  they  are  thinking”  (Druin  et  al.,  1999,  p.  61),  and 
understand  the  abstract  idea  that  their  low-tech  prototypes  and 
designs  are  going  to  be  turned  into  technology  in  the  future. 
They  also  do  not  get  bogged  down  with  the  notion  that  their  de¬ 
signs  must  be  similar  to  preexisting  designs  and  products. 

Through  her  work  with  children  as  design  partners,  Druin 
(1999,  2002)  has  discovered  that  there  are  stumbling  blocks  on 
the  way  to  integrating  children  into  the  design  process  and  to 
helping  adults  and  children  work  together  as  equals.  One  set 
of  problems  deals  with  the  ability  of  children  to  express  their 
ideas  and  thoughts.  When  the  adult  and  children  researchers 
are  doing  observations,  it  is  best  to  allow  each  group  to  develop 
its  own  style  of  note  taking.  Adults  tend  to  take  detailed  notes, 
and  children  tend  to  prefer  to  draw  cartoons  with  short,  ex¬ 
planatory  notes.  It  is  often  difficult  to  create  one  style  of  note 
taking  that  will  suit  both  groups.  Since  children  may  have  a  dif¬ 
ficult  time  communicating  their  thoughts  to  adults,  low-tech 
prototyping  is  an  easy  and  concrete  way  for  them  to  create  and 
discuss  their  ideas.  Art  supplies  such  as  paper,  crayons,  clay,  and 
string  allow  adults  and  children  to  work  on  an  equal  footing.  A 
problem  that  arises  in  practice  is  that  since  these  tools  are  child¬ 
like,  adults  may  believe  that  only  the  child  needs  to  do  such  pro¬ 
totyping.  It  is  important  to  encourage  adults  to  participate  in 
these  low-tech  prototyping  sessions. 

The  second  set  of  problems  emerges  from  the  traditionally 
unequal  power  relationships  between  adults  and  children.  In 
what  sense  can  children  be  treated  as  peers?  When  adults  and 
children  are  discussing  ideas,  making  decisions,  or  conducting 
research,  traditional  power  structures  may  emerge.  In  conduct¬ 
ing  a  usability  study,  the  adult  researcher  might  lead  the  child 
user  through  the  experiment  rather  than  allowing  the  child  to 
explore  freely  on  his  or  her  own.  In  a  team  discussion,  the  chil¬ 
dren  may  act  as  if  they  are  in  a  school  setting  by  raising  their 
hands  to  speak.  Adults  may  even  inadvertently  take  control  of 
discussions.  Is  it  sensible  to  set  up  design  teams  where  children 
are  given  equal  responsibilities  to  those  of  adult  designers?  Get¬ 
ting  adults  and  children  to  work  together  as  a  team  of  equals  is 
often  the  most  difficult  part  of  the  design  process.  It  is  to  be 
expected  that  it  may  take  a  while  for  a  group  to  become  com¬ 
fortable  and  efficient  when  working  together.  It  can  take  up  to 
six  months  for  an  “intergenerational  design  team  to  truly  de¬ 
velop  the  ability  to  build  upon  each  other’s  ideas”  (Druin  et  al., 
2001).  To  help  diffuse  such  traditional  adult-child  relationships, 
adults  are  encouraged  to  dress  casually,  and  there  always  should 
be  more  than  one  adult  and  more  than  one  child  on  a  team.  A 
single  child  may  feel  outnumbered  by  the  adults,  and  a  single 
adult  might  create  the  feeling  of  a  school  environment  where 
the  adult  takes  on  the  role  of  teacher.  Alborzi  et  al.  (2000) 
started  each  design  session  with  15  minutes  of  snack  time, 
where  adults  and  children  can  informally  discuss  anything.  This 
helps  both  adults  and  children  get  to  know  each  other  better 
as  “people  with  lives  outside  of  the  lab”  (Alborzi  et  al.,  2000)  and 
to  improve  communication  within  the  group. 

Scaife  et  al.  (1997)  identified  aspects  of  working  with  chil¬ 
dren  in  the  role  of  informant  that  require  special  attention.  They 
found  that,  when  working  in  pairs,  children  feel  less  inhibited 


about  telling  strange  adults  what  they  were  thinking.  Other  re¬ 
searchers  have  also  found  that  pairing  children,  especially  with 
friends,  can  help  ease  discomfort  (Dindler,  Eriksson,  Iversen, 
Ludvigsel,  &  Lykke-Olesen,  2005;  Als,  Jensen,  &  Skov,  2005). 
Scaife  et  al.  (1997)  cautioned  that  adults  also  need  to  become 
comfortable  in  the  role  of  facilitator  and  should  take  care  not 
to  intervene  too  quickly  if  children’s  discussions  wander. 

In  addition  to  the  social  challenges  associated  with  mixing 
adults  and  children  as  equal  design  partners,  kids  do  not  always 
know  how  to  collaborate  well  with  one  another  in  the  first  place. 
Because  collaborating  on  a  design  project  is  often  a  novel  ex¬ 
perience  for  children,  organizing  the  activities  (without  impos¬ 
ing  too  rigid  a  structure)  can  help  create  productive  sessions. 
For  example,  Guha,  Druin,  Chipman,  Fails,  Simms,  and  Farber 
(2005)  described  a  technique  to  support  collaboration  among 
kids  and  adults  during  cooperative  inquiry  sessions  called  “mix¬ 
ing  ideas.”  First,  kids  generate  ideas  in  a  one-on-one  session 
with  an  adult  facilitator,  then  they  work  in  small  groups  to  inte¬ 
grate  these  ideas,  and  finally  they  work  in  larger  groups  until  the 
whole  group  is  finally  working  together. 

Although  there  have  been  many  successes  in  having  children 
participate  as  design  and  research  partners  in  the  development 
of  software,  there  are  still  many  questions  to  be  answered  about 
the  effectiveness  of  this  approach.  Scaife  and  Rogers  (1999)  at¬ 
tempted  to  address  many  of  the  questions  and  problems  faced 
when  working  with  children  in  their  work  on  informant  design. 
The  first  question  deals  with  the  multitude  of  ideas  and  sugges¬ 
tions  produced  by  children.  Children  say  outrageous  things. 
How  do  you  decide  which  ideas  are  worthwhile?  When  do  you 
stop  listening?  The  problem  of  selection  is  difficult  since,  in  the 
end,  the  adult  decides  which  ideas  to  use  and  which  ideas  to 
ignore.  Scaife  and  Rogers  suggested  creating  a  set  of  criteria  to: 

Determine  what  to  accept  and  what  not  to  accept  with  respect  to  the 
goals  of  the  system  .  .  .  You  need  to  ask  what  the  trade-offs  will  be  if  an 
idea  or  set  of  ideas  are  impiemented  in  terms  of  critical  ‘kid’  learning 
factors:  that  is,  how  do  fun  and  motivation  interact  with  better  under¬ 
standing?  (Scaife  &  Rogers,  1999,  pp.  46-47) 

In  addition  to  deciding  which  of  the  children’s  ideas  to  use, 
there  is  also  the  problem  of  understanding  the  meaning  behind 
what  the  child  is  trying  to  say.  Adults  tend  to  assume  that  they 
can  understand  what  kids  are  getting  at,  but  kid  talk  is  not  adult 
talk.  It  is  important  to  remember  that  children  have  “a  different 
conceptual  framework  and  terminology  than  adults”  (Scaife  & 
Rogers,  1999,  p.  47). 

Another  problem  with  involving  children,  particularly  with 
the  design  of  educational  software,  is  that  “children  can’t  discuss 
learning  goals  that  they  have  not  yet  reached  themselves”  (Scaife 
&  Rogers,  1999,  p.  30).  Can  children  make  effective  contributions 
about  the  content  and  the  way  they  should  be  taught,  something 
which  adults  have  always  been  responsible  for?  Adults  have  as¬ 
sumptions  about  what  is  an  effective  way  to  teach  children.  Kids 
tend  to  focus  on  the  fun  aspects  of  the  software  rather  than  the 
educational  agenda.  A  mismatch  of  expectations  may  exist  if  kids 
are  using  components  of  the  software  in  unanticipated  ways.  In¬ 
volving  children  in  the  design  and  evaluation  process  may  help 
detect  where  these  mismatches  occur  in  the  software. 
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Adapting  Usability  Evaluation  Methods 

HCI  practices  have  evolved  to  address  usefulness,  enjoyabil- 
ity,  and  other  measures  of  design  success;  however,  usability 
remains  a  fundamental  concern  for  HCI  designers.  Although  ef¬ 
ficiency  and  task  completion  are  often  not  central  to  kids’  goals 
in  using  technology,  usability  problems  can  create  barriers  to 
achieving  other  goals.  For  example,  much  research  done  to  date 
has  focused  on  designing  educational  software,  and  evaluation 
is  primarily  of  learning  outcomes,  not  usability.  However,  us¬ 
ability  is  a  prerequisite  for  learning.  In  student  projects  in  Geor¬ 
gia  Tech’s  graduate  class  “Educational  Technology:  Design  and 
Evaluation,”  many  student  designers  are  never  able  to  show 
whether  the  educational  design  of  their  software  is  successful. 
■What  they  find  instead  is  that  usability  problems  intervene,  and 
they  are  unable  even  to  begin  to  explore  pedagogical  efficacy. 
If  children  cannot  use  educational  technology  effectively,  they 
certainly  will  not  learn  through  the  process  of  using  it.  Mac- 
Farlane  et  al.  (2005)  found  that  measurements  of  usability  and 
“fun”  were  significantly  correlated  in  studies  of  educational 
software  for  science.  Usability  is  similarly  important  for  enter¬ 
tainment,  communications,  and  other  applications.  Many  re¬ 
searchers  have  explored  the  effectiveness  of  traditional  usability 
methods  with  children.  In  this  section,  we  examine  compara¬ 
tive  assessments  of  usability  methods  and  review  findings  and 
recommendations . 

Traditional  usability  testing.  Several  guidelines  devel¬ 
oped  for  work  with  adults  become  more  important  when  ap¬ 
plied  to  children.  For  example,  when  children  are  asked  to  work 
as  testers,  it  is  important  to  emphasize  that  it  is  the  software  that 
is  being  tested,  not  the  participant  (Rubin,  1994).  Children 
might  become  anxious  at  the  thought  of  taking  a  test,  and  test 
taking  may  conjure  up  thoughts  of  school.  The  researcher  can 
emphasize  that  even  though  the  child  is  participating  in  a  test, 
the  child  is  the  tester  and  not  the  one  being  tested  (Hanna  et  al., 
1997).  Rubin  recommends  that  you  show  the  participant  where 
video  cameras  are  located;  let  them  know  what  is  behind  the 
one-way  mirror  and  whether  people  will  be  watching.  With  chil¬ 
dren,  showing  them  behind  the  one-way  mirrors  and  around 
the  lab  gives  them  “a  better  sense  of  control  and  trust  in  you” 
(Hanna  etal.,  1997,  p.  12). 

Markopoulos  and  Bekker  (2002)  described  characteristics 
of  kids  that  can  impact  the  process  and  outcome  of  usability 
testing: 

•  Children’s  capacity  to  verbalize  thoughts  is  still  developing. 

•  Personality  may  impact  both  kid’s  willingness  to  speak  up  to 
adults  and  their  motivation  to  please  authority  figures. 

•  The  capacity  to  concentrate  is  variable  among  kids. 

•  Young  children  are  still  developing  the  capacity  for  abstract 
and  logical  thinking;  they  may  differ  in  cognitive  ability  such 
as  remembering  several  items  at  once. 

•  The  ability  to  monitor  goal-directed  performance  develops 
throughout  childhood  and  adolescence. 

•  Some  ages  may  have  more  pronounced  gender  differences 
than  others  may. 


•  With  small  children,  basic  motor  skills  ability  may  be  a  bar¬ 
rier  to  effective  evaluation  if  kids  cannot  use  prototypes  with 
standard  input  devices. 

Hanna  et  al.  (1997)  developed  a  set  of  guidelines  for  laboratory- 
based  usability  testing  with  children: 

•  The  lab  should  be  made  a  little  more  child  friendly  by  adding 
some  colorful  posters  but  avoid  going  overboard  as  too  many 
extra  decorations  may  become  distracting  to  the  child. 

•  Try  to  arrange  furniture  so  that  children  are  not  directly  fac¬ 
ing  the  video  camera  and  one-way  mirror,  as  the  children  may 
choose  to  interact  with  the  camera  and  mirror  rather  than  the 
doing  the  task. 

•  Children  should  be  scheduled  for  an  hour  of  lab  time. 
Preschoolers  will  generally  only  be  able  to  work  for  30  min¬ 
utes  but  will  need  extra  time  to  play  and  explore.  Older  chil¬ 
dren  will  become  tired  after  an  hour  of  concentrated  com¬ 
puter  use,  so  if  the  test  will  last  longer  than  45  minutes, 
children  should  be  asked  if  they  would  like  to  take  a  short 
break  at  some  point  during  the  session. 

•  Hanna  et  al.  (1997)  suggested  that  you  “explain  confidential¬ 
ity  agreements  by  telling  children  that  designs  are  ‘top- 
secret’”  (p.  12).  Parents  should  also  sign  the  agreements, 
since  they  will  inevitably  also  see  and  hear  about  the  designs. 

•  Children  up  to  seven  or  eight  years  old  will  need  a  tester  in 
the  room  with  them  for  reassurance  and  encouragement. 
They  may  become  agitated  from  being  alone  or  following  di¬ 
rections  from  a  loudspeaker.  If  a  parent  will  be  present  in  the 
room  with  the  child,  it  is  important  to  explain  to  the  parent 
that  he  or  she  should  interact  with  the  child  as  little  as  possi¬ 
ble  during  the  test.  Older  siblings  should  stay  in  the  observa¬ 
tion  area  or  a  separate  room  during  the  test,  as  they  may 
eventually  be  unable  to  contain  themselves  and  start  to  shout 
out  directions. 

•  Hanna  et  al.  (1997)  suggested  that  you  should  “not  ask  chil¬ 
dren  if  they  want  to  play  the  game  or  do  a  task — that  gives 
them  the  option  to  say  no.  Instead  use  phrases  such  as  ‘Now 
I  need  you  to,’  ‘Let’s  do  this,’  or  ‘It’s  time  to.’” 

Think/talk  aloud.  An  important  method  for  collecting 
usability  data  with  adults  is  think-aloud  protocols.  Think-aloud 
protocols  in  HCI  research  are  related  to  verbal  protocol  analy¬ 
sis  methods  in  psychology,  in  which  subjects  are  asked  to 
describe  what  they  are  thinking  about  and  paying  attention 
to  while  they  complete  some  set  of  tasks  (Ericsson  &  Simon, 
1993).  In  usability  tests,  think-aloud  methods  are  generally  used 
in  concert  with  direct  observation  (Nielsen,  1993).  Researchers 
who  have  used  this  method  with  children  have  observed  that 
children  may  make  very  few  comments  during  testing  (Donker 
&  Reitsma,  2004).  In  some  cases,  they  seem  to  have  difficulty 
with  concurrent  verbalization — verbalizing  thoughts  while  they 
complete  tasks.  The  cognitive  load  associated  with  learning  and 
executing  task  itself  might  interfere  with  kids’  abilities  to  talk 
about  it  (Hoysniemi,  Hamalainen,  &  Turkki,  2003). 

Despite  potential  obstacles,  it  has  been  demonstrated  that 
verbal  comments  from  children  can  play  an  important  role  in 
identifying  usability  problems.  Donker  and  Reitsma  (2004)  re- 
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ported  that,  although  children  produced  fewer  comments, 
those  few  comments  provided  important  information  about  the 
severity  of  usability  problems  that  were  identified  by  direct  ob¬ 
servation.  Likewise,  other  work  suggested  that,  although  using 
think  alouds  with  kids  may  result  in  fewer  utterances  than  other 
approaches,  it  can  be  used  to  generate  useful  usability  data  with 
both  older  kids  aged  8  to  14  (Donker  &  Markopoulos,  2002; 
Baauw  &  Markopoulos,  2004)  and  younger  children  aged  6  to  7 
(van  Kesteren,  Bekker,  Vermeeren,  &  Lloyd,  2003). 

Active  intervention  is  closely  related  to  think-aloud  protocols 
but  involves  investigators  asking  planned  questions  to  encour¬ 
age  testers  to  reflect  aloud  on  actions  at  specific  points  while 
completing  a  task.  In  a  small  comparative  study  with  kids  ages 
six  and  seven,  van  Kesteren  et  al.  (2003)  found  that  active  inter¬ 
vention  elicited  the  most  comments  when  compared  to  think 
alouds,  posttask/retrospection,  codiscovery,  peer  tutoring,  and 
traditional  usability  testing. 

Post-task  interviews.  In  posttask  interviews,  testers  are 
asked  to  describe  their  experiences  after  they  have  finished  us¬ 
ing  a  new  technology  to  complete  a  set  of  tasks.  In  some  cases, 
video  data  are  reviewed  with  the  participant  to  evoke  comments. 
This  kind  of  retrospective  verbal  protocol  emerged  from  the 
same  tradition  as  think  alouds  (Ericsson  &  Simon,  1993).  Van 
Kesteren  et  al.  (2003)  raised  the  question  of  whether  young  kids’ 
limited  capacities  to  hold  in  memory  several  concepts  at  once 
and  still-developing  abilities  to  engage  in  abstract  thought  limit 
their  abilities  to  accurately  recall  and  recount  past  actions.  They 
found  that  some  kids  age  six  and  seven  were  able  to  recall  past 
actions  and  describe  the  ways  in  which  their  understanding 
changed.  They  note  that  keeping  things  interesting  is  important; 
children  become  bored  with  reviewing  videos  unless  the  tasks 
themselves  are  engaging  to  watch.  In  studies  with  kids  ages  9  to 
11,  Baauw  and  Markopoulos  (2004)  determined  that  posttask 
interviews  alone  revealed  fewer  usability  problems  than  think 
alouds;  however,  when  combined  with  data  from  observations, 
which  is  standard  practice,  there  was  no  significant  difference  be¬ 
tween  the  problems  that  the  two  methods  revealed. 

Codiscovery.  Codiscovery  exploration  is  a  usability  method 
that  is  used  to  understand  users’  experiences  and  perceptions  of 
new  product  designs,  especially  those  that  may  be  unfamiliar.  In 
codiscovery  sessions,  two  users  who  know  one  another  work 
together  to  perform  a  set  of  tasks  using  the  product.  The  goal 
of  using  two  acquainted  users  is  to  encourage  them  to  talk  about 
the  problems  they  encounter  and  their  perceptions  of  the  prod¬ 
uct  in  the  natural  course  of  collaborating  on  a  task  instead  of  re¬ 
lying  on  a  single  user’s  verbal  performance  for  an  experimenter 
(Kemp  &  van  Gelderen,  1996).  First,  the  two  users  are  asked  to 
figure  out  what  a  product  does  and  compare  it  other  products 
they  know  about.  Next,  they  are  asked  to  collaborate  on  a  set  of 
specific  tasks  using  the  product.  Finally,  a  discussion  period  al¬ 
lows  designers  to  ask  about  observed  problems  and  behaviors; 
in  addition,  participants  can  ask  questions  about  the  design  and 
the  intended  purpose  of  the  product.  Van  Kesteren  et  al.  (2003) 
found  that  using  co-discovery  with  kids  ages  six  and  seven  can  be 
difficult  because  they  often  attempt  to  complete  tasks  individu¬ 
ally.  Even  when  seated  next  to  one  another,  two  children  may  not 
interact  at  all,  resulting  in  very  few  comments  about  the  prod¬ 


uct  being  tested.  'When  compared  with  traditional  usability  tests, 
think  alouds,  posttask  interviews,  peer  tutoring  and  active  inter¬ 
vention,  codiscovery  was  found  to  elicit  the  fewest  comments 
from  kids  (van  Kesteren  et  al.,  2003). 

Peer  tutoring.  Peer  tutoring  is  a  method  for  usability  test¬ 
ing  that  was  developed  to  capitalize  on  the  ways  that  children 
interact  with  one  another  in  natural,  playful  settings.  'When  chil¬ 
dren  play  together,  they  regularly  teach  one  another  games  and 
invent  rules  of  play.  Hoysniemi  et  al.  (2003)  explained,  “One  de¬ 
finition  of  the  usability  of  a  children’s  software  application  is 
that  a  child  is  able  and  willing  to  teach  other  children  how  to  use 
it”  (p.  209).  Instead  of  relying  on  task  completion  in  a  lab,  peer 
tutoring  is  an  approach  to  usability  testing  that  allows  kids  to  en¬ 
gage  in  exploratory  and  playful  interactions  in  a  naturalistic  set¬ 
ting.  Peer  tutoring  involves  first  helping  one  or  more  kids  to  de¬ 
velop  expertise  using  a  piece  of  software  and  then  asking  them 
to  teach  other  kids  how  to  use  it.  By  observing,  recording,  and 
analyzing  interactions  between  tutors  and  tutees,  it  is  possible 
to  identify  usability  problems  in  software  as  the  kids  attempt  to 
teach  it  to  one  another.  Hoysniemi  et  al.  (2003)  pointed  out 
that,  although  it  can  be  useful,  the  peer-tutoring  approach  re¬ 
quires  time,  training,  and  careful  implementation  to  be  effective. 


GENRES  OF  TECHNOLOGY  FOR  KIDS 


Technology  for  kids  falls  into  two  broad  categories:  education 
and  entertainment.  'When  game  companies  try  to  mix  these  gen¬ 
res,  they  may  use  the  term  edutainment.  New  products  for  kids 
increasingly  include  specialized  hardware  as  well  as  software. 

Entertainment 

Designers  of  games  and  other  entertainment  software  rarely 
write  about  how  they  accomplish  their  jobs.  Talks  are  presented 
each  year  at  the  Game  Developer’s  Conference  (http://www.gd- 
confcom),  and  some  informal  reflections  are  gathered  as  con¬ 
ference  proceedings.  Attending  the  conference  is  recommended 
for  people  who  wish  to  learn  more  about  current  issues  in  game 
design.  Game  Developer  magazine  is  the  leading  publication 
with  reflective  articles  on  the  game  design  process. 

Game  designers  are  usually  gamers  themselves  and  often 
end  up  simply  designing  games  that  they  themselves  would  like 
to  play.  This  simple  design  technique  is  easy  and  requires  little 
if  any  background  research  with  users.  Because  most  game  de¬ 
signers  have  traditionally  been  male,  this  approach  allowed 
them  to  appeal  quite  effectively  to  a  core  gaming  audience: 
young  men  and  teenage  boys.  However,  female  designers  are 
becoming  more  common  on  design  teams  and  gaming  com¬ 
panies  are  increasingly  recognizing  that  people  outside  the  typ¬ 
ical  gamer  stereotype  represent  a  large  potential  market  for 
their  products.  Designing  for  teenagers  is  relatively  easy.  As  we 
have  seen,  designing  for  very  young  children  presents  substan¬ 
tial  challenges.  The  younger  your  target  audience,  the  more 
they  should  be  tightly  connected  to  every  stage  of  the  design 
process. 
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Brenda  Laurel  pioneered  the  use  of  careful  design  methods 
for  nontraditional  game  audiences  in  her  work  with  the  com¬ 
pany  Purple  Moon  in  the  mid-1990s.  Laurel  aimed  to  develop 
games  that  appeal  to  preteen  girls  both  to  tap  this  market  seg¬ 
ment  and  to  give  girls  an  opportunity  to  become  fluent  with 
technology.  Many  people  believe  that  use  of  computer  game 
leads  to  skills  that  later  give  kids  advantages  at  school  and  work. 
Through  extensive  interviews  with  girls  in  their  target  age  range, 
Purple  Moon  was  able  to  create  successful  characters  and  game 
designs.  However,  the  process  was  so  time  consuming  and  ex¬ 
pensive  that  the  company  failed  to  achieve  profitability  fast 
enough  to  please  its  investors.  The  company  closed  in  1999,  and 
its  characters  and  games  were  sold  to  Mattel.  Purple  Moon  per¬ 
haps  did  more  research  than  was  strictly  necessary,  particularly 
because  their  area  was  so  new.  The  broader  lesson  is  that  the 
game  industry  typically  does  not  budget  for  needs  analysis  and 
iterative  design  early  in  the  design  process.  Playtesting  and  qual¬ 
ity  assurance  typically  take  place  relatively  late  in  the  design  cy¬ 
cle.  Designers  contemplating  incorporating  research  early  in 
their  design  process  must  consider  the  financial  costs. 

Oosterholt  et  al.  (1996)  described  several  design  constraints 
that  are  specific  to  the  design  of  products  for  children.  First,  they 
suggested  that  trying  to  be  fashionable  might  result  in  products 
that  kids  quickly  perceived  as  outdated.  They  also  pointed  out 
that  fun  is  just  as  important  to  measure  as  usability,  that  mea¬ 
surements  of  fun  should  be  shared  with  development  teams,  and 
moreover,  that  the  product  should  grow  with  users  over  time 
and  continue  to  be  fun  long  after  kids  have  learned  to  use  it. 

Game  designer  Carolyn  Miller  (1998)  highlighted  seven  mis¬ 
takes  (or  “kisses  of  death”)  commonly  made  by  people  trying 
to  design  games  for  kids: 

“Death  kiss  #1:  Kids  love  anything  sweet” 

Miller  (1998)  wrote,  “Sweetness  is  an  adult  concept  of  what  kids  should 
enjoy.”  Only  very  young  children  will  tolerate  it.  Humor  and  good  char¬ 
acter  development  are  important  ingredients.  Do  not  be  afraid  to  use 
off-color  humor  or  to  make  something  scary. 

“Death  kiss  #2:  Give  ’em  what’s  good  for  ’em” 

Miller  advised,  “Don’t  preach,  don’t  lecture,  and  don’t  talk  down — noth¬ 
ing  turns  kids  off  faster.” 

“Death  kiss  #3:  You  just  gotta  amuse  ’em” 

Miller  wrote,  “Don’t  assume  that  just  because  they  are  little,  they  aren’t 
able  to  consume  serious  themes.” 

“Death  kiss  #4:  Always  play  it  safe!” 

Adult  games  often  rely  on  violence  to  maintain  dramatic  tension.  Since 
you  probably  will  not  want  to  include  this  in  your  game  for  kids,  you  will 
need  to  find  other  ways  to  maintain  dramatic  tension.  Do  not  let  your 
game  become  bland. 

“Death  kiss  #5:  All  kids  are  created  equal” 

Target  a  specific  age  group,  and  take  into  consideration  humor,  vocab¬ 
ulary,  skill  level,  and  interests.  If  you  try  to  design  for  everyone,  your 
game  may  appeal  to  no  one. 

“Death  kiss  #6:  Explain  everything” 

In  an  eagerness  to  be  clear,  some  people  overexplain  things  to  kids.  Kids 
are  good  at  figuring  things  out.  Use  as  few  words  as  possible,  and  make 
sure  to  use  spoken  and  visual  communication  as  much  as  possible. 


“Death  kiss  #7:  Be  sure  your  characters  are  wholesome!” 

Miller  warned  that  if  every  character  is  wholesome,  the  results  are  pre¬ 
dictable  and  boring.  Characters  need  flaws  to  have  depth.  Miller  identi¬ 
fied  a  number  of  common  pitfalls  in  assembling  groups  of  characters. 
It  is  not  a  good  idea  to  take  a  “white  bread”  approach,  in  which  every¬ 
one  is  White  and  middle  class.  On  the  other  end  of  the  spectrum,  it  is 
also  undesirable  to  take  a  “lifesaver  approach”  with  one  character  for 
each  ethnicity.  Finally,  you  also  need  to  avoid  an  “off-the-shelf”  ap¬ 
proach,  in  which  each  character  represents  a  stereotype:  “You’ve  got 
your  beefy  kid  with  bad  teeth;  he’s  the  bully.  You’ve  got  the  little  kids 
with  glasses;  he’s  the  smart  one.”  Create  original  characters  that  have 
depth  and  have  flaws  that  they  can  struggle  to  overcome  (Miller,  1998) . 

Education 

To  design  educational  software,  we  must  expand  the  concept 
of  user-centered  design  (UCD)  to  one  of  learner-centered  de¬ 
sign  (LCD;  Soloway,  Guzdial,  &  Hay,  1994).  There  are  several 
added  steps  in  the  process: 

•  Needs  analysis 

-  For  learners 

-  For  teachers 

•  Select  pedagogy 

•  Select  media/technology 

•  Prototype 

-  Core  application 

-  Supporting  curricula 

-  Assessment  strategies 

•  Formative  evaluation 

-  Usability 

-  Learning  outcomes 

•  Iterative  design 

•  Summative  evaluation 

-  Usability 

-  Learning  outcomes 

In  our  initial  needs  analysis,  for  software  to  be  used  in  a 
school  setting,  we  need  to  understand  not  just  learners  but  also 
teachers.  Teachers  have  heavy  demands  on  their  time  and  are 
held  accountable  for  their  performances  in  ways  that  vary  be¬ 
tween  districts  and  between  election  years. 

Once  we  understand  our  learners  and  teachers  needs,  we 
need  to  select  an  appropriate  pedagogy — an  approach  to  teach¬ 
ing  and  learning.  For  example,  behaviorism  views  learning  as  a 
process  of  stimulus  and  reinforcement  (Skinner,  1968).  Con¬ 
structivism  sees  learning  as  a  process  of  active  construction  of 
knowledge  through  experience.  A  social-constructivist  perspec¬ 
tive  emphasizes  learning  as  a  social  process  (Newman,  Griffin,  & 
Cole,  1989).  (A  full  review  of  approaches  to  pedagogy  is  beyond 
the  scope  of  this  chapter.) 

Next,  we  are  ready  to  select  the  media  we  will  be  working 
with,  matching  their  affordances  to  our  learning  objectives  and 
pedagogical  approach.  Once  the  prototyping  process  has  be¬ 
gun,  we  need  to  develop  not  just  software  or  hardware,  but  (for 
applications  to  be  used  in  schools)  also  supporting  curricular 
materials  and  assessment  strategies. 
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Assessment  should  not  be  confused  with  evaluation.  The 
goal  of  assessment  is  to  judge  an  individual  student’s  perfor¬ 
mance.  The  goal  of  evaluation  is  to  understand  to  what  extent 
our  learning  technology  design  is  successful.  An  approach  to  as¬ 
sessing  student  achievement  is  an  essential  component  of  any 
school-based  learning  technology.  For  both  school  and  free  time 
use,  we  need  to  design  feedback  mechanisms  so  that  learners 
can  be  aware  of  their  progress.  It  is  also  important  to  note 
whether  learners  find  the  environment  motivating.  Does  it  ap¬ 
peal  to  all  learners  or  to  specific  gender,  learning  style,  or  inter¬ 
est  groups? 

As  in  any  HCI  research,  educational  technology  designers 
use  formative  evaluation  to  understand  informally  what  needs 
improvement  in  their  learning  environments,  and  guide  the 
process  of  iterative  design.  Formative  evaluation  must  pay  at¬ 
tention  first  to  usability,  and  second  to  learning  outcomes.  If  stu¬ 
dents  cannot  use  the  learning  hardware  or  software,  they  cer¬ 
tainly  will  not  learn  through  its  use.  Once  it  is  clear  that  usability 
has  met  a  minimum  threshold,  designers  then  need  to  evalu¬ 
ate  whether  learning  outcomes  are  being  met.  After  formative 
evaluation  and  iterative  design  are  complete,  a  final  summative 
evaluation  serves  to  document  the  effectiveness  of  the  design 
and  justify  its  use  by  learners  and  teachers.  Summative  evalua¬ 
tion  must  similarly  pay  attention  to  both  usability  and  learning 
outcomes. 

A  variety  of  quantitative  and  qualitative  techniques  are  com¬ 
monly  used  for  evaluation  of  learning  outcomes  (Gay  &  Airasian, 
2000).  Most  researchers  use  a  complementary  set  of  both  quan¬ 
titative  and  qualitative  approaches.  Demonstrating  educational 
value  is  challenging,  and  research  methods  are  an  ongoing  sub¬ 
ject  of  research. 

This  represents  an  idealized  learner-centered  design  process. 
Just  as  many  software  design  projects  do  not  in  reality  follow  a 
comprehensive  UCD  process,  many  educational  technology 
projects  do  not  follow  a  full  LCD  process.  LCD  is  generally  sub¬ 
stantially  more  time  consuming  than  UCD.  While  in  some  cases 
it  may  be  possible  to  collect  valid  usability  data  in  a  single  ses¬ 
sion,  learning  typically  takes  place  over  longer  periods.  To  get 
meaningful  data,  most  classroom  trials  take  place  over  weeks 
or  months.  Furthermore,  classroom  research  needs  to  fit  into 
the  school  year  at  the  proper  time.  If  you  are  using  Biologica 
(Hickey,  Kindfield,  Horwitz,  &  Christie,  2000)  to  teach  about 
genetics,  you  need  to  wait  until  it  is  time  to  cover  genetics  that 
school  year.  You  may  have  only  one  or  two  chances  per  year  to 
test  your  educational  technology.  It  frequently  takes  many  years 
to  complete  the  LCD  process.  In  the  research  community,  one 
team  may  study  and  evolve  one  piece  of  educational  technology 
over  many  years.  In  a  commercial  setting,  educational  products 
need  to  get  to  market  rapidly,  and  this  formal  design  process  is 
rarely  used. 

Genres  of  Educational  Technology 

Taylor  (1980)  divided  educational  technology  into  three  genres: 

•  Computer  as  tutor 

•  Computer  as  tool 

•  Computer  as  tutee 


Suppose  that  we  are  learning  about  acid  rain.  If  the  computer 
is  serving  as  tutor,  it  might  present  information  about  acid  rain 
and  ask  the  child  questions  to  verify  the  material  was  under¬ 
stood.  If  the  computer  is  a  tool,  the  child  might  collect  data 
about  local  acid  rain  and  input  that  data  into  an  ecological  model 
to  analyze  its  significance.  If  the  computer  is  a  tutee,  the  child 
might  program  his  or  her  own  ecological  model  of  acid  rain. 

With  the  advent  of  the  Internet,  we  must  add  a  fourth  genre: 

•  Computer-supported  collaborative  learning  (CSCL) 

In  a  CSCL  study  of  acid  rain,  kids  from  around  the  country 
might  collect  local  acid  rain  data,  enter  it  into  a  shared  database, 
analyze  the  aggregate  data,  and  talk  online  with  adult  scientists 
who  study  acid  rain.  This  is  in  fact  the  case  in  the  NGS-TERC 
Acid  Rain  Project  (Tinker,  1993).  See  Table  40.1  for  an  overview 
of  genres  of  children’s  software. 

Computer  as  Tutor 

In  most  off  the  shelf  educational  products,  the  computer  acts 
as  tutor.  Children  are  presented  with  information  and  then 
quizzed  on  their  knowledge.  This  approach  to  education  is 
grounded  in  behaviorism  (Skinner,  1968).  It  is  often  referred 
to  as  “drill  and  practice”  or  “computer-aided  instruction”  (CAI) . 


TABLE  40. 1 .  Genres  of  Children's  Software 


Genre 

Description 

Entertainment 

Games  created  solely  for  fun  and  pleasure. 

Educational 

Software  created  to  help  children  learn 
about  a  topic  using  some  type  of 
pedagogy — an  approach  to  teaching 
and  learning. 

Computer  as  Tutor 

Often  referred  to  as  "drill  and  practice"  or 
"computer-aided  instruction"  (CAI),  this 
approach  is  grounded  in  behaviorism. 
Children  are  presented  with  information 
and  then  quizzed  on  their  knowledge. 

Computer  as  Tool 

The  learner  directs  the  learning  process, 
rather  than  being  directed  by  the 
computer.  This  approach  is  grounded  in 
constructivism,  which  sees  learning  as  an 
active  process  of  constructing  knowledge 
through  experience. 

Computer  as  Tutee 

Typically,  the  learner  uses  construction  kits 
to  help  reflect  upon  what  he  or  she 
learned  through  the  process  of  creation. 
This  approach  is  grounded  in 
constructivism  and  constructionism. 

Computer-supported 

Children  use  the  Internet  to  learn  from  and 

Collaborative 

communicate  with  knowledgeable 

learning  (CSCL) 

members  of  the  adult  community. 

Children  can  also  become  involved  in 
educational  online  communities  with 
children  from  different  geographical 
regions.  This  approach  is  grounded  in 
social  constructivism. 

Edutainment 

A  mix  of  the  entertainment  and  educational 
genres. 
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The  computer  tracks  student  progress  and  repeats  exercises  as 
necessary. 

Researchers  with  a  background  in  artificial  intelligence  have 
extended  the  drill  and  practice  approach  to  create  intelligent  tu¬ 
toring  systems.  Such  systems  try  to  model  what  the  user  knows 
and  tailor  the  problems  presented  to  an  individual’s  needs. 
Many  systems  explicitly  look  for  typical  mistakes  and  provide 
specially  prepared  corrective  feedback.  For  example,  suppose 
a  child  adds  17  and  18  and  gets  an  answer  of  25  instead  of  35. 
The  system  might  infer  that  the  child  needs  help  learning  to 
carry  from  the  ones  to  the  tens  column  and  present  a  lesson 
on  that  topic.  One  challenge  in  the  design  of  intelligent  tutors  is 
in  accurately  modeling  what  the  student  knows  and  what  their 
errors  might  mean. 

Byrne,  Anderson,  Douglass,  and  Matessa  (1999)  experi¬ 
mented  with  using  eye  tracking  to  improve  the  performance 
of  intelligent  tutors.  Using  an  eye  tracker,  the  system  can  tell 
whether  the  student  has  paid  attention  to  all  elements  necessary 
to  solve  the  problem.  In  early  trials  with  the  eye  tracker,  they 
found  that  some  of  the  helpful  hints  the  system  was  providing 
to  the  user  were  never  actually  read  by  most  students.  This 
helped  guide  their  design  process.  They  previously  focused  on 
how  to  improve  the  quality  of  hints  provided;  however,  that  is 
irrelevant  if  the  hints  are  not  even  being  read  (Byrne  et  al.,  1999). 

An  interesting  variation  on  the  traditional  ‘computer  as  tutor’ 
paradigm  for  very  young  children  is  the  Actimates  line  inter¬ 
active  plush  toys.  Actimates  Barney  and  other  characters  lead 
children  in  simple  games  with  educational  value,  like  counting 
exercises.  The  tutor  is  animated  and  anthropomorphized.  The 
embodied  form  lets  young  children  use  the  skills  they  have  in 
interacting  with  people  to  learn  to  interact  with  the  system,  en¬ 
hancing  both  motivation  and  ease  of  use  (Strommen,  1998; 
Strommen  &  Alexander,  1999). 

Computer  as  Tool 

■When  the  computer  is  used  as  a  tool,  agency  shifts  from  the 
computer  to  the  learner.  The  learner  is  directing  the  process, 
rather  than  being  directed.  This  approach  is  preferred  by  con¬ 
structivist  pedagogy,  which  sees  learning  as  an  active  process  of 
constructing  knowledge  through  experience.  The  popular  draw¬ 
ing  program  Kid  Plx  is  an  excellent  example  of  a  tool  customized 
for  kids’  interests  and  needs.  "Winograd  (1996)  commented  that 
Kid  Fix’s  designer  Craig  Hickman  “made  a  fundamental  shift 
when  he  recognized  that  the  essential  functionality  of  the  pro¬ 
gram  lay  not  in  the  drawings  that  it  produced,  but  in  the  experi¬ 
ence  for  the  children  as  they  used  it”  (p.  60).  For  example.  Kid 
Fix  provides  several  different  ways  to  erase  the  screen — includ¬ 
ing  having  your  drawing  explode  or  be  sucked  down  a  drain. 

Simulation  programs  let  learners  try  out  different  possibili¬ 
ties  that  would  be  difficult  or  impossible  in  real  life.  For  exam¬ 
ple,  Biologica  (an  early  version  was  called  “Genscope”)  allows 
students  to  learn  about  genetics  by  experimenting  with  breed¬ 
ing  cartoon  dragons  with  different  inherited  characteristics  like 
whether  they  breathe  fire  or  have  horns  (Hickey  et  al.,  2000). 
Model-it  lets  students  try  out  different  hypotheses  about  water 
pollution  and  other  environmental  factors  in  a  simulated 
ecosystem  (Soloway  et  al.,  1996). 


The  goal  of  such  programs  is  to  engage  students  in  scien¬ 
tific  thinking.  The  challenge  in  their  designs  is  how  to  get  stu¬ 
dents  to  think  systematically  and  not  to  simply  try  out  options  at 
random.  Programs  like  Model-It  provide  the  student  with  scaf¬ 
folding.  Initially,  students  are  given  lots  of  support  and  guid¬ 
ance.  As  their  knowledge  evolves,  the  scaffolding  is  faded,  al¬ 
lowing  the  learner  to  work  more  independently  (Guzdial,  1994; 
Soloway  et  al.,  1994). 

Computer  as  Tutee 

Papert  (1992)  commented  that  much  computer-aided  in¬ 
struction  is  “using  the  computer  to  program  the  child”  (p.  163). 
Instead,  he  argued  that  the  child  should  learn  to  program  the 
computer  and  through  this  process  gain  access  to  new  ways  of 
thinking  and  understanding  the  world.  Early  research  argued 
that  programming  would  improve  children’s  general  cognitive 
skills,  but  empirical  trials  produced  mixed  results  (Clements, 
1986;  Clements  &  Gullo,  1984;  Pea,  1984).  Some  researchers  ar¬ 
gued  that  the  methods  of  these  studies  are  fundamentally 
flawed,  because  the  complexity  of  human  experience  cannot 
be  reduced  to  pretests  and  posttests  (Papert,  1987).  The  coun¬ 
terargument  is  that  researchers  arguing  that  technology  has  a 
transformative  power  need  to  back  up  their  claims  with  evi¬ 
dence  of  some  form,  whether  quantitative  or  qualitative  (Pea, 
1987;  Walker,  1987).  More  recently,  the  debate  has  shifted  to  the 
topic  of  technological  fluency.  As  technology  increasingly  sur¬ 
rounds  our  everyday  lives,  the  ability  to  use  it  effectively  as  a 
tool  becomes  important  for  children’s  successes  in  school  and 
later  in  the  workplace  (Resnick  &  Rusk,  1996). 

In  the  late  1960s,  Feurzeig  (1996)  and  colleagues  at  BBN  in¬ 
vented  Logo,  the  first  programming  language  for  kids.  Papert 
(1980)  extended  Logo  to  include  turtle  graphics,  in  which  kids 
learn  geometric  concepts  by  moving  a  turtle  around  the  screen. 
A  variety  of  programming  languages  for  kids  have  been  devel¬ 
oped  over  subsequent  years,  including  Starlogo  (Resnick,  1994), 
Boxer  (diSessa  &  Abelson,  1986),  Stagecast  (Cypher  &  Smith, 
1995),  Agentsheets  (Repenning  &  Fahlen,  1993),  MOOSE  (Bruck- 
man,  1997),  and  Squeak  (Guzdial  &  Rose,  2001).  Lego  Mind- 
storms  (originally  “LegO/Togo”)  is  a  programmable  construction 
kit  with  physical  as  well  as  software  components  (Martin  & 
Resnick,  1993).  Another  programmable  tool  bridging  the  gap 
between  physical  constructions  and  representations  on  the 
screen  is  Hypergami,  a  computer-aided  design  tool  for  origami 
developed  at  the  University  of  Colorado  at  Boulder.  Students 
working  with  Hypergami  learn  about  both  geometry  and  art 
(Eisenberg,  Nishioka,  &  Schreiner,  1997). 

In  most  design  tools,  the  goal  is  to  facilitate  the  creation  of 
a  product.  In  educational  construction  kits,  the  goal  instead  is 
what  is  learned  through  the  process  of  creation.  So  what  makes 
a  good  construction  kit?  In  an  Interactions  article  entitled  “Pi¬ 
anos,  Not  Stereos:  Creating  Computational  Construction  Kits,” 
Resnick,  Bruckman,  and  Martin  (1996)  discussed  the  art  of  de¬ 
signing  construction  kits  for  learning  (constructional  design): 

The  concept  of  learning-by-doing  has  been  around  for  a  long  time.  But 
the  literature  on  the  subject  tends  to  describe  specific  activities  and 
gives  little  attention  to  the  general  principles  governing  what  kinds  of 
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FIGURE  40.2.  Penguins  created  using  Hypergami. 


“doing”  are  most  conducive  to  learning.  From  our  experiences,  we  have 
developed  two  general  principles  to  guide  the  design  of  new  construc¬ 
tion  kits  and  activities.  These  constructional-design  principles  involve 
two  different  types  of  “connections”: 

•  Personal  connections.  Construction  kits  and  activities  should  con¬ 
nect  to  users’  interests,  passions,  and  experiences.  The  point  is  not 
simply  to  make  the  activities  more  “motivating”  (though  that,  of 
course,  is  important) .  "When  activities  involve  objects  and  actions  that 
are  familiar,  users  can  leverage  their  previous  knowledge,  connecting 
new  ideas  to  their  pre-existing  intuitions. 

•  Epistemological  connections.  Construction  kits  and  activities  should 
connect  to  important  domains  of  knowledge — more  significantly, 
encourage  new  ways  of  thinking  (and  even  new  ways  of  thinking 
about  thinking).  A  well-designed  construction  kit  makes  certain  ideas 
and  ways  of  thinking  particularly  salient,  so  that  users  are  likely  to 
connect  with  those  ideas  in  a  very  natural  way,  in  the  process  of  de¬ 
signing  and  creating. 

Bruckman  (2000)  added  a  third  design  principle: 

•  Situated  support.  Support  for  learning  should  be  from  a  source 
(either  human  or  computational)  with  whom  the  learner  has  a 
positive  personal  relationship,  ubiquitously  available,  richly  con¬ 
nected  to  other  sources  of  support,  and  richly  connected  to  every¬ 
day  activities. 

Resnick  and  Silverman  (2005)  suggested  several  more  design 
principles  for  creating  construction  kits  for  kids.  Some,  such  as 
“iterate,  iterate — then  iterate  again,”  are  familiar  mantras  for 
HCI  designers.  Others  may  be  less  familiar: 

•  Low  Floor  and  Wide  Walls 

If  a  technology  has  a  low  floor,  it  means  that  it  is  easy  for 
novices  to  begin  using  it.  Wide  Walls  suggest  a  wide  range  of 
possible  areas  of  design  and  exploration.  Construction  kits  de¬ 
fine  “a  place  to  explore,  not  a  collection  of  specific  activities.” 

•  Make  Powerful  Ideas  Salient — Not  Forced 

When  designing  toward  specific  learning  goals,  construction 
kits  should  make  these  ideas  visible  and  useful  in  design  ac¬ 
tivities  rather  than  imposing  the  ideas  on  students  as  a  pre¬ 
determined  solution. 

•  Support  Many  Paths,  Many  Styles 

Kids  approach  problems  in  different  ways,  it  is  important  to 
support  a  variety  of  design  approaches  in  a  construction  kit. 


•  Make  it  as  Simple  as  Possible — and  Maybe  Even  Simpler 
Constraints  can  be  the  designer’s  best  friend.  Limited  func¬ 
tionality  sometimes  wins  out  over  designs  that  are  more  so¬ 
phisticated  because  simplicity  allows  kids  to  find  creative  new 
ways  to  use  a  product. 

•  Choose  Black  Boxes  Carefully 

This  principle  is  related  to  the  previous  one;  deciding  when 
to  reveal  complexity  and  when  to  conceal  it  is  a  difficult  ques¬ 
tion.  Resnick  and  Silverman  (2005)  suggested  that  the  sim¬ 
plest  choice  is  often  the  best  one. 

•  A  Little  Bit  of  Programming  Goes  a  Long  Way 

Because  programming  is  the  fundamental  mode  of  construc¬ 
tion  with  computers,  designers  of  construction  kits  for  kids 
often  include  some  programming  functionality.  Focusing  on 
powerful,  simple  commands  that  kids  can  do  well  is  often  the 
best  way  to  support  a  diverse  range  of  activities. 

•  Give  People  'What  They  Want — Not  'What  They  Ask  For 
Observations  of  kids  can  often  tell  designers  more  than  their 
direct  answers  to  questions.  Kids  may  ask  for  unrealistic  fea¬ 
tures  or  may  not  know  themselves  why  they  are  having  diffi¬ 
culty  completing  a  task. 

•  Invent  Things  That  You  Would  Want  to  Use  Yourself 
Although  they  caution  against  overgeneralizing  one’s  own  per¬ 
sonal  likes  and  dislikes,  Resnick  and  Silverman  (2005)  pro¬ 
posed  that  the  most  respectful  approach  to  designing  for  kids 
is  to  create  something  that  the  designer  herself  finds  enjoyable. 

Computer-Supported  Collaborative  Learning  (CSCL) 

Most  tools  for  learning  have  traditionally  been  designed 
for  one  child  working  at  the  computer  alone.  However,  learn¬ 
ing  is  generally  recognized  to  be  a  social  process  (Newman 
et  ah,  1989).  With  the  advent  of  the  Internet  came  new  op¬ 
portunities  for  children  to  learn  from  one  another  and  from 
knowledgeable  members  of  the  adult  community.  This  field  is 
called  “Computer-Supported  Collaborative  Learning”  (CSCL; 
Koschmann,  1996). 

CSCL  research  can  be  divided  into  four  categories: 

1.  Distance  education 

Students  attempt  to  use  online  environments  in  ways  that 
emulate  a  traditional  classroom. 

2.  Information  retrieval 

Research  projects  in  which  students  use  the  Internet  to  find 
information. 

3.  Information  sharing 

Students  debate  issues  with  one  another.  One  of  the  first 
such  tools  was  the  Computer-Supported  Intentional  Learning 
Environment  (CSILE),  a  networked  discussion  tool  designed  to 
help  students  engage  in  thoughtful  debate  as  a  community  of  sci¬ 
entists  does  (Scardamalia  &  Bereiter,  1994).  They  may  also  col¬ 
lect  scientific  data  and  share  it  with  others  online.  In  the  “One 
Sky,  Many  'Voices”  project,  students  learn  about  extreme  weather 
phenomena  by  sharing  meteorological  data  they  collect  with 
other  kids  from  around  the  world,  and  also  by  talking  online  with 
adult  meteorologists  (Songer,  1996).  In  the  Palaver  Tree  Online 
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project,  kids  learn  about  history  by  talking  online  with  older 
adults  who  lived  through  that  period  of  history  (Ellis  &  Bruck- 
man,  2001).  A  key  challenge  in  the  design  of  information  shar¬ 
ing  environments  is  how  to  promote  serious  reflection  on  the 
part  of  students  (Guzdial,  1994;  Kolodner  &  Guzdial,  1996). 


Technological  Samha  Schools 

In  Mindstorms,  Papert  (1980)  had  a  vision  of  a  “technologi¬ 
cal  samba  school.”  At  samba  schools  in  Brazil,  a  community  of 
people  of  all  ages  gather  together  to  prepare  a  presentation  for 
carnival.  “Members  of  the  school  range  in  age  from  children  to 
grandparents  and  in  ability  from  novice  to  professional.  But  they 
dance  together  and  as  they  dance  everyone  is  learning  and 
teaching  as  well  as  dancing.  Even  the  stars  are  there  to  learn 
their  difficult  parts”  (Papert,  1980).  People  go  to  samba  schools 
not  just  to  work  on  their  presentations,  but  also  to  socialize  and 
be  with  one  another.  Learning  is  spontaneous,  self-motivated, 
and  richly  connected  to  popular  culture.  Papert  imagined  a  kind 
of  technological  samba  school  where  people  of  all  ages  gather 
together  to  work  on  creative  projects  using  computers.  The 
Computer  Clubhouse  is  an  example  of  such  a  school  in  a  face- 
to-face  setting  (Resnick  &  Rusk,  1996).  MOOSE  Crossing  is  an 
Internet-based  example  (Bruckman,  1998).  A  key  challenge  in 
the  design  of  such  environments  is  how  to  grapple  with  the 
problem  of  uneven  achievement  among  participants.  When  kids 
are  allowed  to  work  or  not  work  in  a  self-motivated  fashion,  typ¬ 
ically  some  excel  while  others  do  little  (Elliott,  Bruckman,  Ed¬ 
wards,  &  Jensen,  2000). 


Child  Safety  Online 

One  challenge  in  the  design  of  Internet-based  environments  for 
kids  is  the  question  of  safety.  The  Internet  does  contain  infor¬ 
mation  that  is  sexually  explicit,  violent,  and  racist.  Typically,  such 
information  does  not  appear  unless  one  is  looking  for  it;  how¬ 
ever,  it  is  unusual  but  possible  to  stumble  across  it  accidentally. 
Filtering  software  blocks  access  to  useful  information  as  well  as 
harmful  (Schneider,  1997).  Furthermore,  companies  that  make 
filtering  software  often  fail  to  adequately  describe  how  they  de¬ 
termine  what  to  block,  and  they  may  have  unacknowledged  po¬ 
litical  agendas  that  not  all  parents  will  agree  with.  Resolving  this 
issue  requires  a  delicate  balance  of  the  rights  of  parents,  teach¬ 
ers,  school  districts,  and  children  (Electronic  Privacy  Information 
Center,  2001).  Another  danger  for  kids  online  is  the  presence  of 
sexual  predators  and  others  who  wish  to  harm  children.  While 


such  incidents  are  rare,  it  is  important  to  teach  kids  not  to  give 
out  personal  information  online  such  as  their  last  names,  ad¬ 
dresses,  or  phone  numbers.  Kids  who  wish  to  meet  online 
friends  to  friend  face  to  face  should  do  so  by  each  bringing  a  par¬ 
ent  and  meeting  in  a  well-populated  public  place  like  a  fast-food 
restaurant.  A  useful  practical  guide  “Child  Safety  on  the  Infor¬ 
mation  Superhighway”  is  available  from  the  Center  for  Missing 
and  Exploited  Children  (http://www.missingkids.com).  Educat¬ 
ing  kids,  parents,  and  teachers  about  online  safety  issues  is  an 
important  part  of  the  design  of  any  online  software  for  kids. 


CONCLUSION 


To  design  for  kids,  we  must  have  a  model  of  what  kids  are  and 
what  we  would  like  them  to  become.  Adults  were  once  kids.  Many 
are  parents.  Some  are  teachers.  We  tend  to  think  that  we  know 
kids — who  they  are,  what  they  are  interested  in,  and  what  they 
like.  However,  we  do  not  have  as  much  access  to  our  former 
selves  as  many  would  like  to  believe.  Furthermore,  it  is  worth  not¬ 
ing  that  our  fundamental  notions  of  childhood  are  in  fact  cultur¬ 
ally  constructed  and  change  over  time.  Calvert  (1992)  wrote 
about  the  changing  notion  of  childhood  in  America,  and  the  im¬ 
pact  it  has  had  on  artifacts  designed  for  children  and  child  rearing: 

In  the  two  centuries  following  European  settlement,  the  common  per¬ 
ception  in  America  of  children  changed  profoundly,  having  first  held  to 
an  exaggerated  fear  of  their  inborn  deficiencies,  then  expecting  consid¬ 
erable  self-sufficiency,  and  then,  after  1830,  endowing  young  people 
with  an  almost  celestial  goodness.  In  each  era,  children’s  artifacts  me¬ 
diated  between  social  expectations  concerning  the  nature  of  childhood 
and  the  realities  of  child-rearing:  before  1730,  they  pushed  children 
rapidly  beyond  the  perceived  perils  of  infancy,  and  by  the  nineteenth 
century  they  protected  and  prolonged  the  perceived  joys  and  inno¬ 
cence  of  childhood,  (p.  8) 

While  Calvert  (1992)  reflected  on  the  design  of  swaddling 
clothes  and  walking  stools,  the  same  role  is  played  by  new  tech¬ 
nologies  for  kids  like  programmable  Legos  and  drill  and  practice 
arithmetic  programs:  These  artifacts  mediate  between  our  social 
expectations  of  children  and  the  reality  of  their  lives.  If  you  be¬ 
lieve  that  children  are  unruly  and  benefit  from  strong  discipline, 
then  you  are  likely  to  design  CAI.  If  you  believe  that  children  are 
creative  and  should  not  be  stifled  by  adult  discipline,  then  you 
might  design  an  open-ended  construction  kit  like  Logo  or 
Squeak.  In  designing  for  kids,  it  is  crucial  to  become  aware  of 
one’s  own  assumptions  about  the  nature  of  childhood.  Design¬ 
ers  should  be  able  to  articulate  their  assumptions,  and  be  ready 
to  revise  them  based  on  empirical  evidence. 
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INTRODUCTION 


Well-designed  communication  and  information  technology  sys¬ 
tems  have  great,  unrealized  potential  to  enhance  the  quality  of 
life  and  independence  for  those  with  cognitive  dysfunction, 
including  elderly  people,  by 

•  Allowing  them  to  retain  a  high  level  of  independence  and 
control  over  their  lives 

•  Providing  appropriate  levels  of  monitoring  and  supervision  of 
at-risk  people,  without  violating  privacy 

•  Keeping  people  intellectually  and  physically  active 

•  Providing  communication  methods  to  reduce  social  isolation 
and  foster  social  inclusion 

Using  technology  to  augment  human  cognitive  capacity  is 
not  a  new  idea,  taking  the  term  technology  in  its  widest  sense, 
to  include  tools  and  techniques  which  humans  have  developed 
to  help  them  improve  upon  their  limitations.  The  first  cogni¬ 
tive  function  to  be  augmented  was  probably  memory.  Mnemonic 
methods  help  with  this  and  are  still  in  wide  use  today.  One  of 
the  most  common  (e.g.,  the  method  of  loci)  is  to  link  what  is 
to  be  remembered  with  a  well-established  memory  structure 
that  is  easy  to  recall,  such  as  the  layout  of  a  familiar  city,  or  a 
narrative  that  has  already  been  memorized.  The  introduction 
of  written  language  extended  people’s  cognitive  abilities  by  al¬ 
lowing  memories  to  be  recorded  externally.  Although  at  the 
time  some  feared  this  backwards  step  would  allow  people’s 
memory  powers  to  wither,  it  has  since  become  apparent  that 
the  overriding  effect  has  been  to  free  cognitive  abilities  that  can 
further  develop  on  the  basis  of  the  external  support.  The  de¬ 
velopment  of  information  processing  technologies  has  contin¬ 
ued  this  trend  of  extending  the  potential  of  most  people’s  cog¬ 
nitive  abilities.  Thus,  there  should  be  similar  potential  for 
computer  technology  to  extend  the  cognitive  abilities  of  those 
with  some  form  of  impairment  or  other  limitation. 

The  increasing  power  and  decreasing  size  of  computer  tech¬ 
nology,  along  with  its  capacity  to  provide  communication  as  well 
as  computation  and  storage,  offers  the  possibility  of  quite  so¬ 
phisticated  help  for  cognitive  impairments,  if  we  can  develop 
the  appropriate  software.  Computers  have  the  potential  to  act 
as  a  kind  of  scaffolding  for  cognitive  tasks,  taking  over  functions 
that  have  been  affected  by  illness,  accident,  or  aging.  They  could 
also  provide  prompts  for  daily  living,  if  they  were  able  to  track 
successfully  through  the  user’s  sequence  of  tasks  and  actions. 

Particular  strengths  of  computers  as  assistants  for  people 
with  cognitive  impairments  include  being  consistent,  tireless, 
and  not  becoming  emotionally  involved.  In  addition,  multi- 
media  and  multimodal  systems  can  provide  a  very  rich  inter¬ 
action,  which  may  be  particularly  advantageous  for  users  with 
cognitive  dysfunction.  For  example,  such  systems  have  great  po¬ 
tential  in  addressing  the  problems  of  memory  loss  and  the  re¬ 
lated  difficulties  presented  by  dementia.  In  addition,  communi¬ 
cation  systems  that  use  synthetic  speech,  predictive  programs 
that  can  facilitate  writing,  and  a  range  of  nonlinguistic  methods 
of  communication  can  be  used  by  those  with  speech  and  lan¬ 
guage  dysfunction  caused  by  cognitive  impairments.  A  great 


deal  of  work  will  be  required  in  order  to  realize  this  potential 
both  in  uncovering  new  knowledge  about  how  cognition  works 
and  in  developing  assistive  systems  based  on  this  knowledge. 

Because  of  the  wide  range  of  skills  and  knowledge  needed  to 
understand  the  problems  faced  by  people  with  cognitive  im¬ 
pairment,  it  is  essential  that  research  work  in  this  field  should  be 
multidisciplinary,  including  psychologists,  members  of  the 
health  and  therapeutic  professions,  and  engineers.  It  is  also  vital 
to  involve  potential  users  of  the  technology  as  partners  in  the 
research  at  all  stages  of  development. 

In  addition,  it  is  essential  that  interface  designers  be  much 
more  aware  of  the  range  of  abilities  that  exist  in  any  population 
of  users.  For  example,  access  to  the  Internet  for  disabled  people 
is  often  thought  to  be  synonymous  with  access  for  blind  people, 
but  in  fact,  blind  people  form  only  a  small  percentage  of  the 
disabled  population.  Ogozalec  (1997)  pointed  out  that,  if  cur¬ 
rent  trends  continue  in  the  United  States,  by  2030,  one  fifth  of 
the  population  will  be  over  65  years  of  age  and  commented,  “It 
is  difficult  to  categorize  and  draw  conclusions  about  ‘the  el¬ 
derly’,  since  they  comprise  such  a  diverse  and  heterogeneous 
population”  (p.  65).  This  diversity,  particularly  of  cognitive  func¬ 
tion,  ought  to  be  taken  into  account  if  we  are  to  make  software 
and  the  Internet  available  to  as  large  a  percentage  of  the  popu¬ 
lation  as  possible. 

"We  will  describe  the  major  types  of  cognitive  impairment  and 
include  within  this  discussion  the  effects  of  aging.  We  will  then 
illustrate  the  development  of  systems  to  support  people  with  a 
variety  of  cognitive  impairment  with  specific  projects  in  which 
we  have  been  involved.  They  will  not  cover  all  aspects  of  cogni¬ 
tive  impairment,  but  they  do  illustrate  a  methodology  and  an 
approach  to  developing  assistive  technology,  which  may  have  a 
wider  relevance. 

In  the  concluding  part  of  this  chapter,  we  will  address  the  de¬ 
velopment  of  methodologies  that  we  believe  will  be  valuable  for 
designers  of  systems  to  support  people  with  cognitive  impair¬ 
ment  and  to  assist  designers  of  general  systems  to  take  into  ac¬ 
count  the  needs  of  people  with  cognitive  impairments. 


COGNITIVE  IMPAIRMENT 


The  use  of  the  term  cognitive  impairment  implies  that  two  cat¬ 
egories  of  human  cognitive  systems  exist — impaired  and  unim¬ 
paired.  However,  this  is  not  the  case,  although  it  can  reasonably 
be  stated  that  there  are  normal  or  average  cognitive  systems. 
The  vast  majority  of  experimental  cognitive-psychology  litera¬ 
ture  relates  to  this  normal  system.  In  many  contexts,  this  level  of 
explanation  is  suitable  for  indicating  of  what  most  people  are  ca¬ 
pable.  It  should  always  be  borne  in  mind,  however,  that  in  real- 
world  situations  there  is  no  marked  distinction  between  that 
which  is  normal  and  that  which  is  not.  In  other  words,  everyone 
has  some  limits  to  their  cognitive  abilities.  Some  have  a  highly 
specific  impairment,  some  have  more  diffuse  problems,  and 
some  experience  interrelated  constellations  of  impairments.  In 
addition,  the  cognitive  abilities  of  any  one  human  being  will 
change  over  longer  and  shorter  periods.  Aging  can  have  sub¬ 
stantial  effects  on  cognitive  ability,  which  is  particularly  marked 
in  some  age  related  conditions,  such  as  dementia. 


4 1 .  Information  Technology  for  Cognitive  Support 


813 


For  ease  of  exposition,  the  forms  of  cognitive  impairment 
identified  and  described  below  will  in  the  main  refer  to  general 
categories.  It  should  be  noted,  however,  that  all  these  categories 
lie  somewhere  on  a  continuum  and,  while  they  are  delineated 
on  the  basis  of  educational  and  clinical  or  medical  criteria,  such 
cut  off  points  are  relatively  arbitrary  in  the  context  of  the  wide 
variability  of  cognitive  ability  across  the  population.  It  is  also 
worth  noting  that  within  the  context  of  normal  cognitive  sys¬ 
tems,  there  is  significant  diversity  among  people  in  regard  to  dif¬ 
ferential  preferences  for  types  of  material  and  ways  of  approach¬ 
ing  and  processing  information.  For  example,  some  people  may 
be  considered  primarily  verbal  and  tend  to  excel  in  language- 
based  tasks,  relative  to  those  considered  visuo/spatial  (e.g., 
Lohman,  2000).  Thus,  many  of  the  types  of  impairment  ad¬ 
dressed  below  can  be  construed  to  an  important  extent  as  the 
extremities  of  normal  diversity. 


INTELLIGENCE  QUOTIENT 


The  most  widely  known  dimension  of  general  cognitive  ability  is 
probably  intelligence.  Scientific  investigation  of  this  dimension 
has  a  controversial  past,  and  many  aspects  of  this  are  beyond 
the  scope  of  the  present  chapter  (for  a  more  comprehensive  ac¬ 
count,  see  Gould,  1997).  One  underlying  reason  for  such  con¬ 
troversy  is  that  the  word  intelligence  has  a  rather  nebulous  de¬ 
finition.  In  day-to-day  usage,  this  is  rarely  problematic,  but  the 
differences  between  scientific  and  lay  definitions  can  cause  mis¬ 
understanding  (e.g.,  Sternberg,  2000).  Such  misunderstanding 
can  lead  to  controversy  as  most  definitions  of  intelligence  in¬ 
clude  connotations  that  are  considered  socially  important  and, 
thus,  can  often  be  highly  emotive.  Despite  these  difficulties,  the 
investigation  of  intelligence  has  provided  many  insights  into  a 
wide  range  of  more  particular  cognitive  abilities,  and  has  devel¬ 
oped  methods  for  quantifying  general  intellectual  ability  such  as 
the  various  forms  of  IQ  (intelligence  quotient)  tests.  Again,  con¬ 
troversy  has  surrounded  the  use  of  these  tests  over  the  years 
(see  Gould,  1997;  Kaufman,  2000),  but  such  tests  have  been 
widely  used  and  are  accepted  as  a  general  benchmark  of  a  per¬ 
son’s  intellectual  capability. 

An  IQ  score  of  100  is,  by  definition,  normal  with  about  5096 
of  the  population  scoring  above  and  5096  scoring  below,  but  it 
should  be  noted  that  elderly  people  are  not  generally  included 
in  the  standardization  of  these  scores.  Approximately  5096  of 
the  population  is  considered  within  the  bounds  of  normality 
and  deviate  either  side  of  100  by  no  more  than  10  points.  The 
nonnormal  50%  are  distributed  evenly  above  and  below  this 
band.  Thus  about  a  quarter  of  the  (nonelderly)  population  fall 
below  the  level  of  what  is  considered  normal.  Although  the  ter¬ 
minology  varies  across  cultures  and  over  time,  around  20%  of 
the  population  have  IQ  scores  between  75  and  90  and  would 
generally  be  classified  as  slow  learners.  The  final  5%  will  gen¬ 
erally  have  very  special  needs  that  overall  are  best  addressed  on 
an  individual  basis  (Kaluger  &  Kolson,  1987).  Further  to  this 
and  as  an  example  of  the  emotive  connotations  associated  with 
the  issue  of  ‘intelligence’,  it  is  worth  noting  that  the  first  official 
classification  scheme  (see  Detterman,  Gabriel,  &  Ruthsatz, 
2000)  associated  with  IQ  tests  further  broke  down  this  latter 


5%.  These  classifications  were  moron  (IQ  50-75),  imbecile  (IQ 
25-50),  and  idiot  (IQ  <  25),  terms  which  today  would  be  con¬ 
sidered  wholly  unacceptable  as  a  description  of  anyone  with  a 
cognitive  impairment. 

An  IQ  score  reflects  a  person’s  intellectual  ability  as  a  whole. 
A  low  score  may  be  due  to  the  whole  system  functioning  at  a 
suboptimal  level,  but  a  similar  result  can  also  be  due  to  one  or 
more  component  abilities  being  impaired.  There  are  many  tests 
of  IQ,  some  of  which  give  an  indication  of  this  while  others  do 
not.  Some  IQ  tests  are  explicitly  broken  down  into  subtests  that 
reflect  the  relative  levels  of  ability  in  the  component  cognitive 
abilities,  such  as  the  verbal  and  visuo/spatial  abilities  mentioned 
above.  The  more  common  forms  of  cognitive  impairment  are 
described  below  in  the  context  of  a  brief  overview  of  the  cogni¬ 
tive  system. 

For  any  information  in  the  outside  world  to  enter  the  cogni¬ 
tive  system,  it  must  first  be  detected  and  transmitted  by  the  sen¬ 
sory  apparatus.  In  an  important  sense,  this  is  not  simply  the 
start  of  the  process  because  aspects  of  attention  will  influence 
what  is  and  is  not  detected/transmitted,  and,  to  a  certain  ex¬ 
tent,  how.  Basic  perceptual  processing  creates  a  sensory  specific 
representation  of  the  stimulus  event.  Streams  of  such  stimulus 
events  are  summated  into  meaningful  cognitive  entities  (e.g., 
strokes  on  a  page  recognized  as  letters  and  numerals  are  sum- 
mated  into  a  name  and  telephone  number).  These  will  then  be 
either  passed  immediately  to  short-term  memory  (STM),  fur¬ 
ther  processed  by  working  memory  (WM),  or  rehearsed  for 
maintenance  in  STM  or  for  encoding  into  longer  term  storage. 
For  example,  rehearsal  could  refer  to  rote  rehearsal  of  a  tele¬ 
phone  number  between  reading  and  dialing  it  or  to  more  elab¬ 
orate  processing  to  associate  it  with  relevant  extant  memories  to 
improve  the  chance  of  subsequent  recall  (e.g.  method  of  loci, 
mentioned  above) . 

Output  from  the  cognitive  system  will  generally  be  initiated 
in  response  to  some  form  of  external  stimulus,  or  probe,  by  ac¬ 
cessing  extant  memories  relevant  to  the  probe  using  executive 
processes  to  organize  them  in  a  task  relevant  way  and  then  pro¬ 
ducing  a  response.  Output  of  this  kind  has  been  most  com¬ 
monly  studied  with  the  use  of  memory  tests.  This  minimizes  the 
influence  of  intellectual  processing  (problem  solving,  etc.)  per 
se,  and  emphasizes  the  registration,  rehearsal  and  encoding  of 
information,  the  effects  of  decay,  interference  and  other  forms 
of  forgetting,  and  the  effectiveness  of  different  cues  (probes) 
in  eliciting  specific  memories  (e.g.,  recall  vs.  recognition). 

"Virtually  all  aspects  of  the  processing  outlined  above  are 
shaped  by  attention,  and  it  is  important  to  note  that,  regardless 
of  impairment,  while  we  all  have  some  control  over  attention, 
it  can  also  be  the  case  that  attention  can  have  some  control  over 
us.  That  is,  we  can  utilize  attention  to  focus  on  searching  a  list 
for  a  particular  telephone  number  while  ignoring  the  chatter  of 
people  around  us.  Having  read  the  number,  however,  our  at¬ 
tention  can  exert  its  own  control  if  someone  calls  our  name  and 
asks  if  we  have  made  that  call  yet.  Despite  our  best  efforts  at  re¬ 
hearsal,  it  is  likely  that  our  attention  will  be  grabbed  by  our 
name,  and  the  ensuing  question  and  this  brief  distraction  can  be 
enough  to  lose  the  information  from  temporary  storage. 

In  general  terms,  mild  to  moderate  global  cognitive  impair¬ 
ment  will  be  associated  with  decrements  in  efficiency  across 
most  of  the  processing  stages  outlined  above  and  in  aspects  of 
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the  utilization  of  attention.  The  following  will  describe  some  of 
the  main  decrements  in  cognitive  ability  related  to  interacting 
with  computer  type  systems. 


INTERFACE  DESIGN  TO  SUPPORT  PEOPLE 
WITH  COGNITIVE  IMPAIRMENT 


It  is  often  important  to  develop  special  technology  to  provide 
support  for  people  with  various  types  of  cognitive  impairment, 
and  some  such  projects  are  described  later.  It  is  also  important, 
however,  to  address  the  challenge  of  providing  access  to  more 
mainstream  technology  for  people  with  cognitive  impairments. 
When  designing  or  specifying  mainstream  technology  for  such 
users,  it  is  important  to  focus  on  their  characteristics  and  to  be 
fully  aware  of  the  range  of  cognitive  diversity,  even  amongst 
those  without  clinical  dysfunction.  This  is  rarely  mentioned  in 
human  interface  design,  where  the  cognitive  diversity  of  the  hu¬ 
man  race  has  not  been  the  focus  of  much  research.  It  is  also  im¬ 
portant  to  consider  the  effects  of  age  on  cognitive  function.  As 
Worden,  Walker,  Bharat,  and  Hudson  (1997)  commented: 

It  is  known  that,  as  people  age,  their  cognitive  perceptual  and  motor 
abilities  decline  with  negative  effects  on  their  ability  to  perform  many 
tasks.  Computers  can  play  an  increasingly  important  role  in  helping 
older  adults  function  well  in  society.  Despite  this  little  research  has  fo¬ 
cused  on  computer  use  of  older  adults,  (p.  266) 

A  key  aspect  of  any  intellectual  task,  in  regard  to  interactive 
technology  for  people  with  mild  or  moderate  global  cognitive 
impairment,  is  speed  (Salthouse,  T.  A.,  1991).  Whatever  level  of 
performance  a  person  can  achieve  in  any  given  situation  will  be 
made  worse  if  the  task  must  be  done  under  externally  imposed 
time  constraints  whether  actual  or  simply  inferred  by  the  user. 
Thus,  wherever  possible,  the  design  of  any  interaction  should  al¬ 
low  every  step  to  be  carried  out  at  the  user’s  own  pace.  This  is¬ 
sue  also  raises  the  first  distinction  between  older  and  younger 
people.  A  relatively  greater  proportion  of  the  extra  time  needed 
by  older  people  is  due  to  age-related  declines  in  their  sensory 
systems,  particularly  in  hearing  and  vision  rather  than  cognitive 
impairment  per  se.  For  example,  given  comparable  levels  of 
cognitive  impairment,  an  older  person  will  need  relatively  more 
time  to  perceive  the  relevant  stimulus  before  cognitive  processes 
can  be  brought  to  bear  on  it.  Thus,  for  many  older  people,  the 
requirement  for  extra  time  can  be  reduced,  though  rarely  re¬ 
moved,  if  care  is  taken  to  present  text  and  other  aspects  of  on¬ 
screen  layout  in  a  suitably  clear  way  (e.g.,  Carmichael,  A.  R., 
1999;  Charness  &  Bosman,  1994).  Attention  to  text  layout  can 
also  benefit  people  with  specific  learning  difficulties,  such  as 
those  with  dyslexia  and  people  with  limited  literacy  levels.  Be¬ 
yond  this,  clear  text  and  presentation  layout  will  always  be 
worth  considering  carefully  as  such  aspects  have  been  found  to 
benefit  those  who  do  not  specifically  need  it,  albeit  to  a  less 
marked  extent  (Freudenthal,  1999;  Pirkl,  1994). 

Another  key  concept  related  to  interface  design  for  people 
with  cognitive  impairment  is  complexity  and  its  avoidance. 
Complexity  in  interfaces  can  manifest  itself  in  many  different 
ways  and  at  many  different  levels.  A  truly  comprehensive  cov¬ 


erage  of  this  is  beyond  the  scope  of  the  present  chapter,  but 
some  illustrative  examples  will  be  given  to  elucidate  this  idea. 

The  use  of  language  in  an  interactive  system  should  be  given 
careful  consideration,  and  the  syntax  and  vocabulary  should  be 
kept  as  straightforward  and  commonplace  as  the  context  allows. 
This  is  particularly  pertinent  for  any  form  of  instruction.  If  the 
requirements  of  a  particular  stage  of  an  interaction  cannot  be 
captured  in  a  few  simple  concrete  statements,  then  serious  con¬ 
sideration  should  be  given  to  redesigning  the  interaction  itself 
Similarly  any  on-screen  display  should  be  kept  as  uncluttered 
as  practicable  and  wherever  possible  should  present  the  user 
with  only  a  single  issue  (menu,  subject,  decision,  etc.)  at  any 
particular  point  in  time.  Similarly,  but  at  a  larger  scale,  progres¬ 
sion  through  an  interaction  should  be  kept,  again  wherever 
practicable,  as  linear  as  possible.  That  is,  the  user  should  only 
need  to  consider  one  thing  at  a  time.  Any  requirement  to  deal 
with  different  things  in  parallel  will  markedly  increase  the  pos¬ 
sibility  of  errors  and  general  user  dissatisfaction  (Detterman 
et  al.,  2000;  Salthouse,  T,  1985). 

Unfortunately,  as  the  designers  of  a  system  will  have  a  com¬ 
prehensive  understanding  of  the  functions  of  that  system,  they 
are  unlikely  to  assess  issues  of  complexity  from  the  users’  point 
of  view,  particularly  that  of  a  novice  user.  In  addition,  prescrip¬ 
tive  checklists  for  avoiding  complexity  will  ultimately  be  inade¬ 
quate,  as  the  optimum  approach  will  always  depend  of  the 
specifics  of  the  task  the  interactive  system  is  intended  to  sup¬ 
port  (Carmichael,  A.  R.,  1999).  This  is  one  of  the  main  issues 
that  highlights  the  importance  of  early  and  rigorous  user  in¬ 
volvement  in  the  design  of  interactive  systems.  This  is  particu¬ 
larly  important  in  the  case  of  young  designers  developing  in¬ 
terfaces  for  older  users. 

Research  into  cognitive  changes  in  later  life  has  indicated  the 
heuristic  value  of  the  concepts  of  fluid  and  crystallized  abilities 
(Horn  &  Cattell,  1967).  In  general,  fluid  abilities  (novel  prob¬ 
lem  solving)  are  those  that  decline  with  age,  and  crystallized 
abilities  (existing  world  knowledge)  are  those  that  do  not.  Re¬ 
search  that  has  looked  at  the  very  old  (80  years  and  older)  found 
that  ultimately  everything  declines  but  that  the  crystallized  abil¬ 
ities  follow  a  markedly  slower  trajectory  (Backman,  Small, 
Wahlin,  &  Larsson,  2000).  This  is  another  distinction  between 
younger  and  older  people,  as  the  former  will  tend  to  have  much 
less  accumulated  knowledge.  It  is  also  worth  noting  that  there 
are  advantages  and  disadvantages  on  both  sides  of  this  distinc¬ 
tion.  Rabbitt  (1993)  has  shown  that,  in  many  circumstances,  rel¬ 
evant  accumulated  knowledge  can  ameliorate  decline  in  fluid 
ability  (e.g.  well-learned  strategies).  In  other  circumstances, 
however,  the  opposite  can  be  the  case,  wherein  a  well-learned, 
but  essentially  inappropriate,  strategy  can  put  a  relatively 
greater  burden  on  the  associated  fluid  abilities. 

The  Effects  of  Attention 

Many  of  the  constraints  imposed  by  cognitive  impairment  can 
be  further  shaped  by  decrements  in  various  aspects  of  attention. 
One  major  aspect  of  this  is  generally  referred  to  as  “selective 
attention,”  which  allows  us  to  focus  on  salient  aspects  of  a  task 
and  at  the  same  time  helps  us  actively  to  ignore  irrelevant  as¬ 
pects.  The  efficiency  of  selective  attention  is  markedly  dimin- 
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ished  in  most  forms  of  cognitive  impairment.  This  factor  further 
supports  the  recommendation  to  present  the  user  with  just  one 
thing  at  a  time,  which  will  avoid  the  user  erroneously  devoting 
time  and  cognitive  resources  to  processing  irrelevant  informa¬ 
tion.  Similarly,  if  the  nature  of  the  interaction  requires  the  user 
to  attend  to  some  critical  information  at  a  particular  time/loca¬ 
tion,  appropriately  obvious  highlighting  should  be  employed 
to  grab  the  users’  attention.  These  issues  become  emphasized 
in  situations  where  selective  attention  must  be  maintained  over 
periods  of  more  than  just  a  few  minutes. 

Another  aspect  of  attention  known  to  be  less  efficient  in  cog¬ 
nitive  impairment  is  referred  to  as  “divided  attention.”  In  gen¬ 
eral,  this  refers  to  the  ability  to  allocate  cognitive  resources  ap¬ 
propriately  when  trying  to  do  two  or  more  distinct  cognitive 
tasks  or  distinct  portions  of  the  same  task  at  the  same  time.  Many 
scenarios,  where  the  user  is  required  to  do  more  than  one  thing 
at  a  time,  can  simply  demand  more  cognitive  resources  than  are 
available.  However,  declines  in  the  efficiency  of  divided  attention 
can  mean  that  even  if  the  tasks  involved  demand  no  more  than 
the  resources  available,  they  may  not  be  allocated  appropriately. 
Generally,  the  interactive  system  should  be  designed  to  relieve 
the  user  of  this  kind  of  burden.  It  is  difficult  to  be  prescriptive 
about  suitable  solutions  to  this  problem,  as  the  appropriate  ap¬ 
proach  will  depend  on  the  specifics  of  the  interaction  involved, 
but  some  general  ideas  may  be  of  use.  For  example,  the  provi¬ 
sion  of  some  form  of  notepad  function  may  be  helpful  for  tem¬ 
porarily  recording  information  for  subsequent  use,  although 
great  care  is  needed  to  ensure  that  the  instantiation  of  such  a 
function  and  its  utilization  does  not  put  further  cognitive  load  on 
the  user.  Another  possibility  would  be  the  provision  of  an 
overview  of  the  task,  which  could  show  or  remind  the  user 
where  they  are  and  what  they  have  and  have  not  done. 

Memory  Loss  and  Dementia 

Limitations  in  memory  affect  people  with  age-  and  nonage- 
related  cognitive  impairment.  Thus,  wherever  practicable,  in¬ 
teractive  systems  should  be  designed  to  take  the  burden  of 
memory  off  the  user,  for  example,  by  judicious  use  of  prompts 
and  reminders.  Also,  careful  consideration  of  the  steps  in  an  in¬ 
teraction  and  the  way  they  are  presented  to  the  user  can  help 
mitigate  the  most  common  problem  of  deficient  short  term  and 
working  memory.  Even  with  the  best  design  efforts,  however, 
such  problems  are  likely  to  make  users  with  cognitive  impair¬ 
ment  relatively  error  prone.  It  is  thus  very  important  to  ensure 
that  the  interactive  system  allows  for  error  correction  in  an  easy 
to  use  form.  To  ensure  that  the  user  spots  such  errors,  the  sys¬ 
tem  should  provide  feedback  regarding  user  actions  and  where 
appropriate  elicit  active  confirmation  from  them. 

"Various  forms  of  dementia  can  exaggerate  the  relatively  mild 
effects  of  normal  aging  on  the  cognitive  system.  At  the  age  of 
60  years,  about  1%  of  the  population  is  diagnosed  with  demen¬ 
tia.  This  percentage  approximately  doubles  for  every  subse¬ 
quent  5-year  age  band  (e.g.,  4%  at  70  years  and  16%  at  80  years; 
Backman  et  al.,  2000).  Of  the  elderly  population  with  demen¬ 
tia,  Alzheimer’s  disease  accounts  for  about  60%,  depending  on 
the  diagnostic  criteria  used  and  a  further  25%  is  vascular  de¬ 
mentia  (e.g.,  related  to  circulatory  problems).  Most  of  the  vas¬ 


cular  dementias  are  referred  to  as  “multi-infarct  dementias”  and 
tend  to  be  caused  by  series  of  ministrokes,  and  thus,  they  tend 
to  have  more  diffuse  and  less  predictable  effects  on  ability  than 
a  major  stroke.  The  remaining  15%  of  dementias  are  made  up  of 
various  relatively  rare  conditions  (Backman  et  al.,  2000). 

Regardless  of  the  various  causes  and  effects,  all  forms  of  de¬ 
mentia  involve  damage  to  the  brain,  such  damage  being  more 
or  less  widespread,  affecting  cortical  and/or  subcortical  areas.  In 
general,  damage  to  the  cortex  results  in  cognitive/perceptual 
impairment,  while  damage  to  the  subcortical  areas  is  more  re¬ 
lated  to  physical  impairment.  There  are,  however,  a  number  of 
well-known  problems  related  to  the  diagnosis  of  dementia.  Two 
of  these  are  relevant  here.  The  first  involves  the  grey  area  be¬ 
tween  the  worst  effects  of  normal  aging  and  the  initial  effects 
of  pathological  aging  at  the  onset  of  dementia.  The  second  is 
that  the  effects  of  depression  in  later  life  can  closely  mimic 
those  of  dementia.  These  additional  complexities  further  ex¬ 
pand  the  overall  diversity  of  cognitive  impairment  in  relation 
to  human  interface  design,  both  in  regard  to  the  general  level  of 
ability  and  in  the  variation  of  that  level  over  periods  of  days, 
weeks,  months,  and  even  years.  The  convolution  of  this  situa¬ 
tion  is  further  added  to  by  the  effects  of  a  relatively  greater  prob¬ 
ability  of  ill  health  among  older  people.  It  is  estimated  that 
around  80%  of  those  aged  65  and  over  have  at  least  one  chronic 
illness  and  that  many  will  have  more  than  one.  In  addition  to  the 
effects  of  health  per  se,  there  is  also  potential  for  cognitive  abil¬ 
ity  to  be  affected  by  a  variety  of  medication  and  by  interactions 
between  different  medicines. 

Despite  the  above,  some  systematic  changes  associated  with 
extreme  old  age  and  dementia  are  relevant  to  human-interface 
design.  In  general,  the  first  ability  to  deteriorate  in  dementia, 
particularly  with  Alzheimer’s  disease,  is  episodic  as  distinct  from 
semantic  memory.  Episodic  memory  is  memory  for  events,  usu¬ 
ally  from  the  viewpoint  of  personal  experience,  rather  than  for 
facts.  That  is,  remembering,  “X  is  the  capital  of  Y”  is  the  prod¬ 
uct  of  semantic  memory,  whereas  remembering  when  and 
where  you  were  while  you  were  reading  “X  is  the  capital  of  Y” 
is  the  product  of  episodic  memory.  This  generalized  decrement 
in  episodic  memory  may  be  related  to  findings  in  normal  aging 
research  such  as  disproportionate  decrements  in  source  mem¬ 
ory  (e.g.,  specifically  remembering  where  an  item  was  rather 
than  what  it  was)  and  to  prospective  memory  (e.g.,  remember¬ 
ing  to  do  something  in  the  future).  These  changes  have  impor¬ 
tant  implications  for  successful  navigation  in  interactive  systems. 
Eor  example,  keeping  track  of  where  you  have  just  been,  is  often 
an  important  prompt  to  where  you  are  going  now. 

Visuo-Spatial,  Iconic,  and  Verbal  Abilities 

In  addition  to  memory  problems,  there  are,  at  least  at  the  level 
of  the  population,  marked  deterioration  of  visuo-spatial  and  ver¬ 
bal  abilities  in  older  people.  Decline  in  visuo-spatial  abilities  can 
cause  difficulty  with  decoding  layouts  and  utilizing  any  inher¬ 
ent  organization.  Related  to  this  is  a  deterioration  in  iconic 
memory  which,  given  the  graphical  nature  of  many  interfaces, 
can  be  problematic  in  its  own  right.  Particular  difficulties  have 
also  been  found  in  the  comprehension  of  abstract  and  meta¬ 
phorical  phrases,  with  the  tendency  being  to  take  them  literally. 


816  •  NEWELL  ET  AL. 


Such  conditions  can  develop  into  more  global  aphasia  (e.g., 
Broca’s  aphasia,  related  to  the  production  of  speech,  and  Wer¬ 
nicke’s  aphasia,  related  to  comprehension).  The  general  diffi¬ 
culty  with  recall  of  proper  nouns,  found  in  normal  aging,  can 
likely  develop  into  more  profound  anomia.  The  depth  of  such 
problems  may  not  be  apparent  to  an  outside  observer  as  the 
ability  to  read  aloud  may  be  well  preserved,  although  the  con¬ 
tent  may  not  be  properly  understood 

Another  distinct  form  of  global  cognitive  impairment  is 
autism,  including  a  set  of  rarer  but  related  syndromes  (Kaluger 
&  Kolson,  1987).  The  precise  causes  of  autism  are  not  clearly  un¬ 
derstood.  Briefly  stated,  it  is  a  general  neurological  disorder  that 
impacts  the  normal  development  of  the  brain  particularly  in  re¬ 
lation  to  social  interaction  and  communication  skills.  Its  effects 
will  usually  become  apparent  within  the  first  three  years  of  life. 
People  with  autism  typically  have  difficulties  in  verbal  and  non¬ 
verbal  communication  and  social  interactions.  The  disorder 
makes  it  hard  for  them  to  communicate  with  others  and  relate 
to  the  outside  world,  they  will  also  tend  to  have  relatively  low  IQ 
scores.  Closely  related  to  autism  is  Asperger’s  Syndrome.  People 
with  Asperger’s  experience  similar  social  communication  diffi¬ 
culties  but  will  generally  demonstrate  a  normal  IQ.  Further  to 
this,  there  are  several  generally  similar  conditions,  some  of 
which  have  varying  physical  and  behavioral  elements  associated 
with  them.  These  come  under  the  collective  heading  of  perva¬ 
sive  developmental  disorders  and  all  tend  to  produce  difficulties 
with  communication.  An  important  element  of  these  social 
communication  difficulties  in  the  context  of  the  present  chapter 
is  an  inability  to  grasp  the  implications  of  metaphorical  or  id¬ 
iomatic  language.  This  is  similar  to  that  mentioned  above  for  de¬ 
mentia  but  in  autism  tends  to  be  more  profound.  However, 
some  evidence  shows  that  people  with  autism  or  Asperger’s 
syndrome  are  more  able  to  communicate  with  computers  than 
with  people,  or  with  people  via  computers,  rather  than  face  to 
face,  and  thus  properly  designed  computer  systems  may  have 
potential  for  assisting  such  user  groups.  A  variety  of  other  im¬ 
pairments  could  also  be  ameliorated  with  suitable  designed  sup¬ 
port  for  communication. 


Augmentative  and  Alternative  Communication 

Most  interfaces  use  traditional  orthography  (the  written  word) . 
However,  cognitive  limitations  can  have  a  major  impact  on  the 
ability  to  encode  and  decode  traditional  orthography.  People 
with  acquired  cognitive  impairments  (e.g.  aphasia  resulting 
from  a  stroke  or  cerebral  vascular  accident;  CVA)  can  experience 
varying  degrees  of  difficulty  with  both  expressive  and  receptive 
language.  A  person  with  aphasia  may  have  slight  word-finding 
problems  through  to  more  pervasive  problems  in  understand¬ 
ing  spoken  language.  'While  some  individuals  may  retain  some 
literacy  skills,  damage  to  the  language  processing  centers  will 
usually  also  affect  symbolic  representations  of  language. 

Individuals  with  congenital  language  and/or  intellectual  dis¬ 
abilities  (e.g.  congenital  aphasia  and  Down’s  syndrome)  may 
never  become  literate.  The  physical  inability  to  speak  (e.g. 
dysarthria  resulting  from  cerebral  palsy)  may  also  impact  literacy 
learning,  as  basic  skills  required  for  reading  and  writing  (e.g. 
phonemic  awareness)  may  be  absent. 


'Visual  images,  photographs  and  drawings,  provide  augmen¬ 
tative  and  alternative  ways  to  access  communication.  Such  im¬ 
ages  can  be  used  to  enhance  text-based  interfaces.  In  addition, 
sets  of  symbols  (e.g.  the  Picture  Communication  System,  PCS, 
see  Rebus,  see  and  semantic-based  writing  systems  such  as  Blis- 
symbolics,  see  can  be  used  as  an  alternative  to  text  (Beukelman, 
&  Mirenda,  1998;  'Wilson,  2003).  The  type  of  picture,  symbol, 
or  graphic  used  will  depend  on  the  iconicity  (ease  of  recogni¬ 
tion)  ,  transparency  (guessability) ,  opaqueness  (logic  organiza¬ 
tion)  and  learnability  of  the  image.  For  instance,  a  photograph 
of  a  house  may  be  transparent — for  example,  it  is  easily  recog¬ 
nizable — ^while  a  Blissymbol  representing  the  emotion  of  happy 
has  logic  (heart  =  feeling;  up  arrow  =  up)  and  is  thus  opaque 
and  learning  (see  Table  41.1).  The  more  concrete  a  representa¬ 
tion,  the  more  recognizable  it  will  be.  However,  representations 
of  abstract  meanings,  such  as  emotions,  will  involve  less  trans¬ 
parent  images  necessitating  a  longer  learning  curve  (see 
Table  41.1). 


Cognitive  Prostheses 

In  addition  to  providing  better  access  to  everyday  software, 
specially  designed  computer  systems  have  great  potential  to  of¬ 
fer  more  specific  support  for  people  with  a  wide  variety  of  cog¬ 
nitive  impairments.  'We  will  describe  some  specific  examples 
of  such  systems  so  that  the  reader  is  able  to  see  how  such  devel¬ 
opment  can  proceed.  'Within  the  Applied  Computing  Depart¬ 
ment  at  Dundee  University,  a  number  of  projects  have  devel¬ 
oped  computer-based  systems  to  support  people  with  cognitive 
dysfunction,  and  examples  from  the  work  of  this  group  can  il- 


TABLE  41.1.  Examples  of  Three  Symbol  Sets  which  have  been 
Developed  for  Augmentative  and  Alternative  Communication — 
for  More  Information  See  About  These  Symbol  Systems  See 


sad 


4 1 .  Information  Technology  for  Cognitive  Support 


817 


lustrate  the  range  of  areas  where  this  technology  has  particu¬ 
lar  potential.  These  examples  will  also  demonstrate  the  need 
for  appropriate  methodologies  for  research  in  this  area,  in¬ 
cluding  need  for  research  into  developing  more  sensitive  and 
effective  methodologies. 

Software  as  a  Cognitive  Scaffolding 
and  a  Prompt  for  Communication 

One  sequence  of  projects  took  as  its  starting  point  the  improve¬ 
ment  of  communication  systems  for  physically  impaired  non¬ 
speaking  people.  It  became  apparent  that  this  could  be  done 
very  effectively  by  developing  models  of  the  cognitive  tasks  in¬ 
volved  in  communication.  This  research  has  now  spawned  a 
new  area  of  development,  which  is  cognitive  support  for  people 
with  dementia,  where  communicative  impairment  is  just  part 
of  their  range  of  difficulties. 

For  severely  physically  impaired  nonspeaking  people,  even 
with  current  speech  output  technology,  speaking  rates  of  2  to 
10  words  per  minute  are  common,  whereas  unimpaired  speech 
proceeds  at  150  to  200  words  per  minute.  In  an  attempt  to  im¬ 
prove  this  disparity,  a  certain  amount  of  progress  has  been 
made  in  the  area  of  using  computers  to  replace  or  augment 
some  of  the  cognitive  aspects  of  communication.  Although  the 
cognitive  processes  underlying  language  use  are  incompletely 
understood,  a  number  of  theories  that  attempt  to  explain  lan¬ 
guage  use  have  been  used  to  improve  the  functionality  of  com¬ 
munication  systems  for  nonspeaking  people.  This  approach  to 
the  problem  usually  involves  taking  a  sociolinguistic  view  of  lan¬ 
guage.  Instead  of  focusing  on  the  building  blocks,  or  taking  a 
bottom  up  approach,  the  interaction  as  a  whole  is  analyzed,  pay¬ 
ing  attention  to  its  goals,  or  taking  a  top  down  approach  to  the 
communication.  This  may  well  be  a  realistic  simulation  of  the 
natural  process,  since  the  production  of  speech  by  an  unim¬ 
paired  speaker  occurs  at  such  a  rate  that  conscious  processing 
and  controlling  of  the  speech  at  a  microlevel  is  not  possible.  In 
common  with  other  learned  skills,  speech  is  produced  to  some 
extent,  automatically,  with  the  speaker  being  aware  of  giving 
high-level  instructions  to  the  speech  production  system,  but 
leaving  the  details  of  its  implementation  to  the  system. 

The  nonconscious  control  of  much  of  speech  production 
has  been  modeled  in  the  CHAT  prototype  (Aim,  Arnott,  & 
Newell,  1992).  This  produced  quick  greeting,  farewell,  and  feed¬ 
back  remarks  by  giving  the  user  semiautomatic  control  of  ex¬ 
actly  what  form  the  remarks  would  take,  within  parameters  that 
the  user  had  previously  selected.  This  mimicked  the  phenome¬ 
non  of  a  speaker  responding  automatically  to  greetings  and 
other  commonly  occurring  speech  routines,  without  giving  the 
process  any  detailed  thought. 

The  CHAT-like  conversation  described  illustrates  an  attempt  to 
achieve  a  particular  communicative  goal,  achieving  social  close¬ 
ness  by  observing  social  etiquette.  Some  recent  research  efforts 
have  been  directed  at  Hnding  ways  to  incorporate  large  chunks  of 
text  into  an  augmented  conversation,  to  help  users  carry  out 
topic  discussion.  This  has  been  driven  by  the  observation  that  a 
great  deal  of  everyday  discourse  is  reusable  in  multiple  contexts. 

Much  of  this  type  of  discourse  takes  the  form  of  conversa¬ 
tional  narratives.  Research  into  the  conversational  narrative  at 


a  sociolinguistic  level  indicates  several  interesting  characteris¬ 
tics.  These  include  the  way  in  which  narratives  are  told  and  to 
whom  they  are  told.  For  example,  a  recent  event  is  told  repeat¬ 
edly  for  a  limited  time  to  most  people  with  whom  the  speaker 
has  contact.  As  the  event  recedes  in  history,  the  narrative  is  re¬ 
told  when  it  is  relevant  to  the  topic  of  conversation.  The  length 
of  the  narrative  depends  on  its  age  (the  older  a  story  is,  the 
more  embellished  it  can  become,  particularly  if  it  has  previously 
gone  down  well)  and  the  time  available  within  the  conversation. 
The  version  of  the  story  (the  sequence  of  events  may  remain  the 
same  while  the  details  or  embellishments  of  a  story  can  differ) 
depends  on  factors  such  as  the  conversational  context  and 
other  interlocutors  present. 

One  of  the  ways  to  make  the  retrieval  of  text  chunks  easier  is 
to  anticipate  the  chunks  that  the  user  may  want  to  use.  This  has 
been  achieved  by  modeling  the  way  in  which  conversational  nar¬ 
ratives  are  used  using  techniques  from  the  fields  of  artificial  in¬ 
telligence  and  computational  linguistics  (Waller,  1992).  The  pro¬ 
totypes  developed  constantly  adapt  to  the  users’  language  use, 
thus  mirroring  the  user’s  perception  of  where  conversation  items 
are  stored.  In  this  way,  the  system  adapts  to  the  way  the  user 
thinks  instead  of  having  the  user  learn  a  new  retrieval  system. 

One  of  the  arguments  against  using  such  prediction  was 
raised  when  word  prediction  systems  were  first  developed  in 
the  early  1980s.  Therapists  and  teachers  were  concerned  that 
nonspeaking  people,  especially  children,  would  select  what  was 
offered  on  the  screen  rather  than  what  they  originally  wanted  to 
say.  Although  this  may  happen,  research  into  predictive  systems 
applied  to  writing  suggests  that  they  may  carry  over  the  help 
they  offer  and  have  a  wider  effect  on  the  users’  ability  develop¬ 
ment.  Some  of  this  research  reports  an  increase  in  written  out¬ 
put  by  reluctant  writers  and  people  with  spelling  problems 
(Newell,  Arnott,  Booth,  &  Beattie,  1992).  A  general  improve¬ 
ment  in  spelling  has  also  been  noted.  Children  with  language 
dysfunction  and/or  learning  disabilities  have  shown  improve¬ 
ment  in  text  composition  (Newell,  Booth,  &  Beattie,  1991).  This 
research  is  in  the  writing  domain,  but  the  results  suggest  that 
predictive  systems  can  offer  assistance  without  becoming  mere 
substitutes  for  creative  expression. 

Also,  it  is  true  of  unimpaired  conversation  that  speakers 
often  change  direction  in  their  communication  depending  on 
chance  occurrences,  or  on  the  sudden  recollection  of  a  point 
they  would  like  to  include.  Thus,  there  is  a  degree  of  oppor¬ 
tunism  in  all  conversations.  Another  argument  in  favor  of  offer¬ 
ing  predicted  phrases  and  sequences  is  that  the  current  situa¬ 
tion  for  most  augmented  communicators  is  that  their 
conversations  tend  to  be  quite  sparse,  with  control  tending  to 
reside  with  unaided  speaking  partners.  If  it  is  not  possible  to 
go  boating  on  the  lake,  easily  going  off  in  any  direction  you 
please,  is  it  not  preferable  to  build  a  boardwalk  out  over  the  wa¬ 
ter  than  to  stay  on  the  shore? 

One  of  the  motivations  to  improve  communication  sys¬ 
tems  for  nonspeaking  people  is  the  fact  that  they  are  com¬ 
monly  perceived  by  people  who  do  not  know  them  as  being 
less  intellectually  capable  than  they  actually  are.  It  is  often  re¬ 
ported  by  nonspeaking  people  that  they  are  considered  un¬ 
intelligent  or  immature  by  strangers.  The  issue  of  perceived 
communicative  competence  is  one  that  needs  increased  at¬ 
tention  (McKinlay  1991). 
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Related  to  this,  an  interesting  finding  emerged  from  work  in 
which  one  of  the  authors  was  involved.  Here,  a  prototype  com¬ 
munication  system  was  used  to  evaluate  listeners’  impressions 
of  the  content  of  computer-aided  communication  based  on  pre¬ 
stored  texts,  as  compared  to  naturally  occurring  dialogues.  The 
nonspeaking  user  was  able  to  use  only  prestored  texts  in  order 
to  conduct  the  conversations.  Most  of  the  text  was  material 
about  one  subject  (holidays).  A  number  of  rapidly  accessible 
comments  and  quick  feedback  remarks  were  also  available.  The 
unaided  conversations  were  between  pairs  of  normally  speaking 
volunteers  who  were  asked  to  converse  together  on  the  topic  of 
holidays.  Transcripts  of  randomly  sampled  sections  of  the  con¬ 
versations  and  audio  recordings  of  reenactments  of  the  samples 
with  pauses  removed  were  rated  for  social  competence  on  a  six- 
item  scale  (coefficient  alpha  =  0.83)  by  24  judges.  The  content 
of  the  computer-aided  conversations  was  rated  significantly 
higher  than  that  of  the  unaided  samples  {p  <  .001).  The  judges 
also  rated  the  individual  contributions  of  the  computer-aided 
communicator  and  the  unaided  partners  on  how  socially  worth¬ 
while  and  involving  these  appeared.  There  was  no  significant 
difference  between  the  ratings  of  their  respective  contributions 
{p  >  .05;  Todman,  Elder,  &  Aim,  1995). 

This  finding  came  as  something  of  a  surprise  to  the  re¬ 
searchers,  since  the  original  purpose  had  been  to  establish 
whether  conversations  using  prestored  material  would  simply 
be  able  to  equal  naturally  occurring  conversations  in  quality  of 
content.  Of  course,  the  pauses  in  actual  computer-aided  con¬ 
versation  (removed  in  the  above  analysis)  do  have  an  effect  on 
listeners’  impressions  of  the  quality  of  the  communication,  but 
this  finding  is  still  of  interest,  since  it  suggests  that  in  some  ways 
augmented  communication  could  have  an  edge  over  naturally 
occurring  talk.  A  plausible  explanation  for  this  finding  is  that  nat¬ 
urally  occurring  talk  is  full  of  high-speed  dysfluencies,  mistakes, 
substitutions,  and  other  messy  features  that  listeners  tend  to 
discount  with  their  abilities  to  infer  what  the  speaker  is  intend¬ 
ing  to  say.  Prestored  material  is  by  its  nature  selected  because  it 
may  be  of  particular  interest,  and  it  is  expressed  more  carefully 
than  quick  flowing  talk,  and  thus  may  appear  more  orderly  and 
dense  with  meaning  than  natural  talk. 

In  addition  to  conversational  narratives,  another  common 
structure  in  everyday  communication  is  the  script,  particularly 
where  the  speaker  is  undertaking  some  sort  of  transaction. 
Scripts  may  be  a  good  basis  for  organizing  prestored  utterances 
to  attempt  to  overcome  the  problem  of  memory  load  when  op¬ 
erating  a  complex  communication  system  based  on  a  large 
amount  of  prestored  material.  Users’  memory  load  can  be  re¬ 
duced  by  making  use  of  their  existing  long-term  memories  to 
help  them  locate  and  select  appropriate  utterances  from  the 
communication  system.  Schank  and  Abelson  (1977)  proposed 
a  theory  that  people  remember  frequently  encountered  situa¬ 
tions  in  structures  in  long-term  memory,  which  they  termed 
scripts.  A  script  captures  the  essence  of  a  stereotypical  situa¬ 
tion  and  allows  people  to  make  sense  of  what  is  happening  in  a 
particular  situation  and  to  predict  what  will  happen  next.  Other 
research  (e.g.,  'Vanderheiden,  Demasco,  McCoy,  Pennington, 
1992)  has  shown  the  potential  that  similar  script-based  tech¬ 
niques  offer  to  this  field. 

An  initial  experiment  was  devised  (Aim,  Arnott,  &  Newell, 
1995)  to  investigate  the  potential  of  a  script-based  approach  to 


transactional  interactions  with  a  communication  system,  and  a 
prototype  system  was  developed  to  facilitate  this  experiment. 
The  aim  was  to  ascertain  whether  or  not  a  transactional  inter¬ 
action  could  be  conducted  using  a  script-based  communication 
system.  It  was  decided  to  simulate  a  particular  transactional  in¬ 
teraction  that  could  reasonably  be  expected  to  follow  a  pre¬ 
dictable  sequence  of  events,  for  example,  one  that  would  be 
amenable  to  the  script  approach,  in  order  to  find  out  whether 
a  computer-based  script  could  enable  a  successful  interaction. 

The  transaction  chosen  for  the  experiment  was  that  of  ar¬ 
ranging  the  repair  of  a  household  appliance  over  the  telephone. 
Although  the  script  interface  was  a  relatively  simple  one  devised 
for  the  purpose  of  this  experiment,  it  was  successful  in  facilitat¬ 
ing  the  interaction,  and  produced  a  significant  saving  in  the 
amount  of  physical  effort  required. 

To  take  this  work  further  a  large-scale  project  was  under¬ 
taken  to  incorporate  scripts  into  a  more  widely  usable  device. 
The  user  interface  of  this  system  is  made  up  of  three  main  com¬ 
ponents:  scripts,  rapidly  produced  speech  acts,  and  a  unique 
text  facility.  The  scripts  component  is  used  in  the  discussion 
phase  of  a  conversation  and  consists  of  a  set  of  scripts  with 
which  the  user  can  interact.  The  rapid  speech  act  component 
contains  high  frequency  utterances  used  in  the  opening  and 
closing  portions  of  a  conversation  and  in  giving  feedback  and 
consists  of  groups  of  speech-act  buttons.  This  facility  is  based  on 
previous  work  with  CHAT.  The  unique  text  component  is  used 
when  no  appropriate  prestored  utterances  are  available  and 
consists  of  a  virtual  keyboard,  a  word  prediction  mechanism, 
and  a  notebook  facility. 

To  provide  access  to  a  set  of  scripts,  an  interface  was  devised 
that  involved  a  pictorial  representation  of  the  scenes  in  the 
script.  The  pictorial  approach  was  taken  in  order  to  give  users 
easier  access  to  the  stored  material  and  to  assist  users  with  vary¬ 
ing  levels  of  literacy  skills.  In  this  interface,  scripts  are  presented 
to  the  user  as  a  sequence  of  cartoon-style  scenes.  The  scenes 
give  the  user  an  indication  of  the  subject  matter  and  purpose 
of  the  script  and  assist  the  user  to  assess  quickly  whether  the 
script  is  appropriate  for  current  needs.  Each  scene  is  populated 
with  realistic  objects  chosen  to  represent  the  conversation  tasks 
that  can  be  performed.  Thus,  the  user  receives  a  pictorial  over¬ 
view  of  the  script,  what  happens  in  it,  and  what  options  are  avail¬ 
able.  This  assists  the  user  to  see  quickly  what  the  script  will  be 
able  to  do  in  the  context  of  the  current  conversation.  An  exam¬ 
ple  of  the  interface  for  the  system  can  be  seen  in  Fig.  41.1  below, 
which  shows  a  scene  within  the  at  the  doctor  script. 

Research  into  picture  recognition  and  memory  structures 
has  demonstrated  that  groups  of  objects  organized  into  realistic 
scenes  corresponding  to  stereotypical  situations  better  assist 
recognition  and  memory  compared  to  groups  of  arbitrarily 
placed  objects  (Mandler  &  Parker,  1976;  Mandler,  1984).  The 
scene-based  interface  using  a  realistic  arrangement  of  objects 
within  a  scene  was  therefore  chosen  to  facilitate  recognition  and 
remembering  by  the  user  and  thus  reduce  the  cognitive  load 
required  to  locate  suitable  objects  during  a  conversation. 

As  it  would  be  impractical  to  provide  scripts  for  every  con¬ 
ceivable  situation,  it  was  decided  to  provide  users  with  a  lim¬ 
ited  number  of  scripts  together  with  an  authoring  package  with 
which  they  can  develop  their  own  custom  scripts  with  help  from 
their  therapists. 
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FIGURE  41.1.  The  script  system  user  interface  showing  a  scene 
from  the  doctor  script. 

A  text  preview  and  display  box  appears  at  the  top  of  the  user 
interface.  The  main  interface  area  (bottom  right)  contains  the 
scene  image.  The  function  buttons  on  the  left  side  of  the  inter¬ 
face  are,  from  top  to  bottom:  “Tm  listening”  rapid  speech-act 
button;  button  to  access  the  main  rapid  speech-act  interface; 
scene  navigation  backtrack  to  previous  scenes  button;  scene 
navigation  overview  button;  and  tool  button  to  access  the 
notepad  and  additional  system  control  facilities. 

It  was  initially  decided  to  develop  six  complete  scripts.  These 
were  chosen  after  discussions  with  a  user  advisory  group  about 
situations  in  which  they  found  difficulty  communicating.  The 
scripts  developed  were  at  the  doctor,  at  the  restaurant,  going 
shopping,  activities  of  daily  living,  on  the  telephone,  meeting 
someone  new,  and  talking  about  emotions. 

The  system  uses  the  script  to  guide  the  user  through  a  dia¬ 
logue.  There  is  a  prediction  mechanism,  which  predicts  the  next 
most  probable  stage  in  the  dialogue  that  the  user  will  need 
(based  on  the  script),  so  the  user  can  usually  follow  a  predicted 
path  through  a  conversation.  This  prediction  mechanism  mon¬ 
itors  the  sequences  of  objects  selected  and  uses  this  informa¬ 
tion  to  modify  future  predictions. 

Help  for  Aphasia 

Communication  systems  for  nonspeaking  people  have  been  de¬ 
scribed  which  in  some  way  model  the  cognitive  processes  un¬ 
derlying  communication.  In  the  case  of  most  physically  disabled 
nonspeaking  people,  this  is  needed  in  order  to  speed  up  the 
communication  process.  However,  in  the  case  of  speech  prob¬ 
lems  that  are  caused  by  a  stroke  or  other  trauma  (aphasia),  the 
person  trying  to  communicate  will  also  have  cognitive  problems 
to  deal  with.  Interestingly,  the  objections  that  conversation 
modeling  might  provide  an  active  prompt  to  communication 
suggested  a  way  of  possibly  helping  people  who  might  need 
such  a  prompt  in  order  to  initiate  communication  at  all. 

In  a  research  project  investigating  the  possibility  of  prompt¬ 
ing  people  with  Broca’s  aphasia  in  their  communications,  a  pre¬ 


dictive  communication  system  was  developed  with  a  very  sim¬ 
ple  interface  (Waller,  Dennis,  Brodie,  &  Cairns,  1998).  The  sys¬ 
tem  held  personal  sentences  and  stories,  which  were  entered 
with  the  help  of  a  caregiver.  The  user  could  then  retrieve  the 
prestored  conversational  items,  with  the  system  offering  prob¬ 
able  Items  based  on  previous  use  of  the  system.  The  interface 
was  designed  to  be  as  simple  as  possible  and  to  be  usable  by 
people  who  were  unfamiliar  with  technology,  had  language  dif¬ 
ficulties,  but  had  retained  the  ability  to  recognize  familiar  writ¬ 
ten  topic  words. 

To  access  the  sentences  and  stories,  the  user  is  led  through 
a  sequence  of  choices  on  the  screen.  First,  they  are  offered  a 
choice  of  conversational  partners,  a  list  of  topics  most  likely  to 
be  appropriate  for  the  chosen  partner,  and  then  a  list  of  the  four 
most  common  sentences  for  that  partner  and  that  topic.  The 
user  can  choose  to  speak  one  of  the  sentences  through  a 
speech  synthesizer,  or  have  the  system  look  for  more  sugges¬ 
tions.  The  sentences  and  topic  categories  are  personal  to  each 
user,  and  the  order  In  which  topic  words  or  sentences  are  pre¬ 
sented  depends  on  the  past  use  of  the  system.  Thus,  the  sys¬ 
tem  is  specific  to  users,  both  in  the  information  content  and  in 
how  it  adapts  to  individual  ways  of  communicating. 

The  system  was  evaluated  with  five  adults  with  nonfluent 
aphasia  who  were  able  to  recognize  but  not  produce  familiar 
written  sentences.  There  was  little  change  in  the  underlying 
comprehension  and  expressive  abilities  of  the  participants  while 
not  using  the  system.  When  making  use  of  the  system,  the  re¬ 
sults  showed  that  some  adults  with  nonfluent  aphasia  were  able 
to  initiate  and  retain  control  of  the  conversation  to  a  greater  ex¬ 
tent  when  familiar  sentences  and  narratives  were  predicted.  In 
other  words,  users’  existing/residual  abilities  (e.g.,  small  vocab¬ 
ulary,  pragmatic  knowledge  of  conversation)  were  to  a  degree 
augmented  by  the  computer  functioning  as  a  cognitive  pros¬ 
thesis.  This  project  indicated  that  a  communication  system 
based  on  prompting  could  be  of  help  to  people  with  cognitive 
and  communication  difficulties. 

Support  for  Nonliterate  Users 

Children  with  complex  disabilities  may  have  a  combination  of 
physical  and  intellectual  impairments  that  impact  on  the  devel¬ 
opment  of  speech  and  language.  Some  individuals  will  learn  to 
read  and  write  using  this  medium  in  combination  with  nonverbal 
means  to  communicate.  Others  will  use  an  alternative  graphic 
medium  for  interaction.  One  of  the  drawbacks  of  providing  sym¬ 
bol-based  communication  is  the  difficulty  in  providing  access  to 
novel  vocabulary.  The  ongoing  BlissWord  project  (Andreasen, 
Waller,  &  Gregor,  1998)  is  investigating  ways  in  which  users  with 
physical  and  cognitive  limitations  can  explore  new  vocabulary. 
Bllssymbolics  is  a  semantic-based  natural  written  (pictographic) 
language,  similar  to  Chinese.  Because  of  its  generative  charac¬ 
teristics  (BlissWords  are  spelled  using  a  sequence  of  one  or  more 
Bliss  characters),  predictive  algorithms  can  be  applied  to  Blis- 
symbolics  to  assist  users  in  the  retrieval  of  words. 

BlissWords  are  sequenced  beginning  with  a  classifier  (e.g.,  all 
emotions  begin  with  a  heart).  As  illustrated  in  Fig.  41.2,  select¬ 
ing  a  shape  from  the  Bliss  keyboard,  the  interface  produces  a 
list  of  BlissWords  which  begin  with  classifiers  using  that  shape. 
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FIGURE  41 .2.  Screenshot  showing  Blissymbol  keyboard  on  left 
and  prediction  list  on  right. 


Frequency  and  word  lists  can  be  used  further  to  refine  the  Bliss- 
Words  that  are  displayed.  Users  do  not  need  to  be  literate  to  ex¬ 
plore  language  and  vocabulary.  It  is  envisaged  that  video  clips 
and  spoken  explanation  could  further  augment  learning  through 
exploration. 

Support  for  Dementia 

Dementia,  which  involves  the  loss  of  short-term  memory  in  el¬ 
derly  people,  is  a  very  serious  problem  for  the  person  and  for 
their  families  and  caregivers.  It  can  rule  out  most  social  activities 
and  interactions,  since  these  depend  on  a  working  short-term 
memory  for  effective  participation.  This  includes  even  the  es¬ 
sential  ability  to  communicate. 

As  well  as  being  valuable  with  all  older  people,  reminiscence 
is  an  important  tool  used  to  help  elderly  people  who  have  de¬ 
mentia  (Sheridan,  1992;  Feil,  1993).  This  is  because,  while  their 
short-term  memories  may  be  impaired,  their  long-term  memo¬ 
ries  are  often  more  or  less  intact  (Rau,  1993).  The  difficulty  is  ac¬ 
cessing  these  long-term  memories  without  the  capability  of 
keeping  a  conversation  going,  which  depends  on  short-term 
memory.  Activities  that  do  not  require  patients  to  maintain  the 
structure  of  a  conversation  can  help,  for  instance,  looking  at  and 
commenting  on  a  series  of  photographs,  which  can  provide  a 
framework  for  meaningful  person-to-person  interaction. 

The  tools  used  in  such  reminiscence  work  can  also  include 
videos,  sound,  music,  and  written  materials.  However,  tradi¬ 
tionally  these  are  all  in  separate  media,  and  it  can  be  very  time 
consuming  searching  for  a  particular  photo,  sound,  or  film  clip. 
Bringing  all  these  media  together  into  a  digital  multimedia 
scrapbook  could  mean  easier  access  to  content  for  more  lively 
reminiscence  sessions.  The  intention  of  this  project  was  to  be¬ 
gin  to  develop  a  system  that  could  act  as  a  conversation  pros¬ 
thesis,  giving  the  user  the  support  needed  to  carry  out  a  satis¬ 
fying  conversation  about  the  past. 

An  investigation  was  undertaken  to  determine  which  aspects 
of  multimedia  would  be  most  helpful  for  such  a  reminiscence 
experience,  and  the  best  way  to  present  them.  A  number  of  pro¬ 
totype  interfaces  for  a  multimedia  reminiscence  experience 


were  developed.  These  included  text,  photographs,  videos,  and 
songs  from  the  past  life  of  the  city.  The  materials  were  collected 
with  the  assistance  of  the  University,  Dundee  City  archives,  and 
the  Dundee  Heritage  Project.  The  prototypes  were  demon¬ 
strated  for  people  with  dementia  and  their  care  staff  at  a  day 
center  run  by  Alzheimer  Scotland  Action  on  Dementia.  The  fol¬ 
lowing  issues  were  addressed  in  these  evaluation  sessions: 

1.  Is  it  better  for  the  display  to  use  the  metaphor  of  a  real-life 
scrapbook  or  a  standard  computer  screen?  Six  of  the  staff 
members  preferred  the  book  presentation,  three  preferred 
the  screen,  and  two  had  no  preference.  Interestingly,  this 
was  almost  a  reversal  of  the  preference  shown  by  the  peo¬ 
ple  who  had  dementia,  with  the  majority  preferring  the 
screen  presentation.  The  preference  shown  for  the  screen 
presentation  could  be  due  to  reduced  cognitive  ability.  The 
book  presentation  is  a  metaphor  that  may  not  be  interpreted 
suitably  by  the  person  with  dementia. 

2.  How  should  the  scrapbook  material  be  organized — by  sub¬ 
ject  or  by  media  type?  The  majority  of  the  staff  evaluators 
preferred  the  arrangement  by  subject  saying  it  was  more  log¬ 
ical,  some  were  unsure;  however,  no  one  showed  a  prefer¬ 
ence  for  the  arrangement  by  media.  The  clients  with  dementia 
reflected  these  findings.  Despite  preferring  the  arrangement 
being  by  subject,  the  majority  of  evaluators  could  see  bene¬ 
fits  from  having  access  to  both  arrangements.  It  was  con¬ 
cluded  that  for  basic  reminiscence  sessions  the  arrangement 
by  subject  is  preferable.  But  access  to  the  arrangement  by 
media  should  be  an  option,  to  make  the  software  available 
for  use  in  other  ways. 

3.  How  do  the  sounds,  pictures,  video  and,  music  add  to  the 
reminiscence  process,  and  what  are  their  differential  ef¬ 
fects?  Most  of  the  videos  and  photographs  and  all  the  songs 
were  able  to  spur  conversations.  However,  it  was  found  that 
with  videos  the  clients  were  able  only  to  identify  strongly 
with  them  when  they  triggered  off  specific  personal  memo¬ 
ries,  whereas  songs  and  photographs  were  more  generally 
appreciated.  Attention  remained  focused  longest  with  the 
songs,  which  were  particularly  enjoyed  when  played  repeat¬ 
edly  with  everyone  singing  along.  However,  the  staff  felt  that 
some  individual  clients  had  enjoyed  the  videos  most. 

One  general  finding  was  that  the  multimedia  presentation 
as  a  whole  produced  a  great  deal  of  interest  and  motivation 
from  the  people  with  dementia.  The  staff  was  also  very  keen  to 
see  the  idea  developed  further.  A  new  project  is  now  beginning 
which  will  fully  develop  the  ideas  produced  by  this  preliminary 
work  and  will  produce  a  fully  functional  multimedia  reminis¬ 
cence  system.  The  project  will  be  carried  out  by  a  multidiscipli¬ 
nary  team  consisting  of  a  software  engineer,  a  designer,  and  a 
psychologist  specializing  in  aging. 

In  addition  to  its  benefits  for  the  person  with  dementia,  par¬ 
ticipation  in  reminiscence  activities  has  been  shown  to  have  a 
positive  outcome  for  the  caregivers  who  take  part.  Thus,  help 
for  reminiscence  is  not  only  a  tool  to  stimulate  interaction,  but 
also  a  contributor  to  improved  quality  of  life,  for  the  person  with 
dementia  and  his  or  her  family. 

Recent  work  on  using  videos  to  present  life  histories  for  peo¬ 
ple  with  dementia  has  shown  that  new  technologies,  where  sen- 
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sitively  and  appropriately  applied,  can  bring  a  substantial  added 
impact  to  supportive  and  therapeutic  activities  for  people  with 
cognitive  problems  (Cohen,  2000).  We  hope  the  reminiscence 
system  developed  will  have  the  immediate  application  de¬ 
scribed  above  and  will  also  serve  as  an  exploration  in  developing 
a  range  of  computer-based  entertainment  systems  for  people 
with  dementia. 


SYSTEMS  TO  SUPPORT 
PEOPLE  WITH  DYSLEXIA 


Dyslexia  is  a  language  disorder  that  is  still  in  the  process  of  be¬ 
ing  investigated  and  thoroughly  understood.  A  word-processing 
environment  has  been  developed  to  alleviate  some  of  the  visual 
problems  encountered  by  some  people  with  dyslexia  when  they 
read  and  produce  text.  However,  in  the  absence  of  a  full  under¬ 
standing  of  the  nature  of  dyslexia,  the  researchers’  approaches 
were  to  identify  some  of  the  most  commonly  noted  problems 
which  dyslexics.  On  the  basis  of  these  common  difficulties,  they 
identified  ways  in  which  each  individual  might  be  able  to  mini¬ 
mize  the  consequences  of  their  own  particular  problems  by  ma¬ 
nipulating  the  appearance  of  their  word-processing  environment 
and  text  presented  within  it.  This  stage  of  the  development 
process  involved  the  start  of  an  iterative  cycle  of  prototype  de¬ 
velopment  and  evaluation  with  dyslexic  computer  users.  The 
work  led  to  a  software  system  which  provides  a  highly  (and  eas¬ 
ily)  configurable  environment  for  dyslexic  people  (who  experi¬ 
ence  a  very  wide  range  of  differing  preferences  and  problems)  to 
use  for  reading  and  producing  text.  The  approach  was  to  exam¬ 
ine  the  parameters  of  the  situation,  including  cost,  cost  benefit, 
existing  software,  and  user  demand,  with  a  view  to  finding  an  op¬ 
timal  path  to  the  production  of  a  software  system  which  is  of  real 
use  in  practical  situations.  While  users  were  extensively  involved 
throughout  the  development  process,  the  researchers  did  not  al¬ 
ways  rely  directly  on  their  individual  inputs  for  further  develop¬ 
ment;  rather  they  ensured  that  at  all  times  the  process  was  sensi¬ 
tive  to  user  need  and  opinion,  but  informed  by  relevant  sources 
ranging  from  documented  wisdom  about  the  subject  to  hunches 
about  what  might  work. 

This  development  of  a  word  processor  to  assist  dyslexics  is 
thus  a  particularly  illuminating  example  of  how  the  needs  of 
people  with  a  particular  type  of  cognitive  impairment  can  be 
effectively  factored  into  the  design  and  development  process. 
This  research  will  thus  be  described  in  some  depth,  as  an  ex¬ 
ample  of  a  development  and  to  assist  the  reader  in  appreciat¬ 
ing  the  generic  importance  of  this  approach,  which  requires 
knowledge  of  the  underlying  syndrome,  and  a  methodology, 
which  encourages  an  innovative  approach  to  user  involvement. 

Dyslexia,  as  has  been  noted,  is  still  being  investigated  as  a 
language  disorder,  and  at  present,  it  has  a  number  of  definitions. 
The  British  Dyslexia  Association  (2006)  offered  this  description: 

Dyslexia  is  best  described  as  a  combination  of  abilities  and  difficulties 
that  affect  the  learning  process  in  one  or  more  of  reading,  spelling,  and 
writing.  Accompanying  weaknesses  may  be  identified  in  areas  of  speed 
of  processing,  short-term  memory,  sequencing  and  organisation,  audi¬ 
tory  and/or  visual  perception,  spoken  language  and  motor  skills.  It  is 


particularly  related  to  mastering  and  using  written  language,  which  may 
include  alphabetic,  numeric  and  musical  notation. 

The  symptoms  of  dyslexia  vary  greatly  and  the  reasons  for 
the  existence  of  these  symptoms  are  also  varied.  However,  two 
distinct  types  of  dyslexia  are  recognized:  acquired  dyslexia  and 
developmental  dyslexia. 

Acquired  dyslexia  is  associated  with  those  people  who  have 
difficulties  caused  by  damage  to  the  brain.  Therefore,  prior  to 
the  occurrence  of  brain  damage,  no  difficulties  would  have  been 
identified.  This  area  can  be  subdivided  into  disorders  in  which 
the  visual  analysis  system  is  damaged — peripheral  dyslexia — and 
disorders  in  which  processes  beyond  the  visual  analysis  system 
are  damaged,  resulting  in  difficulties  affecting  the  comprehen¬ 
sion  and/or  pronunciation  of  written  words — central  dyslexia. 

Peripheral  dyslexia  is  associated  with  difficulties  such  as  mis¬ 
reading  letters  within  words  and  migrating  letters  between  words. 
Central  dyslexia  on  the  other  hand  concerns  issues  such  as  bad 
comprehension  due  to  an  impaired  semantic  system,  and  the  in¬ 
ability  to  read  unfamiliar  words,  while  familiar  words  are  read  eas¬ 
ily.  Examples  would  include  monkey  being  read  as  ape  (a  seman¬ 
tic  error),  and  patient  being  read  as  parent  (a  visual  error). 

Developmental  (or  congenital)  dyslexia  is  present  from  birth 
but  will  generally  become  apparent  later,  and  has  been  de¬ 
scribed  as  follows: 

Developmental  dyslexia  is  a  learning  disability  which  initially  shows  itself 
by  difficulty  in  learning  to  read,  and  later  by  erratic  spelling  and  by  lack 
of  facility  in  manipulating  written  as  opposed  to  spoken  words.  The  con¬ 
dition  is  cognitive  in  essence,  and  usually  genetically  determined.  It  is 
not  due  to  intellectual  inadequacy  or  to  lack  of  sociocultural  opportu¬ 
nity,  or  to  emotional  factors,  or  to  any  known  structural  brain  defect. 
(Critchley  1964,  p.  5) 

One  of  the  main  features  of  dyslexia  is  the  individual  nature 
of  the  disorder.  The  condition  is  not  typically  characterized  by 
one  single  difficulty,  but  by  a  range  of  difficulties  that  will  vary  in 
combination  and  in  intensity  between  individuals,  giving  rise  to 
an  enormous  variation  between  individuals  in  the  problems  en¬ 
countered.  Each  dyslexic  person  thus  has  a  range  of  difficulties 
that  need  to  be  addressed  differently  from  those  of  others. 
Dyslexia  is  an  example  of  the  need  to  design  for  dynamic  diversity. 

The  wide-ranging  characteristics  of  dyslexia  provide  a  chal¬ 
lenge  for  technological  assistance,  as  a  single  approach  will  not  be 
appropriate  for  the  range  of  problems  presented  by  the  popula¬ 
tion  of  dyslexic  people.  Computer  technology  offers  the  oppor¬ 
tunity  to  provide  reading  and  writing  systems  that  are  highly  con¬ 
figurable  for  each  individual  user  but  they  need  to  be  based  on  an 
understanding  of  the  problems  which  dyslexics  have  in  reading 
and  writing,  and  some  of  the  visual  problems  which  can  affect 
them.  The  approach  adopted  in  this  research  was  to  offer  dyslexic 
users  a  range  of  appropriate  visual  settings  for  the  display  of  a 
word  processor,  together  with  the  opportunity  to  configure  eas¬ 
ily  the  way  in  which  text  is  displayed  to  them.  The  user  can  se¬ 
lect,  by  experimentation,  the  settings  that  best  suit  them.  These 
settings  are  then  saved  and  later  recalled  each  time  that  person 
uses  the  word  processor.  It  will  be  seen  that  this  approach  affords 
the  potential  to  make  computer-based  text  significantly  easier  to 
read  than  printed  text,  as  well  as  improving  the  usability  of  com¬ 
puter  word-processing  systems  for  a  wide  range  of  dyslexics. 
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Some  Common  Problems  of  Dyslexia 

It  was  first  necessary  to  determine  the  parameters  which  should 
be  offered  for  configuration  by  the  user.  An  initial  investigation 
revealed  that  some  of  the  most  commonly  encountered  prob¬ 
lems  are  as  follows  (adapted  from  Willows,  Kruk,  &  Corcos, 
1993;  D.  Shaw,  personal  communication.  May,  1994): 

1.  Number  and  letter  recognition.  One  of  the  fundamental 
problems  faced  by  dyslexics  is  the  recognition  of  individual 
alphanumeric  symbols.  This  is  often  seen  when  letters  that 
are  similar  in  shape,  such  as  n  and  h  and  /and  t,  are  con¬ 
fused.  The  problem  is  exacerbated  with  the  introduction  of 
uppercase  letters.  In  addition,  many  dyslexic  adults,  who  are 
capable  of  reading  printed  letters,  have  difficulty  in  reading 
cursive  writing. 

2.  Letter  reversals.  Many  dyslexics  are  prone  to  reversing  let¬ 
ters,  which  results  in  a  particular  letter  being  interpreted  as 
another  letter.  Examples  of  these  characters  would  be  b,  d,  p, 
and  q.  This  problem  can  result  in  poor  word  recognition  with 
words  containing  reversal  characters  being  substituted  for 
other  words  such  as  bad  for  dad. 

3.  Word  recognition.  As  well  as  the  substitution  effect  caused 
because  of  letter  reversals,  words  that  are  similar  in  their  out¬ 
line  shape  (word  contour)  can  be  substituted  by  dyslexics. 
Typical  examples  of  this  problem  are  the  words  either  and 
enter.  Both  words  have  the  same  start  and  finishing  charac¬ 
ters  and  this,  allied  with  their  similar  word  contours,  make 
them  candidates  for  being  substituted  for  each  other  when 
they  occur  in  the  text. 

4.  Number,  letter  and  word  recollection.  Even  if  the  ability  to 
recognize  numbers  and  letters  is  adequate,  it  can  still  prove 
difficult  for  a  dyslexic  individual  to  recall  the  actual  form  and 
shape  of  a  character.  Many  dyslexics  have  so  much  difficulty 
recalling  upper  and  lower  case  characters  that  they  continue 
to  print  later  in  life.  Similarly,  poor  visual  memory  means  that 
dyslexics  have  little  ability  to  distinguish  whether  or  not  a 
word  looks  right. 

5.  Spelling  problems.  Due  to  the  problems  discussed  above, 
dyslexics  can  have  great  difficulty  with  spelling,  and  many 
dyslexics  have  very  poor  spelling.  Much  of  the  spelling  of  dyslex¬ 
ics  appears  to  reflect  a  phonic  strategy  with  words  like  of  and 
all  being  spelled  ov  and  ohl. 

6.  Punctuation  recognition.  As  with  characters,  dyslexics  ap¬ 
pear  to  have  difficulty  recognizing  punctuation  marks. 

7.  Saccadic  and  fixation  problems.  Another  problem  that  is 
found  in  many  dyslexics  is  their  lack  of  ability  to  follow  text 
without  losing  their  places.  Many  find  it  difficult  to  move 
from  the  end  of  one  line  to  the  beginning  of  the  next  and 
find  themselves  getting  lost  in  the  text. 

8.  Word  additions  and  omissions.  Dyslexics  may  add  or  re¬ 
move  words  from  a  passage  of  text,  apparently  at  random. 
This  is  manifested  by  words  being  omitted  or  duplicated,  ex¬ 
tra  words  being  added,  or  word  order  being  reversed  or 
otherwise  jumbled. 

9.  Poor  comprehension.  With  the  variety  of  errors  caused  by 
the  factors  described  above,  a  dyslexic  person  may  perceive 
a  totally  different  (or  impoverished)  passage  of  text  from  the 
one  that  is  actually  in  front  of  them.  Dyslexics  thus  display 


poor  comprehension  skills  due  to  text  which  they  perceive 
being  significantly  different  from  the  actual  text. 


Computer  Aid  for  the  Problems  of  Dyslexia 

In  an  attempt  to  alleviate  some  of  the  problems  discussed 
above,  dyslexics,  particularly  within  the  education  system,  are 
encouraged  to  use  computers  for  text  manipulation.  The  use 
of  a  computer  keyboard  has  the  potential  to  alleviate  the  prob¬ 
lems  of  character  recollection,  but  this  only  really  helps  with  the 
recollection  of  characters,  not  the  recognition  of  them  once 
they  are  on  the  screen. 

There  is,  however,  strong  evidence  to  suggest  that  the  use 
of  lexical  and  spelling  aids  can  greatly  assist  with  spelling  prob¬ 
lems  exhibited  by  dyslexics  (e.g.,  Newell  &  Booth,  1991).  How¬ 
ever,  merely  highlighting  an  incorrect  word  and  offering  a  re¬ 
placement  may  not  be  enough,  since  one  of  the  other  problems 
that  some  dyslexics  face  is  an  inability  to  tell  if  a  word  looks 
right,  thus  they  will  have  difficulty  selecting  the  appropriate 
corrections. 

Some  dyslexics  are  sensitive  to  color  and  colored  acetate 
screens  or  tinted  glasses  as  well  as  lighting  conditions  can  im¬ 
prove  their  abilities  to  read  text.  Many  dyslexics  also  report  in¬ 
terference  from  peripheral  vision,  indicating  that  anything  that 
can  be  done  to  reduce  screen  clutter  outside  the  main  screen 
window,  such  as  making  the  document  page  fill  the  whole 
screen,  may  be  of  benefit  (D.  Shaw,  personal  communication. 
May,  1996). 

Based  on  the  above  difficulties  encountered  by  dyslexics,  the 
researchers  considered  ways  that  the  screen  image  of  the  text 
could  be  manipulated.  These  included,  foreground  and  back¬ 
ground  color,  character  typeface,  font  and  spacing,  making  let¬ 
ters  or  words  with  similar  shapes  distinguishable  by  using  differ¬ 
ent  fonts,  sizes,  and/or  colors,  and  presenting  text  in  narrower 
columns,  or  with  different  spacing  between  words  and/or  lines 
to  reduce  saccade  and  fixation  problems.  The  researchers  in¬ 
vestigated  potentially  promising  ideas  by  implementing  proto¬ 
types  and  evaluating  their  utility  with  dyslexic  users,  with  an 
overall  view  to  developing  a  configuration  system  that  will  en¬ 
able  all  dyslexics  to  set  up  their  own  optimized  environments. 


An  Experimental  Text  Reader  for  Dyslexics 

The  first  stage  of  the  research  was  to  develop  an  experimental 
text  reader.  This  prototype  presented  the  user  with  an  easily 
configurable  interface,  which  allowed  for  a  number  of  display 
variables  to  be  altered.  Initially,  these  were  background,  fore¬ 
ground  and  text  colors,  font  size  and  style,  and  the  spacing  be¬ 
tween  paragraphs,  lines,  words,  and  characters.  The  interface 
was  designed  in  such  a  way  that  it  gave  visual  feedback  on  se¬ 
lections  before  they  were  confirmed  and  made  minimal  use  of 
text  instruction. 

This  was  evaluated  using  12  computer  literate  dyslexic  stu¬ 
dents  from  higher  education  using  think- aloud  techniques,  as 
well  as  questionnaires  and  interviews.  At  various  development 
stages,  the  helpers  were  asked  to  try  out  the  system  with  a  view 
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to  seeing  if  it  was  possible  for  them  to  put  together  a  display, 
which  improved  their  abilities  to  read  text  from  the  screen.  All 
the  users  were  able  to  find  a  setting  that  was  subjectively  supe¬ 
rior  for  them  to  standard  black  text  on  a  white  background  with 
Times  Roman  10-  or  12-point  text,  but  the  screen  layouts  that 
were  developed  by  the  test  subjects  were  extremely  varied.  This 
highlighted  the  individual  nature  of  the  disorder,  and  the  di¬ 
verse  characteristics  of  any  interface  that  would  be  appropriate 
for  this  group. 

Each  appeared  to  have  a  favorite  color  combination,  although 
brown  text  on  a  green  background  was  liked  by  all  the  testers. 
Subjects  were  in  greatest  agreement  about  the  selection  of  a  type¬ 
face:  Sans  Serif  Arial  was  rated  the  best  by  almost  all  the  testers. 
All  reported  that  increasing  the  spacing  between  the  characters, 
words,  and  lines  was  beneficial.  The  most  interesting  point  that 
arose  during  the  testing,  however,  was  the  fact  that  at  the  begin¬ 
ning  of  the  evaluation  period  the  dyslexic  subjects  did  not  ap¬ 
pear  to  be  aware  that  altering  these  variables  might  be  of  any  use. 

A  second  prototype  was  then  developed  based  on  Word  for 
Windows  (Microsoft,  1994;  1995)  macros  to  provide  the  re¬ 
quired  configuration  interfaces.  This  was  based  on  the  concept 
of  an  evolutionary  system,  rather  than  a  fixed  prototype.  It  was 
clear  that  there  would  be  a  substantial  advantage  in  developing 
a  dyslexic  configuration,  but  this  design  decision  raised  an  in¬ 
teresting  deviation  from  the  received  wisdom  of  the  desirabil¬ 
ity  of  WYSIWYG  (What  You  See  Is  What  You  Get).  In  the  case  of 
a  dyslexic  user,  what  you  see  should  be  whatever  you  can  read 
best,  and  print  previewing  facilities  would  have  to  be  used  to 
show  how  the  layout  will  appear  when  printed. 

This  prototype  provided  a  facility  to  enhance  characters 
prone  to  reversals  (e.g.,  b,  d),  by  using  color  font  type  and  size. 
This  idea  of  coloring  reversal  characters  provided  very  interest¬ 
ing  and  unanticipated  results,  which  are  described  below.  Fixa¬ 
tion  problems  were  tackled  by  reducing  the  page  width,  and  a 
speech  synthesizer  that  could  read  the  text  on  the  screen  was 
included. 

There  were  two  distinct  parts  to  the  overall  solution,  a  pref¬ 
erence  program,  and  a  reading/editing  program.  The  first  al¬ 
lowed  users  to  experiment  with  the  various  parameters  and  the 
second  made  use  of  these  preferences  within  a  reading  and  edit¬ 
ing  environment.  The  preference  program  menu  presents  the 
user  with  various  options  and  variables,  together  with  a  pre¬ 
view  facility,  to  enable  the  user  to  experiment  with,  and  finally 
store  his  or  her  data  in  a  preferences  file. 

The  fact  that  a  unique  user  environment,  tailored  to  the 
need  of  each  individual  is  provided  means  that  the  document 
is  (deliberately)  not  WYSIWYG.  A  print  option  thus  allows  the 
user  to  print  the  document  as  it  appeared  with  his  or  her  pre¬ 
ferred  formatting  applied  to  it,  or  as  it  would  appear  without  any 
special  formatting. 

This  second  prototype  was  developed  as  an  add-on  module 
to  Microsoft  Word,  and  was  evaluated  by  seven  dyslexic  users 
of  15  to  30  years  old,  in  a  similar  fashion  to  that  above.  The  users 
found  the  system  easy  and  intuitive  to  use,  reporting  that  each 
of  the  options  had  an  effect  on  their  abilities  to  read.  The  op¬ 
tions  that  allowed  the  user  to  change  the  color  scheme  of  the 
document  appeared  to  be  the  most  helpful,  but  font  size  and 
spacing,  column  width,  and  indications  of  reversals  were  also  re¬ 
ported  to  assist  reading  by  some  or  all  of  the  users. 


The  reversals  option  provided  the  most  interesting  results  of 
all.  However,  the  reason  for  the  improvement  was  not  always 
that  the  reversal  characters  were  clearly  distinguished  and  easy 
to  read.  Instead,  it  was  claimed  that  the  sporadic  coloring  broke 
the  text  up  and  resulted  in  the  user  being  less  likely  to  get  lost, 
for  example,  the  system  was  reducing  fixation  problems  rather 
than  recognition  problems. 

As  the  testing  progressed,  the  testers  appeared  to  be  sur¬ 
prised  at  times  by  the  effect  some  of  the  changes  had  on  their 
abilities  to  read  the  document.  Comments  included  “I  would 
never  have  thought  of  doing  that,”  or  “I  don’t  think  that  will  do 
me  much  good”  before  finding  that  a  feature  did  indeed  help. 

The  prototypes  were  developed  from  the  perspective  that  the 
user  population  was  diverse,  and  that  the  design  process  must 
accommodate  potential  changes  in  preferences  over  time.  The 
fact  that  dyslexia  is  a  very  idiosyncratic  disorder  and  findings  that 
the  users  were  often  unaware  of  how  easy  it  was  to  improve  their 
reading  potentials  by  changing  visual  aspects  of  the  reading 
environment,  illustrates  how  a  standard  user-centered  design 
methodology  is  not  appropriate  for  such  user  groups. 


RESEARCH  METHODOLOGIES 


The  research  described  above  gives  a  flavor  of  successful  ap¬ 
proaches  to  developing  human  interfaces  and  software  to  sup¬ 
port  people  with  various  types  of  cognitive  impairment.  Much 
of  the  methodology  used  in  these  developments,  however,  had 
to  be  developed  ab  initio.  Traditional  user-centered  design  does 
not  have  the  flexibility  for  these  user  groups,  and  most  research 
and  development  in  the  field  of  communication  and  informa¬ 
tion  technology  to  support  people  with  disabilities  has,  to  date, 
concentrated  on  the  development  of  special  assistive  systems 
and  on  accessibility  features  for  younger,  mainly  physically  or 
sensorially  disabled  people.  Similarly,  the  human  interfaces  to 
most  computer  systems  for  general  use  have  been  designed, 
either  deliberately  or  by  default,  for  a  typical,  younger  user 
(Newell  &  Cairns,  1993;  Newell,  1995;  Newell  &  Gregor,  1997). 
Knowledge  from  these  fields  does  not  necessarily  transfer  com¬ 
fortably  to  the  challenges  encompassed  in  universal  design 
(Beirmann,  1997;  Hypponen,  1999;  Sleeman,  1998;  Stephanidis, 
2001)  and,  in  particular,  the  widely  varying  and  often  declining 
abilities  associated  with  the  range  of  cognitive  impairments. 

This  section  addresses  the  particular  issues  for  the  design 
process  which  accompany  cognitive  impairment  and  suggests 
a  paradigm  and  methodology  to  support  the  process  of  design¬ 
ing  software  which  is  as  near  to  the  universal  accessibility  ideal 
as  is  possible,  derived  from  the  approach  to  specific  projects  de¬ 
scribed  above. 

Software  systems,  which  are  aimed  at  the  mainstream  (rather 
than  being  of  a  prosthetic  nature)  need  to  address  the  wide  vari¬ 
ation  in  the  types  and  severity  of  cognitive  impairment  between 
individuals.  This  demand  is  further  complicated  by  the  fact  that 
as  people  grow  older  their  abilities  change.  This  process  of 
change  includes  a  decline  over  time  in  the  cognitive,  physical, 
and  sensory  functions,  and  each  of  these  will  decline  at  different 
rates  relative  to  one  another  for  each  individual.  This  pattern 
of  capabilities  varies  widely  between  individuals,  and  as  people 
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grow  older,  the  variability  between  people  increases.  In  addi¬ 
tion,  any  given  individual’s  capabilities  vary  in  the  short  term 
due,  for  example,  to  temporary  decrease  in,  or  loss  of,  function 
due  to  a  variety  of  causes,  such  as  the  effects  of  drugs,  illness, 
blood  sugar  levels,  and  state  of  arousal. 

This  broad  range  and  variability  of  change  presents  a  fun¬ 
damental  problem  for  the  designers  of  computing  systems, 
whether  they  be  generic  systems  for  use  by  all  ages,  or  specific 
systems  to  compensate  for  loss  of  function.  Systems  tend  to  be 
developed  for  a  typical  user  and  either  by  design  or  by  default, 
this  user  tends  to  be  young,  fit,  male,  and  crucially,  has  abilities 
that  are  static  over  time.  These  abilities  are  assumed  broadly 
similar  for  everybody.  Not  only  is  this  view  wrong,  in  that  it  does 
not  take  account  of  the  wide  diversity  of  abilities  among  the 
wider  population  of  users,  but  it  also  ignores  the  fact  that  for 
individuals,  these  abilities  are  dynamic  over  time. 

Current  software  design  also  typically  produces  an  artifact 
which  is  static  and  which  has  no,  or  very  limited,  means  of  adapt¬ 
ing  to  the  changing  needs  of  users  as  their  abilities  change.  Even 
the  user-centered  paradigm  (e.g.,  ISO  13407,  1999;  Nielsen, 
1993;  Preece,  1994;  Shneiderman,  1992)  looks  typically  at  issues 
such  as  representative  user  groups,  without  regard  for  the  fact 
that  the  user  is  not  a  static  entity.  Thus,  it  is  important  not  only  to 
be  aware  of  the  diverse  characteristics  of  people  with  cognitive 
dysfunction,  but  also  the  dynamic  aspects  of  their  abilities. 

It  is  clear  that  people  with  cognitive  impairments,  whatever 
their  cause,  can  have  very  different  characteristics  to  most  hu¬ 
man  interface  and  software  designers.  It  is  also  clear  that  in 
these  circumstances  user-centered  design  principles  need  to 
be  employed  if  appropriate  technology  is  to  be  developed  for 
this  user  group  (Gregor  &  Newell,  1999).  These  methodologies, 
however,  have  been  developed  for  user  groups  with  relatively 
homogonous  characteristics.  People  with  dementia,  for  exam¬ 
ple,  are  a  diverse  group  and  even  small  subsets  of  this  group 
tend  to  have  a  greater  diversity  of  functionality  than  is  found  in 
groups  of  able  young  people. 

An  additional  complication  is  that  there  can  be  serious  ethi¬ 
cal  issues  related  to  the  use  of  such  people  as  participants  in  the 
software  development  process.  Some  of  these  are  medically  re¬ 
lated,  but  also  include,  for  example,  the  ability  to  obtain  in¬ 
formed  consent.  It  is  thus  suggested  that  the  standard  method¬ 
ology  of  user-centered  design  is  not  appropriate  for  designing 
for  the  inclusion  of  this  user  group.  The  importance  of  research 
and  development  taking  into  account  the  full  diversity  of  the  po¬ 
tential  user  population,  including  cognitive  diversity  was  ad¬ 
dressed  by  Newell  (1993)  in  his  keynote  address  to  InterCHI  ’93, 
where  the  concept  of  “Ordinary  and  Extraordinary  Human 
Computer  Interaction”  was  developed  (Newell,  1993;  Newell  & 
Cairns,  1993;  Newell  &  Gregor,  1997). 

Market  share  is  clearly  an  important  consideration,  and  this 
has  been  given  impetus,  not  only  by  demographic  trends,  but 
also  by  recent  legislation  in  the  United  States  and  other  coun¬ 
tries,  on  accessibility  of  computer  systems  for  people  with  dis¬ 
abilities.  In  terms  of  the  workplace,  both  the  Americans  with 
Disabilities  Act  and  the  United  Kingdom  Disability  Discrimi¬ 
nation  Act  put  significant  requirements  on  employers  to  en¬ 
sure  that  people  with  disabilities  are  able  to  be  employed 
within  companies  and  to  provide  appropriate  technology  so 
that  such  employees  had  full  access  to  the  equipment  and  in¬ 


formation  necessary  for  their  employment.  Increasingly,  there 
is  political  pressure  to  increase  this  access,  and  more  require¬ 
ments  for  improved  access  by  disabled  people  are  being  en¬ 
shrined  in  legislation.  However,  access  does  not  mean  only 
that  people  with  wheelchairs  can  maneuver  around  buildings; 
it  also  means  that  there  needs  to  be  provision  for  people  with 
cognitive  (and  sensory  and  other  physical)  impairments  to  be 
able  to  operate  computers  and  other  equipment  essential  to 
the  workplace. 

An  important  additional  factor  in  the  value  for  money  equa¬ 
tion  is  that  design  that  takes  into  account  the  needs  of  those 
with  slight  or  moderate  cognitive  dysfunction  can  produce  bet¬ 
ter  design  for  everyone.  An  example  where  this  has  not  oc¬ 
curred  is  illustrated  by  the  problems  that  the  majority  of  users 
have  had  with  video  tape  recorders.  If  the  designers  had  con¬ 
sidered  those  with  cognitive  impairments  within  their  user 
group,  they  may  have  been  able  to  design  more  usable  systems. 
Another  example  is  an  e-mail  system  specifically  designed  to  be 
simple  to  use  by  older  people  with  reduced  cognitive  function¬ 
ing,  which  was  found  to  be  preferred  by  executives  to  the  stan¬ 
dard  e-mail  system  that  they  were  used  to. 

Some  people  are  impaired  from  birth,  but  some  may  be¬ 
come  temporarily  or  permanently  disabled  by  accident  or  illness 
(suddenly  or  more  slowly),  or  even  by  normal  functioning 
within  their  employment.  This  is  particularly  noticeable  in  cog¬ 
nitive  functioning.  Short-term  changes  in  cognitive  ability  oc¬ 
cur  with  everyone.  These  can  be  caused  by  fatigue,  noise  lev¬ 
els,  blood  sugar  fluctuations,  lapses  in  concentration,  stress,  or 
a  combination  of  such  factors,  and  can  produce  significant 
changes  over  minutes,  hours,  or  days.  In  addition,  alcohol  and 
drugs  can  also  induce  serious  changes  in  cognitive  functioning, 
which  is  recognized  in  driving  legislation,  but  not  in  terms  of 
how  easy  it  is  to  use  computer-based  systems. 

Most  people  at  one  time  or  another,  will  exhibit  cognitive 
functional  characteristics,  which  are  significantly  outside  the 
normal  range.  Although  neither  they,  nor  their  peers,  would 
consider  these  people  disabled,  their  abilities  to  operate  stan¬ 
dard  equipment  may  well  be  significantly  reduced. 

The  questions  that  designers  need  to  consider  include: 

•  Does  the  equipment  that  I  provide  comply  with  the  legislation 
concerning  use  by  employees  who  may  be  cognitively  disabled? 

•  To  what  extent  do  I  need  to  take  into  account  the  needs  of 
employees  who  are  not  considered  disabled,  but  have  signif¬ 
icant  temporary  or  permanent  cognitive  dysfunction? 

•  Should  I  make  specific  accommodation  for  the  known  re¬ 
ductions  in  cognitive  abilities  which  occur  as  employees  get 
older  (e.g.,  less  requirement  for  short-term  memory,  or  the 
need  to  learn  new  operating  procedures)? 

•  What  are  the  specific  obligations  designers  and  employers 
have  to  provide  systems  that  can  be  operated  by  employees 
whose  cognitive  ability  has  been  reduced  due  to  the  stress, 
noise,  or  other  characteristics  of  the  workplace? 

The  argument  is  that  it  would  be  very  unusual  for  anyone  to 
go  through  their  working  life  without  at  some  stage,  or  many 
stages,  being  significantly  cognitively  disabled.  If  equipment  de¬ 
signers  considered  this,  it  is  probable  that  the  effectiveness  and 
efficiency  of  the  work  force  could  be  maintained  at  a  higher 
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level  than  would  be  the  case  if  the  design  of  the  equipment 
were  based  on  an  idealistic  model  of  the  characteristics  of  the 
user  and  their  work  environment. 

The  Disabling  Environment 

In  addition  to  the  user  having  characteristics  that  can  be  con¬ 
sidered  disabled,  it  is  also  possible  for  them  to  be  disabled  by 
the  environments  within  which  they  have  to  operate.  Newell  & 
Cairns  (1993)  made  the  point  that  the  human-machine  inter¬ 
action  problems  of  an  able  bodied  (ordinary)  person  operating 
in  an  high  work  load,  high  stress  or  otherwise  extreme  (e.g., 
extraordinary)  environment  has  very  close  parallels  with  a  dis¬ 
abled  (extraordinary)  person,  operating  in  an  ordinary  situa¬ 
tion  (e.g.,  an  office). 

High  workloads  and  the  stress  levels  to  which  this  can  lead 
often  reduce  the  cognitive  performance  of  the  human  opera¬ 
tor.  For  example,  a  very  noisy  environment  cannot  only  create 
a  similar  situation  to  hearing  or  speech  impairment,  but  can 
also  lead  to  reduced  cognitive  performance.  The  stress  level 
in  the  dealing  room  of  financial  houses  can  be  very  high  and 
is  often  accompanied  by  high  noise  levels.  A  significant  ad¬ 
vance  may  be  made  if  the  software  that  was  to  be  used  in 
these  houses  was  to  be  designed  on  the  assumption  that  the 
users  would  be  hearing  impaired  and  have  a  relatively  low 
cognitive  performance.  It  is  interesting  to  speculate  as  to 
whether  such  systems  would  produce  higher  productivity, 
better  decision  making  and  less  stress  on  the  operators. 
Other  examples  of  extreme  environments  in  which  people 
have  to  operate  are  the  battlefield,  under  water,  or  out  in 
space.  The  stress  and  fatigue  caused  by  working  within  such 
environments  means  that  their  performance  is  similar  to  that 
which  could  be  achieved  by  a  very  disabled  person  operating 
in  a  more  normal  environment.  It  is  not  always  clear  that  the 
equipment  such  people  need  to  operate  has  been  designed 
with  this  view  of  the  user. 

It  is  very  important  to  describe  the  users  of  technology  in 
terms  of  their  functional  abilities  related  to  technology  rather 
than  generic  definitions  of  either  medical  conditions,  or  pri¬ 
marily  medical  descriptions  of  their  disabilities.  Unfortunately, 
most  statistical  data  is  presented  as  generic  and  medically  cat¬ 
egorizations  of  disability.  Gill  and  Shipley  (1999),  however,  de¬ 
fined  disabled  user  groups  in  terms  of  their  functional  abili¬ 
ties,  with  specific  emphasis  on  the  use  of  the  telephone.  They 
estimated  that  within  the  European  Union,  which  has  a  popu¬ 
lation  of  385  million,  there  were  9  million  people  with  cognitive 
impairment,  which  could  lead  to  problems  using  the  tele¬ 
phone.  Theses  figures  do  not  take  into  account  multiple  im¬ 
pairments,  and  the  authors  pointed  out  that,  in  the  elderly 
population  in  particular,  there  may  be  a  tendency  towards  cog¬ 
nitive,  hearing,  vision,  and  mobility  impairments  being  present 
to  a  varying  extent  and  these  may  interact  when  considering 
the  use  of  technological  systems.  It  is  this  multiple  minor  re¬ 
duction  in  function  (often  together  with  a  major  disability), 
which  means  that  the  challenges  to  technological  support  for 
older  people  have  significantly  different  characteristics  to  that 
of  younger  disabled  people  and  to  the  nondisabled,  nonelderly 
population. 


There  has  been  some  movement  in  mainstream  research 
and  development  in  technology,  both  in  academia  and  indus¬ 
try,  away  from  a  technology  led  focus  to  a  more  user  led  ap¬ 
proach,  and  this  has  led  to  the  development  of  user-centered 
design  principles  and  practices  in  many  industries.  In  addition, 
a  number  of  initiatives  have  been  launched  to  promote  a  con¬ 
sideration  of  people  with  disabilities  within  the  user  group  in 
mainstream  product  development  teams  with  titles  including 
“Universal  Design,”  “Design  for  All,”  ‘Accessible  Design,”  and 
“Inclusive  Design.”  The  “Design  for  AllAJniversal  Design”  move¬ 
ment  has  been  very  valuable  in  raising  the  profile  of  disabled 
users  of  products,  and  has  laid  down  some  important  principles. 
This  approach  however  has  tended  not  to  place  too  much  sig¬ 
nificance  on  cognitive  impairment,  and,  particularly  if  this  is  in¬ 
cluded  as  a  factor  in  the  design  process,  then  it  becomes  more 
difficult  to  use  traditional  user-centered  design  approaches. 

Newell  and  Gregor  (2000)  suggested  that  a  new  design  ap¬ 
proach  should  be  developed,  which  would  be  based  on  the  al¬ 
ready  accepted  user-centered  design  methodology.  There  are 
some  important  distinctions  between  traditional  user-centered 
design  with  able-bodied  users,  and  the  approach  needed  when 
the  user  group  either  contains,  or  is  exclusively  made  up  of, 
people  with  cognitive  dysfunction.  These  include: 

•  Much  greater  variety  of  user  characteristics  and  functionality 

•  The  difficulty  in  finding  and  recruiting  representative  users 

•  Situations  where  design  for  all  is  certainly  not  appropriate 
(e.g.,  where  the  task  requires  a  high  level  of  cognitive  ability) 

•  The  need  to  specify  exactly  the  characteristics  and  function¬ 
ality  of  the  user  group 

•  Conflicts  of  interest  between  user  groups,  including  tem¬ 
porarily  able  bodied 

•  Tailored,  personalizable,  and  adaptive  interfaces 

•  Provision  for  accessibility  using  additional  components  (hard¬ 
ware  and  software) 

The  balance  in  the  design  process  also  needs  to  shift  from  a 
focus  on  user  needs  to  one  on  the  users  themselves.  There  will 
be  additional  problems  when  considering  people  with  cognitive 
dysfunction,  which  will  include 

•  The  lack  of  a  truly  representative  user  group 

•  That  a  different  attitude  of  mind  of  the  designer  is  required 

•  Ethical  issues  (Aim,  1994;  Balandin  &  Raghavendra,  1999) 

•  It  may  be  difficult  to  get  informed  consent  from  some  users 

•  Difficulties  of  communication  with  users 

•  The  users  may  not  be  able  to  (sufficiently)  articulate  their 
thoughts,  or  even  may  be  incompetent  in  a  legal  sense 

Thus,  there  can  be  particularly  difficult  ethical  problems 
when  involving  users  with  cognitive  impairments  in  the  design 
process.  In  addition,  it  is  often  necessary  to  involve  clinicians 
when  such  users  are  involved,  so  some  of  the  user-centered  de¬ 
sign  actually  focuses  on  professional  advice  about  the  user, 
rather  than  direct  involvement  of  the  user.  Even  with  these 
problems,  however,  it  is  possible  to  include  users  with  cognitive 
dysfunction  sensitively  in  the  design  process. 
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The  Inclusion  of  Users  With  Disabilities 
Within  Research  Groups 

In  Dundee  users  with  disabilities  have  a  substantial  involvement 
in  the  research,  and  they  have  made  a  significant  contribution 
both  to  the  research  and  to  the  commercial  products  that  have 
grown  from  this  research.  Users  are  involved  in  two  major  ways: 

•  As  disabled  consultants  on  the  research  team,  where  they  act 
essentially  as  test  pilots  for  prototype  systems. 

•  By  the  traditional  user-centered  design  methodology  of  hav¬ 
ing  user  panels,  formal  case  studies,  and  individual  users  who 
assess  and  evaluate  the  prototypes  produced  as  part  of  the 
research. 

The  contribution  made  by  clinicians  is  also  vital  to  the  research, 
and  these  are  full  members  of  the  research  team.  Dundee’s  Ap¬ 
plied  Computing  Department  is  also  one  of  the  few  computing 
departments  that  has  employed  speech  therapists,  nurses,  spe¬ 
cial  education  teachers,  linguists,  and  psychologists  (both  clini¬ 
cal  and  cognitive). 

User  Sensitive  Inclusive  Design 

Some  significant  changes  must  be  introduced  to  the  user-centered 
design  paradigm  if  users  with  disabilities  are  to  be  included,  and 
this  is  particularly  important  if  the  users  have  cognitive  impair¬ 
ment.  In  order  to  ensure  that  these  differences  are  fully  recog¬ 
nized  by  the  field,  the  title  “User  Sensitive  Inclusive  Design”  has 
been  suggested.  The  use  of  the  term  inclusive  rather  than  uni¬ 
versal  reflects  the  view  that  inclusiveness  is  a  more  achievable, 
and  in  many  situations,  appropriate  goal  than  universal  design  or 
design  for  all.  Sensitive  replaces  centered  to  underline  the  extra 
levels  of  difficulty  involved  when  the  range  of  functionality  and 
characteristics  of  the  user  groups  can  be  so  great  that  it  is  im¬ 
possible  in  any  meaningful  way  to  produce  a  small  representative 
sample  of  the  user  group,  nor  often  to  design  a  product  that 
truly  is  accessible  by  all  potential  users. 

Design  for  Dynamic  Diversity 

In  addition  to  the  aspects  of  user  sensitive  inclusive  design  de¬ 
scribed  above,  it  is  necessary  to  make  designers  fully  aware  of 
the  range  of  diversity  which  can  be  expected  with  cognitively 
impaired  people,  and  also  the  changing  nature  of  the  cognitive 
functioning  of  people.  Thus,  Gregor  and  Newell  (2000)  suggested 
that  this  be  drawn  particularly  to  the  attention  of  designers  by 
introducing  the  concept  of  “Designing  for  Dynamic  Diversity.” 
This  process,  described  above,  entails  recognition  that  people’s 
abilities  are  diverse  at  any  given  age  and  that  as  they  grow  older 
this  diversity  grows  dynamically;  it  also  involves  a  recognition 
that  any  given  individual’s  abilities  will  vary  according  to  factors 
such  as  mood,  fatigue,  blood  sugar  levels,  and  so  on.  Only  by 
taking  on  board  the  factors  associated  with  Designing  for  Dy¬ 
namic  Diversity  will  software  design  produce  artifacts  which  are 
not  static  and  which  have  no,  or  very  limited,  means  of  adapting 
to  the  changing  needs  of  users  as  their  abilities  change. 


As  has  been  seen  above,  metaphors  and  processes  in  use  at 
present  are  limited  in  meeting  the  needs  of  this  design  paradigm 
or  addressing  the  dynamic  nature  of  diversity.  New  processes 
and  practices  are  needed  to  address  the  design  issues;  aware¬ 
ness  raising  among  the  design,  economic  and  political  commu¬ 
nities  has  to  start;  and  research  is  needed  to  find  methods  to  pin 
down  this  moving  target. 

A  story-telling  metaphor.  In  addition,  researchers  need 
to  consider  how  best  to  disseminate  the  concepts  behind  uni¬ 
versal  usability  and  the  results  of  user  sensitive  inclusive  re¬ 
search.  User  sensitive  inclusive  design  needs  to  be  an  attitude 
of  mind  rather  than  simply  the  mechanistic  application  of  de¬ 
sign  for  all  guidelines.  This  offers  a  further  challenge  to  the  com¬ 
munity.  The  dangers  of  using  such  studies  to  produce  more  ex¬ 
tensive  guidelines  has  been  referred  to  above,  but  it  is  important 
that  the  results  of  user  sensitive  inclusive  design  are  made  avail¬ 
able  to  other  designers  and  researchers.  However,  it  is  too  early 
to  lay  down  principles  and  practices  that  must  be  followed  by  de¬ 
signers,  and  it  may  even  be  impossible  to  do  this  for  some  of  the 
contexts  and  environments  in  which  designers  work.  Thus,  it  is 
suggested  that  we  follow  a  story-telling  approach,  in  which  in¬ 
formation  about  accessibility  issues,  and  design  methods  which 
focus  on  accessibility  is  presented  in  narrative  form,  with  partic¬ 
ular  examples  to  illustrate  generic  principles.  This  is,  in  some 
sense,  an  extension  of  the  single  case-study  methodology.  This 
methodology  could  provide  very  useful  insights  to  designers  in  a 
form  that  they  will  find  easy  to  assimilate  and  act.  Thus,  this  will 
assist  in  their  educations  and  will  help  them  to  design  more  ac¬ 
cessible  products,  and  better  products  for  everyone. 

The  use  of  theater.  As  an  extension  of  the  story-telling 
metaphor,  the  research  group  in  Applied  Computing  at  Dundee 
University  has  investigated  the  use  of  dramatic  techniques  and 
theater  as  a  way  of  addressing  the  challenges  of  user  sensitive  de¬ 
sign.  For  example,  as  part  of  the  UTOPIA  (Usable  Technology  for 
Older  People:  Inclusive  and  Appropriate)  project,  they  worked  in 
collaboration  with  the  Foxtrot  Theater  Company  to  use  theater 
to  encourage  interaction  between  (older)  users  of  technology  and 
designers.  The  outcome  was  the  “UTOPIA  Trilogy,”  a  series  of 
short-video  plays  addressing  problems  older  people  have  in  using 
technology  (Carmichael,  A.,  Newell,  Morgan,  Dickinson,  &  Mival, 
2005).  The  films  were  dramatizations  of  some  of  the  issues  the 
researchers  had  encountered  during  the  project.  These  films  were 
based  on  real  events,  conversations,  and  observations,  and  they 
were  the  amalgamation  of  many  and  are  intended  to  convey  older 
people’s  experiences  with  technology  and  the  situations  they  en¬ 
counter.  These  videos  were  evaluated  with  a  variety  of  audiences 
including  academics,  practitioners,  software  engineers,  relevant 
groups  of  undergraduates,  and  older  people.  This  established  that 
the  videos  provided  a  useful  channel  for  communication  between 
users  of  technology  and  designers,  and  changed  the  perceptions 
of  both  students  and  more  mature  designers  of  IT  systems  and 
products  about  older  people’s  requirements.  A  similar  technique 
has  also  been  used  in  the  requirements  gathering  phase  for  an  IT 
system  designed  to  monitor  older  people  in  case  of  falls  at  home 
(Marquis-Faulkes,  McKenna,  Gregor,  &  Newell,  2003). 

This  research  showed  that  the  use  of  theater  can  be  a  very 
powerful  method  of  encouraging  dialogue  between  various  pro- 
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fessional  groups  particularly  in  a  clinical  environment,  for  keep¬ 
ing  a  focus  for  discussions,  and  also  for  providing  a  channel  for 
communication  between  users  of  technology  and  designers. 
The  researchers  view  is  that  the  success  of  this  approach  was 
in  large  part  due  to  the  plays  being  narrative  based  rather  than 
having  a  pedagogic  style.  That  is,  they  illustrated  the  issues  in¬ 
volved  within  interesting  story  lines,  with  all  the  characteristics 
of  a  good  narrative — humor,  tension,  human  interest,  and  an¬ 
tagonists  and  protagonists.  In  addition,  the  quality  of  the  pro¬ 
duction,  having  been  produced  by  theater  professionals,  played 
a  major  part  in  the  success  of  the  venture. 

Newell,  Morgan,  Carmichael,  and  Gregor  (2006)  discussed 
the  various  ways  in  which  actors  and  theater  can  play  a  part  in 
the  design  process  for  human-computer  interfaces.  This  could 
provide  a  particularly  valuable  methodology  for  the  design 
process  when  the  target  users  have  cognitive  impairment  and 
thus  may  not  be  appropriate  for  including  within  standard  user- 
centered  design  methodologies. 


CONCLUSION 


Although  it  is  not  necessary  for  human  interface  designers 
whose  systems  may  be  used  by  people  with  cognition  impair¬ 
ment  to  be  fully  versed  in  all  aspects  of  cognition,  it  is  important 
for  them  to  have  some  background  knowledge  of  the  area.  They 
should  also  be  in  contact  with  experts  in  other  disciplines,  such 
as  psychology,  and  have  access  to  appropriate  clinical  knowl¬ 
edge.  In  addition,  the  development  of  the  concept  of  and  a 
methodology  for  user  sensitive  inclusive  design,  design  for  dy¬ 
namic  diversity,  and  development  of  story-telling  methods  for 
communicating  results  will  facilitate  researchers  in  this  field  and 
will  provide  mainstream  engineers  with  an  effective  and  efficient 
way  of  including  people  with  disabilities  within  the  potential 
user  groups  for  their  projects.  If  both  of  these,  can  be  achieved 
it  will  go  some  way  towards  providing  appropriate  technological 
support  for  people  with  cognitive  impairment. 
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INTRODUCTION 


Computing  devices  are  becoming  smaller,  more  mobile,  more 
powerful,  more  ubiquitous,  and  less  expensive,  changing  the 
way  people  work,  shop,  communicate,  and  entertain.  As  a  re¬ 
sult,  an  inability  to  use  these  technologies  can  significantly  limit 
employment,  educational,  and  social  opportunities.  While  com¬ 
puting  devices  can  be  convenient  tools  for  traditional  computer 
users,  they  can  also  create  barriers  for  individuals  with  physi¬ 
cal,  cognitive,  or  perceptual  impairments.  For  example,  individ¬ 
uals  with  certain  types  of  tremors  may  experience  difficulty  us¬ 
ing  small  keyboards.  However,  a  design  process  that  considers 
the  impairments  of  users  can  turn  these  potential  barriers  into 
powerful  tools  that  increase  employment  opportunities,  pro¬ 
vide  enhanced  communication  capabilities,  and  enable  in¬ 
creased  independence  (Young,  Tumanon,  &  Sokal,  2000). 

While  cognitive,  perceptual,  and  physical  impairments  (PI) 
can  all  hinder  the  use  of  computing  technologies,  this  chapter 
focuses  on  specific  physical  impairments  that  contribute  to  dis¬ 
ability.  More  specifically,  we  focus  on  PI  that  may  hinder  an  in¬ 
dividual’s  ability  to  physically  interact  with  computing  tech¬ 
nologies.  Therefore,  we  do  not  address  PI  that  affect  the  lower 
body  or  PI  that  hinder  the  production  of  speech.  For  additional 
information  regarding  cognitive  or  perceptual  impairments,  see 
chapters  41,  43,  and  45.  Chapters  39  and  40  may  also  provide 
useful  insights  as  they  discuss  the  design  of  technologies  for 
older  adults  and  children,  two  groups  whose  physical  capabili¬ 
ties  may  require  special  attention. 

The  objectives  of  this  chapter  are  to: 

•  Provide  an  introduction  to  specific  PI  that  can  hinder  the  use 
of  traditional  computing  devices; 

•  Highlight  critical  characteristics  of  PI  that  must  be  considered 
when  designing  computing  systems; 

•  Discuss  the  relationship  between  PI  that  result  from  health 
conditions  and  those  that  result  from  environmental  or  con¬ 
textual  factors; 

•  Summarize  existing  HCI  research  involving  individuals  with  PI; 

•  Offer  observations  based  upon  the  literature  to  guide  future 
research  and  development  efforts. 

We  believe  that  understanding  Pis  will  allow  for  the  design  of 
more  effective  computing  technologies  that  lessen  or  even  elim¬ 
inate  the  associated  disabilities.  We  begin  by  presenting  a  set  of 
definitions  for  impairment,  disability,  and  handicap  that  are  of¬ 
fered  by  the  World  Health  Organization  (WHO) .  By  formally  defin¬ 
ing  these  terms,  and  using  the  terms  in  a  way  that  is  consistent 
with  these  definitions,  we  hope  to  eliminate  potential  ambiguity 
and  confusion.  Subsequently,  we  define  the  subset  of  PI  that  is  of 
primary  concern  when  interacting  with  computing  technologies. 
Through  this  definition,  we  refine  the  scope  of  this  chapter. 

Given  these  definitions,  we  proceed  to  discuss  the  relation¬ 
ship  between  health  conditions  and  PI.  Understanding  the  un¬ 
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derlying  health  condition  is  critical  since  this  often  provides 
valuable  insights  regarding  the  nature  of  the  resulting  impair¬ 
ments.  We  describe  common  health  conditions  (e.g.,  cerebral 
palsy,  spinal  cord  injuries)  associated  with  PI  that  affect  interac¬ 
tions  with  computing  devices.  We  identify  the  associated  PI,  im¬ 
portant  characteristics  of  these  PI,  and  any  additional  impair¬ 
ments  that  may  prove  critical  when  designing  computing 
technologies.  While  health  conditions  are  most  often  associated 
with  PI,  both  contextual  and  environmental  factors  can  also  hin¬ 
der  interactions  with  computing  devices.  Building  on  this  ob¬ 
servation,  we  briefly  discuss  the  relationship  among  the  envi¬ 
ronment,  context,  physical  impairments,  and  disabilities.  Most 
importantly,  we  highlight  similarities  and  differences  between 
those  PI  associated  with  health  conditions  and  difficulties  indi¬ 
viduals  may  experience  as  a  result  of  the  environment  or  the 
activities  in  which  they  are  engaged. 

We  conclude  by  reviewing  recent  research,  discussing  exist¬ 
ing  technologies,  highlighting  some  challenges  associated  with 
adopting  these  technologies,  and  offering  directions  for  addi¬ 
tional  research. 


DEFINING  IMPAIRMENT 


The  World  Health  Organization  published  the  “International 
Classification  of  Impairments,  Disabilities,  and  Handicaps” 
(ICIDH)  in  1980.  The  current  version  of  this  classification 
serves  as  a  foundation  for  this  discussion  (WHO,  2006).  The 
ICIDH  model  acknowledges  the  complex  relationships  that  ex¬ 
ist  among  health  conditions,  impairments,  disabilities,  and 
handicaps  while  highlighting  the  potentially  important  role  of 
both  the  context  and  environment  in  which  activities  are  taking 
place.  Figure  42.1  highlights  the  relationships  between  user 
capabilities  and  the  influence  of  both  health  conditions  and 
context.  Both  health  conditions  and  context  can  directly  affect 
the  users’  capabilities.  Often,  this  influence  is  gradual  rather 
than  abrupt,  causing  an  individual’s  capabilities  to  move  along 


Health  Condition 


Context 

(Environment  and  Tasks) 


FIGURE  42.1.  Model  illustrating  the  relationship  between 
health  conditions,  context,  and  an  individual’s  capabilities. 
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a  continuum  from  fully  capable  to  the  point  at  which  they  are 
no  longer  able  to  effectively  perform  a  particular  task.  A  specific 
health  condition  may  or  may  not,  negatively  affect  an  individ¬ 
ual’s  capabilities,  depending  on  the  environment  in  which  the 
individual  is  located  and  the  tasks  in  which  they  are  engaged. 
The  following  definitions  are  adapted  from  the  ICIDH: 

•  Health  condition:  A  disease,  disorder,  injury,  or  trauma.  Ex¬ 
amples  include  spinal  cord  injuries,  arthritis,  cerebral  palsy, 
stroke,  multiple  sclerosis,  and  amyotropic  lateral  sclerosis. 

•  Impairment:  A  loss  or  abnormality  of  body  structure  or  func¬ 
tion.  Physical  impairments  include  a  loss  of  muscle  power,  re¬ 
duced  mobility  of  a  joint,  uncontrolled  muscle  activity,  and 
absence  of  a  limb.  Impairments  can  be  caused  by  health  con¬ 
ditions.  For  example,  an  individual  with  arthritis  (health  con¬ 
dition)  may  experience  weakness,  stiffness,  and  a  reduced 
range  of  motion  (impairments).  Impairments  can  also  be 
caused  by  context.  For  example,  a  cold  environment  can  limit 
joint  mobility. 

•  Disability  (activity  limitations) :  Difficulties  an  individual  may 
have  in  executing  a  task  or  action.  Examples  include  difficulty 
communicating,  learning,  performing  tasks,  or  using  com¬ 
puting  technologies.  Disabilities  can  be  caused  by  impair¬ 
ments;  an  individual  has  a  disability;  disabilities  are  activity- 
specific.  For  example,  a  loss  of  muscle  power  in  both  arms 
(impairment)  may  interfere  with  an  individual’s  ability  to  use 
a  keyboard  and  mouse  (disability).  Context  can  also  lead  to 
disabilities.  For  example,  working  in  a  moving  vehicle  does 
not  necessarily  result  in  an  impairment  (e.g.,  the  individual’s 
capabilities  have  not  been  altered),  but  can  result  in  a  dis¬ 
ability  (e.g.,  the  individual  may  not  be  able  to  use  a  mouse 
due  to  the  movement  of  the  vehicle). 

•  Handicap  (participation  restrictions) :  Problems  an  individ¬ 
ual  may  experience  while  involved  in  life  situations.  Handi¬ 
caps  can  be  caused  by  disabilities  (which  originate  with  either 
a  health  condition  or  context);  an  individual  experiences  a 
handicap;  handicaps  occur  at  a  social  level.  For  example,  an 
individual  may  have  difficulty  communicating  (disability) 
which  places  restrictions  on  her  ability  to  participate  in  edu¬ 
cational  and  work  activities,  participate  in  the  exchange  of 
information,  and  maintain  social  relationships  (handicaps). 

Given  these  definitions  the  distinction  between  a  disability 
and  a  handicap  is,  in  many  ways,  dependent  upon  the  individ¬ 
ual.  If  an  individual  has  an  impairment  (e.g.,  difficulty  bending 
his  fingers)  that  makes  it  difficult  to  use  a  standard  keyboard, 
then  this  individual  has  a  disability  with  respect  to  the  operation 
of  the  keyboard.  However,  this  individual  only  experiences  a 
handicap  if  the  inability  to  use  the  keyboard  interferes  with  his 
normal  life  activities.  Our  focus  is  on  designing  computing  tech¬ 
nologies  to  lessen  or  eliminate  disabilities  that  may  result  from 
either  PI  or  context.  As  a  result,  the  potentially  subtle  differ¬ 
ences  between  disabilities  and  handicaps  are  not  important. 


PHYSICAL  IMPAIRMENTS  AND  COMPUTING 


Impairments,  as  described  by  the  WHO,  can  affect  every  struc¬ 
ture  and  function  of  the  body.  In  this  chapter,  we  focus  on  the 


subset  of  PI  that  are  most  relevant  in  the  context  of  physical  in¬ 
teractions  with  computing  technologies.  Impairments,  even  PI, 
that  interfere  with  vision,  hearing,  and  other  activities  involved 
in  the  use  of  computing  devices  are  not  discussed  (see  chap¬ 
ters  41,  43,  and  45  for  coverage  of  some  of  these  issues).  Four 
categories  of  PI  are  particularly  relevant: 

•  Structural  deviations — Situations  in  which  there  is  significant 
deviation  or  loss  with  regard  to  anatomical  parts  of  the  body. 
Examples  include  the  partial  or  total  absence  of  a  body  part 
(e.g.,  missing  finger,  hand,  arm)  as  well  as  situations  in  which 
a  body  part  deviates  from  the  norm  in  either  position  or 
dimension. 

•  Mobility  (of  bone  and  joint)  functions — Mobility  functions 
address  an  individual’s  ability  to  move  a  joint  or  bone.  This  in¬ 
cludes  both  the  range  and  ease  of  movement.  For  example, 
an  individual  may  have  limited  range  of  motion  or  experience 
difficulty  bending  her  fingers  (e.g.,  arthritis). 

•  Muscle  power  functions — Muscle  power  functions  address  an 
individual’s  ability  to  generate  force  by  contracting  a  muscle 
or  muscle  group.  Paresis  (e.g.,  weakness)  refers  to  the  par¬ 
tial  loss  of  power  while  paralysis  refers  to  the  complete  loss  of 
power.  Both  can  be  caused  by  various  conditions  (e.g.,  brain 
injury,  spinal  cord  injuries).  Muscle  tone  and  endurance  func¬ 
tions  may  also  be  important,  but  tend  to  be  less  critical  for 
most  interactions  with  computing  technologies. 

•  Movement  functions — ^While  additional  movement  functions 
exist,  we  focus  on  an  individual’s  ability  to  control  voluntary 
and  involuntary  movements.  Difficulty  controlling  move¬ 
ments  that  involve  a  rapid  change  in  direction  (e.g.,  dysdi- 
adochokinesia)  is  an  example  of  a  problem  with  a  voluntary 
movement.  Uncontrolled  shaking  or  trembling  of  the  hands 
(e.g.,  essential  tremors  or  tremors  associated  with  Parkin¬ 
son’s  Disease)  are  examples  of  involuntary  movements. 

■While  there  are  many  other  types  of  PI,  including  the  stabil¬ 
ity  of  the  joints,  motor  reflex  functions,  and  gait  pattern  func¬ 
tions,  the  four  categories  identified  earlier  account  for  the  ma¬ 
jority  of  PI  that  hinder  the  use  of  computing  technologies.  As  we 
discuss  various  health  conditions,  it  is  useful  to  identify  specific 
PI  that  may  occur  as  well  as  important  characteristics  of  these 
PI  that  may  influence  the  design  of  alternative  strategies  or 
technologies.  'While  PI  vary  in  many  ways,  the  following  four  di¬ 
mensions  are  perhaps  the  most  critical: 

•  PI  can  be  permanent  or  temporary, 

•  PI  may  be  continuous  or  intermittent .  For  example,  indi¬ 
viduals  with  Multiple  Sclerosis  experience  relapses  and 
remissions; 

•  PI  may  be  progressive  (getting  more  severe  with  time),  re¬ 
gressive  (getting  less  severe  with  time),  or  static  (no  change 
with  time) ; 

•  The  severity  of  a  PI  can  range  from  mild  to  severe.  In  other 
cases,  severity  is  variable. 

Temporary  PI  can  be  challenging  because  individuals  are  un¬ 
likely  to  invest  significant  time  or  effort  in  adapting  to  new  in¬ 
teraction  techniques.  If  a  PI  is  permanent,  individuals  will  de¬ 
velop  accommodation  or  adaptive  strategies  and  learning  new 
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interaction  techniques  may  be  acceptable.  Intermittent  PI  create 
difficulties  because  alternative  interaction  techniques  must  be 
accessible  when  needed  without  hindering  interactions  when 
the  PI  is  not  present.  Progressive  and  regressive  PI  are  most  ef¬ 
fectively  addressed  by  accommodations  that  adapt  as  the  PI 
changes.  This  adaptation  may  be  automatic  or  user  directed. 
Adaptation  is  also  useful  when  the  severity  of  the  PI  is  variable-, 
the  solution  can  be  more  complex  since  severity  can  change  at 
any  time.  PI  that  are  permanent,  continuous,  static,  and  of  a 
fixed  severity  are  perhaps  the  easiest  to  address.  Under  these 
conditions,  the  individuals  are  likely  to  adapt  to  their  impair¬ 
ment,  develop  their  own  accommodation  strategies,  and  be 
more  willing  to  invest  time  learning  new  interaction  techniques. 


HEALTH  CONDITION-INDUCED 
IMPAIRMENTS 


It  is  tempting  to  view  all  individuals  with  a  particular  PI  as  ben¬ 
efiting  equally  from  a  given  accommodation.  Unfortunately,  this 
perspective  can  result  in  solutions  that  fail  to  adequately  ad¬ 
dress  the  needs  of  the  individual.  Understanding  the  underly¬ 
ing  health  condition  provides  valuable  insights  into  the  nature 
of  the  associated  PI,  which  in  turn  provides  insights  into  the 
characteristics  of  the  accommodations  that  will  prove  most 
beneficial. 

We  briefly  describe  some  of  the  more  common  diseases,  dis¬ 
orders,  and  injuries  associated  with  PI  that  hinder  the  use  of 
computing  systems.  For  additional  information  regarding  any  of 
the  underlying  health  conditions,  we  refer  the  reader  to  the 
Merck  Manual  (Berkow,  1997)  or  the  references  included 
throughout  this  chapter.  Our  descriptions  assume  that  appro¬ 
priate  treatments  (e.g.,  medication,  surgery,  therapy)  have  al¬ 
ready  taken  place.  While  we  do  not  address  issues  involved  in 
perceptual,  speech,  or  cognitive  impairments,  we  do  highlight 
situations  in  which  these  impairments  may  be  present.  We  do 
this  for  two  reasons:  (a)  to  provide  additional  insights  into  diffi¬ 
culties  individuals  with  specific  health  conditions  may  face  (e.g., 
vision,  hearing,  or  cognition  difficulties),  and  (b)  to  highlight 
situations  in  which  specific  alternative  interaction  strategies  or 
technologies  may  not  be  appropriate  (e.g.,  speech  recognition). 

ALS  (Amyotrophic  Lateral  Sclerosis) 

ALS  is  also  known  as  Lou  Gehrig’s  Disease  (Taylor  &  Lieberman, 
1988).  ALS  is  a  progressive  disease  that  begins  with  weakness 
in  the  hands  (less  often  the  weakness  begins  in  the  feet) .  ALS 
progresses  at  a  variable  rate,  but  50%  of  those  affected  die 
within  3  years  of  the  initial  symptoms  and  only  10%  live  beyond 
10  years.  As  the  disease  progresses,  the  weakness  spreads  to 
additional  muscles  and  becomes  more  severe  (eventually  lead¬ 
ing  to  paralysis  throughout  the  body).  With  time,  spasticity  oc¬ 
curs  resulting  in  muscles  becoming  tight.  Spasms  and  tremors 
can  also  occur.  ALS  only  affects  those  nerves  that  stimulate  mus¬ 
cle  action.  Cognitive  functions  and  sensation  remain  intact.  ALS 
is  characterized  primarily  by  muscle  power  PI  (weakness  pro¬ 
gressing  to  paralysis),  but  movement  PI  also  occur  (stiff  mus¬ 
cles,  spasms,  and  tremors).  The  PI  are  permanent,  continuous. 


and  progressive.  Initially,  the  PI  are  mild,  but  with  time  they  be¬ 
come  severe.  Eventually,  the  muscles  involved  in  producing 
speech  are  affected. 

Arthritis 

Arthritis  refers  to  the  inflammation  of  a  joint  (Hicks,  1988; 
Schumacher,  1993).  The  two  most  common  forms  of  arthritis 
are  osteoarthritis  and  rheumatoid  arthritis.  Osteoarthritis  (OA) 
is  a  chronic  joint  disorder  that  can  cause  joint  pain  and  stiffness. 
OA  is  characterized  by  degeneration  of  joint  cartilage  and  adja¬ 
cent  bone,  is  the  most  common  joint  disorder,  and  is  one  of  the 
most  common  causes  of  physical  impairments  (Berkow,  1997). 
OA  is  not  an  inevitable  part  of  aging,  but  does  occur  more  of¬ 
ten  in  older  populations.  OA  typically  progresses  slowly  after 
symptoms  first  appear.  OA  can  result  in  significant  mobility  PI 
(reduced  range  of  motion,  weakness,  and  difficulty  with  repeti¬ 
tive  motions),  which  tend  to  be  permanent,  intermittent  or  con¬ 
tinuous,  progressive,  and  can  range  from  mild  to  severe. 

Rheumatoid  arthritis  (RA)  is  an  autoimmune  disease  that 
can  cause  swelling,  pain,  and  often  the  eventual  destruction  of 
the  joint’s  interior.  RA  is  characterized  by  symmetric  inflamma¬ 
tion  of  the  joints.  RA  typically  appears  first  in  people  between  25 
and  50  years  of  age,  affecting  1%  of  the  population.  Most  often, 
RA  starts  subtly,  progressing  at  a  highly  variable  rate.  In  some 
rare  cases,  RA  spontaneously  disappears  and  treatment  is  often 
successful,  but  RA  can  result  in  significant  mobility  PI  (reduced 
range  of  motion,  weakness,  and  difficulty  with  repetitive  mo¬ 
tions).  The  PI  tend  to  be  permanent,  continuous,  and  progres¬ 
sive.  The  severity  of  the  PI  is  typically  mild  initially,  but  in  10% 
of  the  cases  will  progress  to  severe. 

Brain  Injury 

Brain  Injury  (BI)  is  a  term  used  to  describe  a  collection  of  injuries 
(Horn  &  Zasler,  1996;  Rosenthal,  Griffith,  Bond,  &  Miller,  1990). 
Technically,  there  is  a  disconnection  between  the  cause  of  the 
injury  (e.g.,  penetration  of  the  skull  by  a  foreign  object,  or  the 
rapid  acceleration  or  deceleration  resulting  from  a  forceful  blow 
to  the  head  or  vehicular  accident)  and  the  injury  itself  Common 
head  injuries  include  skull  fractures,  cerebral  contusions  (bruises 
on  the  brain),  cerebral  lacerations  (torn  brain  tissue),  concus¬ 
sions  (brief  loss  of  consciousness  after  an  injury),  and  intracranial 
hematomas  (collection  of  blood  in  the  brain  or  between  the 
brain  and  skull).  Head  injuries  are  the  most  common  cause  of 
death  and  disabilities  for  individuals  under  50  years  of  age.  BI  are 
not  always  the  result  of  trauma.  For  example,  anoxia  can  also  re¬ 
sult  in  BI.  The  consequences  of  a  brain  injury  depend  upon  the 
area  of  the  brain  affected  and  the  severity  of  the  injury,  but  may 
include  death;  language,  vision,  and  motor-control  difficulties; 
and  periodic  headaches.  Other  parts  of  the  brain  often  assume 
the  responsibilities  of  the  damaged  portion  of  the  brain,  lessen¬ 
ing  the  severity  of  the  resulting  impairments.  This  ability  is  more 
prevalent  in  children,  where  the  brain  is  more  adept  at  shifting 
functions  to  different  parts  of  the  brain.  The  most  common  PI 
are  movement  oriented  (e.g.,  difficulty  controlling  muscles).  Af¬ 
ter  rehabilitation,  the  PI  are  considered  permanent,  continuous, 
and  static.  Severity  may  range  from  mild  to  severe. 
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Cerebral  Palsy 

Cerebral  Palsy  (CP)  is  a  condition  that  results  from  brain  injury 
that  typically  occurs  before,  during,  or  shortly  after  birth.  CP  is 
not  a  disease  and  is  not  progressive  (Molnar,  1992).  Over  90%  of 
those  affected  with  CP  survive  to  adulthood.  There  are  four 
main  types  of  CP:  spastic,  choreoathetoid,  ataxic,  and  mixed. 
Spastic  CP  occurs  in  approximately  70%  of  those  individuals 
with  CP  and  is  characterized  by  both  movement  and  muscle 
power  PI  (stiff  muscles  and  weakness).  Choreoathetoid  CP  is 
characterized  by  movement  PI  (spontaneous,  slow,  uncon¬ 
trolled  muscle  movements;  abrupt  and  jerky  movements)  and 
occurs  in  20%  of  those  individuals  with  CP  Ataxic  CP  occurs  in 
approximately  10%  of  those  individuals  with  CP  and  is  charac¬ 
terized  by  movement  and  muscle  power  PI  (poor  coordination, 
weakness,  trembling;  rapid  or  fine  movements  can  be  difficult). 
Mixed  CP  occurs  in  many  individuals  and  is  characterized  by  a 
combination  of  two  of  the  previously  mentioned  forms.  Seizures 
are  also  possible  (most  often  with  Ataxic  CP).  In  all  forms  of  CT 
the  PI  are  permanent,  continuous,  static,  and  may  range  from 
mild  to  severe.  It  is  important  to  note  that  speech  and  intelli¬ 
gence  can  be  affected  in  all  forms  of  CP 

Missing  Limbs  or  Digits  (Amelia  or  Amputation) 

Technically,  missing  limbs  or  digits  is  an  impairment,  not  a 
health  condition  (Banerjee,  1982).  However,  like  tremors,  it  is 
possible  for  limbs  or  digits  to  be  missing  without  any  associated 
health  condition  (e.g.,  congenital  absence  of  a  limb  or  digit  is 
not  considered  a  health  condition).  The  PI  associated  with  miss¬ 
ing  limbs  or  digits  is  permanent,  continuous,  static,  and  can  vary 
from  mild  to  severe  depending  on  the  use  and  effectiveness  of 
prosthetic  devices. 

Multiple  Sclerosis 

Multiple  Sclerosis  (MS)  is  a  disorder  in  which  the  nerve  fibers  as¬ 
sociated  with  the  eye,  brain,  and  spinal  cord  lose  patches  of 
myelin  (a  protective,  insulating  sheath)  (Ropper  &  Brown,  2005). 
MS  is  progressive,  but  no  single  pattern  typifies  the  progression. 
MS  results  in  numerous  symptoms,  including  a  variety  of  move¬ 
ment  and  muscle  power  PI  (clumsiness,  tremors,  stiff  muscles, 
weakness).  MS  often  results  in  periods  of  flare-ups  alternating 
with  periods  of  relatively  good  health  (e.g.,  relapses  and  remis¬ 
sions),  but  in  some  individuals  the  symptoms  become  more  se¬ 
vere  within  weeks  or  months.  As  flare-ups  become  more  fre¬ 
quent,  the  symptoms  become  more  severe  and  may  become 
continuous.  The  PI  are  permanent,  intermittent,  and  progressive. 
The  PI  may  be  mild  initially,  but  often  progress  to  become  severe. 
In  addition  to  multiple  PI,  MS  can  result  in  visual  impairments. 

Muscular  Dystrophy 

Muscular  Dystrophy  (MD)  is  a  group  of  inherited  muscle  disor¬ 
ders  (Pidcock  &  Christensen,  1997).  The  two  most  common 
forms  of  MD  are  Duchenne’s  MD  (affecting  20-30  of  every 


100,000  boys)  and  Becker’s  MD  (affecting  three  of  every  100,000 
boys).  Duchenne’s  MD  is  more  severe,  appearing  between  the 
ages  of  three  and  seven;  most  children  must  use  a  wheelchair  by 
10  or  12  and  death  is  common  by  age  20.  Becker’s  MD  is  less 
severe,  with  symptoms  appearing  around  age  10;  very  few  chil¬ 
dren  are  confined  to  a  wheelchair  by  age  16,  and  more  than  90% 
live  beyond  age  20.  Both  forms  begin  with  muscle  power  PI 
(weakness)  that  spreads  and  becomes  more  severe.  This  is  of¬ 
ten  followed  by  mobility  PI  (joints  that  cannot  be  fully  extended 
due  to  contracted  muscles).  Some  PI  may  not  be  present  early 
(e.g.,  inability  to  fully  extend  the  elbows).  In  both  forms,  the  PI 
are  permanent,  continuous,  and  progressive.  Typically,  the  PI  are 
mild  initially  and  progressively  get  worse. 

Parkinson's  Disease 

Parkinson’s  Disease  (PD)  is  a  progressive  degenerative  disor¬ 
der  of  the  nervous  system  (Duvaisin,  1991).  PD  is  characterized 
by  degeneration  of  the  nerve  cells  in  the  basal  ganglia  which  re¬ 
sults  in  reduced  production  of  dopamine  (the  main  neuro¬ 
transmitter  of  the  basal  ganglia).  PD  affects  approximately  one 
in  250  individuals  over  the  age  of  40,  and  one  in  100  individuals 
over  the  age  of  65.  PD  typically  begins  with  mild  symptoms  and 
progresses  gradually.  PD  often  begins  with  resting  tremors 
(movement  PI).  As  the  disease  progresses,  initiating  a  move¬ 
ment  can  become  difficult,  muscles  become  stiff,  and  bending 
the  elbow  can  become  difficult  or  uncomfortable  (movement 
and  mobility  PI).  The  tremors  vary  dramatically.  Some  individu¬ 
als  never  develop  tremors.  For  others,  tremors  may  be  perma¬ 
nent  or  temporary;  continuous  or  intermittent;  progressive,  re¬ 
gressive,  or  static;  and  can  vary  from  mild  to  severe.  Other  PI, 
including  difficulty  initiating  movements  and  stiffness  of  mus¬ 
cles,  are  permanent,  continuous,  and  progressive.  They  are  mild 
initially,  but  progress  to  become  severe.  Still  other  PI,  such  as 
difficulty  bending  the  elbow,  may  never  occur.  Individuals  with 
PD  often  speak  softly  in  a  monotone  and  may  stutter. 

Repetitive  Stress  Injury 

Repetitive  Stress  Injuries  (RSI)  are  known  by  a  variety  of  names 
including  “cumulative  trauma  disorders”  and  “repetitive  trauma 
syndromes”  (Moore  &  Garg,  1992).  RSI  refers  to  a  collection  of 
disorders  or  injuries  that  are  believed  to  be  associated  with 
repetitive  activities,  but  the  precise  cause  is  still  a  subject  of  de¬ 
bate.  RSI  are  most  commonly  associated  with  the  wrists,  but  can 
affect  other  parts  of  the  body.  RSI  most  often  result  in  mobility 
PI  (reduced  range  of  motion,  weakness,  and  difficulty  with 
repetitive  motions)  that  are  permanent,  continuous  or  inter¬ 
mittent,  progressive,  and  can  vary  from  mild  to  severe.  A  clas¬ 
sic  example  is  carpal  tunnel  syndrome. 

Stroke 

A  stroke  is  defined  as  the  death  of  brain  tissue  resulting  from  a 
lack  of  blood  flow  and  insufficient  oxygen  to  the  brain  (Wade, 
Langton  Hewer,  Skilbeck,  &  David,  1985).  A  stroke  typically 
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occurs  when  a  blood  vessel  becomes  blocked  or  bursts.  The 
symptoms  depend  upon  the  area  of  the  brain  that  is  affected, 
but  can  include  vision,  speech,  cognitive,  and  physical  impair¬ 
ments.  Since  other  parts  of  the  brain  can  assume  the  responsi¬ 
bilities  of  the  damaged  portion  of  the  brain,  it  is  possible  to 
lessen  the  severity  of  the  impairments  that  are  apparent  imme¬ 
diately  following  a  stroke.  Rehabilitation  services  are  critical  in 
this  respect.  The  most  common  PI  are  movement  oriented  (e.g., 
difficulty  controlling  muscles).  After  rehabilitation,  the  PI  can  be 
defined  as  permanent,  continuous,  and  static.  The  severity  of 
the  PI  may  be  range  from  mild  to  severe. 


Spinal  Cord  Injury 

Spinal  Cord  Injuries  (SCI)  occur  when  the  spinal  cord  (a  collec¬ 
tion  of  nerves  extending  from  the  base  of  the  brain  through  the 
spinal  column)  is  compressed,  cut,  damaged,  or  affected  by  dis¬ 
ease  (Stiens,  Goldstein,  Hammond,  &  Little,  1997).  The  spinal 
cord  contains  motor  nerves  (controlling  movement)  and  sen¬ 
sory  nerves  (providing  information  about  temperature,  pain, 
position,  and  touch).  The  consequences  of  an  SCI  depend  upon 
the  level  and  completeness  of  the  injury. 

The  level  of  an  injury  is  based  upon  the  nerves  that  are  af¬ 
fected.  The  spinal  column  is  divided  into  four  areas:  cervical 
(neck),  thoracic  (chest),  lumbar  (lower  back),  and  sacral  (tail 
bone).  There  are  seven  cervical  vertebrae,  numbered  Cl 
through  C7  from  top  to  bottom.  There  are  12  vertebrae  in  the 
thoracic  region  (T1-T12),  five  in  the  lumbar  region  (L1-L5),  and 
five  in  the  sacral  region.  The  level  of  an  SCI  refers  to  the  location 
of  the  damaged  nerves.  In  the  cervical  region,  injuries  are  la¬ 
beled  based  upon  the  vertebrae  immediately  below  the  dam¬ 
aged  nerves  (e.g.,  level  Cl  refers  to  damage  to  the  nerves  just 
above  vertebrae  Cl).  Damage  to  nerves  just  below  C7  and  above 
T1  are  referred  to  as  “level  C8.”  In  all  other  regions,  injuries  are 
labeled  based  upon  the  vertebrae  immediately  above  the  dam¬ 
aged  nerve  (level  T3  refers  to  injuries  to  the  nerves  between  T3 
and  T4).  In  general,  injuries  higher  on  the  spinal  cord  will  re¬ 
sult  in  greater  impairment.  Table  42.1  summarizes  the  possible 
consequences  of  high-level  SCI. 

The  completeness  of  an  injury  is  often  assessed  using  a  scale 
defined  by  the  American  Spinal  Injury  Association  (ASIA,  2001). 
While  there  are  additional  details,  and  assessments  must  be 


TABLE  42. 1  Relationship  Between  the  Level  of  a  Spinal  Cord 
Injury  and  the  Resulting  Effects  on  Muscles 


Level  of  Injury 

Effect 

Cl  toC5 

Paralysis  of  muscles  used  for  breathing,  controlling 
the  arms,  and  controlling  the  legs. 

C5  or  C6 

Paralysis  of  the  legs.  Some  ability  to  flex  the  arms 
remains. 

C6  or  C7 

Paralysis  of  the  legs.  Paralysis  of  part  of  the  wrists 
and  hands.  Much  of  the  ability  to  move  the 
shoulders  and  to  bend  the  elbows  remains. 

C8  orTl 

Legs  and  trunk  paralyzed.  Hands  paralyzed.  Arms 
remain  relatively  unaffected. 

completed  by  trained  professionals,  the  following  summaries 
provide  sufficient  detail  for  our  purposes.  Table  42.2  provides  a 
brief  description  of  the  ASIA  scores. 

SCI  result  in  muscle  power  PI  with  the  level  of  the  injury  de¬ 
termining  which  muscles  are  affected  and  the  completeness  of 
the  injury  determining  how  much  muscle  power  is  lost.  Injuries 
with  ASIA  scores  of  A  or  B  result  in  a  complete  loss  of  muscle 
power  (e.g.,  paralysis)  in  the  affected  muscles.  Injuries  assigned 
ASIA  scores  of  C  or  D  are  more  difficult  to  describe  since  the 
amount  of  residual  muscle  power  can  be  highly  variable  (e.g., 
weakness  or  paresis).  PI  associated  with  SCI  are  considered  per¬ 
manent,  continuous,  and  static,  with  the  severity  of  the  PI  rang¬ 
ing  from  mild  to  severe. 


Tremors 

A  tremor  is  an  involuntary  movement  (Hallett,  1991).  Tremors 
are  produced  by  involuntary  muscle  activity  and  are  often  de¬ 
scribed  as  rhythmic  shaking  movements.  Some  tremors  oc¬ 
cur  when  muscles  are  in  use  {action  tremors)  while  others 
when  the  muscles  are  resting  {resting  tremors).  Intention 
tremors  occur  when  an  individual  makes  a  purposeful  motion. 
Tremors  are  a  common  consequence  of  various  health  condi¬ 
tions  (e.g.,  MS  or  stroke).  In  these  situations,  the  tremors  are 
described  under  the  appropriate  health  condition  (e.g.,  see 
MS  or  stroke). 

Essential  tremors  usually  begin  in  early  adulthood  and  slowly 
become  more  obvious  (essential  tremors  that  begin  in  older 
adults  are  referred  to  as  senile  tremors).  Essential  tremors  typi¬ 
cally  stop  when  the  arms  (or  legs)  are  at  rest,  but  become  more 
obvious  when  the  arms  are  held  away  from  the  body  or  in  awk¬ 
ward  positions.  Essential  tremors  are  relatively  fast  but  result  in 
little  movement.  The  vocal  cords  can  be  affected  by  essential 
tremors,  resulting  in  inconsistent  speech.  Essential  tremors  are 
permanent,  intermittent,  and  can  be  progressive.  Essential 
tremors  are  typically  mild  initially,  but  may  progress  to  be  severe 
and  can  be  variable.  While  tremors  are  technically  a  movement 
PI,  essential  tremors  have  no  known  cause,  which  is  why  they 
are  included  in  this  list  of  health  conditions. 


TABLE  42.2  Residual  Motor  and  Sensory  Function  Associated 
with  each  ASIA  Score 


ASIA  Score/ 
Completeness 

Description  of  Motor  and  Sensory  Function 

A 

Complete.  No  residual  motor  or  sensory  function 
below  the  level  of  the  injury. 

B 

Motor  complete,  sensory  incomplete.  There  is  no 
residual  motor  function,  but  some  sensory 
capabilities  remain  in  tact. 

C 

Motor  and  sensory  incomplete.  Most  muscles 
below  the  level  of  injury  are  below  60%  of 
normal  strength. 

D 

Motor  and  sensory  incomplete.  Most  muscles 
below  the  level  of  injury  are  at  or  above  60%  of 
normal  strength. 

E 

Normal  motor  and  sensory  function. 
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Locked-In  Syndrome 

Locked-in  syndrome  is  a  rare  condition  in  which  a  person  is  con¬ 
scious  and  able  to  think  but  is  so  severely  paralyzed  that  com¬ 
munication  is  possible  only  by  opening  and  closing  the  eyes  in 
response  to  questions  (Berkow,  1997).  Many  diseases  or  injuries 
may  result  in  locked-in  syndrome,  including  ALS,  subcortical  he¬ 
morrhage,  polyneuropathy,  Guillian-Barre  syndrome,  and  brain 
injuries. 

Many  cases  of  locked-in  syndrome  are  the  result  of  various 
progressive  health  conditions,  such  as  ALS.  In  these  cases,  indi¬ 
viduals  gradually  lose  their  physical  capabilities,  usually  starting 
from  the  lower  limbs  and  progressing  to  the  body  and  arms. 
Eventually,  speech  and  movement  of  the  neck  are  affected. 
Locked-in  syndrome  results  in  muscle  power  and  movement  PI 
including  paralysis  of  most  muscles,  stiff  muscles,  and  tremors. 
The  PI  are  permanent,  continuous,  and  severe.  The  initial  sever¬ 
ity  may  be  less  severe  depending  on  the  underlying  disease  or 
injury,  but  by  the  time  an  individual  is  considered  locked-in, 
muscle  power  PI  are  always  severe. 

Summary 

While  the  examples  listed  are  some  of  the  most  common  health 
conditions  that  result  in  PI,  this  is  by  no  means  a  complete  list. 
Table  42.3  provides  a  summary  of  the  information  presented 
earlier,  including  both  the  PI  and  any  additional  impairments 
that  associated  with  each  health  condition  that  may  influence 
the  design  of  alternative  strategies  or  technologies. 

Associated  PI  are  listed  as  well  as  additional  impairments  as¬ 
sociated  with  the  health  condition  that  could  interfere  with  the 
use  of  standard  computing  technologies. 


SITUATIONALLY-INDUCED 
IMPAIRMENTS  AND  DISABILITIES 


As  Fig.  42.1  indicates,  both  the  environment  in  which  an  indi¬ 
vidual  is  working  and  the  activities  in  which  she  is  engaged  can 
contribute  to  the  existence  of  impairments  that  interfere  with 
the  use  of  computing  technologies.  Environmental  characteris¬ 
tics  such  as  lighting,  noise,  vibration,  and  temperature  can  all  af¬ 
fect  an  individual’s  ability  to  interact  with  computing  technolo¬ 
gies.  Similarly,  the  user  may  be  engaged  in  multiple  activities, 
each  placing  demands  on  her  cognitive,  perceptual,  or  physical 
capabilities,  resulting  in  an  overload  such  that  one  or  more  of 
the  tasks  cannot  be  completed  optimally.  Stress  can  also  affect 
an  individual’s  ability  to  interact  with  computers. 

In  some  situations,  an  individual’s  context  creates  temporary 
impairments.  In  other  situations,  there  is  no  impairment,  but  a 
temporary  disability  does  exist.  ’When  the  individual’s  physical 
capabilities  are  altered,  the  result  is  a  temporary  PL  For  exam¬ 
ple,  extended  exposure  to  cold  temperatures  can  make  it  diffi¬ 
cult  for  an  individual  to  bend  his  fingers,  creating  a  temporary 
mobility  impairment  (similar  to  arthritis)  that  can  hinder  the  use 
of  information  technologies.  In  contrast,  when  the  individual’s 
physical  capabilities  are  not  altered  (e.g.,  a  vibrating  environ¬ 


ment),  but  are  simply  rendered  inadequate  for  the  task,  the  re¬ 
sult  is  a  temporary  disability.  An  example  is  an  environment  that 
is  vibrating  (e.g.,  a  moving  vehicle).  'Vibration  does  not  alter  any 
of  the  body’s  structures  or  functions  so  there  is  no  impairment, 
but  can  make  it  difficult  to  use  a  stylus  to  enter  text  using  ges¬ 
ture  recognition,  so  the  individual  does  experience  a  disability. 
■Whether  the  context  results  in  an  impairment  or  disability,  our 
goal  is  to  minimize  the  negative  impact  on  the  user’s  computing 
activity. 

■While  extensive  research  has  studied  dual-task  scenarios,  in 
which  participants  were  required  to  attend  to  two  tasks  at  the 
same  time,  these  studies  tend  to  highlight  decreases  in  cogni¬ 
tive  performance  rather  than  in  physical  performance.  At  the 
same  time,  individuals  are  finding  themselves  in  situations  in 
which  they  are  engaged  in  secondary  computing  tasks  with  in¬ 
creasing  frequency.  For  example,  a  paramedic’s  hands  may  be 
busy  providing  medical  care  while  they  engage  in  a  secondary 
computing  activity  to  take  notes  and  complete  required  forms. 
Simply  walking  can  also  make  the  physical  activities  required  to 
interact  with  a  mobile  computing  device  more  complex  (e.g.. 
Price,  Lin,  Feng,  Goldman,  Sears,  &Jacko,  2006). 

■While  many  factors  can  adversely  affect  an  individual’s  abil¬ 
ity  to  perform  the  physical  actions  required  to  interact  with 
computing  technologies,  and  some  studies  have  been  con¬ 
ducted,  these  relationships  have  yet  to  be  comprehensively  in¬ 
vestigated  and  reported  within  the  HCI  literature.  At  the  same 
time,  the  fundamental  characteristics  of  these  situationally  in¬ 
duced  impairments  and  disabilities  (SIID)  are  clear.  Most  im¬ 
portantly,  unlike  impairments  associated  with  health  conditions 
which  tend  to  be  permanent,  SIID  are  temporary.  As  a  result, 
individuals  experiencing  SIID  are  less  likely  to  develop  accom¬ 
modation  strategies  and  are  therefore  more  likely  to  benefit 
from  technology-based  accommodations.  Since  SIID  are  tem¬ 
porary,  they  are  also  intermittent.  SIID  may  be  progressive,  re¬ 
gressive,  or  static  depending  on  the  conditions  and  can  range 
from  mild  to  severe  or  could  be  variable.  Effectively  addressing 
the  highly  variable  nature  of  SIID  is  an  interesting,  and  chal¬ 
lenging,  area  in  need  of  additional  research. 


HCI  RESEARCH  AND  PI 


Researchers  have  approached  the  issue  of  designing  computing 
technologies  for  individuals  with  PI  from  a  variety  of  perspec¬ 
tives.  Many  projects  were  clearly  motivated  by  a  desire  to  ad¬ 
dress  the  needs  of  individuals  with  disabilities,  and  a  subset  of 
these  projects  focus  on  specific  health  conditions  or  PL  For  ex¬ 
ample,  researchers  have  investigated  the  development  of  a  new 
interaction  alternative  for  individuals  with  cerebral  palsy  (Roy, 
Panayi,  Erenshteyn,  Foulds,  &  Fawcus,  1994),  cursor  control  for 
individuals  with  spinal  cord  injuries  (Casali,  1992),  and  text  en¬ 
try  for  individuals  with  spinal  cord  injuries  (Sears,  Karat,  Os- 
eitutu,  Karimullah,  &  Feng,  2001).  Other  projects  are  motivated 
by  specific  impairments  or  disabilities  regardless  of  the  under¬ 
lying  cause.  For  example,  researchers  have  investigated  the  ef¬ 
fect  neck  range  of  motion  has  on  the  use  of  head  controls  (Lo- 
Presti,  Brienza,  Angelo,  Gilbertson,  &  Sakai,  2000),  and  the  use 
of  force  feedback  for  individuals  with  various  movement  and 
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TABLE  42.3  Common  Health  Conditions  that  can  Result  in  PI  that  Hinder  Interactions  with  Standard  Computing  Technologies 


Heaith  Condition 

Common  PI  and  Characteristics 

Additional  Impairments 

ALS 

Muscle  power:  Weakness  often  begins  in  hands  or  feet.  With  time,  ioss  of 

In  time,  muscles  throughout  the  body 

muscie  power  spreads  and  becomes  more  severe.  Permanent,  continuous, 

can  be  affected,  including  those 

progressive,  mild  progressing  to  severe  (paralysis). 

Movement:  Muscies  can  become  stiff.  Spasms  and  tremors  can  occur. 
Permanent,  continuous,  progressive,  mild  to  severe. 

involved  in  producing  speech. 

Arthritis 

Osteoarthritis 

Mobility:  Reduced  range  of  motion,  weakness,  difficuity  with  flexibiiity  of 
joints  and  movement.  Permanent,  intermittent  or  continuous,  progressive, 
mild  to  severe. 

Rheumatoid  arthritis 

None. 

Mobility:  Reduced  range  of  motion,  weakness,  difficuity  with  flexibiiity  of 
joints  and  movement.  Permanent,  continuous,  progressive,  mild  to  severe. 

None. 

Brain  injury 

Movement:  A  wide  range  of  movement-oriented  difficuities  are  possibie 

Speech,  vision,  and  cognitive 

depending  on  the  iocation  and  severity  of  the  injury.  Permanent, 
continuous,  static,  mild  to  severe. 

impairments  are  possibie. 

Cerebral  paisy 

Spastic 

Muscle  power:  Weakness.  Permanent,  continuous,  static,  mild  to  severe. 

Speech  and  inteliigence  can  be 

Paralysis  does  not  occur. 

Movement:  Stiff  muscies.  Permanent,  continuous,  static,  mild  to  severe. 
Choreoathetoid 

affected. 

Movement:  Spontaneous,  siow,  uncontroiled  muscie  movements  as  weli 

Speech  and  inteliigence  can  be 

as  jerky,  abrupt  movements.  Permanent,  continuous,  static,  mild  to  severe. 

Ataxic 

affected. 

Muscle  power:  Weakness.  Permanent,  continuous,  static,  mild  to  severe. 

Speech  and  inteliigence  can  be 

Paralysis  does  not  occur. 

Movement:  Poor  coordination,  trembling,  difficulty  with  rapid  or  fine 
movements.  Permanent,  continuous,  static,  mild  to  severe. 

Mixed 

Any  combination  of  two  forms  listed  earlier. 

affected. 

Locked-in  syndrome 

Muscle  power:  Locked-in  syndrome  involves  complete  loss  of  power 
in  all  muscles  except  those  that  control  the  eyes.  Permanent,  static,  severe. 
Movement:  Muscles  can  become  stiff.  Spasms  and  tremors  can  occur. 
Permanent,  static,  severe. 

Speech  is  affected. 

Missing  limbs  or  digits 

Mobility.  Permanent,  continuous,  static,  mild  to  severe. 

None. 

Multipie  scierosis 

Muscle  power:  Weakness.  Permanent,  intermittent,  progressive,  mild  to  severe. 
Paralysis  does  not  occur. 

Movement:  Clumsiness,  tremors,  stiff  muscles.  Permanent,  intermittent, 
progressive,  mild  to  severe. 

Vision  can  be  affected. 

Muscuiar  dystrophy 

Muscle  power:  Weakness  that  spreads  and  becomes  more  severe. 

Permanent,  continuous,  progressive,  mild  to  severe.  Paralysis  does  not  occur. 

Mobility:  Joints  that  cannot  be  fuliy  extended.  Permanent,  continuous, 
progressive,  mild  to  severe. 

None. 

Parkinson's  disease 

Movement:  Tremors  (piil  roiling)  of  one  side  are  often  the  first  sign. 

Speech  can  be  affected.  Cognitive 

Permanent  or  temporary;  continuous  or  intermittent;  progressive,  regressive, 
or  static;  mild  to  severe.  Difficulty  initiating  movements  and  stiff  muscles. 
Permanent,  continuous,  progressive,  mild  to  severe. 

Mobility:  Difficulty  bending  the  elbow.  Permanent,  continuous,  progressive, 
non-existent  to  severe. 

impairments  are  possibie. 

Repetitive  stress  injuries 

Mobility:  Reduced  range  of  motion,  weakness,  difficuity  with  repetitive 
motions.  Permanent,  continuous  or  intermittent,  progressive,  mild  to  severe. 

None. 

Stroke 

Movement:  A  wide  range  of  movement-oriented  difficuities  are  possibie 

Speech,  vision,  and  cognitive 

depending  on  the  iocation  (in  the  brain)  and  severity  of  the  injury.  Stroke 
does  not  always  result  in  movement  disorders.  Permanent,  continuous, 
static,  mild  to  severe. 

impairments  are  possibie. 

Spinai  cord  injuries 

Muscle  power:  The  specific  muscles  affected  are  determined  by  the 
level  of  the  injury.  How  much  power  is  lost  is  determined  by  the 
compieteness  of  the  injury.  Permanent,  continuous,  static,  mild  to  severe. 

None. 

Tremors 

Movement:  Tremors  often  affect  the  hands.  Permanent,  intermittent, 
progressive  or  static,  mild  to  severe  or  variable. 

Speech  can  be  affected. 
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muscle  power  PI  that  affect  the  hands  (Keates,  Langdon,  Clark¬ 
son,  &  Robinson,  2000).  Numerous  articles  that  investigate  the 
needs  of  individuals  with  specific  health  conditions  or  Pis  are 
discussed  later. 

Another  group  of  projects  address  issues  associated  with  PI, 
but  do  not  identify  any  specific  PI  or  health  conditions  as  the 
motivation  for  the  research.  For  example,  several  articles  discuss 
communication  issues  for  individuals  “with  severe  speech  and 
motor  impairments”  (McCoy,  Demasco,  Jones,  Pennington,  Van- 
derheyden,  &  Zickus,  1994),  while  others  explore  speech-based 
interfaces  for  “motor-control  challenged  computer  users”  (Ma- 
naris  &  Harkreader,  1998),  new  techniques  for  eye  tracking  that 
“could  be  useful  to  some  people  with  physical  disabilities”  (Pat¬ 
more  &  Knapp,  1998),  and  the  use  of  word-prediction  for  text 
entry  by  “people  with  severe  physical  disabilities”  (Garay-Vito- 
ria  &  Gonzalez-Abascal,  1997).  Multiple  articles  that  fit  into  this 
category  will  also  be  discussed. 

The  goals  for  this  literature  review  include  (a)  summarizing 
the  current  state  of  our  knowledge  with  respect  to  designing 
computing  technologies  for  individuals  with  PI,  (b)  highlighting 
numerous  unsolved  problems  in  need  of  additional  research, 
and  (c)  providing  pointers  to  related  articles  that  are  not  dis¬ 
cussed.  While  there  are  many  ways  to  organize  this  review,  we 
group  articles  by  health  condition  or  PI  whenever  possible.  Ar¬ 
ticles  that  focus  on  a  specific  health  condition  or  PI  are  dis¬ 
cussed,  followed  by  articles  that  address  less-precisely  defined 
PI  (e.g.,  various  impairments  that  affect  the  hands,  regardless 
of  the  underlying  health  condition).  Next,  we  present  several  ar¬ 
ticles  that  have  a  clear  association  with  PI,  but  do  not  discuss  any 
specific  PI  (e.g.,  articles  that  present  a  technology  specifically 
in  the  context  of  individuals  with  PI,  but  do  not  identify  the  PI  or 
health  condition).  We  conclude  by  reviewing  articles  that  ex¬ 
plore  the  adoption  or  abandonment  of  assistive  technologies. 
Occasionally,  articles  that  focus  on  a  specific  health  condition  are 
included  in  other  sections  due  to  the  PI  or  technology  discussed 
(e.g.,  Doherty,  Cockton,  Bloor,  &  Benigno,  2000).  Throughout 
this  section,  we  use  terminology  that  is  consistent  with  the  def¬ 
initions  presented  earlier.  As  a  result,  our  terminology  may  dif¬ 
fer  from  that  used  in  the  original  articles. 

Before  discussing  specific  articles,  we  must  acknowledge 
that  several  related  bodies  of  literature  are  not  discussed.  First, 
we  do  not  discuss  the  articles  that  simply  mention  individuals 
with  disabilities  as  one  of  the  possible  groups  of  users  who  may 
benefit  from  the  results  of  the  research.  For  example,  many  ar¬ 
ticles  discussing  multimodal  interfaces  mention  potential  ben¬ 
efits  for  individuals  with  disabilities  at  some  point,  but  few  are 
designed  for  or  evaluated  using  individuals  with  disabilities  (see 
chapter  21  for  an  introduction  to  multimodal  interactions  and 
pointers  to  the  relevant  research).  Second,  we  do  not  discuss  ar¬ 
ticles  that  focus  on  the  more  general  problem  of  designing  ac¬ 
cessible  computer  systems  without  a  significant  focus  on  PI.  For 
articles  of  this  nature,  the  reader  is  referred  to  the  proceedings 
of  the  ASSETS  conferences  (ASSETS,  2006),  CUU  conference 
(CUU,  2003),  and  the  ERCIM  Workshops  on  User  Interfaces  for 
All  (ERCIM,  2006).  Stephanidis  (2001)  also  discussed  these  is¬ 
sues.  We  encourage  the  reader  to  review  this  literature  as  it  of¬ 
ten  provides  valuable  insights  into  the  methods  that  can  be 
used  to  more  effectively  integrate  accessibility  concerns  into  the 
development  process.  Finally,  we  do  not  discuss  the  extensive 


literature  available  in  the  rehabilitation  and  assistive-technol¬ 
ogy  communities.  We  do  encourage  the  reader  to  review  the 
contributions  of  these  communities.  More  specifically,  we  refer 
the  reader  to  the  proceedings  of  the  annual  Closing  the  Gap 
conferences  (CTG,  2006),  Rehabilitation  Engineering  and  Assis¬ 
tive  Technology  Society  of  North  America  conferences  (RESNA, 
2006) ,  and  the  CSUN  conferences  (CSUN,  2006) .  The  IEEE  Trans¬ 
actions  on  Rehabilitation  Engineering  often  includes  articles 
discussing  the  technical  details  of  new  devices  and  various 
books  and  Internet  sites  can  provide  valuable  insights  (e.g.,  Ed¬ 
wards,  1995;  Gray,  Quatrano,  &  Liberman,  1998;  King,  1999; 
Stephanidis,  2001;  Trace  Center,  2007). 

Spinal  Cord  Injuries  (SCI) 

SCI  are  one  of  the  most  frequently  studied  health  conditions  in 
the  HCI  community.  Given  the  relationship  between  the  level  of 
injury  and  the  resulting  impairments,  studies  typically  include  in¬ 
dividuals  with  injuries  that  are  no  lower  than  C7.  The  following 
articles  discuss  various  topics  including  cursor  control,  validating 
keystroke-level  models,  a  brain-controlled  interface,  and  the  use 
of  speech  recognition  for  communication-oriented  activities. 

Casali  (1992)  investigated  the  efficacy  of  five  cursor-control 
devices  when  used  by  individuals  with  SCI.  Twenty  individuals 
with  SCI,  and  ten  individuals  with  no  PI,  participated  in  this 
study.  Participants  with  SCI  were  divided  into  two  groups  using 
a  custom  assessment  test  (see  Casali  &  Chase,  1993;  Casali, 
1995)  that  evaluates  upper-extremity  motor  skills.  As  a  result, 
three  groups  of  participants  were  discussed:  low  motor  skills, 
high  motor  skills,  and  participants  with  no  PI.  Each  participant 
used  a  trackball,  mouse,  tablet,  and  joystick  in  addition  to  the 
cursor  keys  to  complete  a  series  of  tasks  that  required  either 
selecting  or  dragging  a  target.  Participants  in  the  low-motor-skill 
group  took  longer  to  complete  the  tasks  than  the  individuals 
with  no  PI,  but  the  difference  between  participants  in  the  high- 
motor-skill  group  and  the  participants  with  no  PI  was  not  sig¬ 
nificant.  For  all  three  groups,  the  mouse,  trackball,  and  tablet  re¬ 
sulted  in  the  shortest  task-completion  times,  the  cursor  keys 
took  longer,  and  the  joystick  was  the  slowest  device.  Target  size 
was  more  important  as  motor  skills  decrease.  Interestingly,  the 
authors  noted  that  all  participants  were  able  to  complete  the 
tasks  with  all  five  devices  with  minimal  customization.  However, 
individuals  in  the  low-  and  high-motor-skills  groups  experienced 
substantial  difficulty  when  using  the  mouse,  especially  when 
tasks  required  holding  a  button  down  while  moving  the  mouse. 

Koester  and  Levine  (1994)  investigated  the  efficacy  of 
keystroke-level  models  for  predicting  performance  improve¬ 
ments  which  individuals  with  SCI  would  experience  when  using 
word  prediction  software.  The  authors  described  a  study  that 
provides  insights  into  the  benefits  word  prediction  software  can 
provide  and  the  effectiveness  of  keystroke-level  models  for  pre¬ 
dicting  performance  by  individuals  with  PL  Eight  individuals 
with  no  PI  and  six  individuals  with  high-level  SCI  participated 
in  the  study.  The  individuals  with  SCI  used  their  normal  method 
of  interacting  with  a  keyboard  (two  used  a  mouthstick,  four 
used  hand  splints).  Participants  completed  the  tasks  with  and 
without  word  prediction  to  allow  performance  improvements 
to  be  evaluated. 
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Two  models  were  explored:  (a)  a  generic  model  based  upon 
parameters  derived  from  the  existing  literature  combined  with 
new  data  from  individuals  with  no  PI,  and  (b)  a  user-driven 
model  based  upon  parameters  derived  from  the  data  collected 
in  the  current  study.  The  user-driven  models  resulted  in  an  av¬ 
erage  error  rate  of  6%  while  the  generic  models  resulted  in  er¬ 
ror  rates  of  53%  and  11%  for  the  participants  with  and  without 
PI,  respectively.  Perhaps  more  interesting  were  the  changes  in 
data  entry  rates  observed  for  the  two  groups  of  participants. 
Overall,  the  models  predicted  modest  improvements  in  perfor¬ 
mance  ranging  from  slight  decreases  to  improvements  of  over 
40%.  Results  for  participants  without  PI  matched  the  predic¬ 
tions,  but  participants  with  PI  showed  a  consistent,  and  large, 
decrease  in  data-entry  rates  when  using  word  prediction  (rang¬ 
ing  from  approximately  20-50%  reductions  in  data-entry  rates). 

Mason,  Bozorgzadeh,  and  Birch  (2000)  and  Birch,  Bozorg- 
zadeh,  and  Mason  (2002)  described  a  brain-controlled  switch 
that  can  be  activated  by  imagining  movement.  Earlier  articles 
described  studies  in  which  this  technology  was  used  by  individ¬ 
uals  without  PI  (Lisogurski  &  Birch,  1998;  Bozorgzadeh,  Birch,  & 
Mason,  2000),  demonstrating  that  this  technology  could  effec¬ 
tively  identify  both  real  and  imagined  finger  movements.  In  the 
current  articles,  the  authors  describe  a  study  that  involved  two 
individuals  with  high-level  SCI  with  no  residual  motor  or  sen¬ 
sory  function  in  their  hands.  Their  results  indicated  false-posi¬ 
tive  rates  of  less  than  1%  (e.g.,  the  system  detected  activity  when 
none  was  intended)  and  hit-rates  of  35-48%  (e.g.,  the  system 
detected  activity  when  it  was  intended) .  While  higher  hit-rates 
are  desirable,  these  results  were  viewed  as  positive  in  that  they 
confirmed  that  individuals  with  no  residual  motor  or  sensory 
function  in  their  hands  could  activate  the  system  by  imagining 
finger  movements. 

Sears,  Karat,  Oseitutu,  Karimullah,  and  Feng  (2001)  discussed 
the  results  of  an  experiment  designed  to  explore  the  effective¬ 
ness  of  speech  recognition  for  communication-oriented  activi¬ 
ties  when  used  by  individuals  with  SCI  as  well  as  individuals  with 
no  PI.  The  individuals  with  SCI  all  had  injuries  at  or  above  C6 
with  ASIA  scores  of  A  or  B,  resulting  in  either  no  use  or  limited 
use  of  their  arms  and  their  hands  being  paralyzed.  Overall,  there 
were  no  significant  differences  between  the  two  groups  of 
users,  with  participants  producing  text  at  approximately  13 
words  per  minute  (wpm)  with  few  errors.  An  analysis  of  subjec¬ 
tive  satisfaction  ratings  indicated  that  participants  with  PI  were 
more  positive  about  their  experience.  A  more  detailed  analysis 
revealed  significant  differences  with  respect  to  the  processes 
employed  by  the  two  groups  of  participants.  For  example,  par¬ 
ticipants  with  PI  spent  more  of  their  time  dictating  and  inter¬ 
rupted  their  dictation  more  often  to  correct  errors,  but  they  also 
spent  less  of  their  time  navigating  from  one  location  to  another 
within  the  document.  Oseitutu,  Feng,  Sears,  and  Karat  (2001) 
provided  additional  details  that  highlighted  the  different  strate¬ 
gies  adopted  by  these  two  groups  of  participants. 

Sears,  Feng,  Oseitutu,  and  Karat  (2003)  focused  on  speech- 
based  navigation  techniques  to  help  people  with  spinal  cord  in¬ 
juries  edit  text  documents.  They  found  that  traditional  target- 
based  techniques  (e.g.,  “Select  book”)  and  direction-based 
techniques  (e.g.,  “Move  up  four  lines”)  were  highly  ineffective 
with  users  spending  one  third  of  their  time  issuing  navigation 
commands  while  experiencing  failure  rates  of  approximately  15% 


to  19%  respectively.  Further,  when  navigation  commands  failed, 
the  consequences  were  often  rather  severe.  Frequently,  failed 
commands  would  alter  the  content  of  the  document,  requiring 
significant  time  and  effort  to  recover.  The  authors  proposed  sev¬ 
eral  changes  to  existing  navigation  techniques  to  address  these 
difficulties.  These  revisions  significantly  reduced  the  time  spent 
on  navigation  and  significantly  improved  data-entry  rates.  Feng 
and  Sears  (2004)  proposed  a  new  speech-based  navigation  so¬ 
lution,  which  introduces  carefully  specified  navigation  anchors 
to  allow  users  to  select  target  words  efficiently  and  reliably.  A 
preliminary  study  confirmed  that  their  new  anchor-based  navi¬ 
gation  technique  allowed  users  to  reach  targets  quickly  while 
experiencing  fewer  errors. 

LoPresti  and  Brienza  (2004)  investigated  use  of  head-controlled 
devices  by  individuals  with  SCI.  Although  head-controlled  de¬ 
vices  can  help  people  with  limited  use  of  their  hands,  the  devices 
can  prove  challenging  when  the  users  also  have  limited  neck 
movement.  To  address  this  issue,  the  authors  developed  soft¬ 
ware  that  automatically  adjusted  the  interface  sensitivity  to  the 
needs  of  a  particular  user.  The  software  was  evaluated  in  two 
stages.  The  first  stage  involved  16  novice  users  with  SCI  or  mul¬ 
tiple  sclerosis.  Icon  selection  speed  increased  significantly  with 
the  new  software.  The  second  stage  involved  five  users  with  ex¬ 
tended  use  of  the  original  head-controlled  device.  Only  one  par¬ 
ticipant  experienced  improvements  in  performance.  Unfortu¬ 
nately,  calibrating  their  system  required  five  steps,  resulting  in  a 
complex  and  time-consuming  process,  especially  for  novice 
users  who  have  yet  to  commit  to  long  term  use  of  the  system. 

Cerebral  Palsy  (CP) 

CP  can  result  in  a  variety  of  impairments  due  to  reduced  mus¬ 
cle  power  or  a  loss  of  control  over  voluntary  or  involuntary 
movements.  CP  can  also  result  in  both  impaired  speech  and  in¬ 
telligence.  As  a  result,  designing  technologies  for  individuals 
with  CP  can  be  particularly  challenging.  In  this  section,  we  dis¬ 
cuss  the  results  of  a  single  study  that  was  motivated  explicitly 
by  CP  Additional  studies  discussed  in  elsewhere  in  this  chapter 
may  have  been  motivated  by  CP  For  example.  Aim,  Todman,  El¬ 
der,  and  Newell  (1993)  described  a  system  designed  for  indi¬ 
viduals  with  severe  speech  and  motor  impairments,  including 
an  evaluation  that  included  one  individual  with  CP 

Roy  et  al.  (1994)  discussed  the  design  of  a  gestural  interface 
designed  for  individuals  with  speech  and  motor  impairments 
due  to  CP  Fourteen  students  with  Clj  ages  5-17,  were  observed 
during  their  regular  school  activities.  Numerous  communicative 
acts  were  observed  that  combined  facial  expressions;  eye-gaze; 
vocalization;  dysarthcic  (slurred,  slow,  difficult  to  produce,  and 
difficult  to  understand)  speech;  and  upper  extremity  gestures 
using  the  head,  arms,  hands,  and  upper  torso.  Subsequently, 
four  students  participated  in  sessions  in  which  data  were  col¬ 
lected  for  a  predefined  set  of  gestures  (using  a  3-D  magnetic 
tracker  and  EMG  electrodes).  An  evaluation  of  the  complete  set 
of  gestures,  or  the  subsequent  use  of  such  gestures  by  individ¬ 
uals  with  CP,  was  not  reported.  However,  preliminary  results 
that  were  reported  provided  encouragement  with  respect  to 
the  use  of  computer-based  gesture  recognition  as  a  communi¬ 
cations  tool  for  individuals  with  CP 
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Impairments  of  the  Hands  and  Arms 

Multiple  studies  have  investigated  the  difficulties  individuals  ex¬ 
perience  due  to  movement  or  muscle  power  PI  that  affect  their 
hands  and  arms.  These  studies  often  include  participants  with 
a  range  of  underlying  health  conditions  (e.g.,  Clf  SCI,  stroke) 
as  well  as  varying  PI.  The  unifying  theme  is  that  each  study  fo¬ 
cused  on  the  ability  of  the  participants  to  interact  with  comput¬ 
ers  using  the  arms  and  hands. 

Trewin  and  Pain  (1999)  discussed  the  errors  that  occur 
when  individuals  with  PI  that  affect  their  hands  and  arms  in¬ 
teract  with  a  keyboard  and  mouse.  Twenty  individuals  with  var¬ 
ious  PI  participated,  as  did  six  individuals  with  no  PI.  Each  par¬ 
ticipant  completed  tasks  using  the  keyboard,  mouse,  or  both. 
Several  important  categories  of  typing  errors  were  identified: 
long  key  presses,  additional  keys  (both  local  and  remote),  miss¬ 
ing  keys,  simultaneous  keys,  and  bounces.  Each  type  of  error 
occurred  more  often  for  the  participants  with  PI.  For  additional 
related  work,  see  Trewin  (1996)  as  well  as  Trewin  and  Pain 
(1998a,  1998b). 

Keates  et  al.  (2000)  discussed  the  use  of  haptic  feedback  for 
individuals  with  PI.  More  specifically,  they  investigated  the  po¬ 
tential  of  force  feedback  to  facilitate  point-and-click  activities. 
They  reported  on  two  pilot  studies  that  focused  on  individuals 
with  movement  and  muscle  power  PI  but  unaffected  sensitivity 
of  touch.  Six  individuals  participated  in  the  studies;  four  partic¬ 
ipants  had  choreoathetoid  Cl)  one  had  Friedrich’s  Ataxia,  and 
one  had  Kalman-Lamming’s  Syndrome.  Participants  completed 
a  series  of  point-and-click  activities.  In  the  first  study,  the  efficacy 
of  several  alternative  forms  of  feedback  was  explored.  These 
forms  included  a  pointer  trail,  changing  the  color  of  a  target 
when  the  cursor  is  over  it,  force  feedback  gravity  wells  that 
pulled  the  cursor  toward  targets,  vibrating  the  mouse  when  the 
cursor  is  over  a  target,  or  a  combination  of  all  of  these  forms  of 
feedback.  Their  results  indicated  that  gravity  wells  may  reduce 
target  selection  times  by  10-20%.  Similar  benefits  were  also 
seen  for  pointer  trails — a  standard  accessibility  option  in  the 
Windows  operating  systems.  Participants  disliked  the  vibrating 
cursor,  which  also  increased  target  selection  times.  Interestingly, 
these  results  are  similar  to  those  reported  for  participants  with 
no  PI  using  the  Phantom  input  device  (Oakley,  McGee,  Brew¬ 
ster,  &  Gray,  2000). 

The  researchers’  second  study  explored  the  relationship 
between  force  feedback  and  target  size.  Force  feedback  re¬ 
duced  the  time  required  to  select  targets  by  30-50%  and  er¬ 
ror  rates  by  approximately  80%.  Without  force  feedback,  the 
time  required  increased  as  target  size  decreased,  but  this  pat¬ 
tern  was  greatly  reduced  when  force  feedback  was  provided. 
Interestingly,  at  least  one  participant  was  unable  to  complete 
the  tasks  without  force  feedback,  but  successfully  selected  the 
targets  when  force  feedback  was  provided.  These  results  sug¬ 
gest  that  force  feedback  could  prove  useful  for  individuals  with 
PI,  but  it  should  be  noted  that  these  results  may  be  unique  to 
the  specific  form  of  force  feedback  utilized  in  the  current 
study.  Careful  evaluation  of  other  forms  of  force  feedback  is 
still  necessary. 

Building  on  the  previous  study,  Hwang,  Keates,  Langdon, 
and  Clarkson  (2003)  investigated  the  use  of  force  feedback  grav¬ 
ity  wells  for  point-and-click  tasks  when  an  undesired  haptic  dis- 


tracter  is  present  on  the  screen.  Ten  individuals  with  PI  together 
with  eight  individuals  without  PI  participated.  Their  results  in¬ 
dicated  that  the  location  of  the  distracter  has  an  important  effect 
on  performance.  'While  selection  times  and  error  rates  improved 
when  force-feedback  gravity  wells  were  present,  participants 
were  significantly  slower  when  the  distracter  was  directly  be¬ 
tween  the  start  point  and  the  target.  The  greatest  improvements 
were  evident  for  the  most  impaired  users. 

Keates,  Clarkson,  and  Robinson  (2000)  investigated  the  use 
of  the  Model  Human  Processor  (Card,  Moran,  &  Newell,  1983) 
for  describing  the  behavior  of  computer  users  with  PI.  This 
model  combines  perceptual,  cognitive,  and  motor  activities  to 
predict  the  time  required  to  complete  a  task,  but  was  originally 
developed  with  an  emphasis  on  individuals  without  PI.  Two 
groups  of  individuals  took  part  in  their  study.  The  first  group 
consisted  of  six  individuals  with  PI  caused  by  various  health 
conditions  (e.g.,  quadraplegia,  MD,  spastic  Ck  choreoathetoid 
CP,  and  Friedrich’s  Ataxia)  while  the  second  group  included 
three  individuals  with  no  PI.  Their  results  are  summarized  in  [AQ13] 
Table  42.4. 

Participants  with  PI  appeared  to  take  longer  to  complete  fun¬ 
damental  perceptual  and  cognitive  activities.  These  results  are 
consistent  with  those  discussed  earlier  (Koester  &  Levine, 

1994),  but  should  be  interpreted  carefully  given  the  varied 
health  conditions  of  the  participants.  Through  observation  and 
an  analysis  of  the  different  times  recorded  for  motor  activities, 
the  authors  concluded  that  additional  cognitive  or  perceptual 
activities  were  taking  place  when  individuals  with  PI  completed 
basic  motor  tasks  (as  compared  to  individuals  with  no  PI).  'While 
their  results  are  not  definitive,  they  confirm  the  need  for  addi¬ 
tional  research  that  focuses  on  the  time  required  to  complete 
basic  perceptual  or  cognitive  activities. 

Keates,  Hwang,  Langdon,  and  Clarkson  (2002)  investigated 
the  challenges  of  the  point-and-click  tasks  for  people  with  PI 
by  studying  detailed  records  of  cursor  movements.  The  moti¬ 
vation  for  their  study  was  the  assumption  that  traditional  task 
measures  such  as  speed  and  error  rate  are  not  sufficient  if  the 
goal  is  to  fully  understand  the  interaction  process  including  the 
reasons  users  experience  difficulties.  Five  individuals  with  PI 
and  three  without  PI  completed  a  series  of  multidirectional  cur¬ 
sor  positioning  tasks,  providing  detailed  information  regarding 
the  path  the  cursor  followed  during  each  activity.  Based  on 
those  data,  six  metrics  including  the  distance  traveled  relative  to 
cursor  displacement,  distribution  of  distance  traveled  for  a 
range  of  cursor  speeds,  submovements,  distance  traveled  away 
from  the  target,  distribution  of  distance  traveled  for  a  range  of 
curvatures,  distribution  of  distance  traveled  for  a  ranged  of  radii 
from  the  target,  were  proposed  which  focused  on  deviations 
from  the  optimal  path  for  a  number  of  predefined  cursor  posi¬ 
tioning  tasks.  Next,  a  subset  of  the  participants  (four  with  PI  and 


TABLE  42.4.  Average  Time  Required,  By  Individuals  With 
and  Without  PI,  to  Complete  Fundamental  Perceptual, 
Cognitive,  and  Motor  Tasks 


Perceptual 

Cognitive 

Motor 

No  PI 

80  ms 

93  ms 

7  0ms 

PI 

100  ms 

i  10  ms 

1 10  ms,  210  ms,  300  ms 
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three  without  PI)  completed  a  set  of  target  selection  tasks. 
These  tasks  differed  slightly  from  those  used  in  the  initial  study 
in  that  a  circular  starting  point  was  defined  near  the  center  of 
the  screen.  The  results  indicated  that  the  new  metrics  helped 
highlight  differences  between  users  with  and  without  PI,  espe¬ 
cially  those  difficulties  experienced  by  the  more  impaired  users. 

Building  on  the  results  of  the  previous  study,  Hwang, 
Keates,  Langdon,  and  Clarkson  (2004)  investigated  the  sub¬ 
movements  of  the  cursor  path  for  individuals  with  PI.  When 
conducting  navigation  tasks,  users  completed  a  rapid  move¬ 
ment  that  is  comprised  of  a  sequence  of  smaller,  discrete  sub¬ 
movements.  The  authors  studied  the  cursor  trajectories  of  six 
users  with  PI  together  with  three  individuals  without  PI.  The 
results  showed  that  some  individuals  with  PI  paused  more  of¬ 
ten  and  longer  than  participants  without  PI,  resulting  in  up  to 
five  times  more  submovements  to  complete  the  same  task. 
There  was  a  correlation  between  error  rate  and  peak  sub¬ 
movement  speed  for  users  with  PI,  which  was  not  observed  for 
users  without  PI. 

Myers,  Wobbrock,  Yang,  Yeung,  Nichols,  and  Miller  (2002)  fo¬ 
cused  on  individuals  with  PI  that  result  in  a  loss  of  gross  motor 
control,  such  as  the  movement  of  the  arm  and  the  wrist,  but  re¬ 
tention  of  specific  fine  motor  control,  such  as  the  finger.  Soft¬ 
ware  was  developed  allowing  the  use  of  a  PDA  as  the  mouse  and 
keyboard.  Four  participants  with  muscular  dystrophy  tested  the 
software;  all  provided  positive  feedback.  The  parents  of  one  par¬ 
ticipant  noted  that  although  performance  was  slower  using  the 
PDA  as  compared  to  the  regular  keyboard  and  mouse,  the  PDA 
was  much  less  tiring  and  could  be  used  much  longer  without  a 
break. 

Impairments  in  Infants 

Infants  with  severe  PI  often  develop  to  be  passive  with  limited 
or  nonexistent  speech.  This  is  true  even  when  cognitive  skills 
are  normal.  Fell,  Delta,  Peterson,  Ferrier,  Mooraj,  and  Valleau 
(1994)  discussed  the  development  and  evaluation  of  the  Baby- 
Babble-Blanket  as  a  technology  to  enable  infants  with  severe  PI 
to  control  their  environment  and  communicate.  One  specific 
goal  was  to  help  these  infants  improve  their  motor  skills.  The 
key  component  of  the  system  is  a  pad,  containing  12  equally 
spaced  switches,  which  can  be  placed  on  the  floor.  When  a 
switch  is  pressed,  a  computer  provides  feedback.  The  feedback 
could  include  digitized  speech,  music,  and  sounds.  The  authors 
report  on  a  single-subject  experiment  involving  a  five-month-old 
infant  with  poor  muscle  tone,  hydrocephaly,  and  clubfeet.  This 
infant  was  able  to  activate  switches,  became  more  active  when 
sounds  were  played  in  response  to  switch  activations,  associ¬ 
ated  switches  with  a  desired  effect  (e.g.,  hearing  his  mother’s 
voice),  and  was  able  to  modify  these  associations  when  the  feed¬ 
back  was  moved  from  one  switch  to  another. 

Significant  Speech  and  Physical  Impairments  (SSPI) 

While  the  terminology  may  vary,  several  studies  have  explored 
technologies  for  individuals  with  significant  speech  and  physical 
impairments.  The  article  by  Roy,  Panayi,  Erenshteyn,  Foulds  and 


Fawcus  (1994)  discussed  earlier  could  have  been  included  here, 
but  was  listed  separately  due  to  its  explicit  focus  on  impairments 
caused  by  CE 

Aim,  Todman,  Elder  and  Newell  (1993)  described  the  devel¬ 
opment  of  a  computer-aided  communication  technology  for  in¬ 
dividuals  with  SSPI.  This  article  focused  on  situations  in  which 
the  speech  impairment  results  from  PI  rather  than  cognitive  im¬ 
pairments.  The  system  tracks  the  conversation,  guiding  users  as 
they  specify  the  type  of  utterance  they  wish  to  generate.  The 
system  was  evaluated  with  two  participants.  The  mean  rate  of 
speech  was  144.4  word  per  minute  (wpm).  One  participant  with 
no  PI  took  part  in  eight  conversations  with  student  volunteers, 
producing  speech  at  a  rate  of  67.4  wpm  while  the  conversation 
partners  spoke  132.9wpm.  While  selecting  an  utterance  could 
require  one  to  four  clicks,  less  than  0.5%  of  the  selections  in¬ 
volved  more  than  two  clicks.  This  suggests  that  the  desired  ut¬ 
terances  were  easily  accessed  given  the  current  design  of  the 
system.  The  second  participant  had  CP  and  used  a  word  board 
containing  400  words  as  his  primary  method  for  communicat¬ 
ing.  Using  the  experimental  system,  this  participant  was  able  to 
express  himself  more  fully,  increasing  his  vocabulary  from  143  to 
534  words  and  the  number  of  times  he  took  control  of  the  con¬ 
versation  from  10  to  27.  Many  important  questions  must  still  be 
addressed  for  systems  such  as  the  one  described  in  this  article. 
One  critical  issue  is  the  effectiveness  of  such  a  system  as  the 
number  of  conversation  topics  and  phrases  increases.  Another 
critical  issue  is  how  the  user  can  efficiently  enter  and  organize 
these  phrases  as  they  customize  the  system  for  use  in  new  situ¬ 
ations. 

Demasco,  Newell,  and  Arnott  (1994)  discussed  the  develop¬ 
ment  of  a  system  to  support  communication  activities  based 
upon  principles  of  visual  information  seeking.  Building  on  five 
key  principles  derived  from  the  literature,  a  computerized  word 
board  was  designed.  Key  features  included  the  separation  of 
navigation  and  selection  which  allows  these  activities  to  be  sup¬ 
ported  by  different  devices,  and  expanding  the  collection  of  ac¬ 
cessible  words  using  word  associations  (e.g.,  is-like,  isa,  hasa, 
goes-with).  At  the  time  the  article  was  written,  the  system  was 
under  development  and  a  variety  of  directions  for  research  were 
presented,  but  no  evaluation  had  been  completed. 

McCoy,  Demasco,  Pennington,  and  Badman  (1997)  investi¬ 
gated  the  application  of  natural  language  processing  to  Aug¬ 
mentative  and  Alternative  Communication  (AAC)  to  improve 
the  communication  capability  for  individuals  who  have  cogni¬ 
tive  disabilities  that  hinder  their  ability  to  speak.  The  objective  is 
to  develop  intelligent  communication  aids  that  provide  linguis¬ 
tically  “correct”  output  when  speeding  the  communication  rate. 
The  author  proposed  a  prototype  system  with  a  key  component 
called  the  “Intelligent  Parser  Generator.”  Several  research  issues 
related  to  the  system  and  the  specific  user  population  were  pro¬ 
posed  and  discussed.  However,  no  user  evaluation  result  has 
been  reported. 

Albacete,  Chang,  Polese,  and  Baker  (1994)  and  Albacete, 
Chang,  and  Polese  (1998)  discussed  the  design  of  iconic  lan¬ 
guages  for  use  by  individuals  with  SSPI.  Their  approach  is  based 
upon  semantic  compaction  in  which  ambiguous  icons  are  com¬ 
bined  to  represent  concepts.  Eor  example,  ‘APPLE  VERB”  could 
mean  “eat”  while  ‘APPLE  NOUN”  means  “food.”  An  icon  alge¬ 
bra  was  defined,  including  operators  to  combine,  mark,  provide 
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context  for,  enhance,  or  invert  icons.  As  a  result,  this  approach 
can  allow  a  vocabulary  of  several  thousand  words  to  be  repre¬ 
sented  with  just  50  to  120  icons.  This  approach  served  as  the 
foundation  for  the  Minispeak  system. 

Input  Using  Electrophysiological  Data 

Traditional  input  techniques  involve  some  kind  of  physical  ac¬ 
tivity.  Most  often,  the  hands  and  arms  are  involved,  but  other 
technologies  allow  users  to  provide  input  by  moving  their  eyes, 
heads,  tongue,  or  eyelids.  However,  a  growing  number  of  re¬ 
searchers  are  investigating  input  techniques  that  do  not  require 
physical  activities.  The  article  by  Mason,  Bozorgzadeh  and  Birch 
(2000)  discussed  earlier  provides  one  example,  in  which  elec¬ 
trophysiological  data  were  used  to  generate  input  in  a  computer. 
In  this  section,  several  additional  articles  that  discuss  the  use  of 
electrophysiological  data  are  discussed.  This  is  a  new,  exciting, 
and  promising  form  of  input  for  individuals  with  significant  PI. 

Patmore  and  Knapp  (1998)  discussed  an  approach  to  eye 
tracking  that  uses  the  electrooculogram  (EOG)  and  visual 
evoked  potentials  (VEP)  as  input.  While  a  number  of  EOG-based 
eye-tracking  systems  have  been  discussed,  additional  research  is 
needed  to  make  such  systems  reliable  (see  LaCourse  &  Hludik, 
1990;  Patmore  &  Knapp,  1995).  The  VEP  is  a  response  to  a  flash 
of  light  that  is  highly  sensitive  to  the  stimulus’  distance  from  the 
center  of  the  Held  of  vision.  The  authors  presented  the  techni¬ 
cal  details  of  a  system  that  combines  the  EOG,  the  VEP,  and 
fuzzy  logic  to  improve  accuracy.  While  promising,  significant 
additional  research  is  necessary  before  such  a  system  will  be  suf¬ 
ficiently  reliable  for  use  under  realistic  conditions. 

Allanson,  Rodden,  and  Mariani  (1999)  described  a  toolkit  de¬ 
signed  to  explore  the  use  of  electrophysiological  data  to  sup¬ 
port  interactions  with  computers.  They  discussed  the  variety 
of  electrophysiological  data  available  including  the  electroen¬ 
cephalograph  (EEG),  electromyograph  (EMG),  and  galvanic 
skin  resistance  (GSR).  The  EEG  is  an  electrical  trace  of  brain  ac¬ 
tivity,  the  EMG  can  detect  the  state  (e.g.,  complete  relaxation, 
partial  contraction,  complete  contraction)  of  a  muscle,  and  the 
GSR  measures  changes  in  the  resistance  of  the  skin.  The  authors 
provided  interesting  references  describing  the  use  of  the  EEG  to 
turn  switches  on  and  off  (Kirkup,  Searle,  Craig,  Mclsaac,  &  Moses, 
1997)  and  to  differentiate  between  five  tasks  (Keirn  &  Aunon, 
1990),  as  well  as  references  illustrating  the  use  of  the  EMG  to 
control  computer  games  (Bowman,  1997)  and  manipulate  ob¬ 
jects  on  a  computer  (Lusted  &  Knapp,  1996).  Their  toolkit  inte¬ 
grated  these  signals  using  predefined  widgets  that  allow  the  sig¬ 
nals  to  be  used  to  drive  applications. 

Barreto,  Scargle,  and  Adjouadi  (1999)  discussed  the  design 
of  an  integrated  assistive  real-time  system  as  an  alternative  input 
device  to  computers  for  individuals  with  significant  PI.  The  sys¬ 
tem  uses  EMG  biosignals  from  cranial  muscles  and  EEG  from 
the  cerebrum’s  occipital  lobe  to  control  mouse  input.  The  sys¬ 
tem,  combined  with  an  onscreen  keyboard,  provides  full  oper¬ 
ation  functionalities  with  computers.  Six  healthy  participants 
tested  the  system.  Participants  required  an  average  of  13-21  sec¬ 
onds  to  select  a  button.  Although  the  selection  speed  is  quite 
low,  the  study  demonstrated  that  the  EMG  and  EEG  based  sys¬ 
tem  could  allow  users  to  complete  target  selection  tasks. 


Kiibler,  Kochoubey  Hinterberger,  Ghanayim,  Perelmouter, 
Schauer,  Fritsch,  Taub,  and  Birbaumer  (1999)  discussed  the  use 
of  slow  cortical  potentials  in  a  2-s  rhythm  to  control  cursor 
movements  on  a  computer  screen.  Three  individuals  with  ad¬ 
vanced  ALS  participated  in  extensive  practice  sessions  to  learn 
to  control  this  signal.  Ultimately,  two  of  the  three  reached  accu¬ 
racy  rates  between  70  and  80%,  which  allowed  these  individu¬ 
als  to  select  letters  and  words  displayed  on  a  computer  screen. 
Building  on  these  results,  Birbaumer,  Hinterberger,  Kiibler,  and 
Neumann  (2003)  developed  a  Thought-Translation  Device 
(TTD)  using  slow  cortical  brain  potentials.  The  system  allows 
the  user  to  directly  select  letters  and  words  with  brain  response 
alone.  Eleven  locked-in  patients  tested  the  device  with  a  training 
period  ranging  from  six  months  to  six  years.  Nine  participants 
had  end-stage  ALS,  one  had  experienced  a  subcortical  hemor¬ 
rhage,  and  one  had  polyneuropathy  and  Guillian-Barre  Syn¬ 
drome.  At  the  beginning  of  the  training  session,  the  seven  par¬ 
ticipants  who  were  not  completely  locked-in  patients  were  able 
to  communicate  “yes”  or  “no”  signals  with  face  or  eye  muscles. 
Two  of  these  seven  patients  lost  all  motor  control  during  the 
study  but  were  still  able  to  communicate  using  TTD.  None  of 
the  patients  who  were  completely  locked-in  at  the  beginning 
of  training  were  able  to  gain  effective  control  over  the  system, 
but  one  patient  did  reach  a  selection  speed  of  one  letter  or  word 
per  minute.  Although  only  two  participants  were  able  to  use  this 
system  successfully,  this  study  provided  additional  confirmation 
that  a  brain-computer  interface  has  the  potential  to  benefit  users 
with  severe  PL 

Craig,  Tran,  Mclsaac,  Moses,  Kirkup,  and  Searle  (2000)  de¬ 
veloped  a  system  that  employs  Alpha  waves,  which  increase 
in  amplitude  when  the  eyes  are  closed  and  decrease  in  ampli¬ 
tude  when  the  eyes  are  open,  to  activate  electrical  devices.  Ten 
individuals  without  PI  and  one  individual  with  severe  PI  eval¬ 
uated  the  system.  The  participant  with  PI  had  childhood-onset 
poliomyelitis  which  resulted  in  the  loss  of  motor  function  be¬ 
low  her  neck.  Participants  without  PI  completed  one  50-60 
minute  session  on  each  of  four  consecutive  days.  In  each  ses¬ 
sion,  they  activated  the  environmental  control  system  30 
times.  The  participant  with  severe  PI  completed  four  trials  of 
approximately  ten  minutes  duration  in  each  trial.  There  was  a 
significant  decrease  in  selection  times  between  day  one  and  all 
subsequent  days,  including  a  16%  reduction  from  12.7  seconds 
on  day  one  to  10.7  seconds  on  day  two.  Error  rates  also  de¬ 
creased  significantly  from  approximately  50%  to  23%  by  the 
second  day. 

Doherty  et  al.  (2000)  discussed  the  use  of  formative  experi¬ 
ments  and  contextual  design  to  assess  the  potential  of  applica¬ 
tions  controlled  by  eye  movements,  the  EOG  signal,  and  the 
EEG  signal  (e.g.,  the  Cyberlink  interface).  An  initial  study  in¬ 
volving  44  participants  with  various  PI  demonstrated  some  ex¬ 
pected  limitations  (e.g.,  the  mouse  was  not  effective  when 
arm/hand  control  was  limited,  eye  tracking  was  not  effective 
when  peripheral  vision  was  impaired)  as  well  as  unexpected  dif¬ 
ficulties  (e.g.,  some  quadriplegic  participants  could  not  produce 
signals  below  the  neck  and  therefore  could  not  operate  the  GSR 
device).  Interestingly,  the  Cyberlink  interface  was  the  only  al¬ 
ternative  that  all  44  participants  could  use  to  navigate  a  maze. 
Based  on  those  results,  Doherty  et  al.  (2001)  proposed  ten  de¬ 
sign  guidelines  for  developing  interfaces  with  Cyberlink  and 
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brain-injured  users.  Example  guidelines  include  avoiding  strobes 
when  sampling  the  cursor,  using  tunnels  instead  of  clicking  for 
selection,  avoiding  long  distances  by  providing  rest  stops,  and 
not  using  timeouts  to  indicate  selection.  Overall,  their  results 
confirmed  that  interfaces  built  upon  electro  physiological  data 
can  be  useful  for  individuals  with  PI,  but  that  substantial  re¬ 
search  is  necessary  before  the  full  potential  of  these  technolo¬ 
gies  will  be  realized. 

Gnanayutham,  Bloor,  and  Cockton  (2005)  improved  the 
CyberLink  system  described  earlier  by  adding  a  discrete  accel¬ 
eration  function  for  cursor  movement.  Two  studies  were  con¬ 
ducted.  The  first  tested  the  adaptive  cursor  acceleration  algo¬ 
rithm  with  30  participants,  including  19  with  PI.  The  results 
suggested  that  the  acceleration  function  did  improve  perfor¬ 
mance,  but  was  not  sufficient  to  overcome  the  wide  variations  in 
capabilities  of  the  study  participants.  To  address  this  shortcom¬ 
ing,  the  authors  explored  personalized  tiling  for  brain  interfaces, 
with  the  goal  of  producing  a  better  match  between  the  demands 
of  the  device  and  the  capabilities  of  the  users.  In  the  second 
study,  five  individuals  with  brain  injuries  evaluated  the  person¬ 
alized  interface.  All  five  participants  were  able  to  communicate 
with  the  system.  Two  participants  communicated  through  a  two- 
target,  yes/no  interface  with  an  average  success  rate  of  70%.  Two 
other  participants  communicated  using  a  switch,  also  achiev¬ 
ing  70%  success  rate.  Three  participants  showed  interest  be¬ 
yond  communication  tasks  and  tried  using  the  interface  for  tele¬ 
vision  and  music  systems  in  their  rooms. 

Kennedy,  Kakay  Moore,  Adams,  and  Goldwaithe  (2000)  dis¬ 
cussed  an  experimental  interface  that  requires  the  implanta¬ 
tion  of  special  electrodes  into  specific  areas  of  the  brain.  For 
one  individual,  EMG  signals  from  various  muscles  were  uti¬ 
lized  to  supplement  the  neural  signals  detected  by  the  elec¬ 
trode.  With  five  months  of  practice,  the  individual  learned  to 
control  cursor  movements.  Results  for  three  tasks  were  re¬ 
ported.  These  tasks  involved  moving  the  cursor  across  the 
screen,  placing  the  cursor  on  an  icon  or  button,  and  selecting 
the  icon  or  button  by  holding  the  cursor  still  for  a  predefined 
period  (or  activating  a  predefined  EMG  signal).  Through  these 
tasks,  this  individual  was  able  to  invoke  synthesized  speech  for 
predefined  messages,  spell  names,  and  answer  questions. 
Moore  and  Kennedy  (2000)  identified  several  directions  for  ad¬ 
ditional  research  including  signal  understanding,  patient  train¬ 
ing,  navigation  paradigms,  control  of  neural  prosthetics,  and 
communication . 

Carroll,  Schlag,  Kirikci,  and  Moore  (2002)  focused  on  the 
communications  needs  of  individuals  who  are  locked-in.  The  au¬ 
thors  analyzed  log  sheets,  conducted  interviews,  and  visited  users 
to  identify  user  needs.  The  style  and  content  of  interactions  var¬ 
ied  according  to  the  role  of  the  person  with  whom  the  individ¬ 
ual  was  communicating.  Family  members  engaged  in  fewer  in¬ 
teractions  that  allowed  the  individual  to  participate,  while  nurses 
continually  questioned  the  individual  about  various  issues,  in¬ 
cluding  their  comfort  and  needs.  This  finding  suggests  that  the 
support  offered  by  the  system  may  vary  based  on  the  relation¬ 
ship  between  the  individual  and  her  communication  partner. 

Felzer  and  Freisleben  (2002)  developed  that  “Hands-free” 
Wheelchair  Control  System  (HaWCoS)  that  allows  individuals  to 
control  an  electrically  powered  wheelchair  without  using  their 
hands.  The  system  uses  muscle  contractions,  a  kind  of  EMG  sig¬ 


nal  associated  with  any  arbitrary  muscle,  as  input  signals  to  com¬ 
municate  with  the  wheelchair.  Three  experiments  were  con¬ 
ducted,  comparing  one  prototype  of  the  HaWCoS  system  with  a 
traditional  joystick  control.  Although  requiring  more  time  than 
the  traditional  joystick,  the  HaWCoS  system  allowed  users  to 
control  the  wheelchair  under  a  hands-free  condition,  providing 
an  alternative  for  people  with  severe  PI. 

Borisoff,  Mason,  Bashashati,  and  Birch  (2004)  investigated 
Low  Frequency  Asynchronous  Switch  Design  (LF-ASD)  as  a  di¬ 
rect  Brain-Computer  Interface  (BCI)  technology  for  people 
with  severe  PI.  Earlier  research  on  LE-ASD  produced  encour¬ 
aging  results,  but  error  rates  were  too  high  for  practical  use. 
These  authors  implemented  energy  normalization  and  feature 
space  dimensionality  reduction  features,  which  normalize  the 
low-frequency  signals  and  increase  accuracy.  The  revised  sys¬ 
tem  was  evaluated  by  four  individuals  with  high-level  SCI  as 
well  as  four  individuals  without  PI.  Accuracy  improved  signifi¬ 
cantly  compared  to  earlier  results,  with  true  positive  rates  in¬ 
creased  by  approximately  33%  and  false  positive  rates  by  1%  to 
2%.  Individuals  with  PI  achieved  the  same  accuracy  as  those 
without  PI,  which  was  not  possible  when  using  a  traditional 
keyboard  and  mouse. 

Moore  and  Dua  (2004)  developed  a  biometric  input  device 
using  galvanic  skin  response  (GSR),  which  is  a  change  in  electric 
conductivity  of  the  skin  caused  by  increased  activity  of  the  sweat 
glands.  Various  emotional  responses  such  as  fear,  excitement,  or 
anxiety  can  affect  the  GSR  level.  A  locked-in  individual  who  had 
lost  the  ability  to  use  a  brain-computer  interface  tested  the  sys¬ 
tem  during  a  one-year  period.  The  results  showed  that  there 
was  a  significant  difference  in  accuracy  between  baseline  and 
controlled  states.  After  several  days,  an  accuracy  rate  of  approx¬ 
imately  62%  was  achieved.  The  experiments  demonstrated  that 
GSR  has  the  potential  to  be  used  by  people  with  severe  PI. 

Cursor-Control  Technologies 

Cursor  control  and  text  generation  are  perhaps  the  two  most 
common  activities  when  interacting  with  computers.  In  fact, 
cursor  control  can  form  the  foundation  for  text  generation  and 
therefore  may  be  the  most  fundamental  task  in  which  individu¬ 
als  with  PI  engage.  Numerous  technologies  have  been  explored 
for  controlling  the  cursor  location  and  many  of  the  articles  sum¬ 
marized  earlier  focused  on  cursor  control  tasks.  While  new  eye¬ 
tracking  technologies  are  being  explored,  commercial  systems 
are  readily  available  and  can  prove  useful  for  individuals  with 
PI.  The  issues  involved  in  using  these  technologies  are  not  dis¬ 
cussed  in  this  chapter,  but  Sibert  and  Jacob  (2000)  provided 
useful  pointers  to  the  recent  literature  on  this  topic.  In  the  fol¬ 
lowing  sections,  technologies  are  discussed  that  allow  individu¬ 
als  to  control  the  cursor  using  their  heads,  tongues,  eyes  and 
eyelids,  or  speech. 

Head.  When  an  individual  has  limited  use  of  their  hands 
and  arms,  other  options  must  be  found  for  interacting  with 
computers.  One  option  is  to  control  the  cursor  location  using 
head  movements.  In  this  section,  several  articles  are  discussed 
that  explore  the  issues  involved  in  interacting  with  computers 
via  head  movements. 
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Radwin,  Vanderheiden,  and  Lin  (1990)  discussed  the  results 
of  a  study  that  compared  a  standard  mouse  to  a  lightweight, 
ultrasonic,  head-controlled  pointing  device.  Ten  participants 
with  no  PI  completed  a  variety  of  tasks  that  required  the  selec¬ 
tion  of  circular  targets.  Distance,  target  size,  and  device  all  sig¬ 
nificantly  affected  movement  times.  Interestingly,  target  size  had 
a  greater  effect  for  the  head-controlled  device,  with  movement 
times  decreasing  more  dramatically  than  when  the  mouse  was 
used.  The  direction  of  the  movement  also  affected  movement 
times,  with  the  horizontal  movements  resulting  in  the  shortest 
times  for  the  mouse  and  vertical  movements  resulting  in  the 
shortest  times  for  the  head-controlled  device.  Results  were  also 
reported  for  two  individuals  with  CP  including  detailed  results 
for  one  of  these  individuals  that  highlighted  the  importance  of 
carefully  assessing  the  abilities  of  individuals  with  PI  when  de¬ 
signing  computer  systems  for  their  use.  For  example,  horizontal 
movements  were  efficient  for  participants  with  no  PI,  but  the  in¬ 
dividual  with  CP  experienced  more  difficulty  moving  the  cursor 
to  the  right  than  any  other  direction. 

Lin,  Radwin,  and  Vanderheiden  (1992)  reported  on  a  simi¬ 
lar  study  that  focused  on  the  relationship  between  control-dis¬ 
play  gain  and  performance  with  both  mouse  and  head-con¬ 
trolled  device.  Ten  individuals  with  no  PI  participated  in  this 
study.  Movement  times  and  deviation  of  the  cursor  from  the 
optimal  path  were  reported.  Extensive  analyses  of  the  results 
highlighted  the  importance  of  target  size,  movement  distance, 
and  gain.  Gain  affected  movement  times,  but  both  target  size 
and  movement  distance  were  more  influential.  A  U-shaped 
curve  was  observed  for  both  the  mouse  and  head-controlled 
device.  The  reader  is  referred  to  the  original  article,  as  well  as 
Schaab,  Radwin,  Vanderheiden,  and  Hansen  (1996),  for  addi¬ 
tional  information  regarding  the  relationship  between  control- 
display  gain  and  performance  with  head-controlled  devices. 

Malkewitz  (1998)  described  a  system  that  uses  head  move¬ 
ments  to  control  cursor  location  and  speech  to  click  the 
mouse  buttons,  activate  hotkeys,  and  emulate  the  keyboard. 
The  goal  was  to  provide  access  to  standard  applications  by  em¬ 
ulating  standard  input  devices  (e.g.,  keyboard  and  mouse). 
Speech  recognition  was  restricted  to  a  predefined  set  of  com¬ 
mands  and  the  ability  to  enter  individual  letters.  Full  dictation 
was  not  supported  for  various  reasons.  While  preliminary  re¬ 
sults  were  encouraging,  the  authors  note  the  need  for  addi¬ 
tional  research. 

Evans,  Drew,  and  Blenkhorn  (2000)  described  the  design  of 
a  head-operated  joystick.  The  authors  briefly  summarized  the 
results  of  an  evaluation  involving  40  participants  including  nine 
individuals  with  undocumented  impairments.  It  is  reported  that 
all  40  individuals  used  the  device  successfully,  but  detailed  mea¬ 
sures  were  not  provided.  The  nine  individuals  with  impairments 
also  used  a  commercially  available  device  that  emulated  a 
mouse.  Interestingly,  all  nine  preferred  the  joystick  emulation  to 
the  mouse  emulation  provided  by  the  commercial  device.  It  is 
speculated  that  this  preference  is  due  at  least  in  part  to  the  fact 
that  users  can  rest  their  heads  in  a  neutral  position  once  the  cur¬ 
sor  is  placed  in  the  correct  location  when  using  a  device  that 
emulates  a  joystick. 

LoPresti,  Brienza,  Angelo,  Gilbertson,  and  Sakai  (2000)  stud¬ 
ied  the  relationship  between  neck  range  of  motion  and  the  use 
of  head-controlled  pointing  devices.  Fifteen  individuals  with  no 


PI  as  well  as  ten  individuals  with  various  health  conditions  that 
resulted  in  PI  that  affected  neck  movements  participated  in  the 
study.  A  head-mounted  display  was  used  as  opposed  to  a  tradi¬ 
tional  desktop  monitor.  The  results  confirmed  that  the  individ¬ 
uals  with  PI  were  less  accurate  and  took  longer  when  completing 
icon  selection  tasks.  They  were  also  less  accurate  when  per¬ 
forming  target-tracking  tasks.  Further,  their  performance  was 
much  more  variable  than  that  of  the  participants  with  no  PL  As 
with  several  other  studies,  the  authors  found  that  models  devel¬ 
oped  based  upon  individuals  with  no  PI  do  not  accurately  rep¬ 
resent  the  performance  of  individuals  with  PL  A  more  detailed 
analysis  indicated  that  vertical  movements  were  faster  than  hor¬ 
izontal  movements  and  horizontal  movements  were  faster  than 
diagonal  movements.  LoPresti  (2001)  reported  similar  results. 

De  Silva,  Lyons,  Kawato,  and  Tetsutani  (2003)  developed  a 
vision-based  face-tracking  system  for  cursor  control.  The  cur¬ 
sor  position  was  controlled  head  movements.  The  mouse  but¬ 
ton  was  activated  by  the  individual  opening  his  mouth.  Eight 
users  without  impairments  tested  the  system.  The  face  track¬ 
ing  system  allowed  for  an  index  of  performance  of  2.0  bits/sec¬ 
ond,  which  is  greater  than  a  joystick,  but  slower  than  a  track¬ 
ball  and  touchpad.  The  subjective  satisfaction  measures  showed 
that  the  system  was  rated  average  for  ease  of  use.  The  lowest  rat¬ 
ings  were  given  for  smoothness  and  accuracy  of  cursor  control. 
The  system  has  the  potential  to  support  hands-free  typing. 

Tongue.  Salem  and  Zhai  (1997)  provided  a  brief  descrip¬ 
tion  of  a  system  that  allows  the  cursor  location  to  be  controlled 
using  a  tongue-operated  isometric  joystick  like  those  available 
on  many  portable  computers.  A  Trackpoint  joystick  was  mounted 
in  a  mouthpiece,  similar  to  those  used  by  athletes,  that  was  fit¬ 
ted  to  the  individual’s  upper  teeth.  Two  individuals  participated 
in  a  pilot  study.  Neither  had  experience  with  an  isometric  point¬ 
ing  device.  Both  used  the  tongue-controlled  joystick  as  well  as 
a  standard  finger-controlled  joystick.  Initial  performance  was 
substantially  better  when  using  their  fingers,  but  with  practice 
the  gap  between  the  two  devices  narrowed.  During  the  last  trial, 
the  tongue-based  device  was  5%  slower  than  the  finger-based 
device  for  one  participant,  but  it  was  57%  slower  for  the  other 
participant. 

Eye  and  eyelid.  Shaw,  Loomis,  and  Crisman  (1995)  pre¬ 
sented  a  system  designed  to  allow  individuals  to  control  vari¬ 
ous  devices,  including  computers,  by  opening  and  closing  their 
eyes.  Two  individuals  utilized  the  system.  The  first  (participant 
A)  had  a  high-level  SCI  resulting  in  no  residual  use  of  his  arms  or 
hands.  The  second  (participant  B)  had  a  stroke  and  could  only 
control  a  single  eyelid.  With  six  hours  of  practice,  participant  A 
was  able  to  move  from  interacting  with  simulation  software  to 
controlling  a  powered  wheelchair.  He  found  this  device  to  be 
less  obtrusive  than  the  chin-control  he  normally  used  to  control 
his  wheelchair.  After  12  hours  of  practice,  participant  B  was  able 
to  navigate  the  maze  presented  by  simulation  software  and 
made  progress  toward  controlling  a  powered  wheelchair.  Given 
the  minimal  motor  control  this  individual  exhibited,  this  was 
considered  a  noteworthy  accomplishment. 

Bates  and  Istance  (2002)  studied  a  zooming  interface  to 
enhance  the  performance  of  eye-controlled  pointing  devices. 
Selecting  small  targets  was  challenging  due  to  the  way  the  hu- 
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man  eye  functions.  The  authors  addressed  this  issue  by  intro¬ 
ducing  automatic  zooming  to  make  targets  larger  and  easier  to 
select.  Six  individuals  without  PI  took  part  in  the  study.  The  re¬ 
sult  confirmed  that  target  size  is  an  important  factor  for  the  ef¬ 
fectiveness  of  eye-based  interaction.  The  zooming  interface  was 
significantly  faster  and  more  accurate  compared  to  original  so¬ 
lution  without  zooming.  However,  the  zooming  interface  was 
considered  less  comfortable  than  the  head  mouse. 

Hornof,  Cavender,  and  Hoselton  (2004)  developed  a  system 
for  children  with  motor  impairments  to  draw  pictures  and  select 
targets  using  eye  movements.  The  eye  system  allowed  users  to 
switch  between  looking  and  drawing  by  focusing  on  a  prede¬ 
fined  part  of  the  screen.  Eight  participants  without  PI  evaluated 
the  system.  Participants  generally  found  the  system  easy  to 
learn.  The  quality  of  the  pictures  drawn  showed  that  the  sys¬ 
tem  could  allow  children  to  draw  pictures  with  their  eyes,  al¬ 
though  it  is  not  an  easy  task. 

Speech.  Manaris  and  Harkreader  (1998)  and  Manaris, 
Macgyvers,  and  Lagoudakis  (2002)  discussed  a  “listening  key¬ 
board”  that  allowed  users  with  PI  to  access  all  of  the  function¬ 
ality  of  a  traditional  keyboard  and  mouse.  A  usability  study  with 
three  individuals  with  PI  explored  the  effectiveness  of  the  lis¬ 
tening  keyboard  through  a  “Wizard-of-Oz”  design.  Results  from 
a  postexperiment  questionnaire  suggested  that  participants 
were  comfortable  with  the  concept  of  a  “listening  keyboard” 
and  that  it  would  be  preferable  to  mouthstick  or  handstick  sys¬ 
tems.  The  participants  were  able  to  complete  tasks  faster  with 
lower  error  rates  when  using  the  listening  keyboard  compared 
to  the  handstick.  However,  these  results  should  be  interpreted 
with  caution  since  the  number  of  participants  was  limited  and 
the  “Wizard-of-Oz”  methodology  results  in  unrealistically  low 
recognition  error  rates. 

Karimullah  and  Sears  (2002)  focused  on  continuous  speech- 
based  navigation  techniques  and  studied  a  standard  speech- 
based  mouse  cursor  and  a  predictive  cursor.  The  predictive  cur¬ 
sor  was  proposed  to  address  the  delay  related  to  speech 
processing.  As  the  cursor  moves,  the  predictive  cursor  appears  in 
front  of  the  actual  cursor,  indicating  the  position  at  which  the 
actual  cursor  will  be  located  if  the  command  to  stop  the  cursor 
were  issued.  Twenty-eight  participants  without  PI  evaluated  the 
efficacy  of  the  two  solutions.  The  difficulty  users  experienced 
selecting  targets  was  affected  by  the  speed  at  which  the  cursor 
was  moving,  target  size,  and  the  delay  between  when  the  com¬ 
mand  was  issued  and  when  it  was  processed,  but  no  differences 
were  detected  between  the  standard  and  predictive  cursors. 

Building  on  the  results  of  the  previous  study.  Sears,  Lin,  and 
Karimullah  (2002)  conducted  an  experiment  to  explore  why  the 
predictive  cursor  did  not  prove  beneficial.  This  study  investi¬ 
gated  three  sources  of  delay  associated  with  speech  input:  the 
time  required  for  the  user  to  initiate  a  response,  the  time  re¬ 
quired  to  speak  the  response,  and  the  time  required  for  the  sys¬ 
tem  to  produce  a  response.  Fifteen  individuals  without  PI  par¬ 
ticipated  in  this  study.  Based  on  the  results  of  this  study,  the 
authors  proposed  a  variable  speed  cursor  to  improve  the  per¬ 
formance  as  this  would  allow  users  to  decrease  the  speed  of 
the  cursor  when  selecting  small  targets.  This  is  expected  to  ad¬ 
dress  the  problems  that  users  experience  and  significantly  re¬ 
duce  failure  rates. 


Dai,  Goldman,  Sears,  and  Lozier  (2004)  focused  on  discrete 
speech-based  navigation  techniques,  evaluating  two  grid-based 
solutions,  which  allow  users  to  select  targets  by  recursively 
locating  the  cursor  onto  a  smaller  area  of  the  screen.  Both  so¬ 
lutions  divided  the  screen  into  nine  regions.  One  solution  pro¬ 
vided  a  single  cursor  in  the  center  region.  The  other  provided 
nine  cursors,  each  locating  in  one  region.  Twenty- four  partici¬ 
pants  without  PI  tested  the  two  techniques.  The  nine-cursor 
solution  allowed  users  to  select  targets  of  varying  sizes,  distance, 
and  direction  significantly  faster  than  the  one-cursor  solution. 
Through  comparisons  with  earlier  studies,  the  authors  sug¬ 
gested  that  grid-based  solutions  might  prove  advantageous 
when  compared  to  other  speech-based  navigation  solutions 
such  as  the  continuous  cursor. 

Text  Entry  Technologies 

Computers  are  frequently  used  to  generate  text.  While  the  key¬ 
board  is  the  most  frequently  used  alternative,  other  technolo¬ 
gies  can  also  be  used.  Some  individuals  with  PI  select  letters 
from  on-screen  keyboards,  others  use  speech  recognition,  and 
still  others  interact  with  the  standard  keyboard.  When  using 
an  on-screen  keyboard,  text  entry  is  effectively  turned  into  a 
cursor-control  task.  As  a  result,  all  of  the  cursor-control  tech¬ 
nologies  discussed  earlier  can  also  be  used  for  text  entry.  The 
article  by  Sears,  Karat,  Oseitutu,  Karimullah,  and  Feng  (2001) 
discussed  the  use  of  speech  recognition  for  text  entry,  but  was 
included  earlier  due  to  its  focus  on  individuals  with  SCI.  Arti¬ 
cles  in  earlier  sections  could  also  be  included  here,  but  are  dis¬ 
cussed  separately  due  to  their  focus  on  individuals  with  spe¬ 
cific  impairments.  The  articles  included  later  discuss 
technologies  that  can  be  used  for  text  entry  without  focusing  on 
any  health  condition  or  PL 

Keyboard-based  text  entry.  Matias,  MacKenzie,  and 
Buxton  (1993)  described  a  one-handed  keyboard  based  upon 
the  traditional  QWERTY  design.  This  basic  design  allows  users 
to  access  all  of  the  keys  on  one  side  of  the  keyboard  at  a  time, 
switching  between  sides  by  pressing  and  holding  the  spacebar. 
A  study  with  ten  participants  with  no  PI  was  reported.  Each  par¬ 
ticipant  used  his  or  her  nondominant  hand  when  typing  with 
one  hand.  Speed  and  errors  improved  significantly  over  ten  ses¬ 
sions.  By  the  end  of  ten  sessions,  participants  were  typing 
34.7wpm  with  an  error  rate  of  7.44%  with  the  one-handed  key¬ 
board  as  compared  to  64.9wpm  and  4.20%  errors  with  the  two- 
handed  keyboard.  While  this  system  could  prove  useful  for  in¬ 
dividuals  with  PI  that  hinder  the  use  of  one  hand,  it  is  important 
to  remember  that  the  efficacy  of  this  design  for  individuals  with 
limited  experience  using  the  QWERTY  keyboard  is  uncertain. 

Many  researchers  have  investigated  techniques  that  allow 
users  to  generate  text  using  fewer  keystrokes  than  are  normally 
required.  These  techniques  typically  predict  words,  given  just  a 
couple  of  letters,  using  statistics  that  describe  how  frequently 
various  words  are  used.  Some  systems  also  use  statistics  regard¬ 
ing  how  frequently  various  words  follow  words  that  have  already 
been  entered.  Other  techniques  present  multilevel  interfaces 
in  which  users  repeatedly  select  categories  until  the  desired 
word  is  available.  Once  the  word  is  selected,  an  appropriate 
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ending  (e.g.,  “s,”  “ing”)  can  be  added.  Demasco  and  McCoy 
(1992)  presented  a  theoretical  discussion  of  a  word-based  vir¬ 
tual  keyboard  that  uses  a  level-based  interface.  McCoy,  De¬ 
masco,  Jones,  Pennington,  Vanderheyden,  and  Zickus  (1994) 
discussed  a  word-based  text-entry  technique  by  which  users 
specify  uninflected  content  words  and  the  system  adds  appro¬ 
priate  endings  to  the  words  as  well  as  additional  words  (e.g., 
articles,  prepositions)  to  complete  the  sentence.  Garay-Vitoria 
and  Gonzalez-Abascal  (1997)  discussed  the  potential  of  syntac¬ 
tic  analysis  of  previously  entered  words  to  enhance  word  pre¬ 
diction  results.  Boissiere  and  Dours  (2000)  described  a  system 
that  predicts  word  endings  without  assistance  from  the  user. 
Unlike  many  of  the  articles  discussed  earlier,  all  of  these  arti¬ 
cles  take  either  a  theoretical  or  a  systems  approach  to  the  prob¬ 
lem.  Consequently,  some  of  the  systems  have  not  been  imple¬ 
mented  and  those  that  have  been  implemented  have  not  been 
evaluated  by  the  intended  end  users. 

Trewin  (2002)  studied  overlap  errors,  one  of  the  common 
typing  errors  in  which  two  keys  are  pressed  at  once,  to  improve 
typing  accuracy  for  people  with  PI.  While  some  users  with  PI  use 
keyguards  to  reduce  overlap  errors,  the  author  developed  a 
software  filter  to  identify  and  eliminate  overlap  errors.  The  soft¬ 
ware  used  keystroke  timing  and  language-based  techniques  to 
identify  and  automatically  correct  overlap  errors.  Twenty  par¬ 
ticipants  with  PI  tested  the  software.  The  most  effective  filter 
was  based  on  keystroke  timing  characteristics.  This  filter  identi¬ 
fied  97%  of  the  errors  and  misclassified  46%  of  the  deliberate 
overlaps.  The  accuracy  of  the  filters  was  dependent  on  the  typ¬ 
ing  styles  of  the  users. 

Willis,  Pain,  Trewin,  and  Clark  (2002)  developed  a  system 
that  automatically  reconstructs  full  sentences  from  abbreviated 
text,  allowing  people  with  PI  to  type  less  text.  Ten  individuals 
without  PI  participated  in  an  evaluation  study.  Four  texts  were 
used  in  the  study,  each  of  500  letters.  Each  participant  com¬ 
pleted  four  abbreviation  tasks  based  on  one  text  and  one  re¬ 
construction  task  based  on  a  different  text.  Both  texts  were  ran¬ 
domly  chosen  from  the  four  texts.  Several  abbreviation  patterns 
were  identified,  such  as  “double  letter”  being  transformed  to 
single  letter  and  “ch”  being  transformed  to  “k.”  These  patterns 
provided  insights  that  guided  the  development  of  an  abbrevia¬ 
tion  expansion  system  that  matches  users’  natural  abbrevia¬ 
tion  styles. 

Trewin,  Slobogin,  and  Power  (2002)  developed  a  keyboard 
optimizer  capable  of  adapting  the  keyboard  to  meet  the  users’ 
needs,  with  a  focus  on  users  with  mild  to  moderate  PI  that  neg¬ 
atively  affects  their  ability  to  use  a  keyboard.  Given  a  sample  of 
an  individual’s  typing  behaviors,  the  optimizer  generates  a  cus¬ 
tomized  profile  that  includes  details  about  how  the  individual 
types  that  can  allow  the  keyboard  to  be  configured  to  improve 
typing  outcomes  for  the  specific  individual.  The  system  was 
evaluated  by  eight  users  with  PI  that  affected  their  typing. 
Twenty-one  configuration  modifications  were  suggested  by  the 
optimizer,  with  20  being  judged  by  the  clinicians  as  being  ap¬ 
propriate.  Participants  liked  the  system  and  found  it  easy  to  use. 
Trewin  (2004)  further  improved  the  underlying  models  and 
built  a  dynamic  keyboard.  Ten  individuals  with  young-onset 
Parkinson’s  Disease  provided  input  to  the  system.  The  auto¬ 
matic  adjustment  of  the  key  repeat  delay  feature  was  accept¬ 
able  to  the  users. 


Wobbrock,  Myers,  Aung,  and  LoPresti  (2004)  developed  two 
integrated  gestural  text-entry  methods  designed  for  use  from 
power  wheelchairs.  Both  devices  can  be  integrated  into  exist¬ 
ing  controls  on  power  wheelchairs  and  are  small,  light,  inex¬ 
pensive,  and  require  minimal  configuration.  One  method  uses  a 
joystick  and  the  other  a  touchpad.  Both  methods  use  a  stylus- 
based  unistroke  text-entry  method  called  EdgeWrite  designed 
for  people  with  tremors.  EdgeWrite  relies  on  physical  edges  and 
corners  to  provide  stability  during  motion.  The  users  move  sty¬ 
lus,  finger,  or  joystick  along  the  physical  edges  and  into  the  cor¬ 
ners  of  a  square  bounding  the  input  area.  The  system  recog¬ 
nizes  the  input  based  on  the  order  corners  which  are  hit.  Seven 
power  wheelchair  users  joined  the  participatory  design  process 
and  evaluated  the  two  techniques.  Although  data  were  too  lim¬ 
ited  for  a  complete  statistical  analysis,  the  results  showed  po¬ 
tential  for  the  target  population.  The  touchpad  application  was 
faster  than  the  joystick  application,  allowing  users  to  generate 
an  average  of  one  word  per  minute. 

Speech  recognition  based  text  entry.  Goette  (1998) 
conducted  a  field  study  to  identify  factors  that  influence  the  suc¬ 
cessful  adoption  of  speech  recognition  software  for  both  dicta¬ 
tion-oriented  and  environmental  control  tasks.  Individuals  with 
various  health  conditions  (e.g.,  MS,  MD,  SP,  arthritis)  partici¬ 
pated.  Most  individuals  who  stopped  using  the  software  (53%) 
did  so  within  three  months.  Interestingly,  those  individuals  who 
successfully  adopted  the  software  had  higher  expectations  for 
the  system  and  expected  greater  benefits,  but  also  believed  the 
system  would  be  easy  to  use  as  compared  to  those  individuals 
who  stopped  using  the  software.  When  the  software  was  suc¬ 
cessfully  adopted,  it  was  used  for  a  wide  variety  of  computer- 
based  tasks  rather  than  a  few  isolated  activities.  Four  guidelines 
were  proposed  for  successful  outcomes:  (a)  managing  expecta¬ 
tions  by  understanding  the  potential  benefits  and  limitations, 
(b)  selecting  the  correct  system  for  the  tasks  to  be  accom¬ 
plished,  (c)  obtaining  thorough  training,  and  (d)  trying  the  sys¬ 
tem  for  an  extended  period  before  purchasing  it. 

Pieper  and  Kobsa  (1999)  developed  a  speech  system  to  allow 
severely  impaired  users  who  are  bedridden  to  interact  with  the 
computer.  The  authors  implemented  several  modifications  to 
the  commercially  available  system  to  cater  the  usage  environ¬ 
ment.  The  output  of  the  computer  was  displayed  on  the  ceiling 
and  the  speech  recognition  system  configured  to  accommodate 
the  use  of  respiratory  devices.  One  patient  who  had  ATS  for  five 
years  used  the  system  for  seven  months  to  write  poetry,  diaries, 
and  letters.  The  result  shows  that  the  projected  video  display 
was  acceptable  to  the  user.  Overall,  the  user  was  satisfied  with 
the  system. 

Manasse,  Hux,  and  Rankin-Erickson  (2000)  studied  the  use 
of  speech  recognition  to  help  a  19-year-old  survivor  of  traumatic 
brain  injury  (TBI).  This  study  focused  on  text  generation  tasks 
using  Dragon  NaturallySpeaking.  This  individual  experienced  a 
recognition  error  rate  of  approximately  20%.  The  participant 
quickly  and  easily  mastered  navigation  and  error-correction 
commands.  Interestingly,  the  participant  generated  more  text 
using  the  standard  keyboard  than  she  did  using  speech  recog¬ 
nition,  but  she  generated  more  complex  sentences  when  using 
the  speech-recognition  system.  Overall,  minimal  qualitative  dif¬ 
ferences  were  observed  in  the  resulting  text. 
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Bain,  Basson,  and  Wald  (2002)  focused  on  using  speech 
recognition  to  generate  lecture  notes  for  students  in  the  class¬ 
rooms.  Soft-  and  hardcopy  notes  can  provide  an  alternative  to 
traditional  note  taking  for  people  with  PI  or  perceptual  impair¬ 
ments.  More  recently,  Hewitt,  Lyon,  Britton,  and  Mellor  (2005) 
developed  the  “SpeakView”  system,  which  provides  speech-to- 
text  displays  for  meetings  and  in  classrooms.  Key  issues  that 
were  addressed  by  the  “SpeakView”  system  included  automati¬ 
cally  adding  punctuations,  style,  and  acceptability  of  the  cap¬ 
tions,  usability,  and  training.  Sixty-nine  students  evaluated  the 
system  by  observing  two  presentations  and  answering  prede¬ 
fined  questions.  Captions  produced  by  “SpeakView”  were  pro¬ 
vided  for  one  of  the  presentations.  The  use  of  captions  did  not 
influence  the  number  of  questions  answered  correctly  by  the 
students,  but  students  did  prefer  presentations  with  captions. 
Students  indicated  that  the  captions  made  the  talk  easier  to  un¬ 
derstand  and  that  they  felt  the  captions  helped  them  answer 
more  questions  correctly.  This  system,  if  fully  implemented  with 
acceptable  accuracy,  has  the  potential  to  benefit  individuals  with 
PI  or  hearing  impairments  who  find  taking  notes  difficult. 

Koester  (2004)  investigated  the  use  of  automatic  speech- 
recognition  technology  by  experienced  users.  Twenty-four  users 
with  PI  that  hindered  the  use  of  the  keyboard  and  mouse,  with 
six  months  of  experience  with  ASR,  participated  in  this  study. 
Each  participant  completed  a  series  of  predefined  tasks  using 
two  input  methods:  speech  and  nonspeech.  Participants  re¬ 
sponded  to  a  survey  about  their  usage  patterns.  The  results 
showed  that  speech  was  significantly  slower  for  command  input, 
but  there  was  no  significant  difference  in  text  entry  rates  be¬ 
tween  speech  and  nonspeech  input.  There  was  a  wide  variation 
among  text  entry  rates,  ranging  from  3-32  words  per  minute. 
There  was  also  a  wide  variation  in  recognition  accuracy,  ranging 
from  7296-94%.  The  study  confirmed  that  recognition  errors  are 
a  major  limitation  for  speech-based  input,  with  users  spending 
an  average  of  56%  of  their  time  on  error  correction.  Interestingly, 
the  participants  who  achieved  the  best  performance  were  those 
who  tended  to  employ  the  prescribed  correction  strategies. 

Eye-gaze-based  typing.  Lankford  (2000)  developed  an 
eye-gaze  interface  that  can  support  both  text  entry  and  cursor 
control  tasks.  The  system  used  dwell  time  to  provide  the  user 
with  access  to  various  mouse  clicking  actions.  However,  because 
of  the  limitations  of  the  human  eye,  gaze-based  clicking  was  not 
able  to  reliably  control  the  Windows  environment,  particularly 
for  users  with  PL  To  improve  the  accuracy  of  gaze-based  click¬ 
ing,  a  zooming  interface  was  developed  to  enlarge  the  objects 
when  needed.  Key  selection  was  accomplished  by  prolonged 
fixation  on  the  keyboard  key.  The  activation  time  could  be  cus¬ 
tomized  to  suit  individual  users.  A  list  of  possible  words  to  insert 
into  a  document,  based  on  what  the  user  had  already  typed,  was 
provided  to  speed  up  typing  and  reduce  errors.  No  user  study 
has  been  reported  concerning  the  effectiveness  of  this  system. 

Hansen,  Itoh,  Tprning,  Aoki,  and  Johanson  (2004)  investi¬ 
gated  the  effectiveness  of  gaze  typing  compared  to  hand  typing 
and  head  tracking.  Twenty-seven  participants  without  PI  evalu¬ 
ated  the  three  modalities  with  two  interfaces  in  different  lan¬ 
guages.  Each  participant  entered  predefined  text.  Participants 
achieved  a  typing  speed  of  6.22  words  per  minute.  The  eye¬ 
typing  system  resulted  in  an  error  rate  of  6%  with  the  Japanese  in¬ 


terface  and  28%  with  the  Danish  interface.  Significant  learning 
effects  were  observed  for  both  interfaces.  Overall,  gaze  typing 
was  the  least  efficient  and  least  satisfying  input  modality  among 
the  three  alternatives.  The  potential  of  the  gaze-based  typing 
and  its  advantages  compared  to  other  input  modalities  needs  to 
be  investigated  by  including  users  with  PI,  especially  those  with 
severe  PI  for  whom  gaze  typing  may  be  one  of  the  few  alterna¬ 
tives  that  are  available. 


ADOPTION  OR  ABANDONMENT 
OF  ASSISTIVE  TECHNOLOGIES 


It  has  been  estimated  that  over  1,000  assistive  technology  prod¬ 
ucts  come  to  the  market  each  year  (Scherer  &  Galvin,  1996).  Un¬ 
fortunately,  the  adoption  rates  of  assistive  technology  products 
are  highly  disappointing.  Many  new  technologies  never  reach 
the  majority  of  the  intended  users  due  to  a  lack  of  marketing 
activities  or  cost.  Even  when  the  technologies  reach  the  in¬ 
tended  users,  many  are  abandoned  within  a  short  period  of 
time.  Understanding  why  users  adopt  or  abandon  assistive  tech¬ 
nologies  is  critical  for  the  design,  implementation,  and  market¬ 
ing  of  these  technologies.  A  number  of  studies  that  can  provide 
valuable  insights  to  researchers,  designers,  as  well  as  healthcare 
providers  are  summarized  later. 

Scherer  (1996)  studied  adoption  rates  of  general  assistive 
technologies  and  found  that  many  people  abandon  those  tech¬ 
nologies.  Abandonment  rates  ranged  from  8%  for  life-support 
devices  to  75%  for  hearing  aids,  depending  on  the  type  of  de¬ 
vice.  On  average,  about  one  third  of  all  assistive  technologies 
were  abandoned.  However,  it  was  also  observed  that  many  peo¬ 
ple  who  continue  using  the  technology  are  unhappy  or  uncom¬ 
fortable  with  the  experience,  suggesting  that  the  underlying 
problems  are  even  more  significant  than  the  abandonment  rates 
would  suggest.  It  was  observed  that  most  abandonment  occurs 
within  the  first  year,  especially  the  first  three  months,  or  after 
five  years  of  use  (normally  because  insurance  policies  only  sup¬ 
port  the  first  five  years  of  use).  This  suggests  that  users  learn  rel¬ 
atively  quickly  whether  the  specific  assistive  technology  works 
for  them.  If  not,  they  abandon  it  quickly.  The  author  believes 
that  the  most  significant  factor  that  contributes  to  the  abandon¬ 
ment  of  assistive  technology  is  that  the  device  does  not  meet 
the  users’  needs  or  expectations. 

King  (1999)  discussed  the  factors  leading  to  the  failure  of  as¬ 
sistive  technologies,  highlighting  three  major  categories  of  fail¬ 
ures.  The  first  category  relates  to  the  people  surrounding  the 
user.  For  the  people  surrounding  the  user,  correctly  evaluating 
and  communicating  the  users’  condition  is  critical  since  many  as¬ 
sistive  technologies  fail  because  the  users’  real  needs  are  not 
known  by  the  designers.  The  second  category  relates  to  the 
users  themselves.  From  the  perspective  of  the  assistive  technol¬ 
ogy  users,  design  efforts  should  focus  on  matching  the  device’s 
features  to  the  user’s  needs,  instead  of  the  user  to  the  device. 
The  users’  age,  gender,  and  literacy  can  all  play  important  roles 
in  the  adoption  process.  The  third  category  relates  to  the  assis¬ 
tive  technology  device  itself  As  to  the  assistive  device,  insuffi¬ 
cient  consideration  of  human  factors  during  the  design  process 
has  caused  many  assistive  technologies  to  fail.  Considering  ease 
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of  use  and  portability,  the  designers  should  focus  on  lighter  and 
smaller  solutions  that  provide  easy  access  to  the  power  supply.  In 
addition,  assistive  devices  need  to  be  durable  and  repairable. 

Kintsch  and  DePaula  (2002)  proposed  a  framework  for  facili¬ 
tating  the  successful  adoption  of  assistive  technologies.  The 
framework  proposed  four  groups  of  participants  in  the  adoption 
process:  users,  caregivers,  designers,  and  assistive  technology 
specialists.  All  four  groups  of  participants  need  to  be  actively  in¬ 
volved  to  support  the  adoption  of  assistive  technologies.  In  the 
development  phase,  developers  need  to  customize  the  product 
to  the  users’  needs  and  abilities,  make  products  simple  and 
durable,  and  take  into  consideration  user  preferences.  In  the  se¬ 
lection  phase,  products  need  to  be  assessed  carefully  by  assistive 
technology  specialists.  The  effectiveness  of  the  product  for  the 
specific  user  must  be  evaluated,  currently  through  a  trial  and  re¬ 
assessment  process.  In  the  learning  phase,  caregivers  need  to  be 
trained  with  the  product  first  so  they  can  teach  and  support  the 
user.  In  the  integration  phase,  which  is  when  the  device  becomes 
integrated  into  daily  life,  special  patience  is  needed  since  tasks 
may  initially  require  more  time  as  the  user  gains  experience. 

Michaels  and  McDermott  (2003)  focused  on  the  adoption  of 
assistive  technologies  for  students  in  the  special  education  pro¬ 
grams.  The  authors  conducted  interviews  with  employees  from 
356  special  education  programs  and  collected  143  surveys  from 
special  education  teachers.  The  results  show  that  the  major  bar¬ 
riers  of  assistive  technology  adoption  in  special  education  pro¬ 
grams  include  funding,  time  limitations,  lack  of  faculty  knowl¬ 
edge  or  consistency  in  assistive  technology  focus,  perceptions 
that  assistive  technology  is  only  important  for  students  with  cer¬ 
tain  types  of  disabilities,  and  lack  of  short-  and  long-term  plans. 

Goette  (2000)  studied  the  use  of  speech-recognition  tech¬ 
nology  in  organizations  and  the  key  factors  that  contributed  to 
successful  adoption.  A  total  of  40  individuals  with  PI  or  sensory 
disabilities  who  were  using  or  had  used  speech  recognition 
technologies  were  interviewed.  Twenty-three  of  the  participants 
had  successfully  adopted  the  speech-recognition  technology, 
while  17  participants  had  abandoned  it.  The  results  suggest  that 
participants  who  successfully  adopted  the  technology  had  real¬ 
istic  expectations  about  the  adoption  process.  Importantly,  they 
expected  to  devote  more  time  and  effort  than  stated  in  the  user 
manual  to  become  proficient  with  the  technology.  For  those 
users  who  had  abandoned  the  technology,  unrealistic  employer 
expectations  was  one  of  the  key  reasons  for  abandonment. 


CONCLUSIONS 


The  most  obvious  conclusion  is  that  additional  research  is  nec¬ 
essary.  PI  vary  dramatically  in  terms  of  severity,  temporal  vari¬ 
ability,  and  the  body  parts  that  are  affected.  Numerous  tech¬ 
nologies  can  be  used  in  a  variety  of  configurations.  Given  this 
variability  and  the  limited  number  of  studies  reported  in  the  lit¬ 
erature,  it  is  clear  that  only  a  fraction  of  the  important  questions 
have  been  investigated.  Unfortunately,  conducting  informative 
studies  involving  individuals  with  PI  can  be  difficult.  Unlike 
computer  users  with  no  PI,  the  pool  of  potential  participants  is 
limited.  Even  when  appropriate  participants  can  be  found,  ad¬ 
ditional  factors  can  make  such  studies  difficult. 


While  additional  research  is  needed,  the  existing  literature 
does  provide  insights  that  can  prove  useful  to  both  practitioners 
and  researchers.  Individually,  the  articles  highlight  potential 
benefits  and  limitations  of  various  technologies  for  specific 
groups  of  individuals  as  well  as  new  technologies  that  may 
prove  useful  for  specific  tasks  or  individuals  in  the  future.  "When 
viewed  as  a  whole,  several  lessons  that  extend  beyond  the  tech¬ 
nology  or  specific  user  groups  studied  become  clear.  We  con¬ 
clude  by  highlighting  five  such  lessons. 

PI  Does  Not  Imply  Disability 

Impairments  can,  but  do  not  always,  result  in  disabilities  (see  Fig. 
42.1).  Several  of  the  studies  discussed  earlier  demonstrate  this 
fact.  Casali  (1992)  included  three  groups  of  participants  includ¬ 
ing  individuals  with:  PI  resulting  in  low  motor  skills,  PI  resulting 
in  high  motor  skills,  and  no  PI.  The  high-motor-skills  group  was 
able  to  complete  cursor  manipulation  tasks  just  as  quickly  as  the 
group  with  no  PI.  This  suggests  that  the  high-motor-skills  group 
did  not  experience  any  disability  because  of  their  PI.  Similarly, 
Sears,  Karat,  Oseitutu,  Karimullah,  and  Feng  (2001)  described  a 
study  in  which  individuals  with  high-level  spinal-cord  injuries 
were  able  to  compose  text  just  as  quickly  as  individuals  without 
PI.  Several  other  studies  provide  examples  where  technology  can 
either  reduce  or  eliminate  disabilities  resulting  from  specific  PI 
(e.g.,  Keates,  Langdon,  Clarkson,  &  Robinson,  2000).  Overall, 
these  results  clearly  indicate  that  it  is  possible,  under  the  right 
circumstances,  to  design  technology  such  that  specific  PI  will  not 
interfere  with  certain  computing  activities. 

PI  Affect  Cognitive,  Perceptual,  and  Motor  Activities 

PI  are  expected  to  affect  an  individual’s  ability  to  complete  basic 
motor  activities,  but  are  not  necessarily  expected  to  affect  perfor¬ 
mance  on  cognitive  and  perceptual  activities.  Many  of  the  articles 
discussed  earlier  confirm  differences  in  performance  for  motor 
activities  (e.g.,  LoPresti,  Brienza,  Angelo,  Gilbertson,  &  Sakai, 
2000;  Radwin,  ’Vanderheiden,  &  Lin,  1990).  More  importantly 
the  unexpected  affect  of  PI  on  cognitive  and  perceptual  activities 
becomes  apparent  by  analyzing  both  fundamental  interactions 
and  the  high-level  strategies  adopted  by  individuals  with  PI. 

Sears,  Karat,  Oseitutu,  Karimullah,  and  Feng  (2001)  found 
that  individuals  with  and  without  PI  responded  differently  when 
using  the  same  speech  recognition  system  to  accomplish  a  va¬ 
riety  of  tasks.  Individuals  with  PI  interrupted  their  dictation 
more  often,  spent  a  greater  percentage  of  their  time  dictating  as 
opposed  to  issuing  commands,  and  navigated  shorter  distances. 
The  results  reported  by  Koester  and  Levine  (1994)  also  highlight 
differences  between  individuals  with  and  without  PI  with  regard 
to  both  basic  cognitive,  perceptual,  and  motor  activities  as  well 
as  higher-level  strategies.  These  studies  confirm  the  importance 
of  integrating  individuals  with  PI  into  the  process  when  design¬ 
ing  new  computer  systems.  Existing  models  of  computer  user 
behavior  are  based  on  data  from  individuals  without  PI,  but  it  is 
clear  that  new  models  that  integrate  data  from  individuals  with 
PI  are  necessary  (Koester  &  Levine,  1994).  These  new  models 
must  address  differences  not  only  for  motor  activities,  but  for 
cognitive  and  perceptual  activities  as  well. 
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Basic  Actions  Can  Be  Difficult 

A  number  of  articles  have  highlighted  the  difficulty  individuals 
with  PI  may  experience  with  even  the  most  basic  interaction  ac¬ 
tivities.  Trewin  and  Pain  (1999)  highlighted  a  variety  of  difficul¬ 
ties  with  activities  as  fundamental  as  pressing  keys  on  a  key¬ 
board  or  pointing  at  and  clicking  on  objects  with  a  mouse.  Casali 
(1992)  also  illustrated  the  difficulty  individuals  with  PI  could 
have  with  activities  that  require  the  mouse  button  to  be  held 
down  while  dragging  the  mouse.  Unfortunately,  these  difficul¬ 
ties  become  even  more  significant  as  the  target  of  the  action 
gets  smaller  (Casali,  1992).  For  example,  Radwin,  Vanderheiden, 
and  Lin  (1990)  showed  that  performance  with  head-controlled 
devices  was  affected  by  target  size  even  more  than  the  mouse. 

Standardized  Descriptions  of  Health  Conditions  and  PI 

Perhaps  the  greatest  hindrance  for  both  practitioners  and  re¬ 
searchers  is  the  lack  of  accepted  standards  for  describing  the 
capabilities  and  limitations  of  the  individuals  who  will  make  use 
of  the  new  solutions  that  are  developed.  For  studies  including 
traditional  computer  users  without  PI,  we  typically  provide  basic 
demographics  such  as  age,  gender,  and  computer  experience. 
For  these  users,  we  typically  assume  “normal”  perceptual,  cog¬ 
nitive,  and  motor  abilities.  Occasionally,  researchers  even  may 
report  that  participants  all  had  “normal”  or  “corrected  to  nor¬ 
mal”  vision  or  hearing  if  these  abilities  are  particularly  relevant 
for  a  given  study.  For  individuals  with  PI  (or  cognitive  or  per¬ 
ceptual  impairments  for  that  matter) ,  we  need  to  describe  the 
individual’s  perceptual,  cognitive,  and  motor  abilities.  Unfortu¬ 
nately,  existing  studies  rarely  provide  sufficient  details  regarding 
health  conditions  or  associated  PI.  Many  articles  simply  list  a 
health  condition  for  each  participant  (e.g..  Cl)  choreoathetoid 
Ck  Incomplete  SCI  at  C6,  Friedrich’s  Ataxia) .  "While  this  provides 
a  general  sense  as  to  the  PI  that  may  exist,  it  does  not  provide  a 
sufficiently  detailed  understanding  of  the  participants’  abilities 
and  limitations. 

"When  a  health  condition  results  in  a  well-defined  set  of  PI,  a 
precise  description  of  the  underlying  health  condition  may  be 
sufficient.  For  example,  a  SCI  at  or  above  C5  with  an  ASIA  score 
of  A  or  B  results  in  paralysis  of  the  muscles  used  to  control  the 
arms  and  legs.  Given  this  health  condition,  we  know  that  an  in¬ 
dividual  would  not  be  able  to  use  her  hands  or  arms  when  in¬ 
teracting  with  a  computer.  In  contrast,  an  SCI  at  or  above  C5, 
with  an  ASIA  score  of  C  or  D,  results  in  much  more  ambiguous 
PI.  Similarly,  indicating  that  an  individual  has  CP  provides  gen¬ 
eral  insights,  but  the  resulting  PI  can  vary  dramatically.  When  a 
health  condition  does  not  map  to  a  precise  set  of  PI  that  are  sta¬ 
tic,  it  is  critical  to  describe  both  the  health  condition  and  the 
resulting  PI  (and  any  other  impairments  that  may  affect  inter¬ 
actions  with  computers). 

Health  conditions  are  best  described  by  physicians.  Physical 
examinations  can  often  provide  the  necessary  information,  but 
access  to  medical  records  may  be  sufficient.  Ideally,  a  standard¬ 
ized  scale  will  be  used  to  describe  the  health  condition,  such  as 
the  ASIA  scale  that  is  used  to  describe  the  completeness  of  a 
spinal  cord  injury.  However,  such  assessments  focus  on  the  sta¬ 
tus  of  the  individual’s  health,  not  necessarily  on  their  physical 


abilities.  Understanding  an  individual’s  health  can  provide  in¬ 
sights  into  changes  that  may  occur  in  the  future.  Unfortunately, 
most  of  the  time  health  conditions  do  not  map  directly  to  a  well- 
defined  set  of  PI. 

In  these  situations,  an  occupational  therapist  can  become  in¬ 
volved  in  assessing  each  participant’s  PI.  Occupational  thera¬ 
pists  can  evaluate  specific  skills  or  abilities,  including  sensory 
motor,  cognitive,  and  motor  skills  as  necessary.  The  primary  fo¬ 
cus  should  be  on  those  skills  and  abilities  that  may  affect  the 
individual’s  ability  to  interact  with  computers.  While  numerous 
standardized  tests  exist,  many  of  these  tests  focus  on  activities 
that  are  not  directly  relevant  to  the  use  of  information  tech¬ 
nologies.  Assessments  by  occupational  therapists  focus  on  the 
impairments  an  individual  experiences  that  may  affect  an  indi¬ 
vidual’s  ability  to  perform  certain  activities  without  regard  to  the 
underlying  health  condition.  These  assessments  focus  on  an  in¬ 
dividual’s  current  status  and  therefore  do  not  address  changes 
that  may  occur  due  to  the  progression  of  a  disease.  New  ap¬ 
proaches  for  assessing  and  documenting  the  physical  capabili¬ 
ties  of  individuals,  that  are  relevant  for  interactions  with  infor¬ 
mation  technologies,  could  prove  useful  in  a  variety  of  contexts. 
Effective  solutions  could  be  used  by  rehabilitation  specialists  as 
they  identify  appropriate  technologies  for  patients  as  well  as 
researchers  seeking  to  provide  more  comprehensive  descrip¬ 
tions  of  the  participants  in  their  research  studies. 

Successful  Adoption  of  Assistive 
Technologies  Is  Critical 

Assistive  technologies  are  critical  for  helping  users  with  PI  in¬ 
teract  with  computer  systems,  but  current  adoption  rates  are 
highly  disappointing.  Scherer  (1996)  indicated  that  abandon¬ 
ment  rates  for  assistive  technologies  range  from  8%  to  75%,  with 
approximately  one  third  of  all  assistive  technologies  being  aban¬ 
doned.  It  is  also  clear  that  the  majority  of  new  solutions  are 
abandoned  relatively  soon  after  the  initial  interaction.  King 
(1999)  suggested  that  successful  adoption  of  assistive  technolo¬ 
gies  requires  sufficient  consideration  of  human  factors  when  de¬ 
signing  devices  and  active  involvement  of  both  the  end-user  and 
the  people  surrounding  the  user,  such  as  the  caregivers  and 
healthcare  providers.  Kintsch  and  DePaula  (2002)  also  empha¬ 
sized  the  importance  of  active  involvement  by  the  end-users, 
caregivers,  designers,  and  assistive  technology  specialists.  To  de¬ 
sign  and  distribute  effective  assistive  technologies  that  users  will 
readily  adopt,  active  user  involvement  and  sufficient  considera¬ 
tion  for  human  factors  is  required. 


FUTURE  DIRECTIONS 


The  existing  literature 

•  Highlighted  the  potential  benefits  and  limitations  of  various 
technologies  for  specific  groups  of  individuals; 

•  Provided  examples  of  situations  in  which  PI  did  not  result  in 
disabilities; 

•  Illustrated  the  potential  impact  of  PI  on  cognitive,  perceptual, 
and  motor  activities; 
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•  Confirmed  the  difficulties  individuals  with  PI  may  experience 
with  basic  actions  such  as  pressing  keys  and  clicking  the 
mouse  button,  as  well  as  more  complex  actions; 

•  Reenforced  the  need  for  standardized  descriptions  of  health 
conditions  and  PI; 

•  Highlighted  how  infrequently  assistive  technologies  are  suc¬ 
cessfully  adopted  as  well  as  some  of  the  underlying  problems 
that  lead  to  abandonment. 

This  same  literature  confirmed  the  potential  of  various  new 
technologies  while  highlighting  the  need  for  additional  re¬ 
search.  Several  studies  illustrated  the  use  of  body  parts  that  are 
normally  ignored  when  designing  computer  interfaces  (e.g., 
Shaw,  Loomis,  &  Crisman,  1995;  Salem  &  Zhai,  1997).  Others  ex¬ 
plored  the  potential  of  using  electrophysiological  data  to  con¬ 
trol  computers  (e.g.,  Kennedy,  Kakay,  Moore,  Adams,  &  Gold- 
waithe,  2000;  Kiibler,  Kochoubey,  Hinterberger,  Ghanayim, 
Perelmouter,  Schauer,  et  al.,  1999;  Doherty,  Cockton,  Bloor,  & 
Benigno,  2000).  These  studies  were  exploratory,  but  they  high¬ 
light  the  potential  of  these  new  technologies,  especially  for  in¬ 
dividuals  with  severe  PI. 

We  must  continue  to  explore  technologies  that  support  al¬ 
ternative  methods  of  interacting  with  computers,  but  we  also 
need  a  better  understanding  of  both  the  potential  and  limita¬ 
tions  of  existing  technologies.  Existing  studies  provide  examples 
where  individuals  with  PI  were  just  as  fast  and  accurate  as  indi¬ 
viduals  with  no  PI,  but  important  differences  were  also  apparent 
for  satisfaction  ratings  and  the  strategies  that  the  individuals 
with  PI  employed.  Other  studies  highlighted  situations  in  which 
PI  resulted  in  longer  task  completion  times,  higher  error  rates, 
or  more  subtle  differences  that  only  became  apparent  when 
performance  was  analyzed  in  detail.  Future  studies  should  at¬ 


tend  to  changes  in  cognitive,  perceptual,  and  motor  activities, 
which  may  reveal  themselves  through  task  completion  times, 
error  rates,  satisfaction  ratings,  or  through  changes  in  the  strate¬ 
gies  that  individuals  adopt  when  completing  a  task.  To  allow 
both  researchers  and  practitioners  to  more  effectively  interpret 
and  generalize  the  results  of  such  studies,  we  must  carefully 
document  both  the  health  conditions  and  the  PI  of  study  par¬ 
ticipants.  Ideally,  new  standardized  methods  for  describing  the 
physical  capabilities  of  study  participants  will  be  developed  and 
adopted. 

While  research  into  the  efficacy  of  various  technologies  for  in¬ 
dividuals  with  PI  is  important,  studies  that  assess  the  outcomes 
when  assistive  technologies  are  actually  used  are  also  critical 
(Fuhrer,  2001).  Research  studies  highlight  the  potential  of  a  tech¬ 
nology,  but  do  not  guarantee  the  success  of  a  technology  when 
used  in  the  field.  Outcomes  research  will  ultimately  determine 
the  success  or  failure  of  the  technologies  developed  by  the  HCI 
community  for  individuals  with  PI.  As  a  result,  it  is  important  for 
HCI  researchers  and  system  developers  to  work  closely  with 
those  rehabilitation  professionals  who  evaluate  the  needs  of  in¬ 
dividuals  with  PI,  determine  which  technologies  they  should  be 
using,  and  assess  the  success  or  failure  of  these  technologies. 
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INTRODUCTION 


The  introduction  of  early  computers  facilitated  new  ways  for 
individuals  with  visual  impairment  to  access  information  elec¬ 
tronically:  magnified,  in  Braille,  or  aurally  via  the  conversion  of 
digital  information.  However,  the  introduction  of  GUIs  that  pre¬ 
sent  digital  information  via  visual  metaphors  and  icons  con¬ 
tributed  to  the  digital  divide,  which  can  hamper  the  productiv¬ 
ity  of  this  population.  In  many  cases,  even  access  to  documents 
and  forms  can  be  a  difficult  to  impossible  task  when  available 
through  the  direct  manipulation  paradigm  (Fortuin  &  Omata, 
2004).  The  exclusive  reliance  of  GUIs  on  the  visual  interaction 
paradigm  therefore  threatens  to  limit  accessibility  for  anyone 
whose  visual  channel  is  compromised  (Dix,  Finlay,  Abowd,  & 
Beale,  1998).  This  chapter  provides  readers  with  (a)  an  intro¬ 
duction  to  the  visual  sensory  channel;  (b)  a  review  of  research 
approaches,  models,  and  theories  that  are  relevant  to  HCI 
and  visual  impairment;  and  (c)  a  discussion  of  forms  of  visual 
dysfunction  in  the  research,  design,  and  evaluation  of  human- 
computer  systems. 

The  interaction  strategies  and  related  interaction  barriers  for 
individuals  with  visual  impairments  in  the  past  15  years  has  re¬ 
ceived  growing  attention  in  an  attempt  to  inform  judicious,  in¬ 
clusive  design  for  accessible  information  technologies  (e.g..  As¬ 
sistive  Technologies  for  Independent  Aging:  Opportunities  and 
Challenges,  2004;  Arditi,  2002;  Brewster,  Wright,  &  Edwards, 
A.  D.  N.,  1994;  Craven,  2003;  Fortuin  &  Omata,  2004;  Fraser  & 
Gutwin,  2000;  Gaver,  1989;  Jacko,  1999;  Jacko,  Barnard,  et  al., 
2004;  Jacko,  Barreto,  Scott,  Chu,  et  al.,  2002;  Jacko,  Barreto, 
Scott,  Rosa,  &  Pappas,  2000;  Jacko,  Moloney,  et  al.,  2005;  Jacko, 
Rosa,  Scott,  Pappas,  &  Dixon,  1999;  Jacko  &  Sears,  1998).  Visual 
impairments  (that  do  not  lead  to  blindness)  can  create  barriers 
to  distinguishing  fine  details  of  iconic  screen  targets  and  to 
tracking  the  highly  dynamic  nature  of  the  pointer  used  to  ma¬ 
nipulate  these  icons  (Fraser  &  Gutwin,  2000).  This  is  largely  at¬ 
tributed  to  difficulty  in  manipulating  objects  with  the  pointer 
due  to  reduced  visual  acuity  and  visual  field. 

In  terms  of  the  visual  sensory  channel,  it  is  known  that  user 
behavior  is  strongly  influenced  by  the  nature  and  amount  of 
residual  vision  the  user  experiences  in  combination  with  com¬ 
puter  interface  characteristics.  As  an  extreme  example,  a  blind 
user  without  any  functional  vision  will  use  fundamentally  dif¬ 
ferent  coping  skills  to  navigate  an  interface  as  compared  to  an 
individual  with  clouded  vision  due  to  cataracts  (Jacko  &  Sears, 
1998).  Harper,  Goble,  and  Stevens  (2001)  emphasized  that  the 
differences  in  orientation,  navigation,  travel,  and  mobility  of  vi¬ 
sually  impaired  versus  sighted  individuals  should  be  considered 
in  the  design  of  technology  because  there  are  differences  in  the 
mental  map  and  cognitive  processes  that  occur  across  the  spec¬ 
trum  of  visual  abilities. 

The  impetus  for  this  chapter  is  two  parts.  First,  the  number  of 
individuals  who  report  low  vision  is  anticipated  to  rise  sharply 
with  the  aging  baby  boomers  (who  are,  on  average,  living  longer) 
as  they  experience  age-related  changes  to  their  functional  vision 
(e.g.,  reduced  visual  acuity,  presbyopia,  contrast  sensitivity,  color 
sensitivity,  depth  perception,  and  glare  sensitivity).  They  are  in¬ 
creasingly  predisposed  to  acquire  ocular  diseases  associated  with 
older  age  (e.g.,  macular  degeneration,  diabetic  retinopathy, 


glaucoma,  and  cataracts;  for  a  review,  see  Orr,  1998;  Schieber, 
1994).  Secondly,  the  digital  divide  imposed  on  the  population 
with  visual  impairments  has  been  measured  in  terms  of  technol¬ 
ogy  access  and  unemployment  (Gerber  &  Kirchner,  2001).  Look¬ 
ing  beyond  the  United  States,  as  the  need  for  information  in¬ 
creases  globally  so  does  the  diversity  of  the  people  requiring 
access.  As  a  result,  a  potentially  large  number  of  users  may  be  dis¬ 
advantaged  with  respect  to  gaining  access  to  a  variety  of  types 
of  information  without  adequate  accommodations. 

The  framework  for  the  structure  of  this  chapter  results  from 
an  HCI  approach  first  introduced  by  Jacko  and  Vitense  (2001), 
and  further  clarified  byjacko,  Vitense,  and  Scott  (2002).  Initial 
work  in  this  research  area  included  a  comprehensive  review  of 
the  literature  to  facilitate  the  development  of  a  categorization 
scheme  to  account  for  categories  of  impairment.  From  this  lit¬ 
erature  review,  five  major  categories  emerged:  (a)  hearing  im¬ 
pairments,  (b)  mental  impairments,  (c)  physical  impairments, 
(d)  speech  impairments,  and  (e)  visual  impairments.  Fig.  43.1 
illustrates  that  each  of  the  five,  overarching  categories  is  com¬ 
posed  of  a  collection  of  clinical  diagnoses  unique  to  that  cate¬ 
gory  (depicted  in  Fig.  43.1  byAi,...,An,  Bi,...,Bn,  Ci,. Dj,..., 
Dn  and  Ei,...,En).  Each  diagnosis,  in  turn,  influences  certain 
functional  capabilities  that  are  critical  to  the  access  of  informa¬ 
tion  technologies  (depicted  in  Fig.  43.1  by  Yj, ...,¥„).  A  subset  of 
these  functional  capabilities  can  be  directly  linked  to  specific 
classes  of  technologies  (shown  at  the  bottom  of  the  diagram). 

While  this  framework  is  applicable  to  the  five  identified  cat¬ 
egories  of  impairment,  its  discussion  and  demonstrated  utility 
in  the  scope  of  this  chapter  will  address  only  visual  impairment. 
Consider,  for  example,  a  person  who  has  been  diagnosed  with 
a  specific  type  of  visual  impairment  represented  as  Ej  in  Fig. 
43.1.  This  visual  impairment  results  in  measurable  decrements 
to  certain  aspects  of  this  person’s  functional  visual  capabilities, 
Y-j  and  Yg.  Observe  from  Fig.  43.1  that  the  decrement  to  the 
functional  capability  represented  Yg  does  not  impede  a  person’s 
access  to  any  of  the  technology  classes  depicted  in  Fig.  43.1.  In 
contrast,  the  functional  visual  capability  represented  by  Y-j  in 
Fig.  43.1,  which  impedes  this  person’s  ability  to  successfully 
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FIGURE  43. 1 .  Framework  for  the  Integration  of  Clinical  Diag¬ 
noses,  Functional  Capabilities,  and  Access  to  Classes  of  Tech¬ 
nologies  (Adapted  from  lacko,  Vitense,  &  Scott,  2002). 


43.  Perceptual  Impairments:  New  Advancements  Promoting  Technological  Access  •  855 


access  information  using  all  four  examples  of  information  tech¬ 
nology.  From  this  conceptual  representation,  it  is  apparent  that 
much  more  knowledge  is  needed  for  researchers  to  possess  an 
accurate  depiction  of  the  empirical  relationships  that  exist  be¬ 
tween  diagnoses,  functional  capabilities,  and  access  to  specific 
classes  of  technologies.  More  specifically,  emerging  from  this 
conceptual  framework  are  several  key  research  areas  in  need  of 
investigation: 

1.  It  is  critical  to  establish  empirical  links  between  clinical  diag¬ 
noses  and  sets  of  functional  capabilities. 

2.  It  is  necessary  to  define  the  set  of  functional  capabilities 
required  to  access  information  technologies. 

3.  It  is  essential  to  establish  empirical  bases  for  the  influence 
of  specific  functional  capabilities  on  access  to  specific  classes 
of  technologies. 

While  the  framework  illustrates  five  categories  of  impair¬ 
ment,  this  chapter  will  focus  on  visual  impairment  and  func¬ 
tional  visual  capability.  Auditory/speech,  cognitive  impairments 
and  physical  (motor)  impairments  are  covered  in  this  handbook 
in  accompanying  chapters. 

Following  this  framework,  this  chapter  introduces  readers  to 
the  clinical  definition  of  visual  impairments  and  diagnoses,  in 
which  visual  function  is  first  addressed,  followed  by  a  discussion 
of  the  leading  causes  of  vision  loss  in  the  United  States  and  be¬ 
yond.  The  section  also  provides  a  discussion  of  specific  visual 
functions.  Then,  the  chapter  highlights  recent  advancements 
in  HCI  research,  which  effectively  link  the  three  major  areas  of 
the  framework:  clinical  diagnosis,  functional  impairment,  and 
interaction  across  a  variety  of  classes  of  technology  and  impair¬ 
ments.  This  research  is  done  in  pursuit  of  HCI  solutions  for  the 
visually  impaired  including  perceptual  interfaces,  multimedia  in¬ 
terfaces,  multimodal  interfaces,  and  adaptive  interfaces.  Finally, 
examples  of  technological  advancements  for  this  population  are 
presented.  While  these  technological  advancements  are  dis¬ 
cussed  briefly  in  this  chapter  within  the  context  of  visual  im¬ 
pairments,  it  should  be  noted  that  additional  information  on 
HCI  and  perceptual-motor  interactions,  hearing  and  speech  im¬ 
pairments,  and  motor  impairments  can  be  found  in  their  re¬ 
spective  chapters. 


VISUAL  FUNCTION 


Definitions 

When  considering  the  impact  of  visual  loss  on  an  individual’s 
ability  to  use  a  computer  effectively,  it  is  first  necessary  to  un¬ 
derstand  various  dimensions  of  visual  performance.  Many  dif¬ 
ferent  terms  have  been  used  to  refer  to  abnormalities  in  visual 
function,  including  disorder,  impairment,  disability,  and  hand¬ 
icap  (Colenbrander,  1977).  Although  often  used  as  synonyms, 
there  are  distinct  differences.  For  instance,  while  disorder  and 
impairment  describe  aspects  of  an  organ’s  condition,  the  term 
disability  describes  aspects  of  a  patient’s  condition. 

Disorder  refers  to  an  abnormality  in  the  anatomy  or  physi¬ 
ology  of  an  organ  and,  in  the  case  of  a  visual  disorder,  may  occur 


anywhere  in  the  visual  system.  Examples  of  visual  disorders  in¬ 
clude  corneal  scar,  cataract,  macular  degeneration,  optic  atrophy, 
or  occipital  stroke.  It  is  important  to  recognize  that  knowing  the 
specific  visual  disorder  provides  no  information  concerning  the 
functional  capacity  of  the  eye. 

Impairment  refers  to  a  functional  abnormality  in  the  organ. 
Thus,  varying  degrees  of  visual  impairment  can  be  measured  in 
terms  of  specific  visual  functions,  such  as  visual  acuity,  contrast 
sensitivity,  visual  field,  or  color  vision.  While  such  impairment 
measures  demonstrate  how  well  the  eye  functions,  they  do  not 
reveal  the  impact  of  the  visual  disorder  on  the  patient’s  ability  to 
perform  everyday  activities.  For  example,  a  physician  may  state 
that  the  patient’s  visual  acuity  has  dropped  by  four  lines  on  the 
eye  chart,  while  the  patient  reports  an  inability  to  see  well 
enough  to  use  a  computer. 

Disability  refers  to  the  ability  of  a  patient  (rather  than  an  or¬ 
gan)  to  perform  tasks,  such  as  daily  living  skills,  vocational  skills, 
reading,  writing,  mobility  skills,  and  so  on.  Since  disability  im¬ 
plies  a  broader  perspective  (the  focus  is  on  the  person  as  a 
whole  rather  than  on  a  specific  organ),  it  is  no  longer  entirely  vi¬ 
sion  specific.  For  instance,  while  computer  skills  may  be  re¬ 
duced  due  to  vision  loss,  they  may  also  suffer  due  to  such  con¬ 
ditions  as  arthritis.  It  is  the  combination  of  visual  and  nonvisual 
skills  that  determines  the  abilities  or  disabilities  of  an  individual. 
"Vision  substitution  techniques  (such  as  the  use  of  a  white  cane 
and  increased  reliance  on  memory  and  on  hearing)  may  be 
helpful  in  improving  the  ability  of  an  individual  to  perform  spe¬ 
cific  tasks. 

A  disorder  may  cause  impairments,  and  impairments  may 
cause  disability.  However,  these  links  are  not  rigid.  An  analysis  of 
these  various  dimensions  of  vision  loss  permits  identification 
of  interventions  at  each  link,  which  may  improve  the  functional 
status  and  quality  of  life  of  an  individual  with  visual  loss  (Fig. 
43.1;  Fletcher,  1999).  For  example,  if  one  were  interested  in  im¬ 
proving  the  ability  of  an  individual  with  vision  loss  to  use  a  com¬ 
puter  effectively,  possible  interventions  include  medical  and  sur¬ 
gical  intervention  to  impact  the  visual  disorder  and  impairment, 
visual  aids  and  adaptive  devices  to  impact  visual  impairment, 
and  social  interventions,  training,  counseling,  and  education  to 
impact  visual  disability.  The  design  of  a  computer  interface  that 
enhances  the  ability  of  an  individual  with  vision  loss  to  perceive 
graphical  and  textual  information  would  have  a  beneficial  ef¬ 
fect  on  the  degree  of  impairment  and  resulting  level  of  disability 
of  that  individual. 

Epidemiology 

Low  vision  has  been  defined  as  a  permanent  visual  impairment 
that  is  not  correctable  with  glasses,  contact  lenses,  or  surgical  in¬ 
tervention,  and  which  interferes  with  normal  everyday  func¬ 
tioning  (Mehr  &  Freid,  1975).  Specifically,  low  vision  is  defined 
as  a  best-corrected  visual  acuity  worse  than  20/40  in  the  better- 
seeing  eye  (The  Eye  Diseases  Prevalence  Research  Group, 
2004a).  Blindness  is  defined  as  a  best-corrected  visual  acuity  of 
20/200  or  worse  according  to  the  United  States  definition  and 
is  defined  as  a  best-corrected  visual  acuity  of  worse  than  20/400 
according  to  the  World  Health  Organization  definition  (The  Eye 
Diseases  Prevalence  Research  Group,  2004a).  Based  on  demo- 
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graphics  from  the  2000  U.S.  Census,  an  estimated  937,000 
(0.78%)  Americans  older  than  40  years  were  blind  (United  States 
definition);  an  additional  2.4  million  Americans  (1.98%)  has  low 
vision  (The  Eye  Diseases  Prevalence  Research  Group,  2004a). 
Thus,  blindness  or  low  vision  affects  approximately  1  in  28 
Americans  older  than  40  years  (The  Eye  Diseases  Prevalence  Re¬ 
search  Group,  2004a).  Largely  due  to  the  aging  of  the  U.S.  pop¬ 
ulation,  it  is  estimated  that  the  number  of  blind  persons  older 
than  40  years  in  the  United  States  will  increase  by  approximately 
70%  to  1.6  million  (prevalence  of  1.1%)  by  the  year  2020  (The 
Eye  Diseases  Prevalence  Research  Group,  2004a).  Vision  loss 
has  been  ranked  third,  behind  arthritis  and  heart  disease, 
among  conditions  that  cause  persons  older  than  70  years  to 
need  assistance  in  activities  of  daily  living  (LaPlante,  1988). 

Selected  Visual  Disorders 

In  the  United  States,  the  most  common  causes  of  decreased  vi¬ 
sion  are  age-related  macular  degeneration  (AMD),  diabetic 
retinopathy,  and  glaucoma. 

Age-related  macular  degeneration.  AMD  is  the  lead¬ 
ing  cause  of  irreversible  visual  loss  in  the  western  world  in  indi¬ 
viduals  over  60  years  of  age.  The  macula  is  the  part  of  the  retina 
that  is  responsible  for  central  vision.  AMD  affects  more  than 
1.75  million  individuals  in  the  United  States,  and  due  to  the 
rapid  aging  of  the  U.S.  population,  this  number  is  estimated  to 
increase  to  almost  3  million  by  the  year  2020  (The  Eye  Diseases 
Prevalence  Research  Group,  2004b).  The  prevalence  of  severe 
visual  loss  due  to  AMD  increases  with  age.  In  the  United  States, 
at  least  10%  of  persons  between  the  ages  of  65  and  75  years  have 
lost  some  central  vision  due  to  AMD;  among  individuals  over 
the  age  of  75  years,  30%  have  vision  loss  due  to  AMD. 

Risk  factors  for  this  disease  and  its  progression  include  age, 
sunlight  exposure,  smoking,  ocular  and  skin  pigmentation,  ele¬ 
vated  blood  pressure,  and  elevated  serum  cholesterol  levels. 
The  role  of  nutrition  has  not  been  fully  identified  as  a  risk  factor, 
but  a  diet  low  in  antioxidants  and  lutein  may  be  a  contributing 
factor. 

AMD  is  a  bilateral  disease  in  which  visual  loss  in  the  first  eye 
usually  occurs  at  about  65  years  of  age;  the  second  eye  becomes 
involved  at  the  rate  of  approximately  10%  per  year.  The  two 
main  types  of  AMD  are  atrophic  and  exudative.  The  atrophic 
(dry)  form  of  the  disease  is  generally  a  slowly  progressive  dis¬ 
ease  that  accounts  for  approximately  90%  of  cases.  It  is  charac¬ 
terized  by  the  deposition  of  abnormal  material  beneath  the 
retina  (drusen)  and  by  degeneration  and  atrophy  of  the  central 
retina  (also  known  as  the  macula);  patients  typically  note  slowly 
progressive  central  visual  loss.  Although  much  less  common, 
the  exudative  (wet)  form  of  the  disease  is  responsible  for  about 
88%  of  legal  blindness  attributed  to  AMD.  This  form  of  the  dis¬ 
ease,  which  often  occurs  in  association  with  atrophic  AMD,  is 
characterized  by  the  growth  of  abnormal  blood  vessels  beneath 
the  central  retina  (macula);  these  abnormal  blood  vessels  ele¬ 
vate  and  distort  the  retina  and  may  leak  fluid  and  blood  beneath 
or  into  the  retina.  Vision  loss  may  be  sudden  onset  and  rapidly 
progressive  (in  contrast  to  the  atrophic  form  of  the  disease, 
where  vision  loss  generally  occurs  progressively  over  several 


months  or  years).  AMD  can  cause  profound  loss  of  central  vi¬ 
sion,  but  the  disease  generally  does  not  affect  peripheral  vision, 
and  therefore,  patients  typically  retain  their  abilities  to  ambulate 
independently. 

Currently,  no  proven  treatment  reverses  the  retinal  damage 
that  has  already  occurred  due  to  AMD.  In  order  to  try  to  prevent 
further  vision  loss,  recommendations  made  to  patients  include 
eye  protection  against  ultraviolet  light  exposure  (sunglasses  with 
ultraviolet  light  protection),  no  smoking,  optimal  control  of 
blood  pressure  and  serum  cholesterol  level,  and  a  diet  rich  in 
dark  green  leafy  and  orange  vegetables  (antioxidants  are  be¬ 
lieved  to  reduce  the  damaging  effects  of  light  on  the  retina 
through  their  reducing  and  free-radical  scavenging  actions; 
lutein  is  a  macular  pigment).  The  Age-Related  Eye  Disease  Study 
(AREDS)  was  a  randomized,  controlled  clinical  trial  that  demon¬ 
strated  a  statistically  significant  reduction  in  rates  of  at  least  mod¬ 
erate  visual  acuity  loss  in  persons  with  moderate  AMD  who  re¬ 
ceived  supplementation  with  vitamins  C  and  E,  beta  carotene, 
and  zinc  compared  to  persons  with  moderate  AMD  assigned  to 
placebo  (Age-Related  Eye  Disease  Study  Research  Group,  2001). 
Laser  photocoagulation  and  photodynamic  treatment  of  the  ab¬ 
normal  blood  vessels  found  in  patients  with  exudative  macular 
degeneration  may  help  to  prevent  severe  vision  loss  in  some 
cases.  Intravitreal  injections  of  pegaptanib,  a  pegylated  modi¬ 
fied  oligonucleotide  that  binds  to  extracellular  vascular  en¬ 
dothelial  growth  factor  (VEGF)  isoform  165  (the  isoform  widely 
considered  to  be  the  primary  pathologic  form  of  VEGF),  are 
associated  with  a  higher  likelihood  of  visual  preservation  and 
a  slowing  of  visual  loss  among  patients  with  exudative  AMD 
(Gragoudas,  Adamis,  Cunningham,  Feinsod,  &  Guyer,  2004). 
Surgical  rotation  of  the  retina  away  from  the  area  of  abnormal 
blood  vessels  has  also  been  effective  in  some  cases.  Other  treat¬ 
ment  modalities  currently  under  investigation  include  such 
drugs  as  corticosteroids,  other  anti-VEGF  agents,  and  combina¬ 
tion  treatments. 

Diabetic  retinopathy.  Approximately  16  million  Amer¬ 
icans  suffer  from  diabetes  mellitus,  most  of  whom  will  develop 
diabetic  retinopathy  within  20  years  of  their  diagnoses.  In  fact, 
after  20  years  of  diabetes,  nearly  99%  of  those  with  insulin- 
dependent  diabetes  mellitus  and  60%  with  non-insulin-dependent 
diabetes  mellitus  have  some  degree  of  diabetic  retinopathy.  Di¬ 
abetic  retinopathy  is  the  leading  cause  of  legal  blindness  in 
Americans  aged  20  to  65  years,  with  10,000  new  cases  of  blind¬ 
ness  annually.  One  million  Americans  have  proliferative  diabetic 
retinopathy,  and  500,000  have  macular  edema.  Among  an  esti¬ 
mated  10.2  million  American  adults  aged  40  years  and  older 
known  to  have  diabetes  mellitus,  the  estimated  crude  preva¬ 
lence  rates  for  retinopathy  and  vision-threatening  retinopathy 
are  40.3%  and  8.2%,  respectively  (The  Eye  Diseases  Prevalence 
Research  Group,  2004b).  The  estimated  U.S.  general  popula¬ 
tion  prevalence  rates  for  retinopathy  and  vision-threatening 
retinopathy  are  3.4%  (4.1  million  persons)  and  0.75%  (899,000 
persons),  respectively  (The  Eye  Diseases  Prevalence  Research 
Group,  2004c). 

The  major  risk  factor  for  diabetic  retinopathy  is  duration  of 
diabetes;  it  is  estimated  that  at  15  years,  80%  of  diabetics  will 
have  background  retinopathy  and  that,  of  these,  5%  to  10%  will 
progress  to  proliferative  changes.  Other  risk  factors  include 
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long-term  diabetic  control  (as  reflected  in  serum  levels  of  gly¬ 
cosylated  hemoglobin),  hypertension,  smoking,  and  elevated 
serum  cholesterol. 

There  are  two  main  types  of  diabetic  retinopathy:  nonprolif¬ 
erative  and  proliferative.  Nonproliferative  diabetic  retinopathy 
refers  to  retinal  microvascular  changes  that  are  limited  to  the 
confines  of  the  retina  and  include  such  findings  as  micro¬ 
aneurysms,  dot  and  blot  intraretinal  hemorrhages,  retinal  edema, 
hard  exudates,  dilation  and  bleeding  of  retinal  veins,  intraretinal 
microvascular  abnormalities,  nerve  fiber  layer  infarcts,  arteriolar 
abnormalities,  and  focal  areas  of  capillary  nonperfusion.  Non¬ 
proliferative  diabetic  retinopathy  can  affect  visual  function 
through  two  mechanisms:  intraretinal  capillary  closure  resulting 
in  macular  ischemia,  and  increased  retinal  vascular  permeability 
resulting  in  macular  edema.  Clinically  significant  macular  edema 
is  defined  as  any  one  of  the  following:  (a)  retinal  edema  located 
at  or  within  500  um  of  the  center  of  the  macula;  (b)  hard  exu¬ 
dates  at  or  within  500  um  of  the  center  if  associated  with  thick¬ 
ening  of  adjacent  retina;  and  (c)  a  zone  of  retinal  thickening 
larger  than  one  optic  disc  area  if  located  within  one  disc  diame¬ 
ter  of  the  center  of  the  macula. 

Proliferative  diabetic  retinopathy  is  characterized  by  ex- 
traretinal  fibrovascular  proliferation;  that  is,  fibrovascular 
changes  that  extend  beyond  the  confines  of  the  retina  and  into 
the  vitreous  cavity.  Fibrovascular  proliferation  in  proliferative  di¬ 
abetic  retinopathy  may  lead  to  tractional  retinal  detachment  and 
vitreous  hemorrhage.  High-risk  proliferative  diabetic  retinopa¬ 
thy  is  defined  by  any  combination  of  three  of  the  four  follow¬ 
ing  retinopathy  risk  factors:  (a)  presence  of  vitreous  or  prereti- 
nal  hemorrhage;  (b)  presence  of  new  vessels;  (c)  location  of 
new  vessels  on  or  near  the  optic  disc;  and  (d)  moderate  to  se¬ 
vere  extent  of  new  vessels  (Diabetic  Retinopathy  Study  Research 
Group,  1979). 

Management  of  diabetic  retinopathy  includes  referring  the 
patient  to  an  internist  for  optimal  glucose  and  blood  pressure 
control.  In  the  Early  Treatment  Diabetic  Retinopathy  Study,  fo¬ 
cal  or  grid  laser  photocoagulation  treatment  for  clinically  sig¬ 
nificant  macular  edema  reduced  the  risk  of  moderate  visual 
loss,  increased  the  chance  of  visual  improvement,  and  was  as¬ 
sociated  with  only  mild  loss  of  visual  field  (Early  Treatment  Di¬ 
abetic  Retinopathy  Study  Research  Group,  1995).  Intravitreal 
triamcinolone  acetonide  and  other  intravitreal  anti-VEGE 
agents  are  currently  under  investigation  for  the  treatment  of  di¬ 
abetic  macular  edema.  Panretinal  laser  photocoagulation  treat¬ 
ment  of  eyes  with  high-risk  proliferative  diabetic  retinopathy 
reduced  the  risk  of  severe  visual  loss  by  50%  compared  to  un¬ 
treated  control  eyes  (Diabetic  Retinopathy  Study  Research 
Group,  1981).  Surgery  is  often  indicated  for  nonclearing  vitre¬ 
ous  hemorrhage  and  for  tractional  retinal  detachment  involv¬ 
ing  or  threatening  the  macula. 

Glaucoma.  Primary  open  angle  glaucoma  (POAG)  is  the 
most  prevalent  type  of  glaucoma,  affecting  1.3%  to  2.1%  of  the 
general  population  over  the  age  of  40  years  in  the  United  States. 
In  the  United  States,  the  disease  is  the  leading  cause  of  irre¬ 
versible  blindness  among  Blacks  and  the  third  leading  cause 
among  Whites  (following  AMD  and  diabetic  retinopathy) ,  and 
is  responsible  for  12%  of  legal  blindness.  Risk  factors  for  the  dis¬ 
ease  include  increasing  age  (especially  greater  than  40  years), 


African  ethnicity,  positive  family  history  of  glaucoma,  diabetes 
mellitus,  and  myopia  (nearsightedness). 

POAG  is  a  chronic,  slowly  progressive  optic  neuropathy  char¬ 
acterized  by  atrophy  of  the  optic  nerve  and  loss  of  peripheral 
vision.  Central  vision  is  typically  not  affected  until  late  in  the  dis¬ 
ease.  Because  central  vision  is  relatively  unaffected  until  late  in 
the  disease,  visual  loss  generally  progresses  without  symptoms 
and  may  remain  undiagnosed  for  quite  some  time.  While  usually 
bilateral,  the  disease  may  be  quite  asymmetrical.  POAG  is  asso¬ 
ciated  with  increased  intraocular  pressure,  but  normal-tension 
glaucoma  may  cause  glaucomatous  vision  loss  in  patients  with 
normal  intraocular  pressure.  Thus,  normal  eye  pressure  does 
not  rule  out  the  presence  of  glaucoma. 

Treatment  of  POAG  includes  topical  or  systemic  medications, 
laser,  or  surgery  to  lower  the  intraocular  pressure  to  a  level  at 
which  optic  nerve  damage  no  longer  occurs.  Visual  field  testing 
is  performed  regularly  in  order  to  evaluate  for  progressive  loss 
of  peripheral  vision,  and  the  optic  nerve  is  examined  regularly 
to  evaluate  for  evidence  of  progressive  optic  atrophy  (clinical 
signs  of  glaucoma  in  the  optic  disc  include  asymmetry  of  the 
neuroretinal  rim,  focal  thinning  of  the  neuroretinal  rim,  optic 
disc  hemorrhage,  and  any  acquired  change  in  the  disc  rim  ap¬ 
pearance  or  the  surrounding  retinal  nerve  fiber  layer). 

Specific  Visual  Functions 

Visual  acuity.  Visual  acuity  is  the  most  common  measure 
of  central  visual  function  and  refers  to  the  smallest  object  re¬ 
solvable  by  the  eye  at  a  given  distance.  It  is  defined  as  the  reci¬ 
procal  of  the  smallest  object  size  that  can  be  recognized.  Visual 
acuity  is  expressed  as  a  fraction  in  which  the  numerator  is  the 
distance  at  which  the  patient  recognizes  the  object  and  the  de¬ 
nominator  is  the  distance  at  which  a  standard  eye  recognizes 
the  object.  Eor  instance,  a  visual  acuity  of  20/60  means  that  the 
patient  needs  an  object  three  times  larger  or  three  times  closer 
than  a  standard  eye  requires.  The  traditional  visual  acuity  chart 
presents  symbols  of  decreasing  size  with  fixed  high  contrast. 
The  visual  acuity  chart  used  most  often  in  the  clinical  setting  is 
the  Snellen  acuity  chart,  which  is  comprised  of  certain  letters 
of  the  alphabet;  the  size  of  the  letters  are  constant  on  a  given 
line  of  the  eye  chart,  and  decrease  in  size  the  lower  the  line  on 
the  chart.  In  accurate  Snellen  notation,  the  numerator  indicates 
the  test  distance  and  the  denominator  indicates  the  letter  size 
seen  by  the  patient. 

Contrast  sensitivity.  Contrast  sensitivity  refers  to  the 
ability  of  the  patient  to  detect  differences  in  contrast  and  is  de¬ 
fined  as  the  reciprocal  of  the  lowest  contrast  that  can  be  de¬ 
tected.  This  may  be  measured  with  the  Pelli-Robson  chart,  in 
which  letters  decrease  in  contrast  rather  than  size,  or  the  Bailey- 
Lovie  chart,  in  which  letters  of  a  fixed  low  contrast  are  varied  in 
size.  Contrast  sensitivity  is  considered  a  more  sensitive  indicator 
of  visual  function  than  Snellen  acuity  and  may  provide  earlier 
detection  of  such  pathology  as  retinal  and  optic  nerve  disease. 

Visual  field.  Visual  field  is  classically  defined  as  a  three- 
dimensional  graphic  representation  of  differential  light  sensitiv¬ 
ity  at  different  positions  in  space.  Perimetry  refers  to  the  clinical 
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assessment  of  the  visual  field.  Typically,  visual  field  is  assessed 
with  kinetic  or  static  perimetry.  During  kinetic  perimetry  a  test 
object  of  fixed  intensity  is  moved  along  several  meridians  to¬ 
ward  fixation  and  points  where  the  object  is  first  perceived  are 
plotted  in  a  circle.  During  static  perimetry,  a  stationary  test  ob¬ 
ject  is  increased  in  intensity  from  below  threshold  until  per¬ 
ceived  by  the  patient,  and  threshold  values  yield  a  graphic  pro¬ 
file  section.  While  peripheral  visual  field  loss  often  produces 
difficulty  for  patients  in  orientation  and  mobility  functions,  mac¬ 
ular  field  loss  (either  centrally  or  paracentrally)  often  causes 
difficulties  with  reading.  For  instance,  the  presence  of  central  or 
paracentral  visual  field  loss  is  a  more  powerful  predictor  of  read¬ 
ing  speed  than  is  visual  acuity  (Fletcher,  Schuchard,  Livingstone, 
Crane,  &  Hu,  1994). 

Color  vision.  Evaluation  of  color  vision  may  be  per¬ 
formed  using  pseudoisochromatic  color  plates,  which  are  quick 
and  commonly  available;  they  consist  of  circles  in  various  col¬ 
ors  such  that  a  person  with  normal  color  vision  function  can  dis¬ 
tinguish  a  number  from  the  background  pattern  of  circles.  Ishi- 
hara  or  Hardy-Rand-Rittler  pseudoisochromatic  color  plates  are 
designed  to  screen  for  congenital  red/green  color  deficiencies, 
while  Lanthony  tritan  plates  may  be  used  to  detect  blue/yellow 
defects,  which  are  frequently  present  in  acquired  disease.  With 
the  Farnsworth-Munsell  100-hue  test,  the  patient  must  order 
84  colored  disks;  the  time-consuming  nature  of  this  test  limits 
its  clinical  use.  The  Farnsworth  Panel  D- 15  is  a  shorter  and  more 
practical  version  (using  15  disks),  but  is  less  sensitive.  Most 
color-vision  defects  are  nonspecific. 


Visual  Function  and  Age 

This  section  provides  an  overview  of  how  levels  of  visual  func¬ 
tion  vary  with  age,  and  to  what  degree.  Aging  is  synonymous 
with  natural  declines  in  a  person’s  sensory  abilities.  As  such, 
the  process  of  aging  is  accompanied  by  changes  to  the  eye,  in¬ 
cluding  the  retina  and  visual  nervous  system  that  can  impact 
functional  vision  (Schieber,  1994).  Additionally,  older  adults  are 
more  likely  to  acquire  ocular  conditions  that  can  compromise 
visual  functioning  beyond  normally  anticipated  changes,  such  as 
macular  degeneration,  diabetic  retinopathy,  and  cataracts.  Age- 
related  vision  loss  commonly  impinges  on  the  ability  to  com¬ 
plete  near  vision  tasks  such  as  reading  and  using  the  computer 
(Arditi,  2004).  An  understanding  of  these  functional  declines 
provides  direction  for  strategies  aimed  to  mitigate  the  negative 
impact  of  these  changes.  HCI  designers,  developers,  and  us¬ 
ability  specialists  should  be  fully  aware  of  these  needs,  as  the 
needs  of  this  growing  user  population  will  become  an  increased 
priority  with  the  shift  in  demographics  of  population  segment. 

Aspects  of  visual  function  that  are  known  to  normally  decline 
as  part  of  the  aging  process  include, 

•  Visual  acuity 

•  Visual  field 

•  Contrast  sensitivity 

•  Color  perception 

•  Floaters 


•  Dry  eyes 

•  Increased  need  for  light 

•  Difficulty  with  glare 

•  Dark/light  adaptation 

•  Reduced  depth  perception  (Orr,  1998) 

Beyond  these  factors,  eye  movement  efficiency  and  accuracy 
are  observed  to  decline  with  old  age.  Older  adults  are  known 
to  be  less  accurate  and/or  slower  in  locating  a  target  in  the  pe¬ 
ripheral  vision  (see  also  Kline,  D.  W,  &  Scialfa,  1997;  Lee,  Legge, 
&  Oritz,  2003).  Age-related  differences  have  also  been  observed 
with  the  effectiveness  with  which  older  adults  visually  track  tar¬ 
gets  with  higher  velocities.  In  both  cases,  these  trends  are  typi¬ 
cally  aggravated  by  the  presence  of  distracting  stimuli  (visual, 
auditory,  tactile,  etc.)  in  the  background  or  foreground  that  con¬ 
tribute  to  the  complexity.  The  perception  of  moving  stimuli,  for 
older  adults,  is  both  less  effective  and  less  efficient  in  tasks 
aimed  at  the  detection  of  small  target  movement/change  such 
as  those  found  on  dials  and  controls  (Kline,  D.  W,  &  Scialfa, 
1997).  Furthermore,  deficits  in  central,  para-central,  and  pe¬ 
ripheral  visual  field  can  pose  different  demands  on  vision,  re¬ 
sulting  in  different  search  strategies  related  eye  movements 
(Coeckelbergh,  Cornelissen,  Brouwer,  &  Kooijam,  2002). 

Older  adults  tend  to  exhibit  a  greater  degree  of  difficulty  with 
visual  search  tasks,  especially  when  the  number  of  items  to  be 
searched  increases  (Kline,  D.  W,  &  Scialfa,  1997).  Older  adults 
have  a  propensity  for  longer  visual  reaction  times,  especially  in 
cases  where  attention  is  divided.  Furthermore,  this  population 
segment  experiences  difficulties  ignoring  extraneous  informa¬ 
tion,  or  background  noise  (Schieber,  1994).  Research  also  sug¬ 
gests  that  visual  search  is  slower  and  less  effective  for  older 
adults  due  to  a  shrinking  of  the  useful  field  of  view  to  which  at¬ 
tention  can  be  simultaneously  allocated.  The  size  of  the  useful 
field  of  view,  for  older  adults,  is  especially  susceptible  to  con¬ 
text  related  factors,  such  as  complexity  and  cognitive  task  load 
(Schieber,  1994). 


Summary 

Studies  have  demonstrated  that  ophthalmic  patients  are  at  high 
risk  for  decreased  functional  status  and  quality  of  life  (Parrish 
et  al.,  1997;  Scott,  Schein,  Feuer,  Folstein,  &  Bandeen-Roche, 
2001;  Scott,  Schein,  "West,  Bandeen-Roche,  Enger,  &  Folstein, 
1994).  Patients’  functional  statuses  and  qualities  of  life  may  be 
improved  by  interventions  that  increase  visual  function,  such 
as  surgery  to  repair  retinal  detachment  or  remove  epiretinal 
membrane  (Scott,  Smiddy,  Feuer,  &  Merikansky,  1998)  and 
surgery  to  remove  cataract  (Applegate  et  al,  1987;  Brenner,  Cur- 
bow,  Javitt,  Legro,  &  Sommer,  1993;  Donderi  &  Murphy,  1983; 
Javitt,  Brenner,  Curbow,  Legro,  &  Street,  1993;  Steinberg  et  al., 
1994).  In  addition,  functional  status  and  quality  of  life  may  be 
improved  by  interventions,  such  as  low  vision  devices  and  ser¬ 
vices,  which  permit  patients  to  use  their  remaining  vision  more 
effectively  (Scott,  Smiddy,  Schiffman,  Feuer,  &  Pappas,  1999). 
In  addition,  as  the  proportion  of  older  adults  multiplies,  the 
number  of  individuals  experiencing  some  degree  of  vision  dys¬ 
function  that  normally  occurs  with  age  has  created  an  increased 
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demand  for  information  technology  that  affords  use  despite  the 
dysfunction.  Prior  studies  have  demonstrated  the  effect  of  low 
vision  interventions  on  objective  task-specific  measures  of  func¬ 
tional  abilities  such  as  reading  speed,  reading  duration,  and  abil¬ 
ity  to  read  a  certain  print  size  (Nilsson,  U.  L.,  1990;  Nilsson,  U.  L., 
&  Nilsson,  S.  E.  G.,  1986;  Rosenberg,  Faye,  Fischer,  &  Budicks, 
1989;  Sloan,  1968). 

However,  historically  there  has  been  little  data  available  con¬ 
cerning  the  abilities  of  people  with  visual  impairments  to  use 
computers.  It  has  been  even  more  challenging  to  find  data  con¬ 
cerning  how  modifications  of  graphical  user  interface  features 
may  increase  accessibility  of  computers  to  patients  with  visual 
impairments.  An  exception  to  this  is  the  research  agenda  es¬ 
tablished  byjacko  and  colleagues  (see  Jacko,  Moloney,  et  al., 
2005;Jacko,  Scott  etal.,  2003;  Scott  etal.,  2002a,b;  Scott,  Jacko, 
et  al.,  2006;  and  Table  43.1  for  examples  that  has  yielded  sys¬ 
tematically  derived  HCI  performance  thresholds  for  users  ac¬ 
cording  to  their  ocular  profile  (diagnoses  and  functional  ability). 
Table  43.1  summarizes  the  research  products  resulting  from  this 
effort,  chronologically  organized  by  the  ocular  pathology  in¬ 
vestigated,  the  age  of  the  users  involved,  the  GUI  interaction  in¬ 
vestigated  and  the  corresponding  specific  interface  feature  in 
question.  It  goes  without  saying  that  the  rapid  proliferation  of 
visual  displays  beyond  the  desktop  to  handheld  and  wearable 
computers  and  other  mobile  devices,  such  as  cellular  tele¬ 
phones,  increase  the  importance  of  graphical  user  interface  in¬ 
novations  that  facilitate  efficient  and  rewarding  usage  by  people 
with  visual  impairments. 


HIGHLIGHTING  TECHNOLOGICAL 
ADVANCEMENTS  IN  HCI  RESEARCH 


With  nearly  every  aspect  of  today’s  society  involving  some  type 
of  computer  technology,  there  is  an  ever-growing  need  to  un¬ 
derstand  how  individuals  with  visual  impairments  can  access 
technology.  Currently,  without  special  modifications,  the  typical 
PC  poses  several  challenges  to  users  who  experience  limited 
bandwidth  to  their  visual  sensory  channels,  as  well  as  other  per¬ 
ceptual  impairments.  As  a  result,  the  HCI  research  community 
is  placing  great  emphasis  on  the  design  of  universally 
acceptable  technologies.  According  to  Stephanidis,  Salvendy 
Akoumianakis,  Bevan,  et  al.  (1998),  “Universal  access  in  the 
Information  Society  signifies  the  right  of  all  citizens  to  obtain 
equitable  access  to,  and  maintain  effective  interaction  with,  a 
community-wide  pool  of  information  resources  and  artifacts” 
(p.  6).  Accessibility  has  been  a  term  traditionally  associated 
with  elderly  individuals,  individuals  with  disabilities  and  others 
who  possess  special  needs  (Stephanidis,  Salvendy,  Akoumi¬ 
anakis,  Arnold,  et  al.,  1999).  However,  because  of  the  current 
influx  of  new  technologies  into  the  market,  the  population  of 
users  who  may  possess  special  needs  is  growing.  As  a  result,  ac¬ 
cessibility  has  taken  on  a  more  comprehensive  connotation. 
This  connotation  implies  that  all  individuals  with  varying  lev¬ 
els  of  abilities,  skills,  requirements,  and  preferences  be  able  to 
access  information  technologies  (Stephanidis,  Salvendy,  Ak¬ 
oumianakis,  Arnold,  et  al.,  1999).  Universal  access  also  implies 
more  than  just  adding  features  to  existing  technologies.  Rather, 


the  concept  of  universal  access  emphasizes  that  accessibility  be 
incorporated  directly  into  the  design  (Stephanidis,  Salvendy, 
Akoumianakis,  Bevan,  et  al.,  1998).  Perceptual  and  adaptive  in¬ 
terfaces  are  two  ideal  examples  of  how  universal  accessibility 
can  be  achieved. 

Perceptual  Interfaces 

The  concept  of  perceptual  design  describes  a  perspective  of 
design  that  defines  interactions  in  terms  of  human  perceptual 
capabilities.  In  a  sense,  it  strives  to  humanize  interaction.  The 
design  of  perceptual  interfaces  adheres  to  the  idea  that  lessons 
learned  from  psychological  research  about  perception  can  be 
applied  to  interface  design  (Reeves  &  Nass,  2000).  In  adopting 
the  concept  of  perceptual  design,  several  opportunities  surface 
for  the  creation  of  innovative  perceptual  user  interfaces.  Inter¬ 
actions  with  these  interfaces  can  be  described  in  terms  of  three 
particular  human  perceptual  capabilities;  chemical  senses  (e.g., 
taste  and  olfaction),  cutaneous  senses  (e.g.,  skin  and  receptors), 
and  vision  and  hearing  (Reeves  &  Nass,  2000).  Although  com¬ 
monly  used  computer  technology  limits  the  effectiveness  of 
chemical  senses,  the  technology  can  be  extended  to  incorporate 
the  cutaneous,  visual,  and  hearing  senses. 

In  terms  of  vision,  research  has  focused  on  topics,  such  as  vi¬ 
sual  mechanics,  color,  brightness  and  contrast,  objects  and 
forms,  depth,  size,  and  movement.  Hearing  research  includes 
psychophysical  factors  such  as  loudness,  pitch,  timbre,  and 
sound  localization;  physiological  mechanisms  such  as  the  audi¬ 
tory  components  of  the  ear,  and  the  neural  activity  associated 
with  hearing;  and  the  perception  of  speech  such  as  units  of 
speech  and  the  mechanics  of  word  recognition  (Reeves  &  Nass, 
2000).  With  respect  to  the  cutaneous  senses,  augmented  GUIs 
with  haptic  feedback  have  been  around  since  the  early  1990s. 
Akamatsu  and  Sate  conducted  the  first  research  with  a  haptic 
mouse  that  produced  haptic  feedback  via  fingertips  and  force 
feedback  via  controlled  friction  (as  cited  in  Oakley,  McGee, 
Brewster,  &  Gray,  2000).  Engle,  Goossens,  and  Haakma  found 
that  directional  two  degrees  of  freedom  force  feedback  im¬ 
proved  speed  and  error  rates  in  a  targeting  task  (as  cited  in  Oak¬ 
ley,  McGee,  Brewster,  &  Gray,  2000). 

The  strength  of  perceptual  user  interfaces  comes  from  the 
ability  of  designers  to  combine  an  understanding  of  natural  hu¬ 
man  capabilities  with  computer  input/output  devices,  and  ma¬ 
chine  perception  and  reasoning  (Turk  &  Robertson,  2000).  Gen¬ 
eral  examples  of  how  capabilities  can  be  combined  with 
technology  include  speech  and  sound  recognition  and  genera¬ 
tion,  computer  vision,  graphical  animation,  touch-based  sensing 
and  feedback,  and  user  modeling  (Turk  &  Robertson,  2000). 

From  an  applied  research  standpoint,  the  concepts  of  per¬ 
ceptual  interfaces  are  housed  within  multimedia  and  multi¬ 
modal  interfaces.  Both  multimedia  and  multimodal  interfaces 
offer  increased  accessibility  to  technologies  for  individuals  with 
perceptual  impairments.  Distinctions  can  be  drawn  between 
perceptual,  multimedia,  and  multimodal  interfaces,  shown  in 
Fig.  43.2.  Perceptual  interfaces  prescribe  human-like  percep¬ 
tual  capabilities  to  the  computer.  Multimedia  and  multimodal 
interfaces  can  be  considered  applied  extensions  of  this  concept. 
Multimedia  interfaces  elicit  perceptual  and  cognitive  skills  to 
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Computer 


FIGURE  43.2.  Perceptual,  multimodal,  and  multimedia  inter¬ 
faces  (flow  of  information). 


interpret  information  presented  to  the  user.  Whereas  multimodal 
interfaces  use  multiple  modalities  for  the  HCI.  Multimedia  in¬ 
terfaces  focus  on  the  media  while  multimodal  interfaces  focus 
on  the  human  perceptual  channels  (Turk  &  Robertson,  2000). 
The  strength  and  capabilities  of  multimedia  and  multimodal  in¬ 
terfaces  with  respect  to  individuals  with  perceptual  impairments 
are  described  in  more  depth  in  the  next  two  sections. 


hance  a  GUI  for  a  user  with  a  hearing  impairment.  Information 
that  would  be  commonly  conveyed  via  auditory  feedback  could 
be  provided  visually.  An  example  of  this  would  be  to  have  an 
icon  display  when  an  error  has  been  committed,  rather  than  the 
traditional  beep  sounding.  Other  examples  of  how  multiple 
modalities  can  enhance  accessibility  to  information  technolo¬ 
gies  are  discussed  in  the  multimodal  section. 

The  potential  strength  of  multimedia  interfaces  comes  from 
its  ability  to  use  images,  text,  and  animation  to  connect  with 
users.  However,  in  order  to  facilitate  that  type  of  exchange, 
much  consideration  must  be  paid  to  users’  sensations  and  per¬ 
ceptions  via  the  auditory,  speech,  and  visual  channels.  For  in¬ 
stance  when  designing  graphical  images,  basic  knowledge  of 
color  vision  is  needed  to  ensure  that  colors  can  be  discrimi¬ 
nated,  foreground  can  be  separated  from  background,  high¬ 
lights  will  attract  attention,  and  grouping  of  objects  is  apparent 
(Gillan,  1998).  An  understanding  of  human  sensation  and  per¬ 
ception  is  necessary  if  perpetual  interfaces  are  to  reach  their  full 
potentials.  When  perceptual  interfaces  are  used,  however,  they 
provide  individuals  with  perpetual  impairments,  such  as  those 
related  to  the  visual  channel,  alternative  modalities  to  access 
technology.  Multimodal  interfaces  are  a  second  type  of  percep¬ 
tual  interface  that  provide  this  same  benefit. 


Multimedia  interfaces.  Multimedia  interfaces  have 
grown  from  the  need  to  display  diverse  forms  of  information  in 
a  flexible  and  interactive  way.  Multimedia  can  be  simply  defined 
as  computer-controlled  interactive  presentations  (Chignell 
&  Waterworth,  1997).  The  broadness  of  this  definition  directly 
corresponds  to  the  broadness  of  the  field  of  multimedia  re¬ 
search.  Chapter  12  in  this  Handbook  contains  a  comprehen¬ 
sive  discussion  of  multimedia  including  topics  dealing  with  the 
cognitive  implications  related  to  multimedia,  selecting  media 
for  the  message,  navigation  and  interaction  and  evaluation  of 
multimedia.  However,  for  the  purpose  of  this  chapter,  the  dis¬ 
cussion  of  multimedia  is  limited  to  a  brief  overview  of  types  and 
potential  strengths  of  multimedia  as  they  relate  to  enhancing 
accessibility  to  information  technologies  for  individuals  with 
perceptual  impairments. 

There  are  three  approaches  to  multimedia:  performance, 
presentation,  and  document  (Chignell  &  Waterworth,  1997).  In 
the  performance  approach,  multimedia  is  a  kind  of  theatrical 
play  that  is  conveyed  through  “actors.”  The  timing  of  the  actors’ 
performances  is  orchestrated  in  an  effort  to  entertain  and  edu¬ 
cate  (Waterworth  &  Chignell,  1997).  Presentation  multimedia  is 
a  modern  version  of  slide  shows,  where  video  clips  and  anima¬ 
tion  enhance  a  sequence  of  slides.  The  goal  of  the  presentation 
approach  is  to  convey  ideas  to  the  user  (Chignell  &  Waterworth, 
1997).  Lastly,  the  document  approach  focuses  on  text  and  ideas. 
It  can  be  thought  of  as  an  enhanced  document  that  elaborates 
ideas  in  the  text.  All  of  these  approaches  provide  additional  op¬ 
portunities  to  convey  perceptual  information  to  the  user. 

Multimedia  allows  communication  between  users  and  com¬ 
puters  in  a  sensory  manner.  As  such,  the  essential  aspect  of 
designing  multimedia  interfaces  is  selecting  the  media  and 
modalities.  Information  can  be  taken  from  one  modality  and 
presented  in  another  modality  (Chignell  &  Waterworth,  1997). 
For  instance,  data  presented  visually  could  be  converted  and 
displayed  audibly.  For  instance,  multimedia  could  be  used  to  en¬ 


Multimodal  interfaces.  Multimodal  interfaces,  as  they 
are  discussed  in  this  chapter,  are  interfaces  that  support  a  wide 
range  of  perceptual  capabilities  (e.g.,  auditory,  speech,  and  vi¬ 
sual)  as  a  means  to  facilitate  human  interaction  with  computers. 
Multimodal  interfaces  are  discussed  in  a  much  broader  sense 
(e.g.,  development  of  multimodal  processing)  in  another  chap¬ 
ter  in  this  handbook. 

With  the  growing  complexity  of  technology  and  applications, 
a  single  modality  no  longer  permits  users  to  interact  effectively 
across  all  tasks  and  environments  (Oviatt,  Cohen,  Suhm,  et  al., 
2002).  The  strength  of  a  multimodal  design  is  its  ability  to  allow 
users  the  freedom  to  use  a  combination  of  modalities  or  the 
best  modality  for  their  needs.  These  interfaces  make  the  most 
effective  use  of  the  variety  of  human  sensory  channels,  alone 
and  in  combination.  Ultimately,  multimodal  interfaces  offer  ex¬ 
panded  accessibility  of  computing  and  promote  new  forms  of 
computing  not  previously  available  to  individuals  with  percep¬ 
tual  impairments. 

The  development  and  application  of  multimodal  interfaces 
for  making  technologies  accessible  to  users  with  visual  impair¬ 
ments  is  growing.  Some  forms  of  unimodal,  bimodal,  and  tri- 
modal  feedback  within  multimodal  interfaces  have  been  inves¬ 
tigated  and  their  advantages  have  been  documented  for  users 
with  visual  impairments  (Jacko,  Emery  et  al.,  2004;  Jacko, 
Barnard,  et  al.,  2004;  Jacko,  Moloney,  et  al.,  2005;  Jacko,  Scott, 
Sainfort,  et  al.,  2003).  Since  most  information  presented  on  a 
GUI  is  visual,  there  is  great  interest  in  the  research  community 
to  find  alternative  ways  of  displaying  this  information.  One  of 
the  common  approaches  to  conveying  visual  information  in  a 
nonvisual  way  is  through  the  use  of  the  tactile  modality.  Spe¬ 
cific  research  has  been  conducted  in  the  realm  of  tactile  displays 
with  respect  to  visual  impairments.  Not  only  can  tactile  displays 
provide  information  regarding  a  graphic’s  identity  but  also  the 
depth,  location,  and  perception  of  its  purpose.  The  use  of  tactile 
systems  can  also  provide  navigational  information.  Research  has 
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shown  that  tactile  output  of  directional  information  offers  sup¬ 
port  to  the  blind  as  they  explore  images  (Kurze,  1998).  Research 
has  also  been  conducted  in  the  area  of  movable  dynamic  tactile 
displays  that  present  information  to  one  or  several  fingertips  in 
a  Braille  type  manner.  The  Braille  dots  move  in  a  wave  of  lifted 
and  lowered  series  of  pins  (Fricke  &  Baehring,  1994).  Along  the 
same  line  of  research,  the  use  of  a  bidimensional  single  cell 
Braille  display  combined  with  a  standard  Braille  cell  has  been 
evaluated.  Although  initial  research  found  this  new  combined 
device  is  not  an  improvement  over  a  standard  stand-alone 
Braille  display,  continued  research  is  yielding  improvement 
(Ramstein,  1996).  Tactile  output  via  force  feedback  has  been 
looked  to  as  a  means  of  conveying  numerical  information.  Yu, 
Ramloll,  and  Brewster  (2000)  worked  on  a  system  that  converts 
data  typically  displayed  visually  into  haptic  and  auditory  out¬ 
put.  Since  data  visualization  techniques  are  not  appropriate  for 
blind  people  or  people  with  visual  impairments,  this  system 
translated  graphs  into  friction  and  textured  surfaces  along  with 
auditory  feedback. 

Another  common  approach  to  converting  visual  information 
in  a  nonvisual  way  is  through  the  use  of  the  speech  modality. 
Speech  recognition  systems  serve  as  an  alternative  modality  for 
users  and  computers  to  interact.  These  systems  recognize  hu¬ 
man  speech  and  translate  it  into  commands  or  words  under¬ 
stood  by  the  computer.  Chapter  34  in  this  handbook  offers  an 
introduction  to  the  technology  of  speech  recognition  systems 
and  design  issues  associated  with  incorporating  speech  recog¬ 
nition  into  applications.  However,  this  chapter’s  discussion  of 
speech  recognition  systems  concentrates  on  its  use  related  to 
individuals  with  visual  impairments.  The  integration  of  speech 
input  and  output  into  applications  offers  an  alternative  to  purely 
graphical  environments  (Yankelovich,  Levow,  &  Marx,  1995). 
This  type  of  technology-driven  design  allows  applications  to  be 
suited  to  a  wide  variety  of  individuals  with  disabilities  (Danis  & 
Karat,  1995).  For  instance,  individuals  with  visual  impairments 
can  use  a  computer  solely  by  voice  activation. 

Speech  recognition  systems  are  traditionally  associated  with 
the  concept  of  dictation.  Products  such  as  Dragon  Systems,  Inc. 
Dragon  Naturally  Speaking  offer  a  line  of  speech  recognition 
products  for  dictation.  Some  specific  packages  are  geared  to¬ 
ward  particular  professions,  such  as  medical  and  legal  (Cun¬ 
ningham  &  Coombs,  1997).  Other  common  dictation  systems 
are  IBM  ViaVoice,  Lernout  &  Hauspie  Voice  Xpress,  and  Philips 
Speech  Processing  FreeSpeech98.  Along  the  lines  of  recognition 
engines,  Verbex  Voice  Systems  and  SRI  Corp.’s  DECIPHER  offer 
continuous  speech  recognition  technology.  Microsoft’s  'Whisper 
provides  speaker-independent  speech  recognition  with  online 
adaptation,  noise  robustness,  and  dynamic  vocabularies  and 
grammars  (Huang,  Acero,  Alieva,  et  al.,  1995).  Speech  driven 
menu  navigation  systems,  for  instance  Command  Corp.  Inc.,  IN 
CUBE  Voice  Command  for  window  navigation,  have  also  been 
developed  (Karshmer,  Brawner,  &  Reiswig,  1994;  Karshmer,  Og¬ 
den,  Brawner,  Kaugars,  &  Reiswig,  1994). 

Often,  circumstances  may  affect  more  than  one  sensory 
channel,  and  individuals  may  experience  multiple  impairments. 
In  addition  to  providing  heightened  challenges  to  interaction, 
this  impacts  the  efficacy  of  perceptual  interface  in  meeting 
users’  needs.  For  example,  many  disabilities  are  associated  to 
some  degree  with  speech  degeneration,  and  therefore  speech 


recognition  systems  may  not  fully  accommodate  a  user’s  needs 
(Rampp,  1979).  Older  adults  commonly  experience  multiple  im¬ 
pairments,  as  all  the  sensory  channels  normally  decline  with 
age,  and  are  more  susceptible  to  conditions,  such  as  stroke, 
which  can  cause  reduced  speech,  vision,  and  motor  skills.  For 
cases  in  which  speech  is  degraded,  speech  recognition  systems 
must  be  sensitive  enough  to  adapt  to  impaired  speech.  Regard¬ 
less  of  what  caused  the  speech  impairment,  devices  and  tech¬ 
niques  can  be  applied  to  augment  the  communicative  abilities 
of  individuals  who  experience  difficulty  speaking  in  an  under¬ 
standable  manner. 

Augmentative  and  alternative  communication  (AAC)  is  a  field 
of  study  concerned  with  providing  such  devices  and  techniques. 
McCoy,  Demasco,  Pennington,  and  Badman  (1997)  described  a 
prototype  system  aimed  at  users  with  cognitive  impairments. 
This  prototype  is  designed  to  aid  communication  and  provide 
language  intervention  benefits  across  several  user  populations. 
An  iconic  language  approach  has  been  applied  to  aid  individuals 
with  significant  speech  and  multiple  impairments  (SSMI).  Re¬ 
search  has  been  conducted  on  the  use  of  icon  language  design 
based  on  the  theory  of  icon  algebra  (Chang,  1990)  and  the  the¬ 
ory  of  conceptual  dependency  (Schank,  1972).  These  method¬ 
ologies  are  then  used  in  interactive  interface  design  (Albacete, 
Chang,  &  Polese,  1994).  Individuals  with  SSMI  also  commonly 
have  difficulty  with  word  processing.  Using  an  animated  graph¬ 
ical  display,  phoneme  probabilities  of  speech  can  be  more  easily 
isolated  and  recognized.  This  allows  users  the  opportunity  to  in¬ 
terpret  their  speech  rather  than  forcing  the  computer  to  do  it 
automatically  (Roy  &  Pentland,  1998).  The  more  feedback  and 
modalities  users  are  provided,  the  more  efficiently  they  can 
interact  with  speech  recognition  by  assisting  the  computer  to  in¬ 
terpret  their  speech  correctly.  Speech  recognition  systems  must 
also  take  into  account  the  possible  extent  of  cognitive  burden 
placed  on  the  user.  If  voice  is  used  to  navigate  through  a  GUI, 
certain  prosodic  features  (e.g.,  pauses)  of  the  user’s  speech, 
resulting  from  a  high  cognitive  load,  may  affect  performance 
(Baca,  1998).  Additional  challenges  associated  with  the  imple¬ 
mentation  of  speech  recognition  systems  are  discussed  in  this 
handbook  as  well.  Overall,  these  systems  need  to  account  for 
situations  when  users  with  visual  impairment  experience  vary¬ 
ing  levels  of  hearing,  speech,  and  cognitive  abilities. 

The  auditory  channel  serves  not  only  to  convey  information, 
but  also  to  receive  information.  The  use  of  auditory  feedback  is 
extremely  useful  to  many  computer  users.  Ongoing  research  has 
looked  at  the  use  of  bidirectional  sound  as  a  standard  element  of 
an  interface  environment.  A  prototype  named  the  "Voice  Enabled 
Reading  Assistant  (VERA),  written  using  an  Aural-Oral  User  Inter¬ 
face  (A-OUI)  model,  was  developed  to  provide  bidirectional 
sound.  The  A-OUI  model  captures  qualities  and  functions  of  plain 
text  files  needed  for  user  interfaces  to  present  information  to  the 
auditory,  visual,  and  tactile  senses.  The  use  of  the  "VERA  prototype 
can  be  applied  to  many  types  of  office  and  Internet  text  and  data. 
Similar  speech-enabled  products  are  Emacspeak  and  IBM  Voice- 
Type  Simply  Speaking  Gold  (Ryder  &  Ghose,  1999). 

Much  work  with  respect  to  audio  feedback  has  focused  on 
conveying  information  from  a  GUI.  Eor  instance,  research  con¬ 
ducted  by  Darvishi  and  colleagues  (Darvishi,  Guggiana,  et  al., 
1994;  Darvishi,  Munteanu,  et  al.,  1994)  looked  at  mapping 
GUIs  into  auditory  domains  through  impact  sounds  based  on 
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physical  modeling.  Ultimately,  these  sounds  are  used  to  con¬ 
vey  information  to  the  user  regarding  the  objects  in  a  GUI  en¬ 
vironment.  Audio  feedback  has  also  been  used  to  provide  vi¬ 
sually  impaired  users  with  a  sense  of  depth  perception,  by 
varying  the  location  of  the  sound  sources  in  a  three-dimen¬ 
sional  environment.  Because  depth  perception  is  a  function  of 
vision,  any  cues  that  can  be  conveyed  though  other  modali¬ 
ties  are  vital.  This  is  particularly  crucial  when  working  with 
three-dimensional  computer  applications,  such  as  a  CAD  pack¬ 
age  (Mereu  &  Kazman,  1996).  The  use  of  auditory  feedback 
has  also  been  studied  with  regard  to  enhancing  synthesis 
speech  output.  Through  the  development  of  spatial  audio 
processing  systems,  a  greater  benefit  of  synthesis  speech  was 
achieved  (Crispien,  Wiirz,  &  Weber,  1994).  IBM’s  Screen 
Reader/2  also  uses  synthesis  speech  to  make  GUIs  accessible 
to  those  who  are  visually  impaired  or  blind,  by  converting 
screen  information  to  speech  or  Braille.  The  users  are  contin¬ 
uously  kept  informed  of  screen  activity  and  cursor  movement. 
Reading  typed  characters,  words  and  sentences  are  features 
that  can  be  made  automatic.  This  software  aids  in  the  use  of 
windows,  menus,  dialog  boxes  and  other  controls  (Thatcher, 
1994).  Microsoft  Corporation’s  Whistler,  a  trainable,  text-to- 
speech  system  that  produces  synthetic  speech,  sounds  very 
natural  by  reproducing  the  characteristics  of  original  speech 
(Huang,  Acero,  Adcock,  et  al.,  1996).  WRITE:OUTLOUD  and 
OutSPOKEN  are  two  additional  text-to-speech  products  com¬ 
monly  used  (Friedlander,  1997).  Regardless  of  what  product 
is  used,  text-to-speech  software  enables  information  to  be  col¬ 
lected  and  utilized  in  a  more  rapid  manner. 

In  addition  to  the  benefits  already  discussed  that  enable 
users  with  visual  impairments  to  access  technology,  multimodal 
interfaces  also  provide  superior  support  of  error  handling.  More 
specifically  multimodal  interfaces  have  been  shown  to  have  the 
ability  to  avoid  errors  and  recover  more  efficiently  from  errors 
when  they  do  occur.  This  is  a  very  important  element  of  uni¬ 
versal  accessibility.  Enabling  users  to  use  a  technology  includes 
not  only  being  able  to  access  the  technology  but  also  the  ability 
to  use  it  in  an  efficient  manner.  Reducing  or  avoiding  errors  is  a 
major  key  to  improving  efficiency.  A  multimodal  interface  pro¬ 
vides  better  error  handling  than  a  unimodal  interface  for  several 
reasons.  The  following  are  factors,  which  represent  reasons  why 
a  multimodal  interface  may  be  better  at  error  avoidance. 

•  Users  will  select  the  input  mode  considered  less  error  prone 
for  a  particular  context,  which  is  assumed  to  lead  to  error 
avoidance. 

•  The  ability  to  use  several  modalities  permits  users  the  flexi¬ 
bility  to  leverage  their  strengths  by  using  the  most  appropri¬ 
ate  modality.  It  is  possible  that  the  more  comfortable  the 
users,  the  less  likely  they  are  to  make  errors. 

•  Users  tend  to  switch  modes  after  systems  errors.  For  in¬ 
stance,  users  are  likely  to  switch  input  modes  when  they  en¬ 
counter  a  system  recognition  error. 

•  Users  are  less  frustrated  with  error  when  interacting  within  a 
multimodal  interface,  even  when  errors  are  as  frequent  as  in  a 
unimodal  interface.  The  reduction  in  frustration  may  be  the  re¬ 
sult  of  the  user  feeling  as  thought  they  have  more  control  over 
the  system  because  they  can  switch  modes  (Oviatt,  1999). 


Ultimately,  multimodal  systems  aid  in  reducing  errors  (error 
handling)  because  they  offer  parallel  or  duplicated  functionality 
that  allow  users  to  accomplish  the  task  using  one  of  several 
modalities  (Oviatt  &  Cohen,  2000).  These  benefits  stem  from  the 
user-centered  and  system-centered  design  perspectives  from 
which  multimodal  interfaces  are  built  (Oviatt  &  Cohen,  2000;  Ovi¬ 
att,  Cohen,  Suhm,  et  al.,  2002).  Like  multimodal  interfaces,  adap¬ 
tive  interfaces,  also  benefit  from  a  user-centered  perspective. 

Adaptive  Interfaces 

Adaptive  interfaces  have  great  potential  for  accommodating  a 
wide  range  of  users  in  a  variety  of  work  contexts.  As  a  result, 
much  research  has  been  conducted  on  the  design  and  imple¬ 
mentation  of  adaptive  interfaces.  An  example  of  early  research 
in  the  field  of  adaptive  GUI  is  illustrated  by  the  work  of  Mynatt 
and  Weber  (1994)  with  the  Mercator  project,  and  the  GUIB  (Tex¬ 
tual  and  Graphical  User  Interfaces  for  Blind  People)  project 
(Petrie,  Morley  &  Weber,  1995).  Both  of  these  projects  focused 
on  making  environment-level  adaptations  to  GUIs  in  order  to 
make  them  more  accessible  (Stephanidis,  2001a).  Mercator  in¬ 
terfaces  model  the  graphical  objects  and  their  hierarchical  rela¬ 
tionships.  The  model  serves  to  predict  a  user’s  interaction  (Ed¬ 
wards,  W  K,  &  Mynatt,  1994;  Edwards,  W  K,  Mynatt,  &  Stockton, 
1994).  The  goal  is  to  provide  visually  impaired  users  with  an  in¬ 
terface  that  is  more  accessible.  By  better  understanding  how 
low  vision  users  interact  with  a  computer,  interfaces  can  be  de¬ 
signed  more  effectively.  Outputs  such  as  synthetic  and  digitized 
speech,  refreshable  Braille,  and  nonspeech  sounds  also  make  an 
interface  easier  to  use.  Auditory  icons  and  earcons  are  two  pre¬ 
dominate  areas  of  nonspeech  sound  research.  Auditory  icons, 
developed  by  Gaver  (1989),  are  everyday  sounds  that  occur  in 
the  world  mapped  to  the  computer  world.  Gaver  first  applied 
the  concept  of  auditory  icons  to  SonicFinder.  Earcons  are  a  sec¬ 
ond  method  for  employing  nonspeech  audio  in  GUIs.  Earcons, 
developed  by  Blattner,  Sumikawa,  and  Greenberg  (1989),  are 
audio  messages  that  provide  the  user  with  information  about 
objects  or  operations  of  the  computer.  The  first  version  of  GUIB 
adapted  the  GUI  by  combining  Braille  speech  and  nonspeech 
audio  together  to  construct  a  nonvisual  interface  so  that  users 
who  are  blind  can  access  the  GUI  (Emiliani,  2001). 

Configurable  interface  designs  based  on  user  models  have 
been  heavily  researched.  The  user  models  have  been  defined 
with  respect  to  visual,  cognitive,  motor,  and  other  abilities.  With 
these  models,  custom  computer  systems,  including  both  hard¬ 
ware  and  software,  can  be  created  (McMillan  &  Wisniewski,  1994). 
Semantic  abstraction  of  user  interaction,  named  abstract  widgets, 
is  another  modeling  approach  that  provides  great  flexibility.  Ab¬ 
stract  widgets  separate  the  user  interface  from  the  application 
functionality.  This  allows  users  to  interact  with  interfaces  as  they 
choose,  independently  from  their  environment  (Kawai,  Aida,  & 
Saito,  1996).  The  use  of  adaptation  determinates,  constituents, 
goals,  and  rules  are  yet  another  approach  of  an  adaptation  strat¬ 
egy.  This  strategy  is  based  on  the  fact  that  these  important  attrib¬ 
utes,  which  categorize  adaptation,  can  be  used  to  formulate  adap¬ 
tation  rules.  These  adaptation  rules,  in  turn,  assist  the 
development  of  intelligent  user  interfaces.  This  approach  can  be 
customized  to  the  requirements  of  different  application  domains 
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and  user  groups  (Stephanidis,  Karagiannidis,  &  Koumpis,  1997). 
Approaches  such  as  described  by  the  Pervasive  Accessible  Tech¬ 
nology  (PAT)  allow  individuals  with  disabilities  to  use  standard 
interface  devices  that  adapt  to  the  user  in  order  to  communicate 
with  information  technology  infrastructures  (Paciello,  1996). 
Based  on  an  individual’s  disability,  the  implementation  of  a  User 
Interface  Management  System  (UIMS)  model  provides  the  versa¬ 
tility  needed  to  adapt  interfaces  to  individuals.  The  selection  of  in¬ 
put  devices,  presentation  of  information  on  the  screen,  and 
choice  in  selection/activation  method,  can  all  be  adapted  to  fit 
specific  user  needs  (Buhler,  Heck,  &  "Wallbruch,  1994).  Current 
research  focuses  on  operationally  reliable  software  infrastructures 
that  support  alternative  physical  realizations  through  the  abstrac¬ 
tions  of  objects.  More  specifically,  systems  such  as  Active  X(  by  Mi¬ 
crosoft  and  JavaBeans(  by  SunSoft  represent  component-ware 
technology  (Stephanidis,  2001b).  These  systems  represent  a 
mainstream  effort  to  provide  technological  structures  that  pro¬ 
vide  more  adequate  support  for  accessibility.  They  are  also  two 
examples  of  currently  available  tools  that  can  be  used  to  gener¬ 
ate  code  for  the  adaptation  of  various  interface  components. 

Some  hypermedia  systems  research  has  focused  on  adaptive 
hypermedia  applications.  These  adaptive  hypermedia  systems 
keep  track  of  evolving  aspects  of  the  user,  such  as  preferences 
and  domain  knowledge.  This  information  is  stored  to  create  a 
user  model,  which  in  turn  is  the  basis  for  user  interface  adapta¬ 
tion  (De  Bra,  Houben,  &  ’Wu,  1999).  Adaptive  hypermedia  can 
also  be  designed  based  on  task  models.  In  that  the  latter  case, 
task  models  are  as  the  basis  from  which  hypermedia  system  are 
developed.  These  task  models  support  the  design  and  develop¬ 
ment  of  hypermedia.  Different  task  models  are  associated  with 
different  types  of  users  (Paterno  &  Mancini,  1998).  Task  models 
reflect  the  user’s  view  of  the  activities  to  be  performed.  In  addi¬ 
tion  to  personalizing  the  content  of  hypermedia  systems,  adap¬ 


tive  navigation  support  has  also  been  researched.  Prototype  sys¬ 
tems  have  been  developed  to  demonstrate  how  different  navi¬ 
gational  possibilities  can  be  made  available  based  on  a  user 
model  (Pilar  da  Silva,  Van  Durm,  Duval,  &  Olivie,  1998). 

Another  example  of  adapting  interfaces  to  users  is  through 
the  use  of  the  EZ  Access  protocols  developed  by  the  Trace  Cen¬ 
ter  at  the  University  of 'Wisconsin-Madison.  EZ  Access  protocols 
are  a  set  of  techniques  that  modify  an  interface  to  fit  a  specific 
user’s  need.  These  protocols  work  across  disabilities  and  with 
a  range  of  products.  The  most  common  use  of  EZ  Access  pro¬ 
tocols  is  in  touchscreen  kiosks  (Vanderheiden,  1998). 

■While  research  and  development  in  the  areas  of  perceptual 
and  adaptive  interfaces,  like  those  examples  discussed,  have 
gained  substantial  attention  in  generating  potential  solutions  for 
individuals  with  visual  sensory  impairments,  the  majority  of 
work  and  realized  products  are  positioned  in  two  areas.  The 
most  readily  available  products  and  the  majority  of  the  research 
have  been  somewhat  narrowly  focused  on  (a)  the  magnification 
of  screen  elements  (Fraser  &  Gutwin,  2000)  and  (b)  the  acces¬ 
sibility  of  text  (Craven,  2003).  Comparatively,  less  has  been  ac¬ 
complished  in  terms  of  critical  aspects  of  the  graphical  user  in¬ 
terface  as  has  been  done  with  magnification  and  vocalization  of 
text.  "While  the  solutions  generated  have  afforded  access  to  both 
those  individuals  who  are  blind  and  those  who  retain  a  range 
of  visual  impairment,  they  tend  to  be  one  size  fits  all  solutions, 
which  entirely  abandon  the  visual  sensory  channel  (as  in  the 
case  of  Braille  interfaces  or  Screen  Readers),  or  focus  only  on 
the  augmentation  of  the  visual  channel  (e.g.,  magnification,  in¬ 
creased  contrast,  increased  text  size,  etc.),  and  very  few  are  mul¬ 
timodal,  or  perceptual.  Table  43.2  summarizes  a  selection  of 
popular  assistive  devices.  Following  this  table,  a  discussion  of  re¬ 
search  that  has  evaluated  the  efficacy  of  such  approaches  to  ac¬ 
cessibility  is  presented. 


Competitor 


Screen  Reader 


Braille  Display 


Screen  Magnifiers 


TABLE  43.2.  Popular  Assistive  Devices 


Product  Overview 

Software  program  that  reads  to  the  user  elements  that 
appear  on  an  interface  via  synthesized  voice.  The 
program  reads  left  to  right,  starting  at  the  very  top  of 
the  screen.  When  an  image  is  encountered  the 
program  reads  the  associated  ALT  text 


Similar  to  the  screen  readers,  but  gives  the  reader  the 
information  via  tactile  cues  (Braille  characters) 


Physical  device  or  software  program  that  enlarges  the 
entire  screen  image 

Software  programs,  such  as  Zoomtext,  developed  by 
Ai  squared,  allow  for  "adaptable"  magnification  of  the 
interface,  and  some  versions  incorporate  a  screen  reader 


Key  Limitation(s) 

Solution  abandons  any  remaining  vision  the  user  has, 
using  only  their  auditory  ability 
Efficacy  depends  on  the  organization  of  the  interface 
(e.g.,  anything  not  modeled  in  a  left  to  right 
organization  is  not  compatible) 

Same  limitations  as  a  screen  reader,  plus  the  user  has 
to  learn  Braille,  which  is  not  likely  if  they  have  residual 
vision,  and  are  losing  vision  later  in  life 


A  'one  size-fits  all’  solution,  that  is  not  adaptable 
between  users 

For  people  with  visual  impairments,  magnification  is 
not  always  the  most  effective  strategy  (especially  with 
obstructed  visual  fields) 
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Kline  and  Glinert  (1995)  presented  UnWindows  VI,  a  set  of 
interface  tools  to  support  selective  magnification  of  window 
area,  and  tracking  the  location  of  the  mouse  pointer  on  the  dis¬ 
play  screen.  The  authors  noted,  “Magnification  is  one  method 
commonly  employed  to  help  low  vision  users  deal  with  the 
small  type  fonts,  illustrations,  and  icons  present  in  much  of  to¬ 
day’s  printed  media  and  computer  displays”  (Kline  &  Glinert, 
1995,  p.  2).  Key  components  of  the  UnWindows  system  in¬ 
cluded  (a)  a  dynamic  magnifier  to  compensate  for  the  loss  of 
global  context  imposed  by  static  magnification  and  changing 
display  content  and  (b)  Visual  and  aural  feedback  to  aid  the 
users  in  locating  the  mouse  pointer.  Kline  and  Glinert  placed 
emphasis  on  the  problematic  nature  of  visual  tracking  in  the 
presence  of  a  screen  densely  populated  with  icons  and  win¬ 
dows.  Interestingly,  they  received  mixed  reaction  to  their  inter¬ 
face  by  users  with  and  without  visual  impairment,  especially  in 
terms  of  the  auditory  feedback  provided  whenever  the  mouse 
pointer  entered  a  new  window  (users  found  this  annoying) . 
While  no  formal  empirical  testing  was  performed  in  relation  to 
UnWindows,  questions  surface  as  to  the  effectiveness  of  non¬ 
visual,  multimodal  feedback  in  a  complex  display. 

A  usability  review  of  currently  available  technologies  for  the 
conversion  of  GUI  technology  for  use  by  individuals  who  were 
blind  or  possessed  low  vision,  the  ability  of  magnification,  syn¬ 
thetic  speech,  and  Braille  were  reviewed  for  their  ability  provide 
the  respective  users  100%  access  to  GUIs  on  nine  test  areas 
(Becker  &  Lundman,  1998).  These  tests  included 

•  Installing  and  configuring  the  device/software 

•  Uninstalling  the  device/software 

•  Performance  reliability/stability 

•  Program  manager  to  read  and  manipulate  windows,  menus 
and  icons 

•  Word  processing  based  tasks  such  as  opening  and  saving  files, 
reading  and  editing  text,  text  attributes,  and  toolbars 

•  Spreadsheet  tasks  such  as  reading  cells,  tables  figures,  and 
editing  data  and  formulas 

•  Internet  use,  including  dialing  up,  accessing  World  Wide  Web 
pages,  navigating  with  link  buttons,  sending  e-mail  and  read¬ 
ing  graphics 

•  Screen  searching,  such  as  searching  for  characters,  strings, 
formats,  and  icons 

•  Operating  start  menu,  exploring  and  controlling  settings 
(Becker  &  Lundman,  1998) 

For  the  assessment  of  seven  magnification  programs  (syn¬ 
thetic  speech  and  Braille  displays  are  not  relevant  to  individu¬ 
als  with  visual  impairments  who  have  residual  vision),  the  eval¬ 
uators  comprised  of  a  system  engineer,  ergonomic  engineer, 
computer  science  expert,  and  three  individuals  with  visual 
impairments. 

The  use  of  magnification,  as  a  strategy  to  afford  access  to 
GUIs  proved  somewhat  successful,  providing  89%  or  higher  ac¬ 
cess  to  GUIs,  except  for  Internet  use  (84%  access)  and  screen 
searching  (0%;  Fortuin  &  Omata,  2004).  It  was  concluded  that 
the  essential  problem  for  the  design  of  interactive  systems  for 
users  with  visual  impairments  is  to  (a)  to  determine  what  the 


users  need  and  (b)  how  to  represent  the  requested  informa¬ 
tion  based  on  key  psychological  and  physical  attributes  of  the 
user.  The  result  of  ineffective  assistive  technologies  is  a  lack  of 
usable  contextual  cues  for  the  users  to  provide  feedback  in  the 
case  of  errors;  and  this  translates  to  large  amounts  of  imposed 
workload  on  the  user  and  frustration  (Fortuin  &  Omata,  2004). 

In  a  case  study  on  an  English  teacher  who  was  having  diffi¬ 
culty  reading  student  papers,  typing,  and  proofreading,  Whit¬ 
taker  (1998)  discovered  that  magnification  was  not  affording 
optimal  performance.  Typically,  the  authors  found  that  users 
with  visual  acuity  of  20/40  or  better  would  respond  well  to  sim¬ 
ple  optical  magnification.  The  authors  investigated  other  vi¬ 
sual  functioning  to  find  that  the  individuals  had  severely  di¬ 
minished  contrast  sensitivity  (13%  contrast  threshold,  with  2% 
representing  normal  sensitivity).  Furthermore,  this  individual’s 
visual  field  was  20  degrees  horizontally  (180  degrees  is  normal). 
Magnification  was  likely  reducing  the  number  of  letters  view¬ 
able  simultaneously  in  the  presence  of  scotoma  within  the  vi¬ 
sual  field.  The  author  warned  that  magnifiers  and  large  moni¬ 
tors  are  not  always  the  most  effective  solution  for  users  with 
impaired  vision. 

Arditi  (2004)  addressed  the  reading  difficulties  of  individu¬ 
als  with  low  vision.  According  to  the  author,  successfully  over¬ 
coming  this  difficulty  is  accomplished  through  the  exploitation 
of  remaining  vision.  The  easiest  way  to  do  this  is  through  mag¬ 
nification,  but  as  shown  in  this  study,  it  is  not  a  one  size  fits  all 
solution.  Several  parameters  of  the  font,  including  height, 
stroke,  spacing,  and  serif  size,  must  be  selected  in  a  combina¬ 
tion  that  best  suits  a  given  user.  Arditi  presented  the  prototype 
and  initial  user  testing  of  computer-based  software  that  lets  a 
user  customize  fonts  for  maximized  legibility.  Those  users  stud¬ 
ied  were  able  to  adjust  font  to  a  usable,  legible  level,  to  posi¬ 
tively  impact  reading  times  and  the  reading  acuity. 

An  in-depth  review  of  accessibility  tools  aimed  at  improving 
interactions  of  computer  users  with  low  vision  informed  the 
design  oiMouseLupe  (Silva,  Regina,  &  Bellon,  2QQT).MouseLupe 
simulates  a  magnifying  glass,  enabling  users  to  magnify  select 
portions  of  text  or  display  graphics,  inspired  by  the  problem¬ 
atic  nature  of  screen  magnification  software.  The  authors  sug¬ 
gested  that  magnification  improves  the  readability  of  smaller 
text,  but  occludes  the  visible  area  of  the  document.  Further¬ 
more,  graphics  that  contain  text  (like  most  icons),  a  critical  ele¬ 
ment  of  the  graphical  user  interface,  when  enlarged,  are  difficult 
to  read  (for  a  comprehensive  review  of  magnification  tools,  see 
Silva,  Regina,  &  Bellon,  2002). 

Several  researchers  have  considered  the  effect  of  visual  im¬ 
pairments  on  web  browsing  (Arditi,  2003;  Craven,  2003;  Harper, 
Goble,  &  Stevens,  2001;  Silva,  Regina,  &  Bellon,  2002).  Harper, 
Goble,  and  Stevens  (2001)  addressed  this  problem  in  terms  of 
“Web  Mobility.”  These  authors  provided  guidelines  for  move¬ 
ment  through  and  around  complex  hypermedia  environments, 
such  as  the  web,  for  users  with  visual  impairments.  The  prob¬ 
lem,  according  to  these  authors,  is  that  visual  impairment  in¬ 
hibits  individuals’  ability  to  efficiently  assimilate  page  structure 
and  visual  cues  that  lead  to  the  following  problems: 

•  Failure  to  get  a  feel  for  the  content  on  the  website 

•  Failure  to  have  a  sense  for  the  magnitude  of  the  display  or 
where  in  a  website  the  interaction  takes  place 
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•  Disorientation 

•  Obstacles  and  distracters  such  as  spacer  images,  tables,  and 

large  images 

•  Too  much  complex  detail  that  cannot  be  resolved 

•  Frustration 

Arditi  (2003)  observed  the  problems  of  web  browsing  in 
terms  of  the  allocation  of  screen  space  resources.  According 
to  the  author,  conflicts  arise  in  the  implementation  of  web 
browsing  solutions  for  individuals  with  low  vision  including: 
(a)  high  magnification  requirements;  (b)  variable  typography 
color,  size,  and  contrasts  of  the  content  presented;  (c)  embed¬ 
ded  text  messages  to  augment  web  images;  and  (d)  accessible 
web  browsing  controls  (icons,  buttons,  menus).  The  author 
presented  a  novel  approach  for  effectively  using  screen  re¬ 
sources,  providing  evidence  that  the  strategic  layout  of  a  dis¬ 
play  is  a  critical  factor  to  successful  interaction.  The  layout  of 
screen  elements  was  interpreted  as  more  critical  than  magnifi¬ 
cation  of  the  screen  elements. 

Craven  (2003)  questioned  the  accessibility  of  electronic  li¬ 
brary  resources  on  the  World  Wide  Web  for  individuals  with  vi¬ 
sual  impairments.  The  results  of  her  study  with  20  sighted  and 
20  visually  impaired  users  revealed  the  browsing  times  of  those 
individuals  with  visual  impairments  were  significantly  greater, 
depending  on  the  design  of  the  website  (layout  complexity  and 
distracters).  Navigation  time  for  the  group  of  users  with  visual 
impairments  was  significantly  longer  due  to  visual  functioning, 
but  also  due  to  artifacts  of  assistive  technology  use  in  navigation 
(magnification  and  screen  readers) . 

Summary 

Through  technological  advances  in  HCI  research,  the  concepts 
of  perceptual  and  adaptive  interfaces  have  emerged.  These  two 
categories  of  technologies  provide  vast  opportunities  for  indi¬ 
viduals  with  perceptual  impairments  to  fully  access  electronic 
information.  More  specifically,  multimedia  and  multimodal  sys¬ 
tems  have  been  shown  to  accommodate  users  of  varying  abili¬ 
ties.  However,  while  these  are  the  most  promising  approaches 
for  this  population,  there  still  exists  a  chasm  between  research 
and  practice.  Unfortunately,  it  is  rare  to  see  the  knowledge 
generated  through  HCI  research  actually  implemented  into 
commercially-available  products.  Much  work  still  needs  to  be 


done,  both  in  terms  of  advancing  knowledge — through  empiri¬ 
cal  HCI  research — and  in  terms  of  increasing  awareness.  Trans¬ 
lational  efforts  need  to  be  made  to  help  disseminate  this  new 
knowledge  into  the  design  of  technologies  and  devices,  so  that 
individuals  with  perceptual  (and  other  forms  of)  impairment  can 
better  leverage  the  increasingly  ubiquitous  technological  tools 
used  in  everyday  life.  An  evidence-based  need  for  improved  de¬ 
sign  of  technology — especially  for  individuals  with  sensory  of 
perceptual  impairments — is  needed  to  help  ensure  that  the  new 
knowledge  continually  generated  through  research  is  ultimately 
used  to  improve  human  interaction  with  technology. 


CONCLUSIONS 


To  provide  a  context  for  the  topic  of  perceptual  impairments 
and  computing  technologies.  Fig.  43.1  demonstrated  that,  in  or¬ 
der  to  achieve  universal  access  across  classes  of  computing  tech¬ 
nologies,  researchers  must  be  prepared  to  address  several  very 
challenging  issues: 

1.  Establish  empirical  links  between  clinical  diagnoses  and  sets 
of  functional  capabilities 

2.  Define  the  set  of  functional  capabilities  required  to  access  in¬ 
formation  technologies 

3.  Establish  empirical  bases  for  the  influence  of  specific  func¬ 
tional  capabilities  on  access  to  specific  classes  of  technologies 

This  chapter  aimed  to  establish  a  basis  for  addressing  such 
issues  by  examining,  specifically,  visual  impairment,  several  spe¬ 
cific  diagnoses,  and  the  resulting  functional  abilities.  Einally 
groundbreaking  advancements  in  perceptual  interfaces,  multi¬ 
modal  interfaces,  multimedia  interfaces,  and  adaptive  interfaces 
was  discussed,  which  can  be  applied  across  a  variety  of  classes  of 
technology  in  order  to  enhance  the  perceptual  experience  of 
people  who  possess  such  impairments. 
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INTRODUCTION 


Over  the  past  decade,  the  topic  of  universal  accessibility  has 
received  close  attention  in  the  field  of  system  design.  While 
considerable  progress  has  been  made  in  addressing  the  needs 
of  the  physically  disabled  and  aging,  the  cognitively  disabled 
have  largely  been  overlooked  (Gribbons,  1992,  Newell, 
Carmichael,  Gregor,  and  Aim,  2003).  With  nearly  45  million 
adult  Americans  suffering  from  the  debilitating  effects  of  illit¬ 
eracy,  the  Universal  Accessibility  movement  has  clearly  not 
been  universal.  While  many  have  called  for  action  (Dickinson, 
Eisma,  &  Gregor,  2003;  Shneiderman,  2000),  an  extensive  re¬ 
view  of  the  leading  human  factors  and  HCI  journals  revealed  a 
small  number  of  studies  focused  on  this  critical  issue.  The  topic 
of  the  less  literate  and  learning  disabled  is  also  glaringly  omit¬ 
ted  from  the  leading  HCI  and  human  factors  textbooks. 

The  research  that  does  exist  in  literacy  and  learning  disability 
studies  has  been  confined,  for  the  most  part,  to  the  fields  of 
educational  psychology,  instructional  design,  and  health-care 
informatics.  Here,  a  rich  research  tradition  dates  back  many 
decades.  However,  most  of  these  findings  are  limited  to  the  de¬ 
sign  of  the  classroom  experience,  standalone  computer-based 
training  systems,  and  paper-based  communication  products. 
Research  in  noneducational  settings  for  general  interface  and  in¬ 
teraction  design  is  severely  limited.  Further,  most  of  this  work 
focuses  on  children  and  not  the  adult  population. 

The  ubiquitous  nature  of  the  Internet  has  forced  some  progress 
on  accommodations  for  the  cognitively  disabled.  The  Web  Content 
Accessibility  Guidelines  1.0  (1999)  released  by  the  W3C  urges 
designers  to  accommodate  this  population.  Unfortunately,  many 
emerging  guidelines  tend  to  be  vague  and  nonspecific. 

This  chapter  will  address  the  challenge  of  accommodating 
functionally  illiterate  users  and/or  users  with  learning  disabili¬ 
ties  in  system  design.  It  will  explore  the  nature  of  this  population 
and  the  magnitude  of  the  problem,  the  relationship  between  lit¬ 
eracy  and  learning  disability,  characteristics  of  the  disability,  and 
recommendations  for  design  practice  and  usability  testing.  Ulti¬ 
mately,  it  is  hoped  that  this  effort  will  encourage  further  research 
in  this  critical  area  and  the  integration  of  that  work  in  our  pro¬ 
fession’s  educational  programs,  guidelines,  and  best  practices. 

Defining  the  Population 

Part  of  the  challenge  of  addressing  this  population  is  building 
consensus  on  the  most  effective  definition  of  this  group.  Com¬ 
mon  labels  include  “functionally  illiterate,”  “cognitively  dis¬ 
abled,”  and  “the  learning  disabled”  (a  subset  of  the  cognitively 
disabled  group).  An  underlying  assumption  in  this  work  is  that 
the  three  categories  are  not  mutually  exclusive.  For  the  pur¬ 
poses  of  this  chapter,  discussion  will  be  limited  to  functionally  il¬ 
literate  and  learning-disabled  populations.  Functional  illiteracy,  a 
lack  of  document  and  quantitative  literacy  needed  to  function  in 
modern  society,  was  selected  since  it  provides  the  most  com¬ 
prehensive  picture  of  the  total  population.  This  perspective 
brings  to  the  discussion  a  very  detailed  set  of  demographics  and 
statistics.  Unfortunately,  this  perspective  is  weak  on  underlying 
causes  and  effective  accommodations.  Fortunately,  the  learn¬ 


ing-disabled  perspective,  while  focused  most  often  on  the 
school-age  population,  provides  an  extensive  research  base  that 
defines  the  characteristics  of  this  population,  the  underlying 
source  of  the  disability,  and  possible  accommodations.  Learn¬ 
ing  disability  is  an  umbrella  term  used  to  describe  a  wide  range 
of  disorders  in  information  processing  and  it  is  generally  be¬ 
lieved  that  learning  disabilities  are  linked  to  a  dysfunction  in  the 
central  nervous  system.  The  larger  cognitively  disabled  category 
was  not  chosen  since  it  includes  a  much  broader  class  of  dis¬ 
abilities  including  Down  syndrome,  autism,  emotional  disabili¬ 
ties,  and  Alzheimer’s  disease.  Naturally,  these  more  severe  dis¬ 
abilities  require  more  extreme  accommodations. 

The  causes  of  functional  illiteracy  are  complex  and  often  in¬ 
extricably  intertwined;  these  causes  include  a  lack  of  educa¬ 
tional  attainment,  dyslexia,  other  learning  disabilities,  and  social 
deprivation.  Further  complicating  matters,  many  adults  with  low 
literacy  skills  were  never  diagnosed  with  a  learning  disability  as 
children  and  entered  adulthood  suffering  the  effects  of  the  dis¬ 
ability  without  knowing  the  cause.  Consequently,  it  is  all  but  im¬ 
possible  to  characterize  whether  the  cause  of  adult  illiteracy  is 
a  product  of  low  educational  attainment,  an  underlying  disabil¬ 
ity,  or  whether  an  undiagnosed  disability  was  the  cause  of  an 
individual  dropping  out  of  the  educational  system.  ’WTiile  it  may 
never  be  possible  to  determine  how  many  in  the  lower-literacy 
population  have  learning  disabilities,  it  is  a  safe  assumption  that 
the  contribution  to  the  population  is  significant. 


Population  Size 

The  current  state  of  functional  illiteracy  in  the  United  States  pre¬ 
sents  a  shameful  picture.  According  to  the  1992  National  Adult 
Literacy  Survey,  some  23%,  or  nearly  45  million  of  approximately 
200  million  adult  Americans,  function  at  the  lowest  level  of 
literacy,  “Level  1.”  Those  at  Level  1  literacy  are,  for  the  most  part, 
able  only  to  read  a  simple  form  or  understand  rudimentary  in¬ 
formation  in  a  short  news  article;  others  are  not  capable  of  even 
this  (Kirsch,Jungeblut,  Jenkins,  &  Kolstad,  1993).  Within  the 
context  of  these  figures,  it  is  surprising  that  greater  attention  has 
not  been  directed  to  this  population,  since  its  numbers  are 
larger  than  all  other  disability  groups  combined. 

The  other  literacy  levels,  two  through  five,  represent  pro¬ 
gressively  higher  levels  of  literacy  skills.  Not  surprisingly,  one 
contributor  to  functional  illiteracy  is  low  educational  attainment. 
The  National  Adult  Literacy  Survey  states  that  “adults  with  rela¬ 
tively  few  years  of  education  were  more  likely  to  perform  in  the 
lower  literacy  levels  than  those  who  completed  high  school  or 
received  some  postsecondary  education.  (Kirsch,  Jungeblut, 
Jenkins,  and  Kolstad,  1993).  The  National  Center  for  Educa¬ 
tional  Statistics  (2001)  reported  the  following: 

“Between  1972  and  1985,  high  school  completion  rates  climbed  by 
2.6  percentage  points  (from  82.8  percent  in  1972  to  85.4  percent  in 
1985);  since  1985,  the  rate  has  shown  no  consistent  trend  and  has  fluc¬ 
tuated  between  85  and  87  percent.  This  net  increase  of  about  3  per¬ 
centage  points  over  29  years  represents  slow  progress  toward  improving 
the  national  high  school  completion  rates”  (High  School  Completion 
Rates,  Executive  Summary).  Kirsch,  Jungeblut,  Jenkins,  &  Kolstad,  1993. 
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Given  this  slow  progress,  it  is  highly  probable  that  educa¬ 
tional  attainment  will  continue  to  be  a  key  contributor  to  lower 
literacy  skills  among  adults  for  the  foreseeable  future. 

The  consensus  of  most  literacy  experts  is  that  people  at  Lev¬ 
els  1  and  2,  the  lowest  levels,  lack  the  essential  skills  to  func¬ 
tion  successfully  in  society.  In  2003,  this  survey  was  readminis¬ 
tered  and  the  results  showed  no  significant  improvement  in 
prose  and  document  literacy,  with  some  improvement  in  quan¬ 
titative  literacy.  Cleary  the  data  from  these  two  studies  show  the 
problem  is  large  and  gains  in  literacy — like  increases  in  high 
school  completion  rates — will  come  slowly. 

Unlike  the  measure  provided  by  the  National  Adult  Literacy 
Survey,  there  is  no  large-scale  study  measuring  the  prevalence 
of  learning  disabilities  in  the  general  population.  There  are,  how¬ 
ever,  a  range  of  estimates  from  a  variety  of  sources.  The  National 
Dyslexia  Association  (n.d.)  estimates  that  70-80%  of  the  popula¬ 
tion  of  students  receiving  special  education  services  has  deficits 
in  reading  and  that  15-20%  of  the  general  population  has  lan¬ 
guage-based  learning  disabilities  (how  common  are  language- 
based  learning  disabilities  simply  to  document  the  scope  of  the 
problem).  The  Interagency  Committee  on  Learning  Disabilities 
believes  that  5-10%  of  the  population  is  affected  by  learning  dis¬ 
abilities.  The  President’s  Committee  on  Employment  for  People 
with  Disabilities  found  that  10-14%  of  the  adults  in  the  work¬ 
force  have  learning  disabilities.  Finally,  the  National  Institutes  of 
Health  estimated  that  15%  of  the  population  in  the  United  States 
has  some  type  of  learning  disability  (National  Institute  for  Liter¬ 
acy,  n.d.).  Despite  the  probable  link  between  those  with  learning 
disabilities  and  the  larger  lower-literacy  population,  no  attempt 
has  been  made  to  validate  this  connection. 

Not  surprisingly,  functional  illiteracy  and  illiteracy  are  not 
problems  confined  to  the  United  States.  There  are  an  estimated 
876  million  illiterate  adults  in  the  world,  which  represents  nearly 
a  quarter  of  the  world’s  population.  In  developed  nations,  it  is 
estimated  that  there  are  approximately  100  million  functionally 
illiterate  people  (Kickbusch,  2001) .  In  India  alone,  nearly  45%  of 
the  adult  population  is  illiterate  according  to  the  2001  Indian 
Census  (Huenerfauth,  2002).  These  statistics  and  others  clearly 
convey  the  global  scale  of  this  problem. 

Making  the  Case  for  Including  Functional  Illiteracy 

The  lack  of  research  addressing  literacy  and  learning  disability 
in  the  HCI  discipline  was  understandable  10  or  15  years  ago 
when  this  population  was  highly  unlikely  to  interact  with  tech¬ 
nology  outside  of  an  educational  setting.  With  the  ubiquitous 
nature  of  information  technology  in  modern  society  (for  ex¬ 
ample,  the  Internet,  computers  in  all  aspects  of  work,  public 
kiosks,  ATMs,  electronic  voting  machines,  e-health,  and  con¬ 
sumer  electronics)  this  research  gap  is  now  indefensible.  Ac¬ 
cessible  technology  is  no  longer  a  luxury  or  an  option;  rather 
it  is  a  prerequisite  to  gaining  access  to  information  that  affects 
one’s  quality  of  life,  participating  in  government,  and  con¬ 
tributing  to  the  economy.  A  report  from  the  Department  of 
Commerce  (2002)  stated  that  “  [b]  etween  December  1998  and 
September  2001,  Internet  use  by  individuals  in  the  lowest- 
income  households  (lower  socio-economic  groups  represent 
a  disproportionately  high  percentage  of  the  lower  literacy  pop¬ 
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ulation)  increased  at  a  25  percent  annual  growth  rate”  (p.  56). 
The  study  also  reported  that  “as  of  September  2001,  about 
65  million  of  the  115  million  adults  who  were  employed  and 
25  or  over  use  a  computer  at  work”  (p.  57).  While  the  percent¬ 
age  of  workers  using  computers  was  significantly  higher  for 
“professional”  positions,  20%  of  those  working  as  operators, 
fabricators,  laborers,  farmers,  and  fishermen  use  a  computer  as 
part  of  their  work  (p.  58).  Lastly,  shifts  in  the  economy  from 
“hands-on  work”  to  “information-and-technology  work”  will 
seriously  disadvantage  lower-literacy  groups  unless  there  is  a 
corresponding  increase  in  functional  literacy  skills. 

If  the  size  of  this  population  alone  were  not  enough  to  mo¬ 
bilize  action,  the  cognitively  disabled  are  a  protected  group  un¬ 
der  the  Americans  with  Disabilities  Act.  Under  the  act,  employ¬ 
ers  are  required  to  provide  workplace  accommodations  for 
employees  who  disclose  their  disability.  That  last  stipulation  of 
the  act  could  very  well  be  one  of  the  reasons  this  population  has 
been  marginalized.  Low  literacy  carries  a  stigma  for  most  people 
and  creates  a  reluctance  to  disclose — no  disclosure,  no  require¬ 
ment  for  accommodation. 

Functional  illiteracy  and  healthcare.  Perhaps  the 
most  poignant  case  of  a  digital  divide  between  the  literate  haves 
and  the  functionally  illiterate  have-nots  is  in  the  healthcare  com¬ 
munity.  After  a  long  delay,  this  community  has  made  tremendous 
progress  in  implementing  information  technologies  to  enhance 
the  quality  of  patient  care,  particularly  in  disseminating  health- 
related  information,  managing  patient  care,  accessing  patient  in¬ 
formation,  and  promoting  health  services.  As  a  result,  the  trend 
for  seeking  health-related  information  online  is  increasing  dra¬ 
matically.  An  estimated  70  million  Americans  have  sought  health 
information  online  (Cain,  Mittman,  Sarasohn-Kahn,  &  Wayne, 
2000)  from  the  nearly  10,000  or  more  health-related  websites 
(Benton  Foundation,  1999).  Birru  et  al.  (2004)  estimated  that  be¬ 
tween  40%  and  54%  of  patients  use  the  Internet  to  search  for  in¬ 
formation  on  ailments  and  treatments.  This  development  would 
be  exciting,  given  the  documented  contributions  of  information 
to  improved  health,  if  it  were  not  for  the  fact  that  equal  access  to 
this  information  does  not  exist  for  all  Americans.  Similar  to  the 
progression  that  played  out  in  the  larger  development  commu¬ 
nity,  initial  design  efforts  in  the  healthcare  sector  have  focused 
first  on  the  “typical”  fully  functioning  user  and  then  on  the  aging 
and  physically  disabled.  Most  healthcare  materials — web-based  or 
paper — are  written  well  above  the  literacy  level  of  the  average 
American,  approximately  a  lOth-grade  level  or  greater.  (Doak, 
Doak,  &  Root,  1996;  Birru  et  al.,  2004;  Weiner  et  al.,  2004). 

The  healthcare  sector  has  both  a  moral  and  economic  re¬ 
sponsibility  to  address  the  needs  of  lower  literacy  populations. 
Morally,  it  is  only  right  that  all  citizens  benefit  equally  from  the 
value  information  technology  brings  to  the  quality  of  healthcare 
services.  Cashen,  Dykes,  and  Gerber  (2004)  provided  support 
for  this  position  by  reporting  that  literacy  is  a  better  predictor  of 
health  status  than  age,  income,  employment  status,  education, 
or  race.  There  is  clearly  an  economic  motivation  as  well.  A  con¬ 
servative  estimate  places  excess  healthcare  costs  tied  to  low  lit¬ 
eracy  at  $73  billion  a  year.  (Rudd,  Moeykens,  &  Colton,  1999). 
It  is  not  surprising,  then,  that  low-literacy  adults  are  twice  as 
likely  to  be  hospitalized  as  their  functionally  literate  counter¬ 
parts  (Birru  et  al.,  2004). 
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In  recent  years  the  definition  of  the  digital  divide  has  shifted 
from  economics  to  functional  literacy.  The  availability  of  low- 
cost  equipment  and  easy  access  to  the  Internet  in  public  facili¬ 
ties  has  lessened  the  economic  barriers.  According  to  Shnei- 
derman  (2000),  poor  interface  and  interaction  design  remain 
as  one  of  the  defining  variables  between  the  technology  haves 
and  have-nots.  And  while  we  might  hold  out  hope  that  we  may 
one  day  eradicate  illiteracy  at  the  source,  the  fact  that  learning 
disabilities  are  lifelong  and  that  fewer  than  one  in  eight  low- 
literacy  workers  receives  literacy  training  in  the  workplace 
(Sum,  1999,  p.  156)  strongly  suggests  that  the  burden  falls  on 
better  design  and  appropriate  accommodations. 


CHARACTERISTICS  OF  THE  POPULATION 


Functional  illiteracy  is  often  referred  to  as  the  “invisible”  dis¬ 
ability.  In  many  cases,  the  individual  is  unaware  of  the  disability 
or  the  individual  “hides”  the  disability.  Findings  from  The  Na¬ 
tional  Literacy  Survey  (1993)  suggest  that  as  many  as  “66  to  75 
percent  of  the  adults  in  the  lowest  [literacy]  level  and  93  to  97 
percent  in  the  second  lowest  level  described  themselves  as  be¬ 
ing  able  to  read  or  write  English  ‘well’  or  ‘very  well’  ”  (The  Lit¬ 
eracy  Skills  of  American  Adults).  In  another  study  by  Moon, 
Cheng,  Patel,  and  Scheldt  (1998),  70%  of  the  participants  re¬ 
ported  they  read  “really  well,”  while  in  actuality  their  reading 
scores  reflected  a  7th-  to  8th-grade  ability.  Also  contributing  to 
the  “invisibility”  of  this  disability  is  the  stigma  attached  to  func¬ 
tional  illiteracy  and  learning  disabilities  that  increases  an  indi¬ 
vidual’s  reluctance  to  disclose  the  problem. 

The  learning  disability  label  encompasses  a  wide  range  of  in¬ 
formation  processing  disorders,  such  as  dyslexia  (language), 
dyscalculia  (mathematics),  and  dysgraphia  (handwriting).  They 
are  thought  to  be  neurobiological  in  origin  and  present  them¬ 
selves  in  various  combinations  and  levels  of  severity.  In  addition, 
an  individual  with  a  deficit  in  one  area  may  have  strengths  in 
other  areas.  Understanding  the  heterogeneous  nature  of  the 
learning-disabled  population  is  critical  to  formulating  appropri¬ 
ate  accommodations.  Learning  disabilities  generally  persist  over 
a  lifetime,  but  the  manner  in  which  the  disability  presents  itself 
will  change  with  life  stages  and  accommodation  strategies 
adopted  by  an  individual  (Gerber,  1998).  The  lifelong  persis¬ 
tence  of  the  disability,  combined  with  the  aforementioned  lack 
of  adult-literacy  training,  suggests  this  problem  will  not  go  away. 

Overall,  learning  disabilities  affect  an  individual’s  ability  to 
develop  and  use  reading,  writing,  reasoning,  and  mathematical 
skills  (Karande,  Sawant,  Kulkarni,  Galvankar,  &  Sholapurwala, 
2005).  In  the  absence  of  these  fully  developed  skills,  the  indi¬ 
vidual’s  ability  to  access,  process,  and  retain  (learn)  informa¬ 
tion  is  severely  constrained.  In  addition,  the  many  challenges 
and  failures  experienced  by  the  learning  disabled  over  a  lifetime 
often  result  in  a  poor  concept  of  self-worth  and  low  self-esteem 
(Gerber,  1998). 

One  point  at  which  some  in  the  learning-disabled  population 
break  with  the  larger  functionally  illiterate  community  is  at  the 
level  of  underlying  intelligence.  Many  individuals  with  one  or 
more  learning  disabilities  have  normal  or  above  normal  intelli¬ 
gence  (Rowland,  2004;  Doak  et  al.,  1996;  Gerber,  1998).  Conse¬ 


quently,  once  appropriate  accommodations  are  provided  to  mit¬ 
igate  the  effects  of  the  disability,  adequate  intelligence  exists  to 
accomplish  most  cognitive  tasks. 

Functional  illiteracy  and  aging.  One  remaining  char¬ 
acteristic  of  this  population  is  the  disproportionately  high  rates 
of  lower  literacy  levels  in  the  aging  population.  Kirsch  et  al. 
(1993)  reported,  “Older  adults  were  more  likely  than  middle- 
aged  and  younger  adults  to  demonstrate  limited  literacy  skills. 
For  example,  adults  over  the  age  of  65  have  average  literacy 
scores  that  range  from  56  to  61  points  (or  more  than  one  level) 
below  those  of  adults  40-54  years  of  age”  (p.  5)  (The  Literacy 
Skills  of  America’s  Adults).  In  the  most  recent  national  literacy 
survey,  14%  of  the  adult  population  is  below  basic  prose  liter¬ 
acy  skills,  with  26%  of  this  group  65  years  of  age  or  older  (NAAL, 
2003).  The  cause  for  the  disproportionate  representation  of 
older  adults  is  likely  a  combination  of  a  number  of  factors  in¬ 
cluding  higher  school  dropout  rates  for  the  older  population, 
lack  of  special  education  services  when  they  were  in  school,  and 
the  general  decline  in  cognitive  abilities  associated  with  aging. 
Regardless  of  the  cause,  design  accommodations  that  benefit 
lower-literacy  populations  are  likely  to  have  the  added  benefit  of 
supporting  the  aging  population  as  well. 

The  next  section  will  move  from  this  general  overview  to  a 
detailed  discussion  of  the  functional  characteristics  of  the  learn¬ 
ing  disabled.  A  sizable  portion  of  the  research  that  fuels  this  dis¬ 
cussion  is  from  the  study  of  dyslexia.  The  rationale  for  this  focus 
is  twofold:  (a)  dyslexia  is  the  most  common  of  the  learning  dis¬ 
abilities,  affecting  nearly  80%  of  the  learning-disabled  popula¬ 
tion  (Karande  et  al.,  2005);  and  (b)  deficits  associated  with 
dyslexia  are  most  likely  to  impact  information  processing  as  it 
relates  to  HCI. 


Functional  Characteristics  of  the  Learning  Disabled 

Because  of  the  tremendous  diversity  in  the  learning  disability 
community  and  the  high  degree  of  overlap  among  discrete  dis¬ 
abilities,  it  is  easy  to  become  overwhelmed  by  the  complexity 
of  the  area.  A  more  manageable  strategy  is  to  identify  the  func¬ 
tional  characteristics  shared  across  disabilities,  with  close  atten¬ 
tion  to  those  that  affect  a  particular  interaction  environment 
(Brown  &  Lawton,  2001;  Bohman  &  Anderson,  2005).  Figure 
44.1  summarizes  a  number  of  proposed  groupings. 

After  carefully  examining  these  characteristics,  I  simplified 
this  list  into  four  dynamically  interconnected  categories: 

1.  Reading 

2.  Memory 

3.  Metacognitive 

4.  Search  and  navigation 

These  categories  were  selected  because  of  their  likely  affect 
on  the  computer-interaction  environment.  In  addition,  many 
of  the  characteristics  in  Table  44.1  are  directly  linked  to  deficits 
in  a  common  underlying  process,  such  as  metacognition  or 
working  memory.  These  four  categories,  as  highlighted  in  the 
discussion  that  follows,  are  not  mutually  exclusive.  For  example. 
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Source 

Deficiency 

Rowland  (2005) 

Perception  and  processing 

Problem  solving 

Low  resilience  to  frustration — not  persistent 

Memory 

Attention 

liwnani  (2001) 

Difficulty  with  sequential  operations 

Difficulty  with  complex,  cluttered  displays 
and  controls 

Difficulty  choosing  from  large  sets 

Timed  responses —  particularly  when  they 
involve  text 

Bohman  and 

Memory 

Anderson 

Problem-solving 

(2005) 

Attention 

Reading,  linguistic,  and  verbal 
comprehension 

Math  comprehension 

Visual  comprehension 

Cromley  (2005) 

Poor  decoding 

Limited  background  knowledge 

Low  vocabulary 

Dysfunctional  beliefs  about  reading 

Low-strategy  use 

Motivational  Barriers 

Working  memory  issues 

Kolatch  (2000) 

Trouble  with  abstract  reasoning  and 
organization 

Short  and  long-term  memory  loss 

Word  finding  and  syntax  development 
difficulties 

Diminished  capacity  to  organize,  assimilate,  and 
retain  information  (web  section)  (Introduction) 

Web  Aim 

Difficulty  interpreting  what  is  heard  and  read 

Difficulty  connecting  information 

The  International 

Difficulties  with  accurate  and/or  fluent  word 

Dyslexia 

recognition 

Association 

Poor  spelling  and  decoding  abilities 

Deficit  in  the  phonological  component  of 
language 

Problems  in  reading  comprehension  and 
reduced  reading  experience  that  can  impede 
growth  of  vocabulary  and  background 
knowledge 

The  TRACE 

• 

Memory  difficulty  recognizing  and  retrieving 

Center  (2003) 

information 

• 

Perception  difficulty  taking  in,  attending  to, 
and  discriminating  sensory  information 

• 

Difficulties  in  problem  solving 

• 

Difficulty  evaluating  outcomes 

Difficulty  generalizing  previously  learned 
information 

FIG.  44. 1 .  Functional  characteristics  of  the  learning  disabled. 


skilled  reading  draws  on  memory  and  requires  the  deployment 
of  metacognitive  strategies.  Similarly,  search  and  navigation 
place  a  load  on  working  memory,  draw  on  long-term  memory, 
and  require  metacognitive  monitoring.  The  following  section 
will  review  each  of  these  categories  more  closely,  and  the  final 
section  will  offer  recommendations  for  accommodating  each 
of  these  categories  in  system  design. 
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Reading 

Given  the  prominent  role  of  language  in  everyday  communica¬ 
tion  as  well  as  the  more  demanding  software  and  web  envi¬ 
ronments,  reading  is  one  of  the  first  barriers  encountered  by 
the  functionally  illiterate.  The  alarming  rate  of  poor  prose  and 
document-literacy  skill  was  noted  earlier.  Also  noted  was  the 
fact  that  poor  reading  is  often  tied  to  an  underlying  learning  dis¬ 
ability,  inadequate  schooling,  or  lack  of  exposure  to  reading.  As 
in  so  many  other  cases  in  this  area,  an  individual  can  suffer  from 
all  three  conditions,  two  of  them,  or  just  one. 

In  dyslexic  children,  the  learning  disability  affects  reading  at 
the  most  basic  levels  of  phonological  processing  (Snowling, 
Deftry  and  Goulandris,  1996).  Deficiencies  at  this  basic  level 
lead  to  further  difficulties  acquiring  the  complex  skill  of  reading, 
building  a  rich  vocabulary,  and  decreasing  word-retrieval  times. 
Most  significantly,  disabled  readers  allocate  a  disproportionately 
high  amount  of  their  attention  to  decoding  letters  and  words,  a 
process  that  is  quickly  automated  by  the  nondisabled  reader 
(van  Gelderen  et  al.,  2004).  For  the  skilled  reader,  the  process  of 
automation  frees  the  attention  resource  to  focus  on  compre¬ 
hension,  the  most  critical  component  of  the  reading  act.  Skilled 
reading  requires  the  management  of  a  complex  series  of  parallel 
actions,  such  as  recognizing  words,  connecting  those  words  to 
what  came  before,  using  those  words  to  anticipate  what  will 
come  next,  and  relating  this  combined  experience  to  what  one 
already  knows.  For  the  learning  disabled,  focusing  most  of  the 
available  attention  on  recognizing  letters  and  decoding  words 
severely  compromises  comprehension.  Birru  et  al.  (2004)  re¬ 
ported  that  even  when  poor  readers  were  able  to  read  a  pas¬ 
sage,  their  inability  to  express  answers  related  to  the  passage  in 
their  own  words  suggested  minimal  comprehension.  Naturally, 
because  reading  is  such  a  frustrating  experience  for  the  learning 
disabled,  they  are  also  less  likely  to  read  on  a  frequent  basis, 
lowering  the  likelihood  they  will  increase  their  reading  skill 
through  practice  (Stanovich  &  West,  1989). 

Given  the  difficulty  experienced  by  this  population  at  the 
most  basic  level  of  decoding,  it  becomes  clear  that  the  problem 
is  greatly  exacerbated  by  materials  written  at  a  grade  level  be¬ 
yond  their  ability.  When  information  is  presented  at  a  level  be¬ 
yond  the  capability  of  the  reader,  mental  workload  is  increased 
significantly  and  comprehension  is  greatly  diminished.  From 
an  accessibility  perspective,  grade-level  readability  is  a  persistent 
problem  throughout  the  web  development  community.  A  re¬ 
cent  study  by  West  (2003)  found  that: 

“89%  of  government  websites  are  not  easily  accessible  to  the  citizenry 
because  the  site  read  at  higher  than  an  eighth  grade  level  of  literacy. 
Fully  two-thirds  of  all  sites  have  language  consistent  with  a  12th  grade 
reading  level,  which  is  higher  than  the  average  American  [half  of  all 
Americans  read  no  higher  than  the  eighth  grade  level] .”  (p.  3) 

In  another  review  conducted  by  Croft  and  Peterson  (2002)  the 
investigators  found  the  mean  readability  score  of  145  asthma-re¬ 
lated  websites  was  above  the  10th  grade,  with  27  of  the  websites 
at  the  maximum  12th-grade  level.  Finally,  Graber,  D’Alessandro, 
and  Johnson-West  (2002)  conducted  a  study  of  the  readability 
level  of  privacy  policies  on  Internet  health  websites.  The  inves¬ 
tigators  found  that  the  average  readability  of  the  statement  was 
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at  a  level  equal  to  that  of  a  second-year  college  student,  far  out¬ 
side  the  reach  of  the  average  American. 

Despite  the  presence  of  a  reading  disability,  most  of  this  pop¬ 
ulation  possesses  average  or  above  average  intelligence.  As  a 
result,  if  materials  are  presented  at  the  appropriate  grade  level, 
the  learning-disabled  reader  is  capable  of  understanding.  In 
separate  studies  conducted  by  Nielsen  (2005)  and  Birru  et  al. 
(2004),  poor  readers  performed  well  when  interacting  with 
easier-to-read  materials. 

While  there  remains  considerable  disagreement  on  how  best 
to  measure  grade-level  reading,  certain  measures  and  tests  have 
gained  wide  acceptance.  One  such  test,  the  Flesch  Kincaid  Grade 
Level  Readability  test,  measures  readability  by  dividing  the  sen¬ 
tence  length  (number  words  divided  by  the  number  of  sen¬ 
tences)  by  the  average  number  of  syllables  per  word  (number 
of  syllables  divided  by  the  number  of  words) .  The  resulting  cal¬ 
culation  is  then  correlated  with  an  appropriate  grade  level.  While 
this  is  a  far  from  perfect  measure,  it  does  provide  a  useful  bench¬ 
mark  of  readability  when  evaluating  the  suitability  of  text  for  a 
given  population.  Unfortunately,  this  test  along  with  others,  such 
as  the  SMOG  Readability  Formula  and  the  FRY  Graph  Reading  In¬ 
dex,  apply  only  to  prose  and  not  to  lists,  labels,  and  headings. 
With  this  type  of  text,  the  use  of  familiar  words  improves  acces¬ 
sibility.  Designers  can  consult  reference  works,  such  as  the  Word 
Frequency  Book,  (Carroll,  1972),  to  identify  the  most  commonly 
used  words  in  the  English  language.  A  more  complex  form  of  lan¬ 
guage  support  can  be  found  on  the  CAST  website  (http://www 
.cast.org/)  where  the  sponsors  provide  a  language  tool  to  sup¬ 
port  low-literacy  users  visiting  the  CAST  site. 

Because  the  functionally  illiterate  commonly  experience  dif¬ 
ficulty  with  reading,  illustrations,  audio  output,  and  video  are  of¬ 
ten  offered  as  effective  accommodations.  While  illustrations  can 
assist  the  less-able  reader  (Doak  et  al.,  1996;  Weiner  et  al.,  2004), 
positive  effects  vary  with  the  reader  and  nature  of  the  informa¬ 
tion.  According  to  Beveridge  and  Griffiths  (1987),  illustrations 
contributed  positively  to  reading  performance  in  easier-to-read 
passages.  In  contrast,  they  also  noted  that  illustrations  degraded 
reading  performance  in  the  most  difficult  passages.  In  a  study 
related  to  the  supplemental  use  of  animation,  Larsen  (1995)  re¬ 
ported  learning-disabled  children  had  trouble  due  to  the  dis¬ 
traction  of  the  animation.  Similarly,  Jiwnani  (2001)  reported  sys¬ 
tem  designers  must  be  careful  using  auditory  output  since  it  can 
confuse  the  learning  disabled.  The  pace  of  the  output  must  be 
slow,  free  of  background  noise,  and  repeatable  under  the  con¬ 
trol  of  the  user. 

Clearly,  interface  and  interaction  designers  must  choose 
word,  sentence  length,  and  sentence  structure  with  due  con¬ 
sideration  for  the  readability  level  of  the  audience.  Readability 
can  be  seen  as  the  first  barrier  in  software  or  web  design  en¬ 
countered  by  the  functionally  illiterate  population.  If  this  pop¬ 
ulation  is  unable  to  process  the  underlying  language  used  in  the 
display,  all  other  accommodations  are  meaningless. 

Memory 

Long-term  memory.  The  long-term  and  working  mem¬ 
ory  systems  each  have  implications  for  the  functionally  illiter¬ 
ate  and  learning-disabled  population.  As  noted  in  the  previous 


section,  this  population  is  less  likely  to  read  as  often  and,  con¬ 
sequently,  fails  to  benefit  from  a  major  source  of  acquiring  do¬ 
main  knowledge.  Without  well-established  domain  models,  in¬ 
teracting  with  web-based  information  or  a  software  system  is 
considerably  more  demanding.  Cognitive  science  has  long  rec¬ 
ognized  that  long-term  memories  supplement  and  extend  the 
limited  capacity  of  working  memory.  This  topic  will  be  explored 
further  in  the  navigation-and-search  section  that  follows. 

Working  memory.  Cognitive  science  has  also  recognized 
the  limited  capacity  of  working  memory  and  the  major  bottleneck 
it  represents  in  information  processing  and  human-computer 
interaction.  While  learning  disabilities  are  very  diverse  in  their 
causes  and  in  how  they  manifest  themselves,  one  consistent 
variable  in  all  learning  disabilities  is  working  memory  capacity. 
The  dynamic  interaction  between  memory  capacity  and  the 
time  information  can  be  actively  maintained  also  imposes  severe 
constraints  on  the  learning  disabled.  Debate  has  raged  for  years 
whether  the  learning  disabled  suffer  from  diminished  working 
memory  capacity  or  whether  the  complex  dynamics  of  the  un¬ 
derlying  cognitive  deficiencies  place  an  excessive  burden  on  a 
“normal”  capacity  (Daneman  &  Carpenter,  1980;  Swanson, 
1993;  Vellutino  et  al.,  1996;  Ransby  &  Swanson,  2003).  From  the 
perspective  of  design  accommodations,  the  precise  source  of 
the  problem  is  insignificant. 

Clearly,  the  attention  devoted  to  low-level  decoding  places  a 
heavy  demand  on  working  memory  capacity.  In  addition,  the 
increased  likelihood  of  weaker  mental  models  minimizes  the 
opportunity  to  “offload”  processing  burden  from  the  working 
memory.  Further,  the  load  imposed  by  metacognition,  search, 
and  navigation — as  discussed  in  the  sections  that  follow — 
places  additional  demands  on  this  precious  resource.  Finally, 
the  high  level  of  anxiety  and  frustration  experienced  by  this 
population  also  operates  in  working  memory  and  will  nega¬ 
tively  affect  performance  on  an  information-processing  task 
(Lee,  1999).  Consequently,  this  issue,  more  than  any  other, 
guides  many  of  the  design  and  support  recommendations  in 
the  final  section.  For  example,  limited  working  memory  re¬ 
source  accounts  for  the  problems  the  learning  disabled  expe¬ 
rience  with  sequenced  operations  in  software  or  in  retracing 
their  paths  of  travel  navigating  a  website.  Interface  and  inter¬ 
action  designs  must  compensate  for  the  limitations  of  working 
memory  or  suffer  the  consequences  of  lower  performance,  in¬ 
creased  errors,  or  abandonment  of  the  task. 

While  the  underlying  learning  disability  can  make  the  acqui¬ 
sition  of  expertise  more  difficult,  it  would  be  wrong  to  think  this 
population  does  not  bring  some  level  of  learning  and  a  variety  of 
conceptual  models  to  an  interaction  experience.  As  with  so 
many  other  variables  in  accessibility  design,  we  must  recognize 
that  the  goals  and  models  of  the  lower-literacy  population  may 
not  align  with  those  of  the  general  population.  Dickinson, 
Eisma,  and  Gregor  (2003)  reported  that  “users  made  remarks 
which  indicated  that  they  were  not  trying  to  understand  the  sys¬ 
tem  or  find  generic  rules  that  could  help  them  to  use  it  better” 
(p.  63).  In  other  words,  they  were  not  attempting  to  understand 
the  model  underlying  the  system;  they  were  simply  trying  to  get 
something  accomplished.  System  design  should  focus  on  that 
goal  and  avoid  excessive  functionality  and  extraneous  informa¬ 
tion.  From  a  development  perspective,  it  is  always  easy  to  imag- 
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ine  occasions  when  one  user  or  another  might  need  this  feature 
or  that  piece  of  information.  Unfortunately,  the  lower-literacy 
population  comes  to  the  system  with  a  greatly  limited  need,  for 
both  functions  and  information.  Accessible  design  must  con¬ 
sider  these  needs.  Interface  clutter  increases  mental  workload 
on  users  as  they  attempt  to  locate  information  and  features  that 
align  with  their  needs,  a  difficult  challenge  given  their  atten- 
tional  impairments. 

Metacognitive 

Another  critical  deficiency  in  most  dyslexics  is  in  metacognition, 
the  process  of  thinking  about  thinking.  Metacognition  can  take 
many  forms,  including  strategic  planning,  monitoring,  self¬ 
appraisal,  document-processing  strategies,  and  reading  strategies. 
Metacognition  operates  actively  in  working  memory,  parallel  to 
the  main  information-processing  task.  Consequently,  this  activity 
places  yet  another  demand  on  the  memory  resource.  Also,  because 
these  strategies  are  learned  behaviors,  it  suggests  an  interaction 
with  long-term  memory.  Research  has  shown  that  the  learning  dis¬ 
abled  are  generally  deficient  in  the  metacognitive  area.  While 
nondisabled  learners  acquire  most  of  these  strategies  without  for¬ 
mal  instruction,  studies  have  shown  these  strategies  can  effectively 
be  taught  to  the  learning  disabled  (Collins,  Dickson,  simmons,  & 
Kameenui,  1998).  Again,  we  must  be  reminded  that  skilled  readers 
become  skilled  through  years  of  practice  with  different  reading 
materials  that  were  processed  to  support  diverse  purposes.  Be¬ 
cause  proficient  readers  and  learners  use  these  strategies  on  a  fre¬ 
quent  basis,  they  migrate,  over  time,  to  the  level  of  an  automatic 
skill.  In  contrast,  the  absence  of  persistent  exposure  to  reading 
places  the  learning  disabled  at  a  further  disadvantage. 

Cromley  (2005)  indicated  that  the  learning  disabled  are  less 
likely  to  know  when  they  do  not  understand  and  are  less  likely  to 
reread,  synthesize,  generate  questions,  or  make  predictions. 
They  are  also  more  likely  to  “satisfice”  and  accept  partial  or  in¬ 
correct  information  rather  than  persevering  to  gain  a  more  com¬ 
plete  understanding.  A  distinguishing  attribute  of  good  readers 
is  their  ability  to  monitor  the  information-processing  activity;  in 
other  words,  they  ask  themselves:  “Do  I  understand  what  I  am 
reading?  Do  I  see  the  relevance  of  the  information  to  the  task? 
What  do  I  expect  to  follow  in  the  next  passage?  How  does  this  re¬ 
late  to  what  I  already  know?”  Corley  and  Taymans  (2002)  orga¬ 
nized  this  activity  in  three  parts:  setting  goals  before  the  task  and 
establishing  a  plan;  monitoring  comprehension  and  under¬ 
standing  while  engaging  the  task;  and  evaluating  one’s  learning 
after  the  task  and  making  adaptations  when  faced  with  similar 
tasks.  Danielson  (2002),  referencing  users  navigating  the  web, 
suggested  that  once  goals  are  made,  strategies  are  selected  for 
achieving  that  goal.  Further,  strategies  are  constantly  assessed 
and  adjusted  many  times  in  a  successful  interaction  experience. 
Given  the  mental  demand  of  these  activities,  monitoring  and 
adjusting  activities  pose  a  challenge  for  the  disabled.  Finally,  Britt 
and  Gabrys  (2002)  indicated  that  document  literacy  requires  four 
metacognitive  skills:  sourcing  (evaluating  the  credibility  of 
source),  corroboration  (seeking  independent  confirmation),  in¬ 
tegration  (creating  mental  representation  of  information),  and 
search  (a  skill  woven  throughout  the  other  three  areas).  The  in¬ 
ability  of  the  learning  disabled  to  engage  in  self-evaluating  activ¬ 


Universal  Accessibility  and  Functionally  Illiterate  Populations  *  877 


ities  is  evident  in  the  work  of  Birru  et  al.  (2004).  Although  most 
of  the  participants  in  their  study  failed  to  answer  the  questions 
correctly,  seven  out  of  eight  reported  feeling  “very  comfortable” 
or  “comfortable”  with  their  Internet  search  experience.  On  the 
sourcing  and  corroboration  skills,  five  out  of  eight  used  infor¬ 
mation  provided  on  sponsored  sites,  yet  still  reported  it  was 
“very  easy”  to  find  trustworthy  information  on  the  Internet. 

In  short,  metacognition  requires  a  tremendous  amount  of 
cognitive  activity,  activity  that  occurs  in  parallel  with  the  primary 
decoding  task.  All  of  this  activity  occurs  in  the  executive  control 
component  of  working  memory.  As  noted  in  the  preceding  sec¬ 
tion,  this  is  a  limited  resource  for  the  learning-disabled  reader 
who  devotes  much  of  this  resource  to  low-level  decoding. 

We  take  for  granted  that  nondisabled  users  engage  in  all  of 
these  activities  effectively  and  effortlessly.  Design  support  for  this 
variable  amounts  to  providing  constant  feedback  to  the  disabled 
user,  explicitly  connecting  the  information  to  the  task,  establish¬ 
ing  checkpoints  where  the  system  asks  questions  of  the  reader 
to  monitor  understanding,  and  engineering  performance  sup¬ 
port  in  the  form  of  cognitive  scaffolding.  On  the  matter  of  feed¬ 
back,  for  example,  Dickinson  et  al.  (2003)  noted  that  longer  tasks 
are  tolerated  by  the  learning  disabled  so  long  as  they  receive  fre¬ 
quent  feedback  on  what  is  happening  and  confirmation  they  are 
proceeding  correctly.  In  the  absence  of  such  feedback,  the  poor 
reader  becomes  anxious,  a  state  that  further  degrades  perfor¬ 
mance  since  anxiety  occupies  the  working  memory  space. 

Navigation  and  Search 

The  learning  disabled  have  traditionally  experienced  tremen¬ 
dous  difficulty  reading  paper-based  materials.  They  experience 
this  difficulty  despite  the  fact  the  linear  format  found  in  most 
paper-based  materials  offers  considerable  support  for  the  reader. 
In  these  materials,  the  author  assumes  the  burden  for  commu¬ 
nicating  organization  through  meaningful  structures,  logical  se¬ 
quences,  helpful  transitions,  and  explicit  connections.  In  con¬ 
trast,  the  nonlinear  web  requires  the  reader  to  infer  organization 
and  build  a  coherent  model  of  the  subject  matter  as  the  infor¬ 
mation  unfolds  in  less  predictable  ways  (Britt  &  Gabrys,  2002). 
McDonald  and  Stevenson  (1998)  identified  disorientation  in  the 
nonlinear  environment  as  one  of  the  most  challenging  elements 
of  the  web.  While  disorientation  is  an  obstacle  that  can  be  over¬ 
come  by  able  users,  this  barrier  is  often  insurmountable  for  the 
learning  disabled.  This  population,  at  best,  experiences  difficulty 
building  a  mental  model  of  the  information,  monitoring  where 
they  are  in  the  experience,  or  retracing  their  path.  Each  of  these 
activities  draws  on  their  overtaxed  working  memory. 

As  one  examines  the  factors  affecting  readers’  ability  to  navi¬ 
gate,  a  number  of  variables  emerge: 

•  Basic  navigation  skills  related  to  the  interface 

•  Topology  formats:  nonlinear,  hierarchical,  and  mixed  topologies 

•  Depth  versus  breadth  of  the  structure 

•  Navigational  aids 

Interaction  skills.  'While  there  is  limited  research  exam¬ 
ining  the  effects  of  learning  disabilities  on  navigation,  the  work 
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of  Zarcadoolas,  Blanco,  Boyer,  and  Pleasant  (2002)  suggested 
learning-disabled  users  can  quickly  learn  and  retain  basic  web- 
interaction  skills.  These  skills  include  the  behavior  of  links  and 
the  act  of  scrolling.  Participants  in  this  study  did  require  training 
and  prompting  to  use  graphic  links.  Overall,  the  positive  reac¬ 
tion  to  the  linking  convention  is  encouraging  since  Zarcadoolas 
et  al.  also  found  the  participants,  when  given  a  choice,  preferred 
following  links  rather  than  using  search.  Worth  noting,  Birru 
et  al.  (2004)  found  that  users  in  their  study,  although  comfort¬ 
able  with  the  linking  convention,  would  seldom  click  on  more 
than  one  or  two  links  to  answer  questions.  Finally,  although  ba¬ 
sic  scrolling  behavior  was  understood,  none  of  the  participants 
in  the  Zarcadoolas  et  al.  study  scrolled  to  view  additional  infor¬ 
mation.  Without  additional  corroborating  evidence  it  is  difficult 
to  generalize  from  these  limited  studies.  However,  the  behav¬ 
iors  exhibited  in  these  studies  warrant  close  scrutiny  as  we  con¬ 
tinue  to  study  and  observe  this  class  of  users. 

In  contrast  to  the  relative  ease  with  which  this  population 
learned  basic  interaction  behaviors,  a  more  significant  challenge 
is  faced  navigating  the  larger  system.  Danielson  (2002)  sug¬ 
gested  users  begin  the  navigation  task  with  a  decision  about 
whether  they  are  looking  for  a  specific  item,  a  group  of  items,  or 
general  information  about  the  contents  of  a  domain  area.  As 
noted  in  the  metacognitive  area,  the  learning  disabled  are  gen¬ 
erally  weak  at  planning  tasks,  which  immediately  places  them 
at  a  disadvantage  in  the  navigation  area.  Unfortunately,  there  is 
little  or  no  research  that  directly  examines  the  learning-disabled 
population  and  the  structural  issues  of  nonlinear  versus  hierar¬ 
chical,  or  depth  versus  breadth.  There  is,  however,  considerable 
speculation  on  these  issues  since  performance  differences  for 
general-population  users  are  typically  discussed  within  the  con¬ 
text  of  working  memory  capacity. 

Topology.  The  work  of  McDonald  and  Stevenson  (1998) 
shed  some  light  on  the  issue  of  the  efficacy  of  hierarchical,  non¬ 
linear,  and  mixed  (hierarchical  with  referential  links)  topologies. 
Most  significant  perhaps  for  the  purposes  of  this  discussion, 
they  found  novices  benefited  most  from  the  mixed  structure 
since  it  offered  a  balance  of  freedom  and  control.  The  referen¬ 
tial  links  supported  exploration  of  a  site  without  the  support  of 
a  well-formed  mental  model,  while  the  hierarchical  framework 
served  to  constrain  movements  and  minimize  disorientation. 
While  the  strict  hierarchical  structure  provided  the  greatest  con¬ 
trol  and  guidance  for  participants,  it  proved  inefficient  when 
participants  wanted  to  make  distal  movements  in  the  structure. 
Finally,  the  nonlinear  structure  simply  provided  too  many  op¬ 
tions  and  placed  the  heaviest  demands  on  expertise  and  work¬ 
ing  memory.  In  an  earlier  study,  McDonald  and  Stevenson 
(1996)  found  users  stopped  reading  too  soon  when  faced  with 
too  many  decisions  related  to  “what”  and  “how  much”  to  read. 
In  each  of  these  situations,  it  is  easy  to  speculate  that  the  prob¬ 
lems  faced  by  the  participants  in  these  studies  would  only  be 
exacerbated  for  the  learning  disabled,  given  their  underlying 
deficiencies  in  working  memory  and  metacognition. 

Depth  vs.  breadth.  The  efficacy  of  breadth  versus  depth 
in  the  hierarchy  is  also  open  to  speculation.  As  with  topology, 
the  depth  issue  is  framed  by  limitations  of  working  memory  and 
the  users’  ability  to  distribute  the  navigation  workload  to  their 


own  conceptual  models  of  the  domain.  In  this  discussion,  we 
see  greater  diversity  of  opinion.  In  the  world  of  web  design,  best 
practice  favors  breadth  over  depth,  resulting  in  a  greater  num¬ 
ber  of  choices  at  the  highest  level.  A  study  by  Larson  and  Czer- 
winski  (1998)  suggested  that  a  medium  depth  produced  the 
best  search  performance  over  either  the  broader  or  the  deeper 
options.  Benard  (n.d.)  suggested  depth  versus  breadth  is  not 
the  best  framing  mechanism.  Instead,  he  suggests  the  shape  of 
the  hierarchy  is  the  best  predictor  of  search  performance,  find¬ 
ing  a  broad  first  level,  a  narrow  middle  level,  and  a  broad  base  to 
be  most  efficient.  While  not  based  on  a  controlled  study,  Kolatch 
(2000)  proposed  a  simpler  top-level  interface,  offering  fewer 
choices  and  a  deeper  structure  for  cognitively  disabled  users. 
Kolatch  was  careful  to  point  out  that  this  contradicts  most  re¬ 
search  on  the  topic  (web  section  Analyses).  Until  new  research 
suggests  otherwise,  the  balanced  structure  proposed  by  Larson 
and  Czerwinski  (1998)  provides  guidance  consistent  with  our 
understanding  of  the  adverse  effects  of  too  many  choices  and 
the  mental  load  imposed  by  deep  structures. 

Navigation  aids.  Danielson  (2002)  suggested  that  a  vari¬ 
ety  of  visual  navigational  aids  can  mitigate  the  aforementioned 
problems  encountered  in  navigation.  Navigational  aids  fall  into 
two  categories:  index  and  table  of  contents  lists,  and  site  maps. 
Danielson  noted  that  graphical  site  maps  are  superior  to  lists 
and  table  of  contents  at  conveying  the  relationships  between 
distal  nodes  on  the  site.  The  study  also  showed  that  the  site 
maps  benefited  nonknowledgeable  users  more  than  the  knowl¬ 
edgeable.  Finally,  when  compared  to  a  control  group  without 
site  maps,  Danielson  reported  that  subjects  with  maps  were  less 
likely  to  abandon  the  task,  reported  information-seeking  confi¬ 
dence,  moved  deeper  into  the  site,  and  made  more  movements 
outside  the  hierarchy.  Given  the  previously  defined  tendencies 
of  the  learning-disabled  population,  it’s  highly  probable  that 
they  would  benefit  from  the  assistance  offered  by  site  maps. 
Supporting  this  theory  Mirchandani  (2003)  found  detailed  site 
maps  helped  the  disabled  user  find  information  within  a  click  or 
two.  See  the  ADA  Insights  website  (http:/Avww.adainsights.org) 
for  an  example  of  a  site  map  designed  to  support  disabled  users. 
As  a  visualization,  the  benefit  of  the  site  map  is  its  ability  to  shift 
the  cognitive  load  from  working  memory  to  the  visual  and 
thereby  transform  the  load-intensive  cognitive  task  to  a  less  la¬ 
bor-intensive  perceptual  task.  A  more  extensive  navigational 
support  tool  can  be  found  on  the  CAST  website  (http:/Avww.cast 
.org/)  The  tool  displays  a  taxonomical  view  of  the  site,  the  user’s 
current  location,  the  user’s  navigational  path,  quick  links  to  fre¬ 
quently  visited  areas,  and  a  notepad  that  allows  the  creation  and 
storage  of  notes.  Combined,  this  support  helps  compensate  for 
deficiencies  in  monitoring  skills. 

Search.  In  contrast  to  navigation,  research  is  slowly  emerg¬ 
ing  related  to  the  learning  disabled  and  the  search  task.  Not 
surprisingly,  this  research  reports  the  learning  disabled  experi¬ 
ence  tremendous  difficulty  with  search  tasks.  At  the  most  basic 
level,  the  poor  spelling  displayed  by  many  in  the  lower-literacy 
population  creates  a  major  barrier  to  successful  searches.  Search 
engines,  such  as  Google,  that  offer  alternative  spellings  are  best. 
Weak  self-monitoring  skills  also  decrease  the  likelihood  that  the 
user  will  recognize  that  a  mistake  was  made.  Even  when  search 
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terms  were  spelled  correctly,  Birru  et  al.  (2004)  found  that  sub¬ 
jects  rarely  retyped  search  terms  to  locate  more  relevant  items — 
a  serious  problem  since  they  also  found  the  low-literacy  adults 
in  this  study  rarely  used  optimal  search  terms  as  they  attempted 
to  retrieve  health-related  information.  Birru  et  al.  (2004)  con¬ 
cluded  by  highlighting  the  benefits  of  a  categorizing  search  en¬ 
gine  as  one  means  of  minimizing  the  adverse  effects  of  poor 
search  and  long  lists  of  results.  This  engine  would  sort  results  by 
category  and  minimize  the  need  to  evaluate  long,  unsorted  lists. 
This  is  a  good  example  of  a  technology  assuming  the  burden  for 
a  weak  metacognitive  skill  in  the  disabled  population. 


DESIGN  ACCOMMODATIONS 


There  are  three  approaches  to  accommodating  the  needs  of 
the  learning-disabled  population.  Each  of  these  recommenda¬ 
tions  is  based  on  the  previously  discussed  characteristics  of  this 
population. 

1.  Assistive  technologies:  Building  technologies  customized  to 
compensate  for  the  disability; 

2.  Layered  design:  Designating  special  sections  of  the  website 
or  software  for  a  disabled  population; 

3.  Universal  design:  Adopting  design  practices  that  enable  the 
learning-disabled  population  while  benefiting  the  larger  pop¬ 
ulation  as  well. 

The  following  sections  will  briefly  review  the  first  two  cate¬ 
gories  and  provide  more  detailed  recommendations  for  the 
third  category,  universal  design.  This  emphasis  is  based  on  three 
considerations.  First,  the  initial  two  categories  require  greater 
effort  and  increase  development  costs.  Second,  they  benefit  a 
smaller  portion  of  the  population,  typically  the  most  severely 
disabled.  Third,  universal  design  principles  are  less  likely  to  in¬ 
crease  development  time  or  costs  and  will  likely  benefit  a  larger 
number  of  the  disabled  and  fully  functioning  populations. 

Assistive  technologies.  A  number  of  studies  have  demon¬ 
strated  the  benefits  of  assistive  technologies  for  the  disabled 
(Brown,  1992;  Cole  &  Dehdashti,  1998;  Newell  &  Gregor,  2000; 
McGrenere  et  al.,  2003).  A  typical  application  in  this  genre  is  a 
stand-alone  computer-based  educational  system,  designed  to  as¬ 
sist  severely  disabled  children.  Other,  less  intrusive  accommo¬ 
dations,  including  software  agents  and  embedded  scaffolding, 
have  been  explored  by  Shaw,  Johnson,  and  Ganeshan  (1999); 
Quintana,  Krajcit,  and  Soloway  (2003);  and  Shneiderman  (2003). 

Scaffolding  and  agents  offer  performance  support  to  the  dis¬ 
abled  user  in  the  early  stages  of  interaction  with  a  new  product. 
Shneiderman  (2000)  described  this  approach  as  “evolutionary 
learning.”  As  the  user  becomes  more  proficient  with  the  re¬ 
quirements  of  the  system  and  task,  the  scaffolding  is  slowly  torn 
down,  either  under  the  control  of  the  user  or  through  intelligent 
software  agents.  Most  typically,  the  agent  replaces  or  supports  a 
deficient  metacognitive  process  by  assisting  with  planning,  mon¬ 
itoring,  self-appraisal,  strategy  setting,  and  feedback.  Quite  sim¬ 
ply  this  burden  is  shifted  from  the  users’  working  memory  to 
the  system.  Shneiderman  (2000)  and  Quintana,  Krajcik,  and 
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Soloway  (2002)  highlighted  the  efficacy  of  this  approach  for 
supporting  less-knowledgeable  and  disabled  populations. 

Layered  design.  The  layered  design  strategy  requires  the 
design  of  two  versions  of  the  system:  one  for  the  functionally  lit¬ 
erate  users  and  one  for  the  functionally  illiterate.  While  it  is  gen¬ 
erally  preferred  not  to  separate  disabled  and  fully  functioning 
populations,  layered  design  is  simply  an  extension  of  a  practice 
that  has  been  used  for  years  for  separating  domain  experts  from 
nonexperts  or  native  speakers  from  non-native  speakers.  In  this 
case,  one  area  of  the  site  would  be  written  at  the  population  av¬ 
erage  8th-grade  level,  while  another  section  of  the  site  would  be 
written  at  the  5th-grade  level  or  lower.  Lobach,  Arbanas,  Mishra, 
Campbell,  and  Wildemuth,  (2004)  described  the  benefits  of  a 
two-tier  literacy  system  in  their  study.  Sites  such  as  this  may  in¬ 
clude  other  assistance,  such  as  agents,  site  maps,  illustrations, 
and  the  like. 

Universal  design.  In  contrast  to  the  first  two  approaches, 
universal  design  accommodations  are  seamlessly  embedded  in 
the  system  interface  and  interaction  design.  This  class  of  ac¬ 
commodations  is  less  likely  to  affect  development  costs  or  cre¬ 
ate  an  obstacle  for  the  nondisabled.  Consequently,  these  ac¬ 
commodations  are  more  likely  to  be  embraced  by  the  wider 
development  and  user  communities.  Further,  it  is  widely  ac¬ 
cepted  in  the  accessibility  community  that  these  accommodations 
improve  the  usability  of  the  product  for  all  users  (Shneiderman, 
2000;  Dickinson,  Eisma,  &  Gregor,  2003).  The  following  recom¬ 
mendations  are  organized  in  the  previously  defined  categories, 
although  support  for  a  given  recommendation  is  often  found 
in  multiple  categories.  Finally,  recommendations  designated 
with  an  identify  recommendations  that  require  additional  re¬ 
search  and  warrant  close  monitoring.  All  others  enjoy  wide  sup¬ 
port  in  the  literature. 

Reading 

When  designing  the  language  component  of  any  system,  sup¬ 
port  the  following: 

•  Maintain  an  appropriate  reading  level  (use  readability  formu¬ 
las  to  establish  a  benchmark;  measure  word  difficulty  and 
sentence  length); 

•  Use  active  voice; 

•  Use  the  Word  Frequency  Book  to  identify  common  words; 

•  Place  information  and  instruction  in  context; 

•  Employ  lists; 

•  Chunk  information; 

•  Present  content  in  sequence; 

•  Repeat  information  from  screen  to  screen  (do  not  assume 
the  user  will  carry  over); 

•  Maintain  consistency  in  language  and  procedures; 

•  Communicate  directly  and  concretely; 

•  Emphasize  actions  users  must  complete; 

•  Highlight  critical  information,  information  structure,  or  new 
information; 

•  Use  familiar  terms,  and  avoid  acronyms  and  jargon; 
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•  Use  visual  and  auditory  prompts; 

•  Provide  definitions  of  critical  terms  through  direct  linking  to 
glossary; 

•  Use  illustrations  to  complement  text,  communicate  struc¬ 
ture,  and  emphasize  connections; 

•  Highlight  (using  circles,  arrows,  and  the  like)  critical  areas  of 
an  illustration  and  explicitly  connect  to  the  text; 

•  Avoid  the  gratuitous  use  of  animations  and  other  movement;* 

•  Use  12-point  type; 

•  Use  familiar  typefaces  (there  is  conflicting  research  regarding 
the  efficacy  of  serif  versus  sans  serif);* 

•  Avoid  tight  letter-spacing  (makes  low-level  decoding  more 
difficult) ; 

•  Pace  auditory  output  slowly,  and  allow  user  control  to  repeat 
output; 

•  Avoid  background  noise  with  auditory  output; 

•  Provide  “looser”  versus  tighter  line  spacing; 

•  Support  easy,  user  controlled,  style  changes; 

•  Make  it  “look  easy”  by  lowering  information  density; 

•  Limit  use  of  italics  and  uppercase  (effects  are  much  more  se¬ 
vere  than  for  the  general  population); 

•  Maintain  higher  contrast;* 

•  Avoid  light  text  on  a  dark  background; 

•  Use  ragged  right  formats  to  preserve  consistent  word  space. 

Memory 

In  an  effort  to  decrease  mental  workload,  support  the  following: 

•  Maintain  consistency  in  all  aspects  of  the  design; 

•  Leverage  existing  knowledge,  behaviors,  and  tasks; 

•  Avoid  splitting  attention  between  two  tasks; 

•  Focus  on  the  user  goals; 

•  Limit  information  and  features  to  what  is  really  needed; 

•  Limit  chunking  complexity  for  audio  output  (particularly 
IVR); 

•  Focus  on  behaviors  and  tasks  rather  than  facts; 

•  Partition  tasks  in  reasonably  sized  groups; 

•  Avoid  the  need  to  retain  information  over  long  tasks  or  across 
multiple  screens; 

•  Support  mental  calculations,  decisions,  and  comparisons; 

•  Complete  mathematical  calculations; 

•  Limit  choices; 

•  Provide  a  list  of  options  for  entry  fields; 

•  Complete  information  automatically  in  forms  and  fields 
whenever  possible; 

•  Employ  advanced  organizers; 

•  Use  mnemonics; 

•  Minimize  screen  clutter; 

•  Provide  extra  time  for  tasks; 

•  Eliminate  the  anxiety  of  timeouts. 


Metacognitive 

In  an  effort  to  strengthen  or  replace  deficient  metacognitive 
processes,  support  the  following: 

•  Maintain  a  consistent  design; 

•  Convey  associations  between  new  information  or  process 
and  that  which  is  known; 

•  Communicate  goal  or  purpose  of  the  site  immediately; 

•  Communicate  prerequisite  knowledge  for  the  task  and  pro¬ 
vide  convenient  links; 

•  Communicate  required  sequences  or  organizational  structures; 

•  Allow  the  user  to  interact  with  the  information; 

•  Use  checklists  to  support  self-monitoring; 

•  Minimize  choices  that  require  fine  discriminations  or  close 
monitoring; 

•  Design  for  immediate/early  success  (which  lowers  anxiety 
and  builds  confidence); 

•  Align  with  user  goals,  and  provide  reward  or  convey  value 
proposition  (motivation) ; 

•  Support  cooperative  work  activities; 

•  Provide  reminders; 

•  Provide  source  information  for  material  presented; 

•  Minimize  embedded  links; 

•  Avoid  taking  user  to  other  pages  for  ancillary  information; 

•  Employ  error  prevention  and  recovery  support; 

•  Query  when  choices  or  decisions  are  required; 

•  Review  information  entered  or  validate  the  successful  com¬ 
pletion  of  a  process; 

•  Use  auditory  and  visual  cues  to  mark  stages  of  a  work  cycle, 
helping  the  user  self-monitor. 

Navigation  and  Search 

In  an  effort  to  improve  the  effectiveness  of  search  and  naviga¬ 
tion,  support  the  following: 

•  Make  information  and  features  supporting  goals  readily  ac¬ 
cessible  to  minimize  navigation  and  search; 

•  Provide  persistent  presentation  of  path  history; 

•  Offer  persistent  opportunity  to  exit,  backup,  or  return  to  start; 

•  Use  redundancy:  repetition,  signal  design,  and  channel; 

•  Provide  status  indicators; 

•  Minimize  scrolling; 

•  Label  all  links; 

•  Provide  linked  paths  for  probable  scenarios; 

•  Maintain  alerts  onscreen  until  dismissed  by  user; 

•  Place  information  or  process  in  context; 

•  Partition  information  into  categories  defined  by  clear  rules; 

•  Use  site  maps,  tables  of  contents,  and  indexes; 

•  Use  a  topology  that  is  primarily  hierarchical  with  referential 
links  to  areas  aligned  with  goals;* 
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•  Maintain  a  medium  breadth  and  depth — not  too  broad,  not 
too  deep;* 

•  Use  clear,  thematic  labels  at  each  level; 

•  Provide  productive  terms  for  search; 

•  Offer  suggested  spellings; 

•  Use  categorizing  search  engines; 

•  Provide  performance  support  for  evaluating  search  results 
(corroboration  and  sourcing). 


Guidance  for  Usability  Testing 

To  produce  the  best  data  and  protect  the  well-being  of  partici¬ 
pants,  support  the  following: 

•  Avoid  embarrassment  or  humiliation  (anxiety  lowers  per¬ 
formance); 

•  Consider  ethical  obligations  when  screening  for  lower- 
literacy  samples  (full-disclosure  issues,  informed  consent, 
and  IRB); 

•  Minimize  anxiety  in  testing  situation; 

•  Test  in  context  (actual  work  environment) ; 

•  Avoid  using  the  term  “testing”  since  this  population  equates 
the  term  with  failure; 

•  Break  tasks  in  the  test  script  into  small,  yet  logical,  units; 

•  Shorten  test  times  to  minimize  effects  of  fatigue  on  perfor¬ 
mance  and  to  compensate  for  attention  deficit; 

•  Use  direct  interaction  protocol  since  users  may  not  freely 
think  aloud  (inadequate  working  memory  to  support  both 
think-aloud  and  to  acclimate  to  the  task);* 

•  Accommodate  slower  processing  speeds  when  setting  per¬ 
formance  levels  (e.g.,  task  times); 
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•  Expect  participants  to  report  exaggerated  levels  of  perfor¬ 
mance  or  system  quality  in  self-reporting  as  a  means  of  cov¬ 
ering  poor  literacy  skills. 


CONCLUSION 


Ten  years  ago,  our  concern  for  the  effects  of  lower  literacy  was 
confined  exclusively  to  the  processing  of  print  media.  With  the 
explosion  of  information  technology  in  the  workplace  and  the 
role  of  the  Internet  as  the  repository  for  information  of  all  types, 
literacy  has  become  a  major  barrier  for  millions  of  Americans 
as  they  compete  for  work  and  attempt  to  improve  the  quality 
of  their  lives.  While  the  problems  encountered  by  lower-literacy 
populations  with  print  media  are  significant,  the  problems  be¬ 
come  greatly  exacerbated  with  the  increased  mental  workload 
imposed  by  computer  systems. 

As  advocated  in  this  chapter,  many  of  the  barriers  to  accessi¬ 
bility  for  this  population  require  simple  modifications  in  the 
interface  and  interaction  design.  Accessible  design  is  an  art,  in 
which  the  needs  of  one  group  are  carefully  balanced  against 
those  of  another,  and  the  designer  recognizes  that  “accessible  for 
most”  is  more  achievable  than  “accessible  for  all.”  As  one  reviews 
the  introductory  list  of  design  accommodations  required  by  the 
lower-literacy  population,  HCI  professionals  will  recognize  most, 
if  not  all,  of  these  variables  as  ones  considered  in  each  and  every 
interface  design.  The  difference — and  a  significant  one — is  that 
fully  capable  users  have  flexible  learning,  problem  solving,  and 
interaction  skills  that  allow  them  to  adjust  to  less-than-perfect 
design  specifications.  The  learning  disabled,  in  contrast,  lack 
cognitive  flexibility  and  suffer  varying  degrees  of  performance 
degradation  in  nonoptimum  design  environments.  By  increasing 
our  understanding  of  this  sizable  population  and  embracing  the 
design  practices  outlined  here,  we  may  finally  ensure  that  uni¬ 
versal  accessibility  is  truly  universal. 
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INTRODUCTION 


Traditionally,  human  interactions  with  computers  has  relied 
most  heavily  on  visual  perception  and  motor  ability.  Graphical 
user  interfaces  (GUIs)  have  dominated  computer  displays,  both 
large  and  small,  while  mouse  and  keyboard  devices  have  domi¬ 
nated  computer  input.  Increasingly,  however,  newer  interfaces 
are  using  sound  and  speech.  The  use  of  attention-grabbing  multi- 
media  computer  presentations  plus  the  increasing  use  of  con¬ 
versational  speech  interfaces  are  examples. 

As  exciting  as  these  new  technologies  may  be,  they  have  the 
potential  to  disenfranchise  deaf  and  hard  of  hearing  users.  This 
chapter  will  discuss  interface  technologies  as  they  relate  to  the 
needs  of  users  unable  to  hear  auditory  information.  In  addition, 
language  considerations  associated  with  limited  hearing  will  be 
discussed. 

This  chapter  begins  with  a  discussion  of  hearing  loss,  fol¬ 
lowed  by  issues  of  language  acquisition  as  they  relate  to  hear¬ 
ing  loss.  Auditory  user  interfaces  that  present  difficulties  will  be 
discussed  next,  along  with  information  about  interface  alterna¬ 
tives  that  enable  access  for  deaf  and  hard  of  hearing  users.  The 
chapter  will  conclude  with  a  brief  overview  of  technologies  that 
have  been  developed  to  assist  with  communication. 


HEARING  LOSS 


According  to  the  National  Institute  on  Deafness  and  Other  Com¬ 
munication  Disorders  (NIDCD),  about  28  million  people  in  the 
United  States  have  some  degree  of  hearing  loss  (National  Insti¬ 
tutes  of  Health  [NIH] ,  2004;  see  also,  Mitchell,  2005,  2006).  This 
is  a  sizable  population  to  be  considered  in  the  design  of  com¬ 
puter  interfaces.  Numbers  alone,  however,  obscure  some  sig¬ 
nificant  differences  among  the  individuals  who  experience  hear¬ 
ing  loss.  Degree  and  type  of  loss,  age  of  onset  of  loss,  as  well  as 
family,  educational,  and  societal  influences  will  all  contribute  to 
the  experience  and  abilities  of  an  individual  who  has  a  func¬ 
tional  hearing  loss. 

Some  individuals  will  have  relatively  little  hearing  loss,  while 
others  will  experience  a  profound  loss.  Individuals  who  are  hard 
of  hearing  will  generally  have  some  hearing.  The  ability  that  an 
individual  user  will  have  to  make  use  of  their  residual  hearing, 
however,  is  not  a  straightforward  calculation  of  dB  loss.  People 
with  hearing  loss  will  experience  difficulty  with  pitch,  timbre, 
and  loudness,  but,  critically,  will  also  experience  difficulty  with 
speech  perception.  Factors  such  as  type  of  loss  (e.g.,  conduc¬ 
tive,  sensorineural,  mixed,  or  central)  will  have  a  major  effect  on 
the  user  experience.  People  with  sensorineural  hearing  loss 
(such  as  resulting  from  lengthy  exposure  to  loud  noises  or  as  a 
result  of  aging)  generally  will  have  more  difficulty  perceiving 
speech  than  people  with  conductive  hearing  losses  that  result 


from  difficulties  in  middle  ear  functioning.  The  extent  to  which 
an  individual  makes  use  of  this  hearing  for  communication  and 
whether  the  individual  can  hear  computer  sounds,  however, 
varies  greatly. 

In  addition,  the  way  in  which  an  individual  having  a  hearing 
loss  interacts  with  their  environment  may  also  be  influenced  by 
societal  factors.  Whether  individuals  identify  themselves  as  Deaf 
(with  an  uppercase  “D”  as  a  member  of  the  Deaf  Community), 
deaf,  or  hard  of  hearing  is  indicative  of  cultural  identity  (Lane, 
1992;  Padden  &  Humphries,  1988,  2005).^  Membership  in  the 
Deaf  Community  is  determined  more  by  shared  language  and 
worldviews  rather  than  by  results  of  audiometric  tests.  For  ex¬ 
ample,  people  may  lose  their  hearing  with  age.  These  people, 
while  deaf  by  the  audiometric  definition,  would  not  share  the 
culture  of  the  Deaf  Community. 

In  the  United  States,  the  language  of  the  Deaf  Community  is 
American  Sign  Language  (ASL).  Other  countries  and  locales 
have  their  own  native  sign  languages  shared  by  members  of 
Deaf  Communities  in  those  areas.  Interestingly,  these  signed 
languages  are  not  based  on  the  spoken  languages  of  the  region. 
People  are  often  surprised  to  learn  that  ASL  is  more  similar  to 
French  Sign  Language,  from  which  it  originated  (Lane,  1984), 
than  it  is  to  British  Sign  Language.  Beginning  with  the  seminal 
work  of  Ursula  Bellugi  and  colleagues  in  the  1970s,  linguists, 
cognitive  psychologists,  and  brain  researchers  have  studied  na¬ 
tive  sign  languages  and  their  users  for  clues  as  to  the  origin  of 
language  and  the  biological  nature  of  language  (Emmorey  & 
Lane,  2000;  Klima  &  Bellugi,  1979;  Erard,  2005). 

Cochlear  implants  are  a  medical  intervention  that  has  re¬ 
ceived  much  attention  in  the  last  couple  of  decades.  The  deci¬ 
sion  as  to  whether  or  not  to  have  a  cochlear  implant  is  often  a 
complex  one,  as  was  explored  in  the  movie  Sound  and  Fury 
(Aronson,  2000;  see  also  Hyde  &  Power,  2006).  From  the  stand¬ 
point  of  a  user,  an  implant  is  not  the  same  as  perfect  hearing, 
but  does  allow  the  user  to  hear  sounds  and,  with  training,  may 
greatly  aid  in  the  perception  of  speech  (Chorost,  2005). 


HEARING  AND  LANGUAGE 


Some,  but  not  all,  deaf  and  hard  of  hearing  individuals  use  sign 
language.  The  type  of  signed  language  depends  on  the  user’s 
life  experiences.  Deaf  children  born  to  deaf  parents,  regardless 
of  severity  of  hearing  loss,  will  generally  acquire  a  sign  language, 
such  as  ASL,  natively  as  hearing  children  of  hearing  parents  ac¬ 
quire  spoken  language.  People  who  lose  their  hearing  late  in  life 
generally  will  not  master  a  sign  language.  In  between,  there  are 
many  variations.  Deaf  signers  may  be  exposed  to  a  native  sign 
language  as  adults  and  as  a  result,  may  only  acquire  partial  mas¬ 
tery  (Newport,  1990).  Other  deaf  signers  will  be  exposed  pri¬ 
marily  to  manual  forms  of  English  (or  other  spoken  languages), 
rather  than  natural  sign  languages.  Many  schools  that  use  sign 


^Terms  such  as  hearing-impaired  and  deaf-mute  are  generally  considered  to  have  negative  connotations.  For  a  discussion  of  this,  see  “What  is 
Wrong  with  the  Use  of  these  Terms:  ‘Deaf-mute,’  ‘Deaf  and  dumb,’  or  ‘Hearing-impaired’?”  by  the  National  Association  of  the  Deaf  (Available  at 
http://www.nad. org/site/pp.asp?c=foINKQMBF&b=103786  ).  The  style  manual  of  the  American  Psychological  Association  (APA)  recommends  use 
of  nondiscriminatory  language  in  all  publications  (see  http://www.apastyle.org/disabilities.html ). 
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language  in  the  classroom  do  not  use  a  natural  sign  language, 
but  rather  a  representation  of  the  spoken  language  that  is 
signed.  Forms  of  signed  English  borrow  signs  from  ASL,  but 
these  signs  are  produced  in  English  word  order  rather  than 
using  ASL  sentence  structures  (for  an  extended  discussion, 
see  Lane,  Hoffmeister,  &  Bahan,  1996).  Children  attending 
schools  that  use  some  form  of  signed  English  may  not  be  ex¬ 
posed  to  ASL.  Moreover,  schools  that  educate  students  using 
an  oral  approach,  in  which  speech  is  the  primary  means  of 
communication  in  the  classroom,  also  may  not  be  exposed 
to  ASL. 

A  person  who  is  profoundly  deaf  from  birth  may  have  diffi¬ 
culty  acquiring  mastery  of  the  spoken  language,  be  it  presented 
auditorily  or  in  print.  It  is  not  surprising  that  someone  who  has 
never  heard  speech  will  have  difficulty  perceiving  or  producing 
it.  In  a  large-scale  study  of  deaf  and  hard  of  hearing  children 
attending  schools  using  an  oral  approach  to  education,  Con¬ 
rad  (1979)  reported  that  profoundly  deaf  children  rarely  ac¬ 
quired  sufficient  lipreading  skills  to  allow  easy  participation  in 
conversations.  He  found,  on  average,  that  these  children  (with 
hearing  loss  greater  than  85  dB)  could  only  comprehend  about 
25%  to  28%  of  the  words  through  lipreading  that  they  could 
comprehend  through  reading.  Even  among  these  orally  edu¬ 
cated  students,  fewer  than  20%  had  speech  that  was  rated  even 
fairly  easy  to  understand.  While  the  statistics  are  better  for  hard 
of  hearing  children,  Conrad  found  that  even  these  students 
(with  hearing  loss  less  than  65  dB)  could  only  comprehend 
about  36%  of  the  words  through  lipreading  that  they  could 
comprehend  through  reading.  Nearly  85%  of  these  hard  of 
hearing  students,  however,  had  speech  that  was  rated  at  least 
fairly  easy  to  understand. 

Perhaps  more  surprising  may  be  the  fact  that  many  deaf  and 
hard  of  hearing  individuals  have  difficulty  with  reading  (Conrad, 
1979;  Gallaudet  Research  Institute,  2003).  To  understand  this, 
it  is  necessary  to  realize  that  reading  is  based  on  the  underlying 
spoken  language.  Spoken  languages  are  composed  of  sounds, 
linguistically  defined  as  “phonemes.”  These  phonemes  corre¬ 
spond,  albeit  not  always  in  a  one-to-one  relationship  in  English, 
to  letters  or  letter  combinations.  This  is  true  of  all  alphabetic 
languages.  Learning  to  read  is  generally  considered  to  be  learn¬ 
ing  to  map  the  print  onto  the  spoken  language  the  person  al¬ 
ready  knows  (Brady  &  Shankweiler,  1991).  In  addition,  speech 
plays  a  critical  role  in  the  short-term  memory  processes  that 
serve  understanding  of  grammar  and  text  comprehension 
(Lichtenstein,  1998).  In  the  case  of  deaf  readers,  however,  it  can¬ 
not  be  taken  for  granted  that  reading  will  build  on  a  firm  un¬ 
derstanding  of  the  structure  of  the  spoken  language,  but  there 
are  no  absolutes.  Some  prelingually,  profoundly  deaf  children 
become  excellent  readers,  while  some  with  lesser  degrees  of 
hearing  loss  do  not  (Conrad,  1979;  Hanson,  1989). 

Interestingly,  research  has  recently  turned  to  an  examination 
of  signed  languages  as  an  influencer  in  the  development  of 
skilled  reading.  It  has  long  been  known  that  deaf  children  of 
deaf  parents,  on  average,  acquire  greater  mastery  of  reading  and 
writing  than  deaf  children  of  hearing  parents.  Is  this  due  to  early 
exposure  to  sign  language?  While  a  number  of  both  intellectual 
and  societal  factors  have  been  considered  as  contributors  to  this 
disparity,  interest  has  focused  on  the  issue  of  language.  Evi¬ 
dence  is  now  emerging  as  to  the  important  role  that  early  mas¬ 


tery  of  a  sign  language  can  have  on  second  language  learning  for 
deaf  students  as  they  acquire  reading  and  writing  skill  (Padden 
&  Hanson,  2000;  Padden  &  Ramsey,  2000). 

In  short,  for  any  deaf  or  hard  of  hearing  individual,  language 
experience  cannot  be  assumed.  That  individual  may  or  may  not 
sign,  speak  clearly  or  lipread  well,  or  have  reading  skills  consis¬ 
tent  with  those  of  the  hearing  population.  This  knowledge  has 
implications  for  designers  who  seek  to  address  the  needs  of 
deaf  and  hard  of  hearing  users. 


DISPLAY  TECHNOLOGIES 


In  many  ways,  computers  and  other  technologies  have  proven  to 
be  of  great  benefit  to  deaf  and  hard  of  hearing  users.  The  largely 
visual  nature  of  information  on  the  Internet  makes  this  informa¬ 
tion  accessible  to  deaf  and  hard  of  hearing  users.  Instant  mes¬ 
saging  and  e-mail  facilitate  communication  with  deaf  and  hearing 
family,  friends,  and  coworkers,  while  network-connected  wire¬ 
less  devices  such  as  PDAs  and  pagers  are  seen  as  lifelines  that  can 
be  used  in  place  of  cell  phones.  As  would  be  suspected  from  the 
previous  discussion,  however,  effective  interfaces  for  these  and 
other  technologies  for  use  by  deaf  and  hard  of  hearing  individu¬ 
als  must  take  into  account  both  sensory  and  language  consider¬ 
ations.  In  particular,  the  increasing  reliance  on  sound  and  speech 
interfaces  to  convey  information  can  have  serious  consequences 
for  individuals  who  have  a  hearing  loss. 

Audible  Signals 

Sounds  have  become  increasingly  popular  in  computer  inter¬ 
faces.  They  have  long  been  used  to  convey  information  about 
new  messages  and  have  become  popular  as  problem  alerts, 
such  as  when  an  error  has  been  committed.  These  sound  events 
are  considered  attention-grabbing  events  for  users  whose  vi¬ 
sual  attention  may  otherwise  be  engaged.  For  any  user  who  has 
a  hearing  loss,  however,  sounds  will  be  a  problem. 

A  number  of  considerations  can  help  provide  the  necessary  vi¬ 
sual  support  for  a  user  who  is  deaf  or  hard  of  hearing  (Vander- 
heiden,  1994).  It  is  necessary  to  provide  visual  forms  for  all  audi¬ 
tory  information.  Critically,  these  visual  cues  should  be  sufficiently 
noticeable  so  that  they  catch  the  attention  of  a  person  who  may 
not  be  looking  directly  at  the  computer  screen.  Operating  sys¬ 
tems  have  features  that  can  provide  such  visual  alerts.  For  exam¬ 
ple,  Windows®  has  accessibility  features  that  allow  users  to  set 
up  their  system  to  have  captions  or  visual  warnings  displayed  for 
sound  events.  These  are  helpful,  although  they  may  not  give  the 
full  range  of  information  carried  by  a  sound  event.  For  example, 
the  meaning  of  a  sound  event  may  differ  based  on  the  tone  of 
the  signal  or  when  the  sound  is  produced.  While  visual  captions 
and  warnings  may  alert  a  deaf  user  that  a  sound  event  has  oc¬ 
curred,  they  will  be  unable  to  convey  these  more  subtle  distinc¬ 
tions.  It  is  important  for  designers  to  give  careful  consideration  to 
sound  events  to  ensure  that  crucial  information  is  available  by  a 
nonauditory  means  for  deaf  and  hard  of  hearing  users. 

Multimodal  interfaces,  as  the  name  implies,  are  designed  to 
support  a  range  of  perceptual  capabilities.  In  theory,  this  would 
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seem  ideal  for  users  who  are  deaf  or  hard  of  hearing  as  visual 
alternatives  to  auditory  materials  should  be  available.  Multi¬ 
modal  interfaces,  however,  do  to  always  present  all  information 
on  both  modalities.  The  emphasis  in  many  multimodal  inter¬ 
faces  is  representing  information  via  speech  that  would  other¬ 
wise  be  conveyed  by  print  or  some  other  visual  means.  To  the 
detriment  of  deaf  and  hard  of  hearing  users,  often,  less  atten¬ 
tion  is  given  to  ensuring  that  all  auditory  material  be  visually 
conveyed  as  well. 

Multimedia  Interfaces 

Multimedia  uses  a  combination  of  text,  sound,  pictures,  anima¬ 
tion,  and  video  to  present  information  (see  Sutcliff,  chapter  20, 
“Multimedia  User  Interface  Design,”  this  volume).  Traditionally, 
games  and  educational  software  have  exploited  the  richness  of 
multimedia,  but  the  advent  of  high-speed  Internet  communi¬ 
cations  has  enabled  the  use  of  multimedia  for  a  number  of  en¬ 
gaging  applications  on  the  web.  For  the  present  discussion,  con¬ 
sideration  will  be  given  to  multimedia  presentations  as  they  may 
impact  access  for  deaf  and  hard  of  hearing  users. 

Multimedia  material  is  inherently  sensory.  The  technology 
offers  eye-catching  visual  displays  and  attention-grabbing  sound 
effects.  To  the  extent  that  the  information  conveyed  visually 
and  auditorally  is  the  same,  information  can  be  reinforced  for 
users  who  have  both  channels  available  to  them.  To  the  extent 
that  different  information  is  presented  in  the  two  channels, 
however,  users  who  have  a  functional  loss  of  one  of  the  chan¬ 
nels  will  not  have  full  access  to  that  information.  As  it  relates 
to  the  present  discussion,  this  means  that  any  information  that 
is  carried  solely  by  sound  or  speech  will  be  unavailable  to  deaf 
and  hard  of  hearing  users.  Accessibility  guidelines  for  multi- 
media  products  and  web  pages  require  equivalent  visual  pre¬ 
sentations  (e.g.,  see  Brewer  &  Dardailler,  1999;  U.S.  General 
Services  Administration,  n.d.). 

Consider  the  growing  popularity  of  video  on  the  web.  Video 
material  is  an  extremely  effective  way  to  convey  information  and 
younger  generations  of  computer  users  have  come  to  expect 
video  to  be  part  of  their  computer  experience.  For  deaf  and 
hard  of  hearing  users,  the  voice-over  that  is  common  in  video 
will  be  inaccessible.  Words  spoken  by  persons  in  view  of  the 
camera  also  are  largely  inaccessible.  Even  people  skilled  at 
lipreading  cannot  lipread  video  conversations  that  have  poor 
lighting  or  poor  resolution,  or  speakers  who  turn  away  from  the 
camera.  Additionally,  the  video  may  present  music  or  sound 
events  (e.g.,  doorbells  or  animals  noises)  that  contribute  signif¬ 
icantly  to  events  on  the  video.  These,  too,  are  unavailable  to 
deaf  and  hard  of  hearing  users.  Technologies  such  as  caption¬ 
ing  and  sign-language  translation  exist  to  provide  alternative 
presentations  of  sound  events  and  speech.  Designers  wishing  to 
make  their  applications  universally  accessible  should  consider 
these  alternatives  and  incorporate  them  into  their  applications. 

Captioning.  Captioning  provides  a  print  alternative  to 
speech  and  sound  events.  It  is  much  like  subtitling,  except  that 
it  is  specifically  designed  for  deaf  and  hard  of  hearing  users  and, 
thus,  will  include  comments  in  the  captioning  about  sounds 
(e.g.,  “<music  playing>”  or  “<sounds  of  child  crying>”)  that 


may  not  be  included  in  subtitles  of  foreign  language  videos. 
Captioning  of  certain  television  programming  is  mandated  in 
the  United  States  by  the  Federal  Communications  Commission 
and  is  considered  to  be  beneficial  to  large  number  of  users,  not 
only  those  with  hearing  loss.  It  has  been  shown,  for  example, 
to  improve  reading  abilities  of  children  and  to  benefit  second 
language  learners.  Designers  who  use  multimedia  materials 
have  an  obligation  to  their  full  audience  to  provide  captioning  of 
audio  and  video  materials.  The  listing  of  a  number  of  resources 
for  captioning  software  is  available  at  the  Closed  Captioning 
\website  (n.d.). 

As  might  be  anticipated  from  the  previous  review  of  reading 
levels  of  deaf  and  hard  of  hearing  users,  there  has  been  some 
controversy  about  what  language  level  should  be  used  for  cap¬ 
tioning.  Simply  put,  the  issue  revolves  around  the  question  of 
whether  captions  should  be  verbatim  transcripts  or  simplified 
captioning  should  be  provided.  Verbatim  captioning  is  generally 
preferred  by  users  themselves  (NIH,  2002)  and  is  often  cited  as 
having  the  potential  to  improve  reading  skills  (Steinfeld,  2001). 

Signing.  For  sign  language  users,  there  are  sign  language 
alternatives  to  captioning.  These  sign  interfaces  have  been  de¬ 
fined  as  ways  of  representing  signed  languages  on  a  computer 
such  that  signing  can  be  stored,  displayed,  and  manipulated  to 
facilitate  computer  interaction  (Frishberg,  Corazza,  Day,  Wilcox, 
&  Schulmeister,  1993).  It  should  be  noted  that  these  interfaces 
have  often  been  employed  not  only  for  making  audio  and  speech 
materials  accessible  to  deaf  and  hard  of  hearing  users,  but  they 
have  also  been  used  for  language  learning  by  both  deaf  and 
hearing  people.  A  number  of  software  applications  have  been 
developed  that  use  sign  language  as  a  means  of  teaching  read¬ 
ing  skills  and  writing  to  deaf  signers  or  teaching  sign  language 
skills  to  individuals  wishing  to  learn  to  sign. 

Importantly,  software  applications  that  purport  to  use  “sign 
language”  or  ASL  differ  in  significant  ways  in  the  language  that  is 
being  used.  Notably,  many  of  these  applications  use  finger¬ 
spelling.  Fingerspelling  is  not  a  natural  signed  language,  but 
rather  is  a  derivative  of  print.  Specifically,  in  fingerspelling,  there 
is  one  handshape  for  each  letter  of  the  alphabet  and  words  are 
spelled  out  letter  by  letter  on  the  hands.  Thus,  the  word  “lan¬ 
guage”  would  be  spelled  out  by  eight  distinct  handshapes 
spelling  L-A-N-G-U-A-G-E.  Other  sign  language  interfaces  use  a 
form  of  signed  English,  rather  than  the  native  sign  language.  Eor 
young  children  and  others  not  fluent  in  English,  interfaces  that 
use  fingerspelling  or  signed  English  transliterations  of  text  may 
not  meet  their  needs. 

In  many  cases,  the  goal  when  using  sign  interfaces  is  to  pro¬ 
vide  ASL  translations.  Given  the  present  state  of  the  art,  auto¬ 
matic  translation  from  English  to  ASL  is  not  possible;  however, 
many  current  efforts  are  directed  at  facilitating  this  translation, 
acknowledging  the  need  for  translation  into  ASL.  In  what  fol¬ 
lows,  a  few  of  the  options  for  sign  language  presentation  of  au¬ 
dio  and  multimedia  will  be  discussed. 

Ideal  in  many  respects  would  be  to  have  a  live  person  signing 
an  ASL  version  of  print  and  multimedia  materials.  Hanson  and 
Padden  (1989,  1990)  used  videodisc  technology  in  the  earliest 
computer-based  attempt  at  a  bilingual  ASL/English  approach  to 
reading  and  writing  instruction  for  signing  children.  The  work 
combined  ASL  video  and  the  translated  English  text  on  one 
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screen.  In  this  and  other  language  learning  situations  where  the 
users  are  young  children  still  developing  both  ASL  and  English 
skills,  this  use  of  live  signing  has  been  particularly  effective  (Fr- 
ishberg  et  al,  1993). 

The  advent  of  high-speed  networks  and  Internet  video  has 
created  opportunities  for  web  applications  that  use  live  signing 
interfaces.  For  example,  classroom  applications  have  been  de¬ 
veloped  (e.g.,  see  King,  2000;  Laurent  Clerc  National  Deaf  Edu¬ 
cation  Center,  n.d.;  for  a  demo,  see  “Sample  Web  Page,”  Gal- 
laudet  University,  n.d.).  Additionally,  video  blogs  have  created 
the  opportunity  for  blogs  to  be  signed  by  the  blogger,  rather 
than  written  (Lamberton,  2005),  and  video  e-mail  allows  sign¬ 
ers  to  communicate  through  signing  by  creating  video  record¬ 
ings  to  be  transmitted  as  e-mail  messages  (e.g.,  see  “Road  Run¬ 
ner  Video  Mail,”  n.d.;  also,  “Vibe  Video  Mail,”  n.d.). 

Although  ideal  from  the  language  perspective,  other  con¬ 
straints  may  argue  against  the  use  of  live  signing  in  an  interface. 
Lack  of  access  to  high-speed  networks  for  video  transmission 
may  be  an  issue,  but  often  the  problem  is  the  desire  for  auto¬ 
matic  translation  of  software  and  web  content  that  live  signing 
does  not  provide.  Live  signing  requires  the  prior  recording  of 
the  signed  material.  Once  recorded,  changes  to  the  video  re¬ 
quire  a  new  recording.  Because  of  this,  live  signing  is  not  prac¬ 
tical  in  applications  that  require  that  sign  versions  of  audio  or 
multimedia  be  created  in  real  time.  Short  of  having  an  inter¬ 
preter  doing  the  translations,  live  signing  is  not  possible  in  these 
situations. 

One  approach  to  automatic  sign  presentation  is  what  might 
be  called  “concatenated  signing”  (e.g.,  see  “iCommunicator,” 
n.d.;  “Signtel,”  n.d.).  With  this,  a  software  program  is  used  to 
create  word  strings  or  sentences  by  concatenating  signs  pro¬ 
duced  by  a  live  signer.  The  program  starts  with  a  vocabulary  of 
stored  individual  signs.  These  signs  can  be  strung  together  to 
form  phrases  and  sentences.  To  prevent  a  jerky  appearance  that 
would  occur  through  the  simple  production  of  a  list  of  signs, 
algorithms  are  used  to  smooth  the  transitions  between  these 
signs.  While  these  transitions  are  not  as  natural  as  live  signing, 
the  signing  is  legible  and  suggests  an  interesting  approach  to 
sign  language  interfaces. 

A  technique  that  has  generated  much  interest  in  recent 
years  is  signing  avatars  (e.g.,  see  Cox  et  al.,  2002;  Karpouzis, 
Caridakis,  Fotinea,  &  Efthimiou,  in  press;  Kennaway,  2002; 
Sims,  2004).  These  avatars  use  virtual-reality  techniques  to  pro¬ 
duce  animated  signing.  The  specific  techniques  differ  for  the 
various  avatars,  but  central  to  all  is  that  they  display  computer 
generated  signing.  Some  systems  are  able  to  display  not  just 
hands,  but  full  signers,  so  that  facial  expressions  as  well  as  hand 
movements  are  shown.  See,  for  example,  the  SigningAvatar® 
shown  in  Fig.  45.1  that  illustrates  the  sophistication  of  these 
animations. 

Signing  avatars  have  been  used  in  education  and  have  po¬ 
tential  for  applications  such  as  translation  of  web  pages,  televi¬ 
sion  programs,  and  conversational  dialog.  Chief  among  the 
virtues  is  that,  unlike  natural  (live)  sign  language  applications 
that  are  limited  to  prerecorded  materials,  avatars  can  generate 
signed  versions  of  English  words  “on  the  fly.”  They  also  have  the 
advantage  of  not  requiring  large  downloads  for  web  usage. 

While  the  avatars  can  sign  ASL  when  preprogrammed,  the 
language  translation  work  needed  for  automatic  translation  into 
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FIGURE  45.1.  An  ASL  sign  avatar.  The  full  animation  of  this 
avatar  can  be  viewed  on  http://www.vcom3d.com/ASL.htm. 
©  VcomSD,  Inc.,  2004.  All  rights  reserved. 


ASL  (or  other  signed  language)  is  not  ready  to  support  this  ren¬ 
dering  on  the  fly.  A  current  research  focus  in  avatar  work  is  on 
natural  language  translation  and  exploring  techniques  to  display 
native  sign  languages  by  avatars  (e.g.,  see  Huenerfauth,  2005). 


INPUT  TECHNOLOGIES 


Standard  keyboard  and  mouse  input  technologies  offer  no  bar¬ 
riers  to  users  who  have  a  hearing  loss.  The  emergence  of  new 
interfaces,  however,  presents  alternatives  that  will  impact  the 
way  in  which  we  are  able  to  interact  with  computers.  We  con¬ 
sider  here  both  speech  and  gestural  interfaces  as  they  may  in¬ 
fluence  interactions  for  deaf  and  hard  of  hearing  users. 

Speech  Interfaces 

Conversational  speech  interfaces  have  appeal  as  a  natural 
means  of  interacting  with  computers  (see  Karat,  Vergo,  Na- 
hamoo,  Lai,  &  Yankelovich,  chapter  19,  “Conversational  Speech 
Interfaces  and  Technologies,”  this  volume).  These  interactions 
can  range  from  simple,  even  one-word  commands  to  full  dic¬ 
tation  of  documents  and  user  collaborations.  For  deaf  and  hard 
of  hearing  individuals,  such  interactions  require  consideration 
of  user  needs.  First,  these  interactions  can  involve  speech  out¬ 
put  which,  as  already  discussed,  requires  a  visual  display  alter¬ 
native.  Second  are  problems  with  speech  input.  Some  deaf  and 
hard  of  hearing  individuals  utilize  speech  and  would  be  inter¬ 
ested  in  taking  advantage  of  speech  input.  Speech  recognition, 
however,  may  be  more  problematic  for  these  speakers  than  for 
hearing  speakers. 

As  mentioned  briefly  in  the  previous  paragraph,  the  speech 
of  deaf  and  hard  of  hearing  individuals  is  not  always  highly  in¬ 
telligible  to  hearing  listeners.  Since  speech  recognition  engines 
can  be  trained  to  the  voice  and  pronunciations  of  an  individual 
speaker,  however,  couldn’t  a  recognizer  be  trained  to  under¬ 
stand  the  speech  of  an  individual  deaf  speaker,  even  if  the 
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speech  is  not  completely  intelligible  to  listeners?  The  difficulty  is 
that  recognizers  require  consistent  speech.  Research  has  shown 
that  the  speech  of  deaf  and  hard  of  hearing  speakers  often  is 
more  variable  than  the  speech  of  hearing  speakers  (McGarr, 
1987;  McGarr  &  Lofqvist,  1988).  For  example,  there  is  more 
acoustic  variation  in  the  pronunciation  of  a  single  phoneme  by 
a  deaf  or  hard  of  hearing  speaker  than  there  is  in  the  pronunci¬ 
ation  of  that  phoneme  by  a  hearing  speaker.  Hearing  listeners 
are  very  tolerant  of  this  variability;  speech  recognizers  are  less 
so.  Thus,  deaf  and  hard  of  hearing  users  who  many  wish  to  use 
their  speech  for  input  may  well  find  recognition  less  accurate 
than  it  is  for  hearing  speakers. 

Speech  interfaces  are  often  seen  as  useful  alternatives  to  vi¬ 
sual  interfaces  in  situations  when  computers  or  keyboards  are 
not  available  (see  Karat  et  al.,  chapter  19,  “Conversational  Speech 
Interfaces  and  Technologies,”  this  volume).  For  hearing  users, 
this  alternative  of  a  speech  interface  may  be  highly  desirable. 
Many  deaf  and  hard  of  hearing  users,  however,  will  not  be  inter¬ 
ested  in  speech  interfaces;  others  may  experience  difficulties  in 
using  them.  Although  situational  demands  may  sometimes  dic¬ 
tate  the  use  of  conversational  interfaces,  care  must  be  taken  that 
outside  of  the  situational  context,  there  exists  a  means  for  deaf 
and  hard  of  hearing  users  to  access  the  same  information  that 
hearing  users  access. 

Sign  Interfaces 

Designed  specifically  for  deaf  and  hard  of  hearing  users,  sign 
recognition  is  a  specific  and  complex  subset  of  gesture  recog¬ 
nition  technologies  having  the  goal  of  automatically  converting 
signed  language  to  text  or  speech.  These  technologies  have 
been  investigated  for  a  number  of  years,  primarily  to  address  the 
need  to  facilitate  communication  between  signers  and  nonsign¬ 
ers.  They  also  have  the  potential  to  provide  an  alternative  means 
of  natural  language  input  to  computers. 

The  task  of  recognizing  full  ASL  or  other  natural  sign  lan¬ 
guages  is  a  difficult  problem.  The  first  reason  is  that  an  individual 
sign  varies  depending  upon  its  context.  Take,  for  example,  the 
sign  for  the  word  GIVE.  This  sign  has  a  specific  shape  combined 
with  a  variable  movement.  The  movement  reflects  who  is  giving 
and  to  whom.  I  GIVE-TO  YOU,  YOU  GIVE-TO  ME,  I  GIVE-TO 
HIM,  HE  GIVES-TO  ME,  and  I  GIVE-TO  ALL-OF-YOU  each  have  a 
different  movement  that  indicates  subject  and  object.  A  second 
reason  for  difficulty  in  sign-language  recognition  is  that  several 
pieces  of  linguistic  information  are  produced  in  parallel.  For  ex¬ 
ample,  facial  expression  carries  critical  grammatical  information. 
Thus,  a  full  language  recognizer  needs  to  recognize  not  only  the 
hand  gestures  of  ASL,  but  must  also  recognize  certain  facial  in¬ 
formation  relevant  to  the  grammar.  Such  facial  elements  include 
eyebrow  position,  eye  gaze,  and  mouth  movements. 

As  with  interfaces  that  produce  signs,  many  systems  that  pur¬ 
port  to  perform  sign-language  recognition  deal  with  recognition 
of  fingerspelling  handshapes  rather  than  recognition  of  ASL  sen¬ 
tences  or  even  ASL  signs.  Using  fingerspelling  handshapes 
rather  than  ASL  signs  certainly  constrains  the  size  of  the  prob¬ 
lem.  Because  there  are  only  26  handshapes  in  the  English  al¬ 
phabet,  this  represents  a  much  more  manageable  problem 
space  than  the  recognition  of  full  ASL  signing.  While  it  presents 


an  alternative  to  keyboard  typing  of  words,  it  doesn’t  provide 
the  type  of  natural  language  interaction  with  computers  that  is 
afforded  by  speech  interfaces. 

Techonologies  for  recognizing  signs  have  tended  to  use  either 
instruments  worn  by  the  signer  or  computer  vision  techniques. 
The  first  if  these  approaches  has  the  signer  wear  a  specially  de¬ 
signed  glove  or  sensors  placed  on  their  joints  that  allow  a  com¬ 
puter  to  track  movement  (e.g.,  see  Braffort,  1996;  Eang,  Gao,  & 
Zhao,  2003;  Hernandez-Rebollar,  Lindeman,  &  Kyriakopoulos, 
2002;  Kadous,  1996;  Wang,  Gao,  &  Ma,  2002).  In  contrast,  com¬ 
puter  vision  techniques  use  cameras  to  provide  input  to  a  com¬ 
puter  about  a  signer’s  movements  and  facial  gestures  (e.g.,  see 
Brashear,  Starner,  Lukowicz,  &  Junker,  2003;  Kadous,  1996;  Lee, 
et  al.,  2005;  Vogler  &  Metaxas,  2001).  These  inputs  are  then  ana¬ 
lyzed  using  a  variety  of  techniques  such  as  neural  networks  or 
Hidden  Markov  Models  that  then  recognize  the  sign. 

Critically,  however,  this  recognition  does  not  do  language 
translation.  Thus,  the  output  will  be  a  one-to-one  mapping  of  a 
sign  into  print  or  speech.  As  mentioned  with  the  avatar  work 
presented  earlier,  the  ASL/English  rules  that  would  be  needed 
for  such  translation  are  not  developed  to  a  state  where  auto¬ 
matic  translation  can  occur;  however,  advances  are  being  made. 
Recently,  for  example,  Hernandez-Rebollar  (2005)  presented 
work  designed  to  translate  signed  input  into  English  phrases. 
In  that  work,  the  translation  was  enabled  by  having  a  limited 
number  of  phrases  that  the  system  could  recognize,  thus  con¬ 
straining  the  problem  space. 

The  current  state  of  the  art  for  sign  recognition  is  not  as  ad¬ 
vanced  as  speech  recognition.  As  researchers  continue  to  work 
on  the  problem,  however,  advances  can  be  expected.  Eor  sign¬ 
ers,  it  might  be  the  case  that  conversational  sign  interactions  will 
one  day  be  possible,  much  as  speech  recognition  now  allows 
speakers  to  benefit  from  conversational  speech  interactions. 


TECHNOLOGY  AND  COMMUNICATION 


Deaf  and  hard  of  hearing  individuals  have  difficulties  not  only 
with  computer  interfaces,  but  also  experience  significant  dif¬ 
ficulties  in  certain  communication  situations.  Telephone  con¬ 
versations,  as  well  as  one-on-one  conversations,  group  dis¬ 
cussions,  and  presentations  or  classroom  lectures,  are  all 
problematic.  No  discussion  of  HCI  for  deaf  and  hard  of  hear¬ 
ing  individuals  would  be  complete  without  at  least  a  brief 
mention  of  technologies  that  have  been  developed  to  aid  in 
these  situations. 

Telephones  have  long  been  a  source  of  difficulty  for  deaf  and 
hard  of  hearing  people.  Alexander  Graham  Bell  was  a  teacher 
of  deaf  students  and  was  married  to  a  woman  who  was  deaf  His 
interest  in  finding  improved  ways  to  communicate  with  deaf 
speakers  lead  to  his  invention  of  the  telephone.  Ironically,  how¬ 
ever,  over  the  years,  telephones  created  a  number  of  barriers  for 
deaf  and  hard  of  hearing  individuals  in  the  workplace  and  other 
situations.  To  overcome  these  barriers,  a  number  of  assistive  de¬ 
vices  have  been  developed  (Lazzaro,  1993).  Amplification  and 
adapters  exist  for  many  phones  that  will  allow  hard  of  hearing 
individuals  or  people  with  cochlear  implants  to  hear  phone  con¬ 
versations.  Teletype  devices  (TTYs  and  TDDs)  as  well  as  some 
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computer  applications  allow  deaf  users  to  type  conversations 
that  are  carried  over  phone  lines. 

Operator-assisted  relay  services  have  been  established  to  en¬ 
able  conversations  between  deaf  and  hearing  individuals.  The 
relay  personnel  serve  as  a  bridge  between  the  two  conversation 
participants,  translating  typed  information  into  speech  for  the 
hearing  participant  and  translating  speech  into  written  text  for 
the  deaf  participant.  Hard  of  hearing  speakers  can  speak  directly 
with  others  using  a  captioning  service  that  provides,  nearly  in 
real  time,  a  printed  transcript  of  the  conversation  to  support 
hard  of  hearing  users  (CapTel,  2005).  Various  means  of  enabling 
signed  conversations  over  the  telephone  have  been  explored 
over  the  years,  but  the  advent  of  video  phones  has  now  made 
feasible  the  option  of  signed  phone  conversations. 

The  current  prevalence  of  conference  calls  in  the  workplace 
has  created  a  new  set  of  problems  for  deaf  and  hard  of  hearing 
workers.  Similarly,  classroom  situations  or  lecture  presentations 
also  create  significant  difficulties  for  deaf  and  hard  of  hearing  at¬ 
tendees.  Even  individuals  skilled  in  lipreading  have  difficulty  in 
these  situations  because  the  speaker’s  face  is  rarely  visible  with 
sufficient  resolution  for  lipreading.  Sign  language  interpreting 
and  captioning  are  two  means  of  providing  accommodation  for 
participants  unable  to  hear  or  understand  the  speech  in  these 
situations. 

Sign  language  interpreters  allow  signers  to  participate  in 
meetings  by  providing  real-time  translation  of  the  spoken  con¬ 
versation  into  sign.  This  is  a  two-way  translation  service,  such 
that  the  signing  participate  is  also  able  to  participate  by  having 
the  interpreter  speak  their  signed  utterances.  Remote  inter¬ 
preting  is  a  technology  that  addresses  the  problem  of  a  shortage 
of  skilled  interpreters,  particularly  in  some  locations.  The  inter¬ 
esting  aspect  of  this  is  that  the  interpreter  need  not  be  present 
at  the  location  of  any  of  the  conference  participants.  The  deaf  or 
hard  of  hearing  participant  views  the  remotely  located  inter¬ 
preter  on  a  computer  display  or  TV  screen.  The  interpreter  lis¬ 
tens  to  hearing  participants  by  telephone  and  provides  sign- 
language  interpreting  for  the  deaf  participant.  The  interpreter 
also  voices  what  the  deaf  person  signs  for  hearing  participants. 
This  remote  interpreting  can  be  used  similar  to  the  telephone 
relay  service,  with  participants  in  different  locations  (Video  Re¬ 
lay  Service,  or  VRS),  or  in  situations  where  the  participants  are 
located  in  the  same  room  (Video  Relay  Interpreting,  or  VRI) . 

Communication  Access  Real-time  Translation  (CART)  pro¬ 
vides  real  time  captioning,  enabling  discussions  and  presenta¬ 
tions  to  be  transcribed  into  text  for  deaf  and  hard  of  hearing  par¬ 
ticipants.  Typically,  this  is  done  by  a  person  who  creates  the 
captions.  As  with  sign  language  interpreting,  the  captioner  can 
be  physically  present  or  remotely  located,  listening  to  the  con¬ 
versation  by  phone  and  transmitting  the  text  via  a  network  to  a 
computer  screen  or  projected  display.  The  deaf  participant 
types  their  questions  or  comments.  That  input  is  then  read 
aloud  for  the  other  participants,  either  by  a  captioner  or  using 
computer  text-to-speech  technology  (e.g.,  see  Caption  First, 
n.d.;  Viable  Technologies,  n.d.) 

The  ability  to  automatically  transcribe  speech  was  envisioned 
more  than  100  years  ago  as  a  technology  that  held  great  promise 
for  deaf  and  hard  of  hearing  people,  even  though  such  tech¬ 
nology  was  hardly  imaginable  at  the  time  (Fay,  18839  -  As  speech- 
recognition  systems  have  matured  over  the  last  quarter  cen¬ 


tury,  a  number  of  applications  designed  for  deaf  users  have  been 
explored  (Bain,  Basson,  Faisman,  &  Kanevsky,  2005;  Stinson  & 
Stuckless,  1998). 

The  Liberated  Learning  Project  is  one  example  of  using  au¬ 
tomatic  speech  recognition  in  the  classroom  (Bain,  Basson,  & 
Wald,  2002).  In  this  effort,  the  classroom  teacher  speaks  into  a 
microphone  that  transmits  his  or  her  speech  to  a  computer  to 
perform  the  recognition.  The  transcript  of  the  lecture  is  dis¬ 
played  in  close  to  real  time  on  a  screen  at  the  front  of  the  class¬ 
room.  Shown  in  Fig.  45.2  is  a  classroom  situation  for  the  Liber¬ 
ated  Learning  Project. 


FIGURE  45.2.  An  example  classroom  lecture  transcribed  in  real 
time.  From  Bain  et  al.  (2005);  republished  with  permission. 
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While  recognition  technologies  have  improved  over  the 
years,  they  still  do  not  attain  100%  correct  performance  for  dic¬ 
tation.  To  provide  students  with  accurate  transcripts  for  lecture 
notes,  the  professors  who  participate  in  the  Liberated  Learning 
Project  review  the  transcript  for  inaccuracies  and  make  correc¬ 
tions  after  a  lecture  is  completed.  The  corrected  transcripts  are 
then  made  available  to  students  (Bain  et  al.,  2002).  Other  efforts 
that  use  automatic  speech  recognition  for  captioning  use  dif¬ 
ferent  methods  for  correcting  errors  in  transcripts.  These  other 
methods  include  having  a  trained  speaker  “shadow”  the  speech 
to  produce  more  reliable  recognition,  and/or  having  a  person 
correcting  errors  in  real  time  (e.g.,  see  Bain  et  al.,  2005;  Robson, 
2001;  Viable  Technologies,  n.d.). 


CONCLUSIONS 


Computer  technologies  are  very  important  in  the  lives  of  many 
deaf  and  hard  of  hearing  people.  These  technologies  can  ease 
communication  between  coworkers,  friends,  family  members, 
neighbors,  and  a  variety  of  services.  These  technologies  have 
also  provided  access  for  deaf  students,  employees,  and  individ¬ 


uals  to  have  full  access  to  rich  multimedia  services,  shopping, 
news,  and  general  information.  In  short,  barriers  long  encoun¬ 
tered  by  deaf  and  hard  of  hearing  people  are  now  being  over¬ 
come  through  the  use  technology. 

Deaf  and  heard  of  hearing  users  represent  a  population  that, 
in  itself,  is  diverse  in  both  hearing  and  language  experiences  and 
skills.  Applications  that  offer  flexibility  of  language  (e.g.,  caption¬ 
ing,  signed  English,  or  ASL)  will  be  most  accessible  to  users  who 
have  a  hearing  loss.  At  a  minimum,  however,  developers  and  de¬ 
signers  need  to  ensure  that  information  is  not  carried  by  the  au¬ 
ditory  channel  alone.  Paramount  is  the  need  to  ensure  that  any 
audible  information,  be  it  a  sound  alert,  speech  prompt,  or  other 
auditory  event,  have  a  visual  counterpart.  Such  considerations 
are  not  only  good  design,  but  are  also  mandated  by  a  growing 
number  of  regulations  worldwide.  The  ability  of  everyone  to  par¬ 
ticipate  in  our  increasingly  technological  society  is  crucial. 


ACKNOWLEDGMENTS 


rd  like  to  thank  Al  Noll,  Debra  Noll,  and  John  Richards  for  valu¬ 
able  comments  on  early  drafts  of  this  chapter. 


References 


Aronson,).  (2000) .  Sound  and  Fury  [Motion  picture].  United  States: 
New  Video  Group. 

Bain,  K.,  Basson,  S.,  &  Wald,  M.  (2002).  Speech  recognition  in  univer¬ 
sity  classrooms:  Liberated  learning  project.  In  Proceedings  of  the  5th 
International  ACM  SIGCAPH  Conference  on  Assistive  Technologies 
(ASSETS’02),  (pp,  192-196).  New  York:  ACM. 

Bain,  K.,  Basson,  S.,  Faisman,  A.,  &  Kanevsky,  D.  (2005).  Accessibility, 
transcription,  and  access  everywhere.  IBM  Systems  Journal,  44(3), 
589-605. 

Brady,  S.,  &  Shankweiler,  D.  (1991).  Phonological  processes  in  literacy: 
A  Tribute  to  Isabelle  Y.  liherman.  Hillsdale,  NJ:  Lawrence  Erlbaum 
Associates. 

Braffort,  A.  (1996).  A  gesture  recognition  architecture  for  sign  language. 
In  Proceedings  of  the  2nd  International  ACM  SIGCAPH  Conference 
on  Assistive  Technologies  (ASSETS’96),  102-109. 

Brashear,  H.,  Starner,  T.,  Lukowicz,  E,  &  Junker,  H.  (2003).  Using  multiple 
sensors  for  mobile  sign  language  recognition.  In  Proceedings  of  the 
7th  IEEE  International  Symposium  on  Wearable  Computers,  45-52. 

Brewer,).  &  Dardailler,  D.  (1999).  Web  Content  Accessibility  Guidelines 
1. 0.  Retrieved  December  7,  2005,  from  University  of  Wisconsin-Madi- 
son.  Trace  Research  and  Development  Center  website:  http:// 
www.w3.org/rR/mi-WEBCONTENT/ 

CapTel.  (n.d.).  Introducing  the  Captioned  Telephone.  Retrieved  Decem¬ 
ber  7,  2005,  from  http://www.captionedtelephone.com/about 
captel.phtml 

Caption  First,  (n.d.).  Overview  of  our  services.  Retrieved  January  20, 
2006,  from  http://www.captionfirst.com/overview.htm 

Chorost,  M.  (2005).  Rebuilt:  How  becoming  part  computer  made  me 
more  human.  New  York,  NY  Houghton  Mifflin  Company. 

Closed  Captioning  Web.  (n.d.)  Closed  Captioning  Web:  Software  links. 
Retrieved  December  7,  2005,  from  http://www.captions.org/soft- 
links.cfm 

Conrad,  R.  (1979).  The  deaf  schoolchild.  London:  Harper  &  Row. 


Cox,  S.,  Lincoln,  M.,  Tryggvason,  J.,  Nakisa,  M.,  Wells,  M.,  Tutt,  M.,  et  al. 
(2002).  TESSA,  a  system  to  aid  communication  with  deaf  people.  In 
Proceedings  of  the  5th  International  ACM  SIGCAPH  Conference  on 
Assistive  Technologies  (ASSETS’02).  (pp.  205-212).  New  York:  ACM. 

Emmorey,  K.,  &  Lane,  H.  (2000).  The  signs  of  language  revisited:  An 
anthology  to  honor  Ursula  Bellugi  and  Edward  Klima .  Mahwah ,  NJ : 
Lawrence  Erlbaum  Associates. 

Erard,  M.  (2005).  The  birth  of  a  language.  New  Scientist,  188(2522), 
46-49. 

Fang,  G.,  Gao,  W,  &  Zhao,  D.  (2003).  Large  vocabulary  sign  language 
recognition  based  on  hierarchical  decision  trees.  In  Proceedings  of  the 
5th  international  conference  on  Multimodal  interfaces,  (pp.  125- 
131).  New  York:  ACM. 

Fay,  E.  A.  (1883).  The  glossograph.  American  Annals  of  the  Deaf  28, 
67-69. 

Frishberg,  N.,  Corazza,  S.,  Day,  L.,  Wilcox,  S.,  &  Schulmeister,  R.  (1993). 
Sign  language  interfaces.  In  Proceedings  of  the  ACM  SIGCHI  Con¬ 
ference  on  Human  factors  in  computing  systems .  (pp.  194-197). 
New  York:  ACM. 

Gallaudet  Research  Institute.  (2003).  Literacy  and  deaf  students. 
Retrieved  December  7,  2005,  from  http://gri.gallaudet.edu/Literacy/ 

Gallaudet  University,  (n.d.).  Sample  web  page  with  embedded  real 
media  video.  Retrieved  January  20,  2006,  from  http://academic 
.gallaudet.edu/pages/iced2000/real/stellaluna_smil.html 

Hanson,  V  L.  (1989) .  Phonology  and  reading:  Evidence  from  profoundly 
deaf  readers.  In  D.  Shankweiler  &  1.  Y  Liberman  (Eds.) ,  Phonology 
and  reading  disability:  Solving  the  reading  puzzle  (pp.  69-89) .  Ann 
Arbor:  University  of  Michigan  Press. 

Hanson,  V  L.,  &  Padden,  C.  A.  (1989).  The  use  of  interactive  video  for 
bilingual  ASL/English  instruction  of  deaf  children.  American  Annals 
of  the  Deaf  134,  209-213. 

Hanson,  V  L.,  &  Padden  C.  A.  (1990).  Bilingual  ASL/English  instruction  of 
deaf  children.  In  D.  Nix  &  R.  Spiro  (Eds.),  Cognition,  education,  and 


45.  Computing  for  Hard  of  Hearing  Users  *  893 


multimedia:  Exploring  ideas  in  high-technology  (pp.  49-63).  Hills¬ 
dale,  NJ:  Lawrence  Erlbaum  Associates. 

Hernandez-Rebollar,  J.  L.  (2005).  Gesture-drive  American  Sign  Language 
phraselator.  In  Proceedings  of  the  7th  International  ACM  Confer¬ 
ence  on  Multimodal  Interfaces  (ICMI’OS).  (pp.  288-292).  New  York: 
ACM  Press. 

Hernandez-Rebollar,  J.-L.,  Lindeman,  R.  W,  &  Kyriakooulos,  N.  (2002). 
A  multi  class  pattern  recognition  system  for  practical  fingerspelling 
translation.  In  Proceedings  of  the  4th  IEEE  International  Confer¬ 
ence  on  Multimodal  Interfaces  (ICMI’OS)  (pp.  185-190) .  Washing¬ 
ton,  DC:  IEEE  Computer  Society. 

Huenerfauth,  M.  (2005).  Representing  coordination  and  non-coordina¬ 
tion  in  an  American  Sign  Language  animation.  In  Proceedings  of  the 
7th  International  ACM  SIGACCESS  Conference  on  Assistive  Tech¬ 
nologies  (ASSETS’OS).  (pp.  44-51).  New  York:  ACM  Press. 

Hyde,  M.,  &  Power,  D.  (2006).  Some  ethical  dimensions  of  cochlear 
implantation  for  deaf  children  and  their  families. /oMrna/  of  Deaf 
Studies  and  Deaf  Education,  Ilfl),  102-111. 

iCommunicator.  (n.d.).  iCommunicator.  Retrieved  January  4,  2006, 
from  http :/Avww. myicommunicator. com/ 

Kadous,  M.  W  (1996).  Machine  recognition  of  Auslan  signs  using  Power- 
gloves:  Towards  large-lexicon  recognition  of  sign  languages.  In  L.  Mess¬ 
ing  (Ed.),  Proceedings  ofWIGLS,  The  Workshop  on  the  Integration  of 
Gestures  in  language  and  Speech,  Wilmington  Delaware,  165-174. 

Karpouzis,  K.,  Caridakis,  G.,  Eotinea,  S.-E.,  &  Efthimiou,  E.  (in  press). 
Educational  resources  and  implementation  of  a  Greek  sign  language 
synthesis  architecture.  Computers  in  Education. 

Kennaway  R.  (2002).  Synthetic  animation  of  deaf  signing  gestures.  Ges¬ 
ture  and  Sign  Language  in  Human-Computer  Interaction:  Interna¬ 
tional  Gesture  Workshop  (GW  2001,  LNAl  2298),  146-157. 

King,  C.  (2000).  Online  learning  at  Gallaudet  University.  Retrieved  Jan¬ 
uary  20,  2006  from  Gallaudet  University  website:  http://academic.gal- 
laudet.edu/pages/iced2000/iced2000_GUonlinelearning.PDE 

Klima,  E.  S.,  &  Bellugi,  U.  (1979).  The  signs  of  language.  Cambridge,  MA: 
Harvard  University  Press. 

Lamberton,  J.  {2005) .  Jason  Lamberton’s  video  blog.  Retrieved  January 
20,  2006,  from  http://video.google.com/videoplay?docid=60124636 
06293405795&q=gallaudet-f  university 

Lane,  H.  (1984) .  When  the  mind  hears:  A  history  of  the  deaf.  New  York: 
Random  House. 

Lane,  H.  (1992) .  The  mask  of  benevolence:  Disabling  the  deaf  commu¬ 
nity.  New  York:  Alfred  A.  Knopf 

Lane,  H.,  Hoffmeister,  R.,  &  Bahan,  B.  (1996).  A  journey  into  the  deaf- 
world.  San  Diego,  CA:  Dawn  Sign  Press. 

Laurent  Clerc  National  Deaf  Education  Center,  Gallaudet  University, 
(n.d.).  Shared  reading  project:  Chapter  by  chapter — The  Thinking 
Reader.  Retrieved  January  20,  2006,  from  http://clerccenter.gallaudet 
.edu/Literacy/programs/chapter.html 

Lazzaro,  J.  J.  (1993) .  Adaptive  technologies  for  learning  and  work  envi¬ 
ronments.  Chicago,  IL:  American  Library  Association. 

Lee,  S.,  Henderson,  V,  Hamilton,  H.,  Starner,  T,  Brashear,  H.,  &  Hamil¬ 
ton,  S.  (2005).  A  gesture-based  American  Sign  Language  game  for 
deaf  children.  In  Proceedings  of  the  ACM  Conference  on  Human 
Pactors  in  Computing  Systems  (CHI  ’05).  (pp.  1589-1592).  New 
York:  ACM  Press. 

Lichtenstein,  E.  H.  (1998).  The  relationships  between  reading  processes 
and  English  skills  of  deaf  college  students.  Journal  of  Deaf  Studies 
and  Deaf  Education,  3(2),  80-134. 

McGarr,  N.  S.  (1987).  Communication  skills  of  hearing-impaired  chil¬ 
dren  in  schools  for  the  deaf.  ASHA  Monographs,  Oct(26),  91-107. 

McGarr,  N.  S.,  &  Lofqvist,  A.  (1988).  Laryngeal  kinematics  in  voiceless 
obstruents  produced  by  hearing-impaired  speakers.,  Journal  of 
Speech  and  Hearing  Research,  31(2),  234-239. 

Mitchell,  R.  E.  (2005,  Eebruary  15).  Can  you  tell  me  how  many  deaf  peo¬ 
ple  there  are  in  the  United  States?  Retrieved  December  7,  2005,  from 


Galludet  Research  Institute,  Graduate  School  and  Professional  Pro¬ 
grams  website:  http://gri.gallaudet.edu/Demographics/deaf-US.php 
Mitchell,  R.  E.  (2006).  How  many  deaf  people  are  there  in  the  United 
States?  Estimates  from  the  survey  of  income  and  program  panicipa- 
tion.  Journal  of  Deaf  Studies  and  Deaf  Education,  7/(1),  112-119. 
National  Institutes  of  Health,  National  Institute  on  Deafness  and  Other 
Communication  Disorders.  (2002).  Captions  for  deaf  and  hard  of 
hearing  viewers.  NIH  Publication  No.  00-4834.  Retrieved  December  7, 
2005,  from  http://www.nidcd.nih.gOv/health/hearing/caption.asp#edit 
National  Institutes  of  Health,  National  Institute  on  Deafness  and  Other 
Communication  Disorders.  (2004).  Statistics  about  hearing  dis¬ 
orders,  ear  infections,  and  deafness.  Retrieved  December  7,  2005 
from  http://www.nidcd.nih.gov/health/statistics/hearing.asp 
Newport,  E.  (1990).  Maturational  constraints  on  language  learning. 
Cognitive  Science,  14,  11-28. 

Padden,  C.  A.,  &  Hanson,  V  L.  (2000).  Search  for  the  missing  link:  The 
development  of  skilled  reading  in  deaf  children.  In  K.  Emmorey  &  H. 
Lane  (Eds.),  The  signs  of  language  revisited:  An  anthology  to  honor 
Ursula  Bellugi  and  Edward  Klima  (pp.  435-447).  Mahwah,  NJ: 
Lawrence  Erlbaum  Associates. 

Padden,  C.,  &  Humphries,  T.  (1988).  Deaf  in  America:  Voices  from  a 
culture.  Cambridge,  MA:  Harvard  University  Press. 

Padden,  C.,  &  Humphries,  T.  (2005).  Inside  deaf  culture.  Cambridge, 
MA:  Harvard  University  Press. 

Padden,  C.,  &  Ramsey,  C.  (2000).  American  Sign  Language  and  reading 
ability  in  deaf  children.  In  C.  Chamberlain,).  Morford,  &  R.  Mayberry 
(Eds.),  language  acquisition  by  eye  (pp.  165-189).  Mahwah,  NJ: 
Lawrence  Erlbaum  Associates. 

Road  runner  video  mail,  (n.d.) .  Retrieved  January  20,  2006,  from  http:// 
vmail.vibephone.com/vm/vm_player?vmfile=4dbc51d033b4cdbac58 
3bf8ec3ef9cd3&vmpid= 1007 

Robson,  G.  (2001).  Can  realtime  captioning  be  done  using  realtime 
voice  recognition  systems?  Retrieved  December  7,  2005,  from  http:// 
www.robson.org/capfaq/online.html#VoiceRecognition 
Signtel,  Inc.  (n.d.).  Signtel.  Retrieved  December  7,  2005,  from  http:// 
WWW.  signtelinc .  com/main.  h  tm 

Sims,  E.  (2004).  Using  emerging  visualization  technologies  to  provide 
sign  language  access  to  the  Web.  Retrieved  January  20,  2006  from 
http://www.w3.org/WAI/RD/2003/12/Visualization/VCom3D.html 
Steinfeld,  A.  (2001).  The  case  for  real  time  captioning  in  classrooms. 

Retrieved  December  7, 2005,  from  http:/Avww.cartinfo.org/steinfeld.html 
Stinson,  M.,  &  Stuckless,  R.  (1998).  Recent  developments  in  speech-to- 
print  transcription  systems  for  deaf  students.  In  A.  Weisel  (Ed.), 
Issues  unresolved:  New  perspectives  on  language  and  deaf  educa¬ 
tion  (pp.  126-132).  Washington,  DC:  Gallaudet  University  Press. 

U.S.  General  Services  Administration,  Office  of  Governmentwide  Pol¬ 
icy,  IT  Accessibility  &  Workforce  Division,  (n.d.)  Section  508. 
Retrieved  January  20,  2006  from  http://section508.gov/ 
Vanderheiden,  G.  C.  (1994).  Application  software  design  guidelines: 
Increasing  the  accessibility  of  application  software  for  people  with 
disabilities  and  older  users.  Retrieved  January  20,  2006,  from  Univer¬ 
sity  of  Wisconsin-Madison,  Trace  Research  and  Development  Center 
website:  http://trace.wisc.edu/docs/software_guidelines/software.htm 
Viable  Technologies,  (n.d.).  Erequently  asked  Questions.  Retrieved 
December  7,  2005,  from  http://www.viabletechnologies.com/faq.php 
Vibe  video  mail.  (n.d.).  Retrieved  January  20,  2006,  from  http://www 
.vibephone.com/vsg/htdocs/products/video-mail/index.jsp 
Vogler,  C.,  &  Metaxas,  D.  (2001).  Eramework  for  recognizing  the  simul¬ 
taneous  aspects  of  American  Sign  Language.  Computer  Vision  and 
Image  Understanding  (81),  358-384. 

Wang,  C.,  Gao,  W,  &  Ma,  J.  (2002) .  A  real-time  large  vocabulary  recogni¬ 
tion  system  for  Chinese  Sign  Language.  In  1.  Wachsmuth  and  T.  Sowa 
(Eds.),  International  Gesture  Workshop,  Springer  Lecture  Notes  in 
Artificial  Intelligence ,  (Vol.  2298,  pp.  86-95).  New  York,  NY  Springer- 
Verlag. 


Part 

-VI- 

SECTION  A 

THE  DEVELOPMENT  PROCESS 
REOUIREMENTS  SPECIFICATION 


♦ 


46* 


USER  EXPERIENCE  AND  HCI 

MiHe  KuniavsHy 

ThingM  Corporation 


Introduction  . 898 

The  Boundaries  of  User  Experience . 898 

UX  Is  Context . 898 

Garrett’s  Elements  . 898 

The  Organizational  Experience . 899 

The  1927  Ford  Model  T  . 900 

A  Children’s  Art  Product  Manufacturer  Website . 900 

The  User  'View  . 901 

The  User  Experience  of  Products . 901 

Affect  . 901 

Value . 901 

The  User  Experience  of  Organizations . 902 

Brand . 902 

Relationships  . 904 

Examining  the  User  Experience  . 905 

Identifying  Organizational  Goals  . 905 

Identify  stakeholders  . 905 

Collect  stakeholder  goals . 906 

Prioritize  organizational  goals . 906 

A  rapid  technique:  project  history . 906 

Field  Observation . 907 

Find  key  informants,  schedule  research  . 908 

Narrow  the  focus . 908 

User  interactive  observation  . 909 

Use  multiple  researchers  and  analyze 

collaboratively  . 909 

Validation . 909 


Focus  Groups . 909 

Prepare . 910 

Make  a  schedule . 910 

Pick  an  audience  . 910 

Develop  discussion  topics . 911 

Write  a  discussion  guide  . 911 

Analyze  results . 911 

Manage  With  User  Experience . 911 

Agile  User  Experience  Development  . 912 

Iterative  development  . 912 

Risk-driven  and  client-driven . 912 

Timeboxing  . 913 

Adaptive  development  and  evolutionary 

requirements  . 913 

Introducing  User  Experience  Into  an 

Existing  Process  . 913 

Get  a  senior  manager  champion . 913 

Work  within  existing  processes . 913 

Make  small,  but  well-publicized  changes  . 914 

Make  developers’  lives  easier  witb 

user  experience  . 914 

Conclusion . 914 

References  . 915 


897 


898  •  KUNIAVSKY 


INTRODUCTION 


The  goal  for  this  chapter  is  to  introduce  concepts  and  tech¬ 
niques  that  help  structure  the  application  of  HCI  in  a  real-world 
environment  by  examining  the  larger  context  in  which  HCI  hap¬ 
pens  and  by  using  that  context  as  the  basis  for  the  design  of  user 
experiences. 

Understanding  the  broader  factors  that  influence  the  user 
experience  is  as  important  for  creating  successful  HCI  systems 
as  thoroughly  understanding  the  cognitive  science  behind  the 
user’s  actions.  Company  goals,  economic  relationships,  emo¬ 
tional  responses,  and  social  interactions  can  overwhelm  behav¬ 
ioral  and  perceptual  responses  of  consumers.  Although  inten¬ 
sive  research  is  currently  investigating  some  of  these  ideas,  the 
majority  of  firsthand  experience  of  and  thinking  about  design¬ 
ing  experiences  under  such  pressures  has  happened  in  the  con¬ 
sumer  marketplace  as  documented  in  popular  business  and 
marketing  literature.  In  bringing  these  ideas  and  experiences 
to  this  volume,  I  hope  to  introduce  the  process  of  HCI  as  part  of 
a  broader  activity:  specifically,  the  development  and  creation  of 
user  experience  in  a  consumer  economy. 


THE  BOUNDARIES  OF  USER  EXPERIENCE 


The  definition  of  user  experience  (UX)  and  its  relationship  to 
HCI  is  complex.  Both  fields  share  boundaries  with  a  number  of 
other  fields,  and  each  other.  On  one  hand,  either  field  can  re¬ 
semble  anthropology,  cognitive  psychology,  industrial  design,  or 
computer  science  in  practice.  On  the  other,  customer  relation¬ 
ship  management  and  marketing  play  a  large  role  in  actual  day- 
to-day  experiences  with  products  and  services.  Consulting  for  a 
broad  range  of  organizations  on  projects  ranging  from  con¬ 
sumer  products  for  broad  audiences  to  highly  focused  products 
for  internal  use  has  shaped  thinking  about  the  definition  of  the 
term.  User  experience  is  a  set  of  broader  considerations  than 
HCI.  It  aggregates  and  contextualizes  HCI  by  incorporating  the 
concerns  of  both  end  users  and  organizations.  In  other  words, 
the  user  experience  consists  of  all  of  the  factors  that  influence 
the  relationship  between  the  end  user  and  an  organization,  es¬ 
pecially  when  a  product^  mediates  that  relationship. 

UX  Is  Context 

From  the  users’  perspective,  their  experiences  are  continuous. 
The  products,  their  immediate  environments,  and  their  lives  all 
interact  and  feed  back  on  one  another.  On  the  most  basic  level, 
what  someone  understands  about  a  product  affects  what  he  or 
she  finds  attractive  about  the  product,  and  what  is  attractive  af¬ 
fects  his  or  her  willingness  to  understand  it.  How  much  de¬ 
pends  on  the  rest  of  the  context,  but  it  is  a  mistake  to  think  that 


TABLE  46. 1 .  Organizations  Involved  in  End-User  Relationships 


Product 

Organizations  Involved 

Traditional  technology 

Sales/Repair 

product 

Traditional  desktop 

Sales,  Support 

software 

Website 

Internet  Service  Provider,  Website  Provider 

Mobile 

Handset  manufacturer.  Network  provider. 
Application  provider.  Content  provider 

only  the  look  or  the  functionality  matter.  It  all  matters,  and  re¬ 
search  and  iterative  design  determine  to  what  degree. 

Many  seemingly  stand-alone  products  now  are  merely  ways 
to  access  services  provided  by  organizations.  End  users’  rela¬ 
tionships  to  an  experience  and  the  organizations  creating  the 
experience  intertwine  more  than  ever  (see  Table  46.1).  In  the 
days  of  traditional  industrial  manufacturing  (roughly  before 
1970),  end  users  of  a  product  may  have  only  had  one  interaction 
with  an  organization:  the  store  from  which  they  bought  it, 
which  may  have  also  provided  support  and  repair  services.  Pack¬ 
aged  software  included  three  or  more:  the  store  that  sold  the 
hardware,  the  store  that  sold  the  software,  and  the  providers  of 
technical  support.  "With  the  introduction  of  web-based  software 
interactions,  the  number  of  organizations  increased,  with  the 
addition  of  an  ISP  and  website  provider.  Modern  mobile  phone 
based  applications  may  involve  even  more:  a  handset  manufac¬ 
turer,  an  operating  system  developer,  a  network  provider,  an  ap¬ 
plication  developer,  and  a  content  provider.  All  of  these  organi¬ 
zations  contribute  to  the  end-user  experience,  often  without  a 
lot  of  coordination  between  them. 

HCI  is  part  of  a  technology  creation  process.  Like  any  tech¬ 
nology  creation  process,  doing  it  right  requires  not  only  au¬ 
tomating  a  certain  set  of  tasks,  but  also  inventing  tools  that  intro¬ 
duce  new  possibilities  for  both  the  people  using  them  and  the 
organizations  creating  them.  In  such  a  multilayered  environment, 
product  development  can  go  in  many  directions,  and  research 
can  be  conducted  almost  ad  infinitum.  However,  in  the  end,  lim¬ 
ited  resources  require  choosing  one  promising  direction. 

User  experience  design  and  research  is  a  pragmatic  pursuit. 
Its  goal  should  be  the  understanding  of  the  experience  of  tech¬ 
nology  users  and  technology-producing  organizations  to  man¬ 
age  the  risks  of  technology  creation  and  increase  the  chances 
of  success. 

Garrett’s  Elements 

Garrett  (2000,  2002)  developed  a  model  (see  Fig.  46.1)  for  un¬ 
derstanding  how  various  aspects  of  product  design  interact  to 
create  a  whole  user  experience. 

Garrett  (2000,  2002)  focused  on  web  design,  but  his  model 
extended  to  most  other  kinds  of  user  experience.  It  described 


define  product  broadly.  A  product  represents  the  interface  between  an  organization  and  end  users.  It  could  be  a  physical  object,  a  service,  a  system, 
software,  or  a  combination  of  these.  For  example,  an  ATM  consists  of  three  elements:  the  machine  itself,  the  card  used  to  access  it,  and  the  service 
that  it  enables  access  to.  However,  it  is  a  single  product,  especially  from  the  perspective  of  the  end  user.  More  often,  it  is  a  single  definable  entity, 
but  seemingly  stand-alone  artifacts  regularly  turn  out  to  belong  to  a  system  of  interlocking,  interdependent  elements. 
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A  basic  duality:  The  Web  was  originaliy  conceived  as  a  hypertextuai  information  space;  30  March  2000 

but  the  development  of  increasingly  sophisticated  front-  and  back-end  technologies  has 

fostered  its  use  as  a  remote  software  interface.  This  dual  nature  has  led  to  much  confusion, 

as  user  experience  practitioners  have  attempted  to  adapt  their  terminology  to  cases  beyond 

the  scope  of  its  original  application.  The  goal  of  this  document  is  to  define  some  of  these 

terms  within  their  appropriate  contexts,  and  to  clarity  the  underlying  relationships  among 

these  various  elements. 


Web  as  software  interface 


Visual  Design:  graphic  treatment  of  interface 
elements  (the  "look"  in  'look-and-feel") 


Interface  Design:  as  in  traditional  HCI; 
design  of  interface  elements  to  facilitate 
user  interaction  with  functionality 
Information  Design:  in  the  Tuftean  sense: 
designing  the  presentation  of  information 
to  facilitate  understanding 


Interaction  Design:  development  of 
application  flows  to  facilitate  user  tasks, 
defining  how  the  user  interacts  with 
site  functionality 


Functional  Specifications:  "feature  set"; 
detailed  descriptions  of  functionality  the  site 
must  include  in  order  to  meet  user  needs 


User  Needs;  externally  derived  goals 

for  the  site;  identified  through  user  research, 

ethno/techno/psychographics,  etc. 

Site  Objectives:  business,  creative,  or  other 
internally  derived  goals  for  the  site 

task-oriented 


Concrete 


Completion 


Abstract 


Conception 


Web  as  hypertext  system 


Visual  Design:  visual  treatment  of  text, 
graphic  page  elements  and  navigational 
components 

Navigation  Design;  design  of  interface 
elements  to  facilitate  the  user's  movement 
through  the  information  architecture 
Information  Design;  in  the  Tuftean  sense: 
designing  the  presentation  of  information 
to  facilitate  understanding 


Information  Architecture;  structural  design 
of  the  information  space  to  facilitate 
intuitive  access  to  content 


Content  Requirements:  definition  of 
content  elements  required  in  the  site 
in  order  to  meet  user  needs 


User  Needs;  externally  derived  goals 

for  the  site;  identified  through  user  research, 

ethno/techno/psychographics,  etc. 

Site  Objectives:  business,  creative,  or  other 
internally  derived  goals  for  the  site 

information-oriented 


This  picture  is  incomplete;  The  model  outlined  here  does  not  account  for  secondary  considerations  (such  as  those  arising  during  technical  or  content  development) 
that  may  influence  decisions  during  user  experience  development.  Also,  this  model  does  not  describe  a  development  process,  nor  does  it  define  roles  within  a 
user  experience  development  team.  Rather,  it  seeks  to  define  the  key  considerations  that  go  into  the  development  of  user  experience  on  the  Web  today. 

©  2000  Jesse  James  Garrett  http;//www.iig.net/ia/ 


FIGURE  46. 1 .  Garrett's  elements  of  user  experience  diagram  (Garrett,  2000). 


the  dependencies  connecting  abstract  business  and  user  goals 
to  visual  design  through  a  set  of  intermediate  steps.  These  steps 
describe  the  information  that  a  product  provides  and  describe 
how  people  interact  with  that  information.  Productivity  prod¬ 
ucts  (the  left-hand  column,  defined  as  “web  as  software  inter¬ 
face”)  emphasize  the  content  less  than  the  interaction,  while 
information  products  (“web  as  hypertext  system”)  emphasize 
the  content  more  than  the  interaction. 

The  diagram  defines  stages  in  understanding  and  managing 
this  process,  and  emphasizes  that  factors  that  are  unrelated  to 
ergonomics  or  functionality  constrain  end-user  experience.  It 
implies  that  good  HCI  is  a  subset  of  good  product  development, 
and  inseparable  from  the  larger  context.  The  outer  layers  in  Gar¬ 
rett’s  diagram  hide  the  inner  ones  from  both  users  and  from  the 
organization  at  large.  Users  only  see  the  visual  design  layer, 
while  organizations  only  see  the  website  objectives  layer. 

However,  the  user  experience  depends  on  a  cascading  se¬ 
quence  of  assumptions  and  decisions.  These  are  constrained  by 


economic  factors  imposed  by  the  organization  and  psychological 
or  sociological  factors  imposed  by  users  and  society.  These  eco¬ 
nomic,  psychological,  and  sociological  factors  tell  at  least  half  of 
the  story  of  the  complete  user  experience.  They  define  the  con¬ 
text  in  which  decisions  are  made  and  the  product  actually  expe¬ 
rienced,  and  they  should  be  the  ones  in  which  it  is  designed. 


THE  ORGANIZATIONAL  EXPERIENCE 


End  users  are  not  the  only  customers  of  a  given  piece  of  tech¬ 
nology.  Technology  creation  solves  two  sets  of  problems:  one 
for  the  people  using  it,  and  another  for  the  organization  creat¬ 
ing  it.  HCI  research  and  design  often  assumes  that  an  organiza¬ 
tion’s  goal  is  to  provide  optimal  end-user  experiences,  but  many 
other  factors  drive  organizational  motivations.  Organizations’ 
needs  and  desires^  frame  and  prioritize  product  research  and 


^Hassenzahl  (2003)  used  pragmatic  and  hedonic  product  attributes  to  discuss  roughly  these  same  concepts.  His  terms  refer  to  individuals’  per¬ 
spectives  in  the  abstract,  but  I  prefer  to  use  needs  and  desires  because  these  terms  better  frame  discussions  from  users’  perspectives  and  work  better 
when  discussing  the  parallel  between  an  organization’s  perspective  and  that  of  the  user  of  its  product. 
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development  as  much  as  users’  abilities  and  goals,  which  are  the 
traditional  realm  of  HCI. 

An  organization  creates  a  product  because  it  desires  some¬ 
thing  from  a  user  base.  The  difficulty  is  that  the  user  base  often 
desires  something  different.  The  resolution  of  these  two  dis¬ 
junctive  desires  deeply  affects  the  final  user  experience.  For  this 
reason,  user-experience  design  and  research  starts  with  organi¬ 
zational  strategy. 

This  example  from  industrial  design  foreshadows  many  of 
today’s  HCI  and  user  experience  issues  80  years  earlier. 

The  1927  Ford  Model  T 

The  Ford  Model  T  was  an  incredibly  successful  car,  and  the  first 
killer  app  of  the  20th  century.  Throughout  the  19  years  it  was 
manufactured,  its  design  remained  unchanged,  except  for  one 
thing:  Every  year,  it  was  cheaper  than  the  year  before.  From  the 
perspective  of  Model  T  users,  it  was  a  great  vehicle:  reliable,  pre¬ 
dictable,  and  inexpensive.  However,  by  the  mid-1920s,  it  was  not 
selling  well  relative  to  many  of  its  competitors,  and  Ford  dis¬ 
continued  it  in  1927  (Wikipedia,  2005a).  Henry  Ford  refused  to 
value  anything  but  efficiency  (for  his  company  and  its  cus¬ 
tomers)  in  his  products.  However,  by  the  mid-1920s  Ford’s  com¬ 
petitors  were  selling  more  cars  by  evolving  the  look  and  feel 
every  year  {styling  in  automotive  terminology).  The  goal  went 
beyond  making  cars  more  efficient  or  cheaper  to  making  them 
look  different.  Having  realized  that  people  treated  cars  as  ex¬ 
pressions  of  identity,  the  competitors  included  styling  as  a  key 
part  of  the  user  experience. 

Ford  had  many  options  they  could  have  pursued  in  response 
to  the  economic  pressure  put  on  them  by  the  profits  lost  to  com¬ 
petitors.  They  could  have  restructured  their  manufacturing 
processes  to  make  Model  Ts  even  cheaper.  They  could  have  low¬ 
ered  the  quality  of  their  product  to  increase  their  margins;  they 
could  have  embarked  on  a  research  and  development  program 
to  merge  their  car,  tractor,  and  airplane  products,  so  they  would 
only  produce  one  product.  They  could  have  laid-off  workers  and 
decreased  the  number  of  cars  they  were  producing  and  so  on. 
Each  plan  would  have  differently  affected  the  driver  experience. 
Eord’s  decision  was  to  stop  making  the  Model  T  and  introduce 
the  1928  Model  A,  a  car  with  competitive  styling  (available  in  four 
colors,  none  of  them  the  black  of  the  Model  T;  Wikipedia, 
2005b).  Ford’s  industrial  designers  then  updated  the  styling  of 
their  cars  on  a  regular  basis  as  their  competitors  did. 

Beginning  user  experience  evaluation  by  analyzing  the  spon¬ 
soring  organization’s  motivations  regularly  reveals  the  issues 
that  pervade  the  assumptions  behind  the  product.  Introducing 
subtle  changes  in  core  assumptions,  as  Henry  Ford’s  son  Edsel 
(then  the  President  of  Ford)  did  in  1927,  can  change  the  expe¬ 
rience  of  the  entire  product  without  having  to  rethink  the  whole 
user  interface  (because  the  problem  may  not  be  in  the  inter¬ 
face  at  all). 

A  Children's  Art  Product  Manufacturer  Website 

A  maker  of  children’s  art  products  wants  a  new  information  ar¬ 
chitecture  for  their  website.  The  website  has  three  audiences 


(children,  educators,  and  parents  and  grandparents),  and  more 
than  200  different  kinds  of  content.  With  such  a  depth  of  infor¬ 
mation  and  such  a  broad  audience,  there  is  no  obviously  canon¬ 
ical  way  to  structure  the  content.  The  historical  function  of  the 
website  as  a  sales  channel  directed  toward  parents  and  educa¬ 
tors  guided  all  of  the  initial  architecture  choices.  However,  in¬ 
terviews  with  company  executives  responsible  for  the  website 
revealed  that  these  assumptions  were  either  inaccurate  or  in¬ 
appropriately  emphasized.  Most  mistaken  was  the  belief  that 
the  site  had  to  be  a  revenue  source.  In  fact,  the  Chief  Financial 
Officer  (CFO)  flatly  stated  that  the  website’s  goal  is  to  spread  the 
company’s  brand  identity  as  broadly  as  possible  among  their  pri¬ 
mary  audience.  In  its  incarnation,  the  website  met  neither  the 
goals  of  the  original  development  team  nor  its  actual  goal  as  a 
brand  vehicle. 

Throughout  the  product’s  development  lifecycle,  internal 
expectations  and  assumptions  guide  the  experience  it  creates  in 
subtle  ways.  In  this  example,  the  information  architecture  for  a 
website  was  distorted  by  the  explicitly  stated  goal  of  revenue 
production,  even  though  the  organization’s  leaders  had  changed 
their  goals.  When  expectations  contain  internal  conflicts,  they 
produce  contradictory  and  confusing  interaction. 

Organizations  have  to  put  themselves  first,  even  when  cre¬ 
ating  products  for  end  users.  For  example.  Southwest  Airlines 
policy  allows  customers  to  apply  the  price  of  an  unused  ticket 
to  another  ticket.  However,  the  company  profits  if  people  do 
not  take  them  up  on  the  offer.  Thus,  it  is  not  in  the  best  inter¬ 
est  of  the  company  to  make  it  easy  to  perform  the  transaction. 
As  of  October  2005,  Southwest.com  allows  the  user  to  trans¬ 
fer  funds  from  an  unused  ticket  only  if  they  have  the  exact  con¬ 
firmation  number  of  the  unused  ticket  and  the  exact  spelling 
of  the  name  associated  with  it — even  if  they  have  an  account 
on  the  Southwest  website  and  the  system  database  can  pull  up 
all  of  the  other  account  information.  The  Web-site  interface 
makes  transferring  funds  difficult  because  the  interface  ulti¬ 
mately  serves  the  company’s  financial  interests,  not  those  of 
the  customer. 

User  experience  defines  the  boundaries  of  product  devel¬ 
opment  through  stakeholder  needs  and  end-user  goals.  These 
needs  and  goals  are  not  just  management  requests  or  customer 
complaints.  They  represent  the  core  of  how  the  organization 
defines  success  and  what  end  users  expect  the  product  will  do 
for  them. 

Applying  the  tools  of  user  experience  research  and  design 
to  the  organization  is  tricky.  Looking  closely  at  organizational 
assumptions  and  expectations  steps  right  into  in-house  poli¬ 
tics — that  aspect  of  collaborative  work  that  everyone  would 
prefer  did  not  exist — and  can  create  interpersonal  tension. 
However,  unstated  internal  priorities  often  inhibit  successful 
user  experience  design  more  than  any  external  factor,  so  they 
are  important  to  investigate.  Fixing  office  politics  is  outside  this 
chapter’s  scope  and  most  readers’  job  descriptions,  but  explic¬ 
itly  clarifying  an  organization’s  priorities  is  well  within  the  ca¬ 
pability  of  an  HCI  professional.  In  fact,  it  is  critical.  As  we  have 
seen,  confusing,  conflicting,  and  ambiguous  organizational 
agendas  produce  conflicting  product  requirements,  which  in 
turn  produce  difficult  to  use  interfaces.  Knowing  organiza¬ 
tional  needs  helps  balance  the  needs  of  users  and  organiza¬ 
tions  in  design. 
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THE  USER  VIEW 


As  stated  above,  factors  that  affect  an  end  user’s  experience 
are  not  just  those  that  determine  the  efficiency  of  the  interface 
in  enabling  task  completion.  Functionality  is,  of  course,  critical 
to  the  continued  product  viability — it  needs  actually  to  do 
something — but  viability  is  more  than  functionality.  We  all  will¬ 
ingly  enter  into  experiences  (buy  products,  use  services,  etc.) 
that  are  far  from  functionally  optimal,  and  yet  we  leave  satis¬ 
fied.  Agarwal  and  Karahanna  (2000)  defined  the  concept  of 
“cognitive  absorption,”  which  seems  like  a  good  way  to  de¬ 
scribe  the  main  goal  of  product  designers  and  developers,  as 
‘A  state  of  deep  involvement  .  .  .  exhibited  through  temporal 
dissociation,  focused  immersion,  heightened  enjoyment,  con¬ 
trol,  and  curiosity.” 

Few  products  regularly  produce  cognitive  absorption.  In  or¬ 
der  to  understand  why,  it  is  valuable  to  define  some  other  terms 
describing  important  aspects  of  the  user  experience  from  the 
user’s  perspective. 

Ortony  Norman,  and  Revelle  (2005,  p.  174)  proposed  a 
model  that  describes  “an  organism’s”  (e.g.,  a  person’s)  psycho¬ 
logical  function  in  the  world.  The  model’s  four  (continually  in¬ 
terrelating)  parts  are: 

•  Affect — what  the  organism  feels 

•  Motivation — what  the  organism  needs  and  wants 

•  Cognition — ^what  the  organism  knows,  thinks,  and  believes 

•  Behavior — what  the  organism  does 

Product  design  implicitly  takes  all  of  these  factors^  into  con¬ 
sideration,  but  explicit  examination  of  them  is  rare.  Marketing 
researchers  investigate  motivation;  interaction  designers  use 
their  knowledge  of  cognition;  usability  research  focuses  on  be¬ 
havior;  and  visual  or  identity  designers  and  advertising  agen¬ 
cies  try  to  influence  motivation  through  affect.  However,  that  is 
an  ideal  situation.  In  reality,  the  practice  of  understanding  and 
structuring  a  unified  experience  is  so  new  that  design  generally 
runs  on  gut-level  intuition,  and  everyone  is  guessing  at  every¬ 
thing.  Gut-level  decision  making  is  not  necessarily  bad.  Humans 
are  often  good  at  predicting  other  humans’  reactions — except 
when  intuition  totally  fails. 

The  User  Experience  of  Products 

Affect.  According  to  Ortony  et  al.  (2005),  emotional  re¬ 
sponse,  or  affect,  is  a  complex  interaction  of  immediate  reac¬ 
tions  modulated  by  experience  with  previous  situations  and 
cognitive  prediction  of  future  states,  all  of  which  happens 
rapidly  and  simultaneously.  Immediate  feelings,  emotions,  and 
moods  are  different  states  operating  at  different  levels  of  gran¬ 
ularity.  They  are  also  critical  to  people’s  experience  of  a  product. 
■When  people  fall  in  love  at  first  sight  with  a  product  or  a  place. 


their  successive  experiences  will  not  be  moderate.  The  emo¬ 
tions  may  lead  them  to  overlook  interaction  problems  or  poor 
functionality.  Later,  the  emotional  state  may  wear  off,  the  hon¬ 
eymoon  ends,  and  the  inadequacies  of  the  product  turn  joy  into 
disillusionment. 

Davis  (1989,  p.  320)  showed  that  “both perceived  usefulness 
and  perceived  ease  of  use  were  significantly  correlated  with  self 
reported  indicants  of  system  use.”  People’s  emotional  relation¬ 
ships  to  products  before  they  had  the  opportunities  to  use 
them  affected  how  they  used  them  later.  Zhang  and  Li  (2005, 
p.  106)  extended  Davis’  research  by  applying  a  more  primal  con¬ 
cept,  affective  quality.  They  investigated  the  perceived  affective 
quality  of  software  products  and  concluded,  “a  user’s  immediate 
and  reflexive  affective  reaction  to  [information  technology]  has 
a  positive  impact  on  his  or  her  consequent  cognition-oriented 
evaluations  of  the  [technology] .” 

Furthermore,  Nass  and  Reeves  (1996)  described  in  detail 
how  people  exhibited  many  of  the  same  emotional  responses  to 
computers,  televisions,  and  films  as  they  did  to  other  humans, 
significantly  changing  their  expectations  and  behaviors  toward 
the  technology  as  a  result. 

"What  constitutes  affective  quality  (which  is  measured  in 
terms  of  valence  and  activation;  e.g.,  the  direction  and  magni¬ 
tude  of  the  emotional  response)  in  terms  of  technological  prod¬ 
ucts  is  still  under  investigation.  However,  evaluating  and  de¬ 
signing  the  complete  user  experience  clearly  requires  close 
consideration  of  the  experience’s  affective  aspects. 

Value.  People  act  for  a  reason.  They  engage  with  products 
or  experiences  for  some  reason  (or  reasons),  they  keep  using 
them  for  other  reasons,  and  they  stop  for  others  still.  In  the 
largest  context,  Maslow’s  (1943)  hierarchy  of  needs  serves  as 
one  model  of  how  what  people  value  in  their  lives  motivates 
their  actions.  Norman  et  al.  (2005)  described  how  one  kind  of 
motivation,  curiosity,  could  arise  from  an  emotional  response  to 
an  environment.  ‘Animals’  motivation  systems  [let]  the  resting 
point  of  affect  be  slightly  positive  so  that  when  there  is  nothing 
that  needs  to  be  done,  the  animal  is  led  to  explore  the  environ¬ 
ment”  (Ortony  et  al.,  2005,  p.  194). 

However,  pure  curiosity  rarely  leads  people  to  have  new  ex¬ 
periences  or  to  continue  well-known  experiences.  'When  using  a 
household  appliance,  for  example,  curiosity  rarely  drives  peo¬ 
ple’s  behavior.  From  a  product  developer’s  perspective,  a  good 
approximation  of  motivation  is  what  creates  value  for  the  end 
user.  "Value  consists  of  two  elements: 

•  The  product’s  perceived  potential  for  changing  a  customer/ 

user’s  life 

•  How  well  it  satisfies  that  potential 

Perceived  potential  consists  of  three  elements:  functional, 
economic,  and  psychological  (Sawhney,  2003).  The  functional 
aspect  is  the  prospective  user’s  expectation  about  whether  the 
product  will  be  able  to  solve  a  real-world  problem  the  person 


^These  terms  are  a  framework  for  the  subsequent  discussion.  They  are  not  defined  in  the  same  rigorous,  technical  way  that  Norman  et  al.  (2005) 
defined  them  in  their  work.  The  definitions  provided  here  are  the  more  common  dictionary  definitions,  which  are  a  superset  of  how  Norman  et  al. 
defined  them. 
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is  having.  “Will  the  disk  utility  program  recover  my  thesis?”  or 
“Will  the  personal  video  recorder  let  me  watch  The  Simpsons 
at  3  A.M.?” 

The  economic  aspect  consists  of  the  cost-benefit  analysis 
that  a  prospective  buyer  of  a  product  does  when  considering 
whether  purchasing  the  product  will  be  worth  the  opportunity 
cost  of  spending  money  on  it.  This  is  the  literal,  most  traditional, 
definition  of  value.  “Will  this  CRM  system  let  me  shave  25%  off 
of  my  expenses?” 

The  psychological  aspect  contains  all  of  the  hopes  that  some¬ 
one  has  for  how  owning  or  using  a  thing  will  change  his  or  her 
life  and  is  the  both  the  most  difficult  to  understand  and  the  po¬ 
tentially  most  important.  It  holds  all  of  the  emotional  attach¬ 
ment,  all  of  the  social  pressure,  and  all  of  the  personal  desires 
that  make  up  someone’s  self-image,  as  they  are  contemplating 
buying,  and  then  using,  a  product.  Some  consumer  objects, 
such  as  the  Nokia  7280  phone  (see  Fig.  46.2),  evoke  much  more 
about  their  values  than  they  communicate  about  their  function¬ 
alities.  Designed  as  fashion  items,  much  of  their  functionalities 
are  the  same  as  that  for  garments:  They  explicitly  project  an  im¬ 
age  of  their  users  to  both  others  and  the  users. 

However,  these  same  ideas  apply  to  ostensibly  purely  func¬ 
tional  products.  Every  underused  enterprise  software  product  is 
the  result  of  a  perceived  value  that  did  not  match  the  reality  of 
the  situation  on  the  ground,  often  for  reasons  that  were  nei¬ 
ther  functional  nor  economic. 

The  design  of  the  user  experience  is  the  practice  of  creating 
products  that  satisfy  perceived  value. 

What  brings  people  value  changes  with  context  is  that,  at 
different  places  and  times,  people  will  have  different  values. 
There  is  a  lifecycle  to  expectations  dictated  by  habituation.  As 
same  people  grow  accustomed  to  a  product’s  functionality,  its 
novelty  wears  off  For  a  long  time,  the  Model  T  satisfied  what 
consumers  wanted  in  a  car.  For  19  years,  Henry  Ford  thought 
only  the  price  of  the  car  had  to  change,  but  consumers  clearly 
thought  differently.  As  the  automobile’s  functionality  became 
commonplace,  people’s  relationship  to  it  changed.  They  began 
to  focus  on  the  psychological  needs  it  satisfied  and  to  see  it  less 
as  a  tool  they  were  using  and  more  as  part  of  who  they  were. 
People  desire  variety  (Postrel,  2003),  and  the  black  Model  T  no 
longer  satisfied.  Car  buyers  were  willing  to  pay  extra  for  a  dif¬ 
ferent  user  experience,  but  Ford  did  not  recognize  this  until  it 
was  almost  too  late. 


FIGURE  46.2.  Nokia  7280  phone. 


Blindness  to  the  larger  user  experience  also  exists  in  the  de¬ 
velopment  of  software  products.  The  business  press  regularly 
describes  the  struggle  between  well-established  companies  and 
their  younger  competitors.  Such  stories  typically  describe  a 
company  with  an  older  product  whose  target  audience  no 
longer  wants  sees  value  in  the  user  experience  their  product 
provides.  The  older  company  clearly  produced  good  user  value 
at  one  point,  or  else  they  would  not  have  had  the  success  that 
allowed  them  to  be  in  a  threatened  leadership  position.  Their 
products  changed  their  audience’s  expectations,  but  then  the 
company  failed  to  notice  when  expectations  moved  on.  For  ex¬ 
ample,  Yahoo!  search  technology  was  lagging  in  the  early  2000s 
when  compared  to  Google’s.  At  one  point,  Yahoo!  was  a  domi¬ 
nant  player  in  the  search  market,  but  by  2005,  they  got  to  the 
point  where,  “The  company  is  doing  everything  that  the  fertile 
imaginations  of  their  software  engineers  can  muster  in  order  to 
persuade  people  to  search  with  them  first.”  (Andelman,  2005). 

Likewise,  organizations  also  often  produce  products  for 
which  the  market  is  not  yet  ready.  In  2005,  a  number  of  large 
organizations  have  invested  in  entertainment  PCs,  which  look 
like  stereo  equipment,  and  associated  products,  such  as  media 
servers,  but,  to  the  puzzlement  of  the  companies  making  these 
services,  there  has  been  a  lack  of  widespread  adoption  of  such 
services  in  the  past  (Buckman,  2002).  These  products’  unpop¬ 
ularity  may  have  nothing  to  do  with  the  feature  set  or  its  pre¬ 
sentation.  The  makers  of  these  products  should  not  necessarily 
have  been  doing  any  more  usability  testing  or  focus  groups.  The 
interface  for  the  TiVo  (2005)  personal  video  recorder  was  widely 
praised  by  both  interaction  designers  and  users,  but  it  took  the 
company  eight  years  to  achieve  profitability.  It  may  be  that  pa¬ 
tience  is  an  ingredient  in  the  user  experience  of  these  products 
before  they  appear  worthwhile  to  a  broad  audience. 

As  Sawhney  (2003)  described,  the  process  of  creating  cus¬ 
tomer  value  in  technology  products  requires  understanding  the 
interaction  of  all  the  elements  that  make  the  product  desirable: 

According  to  HP,  the  benefits  of  the  iPaq  are  its  powerful  processor, 
bright  screen,  expandability  and  flexibility — a  statement  of  functional 
value.  But  to  close  a  sale,  HP  must  also  demonstrate  economic  value 
with  quantified  estimates  of  improved  productivity  for  end  users  as  well 
as  application  developers.  And  HP  must  convince  customers  of  the 
emotional  benefits  of  choosing  a  device  platform  that  is  backed  by  rep¬ 
utable  and  financially  solid  companies  such  as  HP  and  Microsoft. 
(Sawhney,  2003) 

Creating  a  user  experience  requires  understanding  this  entan¬ 
glement  of  ideas  as  well  as  HP  did  in  creating  the  iPaq. 

The  User  Experience  of  Organizations 

Brand.  Brand  identity  generally  refers  to  the  combination  of 
all  the  implicit  values  an  organization  communicates  about  itself, 
as  understood  by  the  consumers  of  that  organization’s  products  or 
services.  Symbols  such  as  logos  and  slogans  evoke  brand  identity, 
but  the  actual  identity  is  the  set  of  values  that  people  project  onto 
an  organization,  and  by  extension,  onto  its  products  based  on  per¬ 
sonal  experiences  with  that  organization  and  its  advertising.  In 
terms  of  the  user  experience,  brand  identity  creates  expectations 
for  the  value  that  an  organization’s  products  will  provide  to  the 
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end  user.  As  such,  it  is  an  important  component  in  setting  people’s 
expectations  for  how  to  approach  a  product  and  what  the  product 
will  do  for  them  economically,  functionally,  or  psychologically. 

Brands  live  in  the  minds  and  expectations  of  the  buyers  and 
users  of  an  organization’s  products  and  services.  A  logo  can 
evoke  a  set  of  feelings  and  expectations  for  the  value  that  a 
product  will  give  someone,  but  it  is  not  the  actual  value.  The 
product  still  has  to  provide  the  value,  although  often  that  value 
is  not  in  terms  of  the  actual  functionality,  but  rather  in  the  emo¬ 
tional  satisfaction  that  owning,  using,  or  being  seen  with  a  prod¬ 
uct  brings.  This  aspirational  component  of  a  brand  is  the  emo¬ 
tional  value  the  audience  perceives  the  product  will  deliver.  In 
that  sense,  it  is  the  perceived  affective  quality  of  all  of  the  prod¬ 
ucts  produced  by  an  organization. 

Products  that  do  not  meet  brand  expectations  can  either 
disappoint  or  confuse  users.  During  the  dot-com  boom  of  the 
late  1990s,  many  companies  attempted  business  models  that 
took  their  brands  well  outside  of  people’s  existing  expectations 
for  them. 

For  example,  when  Intel,  a  chipmaker,  partnered  with  toy 
manufacturer  Mattel,  it  seemed  like  a  good  way  to  merge  cutting 
edge  technology  with  toy  manufacturing.  The  partnership  pro¬ 
duced  several  products  under  the  Intel  Play  brand  (Fig.  46.4). 
However,  sales  of  the  toys  did  not  meet  expectations,  and  the 
partnership  was  dissolved.  As  with  any  enterprise,  the  circum¬ 
stances  were  complex,  but  one  of  the  potential  problems  may 
have  been  that  the  Intel  brand  strongly  connoted  an  entirely  dif¬ 
ferent  set  of  values  than  was  appropriate  for  the  sale  of  toys.  As 
manufactured  and  sold  by  Digital  Blue  (Fig.  46.3),  an  educa¬ 
tional  toy  company  founded  to  market  and  develop  the  prod¬ 
ucts  from  the  failed  venture,  the  products  developed  by  Intel 
Play  are  seeing  financial  success.  This  shows  that  the  entire  hi¬ 
erarchy  of  Garrett’s  (2000)  Elements  can  be  satisfied  on  a  func¬ 
tional  level,  but  if  the  total  user  experience  does  not  fulfill  the 
user’s  larger  expectations,  products  can  still  fail. 


Good  experiences  while  using  a  product  will  affect  people’s 
perceptions  of  the  organization  that  produced  it,  which  in  turn 
affects  their  expectations  for  the  functionality  of  other  prod¬ 
ucts  that  the  company  produces.  Bad  experiences  with  a  ser¬ 
vice,  such  as  documented  in  Rafaeli  and  Vilnai-Yavetz  (2004),  can 
lead  to  a  wholesale  dissatisfaction  with  other  products  that  the 
organization  produces,  irrespective  of  those  products’  immedi¬ 
ate  user  experience. 

From  an  HCI  perspective,  understanding  and  incorporating 
brand  identity  into  the  experience  is  important.  As  Saffer  (2002) 
said,  “Navigation,  nomenclature,  and  content  presentation  must 
also  reflect  the  company’s  brand.  The  most  elegant  visual  de¬ 
sign  in  the  world  isn’t  going  to  overcome  inappropriate  inter¬ 
action  design.” 

For  example,  knowing  the  children’s  art  product  manufac¬ 
turer  (previously  mentioned)  was  more  interested  in  commu¬ 
nicating  the  company  brand  than  producing  revenue  changed 
the  direction  of  the  user  experience  dramatically,  websites  in¬ 
tended  to  efficiently  sell  products  that  are  designed  to  be 
purely  functional,  whereas  one  intended  to  evoke  a  sense  of 
playfulness,  whimsy,  and  creativity  (the  psychological  values 
the  company  in  question  tried  to  communicate)  is  much  dif¬ 
ferent.  Compare  the  McMaster-Carr  website,  which  has  been  a 
very  successful  sales  website  (Spool,  2005),  to  the  site  for  the 
Lego  toy  company  (see  Fig.  46.5  and  Fig.  46.6). 

The  interaction  design,  the  organization  of  the  content,  the 
kind  of  content  presented,  and  the  visual  design  of  individual 
interface  elements  of  the  two  websites  differ  not  just  because 
the  audience  differs  or  the  products  differ  (though  those  dif¬ 
ferences  are  undeniably  important)  but  also  because  the  mes¬ 
sage  they  want  to  communicate  differs.  Compare  the  Carhartt 
clothing  company’s  websites  in  the  United  States  to  that  in  Eu¬ 
rope  (see  Eig.  46.7  and  Eig.  46.8).  In  the  United  States,  Carhartt 
is  branded  primarily  as  a  work  wear  manufacturer,  while  in  Eu¬ 
rope,  it  is  a  fashion  brand  for  urban  youth. 


FIGURE  46.3.  The  Digital  Blue  Digital  Movie  Creator,  If 


FIGURE  46.4.  The  Intel  Ray  Digital  Movie  Creator. 
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FIGURE  46.5.  McMaster-Carr  website  homepages. 
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FIGURE  46.7.  Carhartt  U.S.  website  homepages. 


FIGURE  46.6.  Lego  website  hompages. 

Relationships.  In  today’s  world,  we  rarely  interact  with 
an  organization  only  once.  The  process  of  buying,  owning,  using, 
and  maintaining  a  product,  whether  software  or  an  appliance, 
consists  of  many  interactions  with  an  organization.  Customer  re¬ 
lationship  management  (Wikipedia,  2005c)  and  customer  ex¬ 
perience  management  (Wikipedia,  2005d)  practices  define 
these  interactions  as  contact  points  or  touch  points  (Schmitt, 
2003).  These  practices  aim  to  analyze  and  design  positive  expe¬ 
riences  during  these  interactions.  In  fact,  some  theories  (Pine 
&  Gilmore,  1999)  claim  these  interactions  are  even  more  im¬ 
portant  than  the  products  that  spark  them. 

The  mobile  phone  is  an  example  of  the  numerous  customer 
relationships  involved  in  owning  and  using  a  contemporary 


product.  Although  technically  a  computer,  a  mobile  phone  is 
not  just  a  computational  tool.  Its  functionality  as  a  tool  and  as  a 
communication  medium  completely  depends  on  the  services 
accessible  through  a  handset.  In  a  sense,  it  is  the  physical  man¬ 
ifestation  of  a  set  of  virtual,  continually  shifting  services  (as  evi¬ 
denced  by  the  complexity  of  subscription  plans).  Without  the 
services,  a  phone  handset  is  useless.  However,  the  network 
does  not  just  provide  transparent  connectivity;  the  ecology  of 
organizations  involved  in  delivering  the  mobile  user  experience 
is  fragmented  (Fig.  46.9),  and  none  of  the  players  is  wholly 
responsible  for  the  HCI. 

Mobile  user-experience  design  processes  require  an  under¬ 
standing  of  the  relationship  between  various  organizations  and 
the  way  in  which  users  will  interact  with  them.  Knowing  these 
contact  points  can  focus  and  prioritize  the  HCI  research  and 
design.  Arnall  (2001)  cited  constraints  imposed  network  per- 
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FIGURE  46.9.  The  mobile  data  value  web  (European  Information 
Technology  Observatory,  2002). 


formance,  billing,  and  hardware  limitations  in  creating  an  SMS- 
based  service.  Creating  a  satisfying  user  experience  required 
determining  both  user’s  experience  with  each  of  those  con¬ 
tact  points  and  the  integration  of  all  of  them.  For  example,  the 
design  of  the  service  had  to  include  both  interaction  and  fi¬ 
nancial  incentives  for  people  to  sign  up  for  the  service  (the 
signup  process  was  made  to  be  quick  and  the  service  was  ini¬ 
tially  free) . 

The  exact  nature  of  contact  points  will  vary  based  on  the  de¬ 
tails  of  the  service  or  product  under  consideration,  but  it  typi¬ 
cally  involves 

•  Customer  service 

•  Billing 

•  Sales 

•  Account  management 

•  Marketing 

To  some  extent,  this  has  always  been  true  in  all  HCI  devel¬ 
opment,  but  it  has  not  been  a  prime  focus  of  the  research  and 
design  process.  In  an  ecology  of  many  interacting  services,  such 
as  described  above,  ignoring  the  other  players  in  the  environ¬ 
ment  is  no  longer  optional.  When  such  a  service  provides  a  so¬ 
lution  to  an  end  user,  the  solution  cannot  just  be  evaluated 
through  the  completion  of  a  narrow  set  of  tasks.  It  needs  to  be 
analyzed  in  the  improvement  it  makes  in  the  life  of  the  person 
who  uses  it.  People  must  find  value  throughout  their  interaction 
with  it,  whether  through  the  out  of  box  experience  (Wikipedia, 
2005e)  in  unpacking  the  product,  or  how  they  feel  as  they  are 
using  it,  or  their  interactions  with  the  product  and  the  organi¬ 
zation  during  a  technical  support  call. 

Industrial  designers  and  architects  have  addressed  these  is¬ 
sues  for  a  long  time,  recognizing  that  the  evolving  roles  their 
products  play  in  people’s  lives  are  not  always  possible  to  pre¬ 
dict  or  design  to  the  last  detail.  They  have  focused  on  creat¬ 
ing  user  experiences  that  offer  multiple  channels  of  value 
(rarely  in  monetary  terms,  but  by  a  combination  of  affective 


and  functional  ideas).  Salespeople  and  marketers  have  ap¬ 
proached  the  experience  from  the  other  direction.  They  try 
to  identify  the  interactions  people  have  with  an  organization, 
to  understand  the  value  (in  monetary  terms)  of  those  interac¬ 
tions,  and  to  maximize  their  monetary  values  or  to  minimize 
their  expenses. 

Computer  interfaces  straddle  both  sides  of  the  equation, 
providing  immediate  value  for  end-users  and — especially  in  a 
dynamic  networked  environment  such  as  that  provided  by  mo¬ 
bile  phones,  ubiquitous  computing,  or  the  web — value  for  or¬ 
ganizations  (whether  monetary  or,  as  in  case  of  governments 
or  nonprofit  organizations,  other  through  other  metrics  that 
include  social  goods).  Integrating  an  analysis  of  the  relation¬ 
ship  between  people  and  organizations  as  mediated  by  the  in¬ 
terface  is  a  key  component  to  providing  value  to  both  groups. 


EXAMINING  THE  USER  EXPERIENCE 


Approaching  the  investigation  of  such  difficult  to  quantify  ideas 
as  affect  and  value  is  no  small  task.  Organizations  may  be  unable 
to  articulate  their  intentions  or  values.  Differentiating  end  users’ 
needs  from  their  desires  and  their  actual  behavior  from  hope¬ 
ful  visions  is  difficult.  Further,  the  ambiguous  nature  of  the  col¬ 
lected  data  makes  interpretations  vary  across  interpreters.  Ex¬ 
tracting  quantitative  information  about  a  broad  group  of  people 
takes  an  investment  of  extraordinary  resources. 

However,  the  difficulty  of  collecting  this  information  should 
not  discourage  you  from  trying  to  collect  it.  In  order  to  reduce 
the  risk  of  failure  (though,  sadly,  probably  not  increase  the  risk 
of  success),  a  model  of  the  whole  user  experience  such  as  Gar¬ 
rett’s  (2000)  is  valuable. 

This  section  describes  in  detail  several  techniques  for  under¬ 
standing  the  organizational  and  user  needs  for  the  user  experi¬ 
ence.  They  are  by  no  means  exhaustive,  but  they  are  included 
as  examples  of  how  to  approach  a  user-experience  research  proj¬ 
ect,  rather  than  focusing  on  fragmented  tasks,  and  how  to  prag¬ 
matically  apply  the  theory  of  the  previous  sections. 

Identifying  Organizational  Goals 

There  are  three  steps  to  understanding  organizational  goals  for 
a  product: 

1.  Identifying  stakeholders 

2.  Collecting  stakeholder  goals 

3.  Prioritizing  among  the  goals 

Identify  stakeholders.  Start  by  identifying  groups  who 
most  often  own  the  product  (or  who  most  often  care  about 
the  product).  Make  a  list  of  all  of  the  departments  who  af¬ 
fected  by  the  product’s  success  or  failure  and  of  who  in  each 
department  is  most  responsible  for  it.  If  there  is  not  a  single 
person  who  is  responsible  for  the  product  in  a  given  depart¬ 
ment,  find  the  person  who  dealt  with  it  most  recently.  Odds 
are  that  this  person  regularly  deals  with  it  or  can  tell  you  who 
does.  Product  managers  generally  know  which  groups  and 
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individuals  have  the  biggest  stake  in  the  project  and  the  list 
will  likely  contain: 

•  Engineering^ 

•  Design 

•  Marketing 

Other  groups  can  have  stakes  in  the  process,  depending  on 
the  size/structure  of  the  organization  in  the  product’s  success.  A 
significant  managerial  presence  in  a  product  could  be  a  major 
moneymaker  (or  loser)  or  if  it  is  brand  new.  Each  of  these  groups 
has  a  different  perspective  on  the  product. 

Eor  example,  a  fictitious  list  of  stakeholders  (Table  46.2)  for 
a  web-based  data  warehousing  application  contains  representa¬ 
tives  from  identity  design  and  marketing  in  addition  to  the  peo¬ 
ple  who  actually  build  the  product. 

Collect  stakeholder  goals.  Once  you  have  your  list  of 
stakeholders,  find  out  what  they  consider  the  most  important 
issues.  You  can  do  this  either  by  getting  all  of  the  stakeholders 
together  and  spending  an  afternoon  setting  organization-wide 
priorities  for  the  product  or  by  speaking  to  each  person  inde¬ 
pendently.  Individual  interviews  are  often  necessary  with  exec¬ 
utives,  and  it  is  critical  that  they  are  involved  in  this  process. 
Ask  each  person  (or  department) 

1.  In  terms  of  what  you  do  on  a  day-to-day  basis,  what  are  the 
goals  of  this  product? 

2.  Are  there  ways  that  it  is  not  meeting  those  goals?  If  so,  what 
are  they? 

3.  Are  there  questions  you  want  to  have  answered  about  it?  If 
so,  what  are  they? 

Every  group  will  have  different  goals  and  will  measure  suc¬ 
cess  differently.  Programmers  may  measure  success  by  the  num¬ 
ber  of  bugs  per  thousand  lines  of  code.  Identity  design  may 
have  internal  reviews  that  evaluate  how  well  the  product  inte¬ 
grates  with  the  corporate  brand.  Customer  support  will  want 
to  minimize  the  number  of  questions  they  have  to  field.  Sales 
will  always  want  to  bring  in  more  revenue. 

Once  you  have  spoken  to  the  departmental  representatives, 
make  a  list  of  the  goals  and  desires.  At  this  point,  you  will  prob¬ 
ably  see  that  some  of  the  goals  are  contradictory.  It  is  too  early 
to  attempt  to  resolve  the  contradictions,  but  investigating  the 
relationship  between  them  may  be  an  important  near-term  goal 
for  the  project. 

Prioritize  organizational  goals.  Based  on  your  inter¬ 
views,  you  will  have  some  idea  of  the  corporate  priorities  with 
respect  to  the  goals  you  have  defined.  Some  things  are  impor¬ 
tant  because  the  organization  believes  they  prevent  people  from 
using  a  key  feature.  Others  may  be  important  because  they  dif¬ 
ferentiate  the  product  from  its  competitors.  Still  others  might 


TABLE  46.2.  List  of  Potential  Stakeholders  of  a 
Fictitious  Data  Warehousing  Application 

Alison,  VP  of  Product  Development 

Erik,  Interaction  Design 

Michel,  Marketing 

Claire,  Database  Administration 

Ed,  Customer  Support 

Leif,  QA 

Joan,  Identity  Design 


TABLE  46.3.  A  List  of  Goals  and  Questions  of  a 
Fictitious  Data  Warehousing  Application 

Who  Goals  and  Questions 


Alison,  VP  Product 
Development 

Erik,  Interaction 
Design 


Michel,  Marketing 

Claire,  Database 
Administration 

Ed,  Customer 
Support 
Leif,  QA 

Joan,  Identity 
Design 


Fewer  complaints  from  major  clients 

Match  data  retrieval  features  offered  by 
competitor 

Help  construct  more  sophisticated  reports,  since 
the  current  interface  does  not  reveal  full 
report  engine 

Why  do  so  many  people  start  and  then  abandon 
the  query  wizard? 

To  show  tight  integration  of  the  new  report 
generator  with  the  query  system 

Is  there  a  way  to  keep  people  from  clicking  the 
search  all  button?  It  hammers  the  database 
every  time. 

Reduce  support  calls  about  report  generator 

Shift  more  support  from  the  phone  to  email 

Identify  query  wizard  lavaScript  errors  to  address 
user  complaints 

Make  the  look  and  feel  of  the  acquired  report 
generator  match  that  of  the  query  interface 


be  less  important  because  they  create  a  drain  on  resources  or 
are  currently  a  topic  of  debate  within  the  company. 

There  are  many  prioritization  methods.  Sometimes,  just  mak¬ 
ing  a  list  is  sufficient,  but  using  a  technique  that  abstracts  key  fac¬ 
tors  can  be  useful.  Table  46.4  explains  one  modified  from  the  to¬ 
tal  quality  management  industrial  manufacturing  discipline. 

Using  this  technique,  the  questions  in  Table  46.3  could  be 
prioritized,  as  in  Table  46.5. 

Often,  when  prioritized  systematically,  it  is  easy  to  see  why 
product  development  happens  in  the  way  it  does.  The  lists  show 
unstated  company  priorities  come  out  and  agendas  that  are  or¬ 
thogonal  to  the  organization’s  actual  needs.  In  retrospect,  it  is 
possible  to  see  how  decisions  go  against  the  product  and  organi¬ 
zation’s  needs  and  how  teams’  abilities  produce  the  conditions 
that  generate  bad  user  experiences.  Most  importantly,  tables  such 
as  these  allow  you  to  prioritize  what  you  learn  about  user  needs. 

A  rapid  technique:  project  history.  It  is  not  always  pos¬ 
sible  to  perform  a  rigorous  investigation  of  an  organization’s 


‘‘These  terms  are  used  here  broadly.  Engineering  typically  consists  of  programmers  in  a  software  or  web  environment  but  can  include  electrical  and 
mechanical  engineers  in  a  hardware-development  project.  Likewise,  design  can  include  information  architects,  industrial  designers,  interaction  de¬ 
signers,  and  visual  designers. 
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TABLE  46.4.  A  Prioritization  Technique 

1 .  Make  a  coiumn  next  to  your  iist  of  questions  and  iabei  it  "Desire.” 

Go  down  the  iist  and  negotiate  with  the  group  a  rating  for  each  item 
on  a  scaie  of  one  to  five.  Five  means  the  feature  affected  is  a  must 
have,  criticai  to  the  success  of  the  product,  and  one  means  it  is  a 
nice  to  have,  but  not  essentiai. 

2.  Next,  make  a  second  coiumn  and  Iabei  it  "Risk.”  This  wiil  reflect  how 
bad  the  probiem  is.  Write  a  number  on  a  one  to  five  scaie  here,  too. 
Five  represents  bad  probiems  (ones  that  either  directly  affect  the 
bottom  line  right  now  or  represent  major  malfunctions),  and  one 
refers  to  problems  that  are  annoyances  or  information  that  would 
be  good  to  know. 

3.  Finally,  make  a  column  and  label  it  "Ease.”  This  is  how  easy  your 
team  feels  it  will  be  to  address  the  problem.  Five  means  that  it  is 
easy  to  do,  and  one  means  that  it  is  very  difficult. 

4.  Multiply  the  three  entries  in  the  columns,  and  write  the  result  next  to 
them  in  a  third  column  called  "Priority."  This  combines  and  amplifies 
the  factors.  Ordering  the  list  by  the  last  column  gives  you  a  starting 
order  in  which  to  investigate  the  product’s  user  experience. 


TABLE  46.5.  The  Prioritization  Technique  from  Table  46.4 
Applied  to  the  Questions  in  Table  46.3 


Goal 

Desire 

Risk 

Ease 

Total 

Match  data  retrieval  features 
offered  by  competitor 

4 

3 

2 

24 

Why  do  so  many  people  start  and 
then  abandon  the  query  wizard? 

4 

5 

4 

80 

To  show  tight  integration  of  the 
new  report  generator  with  the 
query  system 

3 

3 

4 

36 

Is  there  a  way  to  keep  people  from 
clicking  the  search  all  button? 

It  hammers  the  database 
every  time. 

5 

5 

3 

75 

Reduce  support  calls  about 
report  generator 

2 

4 

2 

16 

Identify  query  wizard  JavaScript 
errors  to  address  user  complaints 

3 

2 

5 

30 

Make  the  look  and  feel  of  the 
report  generator  match  the 
query  interface 

5 

2 

4 

40 

needs.  A  fast  way  to  understand  the  organization’s  goals  is  to 
create  a  quick  history  of  the  project.  The  sequence  of  events 
that  lead  to  the  current  situation  reveals  a  set  of  problems  and 
solutions,  which  in  turn  reveal  what  the  organization’s  needs 
and  values.  The  process  is  straightforward  in  principle,  although 
the  answers  to  basic  questions  can  reveal  complexities  in  prior¬ 
ity  and  interest  that  a  simple  narrative  explanation  of  the  current 
situation  does  not.  Getting  a  project  history  can  be  as  simple  as 
asking  the  following  questions  of  the  key  stakeholders  respon¬ 
sible  for  a  project.  The  goal  is  to  encourage  them  to  describe  the 
sequence  that  led  to  the  current  situation. 

•  Why  did  you  decide  to  this? 

•  Why  did  you  decide  to  do  it  now? 

•  Who  initiated  the  project? 

•  What  was  the  organizational  pressure  that  suggested  it? 


The  idea  is  to  ask  these  questions  (which  are  just  a  variant  on 
the  standard  whoAvhatAvhenAvhy  interrogatives)  recursively.  In 
other  words,  for  every  answer,  it  is  possible  to  ask  the  same 
questions  to  get  an  even  older,  and  maybe  deeper,  set  of  moti¬ 
vations.  Some  techniques  recommend  doing  a  certain  number 
of  times  (four  seems  to  be  common),  but  going  deeply  on  a 
couple  of  key  ideas  is  usually  enough  to  understand  the  deeper 
motivations  and  constraints  underlying  the  current  situation. 
One  variant  that  has  proved  useful  is  to  ask  to  include  anyone 
mentioned  into  the  conversation.  “Oh,  so  Lucie  suggested  that 
PCB  designers  weren’t  using  the  spec  sheets,  which  is  why  we 
are  trying  to  make  them  more  prominent.  Could  we  talk  to  her 
about  how  she  determined  that  they  were  not  using  them 
enough?”  It  could  be  that  Lucie  has  stacks  of  e-mails  from  cus¬ 
tomer  service  in  which  people  ask  for  information  that  is  readily 
available,  or  maybe  she  just  has  a  hunch.  In  the  former  case, 
the  information  in  the  e-mail  could  be  valuable  in  determining 
users’  expectations  from  the  service;  in  the  latter  case,  under¬ 
standing  Lucie’s  motivations  provides  information  about  how 
she  measures  success  or  envisions  the  purpose  of  the  service. 


Field  Observation 

Norman  (1998)  said,  “The  goal  is  to  make  the  people  who  are 
being  observed  become  participants  in  the  discovery  process 
of  learning  just  what  their  real  needs  are — not  the  artificial 
needs  proscribed  by  the  way  they  do  things  today,  but  what  the 
goals  are,  what  they  are  striving  for.  This  is  the  role  of  rapid 
ethnography.” 

A  highly  effective  and  increasingly  popular  method  of  ex¬ 
ploring  the  user  experience  comes  from  field-research  tech¬ 
niques  based  on  methods  pioneered  by  anthropology,  ethnog¬ 
raphy,  and  ethnomethodology.  Examining  work  and  life  context 
produces  a  richer  understanding  of  the  relationships  between 
preference,  behavior,  problems,  and  values.  Laboratory  and  sur¬ 
vey  methods  extract  people  from  their  environments  to  focus 
on  individual  tasks  or  perspectives  or  aggregate  responses  from 
many  people.  Field  observation’s  goal  is  to  gain  insight  into  the 
total  relationship  between  the  elements  of  the  user  experience 
as  experienced  and  understood  in  the  context  of  use. 

Rather  than  trying  to  validate  theories  in  a  controlled  setting, 
these  ethnography-derived  methods,  including  contextual  in¬ 
quiry  (Beyer  &  Holzblatt,  1998),  derive  insight  through  direct 
observation  of  people  in  their  actual  environment  with  (ideally) 
little  presumption  about  their  behavior  and  needs. 

Direct  observation  removes  much  of  the  bias  that  creeps  into 
research  when  people  or  tasks  are  isolated.  Outside  of  the  envi¬ 
ronment  that  triggers  them,  our  explanations  of  desires,  values, 
reactions  and  behaviors,  especially  in  routine  events,  lose  criti¬ 
cal  details  by  our  tendency  to  simplify,  idealize,  and  project.  Ex¬ 
ploring  the  context  of  activities  can  identify  people’s  larger  goals 
through  the  small  details.  For  example,  when  someone  leaves  a 
note  on  a  kitchen  counter,  the  goal  is  not  to  just  to  leave  the  mes¬ 
sage,  but  rather  to  communicate  something  specific  to  a  mem¬ 
ber  of  the  household  (even  him-  or  herself).  The  message  may 
be  a  to-do  list,  a  reminder,  or  an  alert  (Elliot,  Neustaedter,  & 
Greenberg,  2005),  and  its  location  communicates  how  to  inter¬ 
pret  the  message.  'When  discussing  domestic  communications 
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outside  the  context  of  their  daily  routine,  critical  details  such  as 
spatial  placement,  time  of  day,  materials  used,  or  triggering  event 
can  be  lost. 

Direct  observation  identifies  emotional  reactions  that  would 
be  otherwise  difficult  to  capture.  For  example,  Vrendenburg, 
Righi,  and  Isensee  (2001)  described  a  situation  where  a  t-shirt 
included  in  the  packing  material  of  an  IBM  RS/6000  computer 
led  to  surprise  and  delight  from  users — signs  of  a  good  user  ex¬ 
perience — just  unpacking  the  box: 

Users  opened  the  product  box  to  find  a  t-shirt,  a  mouse  pad,  a  copy  of 
Wired  magazine,  and  games  that  showcased  the  3D  graphics  capabilities 
of  the  system  such  as  Quake.  This  approach  to  design  worked  beauti¬ 
fully.  It  became  cool  to  have  an  RS/6000.  One  of  the  most  common  ques¬ 
tions  asked  by  customers  in  the  feedback  survey  was  “Where  can  I  get 
another  t-shirt”?  (p.  34) 

This  was  an  unexpected  observation  that  was  not  part  of  a 
focused  program  of  focused  ethnographic  observation  of  peo¬ 
ple’s  experiences  unpacking  RS/6000  computers,  but  it  is  repre¬ 
sentative  of  the  kinds  of  things  such  observation  produces.  In  an¬ 
other  instance,  Berg,  Taylor,  and  Harper  (2003)  observed  the 
following  relationships  between  UK  teenagers  and  their  mobile 
phones: 

[The]  text  messages  that  were  exchanged  were  sometimes  described 
as  objects  that  evoked  particular  memories.  The  messages  were  the 
embodiment  of  something  personal  that  could  be  stored,  retrieved, 
reread  and  shared,  becoming  tangible  mementos  for  individuals  and 
groups.  Thus,  the  phone  appeared  to  provide  a  means  to  participate 
in  social  exchange  in  so  far  as  it  enabled  particular  objects  to  take  on 
symbolic  meaning  and  for  the  objects  to  be  seen  as  meaningful  be¬ 
tween  people,  (p.  434) 

Such  insights  map  directly  to  user  experience  design  (as  the 
authors  then  proceeded  to  do).  They  allow  technology  to  en¬ 
able  specific,  observed  behaviors  in  the  context  they  occur, 
rather  than  hypothetical  behaviors  and  assumed  needs. 

Field  research  methods  for  user-experience  design  are  typi¬ 
cally  neither  as  detailed,  data-heavy  or  analytically  rigorous  as 
formal  ethnography  (Bentley  et  al.,  1992).  These  techniques  fo¬ 
cus  on  pragmatic  on  the  ground  observation  and  interpretation 
within  the  context  of  a  development  and  production  process. 
They  use  standardized  methods  and  seeking  to  identify  contact 
points,  activity  sequences,  artifacts,  and  values  in  the  context  of 
work  practices.  Beyer  and  Holzblatt’s  (1998)  contextual  inquiry 
is  probably  the  most  prevalent  of  these  techniques.  Generalized 
from  rapid  ethnography  (Millen,  2000),  Table  46.6  lists  a  set  of 
steps  for  conducting  field  research. 


TABLE  46.6.  Steps  for  Conducting  Field  Research, 
Adapted  from  Millen  (2000) 

1 .  Find  key  informants 

2.  Narrow  the  focus 

3.  Use  interactive  observation 

4.  Use  multiple  researchers  and  analyze  collaboratively 

5.  Validate  conclusions 


Find  key  informants,  schedule  research.  Millen’s 
(2000)  recommended  identifying  informants  and  asking  them 
to  serve  as  guides  through  a  field  observation.  He  suggested 
that  guides  should  be  “people  with  access  to  a  broad  range  of 
people  and  activities  and  be  able  to  discuss  in  advance  where  in¬ 
teresting  behaviors  are  most  likely  to  be  observed  or  where  ac¬ 
tivities  that  reveal  social  tension  are  most  likely  to  be  found” 
(Millen,  2000,  p.  282).  For  example,  when  observing  technol¬ 
ogy  in  a  hospital,  it  pays  to  talk  to  a  nurse  who  works  there,  or 
if  investigating  hobbyist  PC  case  modification  (casemod)  cul¬ 
ture,  it’s  valuable  to  have  a  member  of  a  club  of  modders  intro¬ 
duce  you  to  the  hobby  and  the  players  in  it. 

When  choosing  informants,  you  should  pick  at  least  five  peo¬ 
ple  or  groups  who  resemble  the  people  who  will  use  your  prod¬ 
uct  or  who  will  provide  key  insights.  Overall,  they  should  have 
the  same  profile  as  the  eventual  target  audience,  though  fringe 
members  of  a  group  may  be  good  informants  and  provide  in¬ 
formation  or  exhibit  behavior  that  typical  group  members  will 
have  internalized. 

The  breadth  and  depth  of  research  will  determine  the  extent 
of  the  study  undertaken:  long-term  planning  generally  requires 
deeper  insight  and,  thus,  more  and  longer  observation  than 
short-term  problem  solving,  for  example.  A  typical  month-long 
research  schedule  (Table  46.7)  generally  involves  two  to  five 
hours  per  observation  or  interview  period,  followed  by  two  to 
three  hours  of  group  analysis  per  hour  of  observation. 

Narrow  the  focus.  The  goal  of  traditional  ethnographies 
is  to  understand  as  much  as  possible  about  the  entire  context 
in  which  a  group  of  individuals  acts,  without  judgment.  In  con¬ 
trast,  most  commercial  research  projects  begin  with  an  idea 
about  what  problems  need  solving  and  an  idea  about  how  to 
solve  them.  Field  observation  clarifies  and  focuses  these  ideas 
by  discovering  the  situations  in  which  these  problems  occur  and 
how  people  deal  with  them.  In  addition,  unlike  an  evaluative 
technique  such  as  usability  testing,  it  is  observational  and  typi¬ 
cally  uncovers  unexpected  directions.  Thus,  it  is  best  done  be¬ 
fore  the  process  of  creating  solutions  has  begun  when  there  is 
still  time  to  iterate  on  research.  This  is  usually  at  the  beginning 
of  the  development  cycle. 

However,  in  the  interest  of  maximizing  immediate  results, 
the  project  typically  concentrates  on  the  fields  of  activity  that 
will  likely  produce  results  that  designers  can  incorporate  into 
the  user  experience.  Narrowing  focus  means  identifying  the  im¬ 
portant  aspects  of  your  audience’s  work  or  life  practice,  while 
leaving  open  the  option  to  challenge  assumptions.  One  tech¬ 
nique  is  for  researchers  to  familiarize  themselves  closely  with 
the  terminology,  tools,  and  techniques  their  audiences  are  likely 


TABLE  46.7.  Typical  Field  Research  Schedule 


Timing 

Activity 

t  —  2  weeks 

Organize  and  schedule  participants. 

T 

Begin  observation.  Begin  analysis-scheduling  process 
for  development  team. 

t  +  1  week 

Complete  observation.  Review  videotapes  and  notes. 
Complete  analysis  scheduling. 

t  +  2  weeks 

Begin  analysis. 

t  +  3  weeks 

Complete  analysis. 
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to  use.  An  informant  can  walk  the  researchers  through  some 
concepts  before  formalizing  the  research  goals.  The  sports- 
caster  method,  where  one  informant  explains  what  another  one 
is  doing,  is  another  useful  technique.  For  example,  walking 
through  a  shopping  district  with  a  fashion-conscious  teenage 
commentator  can  reveal  a  lot  about  where  to  look  for  interest¬ 
ing  behaviors,  rather  than  starting  from  scratch. 

With  this  information  in  mind,  it’s  possible  to  narrowly  de¬ 
fine  the  aspect  of  the  practice  that  you  can  ask  questions  about 
and  observe. 

User  interactive  observation.  This  is  the  key  to  the 

technique,  and  it  requires  going  to  where  people  are  engaged 
in  the  kind  of  activity  the  experience  for  which  you  are  designing 
and  asking  them  to  teach  you  about  their  activities.  Most  of  the 
time  should  be  spent  observing  what  the  participants  are  doing, 
what  tools  they  are  using,  and  how  they  are  using  them.  One 
effective  technique  is  to  take  on  the  role  of  an  apprentice  and 
asking  them  to  give  a  running  description  of  what  they  are  do¬ 
ing.  As  in  an  expert-apprentice  relationship,  this  should  be 
enough  to  describe  the  practice  to  the  apprentice  but  not 
enough  to  interrupt  the  flow  of  the  work.  As  an  apprentice,  you 
may  occasionally  ask  for  explanations,  clarifications,  or  walk¬ 
throughs  of  actions,  but  do  not  let  it  drive  the  discussion. 

Observations  can  be  in  the  form  of  structured  interviews, 
with  prewritten  discussion  guides.  This  is  useful  in  answering 
specific  questions,  but  risks  missing  key  challenges  to  assump¬ 
tions.  Other  kinds  of  tools  can  elicit  specific  kinds  of  information 
(Beyer  &  Holzlbatt,  1998;  Millen,  2000),  or  aid  in  the  construct¬ 
ing  models  later  (Wixon  et  al.,  2002).  An  informant  can  use  a  pa¬ 
per  model  of  a  shop  floor,  for  example  to  describe  activity  in  a 
factory  than  would  be  possible  in  the  loud  environment  of  the 
factory  itself 

Collect  as  much  documentation  of  the  practice  as  possible. 
Digital  and  video  cameras,  liberally  used,  provide  both  material 
for  analysis  and  illustrations  for  presentation.  Collect  physical 
artifacts,  when  possible.  For  example,  a  group  of  researchers 
studying  patterns  of  technology  use  in  urban  German  areas  took 
400  photographs  in  a  span  of  three  hours  and  brought  back 
posters,  local  handicrafts,  and  a  pipe  from  a  construction  site. 

Use  multiple  researchers  and  analyze  collabora- 
tively.  Collecting  and  analyzing  data  simultaneously  can  pro¬ 
vide  efficiency,  though  it  introduces  more  potential  biases  to  the 
interpretation  of  the  observations  (Madison,  2005).  Techniques 
for  group  qualitative  data  analysis  range  from  traditional  tran¬ 
script  coding  methods  (U.S.  General  Accounting  Office,  1996) 
to  contextual  inquiry’s  formal  methods  (Beyer  &  Holzblatt, 
1998)  for  constructing  multifaceted  models  of  users  work  prac¬ 
tices.  Affinity  diagrams  are  a  particularly  popular  method.  Table 
46.8  describes  the  steps  in  the  construction  of  an  affinity  dia¬ 
gram.  It  takes  about  one  day. 

This  rather  mechanistic  process  yields  good  first-cut  results 
about  the  breadth  of  the  user  experience,  and  frames  subse¬ 
quent  investigation. 


TABLE  46.8.  Affinity  Diagram  Construction, 

Adapted  from  Beyer  and  Holzblatt  ( 1 998) 

1 .  Extract  50-100  notes  from  each  interview.  Notes  are  singular 
observations  about  tools,  sequences,  interactions — anything. 
Randomize  them. 

2.  Get  a  group  of  people  together  in  a  room  with  a  blank  wall  or  a  big 
whiteboard.  Have  them  block  out  the  whole  day  for  the  work. 

3.  Divide  the  group  into  pairs  of  analysts.  Give  each  pair  an  equal 
number  of  notes. 

4.  Write  one  note  on  a  Post-it  and  put  it  on  the  wall/window/board. 

5.  Tell  the  group  to  put  notes  that  relate  to  that  note  around  it  one  at 
a  time,  it  does  not  matter  how  the  notes  relate  as  long  as  the  group 
feels  they  relate. 

6.  If  no  more  notes  relate  to  a  given  note  cluster,  write  a  label 
summarizing  and  naming  the  cluster  (use  a  different  color  so  it's 
easy  to  identify  the  labels). 

7.  Repeat  the  process  with  the  other  notes,  labeling  groups  as  they  occur. 

8.  Generally,  it  is  useful  to  break  up  groups  of  more  than  four  notes  into 
smaller  clusters.  However,  there  is  no  upper  bound  on  how  many 
notes  may  be  in  a  group  if  there  is  no  obvious  way  to  break  it  up. 

9.  As  the  groups  accumulate,  Beyer  and  Holzblatt  (1998)  recommended 
using  pink  notes  to  label  groups  of  blue  notes  and  green  notes  to 
label  groups  of  pink  notes. 


Validation.  A  key  part  of  modeling  is  to  evaluate  the  qual¬ 
ity  of  the  model  with  the  people  whose  lives  it  models.  An  im¬ 
mediate  follow-up  interview  with  in-depth  questions  can  clarify 
a  lot.  Certain  situations  may  not  have  been  appropriate  to  in¬ 
terrupt  (e.g.,  if  you  are  observing  a  surgeon  or  a  stock  trader, 
that  may  apply  to  the  whole  observation  period),  whereas  oth¬ 
ers  may  have  brought  up  questions  that  would  have  interrupted 
the  task  flow.  Conducting  this  interview  while  the  participant’s 
memory  of  the  event  is  still  fresh  will  produce  best  results. 
“You’ll  never  understand  what’s  really  going  on  until  you’ve 
talked  to  people  about  what  they  are  doing.  The  [follow-up]  in¬ 
terview  .  .  .  gives  you  the  rationale  to  make  sense  of  things  that 
might  otherwise  seem  odd  or  insignificant”  (Bellotti,  1999). 

Focus  Groups^ 

People’s  affective  responses  and  values  are  hard  to  observe  ob¬ 
jectively,  and  getting  a  subjective  read  is  often  all  that  is  possible. 
Focus  groups  are  structured  group  interviews  that  quickly  and 
inexpensively  reveal  a  target  audience’s  desires,  experiences, 
priorities,  and  values.  Sometimes  vilified  by  their  associations 
with  dishonest  marketing,  they  do  not  deserve  their  notoriety. 
They  are  neither  the  panacea  for  curing  bad  products  nor  pseu¬ 
doscientific  voodoo  to  justify  irrational  decision  making.  "When 
moderated  well,  carefully  analyzed  and  appropriately  presented, 
they  are  an  excellent  technique  for  uncovering  what  people 
think  about  a  given  topic  and,  especially,  how  they  think  about 
it.  A  focus  group  reveals  people’s  perceptions  of  their  values: 
what  they  feel  right  now  and  how  they  see  that  in  relation  to 
themselves.  Those  are  crucial  in  understanding  how  an  experi¬ 
ence  will  affect  them. 


^Much  of  this  chapter  is  adapted  from  Kuniavsky,  M.  (2003) . 
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In  product  development,  focus  groups  are  most  useful  early 
in  the  development  cycle,  when  they  generate  ideas,  prioritize 
features,  and  provide  insight  into  people’s  values  and  expecta¬ 
tions.  They  can  reveal  the  features  people  value  highest  and  why 
they  value  them,  though  not  whether  they  will  actually  use 
them.  As  a  competitive  research  tool,  they  uncover  what  people 
value  in  competitors’  products  and  where  those  products  fail. 
As  Krueger  (1988,  p.  83)  said,  “The  purpose  of  focus  groups  is 
not  to  infer,  but  to  understand,  not  to  generalize  but  to  deter¬ 
mine  a  range,  not  to  make  statements  about  the  population  but 
to  provide  insights  about  how  people  perceive  a  situation.” 

A  focus  group  series  is  a  sequence  of  tightly  moderated 
group  discussions  among  people  taken  from  a  thin  slice  of  a 
product’s  target  audience.  The  goal  is  to  encourage  the  partici¬ 
pants  to  feel  comfortable  revealing  their  thoughts  and  feelings 
by  putting  them  in  a  group  of  people  who  are  like  them,  or 
share  an  interest  or  an  experience  that  directly  relates  to  a  prod¬ 
uct  or  an  idea. 

Prepare.  Focus  group  preparation  consists  of  having  sev¬ 
eral  things: 

•  A  schedule.  The  best  results  come  from  situations  where 
there  has  been  enough  time  to  examine  the  contingencies. 
A  good  schedule  provides  sufficient  time  for  everything,  es¬ 
pecially  recruiting  and  guide  writing,  and  enough  slop  to 
make  a  mistake  or  two. 

•  The  target  audience.  "Who  will  be  to  invite  to  participate? 
Specifically,  you  need  to  know  the  subset  of  the  target  audi¬ 
ence  that  is  likely  to  give  you  the  best  feedback. 

•  The  research  scope.  Focus  group  series  can  have  a  few  groups 
of  a  handful  of  people  or  as  many  as  a  dozen  groups  with  ten 
or  more  participants  apiece.  The  number  of  groups  and  peo¬ 
ple  will  depend  on  the  complexity  of  your  questions,  the 
depth  to  which  you  want  to  explore  the  answers,  and  the  cer¬ 
tainty  with  which  you  want  to  know  these  answers.  More 
than  four  groups  per  audience  are  rarely  necessary,  but  two 
are  generally  not  enough. 

•  Specific  research  topics.  Not  all  groups  feel  equally  comfort¬ 
able  talking  about  all  subjects  and  not  all  subjects  lend  them¬ 
selves  to  group  discussion.  Carefully  chosen  topics  and  a 
thought-through  discussion  guide  yield  the  most  information 
without  sacrificing  the  depth  of  research  or  the  clarity  of  the 
results. 


TABLE  46.9.  Typical  Focus  Group  Research  Schedule 


Timing 

Activity 

t  - 

2  weeks 

Determine  audience  and  scope,  start  recruiting 
immediately 

t  - 

2  weeks 

Determine  broad  topics  to  be  investigated,  start 
writing  guide 

t  - 

1  week 

Write  first  version  of  discussion  guide,  discuss  exact 
topic  wording  with  development  team,  check  on 
recruiting 

t  - 

3  days 

Write  second  version  of  discussion  guide  with  timing, 
discuss  with  development  team,  recruiting  should 
be  completed 

t  - 

2  days 

Complete  guide,  schedule  run  through,  set  up,  and 
check  all  equipment 

t  - 

1  day 

Run  through  in  the  morning,  check  times,  and  adjust 
guide  questions  as  appropriate 

Do  final  recruiting  check 

t 

Conduct  groups  (usually  one  to  three  days,  depending 
on  scheduling) 

Discuss  with  observers,  collect  copies  of  all  notes 

t  -F 

1  day 

Relax.  Do  something  else 

t  -F  3  days 

Watch  all  tapes,  take  notes 

t  -F 

1  week 

Combine  notes,  write  analysis 

vice,  you  want  to  pick  the  people  who  represent  the  largest  sub¬ 
set  of  your  actual  audience.  If  you  are  looking  to  find  whether 
a  new  audience  will  be  interested  in  what  you  are  developing,  a 
clear  specification  of  who  are  the  potential  users  will  be  neces¬ 
sary  and  what  factors  will  uniquely  differentiate  them  from  oth¬ 
ers.  For  example,  when  introducing  a  new  product  for  use  after 
a  car  accident,  it  is  hard  to  get  people  to  predict  what  they  are 
going  to  need;  however,  talking  to  people  who  were  in  car  ac¬ 
cidents  recently  may  get  an  evaluation  of  what  could  have  been 
useful.  A  sample  profile  is  in  Table  46.10. 

The  perspective  of  the  members  of  the  subgroups  defines 
similarity.  A  group  of  audiophiles  will  likely  be  comfortable  to¬ 
gether  regardless  of  age,  whereas  20-year-old  and  35-year-old 
urban  restaurant  goers  probably  have  perspectives  that  differ 
enough  to  require  multiple  groups.  If  you  feel  that  certain  groups 
of  people  would  not  feel  comfortable  with  each  other,  then  do 
not  put  them  together.  Income,  race,  sex,  class,  age,  job,  and 
computer  experience  can  play  roles  in  how  people  interact  in  a 
group  situation  and  how  they  react  to  a  given  user  experience. 


Make  a  schedule.  A  typical  schedule  for  a  focus  group 
series  takes  about  three  weeks  from  beginning  to  end  and 
should  provide  sufficient  time  for  recruiting  and  writing  the  dis¬ 
cussion  guide.  The  process  typically  takes  about  three  weeks,  as 
detailed  in  Table  46.9. 

Pick  an  audience.  From  your  ideal  target  audience,  you 
should  choose  a  subset  or  several  subsets  that  are  likely  to  give 
you  the  most  useful  feedback.  The  right  group  will  vary  from  sit¬ 
uation  to  situation.  First,  you  need  a  solid  profile  of  your  target 
audience,  complete  with  a  thorough  understanding  of  their  de¬ 
mographic/technological  makeup.  For  example,  if  you  are  just 
looking  to  find  out  what  existing  users  value  about  your  ser- 


TABLE  46. 1 0.  Sample  Audience  Profile  for  Focus  Groups 
Participant  Recruiting 

Age:  20  to  55 

Gender:  Separate  groups  for  men  and  women 

Income:  Household  income  over  $70, 000/year 

Computer  use:  Computer  at  home  or  work 

Internet  use:  Internet  at  home  or  work.  One  or  more  years’  experience. 
Five  to  ten  hours  per  week  for  personal  use  (shopping,  reading  news, 
banking,  etc.) 

Mobile  use:  Own  a  mobile  phone,  used  nonvoice  mobile  services 
(played  a  game,  SMS,  etc.)  One  or  more  times  in  previous  six  months 

Behavior:  Were  in  a  noninjury  auto  accident  in  the  previous  9-1 2 
months,  as  driver 
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Develop  discussion  topics.  For  an  average  focus  group, 
you  should  have  three  to  five  main  topics  to  investigate.  You 
should  phrase  topics  in  terms  of  the  project  as  a  whole.  “Un¬ 
derstanding  the  mental  model  people  use  when  researching  in¬ 
surance”  could  be  a  goal  for  an  insurance  brokerage  website, 
while  a  service  that  recommended  home  building  contractors 
could  be  interested  in  “Knowing  at  which  point  people  turn  to 
an  external  service  when  doing  home  repair.” 

Focus  these  objectives  enough  that  a  group  could  adequately 
discuss  each  one  in  about  10  minutes.  Do  not  phrase  them  as 
questions  or  issues  that  other  methods  (such  as  a  survey)  can 
better  answer.  A  survey  could  make  ‘A  list  of  our  competitors” 
better  than  focus  groups,  whereas  “The  factors  that  make  Sony’s 
camera  phone  experience  more  compelling  than  ours”  is  more 
appropriate. 

Write  a  discussion  guide.  The  discussion  guide  is  a 
script  for  the  moderator  to  follow.  It  creates  a  consistent  frame¬ 
work  for  the  focus  group  series  by  asking  the  same  questions 
in  the  same  order  with  much  the  same  context.  This  allows  a 
discussion  to  bring  out  the  subtleties  of  the  participants’  views 
without  shortchanging  any  of  the  topics. 

Focus  group  discussion  questions  should  be: 

•  Carefully  ordered.  Questions  get  the  participants  thinking 
about  certain  issues  and  remembering  certain  events.  A  care¬ 
ful  sequence  of  questions  takes  advantage  of  their  frame  of 
mind  to  make  the  flow  of  the  group  discussion  feel  more  nat¬ 
ural,  which  in  turn  helps  the  participants  to  maintain  a  cre¬ 
ative  flow  of  ideas  and  produce  better  insights.  In  general, 
questions  should  flow  from  the  most  general  to  the  most  spe¬ 
cific,  with  each  question  narrowing  the  discussion  a  bit. 
There  should  be  planned  transitions  between  topics  unless 
introducing  a  brand  new  topic. 

•  Nondirected.  Questions  should  not  imply  an  answer  or 
present  a  value  judgment.  They  should  allow  participants 
to  fill  in  their  own  thoughts  and  values.  For  example,  ask¬ 
ing,  “Which  do  you  think  is  a  better  search  service,  Google 
or  Yahoo?”  assumes  that  the  participant  feels  there  are  ad¬ 
vantages  of  one  over  the  other.  Instead,  framed  questions 
neutrally,  “Are  there  any  things  you  like  about  using  the 
Google  search  service?  Are  there  things  you  like  about 
Yahoo?  What  are  they?  Can  you  compare  them?  How  do 
they  compare?” 

•  Open-ended.  Avoid  constraining  answers  to  fixed  responses. 
Longer,  more  open  responses  tell  a  greater  part  of  the  story 
and  tend  to  be  less  ambiguous  than  shorter  responses. 
Rather  than  phrasing  a  question  in  the  form  “Which  of  these 
camera  functions  are  most  important  to  you?”  you  could  ask 
“■Which  functions  do  you  use?  How  often?” 

•  Focused  on  specifics.  Conversely,  encourage  participants  to 
be  specific  in  their  answers.  Krueger  (1988)  recommended 
breaking  down  why  questions  into  multiple  what  questions, 
explicitly  asking  for  the  influences  that  informed  their  deci¬ 
sion  and  the  attributes  of  their  decision.  For  example,  “How 
did  you  decide  to  go  shopping  for  a  new  phone  plan?”  and 
“■What  factors  went  into  picking  this  carrier?”  will  provide  bet¬ 
ter  insight  than  “"Why  did  you  pick  Cingular?” 


•  Personal.  Out  of  politeness  people  attempt  tend  to  general¬ 
ize  their  experiences  to  the  public  at  large  or  to  some  hypo¬ 
thetical  audience  which  they  are  not  part  of  Since  you  want 
to  know  individual  views,  values,  and  experiences,  emphasize 
individual  experiences.  Formulate  question  so  that  they  con¬ 
centrate  on  people’s  current  and  past  behavior  and  opinions, 
without  presenting  the  option  to  project.  Thus,  “If  you  had  to 
redo  your  kitchen  right  now,  which  of  these  features  would 
you  use  to  find  a  home  contractor?”  is  preferable  to  “Which  of 
these  features  do  you  think  are  useful?” 

Granted,  fulHlling  all  of  these  criteria  with  all  questions  is  of¬ 
ten  difficult  (writing  questions  that  are  simultaneously  specific 
and  open-ended  is  a  particularly  tricky  challenge),  but  they 
should  be  kept  in  mind  as  guidelines  that  should  be  followed 
whenever  possible. 

Analyze  results.  There  are  as  many  ways  of  analyzing  fo¬ 
cus  group  information  as  there  are  analysts.  Since  the  informa¬ 
tion  is,  by  definition,  qualitative  and  contextual,  the  focus  of  the 
analysis  will  depend  on  the  purpose  of  the  group. 

One  method  consists  of  the  following  steps: 

•  Quickly  capture  initial  hypotheses.  Immediately  after  the 
end  of  the  focus  groups,  walk  through  the  discussion  guide 
section  by  section  and  ask  the  moderator  and  observers  to  re¬ 
call  their  thoughts  about  it:  "What  was  unexpected?  "What  was 
expected  but  did  not  happen?  "What  attitudes  did  people  dis¬ 
play?  What  values  did  they  espouse?  What  interesting  state¬ 
ments  did  they  make  (and  why  were  they  interesting)?  "What 
trends  did  they  observe?  Which  participants  provided  inter¬ 
esting  feedback?  'What  were  the  problems  with  the  group? 

•  Record  the  groups,  and  watch  the  recordings  to  verify  hy¬ 
potheses.  Merely  remembering  a  situation  can  miss  subtle  be¬ 
haviors.  Words  are  misquoted.  Observers  fall  into  groupthink. 
Reviewing  the  discussions  clarifies  ambiguities  and  reveals 
shades  of  meaning. 


MANAGE  WITH  USER  EXPERIENCE 


■When  introducing  a  new  technology  into  a  marketplace,  there  is 
a  risk  of  failure  and  of  losing  money  and  time  on  the  investment 
in  developing,  marketing,  and  distributing  the  technology.  It  is 
of  course  possible  to  create  successful  technology  without  hav¬ 
ing  a  model  of  the  end  user’s  or  organization’s  needs  and  de¬ 
sires.  However,  such  successes  are  essentially  the  product  of 
lucky  accidents.  "When  such  success  happens,  the  organization 
has  to  identify  the  elements  that  led  to  it.  By  this  point,  how¬ 
ever,  the  product’s  success  has  permanently  changed  the  mar¬ 
ket  and  the  organization,  and  identifying  what  made  it  success¬ 
ful  is  difficult.  This  supply-first  model  depends  on  predictable 
markets  and  is  passing.  In  the  current  environment,  “business 
[needs  to  be]  an  adaptive  system  for  responding  to  unantici¬ 
pated  requests  in  unpredictable  environments”  (Haeckel,  1999, 

p.  10). 

Working  from  the  user  experience  is  essentially  a  demand- 
first  philosophy,  continually  redefining  product  scope  to  reduce 
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the  chances  of  failure  and  increase  the  chances  of  repeated  suc¬ 
cesses.  In  other  words,  in  an  adaptive  organization,  the  organi¬ 
zation  adapts  to  the  user  experience. 

Organizations  make  technology  for  some  reason,  so  user  ex¬ 
perience  is  implicitly  included  in  all  technology  creation.  How¬ 
ever,  explicitly  basing  every  stage  of  technology  development 
on  user  experience  models  is  a  relatively  new  concept.  Though 
many  organizations  claim  to  be  customer  centered,  in  practice 
few  product  management  practices  actually  make  it  the  center  of 
all  of  their  activities.  Most  concentrate  the  examination  of  user 
and  organizational  needs  at  the  beginning  of  a  project  (often 
called  the  “requirements  gathering”  phase)  or  at  the  end  (the 
“evaluation”  phase).  Those  are  not  the  only  options.  Projects 
starting  from  scratch,  where  the  technology  is  new  and  the  de¬ 
velopment  team  is  flexible,  can  use  agile  software  development 
methods  that  introduce  user-experience  knowledge  through¬ 
out  the  development  process.  However,  mature  products  with 
long-established  processes,  attitudes,  and  methods  often  make 
starting  from  scratch  impossible.  This  situation  requires  a  differ¬ 
ent  mix  of  techniques. 

Agile  User  Experience  Development 

Henry  Ford  called  his  1907  car  the  “Model  T”  because  there  was 
a  model  S  before  it,  and  a  model  R  before  that,  all  of  the  way 
back  to  the  first  Model  A  in  1903  (the  1928  Model  A  was  a  con¬ 
scious  rebranding  to  evoke  a  new  philosophy  of  building  cars). 
In  other  words,  Henry  Ford  failed  20  times  over  the  course  of 
four  years  at  making  a  successful  passenger  car.  He  iterated, 
based  on  feedback,  on  the  idea  until  he  found  the  correct  com¬ 
bination  of  factors. 

Iteration  based  on  feedback  is  the  core  philosophy  behind  a 
family  of  software  management  practices  called  “agile  software 
development.”  Agile  development  does  not  require  detailed  re¬ 
search  and  design  specifications  up  front  or  paper  trails  and 
sign-offs  throughout.  Instead,  agile  methods  focus  on  exten¬ 
sive  communication,  rapid  iteration,  and  continuously  collect¬ 
ing  information  and  adjusting  to  it,  rather  than  trying  to  plan  the 
entire  process. 

As  Highsmith  (2002)  described: 

Agility  isn’t  a  one-shot  deal  that  can  be  checked  off  the  organizational 
initiative  list.  Agility  is  a  way  of  life,  a  constantly  emerging  and  changing 
response  to  business  turbulence.  Critics  may  counter,  ‘Agility  is  merely 
waiting  for  bad  things  to  happen,  then  responding.  It  is  a  fancy  name  for 
lack  of  planning  and  ad  hoc-ism.”  But  Agile  organizations  still  plan;  they 
just  understand  the  limits  of  planning,  (p.  16) 

Agile  methodologies,  which  include  Extreme  Programming 
(Beck  &  Andres,  1999),  Scrum  (Schwaber  &  Beedle,  2001),  and 
Crystal  Clear  (Cockburn,  2004),  do  not  explicitly  incorporate 
collecting  and  interpreting  user-experience  knowledge;  rather, 
they  continuously  adapt  to  all  new  information. 

Larman  (2003)  defined  a  set  of  core  agile  practices,  summa¬ 
rized  in  Table  46.11. 

Iterative  development.  Larman  (2003,  p.  9)  said,  ‘An  ap¬ 
proach  to  building  software  (or  anything)  in  which  the  overall 


TABLE  46. 1 1 .  Agile  Development  Practices, 
Adapted  from  Larman  (2003) 

Iterative  development 
Risk-driven  and  client-driven 
Timeboxing 
Adaptive  development 
Evolutionary  requirements 


lifecycle  is  composed  of  several  iterations  in  sequence.  Each  it¬ 
eration  is  a  self-contained  mini-project  composed  of  activities 
such  as  requirements  analysis,  design,  programming,  and  test.” 

Iterations  are  typically  from  one  to  four  weeks,  with  the 
goal  of  delivering  “a  stable,  integrated,  and  tested  partially 
complete  system”  (Larman,  2003,  p.  10)  with  every  iteration. 
In  other  words,  an  always-functioning  system  acquires  func¬ 
tionality,  in  contrast  to  a  collection  of  parts  assembled  at  the 
end.  The  user  experience  frames  the  scope  of  each  activity  and 
sets  priorities  between  them.  The  increments  of  functionality 
come  from  knowing  the  organization’s  goals  for  the  product, 
from  knowing  the  needs  and  values  of  the  end-user  audience, 
and  from  a  negotiated  balance  between  the  two. 

The  most  important  elements  to  end  users,  which  satisfy 
long-term  goals  of  the  organization,  can  be  focused  on  first.  For 
example,  one  project’s  first  iteration  focused  on  the  interface 
for  letting  users  retrieve  information,  even  though  there  was 
no  back  end  database.  That  interface  was  the  core  to  meeting 
the  user  and  business  goals  and  had  to  be  right. 

Iteration  does  not  need  to  start  with  functionality.  Before 
programmers  write  any  code  or  interface  designers  create  screen 
designs,  initial  iterations  can  explore  the  audience’s  values  and 
reactions  with  lightweight  prototypes.  For  example,  industrial 
design  as  practiced  by  design  firm  IDEO  (Kelley  &  Littman, 
2001)  is  highly  iterative.  Key  interactions  (as  determined  by  re¬ 
search  into  end  user  and  company  goals)  are  prototyped,  eval¬ 
uated,  and  refined  repeatedly  before  engineering  technology 
begins.  At  the  Rhode  Island  School  of  Design,  “looks-like”  and 
“works-like”  prototypes  differentiate  the  user  experience  from 
technological  capabilities  (Cottam,  2004).  Returning  to  an  earlier 
example,  exploratory  iterations  on  the  children’s  art  product 
website  with  end-user  participation  could  have  revealed  that 
content  meant  for  educators  confused  parents  and  children,  or 
they  could  have  revealed  that  educator  content  gave  the  web¬ 
site  added  authority. 

Risk-driven  and  client-driven.  Larman  (2003,  p.  12) 
said,  “Risk-driven  iterative  development  chooses  the  riskiest, 
most  difficult  elements  for  the  early  iterations  [and]  the  choice 
of  features  for  the  next  iteration  comes  from  the  client — what¬ 
ever  they  perceive  as  the  highest  business  value  to  them.” 

Treating  both  the  end  user  experience  of  products  and  the 
user  experience  of  organizations  as  parts  of  the  same  idea  helps 
select  among  potential  technological  solutions.  For  example,  a 
company  making  software  for  transportation  logistics  spent  a 
calendar  year,  many  developer  years,  and  millions  of  dollars  de¬ 
veloping  a  complex  feature  that  allows  the  system  to  send  sig¬ 
nal  when  cargo  enters  or  leaves  a  certain  geographic  area.  It 
sounds  like  a  good  idea,  but  it  took  much  longer  than  initially 
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estimated  and  several  years  after  its  launch,  customers  had  not 
broadly  adopted  it.  User  research  with  prototypes  would  have 
revealed  that  the  technology  did  not  fit  work  practices  and  that 
the  business  relationships  did  not  support  the  information  in 
the  form  provided.  In  other  words,  it  does  not  solve  a  prob¬ 
lem  that  people  feel  they  have,  and  their  business  systems  (in¬ 
cluding  their  business  software)  cannot  use  the  information.  Al¬ 
though  organizational  desire  was  high,  for  customers  the  risk 
of  not  doing  it  was  low,  thus  the  choice  was  neither  risk-driven 
nor  client-driven. 

Ttmeboxtng.  Agile  methodologies  depend  on  being  able 
to  query  a  customer  who  interprets  the  user  experience  for  the 
developers  and  makes  priority  trade-offs  (for  example,  when 
something  turns  out  to  be  harder  than  previously  imagined). 
“Fixing  the  iteration  end  date  and  not  allowing  it  to  change” 
(Larman,  2003,  p.  13)  allows  scheduled  user  research  and  or¬ 
ganizational  priority  review.  A  regular  research  and  release  plan 
allows  for  much  easier  integration  of  the  results  of  user  re¬ 
search  into  development  process.  Such  timeboxing  allows  the 
customer  to  plan  for  user  research  so  that  the  results  of  the  re¬ 
search  become  available  when  questions  arise.  For  example, 
an  Internet  search  engine  with  one-week  iterations  had  a  three- 
week  research  cycle.  The  research  answered  user  experience 
questions  posed  by  the  developers,  and  upcoming  features 
were  prototyped  and  tested  before  expending  any  program¬ 
ming  resources. 

Adaptive  development  and  evolutionary  require¬ 
ments.  All  of  these  practices  boil  down  to  two  key  concepts: 
development  practices  dynamically  adapt  to  new  information 
and  requirements  for  the  product  change  as  knowledge  about 
the  user  experience  increases. 


Introducing  User  Experience  Into  an  Existing  Process 

We  have  to  return  to  our  entrepreneurial  roots.  (Highsmith,  2002) 
There  had  been  a  growing  sense  among  the  Directors  of  the  Product 
Management  Group  that  there  was  a  diminishing  atmosphere  of  inno¬ 
vation  within  the  group.  (Fraser,  2005) 

[To  launch  a  brand  new  product]  every  mindset,  timeline,  and  as¬ 
sumption  had  to  be  challenged.  (Pellican  &  Homier,  2005) 

Organizations  regularly  have  crises  where  their  leadership 
feels  they  have  lost  the  ability  to  innovate.  Rediscovering  inno¬ 
vation  is  not  impossible,  merely  difficult.  Understanding  the 
user  experience  is  returning  to  an  organization’s  entrepreneur¬ 
ial,  innovative  roots.  Once,  the  organization  had  insight  into 
the  end-user  needs  and  was  able  to  balance  those  with  its  needs 
to  be  come  successful.  It  lost  the  ability  to  be  innovative  when 
it  lost  that  perspective. 

No  existing  organization  can  change  all  its  practices  over¬ 
night.  People  need  to  be  convinced  at  both  the  organizational 
and  individual  levels  that  there  is  value  in  change.  On  a  new  pro] 
ect,  it  is  easier  to  introduce  new  ways  of  creating,  but  everyone 
hates  change  when  there  is  momentum.  Forcing  people  to 
change  in  that  situation  almost  never  works,  but  introducing 


practices  that  make  people’s  lives  better  and  move  the  overall 
practice  in  the  right  direction  sometimes  works  (sadly,  there  are 
no  guarantees). 

Development  practices  that  expose  the  organization  to  user 
experience  ideas  lay  the  foundation  for  a  gradual  shift  in  per¬ 
spective.  The  following  practices  are  not  in  any  order.  They  are 
ideas  that  seem  to  help  organizations  change  from  the  inside  out. 

Get  a  senior  manager  champion.  The  preconditions 
to  a  successful  change  are  the  recognition  of  a  need  to  change 
and  the  authority  to  make  changes.  Those  with  ultimate  re¬ 
sponsibility  for  the  success  of  a  product  need  to  know  that  user- 
experience  research  is  a  key  business  process.  Embracing  the 
idea  that  the  members  of  an  organization  are  not  representative 
of  its  audience  has  to  start  from  the  top,  with  the  recognition 
by  someone  in  a  high-level  position  that  they  cannot  manage 
by  merely  extrapolating  from  their  experience.  Make  senior 
managers  watch  end  user  research  is  an  effective  and  dramatic 
exercise  that  demonstrates  the  difference  between  the  percep¬ 
tion  of  a  product  inside  and  outside  an  organization.  Watching 
someone  struggle  with  a  flagship  application  wins  people  over 
to  the  idea  that  maybe  they  do  not  know  everything  about  how 
their  end  users  view  the  world  and  the  product.  However,  even 
without  this,  enlightened  managers  realize  there  is  value  (both 
for  the  organization  as  a  whole  and  for  their  careers)  in  re¬ 
searching  and  codifying  the  user  experience.  Such  managers 
make  excellent  champions  within  the  organization.  They  are  ex¬ 
perts  in  the  organization’s  needs  and  can  serve  as  guides  and 
translators,  communicating  issues  to  other  managers,  framing 
ways  to  justifying  a  practice  that  has  no  immediate  return  on 
investment,  providing  a  voice  of  authority,  and  making  re¬ 
sources  available  for  the  pursuit  of  such  research.  Few  projects 
document  their  reliance  on  this  relationship,  but  it  is  critical  in 
nearly  every  successful  organizational  change.  Hollinger  (2005) 
said,  “We  requested  management  buy-in  early  on  to  be  able  to 
treat  user  experience  defects  the  same  as  any  other  product  de¬ 
fects.  .  .  .  This  explicit  support  from  a  senior  manager  on  the 
project  was  critical  to  our  success  in  this  area.” 

Work  within  existing  processes.  It  is  easy  to  dismiss 
new  ideas  as  unworkable  within  the  structure  of  an  existing 
practice.  Contextualize  new  practices  in  terms  of  the  existing 
development  process.  For  example,  one  UI  design  team  first  in¬ 
troduced  a  new  practice — the  keeping  of  user-experience  score- 
cards — but  when  the  practice’s  value  became  clear,  the  tradi¬ 
tional  process  owners  took  over: 

Although  the  design  team  produced  the  scorecards,  the  release  man¬ 
agement  team  eventually  took  over  “enforcement”,  pushing  teams  to 
turn  their  yellow  and  red  bubbles  to  green.  This  “mainstreaming”  of 
the  reviews  resulting  in  significantly  more  user  experience  bugs  being 
fixed.  (Hollinger,  2005,  p.  9) 

This  allowed  the  team  to  gradually  introduce  the  practice  while 
simultaneously  showing  its  value.  As  the  user  experience  spans 
technical  and  organizational  practices,  integrating  it  into  both  of 
those  worlds  produces  the  broadest  effects.  It  can  start  by  involv¬ 
ing  representatives  from  business  units  in  the  HCI  research 
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process.  Leadley,  Pao,  and  Douglas  (2005,  p.  6)  said,  “You  have  to 
allot  time  and  budget  for  the  Business  Analysts  to  attend  user  in¬ 
terviews  and  usability  testing.” 

Later,  it  can  progress  to  a  more  integrated  approach: 

Getting  the  entire  marketing,  engineering,  and  quality  assurance  team 
to  watch  customer-interview  tapes  and  call  the  customers  themselves 
was  a  wonderful  achievement.  All  teams  now  require  this  to  be  part  of 
a  new  product  process.  (Pellican  &  Homier,  2005,  p.  13) 

Another  way  is  to  use  familiar  tools  to  represent  user  expe¬ 
rience  ideas.  For  example,  Hollinger  (2005)  treated  unmet  user 
experience  needs  as  “bugs”  and  used  the  internal  bug-tracking 
system  to  keep  track  of  them. 

Make  small,  but  well-publicized  changes.  Persistent 
internal  marketing  is  crucial  to  wide-scale  adoption  of  user  ex¬ 
perience  ideas.  Beginning  with  small  projects,  such  as  usability 
tests  of  existing  projects,  every  report,  presentation,  and  dis¬ 
cussion  can  highlight  insights  gleaned  from  user  experience  re¬ 
search  and  analysis,  linked  to  organizational  goals.  For  exam¬ 
ple,  one  group  chose  to  share  their  findings  with  the  executive 
staff  Hollinger  (2005)  said,  “The  scorecards  were  presented  to 
vice  presidents  in  the  development  organization  at  release  sta¬ 
tus  meetings,  adding  legitimacy  to  user  experience  being  an  in¬ 
tegral  part  of  release  quality  decisions.”  (p.  9) 

Internal  marketing  should  be  an  integral  part  of  a  plan  for 
changing  a  development  culture.  In  a  plan  developed  by  Leadley 
et  al.  (2005;  see  Table  46.12,  p.  4),  most  of  the  effort  is  devoted 
to  marketing  within  the  company  using  a  number  of  methods. 

Such  an  extended  effort  takes  patience,  persistence,  and  re¬ 
sources,  as  acknowledged  by  Leadley  et  al.  (2005,  p.  5):  “The 
‘sales’  effort  by  way  of  ‘dog  and  pony  shows’  absorbed  as  much 
of  our  time  as  defining  standards,  building  elements  and  de¬ 
signing  our  website!” 

However,  internal  marketing  is  inexpensive  relative  to  failed 
products,  and  the  potential  benefits  of  these  methods  can  be 
justified  by  comparing  to  one  failed  product  launch,  six  months 
of  delayed  adoption,  or  another  appropriate  metric. 

Make  developers’  lives  easier  with  user  experience. 

Technology  development  always  happens  under  severe  time 
constraints,  and  time-pressured  developers  resist  additional 
work.  It  is  one  thing  to  communicate  that  user-experience  re¬ 
search  and  analysis  increases  chances  of  long-term  success,  but 
demonstrating  how  it  reduces  work  is  even  better. 

One  effective  way  to  win  over  developers  is  to  give  them  more 
freedom  and  reduce  the  amount  of  paperwork  in  their  lives.  Re¬ 
placing  documents  with  tools  that  embody  good  practices  in 
code  means  that  developers  do  not  feel  pressured  to  memorize 
complex  standards  or  reinvent  techniques.  Apple  Computer 
(n.d.)  successfully  enables  developers  to  create  consistent  inter¬ 
faces  by  backing  up  rules  with  a  toolbox  of  interface  elements  and 
development  tools  that  make  it  easier  to  follow  the  rules  than  to 
break  them.  Similarly,  PBS  created  a  set  of  “widgets”  (Public 
Broadcasting  System,  2004)  that  make  it  easier  for  member  sta¬ 
tions  to  conform  to  a  uniform  organizational  standard.  Leadley 
et  al.  (2005)  and  Kuniavsky  and  Raghavan  (2005)  created  tem¬ 
plates  that  included  the  code  to  generate  interaction  elements 


TABLE  46. 1 2.  Timeline  of  Allianz  User-Centered 
Practice  Introduction  (Leadley  et  al.,  2005) 


Number  of  Projects 

Year  Process  Involved 


2001 

Personas 

Dog  &  pony  shows 

4 

2002 

Usability  testing 

UX  Training 

1st  S-  3rd  Thursdays 

Dog  S-  pony  shows 

10 

2003 

Usability  testing 
|SP  Benchmarking 

Case  studies 

Prototyping 

UX  Training 

ISP  Training 

1st  S-  3rd  Thursdays 

15 

2004 

Contextual  inquiry 

Usability  testing 

UX  Training 

UX  Thursdays 

SIG  Presentations 

17 

2005 

Contributions  to  Software 
Development  Lifecycle 

Governance  Standards 
Accountability  of  Enterprise 
usability  Brand  standards 

UX  Thursdays 

19 

that  conformed  to  end-user  and  organizational  goals.  ‘A  UI  Li¬ 
brary  with  9  templates  and  40  elements,  a  high-level  methodol¬ 
ogy,  and  a  guide  for  how  to  use  our  system.  .  .  .  For  both  tem¬ 
plates  and  elements  we  provided  the  HTML  code  and  an 
HTML-rendered  version  of  the  item”  (Leadley  et  al.,  2005,  p.  8). 


CONCLUSION 


■When  an  organization  creates  technology,  it  embodies  in  a  prod¬ 
uct  its  idea  for  a  solution  to  an  end-user  problem,  with  the  goal 
that  this  will  ultimately  help  the  organization  itself  HCI  is  how 
the  end  user  interacts  with  the  product,  but  this  symbiotic  rela¬ 
tionship  between  the  end  users  and  the  organization  lies  at 
the  core  of  how  that  interaction  is  structured.  Understanding 
the  user  experience,  therefore,  is  the  process  of  understanding 
the  end-user  needs  and  the  organization  needs  with  the  goal  of 
maximizing  the  benefit  to  both. 

This  is  true  regardless  of  whether  the  product  is  destined 
for  a  broad  consumer  market  or  an  internal  tool.  Unfortunately, 
most  methods  still  treat  the  interaction  of  humans  with  com¬ 
puters  and  the  interaction  of  the  product  with  the  organization 
as  different.  In  fact,  developing  the  user  experience  is  the  whole 
of  technology  creation.  Emotional,  social,  and  organizational 
needs  make  up  the  fabric  in  which  HCI  exists.  'Without  them, 
there  would  be  no  computers  and  no  reason  for  humans  to 
interact  with  them. 


46.  User  Experience  and  HCI 


915 


References 


Agarwal,  R.,  &  E.  Karahanna.  (2000).  Time  flies  when  you  are  having  fun: 
Cognitive  absorption  and  beliefs  about  information  technology 
usage.  MIS  Quarterly,  24(4),  665-694. 

Andelman,  D.  A.  (2005,  October  5).  A  better  way  to  search?  Retrieved 
April  5,  2001,  from  Forbes.com 

Apple  Computer,  (n.d.).  Interface  builder  documentation.  Retrieved 
December  1,  2004,  from  http://developer.apple.com/tools/interface- 
builder.html 

Arnall,  T.  (2001).  Mobile  interaction  design  case  study.  Retrieved  Octo¬ 
ber  30,  2005,  from  http://www.elasticspace.eom/2001/06/mobile- 
interaction-design-case-study 

Beck,  K.,  &  Andres,  C.  (2004).  Extreme  programming  explained: 
Embrace  change  (2nd  ed.).  Reading,  MA:  Addison-Wesley. 

Bellotti,  V  (1999).  Personal  communication. 

Bentley,  R.,  Hughes,).  A.,  Randall,  D.,  Rodden,  T,  Sawyer,  E,  Shapiro,  D., 
et  al.  (1992,  November).  Ethnographically  informed  systems  design 
for  air  traffic  control.  Proceedings  of  the  Conference  on  Computer 
Supported  Cooperative  Work,  Toronto,  Canada,  123-129. 

Berg,  S.,  Taylor,  A.  S.,  &  Harper,  R.  (2003,  April).  Mobile  phones  for  the 
next  generation:  Device  designs  for  teenagers.  Proceedings  of  the 
Conference  on  Human  Factors  and  Computing  Systems  CHI  2003, 
Fort  Lauderdale,  FL,  pp.  443-440. 

Beyer,  H.,  &  Holtzblatt,  K.  (1998).  Contextual  design:  Defining  cus¬ 
tomer-centered  systems.  San  Francisco:  Morgan  Kaufmann 

Buckman,  R.  (2002,  June  14).  Microsoft’s  cable-TV  miscues  turned  into 
a  costly  lesson.  Wall  Street  Journal. 

Cockburn,  A.  (2004) .  Crystal  clear:  A  human-powered  methodology  for 
small  teams.  Reading,  MA:  Addison-Wesley. 

Cottam,  M.  (2004,  April).  Reuse  for  new  uses.  Presentation  at  ICSID 
Forum,  CHI  2004,  Vienna,  Austria. 

Davis,  F.  (1989).  Perceived  usefulness,  perceived  ease  of  use,  and  user 
acceptance  of  information  technology.  MIS  Quarterly,  14{3),3 19-340. 

Elliot,  K.,  Neustaedter,  C.,  &  Greenberg,  S.  (2005).  Time,  ownership  and 
awareness:  The  value  of  contextual  locations  in  the  home.  Proceed¬ 
ings  of  the  Uhicomp  2005  conference,  Tokyo,  Japan,  pp.  251-268. 

European  Information  Technology  Observatory.  (2002).  Yearbook  (10th 
ed.).  Frankfurt:  Author. 

Fraser,).  (2005,  November).  Inspired  innovation — How  Corel  is  drawing 
upon  employees’  ideas  for  user  focused  innovation.  Proceedings  of 
DUX05:  Conference  on  Designing  for  User  experience ,  San  Fran¬ 
cisco,  CA,  Electronic  proceedings.  ISBN  1-59593-250-X. 

Garrett,  J.  J.  (2000).  Elements  of  user  experience.  Retrieved  October 
10,  2005,  from  http://www.jjg.net/elements/ 

Garrett,  J.J.  {2002).  Elements  of  user  experience.  Indianapolis,  IN:  New 
Riders  Press. 

Haeckel,  S.  {1999) .  Adaptive  enterprise:  Creating  and  leading  sense-and- 
respond  organizations.  Boston,  MA:  Harvard  Business  School  Press 

Hassenzahl,  M.  (2003).  The  Thing  and  I:  Understanding  the  relationship 
between  user  and  product.  In  M.  A.  Blythe,  A.  F.  Monk,  K.  Over- 
beeke,  &  E  C.  Wright  (Eds.),  Funology:  From  usability  to  enjoyment. 
Dordrecht,  the  Netherlands:  Kluwer  Academic. 

Highsmith,  J.  {2002) .  Agile  software  development  ecosystems.  Reading, 
MA:  Addison-Wesley. 

Hollinger,  M.  (2005,  November).  A  process  for  incorporating  heuristic 
evaluation  into  a  software  release.  Proceedings  ofDUXOS:  Confer¬ 
ence  on  Designing  for  User  experience ,  San  Francisco,  CA,  Elec¬ 
tronic  proceedings,  ISBN  1-59593-250-X. 

Kelley,  T,  &  Littman,  J.  (2001).  The  art  of  innovation:  Lessons  in  creativ¬ 
ity  from  IDEO,  America’s  leading  design  firm.  New  York:  Doubleday. 

Krueger,  R.  A.  (1988).  Focus  groups:  A  practical  guide  for  applied 
research.  Newbury  Eark,  CA:  Sage  Publications. 


Kuniavsky,  M.  (2003).  Observing  the  user  experience:  A  practitioner’s 
guide  to  user  research.  San  Francisco,  CA:  Morgan  Kaufmann. 

Kuniavsky,  M.,  &  Raghavan,  S.  (2005,  November).  Guidelines  are  a  tool: 
Building  a  design  knowledge  management  system  for  programmers. 
Proceedings  ofDUXOS:  Conference  on  Designing  for  User  experi¬ 
ence,  San  Francisco,  CA,  Electronic  proceedings,  ISBN  1-59593- 
250-X. 

Larman,  C.  (2003).  Agile  and  iterative  development:  A  manager’s 
guide.  Reading,  MA:  Addison-Wesley. 

Leadley  B.,  Pao,  H.,  &  Douglas,  S.  (2005,  November).  Creating  a  user 
experience  culture  at  a  nonsoftware  company.  Proceedings  of 
DUX05:  Conference  on  Designing  for  User  experience ,  San  Fran¬ 
cisco,  CA,  Electronic  proceedings,  ISBN  1-59593-250-X. 

Madison,  D.  S.  (2005).  Critical  ethnography:  Method,  ethics,  and  per¬ 
formance.  Thousand  Oaks,  CA:  Sage  Publications. 

Maslow,  A.  H.  (1943).  A  theory  of  human  motivation.  Psychological 
Review,  50,  370-396. 

Millen,  D.  (2000,  August).  Rapid  ethnography:  Time  deepening  strate¬ 
gies  for  HCI  field  research.  Proceedings  of  DIS  2000:  Designing 
Interactive  Systems  ,  New  York,  NX  pp.  280-286. 

Nass,  C.,  &  Reeves,  B.  (1996).  The  media  equation:  How  people  treat 
computers,  televisions,  and  new  media  as  real  people  and  places. 
Cambridge,  MA:  Cambridge  University  Press. 

Norman,  D.  A.  (1998).  The  invisible  computer.  Cambridge,  MA:  MIT 
Press. 

OrtonyA.,  Norman,  D.  A.,  &  Revelle,  W  (2005) ■  Affect  and  proto-affect  in 
effective  functioning.  In).  M.  Fellous  &  M.  A.  Arbib  (Eds.),  Who  needs 
emotions?  The  brain  meets  the  machine  (pp.  173-202).  New  York: 
Oxford  University  Press. 

Pellican,  S.,  &  Homier,  M.  (2005,  November).  Customer  driven  innova¬ 
tion:  Quicken(r)  rental  property  manager.  Proceedings  ofDUXOS: 
Conference  on  Designing  for  User  experience,  San  Francisco,  CA, 
Electronic  proceedings,  ISBN  1-59593-250-X. 

Pine,  B. ).,  &  Gilmore,  J.  H.  (1999).  The  experience  economy.  Boston, 
MA:  Harvard  Business  School  Press. 

Postrel,  V  (2003) .  The  substance  of  style:  How  the  rise  of  aesthetic  value 
is  remaking  commerce,  culture  and  consciousness.  New  York: 
HarperCollins. 

Public  Broadcasting  System.  (2004) .  Best  practices  for  PBS  member  sta¬ 
tions.  Retrieved  December  1,  2005,  from  http://www.pbs.org/remote 
control/bestpractices/ 

Rafaeli,  Y,  &  Vilnai-Yavetz,  I.  (2004).  Emotion  as  a  connection  of  physical 
artifacts  and  organizations.  Organizational  Science,  15,  6. 

Saffer,  D.  (2002,  June  3).  Building  brand  into  structure.  Boxes  and 
Arrows.  Retrieved  October  28,  2005,  from  http://www.boxesand 
arrows.com/archives/building_brand_into_structure.php 

Sawhney  M.  (2003,  July  1).  Fundamentals  of  Value.  CIO  Magazine. 

Schmitt,  B.  H.  (2003).  Customer  experience  management:  A  revolu¬ 
tionary  approach  to  connecting  with  your  customers.  San  Fran¬ 
cisco,  CA:  John  Wiley  &  Sons. 

Schwaber,  K.,  &  Beedle,  M.  (2001).  Ag/fe  software  development  with 
SCRUM.  Upper  Saddle  River,  NJ:  Prentice  Hall. 

Spool,  J.  (2005,  August  10).  Re:  [Sigia-L]  Length  of  nav  Labels.  Message 
posted  to  Sigia-L  electronic  mailing  list,  archived  at  http://www.info- 
arch.org/lis  ts/sigia-l/0508/0157.html 

TiVo.  (2005,  August  24).  Press  release. 

U.S.  General  Accounting  Office.  (1996).  Content  analysis:  A  methodol- 
ogy  for  structuring  and  analyzing  written  material  (GAO/PEMD- 
10.3.1).  Washington,  DC:  Author. 

Vredenburg,  R.,  Isensee,  S.,  &  Righi,  C.  (2001).  User-centered  design:  An 
integrated  approach.  Upper  Saddle  River,  NJ:  Prentice  Hall. 


916  •  KUNIAVSKY 


Wikipedia.  (2005a).  Ford  Model  T.  Retrieved  October  10,  2005,  from 
http://en.  Wikipedia. orgAviki/Ford_Model-T 
Wikipedia.  (2005b).  Ford  Model  A.  Retrieved  October  10,  2005,  from 
http://en.  Wikipedia. orgAviki/Ford_Model_A 
Wikipedia.  (2005c).  Customer  relationship  management.  Retrieved 
October  29,  2005,  from  http://en.wikipedia.org/wiki/Customer_ 
relationship_management 

Wikipedia.  (2005d).  Customer  experience  management.  Retrieved 
October  29,  2005,  from  http://en.wikipedia.org/wiki/Customer_ 
experience_management 


Wikipedia.  (2005e).  Out  of  box  experience.  Retrieved  October  30,  2005, 
from  http://en.wikipedia.org/wiki/Out-of-box_experience. 

Wlxon,  D.  R.,  Ramey,  J.,  Holtzblatt,  K.,  Beyer,  H.,  Hackos,  J.,  Rosenbaum, 
S.,  et  al.  (2002,  April).  Usability  in  practice:  Field  methods,  evolu¬ 
tion  and  revolution.  Proceedings  CFII2002:  Proceedings  of  the  Con¬ 
ference  on  Fluman  Factors  and  Computing  Systems  ,  Minneapolis, 
MN.  pp.  880-884. 

Zhang,  E,  &  Li,  N.  (2005).  The  importance  of  affective  quality.  Commu¬ 
nications  of  the  ACM,  9,  105-108. 


REQUIREMENTS  SPECIFICATIONS  WITHIN 
THE  USABILITY  ENGINEERING  LIFECYCLE 

Deborah  J.  Mayhew,  PhD 

Deborah  ].  Mayhew  &  Associates 


Introducing  Requirements 

Specifications  . 918 

Phase  1:  Requirements  Analysis . 918 

User  profile  . 918 

Contextual  task  analysis . 918 

Usability  goal  setting  . 918 

Platform  capabilities/constraints  . 918 

General  design  guidelines . 919 

Phase  2:  Design/Testing/Development  . 919 

Level  1  Design  . 919 

Work  reengineering . 919 

Conceptual  model  design  . 920 

Conceptual  model  mockups  . 920 

Iterative  conceptual  model  evaluation  . 920 

Level  2  Design  . 920 

Screen  design  standards  . 920 

Screen  design  standards  prototyping  . 920 


Iterative  screen  design  standards  evaluation . 920 

Style  guide  development . 920 

Level  3  Design . 920 

Detailed  user  interface  design . 920 

Iterative  detailed  user  interface 

design  evaluation . 920 

Phase  3:  Installation . 920 

User  feedback  . 920 

Focusing  on  Requirements  Analysis . 920 

The  user  profile . 920 

Contextual  task  analysis . 921 

Motivating  and  Justifying 

Requirements  Specification . 923 

User  Requirements . 923 

Work  Environment  Requirements  . 923 

Task  Requirements . 924 

References  . 926 


917 


918  •  MAYHEW 


INTRODUCING  REQUIREMENTS 
SPECIFICATIONS 


Three  key  ingredients  are  necessary  to  ensure  that  usability  is 
achieved  during  product  development: 

•  Application  of  established  design  principles  and  guidelines; 

•  A  structured  methodology  for  design; 

•  Managerial  and  organizational  techniques. 

At  this  point  in  the  history  of  the  field  of  human-computer 
interaction,  there  are  well-established  software  design  princi¬ 
ples  and  guidelines  available,  based  on  objective  research  and 
reported  in  the  literature.  Many  of  these  principles  and  guide¬ 
lines  are  enumerated  throughout  different  chapters  in  this 
book.  Development  organizations  need  to  have  staffs  which  are 
fluent  in  these  design  guidelines  participate  in  design  efforts,  so 
this  general  accumulated  knowledge  will  find  its  way  into  their 
products. 

However,  just  having  a  design  guru  on  board  does  not  guar¬ 
antee  that  design  principles  and  guidelines  will  find  their  way 
into  products.  Design  is  complex,  and  there  simply  is  no  cook¬ 
book  approach  to  design  that  can  rely  on  general  principles  and 
guidelines  alone.  Development  organizations  also  need  struc¬ 
tured  methods  for  achieving  usability  in  their  products. 

Similarly,  a  well-structured  and  documented  design  method¬ 
ology  must  be  introduced  and  managed — it  does  not  happen  by 
itself  Thus,  managerial  and  organizational  techniques  must  be 
applied  to  ensure  that  the  design  methodology  is  followed  and 
includes  the  application  of  well-established  design  principles. 

Even  when  good  management  practices  are  being  applied, 
either  of  the  remaining  two  ingredients  alone — design  guide¬ 
lines  or  design  methods — is  necessary,  but  not  sufficient.  Opti¬ 
mal  design  cannot  be  accomplished  by  the  systematic  applica¬ 
tion  of  generic  guidelines  alone,  because  every  product  and  its 
intended  set  of  users  are  unique.  Design  guidelines  must  be  tai¬ 
lored  for  and  validated  against  product-unique  requirements, 
and  this  is  what  the  structured  methods  accomplish. 

Conversely,  applying  structured  methods  without  also  draw¬ 
ing  on  well-established  design  principles  and  guidelines  is  inef¬ 
ficient  at  best  and  may  simply  fail  at  worst.  Without  the  benefit 
of  the  initial  guidance  of  sound  design  principles  during  first 
passes  at  design,  a  particular  project  with  its  limited  resources 
may  simply  never  stumble  upon  a  design  approach  that  works. 
For  example,  formal  usability  testing  is  a  valuable  and  objective 
method  for  uncovering  usability  problems.  However,  without  a 
clear  understanding  of  basic  design  principles  and  guidelines,  as 
well  as  unique  requirements  data,  solving  those  problems  after 
they  have  been  identified  will  not  be  easy  or  likely. 

Previous  sections  and  chapters  in  this  book  refer  to  a  broad 
variety  of  design  principles  and  guidelines.  This  section  (section 
A  in  Part  IV)  addresses  the  need  for  methodology  and  provides 
chapters  that  address  a  variety  of  techniques  that  can  be  applied 
during  the  development  process  to  achieve  usability  in  product 
design.  This  chapter  sets  the  stage  for  this  section  and,  in  par¬ 
ticular,  for  the  subsection  on  one  stage  of  the  development 
process:  requirements  specification. 


The  Usability  Engineering  Lifecycle  (Mayhew,  1999)  docu¬ 
mented  a  structured  and  systematic  approach  to  addressing  us¬ 
ability  within  the  product-development  process.  It  consists  of  a 
set  of  usability  engineering  tasks  applied  in  a  particular  order 
at  specified  points  in  an  overall  product-development  lifecycle. 

Several  types  of  tasks  are  included  in  the  Usability  Engineer¬ 
ing  Lifecycle: 

•  Structured  usability-requirements  analysis  tasks; 

•  An  explicit  usability  goal-setting  task,  driven  directly  from 
requirements-analysis  data; 

•  Tasks  supporting  a  structured,  top-down  approach  to  user 
interface  design  that  is  driven  directly  from  usability  goals 
and  other  requirements  data; 

•  Objective  usability  evaluation  tasks  for  iterating  design  to¬ 
ward  usability  goals. 

Figure  47.1  represents  in  summary,  visual  form,  the  Usability 
Engineering  Lifecycle.  The  overall  lifecycle  is  cast  in  three  phases: 
Requirements  Analysis,  DesignAesting/Development,  and  Instal¬ 
lation.  Specific  usability-engineering  tasks  within  each  phase  are 
presented  in  boxes,  and  arrows  show  the  basic  order  in  which 
tasks  should  be  carried  out.  Much  of  the  sequencing  of  tasks  is  it¬ 
erative,  and  the  specific  places  where  iterations  would  most  typ¬ 
ically  occur  are  illustrated  by  arrows  returning  to  earlier  points  in 
the  lifecycle.  Brief  descriptions  of  each  lifecycle  task  follow. 

Phase  1 :  Requirements  Analysis 

User  profile.  A  description  of  the  specific  user  character¬ 
istics  relevant  to  user  interface  design  (e.g.,  computer  literacy, 
expected  frequency  of  use,  level  of  job  experience)  is  obtained 
for  the  intended  user  population.  This  will  drive  tailored  user  in¬ 
terface  design  decisions  and  identify  major  user  categories  for 
study  in  the  Contextual  Task  Analysis  task.  This  volume  includes 
a  set  of  chapters  (Part  V)  dedicated  to  exploring  the  unique 
needs  and  requirements  of  different  types  of  user  populations. 

Contextual  task  analysis.  A  study  of  users’  current 
tasks,  workflow  patterns,  and  conceptual  frameworks  is  made, 
resulting  in  a  description  of  current  tasks  and  workflow,  and  un¬ 
derstanding  and  specification  of  underlying  user  goals.  These 
will  be  used  to  set  usability  goals  and  drive  Work  Reengineering 
and  user  interface  design.  This  section  of  this  volume  includes 
several  chapters  on  topics  related  to  task  analysis  (Chapters  48, 
49,  50). 

Usability  goal  setting.  Specific  qualitative  goals  reflect¬ 
ing  usability  requirements  are  developed,  extracted  from  the 
User  Profile  and  Contextual  Task  Analysis.  In  addition,  quantita¬ 
tive  goals  (based  on  a  subset  of  high-priority  qualitative  goals) 
are  developed,  defining  minimal  acceptable  user  performance 
and  satisfaction  criteria.  These  usability  goals  focus  later  design 
efforts  and  form  the  basis  for  later  iterative  usability  evaluation. 

Platform  capabilities/constraints.  The  user  interface 
capabilities  and  constraints  (e.g.,  windowing,  direct  manipula¬ 
tion,  screen  size,  color,  etc.)  inherent  in  the  technology  plat- 
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FIGURE  47. 1 .  The  Usability  Engineering  Life  Cycie.  (liiustration  taken  from  Mayhew,  1999). 


form  chosen  for  the  product  (e.g.,  Apple  Macintosh,  MS  Win¬ 
dows,  product-unique  platforms)  are  determined  and  docu¬ 
mented.  These  will  define  the  scope  of  possibilities  for  user  in¬ 
terface  design.  This  volume  includes  chapters  which  describe 
platform  capabilities  that  support  many  relatively  new  and  in¬ 
novative  types  of  user  interfaces,  including  multimedia  (chapter 
20),  adaptive  interfaces  and  agents  (chapter  22),  mobile  devices 
and  ubiquitous  computing  (chapter  23),  tangible  interfaces 
(chapter  24),  and  wearable  computers  (chapter  15). 

General  design  guidelines.  Relevant  general  user  in¬ 
terface  design  guidelines  available  in  the  usability  engineering 
literature  are  gathered  and  reviewed.  They  will  be  applied  dur¬ 


ing  the  design  process  to  come,  along  with  all  other  project- 
specific  information  gathered  in  the  previous  tasks. 

Phase  2:  Design/Testing/Development 
Level  1  Design 

Work  reengineering.  Based  on  all  requirements-analysis 
data  and  the  usability  goals  extracted  from  it,  user  tasks  are  re¬ 
designed  at  the  level  of  organization  and  workflow  to  streamline 
work  and  exploit  the  capabilities  of  automation.  No  user  inter¬ 
face  design  is  involved  in  this  task,  just  abstract  organization  of 
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functionality  and  workflow  design.  This  task  is  sometimes  re¬ 
ferred  to  as  “Information  Architecture.” 

Conceptual  model  design.  Based  on  all  the  previous 
tasks,  initial  high-level  design  alternatives  are  generated.  At  this 
level,  navigational  pathways  and  major  displays  are  identified, 
and  rules  for  the  consistent  presentation  of  work  products, 
processes,  and  actions  are  established.  Screen  design  detail  is 
not  addressed  at  this  design  level. 

Conceptual  model  mockups.  Paper-and-pencil  or  proto¬ 
type  mockups  of  high-level  design  ideas  generated  in  the  previ¬ 
ous  task  are  prepared,  representing  ideas  about  high-level  functional 
organization  and  Conceptual  Model  Design  (see  Beaudouin- 
Lafon  and  MacKay  in  this  Handbook  for  a  discussion  of  proto¬ 
typing  tools  and  techniques) .  Detailed  screen  design  and  com¬ 
plete  functional  design  are  not  in  focus  here. 

Iterative  conceptual  model  evaluation.  The  mock- 
ups  are  evaluated  and  modified  through  iterative  evaluation 
techniques  such  as  formal  usability  testing,  in  which  real,  rep¬ 
resentative  end  users  attempt  to  perform  real,  representative 
tasks  with  minimal  training  and  intervention,  imagining  that  the 
mockups  are  a  real  product  user  interface  (see  section  C,  Part  VI 
in  this  volume  for  chapters  on  testing  and  evaluation).  This  and 
the  previous  two  tasks  are  conducted  in  iterative  cycles  until  all 
major  usability  “bugs”  are  identified  and  engineered  out  of  Level  1 
(e.g..  Conceptual  Model)  design.  Once  a  Conceptual  Model  is 
relatively  stable,  system  architecture  design  can  commence. 

Level  2  Design 

Screen  design  standards.  A  set  of  product-specific 
standards  and  conventions  for  all  aspects  of  detailed  screen  de¬ 
sign  is  developed,  based  on  any  industry  and/or  corporate  stan¬ 
dards  that  have  been  mandated  (e.g.,  Microsoft  Windows,  Apple 
Macintosh,  etc.),  the  data  generated  in  the  Requirements  Analy¬ 
sis  phase,  and  the  product-unique  Conceptual  Model  Design  ar¬ 
rived  at  during  Level  1  Design. 

Screen  design  standards  prototyping.  The  Screen 
Design  Standards  (as  well  as  the  Conceptual  Model  Design)  are 
applied  to  design  the  detailed  user  interface  to  selected  sub¬ 
sets  of  product  functionality.  This  design  is  implemented  as  a 
running  prototype. 

Iterative  screen  design  standards  evaluation.  An 

evaluation  technique,  such  as  formal  usability  testing,  is  carried 
out  on  the  Screen  Design  Standards  prototype,  and  then  re¬ 
design/reevaluation  iterations  are  performed  to  refine  and  vali¬ 
date  a  robust  set  of  Screen  Design  Standards.  Iterations  are  con¬ 
tinued  until  all  major  usability  bugs  are  eliminated  and  usability 
goals  seem  within  reach. 

Style  guide  development.  At  the  end  of  the  design/eval¬ 
uate  iterations  in  Design  Levels  1  and  2,  you  have  a  validated  and 
stabilized  Conceptual  Model  Design,  and  a  validated  and  stabi¬ 
lized  set  of  standards  and  conventions  for  all  aspects  of  detailed 
Screen  Design.  These  are  captured  in  the  document  called  the 


product  Style  Guide,  which  already  documents  the  results  of 
requirements-analysis  tasks.  During  Detailed  User  Interface  De¬ 
sign,  following  the  Conceptual  Model  Design  and  Screen  Design 
Standards  in  the  product  Style  Guide  will  ensure  quality,  coher¬ 
ence,  and  consistency — the  foundations  of  usability. 

Level  3  Design 

Detailed  user  interface  design.  Detailed  design  of  the 
complete  product  user  interface  is  carried  out  based  on  the  re¬ 
fined  and  validated  Conceptual  Model  and  Screen  Design  Stan¬ 
dards  documented  in  the  product  Style  Guide.  This  design  then 
drives  product  development. 

Iterative  detailed  user  interface  design  evaluation. 

A  technique  such  as  formal  usability  testing  is  continued  dur¬ 
ing  product  development  to  expand  evaluation  to  previously 
unassessed  subsets  of  functionality  and  categories  of  users,  and 
also  to  continue  to  refine  the  user  interface  and  validate  it 
against  usability  goals. 

Phase  3:  Installation 

User  feedback.  After  the  product  has  been  installed  and 
in  production  for  some  time,  feedback  is  gathered  to  feed  into 
design  enhancements,  design  of  new  releases,  and/or  design  of 
new  but  related  products. 


FOCUSING  ON  REOUIREMENTS  ANALYSIS 


This  section  of  the  Handbook  provides  in-depth  coverage  of 
the  different  phases  (and  tasks  within  those  phases)  of  the 
Usability  Engineering  Lifecycle.  In  particular,  this  subsection.  Re¬ 
quirements  Specification,  provides  chapters  describing  and 
discussing  different  tasks  and  techniques  for  requirements  spec¬ 
ification  in  some  depth  and  from  different  perspectives.  The 
goal  of  this  chapter  is  to  reinforce  the  importance  of  this  phase 
of  the  lifecycle  and  to  provide  real-world  examples  of  the  bene¬ 
fits  of  conducting  the  kinds  of  techniques  discussed  in  the  later 
chapters  of  this  subsection. 

Three  main  things  must  be  studied  and  understood  to  tailor 
design  to  support  unique  requirements: 

•  The  users 

•  The  users’  tasks 

•  The  users’  work  environment 

In  The  Usability  Engineering  Lifecycle,  the  first  is  addressed 
in  the  task  called  the  User  Profile,  and  the  second  two  are  ad¬ 
dressed  in  the  task  called  Contextual  Task  Analysis. 

The  user  profile.  There  is  no  single  best  user  interface 
style  or  approach  for  all  types  of  users.  Specific  interface  design  al¬ 
ternatives  that  optimize  the  performance  of  some  types  of  users 
may  actually  degrade  the  performance  of  other  types  of  users.  For 
example,  an  infrequent,  casual  user  needs  an  easy-to-learn  and 
easy-to-remember  interface,  but  a  high-frequency  expert  user 
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needs  an  efficient,  powerful,  and  flexible  interface.  These  are  not 
necessarily  the  same  thing.  Similarly,  a  highly  skilled  typist  might 
perform  better  with  a  keyboard-oriented  interface,  whereas  a 
low-skill  typist  might  do  better  with  a  graphical  user  interface  in 
which  point-and-select  interaction  replaces  keyboarding. 

Many  user  populations  have  very  specific  special  needs  that 
impact  their  ability  to  interact  with  software  applications.  Chap¬ 
ters  in  this  section  cover  the  special  characteristics  of  the  elderly 
(see  chapter  39),  children  (see  chapter  40),  cross-cultural  audi¬ 
ences  (see  chapter  18),  and  users  with  various  kinds  of  physi¬ 
cal  and  perceptual  disabilities  (see  chapters  42-43). 

Unless  designers  know  the  specific  characteristics  of  a  popula¬ 
tion  of  users  (e.g.,  expected  frequency  of  use,  level  of  typing  skill, 
special  needs  and  constraints,  etc.),  they  cannot  make  optimal 
user  interface  design  decisions  for  them.  The  purpose  of  a  User 
Profile  is  thus  to  establish  the  general  requirements  of  a  category 
of  users  in  terms  of  overall  interface  style  and  approach. 

One  relatively  new  technique  for  gathering,  documenting,  and 
applying  user  profile  data  in  user  interface  design  is  the  use  of 
“personas”  (Pruitt  &  Adlin,  2006;  see  also  Adlin  &  Pruitt  in  this 
volume).  Personas  are  realistic  and  detailed  descriptions  of 
imaginary  people  who  represent  all  the  characteristics,  skill  sets, 
goals,  top  tasks,  responsibilities,  tools,  pain  points,  and  so  on  of  a 
category  of  users.  They  are  used  to  drive  discussion  and  design 
all  throughout  the  product-development  lifecycle,  from  product 
conception  to  user  acceptance  testing.  They  help  keep  the  focus 
on  users’  needs  and  goals,  and  give  stakeholders  a  common  lan¬ 
guage  with  which  to  consider,  discuss,  and  evaluate  product- 
design  ideas.  Another  older,  but  still  effective,  way  to  keep  the  user 
perspective  in  product  design  is  participatory  design,  which  builds 
users  right  into  the  design  team  (see  chapter  54  in  this  volume) . 

Besides  understanding  individual  users  of  different  cate¬ 
gories,  understanding  the  characteristics  of  groups  and  com¬ 
munities  is  another  aspect  of  user  profiling  that  becomes  im¬ 
portant  in  the  design  of  groupware  and  products  aimed  at 
online  communities  and  computer-supported  cooperative 
work.  This  aspect  of  requirements  analysis  is  addressed  by  Ol¬ 
son  and  Olson  (chapter  27)  in  this  volume. 

The  User  Profile  task  fits  into  the  overall  Usability  Engineer¬ 
ing  Lifecycle  as  described  as  follows: 

•  The  User  Profile  task  is  the  first  task  in  the  Usability  Engi¬ 
neering  Lifecycle; 

•  The  User  Profile  task  will  feed  directly  into  the  Contextual  Task 
Analysis  task  by  identifying  categories  of  users  whose  tasks 
and  work  environment  must  be  studied  in  that  later  task; 

•  The  User  Profile  task  will  feed  directly  into  the  Usability  Goal 
Setting  task  in  that  usability  goals  are  in  part  driven  directly  by 
user  characteristics  (e.g.,  a  low  frequency  of  use  indicates  a 
need  for  ease-of-learning  and  remembering).  Thus,  different 
usability  goals  will  be  extracted  from  the  profiles  of  different 
categories  of  users; 

•  Ultimately,  the  User  Profile  task  will  have  a  direct  impact  on 
all  design  tasks,  which  are  focused  on  realizing  usability  goals, 
in  turn  based  in  part  on  User  Profiles; 

•  The  User  Profile  task  will  also  drive  the  selection  of  usability- 
evaluation  issues  and  test  users; 

•  Output  from  the  User  Profile  task  will  be  documented  in  the 
product  Style  Guide. 


The  User  Profile  task  fits  into  the  underlying  software  devel¬ 
opment  methodology  as  follows: 

•  The  User  Profile  task  can  occur  in  parallel  with,  overlapping 
with,  or  following  the  development  of  the  Requirements 
Model  in  the  Analysis  Phase  in  Object-Oriented  Software 
Engineering  (OOSE),  or  function  and  data  modeling  in  the 
requirements  phase  of  a  traditional  rapid  prototyping 
methodology.  It  could  either  define  Actors  for  the  Require¬ 
ments  Model  or  take  the  definition  of  user  categories  from 
the  Requirements  Model  as  its  starting  point; 

•  The  User  Profile  task  (along  with  all  other  Usability  Engineer¬ 
ing  Lifecycle  Requirements  Analysis  tasks)  should  occur  prior 
to  the  development  of  the  Analysis  Model  in  the  Analysis 
phase  of  OOSE  (or  the  application  architecture  design  in  a 
traditional  rapid  prototyping  methodology). 

Contextual  task  analysis.  The  purpose  of  the  Contex¬ 
tual  Task  Analysis  task  is  to  obtain  a  user-centered  model  of  work 
as  it  is  currently  performed  (e.g.,  to  understand  how  users  cur¬ 
rently  think  about,  talk  about,  and  do  their  work  in  their  actual 
work  environment) .  Ultimately,  the  reason  for  this  is  that  when 
designing  a  new  product  and  its  user  interface,  it  is  important 
to  find  an  optimal  compromise  or  tradeoff  between  three  goals: 

•  Realizing  the  power  and  efficiency  that  automation  makes 
possible; 

•  Reengineering  work  processes  to  more  effectively  support 
identified  business  goals; 

•  Minimizing  retraining  by  having  the  new  product  tap  as  much 
as  possible  into  users’  existing  task  knowledge,  and  maximiz¬ 
ing  efficiency  and  effectiveness  by  accommodating  human 
cognitive  constraints  and  capabilities  within  the  context  of 
their  actual  tasks. 

Traditionally,  in  software-development  methodologies,  the 
most  focus  is  put  on  the  first  goal,  and  some  of  the  second  goal 
and  most  of  the  third  are  lost,  because  designers  and  develop¬ 
ers  never  really  understand  the  users’  work  and  their  current 
work  models.  A  great  deal  of  work  reengineering  occurs  in  ap¬ 
plication  design,  and  only  some  of  it  serves  the  first  two  goals. 
Much  of  it  is  unnecessary  and  not  useful,  and  it  results  in  un¬ 
necessary  training  and  usage  burdens  on  the  user.  The  third 
goal — minimizing  the  training  overhead  and  maximizing  effi¬ 
ciency  and  effectiveness — cannot  be  factored  in  unless  design¬ 
ers  have  a  clear  picture  of  users’  current  work  models:  how  they 
do  their  work  in  the  realities  of  their  everyday  work  environ¬ 
ment,  and  how  they  think  about  it  and  talk  about  it. 

Traditional  systems  analysis  usually  (but  not  always)  results  in 
the  inclusion  of  all  required  data  and  low-level  functions,  and 
structures  them  in  a  robust  implementation  architecture.  'With¬ 
out  a  truly  user-centered  approach,  however,  it  often  fails  to  or¬ 
ganize  and  present  that  data  and  functionality  in  a  manner  that 
supports  and  optimizes  the  work  performance  of  real  users  in 
their  real  work  environment.  This  missing  piece  is  the  whole 
point  of  a  Contextual  Task  Analysis. 

Thus,  the  purpose  of  this  task  is  to  supplement  more  tradi¬ 
tional  types  of  systems  analyses  to  define  usability  requirements 
and  to  point  toward  ways  of  meeting  those  requirements.  Then, 
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in  later  tasks,  these  requirements  can  be  applied  directly  to 
making  user  interface  design  decisions. 

Contextual  Task  Analysis  consists  of  the  following  basic  steps: 

•  Gathering  background  information  about  the  work  being 
automated; 

•  Collecting  and  analyzing  data  from  observations  of  and  in¬ 
terviews  with  users  as  they  do  real  work  in  their  actual  work 
environment; 

•  Constructing  and  validating  models  of  how  users  currently 
think  about  and  do  their  work. 

A  central,  key  step  in  the  Contextual  Task  Analysis  task  is  the 
second  step,  sometimes  referred  to  as  “contextual  observa¬ 
tions/interviews.”  Here,  the  idea  is  that  analysts  must  observe 
and  interview  users  in  their  real-life  work  context  to  understand 
their  work  and  discover  their  work  models.  Only  then  can  de¬ 
signers  structure  and  present  functionality  in  an  application 
user  interface  in  a  way  that  taps  into  users’  current  work  models 
and  optimally  supports  their  tasks. 

An  abstract  modeling  of  users’  tasks,  which  is  the  focus  of 
more  traditional  types  of  business  and  systems  analysis,  does 
not  typically  take  into  consideration  key  aspects  of  actual  work- 
flow  and  key  aspects  of  the  users  and  their  work  environment, 
and  simply  does  not  support  this  goal.  Another  way  of  putting 
this  is  that  traditional  systems  analysis  models  work  in  the  ab¬ 
stract,  without  considering  the  basic  capabilities  and  constraints 
of  human  information  processing,  the  particular  characteristics 
of  the  intended  user  population,  the  unique  characteristics  of 
the  work  environment,  and  how  users  themselves  model,  carry 
out,  and  talk  about  their  tasks.  In  addition,  the  usual  approach 
in  a  traditional  business/systems  analysis  is  to  decompose  high- 
level  units  of  work  into  low-level  functions.  In  the  process, 
much  of  how  the  work  is  actually  carried  out  is  lost.  The  result 
of  this  analysis  approach  is  often  systems  that  provide  all  nec¬ 
essary  functionality,  but  in  an  organization  and  presentation  that 
simply  does  not  support  the  natural  flow  of  work  and  address 
the  current  “points  of  pain”  for  individual  users. 

Based  on  an  analysis  of  direct  observations,  models  can  be 
constructed  that  represent  not  the  work  in  the  abstract  from  a 
systems  point  of  view,  but  instead  represent  how  users  currently 
think  about,  talk  about,  and  actually  carry  out  their  work  (e.g., 
models  reflecting  the  users’  point  of  view).  These  models  do  not 
get  directly  designed  into  the  application  or  its  interface.  They 
feed  into  only  one  of  the  goals  referred  to  earlier,  that  of  tap¬ 
ping  into  existing  user  knowledge,  habit,  and  capabilities,  and 
they  are  juggled  with  the  other  two  goals  of  supporting  general 
business  goals  and  exploiting  the  power  of  automation.  This  jug¬ 
gling  happens  in  a  later  task  in  The  Usability  Engineering  Life- 
cycle  called  “"Work  Reengineering,”  but  also  often  referred  to  as 
“Information  Architecture.” 

One  generic  aspect  of  tasks  that  has  become  increasingly 
important  with  the  explosion  of  e-commerce  is  the  issues  of  se¬ 
curity  and  privacy.  Understanding  users’  requirements  for  and 
expectations  of  privacy,  and  making  users  feel  confident  in  the 
security  and  privacy  of  personal  information  they  are  asked  to 
divulge  in  the  course  of  various  kinds  of  online  transactions,  has 
become  a  significant  part  of  the  necessary  research  into  user 


tasks,  and  the  design  of  interactions  to  support  them.  This  im¬ 
portant  new  aspect  of  task  analysis  is  covered  by  Karat,  Karat, 
and  Brodie  (chapter  32)  in  this  volume. 

Contextual  Task  Analysis  task  fits  into  the  overall  Usability 
Engineering  Lifecycle  as  described  as  follows: 

•  The  User  Profile  task  will  feed  directly  into  the  Contextual 
Task  Analysis  task  by  identifying  categories  of  users  (e.g.,  ac¬ 
tors)  whose  tasks  must  be  studied; 

•  The  Contextual  Task  Analysis  task  will  feed  directly  into  the 
Usability  Goal  Setting  task  by  helping  to  identify  different  pri¬ 
mary  goals  for  different  task  types  (use  cases) ,  and  by  identi¬ 
fying  bottlenecks  and  weaknesses  in  current  work  processes 
that  can  be  reduced  through  good  user  interface  design; 

•  The  Contextual  Task  Analysis  task  will  feed  directly  into  the 
Work  Reengineering  task.  Current  user  work  models  are 
reengineered  only  as  much  as  necessary  to  exploit  the  power 
of  automation  and  contribute  to  explicit  business  goals.  Cur¬ 
rent  user  knowledge  and  experience  are  exploited  as  much 
as  possible  to  facilitate  ease  of  learning  and  use; 

•  The  Contextual  Task  Analysis  task  will  be  documented  in  the 
product  Style  Guide; 

•  Ultimately,  the  Contextual  Task  Analysis  task  will  have  a  direct 
impact  on  all  design  tasks,  and  on  the  selection  of  usability 
testing  and  evaluation  issues,  as  well  as  on  the  design  of 
usability  testing  materials. 

Contextual  Task  Analysis  fits  into  the  underlying  software  de¬ 
velopment  methodology  as  follows: 

•  The  Contextual  Task  Analysis  task  can  occur  in  parallel  with, 
overlapping  with,  or  following  the  development  of  the  Re¬ 
quirements  Model  in  the  Analysis  Phase  in  OOSE  (or  function 
and  data  modeling  in  the  requirements  phase  of  a  traditional 
rapid  prototyping  methodology).  It  could  either  identify  Use 
Cases  for  the  Requirements  Model,  or  take  the  definition  of 
Use  Cases  from  the  Requirements  Model  as  its  starting  point 
for  constructing  Task  Scenarios; 

•  The  Contextual  Task  Analysis  task  (along  with  all  other  Us¬ 
ability  Engineering  Lifecycle  Requirements  Analysis  tasks) 
should  occur  prior  to  the  development  of  the  Analysis  Model 
in  the  Analysis  phase  of  OOSE  (or  the  application  architec¬ 
ture  design  in  a  traditional  rapid-prototyping  methodology). 

Currently,  in  the  field  of  Usability  Engineering,  there  is  no 
well-established,  universally  applied,  general,  practical,  and 
highly  structured  technique  for  performing  a  Contextual  Task 
Analysis  for  the  purpose  of  driving  the  user  interface  design  for 
a  product  that  has  already  been  identified  and  scoped.  It  is  more 
of  an  art  than  a  science  at  this  point,  with  each  usability  practi¬ 
tioner  using  his  or  her  own  informal  approach.  Recently,  how¬ 
ever,  some  structured  techniques  have  begun  to  emerge,  and 
experience  with  them  has  been  reported  in  the  literature  (e.g., 
Beyer  &  Holtzblatt,  1998;  Hackos  &  Redish,  1998).  Later  chap¬ 
ters  in  this  book  represent  the  latest  forays  into  the  thinking  of 
these  and  other  experts  on  this  important  task  in  the  Usability 
Engineering  Lifecycle. 
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MOTIVATING  AND  JUSTIFYING 
REQUIREMENTS  SPECIFICATION 


While  the  previous  section  set  the  stage  for  Requirements  Spec¬ 
ifications  within  the  broader  perspective  of  the  whole  Usability 
Engineering  Lifecycle  and  underlying  development  process,  this 
section  provides  motivation  and  justification  for  investing  in 
Requirements  Specifications  tasks  and  activities,  in  particular 
through  the  reporting  of  “war  stories”  from  the  experience  of 
myself  and  other  practitioners.  The  war  stories  are  divided  into 
those  relating  to  user,  environment,  and  task  requirements. 

User  Requirements 

Following  are  two  examples  from  my  own  experience  of  the  im¬ 
portance  of  knowing  your  users. 

For  two  of  my  clients,  I  first  interviewed  project  team  mem¬ 
bers  (developers)  to  get  a  general  sense  of  the  user  population. 
My  purpose  was  to  solicit  input  to  the  design  of  a  User  Profile 
questionnaire  that  I  would  later  employ  to  solicit  profile  infor¬ 
mation  directly  from  the  users  themselves. 

In  one  case,  the  project  team  was  convinced  that  their  users 
would  have  a  generally  low  level  of  familiarity  with  the  Microsoft 
Windows  platform  they  planned  for  their  product.  They  were 
thus  prepared  to  depart  significantly  from  the  Windows  platform 
user  interface  standards  in  their  product  user  interface.  The  User 
Profile  questionnaire,  however,  revealed  a  generally  high  level 
of  Windows  experience.  This,  and  the  fact  that  the  Windows  user 
interface  standard  was  a  good  fit  for  the  application  functionality, 
led  me  to  strongly  advise  them  to  adopt  the  Windows  standards 
as  closely  as  possible.  They  were  still  interested  in  creating  their 
own  unique  user  interface,  but  early  testing  of  several  alternative 
designs  that  varied  in  how  faithfully  they  followed  Windows  stan¬ 
dards  clearly  showed  that  users  learned  much  more  quickly  the 
more  consistent  the  design  was  with  Windows  standards. 

On  the  other  project,  the  development  team  felt  quite  con¬ 
fident  that  users  would  generally  have  high  levels  of  computer 
literacy.  An  extensive  User  Profile  questionnaire  of  more  than 
800  users,  however,  revealed  that  only  a  very  small  percentage 
of  potential  users  had  any  experience  with  computer  software  at 
all,  let  alone  the  Windows  user  interface  standards.  In  this  case, 
based  on  this  User  Profile  data,  we  designed  (and  validated 
through  testing)  a  highly  simplified  user  interface  that  departed 
significantly  from  the  Windows  standards. 

In  both  of  these  cases,  two  things  are  clear: 

•  Project  team  members  often  have  serious  misconceptions 

about  the  key  characteristics  of  their  users; 

•  These  misconceptions  could  lead  teams  to  design  inappro¬ 
priate  user  interfaces  for  those  users. 

Work  Environment  Requirements 

It  is  also  important  to  understand  the  environment  in  which 
users  will  be  utilizing  a  product  to  carry  out  their  tasks,  because 


this  environment  will  place  constraints  on  how  they  work  and 
how  well  they  work. 

By  analogy,  suppose  a  screwdriver  is  being  designed  and  all 
that  is  known  is  the  size  of  the  screw  head  it  must  fit.  So,  some¬ 
thing  like  a  traditional  screwdriver  is  designed,  with  the  cor¬ 
rectly  sized  blade.  But,  suppose  it  then  turns  out  that  the  user 
needs  to  apply  the  screw  from  the  inside  of  a  narrow  pipe  to 
assemble  some  piece  of  equipment.  Clearly,  a  traditional  screw¬ 
driver  will  be  useless  in  this  work  context. 

Similarly,  suppose  a  software  application  is  being  designed 
for  a  set  of  users,  but  the  designers  have  never  gone  to  the 
users’  actual  work  environment.  They  assume  a  traditional 
office-like  environment  and  design  software  that  will  work  on  a 
traditional  workstation.  But,  suppose  it  then  turns  out  that  in 
the  actual  work  environment,  users  are  constantly  in  motion, 
moving  all  around  the  environment  to  get  different  parts  of  an 
overall  job  done.  If  software  for  a  traditional  workstation  is  de¬ 
signed,  this  will  simply  not  work  in  this  environment.  Software 
that  will  run  on  a  smaller  and  more  portable  device  that  can  be 
carried  around  with  the  user,  such  as  the  units  carried  by  UPS 
delivery  staff,  would  be  required  instead. 

In  another  example,  suppose  designers  have  never  visited 
the  user’s  workplace,  and  they  assume  users  all  work  in  closed 
offices.  So,  a  system  with  voice  input  and  output  is  designed. 
But,  it  then  turns  out  that  users  work  in  one  big  open  area  with 
desks  located  right  next  to  one  another.  The  noise  from  all 
those  talking  people  and  workstations  will  create  an  impossi¬ 
ble  work  environment,  and  most  voice-recognition  systems  sim¬ 
ply  do  not  work  with  acceptable  accuracy  in  a  noisy  environ¬ 
ment.  The  point  is,  there  are  many  aspects  of  the  actual  work 
environment  that  will  determine  how  well  a  tool  will  work  in 
that  environment,  and  so  the  environment  itself  must  be  stud¬ 
ied  and  the  tool  tailored  to  it. 

A  real  example  of  the  importance  of  understanding  the 
users’  work  environment  comes  from  a  project  I  worked  on 
with  a  large  metropolitan  police  department.  Requirements- 
analysis  activities  revealed  that  in  the  typical  police  station,  the 
appearance  of  the  interior  is  dark,  run-down,  and  cluttered;  the 
lighting  is  harsh  and  artificial;  and  the  air  is  close  and  sometimes 
very  hot.  The  noise  level  can  be  high,  the  work  areas  are 
cramped  and  cluttered,  and  the  overall  atmosphere  is  tense  and 
high-pressured  at  best,  chaotic  and  sometimes  riotous  at  worst. 
These  conditions  most  likely  have  a  general  impact  on  morale, 
and  certainly  will  have  an  impact  on  cognitive  functioning  that, 
in  turn,  will  impact  productivity  and  effectiveness.  A  user  inter¬ 
face  must  be  carefully  designed  to  support  the  natural  and  pos¬ 
sibly  extreme  degradations  of  human  performance  under  these 
conditions. 

In  addition,  it  was  observed  that  in  the  noisy,  stressful,  and 
distracting  work  environment  in  a  typical  police  station,  users 
will  frequently  be  interrupted  while  performing  tasks,  some¬ 
times  by  other  competing  tasks,  sometimes  by  unexpected 
events,  unpredictable  prisoners,  and  so  forth.  A  user  interface  in 
such  an  environment  must  constantly  maintain  enough  context 
information  on  the  screen  so  that  when  users’  attention  is  tem¬ 
porarily  but  frequently  drawn  away  from  their  tasks,  they  can 
quickly  get  reoriented  and  continue  their  task  without  errors, 
and  not  have  to  backup  or  repeat  any  work. 
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Task  Requirements 

Besides  the  users  themselves  and  the  environments  they  work 
in,  the  tasks  they  do  also  have  their  own  inherent  requirements. 

One  compelling  example  of  the  need  for  a  thorough  under¬ 
standing  of  users’  tasks  to  achieve  usable  design  comes  from 
one  of  the  earliest  books  on  computer-human  interaction  (Ru¬ 
binstein  &  Hersh,  1984,  p.  26). 

For  example,  recently  a  water  district  in  Maine  installed  an 
online  billing  system  so  that  when  a  customer  called  in,  the 
clerk  could  quickly  retrieve  a  record  of  usage  and  billing  for  that 
customer.  The  managers  noticed  that  the  clerks  were  less  than 
happy  with  the  new  system.  A  little  investigation  revealed  that 
with  the  manual  system,  the  clerks  would  write  pertinent  infor¬ 
mation  in  the  margins  of  the  records,  such  as  that  one  person 
pays  on  the  15th  of  the  month  rather  than  on  the  first,  or  that 
the  meter  reader  should  contact  the  neighbor  across  the  street 
for  access  to  a  house.  This  informal  information  was  critical  to 
the  smooth  operation  of  the  office,  but  because  there  was  no 
provision  for  it  in  an  official  field  of  the  payment-record  form, 
the  information  was  lost  in  the  conversion  to  the  online  sys¬ 
tem.  Understanding  that  there  is  informal  as  well  as  official  in¬ 
formation  and  recognizing  the  importance  of  the  former  would 
have  reduced  the  disruption  in  work  style. 

Another  good  example  of  what  can  be  missed  by  not  under¬ 
standing  users’  tasks  is  found  in  Coble,  Karat,  and  Kahn  (1997). 
This  report  described  the  use  of  task-analysis  techniques  to 
study  the  functional  requirements  of  physicians  for  a  clinical 
workstation: 

Before  the  .  .  .  session  started,  a  physician  explained  the  purpose  and 
details  of  each  section  in  his  office  chart.  .  .  .  Later,  when  he  was  doing 
actual  work  in  context,  the  person  performing  the  .  .  .  session  noticed 
that  the  note  he  was  looking  at  was  written  with  red  ink.  She  probed 
and  the  physician  said  it  told  him  the  previous  encounter  with  this  pa¬ 
tient  was  a  hospital  visit.  That  fact  told  him  he  needed  to  review  the  hos¬ 
pital  discharge  summary  and  hospital  laboratory  results  before  enter¬ 
ing  the  patient’s  exam  room.  The  physician  was  surprised  that  he  had 
not  mentioned  that  need  before.  It  was  so  ingrained  in  how  he  worked 
that  he  did  not  even  process  that  highly  relevant  detail  consciously  any¬ 
more  (p.  231).  [Italics  mine.] 

In  another  example  from  my  own  experience,  I  was  once 
performing  a  requirements  analysis  in  a  police  department  and 
observing  police  officers  using  a  system  of  standardized  paper 
forms  to  document  property.  I  observed  infrequent  users  of  the 
forms  struggling  with  them;  there  were  many  complex  forms 
and  many  undocumented  rules  about  how  to  fill  them  out.  Later 
I  observed  very  frequent  users  using  them.  The  frequent  users 
tended  to  ignore  the  forms  initially  and  take  freeform  notes  de¬ 
scribing  the  property  they  had  to  document  in  a  format  very  dif¬ 
ferent  from  that  required  on  the  forms.  They  then  transcribed 
their  own  notes  onto  the  forms  as  required.  It  was  clear  from 
this  observation  (and  from  follow-up  interviews  with  the  fre¬ 
quent  users)  that  the  forms  were  not  designed  in  a  way  that  sup¬ 
ported  the  users’  task,  and  I  learned  a  lot  from  how  frequent 
users  worked  around  the  forms  with  their  own  format  that 
helped  in  designing  better  online  forms.  A  traditional  systems 
analysis  would  typically  not  involve  studying  users  actually  using 
paper  forms  in  their  actual  work,  but  would  have  instead  taken 


the  paper  forms  themselves  as  a  description  of  the  work  to  be 
automated.  Problems  with  the  forms  would  have  been  missed 
entirely  and  perpetuated  in  the  online  version  of  the  task. 

There  are  always  a  great  many  things  like  this  that  users  sim¬ 
ply  will  not  think  to  report  during  an  offsite  interview  or  “focus 
group”  that  only  emerge  during  in-context  observations  of  peo¬ 
ple  doing  real  work  in  their  workplaces.  Such  things  can  have 
major  implications  for  product  user  interface  design. 

Here  is  another  example,  this  time  from  the  context  of  a  cus¬ 
tomer  service  application  in  an  insurance  company.  In  the  navi¬ 
gational  model  illustrated  in  Fig.  47.2,  the  free  text  in  normal  in¬ 
tensity  represents  the  basic  user  tasks  identified  from  initial 
in-context  observations  with  users.  The  bold  text  in  boxes  rep¬ 
resents  the  hierarchy  of  groups  that  the  design  team  developed, 
with  labels  they  created — again,  all  based  on  initial  task  analysis 
activities.  (This  hierarchy  represents  only  a  partial  set  of  the  func¬ 
tionality  required  by  a  Customer  Support  Rep.  For  simplicity,  not 
all  groups  or  low-level  tasks  are  included  in  the  example.) 

In  contrast  to  the  designer-generated  model  in  Fig.  47.2,  Fig. 
47.3  shows  a  user-generated  model.  Again,  the  free  text  in  nor¬ 
mal  intensity  represents  the  low-level  tasks  presented  to  the 
user,  each  one  on  a  separate  index  card.  This  time  the  bold  text 
in  boxes  represents  the  groups  that  one  user  formed  from  the 
low-level  tasks,  with  labels  they  suggested. 

The  differences  between  the  designer-generated  model  and 
the  user-generated  model  of  these  tasks  are  significant. 

For  example,  the  design  team  organized  all  types  of  Changes 
(e.g..  Address,  Beneficiary)  under  a  single  category  (“Change 
Requests”)  within  the  category  Sales  Support,  whereas  this  user 
distinguished  between  those  changes  that  have  to  do  with  cus- 


Incomplete  App 
Missing  Evidence 


FIGURE  47.2.  A  Designer’s  Navigational  Model.  Admin  = 
Administration;  App  =  application;  Mgmnt  =  management. 
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FIGURE  47.3.  A  User's  Navigational  Model.  Admin  =  Adminis¬ 
tration;  App  =  application. 


tomer  information  and  those  that  have  to  do  with  policy  details, 
but  located  both  under  the  category  Customer  Support  rather 
than  Sales  Support.  As  it  turned  out,  Customer  Support  staff  can 
make  simple  changes  to  customer  information  on  their  own, 
but  more  complex  paperwork  and  approvals  are  required  for 
policy  changes.  Thus,  these  are  two  distinct  categories  of  tasks 
to  users.  Even  though  requests  for  these  types  of  changes  of¬ 
ten  come  from  customers  via  sales  agents,  this  user  still  re¬ 
garded  them  as  really  reflecting  Customer  Support. 

Similarly,  the  design  team  lumped  all  Office  Administration 
tasks  in  one  group,  whereas  this  user  distinguished  between 
daily  and  occasional  office  tasks.  Different  people  get  assigned 
to  these  types  of  tasks,  so,  again,  this  is  a  meaningful  distinc¬ 
tion  to  users. 

The  designers  imagined  a  category  unto  itself  called  Follow- 
Up,  in  which  incomplete  tasks  of  all  types  (e.g..  Incomplete 
App)  are  all  located,  whereas  this  user  considered  follow-up 
activities  as  belonging  with  the  original  task  types  with  which 
they  are  associated. 

Finally,  note  that  the  user  regarded  New  Sales  (data  entry  of 
new  policy  applications)  as  its  own  category,  whereas  the  de¬ 
signers  had  seen  this  as  a  subcategory  of  Sales  Support — a  very 
different  perception. 

One  important  thing  to  note  in  comparing  this  user’s  model 
of  tasks  with  the  designer’s  first  pass  at  a  model  of  tasks  is  that, 
for  any  set  of  low-level  tasks,  there  may  be  a  large  number  of 
very  different  “logical”  task  organizations  possible,  but  there  will 
only  be  a  small  number  of  rather  similar  ones  that  will  make 
sense  to  users,  given  their  actual  work.  A  traditional  systems 
analysis  will  decompose  high-level  functional  requirements  to 
all  the  corresponding  low-level  tasks,  but  it  typically  does  not 
ensure  that  the  task  organization  most  consistent  with  the 
users’  models  of  their  tasks  will  be  presented  to  the  user  in  the 
user  interface  to  a  product  intended  to  support  those  tasks. 
That  is  one  of  the  main  goals  of  a  task  analysis. 

Also  note  that  this  user’s  model  is  most  likely  not  exactly  the 
same  as  other  users’  models,  although  in  most  cases,  there  will 


be  a  lot  of  similarity  across  individual  users’  models.  It  is  neces¬ 
sary  to  consolidate  all  the  sampled  users’  task  models  and  gen¬ 
erate  one  consolidated  model  (in  the  same  format  as  the  indi¬ 
vidual  users’  models,  as  illustrated  in  Fig.  47.3)  that  captures  all 
the  across-user  commonalties  as  well  as  possible. 

One  last  example  from  my  recent  experience  illustrates  the 
importance  of  studying  users’  work  in  depth  to  design  the  user 
interface  to  a  software  application  that  supports  that  work. 

In  this  example,  the  application  was  a  database  system  in¬ 
tended  to  support  users  in  a  government  agency.  The  mission  of 
the  agency  was  to  uncover  and  prosecute  criminal  behavior  of 
a  specific  type.  The  job  of  the  users  of  the  intended  system  was 
to  manage  publicity  for  their  agency  in  the  media.  These  users 
needed  to  interact  with  the  media  to  enhance  reporting  on 
criminal  cases  to  use  the  media  to  help  discourage  criminal 
behavior  and  encourage  public  cooperation  in  reporting  and 
investigating  criminal  behavior. 

The  database  of  information  being  provided  to  these  users 
included  information  on  criminal  cases  in  progress  relevant  to 
this  agency,  prior  press  releases  and  news  articles  on  relevant 
criminal  cases,  a  “phonebook”  of  contact  information  for  re¬ 
porters  and  other  contacts  in  the  media,  records  of  past  com¬ 
munications  with  the  media,  and  so  forth.  A  card-sorting  exer¬ 
cise  revealed  that  users  considered  these  categories  of  data 
types — Cases,  Documents,  Media  Contact  Information,  Media 
Communications — to  be  distinct  and  meaningful  categories 
that  would  provide  a  good  foundation  for  the  navigational 
structure  of  the  application.  However,  what  was  not  revealed  by 
the  card  sorting  exercise,  but  did  become  clear  through  other 
methods  of  studying  users’  work  in  context,  was  that  when 
users  searched  for  and  found  a  particular  item  in  one  of  these 
categories,  such  as  a  case,  they  then  typically  wanted  to  auto¬ 
matically  see  items  in  other  categories  related  to  the  item  they 
had  initially  looked  up. 

Based  on  the  card-sort  data,  I  had  initially  designed  a  navi¬ 
gational  structure  in  which  the  user  first  selected  a  category  of 
data,  and  then  searched  for  items  in  that  category.  Assuming 
from  the  card-sort  data  that  a  search  for  one  type  of  data  was 
independent  from  searching  for  another  type,  I  designed  the  in¬ 
teraction  such  that  searching  for  something  within  any  category 
of  data  was  independent  of  searching  for  something  within  any 
other  category  of  data.  That  is,  if  the  user  searched  for  and 
found  a  case,  and  then  navigated  to  the  category  of  Documents, 
or  to  the  category  of  Media  Contact  Information,  the  initial  de¬ 
sign  assumed  that  the  user  wanted  to  start  an  independent,  un¬ 
related  new  search  in  that  category.  However,  further  discussion 
with  users  revealed  that  in  fact  when  the  user  searched  for  and 
found  a  case,  it  was  most  likely  that  he  or  she  would  then  want 
immediate  access  to  documents  related  to  that  case,  or  contact 
information  related  to  that  case.  Similarly,  if  the  user  initially 
searched  for  a  particular  reporter’s  contact  information,  then  he 
most  likely  would  be  interested  in  then  seeing  all  documents  re¬ 
lated  to  this  reporter,  and/or  all  cases  related  to  this  reporter. 
This  required  a  very  different  navigational  and  interaction  de¬ 
sign.  Even  though  card-sorting  data  had  been  collected,  with¬ 
out  the  ongoing  input  of  users  during  early  design,  it  would 
have  been  very  easy  to  design  the  application  in  such  as  way  as 
to  make  the  users’  most  typical  type  of  task  very  tedious  and 
cumbersome. 
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Clearly,  it  is  possible,  even  likely,  to  design  a  user  interface 
that  does  not  support  users,  their  tasks,  or  their  work  environ¬ 
ment  if  an  in-depth  requirements  analysis  is  not  conducted 


prior  to  design  and  incorporated  into  design.  The  chapters  that 
follow  describe  and  discuss  a  variety  of  techniques  for  gathering 
these  all-important  requirements  specifications. 


References 


Beyer,  H.,  &  Holtzblatt,  K.  (1998).  Contextual  design.  San  Francisco: 
Morgan  Kaufmann  Publishers,  Inc. 

Coble,  J.  M.,  Karat,  J.,  &  Kahn,  M.  G.  (1997).  Maintaining  a  focus  on  user 
requirements  throughout  the  development  of  clinical  workstation 
software.  CHI  ’97  Proceedings. 

Hackos,  J.  T.,  &  Redish,  J.  C.  (1998).  User  and  task  analysis  for  interface 
design.  New  York:  John  Wiley  &  Sons,  Inc. 


Mayhew,  D.  J.  (1999).  The  usability  engineering  lifecycle.  San  Francisco: 
Morgan  Kaufmann  Publishers. 

Pruitt,  J.,  &  Adlin,  T.  (2006).  The  persona  lifecycle:  Keeping  people  in 
mind  throughout  product  design.  San  Francisco:  Morgan  Kaufmann 
Publishers/Elsevier. 

Rubinstein,  R.,  &  Hersh,  H.  (1984).  The  human  factor:  Designing  com¬ 
puter  systems  for  people .  Burlington,  MA:  Digital  Press. 


Z48 


♦ 


TASK  ANALYSIS 


Catherine  Courage 

Salesforce.com 

]anice  [Ginny]  Redisfi 

Redish  &  Associates,  Im. 


Dennis  Wixon 

Microsoft 


Introduction  . 928 

Defining  Task  Analysis . 928 

Task  Analysis  in  this  Chapter  . 928 

Considering  Four  Principles  That  Underlie 

Task  Analysis  . 929 

Task  Analysis  Is  an  Integral  Part  of  a 

Broader  Analysis  . 929 

Task  Analysis  Includes  Understanding 

Users’  Goals  . 929 

Considering  Norman’s  entire  action 

cycle  as  task  analysis  . 929 

Task  Analysis  Is  Relevant  at  All  Stages 

of  the  Process  . 930 

Practical  Reality  Impinges  on  What  We 

Actually  Do  . 930 

Selling  Task  Analysis . 930 

Dispelling  the  Myths  That  Lead  to  Resistance . 931 

We  do  market  research;  We  do  not  need 

task  analysis . 931 

Task  analysis  is  too  time  consuming 

and  costly . 931 

Using  Other  Opportunities  to  Sell  Task  Analysis . 931 

Making  Task  Analysis  Part  of  the  Standard 
Development  Process  . 931 


Planning  For  a  Task  Analysis 

(Issues  to  Consider)  . 931 

Background  Research  . 931 

Getting  Into  the  Project  Plan  . 932 

Getting  Sign-off  and  Meeting  Schedules  . 932 

Providing  Useful  and  Usable  Data  . 932 

Deciding  What  the  Project  Team  Needs  . 932 

Where  is  the  product  in  its  overall  life  cycle?  ....  933 
How  much  time  do  you  have  to  conduct 

your  task  analysis?  . 933 

How  broad  or  specialized  is  the  user 

population  for  the  product?  . 933 

How  widespread  geographically,  culturally, 
and  linguistically  is  the  user  population 

for  the  product? . 933 

How  detailed  must  we  specify  the  tasks? . 933 

Is  this  a  special  type  of  product  for  which 

traditional  task  analysis  may  not  be  useful? . 933 

Deciding  on  an  Appropriate  Level 

of  Granularity . 933 

Deciding  Where  to  Start . 933 

Gathering  Reusable  Data  . 933 

Going  to  Different  Users  at  Different  Times  . 934 

Collecting  Task  Analysis  Data  . 934 


927 


928  •  COURAGE,  REDISH,  WIXON 


Preparing  Philosophically  to  Work  with  Users . 934 

Treat  the  user  with  respect  . 934 

Understand  that  you  are  a  visitor  (guest) 

in  the  users’  environment  . 934 

Be  in  learning  mode  . 934 

Let  the  user  be  a  mentor . 934 

Be  a  partner  in  the  task  of  understanding 

the  user’s  world . 934 

Selecting  Users  and  Environments . 934 

Make  a  convenience  sample  as 

representative  as  possible  . 934 

Do  not  limit  task  analysis  to 

development  partners  . 935 

Conducting  Site  Visits  . 935 

Observing  Users . 935 

Interviewing  Users . 936 

Doing  Rapid  Task  Analysis  . 936 

Narrow  the  scope . 936 

Using  multiple  teams . 936 

Schedule  your  sessions  around  activity  peaks  ....  937 
Bringing  Users  to  You  (Field  Studies  in  the  Lab)  ....  937 

Using  the  lab  for  procedural  task  analysis . 937 

Using  the  lab  for  quantitative  task  analysis  . 937 

Conducting  Remote  Task  Analysis  . 937 

Conducting  International  Task  Analysis 

Studies  . 938 

Selling  an  International  Study  . 938 

Preparing  For  Your  International  Task  Analysis . 938 

Translators  . 938 

Recruiting . 938 

Logistics  . 938 

Background  research . 939 


INTRODUCTION 


Successful  design  comes  from  a  marriage  of  users’  goals  and 
(usually)  new  technologies.  Successful  design  does  not  neces¬ 
sarily  perpetuate  users’  current  ways  of  working,  but  it  is  built 
on  a  deep  understanding  of  those  ways  and  of  how  a  new  de¬ 
sign  will  change  them. 

In  this  chapter,  we  explore  modern  interpretations  and  uses 
of  task  analysis.  The  first  section  defines  the  meaning  of  task 
analysis  in  this  chapter. 

The  five  following  sections  serve  as  a  practical  guide. 


DEFINING  TASK  ANALYSIS 


Task  analysis  has  different  meanings  to  different  authors. 
Rather  than  delve  into  all  of  the  different  interpretations,  we 
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give  a  single  definition  here.  (For  a  thorough  comparison  of  task 
analysis  definitions,  see  Redish  &  Wixon,  2003.) 

Task  Analysis  in  This  Chapter 

Task  analysis  means  understanding  users’  work.  Thus,  task 
analysis  encompasses  all  sorts  of  techniques,  including  natural¬ 
istic  observations,  interviews,  shadowing  users,  doing  day  in  the 
life  of  studies,  conducting  ethnographic  interviews,  and  observ¬ 
ing  and  listening  to  users  doing  specific  tasks.  It  includes  gath¬ 
ering  information  that  leads  to  insights  about  users’  lives  at 
work,  to  scenarios  and  use  cases,  and  sometimes  to  detailed 
flowcharts  of  work  processes  or  specific  procedures. 

A  major  emphasis  of  task  analysis  is  predesign,  and  three  types 
of  analysis — user,  task,  and  environmental — are  necessary  input 
to  designing  any  product.  Therefore,  task  analysis  is  an  integral 
part  of  a  triangle  that  covers  users,  tasks,  and  environments.  As 
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described  in  more  detail  in  the  following  sections,  task  analysis 
goes  hand  in  hand  with  understanding  users  (user  analysis)  and 
understanding  the  users’  physical,  technological,  cultural,  social, 
and  political  environments  (environmental  analysis). 

Users  are  critical  to  all  three  types  of  analysis.  Task  analysis 
requires  watching,  listening  to,  and  talking  with  users.  Other 
people,  such  as  managers  and  supervisors,  and  other  informa¬ 
tion  sources,  such  as  print  or  online  documentation,  are  only 
secondarily  useful  for  task  analysis.  Relying  on  them  may  lead  to 
a  false  understanding. 

"We  believe  that  task  analysis  does  not  stop  with  design  (cf , 
Kirwan  &  Ainsworth,  1992).  At  every  stage  of  the  design  and  de¬ 
velopment  process,  task  analysis  is  critical.  It  is  the  major  input 
to  use  cases  and  design  specifications,  and  it  helps  us  under¬ 
stand  how  the  emerging  product  affects  users.  It  is  the  key  to 
evaluating  designs  as  scenarios  for  heuristic  evaluations  and  for 
usability  testing.  Task  analysis  must  be  the  organizing  principle 
for  documentation  and  training. 

"We  recognize  that  efficiency  oriented,  detailed  task  analy¬ 
ses,  such  as  TAG  and  GOMS,  have  a  place  in  evaluating  some 
products,  especially  those  for  which  efficiency  on  the  order  of 
seconds  saved  is  important  (for  example,  see  Gray,  John,  &  At¬ 
wood,  1993).  However,  that  type  of  task  analysis  is  not  the  fo¬ 
cus  of  this  chapter.  The  focus  here  is  a  broad  understanding  of 
the  world  in  which  the  new  product  will  be  used. 

Considering  Four  Principles  That  Underlie  Task  Analysis 

The  practical  advice  for  task  analysis  later  in  this  chapter  is  based 
on  four  principles: 


users 


FIGURE  48. 1 .  Triangulating  information  about  users,  tasks,  and 
users'  environments. 


-  How  motivated  are  they? 

-  What  mental  models  do  they  have  of  the  activities  your 
product  covers? 

•  Users’  environments 

-  Physical  situation  in  which  the  work  occurs 

-  Technology  available  to  the  user  (What  you  might  need  to 
find  out  about  could  range  from  what  modem  speed  users 
have  to  how  often  power  is  interrupted  to  the  cost  of  up¬ 
grading  equipment  to  finding  an  opportunity  for  your  prod¬ 
uct  because  the  users’  environments  favor  low  technology.) 

-  Social,  cultural,  and  linguistic  considerations  (What  will 
make  the  new  product  acceptable  in  the  users’  world?  How 
will  the  new  product  change  the  users’  world?  How  will 
you  help  the  users  make  the  transition  from  the  old  world 
to  the  new  world?) 


1.  Task  analysis  is  an  integral  part  of  a  broader  analysis  that  in¬ 
cludes  understanding  users  and  their  environments. 

2.  Task  analysis  includes  understanding  users’  goals. 

3.  Although  the  focus,  methods,  granularity,  and  presentation 
of  information  may  differ  at  different  times,  task  analysis  is 
relevant  at  all  stages  of  the  design  and  development  process. 

4.  The  practical  reality  is  that  task  analysis  for  a  given  project 
depends  on  many  factors. 

Tas^  Analysis  Is  an  Integral  Part  of  a  Broader  Analysis 

The  first  principle  is  that  task  analysis  by  itself  is  not  enough 
to  give  you  the  understanding  that  you  need  to  design  or  eval¬ 
uate  a  product.  The  methodology  you  need  brings  together  in¬ 
formation  about  three  interwoven  elements:  users,  tasks,  and 
environments. 

Fig.  48.1  shows  how  all  three  of  these  analyses  must  come  to¬ 
gether  as  you  triangulate  what  you  learn  to  gain  the  under¬ 
standing  you  need. 

Task  analysis  is  one  corner  of  the  triangle  of  understanding 
that  you  need.  The  other  two  are 

•  Users 

-  'Who  are  they? 

-  'What  characteristics  are  relevant  to  what  you  are  designing? 

-  'What  do  they  know  about  the  technology? 

-  'What  do  they  know  about  the  domain? 


Tas^  Analysis  Includes  Understanding  Users’  Goals 

The  second  principle  is  that  a  task  is  what  someone  does  to 
achieve  a  goal. 

Considering  Norman ’s  entire  action  cycle  as  task 
analysis.  As  Norman  (1988)  explained,  “To  get  something 
done,  you  have  to  start  with  some  notion  of  what  is  wanted — 
the  goal  that  is  to  be  achieved”  (p.  46).  It  is  true,  as  Norman 
also  said,  that  we  cannot  always  articulate  our  goals  clearly.  How¬ 
ever,  in  general,  we  do  start  with  goals,  such  as 

•  Making  the  family  happy  by  getting  dinner  on  the  table 

•  Getting  a  draft  of  a  paper  to  a  coauthor  in  a  different  city 

Norman  (1988)  also  described  how  we  go  about  trying  to 
meet  our  goals,  or  how  we  act.  He  gives  us  the  seven-stage  cycle 
shown  in  Fig.  48.2. 

'We  can  think  of  forming  the  intention  as  deciding  what  tasks 
to  do  to  meet  a  goal.  'We  can  think  of  specifying  an  action  as  fig¬ 
uring  out  what  steps  and  decisions  to  take  to  carry  out  the  task. 
Executing  the  action  would  then  be  actually  doing  the  steps  and 
carrying  out  the  decisions  that  we  have  decided  are  the  way  to 
complete  the  task. 

'We  can  think  of  perceiving  the  state  of  the  world  as  seeing 
the  results  of  the  steps  and  decisions  that  we  took  and  inter- 
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The  Action  Cycle 

1 .  Forming  the  goal 

2.  Forming  the  intention 

3.  Specifying  an  action 

4.  Executing  the  action 

5.  Perceiving  the  state  of  the  world 

6.  Interpreting  the  state  of  the  world 

7.  Evaluating  the  outcome 


FIGURE  48.2.  Donald  Norman's  view  of  how  people  get  things 
done  (1988,  48). 


preting  the  state  of  the  world  as  relating  the  results  to  our  un¬ 
derstanding  of  how  to  carry  out  the  task.  Evaluating  the  out¬ 
come  would  then  be  judging  whether  we  have  successfully  met 
the  original  goal. 

Tas^  Analysis  Is  Relevant  at  All  Stages  of  the  Process 

Our  third  principle  is  that  task  analysis  belongs  everywhere 
in  the  process  of  planning,  designing,  developing,  and  evalu¬ 
ating  a  product.  Task  analysis,  like  so  much  else  in  the  user- 
centered  design  process,  should  be  done  iteratively.  The  focus, 
methods,  granularity,  and  presentation  may  change  over  time  as 
different  questions  and  different  types  and  levels  of  information 
become  more  or  less  relevant  (for  a  detailed  discussion  of  task 
analysis  at  different  stages,  see  Redish  &  Wrxon,  2003). 

Table  48.1  shows  the  types  of  questions  that  task  analysis 
might  help  answer  at  different  times  in  the  product  life  cycle. 

Practical  Reality  Impinges  on  What  We  Actually  Do 

In  the  fast-paced  world  of  software  and  web  design,  in  reality, 
what  we  can  do  for  a  task  analysis  (or  any  other  aspect  of  user- 
centered  design)  depends  on  many  factors.  These  factors  include: 

•  Time 

•  Resources 

•  People 

•  Availability  of  users  to  observe  and  talk 

•  Travel  restrictions 

This  chapter  is  meant  to  help  you  consider  and  decide  the 
best  approaches  for  whatever  situation  in  which  you  find  your¬ 
self  First,  we  discuss  ways  to  convince  clients  and  others  in  your 
company  of  the  importance  of  usability  techniques  like  task 
analysis. 


TABLE  48. 1 . 

Task  Analysis  Questions  at  Different  Times 

Exampies  of  Questions  That  Task  Analysis 

Stage 

Should  be  Used  to  Answer 

Strategic  Planning 

Why  would  someone  or  some  organization 
choose  to  use  this  product? 

What  goais  in  their  world  would  this  product 
help  to  meet? 

What  benefits  are  most  meaningful  and  valuable 
to  users? 

How  will  this  product  be  perceived  by  different 
cultures? 

Predesign 

What  are  the  alternatives  currently  available  and 
technologically  possible  that  would  address  the 
why  questions  listed  above? 

How  do  users  achieve  relevant  goals  today? 

What  works  well  and  what  does  not? 

How  could  our  product  make  that  easier? 

Information 

What  do  users  know  and  what  are  their 

Architecture 

environments? 

How  do  users  organize  their  world? 

What  vocabulary  do  users  use  today  for  their 
goals  and  tasks? 

How  can  we  incorporate  that  vocabulary? 

Concept  Design 

What  metaphors  are  familiar  to  users? 

Interface  Design 

What  do  users  know  about  interface  conventions? 
How  does  the  task  flow  of  the  new  product  match 
users'  expectations  from  their  current  work? 

If  we  are  changing  users'  task  flows,  how  can  we 
build  in  help  for  transitions? 

Early  Prototypes 

What  tasks  should  we  provide  to  heuristic 
reviewers  and  usability  testers? 

Development 

What  does  the  user  know  that  would  address  the 
problems  we  have  uncovered? 

What  changes  should  we  make  to  the  interface 
and  information  to  better  match  users' 
expectations  and  work? 

Post-release 

How  well  does  this  release  match  the  user/ 
business  needs  that  we  uncovered  in  the 
original  strategic  planning  phase? 

Are  users  now  better  able  to  achieve  their  goals 
than  they  were  before  they  had  this  product? 

SELLING  TASK  ANALYSIS 


In  the  field  of  usability  and  design,  usability  testing  has  become 
a  well-understood  part  of  the  user-centered  design  process. 
This  is  a  great  advance  in  product  development.  However, 
equally  important  upfront  activities,  such  as  task  analysis,  have 
not  yet  made  the  same  strides. 

Usability  testing  is  more  easily  justified  and  understood  be¬ 
cause  it  is  tied  directly  to  the  product  or  a  prototype  of  the 
product  and  it  often  produces  quantifiable  metrics.  In  addition, 
numerous  published  case  studies  demonstrate  the  value  of  us¬ 
ability  testing. 

Even  product  teams  that  are  very  receptive  to  usability  test¬ 
ing  may  be  resistant  to  task  analysis  and  other  requirements  ac¬ 
tivities.  You  may  need  to  be  prepared  with  evidence  of  the  mer¬ 
its  of  task  analysis  and  other  upfront  activities.  Selling  task 
analysis  is  critical;  it  does  not  matter  how  well  you  understand 
how  to  do  an  activity  if  you  are  unable  to  conduct  it. 
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Dispelling  the  Myths  That  Lead  to  Resistance 

Resistance  comes  primarily  from  a  lack  of  understanding.  Team 
members  may  respond  to  requests  to  do  task  analysis  with  state¬ 
ments  like  these: 

•  We  do  market  research;  we  do  not  need  task  analysis 

•  Task  analysis  is  too  time  consuming  and  costly 

We  do  market  research;  We  do  not  need  task  analy¬ 
sis.  A  product  team  may  say,  “Marketing  has  talked  to  the 
users  of  our  product.”  This  may  be  true,  but  the  goals  of  mar¬ 
ket  research  and  task  analysis  differ.  Marketing  focuses  on  busi¬ 
ness  decisions  and  forecasting.  They  want  to  understand  what 
will  make  the  product  sell.  User  research,  including  task  analy¬ 
sis,  focuses  on  understanding  how  the  product  will  be  used,  so 
that  the  product  can  be  designed  in  a  way  that  is  useful  and  easy 
to  use  for  the  user. 

Market  research  tends  to  focus  on  breadth,  while  task  analy¬ 
sis  focuses  on  depth.  For  example,  while  market  research  may 
tell  you  that  users  are  interested  in  a  particular  feature,  task 
analysis  tells  you  how  they  would  use  the  feature. 

It  is  also  important  to  note  that  market  research  may  not  al¬ 
ways  collect  data  from  true  end  users.  In  many  cases,  the  person 
who  buys  the  product  (marketing’s  target)  is  not  the  same  as 
the  person  who  uses  it  (task  analysis’  target).  A  good  example 
is  software  for  running  a  call  center.  The  CIO  or  senior  level 
management  drives  the  decision  as  to  what  call  center  software 
to  buy,  but  they  will  never  use  it. 

The  two  types  of  research  differ.  Both  are  equally  important 
and  have  their  place  in  product  development.  They  can  even 
be  very  complementary.  However,  their  goals,  and  thus  their 
methods,  differ. 

Task  analysis  is  too  time  consuming  and  costly.  Time 
and  resources  are  two  key  restraints  of  product  development. 
Product  teams  are  often  hesitant  to  invest  in  activities  that  will  im¬ 
pact  time  and  resources  if  they  are  uncertain  of  their  true  value. 

However,  the  truth  is  that  the  return  on  investment  of  task 
analysis  is  high.  Task  analysis  may  cost  time  and  money  upfront, 
but  it  will  ultimately  save  time  and  money,  and  even  cause  your 
company  to  earn  more  money  in  the  end. 

Task  analysis  saves  money  during  design.  Understanding 
how  users  work  makes  the  design  stage  move  much  more 
quickly.  You  also  avoid  the  many  design  bottlenecks  that  come 
from  each  team  member  feeling  that  his  or  her  viewpoint  rep¬ 
resents  the  users’  viewpoints. 

In  addition,  task  analysis  saves  money  during  development. 
User  requirements  can  help  to  keep  feature  creep  in  check.  Re¬ 
moving  this  disruption  for  development  can  be  a  huge  time  and 
money  saver. 

Task  analysis  also  saves  money  during  usability  testing.  You 
will  find  that  your  product  tests  better  and  that  fewer  design  it¬ 
erations  are  required. 

Because  a  product  that  meets  users’  needs  decreases  sup¬ 
port  calls  and  training  costs,  task  analysis  saves  money  after 
sales.  This  is  a  benefit  for  your  company  and  the  companies  and 
individuals  who  buy  your  product. 


Task  analysis  can  increase  your  company’s  profits.  Customer 
satisfaction  is  high  when  products  are  easy  to  use.  Giving  users 
a  feature  that  they  cannot  use  has  no  value.  Task  analysis  can 
create  loyal  return  customers  who  learn  to  expect  a  satisfying 
experience  from  your  products. 

Furthermore,  not  all  task  analysis  is  time  and  resource  in¬ 
tensive.  People  sometimes  equate  task  analysis  with  true  ethno¬ 
graphic  research,  which  can  take  months  to  collect  and  analyze 
data.  However,  most  task  analysis  is  not  done  that  way  today. 
Task  analysis  methodologies  have  adapted  to  product  cycles  and 
to  the  desire  to  be  the  first  to  market.  Some  types  of  task  analy¬ 
sis  activities  can  be  planned,  carried  out,  and  analyzed  in  a  week. 
(We  discuss  these  rapid  methods  later  in  this  chapter.) 

Using  Other  Opportunities  to  Sell  Task  Analysis 

If  your  explanations  and  attempts  at  education  seem  to  fall  on 
deaf  ears,  do  not  despair.  You  will  have  plenty  of  opportunities 
to  raise  the  topic  again  because  task  analysis  is  relevant  through¬ 
out  design  and  development.  For  example,  if  your  product  team 
is  spinning  wheels  and  having  difficulty  making  design  deci¬ 
sions,  suggest  doing  task  analysis  again. 

In  the  worst-case  scenario,  where  you  just  cannot  convince 
the  team  to  incorporate  task  analysis,  they  may  learn  the  les¬ 
son  the  hard  way — in  the  usability  lab.  A  product  that  was  de¬ 
veloped  without  the  benefits  of  user  research  will  often  fail  in 
usability  testing.  The  team  may  realize  at  this  point  that  task 
analysis  might  have  prevented  the  failure,  and  they  will  likely 
reconsider  next  time. 

Making  Task  Analysis  Part  of  the  Standard 
Development  Process 

Ideally,  you  want  to  make  task  analysis  a  part  of  your  standard 
product  development  process.  There  are  many  ways  to  do  this. 

•  Once  you  have  some  data,  spread  the  word. 

•  Advertise  your  success  stories. 

•  Hold  brown  bags  to  demonstrate  the  success  of  your 
research. 

•  Post  visual  posters  of  your  results  in  hallways  where  those 
passing  by  can  see  your  work. 

•  Identify  leaders  in  your  organization  and  make  them  aware  of 
the  impacts  of  task  analysis.  They  can  provide  top-down  in¬ 
fluence  to  get  other  teams  onboard.  Soon  teams  will  be  com¬ 
ing  to  you  looking  for  your  support. 


PLANNING  FOR  A  TASK  ANALYSIS 
(ISSUES  TO  CONSIDER) 


Background  Research 

If  there  is  an  existing  product,  learn  as  much  as  you  can  about 
that  product  before  you  begin  your  task  analysis.  One  of  the 
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best  ways  to  do  this  is  to  use  the  product.  You  may  not  be  a  true 
end  user,  but  becoming  familiar  with  the  user  interface  and  the 
process  is  beneficial.  When  you  are  observing  users,  you  will 
have  a  level  of  familiarity  with  the  product,  which  can  allow  the 
session  run  with  fewer  interruptions. 

Server  log  files  can  be  a  very  valuable  initial  source  of  task 
information.  They  are  a  record  of  file  requests  to  a  server  and 
the  server  responses  to  those  requests.  Information  from  these 
logs  can  help  determine  what  you  want  to  observe  in  your  task 
analysis.  From  these  records,  you  can  learn  about 

•  Visitors — ^Who  is  coming  to  your  site  and  when  they  return? 

•  Click  paths — Where  do  users  enter  your  website?  How  are 
they  navigating  from  page  to  page?  What  is  the  sequence  of 
the  pages  visited? 

•  Time  per  page — Where  was  time  spent  across  the  pages  visited? 
Where  are  users  spending  the  bulk  (or  least)  of  their  time? 

•  Exit  point — ^Where  are  users  leaving  the  website? 

•  Actions  completed — ^Are  users  completing  tasks  (e.g.,  make  a 
purchase,  download  software,  etc.). 

There  are  a  variety  of  other  sources  to  gather  information 
from.  These  include  the  product  team,  marketing,  customer  sup¬ 
port  cases,  early  adopter  feedback,  and  competitor  products  (for 
a  detailed  discussion,  see  Courage  &  Baxter,  2004,  chapter  2). 


Getting  Into  the  Project  Plan 

A  critical  aspect  of  being  able  to  do  task  analysis  throughout  a 
project  is  getting  this  and  all  other  usability  activities  into  the 
project  plan.  The  extent  to  which  this  can  happen  depends,  of 
course,  on  buy  in  from  the  project  team,  but  also  on  other  fac¬ 
tors,  such  as  whether  a  project  plan  exists  and  in  how  much  de¬ 
tail  it  is  specified.  The  more  strongly  a  project  team  uses  a  for¬ 
mal  project  plan,  the  more  critical  it  is  to  get  usability  activities, 
such  as  task  analysis,  into  the  plan.  Time,  resources,  and  respect 
from  managers  and  developers  for  the  information  may  be 
dependent  on  being  part  of  the  formal  plan. 

Another  approach  that  many  usability  specialists  follow  is  to 
create  a  usability  project  plan  that  parallels  the  system  design 
and  development  project  plan.  That  is  fine  if  the  system  people 
understand  and  respect  the  parallelism  of  the  plans. 


Getting  Sign-off  and  Meeting  Schedules 

Whether  the  usability  plan  is  part  of  the  overall  project  plan  or 
of  a  parallel  track,  it  is  important  to  get  sign-off  from  the  rest  of 
the  design  and  development  team  with  respect  to 

•  Activities  that  the  usability  team  will  do 

•  Resources  needed  for  those  activities 

•  Information  that  will  be  brought  back  from  those  activities 

•  Deliverables  (formal  or  informal)  that  will  come  from  that 
information 

•  Dates  for  those  deliverables 


As  usability  specialists,  we  must  acknowledge  that,  at  least 
in  the  United  States,  most  software  and  web  development  proj¬ 
ects  are  schedule-driven.  Task  analysis  and  other  usability  activ¬ 
ities  must  deliver  information  in  a  timely  fashion  at  the  right  mo¬ 
ments  in  the  project  schedule. 

Getting  into  the  project  plan  early  can  help  set  schedules  that 
allow  time  and  resources  to  do  task  analysis  and  other  usability 
activities.  Another  way  that  some  usability  specialists  accomplish 
this  is  to  take  responsibility  for  elements  of  the  project  plan,  such 
as  the  user  interface  specification,  and  use  that  as  the  way  to  in¬ 
tegrate  usability  data  into  product  design  and  development.  A 
project  plan  is  an  excellent  tool,  but  we  are  aware  the  project 
schedules  often  change.  You  will  be  dependent  on  others  for 
your  activities,  so  it  is  important  to  track  the  schedule  of  activities 
that  you  may  not  be  a  part  of,  but  that  may  impact  your  schedule. 
Flexibility  in  your  project  plan  is  critical.  In  some  cases,  you  may 
find  that  you  have  two  extra  weeks,  but  more  likely,  you  will  find 
that  you  have  lost  time  and  you  need  to  adjust  your  usability  plan. 

Providing  Useful  and  Usable  Data 

As  usability  specialists,  we  help  project  teams  only  if  we  pro¬ 
vide  useful  and  usable  data  when  they  need  it.  Therefore,  as 
noted,  staying  closely  aligned  with  a  project  plan  and  schedule 
is  critical. 

What  else,  besides  timeliness,  makes  data  useful  and  usable? 

•  Data  the  project  team  needs.  As  usability  specialists,  we 
should  approach  any  project  looking  for  where  the  team 
needs  data  about  users  (whether  they  know  it  or  not)  and 
plan  to  collect,  interpret,  and  present  that  data  in  a  way  that 
the  team  can  understand  and  directly  use.  Often,  the  chal¬ 
lenge  is  getting  the  team  to  allow  you  to  collect  the  data,  but 
in  other  instances,  they  may  want  to  collect  too  much  data. 
They  may  have  questions  that  are  much  too  specific  given  the 
stage  of  development.  This  is  fine  if  the  data  can  be  used  later, 
but  if  there  is  no  rationale  for  how  the  data  will  impact  the 
product  (now  or  later),  it  is  best  to  invest  your  time  elsewhere. 

•  Data  that  is  credible.  As  we  have  said,  time  pressures  and  lim¬ 
ited  resources  often  curtail  the  extent  of  any  usability  activity, 
including  task  analysis.  In  general,  we  follow  the  maxim  that 
some  data  about  users  and  their  work  is  better  than  no  data. 
Practical  experience  tells  us  that  is  true — as  long  as  the  data 
is  accurate  and  representative.  Limited  data  can  still  be  good 
data.  Good  data  is  collected  in  a  systematic,  careful,  rigorous 
way  from  appropriate  users.  To  do  this  successfully,  you  must 
become  familiar  with  the  product  and  domain.  This  is  critical 
for  credible  data  collection.  Domain  familiarity  will  allow  you 
to  identify  the  appropriate  type  of  task  analysis  and  to  formu¬ 
late  the  most  appropriate  questions  and  observations.  The 
team  will  also  take  you  more  seriously  and  view  you  as  a  true 
member  of  the  team  if  you  understand  the  domain. 

Deciding  What  the  Project  Team  Needs 

The  task  analysis  that  you  want  to  do  will  depend  in  part  on  the 
type  of  product  or  website  that  you  are  working  on.  Consider 
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these  six  factors  as  you  think  about  the  project  for  which  you  are 
planning  task  analyses  and  other  usability  activities. 

Where  is  the  product  in  its  overall  life  cycle?  For  ex¬ 
ample,  is  the  team 

•  Upgrading  an  existing  product  without  changing  the  medium 
(a  new  software  release  or  a  revision  of  a  website)? 

•  Changing  business  processes  or  medium  (going  from  a  legacy — 
DOS-based  or  green-screen  product — to  a  GUI  or  to  the  web)? 

•  Developing  something  totally  new? 

How  much  time  do  you  have  to  conduct  your  task 
analysis? 

•  Has  the  team  always  been  onboard  with  task  analysis?  Did 
they  defer  to  you  to  determine  the  appropriate  activity  and 
time  line? 

•  Has  the  team  suddenly  discovered  that  they  need  some  data 
quickly  and  they  have  not  allotted  for  it  in  their  schedule? 

•  Does  the  team  believe  that  there  no  time  in  the  schedule  for 
task  analysis? 

How  broad  or  specialized  is  the  user  population  for 
the  product?  Is  the  product  for  a 

•  Very  broad  public  market? 

•  Niche  business  market  where  you  can  easily  define  the  user 
population  and  access  to  them  is  through  account  executives 
(or  similar)? 

•  Special  audience  (e.g.,  children,  the  elderly,  or  persons  with 
disabilities)? 

How  widespread  geographically,  culturally,  and  lin¬ 
guistically  is  the  user  population  for  the  product?  Is 

the  product 

•  Global?  How  far?  How  many  countries?  Cultures?  Languages? 

•  Local?  (Some  products  are  still  used  only  in  a  particular  coun¬ 
try — but  even  within  one  country,  you  are  likely  to  find  dif¬ 
ferences  in  culture  and  vocabulary.) 

How  detailed  must  we  specify  the  tasks?  Is  the  product 

•  Safety-critical,  where  tasks  are  very  specific  and  must  be  done  in 
specified  ways?  Do  users  receive  training  until  they  prove  their 
competence  in  completing  the  tasks  accurately  and  efficiently? 

•  Used  by  many  different  types  of  people  for  different  tasks, 
which  they  may  do  in  different  ways? 

Is  this  a  special  type  of  product  for  which  traditional 
task  analysis  may  not  be  useful?  Some  applications  do 
not  fit  the  traditional  approach  for  task  analysis.  These  are  pri¬ 
marily  applications  that  the  user  does  for  fun,  such  as  games. 

Traditional  task  analysis  would  not  aid  much  in  the  design 
and  development  of  these  applications. 

As  a  result,  game  designers  do  not  typically  do  task  analysis. 
They  are  more  interested  in  mood,  theme,  story,  drama,  pro¬ 


gression,  surprise,  pacing,  and  the  physical  correlates  of  these 
experiences.  The  conceptual  space  in  which  they  operate  and 
their  ways  of  thinking  about  how  to  create  a  fun  experience  is 
fundamentally  different,  perhaps  even  antithetical  to,  the  ana¬ 
lytic  approach  that  is  typical  of  task  analysis. 

Deciding  on  an  Appropriate  Level  of  Granularity 

Another  aspect  to  consider  as  you  plan  a  task  analysis  is  the 
types  of  analysis  to  do.  Understanding  users’  goals  and  their 
work  can  be  done  at  several  different  levels.  You  might  be  in¬ 
terested  in  one  or  more  of  these  types  of  analysis: 

•  Analysis  of  a  person’s  typical  day  or  week  (A  day  in  the 

life  of  or  an  evening  at  home  with) — This  is  probably  most 
needed  for  the  early  stages  of  strategic  planning  and  predesign. 

•  Job  analysis  (all  the  goals  and  tasks  that  someone  does  in  a 
specific  role — daily,  monthly,  or  over  longer  periods) — ^Again, 
this  is  probably  most  needed  for  the  early  stages  of  strategic 
planning  and  predesign. 

•  Workflow  analysis  (process  analysis,  cross-user  analysis, 
how  work  moves  from  person  to  person) — This  may  be  use¬ 
ful  in  strategic  planning  and  predesign,  especially  if  you  have 
ways  of  improving  workflow  among  users.  It  is  also  needed  in 
concept  design  and  interface  design. 

•  High-level  task  analysis  (the  work  needed  to  accomplish  a 
large  goal  broken  down  into  subgoals  and  major  tasks) — This 
is  needed  in  information  architecture,  concept  design,  and 
interface  design. 

•  Procedural  analysis  (the  specific  steps  and  decisions  the 
user  takes  to  accomplish  a  task) — This  is  needed  at  the  in¬ 
terface  design  stage  and  beyond. 

Deciding  Where  to  Start 

You  must  first  understand  how  far  along  in  the  process  the  proj¬ 
ect  is.  Unfortunately,  by  the  time  usability  specialists  know  about 
the  project,  the  strategic  planning  and  predesign  stages  may  al¬ 
ready  be  considered  closed.  The  project  may  be  at  consider¬ 
able  risk  if  the  strategic  planning  and  predesign  questions  were 
never  answered  or  were  answered  based  on  speculation  or  in¬ 
ternal  discussions  without  users.  However,  it  maybe  unproduc¬ 
tive  to  spend  time  and  effort  collecting  data  that  speaks  to  those 
questions  if  no  one  is  willing  to  listen  to  the  answers  you  bring 
back.  If  that  is  the  case,  a  more  productive  use  of  the  limited 
time  and  resources  for  task  analysis  and  other  usability  activi¬ 
ties  would  be  to  understand  where  the  project  is  and  how  to 
influence  it  from  that  point  forward.  In  some  cases,  the  team  re¬ 
alizes  (often  with  your  help)  that  they  do  need  to  take  a  step 
back  to  the  earlier  stages  and  collect  user  data  in  order  to  pre¬ 
vent  the  product  from  going  down  the  wrong  path. 

Gathering  Reusable  Data 

Time,  resources,  and  costs  are  likely  to  limit  the  number  of 
times  you  can  return  to  users  for  task  analysis.  In  addition,  users 
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do  not  think  in  terms  of  the  information  needs  of  a  project  team 
or  the  different  stages  of  design  and  deveiopment  of  products 
they  might  use  in  the  future. 

One  good  approach  might  be  to  coiiect  extensive  data  about 
users’  work  in  a  reiatively  hoiistic  way,  such  as  capturing  that 
data  on  video  or  audio  or  in  extensive  notes  so  that  you  can  re¬ 
turn  to  the  data — rather  than  to  the  users — with  different  ques¬ 
tions  in  mind  at  different  times. 

To  do  this,  an  open-ended  fieid  study  method  combined 
with  detaiied  information  gathering  is  best  (Wtxon,  1995).  Also, 
having  a  relatively  detailed  log  of  the  raw  data  is  necessary 
(videotape,  audio  tape,  or  verbatim  transcripts). 

Going  to  Different  Users  at  Different  Times 

Although  the  number  of  times  you  may  go  out  to  users  is  likely 
to  be  limited,  we  have  also  said  that  you  are  likely  to  want  to  do 
site  visits  at  different  times  for  different  questions.  Even  if  you 
gather  extensive  data  holistically  on  early  site  visits  and  return  to 
the  data,  you  may  not  have  the  information  that  you  need.  In 
that  case,  go  out  again,  and  go  to  different  users. 

You  can  use  each  set  of  site  visits  not  only  to  answer  the 
specific  immediate  issues  and  questions  but  also  to  enrich  the 
team’s  general  understanding  of  users,  their  work,  and  their 
environments.  Although  your  immediate  focus  may  be  a  specific 
why  or  what  or  how  question,  always  drill  down  so  that  you  are, 
in  fact,  seeing  the  why  behind  the  what  or  the  what  and  how 
behind  the  why. 


COLLECTING  TASK  ANALYSIS  DATA 


While  traditional  task  analysis  focused  on  time  and  motion  stud¬ 
ies,  modern  task  analysis  relies  more  on  ethnography  and  cog¬ 
nitive  psychology,  and  that  is  the  focus  we  take  in  this  chapter. 
(For  a  more  detailed  discussion  of  task  analysis  roots  in  ethnog¬ 
raphy  and  cognitive  psychology,  see  Redish  &  Wtxon,  2003.) 

Preparing  Philosophically  to  Work  with  Users 

From  ethnography  and  cognitive  psychology,  we  also  take  our 
ideas  of  how  to  work  with  users  when  we  are  doing  a  task  analy¬ 
sis.  Anyone  who  works  with  users  to  gather  data  for  a  task  analy¬ 
sis  should  abide  by  these  five  principles  (inspired  by  and  adapted 
from  the  work  of  Wixon,  Holtzblatt,  &  Knox,  1990;  Whiteside, 
Bennett,  &  Holtzblatt,  1988;  Wixon  &  Jones,  1996;  Wtxon  & 
Ramey,  1996;  Holtzblatt  &  Beyer  1993;  Beyer  &  Holtzblatt,  1995). 

Treat  the  user  with  respect.  You  are  there  to  under¬ 
stand  who  users  are,  what  they  do,  how  they  do  it,  and  what 
they  value.  No  matter  how  differently  they  do  their  work  from 
what  you  expected,  you  must  respect  them,  their  cultures,  their 
actions,  and  their  decisions.  Inform  users  that  they  are  the  ex¬ 
perts  and  that  you  are  there  to  learn  from  them.  In  a  working  en¬ 
vironment,  you  must  emphasize  to  the  user  that  you  are  there 
to  learn  how  they  work  so  that  you  can  make  their  job  easier. 


Users  may  assume  that  you  are  there  to  evaluate  them  or  to  de¬ 
velop  a  product  that  will  replace  their  jobs.  You  must  address 
concerns  like  these  upfront  so  that  the  experience  is  comfort¬ 
able,  rather  than  threatening,  to  the  users. 

Understand  that  you  are  a  visitor  (guest)  in  the 
users’  environment.  You  are  there  to  learn  about  their  re¬ 
alities,  not  to  bring  your  reality  to  them. 

Be  in  learning  mode.  You  are  not  there  to  train,  teach, 
demonstrate,  or  sell  a  new  product.  This  part  is  often  difficult 
for  engineers,  training  specialists,  marketing  specialists,  and  oth¬ 
ers  when  they  first  start  doing  site  visits.  They  are  used  to  go¬ 
ing  to  customers  for  these  other  reasons.  Before  going  out  on 
a  task-analysis  site  visit,  they  must  understand  the  difference  in 
the  reasons  for  the  visit,  and  they  must  become  comfortable 
with  the  appropriate  frame  of  mind  and  behavior. 

Let  the  user  be  a  mentor.  If  you  are  there  to  learn,  you 
must  watch  and  listen  more  than  you  act  or  talk.  Especially  in 
work  contexts,  one  way  to  do  this  is  to  suggest  to  the  user  a  re¬ 
lationship  of  mentor  and  trainee — user  as  mentor,  site  visit  ob¬ 
server  as  trainee. 

Users  are  often  unsure  of  how  to  deal  with  the  strangers  in 
their  midst.  They,  too,  have  more  often  been  visited  by  people 
training  them,  demonstrating  something  to  them,  or  market¬ 
ing  something  to  them.  Helping  them  to  get  into  an  appropriate 
relationship  is  useful,  and  most  users  are  happy  to  become  the 
trainer  or  demonstrator. 

Be  a  partner  in  the  task  of  understanding  the  user’s 
world.  Use  a  site  visit  to  test  out  your  assumptions.  Clarify 
your  understanding  of  what  the  user  is  doing  and  saying  as  you 
go  along.  Capture  the  users’  words.  Do  not  translate  as  you 
take  notes. 

Selecting  Users  and  Environments 

Companies  have  sometimes  commissioned  large-scale  task 
analyses  as  part  of  their  long-term  strategic  planning;  that  is, 
outside  of  the  development  cycle  of  a  particular  product  or  ver¬ 
sion  (e.g..  Redish  &  James,  1996;  Dray  &  Mrazek,  1996).  That  is 
terrific  when  it  happens. 

However,  most  task  analyses  today  are  small-scale  studies, 
especially  when  they  are  part  of  a  product-oriented  project.  The 
rapid  pace  of  design  and  development  often  does  not  leave  time 
for  site  visits  to  more  than  a  few  places.  The  scale  is  similar  to 
that  used  in  iterative  usability  testing — six  to  eight  users  per 
study.  As  with  usability  testing,  it  is  better  to  do  a  few  site  visits 
each  time  for  different  purposes  at  different  times  in  the  project 
than  to  do  many  up  front  and  then  not  have  resources  to  go 
out  for  other  reasons  later.  Also,  as  with  iterative  usability  test¬ 
ing,  even  though  each  study  is  small,  over  the  course  of  the 
project,  you  may  see  many  different  sites  and  users  (for  details 
on  how  to  recruit  the  right  users,  see  Courage  &  Baxter,  2004). 

Make  a  convenience  sample  as  representative  as 
possible.  A  task  analysis  study  usually  uses  a  convenience 


48.  Task  Analysis  *  935 


sample.  However,  care  should  be  taken  to  make  the  sample  as 
representative  as  possible  within  the  constraints  of  time  and  re¬ 
sources  (see  Hackos  &  Redish,  1998,  especially  chapters  2  and 
7  on  defining  your  users  and  then  selecting  appropriate  ones  for 
a  user  and  task  analysis). 

Because  it  is  almost  impossible  to  find  one  user  who  truly 
represents  the  entire  spectrum  of  users,  spending  some  time 
with  each  of  a  few  users  is  better  than  spending  all  the  allotted 
time  with  just  one  user.  Characteristics  on  which  you  might 
want  to  base  representation  include 

•  Size  of  the  user’s  company  (which  may  affect  task  specializa¬ 
tion  among  users,  amount  of  support  available  to  users,  tech¬ 
nology  available  to  users,  etc.) 

•  Experience  in  the  domain  and  with  the  medium  being  con¬ 
templated  for  the  product 

•  Gender,  age,  and  background  (which  may  correlate  with  mo¬ 
tivation  to  learn  new  ways  of  working  and  with  interests  and 
values  related  to  technology) 

•  Location  and  culture  if  this  is  a  product  for  a  global  market 

Do  not  limit  task  analysis  to  development  partners. 

Many  companies  today  have  development  partners  who  agree 
to  work  with  them  throughout  the  project.  Development  part¬ 
ners  are  usually  major  customers  who  get  to  influence  the  new 
product  in  exchange  for  allowing  their  people  to  participate  in 
customer  activities  and  beta  tests. 

Working  with  development  partners  is  a  great  idea.  However, 
in  many  cases,  development  partners  are  at  the  tail  end  of  the 
distributions  for  both  size  and  sophistication  within  the  com¬ 
pany’s  market.  "What  works  for  them  may  not  work  well  for  the 
company’s  many  other  customers. 

If  the  project  you  are  working  on  has  development  partners, 
but  the  company  also  wants  the  product  to  be  used  in  many 
smaller  and  less  sophisticated  places,  push  to  include  others  in 
usability  activities  like  user  and  task  analysis.  In  addition  to  mak¬ 
ing  project  managers  realize  how  unrepresentative  users  in  the 
development  partners  may  be,  you  can  also  often  make  a  cost- 
benefit  argument  based  on  plans  for  implementing  the  prod¬ 
uct  in  different  environments. 


Conducting  Site  Visits 

What  happens  when  you  actually  meet  the  users?  How  should 
you  act  when  working  with  users? 

The  two  main  techniques  for  collecting  task  analysis  data  are 
observations  and  interviews.  In  a  typical  site  visit,  you  combine 
both;  and  when  possible,  you  do  them  together — conversing 
with  the  user  about  the  work  as  you  observe  and  listen  to  the 
user  doing  the  work. 

In  certain  situations,  you  cannot  converse  with  the  user 
while  the  user  is  working.  This  may  happen  when 

•  The  situation  is  safety-critical  (e.g.,  with  air  traffic  controllers; 
Means,  1993) 

•  The  user  is  interacting  with  a  client  (e.g.,  in  some  situations 
with  retail  salespeople  or  travel  agents;  Redish  &  James,  1996) 


•  The  users  do  not  want  to  be  interrupted  in  their  work  (e.g., 
with  radiologists;  Ramey,  Rowberg,  &  Robinson,  1996) 

In  those  cases,  you  may  be  able  to  talk  about  the  work  as 
soon  as  the  user  finishes  the  task  (immediate  recall)  or  at  a  later 
time  using  information  from  videotape  or  other  artifacts  to  stim¬ 
ulate  recall  (cued  recall).  In  situations  that  depend  on  the  user 
interacting  with  a  customer,  you  might  have  one  member  of  the 
site-visit  team  act  as  customer,  thus  having  the  user  do  a  real 
task  but  giving  you  the  opportunity  to  talk  with  the  user  during 
the  task  (for  these  and  other  techniques  for  task  analysis,  see 
Courage  &  Baxter,  2004;  for  specifics  on  observing  and  inter¬ 
viewing  during  site  visits,  see  Hackos  &  Redish,  1998). 

Observing  users.  What  you  take  notes  on  and  the  level  of 
detail  of  your  notes  depends  on  the  stage  the  project  is  at  and 
therefore  the  questions  and  issues  you  are  addressing  in  this 
particular  study. 

In  each  site  visit,  you  may  want  to  spend  all  of  your  time  with 
one  user,  the  person  responsible  for  the  tasks  you  are  investi¬ 
gating.  You  may  want  to  spend  time  with  different  users  at  the 
same  site  if  several  people  at  the  site  do  the  same  tasks,  espe¬ 
cially  if  they  have  different  levels  of  domain  knowledge,  product 
experience,  or  technical  skill. 

You  may  also  want  to  spend  time  with  different  users  if  you 
are  interested  in  work  analysis  or  process  analysis  (how  work 
flows  across  users  who  do  different  parts  of  a  process).  You  may 
want  to  spend  time  with  the  people  who  do  each  part  of  the 
process.  If  you  have  already  watched  them  individually  and 
drawn  flowcharts  of  what  you  see  as  their  processes,  you  may 
want  to  bring  them  together  in  a  conference  room  to  go  over 
your  process  flow  and  verify  it,  discuss  it,  or  ask  questions  you 
have  about  it. 

If  you  want  to  get  details  of  steps  and  decisions  to  flowchart 
a  procedure,  you  may  want  to  slow  the  users  down  and  discuss 
each  step  and  decision,  asking  questions  about  other  situations 
and  how  they  might  change  the  steps  or  decisions  and  about 
the  frequency  of  different  situations. 

In  all  cases,  as  you  observe  and  listen,  you  must  remember 
the  five  points  we  made  earlier  in  the  section  on  preparing 
philosophically  to  work  with  users  as  well  as  these  four: 

1.  Be  friendly  but  neutral. 

Be  aware  of  how  you  give  messages  with  your  body  as  well  as 
with  your  words. 

2.  Be  aware  of  assumptions  that  you  brought  with  you. 

As  you  watch  and  listen,  you  are  verifying  those  assumptions 
or  changing  your  understanding  of  the  user’s  reality. 

3.  Ask  questions  to  clarify. 

Restate  what  you  think  you  heard.  Your  goal  must  be  to  bring 
back  the  user’s  understanding  of  the  work.  It  is  very  easy  to 
think  that  you  understand  when  you  are  filtering  what  you 
see  and  hear  through  your  own  view  of  the  work. 

4.  Note  the  user’s  words. 

One  goal  of  a  task  analysis  is  to  understand  the  user’s  vocab¬ 
ulary  for  objects  and  actions.  Again,  it  is  very  easy  to  translate 
what  you  hear  into  the  development  team’s  words  without 
realizing  that  you  are  doing  it — if  you  are  not  attuned  to  the 
need  to  capture  the  user’s  words. 
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Interviewing  users.  Most  of  the  talking  in  a  task  analy¬ 
sis  should  be  a  conversation  with  the  user  about  the  work  dur¬ 
ing  the  work.  However,  you  may  want  to  ask  questions  that  are 
more  general. 

In  general,  interviewing  after  you  have  observed  is  better 
than  before.  You  will  have  the  context  of  what  you  saw  and  heard 
as  a  frame  for  the  other  questions.  You  will  likely  have  a  set  of 
predetermined  questions  to  ask  the  users,  as  well  as  questions 
that  arose  while  in  the  environment.  Having  some  preplanned 
questions  is  a  good  idea,  as  that  allows  you  to  have  some  con¬ 
sistent  data  across  all  of  your  participants.  This  is  especially  help¬ 
ful  if  there  are  multiple  researchers  conducting  the  study. 

The  three  most  important  guidelines  for  successful  inter¬ 
viewing  are: 

1.  Ask  questions  in  a  neutral  manner. 

Do  not  lead  the  user.  Allow  the  user  to  show  you  their  work 
and  draw  your  questions  from  what  they  show  you.  You  may 
also  give  them  some  broad  areas  of  inquiry  or  focus  (Wixon, 
Holtzblatt,  et  al.,  1990).  Try  to  put  the  questions  in  a  behavioral 
context  rather  than  as  a  simple  matter  of  like  or  dislike.  If  you 
say,  “We  are  thinking  of  adding  a  feature  to  the  product.  Would 
you  like  that?”  you  are  suggesting  that  the  user  say,  “Yes,  I’d  like 
that.”  If  you  say,  instead,  “If  the  product  had  a  feature,  would 
you  use  it?”  and  follow  that  up  with  “When  and  how  would 
you  use  it?”  you  are  likely  to  get  a  more  informative  answer. 

2.  Keep  your  questions  clear  and  concise. 

Avoid  lengthy  multipart  questions  or  questions  with  unclear 
language  or  with  your  company  or  product’s  jargon.  If  users 
do  not  understand  exactly  what  you  are  asking,  they  may  mis¬ 
interpret  what  you  are  asking  for  and  respond  inappropri¬ 
ately.  They  may  feel  awkward  and  embarrassed  because  they 
are  unsure  for  what  you  are  looking  (for  more  details  about 
question  writing,  see  Courage  &  Baxter,  2004,  pp.  262-270). 

3.  Listen  far  more  than  you  speak. 

You  can  keep  a  conversation  going  with  prompts  and  probes 
that  send  the  message  that  you  are  listening  and  want  the 
user  to  talk  more  (for  more  on  interviewing  and  listening 
skills  for  user  and  task  analysis,  see  Hackos  &  Redish,  1998, 
pp.  279-291). 

Whenever  possible,  make  an  interview  behavioral  rather 
than  attitudinal.  For  example,  if  you  have  brought  along  proto¬ 
type  screens  of  the  new  product  to  show  as  part  of  your  site 
visit,  do  not  just  show  them  and  ask  what  users  think.  Make  that 
into  a  miniature  usability  test,  even  if  it  is  in  a  conference  room 
and  not  at  the  user’s  desk.  Ask  the  user  to  walk  through  the 
screens  doing  a  realistic  scenario. 

With  interviews,  as  with  observations,  clarify  what  you  are 
hearing.  Restate  so  that  you  know  you  are  getting  the  user’s  un¬ 
derstanding.  Try  not  to  filter  through  your  preconceived  no¬ 
tions.  Do  not  translate  the  user’s  words  into  your  company  or 
product’s  jargon. 

Doing  Rapid  Task  Analysis 

As  we  have  said,  sometimes,  the  team  needed  the  data  two 
weeks  ago;  sometimes,  they  just  do  not  have  the  time  or  re¬ 


sources  for  a  multiweek  task  analysis.  Several  rapid  data  collec¬ 
tion  methods  have  emerged  for  these  situations.  We  discuss  three 
here:  narrowing  the  scope,  using  multiple  teams,  and  schedul¬ 
ing  your  sessions  around  activity  peaks. 

Note  that  rapid  task  analysis  is  not  right  for  all  situations.  If 
your  team  is  designing  or  redesigning  a  very  complex  system, 
there  may  be  so  much  to  understand  that  even  a  condensed 
method  will  be  time  consuming. 

Narrow  the  scope.  One  way  to  do  rapid  task  analysis  is  to 
reduce  the  scope  of  your  study.  Instead  of  trying  to  understand 
everything  about  the  users  and  all  of  their  tasks,  focus  on  col¬ 
lecting  data  that  you  will  be  able  to  use  immediately  to  make  de¬ 
cisions.  Narrowing  the  scope  is  an  obvious  choice  when  you 
have  specific  issues  or  questions  that  need  to  be  resolved.  Do 
not  focus  on  the  big  picture;  focus  on  those  specific  issues. 

Breadth  is  great  if  you  can  afford  it,  but  data  analysis  will  be 
costly  simply  because  you  have  more  of  it.  Searching  for  gems 
within  page  of  notes  and  video  may  be  too  time  consuming.  In¬ 
stead,  focus  on  actionable  data,  which  results  from  a  defined 
research  question. 

Using  multiple  teams.  Another  way  to  do  rapid  task 
analysis  is  to  have  multiple  teams  collect  the  data  to  shorten 
the  amount  of  time  required.  This  is  a  great  way  to  engage  the 
entire  project  team.  It  also  may  be  necessary  in  situations  where 
your  study  is  international  and  involves  different  languages  and 
cultures. 

Engaging  the  entire  team  can  have  a  powerful  impact.  Other 
team  members  get  to  see  the  users’  environments  firsthand,  and 
they  then  often  feel  ownership  in  the  study.  In  addition,  multiple 
views  of  the  same  task  can  provide  a  rich  data  set  and  better 
understanding  of  what  is  happening  in  the  users’  environments. 

Involving  multiple  data  collectors  is  ideal  if  you  are  working 
with  a  team  who  understands  task  analysis.  You  may  have  con¬ 
ducted  an  activity  for  them  in  the  past  where  they  were  very  re¬ 
ceptive  to  the  data.  Familiarity  with  task  analysis  is  very  helpful 
in  fielding  multiple  data  collection  teams. 

If  you  take  the  team  data-collection  approach,  you  must  train 
the  observers.  You  need  to  make  sure  that  everyone  is  on  the 
same  page  about  the  types  of  data  that  will  be  observed  and  that 
will  be  collected.  You  want  to  do  your  best  to  make  sure  that 
data  is  collected  in  a  consistent  manner.  Templates  are  ideal  in 
this  situation.  We  recommend  these  templates: 

•  A  protocol  that  gives  the  procedures  and  steps  that  every¬ 
one  is  to  follow  when  with  the  users.  A  protocol  helps  to  en¬ 
sure  that  all  the  sessions  are  conducted  in  a  consistent  manner. 
A  useful  protocol  includes  information  on  how  to  introduce 
the  session  to  the  participant,  how  long  each  part  of  the  ses¬ 
sion  should  last,  how  to  end  the  session,  and  so  on. 

•  A  list  of  questions  to  ask  either  during  or  after  the  session. 

•  An  observation  guide  that  includes  the  issues  on  which  ob¬ 
servers  should  focus.  This  helps  guide  the  observations  to  en¬ 
sure  that  everyone  is  observing  the  same  types  of  activities. 
It  is  not  a  specific  list  of  questions  to  ask  the  user. 

•  A  template  for  recording  data.  This  may  be  in  spreadsheet  or 
document  format.  A  template  for  recording  data  helps  cue 
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observers  as  to  what  behaviors  they  should  be  observing  and 
provides  a  consistent  data  collection  method.  It  makes  ana¬ 
lyzing  data  much  easier. 

•  A  template  for  reporting  on  the  session.  This  is  a  standard 
worksheet  that  each  observer  completes  at  the  end  of  each 
session  to  highlight  some  of  the  key  observations  or  discov¬ 
eries  made.  It  can  be  very  useful  when  the  team  first  regroups 
to  discuss  the  data.  In  some  cases,  teams  create  templates 
based  on  the  form  of  analysis  required  by  the  development 
process.  For  example,  at  DEC  where  Quality  Factor  Deploy¬ 
ment  (QFD)  was  extensively  used,  teams  would  prepare  a 
QFD  template  and  fill  in  the  template  from  daily  debriefs.  The 
result  was  an  extensive  analysis  in  a  useful  form  on  the  day 
they  returned  from  an  extensive  visit. 

If  the  data  is  being  collected  over  a  number  of  days,  check 
in  with  your  team  at  the  end  of  each  day  to  talk  about  key  ob¬ 
servations.  This  can  be  in  person  or  over  the  phone.  It  is  great  to 
discuss  these  things  when  the  data  is  fresh.  Daily  check  in  also 
gives  you  an  opportunity  to  resolve  any  procedural  issues  that 
observers  may  be  having. 

Schedule  your  sessions  around  activity  peaks.  If 

you  schedule  your  visits  during  times  of  peak  activity,  you  may 
observe  more  activity  during  a  shorter  window  of  time.  While 
this  technique  is  useful  in  any  task  analysis,  it  may  be  particularly 
helpful  if  you  are  doing  rapid  task  analysis. 

Our  general  recommendation  that  a  team  of  two  go  to  make 
observations  may  be  critical  if  you  plan  to  go  during  a  time  of 
peak  activity.  In  a  time  of  peak  activity,  it  is  especially  difficult 
for  a  single  person  to  keep  up  with  the  potentially  rapid  pace.  In 
our  two-person  teams,  one  observer  focuses  on  asking  ques¬ 
tions,  while  the  other  takes  notes. 

Be  aware,  however,  that  many  users  may  not  want  you  to 
come  at  times  of  peak  activity  precisely  because  they  are  so  busy 
and  stressed.  They  may  feel  that  they  cannot  cope  with  any  dis¬ 
tractions  or  additions  to  the  environment.  If  seeing  times  of 
peak  activity  is  crucial  to  your  issues,  you  may  need  to  convince 
users  that  you  will  observe  unobtrusively  and  not  get  in  the  way. 
You  may  need  to  forego  some  of  the  questions  and  interview 
time  that  you  would  like  to  have. 


Bringing  Users  to  You  (Field  Studies  in  the  Lab) 

The  best  task  analysis,  of  course,  is  done  in  the  users’  context. 
However,  time  constraints,  travel  restrictions,  or  security  re¬ 
strictions  may  make  it  impossible  for  you  to  go  to  the  users. 
You  can  still  do  task  analysis.  Have  the  users  bring  their  tasks 
to  you. 

Although  you  will  not  get  a  true  environmental  analysis  (see¬ 
ing  the  users  work  in  their  own  settings) ,  you  can  still  get  a  lot 
of  data  about  users  and  tasks.  "When  you  do  task  analysis  in  the 
lab,  make  sure  you  ask  the  users  to  bring  artifacts  relevant  to  the 
task  being  discussed,  such  as  objects  participants  use  or  objects 
resulting  from  the  tasks  (Butler,  1996).  Artifacts  could  be  a  calen¬ 
dar,  a  procedure  manual,  a  physical  form,  a  computer-generated 
report,  and  so  on. 


Using  the  lab  for  procedural  task  analysis.  For  de¬ 
tailed  procedural  task  analysis  (getting  down  the  steps  and  de¬ 
cisions  in  completing  a  specific  task),  the  lab  may  be  the  right 
environment.  If  you  want  to  capture  details,  you  probably  want 
a  videotape  record.  It  is  easier  to  videotape  in  the  lab  than 
to  cart  equipment  to  users’  sites,  set  it  up,  and  take  it  down — 
although  we  do  recommend  videotaping  for  most  task  analysis 
site  visits. 

Procedural  task  analysis  can  be  done  either  individually  or 
with  a  group  of  users.  In  an  individual  task  analysis,  you  work  with 
one  user  at  a  time  and  delve  into  exactly  how  that  user  completes 
the  process.  In  a  group  task  analysis  (GTA),  you  focus  on  four  to 
six  users  together  and  gather  the  task  flow  from  the  group. 

Group  task  analysis  has  several  advantages,  including  these 
three  (for  more  on  group  task  analysis,  see  also  Courage  & 
Baxter,  2004): 

•  Seeing  the  details.  Describing  a  task  flow  can  be  difficult.  It 
is  easy  to  overlook  routine  steps  or  to  forget  about  some  of 
the  details.  Like  focus  groups,  a  GTA  generates  group  discus¬ 
sion  and  a  synergy  among  the  participants.  Because  the  users 
are  comparing  how  they  each  complete  the  task,  details  tend 
to  emerge  in  the  discussion. 

•  Coming  to  agreement.  One  of  the  goals  of  a  group  task  analy¬ 
sis  is  for  the  participants  to  come  to  agreement  on  a  single  task 
flow  that  represents  a  common  way  that  they  can  work  to  com¬ 
plete  the  task.  When  conducting  individual  task  analysis  ses¬ 
sions,  the  researcher  must  combine  the  data  and  make  com¬ 
promises,  rather  than  having  the  users  make  these  decisions. 

•  Saving  time.  You  can  often  develop  a  rich  task  flow  with  a 
group  of  users  in  2  hours  where  you  would  spend  8  to 
10  hours  spent  with  individual  users.  This  makes  group  task 
analysis  another  good  technique  for  rapid  task  analysis. 

Using  the  lab  for  quantitative  task  analysis.  If  you 

want  quantitative  measures,  such  as  time  to  complete  a  task  un¬ 
der  ideal  conditions,  the  lab  may  again  be  the  best  place  to  do 
the  study.  Someone  might  ask,  in  that  case,  whether  you  are 
doing  a  field  study  or  a  usability  test,  but  in  fact,  the  line  be¬ 
tween  the  two  techniques  is  quite  blurry.  Both  can  be  done  in 
the  field  or  in  the  lab,  on  old  products  or  new  prototypes,  qual¬ 
itatively  or  quantitatively,  and  on  the  user’s  own  work  or  on  sce¬ 
narios  given  to  the  user.  Task  analysis  is  a  major  part  of  both 
field  studies  and  usability  testing. 

Conducting  Remote  Task  Analysis 

If  your  focus  is  software  or  the  web,  you  can  do  remote  task 
analysis  to  collect  useful  data.  You  can  use  a  screen-sharing  ap¬ 
plication  to  see  the  user’s  desktop  as  the  user  works.  As  you  can¬ 
not  with  lab  studies,  you  cannot  observe  the  user’s  physical  en¬ 
vironment.  However,  one  advantage  you  have  over  a  lab  study 
is  that  the  users  are  still  in  their  own  environments.  You  can  ask 
users  to  describe  their  environments,  and  some  may  even  be 
willing  to  send  photos,  which  can  give  you  useful  insights. 

Remote  task  analysis  is  sometimes  your  only  option,  espe¬ 
cially  in  international  studies.  You  may  not  have  the  resources 
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for  global  travel.  These  types  of  situations  are  best  when  you  al¬ 
ready  have  some  experience  or  familiarity  with  the  culture. 

A  remote  task  analysis  can  also  be  very  useful  to  supplement 
onsite  sessions.  If  you  have  only  the  resources  to  visit  one  or 
two  users,  do  so;  then,  supplement  your  data  through  remote 
sessions  with  other  users. 


CONDUCTING  INTERNATIONAL 
TASK  ANALYSIS  STUDIES 


We  have  put  international  task  analysis  in  a  separate  section  to 
give  special  attention  to  the  unique  considerations  you  must 
attend  to  at  each  stage:  selling,  preparing,  and  collecting  the 
data.  (We  will  also  give  special  techniques  for  presenting  inter¬ 
national  data  later  in  the  chapter.) 

Selling  an  International  Study 

Many  of  the  arguments  for  international  task  analysis  are  the 
same  as  those  for  local  studies.  However,  as  Siegel  and  Dray 
(2005)  pointed  out,  the  costs,  risks,  and  benefits  of  international 
research  are  often  an  order  of  magnitude  greater  than  for  local 
studies.  For  example,  a  product  team’s  assumption  that  they 
represent  the  user  is  particularity  dangerous  when  the  users  are 
from  a  different  country  and  culture.  In  addition,  if  the  team  as¬ 
sumes  that  the  product’s  functionality  and  design  are  so  com¬ 
pelling  that  users  all  over  the  world  will  adopt  it,  they  are  likely 
to  find  that  is  simply  not  the  case.  What  is  compelling  in  one  cul¬ 
ture  may  not  be  in  other  cultures. 

International  research  is  often  most  effective  and  important 
when  the  company  is  taking  a  strategic-level  initiative  and  go¬ 
ing  after  a  market  in  a  new  location.  That  is  where  many  com¬ 
panies  are  today,  particularly  those  on  the  Internet,  as  there  is 
far  more  growth  potential  outside  North  America  and  Western 
Europe  (Siegel  &  Dray,  2005).  If  international  adoption  is  part 
of  the  business  objectives  for  the  product  and  will  be  a  factor  in 
determining  the  product’s  success — that  is,  if  your  company 
is  serious  about  selling  internationally,  then  they  absolutely 
must  be  equally  serious  about  understanding  the  users  in 
those  markets. 

Many  product  teams  think  that  if  they  have  a  team  member 
from  the  geographic  or  cultural  area  of  interest  they  do  not  have 
to  do  any  user  research,  including  any  task  analysis.  This  is  not 
true.  A  representative  from  a  major  user  group  can  be  very  help¬ 
ful,  but  Siegel  and  Dray  (2005)  gave  several  reasons  why  it  is 
not  enough.  First,  products  often  require  specific  domain 
knowledge.  If  the  representative  does  not  have  this  domain 
knowledge,  the  team  may  be  misguided.  Second,  just  because 
a  person  was  raised  in  a  particular  country  or  culture  does  not 
mean  that  person  can  speak  about  the  impacts  of  the  culture  on 
software  designed  for  a  specific  profession  within  that  culture. 
Third,  a  person  who  has  been  out  of  the  country  or  culture  of 
interest  very  likely  has  lost  at  least  some  cultural  awareness — 
and  may  be  out  of  touch  with  the  current  cultural  situation. 
Fourth,  the  fact  that  the  person  relocated  may  make  that  person 
different  from  those  who  chose  to  stay. 


Another  invalid  assumption  is  that  simply  translating  the 
product  into  the  local  language  is  enough.  Text  is  only  one  as¬ 
pect  of  an  interface.  Localization  takes  the  effort  a  step  further. 
The  goal  of  localization  is  to  ensure  that  all  information  is 
adapted  to  a  particular  location  or  culture  (Uren,  1997).  This 
can  include  such  information  as  currency,  date  formats,  col¬ 
ors,  icons,  and  so  on.  Task  analysis  is  ideal  to  get  this  level  of 
understanding.  It  helps  you  understand  both  similarities  and 
differences. 

Preparing  For  Your  International  Task  Analysis 

Preparation  is  a  key  element  of  every  successful  study.  Interna¬ 
tional  studies  add  several  significant  considerations  including 
translators,  recruiting,  logistics,  and  background  research.  After 
discussing  these  four  considerations,  we  give  some  hints  on 
preparing  for  a  rapid  international  task  analysis. 

Translators.  If  the  language  of  your  study  participants  is 
not  your  native  language,  you  will  want  to  employ  a  transla¬ 
tor.  This  person  will  need  to  translate  all  of  your  tasks  analysis 
materials  before  hand  and  then  to  translate  in  real  time  dur¬ 
ing  the  session  (simultaneous  translation).  Simultaneous 
translators  are  often  known  as  interpreters,  and  interpreting  is 
a  specialized  and  expensive  skill,  as  it  requires  a  great  deal  of 
concentration  and  focus  over  extended  periods.  You  will  also 
need  to  allow  time  to  train  the  translator  in  the  goals,  objec¬ 
tives,  and  procedures  of  your  study  to  ensure  that  as  little  as 
possible  get  lost  in  translation.  (This  is,  of  course,  the  ideal 
approach.  We  present  some  time  and  money  saving  alterna¬ 
tives  in  the  section  on  rapid  international  task  analysis  a  little 
later  in  the  chapter.) 

Recruiting.  Recruiting  for  an  international  study  is  not 
the  same  as  recruiting  for  a  local  study.  Factors  that  often  differ 
include  appropriate  ways  to  contact  users,  recruitment  and  in¬ 
centive  costs,  no  show  rates,  the  times  that  people  are  willing  to 
participate,  and  holidays.  Be  sure  to  account  for  these  differ¬ 
ences  in  your  plan.  If  possible,  it  is  wise  to  engage  a  local  re¬ 
cruiting  company.  If  your  company  is  international,  engage  the 
people  in  the  office  of  the  country  you  will  be  visiting  to  help 
you  find  users  and  to  learn  about  the  culture  in  advance  (Siegel 
&  Dray,  2005). 

Logistics.  If  you  are  visiting  multiple  locations,  time  your 
visits  so  that  you  can  minimize  your  travel  time  and  costs.  You 
do  not  want  to  fly  roundtrip  to  Hong  Kong  from  New  York  and 
then  two  weeks  later  fly  to  Shanghai  from  New  York.  Also,  be 
sure  to  give  yourself  some  time  to  adjust  to  new  time  zones. 
Task  analysis  activities  require  you  to  be  very  alert  and  attentive. 
You  will  not  be  of  much  value  if  you  are  completely  exhausted. 

Also  consider  your  equipment.  Will  it  work  in  the  country  to 
which  you  are  traveling?  Will  it  be  too  much  with  which  to 
travel?  If  you  need  to  scale  back,  you  should  consider  this  early 
on.  Should  you  rent  the  necessary  equipment?  If  yes,  from 
where  should  you  rent  it?  Thinking  through  all  of  the  details 
ahead  of  time  will  save  headaches  as  your  site  visit  approaches 
and  while  you  are  on  site. 
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Background  research.  Do  as  much  background  re¬ 
search  as  you  can  before  you  go.  This  will  help  to  prepare  you 
for  some  of  the  things  you  many  encounter.  For  example,  read 
travel  guides  and  books  on  cultural  etiquette.  Read  articles  in  re¬ 
search  papers,  professional  magazines,  and  on  the  web.  Exam¬ 
ine  competitor  products  that  may  be  thriving  in  the  culture 
you  are  going  to  visit.  Use  this  information  to  guide  you  as  you 
create  your  task  analysis  plan. 

This  is  a  lot  of  preparation  to  conduct  international  task  analy¬ 
sis.  The  good  news  is  the  preparation  is  worthwhile.  As  you 
come  to  understand  a  culture  and  make  connections  with  peo¬ 
ple  from  that  location,  future  studies  will  be  easier  and  less  costly. 
The  understanding  that  you  get  from  an  international  task  analy¬ 
sis  will  pay  off  richly  in  the  product’s  success  in  that  culture. 

Doing  rapid  international  task  analysis.  At  times, 
time  and  resources  will  not  allow  you  to  do  all  the  preparation 
we  advise  or  to  conduct  an  international  study  in  the  ideal  way. 
Earlier  in  the  chapter,  we  discussed  conducting  remote  studies. 
Two  other  approaches  for  those  situations  are  to  hire  local  con¬ 
sultants  and  to  conduct  the  study  in  your  language.  Both  ap¬ 
proaches  are  less  than  ideal  and  work  best  when  you  already 
have  some  experience  or  familiarity  with  the  culture. 

Hire  local  consultants.  Having  someone  local  conduct 
the  study  can  be  a  tremendous  time  and  resource  saver.  They 
can  conduct  the  study  in  the  local  language  for  you  and  send  a 
translated  report.  Professional  associations,  such  as  Computer- 
Human  Interaction  (CHI),  Human  Factors  and  Ergonomics  So¬ 
ciety  (HFES),  and  Usability  Professionals’  Association  (UPA)  can 
be  good  starting  points  to  look  for  these  people.  It  may  be  ben¬ 
eficial  to  have  the  data  interpreted  by  someone  from  the  culture 
of  interest  and  avoid  misinterpretations  that  may  occur  due  to 
your  lack  of  understanding  of  the  culture. 

One  disadvantage  of  this  approach  is  that  it  can  be  difficult  to 
find  professionals  who  truly  understand  task  analysis.  You  want 
to  make  sure  that  they  understand  the  method  and  goals  of 
your  study.  A  second  disadvantage  is  that  seeing  is  very  power¬ 
ful.  A  tremendous  amount  is  learned  about  a  culture  by  being 
immersed  in  it.  If  you  do  not  go,  you  miss  that  immersion.  A 
third  disadvantage  is  that  if  someone  who  lives  in  that  culture 
conducts  the  study,  they  may  take  some  of  its  subtleties  for 
granted  and  not  report  on  them  even  though  they  are  critical 
for  your  team  to  know.  A  fourth  disadvantage  is  that  if  you  are 
conducting  research  in  multiple  countries  with  different  con¬ 
sulting  teams,  it  becomes  harder  to  compare  and  contrast 
across  cultures  because  different  people  who  may  actually  never 
speak  with  one  another  collected  the  data. 

Conduct  the  study  in  your  language.  Another  way  to 
save  time  and  money  is  to  conduct  the  study  in  the  language  of 
the  product  team.  This  eliminates  the  need  for  a  translator.  It  of¬ 
ten  works  best  in  countries  where  the  team’s  language  is  a  very 
common  second  language.  It  is  particular  tempting  to  do  this  if 
English  is  the  team’s  language  because  in  many  countries  it  can 
be  relatively  easy  to  find  English-speaking  users. 

Much  success  can  result  from  this  method.  However,  you 
must  consider  the  fact  that  people’s  perception  of  how  fluent 
they  are  may  not  reflect  reality.  You  should  always  conduct  a 


screening  call  with  these  users  before  visiting  them  to  assess 
their  language  skills.  Another  consideration  is  that  choosing 
users  who  speak  your  language  restricts  the  pool  of  participants, 
and  as  a  result,  the  participants  you  work  with  may  not  be  rep¬ 
resentative  of  the  user  population  of  interest. 

Collecting  the  International  Task  Analysis  Data 

As  you  collect  data  internationally,  you  should  add  two  critical 
considerations  to  your  repertoire:  being  culturally  sensitive  and 
being  aware  of  elements  that  differ  in  different  countries  and 
cultures. 

Being  culturally  sensitive.  Research  shows  that  meth¬ 
ods  based  on  the  ’Western  thought  can  lead  to  cultural  conflict 
and  misinterpretation  of  data,  particularly  in  Asian  cultures 
(Chavan,  2005;  Ann,  2004). 

For  example,  Chavan  (2005)  explained  that,  in  India,  users 
often  hesitate  to  say  that  something  is  bad  or  to  identify  issues 
with  a  product.  Even  if  an  Indian  user  does  not  highlight  any 
problems,  you  cannot  assume  that  no  problems  exist  for  that 
user.  Users  in  India  may  also  be  more  likely  than  users  else¬ 
where  to  feel  that  they  are  being  evaluated  just  because  some¬ 
one  is  observing  them. 

For  another  example,  Ann  (2004)  said  that,  in  China,  rela¬ 
tionships  are  very  important.  Friendship  is  a  prerequisite  to 
dealing  in  business  or  finance.  This  may  make  it  difficult  to  re¬ 
cruit  participants  unless  they  come  from  friends’  referrals.  If  you 
are  not  connected  via  a  friend,  users  may  not  want  you  in  their 
homes.  Even  if  you  are  able  to  enter  a  home  without  such  a  con¬ 
nection,  you  may  not  hear  the  users’  true  feelings,  as  they  may 
take  great  care  not  to  offend  you.  If  you  do  not  continue  the 
friendship  after  the  study,  Chinese  participants  may  feel  used 
and  decline  to  participate  in  future  studies.  Furthermore,  a 
"Western  researcher  and  participant  relationship  would  not  be 
well  received  in  China.  The  "Western  approach  in  which  you  ask 
questions  in  a  very  scientific  manner  and  reask  them  in  different 
ways  to  probe  deeper  may  lead  to  distrust  rather  than  to  useful 
information.  Less  intrusive  styles  may  be  more  appropriate. 

Therefore,  choosing  a  culturally  sensitive  approach  to  task 
analysis  for  the  location  you  will  be  visiting  is  critical.  This  is 
where  local  consultants  or  people  in  your  company  at  that  lo¬ 
cation  can  be  invaluable.  Whether  you  actually  conduct  the 
study  or  have  them  do  it,  they  can  be  extremely  valuable  in  help¬ 
ing  your  refine  your  approach  so  that  the  experience  is  pleas¬ 
ant  and  comfortable  for  the  users  and  so  that  you  collect  valu¬ 
able  and  accurate  data. 

Being  aware  of  elements  that  differ  in  different  coun¬ 
tries  and  cultures.  As  Siegel  and  Dray  (2004)  recommended, 
you  should  be  aware  of  the  many  elements  that  change  between 
countries  and  cultures  so  that  you  can  collect  data  on  whichever 
of  these  are  relevant  to  your  product.  Here  are  just  a  few  of  the 
elements  to  consider: 

•  Purchasing  dynamics  and  financial  transactions  (e.g.,  forms 

of  payment,  willingness  to  disclose  financial  information  to 

others  or  online) 
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•  Social  structure  and  service  expectations  (e.g.,  society  classes, 
willingness  to  do  it  yourself) 

•  Mental  models  of  geography  (e.g.,  address  formats,  maps) 

•  Use  of  physical  space  (e.g.,  size  of  workspaces) 

•  Climate  and  environmental  conditions  (e.g.,  how  do  envi¬ 
ronmental  conditions  impact  the  user  of  the  product) 


ANALYZING  AND  PRESENTING  THE  DATA 


Once  you  have  collected  the  data,  of  course,  you  must  analyze  it 
and  present  it.  Data  is  of  no  value  if  you  do  not  communicate 
what  you  have  learned  to  the  people  who  need  the  information. 
To  make  the  data  useful,  you  must  bring  the  data  together,  think 
about  what  you  have  learned,  and  draw  out  implications. 

Analyzing  the  Data 

Consider  these  four  principles  as  you  plan  to  analyze  the  data: 

Involve  the  design  team.  Making  the  effort  to  involve 
the  rest  of  team  pays  off  handsomely.  Involving  other  team 
members  ensures  that  they  have  a  stake  in  the  results.  It  also 
allows  them  to  work  with  the  raw  data,  which  helps  them  inter¬ 
nalize  the  work  of  the  user  more  completely,  even  if  they  did  not 
get  to  participate  in  the  site  visits.  When  you  involve  the  team  in 
the  analysis,  you  build  a  shared  understanding  of  what  was  seen 
and  heard  at  the  customer  or  user  sites. 

It  also  ensures  that  the  questions  the  team  has  get  answered. 
Often,  teams  refine  and  redirect  their  thinking  as  they  go 
through  an  analysis.  They  may  drill  down  more  deeply  into  the 
data,  completely  change  the  questions  they  have  of  the  data, 
and  may  change  their  thinking  about  the  direction  they  had 
planned  for  the  product. 

Working  as  a  team  can  also  help  you  generate  actionable 
items  that  result  from  the  study.  You  can  work  together  to  pri¬ 
oritize  the  findings.  This  is  ideal  to  do  as  a  team  because  you  can 
discuss  limitations,  such  as  time,  and  determine  what  is  truly 
feasible. 

Make  it  traceable.  Any  analysis  should  include  refer¬ 
ences  back  to  the  raw  data.  There  are  many  advantages.  First 
and  foremost,  keeping  the  link  ensures  the  integrity  of  the 
analysis.  It  is  important  to  be  able  to  say  to  those  who  were  not 
involved  in  collecting  and  analyzing  the  data  that  all  conclusions 
are  traceable  back  to  statements  by  users  or  direct  observation 
of  their  behaviors.  Second,  as  we  noted  in  discussing  the  first 
principle,  interpretations  may  change  as  analysis  progresses. 
During  that  process,  it  is  important  to  be  able  to  revisit  the  data 
and  recall  the  context  of  the  behavior  or  the  comments.  If  you 
set  up  ways  of  tracking  the  data  through  analysis,  the  extra  time 
and  effort  need  not  be  substantial. 

Make  it  visible  and  accessible.  The  analysis  may  be 
complex  and  detailed,  but  a  report  laden  with  text  will  almost 
certainly  go  unread.  There  is  not  time  in  the  rush  of  a  project  for 


people  to  read.  Therefore,  many  teams  choose  to  display  their 
analyses  on  a  wall  or  in  a  war  room,  where  team  members  can 
review  and  add  comments  to  the  display  (e.g.,  Simpson,  1998.) 

An  alternative  is  a  hyperlinked  document  in  which  higher- 
level  conclusions  are  linked  to  more  specific  analyses.  Often,  the 
analysis  is  graphical  so  that  designers  can  stand  back  and  see 
patterns  in  the  data.  Some  people  create  multilevel  documents 
with  a  one-page  summary  supported  by  a  3-page  overview, 
which  in  turn  is  supported  by  a  50-page  report. 

Match  the  form  to  the  questions,  the  stage,  and  the 
team ’s  needs.  The  cardinal  rule  of  all  documentation  is  to 
give  users  what  they  need  in  the  form  they  need  it  when  they 
need  it.  That  is  why  most  technical  communicators  have  moved 
from  writing  extensive  tomes  that  people  do  not  open  to  help¬ 
ing  teams  bring  communication  into  the  interface. 

The  same  principle  applies  to  the  internal  working  of  any 
project  team.  The  best  form  in  which  to  represent  the  data  de¬ 
pends  on  many  factors,  including  the  questions  that  were  asked, 
the  stage  in  the  project’s  life  cycle  (that  is,  how  the  information 
will  be  used),  the  time  in  which  the  information  is  needed,  and 
the  team  and  company  culture. 

Keep  in  mind  the  essential  purpose  of  any  analysis.  Analysis 
provides  an  anchor  from  which  designs  can  be  generated  and 
against  which  they  can  be  evaluated.  The  analysis  is  not  an  end 
in  itself;  you  must  keep  the  design  team  engaged  with  the  analy¬ 
sis  and  with  the  representations  of  the  analysis.  In  the  next 
section,  we  describe  a  few  of  the  possible  representations. 

Presenting  the  Data 

Here  are  13  ways  to  present  task  analysis  data  (see  also  Hackos 
&  Redish,  1998;  Courage  &  Baxter,  2004,  described  data  analy¬ 
sis  for  a  variety  of  task  analysis  techniques;  Miles  &  Huberman, 
1994,  also  described  the  rich  variety  of  ways  to  organize  quali¬ 
tative  data  for  interpretation). 

Affinity  diagrams.  Affinity  diagrams  are  hierarchical  pic¬ 
tures  of  user  data.  They  are  produced  inductively  by  grouping 
similar  data  elements  together  into  categories  and  then  group¬ 
ing  the  categories  together.  Fig.  48.3  is  an  abstract  illustration 
of  an  affinity  diagram  in  progress. 

Affinity  diagrams  derive  much  of  their  value  from  the  process 
that  produces  them,  that  is,  a  deep  engagement  with  the  data 


FIGURE  48.3.  An  abstract  illustration  of  an  affinity  diagram  in 
progress. 
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1.  I  must  FIND,  MAINTAIN,  and  COMMUNICATE  With  customers,  and  sometimes  others. 

1.1.  /  need  to  keep  in  touch  with  clients,  colleagues,  and  family. 

1.1.1.  I  must  make  sure  my  landline  islorwarded-can't  miss  calls  while  I'm  out. 

•  P  622  SW  6.2  Seq  (Getting  ready  to  leave):  Turns  off  the  phone  so 
that  the  answering  machine  goes  online. 

•  P  508  ?W  5  Seq  NOTS:  He  has  a  landline,  but  has  forwarded  it  to  his 
mobile  phone  so  that  he  always  can  get  calls  when  he’s  away  from 
the  office  and  also  so  that  he  doesn’t  forget  to  forwardthe  phone 
when  he  leaves  the  office. 

1.1.2.  My  cell  phone  is  essential  during  the  business  day-it  must  be  within  easy 

reach. 

•  P  349  US  3  Notes  He  docks  his  phone  on  the  dash  where  he  can 
reach  it  easily- 

•  P  442  SW  4  Plow  NOTE:  The  phone  is  everywhere  in  this  flow 
diagram — as  much  as  he  is  hitting  the  pavement,  he  is  also  hitting 
the  wire. 

•  P  552  SW  5  Notes  If  I  leave  my  phone  at  home.  I’ve  ruined  my  day.  I 
usually  have  at  least  1  visit,  and  t  need  the  phone  [to  help  if  (  get  lost, 
to  confirm  the  visit,  and  to  keep  track  of  other  business  when  I’m 
traveling]. 

•  P  628  SW  6.1  Seq  6.1  has  her  bag  back  there  in  the  trunk  too,  but 
her  mobile  phone  is  in  her  hand,  [Can’t  tell  if  earphone  is  on;  it’s  not 
by  the  time  we  get  out  of  the  car  in  any  case.) 

•  P  834  SW  S  Seq  In  the  car  he  puts  the  mobile  phone  between  his 
legs,  resting  on  the  seat.  BREAKDOWN:  I  normally  have  the  mobile 
phone  put  away  in  my  pocket  (or  somewhere)  and  an  earpiece  in  my 
ear  so  I  don’t  have  to  mess  with  it  while  I  drive.  [ASSUME:  There’s 
usually  a  switch  next  to  the  hanging  microphone  to  let  you  answer 
calls.)  But  this  borrowed  phone  doesn't  have  an  earpiece. 


FIGURE  48.4.  A  partial  example  of  results  from  an  affinity  diagram. 


FIGURE  48.5.  An  example  of  an  artifact. 


Fig.  48.6  is  a  small  part  of  a  flow  diagram.  The  different  pat¬ 
terns  on  the  underlying  circles  represent  different  people  (man¬ 
agers,  nonmanagerial  professionals,  consumers,  etc.)  who  are  in¬ 
volved  in  this  flow.  Note  once  again  that  the  references  to  the 
raw  data  are  kept  with  the  flow  diagram  so  that  the  team  can  re¬ 
turn  to  the  data  to  understand  more  about  the  user  and  the  con¬ 
text  for  each  data  point. 


combined  with  recurring  reflections  on  the  generalization  that 
best  captures  a  number  of  data  elements.  Teams  often  produce 
collateral  elements  while  creating  an  affinity,  such  as  design 
ideas  and  additional  questions,  that  are  captured  and  then  used 
in  design  or  further  data  gathering. 

Fig.  48.4  is  an  example  of  a  small  part  of  the  result  of  doing 
an  affinity  diagram.  Note  how  the  team  has  kept  the  reference  to 
the  data  with  each  bullet  point  (that  is,  each  note  in  the  affinity 
diagram). 

Artifacts.  Some  types  of  tasks  are  deeply  intertwined  with 
their  artifacts.  For  example,  the  task  of  making  appointments  is 
necessarily  interwoven  with  calendars  that  show  dates  and 
times.  As  a  result,  it  is  often  best  to  begin  with  existing  artifacts 
(e.g.,  Ellen’s  appointment  book)  and  to  organize  data  around 
that  representation. 

The  team  may  or  may  not  decide  that  the  representation  is  in 
fact  the  best  metaphor  to  bring  into  the  new  product.  Even  if 
they  do  not,  the  artifacts  and  representation  are  often  the  best 
way  to  get  an  initial  understanding  for  a  task  analysis.  Fig.  48.5 
shows  an  example  of  an  artifact  that  you  might  want  to  collect 
if  you  were  creating  a  program  to  help  people  keep  track  of 
time,  dates,  meetings,  tasks,  and  so  on. 

Flow  diagrams.  Flow  diagrams  answer  questions  about 
how  information  or  artifacts  flow  through  a  system  (process 
analysis).  They  illustrate  the  dependency  between  system  ele¬ 
ments  or  states  of  the  system  and  what  needs  to  transferred  or 
moved  from  one  part  to  another.  They  also  show  how  roles  are 
divided  within  an  organization  as  data  moves  from  one  per¬ 
son  or  department  to  another  or  between  the  organization 
and  outsiders. 


Personas.  Personas  are  composite  archetypes  that  repre¬ 
sent  the  primary  and  secondary  users  of  the  product.  A  persona 
description  often  describes  the  user’s  activities,  knowledge,  and 
tasks  in  some  depth.  Thus,  personas  may  integrate  the  two  ele¬ 
ments  of  users  and  tasks  from  the  triangle  presented  in  Fig. 
48.1.  In  fact,  some  people  also  include  information  about  the 
persona’s  environments  (physical,  social,  cultural,  technologi¬ 
cal),  thus  capturing  all  the  triangulated  data  that  we  discussed  at 
the  beginning  of  the  chapter. 

Rich  persona  descriptions  that  encompass  user,  task,  and  en¬ 
vironment  information  are  particularly  useful  for  commercial 
products,  which  often  begin  with  market  segmentation  that 
classifies  and  describes  potential  customers.  The  task  analysis 
builds  on  this  data  by  characterizing  these  users  more  precisely 
(Lee  &  Mikkelson;  2000). 

Because  personas  instantiate  users  as  actual  people,  they 
tend  to  be  memorable.  You  can  ask  “How  would  Julie  do  her 
work  if  we  design  the  product  this  way?”  (see  Cooper,  1999). 
Personas  can  come  to  team  meetings  as  life-size  cardboard  fig¬ 
ures  (Tahir,  1996),  on  posters,  or  on  placemats.  They  can  be¬ 
come  part  of  the  product  team’s  e-mail  group  with  someone  in 
charge  of  representing  them  (Pruitt  &  Adlin,  2005). 

Fig.  48.7  is  one  of  many  ways  of  showing  a  persona.  The 
real  example  would  include  photographs  or  sketches  where 
indicated. 

Scenarios.  A  scenario  is  a  short  story  of  a  specific  situa¬ 
tion  that  is  real  and  relevant  to  a  user.  A  scenario  gives  the  team 
the  user’s  goal  and  specific  needs.  It  often  also  gives  the  team 
the  user’s  names  for  objects  and  attributes  of  those  objects.  It 
may  give  the  team  information  on  what  the  user  values  (e.g..  Is 
price  more  important  than  choice  in  renting  a  car?) . 
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FIGURE  48.6.  Part  of  a  flow  diagram  to  present  task  analysis  data. 


Picture  of 

Julie  Morris 

Julie  at  work 

Travel  Agent 

Globe  Travel 

Kansas  City 

□  Specialty; 

Corporate  Travel 

□  Experience; 

5  years  as  travel  agent 

5  years  with  PRODUCT 

□  Technology; 

PC  computer  only  at  work 

no  computer  at  home 

keyboard  savvy 

limited  mouse  experience 

computer  has  Windows 

but  she  spends  most  of  day 

in  transaction-code  based  program 

□  Values; 

speed  of  transactions 
finding  bargains  for  her  clients 
knowing  lowest  fare 

Julie's  work  environment 


Picture  of 

Julie's  desk  and  the 

office 

Note  the  sticky  notes  all 
around  Julie's  computer 
monitor. 


□ 


□ 


□ 


□ 


small  office;  three  travel  agents 

all  help  each  other,  but  others  specialize 

in  leisure  and  international,  respectively 

work  on  salary,  not  commission 

on  fifth  floor  of  office  building  so  get  little 
walk-in  traffic 


□  but  keep  their  desks  moderately  clean  as 
Julie's  partner,  Marla,  who  handles  mostly 
leisure  travel,  has  many  clients  who  make 
appointments  to  come  in 


□  pleasant  atmosphere;  travel  brochures; 
nice  extra  chairs  for  clients 


□  printer  and  fax  machine  are  in  small  room 
in  back  to  keep  noise  level  down  (because 
they  are  on  the  phone  so  much)  -  so 
printing  or  faxing  means  getting  up  from 
desk 


About  Julie 

□  35  years  old 

□  travel  agent  for  5  years;  was  in  retail  sales 
before 

O  enjoys  the  phone  contact  with  clients;  likes 
helping  people 


Picture  of  Julie 
and  her  family  skiing 
(from  her  desk  at 
work) 


How  Julie  works 

a  clients  call;  usually  know  travel  dates,  destinations,  other  needs 
(common  to  corporate  travel  -  note;  leisure  travel  Is  different) 

□  Julie  makes  notes  on  paper  In  spiral  notebook  she  keeps  on  desk  - 
says  it  is  useful  to  go  back  to  If  issue  comes  up  about  not  getting 
arrangements  right;  she  has  notes  of  what  person  asked  for 


□  loves  to  travel,  loves  perks  of  being  a 
travel  agent 


□  knows  transaction  codes  for  air  travel  to  most  major  US  cities;  fast 
with  getting  availability 


□  likes  the  job  but  it  Is  just  a  job 

a  took  original  required  training  but  has  not 
been  to  training  class  since 

□  no  time  for  new  training;  after  work  she 
has  family  and  other  social  obligations 

□  Is  comfortable  with  the  computer,  but 
thinks  she  doesn't  know  a  lot  about 
computers;  doesn't  like  it  when  something 
goes  wrong,  for  example,  with  printer  or 
fax  —  doesn't  feel  she  could  fix  it 


□  if  booking  a  car  for  so  mething  special  -  drop  off  in  another  city; 
getting  it  someplace  other  than  airport;  need  something  special 
like  ski  rack,  etc.  will  call  car  companies 

□  also  books  hotel  online  for  clients  that  repeat  co  mmon  visits;  but 
often  calls  hotels  because  needs  information  about  price,  room, 
special  requests  that  aren't  easy  to  do  online 

□  asks  other  agents  for  help  with  transaction  codes  and  her 
colleagues  are  helpful  if  they  know  -  but  often  other  person  is  also 
on  the  phone,  so  Julie  can't  ask  and  has  to  experiment  and  figure 
it  out  herself -or  tell  client  she'll  do  research  and  call  back 

□  says  that  a  major  portion  of  her  time  is  spent  printing  tickets  and 
making  sure  that  the  right  ticket  gets  in  the  right  envelope  and  to 
the  right  client 


FIGURE  48.7.  An  example  of  one  type  of  persona  (user  profile). 
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Client  calls  the  travel  agent.  Client  wants  to  go  to  Phoenix  for  a 
weekend  to  see  her  friend,  but  she  is  very  concerned  about  the  price. 
She  would  like  to  go  sometime  this  month.  She  can  be  flexible  about 
when  to  go  and  come  back.  She  wants  to  know  what  the  best  fare  is, 
if  it  is  available  this  month,  and  what  flights  she  can  get  for  that  fare. 


FIGURE  48.8.  An  example  of  a  scenario. 

Each  situation  that  you  observe  on  a  site  visit  is  a  scenario. 
You  can  also  collect  scenarios  by  interviewing  users  through  the 
critical  incident  technique  (Flanagan,  1954)  in  which  you  ask 
users  to  recall  a  specific  incident  and  then  to  tell  you  about  it. 

Fig.  48.8  is  an  example  of  a  scenario  from  a  study  of  travel 
agents  (Redish  &  James,  1996). 

You  can  elaborate  a  scenario  with  the  sequence  diagram 
(flowchart)  of  the  procedure  the  user  went  through  to  accom¬ 
plish  the  scenario.  If  accomplishing  the  scenario  is  difficult,  and 
the  scenario  is  important,  creating  a  more  efficient  procedure 
could  become  a  requirement  for  the  new  product. 

Sequence  diagrams.  Flow  diagrams  track  work  through 
a  system  or  across  people.  Sequence  diagrams  use  time  to  track 
the  actions  and  decisions  that  a  user  takes.  Sequence  diagrams 
(procedural  analysis)  show  what  users  do  and  when  and  how 
they  do  it.  This  type  and  level  of  information  is  critical  for  inter¬ 
face  architecture  and  design  because  it  gives  us  the  functions, 
objects,  and  attributes  of  a  system  (e.g.,  menu  items  and  dialog 
box  design)  and  the  navigation  for  a  website. 

By  laying  out  the  sequence  diagram  that  represents  what 
users  do  today,  you  can  often  see  ways  to  make  the  product 
help  users  be  more  efficient  and  effective.  Thus,  the  sequence 
diagram  of  the  reality  of  what  you  find  in  predesign  observa¬ 
tions  is  often  elaborate  and  messy.  It  may  be  important  for  the 
team  to  see  that  reality  as  they  work  towards  a  more  useful 
product. 

Fig.  48.9  is  an  example  of  part  of  a  sequence  diagram. 

Tables.  Tables  are  an  excellent  way  to  show  comparisons 
and  so  are  useful  for  presenting  many  types  of  analysis.  Techni¬ 
cal  communicators,  for  example,  have  traditionally  used  a  user/ 
task  matrix  to  understand  which  tasks  are  done  by  which  types 
of  users.  The  user/task  matrix  becomes  a  major  input  to  a  com¬ 
munication  plan — to  answer  the  question  of  what  tasks  to  in¬ 
clude  in  documentation  for  people  in  different  roles  (e.g.,  sys¬ 
tem  administrators,  end  users).  Tables  can  be  used  to  show  the 
relationships  between  any  two  (or  more)  classes  of  data.  See 
Table  48.2  for  an  example  of  a  table  with  three  classes  of  data. 

User  needs  tables.  Kujala,  Kauppinen,  and  Rekola  (2001) 
developed  what  they  call  a  “user  needs  table”  as  a  way  to  pre¬ 
sent  a  current  task  sequence  along  with  the  problems  and  pos¬ 
sibilities  that  the  designers  should  think  about  for  each  step  in 
the  sequence.  They  hypothesized  that  it  would  be  easier  to  use 
findings  about  users’  needs  in  design  if  the  findings  were  con¬ 
nected  to  the  task  sequence  that  forms  the  basis  of  use  cases. 
In  their  studies,  designers  found  this  type  of  presentation  very 
useful  in  moving  from  data  to  requirements.  They  also  found 
that  writing  use  cases  from  this  type  of  presentation  keeps  the 
use  case  in  the  user’s  language  and  keeps  the  use  case  focused 
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FIGURE  48.9.  Part  of  a  sequence  diagram. 


on  the  user’s  point  of  view.  Table  48.3  shows  an  example  of  a 
user  needs  table. 

Mood  Boards  for  quick  data  dissemination.  Mood 
Boards  (Foucault,  2005)  are  an  ideal  way  to  present  preliminary 
research  results  when  your  team  is  under  a  tight  time  line  and 
you  want  to  get  them  some  data  quickly  or  when  you  want  to 
keep  the  momentum  from  the  study  going. 

Mood  Boards  are  a  collection  of  raw  data  and  artifacts  pre¬ 
sented  with  the  goal  of  inspiring  the  product  team  early  in  the 
development  cycle.  They  often  take  the  form  of  a  large  poster 
showing  photographs,  sketches,  screenshots,  participant  quotes, 
and  artifacts  from  the  field,  such  as  forms,  printouts,  and  Post- 
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TABLE  48.2.  Matrix  for  Weighing  Market  Size,  Users'  Tasks,  and  Potential  Product  Functions 


Users 

Relative 

market 

size 

Hi— 3 

Med— 2 

Low — 1 

Hi  level  task 

Task 

relative 

importance 

Hi— 3 

Med— 2 

Low — 1 

Tool  Bar  Customization 

Better  drag  and  drop 

1  Progress  meters 

1  Interruption  protection  recovery 

Total  (sum  of 
products,  how 
well  market  is 
served  by  these 
features) 

Techno  Bob 

1 

Customize  my  screen 

3 

2 

1 

0 

0 

9 

Down  load  information 

3 

0 

0 

1 

2 

9 

Newbie  Ed 

1 

Customize  my  screen 

1 

-1 

2 

0 

0 

1 

Down  load  information 

3 

0 

2 

3 

0 

12 

Practical  Sue 

3 

Customize  my  screen 

2 

2 

2 

1 

0 

36 

Down  load  information 

2 

0 

2 

1 

2 

30 

Sum 

17 

35 

27 

18 

—  1 — will  confuse  users 

0 — users  will  not  use  it 

1 —  users  mildly  positive 

2 —  users  strongly  positive 

it  notes.  You  can  use  Mood  Boards  to  highlight  some  of  the  in¬ 
teresting  or  unexpected  findings  from  the  study. 

You  can  put  together  a  Mood  Board  quickly  because  the  date 
is  not  yet  refined  or  synthesized.  Mood  Boards  can  also  be  ideal 
for  international  studies  because  they  can  convey  differences 
between  cultures  and  can  give  designers  a  feeling  for  the  aes¬ 
thetics  of  the  other  culture.  Fig.  48.10  is  an  example  of  a  Mood 


Urban  Elderly 


.osi  .vd'.  Arecentstudyof  Boston's  elderly  population  found  that  15% 

of  the  city’s  older  residents  don't  have  anyone  they  can  call  in  an  emergency 
and  10%  never  leave  their  homes.  (AARP  Bulletin) 

Urban  Archaeology:  Enable  elders  to 
generate  a  digital  city  history  to  supplement 
physical  cityscapes. 

’  ^urba  t  Per  Urban  elderly  face  dramatically  higher  mortality  rates  than 
their  suburban  and  rural  counterparts  due  to  the  stresses  of  a  city  -  pollution, 
crime,  poverty,  social  isolation.  In  some  cities,  55+  residents  are  twice  as  likely 
to  die  than  peers  in  nearby  suburbs. 

-(65+  .  -  '  •  -  -  -  about  50%  live  in  suburbs  of 

major  cities,  and  27%  live  in  major  cities  themselves.  Only  23%  of  oldsters 
live  outside  of  metropolitan  areas. 

Lack  of  affordable  health  care  may  be  the  main 
culprit  leading  to  premature  death  among  urban  elders.  Older  people  Irving 
in  cities  may  not  have  social  and  medical  contacts  that  enable  early  detection 
of  disease  and  decline. 

Urban  Presence:  Reintegrate  eiders  into  urban 
communities  by  allowing  them  to  watch  over  cities 
through  digital  windows.  Create  digital  opportunities 
for  digital  knowledge  sharing  &  guidance. 

let  Acce£  Urban  elders  have  far  fewer  barriers  to  Internet  access  than 
their  rural  counterparts.  Two-thirds  of  urban  elders  say  they  could  access  the 
Internet  If  they  wanted  to,  compared  to  only  half  or  rural  elders. 

u^Lies.  UriDan  elders,  like  rural  elders,  value  relaxed  social  lives,  beautiful 
surroundings,  and  access  to  basic  needs  including  health  care  and  safety. 

FIGURE  48. 1 0.  An  example  of  a  mood  board  representing  urban 
elderly.  Courtesy  of  Intel  Corporation — Digital  Home  Group. 


Board  from  a  study  that  was  part  of  a  project  to  develop  tech¬ 
nology  for  elderly  in  the  United  States.  Mood  Boards  were  cre¬ 
ated  for  areas  where  there  were  product  possibilities.  These  in¬ 
cluded  urban  elderly,  elderly  fitness,  and  fraud  against  elderly. 

Live  from  the  field:  Another  technique  for  quick  data 
dissemination.  Another  very  effective  rapid  communication 
method  is  to  send  information  back  to  the  team  from  the  field 
in  real  time  (Lovejoy  &  Steele,  2005) .  This  method  is  particularly 
useful  for  international  studies  where  the  product  team  is  of¬ 
ten  quite  curious  and  excited  about  the  study.  In  addition,  if  you 
are  going  to  be  traveling  for  an  extended  period,  you  can  get 
some  data  to  the  team  so  they  do  not  have  to  wait  for  weeks  to 
hear  what  you  are  learning.  Photos,  stories,  snippets  of  data, 
and  insights  can  help  the  team  to  feel  a  part  of  the  study. 

Lovejoy  and  Steele  (2005) ,  for  example,  collected  photos  and 
narrated  the  photos  with  data  to  create  what  they  call  photo  sto¬ 
ries.  Other  useful  ways  to  communicate  findings  quickly  include 
blogs,  e-mails,  and  websites. 

Culture  cards.  The  last  three  data  presentation  meth¬ 
ods  focus  on  conveying  the  culture  of  your  study  participants. 
These  techniques  were  developed  by  Intuit  (Foucault,  2005)  to 
convey  findings  from  an  observational  study. 

Culture  cards  are  a  deck  of  20  to  30  physical  cards  that  are 
used  to  communicate  key  findings  and  images  from  different 
cultures.  Information  such  as  photographs,  brief  research  find¬ 
ings,  user  descriptions,  or  users’  needs  is  printed  on  materials  of 
cultural  significance,  such  as  greeting  cards,  textbook  paper,  or 
handmade  paper  from  the  area.  Fig.  48.11  shows  examples  of 
the  types  of  findings  that  may  be  noted  on  one  side  of  a  culture 
cards.  These  cards  were  created  for  a  project  on  pregnancy  and 
parenting  conducted  by  Intel.  On  the  flip  side  of  such  cards,  you 
might  find  photographs  that  are  representative  of  the  findings. 

The  intent  is  to  be  motivational — to  inspire  the  product  tern 
to  create  culturally  sensitive  and  appropriate  design  solutions. 
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[Aesthetics]  1  just  love  baby  things  - 
they're  so  small  and  cute.  It's  so  much  fun 
picking  out  all  of  the  things  for  the  nurseiy. 

[Get  the  baby  to  sleep]  Can  you  make  my 
baby  sleep  during  the  night?  I'd  pay  a  lot  of 
money  for  something  like  that. 

First  Steps  Research 

1 

First  Steps  Research 

FIGURE  48. 1 1 .  Findings  displayed  on  the  back  of  the  culture  cards.  Courtesy  of  Intel  Corpora¬ 
tion — Digital  Home  Group. 


TABLE  48.3.  An  Example  of  a  User  Needs  Table, 
from  Kujala  et  al.  (200 1 ;  used  with  permission) 

Task  Sequence:  Problems  and  Possibilities: 


Step  i :  When  trapped  in  •  Passengers  want  to  get  out  of  the 


an  elevator,  passenger 
makes  an  emergency  alarm.  • 


Step  2:  Unoccupied  service  • 

centre  operator  receives 
the  emergency  alarm  call  • 

and  asks  for  information 
(description  of  the  failure).  • 

Step  3:  Service  centre  • 

operator  completes 
transmission  of  information  • 

to  the  system  and  sends  it 
to  the  area  service  person.  • 


Step  4:  Service  center 
operator  calls  the  service 
person  and  reads  the 
description  of  the  failure. 


elevator  as  soon  as  possible. 

All  kinds  of  passengers  must  be  able 
to  make  an  alarm  call  (blind, 
foreigners,  etc.). 

Sometimes  passengers  may  make 
false  alarms  unintentionally. 
Passengers  may  be  in  panic. 
Passengers  need  instant 
confirmation  that  they  have  created 
a  connection  to  the  service  centre 
operator  and  that  they  are  going  to 
get  help. 

Different  versions  and  types  of 
remote  monitoring  systems. 
Passenger  is  the  only  information 
source. 

Service  centre  operator  does  not 
notice  the  emergency  alarm  call. 
Laborious  phase  for  the  service 
center  operator. 

Simultaneous  calls  must  be 
differentiated. 

Service  person  cannot  see  all  the 
information. 

Inadequate  information  from  a  site 
system. 

Possibility:  Instructions  as  to  how 
to  operate  the  system. 

Possibility:  Possibility  to  open  phone 
line  from  Call  Center  to  the  elevator. 
Extra  work  for  the  service  center 
operator. 


FIGURE  48. 1 2.  Photo  of  a  culture  scape.  Trying  to  immerse  the 
user  in  the  Asian  culture  by  having  the  team  remove  their  shoes 
and  by  decorating  the  conference  room  with  artifacts.  Courtesy 
of  Intel  Corporation — Digital  Home  Group. 
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Culture  cards  can  provide  a  useful  overview  of  key  findings  and 
serve  as  a  launch  pad  for  brainstorming  and  design  sessions. 
Two  key  advantages  are  that  they  break  the  data  into  digestible 
chunks  and  that  they  cue  the  designer  to  appropriate  aesthet¬ 
ics  for  each  culture. 

Culture  Scapes.  When  communicating  results  from  an 
observational  study,  it  is  challenging  to  convey  the  true  feelings 
and  sensations  generated  by  the  environment.  Smells,  sounds, 
textures,  and  flavors  are  all  important  aspects  of  the  user’s  en¬ 
vironment,  but  they  can  be  difficult  to  relay  to  the  team  (Foucault, 
2005).  Images  and  words  are  often  not  enough.  Fig.  48.12  is  an 
example  of  a  Culture  Scape. 

Culture  Scapes  are  designed  to  allow  the  team  to  experience 
more  richly  the  culture  by  engaging  multiple  senses.  Foucault 
(2005)  successfully  used  Culture  Scapes  when  relaying  findings 
from  a  mobile  technology  study  in  Singapore.  During  the  study, 
the  observers  were  struck  by  the  intense  heat  and  humidity  and 
how  it  related  to  their  observations.  To  convey  these  environ¬ 
mental  realities,  they  presented  their  findings  in  a  room  that  was 
90  degrees  Fahrenheit  with  85%  humidity.  Because  of  the  sim¬ 
ulated  environment,  team  members  got  the  importance  of  these 
environmental  factors  and  their  critical  impacts  on  design.  If  you 
plan  to  use  Culture  Scapes,  it  is  important  while  you  are  in  the 
field  to  consider  what  sensory  experiences  are  most  important 
and  how  you  will  convey  them  when  you  return. 

Culture  capsules.  Culture  capsules  showcase  unfamiliar 
elements  of  a  culture  by  physically  recreating  a  space  from  the 
observed  environment  (Foucault,  Russell,  &  Bell,  2004).  This 
presentation  method  is  of  particular  importance  when  space  is 
the  primary  consideration  in  design.  Photographs  taken  in  the 
field  guide  these  recreations,  which  may  also  include  artifacts 
from  the  physical  environment  being  recreated.  For  example, 
Foucault  et  al.  (2004)  used  this  technique  to  highlight  the  space 
constraints  in  a  typical  Chinese  home  so  the  team  would  un¬ 
derstand  the  limited  space  available  in  a  typical  home  office  in 


China.  By  replicating  the  physical  spaces  they  had  seen,  the  re¬ 
searchers  made  the  necessary  information  about  space  con¬ 
straints  obvious  to  the  product  team.  The  space  also  served  well 
as  a  brainstorming  area  and  a  project  showcase. 


CONCLUSION 


Many  usability  practitioners  do  task  analysis  in  one  form  or  an¬ 
other,  although  often  they  do  not  recognize  it  as  such,  much  as 
Moliere’s  character  found  the  fact  that  he  had  been  speaking 
prose  all  his  life  a  revelation. 

Some  may  hesitate  to  plan  for  task  analysis  because  they 
think  of  it  as  too  complex  and  time  consuming  to  apply  in  real- 
world  situations  where  time  is  always  too  short.  In  this  chapter, 
we  have  tried  to  show  that  task  analysis  is  a  family  of  flexible  and 
scalable  processes  that  can  fit  well  in  almost  any  development 
environment. 

Some  may  have  thought  of  task  analysis  only  in  terms  of  highly 
structured  ways  of  capturing  minute  details  of  specific  proce¬ 
dures  relevant  to  evaluating  already  created  designs  or  devel¬ 
oping  training  or  documentation  for  already  determined  sys¬ 
tems.  In  this  chapter,  we  have  tried  to  show  that  while  such  uses 
of  task  analysis  continue,  the  more  common  use  today  is  in  de¬ 
veloping  a  very  broad  understanding  of  users’  work  and  is  of 
great  use  from  the  earliest  strategic  planning  stages  through  all 
the  phases  of  predesign.  Task  analysis  as  laid  out  in  this  chapter 
continues  to  be  useful  throughout  the  process,  being  used  at 
later  stages  to  develop  scenarios  for  user-oriented  evaluations  as 
well  as  in  inspection  methods. 

Task  analysis  is  a  way  to  involve  the  entire  team  in  understand¬ 
ing  users.  It  provides  ways  to  organize  the  mountain  of  unstruc¬ 
tured  data  that  often  comes  from  field  studies  or  site  visits.  It  is 
an  essential  part  of  the  process  of  creating  any  product  (software, 
hardware,  website,  or  document)  because  products  are  tools  for 
users  to  accomplish  goals;  products  are  all  about  doing  tasks. 
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INTRODUCTION 


Contextual  Design  (CD;  for  an  introduction  to  all  the  steps  of 
CD,  see  Beyer  &  Holtzblatt,  1998)  is  a  customer-centered  design 
process  that  takes  a  cross-functional  team  from  collecting  data 
about  users  in  the  field,  through  interpretation  and  consolida¬ 
tion  of  that  data,  to  the  design  of  product  concepts  and  a  tested 
product  structure.  Over  the  last  18  years,  the  industry  has  moved 
from  using  an  engineering-driven  requirements  and  design 
process  to  a  more  user-centered  process.  Many  now  recognize 
that  the  best  way  to  define  the  right  product,  business  process, 
website,  or  other  system  is  to  involve  users  and  user  data  at 
every  step.  CD  has  been  used  in  companies  and  taught  in  uni¬ 
versities  all  over  the  world,  along  with  other  user-centered  de¬ 
sign  processes.  Today,  enlightened  product  companies  and  IT 
organizations  know  that  successful  projects  and  user  adoption 
come  from  using  user-centered  design  processes. 

But  along  with  a  commitment  to  working  with  users  has  come 
a  demand  to  reduce  the  time  needed  for  up-front  design.  Teams 
want  to  include  user  data  without  adding  significant  time  to  the 
process.  Organizations  have  existing  methodologies  and  want 
user-centered  design  techniques  to  be  included  within  these  ex¬ 
isting  practices.  Opposition  to  bringing  users  into  the  center  of  the 
design  process  often  centers  on  the  lack  of  time  in  the  schedule. 

CD  is  a  framework  for  structuring  front-end  design  to  ensure 
that  user  data  drives  the  overall  system  definition  as  well  as  the 
user  interface.  As  such,  a  CD  project  may  range  from  weeks  to 
months  depending  on  the  scope  and  goals  of  the  project.  In  our 
experience,  complaints  about  time  are  less  about  overall  clock 
time  and  more  about  how  user  data  fits  into  the  existing  habits, 
processes,  job  descriptions,  and  schedules  of  a  company.  To 
help  companies  see  how  to  use  the  steps  of  CD  rapidly  we  have 
put  together  a  structure  for  how  to  use  the  techniques  in  one  to 
seven  weeks.  Rapid  CD  tries  to  answer  the  questions  so  often 
asked  about  the  full  CD  process: 

•  Do  I  have  to  do  all  those  steps?  What  can  I  skip?  When  can  I 

skip  it? 


•  How  does  CD  fit  with  my  existing  design  process?  Can  I  use 
CD  techniques  to  get  customer  data  and  then  use  the  steps  I 
am  accustomed  to  using? 

•  I  have  only  two  people  on  the  team.  Can  I  still  do  it? 

•  What  can  I  do  in  a  few  weeks? 

Any  requirements  gathering  and  design  process  takes  time. 
For  the  CD  process,  the  real  speed  (clock  time)  to  project  com¬ 
pletion  depends  on  the  following  variables: 

•  The  number  of  customer  visits  that  you  choose  to  perform 

•  The  number  of  people  who  can  work  simultaneously  on  the 
project  or  the  helpers  you  can  get  at  key  points 

•  The  dedication  of  the  people  assigned  to  the  project  and 
whether  they  can  work  full-time 

•  The  size  of  the  problem — the  more  complex  the  business 
process,  the  more  complex  the  product  and  therefore  the 
longer  it  will  take  to  define  or  redesign 

•  The  number  of  stakeholders  that  have  to  be  satisfied,  coor¬ 
dinated  with,  and  communicated  to  and  so  the  more  buy  in 
you  need  and  the  longer  it  will  take 

With  a  small  enough  project  and  focused,  trained,  and  dedi¬ 
cated  resources,  teams  can  run  projects  that  infuse  customer 
data  into  the  design  in  a  few  weeks.  This  chapter  provides  an 
overview  of  the  steps  of  Rapid  CD  and  key  issues  in  its  use. 


RAPID  CONTEXTUAL  DESIGN  OVERVIEW 


We  have  identified  several  viable  Rapid  CD  design  processes  to 
guide  teams  in  choosing  a  process  right  for  their  projects.  Table 
49.1  provides  an  overview  of  the  steps  used  in  each  of  these 
processes. 

•  Lightning  Fast.  Characterize  your  user  population  and 
vision  a  solution  in  one  to  four  weeks. 


TABLE  49. 1 .  Comparison  of  Rapid  CD  Processes 


Rapid  CD 
Process 

Contextual 

Interviews 

with 

Interpretation 

Sequence 
Model  with 
Consolidation 

Affinity 

Diagrams 

WaU 

Walk& 

Visioning 

Storyboarding 

Paper  Mock- 
up  Interviews 
with 

Interpretation 

Lightning 

Fast 

1-4  weeks 

4-12  users 

■ 

1 

Lightning 
Fast  + 

4-8  weeks 

6-12  users 

4-9  users 

Focused 

6-10 

weeks 

8-12  users 

6-12  users 
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•  Lightning  Fast+.  Characterize  your  user  population,  vision 
a  solution,  mock  it  up  in  paper,  and  test  it  in  four  to  eight 
weeks. 

•  Focused  Rapid  CD.  Characterize  your  user  population,  col¬ 
lect  and  consolidate  sequences  for  a  task  analysis,  vision  a 
solution,  work  out  the  details  by  storyboarding,  mock  it  up  in 
paper,  and  test  it  in  6  to  10  weeks. 

Rapid  CD  processes  assume  a  two-person  team  with  occa¬ 
sional  helpers  at  key  points.  No  matter  the  time  or  resource  con¬ 
straints,  there  is  always  time  to  get  user  data  into  the  process — 
if  the  organization  is  committed  to  it.  These  Rapid  CD  project 
types  provide  some  guidelines  for  structuring  quick  processes 
for  any  project.  Successful  projects  always  need  to  have  strong 
project  management  and  must  be  run  on  day-by-day  schedules 
(for  detailed  descriptions  of  each  step,  schedules  and  guidelines 
for  project  management,  see  Holtzblatt,  Wendell,  &  Wood,  2005; 
see  Table  49.2  for  a  sample  schedule  for  a  one-week  project). 


Setting  Project  Scope:  What  Makes  a  Good 
Rapid  CD  Project? 

Rapid  CD  techniques  can  be  used  on  any  project  to  gather  cus¬ 
tomer  data.  However,  some  projects  are  especially  well  suited 
for  Rapid  CD.  The  basic  rule  of  thumb  is  to  have  a  small,  tight 
project  scope  targeting  a  few  clearly  defined  job  roles.  Following 
are  some  example  projects  that  work  well: 

•  Usability  fixes,  low-hanging  fruit,  quick  fixes.  Some¬ 
times  a  project  or  system  is  just  looking  for  the  top  fixes  to 
take  a  significant  step  to  improve  the  product  or  system.  You 
can  observe  a  cross  section  of  key  users  interacting  with  the 
existing  product,  system,  or  website.  From  this  data,  top  rec¬ 
ommendations  easily  can  be  generated. 

•  Market  or  population  characterization  for  new  system 
concepts.  The  goal  of  this  kind  of  project  is  to  understand 
the  users  and  generate  ideas  for  new  system  directions.  If 


your  new  market  or  business  process  has  no  more  than  four 
job  roles  in  the  target  population  or  process,  you  can  use 
Rapid  CD  to  characterize  the  needs  of  the  population. 

•  Website  evaluation  and  redesign.  You  can  use  Rapid  CD 
to  redesign  your  current  website  to  provide  your  users  better 
access  to  information  and  services  if  your  target  user  popu¬ 
lation  covers  one  to  three  major  job  roles.  Rapid  CD  can  eval¬ 
uate  how  people  currently  are  using  your  website  and  other 
related  sites  and  help  set  new  direction. 

•  Next-generation  system.  If  you  want  to  redesign  an  ex¬ 
isting  product  (not  a  product  suite)  and  your  product  tar¬ 
gets  one  or  two  major  job  roles.  Rapid  CD  will  help  you  to 
identify  new  function,  to  grow  the  product,  and  to  improve 
the  user  experience. 

•  Supporting  a  coherent  task.  If  you  want  to  build  a  system 
to  support  a  coherent  task  that  may  be  accomplished  by  one 
to  four  collaborating  job  roles.  Rapid  CD  can  help.  This  task 
may  be  part  of  an  existing  system,  a  larger  business  process, 
a  task  within  a  larger  ERP  environment,  a  portal,  a  product, 
or  a  website. 

•  Persona  development.  If  you  want  to  develop  personas 
for  your  website,  product,  or  process  and  you  have  one  to 
four  target  job  types.  Rapid  CD  can  provide  you  with  the  data 
you  need  to  find  your  user  types  and  write  rich  personae. 

Rapid  CD  can  work  for  these  projects  because  the  scope  is 
tight.  This  makes  Rapid  CD  a  good  companion  for  Agile  devel¬ 
opment  techniques  where  the  scope  is  already  tight. 

The  wider  the  scope  is,  the  longer  the  process  is,  and  the 
more  resources  are  required.  Therefore,  a  wider  scope  like  a 
large  business  process  or  suite  of  products  is  not  appropriate  for 
Rapid  CD  techniques  when  addressed  together.  However,  when 
these  projects  are  broken  into  tasks  or  individual  products  and 
addressed  successively,  you  can  use  Rapid  CD  techniques  for 
successful  redesign.  If  you  use  the  same  project  team  for  the 
whole  succession  of  projects,  you  will  end  up  with  a  coherent 
system.  If  you  try  to  work  multiple  teams  in  parallel,  you  may  in¬ 
appropriately  break  up  the  work  or  fragment  the  design. 


TABLE  49.2.  Sample  Lightning  Fast  schedule  for  Six  Customers,  Requiring  Five  to  Six  days  to  Gather,  Interpret, 
and  Organize  the  Data  and  Bainstorm  Solutions.  Preparation  for  Visits  and  Presentation  of  Findings  is  Not  Included 


Monday 

Tuesday 

Wednesday 

Thursday 

Friday 

AM 

Two  parallel  customer 
interviews. 

Return  immediately  for 
interpretation. 

AM 

Same  as  Monday. 

AM 

Same  as  Monday. 

All  Day 

Build  the  affinity  of 
approximately  300-400 
affinity  notes. 

Include  at  least  2-4  more 
people  to  help. 

With  no  helpers,  2  people 
will  need  Friday  to  finish. 
This  will  extend  the 
schedule  by  1  day. 

All  Day 

Finish  any  clean-up  of  the 
affinity  by  1 1 :00  a.m. 

Walk  the  affinity  to 
generate  design  ideas. 

Brainstorm 

recommendations  as  a 

PM 

Two  back-to-back 
interpretation  sessions. 

Bring  in  a  third  person  for 
perspective  and  buy-in. 

Finish  both 
interpretations. 

PM 

Same  as  Monday. 

Discuss  finding  and  reset 
focus  to  ensure  the  quality 
of  the  data. 

PM 

Same  as  Monday. 

Finish  all  interpretations 
by  end  of  day. 

group.  Capture 
recommendations  online 
during  the  session. 

Formal  preparation  of 
recommendations  is  not 
included. 

952  •  HOLTZBLATT 


The  Steps  of  Rapid  Contextual  Design 

The  Contextual  Interview — Getting  the  Right  Data 

In  order  to  design  a  product  that  meets  customers’  real 
needs,  designers  must  understand  the  customers  and  their 
practices^  Yet  designers  are  not  usually  familiar  with  or  experi¬ 
enced  in  the  user  activities  they  are  supporting.  If  they  operate 
from  their  gut  feelings,  they  rely  on  their  own  experiences  as 
users.  Generally,  designers  are  more  tolerant  of  technology  than 
average  users,  so  they  are  not  representative  of  end  users. 

On  the  other  hand,  requirements  gathering  is  not  simply  a 
matter  of  gathering  requirements  by  asking  people  what  they 
need  in  a  system  the  way  pebbles  can  be  gathered  from  the 
beach.  A  product  is  always  part  of  a  larger  practice.  It  is  used  in 
the  context  of  other  tools  and  manual  processes.  Product  de¬ 
sign  is  fundamentally  about  the  redesign  of  work  or  life  practice, 
given  technological  possibility.  In  addition,  practice  cannot  be 
designed  well  if  it  is  not  understood  in  detail. 

You  cannot  simply  ask  people  for  design  requirements,  in 
part  because  they  do  not  understand  what  technology  is  capa¬ 
ble  of,  but  more  because  they  are  not  aware  of  what  they  really 
do.  Because  the  everyday  things  people  do  become  habitual 
and  unconscious,  people  are  usually  unable  to  articulate  their 
practices.  People  are  conscious  of  general  directions,  such  as 
identifying  critical  problems,  and  they  can  say  what  makes  them 
angry  at  the  systems.  However,  they  cannot  provide  day-to-day 
details  about  what  they  are  doing.  This  low-level  detail  of  every¬ 
day  practice  grounds  designers  in  what  is  needed  before  they  in¬ 
vent  how  technology  might  augment  the  process. 

The  challenge  of  getting  this  design  data  is  finding  a  tech¬ 
nique  to  get  at  data  that  is  unconscious  and  tacit.  The  first 
step  of  CD  is  Contextual  Inquiry,  our  field  data  gathering  tech¬ 
nique  that  allows  designers  to  go  out  into  the  field  and  talk 
with  people  about  their  work  or  lives  while  they  are  observing 
them.  If  designers  watch  people  while  they  work,  the  people 
do  not  have  to  articulate  their  practices.  If  they  do  blow-by- 
blow  retrospective  accounts  of  things  that  happened  in  the 
recent  past,  people  can  stick  with  the  details  of  cases  using 
artifacts  and  reenactments  to  remind  them  of  what  happened. 
Field  data  overcomes  the  difficulties  of  discovering  tacit 
information. 

In  CD,  the  cross-functional  design  team  conducts  one-on- 
one  field  interviews  with  customers  in  their  workplaces  (or  life 
spaces)  focusing  on  the  aspects  of  the  practice  that  matter  for 
the  project  scope.  The  Contextual  Interview  lasts  about  two 
hours  and  is  based  on  four  principles  that  guide  how  to  run  the 
interview: 

•  Context — While  people  are  working,  gather  data  in  the 
workplace  and  focus  on  the  activities  they  are  doing. 

•  Partnership — Collaborate  with  customers  to  understand 
their  work;  let  them  lead  the  interview  by  doing  their  work. 
Do  not  come  with  planned  questions. 


•  Interpretation — Determine  the  meaning  of  the  customer’s 
words  and  actions  together  by  sharing  your  interpretations 
and  letting  them  tune  your  meaning.  When  immersed  in 
their  real  lives  and  real  works,  people  will  not  let  you  mis¬ 
construe  their  lives. 

•  Focus — Steer  the  conversation  to  meaningful  topics  by  pay¬ 
ing  attention  to  what  falls  within  project  scope  and  ignoring 
things  that  are  outside  of  it.  Let  users  know  the  focus  so  they 
can  steer,  too. 

The  Contextual  Interview  starts  like  a  conventional  interview, 
but  after  a  brief  overview  of  the  practice,  it  transitions  to  ongo¬ 
ing  observation  and  discussion  with  the  user  about  that  part  of 
the  practice  that  is  relevant  to  the  design  focus.  The  interviewer 
watches  the  customer  for  overt  actions,  verbal  clues,  and  body 
language.  By  sharing  surprises  and  understandings  with  users  in 
the  moment,  users  and  designers  can  enter  into  a  conversation 
about  what  is  happening,  why,  and  the  implications  for  any  sup¬ 
porting  system.  As  much  as  possible,  the  interviewer  keeps  the 
customer  grounded  in  current  activity,  but  can  also  use  artifacts 
to  trigger  memories  of  recent  activities. 

The  fundamental  intent  of  the  Contextual  Interview  is  to  help 
designers  get  design  data:  low  level,  detailed  data  about  the 
structure  of  the  practice  and  the  use  of  technology  within  that 
practice.  At  InContext,  we  believe  that  the  Contextual  Interview, 
which  is  based  on  observing  people  in  the  context  of  their  prac¬ 
tices  while  they  do  their  normal  activities,  is  the  best  way  to  get 
this  necessary  design  data.  This  data  is  critical — whatever  else  we 
trade  off  in  Rapid  CD,  we  never  trade  off  the  field  interviews. 

Choosing  customers  depends  on  project  scope.  The 

number  of  people  that  should  be  interviewed  is  directly  related 
to  the  project  scope.  The  wider  the  scope,  the  more  people 
need  to  be  interviewed  to  cover  that  scope.  Our  rule  of  thumb 
for  a  small  scope,  such  as  top  10  problems,  usability  improve¬ 
ment,  next  product  release,  or  checking  a  planned  design,  is 
6  to  10  customers  from  three  to  five  businesses  covering  one 
or  two  roles.  The  more  roles  you  need  to  cover  and  the  more 
contexts  (type  of  business,  characteristic  of  person,  geographic 
location),  the  more  people  you  need  to  interview. 

"When  analyzing  the  project,  first  identify  the  job  titles  or 
roles  targeted  and  the  people  that  support  the  activities  to  be 
studied.  A  Rapid  CD  project  can  handle  one  to  four  roles,  which 
means  keeping  the  scope  of  the  practice  tight.  Remember,  the 
work  or  life  practice  users  engage  in  is  what  counts — if  one  per¬ 
son  is  called  a  system  administrator  and  the  other  a  database  ad¬ 
ministrator,  but  they  do  the  same  work  as  it  relates  to  the  proj¬ 
ect,  it  does  not  matter  that  their  job  titles  are  different.  "We 
interview  a  minimum  of  three  people  per  role;  four  is  better. 

If  the  product  is  to  support  people  across  different  contexts, 
for  example  industries,  what  matters  is  whether  the  contexts 
imply  a  different  practice.  In  real  estate,  for  example,  the  work  is 
structured  differently  within  the  industry:  a  group  of  small,  dis¬ 
tributed  agencies,  a  large  corporate  real  estate  company,  and 


^Some  products,  systems,  and  websites  support  the  way  people  work,  keep  businesses  running,  or  help  users  find  needed  information.  Other  prod¬ 
ucts,  systems,  and  websites  address  games,  other  entertainment,  or  consumer  information  to  support  life  decisions.  To  gather  data  for  these  consumer 
products  we  have  to  look  at  people’s  life  practice.  To  simplify  language,  this  chapter  will  use  practice  to  mean  both  work  and  life  practice. 
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an  in-house  real  estate  representative.  In  each  situation,  the 
communication,  sharing,  and  work  management  is  likely  to  dif¬ 
fer,  creating  three  different  practice  patterns.  To  cover  real  es¬ 
tate,  you  need  to  collect  data  in  all  three  contexts.  We  try  to  in¬ 
terview  at  least  two  businesses  or  independent  set  of  people  per 
practice  pattern. 

If,  for  your  industries,  there  are  no  changes  to  practice  pat¬ 
terns,  then  simply  touch  multiple  industries  without  worrying 
about  overlap.  In  this  case,  wide  diversity  is  best. 

The  goal  in  selecting  users  to  interview  is  to  get  enough  rep¬ 
etition  in  the  practice  so  that  each  role  and  contextual  variable 
has  three  or  four  interviews  that  represent  it,  remembering  that 
any  one  person  may  represent  several  of  the  contextual  vari¬ 
ables.  As  long  as  you  have  overlap,  you  will  be  able  to  find  the 
common  structure  and  key  variations  in  the  practice.  Remem¬ 
ber,  paper  prototype  interviews  will  expand  the  number  of  con¬ 
texts  and  roles  represented  in  the  whole  project. 

Using  content  experts.  A  user  advocate  is  not  the  user. 
The  core  to  your  success  is  to  get  to  the  real  users  of  your  pro¬ 
posed  system.  Some  organizations  resist  letting  teams  go  to 
the  user  because  they  think  they  have  in-house  expertise.  Some 
companies  hire  business  analysts  who  used  to  be  users  in  the 
hope  that  they  will  provide  the  requirements  knowledge  needed. 
In  addition,  some  methods,  such  as  XP,  advocate  that  a  cus¬ 
tomer  representative  is  all  you  need  to  iterate  the  design. 

But  a  representative  user  is  not  the  same  as  a  set  of  current  ac¬ 
tual  users.  A  representative  user  never  is.  No  one  person  can  em¬ 
body  all  the  users  of  a  real  system.  Although  interviewing  several 
users  reveals  a  common  practice,  no  one  user  ever  represents 
that  practice  in  its  entirety.  Furthermore,  the  more  the  user  ad¬ 
vocate  becomes  part  of  the  engineering  organization,  the  less 
useful  they  are  as  a  user  surrogate.  They  learn  too  much  about 
the  technology,  and  they  become  invested  in  the  team’s  thinking. 

More  importantly,  user  advocates  cannot  articulate  their 
practices  any  more  than  users  can.  So  involve  user  advocates  in 
your  process,  put  them  on  the  team,  and  let  them  tell  you  their 
entering  assumptions  about  the  practices  and  problems  to  help 
set  focus  for  the  interviews.  However,  do  not  use  them  as  sub¬ 
stitutes  for  real  user  data. 

User  advocates  do  not  speed  the  process — they  just  create 
another  in-house  person  with  whom  to  argue.  Only  data  col¬ 
lected  through  an  agreed  upon  external  process  will  get  respect 
for  the  findings  across  the  organization.  You  speed  the  process 
by  collecting  the  right  data  and  generating  real  findings  through 
an  agreed  upon  method. 

Interpretation  Sessions  and  Work  Modeling — 

Creating  a  Shared  Understanding 

Contextual  Interviews  produce  large  amounts  of  customer 
data,  all  of  which  must  be  shared  among  the  core  design  team 
and  with  the  larger,  cross-functional  team  of  user  interface  de¬ 
signers,  engineers,  documentation  people,  internal  business 


users,  and  marketers.  Traditional  methods  of  sharing  through 
presentations,  in  reports,  or  by  e-mail  do  not  allow  people  to 
truly  process  the  information  or  bring  their  perspectives  into  a 
shared  understanding.  CD  overcomes  this  by  involving  the  team 
in  interactive  sessions  to  review,  analyze,  and  manipulate  the 
customer  data.  For  Rapid  CD  your  two-person  core  team  should 
be  cross  functional:  for  example,  a  user  researcher  and  an  in¬ 
teraction  designer;  a  product  manager  and  a  developer;  or  an 
analyst  and  an  interaction  designer. 

The  data  will  be  richer  and  the  process  of  buy  in  will  be  en¬ 
hanced  if  other  stakeholders  who  will  be  building,  marketing,  or 
using  the  system  participate  in  the  interpretation  session.  This 
is  a  two-hour  commitment,  a  reasonable  request  for  help  in 
most  organizations.  We  therefore  recommend  a  three-person 
team  for  interpretation  sessions  in  Rapid  CD. 

While  we  audio  record  interviews  for  backup,  we  do  not  tran¬ 
scribe  the  recordings  or  do  videotape  analysis.  Both  transcrip¬ 
tion  and  video  analysis  take  too  long,  and  video  analysis  limits 
the  perspective  to  one  person.  Instead,  within  48  hours  of  each 
customer  interview,  the  team  gathers  in  the  design  room,  where 
the  interviewer  tells  the  story  of  the  interview  from  handwritten 
notes  and  memory. 

Team  members  ask  questions  about  the  interview,  drawing 
out  the  details  of  this  retrospective  account.  One  person  is  the 
recorder  typing  notes  online  (or  into  CDTools^).  The  other 
team  member  who  did  the  interview  retells  the  story  being  cap¬ 
tured.  The  helper  listens  and  points  out  key  issues  to  be  cap¬ 
tured.  Each  team  member  brings  a  different  perspective  to  the 
data,  while  open  discussion  enables  the  team  to  arrive  at  a 
shared  understanding.  Participants  ask  questions,  triggering  the 
interviewer’s  memory  and  eliciting  more  data  than  would  be 
available  from  a  designer  working  alone.  'When  the  discussion 
sparks  design  ideas,  they  are  captured  in  the  notes. 

The  notes  are  displayed  on  a  monitor  or  computer  project  so 
that  everyone  can  see  them.  These  notes  capture  the  key  prac¬ 
tice  issues,  cultural  observations,  breakdowns,  successes,  task 
patterns,  design  ideas,  and  any  other  interpretation  or  issue  that 
has  relevance  to  the  project  scope.  Later  these  notes  are  trans¬ 
ferred  to  Post-its  and  used  to  build  the  affinity  diagram. 

For  some  Rapid  CD  projects,  all  you  need  are  these  notes  to 
characterize  the  problems  and  issues  of  the  customer  popula¬ 
tion.  But,  if  you  want  a  task  analysis,  plan  to  do  storyboarding,  or 
need  to  map  a  work  process,  we  recommend  you  also  capture 
the  CD  sequence  model.  In  Focused  Rapid  CD,  the  team  cap¬ 
tures  the  sequence  model  showing  each  step  required  to  per¬ 
form  a  task  in  order. 

Rapid  CD  does  not  use  any  of  the  other  four  work  models 
from  the  larger  CD  process.  Capturing  and  consolidating  work 
models  characterizes  the  population  much  more  completely 
but  takes  time  to  build.  Rapid  CD  trades  off  this  richness  for 
time.  Because  the  scope  of  a  Rapid  CD  project  is  small  and  tight, 
it  is  already  reasonably  tactical  making  the  richness  of  additional 
data  nice  to  have  but  not  necessary. 

The  interpretation  session  is  a  structured  way  to  ensure  that 
all  relevant  information  from  the  interview  is  captured  for  use  in 


CDTooIs  assists  a  Contextual  Design  team  in  gathering  and  organizing  CD  data.  It  provides  features  to  track  users  interviewed,  capture  interpreta¬ 
tion  notes,  capture  an  affinity  diagram,  and  publish  the  organized  affinity  to  the  Web. 
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the  design  process  and  shared  with  key  team  members.  It  is  the 
context  for  both  understanding  and  seeing  the  structure  of  the 
data  and  starting  a  real  design  conversation  about  how  to  ad¬ 
dress  the  users’  needs  with  technology. 

Working  as  a  distributed  team.  More  and  more  com¬ 
panies  are  expecting  their  teams  to  be  distributed.  Distributed 
teams  have  the  advantage  that  data  can  be  collected  from  differ¬ 
ent  locations  with  less  travel,  which  saves  time  and  cost.  How¬ 
ever,  any  distributed  team  has  to  be  more  organized,  relies  on 
the  latest  collaborative  technology,  and  needs  to  work  harder  to 
develop  a  shared  understanding. 

The  interpretation  sessions  of  Rapid  CD  are  easy  to  do  in  a  dis¬ 
tributed  fashion.  If  you  have  collaboration  or  virtual  meeting  soft¬ 
ware,  you  can  use  that  to  run  a  distributed  meeting.  Simply  display 
CDTools  or  your  word  processor  in  a  collaboration  space  so  all  can 
see  the  notes  as  they  are  typed.  Get  everyone  on  the  phone  and 
start  the  session  as  usual.  As  the  interviewer  tells  the  story,  the 
note  taker  captures  the  issues  and  others  can  call  out  design  ideas 
and  ask  questions  to  get  at  the  details  of  what  happened. 

If  you  are  capturing  sequence  models,  you  may  want  to  con¬ 
sider  capturing  those  online  in  a  word  processor,  spreadsheet, 
or  drawing  tool,  switching  the  shared  display  from  the  interpre¬ 
tation  notes  to  the  sequence  model  as  needed.  Periodically 
stopping  to  check  the  sequence  models  will  add  to  your  inter¬ 
pretation  session  time,  but  is  a  critical  step  to  be  sure  that  the 
data  is  being  captured  correctly. 

The  key  to  success  is  keeping  everyone  engaged.  Be  sure 
that  everyone  has  a  role  and  is  actively  participating.  If  someone 
has  been  quiet  for  a  period,  check  in  with  him  or  her.  Consider 
having  both  sides  of  the  phone  responsible  for  some  type  of 
data  capture  to  ensure  involvement. 

Finally,  do  not  start  distributed.  Do  your  first  interviews  on 
the  road  together  or  in  one  location.  Work  out  your  process 
and  your  initial  understanding  of  the  users’  activities,  issues, 
and  practices  together.  This  becomes  the  bedrock  of  your 
shared  understanding.  After  that,  the  distributed  session  works 
very  well. 

Consolidation — Creating  One  Picture  of  the  Customer 

Consolidating  the  notes  and  the  sequence  models  creates  a 
coherent  representation  of  a  market’s  or  a  user  population’s 
needs  and  activities.  Then  the  design  can  address  a  whole  pop¬ 
ulation,  not  just  one  individual.  The  most  fundamental  goal  of 
CD  is  to  get  the  team  to  design  from  data  instead  of  from  the  1. 
If  you  walk  in  the  hall  and  listen  to  designers  talking,  you  will 
hear  comments  such  as  “I  like  this  feature”  or  “The  interface  will 
work  best  this  way.”  It  is  rare  to  hear,  “Our  user  data  says  that 
the  work  is  structured  like  this,  so  we  need  this  function.”  It  is 
natural  for  people  who  design  to  make  a  system  hang  together 
in  a  way  that  makes  sense  to  them.  But  they  are  not  the  users, 
and  increasingly,  they  are  not  in  any  way  doing  the  activities  of 
the  people  for  whom  they  are  designing.  Getting  designers, 
marketers,  and  business  analysts  out  to  the  field  and  into  inter¬ 
pretation  sessions  moves  them  away  from  design  from  personal 
preference.  However,  we  do  not  want  them  to  become  attached 
to  their  users  to  the  exclusion  of  the  rest. 


Product  and  system  design  must  address  a  whole  market  or 
user  population.  It  must  take  into  consideration  the  issues  of 
the  population  as  a  whole,  the  structure  of  work,  and  the  varia¬ 
tions  natural  to  that  work.  The  core  intent  of  data  consolida¬ 
tion  is  to  find  the  issues  and  the  activity  structure  and  create  a 
coherent  way  to  see  it  and  talk  about  it. 

In  Rapid  CD,  we  consolidate  the  notes  from  the  interpreta¬ 
tion  session  into  the  affinity  diagram  and  the  sequences  into 
consolidated  sequence  models. 

The  affinity  diagram  (see  Fig.  49.1)  brings  issues  and  insights 
across  all  customers  together  into  a  wall-sized  hierarchical  dia¬ 
gram.  The  team  prints  the  notes  captured  in  the  interpretation 
session  onto  labels  that  are  sticky  notes.  The  team  then  orga¬ 
nizes  all  this  data  into  groups,  finding  common  underlying 
themes  that  cross  the  customer  population.  The  process  ex- 
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FIGURE  49.1.  The  affinity  diagram 
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poses  and  makes  concrete  common  issues,  distinctions,  prac¬ 
tice  patterns,  and  needs  without  losing  individual  variation. 
Walking  the  affinity  diagram  allows  designers  to  respond  with 
design  ideas  that  extend  and  transform  the  work.  They  write 
these  on  Post-its  and  stick  them  right  to  the  data  that  stimulated 
the  idea.  This  encourages  a  culture  of  design  from  data  over  a 
focus  on  cool  ideas  generated  from  the  /. 

The  affinity  is  built  from  the  bottom  up,  which  allows  the  in¬ 
dividual  notes  that  come  together  to  suggest  groupings  instead 
of  trying  to  force  them  into  predefined  categories.  Groups  are 
labeled  using  the  voice  of  the  customer — saying  what  they  do 
and  how  they  think.  In  Rapid  CD,  a  team  of  two  can  build  an 
affinity  diagram  in  one  to  three  days.  This  is  the  place  that  we 
recommend  using  helpers — again,  a  one-day  commitment  can 
speed  the  process  and  expose  stakeholders  to  the  raw  data 
building  buy  in  to  the  process  and  the  data. 

The  consolidated  sequence  model  (see  Fig.  49.2)  shows  the 
detailed  activity  structure  that  the  system  will  support  or  re¬ 
place.  This  is  similar  to  traditional  task  analysis,  showing  each 
step,  triggers  for  the  steps,  different  strategies  for  achieving 
each  intent,  and  breakdowns  in  the  ongoing  work.  Work  prac¬ 
tice  redesign  is  ultimately  about  redesigning  the  steps  in  the 
sequences.  Whether  the  redesign  eliminates  or  changes  the 
steps  or  eliminates  the  whole  sequence,  knowing  the  steps  and 
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FIGURE  49.2.  The  consolidated  sequence  model. 


intents  keeps  the  team  honest.  More  often  than  not,  technology 
introduces  new  problems  into  the  practice  by  failing  to  consider 
the  fundamental  intents  that  people  are  trying  to  achieve.  Re¬ 
design  will  better  support  the  practice  if  it  accounts  for  each 
intent,  trigger,  and  step.  This  does  not  mean  leaving  the  steps  of 
the  activity  the  same;  it  means  that  the  team  has  seen  the  cur¬ 
rent  activity  and  has  completely  considered  what  will  happen 
to  it  in  the  new  redesigned  practice.  The  consolidated  sequence 
is  critical  as  a  guide  for  storyboarding  or  for  streamlining  any 
business  processes. 

Consolidate  sequences  by  collecting  together  sequences 
with  similar  overall  intents.  Then,  group  the  steps  into  chunks 
that  represent  activities  and  consolidate  across  like  activities. 
Create  abstract  steps  that  represent  the  detailed  individual  steps 
in  each  actual  sequence  model.  Consolidate  only  the  tasks  that 
the  system  will  support  or  that  you  need  to  understand  in  detail. 

How  can  so  little  data  characterize  a  whole  market? 

Consolidated  data  is  built  from  8  to  30  field  interviews  and  can 
characterize  markets  of  millions  of  people.  Consolidated  data 
from  8  to  12  people  can  identify  a  large  percent  of  the  key  issues 
that  will  eventually  be  identified  in  the  population.  For  exam¬ 
ple,  even  on  large  projects  we  routinely  consolidate  after  10  to 
15  interviews  and  then  grow  the  models  and  affinity  from  there 
as  more  data  is  collected.  An  early  affinity  represents  the  key 
areas  and  distinctions  that  will  grow  in  detail  as  more  data  is 
added,  and  reveals  holes  in  the  data  to  guide  further  data  col¬ 
lection.  Similarly,  additional  data  adds  depth  and  detail  to  the  se¬ 
quence  model,  but  the  basic  structure,  central  to  the  project 
focus,  is  identifiable  early.  Therefore,  a  Rapid  CD  project  with 
8  to  12  key  users  will  reveal  the  vast  majority  of  the  issues  that 
you  need  to  consider  for  your  project. 

Early  in  usability  testing,  people  were  not  comfortable  with 
test  results  from  small  numbers  of  users.  Whatever  the  formal 
arguments,  years  later  we  have  learned  empirically  that  small 
numbers  are  enough.  CD  gets  its  power  from  designing  from  an 
understanding  of  practice  structure  without  losing  variation. 

Consolidation  helps  people  see  and  find  the  structure  in  the 
practice,  and  this  drives  successful  design.  Here  is  a  way  to  think 
about  it. 

We  are  all  different,  but  we  are  alike.  Everybody  looks  differ¬ 
ent;  humans  have  great  variation.  People  are  of  different  ethnic 
groups,  cultures,  and  child  rearing  practices.  Everyone  chooses 
different  clothes,  hobbies,  careers,  and  life  styles.  At  one  level, 
we  are  all  unique.  However,  at  the  same  time  we  are  alike,  each 
having  one  head  and  body  with  two  arms  and  two  legs.  For  ex¬ 
ample,  clothing  manufacturers  can  make  off  the  rack  clothing; 
making  a  few  adjustments  for  different  body  sizes  is  enough. 
Structurally,  the  variation  between  people  is  small  and  the  struc¬ 
ture  of  our  bodies  is  common. 

There  are  only  so  many  ways  to  do  anything.  Any  product  and 
system  design  is  really  a  very  narrow  focus  on  the  human  experi¬ 
ence.  Within  one  kind  of  work  or  activity,  only  so  much  variation 
is  possible.  The  roles  we  play,  the  intents  and  goals  we  have,  and 
the  way  we  do  things  is  similar.  Variation,  once  you  start  looking 
for  structural  elements  is  small.  We  have  found,  for  example,  that 
there  are  only  two  to  four  strategies  for  any  primary  task  in  a  prac¬ 
tice.  If  those  are  the  key  strategies  for  that  practice,  the  question 
is  not  which  we  support,  but  how  we  support  them  all. 
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Designs  that  target  a  certain  domain  of  activity  are  created 
for  a  certain  set  of  people  doing  a  fixed  set  of  tasks,  situated  in 
the  larger  culture,  using  the  same  set  of  tools,  trying  to  achieve 
the  same  kind  of  goals.  Under  these  constrained  conditions, 
practices  from  person  to  person  will  have  similar  patterns.  After 
you  have  collected  data  from  three  to  six  people  doing  the  same 
thing,  the  activity  patterns  and  issues  start  overlapping. 

Personas  give  voice  to  the  consolidated  data.  When 
your  cross-functional  team  has  worked  together  to  gather,  in¬ 
terpret,  and  consolidate  customer  data,  you  have  developed  a 
shared  understanding  of  the  users  and  their  issues.  You  have 
not  just  built  the  data — you  have  experienced  it  in  a  way  that 
others  have  not.  Therefore,  you  may  find  the  consolidated  mod¬ 
els  and  affinity  to  be  evocative  representations  of  the  customer 
data  you  experienced,  but  others  do  not  share  that  experience 
and  cannot  assimilate  these  models  as  easily. 

Most  teams  need  to  communicate  their  understandings  of 
user  needs  and  their  design  plans  to  stakeholders — management, 
customer  organizations,  product  groups,  and  so  forth.  These 
stakeholders,  with  no  background  in  CD  and  no  experience  with 
the  data,  do  not  embody  the  memories  of  the  real  users  from  the 
field  interviews.  When  they  walk  the  data  they  can  see  the  issues, 
needs,  and  breakdowns,  but  the  power  of  knowing  the  users  and 
their  issues  personally  is  not  as  poignant  or  personal. 

Personas  can  help  bring  users  alive  and  focus  the  stakehold¬ 
ers  on  the  relevant  issues  when  they  are  built  from  rich  contex¬ 
tual  data.  Popularized  by  Cooper  (1997),  a  persona  describes 
typical  users  of  the  proposed  system  as  though  they  were  real 
people.  Their  use  is  becoming  more  widespread,  though  with 
mixed  success.  According  to  Manning  (2003),  ‘A  persona  that’s 
not  backed  by  rich  contextual  data  isn’t  valid,  which  accounts 
for  much  of  the  mixed  success.”  However,  when  backed  by  rich 
contextual  data,  they  can  help  developers  and  designers  not  in¬ 
volved  in  the  data  collection  focus  on  the  needs  and  character¬ 
istics  of  their  users  (Manning,  2003).  Anyone,  trained  in  CD  or 
not,  can  read  this  vignette  and  gain  a  sense  of  the  typical  user 
they  are  trying  to  support. 

We  build  personas  from  the  data  collected  in  Rapid  CD  to  help 
focus  on  the  characters  we  are  visioning  about  in  the  next  step,  to 
help  stakeholders  segment  their  market  according  to  practice  in¬ 
stead  of  typical  demographics,  to  clarify  branding  and  prioritization, 
and  to  bring  the  users  and  their  needs  to  life  for  developers. 

To  build  a  persona,  the  team  looks  for  core  practice  roles 
among  the  users,  each  characterizing  a  different  way  of  doing 
the  work.  Expect  anywhere  from  two  to  six  of  these  in  a  Rapid 
CD  project.  Then  for  each  core  practice  role,  find  the  base  user: 
the  user  that  most  exemplifies  this  role.  Look  at  others  who  also 
manifest  this  role  and  borrow  other  relevant  tasks,  values,  and 
life  story  elements  to  create  your  archetypical  role.  This  collec¬ 
tion  of  exemplary  facts  then  becomes  the  basis  for  your  story 
of  the  named  persona.  With  the  rich  user  data  in  the  affinity  al¬ 
ready  grouped  into  issues,  it  is  easy  to  harvest  the  data  for  the 
key  elements  that  differentiate  the  different  personae.  Write  a 
paragraph  about  of  the  persona’s  project-related  life;  get  a  rep¬ 
resentative  picture;  list  the  typical  tasks,  roles,  goals,  and  other 
practice  characteristics  relevant  to  your  project  focus,  and  you 
have  built  a  persona  based  on  reliable  customer  data  that  you 
know  represents  your  user  population. 


Visioning  a  NewV/ork  Practice 

"Visioning  is  about  invention.  However,  design  of  technology 
is  first  design  of  the  story  showing  how  manual  practices,  hu¬ 
man  interactions,  other  tools  come  together  with  your  product 
or  system  to  better  support  the  whole  practice.  Visioning  is  the 
CD  technique  to  help  teams  tell  that  story.  Visioning  is  a  vehi¬ 
cle  to  identify  needed  function  in  the  context  of  the  larger  prac¬ 
tice.  Visioning  ensures  that  teams  put  off  lower-level  decisions 
about  implementation,  platform,  and  user  interface  until  they 
have  a  clear  picture  of  how  their  solution  will  fit  into  the  whole 
of  the  practice.  Teams  commonly  focus  too  much  on  low-level 
details  instead  of  the  full  sociotechnical  system.  This  is  one 
cause  of  breaking  the  way  users  perform  activities  and  of  fail¬ 
ing  to  create  something  the  market  wants.  Therefore,  the  pri¬ 
mary  intent  of  visioning  is  to  redesign  the  practice,  not  to  design 
a  user  interface.  Because  a  visioning  session  is  a  group  activity,  it 
fosters  a  shared  understanding  among  team  members  and  helps 
them  use  their  different  points  of  view  to  push  creativity. 

In  Rapid  CD,  the  core  team  visions  a  solution,  but  this  is  an¬ 
other  time  for  bringing  in  stakeholders  and  helpers.  A  good  vi¬ 
sioning  session  has  from  four  to  six  participants.  If  helpers  have 
been  involved  in  the  interpretation  session  and  affinity  build¬ 
ing  they  may  want  to  participate  in  the  visioning  session.  Again, 
this  is  a  one-  or  two-day  commitment,  which  allows  a  wider 
team  to  participate  in  creating  the  solution  direction. 

The  first  step  to  a  visioning  session  is  to  walk  the  data  model 
by  model,  immersing  the  team  in  customer  data  so  their  inven¬ 
tions  will  be  grounded  in  the  users’  work.  During  the  walk,  team 
members  compare  ideas  and  begin  to  get  a  shared  idea  of  how 
to  respond  to  the  data.  Our  rule  is  that  no  one  gets  to  vision 
unless  he  or  she  has  participated  in  walking  the  data.  Without 
this  rule,  the  process  is  no  longer  data  driven;  anyone  can  walk 
in  and  offer  their  pet  design  ideas.  Simply  walking  the  customer 
data  naturally  selects  and  tailors  preexisting  ideas  to  fit  the 
needs  of  the  population.  Since  we  will  evaluate  the  visions 
based  in  part  on  fit  to  the  data,  knowing  the  data  is  important. 

During  the  visioning  session,  the  team  will  pick  a  starting 
point  and  build  a  story  of  the  new  practice.  One  person  is  as¬ 
signed  to  be  the  pen,  or  the  person  who  draws  the  story  on  a 
flip  chart,  fitting  ideas  called  out  by  the  team  members  into  the 
story  as  it  unfolds.  The  story  describes  the  new  practice,  show¬ 
ing  people,  roles,  systems,  and  anything  else  the  vision  requires. 
The  team  does  not  worry  about  practicality  at  this  point;  all 
ideas  are  included.  Creating  several  visions  allows  the  team  to 
consider  alternative  solutions. 

After  a  set  of  visions  is  created,  the  team  evaluates  each  vi¬ 
sion  in  turn,  listing  both  the  positive  and  negative  points  of  the 
vision  from  the  point  of  view  of  customer  value,  engineering 
effort,  technical  possibility,  and  corporate  value.  The  negative 
points  are  not  thrown  out  but  used  to  stimulate  creative  design 
ideas  to  overcome  objections.  When  complete,  the  best  parts  of 
each  vision  and  the  solutions  to  objections  are  brought  together 
into  one,  synthesized  practice  redesign  solution  (see  Fig.  49.3). 

You  cannot  invent  from  customer  data.  Every  team 
we  work  with  raises  this  claim.  Customer  data  tell  you  what  is, 
not  what  could  be.  How  can  you  see  the  future  by  looking  at  the 
past?  Here  is  what  we  say. 


49.  Contextual  Design 


•  957 


FIGURE  49.3.  A  complete,  synthesized  vision. 


Every  invention  supports  a  real  need,  otherwise  why  would 
anyone  want  it?  Invention  is  a  response  to  some  life  or  work 
practice  by  a  designer  or  technologist  who,  seeing  a  need  and 
knowing  the  technology,  imagines  a  new  possibility.  Edison  did 
not  invent  the  idea  of  light;  he  saw  candles  and  gas  and  invented 
light  bulbs.  Bricklin  did  not  invent  spreadsheets;  he  saw  how 
paper  spreadsheets  are  used  and  knew  what  technology  could 


do.  The  developers  of  WordPerfect  worked  in  the  basement  be¬ 
low  a  secretarial  pool.  Nobody  invents  entirely  new  things  that 
fulfill  no  need  and  contribute  in  no  way  to  human  practice;  they 
invent  new  ways  to  fill  existing  needs  and  overcome  existing  lim¬ 
itations.  As  people  incorporate  these  products  and  new  ways  of 
working  into  their  lives,  they  reinvent  it  by  adopting  and  adapt¬ 
ing  the  new  way  of  working.  If  designers  are  out  there  seeing 
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people  living  their  lives  with,  without,  and  in  spite  of  technol¬ 
ogy,  they  can  see  future  directions  for  technology. 

A  vision  is  only  as  good  as  the  team’s  combined  skill.  Cus¬ 
tomer  data  is  the  context  that  stimulates  the  direction  of  inven¬ 
tion.  However,  no  invention  is  without  understanding  the  ma¬ 
terials  of  invention:  technology,  design,  and  practice  patterns. 
The  visioning  team  needs  to  include  people  who  understand 
the  possibilities  and  constraints  of  the  technology.  If  the  team 
is  supposed  to  design  "Web  pages  and  none  of  them  has  ever 
designed  a  Web  page,  they  will  not  be  able  to  use  Web  technol¬ 
ogy  to  design.  When  the  people  who  always  designed  main¬ 
frames  were  told  to  design  windowing  interfaces,  they  repli¬ 
cated  the  mainframe  interface  in  the  windowing  interface.  This 
is  why  we  recommend  that  design  teams  include  people  with  di¬ 
verse  backgrounds  representing  all  the  materials  of  design  and 
the  different  functions  of  the  organization.  Only  then  will  an 
innovative  design,  right  for  the  business  to  ship,  emerge. 

We  do  not  ask  customers  what  to  make;  we  understand  what 
they  do.  Customers  who  are  not  aware  of  the  details  of  their  prac¬ 
tices,  who  do  not  know  the  latest  technologies,  and  who  do  not 
know  what  your  business  is  capable  of  cannot  tell  you  what  to  in¬ 
vent.  We  do  not  ask  them.  Instead,  we  understand  what  they  are 
doing  and  capture  it  systematically.  We  immerse  a  design  team — 
who  does  understand  technology,  the  practice,  and  the  busi¬ 
ness — in  that  data  and  let  them  vision.  However,  we  are  not  done. 
The  vision  has  to  be  right  for  people,  so  we  take  it  out  and  test  it. 
We  let  people  test  drive  the  future  in  our  paper  mock  ups  and  let 
their  tacit  knowledge  of  their  lives  shape  and  direct  our  vision. 


Defining  Your  Rapid  CD  Process — Tfie  Half-Way  Point 

The  vision,  produced  by  a  cross-functional  team  immersed  in 

customer  data,  is  the  jumping  off  point  for  concurrent  design: 

•  Interaction  design:  The  vision  guides  the  detailed  design 
to  produce  the  overall  user  experience  plan  and  final  inter¬ 
action  design. 

•  Engineering:  The  vision  contains  implementation  assump¬ 
tions  and  challenges  that  must  be  looked  at  for  viability  be¬ 
fore  the  company  can  commit  to  the  vision.  The  data  pro¬ 
vides  the  basis  for  as-is  use  cases  that  can  be  used  to 
characterize  the  current  practice. 

•  Marketing:  The  vision  is  the  story  of  the  new  practice — the 
basis  for  communication  to  customers.  Sales  and  marketing 
are  always  based  on  storytelling:  how  the  new  product  will 
benefit  the  customer.  Sharing  the  story  in  user  groups,  focus 
groups,  and  individual  conversations  is  a  good  way  to  gauge 
sales  point  (the  excitement  that  the  story  will  generate) . 

•  Business  planning:  The  vision  can  be  used  to  drive  market¬ 
ing  surveys  and  investigations  to  flesh  out  the  business  case. 

•  Business  (or  enterprise)  process  design:  The  data  itself 
reveals  root  causes  to  problems  and  drives  which  processes 
to  study  more  and  measure.  It  also  provides  a  representation 
of  the  as-is  process  to  drive  process  mapping  and  as-is  use 
case  development.  The  vision  represents  the  new  sociotech- 
nical  system  that  could  be  put  in  place  to  streamline  the 
process.  Rather  than  starting  by  improving  the  processes  and 


then  designing  the  system  to  support  it,  the  vision  redesigns 
the  process  and  the  technology  as  one  integrated  whole.  This 
speeds  the  design  process  and  ensures  that  the  systems  and 
process  support  each  other  in  creating  a  new  practice  that 
people  are  willing  to  adopt. 

•  Testing:  The  consolidated  models  drive  test  case  development. 

•  Documentation:  The  vision  communicates  what  the  prod¬ 
uct  or  system  is  so  they  can  start  the  introduction  to  the 
user’s  manual. 

Because  the  vision  is  the  center  of  the  design,  revealing  di¬ 
rection  for  a  system  or  product  based  on  real  user  data,  some 
companies  may  wish  to  transition  to  their  existing  processes  af¬ 
ter  this  step.  Lightning  Fast  CD  gets  customer  data  into  the 
product  or  system-concepting  process  and  relies  on  the  orga¬ 
nization’s  existing  process  to  define  and  test  the  actual  system. 

Lightning  Fast  CD  is  also  a  good  starting  point  for  an  organi¬ 
zation — in  only  a  few  weeks,  you  can  gather  and  organize  data 
to  reveal  problems,  issues,  root  causes,  and  opportunities.  The 
vision  step  helps  teams  organize  their  responses  to  this  data  to 
create  a  direction  for  their  company  or  project.  This  direction 
can  be  tactical  or  identify  new  product  direction  depending  on 
the  project  scope.  As  such.  Lightning  Fast  CD  infuses  user  data 
into  a  design  process  simply  and  easily  without  challenging 
other  existing  practices  that  might  be  more  well  defined,  already 
successful,  or  simply  rooted  in  history  and  so  hard  to  change. 

Lightning  Fast  CD  also  supports  some  of  the  new  trends  in 
design.  For  Agile  techniques,  the  vision  implies  core  function 
and  implicit  prioritization,  which  can  be  broken  into  stories  for 
an  Agile  planning  session.  If  a  company  simply  wants  to  develop 
personas,  the  consolidated  data  provides  the  information 
needed  to  flesh  out  the  characters  your  organization  needs  to 
support.  If  you  add  visioning  to  this,  you  have  stories  of  how 
these  characters  might  operate  with  additional  technology  and 
service  support.  If  you  turn  the  vision  into  high-level  future  sce¬ 
narios,  you  have  a  powerful  communication  package  to  guide 
your  developers  and  stakeholders. 

Storyboarding — Working  Out  tfie  Details 

Focused  Rapid  CD,  which  collects  and  consolidated  sequence 
models,  works  out  the  details  of  the  vision  with  storyboarding 
techniques.  Too  often  when  people  design,  they  break  the  ex¬ 
isting  practice  because  they  jump  from  their  big  idea  to  low-level 
user  interface  and  implementation  design.  As  soon  as  designers 
start  focusing  on  technology,  technology  and  its  problems  be¬ 
come  their  central  design  concerns.  How  technology  supports 
work  or  life  practice  is  subordinated.  This  also  happens  when 
people  design  from  idealized  models  of  how  processes  work. 

The  steps  and  strategies  of  a  practice,  being  tacit,  are  easy  to 
overlook.  If  we  are  out  in  the  field  watching  them,  we  can  see 
them.  This  is  what  is  captured  in  the  sequence  models.  Story¬ 
boarding  keeps  the  team  honest  and  the  design  clean.  Guided 
by  the  affinity  diagram  and  consolidated  sequences  the  vision  is 
made  real  in  storyboards.  Storyboards  ensure  that  the  team  does 
not  overlook  any  intents  and  steps  that  are  critical  to  the  practice. 
Even  if  the  practice  is  changed,  we  have  to  think  through  the  de¬ 
tails  of  how  it  will  be  changed  to  ensure  that  adoption  is  easy. 
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Storyboards  are  like  freeze-frame  movies  of  the  new  prac¬ 
tice  (see  Fig.  49.4).  Like  storyboarding  in  film,  the  team  draws 
step-by-step  pictures  of  how  people  will  perform  activities  in 
their  new  worlds.  Storyboards  include  manual  steps,  rough  user 
interface  components,  system  activity  and  automation,  and 
even  documentation  use. 

Because  they  focus  on  practice  redesign,  storyboards  pre¬ 
vent  the  design  team  from  prematurely  delving  into  too  much 
detail.  They  are  guided  by  customer  data,  and  after  each  task 
has  been  thought  through  and  sketched,  the  team  reviews  it  to 
ensure  that  it  remains  true  to  the  customer  data.  This  does  not 
mean  that  no  invention  happens,  but  that  the  team  must  ac¬ 
count  for  the  steps  and  other  data  elements  of  the  affinity.  They 
must  look  at  them  and  make  a  conscious  decision  about  how 
to  handle  them.  They  might  change  all  the  steps  and  even  elim¬ 
inate  whole  sequences;  as  long  as  people  can  still  achieve  their 
fundamental  intents,  the  change  will  work.  When  teams  forget 
or  ignore  the  user’s  intent,  the  design  is  in  trouble. 

How  is  storyboarding  different  from  building  sce¬ 
narios?  The  storyboard  is  fundamentally  a  future  scenario 
guided  by  the  vision  and  reined  in  by  the  data.  Every  system  im¬ 
plies  multiple  storyboards  of  use.  Just  as  a  kitchen  supports 
multiple  scenarios  of  cooking  (the  quick  hot  dog,  the  Thanks¬ 
giving  dinner,  the  breakfast  for  the  toddler),  a  system  supports 
multiple  scenarios  of  use.  A  storyboard  is  a  high-level  use  case. 

Any  vision  implies  multiple  scenarios  to  fulfill  the  vision.  No 
system  should  be  built  around  one  storyboard  or  one  future 
scenario.  Gathering  a  set  of  people  together  in  a  room  to  in¬ 


vent  the  future  with  future  scenarios,  returns  them  to  design 
from  the  I.  Where  do  the  cases  come  from  that  the  system  must 
handle?  If  not  from  customer  data,  they  must  be  generated  by 
the  people  in  the  room.  If  they  come  from  idealized  process 
flows,  they  are  probably  unrealistically  linear,  ignoring  the  mul¬ 
tiple  strategies  and  complexities  of  real  work  processes.  Good 
systems  are  not  linear;  they  do  not  force  or  follow  a  single  linear 
path.  Driving  scenarios  from  consolidated  data  ensures  that  dif¬ 
ferent  strategies  for  doing  a  task  are  considered  and  that  the  de¬ 
sign  supports  the  different  routes  that  real  people  use  to 
achieve  their  intent  and  the  different  roles  that  people  play 
when  completing  activities. 

By  working  with  multiple  teams,  we  have  learned  that  the 
best  system  design  comes  from  pictorial  representations  of  the 
future  practice  to  be  designed.  Text  versions  of  a  scenario  or 
simple  user  interface  drawings  alone  fail  to  consider  all  the  di¬ 
mensions  of  the  new  practice.  Use  case  and  object  modelers 
are  trained  to  use  the  text-intensive  use  case  characterization  of 
the  redesigned  work.  We  find  that  this  ultimately  focuses  the 
team  too  much  on  system  activities  and  business  rules  and  not 
enough  on  necessary  human  processes  and  user  interface  func¬ 
tion.  User  interface  designers  who  are  concerned  primarily 
with  the  interaction  design  tend  to  overlook  the  system  steps 
and  overfocus  on  UI  detail.  They  get  stuck  prematurely  in  de¬ 
tailed  discussions  of  layout  that  will  become  irrelevant  when 
the  implications  of  multiple  scenarios  of  use  are  worked 
through. 

Storyboarding  consolidated  sequences  ensures  that  you 
maintain  the  breadth  of  real  activity  without  having  to  make  up 


FIGURE  49.4.  Storyboards  are  like  freeze-frame  movies. 
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cases.  If  the  team  is  missing  sequence  data  that  supports  the  vi¬ 
sion,  they  should  collect  it.  Now,  knowing  what  data  is  needed, 
data  collection  can  be  very  focused.  Get  sequence  data  from  two 
additional  interviews  and  consolidate  just  those  two  sequences. 
If  you  can  get  only  one  sequence,  analyze  it  as  you  would  for  se¬ 
quence  consolidation.  Having  a  guide  to  the  design  is  always  bet¬ 
ter  than  making  it  up.  Repeatedly,  we  have  seen  that  teams  trying 
to  design  future  practice  without  data  get  lost  and  confused  and 
are  prone  to  overdesign.  Data  keeps  teams  focused  and  centered 
so  invention  is  reasonable  and  likely  to  succeed. 

The  User  Environment  Design  Option.  For  Rapid  CD 
we  recommend  that  you  skip  creating  a  formal  User  Environ¬ 
ment  Design  (UED)  and  go  straight  to  the  user  interface  struc¬ 
ture.  For  complex  projects  and  systems,  for  doing  a  competi¬ 
tive  analysis,  or  for  analyzing  a  planned  design  to  find  potential 
problems,  you  should  consider  using  the  UED  formalism. 

A  good  product,  system,  or  Web  page  must  have  the  appro¬ 
priate  function  and  structure  to  support  a  natural  workflow 
within  it.  System  design  really  has  three  layers.  The  user  inter¬ 
face  accesses  the  function,  structure,  and  flow  necessary  to  sup¬ 
port  the  user’s  redesigned  work.  The  implementation  makes 
that  function,  structure,  and  flow  happen.  The  core  of  a  product 
is  that  middle  layer:  the  explicit  work  the  system  is  performing. 
Just  as  architects  draw  floor  plans  to  see  the  structure  and  flow 
of  a  house,  designers  need  to  see  the  floor  plan  of  their  new  sys¬ 
tems.  Hidden  within  the  storyboards  are  the  implications  for  the 
system  floor  plan — the  UED. 

The  UED  formalism  represents  a  set  of  focus  areas  or  places 
in  the  system  that  provides  support  for  coherent  activities.  A 
place  might  be  a  window,  Web  page,  dialogue  box,  or  pane.  The 
UED  shows  each  part  of  the  system — how  it  supports  the  cus¬ 
tomer’s  work,  exactly  what  function  is  available  in  that  part,  and 
how  the  customer  gets  to  and  from  other  parts  of  the  system — 
without  tying  this  structure  to  any  particular  user  interface  or 
implementation  design.  The  function  in  the  UED  drives  func¬ 
tional  specification  and  implementation  level  use  cases.  The 
function  in  each  focus  area  becomes  the  specification  for  the 
part  of  the  user  interface  that  will  support  that  function. 

In  creating  a  UED,  the  team  walks  the  storyboards  and  de¬ 
rives  the  implications  of  what  the  system  needs  to  provide.  As 
the  implications  of  storyboard  after  storyboard  are  rolled  into 
the  UED,  the  team  starts  to  see  the  best  way  to  structure  the  sys¬ 
tem.  This  system  structure  now  represents  the  system  that,  if 
built,  will  actualize  the  vision  as  it  has  been  worked  out  in  the 
storyboards.  'Whether  this  system  will  be  valued  by  the  users  has 
yet  to  be  tested.  Any  design  team  can  anticipate  needed  func¬ 
tion  only  up  to  a  point.  Therefore,  after  working  out  the  pre¬ 
liminary  UED,  the  team  mocks  up  each  focus  area  in  paper  and 
tests  it  in  mock-up  interviews.  Through  iteration  with  the  users, 
the  UED  stabilizes,  the  user  interface  paradigm  stabilizes,  and 
the  lower  level  requirements  are  solidified. 


User  Interface  Design  and  Mockup 

After  visioning  or  developing  a  list  of  recommendations,  the 
team  will  work  out  their  first  cut  at  how  to  present  their  redesign 
and  new  function  to  the  users.  In  Lightning  Fast+  the  team  uses 


their  existing  methods  to  come  up  with  a  first  cut  at  a  user 
interface.  In  Focused  Rapid  CD,  the  team  works  out  the  new 
practice  and  system  in  storyboards.  Now  the  team  needs  to  ab¬ 
stract  out  the  elements  of  the  design  that  will  define  the  system 
and  the  user  interface  (UI)  from  the  storyboards.  'Whichever 
process  you  choose  the  team  will  have  to  define  how  to  lay  out 
the  needed  function  and  information  into  a  set  of  rough  screens 
representing  the  core  places  in  the  system.  These  places  in  the 
user  interface  may  augment  an  existing  interface,  add  new  places 
and  features  implied  by  the  vision,  or  represent  the  changes  that 
were  worked  out  in  more  detail  within  the  storyboards. 

However,  this  function  and  initial  layout  is  yet  untested  with 
users.  To  ensure  that  you  have  the  right  solution,  testing  and  it¬ 
eration  is  essential.  Lightning  Fast  CD  gives  you  the  data  you 
need  to  generate  a  concept  but  Lightning  Fast+  and  Focused 
CD  ensure  that  you  make  the  actual  design  and  overall  user  ex¬ 
perience  work  for  the  people  who  will  use  the  system. 

To  test  your  design  concepts  and  clarify  your  functions,  we  rec¬ 
ommend  constructing  a  paper  prototype  and  testing  it  with  your 
user  population  in  paper  prototype  field  interviews.  Users  do  not 
understand  models  or  even  storyboards  when  presented  with 
them.  However,  users  can  talk  user  interface  talk  or  form  factor 
talk  for  physical  products.  So  to  test  ideas,  we  move  quickly 
through  storyboarding  and  laying  out  the  rough  user  interface  to 
get  back  to  the  users  with  our  ideas  represented  as  user  interfaces. 

Build  the  paper  prototype  using  normal  stationery  supplies 
(Fig.  49.5).  Card  stock  provides  a  stable  background  to  simu¬ 
late  the  screen.  Post-its  effectively  simulate  anything  that  might 
be  moved  during  an  interview,  such  as  pull-down  menus  or  but¬ 
tons.  Sample  content  should  be  put  on  a  removable  sheet  so 
that  users  can  replace  it  with  their  own  real  content  during  the 
interview.  'Web  content  should  be  laid  out  for  example  products 
or  content  types  to  provide  a  context  for  discussing  the  struc¬ 
ture  of  the  content  and  its  layout.  For  hardware  designs,  use 
other  kinds  of  props  to  simulate  the  hardware.  The  final  proto¬ 
type  is  rough  and  hand  drawn,  but  it  represents  both  the  sys¬ 
tem’s  structure  and  its  behavior. 
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FIGURE  49.3.  A  complete,  synthesized  vision. 
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Mock-up  interviews  help  designers  understand  why  design 
elements  work  or  fail  and  identify  new  function.  These  inter¬ 
views  are  based  on  the  principles  of  Contextual  Inquiry  given 
earlier.  We  test  the  paper  prototype  with  users  in  their  contexts 
to  keep  them  grounded  in  their  real  practices.  Users  interact 
with  the  prototype  by  writing  in  their  contents  and  manipulat¬ 
ing  and  modifying  the  prototype.  The  partnership  is  one  of 
codesign.  As  the  user  works  with  the  prototype  following  a  task 
they  need  to  do  or  did  in  the  recent  past,  the  user  and  designer 
uncover  problems  and  adjust  the  prototype  to  fix  them.  To¬ 
gether  the  user  and  interviewer  interpret  what  is  going  on  in  the 
usage  and  come  up  with  alternative  designs. 

After  the  design  has  been  tested  with  three  or  four  users,  we 
redesign  to  reflect  the  feedback.  Multiple  rounds  of  interviews 
and  iterations  allow  testing  in  increasing  levels  of  detail,  first 
addressing  structural  issues,  then  user  interface  theme  and  lay¬ 
out  issues,  and  finally  detailed  user  interaction  issues. 

If  your  time  is  short,  you  might  want  to  take  out  your  com¬ 
ponents  for  testing  in  stages.  Start  testing  the  central  or  critical 
components  and  grow  the  mock  up  over  your  rounds.  Just  be 
sure  to  test  each  component  at  least  twice.  You  may  need  to  add 
additional  rounds  if  you  test  component  by  component.  Be 
careful  that  you  do  test  the  system  altogether  or  you  might  get 
good  components  that  do  not  play  well  together. 

Why  build  initial  paper  prototypes  on  paper — paper 
versus  online  testing.  Designers  like  tools.  We  are  so  of¬ 
ten  asked  why  we  use  paper  mock  ups  instead  of  online  pro¬ 
totypes.  We  create  mock  ups  in  paper  because  they  are  fast  to 
assemble  and  they  allow  the  user  and  interviewer  to  make 
changes  to  the  interface  in  the  moment.  No  tool  can  assemble 
an  interface  in  one  or  two  days.  More  importantly,  hand  drawn 
paper  prototypes  make  it  clear  to  the  user  that  icons,  layout, 
and  other  interface  details  are  not  central  the  purpose  of  the  in¬ 
terviews;  it  keeps  the  user  focused  on  testing  structure  and 
function.  The  rougher  the  mock  up,  the  easier  it  is  to  test  struc¬ 
ture  and  function. 

As  the  interface  is  tested,  it  starts  to  stabilize.  At  round  three 
of  the  mock-up  interviews,  we  begin  to  solidify  the  interaction 
design  and  build  our  mock  ups  in  wire  frames.  However,  we  still 
print  and  cut  the  wire  frames  up  to  allow  for  real  codesign  with 
the  user  and  quick  changes  in  the  moment. 

Toward  the  end  of  the  prototype  rounds,  we  use  running 
prototypes  to  test  low-level  usability  function  that  simply  cannot 
be  tested  on  paper;  for  example,  how  well  the  user  can  manip¬ 
ulate  a  handle  for  an  object  in  a  drawing  tool.  We  also  use  high 
fidelity  completed  Web  page  designs  displayed  online  to  test  al¬ 
ternative  visual  designs.  This  helps  us  determine  optimal  use 
of  font,  color,  and  spacing  to  promote  scanning,  to  test  various 
layouts  for  relationship  and  prominence  of  elements  and  flow 
from  one  area  to  another,  to  test  the  feel  of  navigating  from 
place  to  place,  to  test  the  physical  experience  of  using  a  device 
such  as  a  cell  phone,  and  to  test  those  interactions  that  cannot 
be  tested  on  paper. 

However,  these  design  reviews,  as  we  call  them,  are  not  test¬ 
ing  function;  their  intent  is  to  test  how  well  the  visual  design  and 
layout  support  the  design.  Moreover,  any  of  this  testing  comes 
after  the  mock-up  testing  has  already  solidified  the  system  struc¬ 
ture  and  function.  To  get  the  system  right — the  right  function. 


structure,  and  flow — start  with  paper  and  keep  the  user  involved 
in  the  testing.  Then,  once  you  know  you  have  the  right  overall 
direction,  you  can  move  to  wire  frames  and  online  prototypes. 

How  are  mock-up  interviews  different  from  demos, 
rapid  prototyping,  or  usability  tests?  The  primary  issue 
in  answering  this  question  is  where  did  the  design  being  tested 
come  from?  Any  time  you  iterate  an  existing  design,  you  can  ex¬ 
pect  no  more  than  a  15  to  20%  change  to  the  structure  that  is 
being  presented.  People  will  have  an  opinion  of  what  you  are 
displaying  before  them;  it  is  much  harder  to  find  out  that  what 
you  are  showing  them  is  not  fundamentally  valued.  The  ques¬ 
tion  is  what  kind  of  data  you  can  get  in  each  context: 

•  Demos  at  a  user  conference  or  in  a  focus  group  assume  that 
people  are  aware  of  their  practices,  that  out  of  the  context 
of  real  use  they  can  predict  and  project  what  they  would  need 
or  want,  and  that  their  knowledge  of  their  activities  is  not 
tacit  but  known  such  that  they  can  bring  it  to  bear  in  a  dis¬ 
cussion.  We  do  not  find  this  to  be  true.  Demos  and  focus 
group  discussions  are  better  tests  of  sales  point  than  func¬ 
tion.  They  measure  excitement  with  the  product  and  clarity 
of  the  sales  story. 

•  Rapid  prototyping  allows  users’  tacit  knowledge  to  shape  and 
tune  a  system  that  is  being  presented.  Since  the  context  of 
the  discussion  is  to  try  to  tune  and  shape  what  is  already  de¬ 
signed,  the  fundamental  structure  of  the  system  and  its  value 
is  rarely  challenged.  Some  designers  or  technologists  decided 
that  the  presented  function  was  needed  and  should  be  struc¬ 
tured  in  a  particular  way.  CD  does  rapid  prototyping  but  only 
after  systematically  using  customer  data  to  produce  the  ba¬ 
sic  design  to  be  tested.  In  this  way,  customer  data  shapes  the 
whole  of  the  design,  not  just  the  final  testing. 

•  Usability  testing  can  be  another  kind  of  rapid  prototyping  and 
as  such  suffers  from  the  same  challenges  if  the  tested  system 
is  not  originally  designed  from  customer  data.  This  is  why 
usability  professionals  are  at  the  forefront  of  pushing  pro¬ 
cesses  like  CD  into  their  organizations.  They  know  that  us¬ 
ability  has  to  be  built  into  the  product  during  definition 
phase,  not  left  to  the  end  of  the  process.  At  some  point,  pa¬ 
per  mock-up  interviews  are  no  longer  good  enough  to  test 
fine  interaction  with  a  tool.  Lab  testing  is  good  for  checking 
user  interface  manipulation  on  a  running  prototype.  Even  for 
these  we  recommend  using  Contextual  Inquiry  techniques  in 
the  lab  if  the  product  is  unstable  or  taking  it  out  on  laptops 
for  testing  with  users’  real  tasks. 


INTEGRATING  CONTEXTUAL  TECHNIQUES 
INTO  STANDARD  PROCESSES 


Some  companies  have  software  methodologies  within  which  any 
user-centered  design  process  must  fit.  In  general,  CD,  specifically 
Rapid  CD,  can  fit  into  any  software  or  system  development 
methodology.  Most  methodologies  define  a  series  of  stages, 
each  with  deliverables  and  milestones.  Few  define  specific  ways 


962  •  HOLTZBLATT 


of  gathering  requirements  so  CD  can  easily  be  fit  within  the 
requirements  step  of  these  methodologies.  Methodologies  dif¬ 
fer  in  the  deliverables  they  require  for  their  milestones.  How¬ 
ever,  the  natural  data  and  design  artifacts  of  CD  easily  can  be 
used  as  inputs  into  any  methodology’s  format. 

In  this  section,  we  discuss  how  Rapid  CD  supports  different 
methodologies. 

General  Corporate  Methodologies 

Corporate  methodologies  follow  a  set  of  typical  steps  outlined 
below.  Focused  Rapid  CD  can  support  any  general  corporate 
methodology. 

•  Business  case.  Define  the  marketing  or  work  improvement 
rationale  for  building  a  new  or  revised  product  or  system.  Use 
any  Rapid  CD  technique  up  to  the  vision  step  to  character¬ 
ize  the  market  and  generate  product  concepts  that  can  then 
be  investigated  and  validated  with  focus  groups,  surveys,  and 
other  market  intelligence. 

•  Requirements  gathering.  Gather  user  needs  and  overall 
business  needs  to  guide  building  the  product  or  system.  Rapid 
CD  techniques  will  get  the  detailed  data  on  issues  and  tasks 
needed  to  develop  a  system  and  generate  system  require¬ 
ments  at  the  level  of  the  UI  and  the  function.  Some  companies 
would  be  satisfied  with  requirements  extracted  from  untested 
storyboards.  We  recommend  that  you  finalize  your  require¬ 
ments  after  you  define  and  test  your  user  interface. 

•  Design.  Detailed  design  of  the  product  or  system,  both  the 
user  interface  and  underlying  technology,  making  sure  it  can 
be  implemented.  In  this  step,  you  create  the  implementation 
structure  that  will  support  the  function  and  UI  defined 
through  your  Lightning  Fast -I-  or  Focused  Rapid  CD  pro¬ 
cesses.  The  key  to  this  step  is  using  paper  prototypes  to  fi¬ 
nalize  the  lower  level  requirements  and  interaction  design. 
There  is  usually  some  negotiation  with  development  at  this 
point  to  fit  implementation  constraints.  Any  changes  at  the 
user-visible  level  should  be  checked  and,  where  necessary, 
tested  with  users. 

•  Implementation.  Code  and  test  the  user  interface  and  the 
overall  system.  Here  you  test  the  running  system  first  with  run¬ 
ning  prototypes  and  then  during  Alpha  and  Beta  releases  using 
Contextual  Inquiry  to  gather  low  level  and  tactical  changes. 

If  you  must  comply  with  a  corporate  methodology  such  as 
that  outlined  above,  choose  Focused  Rapid  CD  and  map  the 
steps  to  your  deliverables.  Use  the  data  and  artifacts  of  the  CD 
process  to  inform  the  documents  and  specifications  you  need 
to  make  for  your  corporate  processes.  If  you  are  tasked  just  with 
developing  a  business  case,  then  Lightning  Fast  Rapid  CD  can 
get  you  started. 

Rational  Unified  Process 

The  traditional  Rational  Unified  Process  (RUP)  approach  (see 
Cockburn,  2001)  is  very  similar  to  (indeed,  is  an  outgrowth  of) 
the  preceding  corporate  process.  RUP  phases  include  business 


modeling,  requirements  analysis,  design,  implementation,  test, 
and  deployment.  Use  the  Focused  Rapid  CD  process  to  sup¬ 
port  RUP  as  follows: 

•  Business  modeling.  Use  the  first  half  of  Focused  Rapid  CD 
up  to  visioning.  The  goal  is  to  develop  an  understanding  of 
how  the  business  operates  now.  This  may  be  captured  in  busi¬ 
ness  process  diagrams  or  as-is  use  cases.  Rapid  CD  can  be  a 
critical  tool  in  discovering  the  root  causes  of  problems  and  the 
informal  ad  hoc  processes  that  make  the  business  process 
work.  Consolidated  sequences  are  as-is  use  cases. 

•  Requirements.  Use  the  second  half  of  Focused  Rapid  CD  to 
collect  any  additional  task  data  needed  to  redesign  the  tasks 
to  be  supported  by  the  new  system.  Use  cases  are  the  key 
representation  of  user  behavior  in  the  new  system.  Story¬ 
boards  are  the  to-be  models  and  feed  use  case  development. 

•  Analysis  and  design.  Use  storyboarding  from  a  Focused 
Rapid  CD  process  to  feed  the  standard  RUP  process  to  de¬ 
velop  object  models  directly  from  use  cases.  Consider  devel¬ 
oping  a  UED.  It  can  be  an  important  way  to  capture  the  pro¬ 
posed  system  and  its  relationships. 


Agile  or  Extreme  Programming 

Extreme  programming  (XP;  see  Beck,  2000)  is  one  of  several 
Agile  programming  methods — methods  that  focus  developing 
your  system  in  a  series  of  simple  iterations,  each  of  which  de¬ 
livers  value  to  the  customer.  Rather  than  spending  a  lot  of  time 
in  up-front  planning  and  modeling,  the  XP  process  works 
closely  with  customer  stakeholders  to  define  each  iteration  and 
then  builds  it,  ships  it  to  the  customer  for  feedback,  iterates  it, 
and  moves  on  to  the  next  piece.  The  system  is  built  up  quickly 
through  rapid  iteration. 

A  key  role  on  an  XP  team  is  that  of  the  customer,  who  defines 
what  needs  to  be  done,  prioritizes  tasks,  and  works  closely  with 
the  developers  to  work  out  each  detail  of  the  system.  All  the 
problems  discussed  earlier  apply  to  this  role:  customers  are  not 
aware  of  their  practices,  no  one  person  can  represent  all  users, 
and  people  who  no  longer  do  the  activities  are  not  representa¬ 
tive  users.  So  use  the  Rapid  CD  process  to  give  your  customer 
role  the  right  information  to  guide  the  team  well. 

XP  organizes  development  around  rapid  iterations  using  re¬ 
lease  planning  and  iteration  planning  to  decide  what  to  build. 
User  stories  define  how  users  will  work  in  the  new  system.  We 
recommend  Rapid  CD  for  XP,  making  it  a  more  user-centered 
design  process. 

•  Release  planning.  Rapid  CD  provides  the  detailed  informa¬ 
tion  to  decide  what  should  go  into  a  release.  The  XP  philoso¬ 
phy  is  not  to  plan  too  far  ahead  because  things  will  change 
anyway;  accordingly,  use  a  Lightning  Past  style  project  to  raise 
key  issues.  Use  Pocused  Rapid  CD  through  storyboarding  to 
get  sequences  to  guide  the  development  of  user  stories.  Bring 
all  data  into  the  release  planning  meeting  and  work  out  with 
your  customer  what  to  implement  for  this  release. 

•  Iteration  planning.  Use  CD  field  interviews  to  track  the 
success  of  each  iteration  with  your  actual  users.  Identify  prob- 
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lems  for  each  iteration  as  you  go,  providing  continual  refine¬ 
ment  to  your  project  direction.  Use  the  iteration  planning 
meetings  to  look  at  issues  being  raised  by  prior  iterations, 
and  agree  with  your  customers  on  prioritizing  solutions.  Use 
paper  prototyping  to  iterate  your  design  with  users  in  paper 
or  wire  frames  before  planning  them  into  iteration. 

Rapid  CD  maps  reasonably  well  into  any  formal  methodology. 
Rapid  CD  artifacts  and  data  can  easily  supplement  or  feed  deliv¬ 
erables  required  for  any  of  the  standard  methodologies.  Any 
methodology  can  use  the  consolidated  customer  data  to  drive 
brainstorming  of  recommendations,  or  the  visioning  process  to 
generate  a  more  systemic  response  to  the  data.  Visioning  is  close 
enough  to  brainstorming,  a  known  technique  in  any  company, 
that  it  should  be  able  to  be  easily  adopted. 

Similarly,  paper  prototyping  has  begun  to  be  widely  accepted 
as  a  means  of  testing  design  ideas  and  refining  them.  Although 
we  recommend  doing  it  in  the  field — with  real  user  cases  and 
not  the  canned  test  scenarios  of  some  usability  testing — paper 
prototyping  as  a  technique  of  getting  feedback  from  users  is  crit¬ 
ical  for  working  out  the  details  of  a  particular  design.  Sometimes 
this  is  the  best  first  step  in  getting  user  data  into  a  design  and 


letting  developers  and  designers  alike  see  how  their  systems  are 
being  received. 

Rapid  CD  can  also  become  the  basis  for  the  development  of 
personae  to  augment  the  design  process.  Usability  testing  is 
built  into  the  process  such  that  good  user  experience  is  a  nat¬ 
ural  outcome  of  any  Rapid  CD  process.  If  companies  just  want 
to  get  started  infusing  user  data  into  their  processes,  they  can 
use  a  Lightning  Fast  Rapid  CD.  In  only  a  few  weeks,  you  can  be¬ 
gin  to  understand  your  market  or  user  population  and  gener¬ 
ate  solution  concepts. 


CONCLUSION 


The  core  of  Rapid  CD  is  getting  customer  data  into  the  minds  of 
product  managers,  designers,  and  developers.  It  provides  the 
data  that  is  needed  to  guide  business  decisions,  prioritize  re¬ 
quirements,  identify  how  to  streamline  work,  be  clear  on  what 
will  be  of  value  to  the  user,  and  produce  high  quality  user  ex¬ 
perience.  Becoming  user  centered  is  accessible  today  to  any 
company  that  really  wants  to  be  user  centered.  Time  is  not  an 
issue — being  user  centered  is  a  business  decision. 
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INTRODUCTION 


In  recent  years,  academic  and  professional  researchers  and  de¬ 
signers  working  in  the  field  of  Human-Computer  Interaction 
(HCI)  have  looked  to  ethnography  to  provide  a  perspective  on 
relations  between  humans  and  the  artifacts  and  solutions  they 
design  and  used  Within  the  field  of  HCI  there  are  different 
views  among  researchers  and  practitioners  on  just  what  con¬ 
stitutes  an  ethnographic  inquiry.  For  some,  ethnography  is  sim¬ 
ply  a  fashionable  term  for  any  form  of  qualitative  research.  For 
others  it  is  less  about  method  and  more  about  the  lens  through 
which  human  activities  are  viewed.  In  this  chapter  we  will  at¬ 
tempt  to  position  the  ethnographic  approach  within  historical 
and  contemporary  contexts,  outline  its  guiding  principles,  de¬ 
tail  the  primary  methods  and  techniques  used  in  ethnographi- 
cally  informed  design  practice,  and  provide  case  examples  of 
ethnography  in  action. 

This  chapter  provides  an  introduction  to  ethnography,  pri¬ 
marily  as  it  relates  to  studies  in  HCI.  We  will  touch  only  briefly 
on  some  of  the  more  controversial  topics  current  within  the 
Held  of  ethnographic  research  that  have  enlivened  mainstream 
academic  discourse  in  recent  years.  We  will  point  the  reader  to 
books  and  articles  where  these  topics  are  discussed  in  more 
detail.  Our  primary  aims  in  this  chapter  are  to  provide  aca¬ 
demics  and  professionals  in  the  field  of  HCI  with  a  working  un¬ 
derstanding  of  ethnography,  an  appreciation  for  its  value  in  de¬ 
signing  new  technologies  and  practices,  and  a  discerning  eye 
when  it  comes  to  reviewing  and  evaluating  ethnographically  in¬ 
formed  design  studies. 


THE  RELEVANCE  OF 
ETHNOGRAPHY  FOR  DESIGN 


The  turn  to  ethnography  as  a  resource  for  design  can  be  traced 
back  to  the  early  1980s  when  computer  technologies  were  mov¬ 
ing  out  of  the  research  labs  and  engineering  environments  and 
into  mainstream  office  settings,  call  centers,  manufacturing 
floors,  and  educational  institutions.  There  was  the  realization 
that  the  designers  and  developers  of  these  technologies  could 
no  longer  rely  exclusively  on  their  own  experiences  as  a  guide 
for  the  user  requirements  of  these  new  systems. Instead,  de¬ 
signers  and  developers  needed  a  way  to  gain  an  understanding 
of  the  everyday  realities  of  people  working  within  these  diverse 
settings  (Blomberg,  Giacomi,  Mosher,  &  Swenton-Wall,  1991).  In 
many  organizations,  market  research  groups  were  being  asked 
to  provide  perspectives  on  the  people  and  practices  that  made 
up  these  varied  settings.  However,  the  techniques  most  com¬ 
monly  used  by  market  research  groups  at  the  time  (e.g.,  attitude 
surveys,  focus  groups,  telephone  interviews,  etc.)  were  not  well 


suited  for  developing  an  actionable  understanding  of  what  peo¬ 
ple  actually  do  day-to-day  that  could  inform  the  design  of  new 
products  and  interactive  solutions. 

Anthropologists  and  other  social  scientists  had  long  recog¬ 
nized  that  what  people  say  and  what  they  do  can  vary  signifi¬ 
cantly,  making  reliance  on  surveys,  focus  groups,  and  telephone 
interviews  insufficient  for  the  task.  Designers  and  developers 
needed  a  way  to  get  a  firsthand  view  of  the  on  the  ground  real¬ 
ities — the  “here  and  now” — of  everyday  life  in  these  diverse  set¬ 
tings.  At  this  time  in  the  early  1980s,  social  scientists  working  at 
the  Xerox  Palo  Alto  Research  Center  were  beginning  to  explore 
ways  of  bringing  insights  from  ethnographic  research  into  a  pro¬ 
ductive  relationship  with  the  design  of  new  technologies  (e.g., 
Blomberg,  1987,  1988,  1995;  Suchman,1983;  Suchman  et  al., 
1999).  Not  long  after,  other  research  labs  (e.g.,  Hewlett-Packard, 
Apple  Computer,  and  NYNEX)  followed  suit  (e.g.,  Nardi  & 
Miller,  1990;  Sachs,  1995).  Today  many  industrial  research  and 
development  labs  in  the  United  States  have  anthropologists  and 
other  social  scientists  with  ethnographic  expertise  on  staff  (e.g., 
IBM,  Intel,  Kodak,  Microsoft,  Motorola,  General  Motors,  and 
Xerox,  to  name  but  a  few) . 

Ethnographically  informed  design  practices  also  began  to 
take  hold  in  design  firms  and  consulting  companies  during  the 
early  1990s  (e.g.,  IDEO,  Fitch,  and  the  Doblin  group).  These 
early  explorations  culminated  in  1993  with  the  founding  of 
E-Lab,  a  research-and-design  company  that  distinguished  itself 
from  other  design  firms  at  the  time  by  creating  an  equal  part¬ 
nership  between  research  and  design  (Wasson,  2000).  Ethno¬ 
graphic  methods  were  at  the  center  of  E-Lab’s  research  ap¬ 
proach,  with  a  commitment  to  base  design  recommendations 
on  insights  from  ethnographic  research  (Robinson,  1994). 

Furthermore,  in  the  mid-1980s  the  growth  in  networked 
applications  and  devices,  made  possible  through  the  availabil¬ 
ity  of  local  area  networks  (LANs)  and  early  Internet  imple¬ 
mentations,  created  awareness  among  designers  and  devel¬ 
opers  that  they  would  need  to  focus  beyond  the  support  of 
single,  isolated  users  interacting  with  information  technolo¬ 
gies.  "What  would  be  needed  was  a  way  of  exploring  the  infor¬ 
mation  and  communication  practices  of  people  interacting 
with  one  another,  both  face-to-face  and  through  mediating 
technologies.  Information  technologies  were  increasingly  be¬ 
coming  communication-and-collaboration  technologies  that 
consequently  demanded  an  examination  of  social  interaction 
across  time  and  space.  In  response,  a  group  of  computer  sci¬ 
entists,  human-factors  engineers,  and  social  scientists,  some¬ 
what  dissatisfied  with  the  dominant  perspectives  within  HCI  at 
the  time^  founded  the  field  of  Computer  Supported  Coopera¬ 
tive  Work  (CSCW)  (e.g.,  Grief,  1988;  Schmidt  &  Bannon, 
1992.).  A  group  of  sociologists  at  Lancaster  University  and  re¬ 
searchers  at  the  Xerox  Research  Center  in  Cambridge,  England 
played  a  prominent  role  in  helping  to  shape  the  ethnographic 
research  agenda  within  CSCW  (e.g.,  Bentley  etal.,  1992;  Hughes, 


^Ethnographic  research  is  often  just  one  of  many  approaches  used  to  inform  design.  Usability  studies,  surveys,  business  case  analysis,  scenario  plan¬ 
ning,  future  workshops  and  social  network  analysis  are  a  few  of  the  approaches  that  are  used  in  conjunctions  with  ethnography. 

^The  dominant  perspectives  at  the  time  emphasized  technological  possibilities  over  the  uses  and  users  of  technology,  the  interface  requirements  of 
standalone  applications  over  networked  devices,  and  human  psychology  and  cognition  over  social  interaction.  However,  by  the  late  1990s  ethno¬ 
graphically  informed  design  attained  a  prominent  place  in  HCI  research  and  today  there  is  considerable  overlap  between  the  fields  of  CSCW  and  HCI. 
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Randall  &  Shapiro,  1993;  Rodden  &  Anderson  1994;  Hughes, 
Rodden,  &  Anderson,  1995). 

Finally,  the  explosion  of  the  Internet  in  the  late  1990s  accel¬ 
erated  the  move  of  information  technologies  out  of  the  work¬ 
place  and  into  homes,  recreational  environments  and  other 
non-work-related  settings.  This  has  redoubled  interest  in  the 
ethnographic  perspective  as  a  valuable  tool  in  the  design  of  new 
technologies  and  technology  mediated  services.  This  has  pre¬ 
sented  a  new  set  of  challenges  for  designers  as  they  were  asked 
to  design  and  build  applications  that  leveraged  powerful,  digital 
technologies  for  use  by  people  of  all  ages,  engaged  in  myriad 
nonwork  related  activities  in  diverse  contexts.  Although  the 
clamor  for  all  that  is  the  Internet  has  somewhat  subsided,  the 
legacy  of  that  period  is  that  researchers  and  designers  who 
learned  their  craft  during  the  Internet  boom  years  have  gone 
on  to  positions  in  academia  and  industry,  in  both  boutique  de¬ 
sign  firms  and  major  companies,  and  in  a  variety  of  industries 
including  advertising,  marketing,  product  development,  and 
IT  services.  In  late  2005  many  in  the  ethnographic  design  com¬ 
munity  assembled  at  an  industry  sponsored  conference,  EPIC 
(Ethnographic  Praxis  in  Industry  Conference).  The  conference 
brought  together  a  diverse  group  of  researchers  working  in  areas 
such  as  product  design,  workplace  studies,  business  ethnogra¬ 
phy  to  define  the  scope  of  a  collective  agenda  and  to  strengthen 
professional  ties  and  research  connections  (Anderson  &  Lovejoy, 
2005).  This  conference  was  a  powerful  testament  to  the  contin¬ 
uing  value  of  focusing  on  people’s  everyday  realities  and  experi¬ 
ences — the  here  and  now — when  designing  innovative  tech¬ 
nologies,  experiences,  and  services. 


THE  ROOTS  OF  ETHNOGRAPHY 


Ethnography  has  its  historical  roots  in  anthropology,  but  today 
is  an  approach  found  in  most  all  of  the  traditional  and  applied 
social  sciences,  and  in  interdisciplinary  fields  such  as  Human- 
Computer  Interaction  and  Human  Factors  Engineering.  In  an¬ 
thropology,  ethnography  developed  as  way  to  explore  the  every¬ 
day  realities  of  people  living  in  small-scale,  non’Western  societies 
and  to  make  understandings  of  those  realities  available  to  oth¬ 
ers.  The  approach  relied  on  the  ability  of  all  humans  to  figure 
out  what’s  going  on  through  participation  in  social  life.  The 
techniques  of  ethnography  bear  a  close  resemblance  to  the  rou¬ 
tine  ways  people  make  sense  of  the  world  in  everyday  life  (e.g., 
by  observing  what  others  do,  participating  in  activities,  and  talk¬ 
ing  with  others).  The  research  techniques  and  strategies  of 
ethnography  developed  and  evolved  over  the  years  to  provide 
ways  for  the  ethnographer  to  “be  present”  for  the  mundane,  the 
exceptional,  and  the  extraordinary  events  in  people’s  lives. 

Over  the  years  within  the  field  of  anthropology  both  the  fo¬ 
cus  on  nonWestern  peoples  and  the  implicit  assumptions  made 
about  non-'Western  societies  (e.g.,  that  they  are  bounded,  closed, 
and  somewhat  static)  have  changed.  Today,  the  ethnographic 
approach  is  not  limited  to  investigations  of  small-scale  societies, 
but  instead  is  applied  to  the  study  of  people  and  social  groups 
in  specific  settings  within  large  industrialized  societies,  such  as 
workplaces,  senior  centers,  and  schools,  and  specific  activities 
such  as  leisure  travel,  financial  investing,  teaching,  and  energy 


consumption  to  name  but  a  few.  Consequently,  new  techniques 
and  perspectives  have  been  developed  and  incorporated  into 
anthropology  and  ethnographic  inquiry.  However,  a  few  basic 
principles  discussed  later  have  continued  to  inform  and  guide 
ethnographic  practice. 


PRINCIPLES  OF  ETHNOGRAPHY 


Natural  Settings 

Ethnography  is  anchored  in  the  underlying  assumption  that  to 
gain  an  understanding  of  a  world  you  know  little  about  you 
must  encounter  it  firsthand.  As  such,  ethnographic  studies  al¬ 
ways  include  gathering  information  in  the  settings  in  which  the 
activities  of  interest  normally  occur.  This  does  not  mean  that 
ethnographic  studies  never  involve  techniques  that  remove 
people  from  those  everyday  settings  or  that  introduce  artifacts 
or  activities  that  would  not  be  present  otherwise.  The  insistence 
on  studying  activities  in  their  everyday  settings  is  motivated  by 
the  recognition  that  people  have  only  a  limited  ability  to  de¬ 
scribe  what  they  do  and  how  they  do  it  without  access  to  the 
social  and  material  aspects  of  their  environments.  Eurthermore, 
the  ability  to  fully  articulate  what  we  do  is  limited  due  to  the 
tacit  nature  of  the  principles  that  guide  our  actions  (Polanyi, 
1966).  Finally,  some  aspects  of  people’s  experiences  can  only 
be  studied  by  observing  and  recording  the  ongoing  flow  of 
activities  as  they  occur  (e.g.,  people’s  patterned  movements 
through  settings  such  as  retail  stores  or  airports,  moment-by- 
moment  shifts  in  scheduling,  etc.). 

Holistic 

Related  to  the  emphasis  on  natural  settings  is  the  view  that  ac¬ 
tivities  must  be  understood  within  the  larger  context  in  which 
they  occur.  Historically  within  anthropology  the  notion  of  holism 
focused  attention  on  the  fact  that  societies  were  more  than  the 
sum  of  their  parts  (however  these  parts  were  specified) .  The 
particular  aspects  of  a  society  (e.g.,  the  court  system)  could  only 
be  understood  in  relation  to  the  other  aspects  of  the  society 
(e.g.,  kinship  system,  belief  systems).  Today,  because  ethnogra¬ 
phy  is  less  often  applied  to  the  study  of  entire  societies,  the  no¬ 
tion  of  holism  has  a  somewhat  different  emphasis.  Holism  holds 
that  studying  an  activity  in  isolation,  without  reference  to  the 
other  activities  with  which  it  is  connected  in  time  and  space, 
provides  only  a  limited  and  potentially  misleading  understand¬ 
ing  of  that  activity.  So,  for  example,  it  would  be  of  dubious  value 
to  investigate  online  search  strategies  without  understanding 
how  these  strategies  fit  into  the  larger  set  of  activities  in  which 
search  is  but  one  component  (e.g.,  in  the  context  of  online  trad¬ 
ing,  shopping,  or  report  writing). 

Descriptive 

Ethnographic  accounts  have  always  provided  a  descriptive  un¬ 
derstanding  of  people’s  everyday  activities.  Ethnographers  are 
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Innovation 

Imagination  of  what  could  be 
based  in  a  knowledge  of  what  is 


FIGURE  50.1.  Innovation. 

concerned  first  and  foremost  with  understanding  events  and  ac¬ 
tivities  as  they  occur,  without  evaluating  the  efficacy  of  people’s 
everyday  practices.  This  is  not  to  say  that  ethnographic  accounts 
cannot  or  should  not  be  used  to  suggest  how  things  could  be 
different  or  to  point  out  inequities  in  current  ways  of  doing 
things.  Indeed,  as  applied  in  the  domain  of  human-computer  in¬ 
teraction,  ethnography  is  often  aimed  at  identifying  opportuni¬ 
ties  for  enhancing  experiences.  However,  there  is  a  strong  con¬ 
viction  that  to  suggest  changes  or  to  evaluate  a  situation,  one 
first  needs  to  understand  it  as  it  is.  The  work  practice  and  tech¬ 
nology  group  at  the  Xerox  PARC  developed  a  slogan  to  express 
this  conviction  that  innovation  requires  an  understanding  of  the 
present  (Fig.  50.1). 

As  such,  ethnographic  accounts  strive  first  and  foremost  to 
provide  descriptive  and  not  prescriptive  understandings  of  peo¬ 
ple’s  everyday  lives.  In  recent  years  there  have  been  many  chal¬ 
lenges  to  the  idea  that  a  purely  descriptive  understanding  is 
possible.  Critics  point  out  that  every  account  is  shaped  by  the 
perspective  of  the  researcher,  the  goals  of  the  project,  and  the 
dynamics  of  the  relationship  between  the  investigator  and  those 
studied,  to  name  but  a  few  factors  that  shape  ethnographic  ac¬ 
counts.  'While  it  is  hard  to  argue  with  this  position,  in  our  view 
the  value  of  ethnography  for  design  is  not  diminished  by  the 
recognition  that  our  accounts  are  always  located  and  partial. 

Members’  Point  of  View 

As  already  alluded  to,  ethnographers  are  interested  in  gaining 
an  insider’s  view  of  a  situation.  They  want  to  see  the  world  from 
the  perspective  of  the  people  studied  and  describe  behaviors  in 
terms  relevant  and  meaningful  to  the  study  participants.  As  such, 
ethnographers  are  interested  in  the  ways  people  categorize  their 
world  and  in  the  specific  language  people  use  to  talk  about 
things.  This  perspective  is  sometimes  at  odds  with  the  require¬ 
ments  of  quantitative  survey  research  in  which  the  relevant  cat¬ 
egories  must  be  known  in  advance  of  the  study  and  in  which  the 
categories  and  the  language  used  cannot  vary  across  participant 
groups.  In  such  a  quantitative  categorical  approach,  the  terms 
and  categories  used  are  likely  to  be  those  of  the  research  com¬ 
munity  and  not  those  of  the  study  participants,  which  can  un¬ 


dermine  the  validity  of  the  results  (see  the  section  on  ethno¬ 
graphic  methods  for  further  discussion  of  this  topic). 


THE  POSTMODERN  INFLECTION 


The  scientific  paradigm  within  which  ethnography  evolved  has 
come  under  serious  questioning  over  the  last  quarter-century  as 
social  studies  of  science  have  shown  how  scientific  knowledge 
production  is  shaped  by  the  larger  social  context  in  which  sci¬ 
entific  inquiries  take  place  (Latour,  1987;  Latour  &  ’Woolgar, 
1986;  Pickering,  1980).  As  part  of  this  critical  discourse,  ethno¬ 
graphic  accounts  have  been  challenged  for  their  veracity.  Like¬ 
wise  the  authority  of  the  ethnographic  voice  has  been  ques¬ 
tioned  (Clifford,  1988;  Clifford  &  Marcus,  1986,  Marcus  & 
Fischer,  1986).  These  challenges  have  come  from  a  number  of 
fronts,  most  significantly  from  study  participants  who  increas¬ 
ingly  are  able  to  read  ethnographic  accounts  (Said,  1978)  and 
from  feminists  who  saw  in  many  ethnographic  accounts  a  'West¬ 
ern,  male  bias  (Harding,  1986;  Smith,  1987;  ’Wolf,  1992;  Yanag- 
isako  &  Delaney,  1995).  These  challenges  have  made  researchers 
from  all  fields  of  inquiry  more  aware  of  how  their  research  is 
shaped  by  the  particular  time  and  place  in  which  it  occurs.  It  is 
our  view  that  knowledge  of  the  world  is  always  mediated  by  pre¬ 
suppositions,  be  they  cultural,  theoretical,  or  practical,  and  as 
such  no  ethnographic  account  is  value-free.  But  we  also  con¬ 
tend  that  this  does  not  diminish  the  value  and  efficacy  of  an 
ethnographic  approach  as  a  resource  for  designing  new  tech¬ 
nologies,  experiences,  and  services.  Maintaining  the  illusion  of  a 
theoretically  neutral  and  value-free  absolute  “truth”  is  not  nec¬ 
essary  to  establish  the  efficacy  of  ethnographic  research  in  de¬ 
sign.  By  striving  to  describe  and  understand  how  people  oper¬ 
ate  in  and  construe  their  everyday  “realities,”  ethnography  can 
provide  useful  frameworks  and  roadmaps  to  guide  the  design  of 
“people-centered”  solutions. 


ETHICAL  ISSUES 


As  will  be  discussed  in  more  detail  later,  ethnographic  research 
requires  developing  the  trust  and  participation  of  the  people 
studied.  ’Without  this  trust  participants  will  be  reluctant  to  al¬ 
low  researchers  into  their  homes,  boardrooms,  and  classrooms, 
and  they  will  not  openly  share  their  everyday  experiences  and 
concerns.  Anthropologists  have  long  realized  that  such  a  privi¬ 
leged,  trusted  position  requires  reciprocity — if  you  allow  me 
access  to  your  world,  I  will  protect  your  interests.  This  bargain 
has  not  always  been  easy  for  ethnographers  to  keep.  Over  the 
years  there  have  been  examples  of  ethnographic  research, 
where,  wittingly  or  not,  the  situation  of  the  people  studied  has 
been  compromised.^ 

In  the  context  in  which  ethnographic  research  is  being  used 
to  inform  the  design  of  new  technologies — technologies  that 


^To  mitigate  such  negative  impacts  the  American  Anthropological  Association  has  developed  a  code  of  ethics  that  provides  guidance  for  people  en¬ 
gaged  in  ethnographic  research.  This  code  outlines  the  appropriate  disclosures  and  protections  that  should  be  given  to  study  participants.  (Fluehr- 
Lobban  (1991)  provides  a  discussion  of  ethical  issues  in  anthropological  research.) 
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will  change  people’s  lives — it  is  critical  that  the  ethnographer  re¬ 
flect  on  the  impact  this  research  could  have  on  study  partici¬ 
pants.  Of  course,  it  is  not  possible  to  control  all  the  ways  findings 
from  ethnographic  research  will  be  used,  nor  how  technologies 
informed  by  these  studies  will  be  integrated  into  people’s  lives. 
But  the  ethnographer  can  work  to  protect  study  participants 
from  immediate  harm  (e.g.,  that  caused  by  divulging  a  worker’s 
identity  to  management)  and  can  inform  study  participants  of 
possible  longer-term  negative  impacts  (e.g.,  job  losses  brought 
about  by  introduction  of  new  technologies).  As  ethnographic  re¬ 
search  has  moved  into  new  contexts  (e.g.,  HCI,  organizational 
development),  it  has  been  necessary  to  think  creatively  about 
how  our  ethical  guidelines  map  to  these  new  conditions.  How¬ 
ever,  we  cannot  lose  sight  of  the  importance  of  protecting  the 
interests  of  those  who  have  agreed  to  participate  in  our  stud¬ 
ies  be  they  workers  in  organizations,  traders  on  ’Wall  Street,  or 
mothers  of  special  needs  children. 


ETHNOGRAPHIC  METHODS 


The  ethnographic  method  is  not  simply  a  toolbox  of  techniques, 
but  a  way  of  looking  at  a  problem,  a  “theoretically  informed  prac¬ 
tice”  (Comaroff  &  Comaroff,  1992,  quoted  in  Agar,  1996:7).  The 
methods  and  techniques  outlined  later  in  this  chapter  have  been 
developed  over  the  years  to  enable  the  development  of  a  de¬ 
scriptive  and  holistic  view  of  activities  as  they  occur  in  their  every¬ 
day  setting  from  the  point  of  view  of  study  participants.  'We  are 
not  attempting  to  be  exhaustive  in  our  presentation,  nor  do  we 
want  to  suggest  that  there  is  a  fixed  set  of  canonical  ethnographic 
methods  and  techniques.  We  encourage  researchers  to  continue 
developing  new  techniques  as  the  circumstances  require  (e.g., 
studying  “virtual”  communities,  globally  distributed  work  groups, 
technologically  mediated  interactions).  'What  we  believe  remains 
constant  in  the  ethnographic  approach  is  a  commitment  to  de¬ 
scribe  the  everyday  experiences  of  people  as  they  occur. 

Research  Planning 

One  of  the  keys  to  a  successful  research  project  is  the  creation 
of  a  plan  of  action  to  guide  the  research  and  support  changes 
and  adjustments  that  inevitably  must  be  made  as  the  project 
proceeds.  Research  planning  can  be  divided  into  three  general 
stages:  formulating  research  objectives,  devising  a  strategy  for 
selecting  study  participants,  and  selecting  appropriate  research 
techniques  and  approaches. 

Research  objectives  follow  from  the  specific  questions  to  be 
addressed  by  the  research.  It  can  be  useful  to  develop  an  explicit 


statement  that  clearly  articulates  the  objectives  of  a  given  study. 
This  statement  acts  as  a  beacon  to  help  keep  the  research  on 
track  through  the  many  twists  and  turns  of  a  project.  For  exam¬ 
ple,  if  the  research  aims  to  inform  the  development  of  a  soft¬ 
ware  application  that  will  help  doctors  manage  patients’  records, 
the  research  statement  could  be  something  as  simple  as  “un¬ 
derstand  how  doctors  manage  patient  records  through  all  the 
phases  of  treatment  and  in  the  varied  settings  in  which  they 
practice  medicine.”  Over  the  course  of  a  project,  the  research 
objectives  statement  (along  with  the  research  design  and  plan) 
may  change  as  a  project  team  coalesces  and  learns  about  the  ex¬ 
periences  of  the  people  in  the  particular  domain  of  interest. 

Study  Participants 

Once  the  research  objectives  have  been  identified,  a  strategy  for 
selecting  study  participants  (sometimes  referred  to  as  a  “sam¬ 
pling  strategy”)  is  devised  that  answers  two  primary  questions: 
what  types  of  participants  best  suit  the  research  objectives,  and 
how  many  participants  should  be  included  in  the  study  to 
achieve  the  research  objectives?  The  strategy  for  selecting  study 
participants  is  influenced  by  the  research  focus  (e.g.,  shopping 
behavior  vs.  work-group  collaboration)  and  may  include  select¬ 
ing  at  different  levels  of  abstraction  (e.g.,  which  organizations, 
which  workgroups,  and  which  individual  employees).  In  addi¬ 
tion,  as  Cohen  (2005)  has  cautioned,  we  should  be  attentive  in 
making  these  choices  for  those  who  we  intentionally  or  inad¬ 
vertently  excluded  from  our  studies  and  as  such  are  rendered 
invisible  to  our  research  lens. 

Several  types  of  sampling  strategies  are  employed  by  social 
science  researchers,  which  fall  under  two  main  categories: 
probability  and  nonprobability  (Bernard,  1995)."*  Our  focus  in 
this  chapter  is  on  nonprobability  sampling,  as  that  is  the  most 
commonly  employed  in  ethnographic  research. ^  The  nature 
of  ethnographic  work,  as  well  as  recruiting  constraints  often 
demand  selecting  participants  based  on  criteria  other  than  a 
strict  probability. 

Four  types  of  sampling  fall  under  the  rubric  of  nonprobabil¬ 
ity:  quota,  purposive,  convenience,  and  snowball  (Bernard, 
1995).  'When  sampling  by  quota,  the  researcher  specifies  which 
groups  are  of  interest  (e.g.,  women,  teenagers,  truck  drivers, 
people  who  use  software  X,  organizations  with  fewer  than  100 
employees,  etc.)  and  how  many  will  be  needed  in  each  group. 
The  number  of  groups  chosen  will  depend  on  the  research  ob¬ 
jectives  and  the  amount  of  time  available,  but  the  basic  idea  is  to 
cover  the  range  of  possible  variation  one  would  expect  across 
the  target  population.  Practically  speaking,  when  identifying  the 
variables  or  factors  that  should  be  considered  in  sampling  to  enable 
visibility  into  possible  variations  in  experiences  and  practices. 


Hhe  intent  behind  probability  sampling,  or  statistical  sampling,  is  to  generalize  from  the  research  sample  to  a  larger  population  with  a  specified  de¬ 
gree  of  accuracy,  measured  in  terms  of  probability.  All  types  of  probability  sampling  require  a  randomly  selected  and  relatively  large  sample  size. 
^Using  non-probability  samples  does  not  mean  we  cannot  make  general  statements.  If  participants  are  chosen  carefully,  one  can  obtain  reliable  data 
with  as  few  as  four  or  five  participants  (Nielson  &  Landauer  1993,  Romney  et  al.,  1987).  Additionally,  a  recent  case  study  demonstrates  that  smaller, 
non-randomly  selected  samples  can  produce  the  same  results  as  large-scale  survey  research  for  as  little  as  1/100  of  the  cost  (Green,  2001).  A  non¬ 
probability  strategy  also  does  not  preclude  conducting  a  statistical  analysis  or  measuring  differences  between  individuals  or  groups  using  nonpara- 
metric  statistics,  such  as  Fisher’s  Exact  Test  or  nonparametric  correlation  measures.  Their  limitation  is  that  they  cannot  be  used  to  make  claims  about 
larger  user  populations  within  a  specified  degree  of  probability. 
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the  ethnographer  will  often  presumptively  identify  “differences 
that  may  make  a  difference”  in  the  experiential  domain  of  in¬ 
quiry.  For  example,  if  the  focus  is  on  how  people  think  about 
and  manage  their  personal  finances,  the  researcher  might  de¬ 
liberately  strive  to  specifically  sample  people  with  varied  finan¬ 
cial  situations  in  addition  to  life/career  stages  or  family  situa¬ 
tions.  To  ensure  the  desired  variability  is  covered — particularly 
when  the  researcher  is  dependent  on  others  to  provide  access 
to  or  recruit  the  participants  for  a  study — it  is  useful  to  create  a 
“screener”,®  a  questionnaire-like  instrument  designed  to  iden¬ 
tify  characteristics  that  are  appropriate  for  a  given  project. 
Quota  sampling  is  only  possible  when  the  desired  participants^ 
are  easy  to  identify  in  advance  and  recruit.  If  it  is  not  possible 
or  desirable  to  specify  how  many  participants  will  be  in  each 
sampled  group,  a  purposive  sampling  strategy  may  be  called  for. 
This  sampling  strategy  is  based  on  the  same  principles  as  quota 
sampling,  but  the  number  of  participants  for  each  group  is  not 
specified. 

Convenience  and  snowball  sampling  rely  on  a  “sample  as  you 
go”  strategy.  This  is  required  in  situations  in  which  you  don’t 
know  in  advance  who  will  be  available  to  participate  or  which  in¬ 
dividuals  or  groups  should  participate.  Convenience  sampling 
entails  selecting  people  who  are  available,  meet  the  require¬ 
ments  of  the  research,  and  are  willing  to  participate.  One  might 
use  this  strategy,  for  example,  to  observe  and  interview  people 
as  they  shop  in  a  grocery  store. 

Snowball  sampling  relies  on  participants  referring  others 
whom  they  think  would  be  good  candidates  for  the  research, 
or  on  researchers  identifying  individuals  or  groups  to  be  in 
included  in  the  study  as  the  research  proceeds.  Because  this 
method  utilizes  existing  social  networks  it  is  especially  valuable 
when  desired  participants  are  initially  inaccessible  or  reluctant 
to  participate  (e.g.,  CEO’s,  drug  users,  club  members)  or  when 
the  relevant  population  cannot  be  known  in  advance.® 

Gaining  Access 

One  of  the  challenges  for  ethnographic  research  is  gaining  ac¬ 
cess  to  field  sites  and  study  participants.  Access  to  institutional 
settings  often  requires  getting  permission  from  management 
to  observe  and  interview  employees,  or  from  school  officials 
and  parents  to  spend  time  in  classrooms.  In  some  cases,  written 
permission  that  specifies  certain  terms  and  conditions  (e.g., 
how  confidential  information  will  be  protected)  is  required  be¬ 
fore  researchers  are  allowed  onsite.  In  other  cases,  recruiting 
agencies  may  be  used  to  identify  participants  and  financial  in¬ 
centives  may  be  offered  for  participating  in  the  study.  The  time 
(and  skill)  required  to  establish  these  initial  relationships  and 
agreements  should  not  be  underestimated.^ 


Observation 

As  discussed  earlier,  ethnographers  are  interested  in  under¬ 
standing  human  behavior  in  the  contexts  in  which  it  naturally 
occurs,  making  observation  one  of  the  hallmark  methods  of  the 
approach.  In  academic  settings,  it  has  been  common  for  an¬ 
thropologists  to  spend  a  full  year  at  a  given  field  site.  While  this 
continues  to  be  the  case  for  more  traditional  ethnographic  stud¬ 
ies,  shifts  in  research  focus  (e.g.,  away  from  studies  of  entire 
societies),  and  in  study  locations  (e.g.,  away  from  isolated,  hard 
to  reach  settings)  have  resulted  in  more  varied  research  designs 
which  may  involve  shorter,  intermittent  fieldwork  periods  in 
one  or  more  distributed  locations.  Moreover,  in  some  applied 
settings  (e.g.,  enterprise  work  environments)  the  time  available 
for  field  observation  may  be  constrained,  sometimes  allowing 
for  no  more  than  a  few  days  in  any  one  setting. 


WHY  OBSERVE? 


One  of  the  fundamental  axioms  in  the  social  sciences,  and  an¬ 
thropology  in  particular,  is  that  what  people  say  they  do  and 
what  they  actually  do  are  frequently  quite  different.  Studies  have 
shown  verbal  reports  to  be  inconsistent  with  observed  behavior 
in  a  number  of  areas,  including  (among  many  other  examples) 
shopping  behavior  (Rathje  &  Murphy,  1991),  child  rearing  (Whit¬ 
ing  &  Whiting,  1970),  recycling  (Corral-’Verduga,  1997),  and 
health  habits  (Rich,  Lamolu,  Amory  &  Schneider,  2000). 

The  discrepancies  between  verbal  reports  and  behavior  can 
be  due  to  a  variety  of  factors.  People  may  be  concerned  with 
their  image  and  so  report,  consciously  or  not,  behavior  that  is 
more  socially  acceptable.  Along  these  same  lines,  participants 
may  respond  to  questions  in  a  particular  way  in  an  attempt  to 
please  the  researcher.  Another  source  of  disparity  between  be¬ 
havior  and  verbal  reports  is  that  people  are  often  not  aware  of 
their  actual  behavior  because  it  is  so  habitual.  Such  tacit  knowl¬ 
edge  is  often  not  easily  accessible  through  interview  techniques 
alone  (D ’Andrade,  1995). 

The  limitation  of  human  memory  is  another  reason  why  in¬ 
terview  data  can  differ  from  observations.  "When  asking  partici¬ 
pants  about  past  events,  or  recurring  patterns  of  behavior,  our 
memory  may  be  selective  and  skew  responses  in  any  number  of 
directions,  sometimes  in  predictable  patterns  (Bernard,  1995). 

The  complexity  of  social  life  is  another  reason  individual  ac¬ 
counts  of  an  event  may  miss  certain  relevant  details.  The  envi¬ 
ronments  in  which  humans  interact  are  extremely  dynamic  and 
complex — composed  of  social  relationships,  artifacts,  and  phys¬ 
ical  spaces — can  make  it  difficult  for  individuals  to  fully  envision, 
let  alone  articulate  after  the  fact,  what  is  going  on. 


®Screeners  are  an  essential  tool  if  using  an  external  recruiting  agency  to  locate  study  participants. 

^For  sampling  purposes  participants  need  not  be  individuals,  but  could  be  families,  households,  work  groups  or  other  naturally  occurring  entities. 

^Johnson  (1990)  provides  a  more  detailed  discussion  of  sampling  in  ethnography. 

^Anthropologists  have  been  accused  in  the  past  of  only  studying  the  disempowered  and  disenfranchised  because  these  individuals  were  less  likely 
to  feel  powerful  enough  to  refuse  participation  in  ethnographic  studies.  Although  important  in  all  contexts,  when  studying  people  with  more 
power  and  ability  to  say  no  (Nader,  1974),  it  is  often  necessary  to  demonstrate  how  their  participation  will  be  of  benefit  to  them,  their  community 
or  workplace,  or  the  wider  society. 
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THE  RESEARCHER'S  OBSERVATIONAL  ROLE 


When  it  comes  to  observation,  there  are  varying  degrees  to 
which  the  researcher  can  become  integrated  into  the  scene. 
At  one  end  of  the  spectrum  the  researcher  may  become  an 
observer-participant.  In  this  role,  one  attempts  to  be  as  unob¬ 
trusive  as  possible,  quietly  observing  events  from  a  discreet,  yet 
strategic,  position.  At  the  other  end  of  the  spectrum  is  the 
participant-observer.  In  this  situation,  the  researcher  is  actively 
involved  in  the  events  observed  (e.g.,  a  researcher  who  goes 
through  the  training  to  be  a  machine  operator  in  an  industrial 
environment). 

There  are  pros  and  cons  associated  with  each  type  of  role. 
While  being  fully  integrated  into  the  action  provides  a  researcher 
with  firsthand  experience  of  an  event,  taking  good  notes  in  this 
context  is  difficult  at  best.  A  great  deal  of  energy  is  spent  trying 
to  fit  in  rather  than  on  attempting  to  make  sense  of  the  events 
in  the  context  of  the  research  objectives.  In  such  cases,  one 
must  rely  on  memory  of  the  events  when  writing  up  field  notes 
after  the  fact.  Taking  a  more  observational  role  affords  a  wider 
perspective  on  events  and  the  time  to  record  and  reflect  on 
events  as  they  unfold.  On  the  downside,  it  precludes  the  op¬ 
portunity  to  experience  the  activity  firsthand.  In  many  research 
situations,  the  ethnographer’s  position  moves  between  these 
two  extremes,  sometimes  occupying  a  hybrid  position  of  both 
partial  participant  and  outside  observer. 


STRUCTURING  FIELD  OBSERVATIONS 


Before  setting  out  to  observe,  decisions  need  to  be  made  about 
what,  where  and  when  to  observe  (Whiting  &  Whiting,  1970). 
One  might  decide  to  observe  individuals  as  they  go  about  their 
work  and  daily  routines  (person  focused),  a  technique  some¬ 
times  referred  to  as  “shadowing”  (Wasson,  2000).  The  researcher 
might  also  decide  to  focus  on  a  specific  event,  such  as  a  meeting 
or  software  education  class  (event  focused) ,  or  observe  the  ac¬ 
tivities  that  occur  over  time  in  a  given  area,  like  an  office  or  store 
(place  focused).  One  can  even  shift  the  subject  of  observation 
to  an  artifact,  such  as  a  document,  and  record  its  transformation 
as  it  moves  from  person  to  person  or  along  a  development  path 
(object  focused). 


VIDEOTAPING 


Given  the  complexity  of  human  behavior  it  is  impossible  to  no¬ 
tice  and  record  in  real  time  everything  of  interest  to  the  re¬ 
searcher.  This  is  one  reason  video  cameras  have  become  in¬ 
creasingly  popular  in  fieldwork.  Video  records  can  be  used  as  a 
reference  to  supplement  field  notes.  The  ethnographer  also  has 


the  advantage  of  being  able  to  watch  events  multiple  times  and 
change  levels  of  analysis  or  observational  focus  with  subsequent 
viewings  (e.g.,  interaction  between  people  vs.  the  movement 
of  one  individual  in  and  out  of  a  scene). 

Videotaping  also  allows  people  not  primarily  involved  in  the 
fieldwork  to  participate  in  the  analysis  and  opens  up  the  range 
of  perspectives  that  can  be  bought  to  bear  on  the  analysis  (e.g., 
Blomberg  &  Trigg  2000)  used  video  collection  tapes  in  interac¬ 
tions  with  product  developers;  also  see  Brun-Cotton  &  Wall, 
1995;  Karasti,  2001;  Suchman  &  Trigg,  1991). 

Video  cameras  can  also  be  used  to  record  events  in  the  ab¬ 
sence  of  the  researcher.  Not  only  does  this  free  the  researcher 
to  be  involved  in  other  activities,  but  the  camera  also  can  be  a 
silent  presence^®  in  situations  where  an  outsider  (even  a  well 
trained  participant  observer)  would  be  seen  as  intrusive  (e.g., 
child  birth,  counselor-student  interactions,  board  room  delib¬ 
erations,  etc.).  This  however  requires  devoting  time  later  to  re¬ 
view  videotapes  and  incorporate  relevant  information  into  the 
analysis. 

Interviewing 

Interviewing  is  a  central  tool  of  ethnographic  research  (Gubrium 
&  Holstein,  2002).  Conducted  and  interpreted  in  light  of  the  po¬ 
tential  differences  between  what  people  say  and  do,  interviews 
are  critical  in  developing  understandings  of  members’  perspec¬ 
tives.  Interviews  can  be  placed  on  a  continuum  from  unstruc¬ 
tured  to  structured,  with  at  one  extreme  the  casual  conversation 
and  at  the  other  a  formal  structured  interview. 

Ethnographic  interviews  are  most  often  open-ended,  partic¬ 
ularly  during  the  early  stages  of  fieldwork  when  the  ethnogra¬ 
pher  is  just  beginning  to  get  a  perspective  on  the  activities  and 
people  studied.  The  more  unstructured  format  gives  the  re¬ 
searcher  the  freedom  to  alter  the  line  of  questioning  as  the  in¬ 
terview  unfolds.  The  researcher  essentially  is  learning  what 
questions  are  important  to  ask.  Unstructured,  however,  does 
not  mean  haphazard  or  lacking  purpose.  The  researcher  will 
know  the  research  objectives  and  the  topics  to  be  explored 
when  entering  the  field,  and  will  usually  have  an  interview  pro¬ 
tocol  to  serve  as  a  (flexible)  guide  for  the  interview.  While  the 
protocol  provides  a  basic  framework  for  an  unstructured  inter¬ 
view,  the  participant  plays  a  major  role  in  the  direction  the  in¬ 
terview  takes.  As  Bernard  (1995)  wrote,  the  idea  is  to  “get  an 
informant  on  to  a  topic  of  interest  and  get  out  of  the  way.”  'When 
the  interview  moves  to  a  topic  of  particular  interest,  the  re¬ 
searcher  can  then  probe  deeper  to  elicit  more  details.  Indeed, 
interviewing  is  something  of  an  art,  and  one  of  the  key  skills 
an  ethnographer  learns  is  the  art  of  “interrupting  gracefully” 
(Whyte,  1960). 

In  an  open-ended  interview  it  is  important  to  avoid  using  an 
interrogation  style  of  questioning  (e.g.,  “yes  or  no”  questions) 
which  is  designed  to  uncover  the  “facts.”  This  defeats  the  purpose 


^“However,  the  expressed  permission  of  the  participants  in  the  interaction  is  needed  in  these  cases  as  well. 

variety  of  software  applications  now  exist  which  can  help  the  researcher  manage  and  analyze  recorded  on  video.  Caveat,  for  example,  allows  the 
researcher  to  select  and  annotate  images/events  of  particular  interest.  A  more  sophisticated  (though  less  user  friendly)  program  is  Observational 
Coding  System  (OCS)  which  provides  for  a  more  quantitative  analysis. 
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of  keeping  the  interview  open  to  allow  for  a  wide  range  of  re¬ 
sponses  and  for  the  participant  to  express  his  experiences,  his 
own  way,  with  his  own  words.  Using  too  structured  a  format 
constrains  the  range  of  possible  answers,  increases  the  chances 
of  missing  critical  pieces  of  information,  and  increases  the  risk 
that  discoveries  will  be  limited  by  the  ethnographers’  preexist¬ 
ing  concepts,  assumptions,  and  hypotheses.  It  is  critical  to  pro¬ 
vide  opportunities  for  participants  to  convey  their  stories  and 
perspectives  in  their  own  way  and  for  the  researcher  to  be  sur¬ 
prised  by  what  people  say  and  do. 

As  a  project  progresses  and  patterns  begin  to  emerge,  inter¬ 
views  can  become  more  structured  and  the  line  of  questioning 
less  broad.  The  researcher  begins  to  narrow  in  on  topics  that  are 
particularly  informative  and  relevant  to  the  research  objectives. 
Questions  on  the  protocol  become  more  focused  and  specific  as 
answers  to  previous  questions  guide  the  follow-up  questioning. 

Once  the  range  of  responses  is  known  and  the  data  begins  to 
show  patterns  and  themes,  the  researcher  may  want  to  struc¬ 
ture  interviews  further.  A  host  of  structured  techniques  exist. 
Some  are  designed  to  identify  the  ways  people  organize  infor¬ 
mation  within  a  specified  domain,  such  as  free  listing,  card  sorts, 
triad’s  tests,  and  paired  comparisons  (Romney,  Batchelder,  & 
Weller,  1986;  Weller  &  Romney,  1988).  Other  techniques,  such 
as  questionnaires  and  surveys, are  used  to  assess  variations  be¬ 
tween  two  or  more  groups  or  to  establish  how  representative 
the  findings  are  for  a  larger  population.  The  main  idea  behind 
these  techniques  is  to  keep  the  form  and  content  of  the  ques¬ 
tions  consistent  for  each  respondent,  thus  allowing  for  differ¬ 
ences  among  the  sample  population  to  be  ascertained.  Con¬ 
ducting  structured  interviews  at  the  end  of  an  ethnographic 
study  has  the  advantage  of  allowing  the  question  structure  and 
language  to  reflect  the  way  participants  talk  about  and  organize 
experiences,  thus  increasing  the  validity  of  the  survey  findings. 


SOCIAL  NETWORK  ANALYSIS 


A  research  technique  that  has  received  a  great  deal  of  atten¬ 
tion  lately  is  social  network  analysis  (Cross  &  Parker,  2004;  Kil- 
duff  &  Tsai,  2003;  Scott  2000).  While  the  data  for  social  network 
analysis  (SNA)  can  derive  from  records  of  activities  (e.g.,  e-mail 
message  exchanges,  coauthorship,  membership  in  organiza¬ 
tions),  often  the  data  used  in  SNA  analysis  are  based  on  re¬ 
sponses  to  survey  questions  (e.g..  How  frequently  do  you 
interact  via  e-mail  with  the  following  people?).  However,  inter¬ 
preting  the  results  of  SNA  often  relies  on  other  sources  of  in¬ 
formation  (e.g.,  ethnographic  research).  For  example,  although 
a  SNA  may  show  that  there  is  little  communication  between 
two  groups  of  workers,  it  cannot  reveal  whether  this  commu¬ 
nication  pattern  is  limiting  the  effectiveness  of  the  two  groups 
or  is  an  indication  that  there  is  little  need  for  the  two  groups 
to  interact.  Interpreting  the  patterns  highlighted  in  social  net¬ 
work  analysis  requires  other  ways  of  gaining  an  understanding 
organizational  dynamics. 


THE  INTERVIEW  AS  A  COMMUNICATIVE  EVENT 


The  interview  has  become  somewhat  ubiquitous  in  western  so¬ 
cieties  and  is  viewed  as  a  reliable  means  of  acquiring  informa¬ 
tion  of  all  kinds  (e.g.,  attitudes  toward  tax  increases,  the  value 
placed  on  education,  preferences  for  certain  products,  basic  de¬ 
mographic  data,  etc.).  However,  as  Briggs  (1983)  pointed  out, 
what  is  said  in  an  interview  should  not  be  thought  of  as  “a  re¬ 
flection  of  what  is  ‘out  there’”  but  instead  must  be  viewed  “as  an 
interpretation  which  is  jointly  produced  by  the  interviewer  and 
respondent.”  This  view  compels  us  to  regard  the  interview  as  a 
communicative  event  in  which  the  structure  and  context  of  the 
interaction  conditions  what  the  researcher  learns.  This  is  no  less 
the  case  in  highly  structured  interviews  (see  Jordan  &  Suchman 
(1990)  and  Moore  (2004)  for  a  critical  analysis  of  the  ecological 
validity  of  survey  research).  Briggs  recommends  that  we  adopt  a 
wider  range  of  communicative  styles  in  our  interactions  with 
study  participants,  particularly  styles  that  are  indigenous  to  the 
study  population. 


INTERVIEWING  RULES  OF  THUMB 


while  there  are  no  hard  and  fast  rules  for  interviewing,  a  few 
general  guidelines  will  help  facilitate  the  interview  process  and 
increase  the  chances  of  obtaining  useful  information.  There  are 
some  points  to  remember: 

•  Interview  people  in  everyday,  familiar  settings.  Not  only  does 
this  make  the  participants  more  comfortable,  it  allows  them 
to  reference  artifacts  in  the  environment  that  play  an  inte¬ 
gral  part  in  their  activities.  Moreover,  a  familiar  environment 
is  full  of  perceptual  cues  that  can  help  jog  the  not-so-perfect 
human  memory. 

•  Establish  and  maintain  good  rapport  with  participants,  even 
if  it  slows  the  interview  process. 

•  Don’t  underestimate  the  value  of  casual  conversation.  Some 
of  the  most  insightful  information  comes  from  informal  con¬ 
versations  when  social  barriers  are  lowered. 

•  Assume  the  respondent  is  the  expert  and  the  researcher  the 
apprentice.  This  not  only  shows  the  participant  respect,  but 
also  gives  them  confidence  and  facilitates  conversation.  Even 
if  the  interviewer  happens  to  be  more  knowledgeable  on  a 
particular  subject,  the  goal  of  an  ethnographic  interview  is 
to  understand  the  participant’s  perspective. 

•  Use  lack  of  knowledge  as  a  discovery  tool.  Participants  will 
always  know  more  about  their  own  experiences  than  the  in¬ 
terviewer  will.  In  this  context,  don’t  interrupt  unnecessarily, 
complete  a  participant’s  sentences,  or  answer  the  questions. 
Again,  the  idea  is  to  learn  about  the  respondent’s  point  of 
view,  not  the  researcher’s.  In  this  context,  the  researcher’s 
“inevitable  ignorance”  about  the  experiences  of  another  per¬ 
son  can  be  a  powerful  tool. 


good  introductory  book  on  surveys  is  How  to  Conduct  Your  Own  Survey  (Salant  &  Dillman  1994).  Readers  interested  in  a  more  advanced  treat¬ 
ment  of  the  subject  are  referred  to  Babbie  (1990). 
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•  When  conducting  an  open-ended  interview,  avoid  asking  “yes 
or  no”  questions.  Responses  to  these  questions  provide  less 
information  than  questions  beginning  with  “what”  or  “how.” 

•  Be  flexible  enough  to  adapt  the  line  of  questioning  when  nec¬ 
essary.  Human  behavior  is  complex  and  full  of  surprises. 


CONNECTIONS  BETWEEN  OBSERVATION 
AND  INTERVIEWS 


As  noted  earlier,  one  of  the  defining  qualities  of  ethnography 
is  its  emphasis  on  holism.  To  obtain  this  holistic  view,  combin¬ 
ing  different  sources  of  data  is  useful  (Agar,  1996).  Observation 
alone  is  seldom  enough  to  adequately  address  research  objec¬ 
tives.  As  such,  observation  is  invariably  coupled  with  interview¬ 
ing.  Interviews  can  extend  and  deepen  one’s  understanding  of 
what  has  already  been  observed.  Similarly,  interviews  can  be 
conducted  prior  to  observing,  giving  the  researcher  a  better 
idea  about  what  is  most  appropriate  to  observe. 

Interviews  can  also  be  conducted  in  the  context  of  ongoing 
activities,  sometimes  referred  to  as  “contextual”  or  “in  situ”  in¬ 
terviewing.  Instead  of  setting  aside  a  specific  time  and  place  for 
an  interview,  the  researcher  creates  an  opportunity  to  ask  ques¬ 
tions  as  participants  go  about  their  daily  activities.  The  strategy 
can  be  extremely  useful  in  getting  answers  to  questions  that  are 
prompted  by  observation  of  ongoing  activities. 

Self-Reporting  Techniques 

In  cases  where  the  domain  of  interest  transpires  over  a  long  pe¬ 
riod,  or  in  which  direct  observation  is  not  practically  feasible,  self- 
reporting  techniques  can  be  very  valuable.  This  methodology  is 
especially  good  at  revealing  patterns  in  behavior  or  obtaining  data 
that  is  otherwise  inaccessible  (Whyte,  1984).  A  number  of  self-re- 
porting  techniques  exist  which  vary  in  terms  of  form,  focus,  struc¬ 
ture,  and  mechanism  of  self-reporting.  Common  techniques 
range  from  simple  written  diaries  to  visual  storybooks,  and  more 
recently  to  Internet-based  (and  often  multimedia)  “blogs.” 


DIARIES 


Traditional  diaries  consist  of  written  records,  which  might  in¬ 
clude  personal  thoughts  or  descriptions  of  specific  behaviors 
or  accounts  of  events  in  which  an  individual  participates.  The  fo¬ 
cus,  format,  and  degree  of  structure  of  diaries  used  in  ethno¬ 
graphic  research  vary  depending  upon  the  research  objectives, 
ranging  from  structured  activity  logs  which  invite  the  participant 
to  capture  and  describe  specific  aspects  of  her  experiences  for 
each  entry,  to  relatively  unstructured  forms  in  which  diarists 
are  provided  only  with  general  instructions.  Study  participants 
might  be  asked  to  keep  diaries  regarding  the  specific  contexts, 
foci,  modalities,  and  outcomes  of  their  interactions  or  they 


might  simply  asked  people  to  describe  their  experiences  over 
time  while  using  a  specific  product. 

How  diaries  are  analyzed  depends  on  the  research  objectives 
and  resource  constraints.  If  time  permits,  follow-up  discussions 
with  participants  to  clarify  points  or  gain  a  deeper  understand¬ 
ing  of  the  meaning  behind  the  words  can  be  useful.  The  texts 
can  also  be  coded  for  themes,  key  words,  or  phrases  and  pat¬ 
terns  examined  across  individuals  or  between  groups. 


VISUAL  STORIES 


Visual  stories  are  essentially  pictorial  diaries  that  employ  images 
in  addition  to  text  in  order  to  document  experiences.  They  can 
be  particularly  valuable  when  working  with  non-literate  partici¬ 
pants,  such  as  children,  or  in  situations  where  words  alone  are 
inadequate  to  capture  the  essence  of  the  subject  (Johnson, 
Ironsmigh,  Whitcher,  Poteat,  &  Snow,  1997).  Much  like  more  tra¬ 
ditional  text-based  diaries,  visual  diaries  can  be  employed  and 
structured  in  any  of  a  number  of  ways.  Wasson  (2000),  for  ex¬ 
ample,  described  giving  participants  a  written  guide  directing 
them  to  take  photographs  of  their  interaction  with  a  product 
under  study.  They  were  then  asked  to  organize  the  developed 
photos  into  a  story  that  made  sense  to  them,  and  researchers 
conducted  follow-up  interviews  over  the  telephone. 

A  more  open-ended  framework  can  also  be  informative.  In¬ 
terested  in  cultural  differences  between  Italian  and  American 
fishermen,  Johnson  and  Griffith  (1998)  instructed  participants 
from  both  groups  to  take  photographs  of  whatever  they  wanted. 
After  developing  the  film,  Johnson  coded  the  pictures  based  on 
their  content  and  found  significant  thematic  differences  be¬ 
tween  the  groups,  which  added  to  his  understanding  of  differ¬ 
ences  in  cultural  values  of  the  two  groups  of  fishermen. 

A  more  recent  derivation  of  the  visual  story  utilizes  a  video 
camera  which  allows  the  participant  to  provide  a  running  nar¬ 
rative  alongside  the  visual  content.  Being  able  to  experience  the 
two  sources  of  information  simultaneously  provides  the  re¬ 
searcher  with  a  rich  record  of  an  activity.  Blomberg,  Suchman, 
and  Trigg  (1996)  used  a  video-story  approach  in  their  study  of 
the  document  practices  of  lawyers.  They  set  up  a  stationary 
video  camera  in  the  law  office  of  a  study  participant  and  asked 
him  to  turn  on  the  camera  whenever  he  had  occasion  to  re¬ 
trieve  documents  from  his  file  cabinet.  The  running  narration 
recorded  on  videotape  provided  insights  into  the  everyday  use 
of  the  file  cabinet  that  helped  inform  the  design  of  an  electronic 
file  cabinet. 


WEBLOGS  (e.g.,  "BLOGS") 


As  Internet  technology  has  evolved  along  with  the  accessibility 
and  ease  of  digital  media  capture  and  online  sharing,  forms  of 
self-reporting  are  evolving  rapidly.  Most  recently,  this  is  evi¬ 
denced  by  the  proliferation  of  Weblogs — better  known  as 


^^With  varying  degrees  of  success  text  analysis  software  have  been  used  to  help  with  large  data  sets.  Some  noteworthy  programs:  Ethnograph, 
NUD*IST,  E-Z-Text,  and  NVivo. 
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“blogs” — in  which  a  website  is  used  to  “post”  online  entries  that 
may  include  textual  narratives,  digital  photos,  or  digital  video/ 
audio.  Although  not  developed  specifically  to  support  ethno¬ 
graphic  inquiries,  blogs  can  be  a  potentially  very  valuable  re¬ 
search  tool.  Blogs  may  be  particularly  useful  as  a  way  for  partic¬ 
ipants  to  self-report  their  use  of  online  tools  in  the  context 
of  their  online  activities.  Blogs  also  enable  researchers  to  re¬ 
view  participant  “posts”  as  they  occur  as  well  as  to  engage  in 
asynchronous  online  exchanges  and  dialogues  with  participant 
“bloggers”.  Indeed,  as  blogs  increasingly  are  used  in  ethno¬ 
graphic  research,  they  may  blur  the  boundary  between  self¬ 
documentation  and  interviews,  resulting  in  a  blend  of  online 
self-reporting  and  intermittent  online  “conversations”  via 
threaded  participant/researcher  posts. 

Remote  "Virtual”  Observation 

Continuing  technological  developments — in  video,  audio,  wire¬ 
less,  network  applications,  tracking  capabilities,  and  pervasive 
computing — have  created  new  opportunities  to  “observe”  and 
collect  rich  and  dynamic  information  across  geographies  in  real 
time  as  well  as  asynchronously.  These  technologies  increasingly 
enable  ethnographers  to  “virtually”  observe  in  a  wide  variety  of 
contexts.  Using  digital  video  and  audio,  people’s  behaviors  can 
be  tracked  and  analyzed  as  they  interact  with  computer  sup¬ 
ported  products  and  Internet-based  networks.^"*  Indeed,  these 
technologies  (along  with  the  use  of  other  digital  tools  such  as 
blogs)  enable  what  some  have  begun  to  refer  to  as  “digital 
ethnography”  (Masten  &  Plowman,  2003). 

The  pervasiveness  of  the  “web  cam”  is  perhaps  the  simplest 
illustration  of  how  technology  has  expanded  the  observational 
capabilities  of  ethnographers.  Internet-enabled  digital  video 
cameras  can  stream  video  in  real  time  and  can  be  remotely  con¬ 
trolled.  This  digital  video  and  audio  can  be  viewed  by  multiple 
people  across  geographies  either  in  real  time  or  by  accessing 
video  archives.  Such  techniques  and  information  sources  can  be 
particularly  useful  for  geographically  distributed  research  and 
design  teams. 

In  addition,  computer  and  online  sensing,  tracking,  and  an¬ 
alytic  technologies  that  monitor,  gather,  collect,  and  integrate 
information  on  peoples’  computer  mediated  activities  can  be  a 
useful  source  of  information  for  ethnographers.  Although  early 
tracking  and  analytic  technologies  required  complex  sifting  and 
analysis  of  massive  amounts  of  data  to  Hnd  meaningful  nuggets, 
more  recent  tools  enable  sophisticated  tracking  of  individual 
paths  and  activities  as  well  as  the  ability  to  model  online  behav¬ 
ior.  For  example,  “scenario”-based  behavioral  models  (e.g.,  of 
online  shopping,  exploratory  behavior,  task  completion,  etc.) 
which  define  hypothesized  patterns  or  sequences  (“funnels”)  of 
online  behavior  can  be  used  as  an  analytic  lens  to  understand  in¬ 
dividual  or  group  online  behaviors.  To  date,  these  tools  have 
been  used  primarily  to  measure  aggregate  completion  of  on¬ 


line  tasks  (e.g.,  online  shopping,  self-service)  and  to  identify  ob¬ 
stacles  to  user  success  (e.g.,  usability  issues).  However,  over 
time  and  in  conjunction  with  other  sources  of  data  and  infor¬ 
mation  they  may  become  useful  tools  for  ethnographers  inter¬ 
ested  in  patterns  of  online  behavior  and  technology  adoption. 
This  may  become  particularly  important  as  ethnographers  at¬ 
tempt  to  understand  the  formation  and  interactions  of  distrib¬ 
uted  virtual  communities  (e.g.,  Rheingold,  2000). 

The  potential  for  using  (and  misusing)  these  sources  of  in¬ 
formation  will  likely  increase  exponentially  as  pervasive  com¬ 
puting  increasingly  enables  the  identification  of  (and  response 
to)  individuals  across  multiple  physical  and  digital  environments 
and  the  tracking  of  their  activities.  The  collection  and  use  of  dig¬ 
itally  enabled  behavioral  observations  obviously  needs  to  be 
carefully  constrained  by  ethical  considerations,  particularly  the 
respect  for  privacy  and  informed  consent.  In  addition,  as  with 
any  behavioral  observation,  it  is  critical  to  understand  the  con¬ 
text  in  order  to  interpret  the  meaning  and  significance  of  the  be¬ 
havior.  In  this  respect,  tracking  computer-mediated  behaviors 
by  itself  is  insufficient  and  may  simply  result  in  the  collection  of 
massive  amounts  of  relatively  meaningless  data.  However,  if 
used  in  conjunction  with  other  sources  of  information  (e.g.,  self- 
reports  that  illuminate  peoples’  intentions  and  meanings),  pat¬ 
terns  in  digital  behavior  can  illuminate  aspects  of  behavior  that 
are  difficult  or  impossible  for  a  human  researcher  to  observe. 

For  example,  it  has  been  increasingly  common  for  teams  de¬ 
signing  online  services  and  tools  to  examine  individual  and  aggre¬ 
gate  patterns  of  online  behavior  (as  reflected  in  web  server  logs 
or  “client  side”  logs  that  are  generated  as  a  function  of  what  users 
do  online)  to  both  identify  usability  issues  as  well  as  to  examine 
patterns  of  technology,  product,  and  service  adoption  over  time 
(Kantner,  2001). 

Artifact  Analysis 

Ethnographers  have  long  had  an  interest  in  the  material  world 
of  the  people  they  study.  The  artifacts  people  make  and  use  can 
tell  us  a  great  deal  about  how  people  live  their  lives. Artifact 
analysis  can  be  an  important  part  of  contemporary  ethnographic 
studies  (e.g.,  Rathje  &  Murphy,  1991).  For  example,  conducting 
an  artifact  analysis  of  the  stuff  on  people’s  desks  can  say  a  great 
deal  about  the  people’s  work  practices.  Similarly,  studying  the 
contents  of  an  automobile’s  “glove  box”  can  tell  a  great  deal 
about  how  the  car  is  used.  Depending  on  the  kinds  of  research 
questions  asked,  it  may  be  useful  to  include  the  collection  and 
analysis  of  specific  artifacts. 

Recordkeeping 

Although  the  authority  of  the  ethnographic  voice  derives  in  part 
from  the  fact  that  the  ethnographer  is  present  and  witness  to 


ability  to  virtually  observe  and  track  behaviors  presents  many  ethical  issues  that  cannot  and  should  not  be  ignored.  It  is  critical  that  ethnogra¬ 
phers  establish  guidelines  and  protections  if  they  engage  in  electronic,  digitally  enabled  observations. 

^^Archaeologists  rely  almost  exclusively  on  the  artifacts  that  remain  in  archaeological  sites  for  their  interpretations  of  the  behavior  and  social  organi¬ 
zation  of  past  human  societies. 
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events  of  interest,  the  ethnographer  should  not  rely  exclusively 
on  experiential  memory  of  these  events.  In  all  ethnographic  re¬ 
search  it  is  essential  to  keep  good  records.  Field  notes  should  be 
taken  either  during  or  soon  after  observing  or  interviewing.  The 
specific  nature  of  the  notes  will  depend  on  the  research  ques¬ 
tions  addressed,  the  research  methods  used,  and  whether  audio 
or  video  records  supplement  note  taking.  Field  notes  should  at 
least  include  the  date  and  time  when  the  event  or  interview 
took  place,  the  location,  and  who  was  present.  Beyond  that, 
notes  can  vary  widely,  but  it  is  often  useful  to  indicate  differ¬ 
ences  between  descriptions  of  what  is  observed,  verbatim 
records  of  what  is  said,  personal  interpretations  or  reflections, 
and  systematic  indications  of  the  flow  of  observed  events  and 
activities.  When  working  with  a  team  of  researchers,  field  notes 
need  to  be  understandable  to  other  team  members.  This  is  of¬ 
ten  a  good  standard  for  the  specificity  of  field  notes  even  when 
working  alone.  If  such  a  standard  is  maintained,  it  will  be  more 
likely  that  the  notes  will  be  useful  to  the  researcher  months  and 
even  years  later,  in  the  event  reanalysis  or  a  comparative  study  is 
undertaken. 

Qualitative  and  Quantitative  Data 

In  a  previous  section  we  touched  upon  the  complementary  na¬ 
ture  of  observational  and  interview  techniques  and  the  benefit 
of  combining  these  two  approaches.  Triangulation  of  data  can 
serve  to  connect  quantitative  and  qualitative  data  as  well.  Some¬ 
times,  prior  to  the  start  of  a  project  the  only  data  available  is 
quantitative,  sometimes  in  the  form  of  survey  data  focused  on 
population  characteristics.  Qualitative  data  derived  from  ethno¬ 
graphic  research  can  complement  quantitative  research  by  pro¬ 
viding  a  meaningful  context  for  interpreting  the  quantitative  re¬ 
sults.  Qualitative  techniques  allow  researchers  to  dig  deeper 
after  a  survey  has  been  tabulated,  and  aid  in  interpreting  and  ex¬ 
plaining  trends  that  the  quantitative  data  might  reveal  (Guest, 
2000).  In  addition,  qualitative  data  can  inform  the  content  and 
language  of  more  structured  questions,  thus  making  them  more 
meaningful  and  relevant  to  the  participants. 

Ethnography  in  a  Global  Context 

While  ethnography  has  its  roots  in  the  study  of  small-scale,  non- 
Western  societies,  the  application  of  ethnography  in  the  design 
of  products  and  services  has  focused  primarily  on  groups  and 
individuals  located  in  the  developed  regions  of  the  world  (e.g.. 
North  America  and  Europe).  Two  recent  developments  have  led 
to  a  shift  in  the  center  of  design  activity.  One  is  the  emergence 
of  the  economies  of  less-developed  countries,  particularly 
Brazil,  Russia,  India,  and  China,  the  so-called  BRIC  countries, 
which  are  rapidly  becoming  major  markets  for  products  and  ser¬ 
vices.  An  interest  in  serving  these  growing  markets  has  led  some 
firms  to  invest  in  designing  products  and  services  specifically  for 
them  by  directly  engaging  designers,  developers,  and  potential 
users  from  these  developing  regions  (e.g.,  HP  and  Intel).  The 
second  development  is  the  rapid  increase  in  the  use  of  Internet- 
enabled  information  technologies  that  connect  workers,  con¬ 
sumers,  citizens,  and  organizations  distributed  around  the  world. 


SHIFTING  DESIGN  FOCUS 


As  new  markets  open  up  around  the  globe,  many  businesses 
and  organizations  see  an  opportunity  to  create  products  and 
services  specifically  for  these  markets  recognizing  that  the  prod¬ 
ucts  and  services  suited  for  the  developed  West  may  not  be  ap¬ 
propriate  for  these  other  regions.  As  such  these  companies  have 
established  design  initiatives  focused  on  and  located  in  coun¬ 
tries  like  India  and  China.  In  some  respects  ethnography  has 
come  full  circle  in  its  application  to  design,  contributing  to  un¬ 
derstandings  of  the  local  contexts  of  people  living  in  culturally 
and  linguistically  diverse  settings  (the  sites  in  which  ethno¬ 
graphic  practice  first  developed).  More  than  ever  ethnographic 
principles  and  practices  are  applicable  and  necessary  as  the  cen¬ 
ter  of  design  activity  moves  outside  the  developed  West. 


GLOBALLY  DISTRIBUTED  INTERACTIONS 


The  methods  and  techniques  of  ethnographic  research  must 
contend  with  the  increasing  number  interactions  both  at  work 
and  in  domestic  spheres  that  take  place  “virtually”  between  peo¬ 
ple  separated  in  space  and  time.  This  is  a  challenge  for  ethno¬ 
graphic  techniques  that  were  developed  to  study  communities 
of  people  who  interact  face-to-face.  Our  techniques  and  ap¬ 
proaches  must  be  adapted  both  practically  and  analytically  to  this 
new  context  where  many  more  interactions  are  mediated  by  in¬ 
formation  technologies  (e.g.,  instant  messaging,  blogs,  e-mail, 
telephone,  web  conferencing,  shared  digital  workspaces,  and 
repositories)  that  transform  traditional  notions  of  place  and  real¬ 
time  interactions.  In  many  enterprises  work  teams  are  made  up 
of  people  who  are  not  co-located,  many  of  whom  are  highly  mo¬ 
bile  in  their  work  activities,  requiring  interactions  to  take  place 
through  conference  calls,  instant  messaging,  and  e-mail.  Fur¬ 
thermore,  in  some  regions  of  the  world,  people  travel  significant 
distances  for  jobs  and  other  opportunities.  In  these  cases,  inter¬ 
actions  with  friends  and  family  as  well  as  with  others  living  away 
from  their  native  communities,  are  enabled  by  communication 
technologies  (Horst  &  Miller,  2005;  Green,  Harvey  &  Knox, 
2005).  Various  strategies  have  been  developed  to  study  distrib¬ 
uted,  multisited  groups  including  team  ethnography  (placing 
researchers  in  multiple  locations),  perspectival  ethnography 
(focusing  on  the  view  from  one  of  the  local  sites),  and  virtual 
observations  (observing  digitally  mediated  interactions). 


MAKING  ETHNOGRAPHY  MATTER: 
COMMUNICATING  AND  APPLYING 
ETHNOGRAPHIC  INSIGHTS  TO  DESIGN 


This  section  outlines  some  of  the  ways  in  which  the  insights  de¬ 
rived  from  ethnographic  work  can  be  represented  and  commu¬ 
nicated  in  order  to  effectively  inspire  and  guide  the  design  of 
products  and  services.  These  ways  of  representing  and  commu¬ 
nicating  what  is  learned  are  intended  as  examples  of  how  ethno¬ 
graphic  work  can  be  made  relevant  for  design.  However,  before 
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we  outline  some  of  these  representational  forms  and  practices 
we  should  consider  the  possible  foci  of  our  design  activities. 

Designing  What? 

The  application  of  ethnography  to  support  a  design  agenda  was 
directed  initially  toward  informing  the  design  of  technologies, 
tools  and  products.  However,  more  recently  attention  has  ex¬ 
panded  to  include  the  use  of  ethnography  to  inform  the  design 
of  experiences  (Pine  &  Gilmore,  1999),  services  (Thomke, 
2003),  organizational  processes,  and  business  strategies  and 
models.  The  establishment  of  a  design  school  (the  “d-school”  as 
it  is  called)  at  Stanford  University  in  2005,  dedicated  to  teach¬ 
ing  “design  thinking  ...  to  solve  big  problems  in  a  human  cen¬ 
tered  way,”  points  to  the  expanded  role  ethnographic  research 
can  play  in  informing  design,  beyond  the  design  of  products. 
(http://www.stanford.edu/group/dschool/big_picture/our_vi- 
sion.html.)  Ethnographers  are  now  involved  in  projects  and 
contexts  that  span  a  range  of  problems  from  helping  to  design 
the  next  e-mail  application,  to  designing  tools  to  support  system 
administrators,  to  new  business  models  to  reach  small  and 
medium  businesses  with  IT  services,  to  new  customer  services 
for  retail  banking.  Moreover,  the  adoption  of  a  technology  is 
usually  associated  with  changes  in  user  experiences,  individual 
work  practices,  and/or  organizational  models  and  processes 
which  can  also  be  within  the  scope  of  the  design  focus. 


PRODUCTS 


The  application  of  ethnography  to  product  design  has  received 
the  most  attention  in  the  literature  partly  because  many  of  the  pi¬ 
oneers  in  the  field  worked  in  corporate  research  organizations 
of  major  technology  companies  (e.g..  Xerox,  Apple,  and  HP).  In 
addition,  early  commercial  applications  of  ethnographically 
informed  design  often  focused  on  the  design  of  consumer  prod¬ 
ucts,  from  cleaning  products  to  automobiles  to  toys  (Elab,  Doblin 
group.  Sonic  Rim).  It  is  not  surprising  therefore  that  many  view 
product  design,  whether  high-tech  products  like  PDAs  and  on¬ 
line  calendar  applications  or  everyday  consumer  products  like 
breakfast  cereals  and  cold  remedies,  as  the  primary  application 
of  ethnographic  research  (Squires  &  Byrne,  2002). 


EXPERIENCES 


The  publication  of  the  Experience  Economy  (Pine  &  Gilmore, 
1999)  marked  a  shift  in  design  focus  to  include  the  experiences 
that  products  and  other  artifacts  enabled.  Pine  and  Gilmore  ar¬ 
gued  that  the  real  challenge  for  businesses  is  creating  engaging 
experiences  for  both  consumers  and  corporate  customers.  The 
admonishment  by  a  number  of  business  gurus  to  pay  more  at¬ 
tention  to  the  customer  in  the  design  of  products  also  con¬ 


tributed  to  this  expanded  focus.  Customers,  it  turned  out,  cared 
less  about  the  products  themselves  and  more  about  what  the 
products  enabled  them  to  do  or  experience.  Businesses  became 
concerned  with  delivering  quality  experiences  in  which  the 
products  took  on  more  of  a  supporting  role.  The  canonical  ex¬ 
ample  often  cited  for  this  shift  to  an  experience  economy  is  Star- 
bucks,  where  what  is  being  sold  is  not  simply  a  cup  of  coffee, 
but  the  experience  of  buying  and  consuming  the  coffee  at  Star- 
bucks,  including  the  elaborate  choices  available,  the  wireless  ac¬ 
cess  provided,  the  exclusive  access  to  trendy  music,  and  so  on. 


SERVICES 


The  service  sector  has  come  to  dominate  much  of  the  world 
economy  and  increasingly  new  services  are  the  site  of  significant 
change  in  the  way  we  work  and  play  (e.g.,  online  dating  ser¬ 
vices,  GM’s  OnStar,  business  process  outsourcing).  While  many 
innovative  services  are  made  possible  by  new  technologies  that 
provide  the  platforms^®  on  which  new  service  relationships  are 
built,  the  value  being  exchanged  is  the  service  provided,  not  the 
technology  per  se.  Ethnographically  informed  design  strategies 
are  now  being  applied  to  service  design  (Thomke,  2003). 


ORGANIZATIONAL  PROCESSES 


Workflow  systems  have  become  ubiquitous  within  many  orga¬ 
nizations,  orchestrating  everything  from  employee  travel-re¬ 
imbursement  processes  to  customer  online-purchasing  proce¬ 
dures.  With  this  comes  the  opportunity  to  inform  the  design  of 
these  technology-enabled  organizational  processes  through  the 
study  of  existing  work  practices  and  processes.  Here  again  the  de¬ 
sign  focus  is  not  so  much  on  the  underlying  technologies  (e.g., 
SAi;  Siebel)  that  manage  the  workflow,  but  on  the  processes 
themselves.  This  is  not  to  say  that  these  studies  will  have  no  im¬ 
pact  on  the  underlying  technologies — for  example,  making  them 
more  flexible  or  end  user  configurable.  But  the  design  focus  is  on 
the  workflow  requirements,  how  people  will  interact  with  these 
systems  and  will  be  supported  in  executing  processes. 


BUSINESS  STRATEGIES  AND  MODELS 


Ethnographic  research  is  also  playing  a  role  in  the  design  of  busi¬ 
ness  strategies  and  models.  Organizations  are  realizing  that  their 
competitive  advantage  is  only  partly  related  to  the  quality  of  their 
products  and  services.  Equally  important  are  the  business  strate¬ 
gies,  including  channels  to  the  market,  relationships  with  busi¬ 
ness  partners,  and  the  composition  of  employees.  Many  new 
business  models  have  emerged  in  the  last  decade  that  capture 
new  revenue  streams  such  as  advertising  (e.g.,  Google,  Yahoo), 
selling  software  as  a  service  (e.g.,  salesforce.com),  and  facilitating 
networks  of  sellers  and  buyers  or  customers  and  providers  (e.g.. 


^‘’See,  for  example.  Service  Oriented  Architectures  (SOA). 
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eBay,  regional  IT  distributors) .  Ethnographic  research  can  and  is 
helping  to  inform  these  new  business  models. 

Representations  and  Models 

Whether  the  focus  is  on  designing  products,  experiences,  ser¬ 
vices,  processes,  or  business  strategies,  the  researcher  must  find 
ways  to  ensure  that  ethnographically  derived  insights  effectively 
inform  design  innovations  and  decisions.  Researchers  can  help 
make  connections  between  ethnography  and  design  in  many 
ways.  At  the  most  basic  level,  this  is  achieved  through  active  en¬ 
gagement,  integration,  and  collaboration  of  researchers  and  de¬ 
signers.^^  Subsequent  to  conducting  ethnographic  inquiries,  re¬ 
searchers  can  engage  with  design  teams  by  acting  as  user  proxies 
(e.g.,  helping  to  formulate  and/or  review  design  concepts  in  sce¬ 
nario-based  reviews,  providing  feedback  regarding  relevant  user 
expectations  and  behaviors  as  they  relate  to  design  concepts 
and  decisions,  etc.).  Conversely,  the  active  and  direct  involve¬ 
ment  of  designers  in  key  elements  of  ethnographic  fieldwork 
(e.g.,  participating  in  observations  and  interviews,  collabora¬ 
tive  analysis  sessions,  reviewing  video  and  audio  recordings  and 
user  artifacts,  etc.)  can  enrich  their  understanding  of  the  people 
who  will  interact  with  and  use  the  solutions  they  design. 

Although  these  forms  of  engagement  are  valuable,  they  limit 
the  ability  of  teams  to  take  full  advantage  of  ethnographically  de¬ 
rived  understandings.  They  are  restricted  in  the  impact  to  the 
scope  of  the  direct  interactions  between  ethnographers  and  de¬ 
signers.  This  can  be  particularly  limiting  when  designing  multi¬ 
faceted  solutions,  working  with  large  and/or  distributed  design 
and  development  teams. 


THE  VALUE  AND  FUNCTION  OF 
REPRESENTATIONS  AND  MODELS 


To  increase  the  value  and  impact  of  ethnographic  research,  ex¬ 
plicit  representations  or  models  can  be  created  which  distill 
and  communicate  essential  insights  about  people’s  experi¬ 
ences  in  forms  that  can  be  applied  to  design  problems  and  de¬ 
cisions.  Although  the  definition  of  model  can  be  the  subject 
of  debate  (as  can  the  distinction  between  representation  and 
model),  for  our  purposes  we  are  using  the  term  to  refer  to  ex¬ 
plicit,  simplified  representation  of  how  people  organize  and 
construct  experiences  and  operate  in  relevant  domains.  The 
important  point  here  is  that  well-constructed  representations 
which  communicate  effectively  can  help  connect  everyday  pat¬ 
terns  of  activity  and  experience  with  design  solutions.  More 
specifically,  representations  and  models  are  tools  that  can 
serve  a  number  of  purposes  including:  enhancing  the  work¬ 
ing  models  of  designers/developers,  supporting  innovation 
and  creativity,  evaluating  and  prioritizing  ideas  and  concepts, 
and  providing  guiding  principles  and  shared  reference  points 
for  design  teams. 


ENHANCING  THE  WORKING  MODELS 
OF  DEVELOPERS 


In  order  to  design  a  technology  solution  for  people,  designers 
must  have  at  least  an  implicit  working  view  of  the  people  who 
will  interact  with  the  solution.  Such  working  frameworks  and 
perspectives  may  include  assumptions  about  a  range  of  essen¬ 
tial  characteristics  of  the  people  who  will  engage  with  the  solu¬ 
tion  and  the  contexts  in  which  they  will  do  so  (Newman,  1998). 
Indeed,  some  would  argue  that  successful  design  requires  a 
high  degree  of  “empathy”  with  the  target  population  (e.g., 
Leonard  &  Rayport,  1997;  Koskinen,  Battarbee,  &  Mattelmaki, 
2005).  Implicit  and/or  explicit  assumptions  or  knowledge  about 
“users”  may  be  formed  through  some  combination  of  direct  ex¬ 
perience  (e.g.,  interacting  with  and/or  observing  people  in  the 
target  population  in  controlled  or  noncontrolled  settings)  and 
secondary  learning  (talking  with  others  about  the  target  group, 
viewing  videotapes  of  target  activities,  reading,  analogy  to  other 
directly  experienced  groups,  etc.).  However  formed,  the  work¬ 
ing  “models”  of  designers/developers  may  be  of  varying  levels  of 
complexity,  robustness,  coherence,  consistency,  and  viability. 
The  broad,  deep,  and  contextualized  understanding  provided 
by  ethnographic  research  can  enrich  the  design  team’s  implicit 
working  models. 


SUPPORTING  INNOVATION 


The  design  of  technology  solutions  for  people  obviously  poses 
a  range  of  potential  creative  challenges  at  varying  levels  of  com¬ 
plexity.  "What  problems  should  be  solved?  What  should  be  built? 
What  kinds  of  experiences  should  the  technology  solution  sup¬ 
port  or  enable?  What  features  and  functions  would  be  useful, 
compelling,  and  satisfying  for  a  particular  group  of  people  in  a 
particular  domain/context?  How  can  existing  or  emerging  tech¬ 
nological  capabilities  be  used  to  enhance  a  particular  group’s,  or 
to  solve  a  particular  problem?  Even  if  there  are  clear  parame¬ 
ters  defining  the  functionality  that  will  be  built  (e.g.,  a  set  of 
“requirements”),  design  teams  must  still  generate  a  compelling, 
easy  to  use,  useful,  and  satisfying  way  of  delivering  that  func¬ 
tionality.  By  providing  an  understanding  of  the  human  domain 
(patterns  of  relationship,  systems  of  meaning,  organizational 
structure,  guiding  principles  or  rules,  etc.),  ethnography  can 
promote  creativity  that  matters  (Robinson  &  Hackett,  1997) — 
relevant  innovations  that  create  new,  realizable  opportunities. 


EVALUATING  AND  PRIORITIZING  IDEAS 


Design  teams  not  only  face  the  challenge  of  generating  innova¬ 
tive  ideas  and  concepts,  but  also  the  equally  important  task  of 
evaluating  and  prioritizing  ideas  and  options  that  arise  from  var- 


noted  earlier,  the  ethnographer  should  develop  an  understanding  of  the  types  of  design  decisions  that  the  design  team  will  need  to  make  and 
a  sense  of  what  they  need  to  know  to  inform  those  decisions. 
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ious  sources  (e.g.,  business  stakeholders,  end  users,  develop¬ 
ment  teams).  Although  there  are  obviously  many  evaluative 
methods  (e.g.,  scenario-based  user  testing,  etc.),  models  derived 
from  ethnographic  research  and  analysis  (e.g.,  scenario  models, 
mental  models,  interaction/social  network  models,  etc.)  can  pro¬ 
vide  a  critical  lens  through  which  development  teams  can  evalu¬ 
ate  and  prioritize  ideas  based  on  how  they  may  fit  into,  not  fit 
into,  or  change  people’s  experiences.  The  need  for  evaluation 
and  prioritization  may  occur  at  various  points  throughout  the  de¬ 
velopment  process,  ranging  from  decisions  about  features  and 
functions,  broad  directions  for  design  concepts,  and  so  forth. 


GUIDING  PRINCIPLES  AND  SHARED 
REFERENCE  POINTS 


The  learning  derived  from  ethnographic  analysis,  particularly 
when  represented  as  explicit  representations  and  models,  can 
serve  as  an  experiential  guidepost  for  individual  designers  and 
design  teams  throughout  the  development  process.  Even 
though  these  representations  do  not  prescribe  or  specify  what 
should  be  done,  they  can  aid  developers  by  focusing  attention 
on  essential  aspects  of  an  experience,  highlighting  variations  in 
the  experiences,  and  limiting  exploration  of  experiential  “dead 
ends.”  In  other  words,  they  can  provide  a  general  structure  and 
direction  within  which  a  team  can  develop  a  shared  under¬ 
standing  and  focus  its  creative  energies. 


TYPES  OF  REPRESENTATIONS  AND  MODELS 


Representations  and  models  can  vary,  ranging  from  personas 
and  scenarios  to  more  abstract  mental  models  and  more  com¬ 


plex  work-practice  models.  The  number,  type,  and  form  of  mod¬ 
els  vary  as  a  function  of  what  is  being  designed,  the  audience, 
and  the  constraints  on  the  design  process,  (e.g.,  K.  Holtzblatt, 
Chapter  X;  Redish  &  Wixon  Chapter  X).  For  example,  teams  de¬ 
signing  organizational  tools  may  find  it  useful  to  model  work  en¬ 
vironments  and  detailed  task  sequences;  teams  designing  learn¬ 
ing  tools  and  programs  may  want  to  represent  particular  skill 
domains,  as  well  as  learning  processes. 

Practitioners  have  developed  a  variety  of  representations  and 
models  to  inform  the  design-and-development  process.  For  ex¬ 
ample,  Beyer  and  Holtzblatt  (1998)  described  a  set  of  five  work 
models  (flow  model,  cultural  model,  sequence  model,  physical 
model,  and  artifact  model)  to  reflect  different  aspects  of  a  work 
domain.  Pruitt  and  Grudin  (2003)  articulated  the  value  (and 
risks)  of  personas  to  inform  the  design  process,  while  Carroll 
(2000)  described  the  value  of  scenarios. 


SAMPLE  REPRESENTATIONS  AND  MODELS 


The  varying  scope,  form,  complexitym  and  function  of  differ¬ 
ent  types  of  models  are  illustrated  in  following  examples. 


EXPERIENCE  MODELS 


The  model  presented  in  Fig.  50.2,  is  one  of  several  developed 
in  the  context  of  ethnographic  research  and  analysis  for  a  finan¬ 
cial  services  company  serving  individual  investors.  This  com¬ 
pany  aimed  to  develop  web  applications  that  would  facilitate 
customers’  active  engagement  in  the  investment  process  with 
particular  financial  instruments.  The  model  was  intended  to  ar¬ 
ticulate  and  visualize  a  financial  development  process  as  well  as 
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FIGURE  50.2.  Financial  development  zones  model. 
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the  varied  meanings  of  “money.”  This  particular  model  high¬ 
lighted  the  role  of  “practice”  in  developing  the  confidence  and 
knowledge  to  become  engaged  in  the  investment  process,  and 
the  iterative/recurrent  nature  of  the  process,  as  people  learned 
to  deal  with  new  financial  instruments  and  domains  (e.g.,  secu¬ 
rities,  bonds,  options,  etc.).  Moreover,  it  illustrated  the  distinc¬ 
tions  that  people  make  between  “real,”  “play,”  and  “founda¬ 
tional”  money  and  the  relationship  between  these  categories, 
investment  behavior,  and  financial  development.  To  oversim¬ 
plify  a  bit,  people  are  more  fully  engaged  and  active  in  the  in¬ 
vestment  process  when  they  view  the  assets/investments  as 
“real”  (e.g.,  money  that  is  used  to  address  their  current  and 
emerging  needs,  pay  bills,  etc.)  rather  than  as  “play”  (e.g.,  stock 
options  that  are  perceived  as  intangible  and  somewhat  imagi¬ 
nary)  or  “foundational”  (e.g.,  savings  for  the  future  that  are  left 
“untouched”).  As  people  have  an  opportunity  to  “practice”  and 
develop  their  knowledge,  they  may  move  from  construing  a  par¬ 
ticular  financial  instrument  or  activity  as  “play”  to  “real.”  These 
notions  suggested  that  web  applications  in  this  domain  should 
not  be  focused  on  simply  providing  a  wealth  of  financial  infor¬ 
mation  or  a  plethora  of  tools.  Instead,  these  patterns  helped  to 
foster  the  generation  of  numerous  ideas  about  ways  to  engage 
people  in  playful  learning  in  the  financial  domain,  with  the  aim 
of  facilitating  the  financial  development  process. 


PROCESS  MODELS 


Process  models  attempt  to  represent  how  a  dynamic  experience 
“works”  and/or  unfolds  over  time.  They  can  range  in  focus  from 
relatively  circumscribed  task-flow  models  that  outline  how  an 
individual  completes  a  specific  task,  to  broader  characteriza¬ 
tions  of  more  holistic  change  processes  (e.g.,  healthcare  behav¬ 
ior  change,  technology  adoption,  etc.).  For  example,  a  health 
services  company  aimed  to  develop  an  “electronic  medical 
record  system”  (combining  client  server  applications  with  web 
based  “portals”).  This  system  would,  among  other  things,  in¬ 
crease  the  efficiency  and  effectiveness  of  their  medical  practice, 
enable  patients  to  view  their  health  records  online,  and  ulti¬ 
mately  empower  patients  and  foster  a  proactive  approach  to 
wellness  and  healthcare  (both  by  clinicians  and  patients). 

At  the  outset  of  the  engagement,  the  health  services  company 
had  generated  a  rather  long  requirements  list  (several  hundred 
features  and  functions)  and  a  particular  view  of  the  structure 
and  function  of  the  web  components  of  the  system.  It  was  clear 
that  the  budget  for  this  initiative  was  not  sufficient  to  build  a  sys¬ 
tem  that  met  all  of  the  initial  “requirements.”  Perhaps  more  im¬ 
portantly,  it  was  unclear  which  components  would  ultimately 
add  the  most  value  for  the  various  stakeholders  (clinicians,  pa¬ 
tients,  the  business  owners,  etc.).  Ethnographic  research  exam¬ 
ining  the  experiences  of  and  relationships  between  clinicians 
and  patients  in  context  (in  clinic  settings  and  in  homes)  pro¬ 
vided  the  means  of  prioritizing  and  evaluating  potential  fea¬ 
tures,  functions,  and  design  concepts. 

Experience  models  of  varying  levels  of  complexity  regarding 
the  health  management  process  were  developed.  For  example, 
one  of  the  simpler  models  (see  Fig.  50.3)  described  how  indi¬ 
viduals,  in  the  process  of  adopting  an  active/proactive  stance  in 
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FIGURE  50.3.  Stages  of  readiness  model. 


relation  to  health  issues,  move  through  varying  “stages  of  readi¬ 
ness.”  A  more  comprehensive,  integrative  model  highlighted 
the  ways  in  which  various  factors  interact  in  influencing  a  per¬ 
son  to  take  action  in  addressing  a  health  issue  and  mapped  the 
role  of  various  healthcare  related  activities  (e.g.,  monitoring, 
motivating,  learning,  sharing,  building  rapport)  in  various  stages 
of  readiness.  The  combination  of  these  models  enabled  the 
team  to  identify  the  most  important  opportunities  for  facilitat¬ 
ing  progression  towards  a  proactive  orientation  to  health,  and 
provided  guidance  in  identifying  ways  to  provide  messages  and 
experiences  tailored  to  a  person’s  stage  and  readiness. 


PERSONAS 


One  of  the  primary  challenges  in  developing  interactive  systems 
is  to  design  them  so  that  they  meet  the  needs  of  varying  users, 
who  may  play  different  roles,  engage  in  varied  tasks,  have  dif¬ 
ferent  motivations  and  strategies,  and  so  forth.  Profiles  or  per¬ 
sonas  are  abstract  representations  of  the  users  of  a  solution 
(Pruitt  &  Grudin,  2003)  which  may  be  informed  through  ethno¬ 
graphic  studies.  Personas  can  help  development  teams  under¬ 
stand  and  anticipate  how  certain  types  of  people  may  experi¬ 
ence  and  interact  with  technology  solutions.  For  example.  Fig. 
50.4  shows  a  simple  persona  developed  to  guide  the  design  of 
interactive  tools  promoting  the  adoption  of  various  financial 
and  health  benefit  programs  in  a  large  enterprise.  Note  that  the 
persona  focuses  characteristics  (attitudes,  life  stages,  scenarios, 
etc.)  that  are  most  relevant  to  the  person’s  experiences  in  man¬ 
aging  financial  and  health-related  concerns. 

The  value  of  personas  can  be  enhanced  by  making  them  vis¬ 
ible  and  dynamically  present  for  design  and  development  teams 
(e.g.,  posters  displayed  in  project  rooms,  multimedia  represen¬ 
tations  that  are  reviewed  with  development  teams,  role-playing 
scenarios  and  walkthroughs  based  on  profiles,  etc.).  Rich  and 
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behavior: 
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Frank 
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•  Continue  making  mortgage 
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Retirement  orientation 

■  “I'd  love  to  retire  early.  I'm 
contributing  8%  to  my 
Savings  Plan  to  get  the  full 
company  match.  That  plus 
my  pension  should  make  that 
possible." 

•  “What  will  my  life  be  like 
when  I  retire?" 

Insurance  orientation 

•  “I  have  enough  insurance  to 
pay  off  my  mortgage  and  put 
my  kids  through  school  if 
anything  happened  to  me." 

■  “I  might  have  an  accident  at 
work  ...  maybe  I  need  to  buy 
more  insurance  for  that?" 


FIGURE  50.4.  Financial  and  health  benefit  program  design  persona. 


dynamic  representations  of  essential  characteristics  of  individ¬ 
uals  can  serve  as  a  common  frame  of  reference  for  communica¬ 
tion  and  a  tangible  reminder  to  development  teams  regarding 
the  people  for  whom  they  are  designing  the  system.  Moreover, 
personas  can  be  used  systematically  in  a  range  of  ways  to  help 
teams  make  design  decisions.  For  example,  Pruitt  and  Grudin 
(2003)  described  specific  techniques  they  have  used  to  system¬ 
atically  apply  personas  to  aid  in  feature  prioritization  decisions. 


SCENARIOS 


Scenarios  are  another  way  ethnographic  research  findings  can 
be  portrayed  (Carroll,  2000;  Nardi,  1992;  Sonderegge,  Manning, 
Charron,  &  Roshan,  2000;  Rosson  &  Carroll  Chapter  X).  Scenar¬ 
ios  illustrate  experiences  and  actions  unfold  in  specific  contexts 
or  situations  (Fig.  50.5)  and  can  be  documented  in  various 
forms  ranging  from  narratives  to  annotated  visual  flow  dia¬ 
grams.  They  may  highlight  interactions  (with  computer  systems, 
people,  business  entities,  etc.),  decisions  processes,  activity  se¬ 
quences,  influencing  factors,  and  so  forth.  They  also  may  illus¬ 
trate  the  different  ways  in  which  varied  groups  or  types  of  peo¬ 
ple  experience  and  navigate  through  similar  situations.  Analysis 
of  scenarios  can  foster  the  identification  of  areas  of  difficulty 
(“pain  points”)  and  experiential  gaps  (or  opportunities),  that 
may  be  addressed  or  enhanced  through  various  design  solu¬ 


tions.  When  integrated  with  personas,  they  can  illustrate  how 
different  target  audiences  navigate  through  the  same  situation, 
which  in  turn  can  suggest  ways  in  which  solutions  can  and 
should  be  adapted  for  varying  target  audiences. 


MOCK-UPS  AND  PROTOTYPES 


Representational  artifacts,  be  they  paper  prototypes,  mock-ups, 
or  working  prototypes,  can  play  an  important  mediating  role  in 
connecting  use  requirements  and  design  possibilities.  When  in¬ 
formed  by  studies  of  practice,  these  design  representations  re¬ 
specify  practices  and  activities  in  ways  that  are  recognizable  to 
practitioners.  The  prototypes  go  beyond  simple  demonstra¬ 
tions  of  functionality  to  incorporate  materials  from  the  partici¬ 
pants’  site,  embody  envisioned  new  technological  possibilities, 
convey  design  ideas  in  relation  to  existing  practices,  and  reveal 
requirements  for  new  practices.  Prototyping  practices  as  such 
recover  and  invent  use  requirements  and  technological  possi¬ 
bilities  that  make  sense  each  in  relation  to  the  other  (Suchman, 
Blomberg,  &  Trigg,  2002).  In  addition  these  representational 
artifacts  facilitate  the  communication  of  what  has  been  learned 
about  technologies-in-use  to  the  larger  research  and  technol¬ 
ogy-development  communities. 

In  an  ethnographic  study  of  engineering  practice  at  a  state 
Department  of  Highways,  design  prototypes  critically  deepened 
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FIGURE  50.5.  Scenario  flow  model. 
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the  researchers  understanding  of  the  requirements  of  the  work 
of  document  filing  and  retrieval  (the  focus  of  the  study).  At  each 
step,  from  early  design  discussions  with  practitioners,  to  the  cre¬ 
ation  of  paper  “mockups”  of  possible  interfaces  to  the  online 
project  files,  and  finally  to  installing  a  running  system  at  the 
worksite,  the  researchers  became  more  aware  of  the  work’s  ex¬ 
igencies.  For  example,  in  recognition  of  some  of  the  difficulties 
that  engineers  experienced  with  their  filing  system  various  al¬ 
ternative  document-coding  strategies  that  augmented  the  ex¬ 
isting  filing  system  were  designed.  Through  successive  rounds, 
in  which  engineers  were  asked  to  code  documents  using 
mocked-up  coding  forms  (both  paper-based  and  online)  the  re¬ 
searchers  understanding  of  the  requirements  of  the  work  deep¬ 
ened.  Eventually,  the  search  and  browsing  interfaces  evolved  to 
be  more  finely  tuned  to  the  requirements  of  the  engineers’ 
work  (e.g.,  Trigg,  Blomberg,  &  Suchman,  1999). 


CAVEAT  REGARDING  REPRESENTATIONS 
AND  MODELS 


Although  representations  and  models  are  valuable  tools  for  con¬ 
necting  ethnographic  understanding  and  design,  they  can  also 
have  negative  effects.  Although  grounded  in  observations  and 
other  forms  of  ethnographic  inquiry,  models  are  always  a  selec¬ 
tive  interpretation  and  construction  of  experience.  Thus,  while 
representations  and  models  can  focus  attention  on  and  illumi¬ 
nate  important  aspects  of  experience,  they  can  also  become  rei¬ 
fied  stereotypes  and  constraints  that  inhibit  design  possibili¬ 
ties.  Ongoing  inquiry,  a  critical  perspective,  and  a  willingness  to 
evolve  the  representations  in  the  face  of  new  learning  are  es¬ 
sential  to  maintain  the  viability  and  value  of  models  for  design. 


RELATION  TO  OTHER  OUALITATIVE 
APPROACHES  AND  PERSPECTIVES 


The  ethnographic  approach  has  strong  connections  to  and 
affinities  with  other  approaches  that  have  contributed  to  the  de¬ 
velopment  of  the  field  of  human-computer  interaction,  namely 
distributed  cognition,  activity  theory,  ethnomethodology,  and 
participatory  design.  There  is  not  space  here  to  go  into  depth 
on  any  of  these  approaches.  Our  aim  is  simply  to  highlight  rela¬ 
tions  between  these  approaches  and  ethnography,  and  provide 
a  way  to  distinguish  between  them. 

Distributed  cognition  (sometimes  referred  to  as  social  or  sit¬ 
uated  cognition)  was  first  introduced  to  the  HCI  community  by 
Lave  (1988)  and  Hutchins  (1995).  Distributed  cognition  located 
cognition  in  social  and  material  processes.  "When  it  was  intro¬ 
duced,  it  challenged  the  dominant  paradigm  within  HCI,  that 
cognition  primarily  involved  the  psychological  and  mental 
processes  of  individuals.  The  connection  between  distributed 
cognition  and  ethnography  is  not  only  in  the  insistence  that  our 
understanding  of  human  activity  be  located  outside  individual 
mental  processes,  in  human  interaction,  but  also  in  the  convic¬ 
tion  that  to  gain  an  understanding  of  human  activity,  ethno¬ 
graphic,  field-based  methodologies  are  required. 


Activity  theory  also  shares  with  ethnography  a  commitment  to 
field-based  research  methodologies.  In  addition,  there  is  the 
shared  view  that  behavior  (activity)  should  be  a  primary  focus  of 
investigation  and  theorizing,  and  a  recognition  that  objects  (arti¬ 
facts)  are  key  components  in  descriptive  and  explanatory  ac¬ 
counts  of  human  experience  (e.g.,  Engestrom,  2000;  Nardi,  1996). 

Ethnomethodology  is  often  used  interchangeably  with 
ethnography  in  HCI  literature.  This  is  not  only  because  the 
terms  are  etymologically  similar,  but  also  because  many  of  the 
social  scientists  contributing  to  the  field  of  HCI  have  adopted  an 
ethnomethodological  approach  (e.g.,  Bentley  et  al.,  1992;  But¬ 
ton  &  Harper,  1996;  Crabtree,  2000;  Hughes  et  al.,  1993,  1994, 
1995)  with  its  focus  is  on  locally  and  interactionally  produced 
accountable  phenomena.  Ethnomethodology’s  particular  set 
of  commitments  (e.g..  Heritage,  1984)  are  not  shared  however 
by  everyone  working  within  the  ethnographic  paradigm. 

Participatory  design  does  not  have  its  roots  in  qualitative 
social  science  research,  but  instead  developed  as  a  political  and 
social  movement,  and  as  a  design  approach  committed  to  di¬ 
rectly  involving  end  users  in  the  design  of  new  technologies 
(See  Muller,  this  volume;  also  Schuler  &  Namioka,  1993;  Kensing 
&  Blomberg,  1988).  'Within  the  HCI  context,  participatory  de¬ 
sign  has  shed  some  of  its  political  and  social-action  underpin¬ 
nings,  and  often  is  viewed  primarily  as  a  set  of  methods  and 
techniques  for  involving  users  in  design.  Its  connection  to 
ethnography  is  in  the  commitment  to  involve  study  participants 
in  the  research,  and  in  the  value  placed  on  participants’  knowl¬ 
edge  of  their  own  practices.  Also  in  recent  years,  those  work¬ 
ing  in  the  field  of  participatory  design  have  incorporated  ethno¬ 
graphic  techniques  (e.g.,  Crabtree,  1998;  Kensing,  Simonsen,  & 
Bpdker,  1999)  as  a  way  of  jointly  constructing  with  participants 
knowledge  of  local  practices. 


ETHNOGRAPHY  IN  ACTION 


Case  Study  1 :  Designing  a  Program 

and  Website  to  Change  Healthcare  Behaviors 

A  large  global  company,  providing  health  insurance  coverage  to 
over  60,000  of  its  employees  in  the  United  States,  developed  a 
multifaceted  program  to  reduce  its  healthcare  costs  and  opti¬ 
mize  the  health  and  productivity  of  its  workforce.  The  major 
goals  were  to  provide  reliable  healthcare  information  and  to  pro¬ 
mote  better  healthcare  decisions.  The  program  provided  a  num¬ 
ber  of  online  and  offline  resources  for  employees  (e.g.,  a  24-hour 
medical  hotline,  a  research  team  that  would  provide  gather  and 
summarize  treatment  outcome  research  findings  for  severe  med¬ 
ical  conditions,  online  access  to  a  leading  edge  medical  infor¬ 
mation/content  website,  etc.).  The  company  initially  promoted 
the  program  through  a  series  of  face-to-face  workshops  designed 
to  convey  the  limitations  of  standard  medical  practice,  encour¬ 
age  a  consumer-oriented  approach  to  healthcare,  and  make  peo¬ 
ple  aware  of  resources  provided  by  the  company. 

After  the  initial  launch,  the  team  became  concerned  that  the 
health  program  resources,  including  the  website  were  being 
underutilized  by  employees,  limiting  the  potential  impact  and 
value  for  both  employees  and  the  company.  In  this  context,  the 
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FIGURE  50.6.  Healthcare  decision-making  model. 


team  initiated  a  study  to  evaluate  the  current  program  and  web¬ 
site  as  well  as  well  as  to  establish  clear  user  models  and  strate¬ 
gic  frameworks  to  guide  website/program  redesign  efforts. 
To  meet  the  project  objectives,  the  research  team  conducted 
ethnographic  inquiries  combined  with  scenario-based  exercises. 
In  order  to  extend  the  participant  sample  as  well  as  to  deal  with 
practical  constraints  (very  limited  time  and  resources),  the  team 
conducted  some  of  the  interviews  and  assessments  remotely,  via 
telephone  and  web-conferencing  tools.  Ethnographic  inquiries 
focused  on  understanding  the  varied  ways  that  people  managed 
their  healthcare  (and/or  the  healthcare  of  family  members) ,  in¬ 
cluding  their  overall  orientations  to  health  and  wellness,  rela¬ 
tionships  and  interactions  with  healthcare  providers  (and  other 
family  members),  and  their  healthcare-decision-making  pro¬ 
cesses.  The  latter  included  understanding  the  online  and  offline 
resources  and  tools  that  people  used  and  the  major  healthcare 
scenarios  they  addressed.  After  exploring  and  profiling  partici¬ 
pants’  healthcare  experiences,  they  were  asked  to  work  through 
an  actual  healthcare  decision  scenario,  while  being  invited  to 
engage  with  the  program  resources  and  website. 

Based  on  these  inquiries,  the  research  team  developed  a 
number  of  experience  models  including:  a  set  of  personas  high¬ 
lighting  key  variations  in  healthcare  orientation  and  behavior 
that  the  program/website  design  team  would  have  to  accom¬ 
modate;  a  simple  typology  of  health-related  scenarios  (e.g., 
managing  severe  and  chronic  medical  conditions,  dealing  with 


common  everyday  healthcare  issues,  and  “wellness”/risk  reduc¬ 
tion)  ;  scenario  flow  models  (Fig.  50.5)  documenting  how  varied 
types  of  people  made  decisions  (Fig.  50.6)  and  used  a  range  of 
resources  to  address  key  health  scenarios. 

These  models  along  with  other  resources  generated  numer¬ 
ous  insights  about  limitations  of  the  current  website  and  pro¬ 
gram,  opportunities  for  program/website  enhancement,  and  de¬ 
sign  recommendations.  For  example,  user  profiles  and  scenario 
models  showed  how  the  program  was  fragmented  and  did  not 
effectively  align  with  people’s  key  health  scenarios,  forcing  an 
individual  to  painfully  sift  through  resource  information  and  de¬ 
scriptions  to  figure  out  which  resources  might  be  most  relevant 
and  useful  in  a  specific  scenario.  In  addition,  the  program  and 
the  website  did  not  adequately  address  “wellness”/risk  reduction 
scenarios  which  represented  a  significant  concern  for  almost  all 
employee  segments  and  presented  an  important  opportunity  for 
the  company  to  promote  a  proactive  and  preventative  approach 
to  healthcare. 

In  order  to  connect  the  user  insights  with  the  program/ 
website  design,  the  team  articulated  a  number  of  design  prin¬ 
ciples  and  a  specific  scenario-based  design  framework  (Fig. 
50.7).  This  framework  highlighted  the  value  of  organizing  the 
website  (and  other  program  elements)  based  on  key  health¬ 
care  scenarios,  aligning  and  prioritizing  resources  and  invit¬ 
ing  specific  modes  of  action  that  were  most  important  in  each 
scenario,  and  enabling  relevant  “cross-scenario”  awareness 


984  •  BLOMBERG  AND  BURRELL 


Change 


Scenario:  { 

Common  Health  r - • 

Questions -Challenges-  '  Modes  of  Action  1 

Targeted  resources. 

Decisions-Goals  1  1 

info,  tools 

Ol 

c 

1  1 

o» 

Program  & 

Site  Adoption 

fU 

Scenario:  1  I 

1 

1/) 

0) 

Severe  Conditions  ,  i 

Optimizing 
Health  Care, 

z 

u 

Questions -challenges-  I  Modes  of  Action  1 

Decisions-Goals  ,  ■ 

Targeted  resources, 
info,  tools 

Ol 

Benefits,  & 

0) 

lO 

t 

Wellness 

Scenario: 

Wellness  ' 

Questions-Challenges-  .  Modes  of  Action  . 

1 

Targeted  resources. 

Decisions-Goals  ' 

1  1 

1 

info,  tools 

Related  Programs 


Contemplate  Find  Learn  Decide  Act 


FIGURE  50.7.  Scenario-based  design  framework. 


and  behavior  that  would  provide  value  to  users  and  support 
program  objectives  (e.g.,  a  person  who  came  to  the  website 
to  learn  about  current  research  findings  on  the  effectiveness  of 
an  experimental  treatment  for  diabetes,  might  also  be  invited 
to  explore  the  value  of  changes  in  diet  or  exercise  to  manage 
diabetes,  etc.).  In  addition,  the  framework  highlighted  the  im¬ 
portance  of  embedding  strategic  messages  regarding  health¬ 
care  (e.g.,  importance  of  evidence-based  medicine,  proactively 
taking  charge  of  one’s  health  and  healthcare,  etc.)  and  implicit 
invitations  to  change  healthcare  behaviors  throughout  the 
site  design. 

The  ethnographic  research  led  the  team  to  rethink  a  num¬ 
ber  of  major  assumptions,  which  in  turn  led  to  redesign  of  pro¬ 
gram  strategies,  resources,  and  the  website.  From  a  program 
perspective,  the  research  highlighted  the  fact  that  the  vast  ma¬ 
jority  of  employees  had  already  adopted  many  consumer  atti¬ 
tudes  and  behaviors  and  were  leveraging  a  number  of  trusted 
health  resources  (in  contrast  to  initial  assumptions  of  limited 
“consumerism”).  This  led  the  team  to  reconsider  the  position¬ 
ing  of  specific  program  resources,  shift  strategic  messaging, 
and  generate  novel  program  strategies  including  behavioral 
“rewards”  programs  that  supported  proactive  and  preventa¬ 
tive  behaviors.  The  initial  research  inquiry  also  led  the  team  to 


implement  a  continuous  assessment  program  to  continue  to 
monitor  program  impact  and  changes  in  employee  experiences 
and  behaviors. 

Case  Study  2:  Department  of  Highways 

The  headquarters  of  a  state  Department  of  Highways  was  the 
site  for  a  collaborative  research  and  design  effort  with  engineers 
charged  with  the  design  of  a  bridge,  scheduled  for  completion 
by  the  year  2002.^®  The  project  aimed  to  design  an  electronic 
document-management  system  that  was  informed  by  an  under¬ 
standing  of  the  everyday  requirements  of  engineering  work  at 
the  Department  of  Highways.  The  project  began  with  onsite  in¬ 
terviews  and  observations  of  engineering  practice,  with  a  focus 
on  the  document- related  work  practices. 

Based  on  an  initial  understanding  of  the  document-manage¬ 
ment  requirements  of  the  work,  as  part  of  the  design  process 
several  alternate  paper-based  document-coding  forms  were 
designed.  After  several  iterations,  a  coding  form  was  settled 
upon  that  was  then  incorporated  into  the  electronic  document 
management  system,  both  as  a  form  to  be  scanned  into  a  doc¬ 
ument  database  and  as  a  model  for  an  online  coding  form.  The 


^®For  more  on  the  project  with  the  Department  of  Highways  see  Suchman  (1999,  2000). 
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FIGURE  50.8.  Engineer  using  mock-up  of  coding  form  to  code 
documents. 


The  insights  gained  from  the  ethnographic  study  also  pointed 
to  challenges  that  would  face  engineering  teams  adopting  the 
new  system.  First,  because  members  of  project  teams  would  no 
longer  be  the  sole  interface  to  the  documents  in  the  project  files, 
team  members  would  need  to  consider  who  might  view  the  doc¬ 
uments  and  for  what  purposes  before  deciding  to  add  a  docu¬ 
ment  to  the  database.  This  was  not  necessary  when  the  project 
files  were  paper-based  because  the  physical  location  of  the  doc¬ 
uments,  in  the  engineering  team’s  work  area,  restricted  access. 
Electronic  access  now  meant  that  users  of  the  system  could  be 
located  anywhere  within  the  Department  of  Highways,  making 
explicit  access  controls  necessary.  In  addition,  it  would  be  crucial 
that  an  ongoing  relation  between  the  paper  and  digital  docu¬ 
ment  renderings  be  maintained  as  engineers  found  it  most  use¬ 
ful  to  work  with  the  printouts  of  large  engineering  documents. 
The  online  renderings  were  not  particularly  useful  by  them¬ 
selves.  The  research  and  design  team  was  able  to  anticipate  these 
work  practice  issues,  make  the  highways  engineers  aware  of 
them,  and  suggest  possible  ways  they  could  be  addressed. 


evolution  of  the  coding  form  was  informed  by  the  prompted 
use  of  the  form  by  engineers  at  the  Department  of  Highways 
(Fig.  50.8). 

One  of  the  key  insights  that  came  from  the  ethnographic 
study  was  the  need  to  design  continuing  connections  between 
the  digital  and  physical  document  worlds.  This  included  locat¬ 
ing  familiar  ways  of  organizing  documents  in  the  new  electronic 
system,  and  taking  advantage  of  visual  memory  in  document 
search  and  browsing  by  displaying  page  images  of  the  docu¬ 
ments  and  not  just  the  text  (Fig.  50.9). 


CONCLUSION 


Ethnographic  studies  have  become  an  important  tool  for  de¬ 
signers  and  development  teams  designing  new  information  and 
communication  technologies.  Today  in  academic,  institutional, 
and  corporate  settings  there  is  the  realization  that  understand¬ 
ing  the  everyday  realities  of  people  living  and  working  in  a  wide 
range  of  environments  and  engaged  in  a  myriad  of  activities 
is  essential  for  creating  technologies  and  services  that  provide 
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engaging  and  productive  experiences  for  their  users  9^  Emerg¬ 
ing  from  recent  research  and  practical  experience  is  the  recog¬ 
nition  that  representational  tools  (models,  personas,  scenarios, 
mock-ups  and  prototypes,  etc.)  and  design-and-development 
practices  (collaborative  data  analysis,  video  review  sessions, 
etc.)  are  necessary  for  connecting  ethnographic  studies  and 
technology  design.  Insights  from  ethnographic  studies  do  not 
map  directly  onto  design  specifications  or  straightforwardly 


generate  “user”  requirements.  Instead  ethnographic  studies 
must  be  connected  and  integrated  with  design  agendas  and 
practices.  Those  wishing  to  leverage  the  potential  of  ethno¬ 
graphic  studies  should  not  only  understand  what  motivates  the 
approach  and  is  at  its  foundation  (e.g.,  natural  settings,  holis¬ 
tic,  descriptive,  members’  point  of  view),  but  also  should  rec¬ 
ognize  the  importance  of  creating  the  conditions  in  which  de¬ 
sign  can  take  advantage  of  ethnographic  insights. 
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INTRODUCTION 


Personas  are  a  new  and  relatively  popular  technique  used  to  help 
bring  users  into  the  forefront  of  people’s  minds  during  the  diffi¬ 
cult  process  of  developing  products  and  services.  Yet,  for  many 
practitioners,  the  persona  method  remains  largely  ill  defined  and 
mysterious.  In  this  chapter,  we  present  a  simple  framework  for 
approaching  the  technique — the  persona  lifecycle — which 
sheds  light  on  how  personas  fit  into  a  standard  development  cy¬ 
cle.  We  then  provide  a  little  detail  on  two  critical  aspects  of  the 
method:  creating  and  using  personas.  Our  goal  is  to  enable  the 
reader  to  quickly  and  easily  get  started  with  the  technique. 


IT  IS  HARD  TO  BE  USER-CENTERED 


In  the  best  of  all  worlds,  everyone  working  on  a  product  would 
always  be  thinking  of  the  needs  of  every  person  who  will  ever 
use  the  product.  Real  information  about  users  would  inform 
every  decision  and  the  resulting  product  would  perfectly  sat¬ 
isfy  everyone  who  uses  it.  In  practice,  it  is  hard  enough  to  get 
everyone  working  on  a  product  to  think  about  users  at  all  (see 
Grudin,  1990, 1993,  for  a  discussion  of  obstacles  and  constraints 
in  product  development  organizations).  To  deliver  on  the  prom¬ 
ise  and  benefits  of  user-centered  design,  we  have  to  find  cre¬ 
ative  ways  to  inject  accurate  information  about  real  users  into 
the  chaotic  world  of  product  development. 

It’s  a  rare  product  indeed  that  does  everything  you  want  it  to 
do  in  the  way  you  want  to  do  it.  Why?  Because,  despite  the  fact 
that  building  products  based  on  what  real  people  need  and  want 
seems  obvious,  putting  users  (or  rather,  information  about 
users)  truly  at  the  center  of  the  design  and  development  process 
is  extremely  difficult.  "Why  is  it  so  hard  to  be  user-centered? 

First,  being  user-centered  is  just  not  natural.  Our  more  nat¬ 
ural  tendency  is  to  be  self-centered — to  design  a  product  based 
on  our  own  wants  and  needs  (sometimes  even  if  we  are  not  ac¬ 
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tually  a  user  of  the  product).  Self-centered  design  is  perhaps 
better  than  technology-centered  design  (another  common  in¬ 
clination),  but  most  of  the  time,  the  people  on  your  product  de¬ 
velopment  team  are  not  representative  of  the  target  audience 
for  your  product.  Thus,  as  many  in  the  user-centered  design 
field  have  long  evangelized  (see,  e.g.,  Nielsen,  1993),  it  is  im¬ 
portant  to  understand  the  user. 

Second,  users  are  complicated  and  varied.  It  takes  great  ef¬ 
fort  to  understand  their  needs,  desires,  preferences,  and  be¬ 
haviors.  In  addition,  pleasing  some  of  them  does  not  necessar¬ 
ily  result  in  pleasing  others. 

Third,  those  doing  the  user  and  market  research  to  under¬ 
stand  who  the  users  are  and  how  they  vary  (and  others  that  are 
just  more  in  touch  your  users — e.g.,  the  sales  team  and  the  sup¬ 
port  team)  are  not  typically  the  people  who  actually  design  and 
build  the  product.  Those  who  collect  the  data  try  to  communi¬ 
cate  the  information  it  contains  by  creating  presentations  or  re¬ 
ports,  which  are  helpful  only  if  they  are  absorbed  by  the  right 
people  at  the  right  times.  If  the  important  information  about 
users  is  not  available  at  the  right  time,  or  is  difficult  to  under¬ 
stand  or  to  remember,  product  teams  forge  ahead  to  design  and 
build  features  they  think  the  users  would  like,  or  more  likely, 
what  is  easiest  and  least  costly  to  build. 

The  Word  User  Is  Part  of  the  Problem 

■When  user-centered  design  was  a  new  idea,  simply  injecting  the 
word  user  into  a  design  and  development  process  was  power¬ 
ful;  it  challenged  the  status  quo.  Unfortunately,  incorporating 
the  word  user  into  everyday  corporate  discourse  is  not  enough 
to  foster  effective  user-centered  design. 

Everyone  (we  hope)  assumes  that  they  are  building  products 
with  users  in  mind;  in  many  organizations,  anyone  asked  would 
probably  answer,  “Yes,  I  think  about  the  user  a  lot.”  People  who 
talk  about  the  “user,”  however,  are  almost  never  asked  to  further 
define  the  term,  and  it  is  a  sure  bet  that  each  person  in  the  or¬ 
ganization  would  describe  the  users  in  a  different  way.  If  every- 
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one  in  the  organization  doesn’t  have  a  clear  and  consistent  un¬ 
derstanding  of  whom  they  are  building  the  product  for,  the 
product  can  fail.  It  Is  our  contention  that  the  word  user  cannot 
provide  the  clarity  that  is  required. 

“User”  is  a  catchall  and  ultimately  a  mean-nothing  word.  It  reflects  a 
technology-centric,  rather  than  a  people-centric,  view  of  the  web.  To 
call  someone  a  user  is  largely  meaningless.  The  phrase  “user-friendly” 
should  never  have  had  to  be  invented.  It  implies  that  technology  is 
inherently  hostile  and  that  a  new  discipline — usability — had  to  be 
invented  to  make  it  friendlier.  After  all,  we  don’t  refer  to  cars  as  “driver- 
friendly.” 

We  don’t  refer  to  bicycles  as  “cyclist-friendly.”  We  don’t  refer  to  chairs 
as  “bum-friendly.” 

— Gerry  McGovern,  Consultant,  gerrymcgovern.com.  (From  “Don’t 
call  people  users.”  April  1,  2002.) 

Personas  add  the  detail  and  specificity  needed  to  provide  prod¬ 
uct  development  teams  the  understanding  needed  create  user- 
centered  products. 

Personas  Move  Us  Beyond  the  Word  “User" 

Personas  are  fictitious,  specific,  concrete  representations  of  tar¬ 
get  users.  Personas  put  a  face  on  the  user;  a  memorable,  engag¬ 
ing,  and  actionable  image  to  serve  as  a  design  target.  The  term 
personas  was  originally  adopted  and  popularized  by  Alan 
Cooper  in  his  1999  book.  The  Inmates  Are  Running  the  Asylum: 
■Why  High  Tech  Products  Drive  Us  Crazy  and  How  To  Restore 
The  Sanity  (see  also  Cooper  &  Reimann,  2003). 

Personas  were  born  out  of  a  short  tradition  in  the  User- 
Centered  Design  (UCD)  community  toward  user  and  system 
modeling  and  out  of  a  somewhat  longer  tradition  in  marketing 
around  market  definition  and  customer  representation.  Perhaps 
the  earliest  comes  from  industrial  designer  Henry  Dreyfuss 
wrote  about  “Joe  and  Josephine,”  the  heroes  of  his  book  De¬ 
signing  for  People  (1955/2003),  which  were  created  to  inform 
the  physical  design  of  objects  and  spaces.  Geoffrey  Moore,  in  his 
book  Crossing  the  Chasm  (1991),  described  the  notion  of  “tar¬ 
get  customer  characterizations” — images  of  customers  meant  to 
replace  or  supplement  impersonal  and  abstract  market  seg¬ 
ments  (see  for  example,  Sissors,  1966,  or  more  recently,  "Wein¬ 
stein,  1998).  John  Carroll  has  been  a  longstanding  proponent 
of  scenario-based  design,  which  typically  includes  terse  repre¬ 
sentations  of  users  called  actors  or  agents  (see  also,  Jacobson, 
1995;  Jacobson,  Christerson,  Jonsson,  &  Overgaard,  1992,  re¬ 
garding  “actors”  in  “use  cases”).  Carroll  (1995,  2000a,  2000b) 
argued  that  scenarios  help  designers  and  analysts  focus  on  as¬ 
sumptions  about  people  and  tasks. 

These  and  many  other  representations  of  users  or  customers 
(e.g.,  see  Constantine  &  Lockwood,  2001,  2002;  Mello,  2003;  Up¬ 
shaw,  1995)  are  typically  not  well-rounded  or  complete  descrip¬ 
tions,  but  instead  are  confined  to  a  few  key  details  and  specific 
contexts.  Moreover,  they  do  not  seem  real;  they  are  devoid  of 
personality.  Personas,  on  the  other  hand,  seem  like  real  people. 
As  such,  personas  carry  information  about  users  to  your  prod¬ 
uct  team  in  way  that  other  representations  cannot.  Personas  en¬ 
able  us  to  move  beyond  our  habit  of  referring  to  “users”  and  in¬ 
stead  allow  us  to  find  a  better  way  to  communicate  about  and 


focus  on  the  real  people  we  want  to  use  our  products.  Personas 
humanize  vast  and  disparate  data  sources,  capitalize  on  our  abil¬ 
ity  to  remember  details  about  individual  people,  and,  in  so  do¬ 
ing,  provide  a  usable  and  useful  alternative  to  referring  to  “the 
user.”  In  other  words,  personas  create  a  concrete,  focused,  and 
stable  definition  of  a  specific  audience. 

"While  rigorous  evaluation  of  these  claims  has  not  been  pro¬ 
vided  to  date,  some  case  studies  and  anecdotal  evidence  sup¬ 
port  them  to  varying  degrees  (e.g.,  see  Freed,  2004;  Grudin  & 
Pruitt,  2002;  Hourihan,  2002;  Levinson,  2003;  see  also  the  nu¬ 
merous  sidebar  stories  in  Pruitt  &  Adlin,  2006).  Based  on  our 
own  experience  with  the  approach,  we  believe  that  when  cre¬ 
ated  with  data  and  used  in  a  thoughtfully  during  the  product  de¬ 
velopment  process,  personas: 

•  Make  assumptions  and  knowledge  about  users  explicit,  cre¬ 
ating  a  common  language  with  which  to  meaningfully  talk 
about  users, 

•  Allow  you  to  focus  on  and  design  for  a  small  set  of  specific 
users  (who  aren’t  necessarily  like  you),  helping  you  make 
better  decisions,  and 

•  Engender  interest  and  empathy  toward  users. 


The  Genesis  of  Personas  in  Product  Design: 

Cooper  Takes  “Play  Acting”  One  Step  Further 

Kim  Goodwin,  Vice  President 
and  General  Manager,  Gooper 

"With  the  publication  of  The  Inmates  Are  Running  the  Asylum 
in  1999,  Alan  Cooper  introduced  the  world  to  personas  as  a 
practical  interaction  design  tool;  however,  Alan  and  the  folks 
at  his  leading  design  consultancy.  Cooper,  had  already  been 
using  personas  for  years. 

In  1983,  Alan  was  working  as  a  solo  software  inventor. 
"While  working  on  a  project  management  program  he  called 
“Plan*It,”  Alan  realized  he  needed  to  understand  more 
about  how  project  managers  thought,  so  he  interviewed  a 
handful  of  people.  A  woman  named  Kathy,  who  seemed  the 
most  typical,  was  the  basis  for  the  first  persona-like  model 
in  Alan’s  head.  "While  waiting  for  his  program  to  compile, 
Alan  would  play-act  a  project  manager  very  much  like  Kathy, 
using  the  way  she  thought  and  worked  to  make  decisions 
about  the  design  of  the  application.  Alan  eventually  sold 
Plan*It  to  Computer  Associates,  who  sold  it  to  the  public  as 
SuperProject.  After  that  success,  Alan  went  on  to  use  this 
technique  on  other  projects,  including  the  visual  program¬ 
ming  language  that  became  "Visual  Basic. 

Later,  when  Alan  ventured  into  consulting,  he  found  that 
he  could  not  just  do  what  seemed  right,  because  he  first  had 
to  persuade  other  people.  For  this  reason,  on  a  1995  project 
with  Sagent  Technologies,  Alan  created  Chuck,  Cynthia,  and 
Rob — the  first  real,  Goal-Directed  personas.  After  some  initial 
resistance,  they  worked  as  Alan  had  intended:  they  provided 
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a  way  to  keep  everyone  focused  on  what  users  really  wanted 
to  do,  rather  than  on  all  the  things  they  might  dod 

Since  then,  Cooper  designers  have  refined  and  formal¬ 
ized  the  methods  for  researching,  creating,  and  applying 
personas  over  the  course  of  hundreds  of  projects.  Many 
practitioners  have  begun  using  personas — some  with  excel¬ 
lent  results,  others  not.  The  most  common  reason  for  fail¬ 
ure?  People  miss  the  thing  that  makes  personas  so  uniquely 
effective:  they’re  based  on  a  qualitative  understanding  of 
how  real  people  behave  and — equally  important — why  they 
behave  that  way. 

Today’s  best  personas  are  well  researched,  focused  on 
behavior,  and  documented  as  a  short  story  writer  would  de¬ 
scribe  a  beloved  character:  with  sympathy,  respect,  and  just 
enough  back  story  to  help  you  understand  what  makes  them 
tick.^  Effective  personas  are  based  on  the  kind  of  informa¬ 
tion  you  can’t  get  from  demographics,  survey  data,  or  sup¬ 
positions,  but  only  from  observing  and  interviewing  individ¬ 
ual  people  in  their  own  environments.  That  qualitative, 
first-hand  information  is  not  only  essential  to  design — it  is 
essential  to  persuasion.  If  you  cannot  effectively  persuade 
the  programmers  to  build  it,  the  executives  to  fund  it,  and 
the  marketers  and  sales  people  to  sell  it,  then  the  best  de¬ 
sign  in  the  world  is  a  failure.  Personas  are  not  just  effective 
because  they  are  accurate  representations  of  human  behav¬ 
ior.  They  are  effective  because  they  help  both  designers  and 
stakeholders  understand  user  needs  at  a  gut  level,  which — 
in  spite  of  all  you  hear  about  ROI — is  where  most  business 
decisions  are  made. 


MAKING  PERSONAS  PRACTICAL: 
THE  CREATION  OF  THE  PERSONA 
LIFECYCLE  METHOD 


If  personas  are  such  a  good  thing,  why  isn’t  everyone  using 
them?  Perhaps  one  answer  is  that  creating  and  using  personas  is 
easier  said  than  done.  'When  we  first  started  talking  to  persona 
practitioners  in  2000,  we  noticed  that  many  people  in  the  UCD 
community  could  see  the  value  of  personas  in  their  own  work 
and  to  their  organizations.  Those  that  tried  to  create  and  use 
personas,  however,  were  running  into  a  somewhat  consistent 
set  of  problems  in  their  persona  efforts.  "While  the  idea  of  cre¬ 
ating  a  set  of  target  users  is  reasonably  straightforward,  the  ac¬ 
tual  process  to  create  and  use  personas  can  be  quite  complex. 
"We  heard  many  of  the  same  questions  repeatedly: 

•  How  do  you  decide  whether  personas  are  the  right  thing  to 
do  in  your  organization? 

•  How  do  you  incorporate  data  into  personas?  "What  kinds  of 
data  work? 


•  How  do  you  know  if  personas  are  worth  the  effort  it  takes  to 
create  them? 

•  How  do  you  communicate  personas  once  they  are  created? 

•  How  do  you  use  personas  to  design  great  products? 

•  'What  do  you  do  with  personas  once  a  project  is  finished?  Can 
you  reuse  personas? 

As  we  continued  our  research  over  the  following  years,  we 
discovered  that  many  practitioners  were  having  less-than-stellar 
experiences  with  personas.  Those  that  were  able  to  create  data- 
driven  personas  were  finding  that,  if  they  are  not  well  commu¬ 
nicated  and  managed,  even  well-crafted  personas  are  easy  for 
designers  and  developers  to  ignore.  At  worst,  poorly  executed 
persona  efforts  yield  no  increase  in  user-focus  and  leach  time 
and  resources  from  other  UCD  techniques  and  other  methods 
that  can  improve  product  quality.  "We  found  four  common  rea¬ 
sons  for  the  failure  of  persona  efforts: 

1.  The  effort  was  not  accepted  or  supported  by  the  leadership 
team, 

2.  The  personas  were  not  credible  and  not  associated  with 
methodological  rigor  and/or  data, 

3.  The  personas,  and  the  techniques  for  using  personas,  were 
not  well  communicated, 

4.  The  product  design  and  development  team  did  not  under¬ 
stand  how  to  use  the  personas. 

Once  we  fully  understood  the  questions  and  common 
causes  of  failure,  we  focused  our  attention  on  finding  solutions 
based  on  the  input  and  insights  of  dozens  of  persona  practi¬ 
tioners.  The  persona  lifecycle  was  the  result. 


THE  PERSONA  LIFECYCLE 


The  persona  lifecycle  is  a  metaphorical  framework  that  breaks 
down  persona  creation  and  use  into  sequential  phases  that  map 
on  to  the  life  stages  of  human  reproduction  and  development. 
This  framework  is  composed  of  five  phases:  (1)  Family  Planning, 
(2)  Conception  &  Gestation,  (3)  Birth  &  Maturation,  (4)  Adult¬ 
hood,  and  (5)  Lifetime  Achievement  &  retirement  (see  Fig.  51.1, 
below).  The  phases  of  the  persona  life  cycle  framework  bring 
structure  to  the  potentially  complicated  process  of  persona  cre¬ 
ation  and  highlight  critical  (yet  often  overlooked  or  ignored)  as¬ 
pects  of  persona  use. 

As  the  name  indicates,  the  persona  lifecycle  is  a  cyclical, 
mostly  serial  process  model.  As  the  illustration  in  Fig.  51.1  shows, 
each  stage  builds  on  the  next,  culminating,  but  not  ending  at. 
Adulthood.  Notice  that  the  illustration  also  shows  that  the  final 
stage.  Lifetime  Achievement  &  Retirement,  is  not  immediately 
followed  by  the  first  stage.  This  is  because  different  persona  ef¬ 
forts  culminate  and  restart  in  different  ways;  personas  can  be 
reused,  reincarnated,  or  retired,  depending  on  the  project. 


^For  more  on  the  origin  of  personas,  see  Alan  Cooper’s  article  on  www.cooper.com. 

^For  more  on  creating  better  personas,  see  Kim  Goodwin’s  “Perfecting  Your  Personas”  and  “Getting  from  Research  to  Personas”  on  www.cooper.com. 
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FIGURE  51.1.  The  five  phases  of  the  persona  lifecycle.  This  dia¬ 
gram  is  designed  to  show  both  the  order  of  the  phases  (from 
Family  Planning  through  Conception  &  Gestation,  Birth  &  Matu¬ 
ration,  Adulthood,  and  finally.  Lifetime  Achievement  &  Retire¬ 
ment)  and  the  relative  amount  of  effort  related  to  each  phase. 


If  your  company  has  already  adopted  some  user-centered  de¬ 
sign  methods,  you  should  find  that  the  phases  of  the  persona 
lifecycle  augment  your  existing  process  and  help  you  get  in¬ 
volved  earlier  in  the  product  development  cycle.  Keep  in  mind 
that  the  persona  lifecycle  is  not  meant  to  replace  other  user- 
centered  design  tools  and  is  not  a  complete  user-centered  prod¬ 
uct  design  method  on  its  own.  Rather,  the  persona  lifecycle  is  an 
organized  collection  of  processes  and  tools  that  will  comple¬ 
ment  other  familiar  methods.  You  will  use  personas  to  enhance 
these  methods,  particularly  whenever  there  is  a  need  for  user 
definition  and  reference. 

In  the  remainder  of  this  chapter,  we  will  discuss  all  of  the 
phases  of  the  persona  lifecycle,  focusing  primarily  on  Conception 
&  Gestation  and  Adulthood  because  they  are  the  least  under¬ 
stood  and,  in  many  ways,  the  most  critical  phases  in  a  successful 
persona  effort.  Conception  &  Gestation  and  Adulthood  contain 
information  on  how  to  create  personas  from  data  (and/or  as¬ 
sumptions,  as  necessary)  and  how  to  use  completed  personas  in 
the  planning,  design,  evaluation,  and  release  of  your  products. 


Phase  1 :  Persona  Family  Planning 

Successful  persona  efforts  are  ones  that  are  designed  to  solve 
specific  problems  for  specific  organizations  and  products.  To 
be  successful,  you  have  to  know  what  kinds  of  problems  you 
want  to  solve  and  decide  whether  or  not  personas  are  the  best 
way  to  solve  them.  Persona  family  planning  is  the  strategy  and 
planning  phase  that  precedes  the  creation  of  the  personas. 
Three  major  activities  occur  during  the  family  planning  phase: 

1.  Researching  your  own  organization  (which  we  call  “organi¬ 
zational  introspection”), 

2.  Creating  a  “persona  core  team,” 

3.  Data  collection. 


Organizational  Introspection 

Before  beginning  any  persona  effort,  you  (perhaps  with  the 
help  of  the  persona  core  team  you  assemble)  should  do  some 
careful  thinking  about  the  particular  problems  you  want  to  solve 
with  your  persona  effort.  While  personas  can  help  in  many  ways 
over  the  entire  design  and  development  cycle,  they  cannot  solve 
every  problem,  and  they  are  not  guaranteed  to  be  accepted  by 
the  people  you  feel  need  them  most. 

Successful  personas  are  those  that  meet  the  needs  of  their 
users  and  are  built  to  fit  seamlessly  into  their  host  environ¬ 
ments.  In  the  case  of  personas,  the  users  are  your  colleagues, 
and  the  environment  is  your  workplace  with  its  existing  design 
and  development  process. 

Ironically,  it  is  easy  to  forget  to  turn  our  analytic  eyes  on  our 
own  users,  the  people  on  our  teams  and  in  our  organizations 
who  use  the  “products”  we  produce  (e.g.,  research  reports,  sto¬ 
ryboards,  scenarios,  prototypes,  and  other  artifacts).  We  forget 
to  carefully  consider  who  our  teammates  are,  their  roles  and 
responsibilities  and  goals,  what  is  working  for  them  currently 
and  what  is  not.  We  launch  our  bright,  fresh,  exciting,  user- 
focused  ideas  into  teams  who  are  interested  and  curious,  but 
who  ultimately  just  need  to  get  their  jobs  done.  As  far  as  most 
product  teams  are  concerned,  they  already  know  the  fastest  and 
most  effective  way  to  do  their  jobs;  when  push  comes  to  shove 
and  deadlines  loom  closer,  your  colleagues  will  inevitably  re¬ 
vert  to  tried-and-true  work  habits. 

Organizational  introspection  is,  in  simple  terms,  working  to 
answer  the  following  questions: 

•  What  resources  do  we  have  for  personas  and  other  UCD 

activities? 

•  What  product  problems  do  we  want  to  solve  with  personas? 

•  What  process  problems  do  we  want  to  solve  with  personas? 

•  How  can  we  ensure  that  the  personas  will  be  accepted  and 

used  by  our  colleagues? 

Answering  these  questions  now  will  allow  you  to  decide 
whether  personas  will  be  appropriate  and  helpful.  The  answers 
will  help  you  to  create  reasonable  goals  for  the  persona  effort 
and  predict  the  challenges  you  are  likely  to  encounter  as  you 
create,  introduce,  facilitate,  and  maintain  your  personas.  Armed 
with  that  information,  you  will  create  a  plan  for  your  persona 
effort  that  will  target  the  application  of  your  personas  to  appro¬ 
priate  aspects  of  your  development  process.  Additionally,  this 
information  will  help  you  determine  what  measures  of  success 
(and  ROI)  will  be  needed  at  the  end  of  the  project. 

Creating  a  Persona  Core  Team 

Personas  are  a  simple  idea,  but  they  are  inevitably  more  work 
than  anyone  expects.  Successful  persona  efforts  involve  a  core 
team  of  between  2  and  10  people  who  are  willing  to  find  time 
in  their  calendars  to  dedicate  to  both  persona  creation  and  to 
helping  evangelize  the  value  and  uses  of  the  personas  to  the  rest 
of  the  organization. 

People  who  are  invited  to  join  core  design  teams  do  not  nec¬ 
essarily  have  to  understand  personas  to  be  helpful;  people  who 
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are  sensitive  to  the  need  for  user  focus  in  a  company  will  make 
excellent  core  team  members. 

Your  goal  is  not  to  create  a  team  that  will  duplicate  research 
or  communication  efforts;  rather,  it  is  to  consolidate  some  as¬ 
pects  of  these  efforts  such  that  they  all  contribute  to  the  creation 
of  personas.  Plan  to  include  people  who  are  already  involved  in 
user  research,  market  research,  business  analysis,  task  analysis,  or 
any  other  user-  or  customer-focused  research  or  profiling  activ¬ 
ity.  If  you  have  colleagues  in  any  of  these  specialties,  you  should 
put  them  on  the  “short  list”  for  inclusion  on  the  core  team: 

•  Information  architects,  interaction  designers,  and  HCI 

specialists. 

•  Usability  specialists,  user  researchers,  and  ethnographers. 

•  Technical  writers  and  documentation  specialists,  and  training 

specialists. 

•  Market  researchers,  business  analysts,  and  product  managers. 

These  colleagues  are  likely  to  understand  the  value  of  per¬ 
sonas,  both  for  the  organization  and  for  their  own  projects. 
They  bring  with  them  a  deep  commitment  to  user-centered  de¬ 
sign,  experience  studying,  analyzing,  and  designing  solutions  for 
target  users,  and  an  interest  in  new  methods  to  bring  user-focus 
into  the  entire  organization. 

Identifying  Data  Sources  and  Collecting  Data 

From  one-on-one  interviews  to  widely  published  reports, 
thousands  of  data  resources  are  available  to  persona  practition¬ 
ers.  Generally  speaking,  personas  are  best  created  from  a  com¬ 
bination  of  qualitative  and  quantitative  research.  The  amount 
and  types  of  data  you  will  collect  will  depend  on  how  much  time 
and  money  you  have  to  spend  and  your  own  evaluation  of  how 
much  data  will  be  necessary  to  create  “good”  personas  for  your 
project.  If  warranted  and  possible,  you  will  do  some  original 
user  research  of  your  own,  though  this  endeavor  tends  to  be  the 
most  time  consuming  of  all. 

Most  companies  have  a  great  deal  of  data  “hanging  around.” 
Market  research  reports,  customer  segmentation  studies,  cus¬ 
tomer  service  logs,  web  usage  logs  and  statistics — all  of  these 
sources  can  be  reused  in  your  persona  effort.  In  fact,  we  rec¬ 
ommend  that  you  postpone  future  data  collection  efforts  until 
after  you  analyze  the  data  you  already  have  during  the  Concep¬ 
tion  &  Gestation  phase  of  the  persona  lifecycle.  This  process  will 
help  you  review  your  existing  data  sources  from  a  new  perspec¬ 
tive,  and  you  will  be  able  to  target  future  data  collection  projects 
to  “fill  in  the  holes”  of  your  current  understanding  of  your  users. 
Look  for  both  internal  sources  (e.g.,  market  research  reports 
and  interviews  with  product  support  specialists  and  other  sub¬ 
ject-matter  experts)  and  external  data  sources  (e.g.,  public  web¬ 
sites  that  provide  statistical  and  demographic  information) . 

As  you  identify  and  collect  data  sources,,  we  also  recom¬ 
mend  that  you  identify  and  evaluate  current  assumptions  about 
users  that  exist  within  your  organization.  The  only  assumptions 
that  can  harm  your  product  are  the  ones  you  are  not  aware  of 
We  strongly  believe  that  personas  are  much  more  credible  and 
helpful  if  they  incorporate  and  refer  to  real  world  data.  If  data 
is  simply  unavailable,  or  you  have  no  time  to  collect  and  ana¬ 


lyze  the  available  data,  creating  assumption  or  ad  hoc  personas 
is  extremely  worthwhile.  One  of  the  major  organizational  ben¬ 
efits  of  personas  is  their  ability  to  focus  everyone  on  a  shared 
understanding  of  who  the  user  is  and  what  he  or  she  really 
needs  out  of  the  product  you  are  designing  and  building.  The 
keyword  here  is  “shared:”  everyone  in  your  organization  will  in¬ 
evitably  build  an  internal  understanding  of  the  users  of  your 
product  no  matter  what  you  do.  If,  through  your  efforts,  they 
all  picture  the  same  users — even  if  this  picture  is  built  on  as¬ 
sumptions — your  product  will  benefit. 

Family  planning  ends  at  the  point  at  which  you  have  estab¬ 
lished  that  (a)  personas  are  right  for  your  organization  and  current 
project,  (b)  you  have  buy-in  from  key  individuals,  (c)  you  have  a 
persona  core  team  in  place,  (d)  you  have  a  solid  plan  for  the  rest 
of  the  persona  effort  that  suits  your  product  team’s  needs,  and 
(e)  your  initial  research  and  data  gathering  are  complete. 


Using  Assumption  Personas  to  Help  a 
Multidisciplinary  Team  See  the  Need  for  Personas 

Graham  Jenkin,  User-centered  Design  and  Research 
Executive,  Bank  of  America  ecommerce/ATM 

In  most  companies,  a  range  of  team  members  must  come  to¬ 
gether  to  define  and  build  customer  experiences:  product 
managers,  marketers,  engineers,  researchers,  and  user  expe¬ 
rience  professionals.  While  the  need  for  personas  may  be  ob¬ 
vious  to  user-experience  professionals,  other  team  members 
may  require  some  creative  persuasion. 

At  Bank  of  America,  the  eCommerce  User-Centered  De¬ 
sign  &  Research  (UCDR)  team  used  “assumption”  personas 
to  obtain  cross-team  support  for  a  dedicated  persona  devel¬ 
opment  effort  for  its  Online  Banking  product. 

Online  Banking  team  members  from  UCDR,  Marketing, 
and  Product  Management  were  interviewed  and  asked  to 
imagine  one  consistent  customer  profile  and  to  make  as¬ 
sumptions  about  that  customer’s  relationship  to  the  bank, 
usage  patterns,  motivations,  goals,  needs,  frustrations,  and 
attitudes  toward  technology. 

The  variation  in  the  team  members’  assumptions  was  strik¬ 
ing.  Some  assumed  that  Casey — the  assumption  persona,  a 
young,  urban  professional — had  strong  loyalty  to  the  Bank  of 
America  brand,  while  others  assumed  that  she  had  no  loyalty 
at  all.  Some  assumed  that  Casey  used  Online  Banking  for  day- 
to-day  financial  triage,  while  others  assumed  she  used  it  for 
“big  picture”  planning.  Some  assumed  that  Casey  was  “geeky 
and  wired,”  while  others  assumed  she  preferred  to  wait  for 
family  and  friends  to  introduce  her  to  new  technology. 

Of  course,  no  one  was  right.  Team  member  assumptions 
were  exactly  that — assumptions  based  on  intuition  and  lack¬ 
ing  in  data. 

When  the  interview  results  were  shared,  there  was  no 
doubt  on  the  next  action.  Team  members  could  not  agree  on 
who  precisely  Casey  was,  but  they  were  unified  in  agreeing 
to  develop  and  use  a  single  set  of  personas  based  on  real  cus¬ 
tomer  data. 
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Phase  2:  Persona  Conception  &  Gestation 

Persona  Conception  &  Gestation  is  the  process  of  translating 
raw  data  into  information  and  information  into  personas.  A  lot 
of  the  work  during  the  Conception  &  Gestation  phase  centers 
on  collaboratively  filtering  data  and  organizing  information — 
both  information  that  arises  out  of  the  data  collected  in  family 
planning  and  information  that  arises  from  other  sources,  such 
as  inherent  knowledge  of  (a)  how  people  behave,  (b)  your  busi¬ 
ness  or  product  strategy,  (c)  the  competitive  marketplace,  and 
(d)  technological  affordances  related  to  your  product  domain. 

In  this  section,  we  will  summarize  the  process  we  recommend 
for  getting  from  raw  data  to  completed  personas.  This  process  is 
designed  to  work  best  as  a  series  of  collaborative  meetings  with 
your  core  team.  The  first  three  steps  cover  persona  conception; 
steps  four  through  six  describe  persona  gestation. 

Step  1 :  Identify  and  Assimilate  Assumptions 

Start  with  assumptions?  After  all  this  emphasis  on  data?  You 
bet.  Your  completed  personas  will  incorporate  lots  of  data — 
but  you  will  be  introducing  them  into  an  organization  that  is 
overflowing  with  assumptions.  If  you  can,  identify  categories  of 
users  that  are  important  to  your  business  and  product  domain. 

How  does  your  company  refer  to  users  today?  Every  com¬ 
pany  tends  to  already  have  a  set  of  terms  they  use  to  describe 
the  relationships  and/or  differences  between  their  users.  These 
terms  are  usually  collections  of  defined  user  roles,  user  goals, 
and/or  user  segments. 

Examples  of  users  defined  in  terms  of  segments  (e.g.,  shared 
demographics  or  psychograpics) : 

•  A  large  software  manufacturer  may  think  about  their  users 
as  small  businesses,  medium  businesses,  large  businesses, 
or  home  office  users. 

•  A  travel  agency  might  be  used  to  thinking  about  their  users  as 
recreational  travelers,  families,  or  business  travelers. 

Examples  of  users  defined  in  terms  of  roles  (e.g.,  relation¬ 
ships  they  have  with  a  system) : 

•  A  company  that  makes  online  presentation  software  might 
think  about  their  users  as  audience  members,  presenters, 
presentation  designers,  and  technical  support  professionals. 

Examples  of  users  defined  in  terms  of  goals  (e.g.,  what  they 
are  trying  to  achieve,  in  their  own  terms): 

•  An  online  bank  might  think  that  the  most  interesting  differ¬ 
ences  between  their  customers  are  that  they  have  different 
goals:  “I  have  to  feel  my  money  is  safe,”  “I  have  to  be  able  to 
access  my  account  from  anywhere,”  or  “I  don’t  care  about 
anything  except  low  fees.” 

The  goal  is  simply  to  identify  the  ways  people  in  your  orga¬ 
nization  already  talk  about  categories  of  users.  Identifying  these 
categories  now  (even  if  they  are  based  solely  on  assumptions) 
will  help  you  structure  your  data  processing  and  build  a  bridge 


between  the  ways  people  think  of  users  today  and  the  data- 
driven  personas  you  will  create.  When  you  are  ready  to  commu¬ 
nicate  and  use  your  personas,  you  will  find  it  much  easier  to  do 
so  if  you  can  describe  them  in  language  that  is  already  familiar — 
even  in  the  case  where  your  data  suggests  that  the  initial  cate¬ 
gories  should  be  replaced  by  different  ones. 

Get  the  rest  of  the  assumptions  out  on  the  table.  Sit 

down  with  your  core  team  and  lots  and  lots  of  yellow  sticky 
notes.  Spend  30  minutes  or  so  getting  as  many  assumptions 
down  on  sticky  notes  as  you  can.  Write  one  assumption  on  each 
sticky  note.  These  assumptions  should  describe  what  you  think 
your  company’s  users  are  like  as  individuals.  For  the  core  team 
working  at  an  online  presentation  software  company,  users 
might  include: 

•  A  46-year-old  salesman  who  lives  on  the  East  Coast  and  has  a 
hard  time  staying  up-to-date  with  the  latest  product  innova¬ 
tions  taking  place  at  the  San  Francisco  home  office. 

•  A  CEO  who  has  to  give  a  keynote  presentation  at  a  large  con¬ 
ference,  but  has  to  do  so  remotely. 

•  An  investor  relations  specialist  who  has  to  present  financial 
data  to  analysts  and  stakeholders  at  least  once  a  quarter. 

•  A  marketing  manager  who  has  to  make  sure  that  the  key 
marketing  messages  for  her  company  are  distributed  to  the 
right  people  at  the  right  times. 

After  everyone  has  created  as  many  sticky  notes  as  they  can,  it 
is  time  to  find  patterns  in  the  assumptions.  To  do  this,  you  will 
conduct  an  affinity  exercise.  Write  the  major  categories  of  users 
(the  ones  identified  in  Step  1)  far  apart  from  each  other  on  a 
large  sheet  of  paper.  Ask  everyone  to  place  their  sticky  notes  on 
the  paper  so  that  sticky  notes  with  similar  or  related  assumptions 
are  near  the  appropriate  category  and  near  each  other,  and  dis¬ 
similar  assumptions  are  far  apart.  This  exercise  should  be  a  noisy, 
collaborative  experience  as  members  of  the  core  team  discuss 
placement  and  groupings  of  their  sticky  notes. 

If  some  sticky  notes  do  not  relate  to  any  of  the  listed  cate¬ 
gories,  create  new  areas  on  your  large  sheet  of  paper  for  them. 
As  the  exercise  progresses,  groupings  of  related  sticky  notes 
start  to  form  “clusters.”  When  the  team  feels  that  they  are  fin¬ 
ished  clustering,  ask  them  to  label  the  clusters.  For  example, 
one  cluster  could  be  labeled  “Marketing  People,”  and  another, 
elsewhere  on  the  big  sheet  of  paper,  could  be  labeled  “Investor 
Relations  People.” 

Discuss  what  you  have  found  as  a  group.  Are  there  any  sur¬ 
prises?  This  is  a  great  time  to  think  about  the  kinds  of  informa¬ 
tion  you  would  like  to  find  in  your  data,  or  collect  directly  from 
users.  Your  data  will  validate  and  enrich  the  categories,  or  it  will 
provide  solid  information  to  show  that  the  existing  categories 
are  inappropriate.  It  will  also  allow  you  to  define  important  sub¬ 
categories  of  users  that  should  also  be  expressed  in  personas. 

Step  2:  Process  Your  Data 

It  is  time  to  process  the  raw  data  and  extract  information  rel¬ 
evant  to  your  user-  and  product-domains,  and  then  identify 
themes  and  relationships.  While  you  can  process  your  data  in 
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FIGURE  51.2.  An  example  of  clustered  assumptions  (light  gray  stickles)  and  factoids  (dark  gray 
stickies)  with  cluster  labels  (large  stickles)  and  the  initial  categories  of  users.  Notice  that  there 
are  clusters  that  contain  only  assumptions  or  factoids  in  addition  to  clusters  that  contain  both. 


many  different  ways  to  create  personas,  we  strongly  recom¬ 
mend  that  you  conduct  an  affinity  exercise  like  the  one  you  have 
just  done  with  assumptions.  Before  you  get  started,  assign  a 
number  to  each  of  your  data  sources. 

Identify  key  data  points  (factoids)  in  the  data  sources. 

The  first  step  in  processing  the  data  is  to  consume  and  filter  the 
information  that  is  in  each  of  the  data  sources.  You  will  want  do 
this  because  not  every  data  point  in  a  given  study/report  is  rele¬ 
vant  to  the  definition  of  your  target  audience  or  the  design  of 
your  product.  Whether  it  is  done  before  or  during  the  meeting, 
ask  your  core  team  members  to  highlight  findings  that  they  think 
are  key  to  understanding  your  target  audience,  or  that  are  highly 
insightful  toward  defining  aspects  of  your  product.  In  other 
words,  you  want  them  to  look  for  findings  that  are  relevant  to 
your  market,  industry  or  domain.  Highlight  any  facts  that  seem 
important  to  your  product’s  audience  or  to  the  product  itself 

Each  important  factoid  should  be  copied  or  cut  out  of  the 
original  document.  Remember  to  note  the  source  and  page 
number  on  each  factoid  so  you  can  trace  them  back  to  the  orig¬ 
inal  sources  later.  We  recommend  that  you  use  blue  sticky  notes 
for  this  exercise.  Whatever  color  you  use,  don’t  use  the  same 
color  that  you  used  for  the  assumption  stickies;  you  will  need  to 
distinguish  between  the  assumptions  and  the  factoids. 

Asssimilate  the  factoids.  Now  the  interaction  (and  fun) 
begins.  Everyone  should  add  their  factoids  to  the  clusters  that 
were  created  in  the  assumption  exercise.  Don’t  be  afraid  to 
move  any  of  the  stickies  around  as  you  do  this — you  can  move 
assumptions,  factoids,  or  even  entire  clusters  as  you  assimilate. 

As  you  continue  assimilating,  keep  your  eyes  out  for  large 
clusters,  or  “puddles,”  of  8-10  or  more  sticky  notes.  Large  clus¬ 
ters  can  usually  be  broken  down  further,  and  you  should  do  this 
if  you  can.  Usually,  we  recommend  that  everyone  find  spots  for 
their  stickies  and  then  pair  up  to  review  the  clusters  together. 


During  this  “clean  up”  phase,  you  will  be  able  to  break  up  pud¬ 
dles,  add  descriptive  labels  to  clusters,  and  generally  ensure  that 
the  clustering  makes  sense.  Stop  when  the  activity  dies  down 
and  few  stickies  are  still  being  moved. 

Note  that  you  will  have  clusters  that  contain  both  assump¬ 
tions  and  factoids,  but  you  will  probably  also  find  clusters  of  as¬ 
sumptions  (with  no  factoids)  and  clusters  of  factoids  (with  no 
assumptions).  This  is  illustrated  in  Eig.  51.2.  All  three  types  of 
clusters  are  helpful.  Clusters  of  assumptions  without  factoids 
will  help  you  identify  topic  areas  in  which  you  need  to  collect 
more  data.  Clusters  of  factoids  without  assumptions  tell  you  that 
there  are  aspects  of  your  customers  that  you  have  not  thought  a 
lot  about  yet. 


Another  Perspective 

The  Cooper  Method:  Collect  Data  Directly  From 
Users  and  Identify  Patterns  to  Create  Personas 

Kim  Goodwin,  Vice  President 
and  General  Manager,  Gooper 

Some  people  have  great  success  with  personas,  while  others 
do  not.  Why  is  that?  First,  it  is  important  to  understand  that 
even  the  best  personas  will  not  solve  all  of  your  problems — 
scenarios,  design  judgment,  and  visualization  skill  are  equally 
important.  Assuming  you  have  all  of  those  things,  the  key  to 
success  with  personas  is  to  do  the  right  kind  of  research  and 
to  make  sure  your  personas  truly  reflect  your  findings. 

1.  Start  with  the  right  kind  of  research 

Personas  that  are  made  up  without  data  are  not  really  per¬ 
sonas,  and  although  they  can  still  be  useful  thought  exer¬ 
cises,  they  are  far  less  effective  than  real  personas  that  are 
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based  primarily  on  ethnographic  user  data.  Ethnographic 
techniques  are  valuable  because  they  assume  that  an  interview 
subject’s  attitudes  and  behaviors  are  so  habitual  as  to  be  un¬ 
conscious.  Rather  than  asking  users  what  they  want,  it  is  more 
effective  to  focus  on  what  users  do,  what  frustrates  them,  and 
what  gives  them  satisfaction.  By  combining  interviewing  with 
direct  observation — preferably  in  the  actual  usage  context — 
you  can  get  a  lot  of  data  very  quickly.  Observation  also  helps 
minimize  dependence  on  users’  self-reported  behavior,  which 
is  often  inaccurate. 

At  Cooper,  we  send  the  designers  out  to  do  the  research, 
so  they  see  the  problems  firsthand  and  develop  empathy  with 
the  users.  "We  may  occasionally  have  three  interviewers,  but  two 
of  them  are  consistent  across  all  of  the  interviews;  this  makes 
synthesis  much  easier  later  on.  We  spend  45  minutes  to  an  hour 
with  individual  people.  It  would  be  easy  to  write  a  whole  chap¬ 
ter  on  research  techniques  alone,  but  there  are  a  few  funda¬ 
mental  points.  First,  ask  a  very  broad  question  (such  as  “Could 
you  think  of  a  typical  work  day  recently,  and  walk  me  through 
it?’’)  This  raises  a  number  of  issues  the  interviewers  can  pursue 
without  needing  to  ask  leading  questions.  Another  important 
technique  is  to  stay  case-focused;  in  other  words,  ask  for  spe¬ 
cific  instances  rather  than  generalizations.  This  will  get  you 
more  detailed  and  more  accurate  information.  Also,  be  sure  to 
look  at  and  ask  questions  about  artifacts  or  aspects  of  the  envi¬ 
ronment.  Finally,  focus  on  actual  behavior  and  frustrations, 
rather  than  asking  the  users  to  design  the  product  for  you. 

Contrary  to  some  expectations,  this  kind  of  research  can  be 
done  in  anywhere  from  a  few  days  to  a  few  weeks.  You  will 
know  you  can  stop  interviewing  when  you  can  predict  how 
each  user  will  respond;  this  means  patterns  are  beginning  to 
emerge.  If  you  have  the  time  and  budget,  you  can  verify  your 
findings  with  quantitative  surveys  or  other  techniques,  but 
these  cannot  replace  direct  observation. 

2.  Identify  behavioral  patterns  from  the  data 

Once  you  finish  interviewing,  list  all  of  the  behavioral  vari¬ 
ables  for  each  user  role  (e.g.,  ways  in  which  interviewee  be¬ 
havior  differed).  In  an  online  shopping  domain,  for  example, 
you  might  have  variables  such  as  frequency  of  shopping,  de¬ 
gree  of  enjoyment,  and  price  versus  service  orientation.  You 
may  find  that  some  demographic  variables — such  as  age  or 
technical  skill — seem  to  affect  behavior.  Be  wary  of  focusing 
on  demographics  during  persona  creation,  since  behavioral 


variables  will  have  far  more  impact  on  the  design.  Note  that  if 
you  are  doing  an  enterprise  application,  each  role  will  have 
its  own  set  of  behavioral  and  demographic  variables.  Although 
the  number  of  variables  will  differ  from  project  to  project,  it 
is  typical  to  find  20  or  so  variables  per  role.  Map  each  inter¬ 
viewee  against  the  appropriate  set  of  variables,  and  then  look 
for  people  who  clump  together  across  a  large  number  of  the 
variables.  When  you  have  found  a  set  of  people  clustering 
across  six  or  eight  variables,  there  is  a  good  chance  that  you 
have  found  a  major  behavior  pattern  that  will  form  the  basis  of 
a  persona. 

3.  Turn  your  patterns  into  persona  descriptions 

For  each  pattern,  add  details  based  on  your  data.  Describe 
the  current  potential  usage  environment,  typical  workday  (or 
other  relevant  time  period),  current  solutions  and  frustra¬ 
tions,  relevant  relationships  with  others,  and  goals.  Avoid  the 
temptation  to  add  a  lot  of  irrelevant  personal  detail;  if  you  are 
designing  an  email  tool,  it  does  not  matter  that  your  persona 
wants  to  be  an  actress.  One  or  two  bits  of  personality  can  bring 
an  otherwise  dull  persona  to  life,  but  too  much  biography  will 
be  distracting  and  will  make  the  persona  less  credible  as  an  an¬ 
alytical  tool.  If  every  aspect  of  the  description  cannot  be  tied 
back  to  real  data,  it  is  not  a  persona — it  is  a  creative  writing 
project  that  should  not  be  used  to  make  critical  design  and 
business  decisions.  Describe  each  persona  in  a  one  or  two 
page  narrative  that  helps  stakeholders  understand  what  makes 
them  tick. 

4.  Use  the  personas  to  drive  scenarios,  requirements, 

design  .  .  .  and  communication 

Use  your  personas  to  develop  scenarios;  put  the  personas 
in  realistic  future  situations  and  envision  how  they  would  like 
a  magic  black  box  of  a  product  to  work.  These  scenarios  will 
lead  you  to  a  set  of  needs  you  can  discuss  with  stakeholders. 
You  can  describe  requirements  from  the  personas’  points  of 
view,  which  leads  to  less  resistance  than  requirements  that 
come  from  you.  Once  there  is  agreement  on  the  require¬ 
ments,  additional  scenarios  (along  with  good  design  skills!) 
will  help  create  the  conceptual  framework  for  the  design. 
"When  you  illustrate  the  design  direction  in  a  scenario,  stake¬ 
holders  are  more  likely  to  see  the  value  of  the  solution.  Per¬ 
sonas  will  help  you  all  the  way  to  pixels  and  specifications,  and 
even  through  implementation. 


Step  3:  Identify  Sukategories  of  Users  and  Create  Skeletons 

Look  at  the  data  clustered  under  each  of  your  user  cate¬ 
gories.  As  a  team,  evaluate  and  discuss  the  possibility  that  each 
category  should  be  divided  into  two  or  more  subcategories. 
Consider  roles,  goals,  and  segments  in  this  assessment.  As  you 
identify  subcategories,  you  can  write  them  on  a  whiteboard;  you 
may  also  find  it  helpful  to  transfer  the  subcategory  names  onto 
sticky  notes  and  place  them  appropriately  in  your  assimilated 
data.  In  doing  this  exercise,  you  are  simply  exploring  the  possi¬ 


ble  groups  of  users  that  have  emerged  from  your  data.  Try  to 
identify  “differences  that  make  a  difference”  within  each  cate¬ 
gory,  based  both  on  the  clustered  assumptions  and  the  clus¬ 
tered  factoids. 

Create  skeletons.  Once  you  have  identified  and  agreed 
upon  the  categories  and  subcategories  of  users,  you  are  ready  to 
create  skeletons.  Skeletons  are  very  brief,  usually  bulleted  lists 
of  distinguishing  data  ranges  for  each  subcategory  of  user. 
Skeletons  help  your  core  team  transition  from  thinking  about 
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Skeleton:  Marketing  Manager  \ 

Role:  Presenter  \ 

Communication  requirements  of  job/goals _ \ 

•  Primary  function  is  to  develop  a  good  story  about  her  ^ 
company  and  product  and  to  communicate  that  story  as 
widely  as  possible  (1,6) 

•  Mariccting  Managers  have  to  communicate  frequently  with 
industry  analysts,  members  of  the  press,  partners,  distributors, 

Frequency  of  presentations 

•  65%  of  marketers  said  that  their  jobs  require  them  to 
distribute  updated  information  to  over  100  people  a  month.  (8, 

15) 

Presentation  methods  used  today 

•  70%  of  marketing  professionals  host  at  least  10  conference 
calls  a  month.  (8,  12) 

•  .  .most  marketing  professionals  rely  heavily  on 
presentation  software  to  get  their  jobs  done."  (2, 4) 


FIGURE  51.3.  An  example  of  a  skeleton  created  out  of  assimi¬ 
lated  data.  Note  that  the  skeleton  includes  factoids  grouped 
according  to  topic;  it  does  not  include  any  narrative  details  or 
"personalized"  information.  The  numbers  refer  to  the  data  source 
number  and  page  number. 

categories  of  users  to  focusing  on  specific  details;  they  also  al¬ 
low  your  team  to  present  the  key  findings  of  the  assimilation  ex¬ 
ercise  to  stakeholders. 

Create  one  skeleton  for  each  of  the  subcategories  you  iden¬ 
tified.  On  each  skeleton,  list  the  cluster  labels  that  relate  to  that 
subcategory;  these  cluster  labels  will  become  headings  in  your 
skeleton  (see  Fig.  51.3).  Because  you  will  be  comparing  skele¬ 
tons  to  and  prioritizing  skeletons  against  each  other,  it  is  im¬ 
portant  that  each  one  contain  at  least  somewhat  comparable 
information.  Consider  including  common  characteristics  or 
headings  across  all  of  your  skeletons.  (If  you  do  this,  you  may 
find  that  you  are  missing  information  for  some  skeletons.  In 
those  cases,  either  leave  that  information  blank,  perhaps  mark¬ 
ing  it  as  “need  data,”  or  make  an  informed  estimation  about 
what  it  might  be.  If  you  do  the  latter,  be  sure  to  indicate  that  it  is 
an  assumption  to  be  followed  up  on.) 

Feel  free  to  create  as  many  skeletons  as  you  and  your  team 
feel  are  necessary  to  “cover”  the  discoveries  you  made  during 
the  clustering  processes.  You  will  have  the  opportunity  to  com¬ 
bine  and  prioritize  them  in  the  next  steps. 


Sometimes  Categories  of  Users  (and  Personas) 
Should  be  Based  on  Verbs,  Not  Nouns 

Karen  McGrane,  Vice  President,  User  Experience, 
Avenue  A  |  Razorfish 

Though  the  method  of  developing  personas  is  relatively  new, 
it  seems  a  given  that  a  persona  maps  to  a  person.  The  ap¬ 
proach  taken  by  user-centered  designers  focuses  on  under¬ 


standing  the  needs,  goals,  and  mindsets  of  a  given  individual. 
A  persona  usually  describes  a  single,  stable  individual  whose 
interaction  with  a  company,  product,  or  website  stays  con¬ 
sistent  throughout  the  experience. 

What  happens  when  the  goals  and  mindsets  of  an  indi¬ 
vidual  user  can  change  rapidly  over  the  course  of  their  rela¬ 
tionship  with  a  company — or  even  over  the  course  of  a  sin¬ 
gle  session?  When  we  observed  people  visiting  the  New  York 
Times  website,  we  realized  that  individual  users  did  not  fall 
neatly  into  a  single  mode  of  use.  People  tended  to  switch 
their  goals  over  time.  People  used  the  site  differently  in  the 
morning,  when  they  wanted  to  read  headlines  and  catch  up 
on  the  day’s  news,  as  compared  to  the  afternoon,  when  they 
might  be  looking  to  take  a  break  over  lunch  or  between 
meetings.  We  often  observed  people  change  modes  in  the 
course  of  a  single  session,  moving  fluidly  from  a  news-read¬ 
ing  mode,  scanning  headlines,  into  a  planning  mode,  where 
they  would  research  restaurants  and  read  movie  reviews. 

"When  it  came  time  to  document  our  personas,  we  real¬ 
ized  that  the  standard  way  of  documenting  them  would  be 
limiting.  We  did  not  want  to  imply  that  “Ketan,  a  54-year-old 
technology  consultant”  was  our  only  “news  junkie,”  or  that 
“Lisa,  a  32-year-old  account  executive”  would  always  oper¬ 
ate  in  a  “planning  mode.”  Making  a  one-to-one  connection 
between  the  person  and  the  action  seemed  inaccurate,  since 
we  knew  each  person  would  most  certainly  take  multiple 
actions  during  their  use  of  the  site. 

"What’s  more,  putting  the  emphasis  on  the  noun  (the  per¬ 
sona)  rather  than  the  verb  (the  activity)  did  not  really  help 
the  New  York  Times  achieve  their  business  goals.  "While  they 
absolutely  wanted  to  create  a  usable  experience  for  a  visitor 
operating  within  a  given  mode,  they  also  wanted  to  know 
what  would  prompt  someone  to  change  modes,  since  shift¬ 
ing  modes  extends  the  experience  to  more  page  views  and 
more  frequent  visits.  As  experience  designers,  we  concluded 
that  we  didn’t  need  to  focus  on  the  person(a.)  We  needed  to 
focus  on  the  activity — the  verb,  the  action,  the  mode  of  use 
(see  Fig.  51.4). 

Clients  are  eager  understand  their  customers  better,  and 
many  focus  on  personas  as  the  tool  that  will  help  them  do  that. 
In  the  future,  I  will  encourage  my  clients  to  think  about  other 
means  of  understanding  customer  behavior.  I  think  one  of  the 
most  important  ways  they  can  do  this — outside  of  personas — 
is  to  learn  more  about  customer  goals  and  modes  of  use. 


Step  4:  Prioritize  the  Skeletons 

Once  you  have  a  set  of  skeletons,  it’s  time  to  review  them  in 
preparation  to  get  feedback  from  all  stakeholders.  When  you 
meet  with  stakeholders,  you  will  evaluate  the  importance  of 
each  skeleton  to  your  business  and  product  strategy,  and  prior¬ 
itize  the  skeletons  accordingly.  The  stakeholders  will  help  to 
identify  a  subset  of  skeletons  to  develop  into  personas. 

Before  you  meet  with  the  stakeholders,  it  is  a  good  idea  to  do 
some  initial  prioritization  of  the  skeletons.  The  core  team 
should  carefully  review  the  skeletons  you  have  created  and 
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FIGURE  5 1 .4.  One  of  the  "verb-based"  persona  posters  for  the  New  York  Times  project.  Note 
that  the  poster  is  focused  on  the  word  Planner  and  not  on  a  specific  persona  name.  In  fact,  the 
"Planner"  poster  includes  details  about  four  individual  personas. 


make  sure  you  agree  that  each  one  truly  does  reflect  a  “differ¬ 
ence  that  makes  a  difference”  between  subcategories  of  users  as 
identified  earlier.  In  many  cases,  it  will  be  possible  to  reduce  the 
number  of  skeletons  by  combining  several  of  them.  Once  you 
have  a  set  that  you  feel  should  not  be  further  reduced,  prioritize 
them  in  a  way  that  makes  sense  to  the  core  team.  This  will  give 
the  stakeholders  as  “strawman”  to  work  with — which  is  easier 
than  asking  them  to  prioritize  a  series  of  skeletons  from  scratch. 
As  you  do  this,  think  about  the  following  questions: 

•  Is  this  category  of  subcategory  important  to  our  product  (rel¬ 
evant,  unique,  illuminating) 

•  How  important  is  it  to  our  business? 

•  Are  there  any  groups  missing? 

•  Are  some  of  the  categories  almost  right,  but  a  few  of  the  char¬ 
acteristics  are  “off”  and  need  tweaking? 

Note  that  you  should  prioritize  skeletons  within  each  major 
category  of  users.  For  example,  prioritize  all  the  “business  trav¬ 
eler”  skeletons  relative  to  one  another,  and  the  “families”  skele¬ 
tons  separately.  Eventually,  you  will  want  to  know  the  priorities 
of  your  personas  across  these  categories,  but  first,  you  will  need 
to  narrow  in  on  the  correct  skeletons  within  each  category. 

Now  schedule  a  meeting  with  stakeholders  empowered  to 
make  decisions  about  the  strategic  focus  of  the  company.  If 
stakeholders  are  not  aware  of  the  data  and  general  process  that 
led  to  these  skeletons,  present  that  information  before  intro¬ 
ducing  the  skeletons  to  them.  Introduce  the  skeletons  in  one 
category  at  a  time,  and  ask  them  to  assign  priorities  based  on: 

•  Frequency  of  use:  How  often  would  each  skeleton  use  your 
product?  Daily  users  would  likely  be  more  important  regard¬ 
ing  design  decisions  than  those  that  only  use  your  product 
once  a  month. 

•  Size  of  market:  Roughly  how  many  people  does  each  skele¬ 
ton  represent?  Larger  markets  are  usually  more  important 
than  smaller  ones.  Do  you  plan  to  aim  your  new  product  at  a 


new  market?  In  that  case,  you  might  consider  the  importance 
of  a  small  market  with  growth  potential. 

•  Historic  or  potential  revenue:  How  much  purchasing 
power  does  each  skeleton  encompass?  If  this  is  a  new  prod¬ 
uct,  you  may  have  to  estimate  this  amount  (e.g.,  through 
trade  journals,  market  trends,  market  research,  and  under¬ 
standing  spending  behaviors  in  different  by  related  markets). 
In  many  cases,  users  might  not  directly  make  the  purchase; 
someone  else  buys  such  products  for  them.  Still,  they  may  in¬ 
fluence  those  purchase  decisions. 

•  Strategic  importance:  Decide  who  is  your  most  strategi¬ 
cally  importance  audience.  Is  it  (a)  those  who  make  the  most 
support  calls,  (b)  those  that  rely  on  your  product  for  critical 
activities,  (c)  those  that  use  your  competitor’s  product,  or 
(d)  those  that  don’t  use  yours  or  anyone’s  product  yet?  Are 
you  trying  to  expand  or  grow  your  market?  If  this  is  your  pri¬ 
mary  goal,  do  your  skeletons  include  nonusers,  technology 
pioneers,  or  trend  setters?  "Which  target  audiences  will  help 
your  team  innovate  or  stretch? 

Prioritization  can  be  difficult,  and  your  first  meetings  will 
likely  end  in  requests  for  more  data.  For  example,  stakeholders 
might  ask  you  to  find  data  on  the  market  size  for  each  of  the 
skeletons  before  they  feel  comfortable  prioritizing.  If  prioritiza¬ 
tion  feels  impossible,  it  might  mean  that  the  business  goals  are 
not  clear.  This  can  lead  to  some  delicate  conversations.  In  gen¬ 
eral,  remind  the  stakeholders  that  you  want  to  prioritize  the 
skeletons  so  that  they  accurately  reflect  the  company’s  vision  for 
itself  and  the  product.  The  stakeholders  must  be  involved  in  the 
prioritization  of  the  skeletons,  because  they  are  the  ones  who 
are  setting  the  course  for  the  business.  Be  willing  to  go  back  and 
try  to  find  more  data  if  necessary. 

Step  5:  Develop  Selected  Skeletons  into  Personas 

You  now  have  a  reduced  set  of  basic  skeletons.  Your  task 
at  this  point  is  to  enrich  these  skeletons  to  become  personas 
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by  adding  additional  data,  as  well  as  concrete  and  individual¬ 
ized  details  to  give  them  personality  and  context.  You’ll  also  in¬ 
clude  some  storytelling  elements  and  photos  to  make  them 
come  to  life. 

As  you  build  on  your  skeletons,  all  of  the  details  of  your  per¬ 
sonas  will  be  encapsulated  in  a  “foundation”  document  (see 
Fig.  51.5).  Foundation  documents  contain  the  complete  defini¬ 
tion  of  a  given  persona,  but  they  do  not  have  to  be  long  or  dif¬ 
ficult  to  create.  Depending  on  your  goals  and  the  needs  of  your 
team,  your  foundation  document  could  range  from  a  single 
page  to  a  long  document.  Creating  a  foundation  document  for 
each  persona  will  provide  you  and  your  team  with  a  single  re¬ 
source  you  can  harvest  as  necessary  while  creating  your  persona 
communication  materials.  At  the  very  least,  complete  personas 
must  include  core  information  essential  to  defining  the  per¬ 
sona — the  goals,  roles,  behaviors,  segment,  environment  and 
typical  activities  that  make  the  persona  solid,  rich  and  unique — 
and  more  importantly,  relevant  to  the  design  of  your  product. 

If  you  are  extremely  time-  and  resource-constrained,  you  can 
start  with  brief  one-page  or  resume-style  foundation  documents. 
Then,  as  you  find  the  time,  you  can  always  come  back  and  add  to 
the  information  in  these  short  foundation  documents. 

When  you  are  deciding  which  characteristics  to  include  in 
your  foundation  documents,  think  about  the  types  of  informa¬ 
tion  that  will  be  most  helpful  to  your  core  team  and  to  the  de¬ 
velopment  team.  We  recommend  you  include  at  least  rudimen¬ 
tary  information  in  each  of  the  following  categories: 

•  Photo  (s) 

•  Name 

•  Age 

•  Goals/motives/concerns 

•  Personal  details/family  life 

•  Income/Spending  habits 

•  Work/job  details 

•  Use  Environment/artifacts 

•  Activities/use  scenario 

•  Knowledge/skills/abilities 

•  Likes/dislikes 

•  Quotes 

•  Market  Size/Influence 

To  create  the  content  of  these  sections,  you  will  turn  the 
characteristics  in  your  skeleton  personas  into  very  specific, 
more  concrete  values.  These  specific  details  turn  your  collection 
of  facts  into  representations  of  people.  For  example, 

•  High-Tech,  publicly  traded  companies  becomes  “PrinterCo.” 

•  Works  in  a  communications  role  becomes  Marketing  Manager. 

•  70%  Female  becomes  Megan,  Dianne,  Irene,  and  so  forth. 

•  Lives  in  a  major  metropolitan  city  becomes  Chicago,  LA,  or 
Houston. 

As  you  replace  factoids  with  specific  details  to  enrich  your 
persona,  copy  the  factoid  or  set  of  factoids  into  a  comment  or 
a  footnote  in  your  foundation  document.  A  lofty  but  worthy  goal 


is  to  have  every  statement  in  your  Foundation  Document  sup¬ 
ported  by  user  data.  You  will  not  likely  achieve  this,  but  the  at¬ 
tempt  helps  you  to  think  critically  about  your  details,  and  high¬ 
lights  places  where  you  might  want  to  do  further  research. 

Moving  toward  precision  means  moving  away  from 
accuracy.  In  many  cases,  the  accuracy  of  your  data  lies  in  its 
ranges  (not  just  central  tendencies,  but  descriptors  of  variance, 
percentages,  and  skew),  and  by  selecting  precise  descriptors, 
you  are  going  to  lose  some  of  that  accuracy.  For  example,  if  a 
category  includes  males  and  females,  you  cannot  create  a  sin¬ 
gle  individual  who  “represents”  the  entire  category.  Rather  than 
trying  to  represent  every  nuance  of  the  entire  category,  try  to 
pick  values  that  are  reasonable,  believable,  and  meaningful. 

Think  of  your  data,  and  your  categories  and  subcategories 
of  users,  as  describing  neighborhoods  of  related  users  of  your 
product.  As  you  create  your  personas,  you  are  describing  a  sin¬ 
gle,  specific  “resident”  of  each  neighborhood;  as  in  real  life, 
each  resident  inhabits  his  or  her  neighborhood,  but  no  one  res¬ 
ident  can  represent  all  the  qualities  of  all  the  people  in  the 
neighborhood. 

Incorporate  narrative  and  storytelling  elements. 

Enriching  your  terse  skeletons  into  personas  that  are  realistic 
and  engaging  requires  some  storytelling.  To  do  this  well,  con¬ 
sider  that  you  are  trying  to  “tell  the  story”  of  the  data  in  your 
foundation  documents  with  narrative.  How  do  your  personas 
sound  and  act?  What  can  they  do  or  not  do?  Turn  your  factoids 
and  specific  details  into  a  running  story — a  sequence  of  actions 
and  events  with  interaction  and  a  plot.  Demonstrate  their  in¬ 
teractions  with  people,  objects,  and  systems.  In  the  best  case, 
these  stories  are  modeled  on  real,  representative  cases  revealed 
through  qualitative  research. 

Narratives  in  persona  documents  are  usually  written  in  third 
person,  active  voice.  Here  is  an  example  of  a  beginning  descrip¬ 
tive  overview  for  a  marketing  manager  named  Megan  written 
as  a  narrative: 

Megan  is  the  product  marketing  manager  for  PrinterCo,  a  leading 
printer  manufacturer.  Part  of  Megan’s  job  is  to  create  and  foster  the 
company’s  image  as  a  cutting-edge  technology  company.  Her  primary 
function  is  to  develop  a  good  story  about  her  company  and  product, 
and  to  communicate  that  story  as  widely  as  possible.  She  spends  her 
time  thinking  about  how  to  educate  people  about  PrinterCo’s  products 
and  to  build  relationships  between  customers  and  PrinterCo. 

Be  careful  when  evoking  stereotypes  or  any  information  that 
could  elicit  a  strong  personal  response.  "When  in  doubt,  choose 
to  include  details  that  help  others  see  your  persona  as  a  real 
person,  with  particular  and  understandable  goals,  needs,  and  in¬ 
terests.  Allow  realism  to  win  out  over  political  correctness;  avoid 
casting  strongly  against  expectations  if  it  will  undermine  credi¬ 
bility.  Break  the  mold  if  it  helps  get  people  on  board  with  your 
effort.  Alan  Cooper  (1999)  addressed  this  issue  by  stating,  “all 
things  being  equal,  I  will  use  people  of  different  races,  genders, 
nationalities,  and  colors”  (p.  128). 

Illustrate  your  personas.  Each  persona  needs  a  face, 
or  a  photo  or  set  of  photos,  to  make  them  real.  We  believe  pho¬ 
tos  or  illustrations  are  critical;  they  help  your  team  believe  in  the 
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personas  and  understand  that  each  persona  describes  a  single 
person.  The  choice  of  what  specific  photos  to  use  is  a  hard  one. 
These  illustrations  of  your  personas  are  extremely  influential 
and  can  significantly  affect  how  your  personas  are  perceived. 

A  photo  is  more  than  just  a  face;  the  model’s  clothing,  ex¬ 
pression,  activity,  and  general  appearance  along  with  the  setting 
and  background  will  all  communicate  or  dictate  some  of  the 
characteristics  of  your  persona.  You  can  either  take  advantage  of 
this  fact  or  continually  fight  it.  In  general: 

•  Avoid  “slick”  stock  photos.  Photos  of  models  look  like  photos 
of  models — not  of  real,  everyday  people.  A  great  alternative  is 
to  take  photos  of  friends-of-friends.  Do  not  use  photos  of  a 
person  that  anyone  on  the  product  team  knows.  If  you  do  take 
your  own  photos,  take  many!  You  can  use  different  photos  at 
different  times  to  keep  people  interested  in  the  personas. 

•  If  you  cannot  take  your  own  photos,  look  for  photos  online. 
Flickr.com  is  a  great  example  of  an  online  photo-sharing  site, 
and  you  can  look  in  the  Flickr  “Creative  Commons”  area  for 
work  by  photographers  who  are  willing  to  let  others  use  their 
photos  royalty-free.  Do  a  search  for  “portraits”  to  find  hun¬ 
dreds  of  potential  personas! 

If  possible,  include  multiple  photos  in  your  foundation  doc¬ 
uments  to  illustrate  your  persona. 


Step  6:  Validate  Your  Personas 

Once  you  have  added  details,  it  is  important  to  double-check 
to  make  sure  your  final  personas  still  reflect  your  data.  Your  goal 
is  to  ensure  that  you  did  not  stray  too  far  away  from  your  data 
when  you  made  their  characteristics  specific  and  concrete  and 
added  elements  of  story  telling.  While  it  is  true  that  personas 
cannot  and  do  not  need  to  be  completely  accurate,  you  do  want 
to  ensure  that  they  reflect  the  essential  information  about  your 
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About  once  a  month  she  has  the  need  to  communicate  to  a  larger  dispersed  audience^ 
which  she  cuirestly  does  by  sending  her  audience  members  a  PowerPoint  presentation 
and  scheduling  a  conference  call.  On  the  call.  sIk  tells  eyeiytm  when  to  flip  iq  Ite  next 
slide. 


Megan  is  beginning  to  feel  that  these  conference  call  presentations  aren't  working  very 
well  for  her  for  sever  rtasens: 

*  She  needs  to  be  able  to  reach  larger  audiences  and  her  conference  call  solution  is  too 
expeiuive 

*  She  wants  more  control  over  the  presentation — she  doesn't  like  sending  out  her 
PowerPoint  deck 

*  Sometimes  people  want  information  she  didn't  predict  they  would  want,  and  therefore 
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FIGURE  51.5.  A  portion  of  an  example  persona  foundation  doc¬ 
ument.  Note  that  the  callouts  on  the  side  of  each  page  are  "fac¬ 
toid”  references. 


target  users  that  you  found  in  your  data.  If  you  built  assump¬ 
tion  personas,  you  should  ensure  that  the  personas  you  cre¬ 
ated  really  do  capture  the  assumptions  in  your  organization. 

To  validate  your  personas,  you  can  do  one  or  more  of  the 
following: 

•  Review  your  personas  against  the  original  data  sources. 

•  Have  experts,  those  that  are  closest  to  your  users,  review 
your  personas. 

•  Have  representative  users  of  each  persona  review  “their”  per¬ 
sona. 

•  Conduct  “reality  check”  site  visits. 

•  Conduct  large  sample  surveys  or  interviews  and  apply  statis¬ 
tical  analysis  or  modeling. 

These  five  approaches  are  not  mutually  exclusive,  nor  are 
they  the  only  means  to  validate  your  personas.  Treat  the  valida¬ 
tion  process  as  an  opportunity  to  gather  even  more  data  about 
your  users,  incorporating  significant  findings  back  into  the  per¬ 
sona  definitions. 

Completed  personas  do  not  mark  the  end  of  user 
research.  When  you  finish  the  creation  of  your  personas,  you 
may  be  tempted  to  think  that  you  do  not  need  to  further  un¬ 
derstand  (do  research  on)  or  involve  real  users  in  the  develop¬ 
ment  of  your  product.  From  our  perspective,  this  could  not  be 
further  from  the  truth.  We  believe  personas  are  a  great  starting 
point  for  understanding  and  incorporating  user  information  in 
the  development  cycle. 

Phase  3 :  Persona  Birth  &  Maturation 

Personas  are  typically  created  and  communicated  as  static  doc¬ 
uments  or  posters  that  provide  a  snapshot  of  interesting  and  rel¬ 
evant  information  about  users.  In  fact,  the  foundation  docu¬ 
ment  we  describe  above  is  a  case  in  point  (see  Fig.  51.5).  These 
artifacts  have  proven  helpful,  largely  because  they  help  make  in¬ 
formation  about  users  highly  accessible,  engaging,  and  memo¬ 
rable  to  people  making  decisions.  Such  representations  of 
users,  however,  are  not  “alive.”  They  are  depicted  as  motion¬ 
less  portraits,  usually  contained  within  a  single,  finite  document 
and  presented  as  such.  There  is  no  room  for  growth  or  devel¬ 
opment.  That  is,  unlike  a  character  in  a  book  or  film,  such  de¬ 
scriptions  do  not  evolve.  Moreover,  the  team  using  them  is  sup¬ 
posed  to  “get  to  know  them”  almost  instantly.  "When  we  get  to 
know  a  friend,  neighbor,  colleague,  or  even  a  character  in  a  fa¬ 
vorite  book  or  TV  show,  we  build  up  an  understanding  of  them, 
a  relationship  with  them.  Once  we  know  people,  we  are  able  to 
understand  why  they  do  what  they  do,  what  they  want,  and 
what  they  need.  Engendering  this  level  of  understanding  is  the 
next  frontier  for  user  representation. 

We  believe  you  must  enable  the  personas  to  “come  to  life,” 
allowing  them  to  be  alive  and  to  develop  in  the  minds  of  the 
people  using  them.  Toward  this  end,  we  propose  that  persona 
practitioners  must 

•  Embrace  the  challenge  of  communicating  information  about 
users  through  narrative  and  storytelling. 
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•  Incorporate  a  variety  of  formats  and  media  to  communicate 
the  essential  persona  characteristics. 

•  Maintain  a  lifecycle  perspective  when  educating  colleagues 
about  the  personas. 

•  Allow  the  people  using  the  personas  to  extrapolate  from  and 
extend  them. 

In  other  words,  personas  should  be  more  than  a  static  col¬ 
lection  of  facts.  Personas  are  compelling  stories  that  unfold 
over  time.  To  be  very  clear,  we  are  not  suggesting  that  per¬ 
sonas  change  drastically  over  time,  take  on  new  characteristics, 
or  develop  new  skills;  they  are  not  to  be  moving  targets.  We 
believe  that  successful  personas  and  persona  efforts  are  built 
progressively;  just  as  we  get  to  know  people  in  our  lives,  we 
must  get  to  know  personas  (and  the  data  they  contain)  by  de¬ 
veloping  a  relationship  with  them.  No  single  document,  read 
in  a  few  minutes  or  posted  on  a  wall,  can  promote  the  kind  of 
rich,  evolving  relationship  with  information  about  users  that 
is  the  cornerstone  of  good  product  development.  No  single 
document  can  contain  the  wave  of  scenarios  and  stories  that 
your  personas  will  inspire.  Personas  must  be  aided,  if  they  are 
to  live  in  the  minds  of  your  colleagues.  As  long  as  the  per¬ 
sonas  are  well  built,  data-driven,  and  thoughtfully  communi¬ 
cated,  the  product  team  can  use  the  personas  that  come  to 
exist  to  generate  new  insights  and  seek  out  the  right  details 
when  they  need  them. 

Like  parents  sending  young  children  off  to  school,  you  and 
your  core  team  will  send  your  personas  into  your  organization 
to  interact  with  other  people.  The  personas  are  fully  formed  but 
may  continue  to  evolve  as  your  team  becomes  familiar  with 
them.  Problems  at  this  phase  can  lead  to  a  lack  of  acceptance 
or  visibility  or  personas  that  “die  on  the  vine”  and  disappear 
from  the  project.  More  subtly,  your  personas  may  come  to  be 
misconstrued  and  misinterpreted.  Successful  persona  Birth  & 
Maturation  requires  a  strong,  clear  focus  on  communication  to 
ensure  that  your  personas  are  not  just  known  and  understood, 
but  adopted,  remembered,  and  used  by  the  product  team.  The 
Birth  &  Maturation  process  includes 

•  Creating  a  persona  campaign  plan  to  organize  your  work  in 
Birth  &  Maturation  and  Adulthood. 

•  Introducing  the  personas  (and  the  persona  method)  to  the 
product  team. 

•  Ensuring  that  the  personas  are  understood,  revered,  and 
likely  to  be  used  (e.g.,  creating  artifacts  to  progressively  dis¬ 
close  persona  details). 

•  Managing  the  minor  changes  to  the  persona  descriptions  that 
become  necessary  after  the  personas  are  introduced. 

At  this  phase,  you  must  be  prepared  to  answer  the  difficult 
questions  that  will  inevitably  come  up  as  you  introduce  the 
personas.  You  will  have  to  be  prepared  to  discuss  (a)  the  pro¬ 
cess  you  used  to  create  the  personas,  (b)  their  utility,  (c)  the 
ways  you  would  like  the  product  team  to  use  the  personas,  and 
(d)  the  ways  you  intend  to  measure  the  value  of  the  persona 
effort.  The  work  you  did  in  the  family  planning  phase  will  come 
in  quite  handy  as  you  prepare  your  answers  to  these  inevitable 
questions. 


Using  Personas  to  Build  Teams 

Aviva  Rosenstein,  Manager,  Design  Research, 

Yahoo!  Media  Group 

The  Yahoo  Photos  team  supports  an  application  for  sharing, 
printing  and  organizing  personal  photos.  This  team  has  been 
in  the  vanguard  in  adopting  Agile  development  approaches 
at  Yahoo  as  an  alternative  to  the  traditional  waterfall  devel¬ 
opment  process.  To  make  this  transition  to  an  Agile  ap¬ 
proach  successful,  we  needed  to  find  ways  to  bring  engi¬ 
neers  and  designers  together  into  a  single,  cohesive  team. 

When  introducing  the  persona  set  to  the  project  team,  I 
used  our  design  personas  to  create  connections  between  the 
engineering  and  design  staff  while,  at  the  same  time,  famil¬ 
iarizing  the  entire  team  with  our  target  customers,  by  creat¬ 
ing  a  simple  game  requiring  knowledge  of  each  of  the  per¬ 
sonas’  characteristics.  The  design  team  scheduled  a  happy 
hour  to  introduce  the  personas,  created  handouts  and 
posters  with  information  about  each  persona,  and  invited 
the  entire  product  team,  including  all  the  designers,  devel¬ 
opers  and  product  managers  assigned  to  the  property,  to 
come  play  the  game  (see  Fig.  51.6). 

The  exercise  was  a  success;  everyone  had  to  study  the 
persona  artifacts  to  determine  what  questions  they  could 
ask,  which  gave  them  a  head  start  in  internalizing  our  per¬ 
sonas.  In  addition,  the  game  provided  an  enjoyable  context 
for  designers  and  engineers  to  cross  social  barriers  and  begin 
interacting  with  each  other. 

The  persona  artifacts  served  as  “boundary  objects”  (Star 
&  Griesemer,  1989),  a  way  of  bridging  communication  gaps 
between  disparate  functional  and  organizational  worlds.  The 
exercise  itself  broke  down  the  barriers  that  existed  between 
the  engineering  and  design  groups  and  allowed  the  design 
team  to  get  the  rest  of  the  developers  to  empathize  with  the 
needs  of  our  customers,  but  the  actual  impact  of  the  persona 
artifacts  went  even  further.  After  this  exercise,  team  mem¬ 
bers  commonly  and  naturally  began  to  use  our  personas  to 
refer  to  actual  customers  and  their  needs,  in  storyboarding, 
use  cases,  and  requirements  documentation.  Project  man¬ 
agement  used  the  personas  to  communicate  value  proposi¬ 
tions  with  executive  staff  Marketing  used  the  persona  arti¬ 
facts  to  communicate  target  audience  characteristics  to  our 
outside  PR  agency.  Design  management  used  the  handouts 
to  communicate  requirements  to  an  outside  design  firm  re¬ 
tained  to  develop  a  specific  part  of  the  service.  It  gave  all  of 
the  teams  a  shared  language  for  talking  about  our  business. 
In  addition,  introducing  the  persona  set  in  an  informal  and 
engaging  way  encouraged  adoption  among  the  various  dis¬ 
ciplines  contributing  to  the  overall  success  of  the  project. 


Persona  Artifacts 

You  can  use  a  variety  of  methods  to  communicate  person  as 
to  the  members  of  your  product  team,  including  websites,  posters, 
illustrations.  Word  documents,  Visio  diagrams,  live  actors,  and 
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FIGURE  51.6.  The  Yahoo!  Photo  persona  artifacts  helped 
bridged  a  communication  gap  between  functional  and  organiza¬ 
tional  worlds. 


videos.  Remember  that  the  artifacts  and  materials  you  create  to 
communicate  information  about  personas  are  very  important — 
they  are  the  user  interface  for  your  personas  and  the  data  be¬ 
hind  them.  Well-thought-out  and  well-designed  persona  materi¬ 
als  can  add  credibility  to  your  entire  persona  effort  and  help 
enormously  with  your  persona  communication  campaign. 

We  define  three  major  categories  of  persona  artifacts: 

1.  Buzz  Generators.  These  artifacts  should  be  designed  to 
build  up  anticipation  about  the  introduction  of  the  personas. 


They  are  usually  posters  with  relatively  little  information  on 
them.  Buzz  generators  can  give  hints  about  the  fact  that  per¬ 
sonas  are  coming  (e.g.,  a  poster  that  says  “do  you  know  who 
your  users  are?”)  or  begin  to  introduce  the  personas  them¬ 
selves  (e.g.,  a  wanted-style  poster  showing  just  a  persona’s 
photo,  name,  and  perhaps  role:  “Meet  Barry  the  Business 
Traveler”) . 

2.  Comparison  Facilitators.  Comparison  facilitators  are  help¬ 
ful  after  everyone  has  been  introduced  the  personas.  These 
artifacts  should  be  designed  to  help  people  understand  key 
differences  between  the  personas.  For  example,  you  could 
create  a  table-style  poster  that  lists  the  personas  across  the 
top  and  highlights  different  goals,  technical  abilities,  chal¬ 
lenges,  and  so  forth  across  each  row. 

3.  Enrichers.  As  the  design  and  development  process  contin¬ 
ues,  you  can  use  enricher  artifacts  to  refresh  everyone  on 
specific  aspects  of  each  persona.  For  example,  when  the 
team  begins  working  on  security  features,  you  could  create 
enricher  artifacts  that  describe  the  security  challenges  for 
each  of  the  personas.  You  might  also  send  out  a  monthly  e- 
mail  “from”  each  persona  to  update  the  team  on  their  goals 
and  needs.  You  can  also  create  fun  artifacts  as  enrichers;  for 
example,  you  can  hand  out  candy  bars  that  have  new  “per¬ 
sona”  labels  listing  salient  facts  about  each  persona.  Anything 
that  helps  keep  the  personas  fresh  and  alive  in  the  minds  of 
the  team  is  a  good  thing! 

Note  that  it  does  not  take  a  lot  of  these  artifacts  to  have  an  ef¬ 
fective  communication  campaign.  Be  very  strategic  and  frugal  in 
your  choices.  Approach  the  creation  and  distribution  of  your 
persona  artifacts  carefully: 

Agree  on  the  specific  goal  of  the  artifact.  "Why  are  you 
creating  this  specific  artifact?  The  goal  will  probably  be  related 
to  one  of  the  three  categories  of  artifacts  we  previously  de¬ 
scribed. 

Agree  on  the  audience,  timing,  and  distribution 
method for  the  artifact.  Your  persona  artifacts  should  even¬ 
tually  be  everywhere  around  your  office  (in  hallways,  coffee 
room,  doors,  meeting  rooms,  stakeholders/leaders’  offices,  etc.), 
but  they  should  appear  progressively.  For  every  artifact,  consider 
who  is  going  to  see  it,  when  (in  the  development  cycle)  they  are 
going  to  see  it,  and  how  the  environment  will  affect  their  ability 
to  digest  the  information.  For  example,  you  might  decide  to  cre¬ 
ate  different  “buzz  generator”  posters  for  the  developers’  hallway 
versus  the  marketers’  hallway.  If  you  work  in  place  that  does  not 
allow  posters  and  such  to  be  displayed  around  the  building,  cre¬ 
ate  artifacts  that  can  handed  out  to  individuals,  carried  around, 
or  placed  on  desktops. 

Agree  on  the  information  elements  that  should  (and 
should  not)  be  included  on  the  artifact.  By  the  time  you 
are  ready  to  create  persona  artifacts,  you  will  have  quite  a  bit  of 
information  about  each  persona  at  your  disposal.  The  informa¬ 
tion  you  have  will  all  seem  highly  relevant  and  deeply  interre¬ 
lated,  and  it  can  therefore  be  difficult  to  comb  out  small  snip¬ 
pets  to  include  on  individual  artifacts.  Remember  that  the  easiest 
way  to  create  a  useless  persona  artifact  is  to  overload  it  with  in- 
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formation.  For  example,  you  might  decide  to  create  “wanted” 
posters  to  create  buzz  and  to  convey  the  name,  role,  and  picture 
for  each  of  your  primary  personas.  It  will  be  tempting  to  include 
a  quote  and  maybe  a  few  bulleted  details  with  additional  infor¬ 
mation.  Remember  your  priorities;  if  you  really  do  want  to  build 
buzz  and  interest,  consider  limiting  the  poster  to  just  a  photo, 
a  name,  and  a  role.  When  in  doubt,  always  opt  for  less  informa¬ 
tion  and  leave  your  audience  craving  more. 

Agree  on  the  relative  priorities  of  the  information 
elements  on  the  artifact.  Once  you  decide  which  informa¬ 
tion  elements  should  be  included  on  an  artifact,  prioritize  these 
elements  according  to  how  important  it  is  that  the  element  is 
read  and  understood.  For  example,  on  the  “wanted”  posters, 
the  photo  and  name  should  probably  be  very  large  and  eye¬ 
catching.  In  contrast,  a  comparison  poster  you  distribute  a  few 
weeks  or  months  later  should  include  names  and  roles,  but 
these  are  probably  not  as  important  as  comparative  information 
about  each  persona’s  goals,  abilities,  desires,  and  so  forth.  See 
Fig.  51.7  for  example  posters  in  each  of  the  three  categories  of 
artifacts. 

If  you  have  limited  resources  (e.g.,  very  little  money  to  use 
on  persona  artifacts),  think  carefully  about  the  artifacts  you  will 
need  now  and  try  to  predict  what  you  will  need  later.  Do  not  use 
your  entire  budget  on  artifacts  you  will  distribute  early;  remem¬ 
ber  that  you  still  face  the  challenge  of  keeping  the  personas  alive 
and  useful  throughout  the  Adulthood  phase. 

Phase  4:  Persona  Adulthood 

Personas  are  “all  grown  up”  in  the  adulthood  phase,  and  they 
have  a  job  to  do.  You  have  now  introduced  the  personas  to  the 
product  team  and  worked  to  clarify  the  role  and  importance  of 
the  personas.  You  have  encouraged  the  product  team  to  em¬ 
brace  the  personas  and  the  information  they  contain,  and  now 
it  is  time  to  help  everyone  use  the  personas  to  inform  the  de¬ 
sign  and  development  of  the  product. 

Adult  personas  are  ready  to  be  put  to  work  in  a  variety  of 
ways.  Personas  can  help  by  answering  difficult  questions  and 
by  focusing  activities  in  a  way  that  takes  the  guesswork  out  of 
making  customer-driven  decisions.  Personas  can  participate  in 
your  product  planning,  design,  and  development  process  by 

•  Being  present  at  your  meetings  and  representing  the  voice  of 
your  customer  throughout  the  development  cycle. 

•  Providing  consistency  by  providing  a  common  reference 
point  across  your  organization,  even  in  a  highly  chaotic,  fast- 
moving,  ever-changing  environment. 

•  Providing  a  way  for  all  of  the  product  teams  to  touch  base 
using  a  common  language,  and  to  ensure  that  everyone  is 
staying  focused  on  creating  a  good  experience  on  the  right 
audiences. 

Personas  can  only  be  involved  and  helpful  in  these  ways  if 
they  inhabit  your  workspaces  and  attend  your  meetings.  Even 
though  they  are  not  real  people,  personas  can  become  the  most 
powerful  voices  in  the  room.  Large  wall  posters,  which  introduce 
your  personas,  should  be  placed  in  every  meeting  room  as  well 
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FIGURE  51.7.  Examples  of  a  buzz  generator,  and  enricher,  and  a 
comparison  facilitator  poster  for  the  online  presentation  soft¬ 
ware  personas. 


as  other  common  spaces  where  team  members  discuss  product 
design,  features,  and  overall  user  experience.  The  important 
thing  is  that  the  personas’  presence  is  felt.  You  want  your  team¬ 
mates  to  refer  to  the  personas  in  their  everyday  work.  Hundreds 
of  tiny  decisions  are  made  each  day,  and  you  want  these  deci¬ 
sions  to  be  made  in  consideration  of  your  target  audiences. 

The  more  the  persona  names  are  used,  the  more  likely  it  is 
that  everyone  in  your  organization  will  accept  the  fact  that  the 
personas  and  the  user  data  they  represent  are  here  to  stay.  The 
more  the  persona  names  replace  the  word  user  in  documents 
and  conversation,  the  more  likely  it  is  that  the  persona  data  will 
shape  your  product.  We  suggest  that  you  ban  the  word  user 
across  your  organization.  Find  the  owner  of  each  and  every  doc¬ 
ument  that  will  help  to  define  or  describe  your  product  and  en¬ 
courage  them  to  use  the  names  of  the  personas  instead  of  the 
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word  user.  You  have  succeeded  if  you  can  search  the  Product 
Vision  Statement,  Business  Plan,  Technical  Spec,  Marketing  and 
Messaging  Plans  and  find  no  instance  of  the  word  user. 

Use  Personas  for  Product  Planning,  Design, 

Evaluation,  and  Release 

One  obvious  way  to  use  personas  is  to  “test”  design  ideas  by 
asking  questions  such  as,  “Will  Sally  want  to  use  this  feature?” 
We  have  found  that  this  is  only  one  of  the  many  ways  personas 
can  be  involved  in  the  product  design  and  development 
process.  The  effective  persona  practitioner  must  understand  the 
many  other  ways  personas  can  be  involved  in  existing  processes, 
and  ensure  that  the  personas  work  hard  in  an  organization  dur¬ 
ing  the  entire  development  process. 

Personas  can  be  used  to  (a)  inform  quality  assurance  test 
cases,  (b)  recruit  usability  test  participants,  (c)  make  high  level 
and  detailed  feature  priority  decisions,  and  (d)  communicate 
product  direction.  Personas  can  also  inform  marketing,  adver¬ 
tising,  and  sales  strategy. 

Personas  and  product  planning.  Personas  can  help 
your  team  envision  what  your  new  product  should  do,  under¬ 
stand  the  context  into  which  you  will  launch  your  product  and 
the  kinds  of  problems  it  needs  to  solve  if  it  is  going  to  be  suc¬ 
cessful.  Now  that  you’ve  created  your  personas,  you  can  ask  the 
personas  to  “tell  you  their  stories;”  the  needs,  goals,  and  con¬ 
texts  you  so  carefully  included  in  your  persona  descriptions  will 
now  allow  you  to  generate  helpful  stories  about  the  way  your 
product  will  be  used  and  the  actions  (and  reactions)  it  should 
elicit. 

Personas  can  help  in  the  product  planning  process  by  helping 
you  both  discover  important  features  and  evaluate  the  relative 
values  of  each  feature.  You  can  use  personas  to  help  you  under¬ 
stand  and  capture  your  user  and  system  requirements  through 

•  Persona  narratives  and  storytelling. 

•  Persona-focused  competitive  reviews. 

•  Persona-focused  feature  brainstorming. 

•  Persona-focused  evaluation  of  proposed  features. 

All  of  these  involve  taking  on  the  perspective  of  your  per¬ 
sonas  to  review  the  competition,  your  ideas  for  features,  and  so 
forth.  For  example,  you  can  do  a  persona-focused  competitive 
review  by  finding  out  with  which  existing  products  your  new 
product  will  compete.  Your  marketing  team  has  probably  al¬ 
ready  done  this,  and  they  are  a  good  source  for  help  with  this 
exercise.  If  you  can,  buy  a  copy  of  each  competing  product.  (If 
the  products  are  prohibitively  expensive,  you  can  do  this  exer¬ 
cise  using  the  marketing  or  collateral  materials  instead  of  the 
actual  products;  this  will  give  you  insight  into  the  reaction  of 
your  potential  customers  to  the  messages  your  competitors 
have  deemed  to  be  important.)  Once  you  have  access  to  the 
products,  it  is  relatively  easy  to  look  at  them  from  your  per¬ 
sonas’  perspectives. 

Gather  the  core  team  and  ask  one  member  to  “walk  through” 
the  competitive  products  from  the  perspective  of  one  of  you 
personas.  As  you  observe  your  “persona”  walking  through  the 


product,  you  will  find  aspects  of  the  product  that  work  well  and 
some  that  do  not.  If  members  of  your  product  design  team  are 
present,  they  will  come  up  with  ideas  for  functionality  that  you 
must  address  in  your  product  and  ideas  for  brand  new  features. 

Personas  and  product  design.  Once  your  organization 
has  a  vision  and  overall  development  plan  in  place,  it  is  time  to  de¬ 
sign  the  elements  of  your  product.  Your  personas  helped  you  un¬ 
derstand  the  big  picture,  and  now  they  can  help  you  make  deci¬ 
sions  about  specific  features  and  design  elements — that  is,  your 
personas  can  show  you  how  these  feature  appear  and  behave. 

We  have  found  one  process  to  be  particularly  helpful  in 
translating  insights  derived  from  personas  into  ideas  for  product 
designs.  It  is  called  Design  Mapping.  We  will  describe  this 
method  in  depth  in  the  next  section  of  this  chapter  so  that  you 
can  try  it  out  for  yourself 

Personas  and  product  design  evaluation.  As  your 
team  settles  in  on  the  features  and  specific  solutions  that  it 
needs  to  embrace  to  create  a  successful  product,  your  personas 
can  help  in  honing  the  implementation  of  these  features  toward 
the  very  best  design. 

You  can  use  your  personas  to  help  with  evaluation  of  your 
features  and  solutions  through 

•  Cognitive  walkthroughs  and  design  reviews  with  personas. 

•  User  testing  and  ongoing  user  research  with  persona  profiles. 

•  Quality  assurance  (QA)  testing  and  bug  bashes. 

The  best  way  to  incorporate  personas  into  the  above 
processes  is  to  involve  the  owner  of  each  process  (for  example, 
the  QA  manager)  to  meet  with  the  persona  core  team.  As  a  rule, 
meet  with  process  owners  earlier  than  you  think  is  necessary. 
This  will  give  you  time  to  figure  out  ways  to  incorporate  per¬ 
sonas  into  the  various  evaluation  processes  your  company  uses 
before  it  releases  any  product. 

Personas  and  product  release.  Now  that  your  product 
is  getting  close  to  being  complete,  it  is  time  to  turn  your  atten¬ 
tion  toward  details  that  are  not  directly  related  to  product  de¬ 
velopment.  You  have  put  a  great  deal  of  effort  into  creating  and 
using  your  personas  to  design  and  build  your  product.  Now  that 
the  product  is  almost  complete,  your  personas  (and  all  of  the 
persona-related  materials  and  tools  you  have  created)  can  be  ex¬ 
tremely  helpful  to  those  responsible  for  documenting,  support¬ 
ing,  and  selling  your  product. 

If  you  have  used  the  personas  throughout  the  design  and  de¬ 
velopment  process,  you  will  have  many  documents  that  talk 
about  how  the  product  should  to  work  from  the  point  of  view  of 
the  personas.  These  documents  will  be  invaluable  to  the  docu¬ 
mentation,  training,  and  support  professionals  in  your  organi¬ 
zation.  Marketing  and  sales  professionals  will  also  be  able  to  use 
the  personas  (and  related  deliverables)  to  help  craft  materials  to 
support  their  own  work. 

Many  persona  practitioners  have  told  us  that  the  personas 
seem  to  “move  out”  of  the  design  and  development  offices  and 
“move  in”  to  the  documentation,  support,  and  sales  offices  dur¬ 
ing  product  release. 


1008  •  ADLIN  AND  PRUITT 


A  Great  Tool  for  Persona  Adulthood:  Using  Design 
Maps  to  get  From  Personas  to  Product  Designs 

Design  Mapping  is  a  process  that  results  in  a  large  flowchart  cre¬ 
ated  out  of  sticky  notes,  and  depicts  an  individual’s  end-to-end 
experience  of  using  a  tool  or  accomplishing  a  goal.  Maps  are  ar¬ 
tifacts  that  help  you  understand  and  communicate  information 
about  the  ways  people  achieve  their  goals  and  the  ways  they 
could  achieve  their  goals  with  new  tools.  Maps  are  similar  to 
other  participatory  design  tools,  but  are  useful  in  ways  that  we 
have  not  found  other  tools  to  be.  Design  Maps  tell  stories  about 
the  experiences  of  personas  in  the  future. 

Maps  are  helpful  information  gathering  and  design  tools  be¬ 
cause  they  are  easy  to  create,  iterate,  and  read.  If  kept  in  public 
spaces,  they  can  become  an  accessible  source  of  insights  into 
user  experiences  (either  as  they  exist  today  or  as  they  are  envi¬ 
sioned  to  become).  Unlike  prose  documents  or  complex  flow¬ 
charts,  Maps  make  it  easy  to  quickly  extract  and  understand  the 
end-to-end  user  experience  and/or  focus  on  details  of  interest. 

Design  Maps  can  help  you  test  new  experiences  before  you 
build  a  new  product.  Moreover,  Design  Maps  will  help  you  trans¬ 
late  what  you  know  about  your  personas  into  designs  for  new 
experiences  that  your  product  could  support.  Once  you  design 
the  experience  you  want  to  create,  it  is  relatively  easy  to  create 
features  to  support  that  experience. 

W/iat  Are  Design  Maps? 

Design  Maps  tell  stories  that  look  into  the  future;  these  sto¬ 
ries  describe  how  your  personas  will  behave  once  your  new 
product  is  built.  Those  familiar  with  scenario-based  design  will 
recognize  that  Design  Maps  have  a  distinct  similarity  to  scenarios 
(see  Carroll,  1995).  Scenarios  are  short  prose  stories  that  de¬ 


scribe  how  aspects  of  your  product  will  be — or  should  be — used. 
Design  Maps  are  both  a  special  type  of  scenario  and  a  process 
by  which  to  create  scenarios  and  modify  them.  Design  Maps  are 
flow-chart  version  of  many  scenarios  “strung  together”  to  cre¬ 
ate  a  big  picture  of  the  experience  your  product  will  support.  De¬ 
sign  Maps  are  inexpensive  (both  in  terms  of  time  and  materials) 
and  they  are  most  helpful  when  built  before  paper  prototypes, 
and  certainly  before  any  code  is  written  (see  Fig.  51.8). 

Which  Processes  Should  1  Design  Map? 

Design  Maps  depict  end-to-end  experiences  (not  specific  fea¬ 
tures  or  widgets).  You  can  create  Design  Maps  to  explore  any 
experience  you  want  to  create  as  it  relates  to  any  aspect  of  your 
new  product’s  design.  The  Design  Maps  you  create  should  ex¬ 
plore  the  ways  that  your  personas  achieve  the  goals  that  you 
have  established  for  them.  Remember  that  their  roles  and  goals 
may  change  in  your  new  designs. 

Design  Map  for  the  “big  picture.  ”  A  “big  picture”  Map 
shows  the  entire  experience  end-to-end  and  therefore  describes 
activities  in  very  broad  terms.  Think  of  this  “overview”  Design 
Map  as  analogous  to  a  map  of  the  United  States  with  a  line  drawn 
on  it  to  show  the  route  of  a  cross-country  driving  trip;  the 
overview  Map  should  give  the  reader  a  general  sense  of  direction 
and  the  order  of  progression,  but  should  not  contain  details. 

Design  Maps  for  achieving  major  milestones.  “Major 
milestone”  Maps  should  “fit  into”  the  overall  Map,  but  should 
explore  individual  goals  and  tasks  more  specifically.  In  the  cross¬ 
country  trip  example,  a  “major  milestone”  Map  would  be  the 
equivalent  of  a  highway  map  showing  the  roads  you  used  to  tra¬ 
verse  a  single  state. 
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FIGURE  51.8.  A  Design  Map  exploring  how  Megan  starts  an  online  presentation.  Design  Maps  are 
created  by  your  team  and  explore  the  new  experiences  you  are  going  to  build  into  your  product. 
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Design  Maps  for  critical  details.  “Critical  details”  Maps 
should  fit  into  the  milestones  Maps  much  as  the  milestones  fit 
into  the  big  picture.  These  Maps  explore  very  specific  details  of 
particular  tasks,  the  way  an  enlargement  of  a  downtown  area 
shows  the  specific  details  of  the  ways  streets  crisscross  a  city. 

How  to  Create  a  Design  Map 

Maps  show  steps  in  a  process  or  experience  sequentially, 
with  any  questions,  comments,  or  ideas  regarding  these  steps 
arrayed  underneath.  Finished  Design  Maps  are  large  sheets  of 
paper  covered  with  color-coded  Post-It™  (or  “sticky”)  notes  that 
describe  the  user  experiences  related  to  your  product.  Design 
Maps  are  created  by  the  design  team  without  the  participation 
of  users. 

The  elements  of  a  Map:  titles,  steps,  questions,  com¬ 
ments,  and  design  ideas.  All  Maps  have  five  basic  build¬ 
ing  blocks:  (a)  titles,  (b)  steps,  (c)  questions,  (d)  comments,  and 
(e)  design  ideas.  Titles  should  identify  the  goals  or  tasks  the 
Map  depicts.  Steps  should  be  arrayed  horizontally,  with  related 
comments,  questions,  and  design  ideas  arranged  under  the 
steps  they  reference.  You  can  read  across  the  row  of  steps  to  get 
a  sense  of  the  process  from  end  to  end  (e.g.,  the  steps  in  a  task 
taken  to  reach  a  goal),  or  you  can  focus  on  a  subset  of  the  steps 
and  read  down  the  columns  to  understand  related  questions 
and  ideas  (see  Fig.  51.9). 

Steps  (blue  sticky  notes);  These  are  the  “verbs”  or  the 
“backbone”  of  the  process.  The  facilitator  of  the  Mapping  exer¬ 
cise  places  steps  horizontally  across  the  Map.  A  good  way  to 


elicit  steps  is  to  ask,  “What  will  the  persona  do  next?”  Steps  are 
the  building  blocks  of  tasks. 

Comments  (green  sticky  notes):  Comments  are  qualify¬ 
ing  statements  about  steps.  They  are  the  most  flexible  elements 
on  a  Map.  Comments  can  describe  behaviors,  habits,  awareness 
or  lack  of  awareness  of  features  or  alternative  actions,  or  even 
qualities  of  objects.  If  you  hear  an  important  piece  of  informa¬ 
tion,  but  it  is  not  a  step,  question,  or  design  idea,  record  it  as  a 
green  comment. 

For  example,  in  our  Megan  Delivers  the  Presentation  Map,  the 
comment  “Megan  has  already  uploaded  all  of  her  slides”  is  a  note 
about  her  actions  that  could  be  significant  with  respect  to  the  rest 
of  the  experience.  The  comment  is  not  a  step,  but  it  relates  to  the 
step  listed  above:  “Megan  logs  on  to  the  presentation  system.”  The 
comment,  in  this  example,  serves  to  remind  the  facilitator  and 
Map  readers  that  we  are  assuming  that  the  slides  are  uploaded  in 
a  separate,  previous  series  of  actions  in  this  particular  Design  Map. 
If  this  comment  were  to  change  (in  this  example,  if  we  were  to 
change  or  design  the  process  so  that  the  slides  are  uploaded  im¬ 
mediately  before  the  presentation),  it  could  affect  all  of  the  steps, 
comments,  questions,  and  ideas  in  the  remainder  of  the  Map. 

Questions  (yellow  sticky  notes):  Yellow  “questions”  are 
the  most  useful  interview  management  tool  of  the  Mapper. 
When  you  first  start  Mapping  any  process,  you  will  identify  many 
questions,  some  indicating  areas  where  you  need  clarification 
and  some  that  express  your  Mapping  participants’  issues.  In 
fact,  you  will  probably  encounter  so  many  questions  that  the 
sheer  volume  and  importance  of  them  will  threaten  to  derail 
your  attempt  to  Map  the  entire  process.  Listing  the  questions  on 
the  Map  allows  you  to  record  and  move  past  them  quickly  so 
that  you  can  capture  as  much  of  the  process  as  possible  without 
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FIGURE  51.9. 


Title,  steps,  questions,  comments,  and  design  ideas  in  a  Design  Map. 
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being  derailed.  Once  you  create  a  Map  that  captures  most  of  an 
end-to-end  process,  you  can  loop  back  and  track  down  answers 
to  the  questions  you  have  identified. 

Create  a  yellow  question  when 

•  You  have  questions  about  the  process. 

•  You  are  not  quite  sure  what  will,  should,  or  could  happen 
next. 

•  Anyone  participating  in  the  Mapping  session  begins  to  bela¬ 
bor  a  point. 

Design  Ideas  (pink  sticky  notes):  As  you  create  your 
Map,  you  will  inevitably  think  of,  or  be  presented  with,  an  as¬ 
sortment  of  ideas  for  specific  features,  widgets,  or  even  things 
like  marketing  messages.  Your  Mapping  goal  is  to  create  a  solid 
picture  of  the  end-to-end  experience  you  want  to  create  for 
your  persona.  While  you  do  not  want  to  allow  your  Mapping  ses¬ 
sion  to  turn  into  a  discussion  of  specific  new  features,  you  also 
should  not  discard  good  ideas  just  because  they  come  up  at  the 
“wrong  time.”  For  example,  in  Fig.  51.8,  one  pink  sticky  note 
says,  “Let’s  create  a  way  for  her  to  flip  through  her  slides  (and 
change  them?)  without  any  audience  members  seeing  this 
process.”  This  is  an  interesting  idea  and  worth  capturing.  Ask 
anyone  who  comes  up  with  a  design  idea  to  record  it  on  a  pink 
sticky  note,  place  it  on  the  Map,  and  move  on. 

Facilitating  a  Design  Mapping  session.  Encourage 
Mapping  participants  to  focus  on  the  experience,  not  on  the 
tool.  The  goal  is  not  to  have  a  Map  that  tells  you,  “The  serial 
number  registration  tracking  database  will  feed  the  score 
records  to  the  page  via  ASf”  but  one  that  says,  “Megan  can  see 
that  Ivan  is  already  logged  on.” 

During  Design  Mapping  sessions,  remind  your  team  to  con¬ 
sider  the  following  questions: 

•  Do  the  tasks  assigned  to  personas  in  the  Design  Map  corre¬ 
spond  to  your  personas’  skills?  For  example,  if  Sam  the  Sales 
Rep  is  expected  to  spend  an  hour  doing  technical  prepara¬ 
tion  so  that  he  can  attend  the  presentation,  he  is  not  being 
well  served  by  the  new  design.  If  Megan’s  Design  Map  allows 
her  to  answer  questions  by  presenting  new  content  on  the  fly, 
you  are  on  the  right  track. 

•  Does  the  new  process  being  constructed  in  the  Map  offer  un¬ 
deniable  advantages  to  the  personas  over  the  old  way  of  do¬ 
ing  things? 

•  Are  we  assuming  things  have  to  be  done  a  certain  way  just  be¬ 
cause  it  is  the  way  they  are  done  now? 

As  you  move  through  a  Mapping  session,  remember  to  table 
questions  that  might  sidetrack  your  work  by  providing  everyone 
with  yellow  sticky  notes  and  encouraging  participants  to  write 
down  difficult  questions  and  issues  and  post  them  on  the  map. 
During  your  Design  Mapping  session,  you  might  hear  a  com¬ 
ment  such  as,  “Well,  if  we’re  assuming  Sam  can  access  the  In¬ 
ternet  using  a  broadband  connection  from  wherever  he  is,  even 
if  he’s  on  the  road,  we  can  assume  he  won’t  have  any  trouble 
viewing  the  presentation.  It  would  also  be  great  if  we  could  as¬ 


sume  he  has  all  the  media  players  installed,  and  the  latest  ver¬ 
sion  of  the  browsers.  This  makes  things  easier  for  us.”  This  is  a 
good  opportunity  to  refer  to  your  personas.  Do  you  have  infor¬ 
mation  about  Sam’s  technical  setup?  How  likely  is  it  that  his 
computer  will  have  the  latest  media  players?  Is  this  likely  to  be 
something  that  is  done  for  him,  or  something  he  has  to  figure 
out  for  himself?  Your  personas  will  be  able  to  immediately  an¬ 
swer  some  of  these  questions;  other  questions  will  have  to  go 
onto  to  the  Map  to  be  answered  later.  In  this  particular  example, 
you  might  decide  to  create  another  Map  that  assumes  that  Sam 
does  not  have  everything  he  needs  loaded  on  his  machine. 
What  process  are  you  going  to  design  to  make  it  easy  for  him 
to  prepare  for  and  attend  a  presentation  in  this  case? 

Sometimes  you  will  want  to  move  quickly,  placing  blue  steps 
across  the  top  of  the  Map  and  filling  in  details  later.  Other  times 
your  team  might  find  it  most  effective  to  hash  out  the  details  un¬ 
der  each  step  before  moving  to  the  next  one.  In  either  case,  you 
will  want  to  limit  Mapping  sessions  to  two  to  three  hours  each. 

After  each  session,  follow  up  on  any  questions  or  issues 
raised  and  add  answers  to  the  Map.  It  can  be  useful  to  convert 
the  sticky  notes  paper  versions  of  your  Maps  into  electronic  ver¬ 
sions  in  Microsoft  Visio  or  a  similar  tool.  This  makes  it  easy  to  for 
participants  to  review  progress  and  quickly  scan  for  new  mater¬ 
ial.  The  electronic  versions  are  useful  for  printing  in  various  for¬ 
mats  and  sending  to  stakeholders  for  review  at  a  distance. 


Use  Design  Maps  to  Create  Wireframes 

Wireframes  tend  to  evolve  naturally  from  Design  Maps.  Once 
you  and  your  team  have  agreed  on  the  experience  you  want  to 
facilitate  for  your  personas,  it  is  relatively  easy  to  use  the  steps, 
assumptions,  questions,  and  design  ideas  in  the  Maps  to  create 
wireframes  of  the  product’s  UI. 

With  your  team,  identify  the  columns  of  the  Design  Map 
that  “go  together”  and  should  be  grouped  on  a  single  interface 
(see  Fig.  51.10).  At  this  point,  the  user  interface  designer  or 
graphic  designer  should  be  heavily  involved.  Consider  what  in¬ 
formation  you  are  collecting  from  the  personas  and  when  you 
are  collecting  it  so  that  you  can  plan  to  display  it  on  the  UI  at 
the  right  times  (e.g.,  if  you  have  not  yet  asked  the  persona  for 
his  or  her  name  yet,  you  cannot  create  a  wireframe  for  a  per¬ 
sonalized  interface). 


Tfie  Benefits  of  Design  Maps 

Perhaps  the  most  important  benefit  of  Design  Maps  (and 
wireframes)  is  that  they  hold  and  communicate  a  shared  vision 
of  the  project  for  your  entire  team.  Seeing  the  “big  picture”  early 
on  in  the  project  helps  motivate  everyone  toward  the  common 
goal  of  making  it  real.  Design  Maps  enable  your  entire  team  to 
“see”  the  product  from  the  personas’  points  of  view,  giving  the 
architects  of  the  product  the  opportunity  to  understand  and  em¬ 
pathize  with  users.  A  deeper  understanding  of  the  planned  prod¬ 
uct  and  personas  early  on  in  the  development  process  can  en¬ 
hance  each  team  member’s  work  on  your  product,  whether  they 
are  coding,  marketing,  managing,  testing,  funding  or  selling  it. 
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FIGURE  51.10.  Wireframes  evolve  naturally  from  Design  Maps.  You  and  your  Mapping  team  can 
evaluate  which  process  steps  belong  together  and  design  the  interface  flow  accordingly. 


Design  Maps  (and  their  associated  wireframes)  are  a  per¬ 
fect  document  to  work  from  when  communicating  project 
plans  to  stakeholders.  Once  your  Design  Maps  are  completed, 
you  can  use  them  to  perform  design  walk-throughs  of  your 
product  with  other  team  members.  Have  someone  read 
through  the  Map  and  another  person  check  the  prototype  or 
the  product  to  make  sure  the  Mapped  process  is  supported  by 
the  product’s  design. 


As  your  development  team  begins  coding  your  new  product, 
check  in  frequently  to  evaluate  what  they  are  building  against 
what  you  thought  they  were  going  to  build.  Use  the  Design 
Maps  to  “test”  the  emerging  product:  does  the  product  being 
built  actually  support  the  experience  you  have  designed?  The 
Design  Maps  you  created  define  a  task  flow;  when  the  product 
changes  during  development,  use  the  Maps  to  evaluate  whether 
or  not  the  changes  support  the  tasks  you  identified. 


Design  Mapping  Illuminates — and  Fixes! — Holes 
in  Other  Kinds  of  Specification  Documents 

Raina  Brody  &  Sylvia  Olveda, 

Usability  Specialists,  Amazon.com 

The  Amazon  usability  team  works  with  many  different  project 
teams  from  all  over  the  company.  For  one  of  our  projects,  we 
were  working  with  a  team  that  was  designing  and  developing 
a  new  feature  for  the  website.  The  team  had  already  created 
three  primary  personas  and  a  very  large  document  full  of  use 
cases,  which  specified  many  features  and  user  interface  ele¬ 
ments.  The  use  cases  and  user  interface  specifications  all  re¬ 
ferred  to  their  personas  by  name.  In  short,  it  looked  like  they 
had  done  a  lot  of  good,  user-centered  work  and  were  pretty 


far  along  the  path  toward  creating  a  good  solution.  We  asked 
the  designer  and  the  design  manager  why  they  felt  they 
needed  our  help. 

They  replied,  ‘All  of  this  stuff  looks  really  good,  but  there 
are  enormous  holes  in  the  use  cases  when  it  comes  to  the  ac¬ 
tual  UI,  the  experience  flow,  and  the  technical  implementa¬ 
tion/specifications.  Some  of  the  technical  services  and  features 
these  use  cases  call  for  don’t  exist  and  simply  can’t  be  built — 
and  the  rest  of  the  team  doesn’t  seem  to  get  this.”  We  decided 
to  lead  a  few  Design  Mapping  sessions  to  help  them  identify 
and  document  the  holes  and  to  communicate  their  concerns 
to  the  rest  of  the  team. 

We  set  up  a  meeting  with  the  project  team  (including  mem¬ 
bers  of  the  business  team  and  design  and  development  teams) 
and  asked  them  to  pick  a  couple  of  the  most  important  use 
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cases.  They  were  a  bit  hesistant  at  first,  because  some  mem¬ 
bers  felt  that  they  had  already  done  all  of  this  work  in  the  use 
case  document.  After  describing  their  primary  use  cases,  we 
started  the  Design  Mapping  exercise.  Very  soon,  it  became 
clear  that  there  were  holes  in  the  existing  documentation.  For 
example,  some  of  the  original  use  cases  assumed  the  existence 
of  data  types  and  search  services  that  simply  did  not  exist,  and 
that  were  not  scoped  as  part  of  the  project  schedule.  The  De¬ 
sign  Map  helped  us  discover  that  very  important  elements  in 
their  use  cases  had  not  yet  been  thought  about. 

The  Design  Mapping  process  really  started  going  well  when 
the  project  team  members  realized  that  we,  the  usability  team, 
were  not  trying  to  design  their  product  for  them.  Instead,  we 
were  serving  as  facilitators  to  help  them  come  up  with  and  dis¬ 
cuss  their  ideas  from  a  customer  experience  and  technical  fea¬ 
sibility  perspective.  Of  course,  it  was  our  job  to  ask  (and  some¬ 
times  answer)  some  difficult  questions  about  whether 
customers  would  be  able  to  discover  and/or  understand  some 
of  the  user  interface  elements  the  team  came  up  with. 

The  Mapping  process  generated  excitement  from  the  pro¬ 
ject  team  and  some  of  the  best  ideas  came  from  developers. 
After  our  first  meetings,  the  Mapping  sessions  went  well  and 
went  very  fast.  The  team  agreed  that  the  Design  Mapping 
process  helped  them  identify  what  was  hard,  what  was  easy, 
and  what  they  could  really  do.  The  team  also  realized  that 
while  the  original  use  case  document  contained  a  lot  of  good 
ideas,  the  complexities  of  the  actual  implementation  were  not 
made  as  apparent  as  they  were  with  the  Mapping  session. 

At  the  end  of  the  project,  we  asked  the  team  how  the  De¬ 
sign  Mapping  and  personas  helped  with  the  project.  The  an¬ 
swers  we  got  were  very  encouraging: 

•  From  a  key  developer:  “Design  mapping  helped  everyone 
on  the  team  to  really  focus  on  the  project  from  our  cus¬ 
tomer’s  point-of-view.  It  took  a  lot  of  disparate  opinions  and 
ideas  about  what  we  wanted  to  build  and  provided  a  com¬ 
mon  focus.  It  was  responsible  for  some  very  productive 
brainstorming.” 

•  From  the  Technical  Program  Manager:  “The  personas  and 
Mapping  process  are  invaluable  tools  designing  a  product 
that  is  customer-centric;  together  they  have  the  power  to 
overcome  designs  driven  by  emotion  and  bias:  two  things 
commonly  found  in  every  product  design  cycle.” 


•  From  the  Product  Manager:  “The  Design  Mapping  we  did  at 
the  beginning  of  our  project  has  proven  to  be  an  invalu¬ 
able  tool  to  keep  the  team  focused  on  what  is  best  for  our 
target  customers.  It  was  also  an  extremely  helpful  team¬ 
building  exercise.” 

•  From  the  Team  Manager:  “The  design  mapping  process  helps 
us  develop  the  right  features  for  the  right  customers.  Proj¬ 
ect  teams  are  hesitant  at  first  because  it  seems  like  a  signifi¬ 
cant  time  investment,  but  once  they  go  through  the  process, 
they  find  it  invaluable.  It  actually  saves  time  in  the  long  run  by 
making  sure  teams  make  better  decisions  up  front.” 

We  learned  some  valuable  lessons  from  this  Mapping  exercise: 

•  If  you  are  Mapping  with  a  team  that  has  never  tried  it  before, 
do  not  start  by  Mapping  a  controversial  feature  or  experi¬ 
ence.  Instead,  start  with  one  of  the  features  or  experiences 
that  seems  well  defined  and  clear.  This  will  help  them  team 
understand  the  Mapping  process  and  will  show  them  that 
you  do  not  intend  to  take  over  the  design  of  their  product. 

•  Encourage  teams  to  do  use  cases  after  they  do  Design 
Maps.  Design  Maps  generate  lots  of  use  cases,  and  they 
have  the  added  benefit  of  clearly  illustrating  how  use  cases 
fit  together  in  an  end-to-end  customer  or  user  experience. 

•  Present  Design  Mapping  as  a  technique  to  facilitate  the 
product  team’s  design  process.  It  is  important  that  the  team 
still  feel  in  control,  even  in  a  situation  where  you  are  leading 
a  Mapping  exercise. 

•  Create  a  Mapping  plan  after  your  first  Mapping  meeting.  Af¬ 
ter  you  do  one  Map,  it  will  be  easy  to  identify  other  user 
experiences  you  want  to  Map.  Revisit  your  Mapping  sched¬ 
ule  after  you  do  the  first  two  or  three  Maps  to  integrate  new 
ideas  for  Maps  and  a  new  perspective  on  how  long  the  Map¬ 
ping  process  will  take. 

•  Think  about  who  you  want  to  invite  and  who  you  need  to 
invite  to  Mapping  sessions.  We  always  facilitated,  and  in¬ 
vited  at  least  one  designer,  the  technical  program  or  project 
manager,  between  one  and  three  members  of  the  technical/ 
development  team,  and  the  product  manager. 

Word  is  spreading  fast,  and  now  we  have  so  many  requests 

to  facilitate  Design  Mapping  sessions  we  can  hardly  keep  up 

with  them. 


Phase  5:  Persona  Lifetime  Achievement  &  Retirement 

Once  the  project  or  product  is  completed,  it  is  time  to  think 
about  what  has  been  accomplished,  and  also  to  prepare  for  the 
next  project.  You  will  want  to  assess  how  effective  the  persona 
method  was  for  your  team  and  product  development  process.  If 
you  are  beginning  to  think  about  the  next  product  (or  next  ver¬ 
sion  of  the  product  you  have  just  released)  you  will  need  to  de¬ 
cide  whether,  and  how,  you  will  reuse  your  existing  personas 
and  the  information  they  contain. 


The  end  of  a  product  cycle  is  a  good  time  to  assess  the  ef¬ 
fectiveness  of  personas  for  the  team  and  to  take  stock  of  lessons 
learned  for  the  next  time.  How  did  the  development  team  ac¬ 
cept  the  method?  Were  your  personas  useful?  To  what  extent 
were  they  accurate  and  precise?  We  will  provide  suggestions  and 
tools  you  can  use  to  validate  the  use  of  personas  in  the  devel¬ 
opment  process  and  to  determine  if  the  persona  effort  was 
worth  the  effort  and  resources  it  required.  Did  personas  change 
the  product?  Did  they  change  your  design  and  development 
process?  User-centered  designers  are  constantly  under  pressure 
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to  validate  the  worth  and  ROI  of  their  activities  and  personas 
can  be  useful  tools  for  measuring  the  success  of  both  the  prod¬ 
uct  and  of  the  UCD  activities  as  a  whole. 

Recall  the  questions  we  recommended  you  answer  in  Phase 
1,  Family  Planning,  related  to  “organizational  introspection.” 
Assessing  return  on  investment  is,  in  simple  terms,  working  to 
answer  those  original  questions  (listed  in  the  left  column  of  Fig. 
51.11,  below)  in  terms  that  are  meaningful  now  that  you  are 
done  with  the  effort  (e.g.,  those  questions  listed  in  the  right  col¬ 
umn  of  Fig.  51.11). 

Retirement 

Depending  on  the  nature  of  your  products,  you  might  be 
able  to  reuse  the  personas  or  “reincarnate”  some  of  your  per¬ 
sona  data  in  new  personas.  That  is,  you  will  need  to  decide  what 
to  do  with  your  “old”  personas  as  you  prepare  for  your  next  pro¬ 
ject.  Do  your  personas  retire?  Do  they  change  over  time?  Do 
they  buy  your  product  and  start  using  it?  Can  other  product 
teams  utilize  your  personas  or  some  portion  of  the  information 
in  them  (e.g.,  are  they  reincarnated)? 

In  most  cases,  you  will  decide  to  use  some  combination  of 
direct  reuse  (using  them  again  without  alteration),  reincarna¬ 
tion  (reusing  some  their  content  and  related  data)  or  retirement 
(discarding  or  completely  replacing  some  of  the  personas). 
Note  that,  if  your  persona  effort  has  been  a  success,  retirement 
and  reincarnation  can  be  a  bit  tricky;  to  have  “room”  for  the  next 
set  of  personas,  you  will  need  to  help  your  organization  let  go  of 
the  personas  they  have  come  to  know  so  well. 

Before  you  decide  what  to  do  with  your  personas,  you  need 
to  revisit  the  data  sources  you  used  to  create  them.  If  you  are 
about  to  start  work  on  the  next  version  of  the  product  you  have 
just  released,  it  is  likely  that  many  (but  probably  not  all)  of  your 
data  sources  are  still  relevant  and  you  can  reuse  entire  personas 
or  some  of  the  information  in  the  personas.  If  you’re  moving  on 
to  create  a  completely  different  product  or  if  there  have  been 


Questions  You  Should  Ask 
During  Family  Planning 

Questions  You  Wili  Ask  During 
Lifetime  Achievement 

What  resources  do  we  have 
for  personas  and  other  UCD 
activities? 

How  much  did  the  persona 
effort  actually  cost? 

What  product  problems  do 
we  want  to  solve  with 
personas? 

Has  the  product  improved?  How 
much,  and  in  what  ways? 

What  process  problems  do 
we  want  to  solve  with 
personas? 

Has  the  process  improved?  In 
what  ways? 

How  can  we  ensure  that 
the  personas  will  be 
accepted  and  used  by 
our  colleagues? 

Were  personas  perceived  as 
helpful?  Has  the  company's 
focus  on  users  improved?  In 
what  ways? 

FIGURE  51.11.  The  work  you  did  during  Family  planning  is  highly 
related  to  what  you  will  need  to  do  in  Lifetime  Achievement. 


major  shifts  in  strategy,  then  perhaps  only  a  few  of  the  data 
sources  may  still  be  relevant  (e.g.,  those  that  relate  to  your  com¬ 
pany  or  to  the  general  product  space  in  which  you  work). 

Reusing  Your  Personas 

If  you’re  building  a  new  version  of  your  product,  or  a  new 
product  for  the  same  audience,  you  might  find  that  many  of 
your  personas  can  be  reused.  Your  personas  could  be  reused 
by  the  same  team  that  used  them  originally,  by  a  new  design  and 
development  team,  or  perhaps  by  a  team  in  some  other  part  of 
your  company,  like  marketing,  sales,  or  product  support. 

■When  you  created  your  personas,  you  (a)  assimilated  your 
data,  (b)  created  persona  skeletons,  (c)  prioritized  the  skele¬ 
tons,  and  (d)  built  some  or  all  of  the  skeletons  into  full  per¬ 
sonas.  When  you  move  on  to  the  next  version  of  your  product, 
you  can  reevaluate  the  primary  versus  secondary  classification 
for  each  of  your  original  personas.  You  might  decide  to  demote 
one  of  your  primary  personas  and  promote  one  of  your  sec¬ 
ondary  personas;  this  promotion/demotion  is  especially  useful 
if  you  are  building  a  new  version  of  the  same  project  but  your 
company  has  decided  to  focus  on  a  slightly  different  user  base. 
Additionally,  you  can  revisit  some  of  the  persona  skeletons  you 
created  but  never  developed  for  the  first  project;  it  is  possible 
that  one  of  these  sketches  would  just  right  for  the  new  project. 
If  so,  you  have  a  tremendous  head  start  and  can  simply  build 
up  the  sketch  into  a  full  persona. 

Reincarnating  Personas 

If  some  of  the  data  in  your  personas  is  still  relevant,  but  the 
personas  you  originally  created  are  not,  you  can  create  “rein¬ 
carnated”  personas  by  reusing  “old”  data  in  new  personas.  If  the 
products  you  develop  serve  users  in  a  specific  market  segment 
or  industry,  you  will  find  many  data  sources  that  stay  relevant  no 
matter  what  project  you  are  working  on.  If  the  data  sources  are 
still  relevant,  but  the  particular  personas  you  have  created  are 
not,  it  is  important  to  do  some  research  and  find  some  addi¬ 
tional  sources. 

Once  you  have  collected  the  appropriate  set  of  data  (which 
will  include  sources  you’ve  already  used),  you  can  revisit  the 
processes  described  in  the  Conception  &  Gestation  section  and 
reassimilate  data  points  according  to  the  issues  you’re  finding 
related  to  your  new  project. 

Retiring  Your  Personas 

You  might  decide  that  you  do  not  want  to  reuse  or  reincar¬ 
nate  your  personas  or  their  underlying  data  at  all.  There  are 
many  reasons  to  retire  personas  before  moving  on  to  the  next 
project: 

•  The  current  project  is  significandy  different  from  the  last  project. 

•  The  users’  goals  have  changed. 

•  Your  company  adopted  a  new  strategy  or  is  targeting  a  dif¬ 
ferent  user  base. 
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•  There  are  significant  changes  to  the  environment  in  which 
your  product  will  be  used,  such  as  new  technologies  or  new 
competitive  products  that  have  “changed  the  landscape  (e.g., 
the  advent  of  streaming  media  and  broadband  in  the  home, 
or  Bluetooth  technology  for  computing) . 

When  you  determine  that  a  persona  or  set  of  personas  is  no 
longer  relevant,  it  is  a  good  idea  to  officially  retire  the  per- 
sona(s)  before  moving  on. 

Why  “officially”?  Why  not  just  take  down  the  posters  and  start 
working  on  a  new  set  of  personas?  Because  if  you  have  done 
your  job  well,  you  have  made  the  personas  incredibly  memo¬ 
rable,  and  all  your  work  has  paid  off;  people  in  your  organiza¬ 
tion  have  absorbed  various  amounts  of  information  about  the 
primary  personas  and  are  accustomed  to  thinking  about  them. 
If  you  try  to  introduce  a  new  set  of  personas  “on  top  of”  an  old 
set,  you  run  the  risk  of  confusing  your  team,  which  will  destroy 
the  clarity  that  personas  are  supposed  to  provide. 

You  can’t  reach  into  your  colleagues’  heads  and  erase  every¬ 
thing  they  know  about  the  personas  you’ve  been  using,  but  you 
have  to  find  a  way  to  help  them  move  past  the  old  personas  and 
let  go  of  the  (no  longer  relevant)  information  they  contain.  This 
can  be  as  simple  as  an  e-mail  announcement  that  the  old  set  of 
personas  is  retiring  and  why.  If  you  are  moving  on  to  a  totally  dif¬ 
ferent  product  or  a  new  strategy,  most  of  your  colleagues  would 
know  about  the  switch  and  the  retirement  announcement  will 
make  sense  to  them.  Use  this  as  an  opportunity  to  invite  feed¬ 
back  from  the  team  on  the  ways  personas  helped  or  didn’t  help 
them  do  their  jobs;  you  can  use  this  feedback  to  tweak  your  cus¬ 
tomized  persona  lifecycle  the  next  time  around. 

Moving  on  to  the  Next  Persona  Effort 

Finally,  it  is  worth  noting  that  the  activities  we  recommend  at 
the  end  of  one  persona  project  function  to  prepare  you  for  the 
next  persona  project;  regaining  control  over  the  personas  and 
evaluating  the  success  of  the  effort  will  help  you  be  even  more 
successful  in  your  next  effort.  The  Lifetime  Achievement,  Reuse, 
&  Retirement  phase  provides  an  excellent  opportunity  to  touch 
base  with  your  core  team  members  and  with  other  stakeholders 
to  talk  about  how  things  went.  As  you  dive  back  into  Family 
Planning,  you  will  want  to  predict  the  new  issues  you  may  en¬ 
counter.  This  final  lifecycle  phase  is  a  great  time  to  have  a  post¬ 
mortem  to  talk  about  what  improved,  stayed  the  same,  or  wors¬ 
ened  during  or  due  to  your  persona  effort. 


SUMMARY 


Personas  can  be  an  invaluable  asset  to  product  design  and  de¬ 
velopment  teams,  but  they  must  be  created  and  used  with  care. 
The  best  advice  we  can  give  you  as  you  embark  on  your  own 
persona  efforts  is  to  keep  in  mind  the  following  axioms  that  are 
at  the  heart  of  the  persona  lifecycle  approach: 

•  Building  personas  from  assumptions  is  good;  building  per¬ 
sonas  from  data  is  much,  much  better. 


•  Personas  are  a  highly  memorable,  inherently  usable  commu¬ 
nication  tool  if  they  are  communicated  well. 

•  Personas  can  be  initiated  by  executives  or  first  used  as  part 
of  a  bottom-up,  grass  roots  experiment,  but  eventually  need 
support  at  all  levels  of  an  organization. 

•  Personas  are  not  a  stand-alone  user-centered  design  process, 
but  should  be  integrated  into  existing  processes  and  used  to 
augment  existing  tools. 

•  Personas  can  help  bring  a  user-centered  focus  into  even  a 
stubbornly  technology-focused  organization. 

•  Effective  persona  efforts  require  organizational  introspection 
and  strategic  thinking. 

•  Personas  can  be  created  fast  and  show  their  value  quickly,  but 
if  you  want  to  get  the  full  value  of  personas,  you  will  have  to 
commit  to  a  significant  investment  of  time  and  resources. 

In  short,  “the  devil  is  in  the  details”  when  it  comes  to  launch¬ 
ing  a  persona  effort  within  an  organization.  Perhaps  this  is  the 
reason  that  persona  efforts  are  inevitably  more  work  than  you 
think  they  are  going  to  be.  If  you  can  dedicate  some  time,  and 
can  get  some  help  from  at  least  one  colleague,  the  persona  life- 
cycle  will  enable  you  to  integrate  persona  creation  and  use 
methods  into  your  existing  processes.  Equally  importantly,  it  will 
provide  you  with  data-driven,  user-focused  reasons  to  change 
aspects  of  existing  processes  that  are  not  working  well  for  team 
and  are  resulting  in  problematic  products. 


Personas  in  the  Present — ^And  Looking 
Toward  the  Future 

Kim  Goodwin,  Vice  President 
and  General  Manager,  Gooper 

As  for  the  future  of  personas,  I  have  heard  a  number  of  peo¬ 
ple  say  that  personas  are  just  the  latest  fad,  and  no  one  will 
care  about  them  in  a  few  years.  It  is  true  that  some  people 
are  using  things  they  call  “personas”  just  because  they  are 
the  latest  craze,  and  no  doubt,  the  popularity  of  personas 
will  fade  quickly  with  those  people  who  find  that  a  photo,  a 
name,  and  a  collection  of  personal  tidbits  do  not  accomplish 
much.  People  who  look  to  personas  to  solve  all  of  their  prob¬ 
lems  will  also  be  disappointed;  however,  those  practitioners 
who  take  the  time  to  master  personas — which  are  easy  to 
understand  but  not  so  easy  to  develop  and  use — will  find 
that  personas  are  one  tool  they  will  never  put  down. 

In  the  last  few  years.  Cooper  designers  have  been  using 
personas  for  more  than  product  definition  and  interaction 
design  guidance.  We  find  that  experience  goals  (how  a  per¬ 
sona  wants  to  feel,  as  opposed  to  the  end  goals  we  usually 
focus  on  with  personas)  are  invaluable  in  guiding  the  aes¬ 
thetic  aspects  of  the  visual,  auditory,  and  physical  design. 
Does  your  online  banking  persona  want  to  feel  her  money 
is  secure?  A  lime  green  background  and  Comic  Sans  typeface 
are  probably  poor  choices.  Does  a  critically  ill  person  want  to 
be  reminded  of  his  condition?  No,  so  perhaps  his  home 
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healthcare  device  should  not  look  as  if  it  belongs  in  a  hospi¬ 
tal.  We  have  used  personas  for  designing  business  processes, 
organizations,  and  training  curricula  as  well. 

Personas  have  been  adopted  by  certain  parts  of  compa¬ 
nies,  but  not  yet  by  others.  In  many  cases,  we  see  companies 
that  use  personas  for  one  product  but  not  others,  or  who  rely 
heavily  on  personas  to  design  their  websites,  but  do  not  yet 
use  them  to  inform  the  design  of  other  customer-facing  sys¬ 
tems.  I  believe  the  lack  of  adoption  is  not  due  to  a  failing  of 
the  method,  but  to  a  common  failing  of  organizations:  silos. 
The  website  or  product  design  teams  are  generally  isolated 
from  other  parts  of  a  company,  so  those  other  groups  are  un¬ 
aware  of  the  advances  those  teams  are  making.  In  addition, 
there  is  seldom  anyone  with  overall  responsibility  for  a  cus¬ 
tomer’s  or  user’s  experience  with  that  company.  To  realize 
the  true  potential  of  their  investment  in  personas,  companies 
will  need  to  look  for  other  opportunities  to  use  them,  even 
outside  of  electronic  systems.  For  example,  a  couple  of  years 
ago,  I  spoke  with  a  human  resources  publication  interested  in 
applying  employee  personas  to  benefits  plans. 

There  is  no  question  in  my  mind:  personas  are  here  to 
stay,  and  we  will  see  them  showing  up  in  more  and  more 
places.  After  all,  personas  are  about  understanding  our  fel¬ 
low  human  beings,  and  that  will  never  go  out  of  style. 


Areas  for  Further  Investigation 

■While  we  are  obviously  fans  of  the  persona  approach,  personas 
are  not  proven.  Indeed,  our  experiences  with  them  have  proven 
to  us  that  personas  have  their  limits;  personas  are  not  a  panacea, 
and  they  are  not  right  for  every  situation  or  organization.  The 


details  of  these  shortcomings  are  not  well  understood  at  this 
time,  though  some  research  does  exist  (see,  e.g.,  Blomquist  & 
Arvola,  2002;  Ronkko,  Heilman,  Kilander,  &  Dittrich,  2004). 
Clearly,  personas  do  not  help  a  development  team  build  better 
products  merely  by  their  presence.  As  we  argue  here  (and  in  our 
book.  The  Persona  Lifecycle  [Pruitt  &  Adlin,  2006]),  personas 
must  be  assimilated  by  the  product  team  and  explicitly  used  in 
design  and  development  activities.  Nevertheless,  what  uses  of 
personas  have  the  most  value?  'What  characteristics  of  personas 
are  the  most  critical?  We  have  provided  some  process  and  guid¬ 
ance  to  help  practitioners  do  personas  successfully,  but  at  pre¬ 
sent,  this  is  based  on  practice  and  experience,  not  science.  We 
need  several  rigorous  evaluations  of  the  method  to  better  un¬ 
derstand  its  benefits  and  limitations. 

The  process  we  recommend  for  persona  creation,  while 
doable  and  generally  effective,  is  not  particularly  precise.  Given 
the  same  data  as  input,  different  groups  of  people  will  likely 
create  somewhat  different  personas.  Sinha  (2003)  and  others 
(including  us)  have  proposed  quantitative  analysis  methods  as 
the  basis  for  persona  creation  to  enhance  repeatability  and  ac¬ 
curacy.  While  this  is  certainly  a  plausible  move  in  the  right  di¬ 
rection,  it  is  the  application  of  qualitative  data  that  helps  make 
a  persona  into  a  persona.  We  do  not  believe  it  solves  the  prob¬ 
lem.  Further,  the  notion  that  personas  can  be  abstracted  into 
elemental  characteristics  or  directly  reused  across  multiple 
product  lines  is  contentious.  More  investigation  is  needed  here. 

Finally  there  are  some  cases  where  development  teams  have 
attempted  to  create  representations  of  things  other  than  people 
(e.g.,  organizational  archetypes — a  persona  of  a  company)  to  be 
used,  like  personas,  in  the  development  of  products  or  services. 
We  believe  that  such  representations,  while  potentially  useful,  do 
not  have  the  power  and  impact  that  representations  of  people 
have.  They  are  not  as  memorable  or  provocative  as  personas — 
we  relate  to  personas  in  ways  that  we  cannot  relate  to  other  ab¬ 
stractions.  We  suggest  this  as  an  interesting  line  of  research. 
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INTRODUCTION 


“A  good  design  is  better  than  you  think.” 

(Heftman,  as  cited  in  Raskin,  2000). 

Design  is  about  making  choices.  In  many  fields  that  require 
creativity  and  engineering  skill,  such  as  architecture  and  auto¬ 
mobile  design,  prototypes  both  inform  the  design  process  and 
help  designers  select  the  best  solution. 

This  chapter  describes  tools  and  techniques  for  using  pro¬ 
totypes  to  design  interactive  systems.  The  goal  is  to  illustrate 
how  they  can  help  designers  generate  and  share  new  ideas,  get 
feedback  from  users  or  customers,  choose  among  design  alter¬ 
natives  and  articulate  reasons  for  their  final  choices. 

We  begin  with  our  definition  of  a  prototype  and  then  dis¬ 
cuss  prototypes  as  design  artifacts,  introducing  four  dimensions 
for  analyzing  them.  We  then  discuss  the  role  of  prototyping 
within  the  design  process,  in  particular  the  concept  of  a  design 
space  and  of  how  it  is  expanded  and  contracted  by  generating 
and  selecting  design  ideas.  The  following  sections  describe  spe¬ 
cific  prototyping  approaches:  rapid  prototyping,  both  offline 
and  online,  for  early  stages  of  design,  and  iterative  and  evolu¬ 
tionary  prototyping,  which  use  online  development  tools.  Fi¬ 
nally,  we  address  the  specific  issue  of  prototyping  mixed  reality 
and  pervasive  computing  systems. 


What  Is  a  Prototype? 

We  define  a  prototype  as  a  concrete  representation  of  part  or 
all  of  an  interactive  system.  A  prototype  is  a  tangible  artifact, 
not  an  abstract  description  that  requires  interpretation.  Design¬ 
ers,  as  well  as  managers,  developers,  customers  and  end  users, 
can  use  these  artifacts  to  envision  and  to  reflect  upon  the  final 
system. 

Note  that  other  fields  may  define  prototype  differently.  For 
example,  an  architectural  prototype  is  a  scaled-down  model  of 
the  final  building.  This  is  not  possible  for  prototypes  of  interac¬ 
tive  systems:  The  designer  may  limit  the  amount  of  information 
the  prototype  can  handle,  but  the  actual  user  interface  must  be 
presented  at  full  scale.  Thus,  a  prototype  interface  to  a  data¬ 
base  may  handle  only  a  small  subset  of  the  final  database  but 
must  still  present  a  full-size  display  and  interaction  techniques. 
Fashion  designers  create  another  type  of  prototype,  a  full-scale, 
one-of-a-kind  model,  such  as  a  handmade  dress  sample.  Al¬ 
though  in  haute  couture,  this  prototype  may  also  be  the  final 
product,  the  ready-to-wear  market  requires  additional  design 
phases  to  create  a  design  that  can  be  mass-produced  in  a  range 
of  sizes.  Some  interactive  system  prototypes  begin  as  one-of-a- 
kind  models  that  are  then  distributed  widely  (since  the  cost  of 
duplicating  software  is  so  low).  However,  most  successful  soft¬ 
ware  prototypes  evolve  into  the  final  product  and  then  continue 
to  evolve  as  new  versions  of  the  software  are  released. 

Hardware  and  software  engineers  often  create  prototypes 
to  study  the  feasibility  of  a  technical  process.  They  conduct  sys¬ 
tematic,  scientific  evaluations  with  respect  to  predefined  bench¬ 
marks  and,  by  systematically  varying  parameters,  fine  tune  the 
system.  Designers  in  creative  fields,  such  as  typography  or  graphic 


design,  create  prototypes  to  express  ideas  and  reflect  on  them. 
This  approach  is  intuitive,  oriented  more  to  discovery  and  gen¬ 
eration  of  new  ideas  than  to  evaluation  of  existing  ideas. 

HCI  is  a  multidisciplinary  field  that  combines  elements  of  sci¬ 
ence,  engineering,  and  design  (Mackay  &  Fayard,  1997;  Dijkstra- 
Erikson  et  al.,  2001).  Prototyping  is  primarily  a  design  activity,  al¬ 
though  we  use  software  engineering  to  ensure  that  software 
prototypes  evolve  into  technically  sound  working  systems  and 
we  use  scientific  methods  to  study  the  effectiveness  of  particu¬ 
lar  designs. 


PROTOTYPES  AS  DESIGN  ARTIFACTS 


We  can  look  at  prototypes  as  both  concrete  artifacts  in  their  own 
right  or  as  important  components  of  the  design  process.  When 
viewed  as  artifacts,  successful  prototypes  have  several  charac¬ 
teristics:  They  support  creativity,  helping  the  developer  to  cap¬ 
ture  and  generate  ideas,  facilitate  the  exploration  of  a  design 
space,  and  uncover  relevant  information  about  users  and  their 
work  practices.  They  encourage  communication,  helping  de¬ 
signers,  engineers,  managers,  software  developers,  customers 
and  users  to  discuss  options  and  interact  with  each  other.  They 
also  permit  early  evaluation  since  they  can  be  tested  in  various 
ways,  including  traditional  usability  studies  and  informal  user 
feedback,  throughout  the  design  process. 

We  can  analyze  prototypes  and  prototyping  techniques  along 
four  dimensions: 

•  Representation  describes  the  form  of  the  prototype,  such  as 
sets  of  paper  sketches  or  computer  simulations; 

•  Precision  describes  the  level  of  detail  at  which  the  prototype  is 
to  be  evaluated,  such  as  informal  and  rough  or  highly  polished; 

•  Interactivity  describes  the  extent  to  which  the  user  can  ac¬ 
tually  interact  with  the  prototype,  such  as  “watch  only”  or 
fully  interactive; 

•  Evolution  describes  the  expected  life  cycle  of  the  prototype, 
such  as  throwaway  or  iterative. 

Representation 

Prototypes  serve  different  purposes  and  thus  take  different 
forms.  A  series  of  quick  sketches  on  paper  can  be  considered  a 
prototype;  so  can  a  detailed  computer  simulation.  Both  are  use¬ 
ful;  both  help  the  designer  in  different  ways.  We  distinguish  be¬ 
tween  two  basic  forms  of  representation:  offline  and  online. 

Offline  prototypes  (also  called  “paper  prototypes”)  do  not 
require  a  computer.  They  include  paper  sketches,  illustrated 
storyboards,  cardboard  mock  ups  and  videos.  The  most  salient 
characteristics  of  offline  prototypes  (of  interactive  systems)  is 
that  they  are  created  quickly,  usually  in  the  early  stages  of  de¬ 
sign,  and  they  are  usually  thrown  away  when  they  have  served 
their  purposes. 

Online  prototypes  (also  called  “software  prototypes”)  run  on 
a  computer.  They  include  computer  animations,  interactive 
video  presentations,  programs  written  with  scripting  languages, 
and  applications  developed  with  interface  builders.  The  cost  of 
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producing  online  prototypes  is  usually  higher,  and  may  require 
skilled  programmers  to  implement  advanced  interaction  and 
visualization  techniques  or  to  meet  tight  performance  con¬ 
straints.  Software  prototypes  are  usually  more  effective  in  the 
later  stages  of  design,  when  the  basic  design  strategy  has  been 
decided. 

In  our  experience,  programmers  often  argue  in  favor  of 
software  prototypes  even  at  the  earliest  stages  of  design.  Be¬ 
cause  they  are  already  familiar  with  a  programming  language, 
these  programmers  believe  it  will  be  faster  and  more  useful 
to  write  code  than  to  “waste  time”  creating  paper  prototypes.  In 
20  years  of  prototyping,  in  both  research  and  industrial  settings, 
we  have  yet  to  find  a  situation  in  which  this  is  true. 

First,  offline  prototypes  are  very  inexpensive  and  quick. 
These  permit  a  very  rapid  iteration  cycle  and  help  prevent  the 
designer  from  becoming  overly  attached  to  the  first  possible 
solution.  Offline  prototypes  make  it  easier  to  explore  the  design 
space,  examining  a  variety  of  design  alternatives  and  choosing 
the  most  effective  solution.  Online  prototypes  introduce  an  in¬ 
termediary  between  the  idea  and  the  implementation,  slowing 
down  the  design  cycle. 

Second,  offline  prototypes  are  less  likely  to  constrain  how 
the  designer  thinks.  Every  programming  language  or  develop¬ 
ment  environment  imposes  constraints  on  the  interface,  limit¬ 
ing  creativity  and  restricting  the  number  of  ideas  considered.  If 
a  particular  tool  makes  it  easy  to  create  scroll-bars  and  pull¬ 
down  menus  and  difficult  to  create  a  zoomable  interface,  the  de¬ 
signer  is  likely  to  limit  the  interface  accordingly.  Considering  a 
wider  range  of  alternatives,  even  if  the  developer  ends  up  us¬ 
ing  a  standard  set  of  interface  widgets,  usually  results  in  a  more 
effective  design. 

Finally,  and  perhaps  most  importantly,  offline  prototypes  can 
be  created  by  a  wide  range  of  people,  not  just  programmers. 
Thus  all  types  of  designers,  technical  or  otherwise,  as  well  as 
users,  managers,  and  other  interested  parties,  can  all  contribute 
on  an  equal  basis.  Unlike  programming  software,  modifying  a 
storyboard  or  cardboard  mock  up  requires  no  particular  tech¬ 
nical  skill.  Collaborating  on  paper  prototypes  not  only  increases 
participation  in  the  design  process,  but  also  improves  commu¬ 
nication  among  team  members  and  increases  the  likelihood  that 
the  final  design  solution  will  be  well  accepted. 

Although  we  believe  strongly  in  offline  prototypes,  they  are 
not  a  panacea.  In  some  situations,  they  are  insufficient  to  eval¬ 
uate  fully  a  particular  design  idea.  For  example,  interfaces  re¬ 
quiring  rapid  feedback  to  users  or  complex,  dynamic  visualiza¬ 
tions  usually  require  software  prototypes.  However,  particularly 
when  using  video  and  Wizard  of  Oz  techniques,  offline  proto¬ 
types  can  be  used  to  create  very  sophisticated  representations 
of  the  system. 

Prototyping  is  an  iterative  process  and  all  prototypes  pro¬ 
vide  information  about  some  aspects  while  ignoring  others. 
The  designer  must  consider  the  purpose  of  the  prototype 
(Houde  &  Hill,  1997)  at  each  stage  of  the  design  process  and 
choose  the  representation  that  is  best  suited  to  the  current  de¬ 
sign  question. 


Precision 

Prototypes  are  explicit  representations  that  help  designers,  en¬ 
gineers  and  users  reason  about  the  system  being  built.  By  their 
nature,  prototypes  require  details.  A  verbal  description  such  as 
“the  user  opens  the  file”  or  “the  system  displays  the  results”  pro¬ 
vides  no  information  about  what  the  user  actually  does  or  sees. 
Prototypes  force  designers  to  show  the  interaction:  just  how 
does  the  user  open  the  file  and  what  are  the  specific  results  that 
appear  on  the  screen? 

Precision  refers  to  the  relevance  of  details  with  respect  to  the 
purpose  of  the  prototype.^  For  example,  when  sketching  a  dia¬ 
log  box,  the  designer  specifies  its  size,  the  positions  of  each  field 
and  the  titles  of  each  label.  However  not  all  these  details  are 
relevant  to  the  goal  of  the  prototype:  it  may  be  necessary  to 
show  where  the  labels  are,  but  too  early  to  choose  the  text.  The 
designer  can  convey  this  by  writing  nonsense  words  or  draw¬ 
ing  squiggles,  which  shows  the  need  for  labels  without  specify¬ 
ing  their  actual  content. 

Although  it  may  seem  contradictory,  a  detailed  representa¬ 
tion  need  not  be  precise.  This  is  an  important  characteristic  of 
prototypes:  those  parts  of  the  prototype  that  are  not  precise  are 
those  open  for  future  discussion  or  for  exploration  of  the  design 
space.  Yet  they  need  to  be  incarnated  in  some  form  so  the  pro¬ 
totype  can  be  evaluated  and  iterated. 

The  level  of  precision  usually  increases  as  successive  proto¬ 
types  are  developed  and  more  and  more  details  are  set.  The 
forms  of  the  prototypes  reflect  their  level  of  precision:  sketches 
tend  not  to  be  precise,  whereas  computer  simulations  are  usu¬ 
ally  very  precise.  Graphic  designers  often  prefer  using  hand 
sketches  for  early  prototypes  because  the  drawing  style  can  di¬ 
rectly  reflect  what  is  precise  and  what  is  not:  the  wiggly  shape  of 
an  object  or  a  squiggle  that  represents  a  label  are  directly  per¬ 
ceived  as  imprecise.  This  is  more  difficult  to  achieve  with  an  on¬ 
line  drawing  tool  or  a  user  interface  builder. 

The  form  of  the  prototype  must  be  adapted  to  the  desired 
level  of  precision.  Precision  defines  the  tension  between  what 
the  prototype  states  (relevant  details)  and  what  the  prototype 
leaves  open  (irrelevant  details).  What  the  prototype  states  is 
subject  to  evaluation;  what  the  prototype  leaves  open  is  sub¬ 
ject  to  more  discussion  and  design  space  exploration. 

Interactivity 

An  important  characteristic  of  HCI  systems  is  that  they  are  inter¬ 
active:  users  both  respond  to  them  and  act  upon  them.  Unfor¬ 
tunately,  designing  effective  interaction  is  difficult:  many  inter¬ 
active  systems  (including  many  websites)  have  a  good  look  but 
a  poor  feel.  HCI  designers  can  draw  from  a  long  tradition  in  vi¬ 
sual  design  for  the  former,  but  have  relatively  little  experience 
with  how  interactive  software  systems  should  be  used:  personal 
computers  have  only  been  commonplace  for  a  couple  decades. 
Another  problem  is  that  the  quality  of  interaction  is  tightly  linked 
to  the  end  users  and  a  deep  understanding  of  their  work  prac- 


^Note  that  the  terms  low-fidelity  and  high-fidelity  prototypes  are  often  used  in  the  literature.  We  prefer  the  term  precision  because  it  refers  to  the 
content  of  the  prototype  itself,  not  its  relationship  to  the  final,  as-yet-undefined  system. 
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tices:  a  word  processor  designed  for  a  professional  typographer 
requires  a  different  interaction  design  than  one  designed  for  sec¬ 
retaries,  even  though  ostensibly  they  serve  similar  purposes. 
Designers  must  take  the  context  of  use  into  account  when  de¬ 
signing  the  details  of  the  interaction  (Beaudouin-Lafon,  2004). 

A  critical  role  for  an  interactive  system  prototype  is  to  illustrate 
how  the  user  will  interact  with  the  system.  While  this  may  seem 
more  natural  with  online  prototypes,  in  fact  it  is  often  easier  to  ex¬ 
plore  different  interaction  strategies  with  offline  prototypes.  Note 
that  interactivity  and  precision  are  orthogonal  dimensions.  One 
can  create  an  imprecise  prototype  that  is  highly  interactive,  such 
as  a  series  of  paper  screen  images  in  which  one  person  acts  as  the 
user  and  the  other  plays  the  system.  One  may  create  a  very  pre¬ 
cise  but  noninteractive  prototype,  such  as  a  detailed  animation 
that  shows  feedback  from  a  specific  action  by  a  user. 

Prototypes  can  support  interaction  in  various  ways.  For  off¬ 
line  prototypes,  one  person  (often  with  help  from  others)  plays 
the  role  of  the  interactive  system,  presenting  information  and 
responding  to  the  actions  of  another  person  playing  the  role  of 
the  user.  For  online  prototypes,  parts  of  the  software  are  im¬ 
plemented,  while  others  are  played  by  a  person.  (This  approach, 
called  the  “Wizard  of  Oz”  after  the  character  in  the  1939  movie 
of  the  same  name,  is  explained  in  a  later  section.)  The  key  is  that 
the  prototype  feels  interactive  to  the  user. 

Prototypes  can  support  different  levels  of  interaction.  Fixed 
prototypes,  such  as  video  clips  or  precomputed  animations,  are 
noninteractive:  the  user  cannot  interact,  or  pretend  to  interact, 
with  it.  Fixed  prototypes  are  often  used  to  illustrate  or  test  sce¬ 
narios.  Fixed-path  prototypes  support  limited  interaction.  The  ex¬ 
treme  case  is  a  fixed  prototype  in  which  each  step  is  triggered  by 
a  prespecified  user  action.  For  example,  the  person  controlling 
the  prototype  might  present  the  user  with  a  screen  containing  a 
menu.  When  the  user  points  to  the  desired  item,  he  or  she  pre¬ 
sents  the  corresponding  screen  showing  a  dialogue  box.  When 
the  user  points  to  the  word  “OK,”  he  or  she  presents  the  screen 
that  shows  the  effect  of  the  command.  Even  though  the  position 
of  the  click  is  irrelevant  (it  is  used  as  a  trigger),  the  person  playing 
the  role  of  the  user  gets  the  feel  of  the  interaction.  Of  course, 
this  type  of  prototype  can  be  much  more  sophisticated,  with  mul¬ 
tiple  options  at  each  step.  Fixed-path  prototypes  are  very  effec¬ 
tive  with  scenarios  and  can  be  used  for  horizontal  and  task-based 
prototypes  (see  section  on  prototyping  strategies  below) . 

Open  prototypes  support  large  sets  of  interactions.  Such 
prototypes  work  like  the  real  system,  with  some  limitations. 
They  usually  cover  only  part  of  the  system  (see  vertical  proto¬ 
types)  and  often  have  limited  error  handling  or  reduced  per¬ 
formance  relative  to  that  of  the  final  system. 

Prototypes  may  thus  illustrate  or  test  different  levels  of  in¬ 
teractivity.  Fixed  prototypes  simply  illustrate  what  the  interac¬ 
tion  might  look  like.  Fixed-path  prototypes  provide  designers 
and  users  with  the  experience  of  what  the  interaction  might  feel 
like,  but  only  in  prespecified  situations.  Open  prototypes  allow 
designers  and  users  to  explore  a  wide  range  of  possible  forms  of 
interaction  with  the  system. 

Evolution 

Prototypes  have  different  life  spans:  rapid  prototypes  are  cre¬ 
ated  for  a  specific  purpose  and  then  thrown  away,  iterative  pro¬ 


totypes  evolve,  either  to  work  out  some  details  (increasing  their 
precision)  or  to  explore  various  alternatives,  and  evolutionary 
prototypes  are  designed  to  become  part  of  the  final  system. 

Rapid  prototypes  are  especially  important  in  the  early  stages 
of  design.  They  must  be  inexpensive  and  easy  to  produce,  since 
the  goal  is  to  quickly  explore  a  wide  variety  of  possible  types  of 
interaction  and  then  throw  them  away.  Note  that  rapid  proto¬ 
types  may  be  offline  or  online.  Creating  precise  software  proto¬ 
types,  even  if  they  must  be  reimplemented  in  the  final  version  of 
the  system,  is  important  for  detecting  and  fixing  interaction 
problems.  A  later  section  presents  specific  prototyping  tech¬ 
niques,  both  offline  and  online. 

Iterative  prototypes  are  developed  as  a  reflection  of  a  design 
in  progress,  with  the  explicit  goal  of  evolving  through  several 
design  iterations.  Designing  prototypes  that  support  evolution 
is  sometimes  difficult.  There  is  a  tension  between  evolving  to¬ 
ward  the  final  solution  and  exploring  an  unexpected  design  di¬ 
rection,  which  may  be  adopted  or  thrown  away  completely. 
Each  iteration  should  inform  some  aspect  of  the  design.  Some 
iterations  explore  different  variations  of  the  same  theme.  Others 
may  systematically  increase  precision,  working  out  the  finer  de¬ 
tails  of  the  interaction.  A  later  section  describes  tools  and  tech¬ 
niques  for  creating  iterative  prototypes. 

Evolutionary  prototypes  are  a  special  case  of  iterative  proto¬ 
types  in  which  the  prototype  evolves  into  part  or  all  of  the  final 
system  (Fig.  52.1).  Obviously,  this  only  applies  to  software  pro¬ 
totypes.  Extreme  Programming  (Beck,  2000)  advocates  this  ap¬ 
proach,  tightly  coupling  design  and  implementation  and  build¬ 
ing  the  system  through  constant  evolution  of  its  components. 
Evolutionary  prototypes  require  more  planning  and  practice 
than  the  approaches  above  because  the  prototypes  are  both 
representations  of  the  final  system  and  the  final  system  itself, 
making  it  more  difficult  to  explore  alternative  designs.  We  ad¬ 
vocate  a  combined  approach,  beginning  with  rapid  prototypes 
and  then  using  iterative  or  evolutionary  prototypes  according  to 
the  needs  of  the  project.  A  later  section  describes  how  to  cre¬ 
ate  iterative  and  evolutionary  prototypes,  using  appropriate 
development  tools. 


PROTOTYPES  AND  THE  DESIGN  PROCESS 


In  the  previous  section,  we  looked  at  prototypes  as  artifacts,  i.e., 
the  results  of  a  design  process.  Prototypes  can  also  be  seen  as 
artifacts  for  design,  as  an  integral  part  of  the  design  process.  Pro¬ 
totyping  helps  designers  think:  prototypes  are  the  tools  they 
use  to  solve  design  problems.  In  this  section,  we  focus  on  pro¬ 
totyping  as  a  process  and  its  relationship  to  the  overall  design 
process. 

User-Centered  Design 

The  field  of  HCI  is  both  user  centered  (Norman  &  Draper,  1986) 
and  iterative.  User-centered  design  places  the  user  at  the  cen¬ 
ter  of  the  design  process,  from  the  initial  analysis  of  user  re¬ 
quirements  to  testing  and  evaluation.  Prototypes  support  this 
goal  by  allowing  users  see  and  experience  the  final  system  long 
before  it  is  built.  Designers  can  identify  functional  requirements. 
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FIGURE  52.1.  Evolutionary  prototypes  of  the  Apple  Lisa:  July  1979  (left),  October  1980  (right) 
(Perkins  et  al.,  1997)  (©  ACM,  with  permission). 


usability  problems  and  performance  issues  early  and  improve 
the  design  accordingly. 

Iterative  design  involves  multiple  design-implement-test 
loops, ^  enabling  the  designer  to  generate  different  ideas  and 
successively  improve  upon  them.  Prototypes  support  this  goal 
by  allowing  designers  to  evaluate  concrete  representations  of 
design  ideas  and  select  the  best. 

Prototypes  reveal  the  strengths  as  well  as  the  weaknesses  of 
a  design.  Unlike  pure  ideas,  abstract  models  or  other  represen¬ 
tations,  they  can  be  contextualized  to  help  understand  how  the 
real  system  would  be  used  in  a  real  setting.  Because  prototypes 
are  concrete  and  detailed,  designers  can  explore  different  real- 
world  scenarios  and  users  can  evaluate  them  with  respect  to 
their  current  needs.  Prototypes  can  be  compared  directly  with 
other,  existing  systems,  and  designers  can  learn  about  the  con¬ 
text  of  use  and  the  work  practices  of  the  end  users.  Prototypes 
can  help  designers  reanalyze  the  user’s  needs  during  the  design 
process:  not  abstractly,  as  with  traditional  requirements  analysis, 
but  in  the  context  of  the  system  being  built. 

Participatory  Design 

Participatory  design  is  a  form  of  user-centered  design  that  ac¬ 
tively  involves  the  user  in  all  phases  the  design  process  (see 
Greenbaum  &  Kyng,  1991)  Users  are  not  simply  consulted  at 
the  beginning  and  called  in  to  evaluate  the  system  at  the  end; 
they  are  treated  as  partners  throughout.  This  early  and  active 
involvement  of  users  helps  designers  avoid  unpromising  design 
paths  and  develop  a  deeper  understanding  of  the  actual  de¬ 
sign  problem.  Obtaining  user  feedback  at  each  phase  of  the 
process  also  changes  the  nature  of  the  final  evaluation,  which  is 
used  to  fine  tune  the  interface  rather  than  discover  major  us¬ 
ability  problems. 

A  common  misconception  about  participatory  design  is  that 
designers  are  expected  to  abdicate  their  responsibilities  as  de¬ 


signers,  leaving  the  design  to  the  end  user.  In  fact,  the  goal  is  for 
designers  and  users  to  work  together,  each  contributing  their 
strengths  to  clarify  the  design  problem  as  well  as  explore  de¬ 
sign  solutions.  Designers  must  understand  what  users  can  and 
cannot  contribute.  Usually,  users  are  best  at  understanding  the 
context  in  which  the  system  will  be  used  and  subtle  aspects  of 
the  problems  that  must  be  solved.  Innovative  ideas  can  come 
from  both  users  and  designers,  but  the  designer  is  responsible 
for  considering  a  wide  range  of  options  that  might  not  be 
known  to  the  user  and  balancing  the  trade-offs  among  them. 

Because  prototypes  are  shared,  concrete  artifacts,  they  serve 
as  an  effective  medium  for  communication  within  the  design 
team  as  well  as  with  users.  We  have  found  that  collaborating  on 
prototype  design  is  an  effective  way  to  involve  users  in  partici¬ 
patory  design.  Prototypes  help  users  articulate  their  needs  and 
reflect  on  the  efficacy  of  design  solutions  proposed  by  designers. 

Exploring  the  Design  Space 

Design  is  not  a  natural  science:  the  goal  is  not  to  describe  and 
understand  existing  phenomena  but  to  create  something  new. 
Of  course,  designers  benefit  from  scientific  research  findings, 
and  they  may  use  scientific  methods  to  evaluate  interactive  sys¬ 
tems.  However,  designers  also  require  specific  techniques  for 
generating  new  ideas  and  balancing  complex  sets  of  trade-offs 
to  help  them  develop  and  refine  design  ideas. 

Designers  from  fields  such  as  architecture  and  graphic  design 
have  developed  the  concept  of  a  design  space,  which  constrains 
design  possibilities  along  some  dimensions,  while  leaving  others 
open  for  creative  exploration.  Ideas  for  the  design  space  come 
from  many  sources:  existing  systems,  other  designs,  other  de¬ 
signers,  external  inspiration,  and  accidents  that  prompt  new  ideas. 
Designers  are  responsible  for  creating  a  design  space  specific  to 
a  particular  design  problem.  They  explore  this  design  space,  ex¬ 
panding  and  contracting  it  as  they  add  and  eliminate  ideas.  The 


’Software  engineers  refer  to  this  as  the  Spiral  model  (Boehm,  1988) . 
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process  is  iterative:  more  cyclic  than  reductionist.  That  is,  the  de¬ 
signer  does  not  begin  with  a  rough  idea  and  successively  add 
more  details  that  are  precise  until  the  final  solution  is  reached. 
Instead,  he  or  she  begins  with  a  design  problem,  which  imposes 
a  set  of  constraints,  and  then  generates  a  set  of  ideas  to  form  the 
initial  design  space.  He  or  she  then  explores  this  design  space, 
preferably  with  the  user,  and  selects  a  particular  design  direction 
to  pursue.  This  closes  off  part  of  the  design  space,  but  opens  up 
new  dimensions  that  can  be  explored.  The  designer  generates  ad¬ 
ditional  ideas  along  these  dimensions,  explores  the  expanded  de¬ 
sign  space,  and  then  makes  new  design  choices.  Design  principles 
(e.g.,  Beaudouin-Lafon  &  Mackay,  2000)  help  this  process  by  guid¬ 
ing  it  both  in  the  exploration  and  choice  phases.  The  process  con¬ 
tinues,  in  a  cyclic  expansion  and  contraction  of  the  design  space, 
until  a  satisfying  solution  is  reached  or  time  has  run  out. 

All  designers  work  with  constraints:  not  just  limited  budgets 
and  programming  resources,  but  also  design  constraints.  These 
are  not  necessarily  bad:  one  cannot  be  creative  along  all  dimen¬ 
sions  at  once.  However,  some  constraints  are  unnecessary,  de¬ 
rived  from  poor  framing  of  the  original  design  problem.  If  we 
consider  a  design  space  as  a  set  of  ideas  and  a  set  of  constraints, 
the  designer  has  two  options.  She  can  modify  ideas  within  the 
specified  constraints  or  modify  the  constraints  to  enable  new 
sets  of  ideas.  Unlike  traditional  engineering,  which  treats  the  de¬ 
sign  problem  as  a  given,  designers  are  encouraged  to  challenge, 
and  if  necessary,  change  the  initial  design  problem.  If  she 
reaches  an  impasse,  the  designer  can  either  generate  new  ideas 
or  redefine  the  problem  (and  thus  change  the  constraints). 
Some  of  the  most  effective  design  solutions  derive  from  a  more 
careful  understanding  and  reframing  of  the  design  brief 

Note  that  all  members  of  the  design  team,  including  users, 
may  contribute  ideas  to  the  design  space  and  help  select  design 
directions  from  within  it.  However,  it  is  essential  that  these  two 
activities  are  kept  separate.  Expanding  the  design  space  requires 
creativity  and  openness  to  new  ideas.  During  this  phase,  every¬ 
one  should  avoid  criticizing  ideas  and  concentrate  on  generating 
as  many  as  possible.  Clever  ideas,  half-finished  ideas,  silly  ideas, 
and  impractical  Ideas  all  contribute  to  the  richness  of  the  design 
space  and  improve  the  quality  of  the  final  solution.  In  contrast, 
contracting  the  design  space  requires  critical  evaluation  of  ideas. 
During  this  phase,  everyone  should  consider  the  constraints  and 
weigh  the  trade-offs.  Each  major  design  decision  must  eliminate 
part  of  the  design  space:  rejecting  ideas  is  necessary  in  order  to 
experiment  and  refine  others  and  make  progress  in  the  design 
process.  Choosing  a  particular  design  direction  should  spark 
new  sets  of  ideas,  and  those  new  ideas  are  likely  to  pose  new 
design  problems.  In  summary,  exploring  a  design  space  is  the 
process  of  moving  back  and  forth  between  creativity  and  choice. 

Prototypes  aid  designers  in  both  aspects  of  working  with  a 
design  space:  generating  concrete  representations  of  new  ideas 
and  clarifying  specific  design  directions.  The  next  two  sections 
describe  techniques  that  have  proven  most  useful  in  our  own 
prototyping  work,  both  for  research  and  product  development. 

Expanding  the  Design  Space:  Generating  Ideas 

The  most  well-known  idea  generation  technique  is  brainstorm¬ 
ing,  introduced  by  Osborn  (1957).  His  goal  was  to  create  synergy 


within  the  members  of  a  group:  ideas  suggested  by  one  partici¬ 
pant  would  spark  ideas  in  other  participants.  Subsequent  studies 
(Collaros  &  Anderson,  1969;  Diehl  &  Stroebe,  1987)  challenged 
the  effectiveness  of  group  brainstorming,  finding  that  aggregates 
of  individuals  could  produce  the  same  number  of  ideas  as 
groups.  They  found  certain  effects,  such  as  production  blocking, 
free  riding,  and  evaluation  apprehension,  were  sufficient  to  out¬ 
weigh  the  benefits  of  synergy  in  brainstorming  groups.  Since 
then,  many  researchers  have  explored  different  strategies  for  ad¬ 
dressing  these  limitations.  For  our  purpose,  the  quantity  of  ideas 
is  not  the  only  Important  measure:  the  relationships  among 
members  of  the  group  are  also  important.  As  de  Vreede  et  al. 
(2000)  pointed  out,  one  should  also  consider  elaboration  of 
ideas,  as  group  members  react  to  each  other’s  ideas. 

We  have  found  that  brainstorming,  including  a  number  of 
variants,  is  an  important  group-building  exercise  for  participa¬ 
tory  design.  Of  course,  designers  may  brainstorm  ideas  by  them¬ 
selves.  However,  brainstorming  in  a  group  is  more  enjoyable 
and.  If  it  is  a  recurring  part  of  the  design  process,  plays  an  im¬ 
portant  role  In  helping  group  members  share  and  develop  Ideas 
together. 

The  simplest  form  of  brainstorming  involves  a  small  group  of 
people.  The  goal  is  to  generate  as  many  ideas  as  possible  on  a 
prespecified  topic:  quantity,  not  quality,  is  important.  Brain¬ 
storming  sessions  have  two  phases:  the  first  for  generating  ideas 
and  the  second  for  reflecting  upon  them.  The  initial  phase 
should  last  no  more  than  an  hour.  One  person  should  moderate 
the  session,  keeping  time,  ensuring  that  everyone  participates, 
and  preventing  people  from  critiquing  each  other’s  ideas.  Dis¬ 
cussion  should  be  limited  to  clarifying  the  meaning  of  a  partic¬ 
ular  idea.  A  second  person  records  every  idea,  usually  on  a 
flipchart  or  transparency  on  an  overhead  projector.  After  a  short 
break,  participants  are  asked  to  reread  all  the  ideas  and  each 
person  marks  their  three  favorite  ideas. 

One  variation  is  designed  to  ensure  that  everyone  con¬ 
tributes,  not  just  those  who  are  verbally  dominant.  Participants 
write  their  ideas  on  individual  cards  or  Post-it  notes  for  a  pre¬ 
specified  period.  The  moderator  then  reads  each  idea  aloud. 
Authors  are  encouraged  to  elaborate  (but  not  justify)  their 
ideas,  which  are  then  posted  on  a  whiteboard  or  flipchart. 
Group  members  may  continue  to  generate  new  ideas,  inspired 
by  the  others  they  hear. 

We  use  a  variant  of  brainstorming,  called  “video  brainstorm¬ 
ing”  (Mackay,  2000),  as  a  very  fast  technique  for  prototyping  in¬ 
teraction:  instead  of  simply  writing  or  drawing  their  ideas,  par¬ 
ticipants  act  them  out  in  front  of  a  video  camera  (Fig.  52.2).  The 
goal  is  the  same  as  other  brainstorming  exercises,  i.e.,  to  create 
as  many  new  ideas  as  possible,  without  critiquing  them.  How¬ 
ever,  the  use  of  video,  combined  with  paper  or  cardboard  mock 
ups,  encourages  participants  to  experience  the  details  of  the 
interaction  and  to  understand  each  idea  from  the  perspective  of 
the  user,  while  preserving  a  tangible  record  of  the  idea. 

Each  video  brainstorming  idea  should  take  two  to  five  min¬ 
utes  to  generate  and  capture,  allowing  participants  to  simulate  a 
wide  variety  of  ideas  very  quickly.  The  resulting  video  clips  pro¬ 
vide  illustrations  of  each  idea  that  are  easier  to  understand  (and 
remember)  than  hand- written  notes.  (We  find  that  raw  notes 
from  brainstorming  sessions  are  not  very  useful  after  a  few 
weeks  because  the  participants  no  longer  remember  the  con- 
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FIGURE  52.2.  Video  Brainstorming  an  animated  character:  One  participant  uses  an  overhead 
projector  to  project  an  image  on  the  wall  and  responds  to  the  actions  of  a  second  participant, 
who  plays  the  role  of  the  user.  Here,  the  animated  character,  a  very  rough  sketch  on  a  trans¬ 
parency,  reponds  when  the  user  waves  and  moves  its  eyes  to  follow  the  user. 


text  in  which  the  ideas  were  created,  whereas  videobrainstorm¬ 
ing  clips  are  useful  years  later.) 

Video  brainstorming  requires  thinking  more  deeply  about 
each  idea  than  in  traditional  oral  brainstorming.  It  is  possible 
to  stay  vague  and  general  when  describing  an  interaction  in 
words  or  even  with  a  sketch,  but  acting  out  the  interaction  in 
front  of  the  camera  forces  the  author  of  the  idea  (and  the  other 
participants)  to  consider  seriously  the  details  of  how  a  real  user 
would  actually  interact  with  the  idea.  Video  brainstorming  also 
encourages  designers  and  users  to  think  about  new  ideas  in  the 
context  in  which  they  will  be  used.  We  also  find  that  video  clips 
from  a  video  brainstorming  session,  even  though  rough,  are 
much  easier  for  the  design  team  to  interpret  than  written  ideas 
from  a  standard  brainstorming  session. 

We  generally  run  a  standard  brainstorming  session,  either 
orally  or  with  cards,  prior  to  a  video  brainstorming  session,  in 
order  to  maximize  the  number  of  ideas  to  be  explored.  Partici¬ 
pants  then  take  their  favorite  ideas  from  the  previous  session 
and  develop  them  further  as  video  brainstorms.  Each  person  is 
asked  to  direct  at  least  two  ideas,  incorporating  the  hands  or 
voices  of  other  members  of  the  group.  We  find  that,  unlike  stan¬ 
dard  brainstorming,  video  brainstorming  encourages  even  the 
quietest  team  members  to  participate. 

Contracting  the  Design  Space:  Selecting  Alternatives 

After  expanding  the  design  space  by  creating  new  ideas,  design¬ 
ers  must  stop  and  reflect  upon  the  choices  available  to  them.  Af¬ 
ter  exploring  the  design  space,  designers  must  evaluate  their 
options  and  make  concrete  design  decisions:  choosing  some 
ideas,  specifically  rejecting  others,  and  leaving  other  aspects  of 
the  design  open  to  further  idea  generation  activities.  Rejecting 
good,  potentially  effective  ideas  is  difficult,  but  necessary  to 
make  progress. 


Prototypes  often  make  it  easier  to  evaluate  design  ideas  from 
the  user’s  perspective.  They  provide  concrete  representations 
that  can  be  compared.  Many  of  the  evaluation  techniques  de¬ 
scribed  elsewhere  in  this  handbook  could  be  applied  to  proto¬ 
types,  to  help  focus  the  design  space.  The  simplest  situation  is 
when  the  designer  must  choose  among  several  discrete,  inde¬ 
pendent  options.  Running  a  simple  experiment,  using  tech¬ 
niques  borrowed  from  psychology,  allows  the  designer  to  com¬ 
pare  how  users  respond  to  each  of  the  alternatives.  The  designer 
builds  a  prototype,  with  either  fully  implemented  or  simulated 
versions  of  each  option.  The  next  step  is  to  construct  tasks  or 
activities  that  are  typical  of  how  the  system  would  be  used,  and 
ask  people  from  the  user  population  to  try  each  of  the  options 
under  controlled  conditions.  It  is  important  to  keep  everything 
the  same,  except  for  the  options  being  tested. 

Designers  should  base  their  evaluations  on  both  quantitative 
measures,  such  as  speed  or  error  rate,  and  qualitative  measures, 
such  as  the  user’s  subjective  impressions  of  each  option.  Ideally, 
of  course,  one  design  alternative  will  be  clearly  faster,  prone  to 
fewer  errors,  and  preferred  by  the  majority  of  users.  More  often, 
the  results  are  ambiguous,  and  the  designer  must  consider  other 
factors  when  making  the  design  choice.  (Interestingly,  running 
small  experiments  often  highlights  other  design  problems  and 
may  help  the  designer  reformulate  the  design  problem  or  change 
the  design  space.) 

The  more  difficult  (and  common)  situation,  is  when  the  de¬ 
signer  faces  a  complex,  interacting  set  of  design  alternatives,  in 
which  each  design  decision  affects  a  number  of  others.  Design¬ 
ers  can  use  heuristic  evaluation  techniques,  which  rely  on  our 
understanding  of  human  cognition,  memory,  and  sensory-per¬ 
ception.  They  can  also  evaluate  their  designs  with  respect  to 
ergonomic  criteria  or  design  principles  (Beaudouin-Lafon  & 
Mackay  2000). 

Another  strategy  is  to  create  one  or  more  scenarios  that  illus¬ 
trate  how  the  combined  set  of  features  will  be  used  in  a  realistic 
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setting.  The  scenario  must  identify  who  is  involved,  where  the 
activities  take  place,  and  what  the  user  does  over  a  specified 
period.  Good  scenarios  involve  more  than  a  string  of  in¬ 
dependent  tasks;  they  should  incorporate  real-world  activities, 
including  common  or  repeated  tasks,  successful  activities,  break¬ 
downs,  and  errors,  with  both  typical  and  unusual  events.  The 
designer  then  creates  a  prototype  that  simulates  or  implements 
the  aspects  of  the  system  necessary  to  illustrate  each  set  of  de¬ 
sign  alternatives.  Such  prototypes  can  be  tested  by  asking  users 
to  walk  through  the  same  scenario  several  times,  once  for  each 
design  alternative.  As  with  experiments  and  usability  studies,  de¬ 
signers  can  record  both  quantitative  and  qualitative  data,  de¬ 
pending  on  the  level  of  the  prototypes  being  tested. 

The  previous  section  described  an  idea-generation  technique 
called  “video  brainstorming,”  which  allows  designers  to  generate 
a  variety  of  ideas  about  how  to  interact  with  the  future  system. 
We  call  the  corresponding  technique  for  focusing  in  on  a  design 
“video  prototyping.”  Video  prototyping  can  incorporate  any  of 
the  rapid-prototyping  techniques  (offline  or  online)  described  in 
a  later  section.  Video  prototypes  are  quick  to  build,  force  de¬ 
signers  to  consider  the  details  of  how  users  will  react  to  the  de¬ 
sign  in  the  context  in  which  it  will  be  used,  and  provide  an  inex¬ 
pensive  method  of  comparing  complex  sets  of  design  decisions. 

To  an  outside  observer,  video  brainstorming  and  video  pro¬ 
totyping  techniques  look  very  similar:  both  involve  small  de¬ 
sign  groups  working  together,  creating  rapid  prototypes,  and  in¬ 
teracting  with  them  in  front  of  a  video  camera.  Both  result  in 
video  illustrations  that  make  abstract  ideas  concrete  and  help 
team  members  communicate  with  each  other.  The  critical  dif¬ 
ference  is  that  video  brainstorming  expands  the  design  space, 
by  creating  a  number  of  unconnected  collections  of  individual 
ideas,  whereas  video  prototyping  contracts  the  design  space, 
by  showing  how  a  specific  collection  of  design  choices  work 
together  in  a  single  design  proposal. 

Prototyping  Strategies 

Designers  must  decide  what  role  prototypes  should  play  with 
respect  to  the  final  system  and  in  which  order  to  create  different 
aspects  of  the  prototype.  The  next  section  presents  four  strate¬ 
gies:  horizontal,  vertical,  task  oriented,  and  scenario  based, 
which  focus  on  different  design  concerns.  These  strategies  can 
use  any  of  the  prototyping  techniques  covered  in  the  following 
sections. 

Horizontal  prototypes.  The  purpose  of  a  horizontal  pro¬ 
totype  is  to  develop  one  entire  layer  of  the  design  at  the  same 
time.  This  type  of  prototyping  is  most  common  with  large  soft¬ 
ware  development  teams,  where  designers  with  different  skill 
sets  address  different  layers  of  the  software  architecture.  Hori¬ 
zontal  prototypes  of  the  user  interface  are  useful  to  get  an  over¬ 
all  picture  of  the  system  from  the  user’s  perspective  and  address 
issues  such  as  consistency  (similar  functions  are  accessible 
through  similar  user  commands) ,  coverage  (all  required  functions 
are  supported),  and  redundancy  (the  same  function  is/is  not 
accessible  through  different  user  commands). 

User  interface  horizontal  prototypes  can  begin  with  rapid 
prototypes  and  progress  through  to  working  code.  Software 


prototypes  can  be  built  with  an  interface  builder  without  creat¬ 
ing  any  of  the  underlying  functionality,  making  it  possible  to  test 
how  the  user  will  interact  with  the  user  interface  without  wor¬ 
rying  about  how  the  rest  of  the  architecture  works.  However, 
some  level  of  scaffolding  or  simulation  of  the  rest  of  the  appli¬ 
cation  is  often  necessary;  otherwise,  the  prototype  cannot  be 
evaluated  properly.  Consequently,  software  horizontal  proto¬ 
types  tend  to  be  evolutionary,  i.e.,  they  are  progressively  trans¬ 
formed  into  the  Hnal  system. 

Vertical  prototypes.  The  purpose  of  a  vertical  proto¬ 
type  is  to  ensure  that  the  designer  can  implement  the  full, 
working  system,  from  the  user  interface  layer  down  to  the  un¬ 
derlying  system  layer.  "Vertical  prototypes  are  often  built  to  as¬ 
sess  the  feasibility  of  a  feature  described  in  a  horizontal,  task- 
oriented,  or  scenario-based  prototype.  For  example,  when  we 
developed  the  notion  of  magnetic  guidelines  in  the  CPN2000 
system  to  facilitate  the  alignment  of  graphical  objects  (Beau- 
douin-Lafon  &  Mackay  2000,  Beaudouin-Lafon  &  Lassen,  2000), 
we  implemented  a  vertical  prototype  to  test  not  only  the  in¬ 
teraction  technique  but  also  the  layout  algorithm  and  the  per¬ 
formance.  "We  knew  that  we  could  only  include  the  particular 
interaction  technique  if  we  could  implement  a  sufficiently  fast 
response. 

Vertical  prototypes  are  generally  high  precision  software  pro¬ 
totypes  because  their  goals  are  to  validate  ideas  at  the  system 
level.  They  are  often  thrown  away  because  they  are  generally 
created  early  in  the  project,  before  the  overall  architecture  has 
been  decided,  and  they  focus  on  only  one  design  question.  For 
example,  a  vertical  prototype  of  a  spelling  checker  for  a  text  ed¬ 
itor  does  not  require  text-editing  functions  to  be  implemented 
and  tested.  However,  the  final  version  will  need  to  be  integrated 
into  the  rest  of  the  system,  which  may  involve  considerable 
architectural  or  interface  changes. 

Task-oriented  prototypes.  Many  user  interface  design¬ 
ers  begin  with  a  task  analysis  to  identify  the  individual  tasks  that 
the  user  must  accomplish  with  the  system.  Each  task  requires  a 
corresponding  set  of  functionality  from  the  system.  Task-based 
prototypes  are  organized  as  a  series  of  tasks,  which  allows  both 
designers  and  users  to  test  each  task  independently,  systemati¬ 
cally  working  through  the  entire  system. 

Task-oriented  prototypes  include  only  the  functions  neces¬ 
sary  to  implement  the  specified  set  of  tasks.  They  combine  the 
breadth  of  horizontal  prototypes,  to  cover  the  functions  re¬ 
quired  by  those  tasks,  with  the  depth  of  vertical  prototypes,  en¬ 
abling  detailed  analysis  of  how  the  tasks  can  be  supported.  De¬ 
pending  on  the  goal  of  the  prototype,  both  offline  and  online 
representations  can  be  used  for  task-oriented  prototypes. 

Scenario-based  prototypes.  Scenario-based  prototypes 
are  similar  to  task-oriented  ones,  except  that  they  do  not  stress 
individual,  independent  tasks,  but  rather  follow  a  more  realis¬ 
tic  scenario  of  how  the  system  would  be  used  in  a  real-world  set¬ 
ting.  Scenarios  are  stories  that  describe  a  sequence  of  events 
and  how  the  user  reacts.  A  good  scenario  includes  both  com¬ 
mon  and  unusual  situations,  and  should  explore  patterns  of  ac¬ 
tivity  over  time.  Bpdker  (1995)  developed  a  checklist  to  ensure 
that  no  important  issues  have  been  left  out. 
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We  find  it  useful  to  begin  with  anecdotes  derived  from  ob¬ 
servations  of  or  interviews  with  real  users.  Ideally,  some  of  those 
users  should  participate  in  the  creation  of  specific  use  and  other 
users  should  critique  them  based  on  how  realistic  they  are.  Use 
scenarios  are  then  turned  into  design  scenarios,  in  which  the 
same  situations  are  described  but  with  the  functionality  of  the 
proposed  system.  Design  scenarios  are  used,  among  other 
things,  to  create  scenario-based  video  prototypes  or  software 
prototypes.  Like  task-based  prototypes,  the  developer  needs  to 
write  only  the  software  necessary  to  illustrate  the  components 
of  the  design  scenario.  The  goal  is  to  create  a  situation  in  which 
the  user  can  experience  what  the  system  would  be  like  in  a  re¬ 
alistic  use  context,  even  if  it  addresses  only  a  subset  of  the 
planned  functionality. 

The  following  section  describes  a  variety  of  rapid  prototyp¬ 
ing  techniques  that  can  be  used  in  any  of  these  four  prototyping 
strategies.  We  begin  with  offline  rapid  prototyping  techniques, 
followed  by  online  prototyping  techniques. 


RAPID  PROTOTYPING 


The  goal  of  rapid  prototyping  is  to  develop  prototypes  very 
quickly,  in  a  fraction  of  the  time  it  would  take  to  develop  a  work¬ 
ing  system.  By  shortening  the  prototype-evaluation  cycle,  the 
design  team  can  evaluate  more  alternatives  and  iterate  the  de¬ 
sign  several  times,  improving  the  likelihood  of  finding  a  solution 
that  successfully  meets  the  user’s  needs.  Rapid  prototypes  also 
serve  to  cut  off  unpromising  design  directions,  saving  time,  and 
money.  It  is  far  easier  to  reject  an  idea  based  on  a  rapid  proto¬ 
type  than  a  more  fully  developed  one,  and  one  reduces  the 
chance  of  spending  a  great  deal  of  time  and  effort  on  a  design 
that  ultimately  does  not  work. 

How  rapid  is  rapid  depends  on  the  context  of  the  particular 
project  and  the  stage  in  the  design  process.  Early  prototypes, 
such  as  sketches,  can  be  created  in  a  few  minutes.  Later  in  the 
design  cycle,  a  prototype  produced  in  less  than  a  week  may  still 
be  considered  rapid  if  the  final  system  is  expected  to  take 
months  or  years  to  build.  Precision,  interactivity,  and  evolution 
all  affect  the  time  it  takes  to  create  a  prototype.  Not  surprisingly, 
a  precise  and  interactive  prototype  takes  more  time  to  build 
than  an  imprecise  or  fixed  one. 

The  techniques  presented  in  this  section  are  organized  from 
most  rapid  to  least  rapid,  according  to  the  representation  di¬ 
mension  previously  introduced.  Offline  techniques  are  gener¬ 
ally  more  rapid  than  online  techniques.  However,  creating  suc¬ 
cessive  iterations  of  an  online  prototype  may  end  up  being 
faster  than  creating  new  offline  prototypes. 

Offline  Rapid  Prototyping  Techniques 

Offline  prototyping  techniques  range  from  simple  to  very  elab¬ 
orate.  Because  they  do  not  involve  software,  they  are  usually 
considered  a  tool  for  thinking  through  the  design  issues,  to  be 
thrown  away  when  they  are  no  longer  needed.  This  section  de¬ 
scribes  simple  paper  and  pencil  sketches,  three-dimensional 
mock  ups.  Wizard  of  Oz  simulations,  and  video  prototypes. 


Paper  and  pencil.  The  fastest  form  of  prototyping  in¬ 
volves  paper,  transparencies  and  Post-it  notes  to  represent 
aspects  of  an  interactive  system  (e.g.,  Muller,  1991).  By  play¬ 
ing  the  roles  of  both  the  user  and  the  system,  designers  can 
get  a  quick  idea  of  a  wide  variety  of  different  layout  and  inter¬ 
action  alternatives,  in  a  very  short  period  of  time  (Rettig,  1994; 
Snyder,  2003). 

Designers  can  create  a  variety  of  low-cost  special  effects.  For 
example,  a  tiny  triangle  drawn  at  the  end  of  a  long  strip  cut  from 
an  overhead  transparency  makes  a  handy  mouse  pointer,  which 
can  be  moved  by  a  colleague  in  response  to  the  user’s  actions. 
Post-it  notes,  with  prepared  lists,  can  provide  pop-up  menus.  An 
overhead  projector  pointed  at  a  whiteboard  makes  it  easy  to 
project  transparencies  (hand  drawn  or  preprinted,  overlaid  on 
each  other  as  necessary)  to  create  an  interactive  display  on  the 
wall.  The  user  can  interact  by  pointing  (Fig.  52.3)  or  drawing 
on  the  whiteboard.  One  or  more  people  can  watch  the  user  and 
move  the  transparencies  in  response  to  her  actions.  Everyone  in 
the  room  gets  an  immediate  impression  of  how  the  eventual 
interface  might  look  and  feel. 

Note  that  most  paper  prototypes  begin  with  quick  sketches 
on  paper,  then  progress  to  more  carefully  drawn  screen  images 
made  with  a  computer  (Fig.  52.4).  In  the  early  stages,  the  goal 
is  to  generate  a  wide  range  of  ideas  and  expand  the  design 
space,  not  to  determine  the  final  solution.  Paper  and  pencil  pro¬ 
totypes  are  an  excellent  starting  point  for  horizontal,  task-based, 
and  scenario-based  prototyping  strategies. 

Mock  ups.  Architects  use  mock  ups  or  scaled  prototypes 
to  provide  three-dimensional  illustrations  of  future  buildings. 
Mock  ups  are  also  useful  for  interactive  system  designers,  helping 
them  move  beyond  two-dimensional  images  drawn  on  paper  or 
transparencies  (see  Bpdker,  Ehn,  Knudsen,  Kyng,  &  Madsen, 
1988).  Generally  made  of  cardboard,  foamcore,  or  other  found 
materials  (Frishberg,  2006),  mock  ups  are  physical  prototypes  of 


FIGURE  52.3.  Hand-drawn  transparencies  can  be  projected 
onto  a  wall,  creating  an  interface  a  user  can  respond  to. 
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FIGURE  52.4.  Several  people  work  together  to  simulate  inter¬ 
acting  with  this  paper  prototype.  One  person  moves  a  trans¬ 
parency  with  a  mouse  pointer  while  another  moves  the  diagram 
accordingly. 


FIGURE  52.5  Mock-up  of  a  hand-held  display  with  carrying  han¬ 
dle. 

the  new  system.  Fig.  52.5  shows  an  example  of  a  handheld  mock 
up  showing  the  interface  to  a  new  handheld  device.  The  mock  up 
provides  a  deeper  understanding  of  how  the  interaction  will  work 
in  real-world  situations  than  a  set  of  screen  images. 

Mock  ups  allow  the  designer  to  concentrate  on  the  physical 
design  of  the  device,  such  as  the  position  of  buttons  or  the 
screen.  The  designer  can  also  create  several  mock  ups  and  com¬ 
pare  input  or  output  options,  such  as  buttons  versus  trackballs. 
Designers  and  users  should  run  through  different  scenarios, 
identifying  potential  problems  with  the  interface  or  generating 
ideas  for  new  functionality.  Mock  ups  can  also  help  the  designer 
envision  how  an  interactive  system  will  be  incorporated  into  a 
physical  space  (Fig.  52.6). 

Wizard  of  Oz.  Sometimes  it  is  useful  to  give  users  the  im¬ 
pression  that  they  are  working  with  a  real  system,  even  before 
it  exists.  Kelley  (1983)  dubbed  this  technique  the  Wizard  of  Oz, 


FIGURE  52.6.  Scaled  mock-up  of  an  air  traffic  control  table, 
connected  to  a  wall  display. 


based  on  the  scene  in  the  1939  movie  of  the  same  name.  The 
heroine,  Dorothy  and  her  companions  ask  the  mysterious  Wiz¬ 
ard  of  Oz  for  help.  When  they  enter  the  room,  they  see  an  enor¬ 
mous  green  human  head  breathing  smoke  and  speaking  with  a 
deep,  impressive  voice.  When  they  return  later,  they  again  see 
the  Wizard.  This  time,  Dorothy’s  small  dog  pulls  back  a  curtain, 
revealing  a  frail  old  man  pulling  levers  and  making  the  mechan¬ 
ical  Wizard  of  Oz  speak.  They  realize  that  the  impressive  being 
before  them  is  not  a  wizard  at  all,  but  simply  an  interactive  illu¬ 
sion  created  by  the  old  man. 

The  software  version  of  the  Wizard  of  Oz  operates  on  the 
same  principle.  A  user  sits  a  screen  and  interacts  with  what  ap¬ 
pears  to  be  a  working  program.  Hidden  elsewhere,  the  software 
designer  (the  Wizard)  watches  what  the  user  does  and,  by  re¬ 
sponding  to  different  user  actions,  creates  the  illusion  of  a  work¬ 
ing  software  program.  In  some  cases,  the  user  is  unaware  that 
a  person,  rather  than  a  computer,  is  operating  the  system. 

The  Wizard  of  Oz  technique  lets  users  interact  with  par¬ 
tially  functional  computer  systems.  "Whenever  they  encounter 
something  that  has  not  been  implemented  (or  there  is  a  bug), 
a  human  developer  who  is  watching  the  interaction  overrides 
the  prototype  system  and  plays  the  role  destined  to  eventu¬ 
ally  be  played  by  the  computer.  A  combination  of  video  and 
software  can  work  well,  depending  upon  what  needs  to  be 
simulated. 

The  Wizard  of  Oz  technique  was  initially  used  to  develop  nat¬ 
ural  language  interfaces  (e.g.,  Chapanis,  1982;  Good  et  al., 
1984).  Since  then,  the  technique  has  been  used  in  a  wide  variety 
of  situations,  particularly  those  in  which  rapid  responses  from 
users  are  not  critical.  Wizard  of  Oz  simulations  may  consist  of 
paper  prototypes,  fully  implemented  systems,  and  everything  in 
between. 

Video  prototyping.  Video  prototypes  (Mackay  1988)  use 
video  to  illustrate  how  users  will  interact  with  the  new  system. 
As  explained  in  an  earlier  section,  they  differ  from  video  brain¬ 
storming  in  that  the  goal  is  to  refine  a  single  design,  not  gener¬ 
ate  new  ideas.  Video  prototypes  may  build  upon  paper  and  pen- 
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FIGURE  52.7.  A  storyboard  for  a  tangible  interface  that  enables  users  to  establish  and  manage 
their  connections  to  a  small  group  of  friends  and  family. 


cil  prototypes  and  cardboard  mock  ups  and  can  use  existing 
software  and  images  of  real-world  settings. 

We  begin  our  video  prototyping  exercises  by  reviewing  rele¬ 
vant  data  about  users  and  their  work  practices,  and  then  review 
ideas  we  video  brainstormed.  The  next  step  is  to  create  a  use 
scenario,  describing  the  user  at  work.  Once  the  scenario  is  de¬ 
scribed  in  words,  the  designer  develops  a  storyboard.  Similar 
to  a  comic  book,  the  storyboard  shows  a  sequence  of  rough 
sketches  of  each  action  or  event,  with  accompanying  actions 
and/or  dialogue  (or  subtitles),  with  related  annotations  that  ex¬ 
plain  what  is  happening  in  the  scene  or  the  type  of  shot  (Fig. 
52.7).  A  paragraph  of  text  in  a  scenario  corresponds  to  about  a 
page  of  a  storyboard. 

Storyboards  help  designers  refine  their  ideas,  generate 
“what-  if”  scenarios  for  different  approaches  to  a  story,  and  com¬ 
municate  with  the  other  people  who  are  involved  in  the  design 
process.  Some  storyboards  may  be  informal  sketches  of  ideas, 
with  only  partial  information.  Others  follow  a  predefined  format 
and  are  used  to  direct  the  production  and  editing  of  a  video  pro¬ 
totype.  Designers  should  jot  down  notes  on  storyboards  as  they 
think  through  the  details  of  the  interaction. 

Paper  storyboards  can  be  used  as  is  to  communicate  with 
other  members  of  the  design  team.  Designers  and  users  can 
discuss  the  proposed  system  and  alternative  ideas  for  interact¬ 
ing  with  it  (Fig.  52.8).  Simple  videos  of  each  successive  frame, 
with  a  voice  over  to  explain  what  happens,  can  also  be  effec¬ 
tive.  We  usually  use  storyboards  to  help  us  shoot  video  proto¬ 
types,  which  illustrate  how  a  new  system  will  look  to  a  user  in  a 
real-world  setting.  We  find  that  placing  the  elements  of  a  story¬ 
board  on  separate  cards  and  arranging  them  (Mackay  &  Pagani, 
1994)  helps  the  designer  experiment  with  different  linear  se¬ 
quences  and  insert  or  delete  video  clips.  Continuing  to  the  next 
step,  i.e.  creating  a  video  prototype  based  on  the  storyboard, 
forces  designers  to  consider  the  design  details  in  even  greater 
depth. 


FIGURE  52.8  Video  Prototyping:  The  CPN  design  team  reviews 
their  observations  of  CPN  developers  and  then  discuss  several 
design  alternatives.  They  work  out  a  scenario  and  storyboard  it, 
then  shoot  a  video  prototype  that  reflects  their  design. 

Storyboards,  even  very  informal  ones,  are  essential  guides 
for  shooting  video  prototypes.  To  avoid  spending  time  in  post¬ 
production,  we  use  a  technique  called  “editing-in-the-camera” 
(see  Mackay,  2000)  in  which  each  video  clip,  guided  by  the  sto¬ 
ryboard,  is  shot  in  the  order  that  it  will  be  viewed.  With  a  well- 
designed  storyboard,  this  is  just  as  easy  and  results  in  a  fin¬ 
ished  video  prototype  at  the  end  of  a  one-hour  session.  Note 
that  today’s  digital  video  cameras  include  editing  features  in 
the  camera,  which  introduces  the  temptation  to  make  editing 
changes  on  the  fly.  Our  students  who  do  this  consistently  take 
more  time  than  their  colleagues  who  do  not,  usually  with  worse 
results.  In  general,  we  recommend  avoiding  post  hoc  editing 
and  just  following  the  storyboard. 
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We  use  title  cards,  as  in  a  silent  movie,  to  separate  the  clips. 
This  both  simplifies  shooting  and  makes  it  easier  for  the  viewer 
to  follow  the  story.  Title  cards  also  provide  the  only  acceptable 
way  to  edit  while  you  are  shooting:  if  you  make  an  error,  you 
should  address  it  immediately  by  rewinding  to  the  last  title  card 
and  continue  shooting  from  there. 

Video  prototypes  take  several  forms.  In  some,  a  narrator  ex¬ 
plains  each  event  and  several  people  on  the  sidelines  may  be 
necessary  to  move  images  and  illustrate  the  interaction.  In  oth¬ 
ers,  actors  simply  perform  the  movements  and  the  viewer  is  ex¬ 
pected  to  understand  the  interaction  without  a  voice  over.  You 
can  easily  create  simple  special  effects.  For  example,  time-lapse 
photography  allows  you  to  have  images  appear  and  disappear 
based  on  the  user’s  interaction.  For  example,  record  a  clip  of  a 
user  pressing  a  button,  press  “pause”  on  the  camera,  add  a  pop¬ 
up  menu,  then  restart  the  camera,  to  create  the  illusion  of  im¬ 
mediate  system  feedback. 

Video  prototypes  should  begin  with  a  title,  followed  by  an  es¬ 
tablishing  shot  that  shows  the  user  in  the  context  defined  by  the 
scenario.  Next,  create  a  series  of  close-up  and  midrange  shots, 
interspersed  with  title  cards  to  explain  the  story.  Place  a  final  ti¬ 
tle  card  with  credits  at  the  end.  We  print  blank  title  cards  on 
colored  paper  to  make  it  easier  to  search  for  the  sections  of  the 
video  later:  When  you  fast-forward  through  video,  a  solid  blue 
or  red  frame  clearly  stands  out  (for  detailed  examples  of  video- 
based  prototyping  techniques,  see  Mackay  2002). 

Video  prototypes  are  fixed  prototypes.  Flowever,  it  is  also 
possible  to  use  video  as  an  open  prototype,  in  which  users  can 
interact  with  the  prototype  in  an  open-ended  way.  Video  thus 
becomes  a  tool  for  sketching  and  visualizing  interactions.  For 
example,  we  sometimes  use  a  live  video  camera  as  a  Wizard  of 
Oz  tool  and  capture  the  interaction  with  a  second  video  camera. 
The  Wizard  should  have  access  to  a  set  of  prototyping  materi¬ 
als  representing  screen  objects.  Other  team  members  stand  by 
ready  to  help  move  objects  as  needed.  The  live  camera  is 
pointed  at  the  Wizard’s  work  area,  with  either  a  paper  prototype 
or  a  partially  working  software  simulation.  The  resulting  image 
is  projected  onto  a  screen  or  monitor  in  front  of  the  user.  One 
or  more  people  should  be  situated  so  that  they  can  observe  the 
actions  of  the  user  and  manipulate  the  projected  video  image 
accordingly.  This  is  most  effective  if  the  Wizard  is  well  prepared 
for  a  variety  of  events  and  can  present  semiautomated  informa¬ 
tion.  The  user  interacts  with  the  objects  he  or  she  sees  on  the 
screen  and  the  Wizard  moves  the  relevant  materials  in  direct 
response  to  each  user  action.  The  other  camera  records  the  in¬ 
teraction  between  the  user  and  the  simulated  software  system 
on  the  screen,  to  create  either  a  video  brainstorm  (for  a  quick 
idea)  or  a  fully  storyboarded  video  prototype. 

Fig.  52.9  shows  a  Wizard  of  Oz  simulation  with  a  live  video 
camera,  video  projector,  whiteboard,  overhead  projector,  and 
transparencies.  The  setup  allows  two  people  to  experience  how 
they  would  communicate  via  a  new  interactive  communication 
system.  One  video  camera  films  the  blond  woman,  who  can  see 
and  talk  to  the  brunette  woman.  Tier  image  is  projected  live 
onto  the  left  side  of  the  wall.  An  overhead  projector  displays 
hand  drawn  transparencies,  manipulated  by  two  other  people, 
in  response  to  gestures  made  by  the  brunette  woman.  The  en¬ 
tire  interaction  is  videotaped  by  a  second  video  camera.  Note 
that  participants  at  a  workshop  on  user  interfaces  for  air-traffic 
control  created  this  video:  none  of  the  participants  had  ever 


FIGURE  52.9.  Complex  wizard-of-Oz  simulation,  with  projected 
image  from  a  live  video  camera  and  transparencies  projected 
from  an  overhead  projector. 

used  video  prototyping  techniques  but  they  were  able  to  set  up 
this  Wizard  of  Oz  style  environment  and  use  it  to  generate  new 
interaction  ideas  in  less  than  30  minutes. 

Combining  Wizard  of  Oz  and  video  is  a  particularly  power¬ 
ful  prototyping  technique  because  it  gives  the  person  playing 
the  user  a  real  sense  of  what  it  might  actually  feel  like  to  inter¬ 
act  with  the  proposed  tool,  long  before  it  has  been  imple¬ 
mented.  Seeing  a  video  clip  of  someone  else  interacting  with  a 
simulated  tool  is  more  effective  than  simply  hearing  about  it, 
but  interacting  with  it  directly  is  more  powerful  still.  Note  that 
video  should  be  used  with  caution,  particularly  when  video 
prototypes  are  taken  out  of  their  original  design  setting  (for 
a  more  detailed  discussion  of  the  ethical  issues  involved,  see 
Mackay  1995). 

"Video  prototyping  may  act  as  a  form  of  specification  for  de¬ 
velopers,  enabling  them  to  build  the  precise  interface,  both  vi¬ 
sually  and  interactively,  created  by  the  design  team.  This  is  par¬ 
ticularly  useful  when  moving  from  offline  to  online  prototypes, 
which  we  now  describe. 

Online  Rapid  Prototyping  Techniques 

The  goal  of  online  rapid  prototyping  is  to  create  higher  preci¬ 
sion  prototypes  than  can  be  achieved  with  offline  techniques. 
Such  prototypes  may  prove  useful  to  better  communicate  ideas 
to  clients,  managers,  developers  and  end  users.  They  are  also 
useful  for  the  design  team  to  fine  tune  the  details  of  a  layout  or 
an  interaction.  They  may  exhibit  problems  in  the  design  that 
were  not  apparent  in  less  precise  prototypes.  Finally,  they  may 
be  used  early  on  in  the  design  process  for  low  precision  proto¬ 
types  that  would  be  difficult  to  create  offline,  such  as  when  very 
dynamic  interactions  or  visualizations  are  needed. 

The  techniques  presented  in  this  section  are  sorted  by  in¬ 
teractivity.  We  start  with  noninteractive  simulations,  such  as  an¬ 
imations,  followed  by  interactive  simulations  that  provide  fixed 
or  multiple-paths  interactions.  We  finish  with  scripting  lan¬ 
guages,  which  support  open  interactions. 
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Noninteractive  simulations.  A  noninteractive  simula¬ 
tion  is  a  computer-generated  animation  that  represents  what  a 
person  would  see  of  the  system  if  he  or  she  were  watching  over 
the  user’s  shoulder.  Noninteractive  simulations  are  usually  cre¬ 
ated  when  offline  prototypes,  including  video,  fail  to  capture  a 
particular  aspect  of  the  interaction  and  it  is  important  to  have  a 
quick  prototype  to  evaluate  the  idea.  It  is  usually  best  to  start  by 
creating  a  storyboard  to  describe  the  animation,  especially  if  the 
developer  of  the  prototype  is  not  a  member  of  the  design  team. 

One  of  the  most  widely  used  tools  for  noninteractive  simu¬ 
lations  is  Macromedia  Director™ .  The  designer  defines  graphi¬ 
cal  objects  called  “sprites,”  and  defines  paths  along  which  to 
animate  them.  The  succession  of  events,  such  as  when  sprites 
appear  and  disappear,  is  determined  with  a  time  line.  Sprites  are 
usually  created  with  drawing  tools,  such  as  Adobe  Illustrator  or 
Deneba  Canvas,  painting  tools,  such  as  Adobe  Photoshop,  or 
even  scanned  images.  Director  is  a  very  powerful  tool  and  ex¬ 
perienced  developers  use  it  quickly  to  create  sophisticated  in¬ 
teractive  simulations.  (However,  it  is  still  faster  to  create  nonin¬ 
teractive  simulations.)  Other  similar  tools  exist  on  the  market 
such  as  Adobe  AfterEffects  and  Macromedia  Flash  (Fig.  52.10). 

Fig.  52.11  shows  a  set  of  animation  movies  created  by  Cur- 
bow  to  explore  the  notion  of  accountability  in  computer  sys- 


FIGURE  52.10.  A  non-interactive  simulation  of  a  desktop  inter¬ 
face  created  with  Macromedia  Flash.  The  time-line  (top)  displays 
the  active  sprites  while  the  main  window  (bottom)  shows  the  ani¬ 
mation.  (©  O.  Beaudoux,  with  permission) 
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bow,  with  permission) 
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terns  (Dourish,  1997).  These  prototypes  explore  new  ways  to  in¬ 
form  the  user  of  the  progress  of  a  file  copy  operation.  They  were 
created  with  Macromind  Director  by  combining  custom-made 
sprites  with  sprites  extracted  from  snapshots  of  the  Macintosh 
Finder.  The  simulation  features  cursor  motion,  icons  being 
dragged,  windows  opening  and  closing,  and  so  forth.  The  re¬ 
sult  is  a  realistic  prototype  that  shows  how  the  interface  looks 
and  behaves,  that  was  created  in  just  a  few  hours.  Note  that  the 
simulation  also  features  text  annotations  to  explain  each  step, 
which  helps  document  the  prototype. 

Noninteractive  animations  can  be  created  with  any  tool  that 
generates  images.  For  example,  many  web  designers  use  Adobe 
Photoshop  to  create  simulations  of  their  websites.  Photoshop 
images  are  composed  of  various  layers  that  overlap  like  trans¬ 
parencies.  The  visibility  and  relative  position  of  each  layer  can 
be  controlled  independently.  Designers  can  quickly  add  or 
delete  visual  elements,  simply  by  changing  the  characteristics  of 
the  relevant  layer.  This  permits  quick  comparisons  of  alternative 
designs  and  helps  visualize  multiple  pages  that  share  a  common 
layout  or  banner.  Skilled  Photoshop  users  find  this  approach 
much  faster  than  most  web-authoring  tools. 

We  used  this  technique  in  the  CPN2000  project  (Mackay, 
Ratzer,  &  Janecek,  2000)  to  prototype  the  use  of  transparency. 
After  several  prototyping  sessions  with  transparencies  and  over¬ 
head  projectors,  we  moved  to  the  computer  to  understand  the 
differences  between  the  physical  transparencies  and  the  trans¬ 
parent  effect  as  it  would  be  rendered  on  a  computer  screen.  We 
later  developed  an  interactive  prototype  with  OpenGL,  which 
required  an  order  of  magnitude  more  time  to  implement  than 
the  Photoshop  mock  up. 

Even  a  spreadsheet  program  can  be  used  for  prototyping: 
Berger  (2006)  described  the  use  of  Microsoft  Excel  to  prototype 
form-based  interfaces.  First,  a  template  is  created  that  contains 
a  number  of  reusable  elements  by  taking  advantage  of  the  work¬ 
book  feature  of  Excel,  where  multiple  pages  can  be  presented 
with  a  tabbed  interface.  Then,  prototypes  are  created  by  copy¬ 
ing  and  pasting  items  from  the  template  into  a  blank  page,  tak¬ 
ing  advantage  of  the  table  structure  of  the  spreadsheet  to  create 
grid  layouts. 

Interactive  simulations.  Designers  can  also  use  tools 
such  as  Adobe  Photoshop  to  create  Wizard  of  Oz  simulations. 
For  example,  the  effect  of  dragging  an  icon  with  the  mouse  can 
be  obtained  by  placing  the  icon  of  a  file  in  one  layer  and  the  icon 
of  the  cursor  in  another  layer  and  by  moving  either  or  both  lay¬ 


ers.  The  visibility  of  layers,  as  well  as  other  attributes,  can  also 
create  more  complex  effects.  Like  Wizard  of  Oz  and  other  pa¬ 
per  prototyping  techniques,  the  behavior  of  the  interface  is  gen¬ 
erated  by  the  user  who  is  operating  the  Photoshop  interface. 

More  specialized  tools,  such  as  HyperCard  and  Macromedia 
Director,  can  be  used  to  create  simulations  with  which  the  user 
can  directly  interact.  HyperCard  (Goodman,  1987)  was  one  of 
the  most  successful  early  prototyping  tools.  It  was  an  authoring 
environment  based  on  a  stack  metaphor:  a  stack  contains  a  set 
of  cards  that  share  a  background,  including  fields  and  buttons. 
Each  card  can  also  have  its  own  unique  contents,  including 
fields  and  buttons  (Eig.  52.12).  Using  a  scripting  language, 
stacks,  cards,  fields,  and  buttons  can  react  to  user  events,  such 
as  clicking  a  button,  as  well  as  system  events,  for  example,  when 
a  new  card  is  displayed  or  about  to  disappear. 

Interfaces  can  be  prototyped  quickly  with  this  approach,  by 
drawing  different  states  in  successive  cards  and  using  buttons  to 
switch  from  one  card  to  the  next.  Multiple-path  interactions  can 
be  programmed  by  using  several  buttons  on  each  card.  Interac¬ 
tions  that  are  more  open  require  advanced  use  of  the  scripting 
language,  but  are  easy  to  master  with  a  little  practice. 

Macromind  Director  uses  a  different  metaphor,  attaching  be¬ 
haviors  to  sprites  and  to  frames  of  the  animation.  For  example, 
a  button  can  be  defined  by  attaching  a  behavior  to  the  sprite 
representing  that  button.  When  the  sprite  is  clicked,  the  ani¬ 
mation  jumps  to  a  different  sequence.  This  is  usually  coupled 
with  a  behavior  attached  to  the  frame  containing  the  button  that 
loops  the  animation  on  the  same  frame.  As  a  result,  nothing 
happens  until  the  user  clicks  the  button,  at  which  point  the  an¬ 
imation  skips  to  a  sequence  where,  for  example,  a  dialogue  box 
opens.  The  same  technique  can  be  used  to  make  the  OK  and 
Cancel  buttons  of  the  dialogue  box  interactive.  Typically,  the 
Cancel  button  would  skip  to  the  original  frame  while  the  OK 
button  would  skip  to  a  third  sequence.  Director  comes  with  a 
large  library  of  behaviors  to  describe  such  interactions,  so  that 
prototypes  can  be  created  completely  interactively.  New  behav¬ 
iors  can  also  be  defined  with  a  scripting  language  called  Lingo. 

Scripting  languages.  Scripting  languages  are  the  most  ad¬ 
vanced  rapid  prototyping  tools.  As  with  the  interactive-simulation 
tools  described  above,  the  distinction  between  rapid  prototyp¬ 
ing  tools  and  development  tools  is  not  always  clear.  Scripting  lan¬ 
guages  make  it  easy  to  quickly  develop  throwaway  prototypes 
(a  few  hours  to  a  few  days),  which  may  or  may  not  be  used  in  the 
final  system,  for  performance  or  other  technical  reasons. 
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FIGURE  52. 1 2.  A  HyperCard  card  (right)  is  the  combination  of  a  background  (left)  and  the  card's 
content  (center)  (©  Apple  Computer,  with  permission) 
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A  scripting  language  is  a  programming  language  that  is  both 
lightweight  and  easy  to  learn.  Most  scripting  languages  are  in¬ 
terpreted  or  semicompiled;  for  example,  the  user  does  not 
need  to  go  through  a  compile-link-run  cycle  each  time  the  script 
(program)  is  changed.  Scripting  languages  can  be  forbidding: 
they  are  not  strongly  typed  and  nonfatal  errors  are  ignored  un¬ 
less  explicitly  trapped  by  the  programmer.  Scripting  languages 
are  often  used  to  write  small  applications  for  specific  purposes 
and  can  serve  as  glue  between  preexisting  applications  or  soft¬ 
ware  components. 

Tcl  (Ousterhout,  1993)  is  particularly  suitable  to  develop  user 
interface  prototypes  (or  small  to  medium-size  applications)  be¬ 
cause  of  its  Tk  user  interface  toolkit.  Tcl  was  inspired  by  the  syn¬ 
tax  of  the  Unix  shell  and  makes  it  very  easy  to  interface  existing 
applications  by  turning  the  application  programming  interface 
(API)  into  a  set  of  commands  that  can  be  called  directly  from  a 
Tcl  script.  Tk  features  all  the  traditional  interactive  objects 
(called  “widgets”)  of  a  UI  toolkit:  buttons,  menus,  scrollbars, 
lists,  dialogue  boxes,  and  so  forth.  A  widget  is  typically  created 
with  only  one  line.  For  example, 

button  . dialogbox . ok  -text  OK  -command  (de¬ 
stroy  .dialogbox}. 

This  command  creates  a  button,  called  “.dialogbox.ok,” 
whose  label  is  “OK.”  It  deletes  its  parent  window  “.dialogbox” 
when  the  button  pressed.  A  traditional  programming  language 
and  toolkit  would  take  5  to  20  lines  to  create  the  same  button. 

Tcl  also  has  two  advanced,  heavily  parameterized  widgets:  the 
text  widget  and  the  canvas  widget.  The  text  widget  can  be  used 
to  prototype  text-based  interfaces.  Any  character  in  the  text  can 
react  to  user  input  using  tags.  For  example,  it  is  possible  to  turn 
a  string  of  characters  into  a  hypertext  link.  In  Beaudouin-Lafon 
(2000),  the  text  widget  was  used  to  prototype  a  new  method  for 
finding  and  replacing  text.  When  entering  the  search  string,  all 
occurrences  of  the  string  are  highlighted  in  the  text  (Fig.  52.13). 
Once  a  replace  string  has  been  entered,  clicking  an  occurrence 
replaces  it  (the  highlighting  changes  from  yellow  to  red).  Click¬ 
ing  a  replaced  occurrence  returns  it  to  its  original  value.  This  ex¬ 
ample  also  uses  the  canvas  widget  to  create  a  custom  scrollbar 
that  displays  the  positions  and  status  of  the  occurrences. 

The  Tk  canvas  widget  is  a  drawing  surface  that  can  contain  ar¬ 
bitrary  objects:  lines,  rectangles,  ovals,  polygons,  text  strings,  and 
widgets.  Tags  allow  behaviors  (e.g.,  scripts)  that  are  called  when 
the  user  acts  on  these  objects.  For  example,  an  object  that  can 
be  dragged  will  be  assigned  a  tag  with  three  behaviors:  button- 
press,  mouse-move,  and  button-up.  Because  of  the  flexibility  of 
the  canvas,  advanced  visualization  and  interaction  techniques  can 
be  implemented  more  quickly  and  easily  than  with  other  tools. 
For  example.  Fig.  52.14  shows  a  prototype  exploring  new  ideas  to 
manage  overlapping  windows  on  the  screen  (Beaudouin-Lafon, 
2001).  Windows  can  be  stacked  and  slightly  rotated  so  that  it  is 
easier  to  recognize  them,  and  they  can  be  folded  so  it  is  possible 
to  see  what  is  underneath  without  having  to  move  the  window. 
Even  though  the  prototype  is  not  perfect  e.g.,  folding  a  window 
that  contains  text  is  not  properly  supported,  it  was  instrumental 
in  identifying  a  number  of  problems  with  the  interaction  tech¬ 
niques  and  finding  appropriate  solutions  through  iterative  design. 

Tcl  and  Tk  can  also  be  used  with  other  programming  lan¬ 
guages.  For  example,  Pad-f-F  (Bederson  &  Meyer,  1998)  is  im- 


In  siunmary,  domain  objects  form  the  basis  of  the 
interaction  as  well  as  its  purpose:  Users  operate  on  domain 
objects  by  editing  their  attributes.  They  also  manipulate 
them  as  a  whole,  e.g.  to  create,  move  and  delete  them. 

Interaction  linstrumentls 

An  interaction  [instrument!  is  a  mediator  or  two-way 
transducer  between  the  user  and  domain  objects.  The  user 
acts  on  the  kotM,  which  transforms  the  user's  actions 
into  commands  affecting  relevant  target  domain  objects. 
Instruments  have  reactions  enabling  users  to  control  their 
actions  on  the  and  provide  feedback  as  the 

command  is  carried  out  on  target  objects  (Figure  1). 

A  scrollbar  is  a  good  example  of  an  interaction  jinstrumentl. 
It  operates  on  a  whole  document  by  changing  the  part  that 
is  currently  visible.  When  the  user  clicks  on  one  of  the 
arrows  of  the  scrollbar,  the  scrollbar  sends  the  document  a 


Search  string  jinslrument 


Replace  with  jtool 


FIGURE  52.13.  Using  the  Tk  text  and  canvas  widgets  to  prototype 
a  novel  search  and  replace  interaction  technique  (Beaudouin- 
Lafon,  2000). 


FIGURE  52. 14.  Using  the  Tk  canvas  widget  to  prototype  a  novel 
window  manager  (Beaudouin-Lafon,  2001). 


plemented  as  an  extension  to  Tcl/Tk:  the  zoomable  interface  is 
implemented  in  C  for  performance  and  is  accessible  from  Tk  as 
a  new  widget.  This  makes  it  easy  to  prototype  interfaces  that 
use  zooming.  It  is  also  a  way  to  develop  evolutionary  proto¬ 
types:  a  first  prototype  is  implemented  completely  in  Tcl,  then 
parts  of  it  are  reimplemented  in  a  compiled  language  to  per¬ 
formance.  Ultimately,  the  complete  system  may  be  imple¬ 
mented  in  another  language,  although  it  is  more  likely  that 
some  parts  will  remain  in  Tcl. 

Software  prototypes  can  also  be  used  in  conjunction  with 
hardware  prototypes.  Fig.  52.15  shows  an  example  of  a  hard¬ 
ware  prototype  that  captures  handwritten  text  from  a  paper 
flight  strip  (using  a  combination  of  a  graphics  tablet  and  a  cus¬ 
tom-designed  system  for  detecting  the  position  of  the  paper 
strip  holder) .  We  used  Tcl/Tk,  in  conjunction  with  C  -F  -F ,  to  pre¬ 
sent  information  on  a  RADAR  screen  (tied  to  an  existing  air  traf¬ 
fic  control  simulator)  and  to  provide  feedback  on  a  touch-sen¬ 
sitive  display  next  to  the  paper  flight  strips  (Cameleon,  Mackay 
et  al.,  1998) .  The  user  can  write  in  the  ordinary  way  on  the  paper 
flight  strip,  and  the  system  interprets  the  gestures  according  to 
the  location  of  the  writing  on  the  strip.  For  example,  a  change  in 
flight  level  is  automatically  sent  to  another  controller  for  confir¬ 
mation  and  a  physical  tap  on  the  strip’s  ID  lights  up  the  corre¬ 
sponding  aircraft  on  the  RADAR  screen.  A  later  section  of  this 
chapter  expands  this  approach  to  the  development  of  proto¬ 
types  of  mixed  reality  and  pervasive  computing  systems. 
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FIGURE  52.15.  Cameleon's  augmented  stripboard  (left)  is  a  working  hardware  prototype  that 
identifies  and  captures  hand-writing  from  paper  flight  strips.  Members  of  the  design  team  test  the 
system  (right),  which  combines  both  hardware  and  software  prototypes  into  a  single  interactive 
simulation. 


ITERATIVE  AND  EVOLUTIONARY  PROTOTYPES 


Prototypes  may  also  be  developed  with  traditional  software  de¬ 
velopment  tools.  In  particular,  high-precision  prototypes  usually 
require  a  level  of  performance  that  cannot  be  achieved  with  the 
rapid  online  prototyping  techniques  previously  described.  Sim¬ 
ilarly,  evolutionary  prototypes  intended  to  evolve  into  the  final 
product  require  more  traditional  software  development  tools. 
Finally,  even  shipped  products  are  not  final,  since  subsequent 
releases  can  be  viewed  as  initial  designs  for  prototyping  the  next 
release. 

Development  tools  for  interactive  systems  have  been  in  use 
for  over  20  years  and  are  constantly  being  refined.  Several  stud¬ 
ies  have  shown  that  the  part  of  the  development  cost  of  an  ap¬ 
plication  spent  on  the  user  interface  is  50%  to  80%  of  the  total 
cost  of  the  project  (Myers  &  Rosson,  1992).  The  goal  of  devel¬ 
opment  tools  is  to  shift  this  balance  by  reducing  production  and 
maintenance  costs.  Another  goal  of  development  tools  is  to  an¬ 
ticipate  the  evolution  of  the  system  over  successive  releases  and 
support  iterative  design. 

The  lowest  level  tools  are  graphical  libraries,  which  provide 
hardware  independence  for  painting  pixels  on  a  screen  and  han¬ 
dling  user  input  and  window  systems  that  provide  an  abstraction 
(the  window)  to  structure  the  screen  into  several  virtual  termi¬ 
nals.  User  interface  toolkits  structure  an  interface  as  a  tree  of  in¬ 
teractive  objects  called  “widgets,”  while  user  interface  builders 
provide  an  interactive  application  to  create  and  edit  those  widget 
trees.  Application  frameworks  build  on  toolkits  and  UI  builders 
to  facilitate  creation  of  typical  functions  such  as  cut/copy/paste, 
undo,  help  and  interfaces  based  on  editing  multiple  documents 
in  separate  windows.  Model-based  tools  semiautomatically  de¬ 
rive  an  interface  from  a  specification  of  the  domain  objects  and 
functions  to  be  supported.  Finally,  user  interface  development 


environments  (UIDEs)  provide  an  integrated  collection  of  tools 
for  the  development  of  interactive  software. 

Before  we  describe  some  of  these  categories  in  more  detail,  it 
is  important  to  understand  how  they  can  be  used  for  prototyp¬ 
ing.  It  is  not  always  best  to  use  the  highest  level  available  tool 
for  prototyping.  High-level  tools  are  most  valuable  in  the  long 
term  because  they  make  it  easier  to  maintain  the  system,  port  it 
to  various  platforms,  or  localize  it  to  different  languages.  These 
issues  are  irrelevant  for  vertical  and  throwaway  prototypes,  so  a 
high-level  tool  may  prove  less  effective  than  a  lower  level  one. 

The  main  disadvantage  of  higher  level  tools  is  that  they  con¬ 
strain  or  stereotype  the  types  of  interfaces  they  can  implement. 
User  interface  toolkits  usually  contain  a  limited  set  of  widgets, 
and  it  is  expensive  to  create  new  ones.  If  the  design  must  incor¬ 
porate  new  interaction  techniques,  such  as  bimanual  interaction 
(Kurtenbach,  Fitzmaurice,  Baudel,  &  Buxton,  1997)  or  zoomable 
interfaces  (Bederson  &  Hollan,  1994),  a  user  interface  toolkit  will 
hinder  rather  than  help  prototype  development.  Similarly,  ap¬ 
plication  frameworks  assume  a  stereotyped  application  with  a 
menu  bar,  several  toolbars,  a  set  of  windows  holding  documents, 
and  so  forth.  Such  a  framework  would  be  inappropriate  for  de¬ 
veloping  a  game  or  a  multimedia  educational  CD-ROM  that  re¬ 
quires  a  fluid,  dynamic,  and  original  user  interface. 

Finally,  developers  need  to  truly  master  these  tools,  espe¬ 
cially  when  prototyping  in  support  of  a  design  team.  Success  de¬ 
pends  on  the  programmer’s  ability  to  change  the  details  quickly 
as  well  as  the  overall  structure  of  the  prototype.  A  developer  will 
be  more  productive  when  using  a  familiar  tool  than  if  forced  to 
use  a  more  powerful  but  unknown  tool. 

Since  a  complete  tour  of  development  tools  for  interactive 
systems  is  beyond  the  scope  of  this  chapter,  we  focus  on  those 
tools  that  can  be  used  most  effectively  for  prototyping:  user 
interface  toolkits,  user  interface  builders,  and  user  interface  de¬ 
velopment  environments. 
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User  Interface  Toolkits 

User  interface  toolkits  are  probably  the  most  widely  used  tool 
nowadays  to  implement  applications.  All  three  major  platforms 
(Unix/Linux,  MacOS,  and  Windows)  come  with  at  least  one 
standard  UI  toolkit.  The  main  abstraction  provided  by  a  UI 
toolkit  is  the  widget.  A  widget  is  a  software  object  that  has  three 
facets  that  closely  match  the  MVC  model  (Krasner  &  Pope, 
1988):  a  presentation,  a  behavior,  and  an  application  interface. 

The  presentation  defines  the  graphical  aspect  of  the  widget. 
The  overall  presentation  of  an  interface  is  created  by  assembling 
widgets  into  a  tree.  Widgets  such  as  buttons  are  the  leaves  of  the 
tree.  Composite  widgets  constitute  the  nodes  of  the  tree  and 
control  the  layout  of  their  children.  The  behavior  of  a  widget  de¬ 
fines  the  interaction  methods  it  supports:  a  button  can  be 
pressed,  a  scrollbar  can  be  scrolled,  and  a  text  field  can  be 
edited.  The  application  interface  defines  how  a  widget  commu¬ 
nicates  the  results  of  the  user  interaction  to  the  rest  of  the  ap¬ 
plication.  It  is  usually  based  on  a  notification  mechanism. 

One  limitation  of  widgets  is  that  their  behaviors  are  limited 
to  the  widget  itself  Interaction  techniques  that  involve  multi¬ 
ple  widgets,  such  as  drag-and-drop,  cannot  be  supported  by  the 
widgets’  behaviors  alone  and  require  separate  support  in  the 
UI  toolkit.  Some  interaction  techniques,  such  as  toolglasses  or 
magic  lenses  (Bier,  Stone,  Pier,  Buxton,  &  De  Rose,  1993),  break 
the  widget  model  both  with  respect  to  the  presentation  and  the 
behavior  and  cannot  be  supported  by  traditional  toolkits.  In 
general,  prototyping  new  interaction  techniques  requires  either 
implementing  them  within  new  widget  classes,  which  is  not  al¬ 
ways  possible,  or  not  using  a  toolkit  at  all.  Implementing  a  new 
widget  class  is  typically  more  complicated  than  implementing 
the  new  technique  outside  the  toolkit,  for  example,  with  a 
graphical  library,  and  is  rarely  justified  for  prototyping.  Many 
toolkits  provide  a  blank  widget,  such  as  the  Canvas  in  Tk  or 
JFrame  in  Java  Swing  (Eckstein,  Toy  &  Wood,  1998),  which  can 
be  used  by  the  application  to  implement  its  own  presentation 


and  behavior.  This  is  usually  a  good  alternative  to  implementing 
a  new  widget  class,  even  for  production  code. 

User  interface  toolkits  have  been  an  active  area  of  research 
over  the  past  15  years.  Interviews  (Linton,  Vlissides,  &  Calder, 
1989)  inspired  many  modern  toolkits  and  user  interface  builders. 
Some  recent  research  toolkits  that  can  be  used  for  prototyping 
include  SubArctic  (Hudson,  Mankoff,  &  Smith,  2005)  and  Satin 
(Hong  &  Landay  2000).  The  latter  is  dedicated  to  ink-based  in¬ 
teraction  and  is  used  with  the  Silk  and  Denim  UIDEs  described 
below. 

A  number  of  toolkits  have  also  shifted  away  from  the  widget 
model  to  address  other  aspects  of  user  interaction.  For  example, 
GroupKit  (Roseman  &  Greenberg,  1996, 1999)  was  designed  for 
groupware.  Jazz  (Bederson,  Meyer,  &  Good,  2000)  forzoomable 
interfaces,  the  Visualization  (Schroeder,  Martin,  &  Lorensen, 
1997)  and  InfoVis  (Fekete,  2004)  toolkits  for  visualization.  In¬ 
ventor  (Strass,  1993)  for  3-D  graphics,  and  Metisse  (Chapuis  & 
Roussel,  2005)  for  window  management  (Fig.  52.16). 

Creating  an  application  or  a  prototype  with  a  UI  toolkit  re¬ 
quires  solid  knowledge  of  the  toolkit  and  experience  with  pro¬ 
gramming  interactive  applications.  In  order  to  control  the  com¬ 
plexity  of  the  interrelations  between  independent  pieces  of 
code  (creation  of  widgets,  callbacks,  global  variables,  etc.),  it  is 
important  to  use  well-known  design  patterns  (Gamma,  Helm, 
Johnson,  &  Vlissides,  1995)  such  as  Command,  Chain  of  respon¬ 
sibility,  Mediator,  and  Observer.  Otherwise  the  code  quickly  be¬ 
comes  unmanageable  and,  in  the  case  of  a  prototype,  unsuitable 
to  design  space  exploration. 

User  Interface  Builders 

User  interface  builders  leverage  user  interface  toolkits  by  allow¬ 
ing  the  developer  of  an  interactive  system  to  create  the  presen¬ 
tation  of  the  user  interface,  such  as  the  tree  of  widgets,  interac¬ 
tively  with  a  graphical  editor.  The  editor  typically  features  a 


FIGURE  52.16.  TWo  prototypes  of  a  window  system  implemented  with  Metisse.  The  one  on  the 
right  implements  folding  windows.  Unlike  in  Figure  1 5  above,  here  it  works  with  real  applications. 
(©  O.  Chapuis,  with  permission) 
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palette  of  widgets  that  the  user  can  use  to  draw  the  interface  in 
the  same  way  as  a  graphical  editor  is  used  to  create  diagrams 
with  lines,  circles,  and  rectangles.  The  presentation  attributes  of 
each  widget  can  be  edited  interactively  as  well  as  the  overall  lay¬ 
out.  This  saves  a  lot  of  time  that  would  otherwise  be  spent  writ¬ 
ing  and  fine  tuning  rather  dull  code  that  creates  widgets  and 
specifies  their  attributes.  It  also  makes  it  extremely  easy  to  ex¬ 
plore  and  test  design  alternatives. 

Apple’s  Interface  Builder  (Fig.  52.17)  is  a  descendant  of  the 
NeXT  interface  builder  (NeXT  Corporation,  1991).  The  palette 
to  the  right  contains  the  available  widgets.  The  user  interface  of 
the  application  was  created  by  dragging  these  widgets  into  the 
application  window  at  the  top  left.  The  bottom  left  window 
contains  icons  representing  the  application  objects.  By  drag¬ 
ging  connectors  between  widgets  and  these  objects,  a  signifi¬ 
cant  part  of  the  behavior  of  the  interface  can  be  created  inter¬ 
actively.  The  user  interface  can  be  tested  at  any  time  by  switching 
the  builder  to  test  mode,  making  it  easy  to  verify  that  it  behaves 
as  expected.  The  same  application  built  directly  with  the  un¬ 
derlying  toolkit  would  require  dozens  of  lines  of  code  and  sig¬ 
nificant  debugging. 

User  interface  builders  are  widely  used  to  develop  proto¬ 
types,  as  well  as  final  applications.  They  are  easy  to  use,  they 
make  it  easy  to  change  the  look  of  the  interface,  and  they  hide 
a  lot  of  the  complexity  of  creating  user  interfaces  with  UI  tool¬ 
kits.  However,  despite  their  name,  they  do  not  cover  the  whole 
user  interface.  Therefore  they  still  require  a  significant  amount 
of  programming,  a  good  knowledge  of  the  underlying  toolkit 
and  an  understanding  of  their  limits,  especially  when  prototyp¬ 
ing  novel  visualization  and  interaction  techniques. 


User  Interface  Development  Environments 

A  number  of  high-level  tools  exist  for  creating  interactive  appli¬ 
cations.  They  are  often  based  on  user  interface  toolkits,  and  they 
sometimes  include  an  interface  builder.  These  tools  are  often 
referred  to  as  “user  interface  development  environments.” 

The  simplest  of  these  tools  are  application  frameworks, 
which  address  stereotyped  applications.  For  example,  many  ap¬ 
plications  have  a  standard  form  where  windows  represent  doc¬ 
uments  that  can  be  edited  with  menu  commands  and  tools  from 
palettes;  each  document  may  be  saved  into  a  disk  file;  standard 
functions  such  as  copy/paste,  undo,  and  help  are  supported.  Im¬ 
plementing  such  stereotyped  applications  with  a  UI  toolkit  or 
UI  builder  requires  replicating  a  significant  amount  of  code  to 
implement  the  general  logic  of  the  application  and  the  basics 
of  the  standard  functions.  Application  frameworks  address 
this  issue  by  providing  a  shell  that  the  developer  fills  in  with 
the  functional  core  and  the  actual  presentation  of  the  nonstan¬ 
dard  parts  of  the  interface.  Most  frameworks  have  been  inspired 
by  MacApp,  a  framework  developed  in  the  1980s  to  develop  ap¬ 
plications  for  the  Macintosh  (Apple  Computer,  1996).  Some 
frameworks  are  more  specialized  than  MacApp.  For  example. 
Unidraw  (Vlissides  &  Linton,  1990)  is  a  framework  for  creating 
graphical  editors  in  domains,  such  as  technical  and  artistic  draw¬ 
ing,  music  composition,  or  circuit  design.  By  addressing  a 
smaller  set  of  applications,  such  a  framework  can  provide  more 
support  and  significantly  reduce  implementation  time. 

Mastering  an  application  framework  takes  time.  It  requires 
knowledge  of  the  underlying  toolkit  and  the  design  patterns 
used  in  the  framework,  and  a  good  understanding  of  the  de- 
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FIGURE  52.18.  The  Garnet  toolkit  and  tools  (Myers  et  al.,  1990). 


sign  philosophy  of  the  framework.  A  framework  is  useful  be¬ 
cause  it  provides  a  number  of  functions  “for  free,”  but  at  the 
same  time  it  constrains  the  design  space  that  can  be  explored. 
Frameworks  can  prove  effective  for  prototyping  if  their  limits 
are  well  understood  by  the  design  team. 

UIDEs  consist  in  assembling  a  set  of  tools  into  an  environ¬ 
ment  where  different  aspects  of  an  interactive  system  can  be 
specified  and  generated  separately.  For  example,  Garnet  (Fig. 
52.18)  and  its  successor  Amulet  (Myers,  Giuse,  et  al.,  1990;  My¬ 
ers,  McDaniel,  et  al.,  1997)  provide  a  comprehensive  set  of  tools, 
including  a  traditional  user  interface  builder,  a  semiautomatic 
tool  for  generating  dialogue  boxes,  a  user  interface  builder  based 
on  a  demonstration  approach,  and  so  forth. 

Some  UIDEs  include  tools  that  are  specifically  designed 
for  prototyping.  For  example.  Silk  (Landay  &  Myers,  2001)  is 
a  tool  aimed  at  the  early  stages  of  design,  when  interfaces  are 
sketched  rather  than  prototyped  in  software.  The  user  sketches 
a  user  interface  directly  on  the  screen  (Fig.  52.19).  Using  gesture 


FIGURE  52.19.  A  sketch  created  with  Silk  (top  left)  and  its  automatic  transformation  into  a  Motif 
user  interface  (top  right).  A  storyboard  (bottom)  used  to  test  sequences  of  interactions,  here  a 
button  that  rotates  an  object.  (©  |.  Landay,  with  permission) 
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recognition,  Silk  interprets  the  marks  as  widgets,  annotations, 
and  so  forth.  Even  in  its  sketched  form,  the  user  interface  is 
functional:  buttons  can  be  pressed,  tools  can  be  selected  in  a 
toolbar,  and  so  on.  The  sketch  can  also  be  turned  into  an  actual 
interface,  i.e.,  using  the  Motif  toolkit.  Finally,  storyboards  can 
be  created  to  describe  and  test  sequences  of  interactions. 
Monet  (Li  &  Landay  2005)  expands  this  approach  by  supporting 
the  specification  of  animations  and  continuous  interaction  such 
as  drag-and-drop.  Silk  and  Monet  therefore  combine  some  as¬ 
pects  of  offline  and  online  prototyping  techniques,  trying  to  get 
the  best  of  both  worlds.  Another  example  is  Denim  (Lin,  New¬ 
man,  Hong,  &  Landay,  2000),  which  addresses  the  prototyping 
of  websites.  Like  Silk,  it  uses  a  sketch-based  interface  to  allow 
designers  to  quickly  enter  ideas  about  the  overall  design  of  the 
website  as  well  as  individual  pages  and  test  the  navigation.  This 
illustrates  a  current  trend  in  research  where  online  tools  at¬ 
tempt  to  support  not  only  the  development  of  the  final  system, 
but  the  whole  design  process.  The  following  section  will  also 
illustrate  this  trend  in  the  new  and  very  active  areas  of  mixed 
reality  and  pervasive  computing. 


PROTOTYPING  MIXED  REALITY  AND 
PERVASIVE  COMPUTING  SYSTEMS 


while  most  examples  in  the  previous  sections  concerned  tradi¬ 
tional  graphical  user  interfaces  (GUIs),  there  is  an  increasing 
need  to  address  the  design  and  prototyping  of  systems  that 
combine  the  physical  and  online  worlds.  The  current  trend  to¬ 
wards  pervasive  computing  and  mixed  reality  started  with 
Weiser’s  (1991)  seminal  article  on  ubiquitous  computing  and 
was  carried  on  by  augmented  reality  (Wellner,  Mackay  &  Gold, 
1993)  and  tangible  interfaces  (Holmquist,  Schmidt,  &  Ullmer, 
2004).  A  common  theme  of  these  approaches  is  to  combine 
interaction  with  the  real  world  and  interaction  with  online  in¬ 
formation,  taking  advantage  of  humans’  abilities  to  interact  with 
physical  artifacts. 

While  mixed  reality  emphasizes  the  role  of  physical  objects, 
pervasive  computing  emphasizes  the  role  of  the  physical  space. 
Both  raise  new  and  difficult  issues  when  trying  to  actually  design 
and  build  such  systems.  At  the  prototyping  stage,  they  typically 


require  a  larger  effort  than  when  designing  a  GUI.  First,  the 
range  of  possible  interactions  is  much  broader  and  typically 
includes  gesture,  eye-gaze,  or  speech.  These  techniques  are  al¬ 
ready  difficult  to  incorporate  into  traditional  interfaces:  in  a 
mixed  reality  or  pervasive  computing  environment,  the  user  is 
much  less  constrained,  making  recognition  and  interpretation 
of  users’  actions  much  harder.  Designers  must  then  address 
recognition  errors  and  more  generally  the  effects  of  context  on 
the  sensing  techniques.  Second,  the  range  of  artifacts  to  design 
is  larger  than  with  desktop  interfaces,  as  it  includes  the  physi¬ 
cal  artifacts  that  the  users  may  manipulate;  in  pervasive  com¬ 
puting,  the  role  of  user  location,  the  issues  of  wireless  network 
coverage  and  sensor  range,  the  difficulty  of  providing  feedback 
where  needed  are  all  new  challenges  that  need  to  be  addressed. 
Finally,  these  systems  are  not  necessarily  used  for  tasks  where 
speed  of  execution  and  productivity  are  the  primary  measures 
of  success.  Whether  they  are  used  in  the  home,  for  tourism, 
such  as  tour  guides,  for  assisted  living  or  plain  entertainment, 
the  wider  and  less  well-mapped  design  space  typically  require 
extensive  prototyping  work. 

For  example,  Boucher  and  Gaver  (2006)  described  their  ex¬ 
periences  in  designing  the  drift  table,  a  coffee  table  with  a  small 
screen  in  the  center  displaying  an  aerial  view  of  England  and  load 
sensors  measuring  the  weight  of  objects  on  the  table.  Accord¬ 
ing  to  the  measured  weight  and  its  estimated  location,  the  aer¬ 
ial  view  slowly  drifts  in  that  direction,  as  if  navigating  with  a  hot 
air  balloon.  'While  the  resulting  design  is  conceptually  simple,  it 
required  a  many  prototypes,  both  to  explore  what  it  should  do 
and  how  it  could  be  built.  The  final  prototype  (Fig.  52.20)  had 
to  be  aesthetically  pleasing  as  well  as  fully  functional  for  testing 
in  people’s  homes.  This  example  shows  that  the  higher  preci¬ 
sion  prototypes  had  to  address  interaction,  function,  and  looks 
simultaneously.  This  proved  particularly  challenging  but  is  com¬ 
mon  when  prototyping  these  types  of  systems. 

The  prototyping  strategies  that  we  have  described,  such  as 
horizontal,  vertical,  and  scenario-based  prototypes,  are  still  valid 
for  prototyping  mixed  reality  and  pervasive  computing  systems, 
as  well  as  some  of  the  prototyping  techniques,  such  as  offline 
prototyping  with  mock  ups  and  video  prototyping.  Online  pro¬ 
totyping,  however,  requires  specific  tools  and  iterative  and  in¬ 
cremental  prototyping,  even  more  so.  The  rest  of  this  section 
explores  some  of  the  existing  tools. 


FIGURE  52.20.  The  drift  table  (©  W  Gaver,  with  permission) 
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Phidgets  (Greenberg  &  Fitchett,  2001)  are  physical  sensors 
and  actuators  that  can  be  controlled  from  a  computer  through  a 
USB  connection.  A  wide  range  of  sensors  is  available  (light,  mo¬ 
tion,  distance,  pressure,  humidity,  etc.),  as  well  as  input  devices 
(buttons,  sliders),  LEDs,  LCD  displays,  motors,  and  RFID  tag 
readers.  By  hiding  their  implementations  and  exposing  their 
functionalities  through  a  standard  software  interface,  phidgets 
can  greatly  facilitate  the  design  and  prototyping  of  tangible 
interfaces.  For  example,  they  can  be  embedded  into  a  physical 
artifact  and  programmed  from  the  computer  to  prototype  a  mo¬ 
bile  device.  Since  all  phidgets  can  be  represented  on  the  com¬ 
puter  screen,  they  can  also  be  used  for  Wizard  of  Oz  settings, 
where  the  Wizard  can  directly  control  the  output  phidgets  such 
as  LEDs  and  motors. 

Papier-Mache  (Klemmer,  Li,  Lin,  &  Landay,  2004)  is  a  toolkit 
for  creating  tangible  interfaces  based  on  vision,  RFID  tags,  and 
barcodes.  It  is  more  adapted  to  the  development  of  augmented 
paper  applications  than  Phidgets  and  provides  a  richer  devel¬ 
opment  environment.  The  DART  toolkit  (MacIntyre,  Gandy, 
Dow,  &  Bolter,  2004)  takes  a  similar  approach,  but  targets  aug¬ 
mented  and  mixed  reality  applications.  The  development  envi¬ 
ronment  is  based  on  Macromedia  Director,  with  the  explicit  goal 
of  being  familiar  to  designers.  In  the  context  of  prototyping, 
both  toolkits  can  be  used  for  applications  other  than  those  that 
they  explicitly  target.  For  example,  DART  can  superimpose  a 
computer-generated  model  with  live  or  recorded  video,  which 
can  be  used  to  put  an  image  or  3-D  model  of  a  device  being  de¬ 
signed  in  the  real  world. 

Topiary  (Li,  Hong,  &  Landay,  2004)  is  a  tool  for  prototyping 
pervasive  applications  that  use  the  locations  of  people,  places, 
and  objects.  Unlike  the  above  tool,  its  goal  is  not  to  create  real- 
world  prototypes  allowing  to  test  a  system  in  situ.  Rather,  it  is  an 
online  tool  that  allows  to  create  and  test  scenarios  on  the 
screen,  using  maps  representing  the  environment  and  icons 
representing  the  people,  places,  and  objects  of  interest.  Once  a 
scenario  has  been  created,  the  tool  allows  the  user  to  experi¬ 
ence  it  either  from  a  bird’s  eye  view  or  from  the  perspective  of 
a  particular  user  by  displaying  the  user’s  PDA.  Such  scenario- 
based  prototyping  can  be  very  useful  in  the  early  stages  of  de¬ 
sign,  when  exploring  ideas. 

Finally,  a  CAPpella  (Dey  Hamid,  Beckmann,  Li,  &  Hsu,  2004), 
is  a  tool  to  help  end  users  create  or  customize  context-aware  ap¬ 
plications.  It  uses  programming  by  demonstration  so  that  users 
only  have  to  show  examples  of  sensor  data  and  the  system’s  re¬ 
action.  "While  the  system  targets  end  users,  it  can  also  be  used  by 
designers  to  create  prototypes  of  context-aware  applications.  By 
hiding  the  complexity  of  programming  recognizers,  it  allows 
for  a  fast  exploration  of  alternatives. 

In  summary,  while  mixed  reality  and  pervasive  computing 
create  new  challenges  for  designers,  a  number  of  tools  are  start¬ 
ing  to  appear  to  help  with  the  design  process  in  general  and 
prototyping  in  particular.  Interestingly,  these  tools  can  also  be 
used  to  prototype  more  traditional  applications.  For  example, 
phidgets  can  be  used  to  control  an  online  application  in  a  Wiz¬ 
ard  of  Oz  setting,  DART  can  be  used  to  create  scenarios  that  mix 
videos  with  models  of  future  artifacts,  and  a  CAPpella  can  be 
used  to  experiment  with  recognition-based  applications.  As 
mixed  reality  and  pervasive  computing  systems  become  more 


widely  available,  they  will  no  doubt  provide  more  tools  to  be 
used  by  designers  in  unexpected  ways  to  help  with  the  proto¬ 
typing  of  interactive  systems  at  large. 


CONCLUSION 


Prototyping  is  an  essential  component  of  interactive  system  de¬ 
sign.  Prototypes  may  take  many  forms,  from  rough  sketches  to 
detailed  working  prototypes.  They  provide  concrete  represen¬ 
tations  of  design  ideas  and  give  designers,  users,  developers  and 
managers  an  early  glimpse  into  how  the  new  system  will  look 
and  feel.  Prototypes  increase  creativity,  allow  early  evaluation 
of  design  ideas,  help  designers  think  through  and  solve  design 
problems,  and  support  communication  within  multidisciplinary 
design  teams. 

Prototypes,  because  they  are  concrete  and  not  abstract,  pro¬ 
vide  a  rich  medium  for  exploring  a  design  space.  They  suggest 
alternate  design  paths  and  reveal  important  details  about  par¬ 
ticular  design  decisions.  They  force  designers  to  be  creative 
and  to  articulate  their  design  decisions.  Prototypes  embody  de¬ 
sign  ideas  and  encourage  designers  to  confront  their  differ¬ 
ences  of  opinion.  The  precise  aspects  of  a  prototype  offer  spe¬ 
cific  design  solutions:  designers  can  then  decide  to  generate 
and  compare  alternatives.  The  imprecise  or  incomplete  aspects 
of  a  prototype  highlight  the  areas  that  must  be  refined  or  re¬ 
quire  additional  ideas. 

We  defined  prototypes  and  then  discussed  them  as  design 
artifacts.  We  introduced  four  dimensions  by  which  they  can  be 
analyzed:  representation,  precision,  interactivity,  and  evolution. 
We  then  discussed  the  role  of  prototyping  within  the  design 
process  and  explained  the  concept  of  creating,  exploring,  and 
modifying  a  design  space.  We  briefly  described  techniques  for 
generating  new  ideas  to  expand  the  design  space  and  tech¬ 
niques  for  choosing  among  design  alternatives  to  contract  the 
design  space. 

We  described  a  variety  of  rapid  prototyping  techniques  for 
exploring  ideas  quickly  and  inexpensively  in  the  early  stages  of 
design,  including  offline  techniques  (from  paper  and  pencil  to 
video)  and  online  techniques  (from  fixed  to  interactive  simula¬ 
tions).  We  then  described  iterative  and  evolutionary  prototyping 
techniques  for  working  out  the  details  of  the  online  interaction, 
including  development  tools  and  software  environments.  Fi¬ 
nally,  we  addressed  the  emerging  fields  of  mixed  reality  and 
pervasive  computing  and  described  the  new  challenges  they 
raise  for  interactive  systems  design,  as  well  as  some  of  the  tools 
available  for  prototyping  them. 

We  view  design  as  an  active  process  of  working  with  a  de¬ 
sign  space,  expanding  it  by  generating  new  ideas  and  contract¬ 
ing  as  design  choices  are  made.  Prototypes  are  flexible  tools  that 
help  designers  envision  this  design  space,  reflect  upon  it,  and 
test  their  design  decisions.  Prototypes  are  diverse  and  can  fit 
within  any  part  of  the  design  process,  from  the  earliest  ideas  to 
the  final  details  of  the  design.  Perhaps  most  important,  proto¬ 
types  provide  one  of  the  most  effective  means  for  designers  to 
communicate  with  each  other,  as  well  as  with  users,  developers, 
and  managers,  throughout  the  design  process. 
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INTRODUCTION 


Scenario-based  design  is  a  family  of  techniques  in  which  the  use 
of  a  future  system  is  concretely  described  at  an  early  point  in  the 
development  process.  Narrative  descriptions  of  envisioned  us¬ 
age  episodes — user  interaction  scenarios — are  then  employed 
in  a  variety  of  ways  to  guide  the  development  of  the  system  that 
will  enable  these  use  experiences. 

Like  other  user-centered  approaches,  scenario-based  design 
changes  the  focus  of  design  work  from  defining  system  opera¬ 
tions  (e.g.,  functional  specification)  to  describing  how  people  will 
use  a  system  to  accomplish  work  tasks  and  other  activities.  How¬ 
ever,  unlike  approaches  that  consider  human  behavior  and  ex¬ 
perience  through  formal  analysis  and  modeling  of  well-specified 
tasks,  scenario-based  design  offers  relatively  lightweight  methods 
for  envisioning  future  use  possibilities. 

A  user  interaction  scenario  is  a  “sketch  of  use.”  It  is  intended 
to  vividly  capture  the  essence  of  an  interaction  design,  much  as 
a  two-dimensional  paper-and-pencil  sketch  captures  the  essence 
of  a  physical  design.  Like  any  story,  a  scenario  consists  of  a  set¬ 
ting,  or  situation  state;  one  or  more  actors  with  personal  moti¬ 
vations;  knowledge;  and  capabilities,  and  various  tools  and  ob¬ 
jects  that  the  actors  encounter  and  manipulate.  The  narrative 
describes  a  sequence  of  actions  and  events  that  lead  to  an  out¬ 
come.  These  actions  and  events  are  related  in  a  usage  context 
that  includes  the  goals,  plans,  and  reactions  of  the  people  taking 
part  in  the  episode. 

Table  53.1  presents  two  brief  scenarios  in  which  Sharon,  a 
high-school  junior,  attends  an  online  science  fair  at  which  some 
of  her  friends  are  exhibiting  their  projects.  In  all  of  these  sce¬ 
narios,  Sharon’s  goal  is  to  chat  with  her  friends  and  visit  their 
exhibits.  The  scenarios  contrast  two  ways  that  such  a  goal  might 
be  supported  by  computer  network  technologies.  Each  is  a  po¬ 
tential  “solution”  to  Sharon’s  goals,  but  the  user  experience 
varies  from  asynchronous  text-based  reading  and  posting  to  a 
forum,  to  a  real-time  graphical  simulation  of  a  science  fair  that  is 
in  progress. 

Designers  can  quickly  construct  scenarios  like  these  in  or¬ 
der  to  make  envisioned  possibilities  more  concrete.  The  exam¬ 
ple  contrasts  two  contemporary  approaches  to  online  inter¬ 
actions,  but  not  as  an  abstraction  or  a  list  of  features  or 
functions.  It  contrasts  two  episodes  of  human-computer  inter¬ 
action  and  personal  experience. 

User  interaction  scenarios  can  be  successively  detailed  to  dis¬ 
cover  and  address  finer-grained  design  issues.  They  serve  as 
grist  for  group  brainstorming,  developing  further  alternatives, 
or  raising  questions  about  the  assumptions  behind  the  scenar¬ 
ios.  They  can  be  used  to  analyze  software  requirements,  as  a 
partial  specification  of  functionality,  and  to  guide  the  design  of 
user  interface  layouts  and  controls.  They  can  also  be  used  to 
identify  and  plan  evaluation  tasks  for  usability  tests. 


WHY  SCENARIO-BASED  DESIGN? 


One  reason  that  scenarios  have  become  so  popular  in  inter¬ 
active  system  design  is  that  they  enable  rapid  communication 


TABLE  53.1.  T\wo  Contrasting  Scenarios  for  a  High-School 
Student  Attending  an  Online  Science  Fair 

A.  Online  Science  Fair  in  a  Web  Forum 

Sharon’s  friends  Alicia  and  Delia  have  submitted  projects  to  the  online 
science  fair,  and  they  are  counting  on  her  to  visit  and  talk  to  them  while 
they  "host”  their  exhibits  tonight.  At  7:30  rm.,  she  clicks  on  the  URL  in  the 
e-mailed  announcement  to  open  a  web  browser  on  the  science  fair’s 
home  page.  She  sees  a  welcome  notice  and  a  categorical  listing  of  pro¬ 
jects  at  the  fair  (biology,  forestry,  astronomy,  etc.).  She  knows  that  Ali¬ 
cia’s  project  is  on  black  holes,  so  she  quickly  scans  the  astronomy  list 
to  find  her  project  and  follows  the  link.  She  goes  to  another  web  page  and 
admires  Alicia’s  overview  poster.  She  sees  a  number  of  links  leading  to  the 
normal  parts  of  a  science  project  (problem,  significance,  methods,  etc.). 
She  also  sees  a  link  named  "Talk  to  Me”  and  follows  this,  hoping  to  get 
in  touch  with  Alicia.  She  sees  a  list  of  comments  about  the  exhibits,  from 
visitors  as  well  as  responses  from  Alicia.  Alicia’s  most  recent  response 
was  at  7:29  rm.,  so  Sharon  thinks  she  may  still  be  online.  She  posts 
a  message  ("Very  flashy  poster!”)  and  is  happy  to  get  an  immediate 
response  from  Alicia  asking  what  else  she  has  seen  at  the  fair. 


B.  Online  Science  Fair  in  a  Collaborative  Virtual  Environment 

Sharon’s  friends  Alicia  and  Delia  have  submitted  projects  to  the  online 
science  fair,  and  they  are  counting  on  her  to  visit  and  talk  to  them  while 
they  "host”  their  exhibits  tonight.  At  7:30  rm.,  she  clicks  on  the  URL  in  the 
e-mailed  announcement,  which  launches  a  collaborative  virtual  environ¬ 
ment.  The  welcome  screen  shows  a  two-dimension  view  of  the  middle- 
school  gymnasium  (she  recognizes  her  favorite  bleachers).  She  sees  a 
bunch  of  little  poster  icons  grouped  in  different  areas  of  the  gym,  with 
banners  announcing  different  science  categories  (biology,  forestry,  astron¬ 
omy,  etc.).  She  also  sees  a  bunch  of  little  faces  moving  around  and  rec¬ 
ognizes  a  couple  of  her  other  friends,  Brian  and  Corey  As  she  slides  her 
cursor  over  the  little  poster  icons,  the  project  titles  and  authors  are  dis¬ 
played,  along  with  an  invitation  to  zoom  in.  She  knows  that  Alicia’s  pro¬ 
ject  is  on  black  holes,  so  she  moves  to  the  astronomy  grouping,  finds 
Alicia’s  project,  and  zooms  in.  She  finds  herself  looking  at  a  virtual  poster, 
with  a  bunch  of  poster  sections  that  name  the  traditional  project  com¬ 
ponents  (problem,  significance,  methods,  etc);  there  is  also  a  comment 
board.  She  also  sees  an  icon  with  Alicia’s  face  "standing”  nearby  When 
she  double-clicks  on  Alicia,  a  text  chat  automatically  opens,  and  she 
types  a  quick  "Hello.”  Alicia  replies  and  Sharon  starts  by  teasing  her 
about  whether  Brian  has  come  to  visit. 


about  usage  possibilities  and  concerns  among  many  different 
stakeholders.  It  is  easy  to  write  a  simple  scenario  such  as  those 
in  Table  53.1,  and  takes  only  a  little  more  effort  to  enrich  it  with 
a  rough  sketch  or  storyboard.  When  designers  are  working 
through  ideas,  they  want  to  make  progress  quickly,  so  that  they 
can  obtain  feedback  and  continue  to  refine  their  ideas.  Scenar¬ 
ios  are  one  way  to  do  this. 

The  design  of  an  interactive  system  is  an  ill-defined  problem. 
Such  problems  tend  to  evoke  a  problem-solving  strategy  termed 
solution-first  (Cross,  2001).  In  the  solution-first  strategy,  de¬ 
signers  generate  and  analyze  a  candidate  solution  as  a  means 
of  clarifying  the  problem  state,  the  allowable  moves,  and  the 
goal.  They  exploit  the  concreteness  of  their  own  solution  pro¬ 
posals  to  evoke  further  requirements  for  analysis. 

A  solution-first  approach  to  design  is  energizing,  effective, 
and  efficient;  it  explains  the  popularity  of  contemporary  system 
development  approaches  like  rapid  prototyping  (Wasserman  & 
Shewmaker,  1982)  and  extreme  programming  (Beck,  1999).  But 
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this  general  strategy  also  entrains  well-known  hazards  (Cross, 
2001) — designers  tend  to  generate  solutions  too  quickly,  before 
they  analyze  what  is  already  known  about  the  problem  and  pos¬ 
sible  moves.  Once  an  approach  is  envisioned,  they  may  have 
trouble  abandoning  it  when  it  is  no  longer  appropriate.  Design¬ 
ers  may  too  readily  reuse  pieces  of  a  solution  they  have  used 
earlier,  pieces  that  are  familiar  and  accessible,  but  perhaps  not 
appropriate.  They  may  not  analyze  their  own  solutions  very 
well,  or  they  may  consider  too  few  alternatives  when  exploring 
the  problem  space.  In  the  next  three  sections  we  consider  how 
scenario-based  design  may  help  to  minimize  these  hazards  of 
solution-first  problem  solving  (see  Table  53.2). 

Scenarios  are  Concrete  but  Rough 

Design  analysis  is  always  indeterminate,  because  the  act  of  de¬ 
sign  changes  the  world  within  which  people  act  and  experience. 
Requirements  always  change  (Brooks,  1995).  When  designs  in¬ 
corporate  technologies  that  are  evolving  rapidly,  requirements 
change  even  more  rapidly.  The  more  successful,  the  more 
widely  adopted,  and  the  more  impact  a  design  has,  the  less  pos¬ 
sible  it  will  be  to  determine  its  correct  design  requirements.  And 
in  any  case,  refinements  in  software  technology  and  new  per¬ 
ceived  opportunities  and  requirements  propel  a  new  generation 
of  designs  every  two  to  three  years. 

Design  representations  that  are  at  once  concrete  but  flexi¬ 
ble  help  to  manage  ambiguous  and  dynamic  situations.  Analysts 
must  be  concrete  to  avoid  being  swallowed  by  indeterminacy; 
they  must  be  flexible  to  avoid  being  captured  by  false  steps.  Sys¬ 
tematic  decomposition  is  a  traditional  approach  to  managing 
ambiguity,  but  it  does  not  promote  flexibility.  Instead,  design¬ 
ers  end  up  with  a  set  of  concrete  subsolutions,  each  of  which  is 
fully  specified.  Unfortunately,  by  the  time  the  set  of  subsolutions 
is  specified,  the  requirements  often  have  changed. 

User  interaction  scenarios  reconcile  concreteness  and  flexi¬ 
bility.  A  scenario  envisions  a  concrete  design  solution,  but  it  can 
be  couched  at  many  levels  of  detail.  Initially  a  scenario  may  be 
extremely  rough.  It  specifies  a  possible  design  by  indicating  the 
tasks  users  may  carry  out,  but  without  committing  to  lower-level 
details  describing  how  the  tasks  will  be  carried  out,  or  how  the 
system  will  present  the  functionality  for  the  tasks.  The  examples 
in  Table  53.1  are  at  an  intermediate  level,  with  some  information 
regarding  task  flow,  but  few  details  about  specific  user-system 
interactions. 

Concrete  material  is  interpreted  more  easily  and  more  thor¬ 
oughly  than  abstract  material.  For  example,  people  remember 


a  prototypical  example  far  better  than  they  remember  the  ab¬ 
stract  category  to  which  it  belongs  (Medin  &  Schaffer,  1978; 
Rosch,  Mervis,  Gray,  Johnson,  &  Boyes-Braem,  1976).  Incom¬ 
plete  material  tends  to  be  elaborated  with  respect  to  personal 
knowledge  when  it  is  encountered.  This  process  of  elaboration 
creates  more  robust  and  accessible  memories,  relative  to  mem¬ 
ories  for  more  complete  material  (Wertheimer,  1938).  The  com¬ 
bination  of  concreteness  and  incompleteness  in  scenarios  en¬ 
gages  a  powerful  variety  of  constructive  cognitive  processes. 

The  fluidity  of  design  situations  demands  provisional  solu¬ 
tions  and  tentative  commitments.  Yet  if  every  design  decision 
is  suspended,  the  result  will  be  a  design  space,  not  a  design.  A 
scenario  is  a  concrete  design  proposal  that  a  designer  can  eval¬ 
uate  and  refine,  but  it  is  also  rough,  so  that  it  can  be  easily  al¬ 
tered,  and  many  details  can  be  deferred. 

Scenarios  Maintain  an  Orientation 
to  People  and  Their  Needs 

Designers  need  constraints;  too  many  things  might  be  de¬ 
signed.  The  current  state  of  technology  development  makes 
some  solutions  impossible  and  others  irresistible.  On  the  one 
hand,  designers  cannot  use  technology  that  does  not  yet  exist. 
On  the  other  hand,  designers  are  caught  up  in  a  technological 
Zeitgeist  that  biases  them  toward  making  use  of  the  latest  gad¬ 
gets  and  gizmos.  They  are  likely  to  be  biased  toward  familiar 
technologies,  even  when  they  are  aware  of  limitations  in  these 
technologies. 

Scenarios  are  work-oriented  design  objects.  They  describe 
systems  in  terms  of  the  work  that  users  will  try  to  do  when  they 
use  those  systems,  ensuring  that  design  will  remain  focused  on 
the  needs  and  concerns  of  users  (Carroll  &  Rosson,  1990).  Sce¬ 
narios  address  what  has  been  called  the  “representational  bias” 
in  human  cognition:  people  overestimate  the  relevance  of  things 
that  are  familiar  to  them  (Kahneman  &  Tversky  1972;  Tversky  & 
Kahneman,  1974).  For  instance,  designers  in  a  web-development 
company  with  years  of  experience  in  forms-based  transactions 
will  see  this  interaction  paradigm  as  a  solution  to  problems  that 
might  be  better  served  by  real-time  interaction  techniques.  It  is 
difficult  to  move  beyond  the  familiar,  but  generating  and  sharing 
a  vivid  representation  of  exceptions  to  the  status  quo  can  pro¬ 
mote  innovating  thinking.  Scenarios  describing  unusual  but 
critical  circumstances  can  provide  such  a  perspective. 

The  reuse  of  familiar  ideas  is  just  one  type  of  constraint  that 
designers  may  apply  in  their  solution-first  problem  solving. 
Other  constraints  may  arise  from  the  organizational  structures 


TABLE  53.2.  Concerns  Stemming  from  She  Solution-First  Approach  to  Design, 
and  Aspects  of  Scenario-Based  Design  that  Address  Each  Concern 

Hazards  of  the  Solution-First  Approach 

How  Scenario-Based  Design  Can  Help 

Designers  want  to  select  a  solution  approach  quickly,  which  may  lead 
to  premature  commitment  to  their  first  design  ideas. 

Designers  attempt  to  quickly  simplify  the  problem  space  with  external 
constraints,  such  as  the  reusing  of  familiar  solutions. 

Designers  are  intent  on  elaborating  their  current  design  proposal, 
resulting  in  inadequate  analysis  of  other  ideas  or  alternatives. 

Because  they  are  concrete  but  rough,  scenarios  support  visible 
progress,  but  also  relax  commitment  to  the  ideas  expressed  in  the 
scenarios. 

Because  they  emphasize  people  and  their  experiences,  scenarios  direct 
attention  to  the  use-appropriateness  of  design  ideas. 

Because  they  are  evocative  and  by  nature  incomplete,  scenarios 
promote  empathy  and  raise  usage  questions  at  many  levels. 
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within  which  the  design  work  is  embedded.  Design  projects  are 
often  chartered  with  a  priori  commitments  to  follow  a  system¬ 
atic  decomposition  process.  This  makes  them  easy  to  manage, 
but  unlikely  to  succeed  with  respect  discovering  the  real  re¬ 
quirements  of  users  and  clients.  Schedules  and  resources  are  of¬ 
ten  assigned  in  ways  that  create  ongoing  conflicts  between  sys¬ 
tem  designers  and  usability  engineers.  Usability  engineers  need 
to  evaluate  scenarios  and  prototypes  at  every  stage  of  system 
development,  but  if  schedules  and  resources  do  not  provide 
for  this,  this  work  can  conflict  with  software  construction  and 
refinement. 

Constraints  such  as  these  can  distract  designers  with  ancil¬ 
lary  factors  so  that  they  lose  sight  of  what  is  essential  in  the  de¬ 
sign  project — namely,  the  needs  and  concerns  of  users.  The  de¬ 
signer  can  become  “unsituated”  with  respect  to  the  real  design 
situation,  which  is  not  the  marketing  manager’s  projections,  or 
the  instructional  designer’s  list  of  steps,  or  the  software  engi¬ 
neer’s  system  decomposition.  The  real  design  situation  is  the 
situation  that  will  be  experienced  by  the  user,  and  designers 
need  to  stay  focused  on  that. 

Scenarios  can  be  made  even  more  effective  as  work-oriented 
design  objects  when  users  are  directly  involved  in  creating  them. 
Ackoff  (1979)  argued  that  the  indeterminacy  of  design  situations 
makes  it  imperative  that  all  stakeholders  participate  directly.  Sce¬ 
narios  support  a  simple  and  natural  process  of  participatory  de¬ 
sign,  in  which  prospective  users  begin  by  enacting  or  relating 
episodes  of  current  activities,  then  work  iteratively  with  design¬ 
ers  to  transform  and  enrich  these  scenarios  with  the  opportuni¬ 
ties  provided  by  new  technologies  (Carroll,  Chin,  Rosson,  & 
Neale,  2000;  Chin,  Rosson,  &  Carroll,  1997). 

Scenarios  are  Evocative,  Raising  Questions 
at  Many  Levels 

There  is  a  fundamental  tension  between  thinking  and  doing: 
thinking  impedes  progress  in  doing,  and  doing  obstructs  think¬ 
ing.  Sometimes  this  conflict  is  quite  sharp,  as  when  one  must 
stop  and  think  before  taking  another  step.  But  frequently  it  is 
more  a  matter  of  trading  priorities.  Designers  are  intelligent 
people  performing  complex  and  open-ended  tasks.  They  want 
to  reflect  on  their  activities,  and  they  routinely  do  reflect  on 
their  activities.  However,  people  take  pride  not  only  in  what 
they  know  and  learn,  but  also  in  what  they  can  do  and  in  what 
they  actually  produce. 

Donald  Schon  (1983;  1987)  discussed  this  conflict  exten¬ 
sively  in  his  books  on  reflective  practice.  For  example,  he  ana¬ 
lyzed  a  coach  reacting  to  an  architecture  student’s  design  concept 
for  a  school  building,  which  included  a  spiral  ramp  intended  to 
maintain  openness  while  breaking  up  lines  of  sight  (she  calls  the 
idea  “a  Guggenheim”) .  Schon  wrote  that  “when  I  visited  open 
schools,  the  one  thing  they  complained  about  was  the  ware¬ 
house  quality — of  being  able  to  see  for  miles.  It  [the  ramp] 
would  visually  and  acoustically  break  up  the  volume”  (Schon, 
1987,  p.  129). 

In  this  episode,  the  coach  feels  that  the  student  needs  to  ex¬ 
plore  and  develop  her  concept  more  thoroughly,  noting  that  a 
ramp  has  thickness  and  that  this  will  limit  her  plans  to  use  the 
space  underneath  the  ramp;  he  urges  her  to  draw  sections.  How¬ 


ever,  he  does  not  justify  this  advice;  as  Schon  puts  it,  he  does  not 
reveal  “the  meanings  underlying  his  questions”  (p.  132).  Schon 
regards  this  as  a  hopeless  confrontation  in  which  no  progress 
can  be  made  on  the  particular  design  project,  or  on  the  larger 
project  of  understanding  how  to  design.  Both  the  student  and 
the  coach  are  willing  to  act  publicly  and  to  share  actions,  but  they 
do  not  reflect  enough  on  their  own  and  one  another’s  values  and 
objectives,  and  on  their  interpersonal  dynamics. 

Reflection  is  not  always  comfortable;  it  forces  one  to  con¬ 
sider  one’s  own  competence,  to  open  oneself  to  the  possibility 
of  being  wrong.  Nonetheless,  designers  create  many  opportu¬ 
nities  for  reflection — for  example,  organizing  design-review 
meetings,  or  building  prototypes  for  formative  evaluation.  Such 
activities  promote  identification  and  integration  of  different  per¬ 
spectives;  they  raise  concrete  and  detailed  design  issues  to 
guide  further  work.  In  this  way  they  help  designers  to  reflect 
on  the  work  they  have  already  done.  But  they  do  not  evoke  re¬ 
flection  in  the  context  of  doing  design.  Design  reviews  and  for¬ 
mative  evaluations  are  ancillary  activities  that  must  be  coordi¬ 
nated  with  design  itself 

Scenarios  help  designers  to  reflect  on  their  ideas  in  the  con¬ 
text  of  doing  design.  The  narrative  is  written  to  evoke  an  image 
of  people  doing  things,  pursuing  goals,  and  using  technology 
in  support  of  these  goals.  The  story  enables  readers  to  em¬ 
pathize  with  the  people  in  the  situation,  which  in  turn  leads  to 
questions  about  motivations,  intentions,  reactions,  and  satis¬ 
faction.  For  example,  in  the  second  scenario  from  Table  53.1,  is 
it  valuable  to  Sharon  to  opportunistically  notice  other  friends 
when  she  arrives  at  the  fair?  "What  effect  does  her  recognition 
of  her  school  gymnasium  have  on  her  success  or  experience  at 
navigating  the  online  fair? 

Scenarios  promote  reflection  and  analysis  in  part  because 
the  human  mind  is  adept  at  overloading  meaning  in  narrative 
structures,  both  in  generation  and  interpretation,  as  illustrated 
by  the  remarkable  examples  of  dreams  (Freud,  1900),  myths 
(Levi-Strauss,  1967),  and  folktales  (Propp,  1928/1958).  It  is  well 
known  that  when  people  communicate,  they  rely  on  the  given- 
new  contract  (Haviland  &  Clark,  1974);  they  assume  or  allude  to 
relevant  background  information,  then  present  what  is  novel. 
This  normative  structure  eases  both  the  generation  and  inter¬ 
pretation  of  narratives. 

Schon  (1983)  described  design  as  a  “conversation”  with  a 
situation  comprised  of  many  interdependent  elements.  The  de¬ 
signer  makes  moves  and  then  “listens”  to  the  design  situation  to 
understand  their  consequences:  “In  the  designer’s  conversation 
with  the  materials  of  his  design,  he  can  never  make  a  move 
which  has  only  the  effects  intended  for  it.  His  materials  are  con¬ 
tinually  talking  back  to  him,  causing  him  to  apprehend  unan¬ 
ticipated  problems  and  potentials”  (p.  101). 

When  a  move  produces  unexpected  consequences,  and  par¬ 
ticularly  when  it  produces  undesirable  consequences,  the  de¬ 
signer  articulates  “the  theory  implicit  in  the  move,  criticizes  it, 
restructures  it,  and  tests  the  new  theory  by  inventing  a  move 
consistent  with  it”  (p.  155). 

Scenarios  often  include  implicit  information  about  design 
consequences.  Returning  to  the  scenarios  in  Table  53.1,  the  ex¬ 
hibit-discussion  forum  in  the  first  scenario  allows  Sharon  to  ar¬ 
rive  at  an  exhibit  “at  any  time,”  find  out  what  has  been  said,  and 
make  a  contribution.  In  the  second  scenario,  it  is  Sharon’s 
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chance  noticing  of  Brian  and  Corey  that  sets  up  her  teasing 
conversation  with  Alicia.  At  the  same  time  these  features  have 
less  positive  consequences,  for  example  the  need  to  browse  and 
read  each  of  a  discussion’s  comments  in  turn,  or  the  require¬ 
ment  that  she  pan  and  zoom  to  get  to  the  details  about  a  pro¬ 
ject.  These  tradeoffs  are  important  to  the  scenarios,  but  often 
it  is  enough  to  imply  them  (this  is  an  aspect  of  the  roughness 
property  discussed  earlier) . 

There  are  times,  however,  when  it  is  useful  to  reflect  more 
systematically  on  these  relationships,  to  make  them  explicit. 
In  another  situation  Sharon  may  find  the  archived  discussion 
too  long  or  disorganized  to  browse,  or  she  may  be  distracted  by 
friends  in  the  gymnasium  and  never  make  it  to  Alicia’s  project. 
These  alternative  scenarios  present  a  failure  with  respect  to  the 
starting  goal.  To  understand,  address,  and  track  the  variety  of 
both  desirable  and  undesirable  consequences  of  the  original 
annotation-design  move,  the  designer  might  want  to  make  ex¬ 
plicit  the  relevant  causal  relationships  in  a  scenario.  Doing  so 
provides  another  perspective  on  the  envisioned  situation  (see 
Table  53.3). 

Scenarios  and  analyses  such  as  those  shown  in  Table  53.3  can 
help  designers  move  more  deliberately  toward  specific  conse¬ 
quences.  For  example,  designers  might  decide  to  provide  for 
subcategories  within  major  categories  in  the  topic  listing,  so  that 
the  convenience  of  the  forum  interaction  is  obtained,  but  so 
that  it  also  scales  well  to  large  numbers  of  projects.  Alternatively, 
the  opportunistic  encounters  with  other  residents  might  be 
considered  desirable  enough  that  efforts  would  be  made  to 
provide  “presence”  information  in  an  asynchronous  web  forum. 
As  each  elaboration  is  envisioned  and  proposed,  it  too  is  ex¬ 
plored  for  further  consequences  and  interdependencies. 

Scenarios  of  use  are  multifarious  design  objects;  they  can  de¬ 
scribe  designs  at  multiple  levels  of  detail  and  with  respect  to 
multiple  perspectives.  In  this  way  they  can  help  designers  re¬ 
flect  on  several  aspects  of  a  problem  situation  simultaneously. 

TABLE  53.3.  Tradeoff  Analysis  of  Features  in  the  Scenarios 
from  Table  53.1,  Each  Feature  is  Considered  for  Possible 
Positive  and  Negative  Consequences  for  Users  in  the  Source 
Scenario  or  Closely  Related  Alternatives.  Each  Combination 
of  Feature  with  Its  Consequences  is  Called  a  "Claim” 

A.  Online  Science  Fair  on  a  Web  Forum 

Presenting  online  projects  as  a  categorical  list  in  a  Web  browser: 

+  enables  convenient  browsing  of  projects,  topic  by  topic 

+  leverages  existing  hypermedia  browsing  and  navigation  skills 

—  may  be  difficult  to  scan  or  navigate  when  the  list  is  very  long  or 
complex 

B.  Online  Science  Fair  in  a  Collaborative  Virtual  Environment 

Representing  online  projects  and  visitors  as  icons  in  an  image  of  the 
gymnasium: 

+  suggests  that  viewers  can  scan,  move  to,  and  "approach”  people 
and  projects 

+  reinforces  a  school-centered  context  for  thinking  about  the 
science  fair 

—  may  be  perceptually  complex;  for  example,  it  may  include  visual 
details  that  have  no  relevance  to  navigation  or  interaction 

—  may  use  image  panning  and  zooming,  not  familiar  interaction 
skills  for  some  users 


The  scenarios  in  Table  53.1  provide  a  high-level  task  view,  but 
they  can  be  elaborated  to  convey  the  moment-to-moment 
thoughts  and  experiences  of  the  actors  in  order  to  provide  a 
more  detailed  cognitive  view,  or  to  explore  actors’  moment-to- 
moment  actions  to  provide  a  more  detailed  functional  view. 
They  might  also  be  elaborated  in  terms  of  the  hardware  or  soft¬ 
ware  components  needed  for  implementing  the  envisioned 
functionality  (Rosson  &  Carroll,  1995;  'Wirfs-Brock,  1995).  Each 
of  these  variations  in  resolution  and  perspective  is  a  permuta¬ 
tion  of  a  single  underlying  scenario.  The  permutations  are  inte¬ 
grated  through  their  roles  as  complementary  views  of  the  same 
design  object. 

Using  scenarios  in  this  way  makes  them  a  more  powerful  de¬ 
sign  representation.  They  allow  the  designer  the  flexibility  to  de¬ 
velop  and  analyze  key  usage  scenarios  in  great  detail — for  ex¬ 
ample,  to  describe  core  application  functionality — while  merely 
sketching  less-critical  scenarios.  At  the  same  time,  designers  are 
able  to  switch  among  multiple  perspectives,  for  example  directly 
integrating  usability  views  with  software  views.  Such  a  flexible 
and  integrative  design  object  can  help  designers  manage  the 
many  interdependent  consequences  implied  by  design  moves 
(Rosson  &  Carroll,  2000,  2000a). 

Potential  Pitfalls  in  Scenario-Based  Design 

■While  scenario  envisioning  and  analysis  can  address  many  of  the 
concerns  of  solution-first  design,  work  with  scenarios  brings  its 
own  downsides.  The  concreteness  and  work-orientation  that 
make  scenarios  effective  as  user-centered  design  representa¬ 
tions  may  also  introduce  their  own  bias  into  the  process,  as  de¬ 
signers  come  to  identify  with  and  optimize  their  ideas  for  par¬ 
ticular  actors  or  activity  contexts;  the  very  characteristics  that 
make  a  story  realistic  and  evocative  may  also  lead  designers  to 
adopt  too  narrow  a  view.  The  analysis  of  consequences  implied 
by  a  scenario  (as  in  Table  53.3)  helps  to  create  a  more  balanced 
view  of  current  design  ideas,  but  at  the  same  time  may  lead  de¬ 
signers  to  focus  on  incremental  changes  to  their  ideas  (e.g.,  to 
address  a  perceived  downside)  rather  than  to  consider  more 
innovative  or  transformational  ideas.  Also,  like  any  design  pro¬ 
cess  that  is  centered  on  representations  that  are  essentially  tex¬ 
tual,  a  scenario-based  process  (with  its  associated  analysis  and  il¬ 
lustration)  might  lead  to  an  unwelcome  documentation  burden, 
as  designers  are  faced  with  maintaining  and  evolving  a  poten¬ 
tially  large  set  of  design  scenarios  at  varying  stages  of  refine¬ 
ment.  In  the  next  two  sections  we  consider  frameworks,  design 
tools,  and  other  methods  that  are  aimed  at  reaping  the  bene¬ 
fits  of  scenario-based  design  while  also  helping  to  manage  costs 
such  as  these. 


SCENARIO-BASED  USABILITY  ENGINEERING 


The  concrete  and  work-oriented  character  of  scenarios  makes 
them  an  effective  representation  for  human-centered  design  ac¬ 
tivities,  particularly  when  these  activities  include  participation 
by  end-users  or  other  stakeholders  (Carroll  et  al.,  2000;  Chin, 
Rosson,  &  Carroll,  1997;  Muller,  1992;  Muller,  Tudor,  'Wildman, 
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White,  Root,  &  Dayton,  1995;  Karat,  1995;  Karat  &  Bennett, 
1991;  Rosson  &  Carroll,  2002).  Scenarios  can  be  quickly  devel¬ 
oped,  shared,  and  revised;  they  are  easily  enriched  with 
sketches,  storyboards,  or  other  mockups  (Erickson,  1995;  Kyng, 
1995).  A  scenario  of  use  can  be  directed  at  many  concerns  in 
system  development,  including  documentation  design  and  ob¬ 
ject-oriented  software  design  (Carroll,  1995,  2000).  Given  these 
many  virtues,  it  is  no  surprise  that  scenarios  are  pervasive  in 
software  design  and  development  (Rosson,  Maass,  &  Kellogg, 
1989;  Weidenhaupt,  Pohl,  Jarke,  &  Haumer,  1998).  In  this  sec¬ 
tion  we  expand  on  the  view  that  scenarios  are  a  user-centered 
design  representation.  We  describe  a  programmatic  framework 
for  employing  scenarios  of  use  in  interactive  system  design  (Car- 
roll,  2000;  for  a  more  comprehensive  presentation  of  the  mate¬ 
rial  in  this  section,  see  Rosson  &  Carroll,  2002). 

The  framework  summarized  in  this  section  incorporates  sce¬ 
nario-based  analysis  and  design  into  all  phases  of  system  devel¬ 
opment,  from  problem  analysis  through  usability  evaluation  and 
iterative  development.  We  build  on  the  general  rationale  for  sce¬ 
nario-based  design  described  earlier,  but  at  the  same  time  show 
how  to  make  the  impacts  of  scenario-based  reasoning  compre¬ 
hensive  and  systematic.  The  overall  process  is  one  of  usability 
engineering,  where  the  scenarios  support  continual  assessment 
and  elaboration  of  the  system’s  usefulness,  ease  of  use,  and  user 
satisfaction.  The  aim  is  to  develop  a  rich  understanding  of  cur¬ 
rent  activities  and  work  practices,  and  to  use  this  understanding 
as  a  basis  for  activity  transformation. 

Figure  53.1  provides  an  overview  of  the  scenario-based  us¬ 
ability  engineering  framework  (SBUE).  We  assume  that  system 
development  begins  with  an  initial  vision  or  charter,  even 
though  this  may  be  quite  sketchy  or  nonbinding.  This  vision 
motivates  a  period  of  intense  analysis  during  which  the  current 
situation  is  examined  for  problems  and  opportunities  that  might 
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FIGURE  53. 1 .  An  overview  of  the  scenario-based  design  (SBD) 
framework.  Scenarios  serve  as  a  central  representation  through¬ 
out  the  development  cycle,  first  describing  the  goals  and  con¬ 
cerns  of  current  practices  in  the  problem  domain,  and  then  being 
successively  transformed  and  refined  through  an  iterative  design 
and  evaluation  process  (from  Rosson  &  Carroll,  2002). 


be  addressed  by  available  technologies.  The  analysts’  under¬ 
standing  of  the  current  situation  is  communicated  in  problem 
scenarios  and  claims.  Problem  scenarios  describe  prototypical 
stakeholders  engaged  in  meaningful  activities;  the  claims  enu¬ 
merate  features  of  the  current  situation  that  are  understood  to 
have  important  consequences — both  positive  and  negative — for 
the  scenario  actors. 

The  problem  scenarios  are  transformed  and  elaborated 
through  several  phases  of  iterative  design.  Design  envisioning  is 
inspired  by  metaphors  and  technology  options,  but  at  the  same 
time  is  constrained  by  the  designers’  knowledge  of  interactive 
system  design.  Each  set  of  scenarios  is  complemented  by  claims 
that  analyze  the  possible  positive  and  negative  consequences 
of  key  design  features.  Claims  analysis  leads  designers  to  reflect 
on  the  usage  implications  of  their  design  ideas  while  the  ideas 
are  being  developed. 

SBUE  is  mediated  by  evaluation  activities  throughout  devel¬ 
opment.  Each  narrative  serves  as  a  test  case  for  analytic  evalua¬ 
tion;  each  claim  hypothesizes  usability  outcomes  for  one  or 
more  test  cases.  Design  scenarios  are  also  evaluated  more  di¬ 
rectly  through  empirical  usability  studies.  In  these  the  claims 
analysis  is  used  for  mediated  evaluation,  wherein  the  hypothe¬ 
sized  usage  impacts  are  operationalized  and  tested  explicitly 
(Scriven,  1967).  The  empirical  findings  are  interpreted  with  re¬ 
spect  to  the  ongoing  claims  analysis,  refining  or  redirecting  the 
design  efforts.  We  turn  now  to  a  brief  example  illustrating  the 
key  elements  of  the  framework. 


Problem  Analysis 

A  challenge  for  any  software  development  project  is  identifying 
the  complete  and  correct  set  of  requirements  (Brooks,  1995). 
Many  system  requirements  are  functional,  addressed  by  the  ac¬ 
tual  services  and  information  provided  by  the  final  system. 
Other  requirements  are  nonfunctional,  for  example  the  mea¬ 
sured  quality  of  the  software  implementation  or  user  interac¬ 
tions,  or  pragmatic  features  of  the  system  development  process 
like  schedule,  cost,  or  delivery  platform  (Rosson  &  Carroll, 
2000b;  Sommerville,  1992;  Sutcliffe  &  Minocha;  1998).  In  SBUE 
we  express  an  initial  high-level  view  of  requirements  as  a  root 
concept  (Table  53.4).  The  root  concept  enumerates  key  aspects 
of  the  team’s  starting  vision;  it  is  used  to  guide  further  analysis 
and  elaboration  of  system  requirements. 

Table  53.4  contains  a  root  concept  for  the  Virtual  Science  Eair 
(VSF)  example  introduced  earlier;  we  will  use  the  VSF  problem 
to  illustrate  the  SBUE  framework  (see  Rosson  &  Carroll,  2002, 
for  a  comprehensive  analysis).  The  starting  vision  and  rationale 
in  this  case  are  quite  straightforward:  there  are  obvious  advan¬ 
tages  to  meeting  with  advisors  and  visitors  in  person,  but  the 
constraint  of  “same-time,  same-place”  limits  the  frequency  and/ 
or  length  of  such  meetings.  Moving  some  of  the  VSF  activities 
online  increases  interaction  opportunity  and  flexibility.  A  side  ef¬ 
fect  is  that  online  content  and  communication  can  be  stored 
and  used  for  other  purposes. 

The  root  concept  also  documents  the  team’s  shared  beliefs 
about  the  project’s  major  stakeholders.  K stakeholder  is  any 
person  or  organization  who  will  be  affected  (either  positively 
or  negatively)  by  the  system  (Checkland,  1981;  Muller,  1992).  It 
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TABLE  53.4.  A  Root  Concept  for  a  Virtual  Science  Fair 
Component  Contributions  to  the  Root  Concept 


High-level  vision 
Basic  rationale 


Stakeholder  group: 
Exhibit  author 


Student  advisor 
Fair  visitor 
Starting 

assumptions 


Project  authors  and  visitors  interact  about  science 
exhibits  anytime,  anywhere 
Network-based  interaction  overcomes  barriers  of 
place  and  time 

Digital  media  are  convenient  for  posting, 
organizing,  and  sharing  multimedia  content 

Flexible  tools  guide  construction  and  exhibiting 
of  projects,  and  support  interaction 
Ongoing  access  and  commenting  during  project 
construction 

Self-paced  exploration  of  projects  and  interaction 
with  others  present  concurrently 
Open-ended  participatory  design  process  with 
students,  advisors,  and  visitors 
Target  users  have  pervasive  access  to  personal 
computers  and  network  connections 
Science  fair  hosted  and  advertised  via  existing 
community  computing  infrastructure 


is  important  to  take  a  broad  view  of  stakeholders,  particularly 
early  in  problem  analysis,  so  that  appropriate  individuals  and 
groups  will  be  represented  in  the  analysis  activities.  In  the  ex¬ 
ample,  we  consider  several  different  sorts  of  people  likely  to  be 
involved  with  VSF,  so  that  we  can  consider  their  distinct  goals 
with  respect  to  system  use:  a  student  who  may  find  the  online 
tools  helpful  in  planning,  constructing,  and  talking  with  others 
about  her  exhibit;  a  student’s  advisor  who  might  have  more  reg¬ 
ular  opportunities  to  provide  feedback  and  advice;  visitors  to 
VSF  who  now  have  the  option  of  visiting  “from  home,”  and  who 
may  benefit  from  other  visitors’  interactions  with  the  exhibits 
and  exhibit  authors. 

Although  the  emphasis  of  SBUE  is  on  analyzing  and  devel¬ 
oping  usable  functionality,  there  may  be  a  range  of  nonfunc¬ 
tional  concerns  that  will  constrain  development.  These  are  doc¬ 
umented  as  starting  assumptions  in  the  root  concept.  For  our 
example  project,  we  assume  that  the  design  of  VSF  will  involve 
considerable  participation  by  stakeholders,  that  the  participat¬ 
ing  students,  teachers,  and  community  members  have  the  com¬ 
puting  resources  needed  to  access  the  system,  and  that  the  fair 
will  be  integrated  into  a  town’s  existing  community-computing 
infrastructure. 

The  root  concept  lays  the  groundwork  for  an  analysis  of  how 
science  fairs  currently  take  place.  This  might  involve  field¬ 
work — for  example,  a  visit  to  a  science  lab  where  projects  are 
being  constructed.  It  may  be  more  modest,  perhaps  a  survey  of 
participating  students  aimed  at  eliciting  descriptions  of  activi¬ 
ties,  or  a  series  of  semi-structured  interviews  with  students, 
their  advisors,  and  their  parents.  One  important  source  of  in¬ 
formation  about  the  problem  domain  are  the  artifacts  used  in  an 
activity — for  instance,  a  flyer  advertising  a  science  fair,  a  poster 
created  to  display  a  project,  a  judge’s  notebook,  or  a  scoring 
sheet  used  to  compare  exhibits.  Such  artifacts  are  an  excellent 
source  of  implicit  information  about  stakeholders’  values  and 
activities  (Bodker,  1992;  Norman,  1988;  Rosson  &  Carroll,  2002). 


Field  studies  of  current  practices  generate  rich  and  diverse 
data  about  needs  and  opportunities.  To  direct  these  data  pro¬ 
ductively  toward  design  efforts,  a  more  abstract  representation 
of  themes  and  relations  is  required.  An  affinity  diagram  (group 
members  posting  and  organizing  individual  observations;  Beyer 
&  Holtzblatt,  1998)  is  very  helpful  in  discovering  these  themes. 
Other  useful  techniques  include  diagrams  of  the  stakeholder  re¬ 
lationships,  hierarchical  task  analysis  of  central  activities,  and 
summaries  of  how  collected  artifacts  support  group  activities 
(see  Chapter  2  in  Rosson  &  Carroll,  2002). 

In  SBUE  the  result  of  problem  analysis  is  a  set  of  problem  sce¬ 
narios  and  claims.  A  problem  scenario  is  a  narrative  of  current 
practice  that  offers  a  synthetic  view  of  the  actors,  themes,  rela¬ 
tionships,  and  artifacts  discovered  in  the  fieldwork.  These  sce¬ 
narios  are  not  design-neutral,  however.  Even  during  early  analysis 
activities,  we  assume  that  the  team  has  a  vision  of  how  technology 
might  enhance  current  practice.  The  fieldwork  and  related  analy¬ 
ses  will  inevitably  be  colored  by  this  vision  (Carroll,  Karat,  Alpert, 
"Van  Deusen,  &  Rosson,  1994).  If  the  team  fails  to  establish  a  vi¬ 
sion,  or  creates  inconsistent  or  contradictory  visions,  this  too  will 
influence  requirements  analysis,  but  in  a  less  positive  fashion. 

An  effective  technique  for  generating  problem  scenarios  is  to 
first  describe  a  set  of  hypothetical  stakeholders,  individuals  who 
will  represent  the  different  sorts  of  people  studied  during  the 
fieldwork.  It  is  important  to  create  a  rich  but  realistic  view  of 
these  individuals,  because  they  form  the  basis  for  describing 
and  later  transforming  current  activities  and  experiences.  Our 
examples  focus  on  the  experiences  of  Sharon,  conceived  as  a 
typical  high-school  junior  with  a  large  set  of  friends  who  she 
tries  to  support  in  their  various  activities,  and  with  whom  she 
likes  to  socialize  whenever  she  gets  the  chance. 

The  scenario  in  Table  53.5  conveys  some  aspects  of  current 
science  fairs;  it  enacts  a  likely  visit  activity  and  simultaneously 
communicates  findings  from  fieldwork.  Problem  scenarios  like 
this  may  be  based  directly  on  an  observed  episode,  or  they  may 
be  entirely  synthetic.  The  goal  is  to  introduce  and  contextual¬ 
ize  the  themes  and  relationships  that  will  guide  subsequent 
design  work.  This  particular  story  combines  our  view  of  high- 
school  students  with  issues  related  to  real-world  science  fairs — 
for  example,  the  fact  that  they  are  scheduled  for  a  particular 
place  at  a  particular  time,  the  protocol  for  arriving,  orienting 
oneself,  being  greeted  and  integrated  into  a  conversation,  par¬ 
ticipants’  ability  to  adapt  the  discussion  according  to  the  audi¬ 
ence,  and  so  on. 

The  themes  and  relations  implicit  in  a  scenario  can  be 
made  more  explicit  and  open  for  discussion  by  analyzing  them 
in  claims  (Table  53.6).  Problem  claims  are  analyzed  by  identi¬ 
fying  features  of  the  scenario  that  have  notable  consequences 
for  the  actors’  experience.  This  is  an  instance  of  analytic  eval¬ 
uation  and  as  such  is  clearly  guided  by  the  expectations  and  bi¬ 
ases  of  the  evaluator.  A  more  systematic  evaluation  can  be  ob¬ 
tained  by  asking  questions  of  the  scenario  that  are  guided  by 
cognitive  or  motivational  theories  of  human  behavior  (Carroll 
&  Rosson,  1992;  Poison,  Lewis,  Rieman,  &  Wharton,  1992).  The 
first  claim  captures  a  key  aspect  of  Sharon’s  experience  at  the 
fair — meeting  in-person  with  her  friends  creates  a  rich  com¬ 
munication  bandwidth;  and  both  verbal  and  nonverbal  content 
can  be  shared,  which  helps  Sharon  know  when  and  how  to 
participate  in  the  conversation. 
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TABLE  53.5.  A  Problem  Scenario  Describing 

Sharon’s  Visit  to  a  Traditional  Science  Fair 

Sharon  Visits  Alicia's  Project  at  the  Science  Fair 

Sharon  is  a  junior  at  Blacksburg  High  School.  Even  though  she  has  a 
biology  exam  tomorrow,  she  knows  that  her  friends  Alicia  and  Delia  are 
expecting  her  to  come  to  their  science  fair,  so  she  decides  to  drive  over 
to  the  high  school  and  stay  up  late  to  study  when  she  gets  back.  She 
remembers  that  Alicia  has  been  working  on  her  "Black  Holes”  poster 
for  the  past  couple  of  weeks,  and  Sharon  is  curious  to  see  how  it  came 
out.  She  also  figures  that  some  of  her  other  friends  will  be  there  and  that 
she  can  hang  out  with  them  for  a  while. 

The  fair  is  scheduled  for  7  p.m.  at  high-school  gym.  Sharon  hates  to 
be  the  first  person  at  events,  so  she  waits  until  7:30  to  go.  The  fair  is 
already  pretty  crowded;  she  notes  that  most  of  the  people  milling 
around  are  parents,  but  she  eventually  spies  her  friends  Brian  and  Corey 
across  the  gym  talking  to  Alicia  under  the  'Astronomy”  banner.  She  can 
see  that  they  are  having  an  animated  discussion — Alicia  is  laughing — 
so  she  heads  that  way.  Sharon  is  not  sure  what  the  others  have  been  dis¬ 
cussing,  so  she  just  waves  a  hello  and  pauses  to  catch  the  drift  of  the 
conversation.  In  a  few  seconds,  Alicia  turns  to  say  "Hi”  and  asks  her  to 
vote  on  the  flashiest  project.  Sharon  admits  that  she  hasn't  had  a 
chance  to  look  at  any  others,  but  that  Alicia’s  black-and-white  visuals 
are  very  dramatic.  She  guesses  from  Corey’s  raised  eyebrows  and  Alicia's 
giggle  that  they  must  have  been  debating  Alicia’s  choice  of  color 
scheme,  lust  then  Brian’s  mom  arrives;  Sharon  is  impressed  at  how 
quickly  Alicia  drops  her  teasing  tone  and  conveys  a  professional  sum¬ 
mary  of  her  project’s  goals  and  activities. 


Once  an  activity  feature  has  been  marked  with  a  claim,  the 
analysis  can  be  extended  through  “what  if”  reasoning,  to  ex¬ 
plore  other  possible  positive  or  negative  consequences.  For  in¬ 
stance,  what  if  Sharon  doesn’t  know  Brian  and  Corey?  What  if 
Alicia  is  so  occupied  with  the  boys  that  she  never  greets  or  in¬ 
cludes  Sharon?  Scenarios  naturally  evoke  and  support  this  sort 

TABLE  53.6.  'Two  Claims  Analyzed  from  the  Science 
Fair  Problem  Scenario.  The  Feature  of  Interest 
Appears  in  the  Shaded  Area;  Hypothesized  Positive 
Consequences  are  Prefaced  With  a  Plus  Sign  and 
Negative  Consequences  With  a  Minus  Sign 

Face-to-Face  Interaction  with  Project  Exhibitors  at  a  Science  Fair .  .  . 

+  ensures  that  both  nonverbal  and  verbal  communication  contribute 
to  the  interaction 

+  leverages  many  years  of  experience  with  people’s  perception  and 
communication  conventions 

—  may  introduce  distracting  or  irrelevant  personal  information  about 
partners 

—  inhibits  parallel-communication  activities  (e.g.,  among  multiple 
parties  at  once) 


Using  the  High-School  Gymnasium  to  Host  the  Science  Fair .  .  . 

+  promotes  a  feeling  of  familiarity  and  intimacy  among  students  from 
the  school 

-F  increases  the  chance  that  visitors  will  be  able  to  envision  and  find 
their  way  to  the  fair 

—  requires  visitors  to  set  aside  time  and  to  travel  to  the  site  for 
interaction 

—  prevents  other  school  events  from  happening  in  parallel  (e.g.,  a 
basketball  game) 


of  questioning;  the  concrete  setting  of  the  scenario  invites  rea¬ 
soning  about  partially  overlapping  alternatives  and  outcomes. 

An  important  characteristic  of  claims  analysis  is  that  it  in¬ 
cludes  both  positive  and  negative  consequences.  During  prob¬ 
lem  analysis,  there  is  a  tendency  to  focus  on  the  difficulties  or 
concerns  of  current  practice — for  example,  as  observed  in  ac¬ 
tivity  breakdowns  or  contradictions  (Bpdker,  1992;  Kuutti,  1995; 
Kyng,  1995;  Nardi,  1996).  In  contrast,  designers  tend  to  focus  on 
the  likely  benefits  of  new  features.  Claims  analysis  imposes  a 
balanced  view  of  problems  and  opportunities.  With  respect  to 
understanding  users’  requirements,  taking  a  balanced  view  like 
this  makes  us  aware  of  aspects  of  the  current  situation  that  are 
already  working  well.  With  respect  to  design  envisioning,  it 
forces  us  to  consider  side  effects  or  other  undesired  impacts  of 
changes  to  a  situation. 

Problem  scenarios  and  claims  are  a  central  outcome  of  prob¬ 
lem  analysis  in  SBUE.  Note  that  problem  scenarios  do  not  con¬ 
vey  requirements  in  the  traditional  sense  of  the  term — that  is, 
they  are  not  a  specification  of  features  required  of  a  system.  In¬ 
stead  they  suggest  criteria  for  design  requirements;  whatever 
solutions  designers  propose  should  address  the  positive  and 
negative  user  impacts  expressed  in  the  scenarios  and  claims.  For 
instance,  a  virtual  science  fair  might  be  “required”  to  reinforce 
the  advantages  of  in-person  interaction  as  much  as  possible,  but 
at  the  same  time  address  the  disadvantages  of  physical  meet¬ 
ings.  This  is  quite  different  from  specifying  that  it  will  have  a 
synchronous  communication  channel.  Individual  features  of  the 
solution  will  be  identified,  elaborated,  evaluated,  revised,  or  dis¬ 
carded  in  an  extended  iterative  process. 

Activity  Design 

Design  requirements  emerge  and  are  refined  throughout  sys¬ 
tem  development  (Brooks,  1995).  But  at  some  point  a  team 
understands  enough  about  project  stakeholders  and  their 
needs  that  they  can  make  specific  design  proposals.  Indeed 
some  projects  are  so  overdetermined  that  system  functions  are 
specified  in  advance,  and  problem  analysis  consists  simply  of 
adapting  these  requirements  to  users’  characteristics,  work  set¬ 
tings,  and  preferences.  In  SBUE,  the  initial  step  toward  speci¬ 
fying  a  design  solution  is  made  by  envisioning  how  current  ac¬ 
tivities  might  be  enhanced  or  even  completely  transformed  by 
available  technologies.  We  deliberately  minimize  attention  to 
the  concrete  steps  of  user  interaction  at  this  point,  emphasiz¬ 
ing  the  basic  goals  and  motivations  of  the  new  activities  (see 
also  essential  use  cases  as  described  by  Constantine  and  Lock- 
wood  (1999)). 

SBUE  is  activity-centered — we  analyze  current  practice  at  the 
level  of  meaningful  activities,  and  build  from  this  to  new  activi¬ 
ties  (Bertelsen  &  Bpdker,  2003;  Kuutti  &  Arvonen,  1992;  Nor¬ 
man,  2005).  A  danger  in  this  is  that  the  designers  will  focus  too 
much  on  how  goals  are  pursued  in  the  current  situation,  and  on 
understanding  and  responding  to  people’s  current  expectations 
about  their  tasks  and  about  technology.  Thus  to  encourage  con¬ 
sideration  of  new  options  and  insights,  we  deliberately  expand 
the  “design  space”  when  envisioning  new  activities.  By  design 
space,  we  mean  the  array  of  possible  concepts  and  technologies 
that  might  be  relevant  to  the  problem  domain,  along  with  some 
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analysis  of  what  these  options  might  bring  to  the  design  solu¬ 
tion  (MacLean,  Young,  &  Moran,  1989;  Moran  &  Carroll,  1996). 

Table  53.7  exemplifies  two  techniques  useful  for  exploring 
design  alternatives.  The  upper  part  of  the  table  shows  how  dif¬ 
ferent  conceptual  metaphors  evoke  contrasting  views  of  stake¬ 
holder  activities.  Metaphors  are  often  used  deliberately  in  user 
interface  design,  with  the  hope  that  users  will  recruit  them  in 
reasoning  by  analogy  about  what  a  system  does  or  how  it  works 
(Carroll  &  Thomas,  1982;  Carroll,  Mack,  &  Kellogg,  1988;  Mad¬ 
sen,  1994).  Here  we  emphasize  the  role  of  metaphors  in  helping 
designers  “think  outside  the  box”  (Erickson,  1990;  VerPlank, 
1988).  Addressing  real-world  activities  and  concerns  is  crucial  to 
effective  system  design,  but  metaphoric  thinking  often  pro¬ 
motes  the  insights  of  truly  creative  design. 

An  analysis  of  available  information  technology  provides  a 
complement  to  the  metaphoric  thinking.  In  a  sense,  the  tech¬ 
nology  provides  another  set  of  metaphors  for  thinking  about 
the  activities  to  be  supported;  but  in  this  case  the  analogy  is 
with  classes  of  software  and  devices  that  already  exist  (e.g., 
web  information  systems,  e-mail,  or  database  packages).  At  the 
same  time,  a  technology-oriented  analysis  such  as  this  directs 
the  design  back  to  many  of  the  pragmatic  concerns  of  software 
development,  by  enumerating  possible  technologies  and  how 
they  might  play  a  role  in  the  solution.  This  analysis  may  also  be 
very  influenced  by  the  project’s  starting  assumptions  (see  Table 
53.3),  if,  for  instance,  the  development  organization  already  has 
developed  a  shared  electronic  whiteboard  tool,  or  has  consid¬ 
erable  experience  building  web-information  systems. 

The  exploration  of  metaphors  and  technology  does  not  gen¬ 
erate  a  new  activity  design.  Rather  it  provides  a  set  of  lenses  for 
discussion.  The  team  might  consider  what  it  would  be  like  if 
the  virtual  science  fair  was  designed  to  be  like  a  cocktail  party 
versus  a  lecture;  they  can  argue  about  the  relative  advantages 
of  using  a  structured  groupware  framework  versus  an  open- 


TABLE  53.7.  Using  Metaphors  and  Existing  Information 
Technology  to  Reason  About  Activities 


Activity-Design  Features  Suggested  by  Metaphors 
for  a  Virtual  Science  Fair 

Reading  at  the  library 

Hearing  a  lecture 

Visiting  a  museum 

Going  to  a  cocktail  party 

Self-paced,  individual  access  to  structured 
information 

Large  audience;  prepared  materials; 

one-way  communication 

Array  of  artifacts;  small  groups  or 
individuals  examine,  discuss 

Friends  forming  subgroups;  social 
exchange  and  mingling 

Activity  Design  Features  Suggested  by  Information  Technology 
for  a  Virtual  Science  Fair 

A  hierarchy  of  web  pages 

An  e-mail  distribution  list 
(listserv) 

A  shared  electronic 
whiteboard 

Meeting  groupware 

Mix  of  text  and  graphics,  category  names, 
and  links 

One-way  "push"  communication,  possibly 
large  audience 

Informal  sketches  with  multiple  people 
working  at  once 

Explicit  agenda,  support  for  floor  control, 
meeting  records 

ended  website.  These  divergent  discussions  form  a  backdrop 
for  the  convergent  process  of  scenario  writing. 

The  writing  of  activity  scenarios  is  a  creative  process  influ¬ 
enced  by  many  factors.  One  or  more  problem  scenarios  may  be 
used  as  a  starting  point — these  provide  a  realistic  context  and  set 
of  goals  for  exhibiting  at  or  visiting  a  science  fair.  The  associated 
claims  guide  design  reasoning — the  general  heuristic  is  to  main¬ 
tain  or  even  enhance  the  positive  consequences  for  the  actors 
while  minimizing  or  removing  the  negatives  (Carroll  &  Rosson, 
1992).  The  metaphors  and  technology  exploration  provide  solu¬ 
tion  ideas — how  our  concept  of  project  exhibits  or  conversations 
with  visitors  might  be  expanded  or  transformed,  and  how  existing 
technology  might  support  these  activities.  Of  course,  the  design¬ 
ers’  knowledge  of  human-computer  interaction  and  of  interactive 
system  design  more  broadly  provide  important  guidance — for 
example,  knowing  the  relative  affordances  of  different  computer- 
mediated  communication  channels,  understanding  the  motiva¬ 
tional  challenges  of  discretionary-use  software,  and  so  on. 

Two  contrasting  activity  scenarios  for  Sharon’s  visit  to  "VSF  ap¬ 
pear  in  Table  53.8.  Both  activity  designs  address  the  goals  of  the 
actors  in  the  problem  scenario:  visiting  a  friend’s  exhibit  and 
chatting  with  her  and  with  other  friends.  Both  respond  to  claims 
analyzed  for  the  problem  scenario — for  example,  attempting  to 
maintain  the  familiarity  and  intimacy  of  a  real-world  meeting  at 
the  high  school,  while  making  participation  more  flexible,  and 
enabling  parallel  conversations.  However,  they  address  these 
goals  and  concerns  with  rather  different  views  of  what  consti¬ 
tutes  an  online  exhibit  and  what  network-based  technology 
might  be  used  to  support  it. 

The  first  example  was  influenced  by  the  lecture  and  library 
metaphors.  New  exhibit  materials,  and  comments  related  to  ex¬ 
isting  exhibits,  are  automatically  distributed  to  the  visitor;  this 
is  analogous  to  sitting  in  a  room,  watching  a  lecture,  and  hearing 
questions  and  answers.  The  online  material  is  organized  into 
topical  categories  that  can  be  browsed  in  a  self-paced  fashion, 
just  as  Sharon  might  browse  stacks  of  books  in  a  library.  These 
metaphors  are  easily  supported  by  a  combination  of  e-mail  and 
web  pages. 

The  second  scenario  shows  an  influence  of  the  cocktail 
party,  museum,  and  library  metaphors.  It  emphasizes  social  ex¬ 
change  and  informal  conversation,  as  well  as  responses  to  an  as¬ 
sortment  of  VSF  artifacts  in  the  space.  Sharon  is  able  to  see 
which  of  her  friends  are  around  (predicted  in  advance  by  her  IM 
buddy  list),  and  is  to  locate  them  (along  with  other  visitors) 
when  she  goes  to  the  'VSF.  She  can  view  the  exhibit,  scan  the 
comment  board,  and  catch  up  on  the  group  chat  history,  all  in 
parallel.  Her  friends  are  referring  to  specific  features  of  Alicia’s 
exhibit  and  Sharon  is  immediately  able  to  interpret  their  mean¬ 
ing  and  jump  in  to  the  conversation. 

Both  scenarios  have  attractive  consequences  for  Sharon  and 
her  friends.  Both  are  possible  solution  approaches,  so  how  do 
the  designers  choose?  Again,  many  pragmatic  factors  contribute 
to  this  decision — the  kinds  of  software  currently  in  use,  the 
team’s  design  expertise,  software  development  timeline  and  re¬ 
sources,  organizational  priorities,  and  so  on.  But  assuming  that 
both  solutions  are  genuine  possibilities,  the  designers  must  also 
evaluate  them  with  respect  to  their  usage  implications.  One  way 
to  do  this  is  with  participatory  design  sessions  (Carroll  et  ah, 
2000;  Chin,  Rosson,  &  Carroll,  1997;  Muller,  1992;  Kyng,  1995) 


1050  •  ROSSON  AND  CARROLL 


TABLE  53.8.  Two  Alternative  Activity  Designs 
for  the  Online  Club  Meeting 

A.  Sharon  Visits  the  VSF  Web  Forum 

Sharon  is  a  junior  at  Blacksburg  High  School.  Even  though  she  has  a  biol¬ 
ogy  exam  tomorrow,  she  knows  that  her  friends  Alicia  and  Delia  have 
been  working  hard  to  develop  their  online  exhibits  for  the  science  fair, 
and  even  though  she  has  browsed  their  projects  a  few  times  over  the  past 
couple  of  weeks,  the  judging  will  be  tonight.  She  decides  to  take  a  study 
to  visit  and  see  how  things  are  going  on,  fully  expecting  that  a  few  of  her 
other  friends  will  be  online  as  well. 

When  she  logs  on  her  IM  buddy  lists  shows  that  both  Alicia  and 
Delia  are  online,  as  are  their  friends  Brian  and  Corey.  She  uses  the  VSF 
announcement  e-mail  to  get  to  the  VSF  website.  Along  with  a  welcome 
notice  there  are  science  project  categories  and  an  invitation  to  browse 
the  exhibits.  She  knows  that  Alicia's  project  is  on  black  holes,  so  finds 
'Astronomy”  and  follows  the  link.  Another  web  page  appears  with  a  list¬ 
ing  of  individual  projects  allowing  her  to  choose  her  friend's  project.  Ali¬ 
cia's  exhibit  is  organized  by  the  traditional  pieces  of  a  science  project 
(problem,  significance,  methods,  etc.).  There  is  also  a  discussion  forum, 
and  when  she  opens  it  she  can  see  comments  from  visitors — some  have 
signed  their  names,  and  others  are  anonymous.  She  notices  one  from 
Brian  teasing  Alicia  about  her  dramatic  black-and-white  color  scheme. 
Alicia  has  been  responding  to  most  comments;  her  most  recent  entry  just 
a  couple  of  minutes  ago  makes  Sharon  think  she  might  be  watching  the 
discussion,  so  she  adds  a  (signed)  comment:  "Very  dramatic  visuals!"  Ali¬ 
cia  responds  immediately  and  opens  an  IM  chat. 

B.  Sharon  Visits  the  VSF  in  a  Collaborative  Virtual  Environment 

Sharon  is  a  junior  at  Blacksburg  High  School.  Even  though  she  has  a  biol¬ 
ogy  exam  tomorrow,  she  knows  that  her  friends  Alicia  and  Delia  have 
been  working  hard  to  develop  their  online  exhibits  for  the  science  fair, 
and  even  though  she  has  browsed  their  projects  a  few  times  over  the  past 
couple  of  weeks,  the  judging  will  be  tonight.  She  decides  to  take  a  study 
to  visit  and  see  how  things  are  going  on,  fully  expecting  that  a  few  of  her 
other  friends  will  be  online  as  well. 

When  she  logs  on  her  IM  buddy  lists  shows  that  both  Alicia  and  Delia 
are  online,  as  are  their  friends  Brian  and  Corey.  She  uses  the  VSF 
announcement  e-mail  to  get  to  the  VSF  and  arrives  at  a  familiar  spot, 
the  middle-school  gymnasium  (she  can  imagine  her  favorite  spot  in  the 
bleachers).  All  of  the  exhibits  are  arranged  in  different  areas  of  the  gym 
according  to  their  science  categories  (biology,  forestry,  astronomy,  etc.). 
She  can  see  who  is  currently  at  the  fair,  and  discovers  that  Brian  and 
Corey  over  near  the  astronomy  table.  She  knows  that  Alicia's  project  is 
on  black  holes,  so  she  figures  they  are  probably  at  her  exhibit  right  now 
and  moves  toward  the  table;  when  she  gets  close  enough  she  is  able  to 
see  the  exhibits  on  the  table,  including  Alicia's.  When  she  arrives  next  to 
Alicia’s  project  she  sees  the  online  poster  Alicia  has  been  working  on  over 
the  past  few  weeks;  each  section  contains  a  traditional  project  compo¬ 
nent  (problem,  significance,  methods,  etc).  She  also  sees  a  comment 
board  and  notices  that  Alicia  has  had  quite  a  few  conversations  with  vis¬ 
itors  already  tonight,  including  some  from  Brian  and  Corey.  As  she 
expected,  Alicia  is  also  there  with  Brian  and  Corey,  engaged  in  a  group 
chat.  Knowing  Brian,  she  guesses  that  he  is  teasing  Alicia  about  her  black- 
and-white  color  scheme  and  quickly  comes  to  Alicia’s  defense. 


TABLE  53.9.  Activity  Claims  that  Draw  Out  Some 
of  the  Contrasts  Inhering  in  the  TWo  Designs 

A.  Sharon  Visits  the  VSF  Web  Forum 

Science  exhibits  presented  as  separate  pages  organized  by  topic  and 
submitters.  .  . 

+  leverage  people's  familiarity  with  categorical  hierarchies 

+  emphasize  the  central  and  permanent  recognition  of  individual 
contributions  to  the  archive 

—  may  be  tedious  or  complex,  especially  if  it  involves  browsing 
extensive  stored  archives 

—  may  discourage  people  from  contributing  more  transient  and 
informal  content  to  an  organized  archive 


B.  Sharon  Visits  the  VSF  in  a  Collaborative  Virtual  Environment 

Science  exhibits  visited  as  objects  in  a  space  inhabited  by  online 
participants.  .  . 

+  Sleverage  people’s  familiarity  with  real-world  conversational 
strategies 

+  encourages  a  combination  of  exhibit-specific  and  ad  hoc  informal 
exchange 

—  requires  that  conversation  participants  be  present  at  the 
same  time 

—  may  be  difficult  for  newcomers  to  intermpt  an  ongoing  conversation 


be  similar  in  many  respects,  but  the  claims  capture  a  design  con¬ 
trast  built  into  the  alternative  designs.  The  web  forum  offers  a 
convenient  hierarchical  listing  of  exhibit  categories  and  page-by- 
page  browsing  of  projects,  whereas  the  collaborative  space 
holds  simulated  “exhibits”  that  people  can  approach,  view,  and 
discuss  in  real  time.  The  analysis  helps  the  designers  to  see  the 
relative  advantages  and  disadvantages  of  an  organized  asyn¬ 
chronous  browsing-and-commenting  experience  and  a  more  ad 
hoc  synchronous  exchange.  Such  an  analysis  may  or  may  not 
be  enough  to  select  one  alternative  over  the  other.  But  at  the 
least  it  begins  to  identify  user  experience  concerns  that  provide 
input  for  continuing  design  discussions. 

It  is  important  to  note  how  much  progress  can  be  made  even 
at  this  very  early  level  of  envisioning  activities.  The  narratives  in 
Table  53.8  are  quite  concrete  and  evocative;  designers  or  their 
clients  can  readily  understand  what  is  being  proposed  and  begin 
to  consider  the  relative  pros  and  cons  of  the  design  ideas.  Al¬ 
though  the  influence  of  the  metaphors  and  information  tech¬ 
nology  is  obvious,  few  details  are  provided  about  how  various 
sorts  of  the  information  will  be  presented  or  how  participants 
will  access  and  interact  with  this  information.  The  scenarios  are 
“just  talk”;  indeed  if  they  are  shared  and  discussed  over  an 
interactive  medium,  they  can  easily  be  extended  or  revised  as 
part  of  a  real-time  design  review  and  discussion. 

User-Interaction  Design 


that  focus  on  how  well  the  alternatives  suit  stakeholders’  needs.  As  design  continues,  tentative  decisions  are  made  about  the  de- 

As  with  the  problem  scenarios,  in  SBUE  we  evaluate  design  see-  sign  direction.  For  this  continuing  example,  we  will  elaborate 

narios  with  claims  analyses,  considering  the  hypothetical  posi-  the  second  of  the  two  activity  scenarios  in  Table  53.9.  We  have 

tive  and  negative  consequences  of  design  features  through  opted  for  the  collaborative  environment  design  because  it 

“what  if”  discussions.  strongly  reinforces  important  strengths  of  current  real  world 

A  sample  claim  from  each  scenario  appears  in  Table  53.9.  In  practice  (e.g.,  familiarity,  intimacy,  well-learned  conversation 

this  illustrative  example,  the  activity  scenarios  were  written  to  skills;  see  Table  53.6),  while  at  the  same  time  addressing  some  of 
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the  problems  identified  in  requirements  analysis  (e.g.,  the  need 
to  travel  to  a  specific  place,  the  possible  distraction  of  irrele¬ 
vant  personal  information) .  We  now  begin  to  elaborate  the  un¬ 
derlying  activity  with  information  and  interaction  details — the 
user  interface. 

As  for  activity  design,  we  first  explore  the  design  space  with 
metaphors  and  technology  options.  The  metaphors  applied  to 
information  design  are  likely  to  overlap  with  those  used  in  activ¬ 
ity  design,  but  the  emphasis  is  shifted  to  what  the  users  will  see 
and  understand  about  the  system;  the  concerns  of  this  phase  are 
similar  to  those  in  the  “gulfs  of  evaluation  and  execution”  dis¬ 
cussed  by  Norman  (1986).  For  instance,  the  three  metaphors  of 
library,  museum,  and  cocktail  party  suggest  these  design  ideas: 

•  library,  information  is  organized  like  pages  in  books  with  ti¬ 
tles;  objects  are  organized  alphabetically  or  by  category  on 
shelves;  there  are  desks  and  chairs  for  reading  and  working; 
people  walk  around  to  find,  pick  up,  and  browse  objects; 
there  may  be  an  organized  index  to  search  for  specific  types 
of  objects;  visitors  may  be  able  to  check  out  objects  for  more 
extended  interaction 

•  museum-,  the  space  is  broken  into  relatively  small  topic- 
specific  rooms;  objects  of  interest  are  mounted  on  walls  or 
tables;  there  is  space  for  groups  of  interested  viewers;  de¬ 
scriptive  titles  and  text  explain  display  objects;  people  may  be 
able  to  use  an  audiotape  to  guide  their  visit 

•  cocktail  party,  there  are  various  “conversation  areas,”  per¬ 
haps  including  a  table  and  chairs;  visitors  self-organize  into 
groups  and  emit  a  “buzz”  of  conversation;  new  arrivals  are 
greeted  with  waves  or  smiles 

Technology  options  explored  at  this  phase  might  include  hy¬ 
perlinks  (icons  or  other  controls  are  used  to  navigate  or  access 
content  objects),  multimedia,  MOOs  (object-oriented  multiuser 
environments  based  on  a  spatial  model),  other  graphical  tech¬ 
niques  for  rendering  a  room  (a  static  photograph,  a  panorama, 
a  three-dimensional  model),  as  well  as  techniques  for  repre¬ 
senting  the  people  and  objects  in  the  space  (avatars,  buddy  lists, 
texture-mapped  forms  and  objects). 

Ideas  such  as  these  are  discussed  and  combined  with  the  de¬ 
sign  team’s  experience  in  user-interaction  design.  Information 
and  interaction  design  possibilities  can  be  “tried  out”  in  the  con¬ 
text  of  various  activity  design  scenarios,  with  attention  directed 
at  claims  analyzed  in  earlier  phases  as  well  as  new  features  and 
consequences.  Does  the  design  enhance  intimacy  among  club 
members?  Will  newcomers  feel  welcome?  Will  they  be  able  to 
“catch  up”  and  participate?  Does  the  design  support  parallel  ac¬ 
tivities?  Design  inquiry  such  as  this — and  the  scenarios  that  pro¬ 
vide  a  real-world  context  for  the  reflective  process — is  a  hall¬ 
mark  of  scenario-based  design. 

Two  alternative  user  interaction  designs  appear  in  Table 
53.10.  For  simplicity  we  focus  on  just  the  central  actions  at  Ali¬ 
cia’s  exhibit.  Both  scenarios  offer  a  view  into  a  virtual  exhibit 
that  includes  multimedia  content  and  project-specific  com¬ 
ments.  Both  assume  an  information  model  that  is  spatial,  ap¬ 
plying  the  concepts  of  rooms,  furniture,  and  so  on  as  a  perva¬ 
sive  metaphor.  Both  also  borrow  from  the  other  more  specific 
metaphors  described  earlier — for  example,  there  is  a  table  that 


TABLE  53.10.  Alternative  User  Interaction  Designs 
for  the  Online  Club  Meeting  Scenario 

A.  Sharon  Visits  VSF  in  the  Community  MOO 

ISharon's  background  and  goal  to  visit  the  VSF  .  .  .| 

When  Sharon  logs  on  her  IM  buddy  lists  shows  that  both  Alicia  and 
Delia  are  online,  as  are  their  friends  Brian  and  Corey.  She  clicks  on  the 
VSF  announcement  in  her  e-mail  to  open  VSF.  A  welcome  description  and 
snapshot  of  the  middle-school  gymnasium  appears  (she  can  imagine  her 
favorite  row  and  seat  in  the  bleachers).  The  description  introduces  a 
number  of  exhibit  tables,  including  ’Astronomy"  and  "Biology,”  which  is 
where  her  friends'  exhibits  are  on  display.  The  description  also  tells  her 
who  is  currently  visiting  the  fair — as  expected,  Alicia,  Delia,  Brian,  and 
Corey  are  all  in  the  space  somewhere. 

Sharon  knows  that  Alicia’s  project  is  on  black  holes,  so  she  uses  the 
"walk  to"  command  to  approach  the  Astronomy  table;  the  VSF  lists 
the  exhibits  on  the  table,  including  'Alicia  on  Black  Holes.”  Sharon  uses 
the  walk  command  to  move  next  to  Alicia's  project,  where  she  is  shown 
an  image  of  a  poster  with  names  for  the  different  sections  (problem,  sig¬ 
nificance,  methods,  etc).  She  is  also  told  that  there  is  a  comment  board, 
and  when  she  asks  to  "review  comments”  she  can  see  that  Alicia  has  had 
quite  a  few  conversations  with  visitors  already  tonight,  including  a  few 
from  Brian  and  Corey.  As  she  expected,  Alicia  is  at  the  exhibit  with  Brian 
and  Corey,  using  the  exhibit-specific  text  chat.  She  can  see  that  Brian  is 
teasing  Alicia  about  her  black-and-white  color  scheme  and  quickly  comes 
to  Alicia's  defense. 

B.  Sharon  Visits  VSF  in  the  Collaborative  Environment 

ISharon's  background  and  goal  to  visit  the  VSF.  .  .| 

When  Sharon  logs  on,  her  IM  buddy  lists  shows  that  both  Alicia  and 
Delia  are  online,  as  are  their  friends  Brian  and  Corey.  She  clicks  on  the 
VSF  announcement  in  her  e-mail  to  open  VSF.  She  sees  a  panoramic  view 
of  the  middle-school  gymnasium  with  a  little  rotation  widget  that  lets  or 
look  around  (she  finds  her  favorite  row  and  seat  in  the  bleachers).  She 
can  also  see  some  exhibit  tables  with  banners  over  them  identifying  them 
as  Astronomy  and  Biology  and  so  on  (she  remembers  that  Alicia  is  on  the 
'Astronomy”  table  and  Delia  in  "Biology”).  She  can  also  see  a  bunch  of  lit¬ 
tle  red  dots  indicating  visitors;  she  wonders  why  some  are  brighter  than 
others.  She  guesses  that  the  three  dots  over  by  'Astronomy”  might  be  Ali¬ 
cia  and  some  friends;  sure  enough,  when  she  mouses  over  the  dots  the 
names  'Alicia,”  "Corey,”  and  "Brian”  appear.  So  she  pans  and  zooms  to 
move  in  closer  to  the  table. 

As  she  zooms  in,  she  first  sees  about  a  dozen  miniature  exhibits 
appear  on  the  table;  she  can  tell  which  one  is  Alicia's  without  even  mou¬ 
sing  over  because  she  recognizes  the  dramatic  black-and-white  "cover 
sheet”  Alicia  has  been  working  on  for  the  past  few  weeks.  She  pans  and 
zooms  to  get  in  close  to  the  project;  the  cover  sheet  is  replaced  by  a 
rectangular  exhibit  containing  several  visuals  representing  different  parts 
of  the  exhibit,  including  a  box  where  people  can  leave  comments.  But 
Sharon’s  attention  is  drawn  instead  to  the  group  chat  that  Alicia  is  hav¬ 
ing  with  her  friends.  Each  person  there  is  represented  by  their  VSF  avatar 
(Brian's  shows  his  face  in  a  grimace — typical  Brian),  and  their  chat  input 
shows  up  in  bubbles  as  they  "speak."  But  each  bubble  also  has  a  "More 
.  .  .”  link,  and  when  Sharon  clicks  it  she  can  see  the  chat  conversation, 
recorded  in  reverse  order  so  that  she  can  quickly  scan  and  get  up  to 
speed.  She  sees  that  Brian  is  teasing  Alicia  about  her  dramatic  cover 
page,  and  quickly  comes  to  her  defense. 


holds  project  exhibits,  the  exhibits  are  posters  that  “point”  to  a 
variety  of  other  information,  the  participants  form  groupings 
and  are  in  conversation,  and  so  on.  However,  the  technology 
supporting  each  scenario  is  quite  different.  In  the  first  case  the 
visitors  are  interaction  within  a  traditional  text-based  MOO, 
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whereas  the  second  presents  a  graphical  rendition  of  the  under¬ 
lying  spatial  model. 

Either  of  the  designs  could  be  used  to  present  and  interact 
with  an  online  science  fair,  but  the  two  proposals  have  differ¬ 
ent  implications  for  how  participants  will  experience  the  space. 
In  the  first  case,  visitors’  attention  is  directed  toward  a  poten¬ 
tially  long  and  complex  stream  of  descriptive  text  that  “explains” 
the  interspersed  images.  The  experience  is  rather  like  reading 
a  book  or  a  play,  with  different  people  and  objects  providing  the 
content,  but  much  of  the  mental  experience  under  the  control 
of  the  reader.  In  contrast,  the  graphical  view  offers  a  concrete 
rendition  of  the  space,  and  visitors  may  be  attracted  to  one  or 
more  activities  taking  place  in  parallel,  in  this  case  a  real-time 
chat  among  friends. 

Some  of  these  general  implications  are  captured  by  the  claims 
in  Table  53.11.  As  for  the  earlier  design  claims,  the  usage  conse¬ 
quences  analyzed  do  not  mandate  one  choice  over  another,  but 
rather  provoke  discussion  of  contrasting  pros  and  cons.  For  ex¬ 
ample,  even  though  the  more  graphical  environment  may  seem 
more  familiar  or  inviting,  we  note  an  important  negative  conse¬ 
quence,  namely  that  the  concrete  view  of  the  gym,  the  tables, 
and  so  on  may  dampen  the  creativity  or  fantasy  of  visitors’  expe¬ 
riences.  This  may  become  an  issue  as  design  continues,  for  ex¬ 
ample  we  may  search  for  ways  to  suggest  a  hybrid  approach, 
inviting  both  real  world  and  fantasy  content  (Cherny,  1995) . 

Ultimately,  user  interaction  design  comprises  all  aspects  of 
how  the  activities’  objects  and  actions  are  rendered  and  exe¬ 
cuted  during  users’  activities.  In  many  design  projects,  this  may 
include  special  attention  to  the  needs  of  new  or  inexperienced 
users.  For  instance,  suppose  that  this  was  Sharon’s  first  visit  to 
’VSR  How  might  she  know  that  the  tables  held  exhibits,  and  that 
she  should  move  in  closer  to  “see”  them?  In  the  graphical  case, 

TABLE  53.11  Claims  Contrasting  Usage  Implications 
of  the  Alternative  Information  designs 

A.  Sharon  Visits  the  VSF  in  the  Community  MOO 

Textual  descriptions  of  people  and  objects  present  in  an  online  space  .  .  . 

-I-  focus  visitors’  attention  on  a  single  source  of  information  about 
the  situation  and  events 

-I-  leverages  visitors'  experience  and  enjoyment  with  fantasy- 
producing  textual  imagery 

—  may  feature  complex  interleaving  of  many  sorts  of  descriptions 
and  communications 

—  may  be  difficult  to  integrate  text-based  descriptions  into  a 
consistent  spatial  model 


B.  Sharon  Visits  the  VSF  in  a  2D  Graphical  Environment 

Two-dimensional  graphical  depictions  of  people  and  objects  present  in 
an  online  space.  .  . 

+  leverage  club  members’  familiarity  and  habits  with  real-world 
places  and  objects 

+  promote  parallel  recognition  and  attention  to  spatially  distributed 
information  and  activities 

+  enable  the  use  of  visual  information  to  organize  behavior 
(e  g.,  locations  in  room  or  on  a  table) 

—  may  be  overwhelming  due  to  the  many  possibilities  presented  in 
parallel 

—  may  be  confusing  or  disorienting  due  to  discontinuities  in  spatial 
navigation  (relative  to  the  real  world) 


the  visual  cues  provided,  along  with  general  experience  in  using 
graphical  user  interfaces,  might  be  enough  to  cue  the  behavior 
described  in  the  scenario.  But  would  she  know  to  follow  the 
“More  .  .  .”  link  to  access  the  chat  history?  It  is  often  useful  to 
create  “user  documentation”  versions  of  design  scenarios, 
where  an  actor(s)  is  assumed  to  have  little  or  no  experience  us¬ 
ing  the  system.  In  SBUE,  documentation  design  (help  texts  or 
other  guidance)  is  also  inspired  by  appropriate  metaphors  (e.g., 
a  coach,  a  policeman)  and  technology  (e.g.,  online  tutorials,  an¬ 
imated  demonstrations).  More  detailed  examples  of  scenario- 
based  documentation  design  can  be  found  in  our  usability¬ 
engineering  textbook  (see  Chapter  8  in  Rosson  &  Carroll) . 

In  the  VSE  design  example  presented  here,  we  considered 
and  incorporated  standard  user  interaction  technology — the 
familiar  'WIMP  paradigm  of  windows,  icons,  menus,  and  point¬ 
ing.  However,  SBUE  can  also  be  used  to  envision  and  analyze 
the  implications  of  more  novel  user-interaction  paradigms  and 
devices.  Eor  example,  we  might  consider  a  role  for  intelligent 
agents  as  part  of  a  new  VSE  scenario,  and  contrast  this  to  a  sce¬ 
nario  involving  community-generated  EAQ  (frequently  asked 
questions)  repository.  Or  we  might  explore  the  implications  of 
gesture  or  speech  recognition  in  lieu  of  (or  as  a  complement  to) 
keyboard  and  mouse  input.  A  key  advantage  of  exploring  these 
ideas  within  a  scenario  context  is  that  designers  are  less  likely 
to  be  caught  up  in  the  new  technologies  for  their  own  sake; 
SBUE  encourages  designers  to  try  out  novel  ideas  in  usage  situ¬ 
ations  that  at  least  are  believable,  and  that  are  analyzed  explicitly 
with  respect  to  usability  consequences. 

Usability  Evaluation 

Like  most  user-centered  design  frameworks,  SBUE  assumes 
that  usability  evaluation  begins  early  and  continues  throughout 
the  design  and  development  process.  Any  representation  of  a 
design  can  be  evaluated.  Figure  53.1  highlighted  the  evaluation 
that  takes  place  after  detailed  user-interaction  scenarios  have 
been  developed.  This  may  involve  relatively  formal  usability 
testing — representative  users  who  are  recruited  to  carry  out 
representative  tasks  on  early  operational  prototypes.  Such  eval¬ 
uation  requires  sufficient  progress  on  a  design  to  enable  con¬ 
struction  of  a  prototype,  though  of  course  user-interaction  pro¬ 
totypes  can  be  built  with  special-purpose  languages  or  tools,  or 
may  even  be  low-fidelity  prototypes  constructed  of  cardboard 
and  paper  (Muller,  1991;  Virzi,  Sokolov,  &  Karis,  1996). 

Starting  in  the  very  early  phases  of  design,  user  feedback  may 
be  obtained  in  informal  participatory  design  sessions  (Chin, 
Rosson,  &  Carroll,  1997;  Muller,  1992).  Users  can  be  included  in 
discussion  and  envisioning  of  activity  scenarios.  The  design  ideas 
are  also  subjected  constantly  to  analytic  evaluation  through 
claims  analysis  and  other  design-review  activities  (e.g.,  usability 
inspections  or  cognitive  walkthrough;  Nielsen,  1995;  Nielsen  & 
Mack,  1994;  Poison  et  al.,  1992).  All  of  these  activities  yield  for¬ 
mative-evaluation  feedback  that  guides  changes  and  expansion 
of  the  design  vision. 

SBUE  builds  on  Scriven’s  (1967)  concept  of  mediated  eval¬ 
uation.  In  mediated  evaluation,  empirical  data  are  collected 
(what  Scriven  terms  “pay-off”  evaluation),  but  the  materials  and 
methods  of  the  empirical  tests  are  guided  by  prior  analytic  eval- 
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uation.  The  analytic  evaluation  may  have  many  different  com¬ 
ponents  (for  example,  an  expert-based  inspection)  or  perhaps  a 
cognitive  model  constructed  for  a  particularly  complex  or  criti¬ 
cal  interaction  sequence  (Gray,  John,  &  Atwood,  1992;  Kieras, 
1997).  In  SBUE,  the  primary  analysis  that  mediates  empirical 
testing  is  claims  analysis.  Claims  written  during  scenario  gener¬ 
ation  and  discussion  document  the  usability  issues  thought  to 
be  most  likely  to  influence  users’  success,  failure,  and  satisfac¬ 
tion;  thus  the  claims  are  used  as  a  skeleton  for  constructing  and 
administering  user  tests.  In  fact,  one  view  of  a  claims  set  is  as  a 
series  of  usability  hypotheses  that  can  be  assessed  empirically; 
claims  also  suggest  why  a  design  feature  may  have  a  particular 
impact  on  users’  experiences. 

Scenarios  and  their  associated  claims  can  be  combined  to 
create  usability  specifications  (Carroll  &  Rosson,  1985;  Good, 
Spine,  Whiteside,  &  George,  1986).  A  usability  specification  is  a 
representative  task  context  that  has  been  analyzed  into  critical 
subtasks,  with  each  subtask  assigned  target  usability  outcomes. 
In  SBUE,  the  design  scenarios  provide  a  realistic  task  context, 
and  the  associated  claims  provide  an  analysis  of  the  scenario 
that  is  parameterized  with  expected  or  desired  usability  out¬ 
comes.  When  the  prototype  has  sufficient  functionality  that  it 
can  be  tested  with  representative  users,  the  specified  tasks  are 
tested  and  the  results  compared  to  the  target  outcomes. 

A  sample  usability  specification  developed  from  the  'VSF  sce¬ 
nario  appears  in  Table  53.12.  The  scenario  serves  as  a  usability 
specification  in  two  ways.  Early  (or  at  any  point)  in  design,  rep¬ 
resentative  users  may  be  asked  to  simply  explore  a  rough  pro¬ 
totype  while  pursuing  the  open-ended  goals  stated  in  the  task 
context.  Because  the  actual  experience  of  users  would  vary  con¬ 
siderably  in  this  case,  it  does  not  make  sense  to  establish  per¬ 
formance  outcomes;  indeed,  formative  evaluations  often  in¬ 
clude  instructions  to  think  aloud,  which  can  slow  and  perhaps 
even  interfere  with  task  performance  (Mack,  Lewis,  &  Carroll, 
1983).  Nonetheless,  the  system  is  considered  successful  in  these 
cases  if  it  satisfies  the  goals  of  a  user  enacting  the  scenario;  col¬ 
lecting  general  ease  of  use  or  satisfaction  ratings  at  the  end  of 
each  episode  can  provide  a  simple  measure  of  this. 

When  the  prototype  is  robust  enough  to  measure  subtask 
times,  more  detailed  usability  specifications  guide  empirical 
testing.  In  the  example,  a  set  of  five  simple  subtasks  has  been 
analyzed  from  the  user-interaction  scenario.  These  tasks  are  di¬ 


rectly  related  to  claims  that  have  been  developed  for  key  de¬ 
sign  features  (only  a  couple  of  these  were  documented  here). 
Performance  measures  are  established,  based  either  on  the  de¬ 
signers’  own  (expert)  experiences  with  the  prototype,  or  on 
benchmark  data  collected  from  comparable  systems.  Satisfac¬ 
tion  measures  are  constructed  to  assess  one  or  more  of  the  spe¬ 
cific  concerns  raised  in  the  claims.  Eor  example,  a  negative  con¬ 
sequence  of  the  graphical  overview  was  the  parallel  complexity 
of  all  the  tables,  exhibits,  and  attendees.  The  usability  specifica¬ 
tion  tracks  this  issue  by  requiring  that  users’  perception  of  the 
fair  “complexity”  is  at  an  acceptably  low  level  (measured  by  a 
Likert-type  rating  scale  with  a  range  of  1-5).  The  satisfaction 
qualities  specified  for  the  other  subtasks  are  similarly  derived 
from  advantages  or  disadvantages  hypothesized  by  claims. 

Usability  specifications  developed  in  this  way  have  two  im¬ 
portant  roles  in  evaluation.  First,  they  provide  concrete  usability 
objectives  that  can  be  serve  as  a  management  tool  in  system  de¬ 
velopment — if  a  product  team  accepts  these  targets,  then  the 
team’s  usability  engineers  are  able  to  insist  that  redesign  and  im¬ 
provement  continue  until  they  are  met  (Carroll  &  Rosson,  1985; 
Good  et  al.,  1986). 

Second,  the  specifications  tie  the  results  of  empirical  evalu¬ 
ation  directly  to  the  usability  issues  raised  during  design.  For 
instance,  our  user  interaction  scenario  specified  that  Sharon 
locates  her  friend  by  mousing  over  the  red  dots  in  the  'VSF 
overview  image.  A  positive  consequence  is  that  awareness  of 
other  people  at  the  "VSF  is  enhanced;  a  negative  consequence 
is  that  the  display  becomes  more  crowded.  The  time  it  takes  to 
locate  Alicia  is  specified  as  one  measure  of  the  feature’s  impact, 
but  this  performance  target  is  complemented  by  users’  subjec¬ 
tive  reactions  to  the  feature.  For  example,  users  might  indicate 
a  level  of  agreement  to  a  statement  such  as:  “The  red  dots  indi¬ 
cating  people’s  location  at  the  "VSF  provided  feedback  about 
current  activities.”  Problematic  results  with  respect  to  either  of 
these  usability  targets  raise  specific  issues  for  redesign. 

As  development  continues,  more  complete  prototypes  or 
initial  working  systems  are  constructed.  At  this  point,  new  usage 
scenarios  are  introduced,  with  new  actors,  goals,  and  activity 
contexts.  These  scenarios  are  also  subjected  to  claims  analysis, 
and  the  resulting  claims  used  to  develop  additional  usability¬ 
testing  materials.  This  is  an  important  step  in  the  evaluation,  be¬ 
cause  it  ensures  that  the  design  has  not  been  optimized  for  the 


TABLE  53. 1 2.  Usability  Specification  Developed  from  VSF  User  Interaction  Scenario 


Task  Context:  Sharon  is  a  busy  high-school  student.  While  studying  for  a  biology  exam,  she  takes  a  quick  study  break  to  visit  the  VSF  advertised  in 
the  school's  recent  e-mail,  and  where  she  knows  her  friends  Alicia  and  Delia  are  exhibiting  their  science  projects.  When  she  opens  the  VSF,  she  finds 
Alicia’s  exhibit  on  the  'Astronomy”  table;  she  sees  that  Alicia  is  also  there  with  her  other  friends  Brian  and  Corey.  She  heads  over  to  chat  with  them  about 
Alicia's  exhibit. 

Overall  Scenario  Outcome: 

Average  rating  of  at  least  4.0  (of  5)  on  ease  of  use  and  satisfaction 

Performance  Targets 

Satisfaction  Targets 

Subtask  1 :  Navigate  to  VSF 

Subtask  2:  Identify  general  layout  of  the  VSF 

Subtask  3:  Identify  location  of  Alicia  at  the  VSF 

Subtask  4:  Navigate  to  Alicia's  exhibit  on  Black  Holes 

Subtask  5:  View  stored  chat  history 

1 0  seconds,  0  errors 

5  seconds,  0  error 

20  seconds,  1  error 

30  seconds,  1  error 

5  seconds,  0  error 

4  on  convenience 

I  on  complexity 

4  on  feedback 

1  on  directness 

4.5  on  inference 
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set  of  design  scenarios  that  has  guided  development  thus  far. 
Near  the  end  of  the  development  cycle  (or  at  predefined  mile¬ 
stones),  a  carefully  orchestrated  summative  evaluation  is  carried 
out,  with  the  goal  of  assessing  how  well  the  system  performs 
with  respect  to  its  usability  specifications. 

Scenario-Based  Iterative  Design 

The  SBUE  framework  is  highly  iterative.  Although  we  have  em¬ 
phasized  the  importance  of  grounding  of  new  activities  in  thor¬ 
ough  analysis  of  current  practice,  we  assume  that  many  new  pos¬ 
sibilities  will  not  be  realized  until  design  ideas  are  made  concrete 
and  exposed  to  actual  use.  At  times  entirely  new  activities  may  be 
envisioned,  when  breakdowns  in  the  technical  or  social  environ¬ 
ment  demand  a  radical  transformation  of  current  practice.  But 
even  for  design  problems  needing  innovative  responses,  an  SBUE 
process  is  valuable — the  problem  scenarios  can  help  designers  to 
predict  how  users  may  need  to  evolve  their  goals  and  expectations 
if  they  are  to  adapt  to  the  more  novel  activities  (Orlikowski,  1992). 

More  commonly,  software  development  involves  a  more 
gradual  coevolution  of  computing  technology  and  people’s  ac¬ 
tivities:  technology  raises  new  possibilities  for  action,  and  cre¬ 
ative  responses  to  such  possibilities  lead  to  new  requirements 
for  support  (Carroll,  Kellogg,  &  Rosson,  1991).  In  these  coevo¬ 
lutionary  settings,  user-interaction  scenarios  provide  a  central, 
activity-centered  thread  in  iterative  design,  serving  as  interme¬ 
diate  design  products  that  can  be  generated,  shared,  and  revised 
within  an  overarching  envisioning-evaluation-refinement  cycle. 


APPLICATIONS  OF  SCENARIO -BASED  DESIGN 
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FIGURE  53.2.  Scenarios  have  diverse  uses  throughout  system 
development  lifecycle. 


Even  if  scenarios  are  not  developed  and  incrementally  re¬ 
fined  as  recommended  in  the  SBUE  framework,  they  may  be 
used  at  many  points  along  the  way.  Eor  instance,  task-based  user 
documentation  is  often  structured  by  scenarios.  Minimalist  help 
and  training  provide  many  examples  of  this,  such  as  a  “training 
wheels”  system  that  blocks  functions  that  are  not  relevant  to  a 
paradigmatic  novice-use  scenario  (Carroll  &  Carrithers,  1983), 
or  a  “view  matcher”  that  guides  new  programmers  through  a 
predefined  scenario  of  debugging  and  modification  (Carroll, 
Singer,  Bellamy,  &  Alpert,  1990;  Carroll  &  Rosson,  1991;  Rosson 
&  Carroll,  1996). 


As  exemplified  by  books  discussing  scenario-based  design  meth¬ 
ods  (e.g.,  Alexander  &  Maiden,  2004;  Carroll,  1995,  2000),  one 
strength  of  scenarios  is  their  many  potential  roles  in  system  de¬ 
velopment.  Scenarios  are  accessible  to  many  process  stakehold¬ 
ers,  and  as  such  can  be  a  tremendous  aid  to  project  communica¬ 
tion  and  coordination.  Scenarios  can  be  represented  at  varying 
levels  of  specificity  and  formality,  making  them  an  attractive 
medium  for  gradual  evolution  and  specification  of  a  system’s  func¬ 
tions.  They  can  also  capture  important  aspects  of  the  nonfunc¬ 
tional  constraints  or  requirements  for  a  project.  In  this  section  we 
briefly  survey  research  and  practice  that  is  exploring  different  ap¬ 
plications  of  scenarios  to  the  problems  of  software  development. 

Scenarios  in  the  System-Development  Lifecycle 

Our  example  has  focused  on  the  familiar  processes  of  require¬ 
ments  analysis,  design,  and  usability  evaluation,  but  scenario- 
based  methods  is  support  many  diverse  goals  in  system  develop¬ 
ment  (Carroll,  1997;  see  Eig.  53.2).  Product  planners  present 
day-in-the-life  scenarios  to  managers  as  design  visions  (Dubberly  & 
Mitsch,  1992);  requirements  engineers  gather  workplace  scenarios 
through  direct  observation  and  interviews,  and  analyze  scenarios 
as  primary  data  (Anton,  McCracken,  &  Potts,  1994;  Holbrook, 
1990;  Hsia,  Samuel,  Gao,  Kung,  Toyoshima,  &  Chen,  1994;  Kaindl, 
1997;  Kuutti,  1995;  Potts,  1995). 


Personas  and  envisionment.  The  concept  of  a  per¬ 
sona — the  envisionment  and  elaboration  of  a  hypothetical  tar¬ 
get  user  with  her  or  her  own  personality,  needs,  and  prefer¬ 
ences — has  been  popularized  by  Alan  Cooper  and  his  colleagues 
(Cooper,  1998;  Cooper  &  Reimann,  2003).  Personas  are  often 
developed  during  the  problem  analysis  or  requirements  speci¬ 
fication  phase  of  a  project,  as  a  means  for  understanding,  ex¬ 
pressing,  and  working  with  the  goals  and  implied  requirements 
of  different  target  users.  Prospective  users  are  conveyed  through 
detailed  composite  user  archetypes  (often  represented  visually, 
including  personality  characteristics),  and  with  a  context  sce¬ 
nario  that  helps  to  anticipate  user  needs  and  expectations,  so 
that  the  designers  working  with  the  personas  can  have  as  di¬ 
rect  an  experience  as  possible  with  potential  users.  Although  it 
is  not  yet  clear  exactly  what  elements  of  a  persona  are  most  im¬ 
portant  in  guiding  or  inspiring  design,  research  does  suggest 
that  software  designers  find  personas  to  be  of  use  in  the  early 
phase  of  design  when  they  are  trying  to  understand  users’  needs 
(Vasara,  2003). 

The  use  of  personas  overlaps  considerably  with  our  own 
SBUE  framework,  although  we  use  the  phrase  “hypothetical 
stakeholder”  to  emphasize  that  these  characters  are  imaginary, 
and  that  they  should  include  a  diverse  set  of  perspectives  and 
roles  in  the  problem  situation  under  consideration.  But  as  for 
personas,  SBUE  encourages  designers  to  become  familiar  with 
and  empathize  with  envisioned  characters  and  personal 
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attributes  who  are  used  repeatedly  in  the  iterative  process  of 
scenario  analysis,  envisionment,  and  refinement. 

Use  cases  and  object-oriented  design.  Scenarios  have 
also  come  to  play  a  central  role  in  object-oriented  software  en¬ 
gineering  (Alexander  &  Maiden,  2004;  Jacobson,  1995;  Jacobson, 
Booch,  &  Rumbaugh,  1998;  Rubin  &  Goldberg,  1992;  Wirfs- 
Brock,  Wilkerson,  &  Wiener,  1990).  A  use  case  is  a  scenario  writ¬ 
ten  from  a  functional  point  of  view,  enumerating  all  of  the  possi¬ 
ble  user  actions  and  system  reactions  that  are  required  to  meet 
a  proposed  system  function  (Jacobson  et  al.  1992).  Use  cases  can 
then  be  analyzed  with  respect  to  their  requirements  for  system 
objects  and  interrelationships.  Wirfs-Brock  (1995)  described  a 
variant  of  use  case  analysis  in  which  she  develops  a  “user-system 
conversation” — using  a  two-column  format,  a  scenario  is  de¬ 
composed  into  a  linear  sequence  of  inputs  from  the  user  and  the 
corresponding  processing  and/or  output  generated  by  the  sys¬ 
tem.  Kaindl  (2000)  extended  this  analysis  by  annotating  how  sce¬ 
nario  steps  implement  required  user  goals  or  system  functions. 

Scenarios  are  promising  as  a  mediating  representation  for 
analyzing  interactions  between  human-centered  and  software- 
centered  object-oriented  design  issues  (Rosson  &  Carroll,  1993, 
1995,  2001) .  As  we  have  seen,  scenarios  can  be  decomposed  with 
respect  to  the  software  objects  needed  to  support  the  narrated 
user  interactions.  These  software  objects  can  then  be  further 
analyzed  with  respect  to  their  system  responsibilities,  identifying 
the  information  or  services  that  should  be  contributed  by  each 
computational  entity  (Wirfs-Brock  &  Wilkerson,  1989;  Beck  & 
Cunningham,  1989;  Rosson  &  Gold,  1989).  This  analysis  (often 
termed  “responsibility-driven  design,”  Wirfs-Brock  et  al.,  1990) 
may  lead  to  new  ideas  about  system  functionality,  for  example 
initiatives  or  actions  taken  by  a  software  object  on  behalf  of  the 
user  or  another  object.  Scenarios  and  claims  analysis  are  useful  in 
describing  these  new  ideas  and  considering  their  usability  im¬ 
plications  in  the  context  of  use  (Rosson,  1999;  Rosson  &  Carroll, 
1995;  Rosson  &  Carroll,  2001).  For  instance,  the  VSF  object  might 
be  given  the  responsibility  to  notify  potential  attendees  when 
their  acquaintances  have  arrived.  A  notification  service  such  as 
this  would  help  to  keep  attendees  informed,  but  individuals  may 
find  the  notices  annoying  if  they  appear  frequently,  or  they  may 
visit  only  when  such  a  notice  arrives. 

Analyzing  nonfunctional  requirements.  The  general 
accessibility  of  scenarios  makes  them  an  excellent  medium  for 
raising  and  discussing  a  variety  of  competing  concerns.  Software 
engineers  are  concerned  about  issues  such  as  code  reuse,  pro¬ 
gramming  language  or  platform,  and  so  on;  management  is  con¬ 
cerned  with  project  resources,  scheduling,  and  so  on;  a  mar¬ 
keting  team  focuses  on  issues  such  as  the  existing  customer 
base  and  the  product  cost.  These  diverse  concerns  are  non¬ 
functional  requirements  on  system  development — concerns 
about  “how”  a  system  will  be  developed,  fielded,  and  main¬ 
tained  rather  than  “what”  a  system  will  provide  (Sommerville, 
1992).  Usability  goals  are  often  specified  as  nonfunctional  re¬ 
quirements,  in  that  they  typically  focus  on  the  quality  of  the  sys¬ 
tem  rather  than  its  core  functions  (Mylopoulos,  Chung,  &  Nixon, 
1992).  The  low  cost  of  development,  content  flexibility,  and  nat¬ 
ural  language  format  of  scenarios  and  claims,  make  them  excel¬ 
lent  candidates  for  contrasting  and  discussing  a  range  of  such  is¬ 


sues  throughout  the  software-development  lifecycle  (Sutcliffe  & 
Minocha,  1998).  For  instance,  in  an  extension  of  the  SBUE  frame¬ 
work,  Rosson  and  Carroll  (2002)  illustrated  how  a  single  sce¬ 
nario  (e.g.,  a  scenario  of  Sharon  visiting  the  VSF  might  be  re-told 
to  include  interoperability  concerns  related  to  different  PC  op¬ 
erating  systems,  or  24 : 7  accessibility  concerns  related  to  server 
configurations).  As  in  SBUE,  tradeoffs  can  be  framed  to  raise, 
discuss,  and  prioritize  competing  concerns. 

Taking  a  more  formal  approach,  a  scenario  can  be  modeled 
as  a  sequence  of  task  steps  and  this  sequence  can  be  analyzed 
with  respect  to  users’  ability  to  perform  the  task  under  differ¬ 
ent  circumstances  (Alexander  &  Maiden,  2004).  For  example, 
Gregoriades  and  Sutcliffe  (2005)  illustrated  the  use  of  Bayesian 
Belief  Network  models  for  assessing  the  reliability  of  human 
users  to  carry  out  a  scenario’s  steps  under  varying  environmen¬ 
tal  pressures.  By  noting  when  and  where  the  predicted  reliability 
fails  to  pass  a  benchmark  threshold,  the  analysts  can  identify 
problematic  aspects  of  a  proposed  scenario,  as  well  as  contrast 
scenarios  from  alternative  designs.  The  goals  of  this  approach 
overlaps  with  tradeoff  analysis,  but  it  provides  a  more  systematic 
and  mathematical  representation  of  design  issues  most  likely 
to  need  attention  in  a  design. 

Evaluating  systems  in  use.  In  SBUE  a  software  devel¬ 
opment  process  begins  with  fieldwork  that  ultimately  is  sum¬ 
marized  through  a  set  of  stakeholder  profiles  and  diagrams, 
themes,  scenarios,  and  claims  analysis  (see  Chapter  2  in 
Rosson  &  Carroll,  2002).  However,  researchers  are  currently 
exploring  a  variant  of  this  approach  in  which  scenarios  are 
used  as  an  elicitation  technique  for  capturing  current  practices 
and  reactions  during  the  adoption  and  acceptance  phase  of  a 
fielded  system.  Haynes,  Purao  and  Skattebo  (2004)  reported  a 
field  study  of  a  collaborative  system  for  product  lifecycle  man¬ 
agement  in  the  U.S.  Marine  Corps.  They  interviewed  26  cur¬ 
rent  users  of  the  system,  selecting  individuals  representing  the 
various  organizational  roles  associated  with  system  use  (e.g., 
engineers,  scientists,  staff  members).  Their  interview  script 
was  scenario-based:  after  eliciting  information  about  the  in¬ 
terviewee’s  work  context,  they  asked  for  personal  scenarios 
of  system  use.  Later  they  transcribed  the  interviews  and  coded 
the  scenarios  to  identify  system  features  and  users’  apparent 
concerns  about  these  features  (coded  as  claims  with  upsides 
and  downsides).  The  findings  were  summarized  and  shared 
through  focus  groups  to  refine  the  evaluation.  The  technique 
allowed  them  to  collect  and  summarize  a  number  of  related 
concerns,  but  also  to  connect  (and  on  occasion  illustrate) 
these  concerns  with  the  organizational  roles  and  usage  con¬ 
text  represented  by  the  user  scenarios. 

As  this  example  suggests,  it  is  quite  possible  to  leverage  the 
expressive  and  communicative  power  of  scenarios  in  a  variety  of 
ways,  even  when  a  user-centered  analyst  enters  a  software  de¬ 
velopment  process  that  is  well  underway.  Because  scenarios  are 
focused  on  the  specifics  of  a  usage  context  and  associated  user 
experience,  they  are  a  natural  medium  for  “war  stories”  about 
how  things  are  or  are  not  working  (Orr,  1986).  When  comple¬ 
mented  by  tradeoff  analysis  (e.g.,  documented  as  claims),  such 
scenarios  can  become  a  powerful  medium  for  discussing  the 
need  for  change,  whether  aimed  at  the  nature  of  people’s  goals 
and  activities  or  to  the  technology  that  supports  them. 
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Scenario-Based  Design  Rationale  as  Theory 

To  this  point  we  have  focused  on  the  instrumental  role  of  sce¬ 
narios  and  tradeoff  analyses  in  supporting  the  analysis,  design, 
and  evaluation  activities  of  interactive  software  development. 
However,  another  potential  role  for  scenario-based  design  is 
building  and  expressing  a  science  base  for  user-centered  design. 
In  this  role,  a  system’s  scenario-based  analyses  comprise  design 
rationale  (DR)  that  can  be  recruited  for  explanation,  general¬ 
ization,  and  reuse — contributing  to  the  theory  base  for  user  in¬ 
teraction  design  (Sutcliffe,  2002). 

Carroll  and  Rosson  (2003)  used  a  community  MOO  as  a  de¬ 
sign  case  study  that  illustrates  the  three  scientific  foundations  of 
scenario-based  DR:  action  science,  ecological  science,  and  syn¬ 
thetic  science.  The  MOO’s  scenarios  and  associated  claims  sup¬ 
ported  action  science  during  the  envisioning  process,  when 
novel  usage  features  were  imagined  in  response  to  the  real 
world  needs  and  preferences  of  users.  The  DR  supported  eco¬ 
logical  science  by  surveying  and  documenting  salient  causal  fea¬ 
tures  of  real  world  usage  contexts;  these  surveys  can  guide  de¬ 
sign  efforts  in  similar  activities  or  domains  in  the  future.  Finally, 
the  contributions  to  synthetic  science  came  through  the  asso¬ 
ciation  of  interdisciplinary  evidence  and  explanations  to  the 
claims’  upsides  and  downsides. 

A  more  complete  analysis  of  scenarios  and  claims  as  theoret¬ 
ical  material  can  be  found  in  Sutcliffe’s  volume  on  Domain  The¬ 
ory  (Sutcliffe,  2002).  He  argued  that  claims  in  particular  are  an 
ideal  abstraction  for  stating  reusable  knowledge  about  inter¬ 
action  design  that  is  grounded  in  the  relevant  usage  context.  He 
continued  to  provide  a  template  for  developing  and  organizing 
claim-based  knowledge,  and  argued  through  demonstration  that 
such  a  representation  is  reusable  from  one  design  project  to  an¬ 
other.  This  potential  for  claims  as  reusable  design  knowledge  has 
been  explored  more  thoroughly  by  researchers  developing  the 
LINK-UP  claims  library  and  scenario-based  development  tool 
(Chewar,  Bachetti,  McCrickard,  &  Booker,  2005;  Payne  et  al., 
2003).  These  researchers  have  elaborated  the  claim  format  pro¬ 
posed  by  Sutcliffe  and  have  been  using  it  to  collect  and  organize 
scientific  knowledge  about  notification  system  design.  They  are 
currently  exploring  the  effectiveness  of  the  tools  in  teaching  us¬ 
ability  engineering  concepts  and  skills  to  undergraduate  and 
graduate  students  (Chewar  &  McCrickard,  2003) . 

Case-Based  Learning  about  Scenario-Based  Design 

Another  indirect  benefit  of  scenario-based  development  is  that 
the  resulting  scenarios,  claims,  and  related  design  documents 
can  support  a  variety  of  case-based  learning  activities.  As  a  con¬ 
crete  story,  even  a  single  scenario  can  be  evocative  enough  to 
illustrate  and  discuss  the  usage  issues  raised  by  its  tradeoff 
analysis.  A  set  of  interrelated  and  successively  refined  scenarios 
and  tradeoffs  presents  an  open-ended  and  rich  information 
structure  for  a  variety  of  active  learning  experiences  (Carroll  & 
Rosson,  2005a,  2005b,  2005c). 

Over  the  past  few  years,  we  have  been  collaborating  with 
other  HCI  faculty  and  practitioners  to  build  a  case  library  of  sce¬ 
nario-based  usability  engineering  projects  (ucs.ist.psu.edu),  and 
incorporating  case-based  learning  activities  into  our  graduate 


and  undergraduate  teaching.  The  students  carry  out  a  number 
of  in-class  or  homework  assignments  that  involve  case  analysis 
over  the  course  of  a  semester — for  example,  analyzing  the  im¬ 
plications  of  “perturbing”  a  case  study  in  specific  ways,  or  trac¬ 
ing  the  impacts  of  a  tradeoff  analyzed  early  in  development 
through  the  lifecycle.  They  also  analyze,  design,  and  prototype 
their  own  interactive  projects,  documenting  their  work  as  a 
miniature  case  study.  The  case-based  experiences  serve  as  a  sort 
of  surrogate  for  apprenticeship  learning  in  the  real  world,  as  the 
students  are  able  to  explore  and  question  the  reasoning  and 
decisions  of  actual  development  teams.  At  the  same  time,  the 
cases  are  an  excellent  vehicle  for  encountering  and  weighing 
the  many  competing  concerns  that  development  teams  must 
address  during  interactive  system  development. 


CURRENT  CHALLENGES 


When  we  design  interactive  systems,  we  design  usage  experi¬ 
ences.  We  create  possibilities  for  learning,  work,  and  leisure,  for 
interaction  and  information.  Scenarios — descriptions  of  mean¬ 
ingful  usage  episodes — are  popular  representational  tools  for 
making  use.  They  help  designers  to  understand  and  to  create 
computer  systems  and  applications  as  artifacts  of  human  activity, 
as  things  to  learn  from,  as  tools  to  use  in  one’s  work,  as  media 
for  interacting  with  other  people. 

Scenario-based  design  offers  significant  and  unique  lever¬ 
age  on  some  of  the  most  characteristic  and  vexing  challenges 
of  design  work;  scenarios  are  at  once  concrete  and  flexible, 
helping  developers  manage  the  fluidity  of  design  situations.  Sce¬ 
narios  emphasize  the  context  of  work  in  the  real  world;  this  en¬ 
sures  that  design  ideas  are  constantly  evaluated  in  the  context  of 
real-world  activities,  minimizing  the  risk  of  introducing  features 
that  satisfy  other  external  constraints.  The  work-oriented  char¬ 
acter  of  scenarios  also  promotes  work-oriented  communication 
among  stakeholders,  helping  to  make  design  activities  more  ac¬ 
cessible  to  many  sources  of  expertise.  Finally,  scenarios  are 
evocative,  raising  questions  at  many  levels,  not  only  about  the 
needs  of  the  people  in  a  scenario  as  written,  but  also  about  vari¬ 
ants  illustrating  design  tradeoffs. 

Scenario-based  methods  are  not  a  panacea.  A  project  team 
which  complains,  “We  wrote  scenarios,  but  our  system  still 
stinks!”  must  also  report  how  their  scenarios  were  developed, 
who  reviewed  them,  and  what  roles  they  played  in  system  de¬ 
velopment.  If  a  user-interaction  scenario  is  not  grounded  in 
what  is  known  about  human  cognition,  social  behavior,  and 
work  practices,  it  may  well  be  inspiring  and  evocative,  but  it  may 
mislead  the  team  into  building  the  wrong  system  (Carroll  et  al., 
1998).  Scenarios  are  not  a  substitute  for  hard  work. 

At  the  same  time,  any  work  on  user-interaction  scenarios  di¬ 
rects  a  project  team  to  the  needs  and  concerns  of  the  people 
who  will  use  a  system.  It  is  in  this  sense  that  scenarios  can  pro¬ 
vide  a  very  lightweight  approach  to  human-centered  design. 
Simply  writing  down  and  discussing  a  few  key  expectations  about 
users’  goals  and  experiences  will  enhance  a  shared  vision  of  the 
problems  and  opportunities  facing  system  users.  Adopting  a 
more  systematic  framework  such  as  described  here  adds  con¬ 
trol  and  organization  to  the  creative  process  of  design,  and  at  the 


53.  Scenario-Based  Design  •  1057 


same  time  generates  work  products  (scenarios  and  claims)  that 
can  serve  as  enduring  design  rationale  during  system  mainte¬ 
nance  and  evolution  (McKerlie  &  MacLean,  1994;  Moran  &  Car- 
roll,  1996). 

Where  are  scenarios  taking  us?  The  current  state  of  the  art 
in  the  design  of  interactive  systems  is  fragmented.  Scenarios 
are  used  for  particular  purposes  throughout  system  develop¬ 
ment,  but  there  is  no  comprehensive  process  (Carroll,  1995; 
Jarke,  Bui,  &  Carroll,  1998;  Weidenhaupt  et  al.,  1998).  Scenario 
practices  have  emerged  piecemeal,  as  local  innovations,  lead¬ 
ing  to  a  considerable  variety  of  scenario  types  specialized  for 
particular  purposes  (Campbell,  1992;  Young  &  Barnard,  1987).  A 
detailed  textual  narrative  of  observed  workplace  practices  and 
interactions,  a  use-case  analysis  of  an  object-oriented  domain 
model,  a  day-in-the-life  video  envisionment  of  a  future  product, 
and  the  instructions  for  test  subjects  in  an  evaluation  experi¬ 


ment  could  all  be  scenarios.  Recognizing  this,  and  cross  leverag¬ 
ing  the  many  different  views  of  scenarios,  is  a  potential  strength 
of  scenario-based  design.  But  much  work  remains  in  developing 
overarching  frameworks  and  methods  that  exploit  this  potential 
advantage. 

It  is  important  for  us  to  be  ambitious,  skeptical,  and  analytic 
about  scenarios  and  scenario-based  design.  Forty  years  ago, 
Herman  Kahn  (1962)  expressed  puzzlement  that  scenarios  were 
not  more  widely  used  in  strategic  planning.  In  the  1990s,  sce¬ 
narios  have  become  so  pervasive  in  interactive  system  design 
that  younger  designers  may  wonder  what  the  alternative  is  to 
scenario-based  design!  But  there  is  yet  some  strangeness  to  sce¬ 
narios.  We  are  not  much  farther  than  Kahn  was  in  understand¬ 
ing  how  scenarios  work  as  tools  for  planning  and  design,  or  in 
understanding  how  to  fully  exploit  their  unique  strengths  as 
aides  to  thought. 
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INTRODUCTION 


This  chapter  surveys  methods,  techniques,  and  practices  in  Par¬ 
ticipatory  Design  (PD)  that  can  lead  to  hybrid  experiences — that 
is,  practices  that  take  place  neither  in  the  workers’  domain,  nor 
in  the  software  professionals’  domain,  but  in  an  “in-between” 
region  that  shares  attributes  of  both  the  workers’  space  and  the 
software  professionals’  space.  Recent  work  in  cultural  theory 
claims  that  this  “in-between”  region,  or  “third  space,”  is  a  fertile 
environment  in  which  participants  can  combine  diverse  knowl¬ 
edges  with  new  insights  and  plans  for  action,  to  inform  the  needs 
of  their  organizations,  institutions,  products,  and  services.  Im¬ 
portant  attributes  of  third  space  experiences  include  challenging 
assumptions,  learning  reciprocally,  and  creating  new  ideas, 
which  emerge  through  negotiation  and  cocreation  of  identities, 
working  languages,  understandings,  and  relationships,  and 
polyvocal  (many-voiced)  discussions  across  and  through  differ¬ 
ences.  The  chapter  focuses  on  participatory  practices  that 
share  these  attributes,  including  (a)  site-selection  of  PD  work; 
(b)  workshops;  (c)  story  collecting  and  story  telling  through  text, 
photography,  and  drama;  (d)  games  for  analysis  and  design;  and 
(e)  the  cocreation  of  descriptive  and  functional  prototypes. 

lust  Add  Users  and  Stir? 

In  a  discussion  of  integrating  women’s  perspectives  into  a  male- 
dominated  curriculum.  Bunch  (1987)  noted  that  “you  can’t  just 
added  women  and  stir”  (p.  140).  It  takes  work,  and  new  ways 
of  thinking,  and  new  kinds  and  methods  of  openness,  to  bring 
substantively  new  voices  into  a  conversation.  Similarly,  to  bring 
users’  knowledges  and  perspectives  directly  into  computer 
specification  and  design,  it  is  necessary  to  do  more  than  “just 
add  users  and  stir.”  This  chapter  surveys  methods  that  go  be¬ 
yond  merely  adding  users — methods  to  create  new  settings  and 
experiences  that  can  assist  computer  professionals  to  work  in 
partnership  with  diverse  users  in  improving  both  computer 
technology  and  the  understandings  that  make  computer  tech¬ 
nologies  successful  in  real  use. 

Participatory  design  (PD)  is  a  set  of  theories,  practices,  and 
studies  related  to  end  users  as  full  participants  in  activities  lead¬ 
ing  to  software  and  hardware  computer  products  and  computer- 
based  activities  (Greenbaum  &  Kyng,  1991;  Muller  &  Kuhn,  1993; 
Schuler  &  Namioka,  1993).  The  field  is  extraordinarily  diverse, 
drawing  on  fields  such  as  (a)  user-centered  design,  (b)  graphic 
design,  (c)  software  engineering,  (d)  architecture,  (e)  public  pol¬ 
icy,  (f)  psychology,  (g)  anthropology,  (h)  sociology,  (i)  labor  stud¬ 
ies,  (j)  communication  studies,  and  (k)  political  science.  This  di¬ 
versity  has  not  lent  itself  to  a  single  theory  or  paradigm  of  study 
or  approach  to  practice  (Slater,  1998).  Researchers  and  practi¬ 
tioners  are  brought  together — but  are  not  necessarily  brought 
into  unity — by  a  pervasive  concern  for  the  knowledges,  voices, 
and/or  rights  of  end  users,  often  within  the  context  of  software 
design  and  development,  or  of  other  institutional  settings  (e.g., 
workers  in  companies,  corporations,  universities,  hospitals,  gov¬ 


ernments).  Many  researchers  and  practitioners  in  PD  (but  not 
all)  are  motivated  in  part  by  a  belief  in  the  value  of  democracy 
to  civic,  educational,  and  commercial  settings — a  value  that  can 
be  seen  in  the  strengthening  of  disempowered  groups  (includ¬ 
ing  workers),  in  the  improvement  of  internal  processes,  and  in 
the  combination  of  diverse  knowledges  to  make  better  services 
and  products. 

PD  began  in  an  explicitly  political  context,  as  part  of  the  Scan¬ 
dinavian  workplace  democracy  movement  (e.g.,  Nygaard,  1975; 
Bjerknes,  Ehn,  &  Kyng,  1987;  Ehn  &  Kyng,  1987;  Floyd,  Mehl, 
Reisin,  Schmidt,  &  Wolf,  1989;  more  recently,  see  Bjerknes  & 
Bratteteig,  1995;  Beck,  1996;  Kyng  &  Matthiessen,  1997;  Aarhus 
Conference,  2005;  Winner,  1994).  Early  work  took  the  form  of 
experiments  conducted  by  university  researchers  in  alliances 
with  organized  labor  (for  historical  overviews,  see  Ehn,  1993; 
Levinger,  1998). 

Subsequent  work  focused  on  combining  complex  and  dis¬ 
tinct  knowledges  for  realistic  design  problems.  Fowles  (2000), 
for  example,  wrote  of  transforming  the  “symmetry  of  ignorance” 
(mutual  incomprehension  between  designers  and  users)  into  a 
complementary  “symmetry  of  knowledge”  through  symmetries 
of  participation  and  symmetries  of  learning.  Similarly,  Holm- 
strom  (1995)  analyzed  a  “gap  in  rationalities”  among  developers 
and  users.  I  wrote  about  the  need  for  translations  among  the  co¬ 
equal  worlds  of  users  and  of  software  professionals,  and  the 
need  to  foster  a  polyvocal  polity  in  which  these  various  inter¬ 
ested  parties  could  coconstruct  new  concepts,  meanings,  and 
alliances  (Muller,  1997a,  1997b).  Suchman  (2002)  described  her 
historical  practice  of  PD  as  “working  for  the  presence  of  multi¬ 
ple  voices  not  only  in  knowledge  production,  but  in  the  pro¬ 
duction  of  technologies  as  knowledges  objectified  in  a  particu¬ 
lar  way.”  Bqdker  and  Buur  (2002)  noted  the  need  to  support  the 
“many-voiced  nature  of  design.”  These  acknowledgements  of 
the  integrity  and  rationality  of  multiple  voices  and  multiple 
knowledges  (e.g.,  users  and  software  professionals)  are  a  crucial 
aspect  of  the  argument  of  this  chapter,  concerning  the  creation 
of  hybrid  spaces  between  and  among  those  diverse  perspec¬ 
tives. 

Recently,  PD  has  achieved  a  status  as  a  useful  commercial 
tool  in  some  settings  (e.g.,  McLagan  &  Nel,  1995),  with  several 
major  and  influential  consultancies  forming  their  business  iden¬ 
tities  around  participatory  methods,^  and  an  increasing  number 
of  textbooks  for  design  or  IT  governance  based  on  participatory 
principles  (Beyer  &  Holtzblatt,  1998;  Bqdker,  Kensing,  &  Si- 
monsen,  2004)  This  overall  corporate  and  managerial  “main- 
streaming”  of  PD  has  been  greeted  by  some  with  enthusiasm, 
and  by  others  with  dismay.  Participatory  work  in  the  United 
States  has  sometimes  been  criticized  as  too  friendly  to  manage¬ 
ment.  Participator  work  on  the  Pacific  Rim  (e.g.,  Noro  &  Imada, 
1991)  appears  to  have  grown  out  of  the  quality  movement,  and 
focuses  much  more  on  solving  problems,  and  much  less  on 
changing  workplace  power  relations. 

A  more  recent  trend  has  been  the  maturing  of  lifecycle  ap¬ 
proaches  to  participatory  work.  Early  and  somewhat  experi¬ 
mental  lifecycle  models  were  offered  by  Mumford  (1983)  and 
Floyd  (1993),  anticipated  in  some  ways  by  Checkland  (1981). 


^In  the  interest  of  fairness  to  other  consultancies,  I  will  not  provide  the  names  of  commercial  ventures. 
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Two  more  mature  approaches  have  been  offered  by  Beyer  and 
Holtzblatt  (1998)  and  Bpdker  and  colleagues  (2004). 

This  chapter  primarily  addresses  methods,  techniques,  and 
practices  in  participatory  design,  with  modest  anchoring  of  those 
practices  in  theory.  I  will  not  repeat  our  recent  encyclopedic  sur¬ 
vey  of  participatory  practices  (Muller,  Hallewell  Haslwanter,  & 
Dayton,  1997).  Rather,  I  will  pursue  a  trend  within  those  prac¬ 
tices  that  has  shown  the  most  growth  during  the  past  years,  and 
I  will  motivate  my  interest  in  that  trend  through  recent  advances 
in  the  theory  of  cultural  studies.  I  will  focus  on  participatory 
practices  that  fall  in  the  hybrid  realm  between  the  two  distinct 
work  domains  of  (a)  software  professionals  and  (b)  end  users. 

I  should  also  say  that  my  concern  is  for  methods  that  have 
been  shown  to  work  in  real  situations — for  example,  those  that 
address  real  problems  in  work  life,  education,  home  life,  leisure, 
and  so  forth — in  which  the  outcomes  were  of  consequence,  and 
in  which  the  participants  could  freely  choose  whether  to  be  in¬ 
volved  in  the  work.  I  have,  therefore,  omitted  many  promising 
methods  that  have  so  far  been  explored  only  as  in-laboratory  uni¬ 
versity  exercises,  apparently  as  part  of  assigned  coursework.  I 
look  forward  more  realistic  explorations  of  these  new  methods, 
and  I  hope  to  include  them  in  later  revisions  of  this  survey. 

Major  Bibliographic  Sources  for  Participatory  Design 

Theory  practice,  and  experience  in  participatory  design  have 
been  published  in  a  series  of  conference  proceedings  and  sev¬ 
eral  major  books. 

Conference  Series 

Five  important  conference  series  have  made  major  contri¬ 
butions  to  PD: 

1.  Critical  Computing.  Four  conferences  have  been  held,  at 
10-year  intervals,  in  the  Critical  Computing  series,  most  re¬ 
cently  in  2005  (Aarhus  Conference,  2005).  Major  papers  from 
the  conferences  have  appeared  as  two  influential  books 
(Bjerknes  et  al.,  1987;  Kyng  &  Matthiessen,  1997). 

2.  IRIS  Conference  (Information  systems  Systems  Re¬ 
search  iln  Scandinavia).  The  annual  IRIS  conference  se¬ 
ries  often  include  sessions  and  individual  contributions  on 
participatory  topics.  Proceedings  may  be  available  through 
the  IRIS  Association,  or  online.^ 

3.  Participatory  Design  Conference.  The  Participatory 
Design  Conference  has  met  on  even-numbered  years  since 
1990.  Proceedings  are  published  by  Computer  Professionals 
for  Social  Responsibility  (CPSR).^  Selected  papers  from  sev¬ 


eral  conferences  have  appeared  in  edited  volumes  or  special 
journal  issues  (e.g.,  Kensing  &  Blomberg,  1998;  Muller  & 
Kuhn,  1993;  Schuler  &  Namioka,  1993).  Papers  from  recent 
conference  years  are  available  through  the  ACM  Digital 
Library."* 

4.  IFIP  Conferences.  A  number  of  conferences  and  work¬ 
shops  (sponsored  by  IFIP  Technical  Committee  [TC]  9)  have 
focused  on  selected  topics  within  participatory  design  (e.g.. 
Briefs,  Ciborra,  &  Schneider  [1983];  Clement,  Kolm,  Wagner 
[1994];  Docherty,  Fuchs-Kittowski,  Kolm,  &  Matthiessen 
[1987];  Gartner  &  Wagner  [1995];  and  van  den  Besselaar, 
Clement,  &Jaervinen  [1991]).^ 

5.  Nordic  Conferences  on  Human-Computer  Interaction. 

The  NORDCHI  conference  series  (sometimes  also  called 
NORDICHI)  meets  on  even-numbered  years,  with  a  strong 
emphasis  on  participatory  work  within  a  broader  Scandina¬ 
vian  context  (Nordichi,  2006).  Papers  from  2002  and  2004  are 
available  through  the  ACM  Digital  Library. 

Major  papers,  panels,  and  tutorials  on  participatory  design 
have  also  appeared  in  the  CHI,  CSCVC]  ECSCVC]  and  DIS  confer¬ 
ence  series,  beginning  as  early  as  1988  {Proceedings  available 
through  the  Association  for  Computing  Machinery,®),  and  in 
Proceedings  of  the  Usability  Professionals’  Association^  confer¬ 
ence  series,  of  the  INTERACT  conference  series,  and  of  the  Hu¬ 
man  Factors  and  Ergonomics  Society  conference  series.  Several 
papers  at  the  Co-Designing  2000  Conference®  addressed  par¬ 
ticipatory  themes  (Scrivener,  Ball,  &  Woodcock,  2000). 

Boofe 

In  addition  to  the  books  cited  above,  major  collections  of 
papers  and/or  chapters  related  to  participatory  design  ap¬ 
peared  in  Carroll’s  (1995)  volume  on  scenarios  in  user  inter¬ 
action  (see  also  Carroll,  2000),  Greenbaum  and  Kyng’s  (1991) 
Design  at  Work,  and  Wixon  and  Ramey’s  (1996)  collection  of 
papers  on  field-oriented  methods  (1996).  Individual  books  that 
have  been  influential  in  the  field  include  Bpdker’s  (1990)  ap¬ 
plication  of  activity  theory  to  issues  of  participation,  Ehn’s 
(1988)  account  of  work-oriented  design,  Suchman’s  (1987)  dis¬ 
cussion  of  situated  action,  and  Beyer’s  and  Holtzblatt’s  (1998) 
presentation  of  contextual  inquiry  and  contextual  design  (see 
also  Holtzblatt,  chapter  49,  this  volume).  A  recent  volume  by 
Bqdker  and  colleagues  (2004)  may  broaden  the  impact  of  PD 
among  information  technology  departments.  Earlier  influen¬ 
tial  works  include  a  series  of  books  on  socio-technical  theory 
and  practice  by  Mumford  (e.g.,  1983;  Mumford  &  Henshall, 
1979/1983),  as  well  as  Checkland’s  (1981)  soft  systems  method¬ 
ology.  Noro  and  Imada  (1991)  developed  a  hybrid  ergonomic 


^http  ://iris .  informatik.gu .  se/ 

3www.cpsr.org. 

"‘http  ://portal.  acm .  org/dl.  cfm 

3http://www.ifip.orat/.  For  TC  9,  see  http://www.ifip.or.at/bulletin/bulltcs/memtc09.htm. 

®www.acm.org 

^www.upassoc.org 

®http  ://vide .  Coventry,  ac .  uk/codesigning/ 
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approach,  involving  participation  and  quality  programs,  which 
has  been  influential  around  the  Pacific  Rim.  For  a  historical  PD 
bibliography,  see  the  CPSR  website 

journals 

Three  journals  have  carried  the  greatest  number  of  PD 
papers:  (a)  Scandinavian  Journal  of  Information  Systems,^ 
(b)  Computer  Supported  Cooperative  Work:  The  Journal  of  Col¬ 
laborative  Computing,^®  and  (c)  Human  Computer  Interaction.^^ 

Wehsites 

Computer  Professionals  for  Social  Responsibility  maintains 
a  set  of  PD  resources  at  http://www.cpsr.org/prevsite/program/ 
workplace/PD.html#resources. 


HYBRIDITY  AND  THE  THIRD  SPACE 


This  chapter  is  concerned  with  participatory  methods  that  oc¬ 
cur  in  the  hybrid  space  between  software  professionals  and  end 
users.  Why  is  this  hybrid  space  important? 

Bhabha  (1994)  made  an  influential  argument  that  the  border 
or  boundary  region  between  two  domains,  or  two  spaces,  is  of¬ 
ten  a  region  of  overlap  or  hybridity — for  instance,  a  “third  space” 
that  contains  an  unpredictable  and  changing  combination  of  at¬ 
tributes  of  each  of  the  two  bordering  spaces.  His  area  of  con¬ 
cern  was  colonization,  in  which  some  native  people  find  them¬ 
selves  caught  in  between  their  own  traditional  culture  and  the 
newly  imposed  culture  of  the  colonizers  (see  also  Dingawaney  & 
Maier,  1994;  Karttunen,  1994).  Their  continual  negotiation  and 
creation  of  their  identities,  as  efforts  of  survival,  creates  a  new 
hybrid  or  third  culture  (Bhabha,  1994;  see  also  Lyotard,  1984) 
and  even  a  third  language  (Anzaldua,  1999;  Bachmann-Medick, 
1996).  In  such  a  hybrid  space,  enhanced  knowledge  exchange  is 
possible,  precisely  because  of  those  questions,  challenges,  rein¬ 
terpretations,  and  renegotiations  (Bachmann-Medick,  1996). 
These  dialogues  across  differences — and,  more  importantly, 
within  differences — are  stronger  when  engaged  in  by  groups, 
emphasizing  not  only  a  shift  from  assumptions  to  reflections,  but 
also  from  individuals  to  collectives  (Carrillo,  2000). 

Bhabha’s  (1994)  conception  has  become  highly  influential. 
Bachmann-Medick  (1996)  applied  the  concepts  to  translation 
theory.  Grenfell  (1998)  interpreted  concepts  of  hybridity  in  a 
study  of  living-at-the-border  in  multicultural  education  settings. 
Evanoff  (2000)  surveyed  a  number  of  theoretical  applications 
of  hybridity,  from  evolutionary  biology  to  constructivist  per¬ 
spectives  in  sociology  to  democratic  responses  to  intercultural 
ethical  disagreements.  He  explored  formulations  from  multiple 
disciplines,  involving  “third  culture”  in  intercultural  ethics, 
“third  perspective”  involving  “dynamic  inbetweenness”  in  Asian- 


TABLE  54.1.  Summary  of  Claims  Relating  to  Third  Spaces 

Overlap  between  two  (or  more)  different  regions  or  fields 
(inbetweenness) 

Marginal  to  reference  fields 

Novel  to  reference  fields 

Not  "owned”  by  any  reference  field 

Partaking  of  selected  attributes  of  reference  fields 

Potential  site  of  conflicts  between/among  reference  fields 

Questioning  and  challenging  of  assumptions 
Mutual  learning 
Synthesis  of  new  ideas 

Negotiation  and  (co-)creation  of.  .  . 

Identities 
Working  language 

Working  assumptions  and  dynamics 

Understandings 

Relationships 

Collective  actions 

Dialogues  across  and  within  differences  (disciplines) 

Polyvocality 

What  is  considered  to  be  data? 

What  are  the  rules  of  evidence? 

How  are  conclusions  drawn? 

Reduced  emphasis  on  authority — increased  emphasis  on  interpretation 
Reduced  emphasis  on  individualism — increased  emphasis  on 
collectivism 

Heterogeneity  as  the  norm 


Western  exchanges,  and  a  psychological  “third  area”  in  the  de¬ 
velopment  of  a  “multicultural  personality.” 

A  summary  of  the  claims  relating  to  third  spaces  (or  hybrid¬ 
ity)  appears  in  Table  54.1. 

Hybridity  and  HCl 

Within  HCI,  Suchman  (2002)  recently  renewed  her  call  for  dia¬ 
logue  across  boundaries  between  the  partial  perspectives  of  end 
users  and  developers.  Suchman  argued  for  boundary-crossing 
and  mutual  learning  between  these  different  standpoints,  and 
appealed  in  part  to  recent  developments  in  feminist  episte¬ 
mologies  which  argue  that  objectivity  is  the  constructive  out¬ 
come  of  an  on-going  dialogue  among  multiple  perspectives. 

The  approach  in  this  chapter  begins  with  a  similar  recogni¬ 
tion  of  diverse  perspectives.  Unlike  Suchman’s  (2002)  emphasis 
on  the  boundary  between  these  perspectives,  however,  this 
chapter  is  concerned  with  creating  regions  of  overlap  where  the 
perspectives  can  come  into  mutual  knowledge  and,  potentially, 
alliance — with  the  creation  of  the  hybrid  spaces  in  which  objec¬ 
tivity  can  emerge  through  constructive  discussion,  dialogue,  ne¬ 
gotiation,  and  mutual  learning.  Similarly,  this  chapter  pursues  a 
different  solution  from  the  located  accountability  recommended 
by  Suchman,  who  sees  each  participant  as  located  within  a  par- 


®’http  ://www.  cs.auc.dk/~sjis/ 

^“http  ://www.  wkap  .nl/j  ournalhome.htm/ 
^'http://hci-journal.com/ 
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ticular  perspective  and  interest,  for  example,  “Organizations 
comprise  multiple  constituencies  each  with  their  own  profes¬ 
sional  identities  and  views  of  others.”  By  contrast,  the  methods 
in  this  chapter  allow  for  the  creation  of  new  perspectives  and 
new  locations,  and  they  acknowledge  the  possibility  that  each 
participant  can  make  different  choices  at  different  moments 
about  where  to  locate  her  or  his  perspective,  standpoint,  and 
thus,  accountability.  In  keeping  with  the  origins  of  PD  in  class 
struggle  (e.g.,  Ehn  &  Kyng,  1987),  Suchman  focuses  on  opposing 
interests  that  meet  across  a  designated  divide.  This  chapter  in¬ 
stead  pursues  the  polyvocal  polity  that  I  proposed  (Muller, 
1997a)  and  the  need  identified  by  Bpdker  and  Buur  (2002;  see 
also  Buur  &  Bpdker,  2000)  to  create  a  “meeting  ground”  for  a 
“widen  [ed]  . . .  circle  of  participants”  that  can  “support  the  many 
voices  being  brought  forth  in  order  to  create  the  new,  and  to  find 
ways  of  supporting  this  multivoicedness.” 

There  have  been  many  calls  within  HCI  for  mutual  or  recipro¬ 
cal  learning  in  hybrid  spaces  (e.g.,  Bpdker,  Ehn,  Kyng,  Kammers- 
gaard,  &  Sundblad,  1987;  Bpdker,  Knudsen,  Kyng,  Ehn,  &  Mad¬ 
sen, 1988;  Druin,  1999;  Druin  et  al.,  2000;  Ehn  &  Sjogren,  1991; 
Floyd,  1987;  Kensing  &  Madsen,  1991;  Lanzara,  1983;  Mogensen  & 
Trigg,  1992;  Muller,  1997a;  Muller,  Wildman,  &  White,  1994;  Mum- 
ford,  1983;  Torpel  &  Poschen,  2002;  Tscheligi  et  al.,  1995).  Bee¬ 
son  and  Miskelly  (2000)  appealed  to  the  notion  of  hybridity  (“het¬ 
erotopia”)  in  describing  workers  who,  like  colonized  peoples,  deal 
“in  a  space  which  is  not  their  own”  (p.  2),  taking  limited  and 
opportunistic  actions  to  preserve  “plurality,  dissent,  and  moral 
space”  (p.  1).  Maher,  Simoff,  &  Gabriel  (2000)  described  the  cre¬ 
ation  of  virtual  design  spaces  for  sharing  diverse  perspectives. 
Merkel  and  colleagues  (2004)  described  a  need  for  “a  new  set  of 
skills  and  competencies  that  go  beyond  technical  design  skills  . .  . 
to  create  conditions  that  encourage  a  collaborative  design  process 
and  active  reflection  ...  for  working  with  groups  .  .  .  that  push  on 
the  traditional  boundaries  between  users  and  designers”  (pp. 
7-8).  In  an  early  formulation,  Lanzara  (1983)  suggested  that: 

[A]  large  part  of  the  design  process,  especially  in  large-scale  projects  and 
organizations  involving  several  actors,  is  not  dedicated  to  analytical 
work  to  achieve  a  solution  but  mostly  to  efforts  at  reconciling  conflict¬ 
ing  [conceptual]  frames  or  at  translating  one  frame  into  another.  Much 
work  of  the  designer  is  .  . .  concerned  with  . .  .  defining  collectively  what 
is  the  relevant  problem,  how  to  see  it. 

Tscheligi  and  colleagues  (1995),  in  a  panel  on  prototyping, 
considered  that  the  “products”  of  prototyping  include  not  only 
artifacts,  but  also  understandings,  communications,  and  rela¬ 
tionships — a  theme  that  was  echoed  in  a  more  recent  panel  on 
modeling  (Kaindl,  Constantine,  Karat,  &  Muller,  2001).  Fander- 
clai  (1995,  1996)  captured  a  strong  sense  of  possible  new  dy¬ 
namics  and  new  learnings  in  a  hybrid  online  space.  Finally, 
Thackara  (2000)  based  part  of  his  plenary  address  at  CHI  2000 
on  the  concept  of  the  third  space,  providing  a  needed  hybrid¬ 
ity  to  HCI  studies. 


Participatory  Design  as  the  Third  Space  in  HCI 

In  this  chapter,  I  extend  the  HCI  analyses  surveyed  in  the  pre¬ 
ceding  paragraphs,  and  apply  Bhabha’s  (1994)  perspective  to 


the  HCI  problem  of  methods  to  bridge  between  two  spaces — 
the  world  of  the  software  professionals,  and  the  world  of  the 
end  users  (see  also  Muller,  1997a,  1997b).  As  noted  by  Such- 
man  (2002),  each  world  has  its  own  knowledges  and  practices; 
each  world  has  well-defined  boundaries.  Movement  from  one 
world  to  the  other  is  known  to  be  difficult.  We  can  see  this  dif¬ 
ficulty  manifested  in  our  elaborate  methods  for  requirements 
analysis,  design,  and  evaluation — and  in  the  frequent  failures 
to  achieve  products  and  services  that  meet  users’  needs  and/or 
are  successful  in  the  marketplace. 

Traditional  scientific  practice  in  HCI  has  focused  on  instru¬ 
ments  and  interventions  that  can  aid  in  transferring  information 
between  the  users’  world  and  the  software  world.  Most  of  the 
traditional  methods  are  relatively  one-directional;  for  example, 
we  analyze  the  requirements  from  the  users,  we  deliver  a  sys¬ 
tem  to  the  users,  and  we  collect  usability  data  from  the  users. 
While  there  are  many  specific  practices  for  performing  these 
operations,  relatively  few  of  them  involve  two-way  discussions, 
and  fewer  still  afford  opportunities  for  the  software  profession¬ 
als  to  be  surprised — to  learn  something  that  we  didn  ’t  know  we 
needed  to  know. 

The  PD  tradition  has,  from  the  outset,  emphasized  mutuality 
and  reciprocity — often  in  a  hybrid  space  that  enabled  new  rela¬ 
tionships  and  understandings.  Bpdker  and  colleagues  (1988) 
made  specific  references  to  “the  mutual  validation  of  diverse 
perspectives.”  Floyd  (1987)  analyzed  software  practices  into  two 
paradigms,  which  she  termed  product-oriented  (focused  on  the 
computer  artifact  as  an  end  in  itself)  and  process-oriented  (fo¬ 
cused  on  the  human  work  process,  with  the  computer  artifact 
as  means  to  a  human  goal).  In  her  advocacy  of  balancing  these 
two  paradigms,  Floyd  noted  that  the  process-oriented  paradigm 
required  mutual  learning  among  users  and  developers  (see  also 
Segall  &  Snelling,  1996).  Most  of  PD  theories  and  practices  re¬ 
quire  the  combination  of  multiple  perspectives — in  part,  be¬ 
cause  complex  human  problems  require  multiple  disciplines 
(e.g.,  software  expertise  and  work-domain  expertise)  for  good 
solutions  (e.g.,  Fowles,  2000;  Holmstrom,  1995),  and  in  part  be¬ 
cause  the  workplace  democracy  tradition  reminds  us  that  all  of 
the  interested  parties  (in  the  States,  we  would  say  “stakehold¬ 
ers”)  should  have  a  voice  in  constructing  solutions  (e.g.,  Ehn  & 
Kyng,  1987).  In  a  related  development,  there  are  increasing  calls 
for  critical  reflection  in  design,  based  on  combining  perspec¬ 
tives  across  disciplines,  including  the  recent  Aarhus  Conference 
on  Critical  Computing  (Aarhus  Conference,  2005). 

Participatory  Design  Contains  Its  Own  Third  Space 

The  preceding  argument — that  PD  serves  as  a  kind  of  third 
space  to  HCI — might  be  interesting,  but  is  hardly  worth  a  chap¬ 
ter  in  a  handbook.  I  now  turn  to  the  question  of  hybridity  in 
methods  within  the  field  of  PD  itself 

In  their  “tools  for  the  toolbox”  approach,  Kensing  and  Munk- 
Madsen  (1993)  developed  a  taxonomy  to  analyze  about  thirty 
participatory  methods  (see  also  Kensing,  Simonsen,  &  Bpdker, 
1996;  and,  in  independent  convergences  on  the  same  attribute, 
see  Gjersvik  &  Hepsp,  1998;  Luck,  2000;  Reid  &  Reed,  2000). 
The  first  dimension  of  their  taxonomy  contrasted  abstract 
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methods  (suitable  for  a  software  professional’s  organization) 
with  concrete  methods  (suitable  for  work  with  end  users) 
Muller  and  colleagues  (Muller,  White,  &  Wildman,1993;  Muller 
et  al.,  1997)  elaborated  on  this  taxonomic  dimension  by  asking 
whose  work  domain  served  as  the  basis  for  the  method  (in  the 
States,  we  would  call  this  a  matter  of  “turf,”  as  in  “on  whose  turf 
did  the  work  take  place?”).  At  the  abstract  end  of  the  contin¬ 
uum,  the  users  have  to  enter  the  world  of  the  software  pro¬ 
fessionals  in  order  to  participate — e.g.,  rapid  prototyping 
(Grpnbask,  1989)  and  quality  improvement  (Braa,  1996).  At  the 
concrete  end  of  the  continuum,  the  software  professionals  have 
to  enter  the  world  of  the  users  in  order  to  participate;  for  ex¬ 
ample,  ethnography  (Blomberg,  Giacomi,  Mosher,  &  Swenton- 
Wall,  1993;  Crabtree,  1998;  Orr  &  Crowfoot,  1992;  Suchman  & 
Trigg,  1991;  see  also  Blomberg  et  al.,  chapter  •  • ,  this  volume)), 
on-going  tailoring  during  usage  (Henderson  &  Kyng,  1991; 
MacLean,  Carter,  Lovstrand,  &  Moran,  1990),  and  end-user  “de¬ 
sign”  by  purchasing  software  for  small  companies  (Krabbel  & 
Wetzel,  1998;  Robertson,  1996,  1998). 

For  the  purposes  of  this  chapter,  we  can  now  ask,  “What 
about  the  practices  that  did  not  occur  at  the  abstract  or  con¬ 
crete  end-points  of  the  continuum?  What  about  the  practices 
in  between?”  These  practices  turn  out  to  occur  in  an  uncer¬ 
tain,  ambiguous,  overlapping  disciplinary  domain  that  does  not 
“belong”  to  either  the  software  professionals  or  the  end  users 
(e.g.,  these  practices  occur  in  neither  the  users’  turf  nor  the 
software  professionals’  turf).  The  practices  in  between  the  ex¬ 
tremes  are  hybrid  practices,  and  constitute  the  third  space  of 
participatory  design.  As  we  explore  hybrid  methods  that  occur 
in  this  third  space,  we  can  look  for  HCI  analogies  of  the  attrib¬ 
utes  and  advantages  that  were  listed  for  Third  Space  studies  in 
Table  54.1. 


THIRD  SPACE:  NEGOTIATION,  SHARED 
CONSTRUCTION,  AND  COLLECTIVE 
DISCOVERY  IN  PD  AND  HCI 


In  the  remaining  sections  of  the  chapter,  I  will  describe  a  diver¬ 
sity  of  participatory  design  techniques,  methods,  and  practices 
that  provide  hybrid  experiences  or  that  operate  in  intermediate, 
third  spaces  in  HCI.  Because  my  theme  is  hybridity  I  have  or¬ 
ganized  these  descriptions  in  terms,  strategies,  and  moves  that 
introduce  novelty,  ambiguity,  and  renewed  awareness  of  possi¬ 
bilities,  occurring  at  the  margins  of  existing  fields  or  disciplines 
(see  Table  54.1).  In  several  cases,  a  single  report  may  fall  into 
several  categories.  For  example,  Ehn  and  Sjogren  (1991)  con¬ 
ducted  a  workshop  (see  “Workshops”  in  this  chapter)  in  which 
a  storytelling  method  (see  “Stories”  in  this  chapter)  provided  a 
space  in  which  people  negotiated  the  naming  and  defining  of 
workplace  activities  (see  “Language”  in  this  chapter).  I  hope  that 
the  strategies  and  moves  of  the  PD  practitioners  and  re¬ 
searchers  will  become  clear,  despite  the  multiple  views  onto  in¬ 
dividual  reports. 


SPACES  AND  PLACES 


Sitings 

One  of  the  simplest  parameters  that  can  be  manipulated  to 
influence  hybridity  is  the  site  of  the  work.  At  first,  this  appears 
to  be  a  simple  issue.  As  Robins  (1999)  said,  “There  are  two  ap¬ 
proaches  to  participatory  design:  (1).  Bring  the  designers  to 
the  workplace.  (2).  Bring  the  workers  to  the  design  room” 
(p.  •  •  •).  This  binary  choice  reflects  the  taxonomic  distinc¬ 
tions  that  I  previously  reviewed;  however,  even  within  the  bi¬ 
nary  choice,  the  selection  of  the  site  can  be  important.  Fowles 
(2000),  in  a  discussion  of  participatory  architectural  practice, 
provided  an  insight  that  can  apply  as  well  for  HCI:  “If  possi¬ 
ble  [,]  design  workshops  should  be  located  in  the  locality  of 
the  participating  group  and  in  the  School  of  Architecture. 
Bringing  the  public  into  the  School  helps  to  de-mystify  the 
profession,  and  taking  students  in  the  community  furthers 
their  understanding  of  the  problem  and  its  context”  (p.  65). 
Pedersen  and  Buur  (2000),  in  their  work  on  industrial  sites, 
agreed  (italics  in  the  original) : 

When  collaborating  with  users  in  our  design  environment  (e.g.,  a  meet¬ 
ing  space  at  the  company),  we  can  invite  a  number  of  users  from  differ¬ 
ent  plants  and  learn  from  hearing  them  exchange  work  experiences  .  .  . 
Being  in  a  foreign  environment  (and  with  other  users),  users  will  tend  to 
take  a  more  general  view  of  things. 

When  collaborating  with  users  in  their  work  context,  users  tend  to 
feel  more  at  ease  as  they  are  on  their  home  ground — ^we  are  the  visitors. 
Tools  and  environment  are  physically  present  and  easy  to  refer  to.  This 
makes  for  a  conversation  grounded  in  concrete  and  specific  work 
experiences. 

The  idea  was  born  to  create  a  type  of  design  event  with  activities  in 
both  environments  and  with  two  sets  of  resources  to  support  design 
collaboration. 

In  our  study  of  telephone  operators’  work,  we  held  our  ses¬ 
sions  at  operator  service  offices  and  in  research  offices  (Muller 
et  al.,  1995a).  The  work  site  meetings  had  the  advantages  of  easy 
access  to  equipment  on  which  we  could  demonstrate  or  exper¬ 
iment.  During  those  meetings,  we  had  a  sense  of  being  strongly 
tied  to  practice.  The  research  site  meetings  were  less  tied  to 
specific  practices,  and  had  a  tendency  to  lead  to  more  innova¬ 
tive  ideas.  Perhaps  more  subtly,  the  two  different  sites  enfran¬ 
chised  different  marginal  participants.  At  the  work  site,  it  was 
easy  to  bring  in  additional  work-domain  experts  (mostly  trainers 
and  procedures  experts)  who  became  adjunct  members  of 
the  core  analysis  team  for  the  duration  of  those  meetings,  and 
became  resources  for  the  core  team  afterwards.  At  the  research 
site,  it  was  easy  to  bring  in  more  technology  experts,  as  well  as 
the  graduate  students  who  later  performed  data  analysis.  The 
research  site  meetings  became  an  occasion  of  enfranchisement, 
contribution,  and  early  commitment  for  these  additional  actors. 
Both  core  and  adjunct  members  became  authors  of  our  report 
(Muller  et  al.,  1995a). 


^^Their  second  dimension  was  of  less  interest  for  the  purposes  of  this  chapter. 
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Brandt  and  Grunnet  (2000)  also  considered  site  selection  in 
their  Smart  Tool  and  Dynabook  projects,  which  were  concerned 
with  working  conditions  in  the  office  and  in  the  home,  respec¬ 
tively.  In  the  Smart  Tool  case,  they  conducted  dramatic  scenar¬ 
ios  in  the  project  designers’  environment.  In  the  Dynabook 
case,  they  asked  people  at  home  to  create  and  enact  scenarios 
in  their  own  living  areas. 

Third  space.  In  terms  of  hybridity,  the  selection  of  site 
can  be  a  deliberate  strategy  to  introduce  new  experiences  and 
perspectives  to  one  or  more  parties  in  the  design  process — a 
de-centering  move  that  can  bring  people  into  positions  of  am¬ 
biguity,  renegotiation  of  assumptions,  and  increased  exposure 
to  heterogeneity.  Returning  to  Bhabha’s  original  argument,  site 
selection  initially  appears  to  be  a  matter  of  moving  across  the 
boundary  between  different  work  cultures,  rather  than  living 
within  the  boundary .  The  use  of  common  design  practices 
across  sites,  however,  makes  those  practices  (and  the  mem¬ 
bership  of  the  design  group)  into  a  kind  of  movable  third 
space.  The  practices  and  the  group  membership  become  sta¬ 
ble  features  that  persist  across  multiple  sites.  At  the  same  time, 
the  practices,  and  even  the  membership,  grow  and  evolve  with 
exposure  to  new  sites  and  new  understandings.  In  these  ways, 
the  practices  become  an  evolutionary  embodiment  of  the 
knowledge  of  the  learnings  of  the  group  (e.g.,  Floyd,  1987; 
Muller,  1997a). 

Claimed  benefits.  What  have  practitioners  gained 
through  site  selection,  within  this  deliberately  hybrid-oriented 
work  area?  Several  themes  emerge: 

•  Improved  learning  and  understanding.  Fowles  (2000) 
described  a  move  from  a  “symmetry  of  ignorance”  toward  a 
“symmetry  of  knowledge”  as  diverse  parties  educated  one  an¬ 
other  through  a  “symmetry  of  learning” — and  even  a  kind  of 
“transformation”  through  exposure  to  new  ideas  (see  also 
Carmien  et  al.,  2003).  Brandt  and  Grunnet  (2000),  Pedersen 
and  Buur  (2000),  and  Muller  and  colleagues  (1995b)  also 
claimed  that  the  selection  of  site  led  to  the  strengthening  of 
the  voices  that  were  comfortable  at  each  site. 

•  Greater  ownership.  Petersen  and  Buur  (2000)  noted  that 
their  procedures  strengthened  user  involvement  in  their  pro¬ 
ject.  Fowles  (2000)  and  Muller  and  Colleagues  (1995b;  see 
also  Muller  et  al.,  1994)  make  specific  reference  to  increases 
in  commitment  and  ownership  of  the  evolving  knowledge 
and  design  of  the  group. 

Woricshops 

Workshops  may  serve  as  another  alternative  to  the  two  “stan¬ 
dard”  sites  that  most  of  us  think  about.  In  PD,  workshops  are 
usually  held  to  help  diverse  parties  (“interested  parties”  or 
“stakeholders”)  communicate  and  commit  to  shared  goals, 
strategies,  and  outcomes  (e.g.,  analyses,  designs,  and  evalua¬ 
tions,  as  well  as  workplace-change  objectives).  Workshops  are 
often  held  at  sites  that  are  in  a  sense  neutral — they  are  not  part 
of  the  software  professionals’  workplace,  and  they  are  not  part 
of  the  workers’  workplace. 


More  importantly,  workshops  usually  introduce  novel  proce¬ 
dures  that  are  not  part  of  conventional  working  practices.  These 
novel  procedures  take  people  outside  of  their  familiar  knowl¬ 
edges  and  activities,  and  must  be  negotiated  and  collectively  de¬ 
fined  by  the  participants.  Workshops  are  thus  a  kind  of  hybrid  or 
third  space,  in  which  diverse  parties  communicate  in  a  mutual¬ 
ity  of  unfamiliarity  and  must  create  shared  knowledges  and 
even  the  procedures  for  developing  those  shared  knowledges. 

The  best-known  workshop  format  in  PD  is  the  Future  Work¬ 
shop  (e.g.,  Kensing  &  Madsen,  1991;  see  also  Bpdker  et  al., 
2004;  McPhail,  Costantino,  Bruckmann,  Barclay,  &  Clement, 
1998;  M0rch,  Engen,  &  Asand,  2004),  Based  in  German  civic 
planning  (Jungk  &  Mullert,  1987),  a  Future  Workshop  proceeds 
through  three  stages:  (a)  Critiquing  the  present,  (b)  Envision¬ 
ing  the  future,  and  (c)  Implementing,  or  moving  from  the  pre¬ 
sent  to  the  future.  These  three  activities  involve  participants  in 
new  perspectives  on  their  work,  and  help  to  develop  new  con¬ 
cepts  and  new  initiatives. 

A  number  of  workshops  have  focused  on  simple  materials 
and  informal  diagrams,  rather  than  on  formal  notations.  B0d- 
ker  and  colleagues  (2004)  noted  that,  “The  tools  are  simple  di¬ 
agrams  or  drawings  with  no  special  formalisms  . . .  because  staff 
members  participating  in  the  workshops,  as  well  as  those  to 
whom  the  results  are  later  presented,  typically  have  no  experi¬ 
ence  with  technical  descriptions  using  [Information  Technology]  - 
originated  formalisms”  (p.  252). 

Sanders  (2000)  described  a  family  of  “generative  tools,”  ac¬ 
tivities  that  are  selectively  combined  into  Strategic  Design  Work¬ 
shops,  under  an  overall  conceptual  strategy  that  combines  mar¬ 
ket  research  (“what  people  say”),  ethnography  (“what  people 
do”),  and  participatory  design  (“what  people  make”).  Activities 
include  the  construction  of  collages  focused  on  thinking  (e.g., 
“how  do  you  expect  your  work  to  change  in  the  future?”),  map¬ 
ping  (e.g.,  laying  out  an  envisioned  work  area  on  paper),  feeling 
(“use  pictures  and  words  to  show  a  health-related  experience  in 
your  past”),  and  story  telling  (see  “Stories”  and  “Making  De¬ 
scriptive  Artifacts,”  following).  Dandavate,  Steiner,  and  William 
(2000)  provided  a  case  study  of  Sanders’  method. 

In  a  different  setting,  Buur,  Binder,  &  Brandt  (2000)  devel¬ 
oped  a  workshop  in  which  workers  carried  a  mock-up  of  a  pro¬ 
posed  new  device  (see  the  following  section,  “Making  Non- 
Functional  Artifacts”)  through  an  industrial  plant,  recording 
how  it  would  be  used.  They  then  acted  out  a  five-minute  video 
scenario  (see  “Dramas,”  following),  which  they  subsequently 
presented  to  other,  similar  worker  teams  in  a  workshop.  Hult- 
crantz  and  Ibrahim  (2002)  used  a  similar  method  to  concretize 
workshops  similar  to  focus  groups  that  were  held  with  family 
members  in  their  own  homes.  Pedell  (2004)  described  a  lower- 
tech  storyboarding  workshop  format  in  which  people  created 
narratives  using  photographs,  putting  them  in  sequences  and  in 
many  cases  altering  (typically  through  the  addition  of  speech 
bubbles  to  show  what  people  were  thinking  or  doing).  Monk 
and  Howard  (1998)  used  a  similar  method,  with  less  emphasis 
on  photographs,  to  develop  a  “rich  picture”  of  a  work  domain. 

Cameron  (1998),  too,  faced  a  different  setting  and  problem, 
and  chose  a  workshop  solution.  This  project  dealt  with  safety  is¬ 
sues  in  urban  design  in  Baltimore  and — like  the  METRAC  pro¬ 
gram  in  Toronto  (Nisonen,  1994) — invited  community  members 
to  contribute  their  domain  expertise  as  people  who  lived  with 


1068  •  MULLER 


safety  issues  on  an  everyday  basis.  Cameron  provided  a  man¬ 
ual,  based  on  a  professionally  developed  set  of  safety  guidelines. 
Community  members  became  community  organizers,  bringing 
the  project  topic  and  the  proposed  guidelines  to  their  own  con¬ 
stituencies.  Two  additional  workshops  refined  the  safety  audit 
information  from  the  constituencies,  selected  priority  issues  to 
fix,  and  adopted  an  action  plan.  Cameron  observed  that: 

One  of  the  successful  aspects  of  the  Design  for  Safety  workshop  is  that 
it  provided  a  forum  for  a  diverse  group  of  people  to  productively  discuss 
common  problems  and  work  through  shared  solutions  and  consensus. 
The  workshops  also  showed  that  crime  and  safety  were  not  solely  the 
responsibility  of  the  police,  but  that  public  works  employees,  traffic  en¬ 
gineers,  and  especially  residents  must  work  together  to  envision  as  well 
as  carry  out  the  plan  .  . .  Requiring  that  residents  share  the  workshop  in¬ 
formation  at  community  association  meetings  further  assisted  the  trans¬ 
fer  of  responsibility  from  the  workshop  into  the  neighborhood. 

Several  other  groups  have  developed  repertoires  of  multi¬ 
ple  workshops,  from  which  they  can  select  the  type  of  work¬ 
shop  that  is  needed  for  a  particular  situation,  site,  or  problem. 
Svanaes  and  Seland  (2004)  described  six  workshops;  I  list  four 
formats  that  they  considered  successful  here: 

1.  Workshop  1.  Theatre,  modeling  clay,  “design  by  accident,” 
and  improvisation  with  teenagers  to  explore  “our  mobile 
future” 

2.  Workshop  2.  Theatre,  brainstorming,  and  improvisation 
with  a  much  more  structured  set  of  props  (no  modeling 
clay)  for  a  different  telecommunications  project 

3.  Workshops  4  and  5.  Theatre  with  audience-critique  of  per¬ 
formance  (similar  to  Boal’s  Theatre  of  the  Oppressed,  pre¬ 
viously  described),  sometimes  using  structured  props  as  well 
as  “designing  on  the  spot”  for  new  concepts,  for  a  hospital 
communication  project 

4.  Workshop  6.  Videotaped  field  data  as  a  point  of  common 
reference,  before  theatrical  work  similar  to  workshops  4  and  5. 

Finally,  Bpdker,  and  colleagues  (2004)  described  a  repertoire 
of  workshops.  One  subset  of  workshops  was  differentiated 
largely  in  terms  of  the  artifact  that  was  cocreated  by  the  partici¬ 
pants,  such  as  freehand  drawing  (see  also  Monk  &  Howard, 
1998),  collages  (see  also  Pedell,  2004;  Sanders,  2000),  affinity 
diagrams  (see  also  Beyer  &  Holtzblatt,  1998),  and  timelines. 
Dray  (1992)  also  used  free-hand  drawing  technique,  but  in  a 
round-robin  brainstorming  BrainDraw  format  in  which  In  par¬ 
ticipants  collaboratively  drew/w  drawings,  rotating  the  drawings 
throughout  the  group  so  that  each  drawing  contained  ideas  cre¬ 
ated  by  each  of  the  members  of  the  group. 

Less  familiar  artifacts  were  also  used  to  define  and  differen¬ 
tiate  workshops  in  the  Bpdker  and  colleagues  (2004)  survey. 
Dead  Sea  scrolls  are  textual  descriptions  of  the  history  of  a  busi¬ 
ness  process.  Roll  lists  are  brief  textual  descriptions  of  all  of  the 
interested  parties  related  to  a  business  activity  or  a  technology 
artifact.  Mapping  (also  called  “mind  mapping” — see  e.g.,  T.  Buzan 
&  B.  Buzan,  1996,  for  nonworkshop  use  of  this  technique)  is  the 


description  of  a  problem  area,  business  process,  function,  or 
other  matter  of  interest  in  terms  of  a  number  of  briefly  stated 
concepts,  connected  by  lines  or  arcs.  A  special  version  of  map¬ 
ping  constructs  a  “communication  map”  among  persons  or 
roles.  Finally,  Prompted  Reflections  can  be  used  similarly  to 
Dray’s  (1992)  Braindraw  technique,  to  bring  people  with  differ¬ 
ent  design  concepts  into  communication  with  one  another. 

Third  space.  The  various  workshop  approaches  have  sev¬ 
eral  commonalities.  Each  workshop  brings  together  diverse  par¬ 
ticipants  to  do  common  work,  to  produce  common  outcomes 
(especially  Bpdker  et  al.,  2004),  and  to  develop  a  plan  of  joint  ac¬ 
tion  (especially  Kensing  &  Madsen,  1991;  Bpdker  et  al.,  2004; 
McPhail  et  al.,  1998;  Mprch  et  al.,  2004).  They  are  thus  oppor¬ 
tunities  that  require  mutual  education,  negotiation,  creation  of 
understanding,  and  development  of  shared  commitments.  Each 
workshop  takes  place  in  an  atmosphere  and  (often)  in  a  site  that 
is  not  “native”  to  any  of  the  participants.  Thus,  all  of  the  partici¬ 
pants  are  at  a  disadvantage  of  being  outside  of  their  own  famil¬ 
iar  settings,  and  they  must  work  together  to  define  their  new  cir¬ 
cumstances  and  relationships.  The  combination  of  diverse 
voices  leads  to  syntheses  of  perspectives  and  knowledges. 

Claimed  benefits.  Advantages  claimed  for  these  experi¬ 
ences  in  hybridity  include: 

•  Development  of  new  concepts  that  have  direct,  practical 
value  for  product  design  (Dandavate,  Steiner,  &  William, 
2000;  Kensing  &  Madsen,  1991;  Sanders,  2000)  or  for  com¬ 
munity  action  (Cameron,  1998). 

•  Engagement  of  the  interested  parties  (“stakeholders”)  in 
the  process  and  outcome  of  the  workshop. 

•  Combinations  of  different  people’s  ideas  into  unified 
concepts. 

•  Production  of  artifacts  that  are  the  expected  and  useful 
“inputs”  to  the  next  stage  of  the  development  process  (Bpd- 
ker  et  al.,  2004;  Svanaes  &  Seland,  2004). 


NARRATIVE  STRUCTURES 


Stories 

Stories  and  story  telling  have  played  a  major  role  in  ethnographic 
work  since  before  there  was  a  field  called  “HCI”  (for  review,  see 
Crabtree,  1998;  Suchman  &  Trigg,  1991;  see  also  Blomberg  & 
Burrell,  chapter  50,  this  volume).  Stories  have  also  had  an  im¬ 
portant  history  in  HCI  (see  Carroll,  1995;  Erickson,  1996;  Muller, 
1999a).  I  will  not  attempt  to  review  these  areas.  Rather,  I  will  fo¬ 
cus  on  those  aspects  of  story  collecting  and  story  telling  that  in¬ 
volve  the  construction  of  third  spaces  and  hybridity. 

Stories  in  participatory  work  may  function  in  at  least  three 
ways.^^  First,  they  may  be  used  as  triggers  for  conversation, 
analysis,  or  feedback  (Salvador  &  Howells,  1998;  Salvador  & 


^^For  a  survey  of  story  genres  that  may  be  used  in  participatory  work,  see  Karasti,  Baker,  &  Bowker  (2002). 
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Sato,  1998, 1999) .  Second,  they  may  be  told  by  end  users  as  part 
of  their  contribution  to  the  knowledges  required  for  under¬ 
standing  product  or  service  opportunities  and  for  specifying 
what  products  or  services  should  do  (Brandt  &  Grunnet,  2000; 
Lafreniere,  1996;  Muller,  2001;  Muller  et  al.,  1995b;  Noble  & 
Robinson,  2000;  Patton,  2000;  Sanders,  2000;  Tschudy  et  al., 
1996) .  Third,  they  may  be  used  by  design  teams  to  present  their 
concept  of  what  a  designed  service  or  product  will  do,  how  it 
will  be  used,  and  what  changes  will  occur  as  a  result  (Druin, 
1999;  Druin  et  al.,  2000;  Ehn  &  Kyng,  1991;  Ehn  &  Sjogren,  1986, 
1991;  Gruen,  2001;  Muller  et  al.,  1994;  Sanders,  2000). 

Beeson  and  Miskelly  (1998,  2000)  used  hypermedia  tech¬ 
nologies  to  enable  communities  to  tell  their  own  stories,  with 
the  intention  that  “plurality,  dissent,  and  moral  space  can  be 
preserved”  (Beeson  &  Miskelly,  2000,  p.  1).  They  were  con¬ 
cerned  to  allow  multiple  authors  to  reuse  community  materials 
selectively,  telling  different  stories  within  a  common  context. 
The  different  accounts  were  organized  according  to  themes, 
and  laid  out  spatially  on  the  image  of  a  fictitious  island  for  navi¬ 
gation  by  end  users. 

Their  work  entered  several  areas  or  aspects  of  hybridity.  First, 
the  authors  of  the  stories  (e.g.,  community  members)  were  us¬ 
ing  hypermedia  technology  for  the  first  time,  and  were  thus  in 
the  role  of  learners,  even  while  they  were  the  owners  of  the 
stories,  and  were  thus  in  the  role  of  experts.  Second,  the  au¬ 
thors  wrote  from  their  own  perspectives,  which  were  some¬ 
times  in  strong  conflict  with  one  another.  Third,  the  authors 
could  make  use  of  one  another’s  materials,  effectively  moving 
away  from  single-author  narratives  and  into  a  kind  of  collabora¬ 
tive  collage  of  materials,  which  conveyed  interlinked  stories. 
Fourth,  just  as  the  community  members  were  negotiating  and 
defining  their  roles  as  learner-experts,  the  software  professionals/ 
researchers  were  negotiating  and  defining  their  roles  as  experts- 
facilitators-students.  Torpel  and  Poschen  (2002)  described  a  re¬ 
lated  method  of  Narrative  Transformation,  emphasizing  work¬ 
ers’  roles  as  story  creators,  story  analysts,  and  originators  of  new 
concepts  that  could  be  pursued  through  other  methods  in  this 
chapter  (e.g.,  low-tech  prototyping). 

A  second  line  of  practice  and  research  has  emphasized  end 
users  telling  their  stories  using  a  system  of  paper-and-pencil, 
card-like  templates.  The  earliest  version  was  the  Collaborative 
Analysis  of  Requirements  and  Design  (CARD)  technique  of  Tudor, 
Muller,  Dayton,  and  Root  (1993),  later  developed  into  a  more 
general  tool  in  Muller  and  colleagues  (1995b)  and  further  re¬ 
fined  in  Muller  (2001).  Lafreniere  (1996)  developed  a  related 
practice.  Collaborative  Users’  Task  Analysis  (CUTA),  repairing 
some  of  the  deficits  of  CARD  for  his  settings.  Tschudy  et  al. 
(1996)  developed  their  own  highly  visual  version,  PictureCARD, 
for  a  setting  in  which  they  had  no  language  in  common  with 
the  users  whose  stories  they  wished  to  understand. 

The  card-based  practices  used  pieces  of  cardboard  about 
the  size  of  playing  cards.  Each  card  represented  a  component 
of  the  user’s  work  or  life  activities,  including  user  interface 
events  (e.g.,  screen  shots),  social  events  (conversations,  meet¬ 
ings)  and  cognitive,  motivational,  and  affective  events  (e.g.,  the 
application  of  skill,  the  formation  of  goals  or  strategies,  sur¬ 
prises  and  breakdowns,  evaluations  of  work  practices).  The 
cards  were  used  by  diverse  teams  in  analysis,  design,  and  eval¬ 
uation  of  work  and  technology.  Because  the  cards  were  novel 


object  to  all  the  participants,  they  occasioned  third-space  ques¬ 
tionings  and  negotiations,  resulting  in  new  shared  understand¬ 
ings  and  co-constructions.  Often,  teams  used  the  cards  to  pre¬ 
pare  a  kind  of  story  board,  narrating  the  flow  of  work  and 
technology  use  and  annotating  or  innovating  cards  to  describe 
that  work.  The  resulting  posters  formed  narratives  of  the  work 
that  were  demonstrated  to  be  understandable  to  end  users, 
corporate  officers,  and  software  professionals,  and  which  led  to 
insights  and  decisions  of  large  commercial  value  (see  Sanders, 
2000,  for  a  differently  constructed  example  of  storyboard 
posters  to  describe  work) . 

Druin  (1999;  Druin  et  al.,  2000)  pursued  a  third  line  of  story¬ 
telling  research  and  practice,  with  children  as  design  partners  in 
a  team  that  also  included  computer  scientists,  graphic  designers, 
and  psychologists  (for  other  participatory  work  with  children, 
see  Sanders,  2000;  Sanders  &  Nutter,  1994).  Their  purpose  was 
to  envision  new  technologies  and  practices  in  children’s  use  of 
computers  and  related  devices.  They  used  both  online  story¬ 
boarding  techniques  and  the  construction  of  prototypes  of 
spaces  in  which  the  jointly  authored  stories  could  be  performed. 
This  work  kept  everyone  learning  from  everyone  else — children 
learning  about  technologies  and  the  storyboarding  environment, 
adults  learning  about  children’s  views  and  other  adults’  exper¬ 
tises,  and  everyone  negotiating  the  meaning  of  new  technologi¬ 
cal  and  narrative  ideas,  as  well  as  their  implementations. 

So  far,  this  section  has  addressed  primarily  the  acquisition 
of  stories;  however,  stories  are  also  for  telling  to  others.  Sanders 
(2000)  described  the  construction  of  storyboards  based  on 
users’  experiences.  Gruen  (2000,  2001)  described  guidelines 
and  practices  through  which  a  diverse  team  could  begin  with  a 
concept,  and  then  could  craft  a  convincing  and  engaging  story 
around  it.  Sanders’  and  Gruen’s  procedures  led  to  hybrid  expe¬ 
riences,  in  the  sense  that  few  software  professionals  or  end 
users  think  in  terms  of  story  construction  or  rubrics  for  effective 
fictions.  Irestig  and  Timpka  (2002)  described  a  method  for  shar¬ 
ing  stories  from  small  working  groups  with  a  larger  audience  of 
decision  makers. 

Third  space.  Story  collecting  and  story  telling  generally  re¬ 
quire  a  kind  of  third  space  in  which  to  occur.  Beeson  and 
Miskelly  (1998,  2000)  were  specifically  concerned  to  create  a 
new  space  for  story  writing  and  story  reading,  and  to  maintain 
some  of  the  most  important  aspects  of  third  spaces  in  that  new 
space — e.g.,  preservation  and  expression  of  new  meanings,  re¬ 
lationships,  conflicts,  multiple  perspectives,  and  “heterotopia.” 
The  three  card-based  practices  use  unfamiliar  media  (the 
cards),  and  made  those  media  central  to  the  team’s  activities, 
thus  requiring  conscious  attention  to  shared  conceptualizing 
and  defining  of  those  media,  as  well  as  the  creation  of  new  me¬ 
dia  when  needed.  Druin  and  colleagues  (2000)  created  new 
software  environments  and  new  devices  to  craft  and  imple¬ 
ment  stories  of  futuristic  technologies.  Finally,  Gruen  (2000, 
2001)  engaged  diverse  teams  in  new  roles  as  story  writers, 
guided  by  expert-derived  guidelines,  in  the  writing  of  profes¬ 
sionally  structured  and  professionally  paced  stories  for  organi¬ 
zational  or  commercial  use. 

Claimed  benefits.  The  story-collecting  and  storytelling 
practices  are  diverse,  and  serve  multiple  purposes.  A  brief  sum- 
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mary  of  the  claims  of  their  value  to  projects  and  products  is 
as  follows: 

•  Articulation  and  preservation  of  a  diverse  community’s 
views  (Beeson  &  Miskelly,  1998,  2000). 

•  Practical  application  to  work  analysis,  task  analysis,  new 
technology  innovation,  and  usability  evaluation  in  commer¬ 
cially  important  products  and  services  (Gruen,  2000,  2001; 
Lafreniere,  1996;  Muller,  2001;  Muller  et  al.,  1995b;  Sanders, 
2000;  Tudor  et  al.,  1993;  Tschudy  et  al.,  1996). 

•  Co-creation  of  new  ideas  and  children’s  articulation  and 
self-advocacy  (Druin,  1999;  Drain  et  al.,  2000). 

Photographs 

Stories  can  be  told  in  many  ways.  One  approach  that  has 
informed  recent  PD  work  is  end-user  photography.  Patton 
(2000)  noted  that  both  (a)  taking  pictures  and  (b)  organizing 
pictures  into  albums  are,  of  course,  familiar  activities  to  most 
people  in  affluent  countries.  These  activities  allow  end  users  to 
enter  into  a  kind  of  native  ethnography,  documenting  their 
own  lives.  In  keeping  with  the  issues  raised  in  the  preceding 
“Stories”  section,  it  is  important  that  the  informants  themselves 
(the  end  users)  control  both  the  camera  and  the  selection  of 
images  (see  Bolton,  1989,  for  a  set  of  discussions  of  the  uses 
and  abuses  of  documentary  photography).  They  thus  become 
both  authors  and  subjects  of  photographic  accounts  of  their 
activities.  This  dual  role  leads  to  one  kind  of  hybridity,  in  which 
the  photographic  activities  partake  of  both  the  world  of  com¬ 
mon  social  life,  and  the  world  of  documenting  and  reporting 
on  working  conditions. 

In  an  exploration  of  products  for  mobile  knowledge  workers, 
Dandavate  and  colleagues  (2000)  similarly  asked  their  infor¬ 
mants  to  take  pictures  as  part  of  a  documentation  of  the  work¬ 
ing  lives.  In  their  study,  informants  were  also  invited  to  con¬ 
struct  collages  of  their  working  lives,  selectively  reusing  the 
photographs  (among  other  graphical  items)  in  those  collages. 
The  collages  were,  in  effect,  one  type  of  interpretation  by  the 
photographers  of  their  own  photographs.  Similarly  to  Patton’s 
work,  Dandavate  and  colleagues  asked  their  informants  to  go 
out  of  their  conventional  professional  roles  as  office  workers 
(but  well  within  their  roles  as  members  of  an  affluent  culture)  in 
the  activity  of  taking  the  photographs.  Dandavate  and  col¬ 
leagues  then  asked  their  informants  to  go  even  further  out  of 
role,  through  the  construction  of  the  collages  based  on  their 
photographs  and  the  interpretation  of  the  collages.  The  activi¬ 
ties  were  thus  marginal,  partaking  of  attributes  of  informal  life 
and  professional  life,  of  familiar  and  unfamiliar  activities.  They 
concluded  that  the  photographic  work  led  to  new  learnings  and 
understandings  that  had  not  been  accessible  through  observa¬ 
tional  studies,  as  well  as  a  stronger  sense  of  ownership  by  their 
informants  in  the  outcome  of  the  study. 


Noble  and  Robinson  (2000)  formed  an  alliance  between  an 
undergraduate  design  class  at  Massey  University  and  a  union  of 
low-status  service  workers,  developing  photodocumentaries  of 
service  work.  The  photographs  served  as  a  kind  of  hybrid 
boundary  object  (Star  &  Griesemer,  1989);  for  the  students,  the 
photographs  were  composed  artifacts  of  design,  while  for  the 
union  members,  the  photographs  were  common  and  casually 
produced  snapshots.  Discussions  between  union  members  and 
students  were  rich,  conflicted,  and  productive,  as  they  negoti¬ 
ated  the  status  and  meaning  of  these  hybrid  objects.  These  dis¬ 
cussions — and  the  exhibits  and  posters  that  they  produced  (e.g., 
the  collective  actions  of  the  students  and  the  union  members) — 
could  not  have  been  successful  without  mutual  learning  and 
construction  of  new  understandings.  Photodocumentaries  were 
used  by  Kwok  (2004)  as  a  means  of  providing  familiar,  concrete 
artifacts  to  enable  design  collaborations.  Mattelmaki  and  Batar- 
bee  (2002;  see  also  Hulkko,  Mattelmaki,  Virtanen,  &  Keinonen, 
2004)  used  photodocumentaries  as  one  component  of  a  set 
of  user-composed  diary  techniques,  with  a  subsequent  user- 
created  collages  to  serve  as  a  rich  source  of  discussions.^"* 

Third  space.  End-user  photography  is  an  interesting  case 
of  hybridity  and  the  production  of  third  spaces.  Photography  is 
a  good  example  of  an  “in-between”  medium — one  that  is  part  of 
many  people’s  informal  lives  (Dandavate  et  al.,  2000;  Noble  & 
Robinson,  2000;  Patton,  2000),  but  that  is  also  an  intensively 
studied  medium  of  communication  and  argumentation  (Bolton, 
1989;  Noble  &  Robinson,  2000).  Photography  occurs  at  the  mar¬ 
gin  of  most  people’s  work,  and  yet  can  easily  be  incorporated 
into  their  work. 

The  resulting  photographs  in  these  projects  have  attributes 
of  their  dual  worlds — they  are  partially  informal  and  quotidian, 
and  partially  formal  and  documentary.  Discussions  around  the 
photographs,  and  combination  of  the  photographs  into  photo¬ 
narratives  (Kwok,  2004;  Patton,  2000)  or  collages  (Dandavate 
et  al.,  2000;  Hulkko  et  al.,  2004;  Mattelmaki  &  Batarbee,  2002) 
can  lead  to  mutual  learning  and  new  ideas,  particularly  through 
the  inclusion  of  the  voices  of  the  photographers,  the  viewers, 
and  especially  the  people  depicted  in  the  photographs  (Noble  & 
Robinson,  2000;  see  also  discussion  of  Isomursu,  Kuutti,  & 
"Vainamo,  2004,  following). 

Claimed  benefits.  The  use  of  end-user  photographs  ap¬ 
pears  to  be  new  and  experimental,  and  there  are  few  strongly 
supported  claims  of  benefits.  Informal  claims  of  success  and 
contribution  include  the  following: 

•  Richer,  contextualized  communication  medium  be¬ 
tween  end  users  and  designers.  (In  some  cases,  the  designers 

were  not,  themselves,  software  professionals.) 

•  Stronger  engagement  of  designers  with  end-users’  worlds. 

•  Enhanced  sharing  of  views  and  needs  among  end  users, 

leading  to  stronger,  articulation  by  them  as  a  collective  voice. 


^“*11  is  noteworthy  that,  in  the  studies  reviewed  here,  the  informants  made  their  own  decisions  about  what  was  important,  and  therefore  what  they 
should  photograph.  For  a  discussion  of  issues  in  more  conventional,  researcher-directed  photographic  diary  studies,  see  Carter  and  Mankoff,  2005. 
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Dramas  and  Videos 

Drama  provides  another  way  to  tell  stories — in  the  form  of 
theatre  or  of  video.  One  of  the  important  tensions  with  regard 
to  drama  in  PD  is  the  question  of  whether  the  drama  is  consid¬ 
ered  a  Hnished  piece,  or  a  changeable  work-in-progress. 

Many  PD  drama-practitioners  make  reference  to  Boal’s  The¬ 
atre  of  the  Oppressed  (Boal,  1974/1992).  Boal  described  the¬ 
atrical  techniques  whose  purpose  was  explicitly  to  help  a  group 
or  a  community  find  its  voice(s)  and  articulate  its  position(s). 
The  most  influential  of  Boal’s  ideas  was  his  Forum  Theatre,  in 
which  a  group  of  nonprofessional  actors  performs  a  skit  in  front 
of  an  audience  of  interested  parties.  The  outcome  of  the  skit  is 
consistent  with  current  events  and  trends — often  to  the  dissat¬ 
isfaction  of  the  audience.  The  audience  is  then  invited  to  be¬ 
come  authors  and  directors  of  the  drama,  changing  it  until  they 
approve  of  the  outcome. 

A  second  technique  of  interest  involves  the  staging  of  a 
tableau  (or  a  “frozen  image,”  in  Brandt  &  Grunnet,  2000),  in 
which  a  group  of  nonprofessional  actors  positions  its  members 
as  if  they  had  been  stopped  in  the  middle  of  a  play.  Each  mem¬ 
ber  can  tell  what  s/he  is  doing,  thinking,  planning,  and  hoping. 

Forum  Theatre  was  used  informally  in  the  UTOPIA  project 
and  other  early  Scandinavian  research  efforts  (Ehn  &  Kyng, 
1991;  Ehn  &  Sjogren,  1991),  addressing  the  question  of  new 
technologies  in  newspaper  production.  Changes  in  work  pat¬ 
terns  and  work-group  relations  were  acted  out  by  software  pro¬ 
fessionals  in  the  end-users’  workplace,  using  cardboard  and  ply¬ 
wood  prototypes,  in  anticipation  of  new  technologies.  The 
workers  served  as  the  audience,  and  critiqued  the  envisioned 
work  activities  and  working  arrangements.  The  drama  was  car¬ 
ried  out  iteratively,  with  changes,  until  it  was  more  supportive  of 
the  skilled  work  of  the  people  in  the  affected  job  titles.  The  re¬ 
searchers  made  repeated  visits  with  more  detailed  prototypes, 
again  using  the  vehicle  of  a  changeable  drama,  to  continue  the 
design  dialogue  with  the  workers.  This  work  was  widely  cred¬ 
ited  with  protecting  skilled  work  from  inappropriate  automa¬ 
tion,  leading  to  a  product  that  increased  productivity  while  tak¬ 
ing  full  advantage  of  workers’  skills. 

Brandt  and  Grunnet  (2000)  made  a  more  formal  use  of 
Boal’s  Eorum  Theatre  and  “frozen  images”  in  the  two  projects 
just  described  (“Sitings”).  'Working  with  refrigeration  techni¬ 
cians  in  the  Smart  Tool  project,  they  and  the  technicians  en¬ 
acted  work  dramas  and  tableaux  around  four  fictitious  work¬ 
ers,  leading  to  insights  about  the  technicians’  work  and  the 
technological  possibilities  for  enhanced  support  of  that  work. 
Here  is  Brandt  and  Grunnet’s  description  of  one  use  of  Eorum 
Theatre: 

[T]  he  stage  was  constructed  of  cardboard  boxes  which  in  a  stylized  way 
served  as  .  .  .  the  different  locations  in  the  scenario.  At  first  the  service 
mechanics  sat  as  an  audience  and  watched  the  play.  After  the  first  show¬ 
ing  of  the  “performance”  the  refrigeration  technicians  were  asked  to 
comment  and  discuss  the  dramatized  scenario  critically  .  .  . 

The  role  of  the  refrigeration  technicians  changed  from  being  a  pas¬ 
sive  audience  into  being  directors  with  an  expert  knowledge.  The  users 
recognized  the  situations  shown  in  the  dramatized  scenario  .  . .  Because 
of  the  openness  of  the  scenario  there  was  a  lot  of  “holes”  to  be  filled  out. 
For  instance,  one  .  .  .  technician  explained  that  he  preferred  to  solve 
the  problems  himself  instead  of  calling  his  boss.  This  information  meant 


that  the  Smart  Tool  should  be  able  to  help  him  solve  his  problems  while 
being  in  his  car  .  .  .  Another  [technician]  wanted  to  have  personal  in¬ 
formation  that  his  boss  was  not  allowed  .  .  .  [to]  access  .  .  .  (p.  14) 

Incidents  were  analyzed  through  tableaux.  The  designers  po¬ 
sitioned  themselves  in  the  “frozen  image”  of  the  work  situation, 
and  then  led  a  discussion  of  (a)  the  work  activities  that  were 
captured  in  the  stopped  action,  and  (b)  the  work  relations  in 
which  each  particular  tableau  was  embedded. 

Muller  et  al.  (1994)  presented  a  related  tutorial  demonstra¬ 
tion  piece  called  Interface  Theatre,  with  the  stated  goal  of  en¬ 
gaging  a  very  large  number  of  interested  parties  in  a  review  of 
requirements  and  designs  (e.g.,  in  an  auditorium).  In  Interface 
Theatre,  software  professionals  acted  out  a  user  interface  “look 
and  feel”  using  a  theatrical  stage  as  the  screen,  with  each  actor 
playing  the  role  of  a  concrete  interface  component  (e.g.,  Kim 
the  Cursor,  Marty  the  Menubar,  Dana  the  Dialoguebox). 

Pedersen  and  Buur  (2000;  see  also  Buur  et  al.,  2000),  fol¬ 
lowing  previous  work  of  Binder  (1999),  collaborated  with  in¬ 
dustrial  workers  to  make  videos  showing  proposed  new  work 
practices  and  technologies.  After  a  collaborative  analysis  of  the 
work  (see  “Games,”  following),  workers  acted  out  their  new 
ideas  and  took  control  of  which  action  sequences  were  cap¬ 
tured  on  video  for  subsequent  explanation  to  other  workers 
and  management  (see  also  Bjorgvinsson  &  Hillgren,  2004; 
M0rch  et  al.,  2004).  Isomursu  and  colleagues  (2004)  used  more 
informal  user-produced  videos  based  on  cell-phone  video 
recordings.  These  included  not  only  lay-ethnographic  records 
of  usage,  but  also  user-originated  dramas  to  illustrate  hypothe¬ 
sized  or  desired  aspects  of  usage.  In  the  Situated  and  Partici¬ 
pative  Enactment  of  Scenarios  method,  G.  lacucci,  C.  lacucci, 
and  Kuutti  (2002)  described  a  projective  series  of  improvisa¬ 
tions  with  an  innovative  technology  idea — the  “magic  thing” — 
in  users’  homes  or  workplaces  (G.  lacucci  &  Kuutti,  2002;  Kuutti, 
G.  lacucci,  &  C.  lacucci,  2002;  see  also  Buur  &  Bpdker,  2002; 
Bpdker  &  Buur,  2002). 

Finally,  Salvador  and  Sato  (1998, 1999)  used  acted-out  dramas 
as  triggers  for  questions  in  a  setting  similar  to  a  focus  group,  and 
Howard,  Carroll,  Murphy,  &  Peck  (2002)  described  the  role  of 
professional  actors  and  directors  in  dramatizing  attributes  of 
proposed  new  products. 

■While  all  of  these  practices  are  loosely  tied  together  through 
the  use  of  drama,  there  are  important  contrasts.  One  important 
dimension  of  difference  is  the  extent  to  which  the  drama  is  im¬ 
provised  in  the  situation,  or  scripted  in  advance.  Boal’s  (1974/ 
1992)  techniques  make  a  crucial  use  of  improvisation  by  the 
user-audience,  to  change  the  action  and  outcome  of  the  drama. 
This  theme  is  most  clearly  seen  in  the  work  of  Brandt  and  Grun¬ 
net  (2000),  Ehn  and  Sjogren  (1986,  1991),  and  Muller  and  col¬ 
leagues  (1994).  At  the  opposite  extreme  are  video  documen¬ 
taries,  which  of  course  are  difficult  to  change  as  a  result  of 
discussion  and  constructive  insight. 

Third  space.  Taken  as  a  somewhat  diverse  participatory 
genre,  the  dramatic  approaches  provide  many  of  the  aspects  of 
hybridity  reviewed  in  the  cultural  studies  introduction  to  this 
chapter.  Drama  brings  a  strong  overlap  of  the  world  of  end  users 
and  the  world  of  software  developers,  showing  concrete  pro¬ 
jections  of  ideas  from  one  world  into  the  other  world — and,  in 
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most  uses,  allowing  modification  of  those  ideas.  Drama  is  mar¬ 
ginal  to  the  work  domains  of  most  software  professionals  and 
most  end  users,  and  thus  moves  all  parties  Into  an  ambiguous 
area  where  they  must  negotiate  meaning  and  collaboratively 
construct  their  understandings.  Agreements,  conflicts,  and  new 
ideas  can  emerge  as  their  multiple  voices  and  perspectives  are 
articulated  through  this  rich  communication  medium. 

Claimed  benefits.  Similar  to  end-user  photography,  most 
of  the  theatrical  work  has  the  feel  of  experimentation.  It  is  diffi¬ 
cult  to  find  clear  statements  of  advantages  or  benefits  of  these 
practices  (see  “Conclusions,”  following).  In  general,  practition¬ 
ers  and  researchers  made  the  following  claims: 

•  Building  bridges  between  the  worlds  of  software  profes¬ 
sionals  and  users. 

•  Enhancing  communication  through  the  use  of  embodied 
(e.g.,  acted-out)  experience  and  through  contextualized 
narratives. 

•  Engaging  small  and  large  audiences  through  direct  or 
actor-mediated  participation  in  shaping  the  drama  (influenc¬ 
ing  the  usage  and  design  of  the  technology). 

•  Increasing  designers’  empathy  for  users  and  their  work. 

•  Simulating  use  of  not-yet-developed  tools  and  tech¬ 
nologies  (“dream  tools,”  Brandt  &  Grunnet,  2000)  to  explore 
new  possibilities. 

•  Euller  understanding  by  focus  group  members,  leading  to 
a  more  informed  discussion. 


GAMES 


From  theory  to  practice,  the  concept  of  games  has  had  an  im¬ 
portant  influence  In  participatory  methods  and  techniques. 
Ehn’s  theoretical  work  emphasized  the  negotiation  of  language 
games  in  the  course  of  bringing  diverse  perspectives  together  in 
participatory  design  (Ehn,  1988;  for  applications  of  this  theory, 
see  Ehn  &  Kyng,  1991;  Ehn  &  Sjogren,  1986,  1991).  In  this  view, 
part  of  the  work  of  a  heterogeneous  group  is  to  understand 
how  to  communicate  with  one  another;  of  course,  communi¬ 
cation  isn’t  really  possible  on  a  strict  vocabulary  basis,  but  re¬ 
quires  an  understanding  of  the  perspectives  and  disciplinary 
cultures  behind  the  words  (Bachmann-Medick,  1996;  Muller, 
1997a,  1997b,  1999b).  Thus,  the  work  of  heterogeneous  teams 
Is,  In  part,  the  “mutual  validation  of  diverse  perspectives”  that 
Bpdker  and  colleagues  (1988)  advocated. 

Games  have  also  been  an  Important  concept  in  designing 
practices,  with  the  convergent  strategies  of  enhanced  teamwork 
and  democratic  work  practices  within  the  team.^^  'We  explained 
the  concepts  as  follows  (Muller  et  al.,  1994): 

When  properly  chosen,  games  can  serve  as  levelers,  in  at  least  two  ways. 
First,  games  are  generally  outside  of  most  workers’  jobs  and  tasks.  They 


are  therefore  less  likely  to  appear  to  be  “owned”  by  one  worker,  at  the 
expense  of  the  alienation  of  the  non-owners.  Second,  .  .  .  [PD]  games 
.  .  .  are  likely  to  be  novel  to  most  or  all  of  the  participants.  Design  group 
members  are  more  likely  to  learn  games  at  the  same  rate,  without  large 
differences  in  learning  due  to  rank,  authority,  or  background.  .  .  .  This 
in  turn  can  lead  to  greater  sharing  of  ideas. 

In  addition,  games  .  .  .  can  help  groups  of  people  to  cohere  together 
[and]  communicate  better.  One  of  the  purposes  of  games  is  enjoy¬ 
ment — of  self  and  others — and  this  can  both  leaven  a  project  and  build 
commitment  among  project  personnel,  (pp.  62-63) 

Derived  from  Ehn’s  (1988)  theoretical  foundation,  Ehn  and 
Sjogren  (1986,  1991;  see  also  Bpdker,  Grpnback,  &  Kyng,  1993) 
adopted  a  “design-by-playlng”  approach,  introducing  several 
games  into  PD  practice: 

•  Carpentopoly,  a  board  game  concerned  with  business  issues 
in  the  carpentry  industry. 

•  Specification  Game,  a  scenario-based  game  based  on  a  set 
of  “situation  cards,”  each  of  which  described  a  workplace  sit¬ 
uation.  Players  (members  of  the  heterogeneous  analysis/ 
design  team)  took  turns  drawing  a  card  and  leading  the  dis¬ 
cussion  of  the  work  situation  described  on  the  card. 

•  Layout  Kit,  a  game  of  floor-plans  and  equipment  symbols, 
for  a  workers’  view  of  how  the  shop  floor  should  be  re¬ 
designed  (see  also  Bpdker  &  Buur,  2002;  Morgan,  Joroff, 
Porter,  &  Schon,  1998;  Klaer  &  Madsen,  1995;  and  most  re¬ 
cently  Brandt  &  Messeter,  2004,  reviewed  below). 

•  Organization  Kit  and  Desktop  Publishing  Game,  a  part 
of  the  UTOPIA  project  (Ehn  &  Kyng,  1991),  in  which  cards  il¬ 
lustrating  components  of  work  or  outcomes  of  work  were 
placed  on  posters,  with  annotations. 

Pedersen  and  Buur  (2000)  extended  the  Layout  Kit  in  new 
ways.  Collaborating  with  workers  at  Danfoss,  they  jointly  cre¬ 
ated  a  board  game  for  laying  out  new  technologies  in  an  indus¬ 
trial  plant: 

A  map  of  the  plant  layout  served  as  the  game  board.  .  .  .  Foam  pieces 
in  different  colors  and  shapes  worked  as  game  pieces  for  the  team  to  at¬ 
tach  meaning  to.  .  .  .  Often,  in  the  beginning  of  the  game,  the  place¬ 
ment  of  the  piece  was  only  accepted  when  touched  by  almost  every¬ 
body.  .  .  .  The  participants  were  forced  to  justify  the  placement,  which 
fostered  a  fruitful  dialogue  about  goals,  intentions,  benefits,  and  ef¬ 
fects.  People  were  asking  each  other  such  things  as  .  .  .  “what  if  we 
change  this?”,  “on  our  plant  we  do  this,  because  .  .  .”,  “would  you  ben¬ 
efit  from  this?”  (pp.  96-98) 

The  games  became  the  foundation  of  the  videos  produced  in 
collaboration  with  the  workers  (described  above  in  “Dramas”). 

Buur  and  colleagues  (2000)  extended  the  Specification 
Game,  making  a  game  from  the  outcome  of  a  participatory 
ethnographic  analysis  of  work  at  an  industrial  plant.  They  first 
collected  video  observations  from  work  activities,  and  devel¬ 
oped  a  set  of  60-70  video  excerpts  for  further  discussion.  They 
next  constructed  a  set  of  cards,  one  for  each  video  excerpt,  with 


^^For  an  exampie  of  games  used  to  teach  design  experiences  among  students,  see  Iversen  and  Buur  (2002). 
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a  still-frame  image  from  the  video  displayed  on  each  card.  Game 
participants  then  grouped  these  60-70  cards  into  thematic  clus¬ 
ters,  organized  their  clusters,  and  analyzed  the  subsets  of  ac¬ 
tions  in  each  cluster  (for  a  related  nongame  technique,  see  affin¬ 
ity  diagramming  in  Beyer  &  Holtzblatt,  1998). 

We  took  the  concept  of  games  in  a  different  direction,  for  use 
in  non-Scandinavian  workplaces,  by  introducing  several  new 
games  (Muller  et  al.,  1994): 

•  CARD,  a  card  game  for  laying  out  and/or  critiquing  an  exist¬ 
ing  or  proposed  work/activity  flow  (see  “Stories,”  previously). 

•  PICTIVE,  a  paper-and-pencil  game  for  detailed  screen  design 
(Muller  et  al.,  1995b). 

•  Icon  Design  Game,  a  guessing  game  for  innovating  new 
ideas  for  icons  (this  game  assumes  subsequent  refinement  by 
a  graphic  designer). 

•  Interface  Theatre,  for  design  reviews  with  very  large  groups 
of  interested  parties  (see  “Dramas,”  above). 

Our  games  emphasized  hands-on,  highly  conversational  ap¬ 
proaches  to  discussing  both  the  user  interface  concept  itself  and 
the  work  processes  that  it  was  intended  to  support.  We  at¬ 
tempted  to  foster  an  informal  and  even  playful  tone,  for  the  rea¬ 
sons  sketched  in  the  earlier  quotation. 

Recently,  Brandt  and  Messeter  (2004;  see  also  Johansson, 
Frost,  Brandt,  Binder,  &  Messeter,  2002)  developed  a  strong 
sequence  of  games.  Their  User  Game  is  based  on  the  video¬ 
collage  methods  of  Buur  and  colleagues  (2000),  combining  brief 
video  clips  into  person  or  role  descriptions,  which  are  then  la¬ 
beled  evocatively  by  the  participants.  The  second  game  in  their 
sequence,  the  Landscape  Game,  places  those  user  constructs 
into  the  work  environment  (as  a  board  game) .  The  Technology 
Game  adds  simple  shapes  that  stand  for  technologies,  again 
playing  those  shapes  onto  the  work  environment  in  the  Land¬ 
scape  Game.  Finally,  the  Scenario  Game  moves  back  to  the  real 
world,  enacting  possibilities  based  on  new  ideas  from  the  pre¬ 
ceding  three  games.  The  enactments  may  be  video  recording, 
both  for  documentary  purposes  and  to  generate  further  video 
material  for  another  cycle  of  the  four  games. 

Third  space.  Each  of  these  ten  games  took  all  of  its  play¬ 
ers  outside  of  their  familiar  disciplines  and  familiar  working  prac¬ 
tices,  but  strategically  reduced  the  anxiety  and  uncertainty  of  the 
situation  by  using  the  social  scaffolding  of  games.  Each  game  re¬ 
quired  its  players  to  work  together  through  mutual  learning  to 
understand  and  define  the  contents  of  the  game,  and  to  interpret 
those  contents  to  one  another  in  terms  of  multiple  perspectives 
and  disciplines.  The  conventional  authority  of  the  software  pro¬ 
fessionals  was  thus  replaced  with  a  shared  interpretation  based 
on  contributions  from  multiple  disciplines  and  perspectives. 

Claimed  benefits.  Participatory  design  work  with  games 
has  been  claimed  to  lead  to  the  following  benefits: 

•  Enhanced  communication  through  the  combination  of 
diverse  perspectives. 

•  Enhanced  teamwork  through  shared  enjoyment  of  work¬ 
ing  in  a  game-like  setting. 


•  Greater  freedom  to  experiment  and  explore  new  ideas 

through  flexible  rules  and  redefinition  of  rules  during  the 
game. 

•  Improved  articulation  of  the  perspectives,  knowledges, 
and  requirements  of  workers. 

•  New  insights  leading  to  important  new  analyses  and  designs 
with  documented  commercial  value. 


CONSTRUCTIONS 


Preceding  sections  have  considered  hybridity  in  participatory 
activities,  such  as  sitings,  workshops,  stories,  photography,  dra¬ 
mas,  and  games.  This  section  continues  the  survey  of  participa¬ 
tory  practices  that  bring  users  and  software  professionals  into 
unfamiliar  and  ambiguous  “third  space”  settings.  In  this  section, 
I  focus  on  the  collaborative  construction  of  various  concrete 
artifacts: 

•  Physical  reflections  of  a  cocreated  language  of  analysis 
and  design. 

•  Descriptions  of  work  in  unfamiliar  media. 

•  Low-tech  prototypes  for  analysis  and  design. 

•  High-tech  prototypes  for  design  and  evaluation. 

Language 

The  preceding  section  noted  Ehn’s  (1988)  theoretical  work  on 
PD  as  language  games .  Ehn’s  interest  converges  with  Bhabha 
(1994)  “third  space”  argument:  Part  of  the  characterization  of 
hybridity  was  the  negotiation  and  cocreation  of  working  lan¬ 
guage  and  meaning.  This  section  takes  Ehn’s  position  seriously 
and  considers  the  role  of  language  creation  in  participatory 
practices  that  lead  to  hybridity. 

Several  projects  have  made  physical  objects  into  a  kind  of 
vocabulary  for  work  analysis,  design,  or  evaluation.  The  cards 
described  in  the  preceding  section  (“Games”)  are  examples 
(Buur  et  al.,  2000;  Ehn  &  Sjogren,  1986,  1991;  Lafreniere,  1996; 
Muller,  2001;  Muller  et  al.,  1995b;  Tschudy  et  al.,  1994).  In  each 
of  these  methods,  the  cards  became  a  kind  of  “common  lan¬ 
guage”  (e.g.,  Muller  et  al.,  1995b)  through  which  the  design 
team  communicated  (a)  with  one  another,  and  (b)  with  their 
labor  and  management  clients. 

In  two  of  the  methods,  the  cards  themselves  were  acknowl¬ 
edged  to  be  incomplete,  and  part  of  the  work  of  the  team  was  to 
develop  and  refine  the  cards  so  as  to  reflect  their  growing  un¬ 
derstanding  and  their  new  insights  (Lafreniere,  1996;  Muller, 
2001).  Team  members  (users  and  others)  were  encouraged  to 
disregard,  if  appropriate,  the  template  of  information  on  each 
card,  up  to  and  including  the  decision  to  turn  the  card  over  and 
write  on  its  blank  back.  In  subsequent  sessions,  the  concepts 
that  were  written  on  the  blank  backs  of  cards  usually  became 
new  kinds  of  cards.  The  working  vocabulary  of  the  team  thus 
grew  as  the  shared  understanding  of  the  team  grew.  This  exten¬ 
sibility  of  the  set  of  cards  was  observed  in  nearly  all  sessions,  but 
was  particularly  important  in  sessions  that  were  envisioning 
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future  technologies  or  future  work  practices.  The  cards  thus  be¬ 
came  a  point  of  hybridity,  where  assumptions  were  questioned 
and  challenged,  where  extensive  and  polyvocal  dialogue  was 
required  for  the  team  to  assign  meaning  to  the  cards,  where 
conflicts  were  revealed  and  resolved,  and  where  the  team  had  to 
construct  its  understanding  and  its  language. 

Similarly,  the  board  games  of  Ehn  and  Sjogren,  and  especially 
of  Pedersen  and  Buur  (2000),  used  deliberately  ambiguous  play¬ 
ing  pieces.  The  analysis  team  had  to  assign  meaning  to  the 
pieces,  and  did  so  in  a  collaborative  way. 

Chin,  Schuchardt,  Myers,  and  Gracio  (2000),  working  with  a 
community  of  physical  scientists  who  were  not  software  pro¬ 
fessionals,  introduced  software-like  flowcharts  to  their  clients 
(see  Kensing  &  Munk-Madsen,  1993,  for  a  discussion  of  the  re¬ 
lationship  between  concrete  tools  and  abstract  tools).  This  work 
shared,  with  the  other  work  reviewed  in  this  section,  aspects  of 
symbol-ambiguity  and  language  cocreation: 

To  attune  scientists  to  the  construction  of  workflow  diagrams,  we  pro¬ 
vided  them  a  simple,  informal  example  of  how  a  meteorologist  might  di¬ 
agram  his  [sic]  work  in  collecting  and  reporting  weather  conditions.  .  .  . 
Although  we  used  circles  and  arrows  in  our  example,  we  did  not  impose 
any  specific  symbology  or  rules  on  the  scientists’  construction  of  work- 
flow  diagrams.  ...  At  times,  the  scientists  did  struggle  in  developing 
some  diagrams,  but  the  labor  was  mostly  centered  on  the  elucidation 
of  the  research  processes  rather  than  the  mechanics  of  diagramming. 
(P-  32) 

Third  space.  Common  to  all  of  these  projects  was  the 
cocreation  of  a  physically  represented  language,  both  within  the 
team  and  from  the  team  to  its  clients  and  stakeholders.  This  kind 
of  lay  linguistic  work  requires  mutual  education  and  mutual  val¬ 
idation  for  the  new  language  components  to  have  meaning  to 
all  of  the  parties.  These  negotiations  of  multiple  knowledges  are 
at  the  heart  of  the  “third  space”  proposal  of  Bhabha  (1994). 

Claimed  benefits.  Most  of  these  projects  involved  a  num¬ 
ber  of  activities,  and  a  number  of  aspects  of  hybridity.  It  is  diffi¬ 
cult  to  determine  how  much  of  their  successes  were  due  specif¬ 
ically  to  the  language-related  components.  Benefits  that  may 
have  resulted  from  the  negotiation  and  cocreation  of  language 
include  the  following: 

•  Enhanced  understandings  of  one  another’s  perspectives 
and  needs. 

•  Critical  examinations  of  assumptions  underlying  the 
ways  that  each  party  expressed  its  perspective. 

•  Shared  ownership  of  the  language  and  its  physical  mani¬ 
festation  (cards,  flowcharts,  game  pieces). 

•  Improved  communication  within  the  team  and  from  the 
team  to  interested  outsiders  (clients,  stakeholders). 

Making  Descriptive  Artifacts 

Another  way  of  moving  end  users  into  unfamiliar  and  hence  re¬ 
flective  experiences  is  to  ask  them  to  use  “projective”  or  artistic 
methods  to  report  on  their  experiences  and  needs.  In  one 
sense,  these  methods  produce  another  kind  of  language  of  ex¬ 


pression,  and  therefore  might  have  been  included  in  the  pre¬ 
ceding  section.  Because  the  outcomes  are  so  distinctively  dif¬ 
ferent  from  the  language-oriented  work  of  the  preceding  sec¬ 
tion,  I  thought  it  best  to  review  this  work  in  its  own  section. 

Sanders  has  employed  user-created  collage  in  her  participa¬ 
tory  practice  for  a  number  of  years  (Sanders,  2000;  see  also  Dan- 
davate  et  al.,  2000;  Sanders  &  Branaghan,  1998;  Sanders  &  Nut¬ 
ter,  1994).  The  choice  of  collage  is,  of  course,  strategic;  relatively 
few  people  make  collages  as  part  of  their  work  activities,  and 
relatively  few  people  interpret  their  collages  to  one  another  as 
part  of  their  work  conversations.  Yet  the  content  of  the  collages 
is  strongly  anchored  in  what  people  know.  The  collages  thus  be¬ 
come  marginal  constructions,  not  part  of  any  defined  workplace 
field  or  discipline,  but  informed  by  familiar  knowledges.  The 
novelty  of  the  collage  encourages  the  challenging  of  assump¬ 
tions,  and  the  interpretation  and  presentation  of  collages  en¬ 
courages  mutual  learning  across  the  diversity  of  experiences 
and  knowledges  of  the  participants. 

For  completeness,  I  make  reference  to  the  work  of  Noble 
and  Robinson  (2000)  on  collaborative  creation  of  photo  docu¬ 
mentaries,  and  of  Patton  (2000)  on  end-user  creation  of  photo 
collages,  reviewed  in  the  earlier  section  on  “Photographs.”  Their 
work  also  produced  descriptive  artifacts  that  took  users  and 
their  collaborators  into  unfamiliar  areas. 

Third  space.  These  methods  have  in  common  the  use  of 
a  nonstandard  medium  for  making  users’  needs  known,  and  for 
developing  new  insights  in  a  workplace  setting.  The  making  of 
collages  may  be  new  for  many  participants.  They  are  thus  in  a 
kind  of  “third  space,”  between  their  work  culture  and  the  artis¬ 
tic  or  expressive  culture  of  collages,  and  they  have  to  reflect  on 
the  differences  as  they  construct  their  approach  to  making  col¬ 
lages  of  their  own  experiences. 

It  is  not  clear,  in  Sanders’  work,  whether  the  collage  work  is 
done  collaboratively  among  end  users,  or  whether  each  collage 
is  a  solitary  production.  If  the  collage-creation  is  done  collabo¬ 
ratively,  then  it  might  give  rise  to  some  of  the  other  attributes 
of  hybridity  in  Table  54.1 — e.g.,  challenging  assumptions,  co-cre- 
ation  of  meanings  and  collective  actions,  dialogues. 

Claimed  benefits.  Basing  her  claims  on  years  of  practice 
with  collages  and  related  practices,  Sanders  (2000)  claimed  the 
following  benefits: 

•  Using  visual  ways  of  sensing,  knowing,  remembering,  and 
expressing. 

•  Giving  access  and  expression  to  emotional  side  of 

experience. 

•  Acknowledging  the  subjective  perspective  in  people’s 
experiences  with  technologies. 

•  Revealing  unique  personal  histories  that  contribute  to 
the  ways  that  people  shape  and  respond  to  technologies. 

Lo'w-Tech  Prototypes 

Beaudouin-Lafon  and  Mackay  (chapter  52,  this  volume)  have 
provided  a  chapter  on  prototyping — including  participatory 
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prototyping — in  this  volume;  therefore,  I  have  written  a  very 
brief  account  in  this  chapter  so  as  not  to  duplicate  their  efforts. 

Low-tech  prototypes  may  lead  to  “third  space”  experiences 
because  they  bring  people  into  new  relationships  with  tech¬ 
nologies — relationships  that  are  “new”  in  at  least  two  impor¬ 
tant  ways.  First,  the  end  users  are  often  being  asked  to  think 
about  technologies  or  applications  that  they  have  not  previously 
experienced.  Second,  in  participatory  work  with  low-tech  pro¬ 
totypes,  end  users  are  being  asked  to  use  the  low-tech  materi¬ 
als  to  reshape  the  technologies — a  “design-by-doing”  approach 
(Bpdker  et  al.,  1993).  In  this  way,  participatory  work  with  low- 
tech  prototypes  involves  much  more  user  contribution  and  user 
initiative  than  the  more  conventional  use  of  “paper  prototypes” 
as  surrogates  for  working  systems  in  usability  testing  (e.g.,  Daly- 
Jones,  Bevan,  &  Thomas,  1999;  Rettig,  1994). 

The  UTOPIA  project  provided  impressive  demonstrations 
of  the  power  of  low-tech  cardboard  and  plywood  prototypes  to 
help  a  diverse  group  to  think  about  new  technologies,  office  lay¬ 
outs,  and  new  working  relations  that  might  result  from  them 
(Bpdker  et  al.,  1987, 1988, 1993;  Ehn  &  Kyng,  1991;  for  other  use 
of  low-tech,  substitutive  prototypes,  see  Mprch  et  al.,  2004). 
Subsequent  projects  to  translate  this  work  to  North  America 
led  to  the  PICTIVE  method  of  paper-and-pencil  constructions  of 
user  interface  designs  by  heterogeneous  design  teams  (Muller 
et  al.,  1995b);  prototyping  of  consumer  appliances  using  foam- 
core  and  hook-and-loop  attachments  (Sanders  &  Nutter,  1994); 
and  a  more  experimental  simulation  of  email,  using  paper  air¬ 
planes  (Dykstra  &  Carasik,  1991). 

Third  space.  Low-tech  prototyping  has  a  reputation  for 
bringing  new  insights  through  the  combination  of  diverse 
perspectives.  The  UTOPIA  project  is  widely  credited  with  mu¬ 
tual  education  among  shop-floor  print  workers  and  computer 
systems  researchers.  Our  experiences  with  PICTIVE  almost  al¬ 
ways  involved  mutual  education.  Understanding  and  chang¬ 
ing  the  artifact  become  important  arenas  for  people  to  ex¬ 
plore  their  understandings  of  one  another’s  positions,  to 
question  one  another’s  approaches,  to  discover  and  resolve 
conflicts,  to  engage  in  combinations  of  views  leading  to  plans 
for  collective  action,  and  to  accommodate  heterogeneity  of 
views  and  interests. 

Claimed  benefits.  The  low-tech  participatory  prototyp¬ 
ing  approaches  have  been  extraordinarily  influential,  with  adop¬ 
tion  on  four  continents.  Claimed  benefits  include 

•  Enhanced  communication  and  understanding  through 
grounding  discussions  in  concrete  artifacts. 

•  Enhanced  incorporation  of  new  and  emergent  ideas 

through  the  ability  of  participants  to  express  their  ideas  di¬ 
rectly  via  the  low-tech  materials,  and  through  the  construc¬ 
tion  of  artifacts  that  can  be  used  in  other  techniques,  espe¬ 
cially  drama  and  video  documentaries. 

•  Enhanced  working  relations  through  a  sense  of  shared 
ownership  of  the  resulting  design. 

•  Practical  appUcation  with  measured  successes  in  using 
low-tech  design  approaches  to  real  product  challenges,  achiev¬ 
ing  consequential  business  goals. 


Evolutionary  Prototyping  and  Cooperative  Prototyping 

This  last  section  on  participatory  methods  is  concerned  with 
software  prototyping.  As  noted  above,  I  am  relying  on  Beau- 
douin-Lafon  and  Mackay’s  chapter  15  in  this  volume  to  cover 
prototyping  in  greater  depth  and  breadth.  I  include  this  brief 
overview  for  completeness  of  my  chapter’s  survey  of  hybridity 
in  participatory  practices. 

Bpdker  and  Grpnbask  (1991)  and  Madsen  and  Aiken  (1993) 
explored  the  potential  of  cooperative  prototyping  in  several 
projects,  using  different  technology  infrastructures.  In  general, 
they  found  that  this  approach  led  to  enhanced  communication 
with  end  users,  improved  incorporation  of  end-user  insights 
into  the  prototypes,  and  stronger  collective  ownership  and  col¬ 
lective  action  planning  by  the  team.  They  also  observed  time- 
consuming  breakdowns  in  the  design  process  itself,  when  new 
ideas  required  significant  programming  effort. 

In  a  different  prototyping  approach,  a  system  is  delivered  to 
its  end  users  as  series  of  iterative  prototypes,  each  of  which  grad¬ 
ually  adds  functionality  (e.g.,  Anderson  &  Crocca,  1993;  Ber- 
telsen,  1996;  Trigg,  2000).  What  appears  to  be  critical  is  that  the 
prototype  functions  as  a  crucial  artifact  in  the  end-users’  work, 
such  as,  (a)  a  resource  of  documents  for  librarians  (Anderson  & 
Crocca,  1993),  (b)  an  online  event  checklist  that  served  as  the 
crucial  coordination  point  for  the  work  of  diverse  contributions 
(Bertelson,  1996),  or  (c)  a  database  supporting  funding  work  in 
a  nonprofit  organization  (Trigg,  2000).  Trigg  (2000)  provided  a 
series  of  observations  and  tactical  recommendations  about  how 
to  engage  the  users  in  the  evaluations  that  both  they  and  the 
software  professionals  had  agreed  were  needed. 

Third  space.  This  very  brief  survey  of  cooperative  proto¬ 
typing  and  “iterative  delivery”  approaches  shows  several  aspects 
of  hybridity.  In  the  case  of  cooperative  prototyping,  the  coop¬ 
erative  work  may  be  done  in  a  physical  third  space  that  is 
neither  the  end-users’  office  nor  the  software  developers’  of¬ 
fice  (see  “Sitings,”  previously).  In  the  case  of  the  delivery  of  it¬ 
erated  prototypes,  each  prototype  is  presented  in  the  end- 
users’  setting,  but  is  unusual  and  only  partially  functional,  and 
thus  occasions  reflection  about  its  nature,  its  role  in  the  end- 
users’  work,  and,  ultimately,  the  work  itself  In  both  cases,  the  in¬ 
vitation  (or  perhaps  the  necessity)  of  the  end-users’  actions  to 
help  shape  the  technology  becomes  an  important  means  of  re¬ 
focusing  their  attention,  as  well  as  the  attention  of  the  software 
developers.  The  ensuing  conversations  are  concerned  with  the 
interlinked  feasibility  of  changes  to  technology  and  to  work 
practices,  with  attributes  of  hybridity  including  polyvocal  dia¬ 
logues,  challenging  one  another’s  assumptions,  and  develop¬ 
ing  plans  for  collective  actions. 

Claimed  benefits.  Some  of  the  virtues  of  the  low-tech 
prototyping  approaches  have  also  been  claimed  for  the  coop¬ 
erative  prototyping  and  “iterative  delivery”  approaches: 

•  Enhanced  communication  and  understanding  through 

grounding  discussions  in  concrete  artifacts. 

•  Enhanced  working  relations  through  a  sense  of  shared 

ownership  of  the  resulting  design. 
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Additional  claims  for  software-based  prototypes  include: 

•  Earlier  understanding  of  constraints  posed  by  the  prac¬ 
tical  limitations  of  software. 

•  Improved  contextual  grounding  of  the  design  in  the 

end-users’  work  practices. 


CONCLUSION 


My  theme  has  been  hybridity  and  the  ways  in  which  selected 
methods  in  participatory  design  may  bring  useful  attributes  of 
hybridity  or  third  space  approaches  into  HCI  work.  I  consid¬ 
ered  eight  trends  in  PD — (a)  selection  of  sites  of  shared  work, 
(b)  workshops,  (c)  stories,  (d)  end-user  photography,  (e)  dra¬ 
mas,  (f)  creation  of  shared  languages,  (g)  descriptive  artifacts 
(low-tech  prototypes) ,  and  (h)  working  prototypes — and  I  ex¬ 
plored  how  each  of  these  categories  of  practice  may  contribute 
to  hybridity,  and  what  advantages  may  result.  The  deliberate  and 
selective  use  of  hybridity  has  led  to  powerful  methods  in  PD  for 
increasing  communication  effectiveness,  team  coherence,  inno¬ 
vation,  and  quality  of  outcome.  Hybridity  is  thus  at  the  heart  of 
PD,  fostering  the  critical  discussions  and  reflections  necessary  to 
challenge  assumptions  and  to  create  new  knowledges,  working 
practices,  and  technologies.  When  we  consider  HCI  as  a  set  of 
disciplines  that  lie  between  the  space  of  work  and  the  space  of 
software  development,  we  see  that  the  hybrid  third  spaces  de¬ 
veloped  within  PD  have  much  to  offer  HCI  in  general. 


Table  54.2  summarizes  the  discussion  of  hybridity  in  PD,  us¬ 
ing  the  criteria  derived  from  cultural  studies  (Table  54.1)  and  the 
experiences  described  in  the  eight  areas  of  practice.  Table  54.2 
shows  different  patterns  of  hybridity  for  different  methods, 
techniques,  and  practices. 

Certain  attributes  are  relatively  common  across  practices, 
such  as  inbetweenness,  questioning  assumptions,  negotiation, 
and  heterogeneity  as  the  norm.  Other  attributes  are  relatively 
rare,  such  as,  considerations  of  what  constitutes  legitimate  data 
for  analysis  or  design,  how  those  data  are  analyzed  as  evidence, 
and  how  conclusions  are  drawn  in  each  of  the  several  fields  that 
are  represented  in  a  team.  These  are  difficult  questions  in  the 
study  of  disciplinarity  (Chandler,  Davidson,  &  Harootunian, 
1994;  Klein,  1996),  so  it  is  perhaps  not  surprising  that  there  is 
relatively  weak  support  for  their  exploration  in  participatory 
practices.  For  projects  in  which  these  are  pivotal  questions,  we 
may  need  new  methods  that  leverage  hybridity  in  new  ways.  I 
hope  that  this  survey  of  PD  practices  for  creating  third  spaces 
will  lead  to  new  practices  that  strengthen  these  missing  attrib¬ 
utes.  Conversely,  I  hope  that  new  work  in  PD  and  HCI  can  help 
to  ground  some  of  the  cultural  studies  discussions  in  new  ways. 

This  chapter  would  not  be  complete  without  a  list  of  un¬ 
solved  problems  in  participatory  design: 

•  Participation  by  non-organized  workforce.  The  field  of 
PD  has  long  been  concerned  about  how  to  engage  in  mean¬ 
ingful  participative  activities  with  workers  or  others  who  are 
not  organized  into  a  group  with  collective  bargaining  power 
or  other  collective  representation  (e.g.,  Greenbaum,  1993, 


TABLE  54.2.  Hybridity  in  Participatory  Practices 


Attribute 

Sitings 

Workshops 

Stories 

Photos 

Dramas 

Games 

Language 

Descriptive 

Prototypes 

Overiap/Inbetweenness 

7 

+ 

- 

+ 

+ 

+ 

+ 

+ 

+ 

Marginality 

+ 

+ 

- 

7 

+ 

+ 

7 

+ 

7 

Noveity 

+ 

+ 

7 

7 

+ 

+ 

+ 

+ 

+ 

Uncertain/shared  "ownership” 

7 

+ 

7 

- 

+ 

+ 

+ 

- 

- 

Seiected  attributes 

+ 

7 

+ 

+ 

- 

+ 

+ 

- 

+ 

Conflicts 

+ 

+ 

+ 

- 

+ 

- 

+ 

- 

+ 

Questioning  assumptions 

+ 

7 

+ 

+ 

+ 

+ 

+ 

7 

+ 

Mutuai  learning 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

7 

+ 

Synthesis  of  new  ideas 

7 

+ 

+ 

+ 

+ 

+ 

7 

+ 

+ 

Negotiation/(co-)creation 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

Identities 

- 

- 

+ 

+ 

- 

7 

7 

+ 

7 

Working  ianguage 

- 

7 

+ 

+ 

- 

+ 

+ 

+ 

+ 

Working  assumptions  and  dynamics 

+ 

7 

+ 

+ 

+ 

+ 

+ 

7 

+ 

Understandings 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

Relationships 

7 

+ 

+ 

+ 

- 

+ 

7 

+ 

7 

Collective  actions 

7 

+ 

7 

+ 

7 

7 

7 

+ 

+ 

Dialogues 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

Poiyvocaiity 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

What  is  considered  to  be  data? 

- 

- 

- 

+ 

- 

- 

+ 

+ 

- 

What  are  the  rules  of  evidence? 

- 

- 

- 

+ 

- 

- 

+ 

+ 

- 

How  are  conclusions  drawn? 

- 

- 

- 

7 

- 

- 

+ 

- 

- 

i  authority — T  interpretation 

+ 

7 

+ 

+ 

+ 

+ 

+ 

7 

+ 

i  individualism — t  collectivism 

7 

+ 

7 

+ 

7 

+ 

7 

7 

+ 

Heterogeneity  as  the  norm 

+ 

+ 

+ 

+ 

- 

+ 

+ 

+ 

+ 

‘Key:  +  practice  includes  this  attribute  of  hybridity 
—  practice  does  not  include  this  attribute 
?  not  sure 


54.  PARTICIPATORY  DESIGN  •  1077 


1996;  van  den  Besselaar,  Greenbaum,  &  Mambrey,  1996).  This 
has  been  a  particulariy  difficult  problem  when  we  have  tried 
to  compare  methods  from  one  country  (and  political  culture) 
to  another  (e.g.,  Muller  et  al.,  1991). 

•  Evaluation  and  metrics.  One  of  the  weaknesses  of  the  lit¬ 
erature  on  participatory  practices  is  the  dearth  of  formal  eval¬ 
uations.  There  is  a  small  set  of  papers  that  have  examined 
software  engineering  projects  across  companies,  and  have 
found  positive  outcomes  related  to  end-user  participation 
(Cotton  et  al.,  1988;  Saarinen  &  Saaksjarvi,  1989).  I  have  been 
unable  to  discover  any  formal  experiments  comparing  par¬ 
ticipatory  methods  with  nonparticipatory  methods  in  a  cred¬ 
ible  workplace  context.  Indeed,  such  studies  would  be  diffi¬ 
cult  to  perform,  because  they  would  require  that  a  product 
be  implemented  and  marketed  twice  (once  with  participa¬ 
tion,  and  once  without).  The  problem  is  made  more  difficult 
because  measurements  and  metrics  of  organizational  out¬ 
comes,  user  participation,  and  user  satisfaction  are  currently 
vexing  research  issues  (e.g.,  Garrety  &  Badham,  1998;  Kap- 
pelman,  1995;  for  review,  see  Gasson,  1995). 


•  Universal  usability  and  “universal  participation?”  Nearly 
all  of  the  practices  described  in  this  chapter  (and  in  the 
longer  set  of  methods  in  Muller  et  al.,  1997)  are  strongly  vi¬ 
sual  and  require  hands-on  manipulation  of  materials.  These 
approaches  violate  the  emerging  requirements  of  universal 
usability  for  people  with  visual  or  motor  disabilities  (see,  e.g.. 
Universal  Usability  Fellows,  2000  and  the  Proceedings  of  the 
Conference  on  Universal  Usability^^-,  see  also  chapters  in  this 
volume  by  Vanderheiden  [chapter]  Czaja  [chapter  39] ;  Mar¬ 
cus  [chapter  18] ;  Newell,  Carmichael,  Gregor,  Aim,  &  Waller 
[chapter  41] ;  Sears  [chapter  42[ ;  and  Jacko,  Leonard,  &  Scott 
[chapter  43]).  In  the  previous  edition  of  this  book,  I  noted 
the  irony  that  participatory  design  appeared  to  have  failed  in 
its  inclusiveness  with  regard  to  people  with  disabilities.  Hap¬ 
pily,  that  problem  is  being  addressed  in  at  least  three  coun¬ 
tries  through  the  research  and  methodologies  of  Carmien, 
DePaula,  Gorman,  and  Kintsch  (2003);  Davies,  Marcella,  Mc- 
Grenere,  and  Purves  (2004);  Moffatt,  McGrenere,  Purves,  and 
Klawe  (2004);  and  Wu  and  colleagues  (Wu,  Richards,  & 
Baecker,2004;  Wu,  Baecker,  &  Richards,  2005). 


Note  to  Editors 
and  Printers: 
Because  I  am 
citing  many 
Scandinavian 
authors,  I  have 
alphabetized  my 
references 
according  to  the 
Scandinavian 
alphabet:  a,  b,  c. 
.  .  X,  y  z,  ae,  0. 
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INTRODUCTION 


This  chapter  discusses  the  unified  user  interface  development 
paradigm  as  an  effective  software  engineering  recipe  for  user  in¬ 
terfaces  that  can  be  automatically  adapted  to  individual  end  users 
and  contexts  of  use.  The  distinctive  procedural,  representational, 
and  programming  properties  of  this  development  discipline  are 
highlighted,  qualifying  it  as  a  plausible  and  cost-effective  ap¬ 
proach  towards  the  goal  of  developing  automatic  user  interface 
personalization.  The  method  conveys  a  fundamentally  new  per¬ 
spective  into  the  development  of  user  interfaces,  emphasizing  a 
principled,  systematic  and  evolutionary  approach  towards  cop¬ 
ing  with  diversity,  by  progressively  encapsulating  and  coordinat¬ 
ing  in  an  extensible  development  structure  all  alternative  inter¬ 
action  artifacts.  Finally,  the  consolidated  experience  in  applying 
the  unified  user  interface  development  paradigm  in  the  emerg¬ 
ing  field  of  ambient  intelligence  is  discussed,  by  explaining  the 
way  the  original  method  is  effectively  applied  to  build  applica¬ 
tions  with  unified  ambient  interfaces. 

Automatic  User  Interface  Adaptation 

The  notion  of  automatic  user  interface  adaptation  reflects  the 
capability  of  interactive  software  to  adapt  during  runtime  to  the 
individual  end  user,  as  well  as  to  the  particular  context  of  use,  by 
delivering  the  most  appropriate  interactive  experience.  The 
storage  location,  origin,  and  format  of  user-oriented  information 
may  vary.  For  example,  information  may  be  stored  in  profiles,  in¬ 
dexed  by  unique  user  identifiers,  extracted  from  user-owned 
cards,  entered  by  the  user  in  an  initial  interaction  session,  or 
inferred  by  the  system  through  continuous  interaction  moni¬ 
toring  and  analysis.  Additionally,  usage-context  information, 
such  as  user  location,  environment  noise,  network  bandwidth, 
and  so  forth,  is  normally  provided  by  special  purpose  equip¬ 
ment,  like  sensors,  or  system-level  software.  In  order  to  support 
optimal  interface  delivery  for  individual  user-  and  usage-context- 
attributes,  it  is  required  that  for  any  given  user  task  or  group  of 
user  activities,  the  implementations  of  the  alternative  best  fit 
interface  components  are  either  implementationally  encapsu¬ 
lated  or  appropriately  locatable  (e.g.,  remote  components, 
downloadable  plug-ins,  dynamically  linked  libraries,  etc.). 

The  Concept  of  Unified  User  Interfaces 

A  unified  user  interface  is  the  interaction-specific  software  of 
software  applications  or  services,  which  is  capable  of  self-adapt¬ 
ing  to  the  characteristics  of  the  individual  end  user  and  con¬ 
text  of  use.  Such  an  adaptation  may  reflect  varying  patterns  of 
interactive  behavior,  at  the  physical,  syntactic,  or  semantic  level 
of  interaction,  to  accommodate  specific  user-  and  context- 
oriented  parameters.  Practically  speaking,  from  the  end-user 
point  of  view,  a  unified  user  interface  is  actually  an  interface 
that  can  automatically  adapt  to  the  individual  user  attributes 
(e.g.,  requirements,  abilities,  and  preferences),  as  well  as  to  the 
particular  characteristics  of  the  usage-context  (e.g.,  computing 
platform,  peripheral  devices,  interaction  technology,  and  sur¬ 


rounding  environment).  Therefore,  a  unified  user  interface  re¬ 
alizes  the  combination  of 

•  User-adapted  behavior,  such  as  the  automatic  delivery  of  the 
most  appropriate  user  interface  for  the  particular  end  user 
(requires  user  awareness) 

•  Usage-context  adapted  behavior,  such  as  the  automatic  de¬ 
livery  of  the  most  appropriate  user  interface  for  the  particular 
situation  of  use  (requires  usage  context  awareness) 

•  Hence,  the  characterization  unified  does  not  have  any  par¬ 
ticular  behavioral  connotation,  at  least  as  seen  from  an  end- 
user  perspective.  Instead,  the  notion  of  unification  reflects 
the  specific  software  engineering  strategy  needed  to  accom¬ 
plish  this  behavior,  emphasizing  the  proposed  systematic 
development-oriented  discipline.  More  specifically,  in  order  to 
realize  this  form  of  adapted  behavior,  a  unified  user  interface 
reflects  the  following  fundamental  development  properties. 

•  It  encapsulates  alternative  dialogue  patterns  (e.g.,  imple¬ 
mented  dialogue  artifacts),  for  various  dialogue  design  con¬ 
texts  (e.g.,  a  subtask,  a  primitive  user  action,  a  visualization), 
appropriately  associated  to  the  different  values  of  user-  and 
usage-context  related  attributes.  The  need  for  such  alternative 
dialogue  patterns  may  only  be  dictated  by  the  user  interface 
design  process,  when,  given  any  particular  design  context,  for 
different  user-  and  usage-context  attribute  values,  alternative 
design  artifacts  are  needed  to  accomplish  optimal  interaction 

•  It  implementationally  encapsulates  representation  schemes 
for  user-  and  usage-context  parameters,  internally  utilizing 
user-  and  usage-context  information  resources  (e.g.,  reposito¬ 
ries,  servers),  to  extract  or  to  update  user-  and  usage-context 
information. 

•  It  encapsulates  the  necessary  user  interface  design  knowl¬ 
edge  and  decision-making  capability  for  activating,  during 
runtime,  the  most  appropriate  dialogue  patterns  (e.g.,  inter¬ 
active  software  components),  according  to  particular  in¬ 
stances  of  user-  and  usage-context  attributes. 

The  distinctive  property  of  unified  user  interfaces  to  encap¬ 
sulate  alternative,  mutually  exclusive,  user  interface  design  arti¬ 
facts,  each  purposefully  designed  and  implemented  as  an  opti¬ 
mal  alternative  for  its  corresponding  target  attributes  of  the  user- 
and  usage-context  models,  assuming  a  particular  design  context, 
constitutes  one  of  the  main  contributions  of  this  research  work 
within  the  user  interface  software  engineering  field. 


UNIFIED  USER  INTERFACE  DESIGN 


The  Design  Problem 

Universal  design  in  HCI  reflects  the  principle  to  address  poten¬ 
tially  all  users  and  usage  contexts — anyone,  anyplace,  anytime. 
Its  main  objective  is  to  ensure  that  each  end  user  is  given  the 
most  appropriate  interactive  experience,  by  supporting  both  ac¬ 
cessible  and  high-quality  interaction.  In  this  context,  in  order  to 
accommodate  genuinely  universal  design  in  the  production 
process  of  software  application  and  services,  two  key  issues  need 
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to  be  optimally  addressed.  The  first  is  scientific,  primarily  con¬ 
cerning  the  way  the  particular  problem  is  to  be  technically  re¬ 
solved,  while  the  second  is  cost  specific,  reflecting  the  criticality 
for  cost-effective  and  economically  viable  solutions.  Clearly,  pro¬ 
ducing  and  enumerating  distinct  user  interface  designs  through 
the  conduct  of  multiple  design  processes  is  an  impractical  solu¬ 
tion,  since  the  overall  cost  for  managing  in  parallel  such  a  large 
number  of  independent  design  processes,  as  well  as  for  trans¬ 
forming  each  produced  user  interface  version  into  a  target  soft¬ 
ware  implementation,  would  be  unacceptable  both  for  the  de¬ 
sign  and  the  software  implementation  phases.  Instead,  a  design 
process  is  required  which  may  lead  to  a  single  design  outcome 
that  appropriately  links  and  organizes  the  differentiating  aspects 
of  the  resulting  interactive  application,  around  common  abstract 
design  structures  and  patterns,  making  it  far  easier  to  (a)  map 
to  a  target  software  system  implementation,  and  (b)  maintain, 
update,  and  extend  the  design  itself 

The  need  for  introducing  alternative  design  artifacts,  even 
for  the  same  specific  design  context  (such  as  a  particular  sub¬ 
task),  emerges  from  the  fact  that,  in  universal  design,  the  design 
problem  encompasses  largely  varying  parameters,  such  as  user- 
and  usage-context  parameters.  Consequently,  when  designing 
for  any  particular  dialogue  design  context,  it  is  likely  that  differ¬ 
entiating  values  of  those  problem  parameters  dictate  the  de¬ 
sign  of  diverse  dialogue  artifacts.  This  issue  introduces  two  im¬ 
portant  requirements  for  pursuing  a  suitable  design  method. 

The  first  is  that  such  a  method  should  offer  the  capability 
to  associate  multiple  alternative  dialogue  artifacts  to  a  particu¬ 
lar  single  design  context,  due  to  the  varying  design  problem  pa¬ 
rameters,  by  enabling  the  unambiguous  association  of  each  al¬ 
ternative  artifact  with  its  corresponding  values  of  the  problem 
parameters. 

The  second  is  that  the  method  should  emphasize  capturing 
of  the  more  abstract  structures  and  patterns  inherent  within 
user  interface  designs,  enabling  the  hierarchical  incremental 
specialization  towards  the  lower  physical  level  of  interaction, 
making  it  possible  to  introduce  alternative  dialogues  patterns  as 
close  to  the  physical  design  as  possible.  This  will  make  it  far  eas¬ 
ier  for  the  design  space  to  be  updated  and  to  evolve,  since  mod¬ 
ifications  and  extensions  due  to  the  consideration  of  additional 
values  of  the  problem  parameters  (e.g.,  considering  new  user- 
and  usage-context  attribute  values)  can  be  applied  locally  closer 
to  the  lower  levels  of  the  design,  without  affecting  the  rest  of  the 
design  space. 

To  demonstrate  briefly  the  need  for  supporting  alternative 
dialogue  artifacts  for  the  same  design  context,  an  example  from 
a  real-life  application  will  be  used.  The  AVANTI  web  browser 
(Stephanidis,  Paramythis,  Sfyrakis,  &  Savidis,  2001)  was  devel¬ 
oped  to  enable  web  access  by  supporting  adaptation  to  the  par¬ 
ticular  user  as  well  as  to  the  context  of  use.  During  the  user 
interface  design  phase,  while  concentrating  on  the  design  con¬ 
text  of  the  link  dialogue  task,  alternative  designs  have  been  dic¬ 
tated,  due  to  varying  user-  and  usage-context  parameters  con¬ 
sidered,  as  shown  in  Fig.  55.1. 

Since  the  differing  artifacts  have  been  part  of  the  final 
AVANTI  web  browser  user  interface  design,  the  design  repre¬ 
sentation  formalism  needed  to  enable  their  copresence  within 
the  resulting  design  space,  by  clearly  associating  each  artifact 
to  the  link-selection  task  and  its  corresponding  values  of  the 


Load  Largot  document  ■^hltp  address^  ? 
It  rail  take  approximately  ••number-* 
secutidti. 


Traditional  hvpetlink  as 
underlined  text. 


Alternative  design  for  [|  hypetlfti^ 
working  as  GUI  push  button 


2one  ,• 


SI 


S2 


S3 


SI:  Requires  load  confirmation  It  is  designed  for 

1 .  Users  with  limited  web  experience; 

2.  Users  that  get  tired  and  show  high  emors 
rates  during  interaction; 

3.  Low  bavidwidth  networks. 

S2:  Link  selection  is  done  as  far  as  the  mouse 
cursor  is  inside  the  rectangular  area  of  the  link 
and  the  left  mouse  button  is  pressed.  Designed  for 
frequent  and  /  or  web-exF»rt  users. 

S3:  Link  selection  is  done  via  typical  GUI  button 
press  (i.e.  press  while  cursor  inside,  release  while 
cursor  inside).  In  comparison  to  S2,  it  allows 
cancellation  in  the  middle  of  the  action  (by  releasing 
mouse  button  while  cursor  is  outside}. 


S4 


S4:  Gravity  support  for  link  targeting.  If  mouse  cursor 
is  inside  gravity  zone  of  ftie  iink,  it  is  automatically 
positioned  at  the  centre  of  the  link.  Designed  for 
users  that  cannot  perform  accurate  mouse 
positioning. 


Link  dialogue 


Link  torgeting 
Link  selection 
Lood  confirmation 


FIGURE  55.1.  Designing  alternative  artifacts  for  the  Link  dia¬ 
logue  task.  Se  is  used  to  indicate  "empty"  (i.e.  no  load  confirma¬ 
tion  dialogue  supported).  S5  is  the  typical  manual  link  targeting 
GUI  dialogue  through  the  mouse. 


user-  and  usage-context  parameters.  A  loose  design  notation  is 
employed  in  Fig.  55.1,  to  show  hierarchical  task  analysis  (sub¬ 
task  sequencing  is  omitted  for  clarity),  as  well  as  the  need  for 
alternative  incarnations  of  a  single  task  (e.g.,  styles  S2/S3  for  link 
selection,  styles  Sl/Se  for  load  confirmation,  and  styles  S4/S5  for 
link  targeting) .  Following  this  example  taken  from  the  AVANTI 
web  browser,  during  runtime,  depending  on  the  particular  end 
user  and  usage  context  attribute  values,  the  appropriate  corre¬ 
sponding  implemented  artifacts  will  only  have  to  be  activated. 


Polymorphic  Task  Hierarchies 

A  polymorphic  task  hierarchy  combines  three  fundamental 
properties:  (a)  hierarchical  decomposition,  (b)  polymorphism, 
and  (c)  task  operators.  The  hierarchical  decomposition  adopts 
the  original  properties  of  hierarchical  task  analysis  for  incre¬ 
mental  decomposition  of  user  tasks  to  lower  level  actions.  The 
polymorphism  property  provides  the  design  differentiation  ca¬ 
pability  at  any  level  of  the  task  hierarchy,  according  to  particular 
user-  and  usage-context  attribute  values.  Finally,  task  operators, 
which  are  based  on  the  communicating  sequential  processes 
(CSP)  language  for  describing  the  behavior  of  reactive  systems 
(Hoare,  1978),  enable  the  expression  of  dialogue  control  flow 
formulae  for  task  accomplishment.  Those  specific  operators, 
taken  from  the  domain  of  reactive  systems  and  process  syn¬ 
chronization,  have  been  selected  due  to  their  appropriateness 
in  expressing  temporal  relationships  of  user  actions  and  tasks. 
However,  designers  may  freely  employ  additional  operators  as 
needed  (e.g.,  the  set  is  not  closed)  or  may  choose  to  document 
dialogue  sequencing  and  control  outside  the  task-structure  in 
natural  language,  when  it  engages  more  comprehensive  algo¬ 
rithmic  logic  (e.g.,  consider  the  verbally  documented  precon¬ 
dition  if  the  logged  user  is  a  guest,  no  sign-in  is  required,  else, 
the  access  privileges  should  be  checked  and  the  sign-in  dialogue 
is  activated  before  chat) . 
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The  concept  of  polymorphic  task  hierarchies  is  illustrated  in 
Fig.  55.2.  Alternative  task  decomposition  is  called  a  “decompo¬ 
sition  style,”  or  simply  a  “style,”  and  is  to  be  given  by  designers 
an  appropriate  descriptive  name.  Alternative  task  subhierarchies 
are  attached  to  their  respective  styles.  The  example  polymor¬ 
phic  task  hierarchy  of  Fig.  55.2  indicates  the  way  two  alterna¬ 
tive  dialogue  styles  for  a  “Delete  File”  task  can  be  designed — one 
exhibiting  direct  manipulation  properties  with  object-function 
syntax  (e.g.,  the  file  object  is  selected  prior  to  operation  to  be 
applied)  with  no  confirmation,  the  other  realizing  modal  dia¬ 
logue  with  a  function-object  syntax  (e.g.,  the  delete  function  is 
selected,  followed  by  the  identification  of  the  target  file)  and 
confirmation. 

Additionally,  the  example  demonstrates  the  case  of  physical 
specialization.  Since  selection  is  an  abstract  task,  it  is  possible 
to  design  alternative  ways  for  physically  instantiating  the  selec¬ 
tion  dialogue  (see  Figure  55.2,  lower-part),  via  scanning  tech¬ 
niques  for  motor-impaired  users,  via  3-D  hand  pointing  on  3-D 
auditory  cues  for  blind  people,  via  enclosing  areas  (e.g.,  irregu¬ 
lar  rubber  banding)  for  sighted  users,  and  via  Braille  output  and 
keyboard  input  for  deaf  and  blind  users.  The  unified  user  inter¬ 
face  design  method  does  not  require  the  designer  to  follow  the 
polymorphic  task  decomposition  all  the  way  down  the  user-task 
hierarchy,  until  primitive  actions  are  met.  A  nonpolymorphic 
task  can  be  specialized  at  any  level,  following  any  design  method 
chosen  by  the  interface  designer.  For  instance,  in  Fig.  55.2 
(lower  part)  graphical  mock  ups  are  employed  to  describe  each 
of  the  alternative  physical  instantiations  of  the  abstract  selection 
task.  It  should  be  noted  that  the  interface  designer  is  not  con¬ 
strained  to  using  a  particular  model,  such  as  CSP  operators,  for 
describing  user  actions  for  device-level  interaction  (e.g.,  draw¬ 
ing,  drag-and-drop,  concurrent  input).  Instead,  an  alternative 


FIGURE  55.2.  The  polymorphic  task  hierarchy  concept,  where 
alternative  decomposition  "styles”  are  supported  (upper  part), 
and  an  exemplary  polymorphic  decomposition,  which  includes 
physical  design  annotation  (lower  part). 


may  be  preferred,  such  as  an  event-based  representation,  such 
as  ERL  (Hill,  1986)  or  UAN  (Hartson,  Siochi,  &  Hix,  1989). 

As  discussed  in  more  detail  in  the  subsequent  sections,  de¬ 
sign  polymorphism  entails  a  decision-making  capability  for  con¬ 
text-sensitive  selection  among  alternative  artifacts,  to  assemble 
a  suitable  interface  instance,  while  task  operators  support  tem¬ 
poral  relationships  and  access  restrictions  applied  to  the  inter¬ 
active  facilities  of  a  particular  interface  instance. 


The  Conduct  of  the  Method 

In  the  unified  user  interface  design  method,  there  are  three  cat¬ 
egories  of  design  artifacts,  all  of  which  are  subject  to  polymor¬ 
phism  on  the  basis  of  varying  user-  and  usage-context  parameter 
values.  These  three  categories  are  (see  Fig.  55.3) 

1.  User  tasks,  relating  to  what  the  user  has  to  do;  user  tasks  are 
the  center  of  the  polymorphic  task  decomposition  process. 

2.  System  tasks,  representing  what  the  system  has  to  do,  or 
how  it  responds  to  particular  user  actions  (e.g.,  feedback);  in 
the  polymorphic  task  decomposition  process,  system  tasks 
are  treated  in  the  same  manner  as  user  tasks. 

3.  Physical  design  that  concerns  the  various  physical  interface 
components  on  which  user  actions  corresponding  to  the  as¬ 
sociated  user  task  are  to  be  performed;  the  physical  struc¬ 
ture  may  also  be  subject  to  polymorphism. 

System  tasks  and  user  tasks  may  be  freely  combined  within 
task  formulas,  defining  how  sequences  of  user-initiated  actions 
and  system-driven  actions  interrelate.  The  physical  design,  pro¬ 
viding  the  interaction  context,  is  always  associated  with  a  par¬ 
ticular  user  or  system  task.  It  provides  the  physical  dialogue  pat¬ 
tern  associated  to  a  task-structure  definition.  Hence,  it  plays  the 
role  of  annotating  the  task  hierarchy  with  physical  design  infor¬ 
mation.  An  example  of  such  annotation  is  shown  in  Fig.  55.2, 
where  the  physical  designs  for  the  “Select  Delete”  task  are  ex¬ 
plicitly  depicted. 


FIGURE  55.3.  The  three  artifact  categories  in  the  Unified  User 
Interface  design  method,  for  which  polymorphism  may  be  ap¬ 
plied,  and  how  they  relate  to  each  other. 
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In  some  cases,  given  a  particular  user  task,  there  is  a  need 
for  differentiated  physical  interaction  contexts,  depending  on 
user-  and  usage-context  parameter  values.  Hence,  even  though 
the  task  decomposition  is  not  affected  (e.g.,  the  same  user  ac¬ 
tions  are  to  be  performed),  the  physical  design  may  have  to  be 
altered.  One  such  representative  example  is  relevant  to  chang¬ 
ing  particular  graphical  attributes,  on  the  basis  of  ethnographic 
user  attributes.  For  instance,  Marcus  (1996)  discussed  the  choice 
of  different  iconic  representations,  background  patterns,  visual 
message  structure,  and  so  forth,  on  the  basis  of  cultural  back¬ 
ground.  User  tasks,  and  in  certain  cases,  system  tasks,  need  not 
always  be  related  to  physical  interaction,  but  may  represent  ab¬ 
straction  on  either  user  or  system  actions.  For  instance,  if  the 
user  has  to  perform  a  selection  task,  then,  clearly,  the  physical 
means  of  performing  such  a  task  are  not  explicitly  defined,  un¬ 
less  the  dialogue  steps  to  perform  selection  are  further  de¬ 
composed. 

This  notion  of  continuous  refinement  and  hierarchical  analy¬ 
sis,  starting  from  higher  level  abstract  artifacts,  and  incremen¬ 
tally  specializing  towards  the  physical  level  of  interaction,  is  fun¬ 
damental  in  the  context  of  hierarchical  behavior  analysis,  either 
regarding  tasks  that  humans  have  to  perform  (Johnson,  John¬ 
son,  Waddington,  &  Shouls,  1988),  or  when  it  concerns  func¬ 
tional  system  design  (Saldarini,  1989).  At  the  core  of  the  unified 
user  interface  design  method  is  the  polymorphic  task  decom¬ 
position  process,  which  follows  the  methodology  of  abstract 
task  definition  and  incremental  specialization,  where  tasks  may 
be  hierarchically  analyzed  through  various  alternative  schemes. 
In  such  a  recursive  process,  involving  tasks  ranging  from  the  ab¬ 
stract  task  level,  to  specific  physical  actions,  decomposition  is 
applied  either  in  a  traditional  unimorphic  fashion  or  by  means  of 
alternative  styles.  The  overall  process  is  illustrated  in  Fig.  55.4; 
the  decomposition  starts  from  abstract  or  physical  task  design, 
depending  on  whether  top-level  user  tasks  can  be  defined  as  be¬ 
ing  abstract  or  not.  Next  follows  the  description  of  the  various 
transitions  (e.g.,  design  specialization  steps)  from  each  of  the 
four  states  illustrated  in  the  process  state  diagram  of  Fig.  55.4. 

Transitions  from  the  abstract  task  design  state.  An 

abstract  task  can  be  decomposed  either  in  a  polymorphic  fash- 


FIGURE  55.4.  The  polymorphic  task  decomposition  process  in 
the  Unified  User  Interface  design  method. 


ion,  if  user-  and  usage-context  attribute  values  pose  the  neces¬ 
sity  for  alternative  dialogue  patterns,  or  in  a  traditional  manner, 
following  a  unimorphic  decomposition  scheme.  In  the  case  of 
a  unimorphic  decomposition  scheme,  the  transition  is  realized 
via  a  decomposition  action,  leading  to  the  task  hierarchy  de¬ 
composition  state.  In  the  case  of  a  polymorphic  decomposition, 
the  transition  is  realized  via  a  polymorphose  action,  leading  to 
the  alternative  subhierarchies  design  state. 

Transitions  from  the  alternative  subhierarchies  de¬ 
sign  state.  Reaching  this  state  means  that  the  required  alter¬ 
native  dialogue  styles  have  been  identified,  each  initiating  a  dis¬ 
tinct  subhierarchy  decomposition  process.  Hence,  each  such 
subhierarchy  initiates  its  own  instance  of  polymorphic  task  de¬ 
composition  process.  While  initiating  each  distinct  process,  the 
designer  may  start  either  from  the  abstract  task  design  state  or 
from  the  physical  task  design  state.  The  former  is  pursued  if  the 
top-level  task  of  the  particular  subhierarchy  is  an  abstract  one. 
In  contrast,  the  latter  option  is  relevant  in  case  that  the  top-level 
task  explicitly  engages  physical  interaction  issues. 

Transitions  from  the  task  hierarchy  decomposition 
state.  From  this  state,  the  subtasks  identified  need  to  be  fur¬ 
ther  decomposed.  For  each  subtask  at  the  abstract  level,  there  is 
a  subtask  transition  to  the  abstract  task  design  state.  Otherwise, 
if  the  subtask  explicitly  engages  physical  interaction  means,  a 
subtask  transition  is  taken  to  the  physical  task  design  state. 

Transitions  from  the  physical  task  design  state.  Phys¬ 
ical  tasks  may  be  further  decomposed  either  in  a  unimorphic 
fashion  or  in  a  polymorphic  fashion.  These  two  alternative  de¬ 
sign  possibilities  are  indicated  by  decompose  and  polymor¬ 
phose  transitions  respectively. 

Relationships  Among  Alternative  Styles 

The  need  for  alternative  styles  emerges  during  the  design 
process  when  it  is  identified  that  there  are  particular  user-  and 
usage-context  attribute  values  not  addressed  by  the  dialogue  ar¬ 
tifacts  that  have  already  been  designed.  Starting  from  this  ob¬ 
servation,  one  could  argue,  all  alternative  styles,  for  a  particular 
polymorphic  artifact,  are  mutually  exclusive  to  each  other  (in 
this  context,  exclusion  means  that,  at  runtime,  only  one  of  those 
styles  may  be  active). 

However,  cases  in  which  it  is  meaningful  to  make  artifacts  be¬ 
longing  to  alternative  styles  exist,  concurrently  available  in  a 
single  adapted  interface  instance.  A  typical  case  concerns  file 
management  tasks,  where  two  alternative  but  compatible  arti¬ 
facts  may  coexist  during  runtime,  such  as  a  direct-manipulation 
one  and  a  command-based  one,  such  as  task-level  multimodal¬ 
ity.  In  the  unified  user  interface  design  method,  four  design  re¬ 
lationships  between  alternative  styles  are  distinguished  (see  Fig. 
55.5),  defining  whether  alternative  styles  may  be  concurrently 
present  at  runtime.  These  four  fundamental  relationships  re¬ 
flect  pragmatic,  real-world  design  scenarios. 

Exclusion.  The  exclusion  relationship  is  applied  when  the 
various  alternative  styles  are  deemed  usable  only  within  the 
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Exclusion 

Relates  many  styles.  Only  one  from  the  alternative  styles  may  be 
present. 

Compatibilitx' 

Relates  many  styles.  Any  of  the  alternative  styles  may  be  present. 

Substitution 

Relates  two  groups  of  styles  together.  When  the  second  is  made 
“active”  al  run-lime,  the  first  should  be  “deaclivaled”. 

Auj^mentation 

Relates  one  style  with  a  group  of  styles.  On  the  presence  of  any 
style  from  the  group  at  run-time,  the  single  style  may  be  also 
“activated”. 

FIGURE  55.5.  Design  relationships  among  alternative  styles, 
and  their  run-time  interpretation. 

space  of  their  target  user-  and  usage-context  attribute  values. 
For  instance,  assume  that  two  alternative  artifacts  for  a  particu¬ 
lar  subtask  are  being  designed,  aiming  to  address  the  user- 
expertise  attributes:  one  targeted  to  users  qualified  as  novice, 
and  the  other  targeted  to  expert  users.  Then,  these  two  are  de¬ 
fined  to  be  mutually  exclusive  to  each  other,  since  it  is  proba¬ 
bly  meaningless  to  activate  concurrently  both  dialogue  patterns. 
For  example,  at  runtime  a  novice  user  might  be  offered  a  func¬ 
tionally  simple  alternative  of  a  task,  where  an  expert  user  would 
be  provided  with  additional  functionality  and  greater  freedom  in 
selecting  different  ways  to  accomplish  the  same  task. 

Compatibility.  Compatibility  is  useful  among  alternative 
styles  for  which  the  concurrent  presence  during  interaction  al¬ 
lows  the  user  to  perform  certain  actions  in  alternative  ways, 
without  introducing  usability  problems.  The  most  important  ap¬ 
plication  of  compatibility  is  for  task  multimodality,  as  it  has  been 
previously  discussed  for  the  file  management  tasks. 

Substitution.  Substitution  has  a  very  strong  connection 
with  adaptivity  techniques.  It  is  applied  in  cases  where,  during 
interaction,  it  is  decided  that  some  dialogue  patterns  need  to 
be  substituted  by  others.  For  instance,  the  ordering  and  the 
arrangement  of  certain  operations  may  change  on  the  basis  of 
monitoring  data  collected  during  interaction,  through  which 
information  such  as  frequency  of  use  and  repeated  usage  pat¬ 
terns  can  be  extracted.  Hence,  particular  physical  design  styles 
would  need  to  be  cancelled,  while  appropriate  alternatives 
would  need  to  be  activated.  This  sequence  of  actions,  for  ex¬ 
ample,  cancellation  followed  by  activation,  is  the  realization  of 
substitution.  Thus,  in  the  general  case,  substitution  involves  two 
groups  of  styles:  some  styles  are  cancelled  and  substituted  by 
other  styles  that  are  activated  afterwards. 

Augmentation.  Augmentation  aims  to  enhance  the  in¬ 
teraction  with  a  particular  style  that  is  found  to  be  valid  but  not 
sufficient  to  facilitate  the  user’s  task.  To  illustrate  this  point,  as¬ 
sume  that,  during  interaction,  the  user  interface  detects  that  the 
user  is  unable  to  perform  a  certain  task.  This  would  trigger  an 
adaptation  (in  the  form  of  adaptive  action)  aiming  to  provide 
task-sensitive  guidance  to  the  user.  Such  an  action  should  not 
aim  to  invalidate  the  active  style  (by  means  of  style  substitution), 
but  rather  to  augment  the  user’s  capability  to  accomplish  the 
task  more  effectively  by  providing  informative  feedback.  Such 
feedback  can  be  realized  through  a  separate  but  compatible 
style.  Therefore,  it  follows  that  the  augmentation  relationship 
can  be  assigned  to  two  styles  when  one  can  be  used  to  enhance 


the  interaction  while  the  other  is  active.  Thus,  for  instance, 
the  adaptive  prompting  dialogue  pattern,  which  provides  task- 
oriented  help,  may  be  related  via  an  augmentation  relationship 
with  all  alternative  styles  (of  a  specific  task)  provided  that  it  is 
compatible  with  them. 


UNIFIED  INTERFACE  ENGINEERING 


In  the  context  of  a  unified  user  interface,  upon  startup  and  dur¬ 
ing  runtime,  the  software  user  interface  relies  on  the  particular 
user  and  context  profiles  to  assemble  the  eventual  interface  on 
the  fly,  collecting  and  gluing  together  the  constituent  interface 
components  required  for  the  particular  end  user  and  usage  con¬ 
text.  Such  constituent  components  are  the  alternative  artifacts 
identified  during  the  user  interface  design  process,  which  need 
to  be  appropriately  transformed  in  the  development  phase  to 
an  implementation  form. 

Effectively,  a  unified  user  interface  consists  of  runtime  com¬ 
ponents,  each  with  a  distinctive  role  in  performing  at  runtime 
a  type  of  an  interface  assembly  process,  by  selecting  the  most 
appropriate  dialogue  patterns  from  the  available  implemented 
design  space  (e.g.,  the  organized  collection  of  all  dialogue  arti¬ 
facts  produced  during  the  design  phase).  Examples  of  how  such 
intelligent  selection  and  assembly  of  user  interface  components 
for  adapted  interaction  delivery  are  provided  in  Fig.  55.6  and 
55.7,  taken  from  the  AVANTI  Project  (Stephanidis  et  al.,  2001). 

A  unified  user  interface  does  not  constitute  a  monolithic 
software  system  but  becomes  a  distributed  architecture  con¬ 
sisting  of  independent  intercommunicating  components,  pos¬ 
sibly  implemented  with  different  software  methods/tools  and 
residing  at  different  physical  locations.  These  components  co¬ 
operate  to  perform  adaptation  according  to  the  individual  end- 
user  attributes  and  the  particular  usage  context.  At  runtime,  the 
overall  adapted  interface  behavior  is  realized  by  two  comple¬ 
mentary  classes  of  system-initiated  actions: 

1.  Adaptations  driven  from  initial  user  and  context  information, 
acquired  without  performing  interaction  monitoring  analysis 
(e.g.,  what  is  known  before  starting  observing  the  user  or  the 
usage  context) 

2.  Adaptations  decided  on  the  basis  of  information  inferred  or 
extracted  by  performing  interaction-monitoring  analysis  (e.g., 
what  is  learned  by  observing  the  user  or  the  usage  context) 

The  former  behavior  is  referred  to  as  “adaptability”  (e.g.,  ini¬ 
tial  automatic  adaptation,  performed  before  initiation  of  inter¬ 
action),  reflecting  the  capability  of  the  interface  to  proactively 
and  automatically  tailor  itself  to  the  attributes  of  each  individ¬ 
ual  end  user.  The  latter  behavior  is  referred  to  as  “adaptivity” 
(e.g.,  continuous  automatic  adaptation)  and  characterizes  the 
capability  of  the  interface  to  cope  with  the  dynamically  changing/ 
evolving  characteristics  of  users  and  usage  contexts.  Adaptabil¬ 
ity  is  crucial  to  ensure  accessibility,  since  it  is  essential  to  pro¬ 
vide,  before  initiation  of  interaction,  a  fully  accessible  interface 
instance  to  each  individual  end  user.  Adaptivity  can  be  applied 
only  on  accessible  running  interface  instances  (e.g.,  ones  with 
which  the  user  is  capable  of  performing  interaction) ,  since  in- 
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FIGURE  55.6.  Three  different  interface  versions  of  the  AVANTi 
browser  produced  by  adaptation-oriented  selection  of  different 
alternative  styles. 
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FIGURE  55.7.  Alternative  augmentation-oriented  styles  for 
motor-impaired  user  access  activated  at  run-time  in  the  AVANTi 
browser. 
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FIGURE  55.8.  The  complementary  roles  of  adaptability  (left) 
and  adaptivity  (right)  as  realised  in  Unified  User  Interfaces,  to 
provide  user-  and  usage-context-  adapted  behaviour. 


teraction  monitoring  is  required  for  the  identification  of  chang¬ 
ing  or  emerging  decision  parameters  that  may  drive  dynamic  in¬ 
terface  enhancements.  The  complementary  roles  of  adaptability 
and  adaptivity  are  depicted  in  Fig.  55.8.  This  fundamental  dis¬ 
tinction  is  made  due  to  the  different  runtime  control  require¬ 
ments  between  those  two  key  classes  of  adaptation  behaviors, 
requiring  different  software  engineering  policies. 


Architectural  Overview 

In  this  section,  the  runtime  architecture  for  unified  user  inter¬ 
faces  will  be  discussed,  providing  an  outline  of  the  adopted  ar¬ 
chitectural  components  with  information  regarding  (a)  the 
functional  role,  (b)  the  runtime  behavior,  (c)  the  encapsulated 
context,  and  (d)  the  implementation  method.  The  components 
of  the  unified  user  interface  architecture  are  (see  Fig.  55.9) 

•  User-Information  Server  (UIS) 

•  Context  Parameters  Server  (CPS) 

•  Decision  Making  Component  (DMC) 

•  Dialogue  Patterns  Component  (DPC) 

User-information  server.  Its  functional  role  is  to  sup¬ 
ply  user  attribute  values  (a)  known  offline,  without  performing 
interaction-monitoring  analysis  (e.g.,  motor/sensory  abilities, 
age,  nationality,  etc.)  and  (b)  detected  online,  from  real-time 
interaction-monitoring  analysis  (e.g.,  fatigue,  loss  of  orienta¬ 
tion,  inability  to  perform  the  task,  interaction  preferences,  etc.). 
During  runtime,  it  plays  a  two-fold  role:  (a)  it  constitutes  a 
server  that  maintains  and  provides  information  regarding  indi¬ 
vidual  user  profiles,  and  (b)  it  encompasses  user- representation 
schemes,  knowledge  processing  components,  and  design  in¬ 
formation  dynamically  to  detect  user  properties  or  characteris¬ 
tics.  This  component  may  need  to  employ  alternative  ways  of 
representing  user-oriented  information.  In  this  sense,  a  reposi¬ 
tory  of  user  profiles  serves  as  a  central  database  of  individual 
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FIGURE  55.9.  The  four  basic  components  of  the  Unified  User 
Interface  macro-architecture  outlining  run-time  communication 
links. 

user  information  (e.g.,  registry).  In  many  cases,  a  profile  struc¬ 
ture  as  a  typical  list  of  typed  attributes  will  suffice  shown;  this 
model,  though  quite  simple,  is  proved  in  real  practice  to  be  very 
powerful  and  flexible  (can  be  stored  in  a  database,  thus  turning 
the  profile  manager  to  a  remotely  accessed  database).  Addition¬ 
ally,  more  sophisticated  user  representation  and  modeling 
methods  can  be  employed,  including  support  for  stereotypes 
of  particular  user  categories.  In  case  dynamic  user-attribute  de¬ 
tection  is  to  be  supported,  the  content  may  include  dynami¬ 
cally  collected  interaction  monitoring  information,  design  infor¬ 
mation,  and  knowledge-processing  components. 

From  a  knowledge  representation  point  of  view,  static  or  pre¬ 
existing  user  knowledge  may  be  encoded  in  any  appropriate 
form,  depending  on  the  type  of  information,  the  user  informa¬ 
tion  server  should  feed  to  the  decision  making  process.  More¬ 
over,  additional  knowledge-based  components  may  be  employed 
for  processing  retrieved  user  profiles,  drawing  assumptions 
about  the  user,  or  updating  the  original  user  profiles.  In  Fig. 
55.10,  the  internal  architecture  of  the  user-information  server 


User  Information  Server 


employed  in  the  AVANTI  web  browser  is  presented.  It  should  be 
noted  that  the  first  version  of  the  AVANTI  web  browser  pro¬ 
duced  in  the  context  of  the  AVANTI  Project  employed  BGP-MS 
(Kobsa  &  Pohl,  1995)  for  the  role  of  the  user-information  server. 
The  profile  manager  has  been  implemented  as  a  database  of 
profiles.  The  two  other  subsystems  (e.g.,  monitoring  manager, 
modeling,  and  inference)  are  needed  only  in  case  dynamic  user- 
attribute  detection  is  required.  The  interaction  monitoring  his¬ 
tory  has  been  implemented  as  a  time-stamped  list  of  monitoring 
events  (the  structure  of  monitoring  events  is  described  in  the 
analysis  of  communication  semantics)  annotated  with  simple  di¬ 
alogue  design  context  information  (e.g.,  just  the  subtask  name). 
In  the  user  models,  all  of  the  types  of  dynamically  detected  user 
attributes  have  been  identified  (e.g.,  inability  to  perform  a  task, 
loss  of  orientation — those  were  actually  the  two  dynamically  de¬ 
tectable  attributes  required  by  the  design  in  the  AVANTI  web 
browser).  Each  such  attribute  is  associated  with  its  correspond¬ 
ing  behavioral  action  patterns.  In  the  specific  case,  the  repre¬ 
sentation  of  the  behavioral  patterns  has  been  implemented  to¬ 
gether  with  the  pattern-matching  component,  by  means  of  state 
automata. 

For  instance,  one  heuristic  pattern  to  detect  loss  of  orienta¬ 
tion  has  been  defined  as  follows:  the  user  moves  the  cursor  in¬ 
side  the  web-page  display  area,  without  selecting  a  link,  for 
more  than  At  seconds.  The  state  automaton  (see  Fig.  55.11) 
starts  recording  mouse  moves  in  the  page  area,  based  on  in¬ 
coming  monitored  mouse  moves,  while  finally  triggering  de¬ 
tection  when  no  intermediate  activity  is  successfully  performed 
by  the  user.  This  worked  fine  from  an  implementation  point  of 
view.  However,  from  the  user  interface  design  point  of  view,  all 
such  heuristic  assumptions  and  behavioral  patterns  had  to  be 
extensively  verified  with  real  users  so  as  to  assert  the  relation¬ 
ship  between  the  observable  user  behavior  and  the  particular 
inferred  user  attributes.  This  is  a  common  issue  in  all  adaptive 
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FIGURE  55. 10.  Internal  micro-architecture  of  the  User  Infor¬ 
mation  Server.  User  profile  is  posted  to  the  Decision  Making 
Component  (DMC),  while  interaction-monitoring  information  is 
received  form  the  Dialogue  Patterns  Component  (DPC). 


FIGURE  55.11.  An  example  of  an  augmented  state  automaton 
used  as  an  implementation  technique  for  behavioural  patterns; 
the  state  automaton  is  directly  hard-coded  in  the  UlS  side,  while 
multiple  such  automata  co-exist. 
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systems  that  employ  heuristics  for  detecting  user  attributes  at 
runtime,  practically  meaning  that  the  validity  of  the  assump¬ 
tions  inferred  is  dependent  on  the  appropriateness  of  the  spe¬ 
cific  user-action  patterns  chosen. 

Context  parameters  server.  The  purpose  of  this  com¬ 
ponent  is  to  supply  context  attribute  values  (machine  and  en¬ 
vironment)  of  two  types:  (a)  (potentially)  invariant,  meaning 
unlikely  to  change  during  interaction,  such  as  peripheral  equip¬ 
ment,  and  (b)  variant,  dynamically  changing  during  interaction 
(e.g.,  environment  noise,  failure  of  particular  equipment,  etc.). 
This  component  is  not  intended  to  support  device  indepen¬ 
dence,  but  to  provide  device  awareness.  Its  purpose  is  to  enable 
the  decision-making  component  to  select  those  interaction  pat¬ 
terns,  which,  apart  from  fitting  the  particular  end-user  attrib¬ 
utes,  are  also  appropriate  for  the  type  of  equipment  available  on 
the  end-user  machine. 

The  usage-context  attribute  values  are  communicated  to  the 
decision-making  component  before  the  initiation  of  interaction. 
Additionally,  during  interaction,  some  dynamically  changing  us¬ 
age-context  parameters  may  also  be  fed  to  the  decision-making 
component  for  decisions  regarding  adaptive  behavior.  For  in¬ 
stance,  assume  that  the  initial  decision  for  selecting  feedback 
leads  to  the  use  of  audio  effects.  Then,  the  dynamic  detection  of 
an  increase  in  environmental  noise  may  result  in  a  runtime  deci¬ 
sion  to  switch  to  visual  feedback  (the  underlying  assumption  be¬ 
ing  that  such  a  decision  does  not  conflict  with  other  constraints) . 

This  component  encompasses  a  listing  of  the  various  in¬ 
variant  properties  and  equipment  of  the  target  machine  (e.g., 
handheld  binary  switches,  speech  synthesizer  for  English,  high- 
resolution  display,  mode  I6bits,  1024X768,  noisy  environment, 
etc.).  In  this  context,  the  more  information  regarding  the  char¬ 
acteristics  of  the  target  environment  and  machine  is  encapsu¬ 
lated,  especially  concerning  I/O  devices,  the  better  adaptation 
can  be  achieved  (information  initially  appearing  redundant  is 
likely  to  be  used  in  future  adaptation-oriented  extensions). 

The  registry  of  environment  properties  and  available  equip¬ 
ment  can  be  implemented  easily  as  a  profile  manager  in  the 
form  of  a  database.  Such  information  will  be  communicated  to 
the  decision-making  component  as  attribute/value  pairs.  How¬ 
ever,  if  usage-context  information  is  to  be  dynamically  collected, 
such  as  environment  noise,  or  reduction  of  network  bandwidth, 
the  installation  of  proper  hardware  sensors  or  software  moni¬ 
tors  becomes  mandatory. 

Decision-making  component.  The  role  of  this  compo¬ 
nent  is  to  decide,  at  runtime,  the  necessary  adaptability  and 
adaptivity  actions  and  subsequently  to  communicate  those  to 
the  dialogue  patterns  component  (the  latter  being  responsible 
for  applying  adaptation-oriented  decisions).  To  decide  adapta¬ 
tion,  this  component  performs  a  kind  of  rule-based  knowledge 
processing,  to  match  end-user  and  usage-context  attribute  values 
to  the  corresponding  dialogue  artifacts,  for  all  the  various  dia¬ 
logue  contexts. 

This  module  encompasses  the  logic  for  deciding  the  neces¬ 
sary  adaptation  actions,  on  the  basis  of  the  user-  and  context-at¬ 
tribute  values,  received  from  the  user-information  server  and 
the  context  parameters  server,  respectively.  Such  attribute 
values  will  be  supplied  to  the  decision-making  component, 


prior  to  the  initiation  of  interaction  within  different  dialogue 
contexts  (e.g.,  initial  values,  resulting  in  initial  interface  adap¬ 
tation),  as  well  as  during  interaction  (e.g.,  changes  in  particu¬ 
lar  values,  or  detection  of  new  values,  resulting  in  dynamic  in¬ 
terface  adaptations) . 

In  the  proposed  approach,  the  encapsulated  adaptation 
logic  should  reflect  predefined  decisions  during  the  design 
stage.  In  other  words,  the  inference  mechanisms  employ  well- 
defined  decision  patterns  that  have  been  validated  during  the 
design  phase  of  the  various  alternative  dialogue  artifacts.  In 
practice,  this  approach  leads  to  a  rule-based  implementation, 
in  which  embedded  knowledge  reflects  adaptation  rules  that 
have  been  already  constructed  and  documented  as  part  of  the 
design  stage.  This  decision-making  policy  is  motivated  by  the  as¬ 
sumption  that  if  a  human  designer  cannot  decide  upon  adapta¬ 
tion  for  a  dialogue  context,  given  a  particular  end  user  and  us¬ 
age  context,  then  a  valid  adaptation  decision  cannot  be  taken  by 
a  knowledge-based  system  at  runtime.  Later  in  this  paper,  while 
discussing  some  implementation  details  of  the  AVANTI  web 
browser,  specific  excerpts  from  the  rule  base  of  the  decision 
engine  will  be  discussed. 

The  first  remark  regarding  the  implementation  of  decision 
making  concerns  the  apparent  awareness  regarding  (a)  the  var¬ 
ious  alternative  dialogue  artifacts  (how  they  are  named — e.g., 
virtual  keyboard,  for  which  dialogue  context  they  have  been  de¬ 
signed — e.g.,  http  address  text  field),  and  (b)  user-  and  usage- 
context  attribute  names,  and  their  respective  value  domains 
(e.g.,  attribute  age  being  integer  in  range  5  ...  110). 

The  second  issue  concerns  the  input  to  the  decision  process, 
being  individual  user-  and  usage-context  attribute  values.  Those 
are  received  at  runtime  both  from  the  user-information  server 
and  from  the  context  information  server,  either  by  request  (e.g., 
the  decision-making  component  takes  the  initiative  to  request 
the  end-user  and  usage-context  profile  at  start  up  to  draw 
adaptability  decisions)  or  by  notification  (e.g.,  when  the  user- 
information  server  draws  assumptions  regarding  dynamic  user 
attributes,  or  when  the  context  parameters  server  identifies  dy¬ 
namic  context  attributes). 

The  third  issue  concerns  the  format  and  structure  of  knowl¬ 
edge  representation.  In  all  developments  that  we  have  carried 
out,  it  has  been  proved  that  a  rule  based  logic  implementation 
is  practically  adequate.  Moreover,  all  interface  designers  en¬ 
gaged  in  the  design  process  emphasized  that  this  type  of  knowl¬ 
edge  representation  approach  is  far  more  close  to  their  own  way 
of  rule-based  thinking  in  deciding  adaptation.  This  remark  has 
led  to  excluding,  at  a  very  early  stage,  other  possible  approaches, 
such  as  heuristic  pattern  matching,  weighting  factor  matrices,  or 
probabilistic  decision  networks. 

The  final  issue  concerned  the  representation  of  the  out¬ 
comes  of  the  decision  process  in  a  form  suitable  for  being  com¬ 
municated  and  easily  interpreted  by  the  dialogue  patterns  com¬ 
ponent.  In  this  context,  it  has  been  practically  proved  that  two 
categories  of  dialogue  control  actions  suffice  to  communicate 
adaptation  decisions:  (a)  activation  of  specific  dialogue  compo¬ 
nents,  and  (b)  cancellation  of  previously  activated  dialogue 
components.  These  two  categories  of  adaptation  actions  pro¬ 
vide  the  expressive  power  necessary  for  communicating  the  di¬ 
alogue-component  manipulation  requirements  that  realize  both 
adaptability  and  adaptivity.  Substitution  is  modeled  by  a  message 
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containing  a  series  of  cancellation  actions  (e.g.,  the  dialogue 
components  to  be  substituted) ,  followed  by  the  necessary  num¬ 
ber  of  activation  actions  (e.g.,  which  dialogue  components  to 
activate  in  place  of  the  cancelled  components).  Therefore,  the 
transmission  of  those  commands  in  a  single  message  (e.g.,  can¬ 
cellation  actions  followed  by  activation  actions)  is  to  be  used 
for  implementing  a  substitution  action.  The  need  to  send  in  one 
message  packaged  information  regarding  the  cancelled  com¬ 
ponent,  together  with  the  components  which  take  its  place, 
emerges  when  the  implemented  interface  requires  knowledge 
of  all  (or  some)  of  the  newly  created  components  during  inter¬ 
action.  For  instance,  if  the  new  components  include  a  container 
(e.g.,  a  window  object)  with  various  embedded  objects,  and  if 
upon  the  creation  of  the  container  information  on  the  number 
and  type  of  the  particular  contained  objects  is  needed,  it  is  nec¬ 
essary  to  ensure  that  all  the  relevant  information  (e.g.,  all  en¬ 
gaged  components)  is  received  as  a  single  message.  It  should  be 
noted  that,  since  each  activation/cancellation  command  always 
carries  its  target  UI  component  identification,  it  is  possible  to 
engage  in  substitution  requests  components  that  are  not  nec¬ 
essarily  part  of  the  same  physical  dialogue  artifact.  In  addition, 
the  decision  to  apply  substitution  is  the  responsibility  of  the 
decision-making  component. 

One  issue  regarding  the  expressive  power  of  activation  and 
cancellation  decisions  categories  concerns  the  way  dynamic  in¬ 
terface  updates  (e.g.,  changing  style  or  appearance,  without 
closing  or  opening  interface  objects)  can  be  effectively  ad¬ 
dressed.  The  answer  to  this  question  is  related  to  the  specific 
connotation  attributed  to  the  notion  of  a  dialogue  component. 
A  dialogue  component  may  not  only  implement  physical  dia¬ 
logue  context,  such  as  a  window  and  embedded  objects,  but 
may  concern  the  activation  of  dialogue  control  policies  or  be  re¬ 
alized  as  a  particular  sequence  of  interface  manipulation  ac¬ 
tions.  In  this  sense,  the  interface  updates  are  to  be  collected  in 
an  appropriate  dialogue  implementation  component  (e.g.,  a 
program  function,  an  object  class,  a  library  module)  to  be  sub¬ 
sequently  activated  (e.g.,  called)  when  a  corresponding  activa¬ 
tion  message  is  received.  This  is  the  specific  approach  taken  in 
the  AVANTI  web  browser,  which,  from  a  software  engineering 
point  of  view,  enabled  a  better  organization  of  the  implementa¬ 
tion  modules  around  common  design  roles. 

Dialogue  patterns  component.  This  component  is  re¬ 
sponsible  for  supplying  the  software  implementation  of  all 
the  dialogue  artifacts  that  have  been  identified  in  the  design 
process.  Such  implemented  components  may  vary  from  dia¬ 
logue  artifacts  that  are  common  across  different  user-  and 
usage-context  attribute  values  (e.g.,  no  adaptation  needed),  to 
dialogue  artifacts  that  will  map  to  individual  attribute  values 
(e.g.,  alternative  designs  have  been  necessitated  for  adapted  in¬ 
teraction).  Additionally,  as  it  has  been  previously  mentioned, 
apart  from  implementing  physical  context,  various  components 
may  implement  dialogue-sequencing  control,  perform  interface 
manipulation  actions,  maintain  shared  dialogue  state  logic,  or 
apply  interaction  monitoring. 

The  dialogue  patterns  component  should  be  capable  of  ap¬ 
plying  at  runtime,  activation  or  cancellation  decisions  originated 
from  the  decision-making  component.  Additionally,  interaction¬ 
monitoring  components  may  need  to  be  dynamically  installed/ 


uninstalled  on  particular  physical  dialogue  components.  This 
behavior  will  serve  the  runtime  interaction  monitoring  control 
requests  from  the  user-information  server,  to  provide  continu¬ 
ous  interaction  monitoring  information  back  to  the  user-infor¬ 
mation  server  for  further  intelligent  processing.  The  dialogue 
patterns  component  either  embeds  the  software  implementa¬ 
tion  of  the  various  dialogue  components  or  is  aware  of  where 
those  components  physically  reside  by  employing  dynamic 
query,  retrieval,  and  activation  methods.  The  former  is  the  typi¬ 
cal  method  that  can  be  used  if  the  software  implementation  of 
the  components  is  provided  locally  by  means  of  software  mod¬ 
ules,  libraries,  or  resident  installed  components.  Usually,  most  of 
the  implementation  is  to  be  carried  out  in  a  single  programming 
language.  The  latter  approach  reflects  the  scenario  in  which  dis¬ 
tinct  components  are  implemented  on  top  of  component-ware 
technologies,  usually  residing  in  local/remote  component 
repositories  (also  called  “registries”  or  “directories”),  enabling 
reuse  with  dynamic  deployment. 

In  the  development  of  the  AVANTI  web  browser,  a  combina¬ 
tion  of  these  two  approaches  has  been  employed  by  imple¬ 
menting  most  of  the  common  dialogue  components  into  a  sin¬ 
gle  language  (actually  in  C-F-l-  by  employing  all  the  necessary 
toolkit  libraries)  while  implementing  some  of  the  alternative 
dialogue  artifacts  as  independent  Active  X  components  that 
were  located  and  employed  on  the  fly.  The  experience  from  the 
software  development  of  the  AVANTI  web  browser  has  proved 
that  (a)  the  single  language  paradigm  makes  it  far  easier  to  per¬ 
form  quick  implementation  and  testing  of  interface  compo¬ 
nents,  and  (b)  the  component-based  approach  largely  promotes 
binary  format  reuse  of  implemented  dialogue  components 
while  offering  far  better  support  for  dynamic  interface  assembly, 
which  is  the  central  engineering  concept  of  unified  user  inter¬ 
faces  (this  issue  we  will  be  elaborated  upon  in  the  conclusion 
section  of  the  paper) . 

The  microarchitecture  of  the  dialogue  patterns  component 
internally  employed  in  the  AVANTI  web  browser,  as  outlined  in 
Fig.  55.12,  emphasizes  internal  organization  to  enable  extensi¬ 
bility  and  evolution  by  adding  new  dialogue  components.  Ad¬ 
ditionally,  it  reflects  the  key  role  of  the  dialogue  patterns  com¬ 
ponent  in  applying  adaptation  decisions.  The  internal 
components  are  as  follows. 

The  activation  dispatcher  locates  the  source  of  implementa¬ 
tion  of  a  component  (or  simply  uses  its  API,  if  it  is  a  locally  used 


Dialogue  Patterns  Component 


FIGURE  55.12.  Internal  micro-architecture  of  the  Dialogue 
Patterns  Component. 
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library)  to  activate  it.  In  this  sense,  activation  may  imply  a  typical 
instantiation  in  OOP  terms,  calling  of  particular  service  func¬ 
tions,  or  activating  a  remotely  located  object.  After  a  component 
is  activated,  if  cancellation  is  to  be  applied  to  this  component, 
it  is  further  registered  in  a  local  registry  of  activated  compo¬ 
nents.  In  this  registry,  the  indexing  parameters  used  are  the  par¬ 
ticular  dialogue  context  (e.g.,  subtask,  for  instance  “http  address 
field”)  and  the  artifact  design  descriptor  (e.g.,  unique  descrip¬ 
tive  name  provided  during  the  design  phase — for  instance,  “vir¬ 
tual  keyboard”).  For  some  categories  of  components,  cancella¬ 
tion  may  not  be  defined  during  the  design  process,  meaning 
there  is  no  reason  to  register  those  at  runtime  for  possible  fu¬ 
ture  cancellation  (e.g.,  components  with  a  temporal  nature  that 
perform  only  some  interface  update  activities). 

The  cancellation  dispatcher  locates  a  component  based  on 
its  indexing  parameters  and  calls  for  cancellation.  This  may 
imply  a  typical  destruction  in  OOP  terms,  calling  internally  par¬ 
ticular  service  functions  that  may  typically  perform  the  unob¬ 
trusive  removal  of  the  physical  view  of  the  cancelled  component 
or  the  release  of  a  remote  object  instance.  After  cancellation  is 
performed,  the  component  instance  is  removed  from  the  local 
registry. 

The  monitoring  manager  plays  a  two-fold  role:  (a)  It  applies 
monitoring  control  requests  originated  from  the  user-informa¬ 
tion  server  by  first  locating  the  corresponding  dialogue  com¬ 
ponents  and  then  requesting  the  installation  (or  uninstallation) 
of  the  particular  monitoring  policy  (this  requires  implementa¬ 
tion  additions  in  dialogue  components,  for  performing  inter¬ 
action  monitoring  and  for  activating  or  deactivating  the  inter¬ 
action  monitoring  behavior);  and  (b)  it  receives  interaction 
monitoring  notifications  from  dialogue  components  and  posts 
those  to  the  user-information  server. 

The  communication  manager  is  responsible  for  dispatching 
incoming  communication  (activation,  cancellation,  and  moni¬ 
toring  control)  and  posting  outgoing  communication  (moni¬ 
toring  data  and  initial  adaptation  requests) .  One  might  observe 
an  explicit  link  between  the  dialogue  components  and  the  com¬ 
munication  manager.  This  reflects  the  initiation  of  interaction  in 
which  the  dialogue  control  logic  (residing  within  dialogue  com¬ 
ponents)  requests  iteratively  the  application  of  decision  mak¬ 
ing  (from  the  decision-making  component) .  Such  requests  will 
need  to  be  posted  for  all  cases  involving  dialogue  component  al¬ 
ternatives  for  which  adapted  selection  has  to  be  appropriately 
performed. 

The  dialogue  components  typically  encompass  the  real  im¬ 
plementation  of  physical  dialogues,  dialogue  control  logic,  and 
interaction  monitoring  method.  In  practice,  it  is  hard  to  accom¬ 
plish  isolated  implementation  of  the  dialogue  artifacts  as  inde¬ 
pendent  black  boxes  that  can  be  combined  and  assembled  on 
the  fly  by  independent  controlling  software.  In  most  designs,  it 
is  common  that  physical  dialogue  artifacts  are  contained  inside 
other  physical  artifacts.  In  this  case,  if  there  are  alternative  ver¬ 
sions  of  the  embedded  artifacts,  it  turns  out  that  to  make  con¬ 
tainers  fully  orthogonal  and  independent  with  respect  to  the 
contained,  one  has  to  support  intensive  parameterization  and 
pay  a  heavier  implementation  overhead.  However,  the  gains  are 
that  the  implementation  of  contained  artifacts  can  be  indepen¬ 
dently  reused  across  different  applications,  while  in  the  more 
monolithic  approach,  reuse  requires  deployment  of  the  con¬ 


tainer  code  (and  recursively,  of  its  container  too,  if  it  is  con¬ 
tained  as  well). 

Adaptability  and  Adaptivity  Cycles 

The  completion  of  an  adaptation  cycle,  being  either  adaptability 
or  adaptivity,  is  realized  in  a  number  of  distributed  processing 
stages  performed  by  the  various  components  of  the  unified  ar¬ 
chitecture.  During  these  stages,  the  components  communicate 
with  each  other,  requesting  or  delivering  specific  pieces  of  in¬ 
formation.  Fig.  55.13  outlines  the  processing  steps  for  perform¬ 
ing  both  the  initial  adaptability  cycle  (to  be  executed  only  once), 
as  well  as  the  two  types  of  adaptivity  cycles  (e.g.,  one  starting 
from  dynamic  context  attribute  values  and  another  starting  from 
interaction  monitoring  control).  Local  actions  indicated  within 
components  (in  each  of  the  four  columns)  are  either  outgoing 
messages,  shown  in  bold  typeface,  or  necessary  internal  pro¬ 
cessing,  illustrated  via  shaded  rectangles. 

Dynamic  User  Interface  Assembly 

The  concept  of  dynamic  interface  assembly  reflects  the  key 
runtime  mechanisms  to  support  adaptability  in  unified  user  in¬ 
terfaces.  Previous  work  in  adaptive  interaction  involved  tech¬ 
niques  such  as  detection  of  user  attributes,  adaptive  prompt¬ 
ing,  and  localized  lexical-level  modifications  (e.g.,  rearranging 
menu  options,  or  adding/removing  operation  buttons).  The  is¬ 
sue  of  making  the  interface  fit  from  the  beginning  to  individual 
users  has  been  addressed  in  the  past  mainly  as  a  configura¬ 
tion  problem,  requiring  interface  developers  to  supply  con¬ 
figuration  editors  so  that  end  users  could  fit  the  interface  to 
their  particular  preferences.  However,  such  methods  are  lim¬ 
ited  to  fine  tuning  some  lexical-level  aspects  of  the  interface 
(e.g.,  toolbars,  menus),  while  they  always  require  explicit  user 
intervention,  for  example,  there  is  no  automation.  In  this  con¬ 
text,  the  notion  of  adaptability,  as  realized  in  unified  user  in¬ 
terfaces,  offers  new  possibilities  for  automatically  adapted 
interactions,  while  the  architecture  and  runtime  mechanisms 
to  accomplish  dynamic  interface  assembly  constitute  a  unique 
software  engineering  perspective. 

Some  similarities  with  dynamic  interface  assembly  can  be 
found  in  typical  web-based  applications  delivering  dynamic  con¬ 
tent.  The  software  engineering  methods  employed  in  such  cases 
are  based  on  the  construction  of  application  templates  (tech¬ 
nologies  such  as  Active  Server  Pages,  ASP,  by  Microsoft  or  Java 
Server  Pages,  JSIj  by  JavaSoft  are  usually  employed)  with  embed¬ 
ded  queries  for  dynamic  information  retrieval,  delivering  to  the 
user  a  web  page  assembled  on  the  fly.  In  this  case,  there  are  no 
alternative  embedded  components — just  content  to  be  dynami¬ 
cally  retrieved — while  the  web-page  assembly  technique  is 
mandatory  when  HTML-based  web  pages  are  to  be  delivered  to 
the  end  user  (in  HTML,  each  time  the  content  changes,  a  differ¬ 
ent  HTML  page  has  to  be  written).  However,  in  case  a  full- 
fledged  embedded  component  is  developed  (e.g.,  ActiveX  object 
or  Java  Applet),  no  runtime  assembly  is  required,  since  the  em¬ 
bedded  application  internally  manages  content  extraction  and 
display,  as  a  common  desktop  information  retrieval  application. 
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A.  Processing  steps  for  adaptability  cycle 
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FIGURE  55.13.  Processing  steps,  engaging  communication  among  architectural  components,  to 
perform  initial  adaptability  cycles  (A),  as  well  as  the  two  types  of  adaptivity  cycles  (B).  Requests 
originated  from  Dialogue  Patterns  Component  are  indicated  with  dashed  ovals,  communicated 
messages  are  shown  with  special  banners,  processing  points  are  drawn  with  shaded  rectangles, 
while  logical  ordering  is  designated  with  numeric  oval  labels. 


FIGURE  55.14.  Illustration  of  the  dynamic  interface  assembly 
process  as  an  incremental  hierarchical  construction  procedure. 


The  implementation  of  unified  user  interfaces  is  organized 
in  hierarchically  structured  software  templates,  in  which  the 
key  placeholders  are  parameterized  container  components. 
This  hierarchical  organization,  as  it  has  been  reflected  in  the 
development  excerpts,  mirrors  the  fundamentally  hierarchical 
constructional  nature  of  interfaces.  The  ability  to  diversify  and 
support  alternatives  in  this  hierarchy  is  due  to  containment 
parameterization,  while  the  adapted  assembly  process  is  real¬ 
ized  by  selective  activation,  engaging  remote  decision  making 
on  the  basis  of  end-user  and  usage-context  information.  The 
dynamic  interface  assembly  process  reflects  the  hierarchical 
traversal  in  the  task  hierarchy,  starting  from  the  root,  to  decide, 
locate,  instantiate,  and  initiate  appropriately  every  target  user 
interface  component  (see  Fig.  55.15). 

This  process  primarily  concerns  the  interface  components 
that  implement  alternative  styles.  From  the  implementation 
point  of  view,  the  following  software  design  decisions  have 
been  made: 

•  The  task  hierarchy  has  been  implemented  as  a  tree  data 
structure  with  polymorphic  nodes  triggering  decision-making 
sessions  (see  Fig.  55.15). 
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FIGURE  55.15.  Illustrating  the  hierarchical  posting  of  decision 
requests,  causing  decision  sessions  for  each  polymorphic  task 
(shown  with  decomposition  alternatives  as  dashed  lines),  and 
marking  of  selected  alternative  styles  (i.e.,  interface  compo¬ 
nents),  after  each  decision  session  completes. 


•  Interface  components  have  been  implemented  as  distinct  in¬ 
dependent  software  modules,  implementing  generic  con¬ 
tainment  application  programming  interfaces  (APIs),  while 
exposing  a  singleton  control  API  for  dynamic  instantiation 
and  name-based  lookup. 

•  The  interface  assembly  procedure  is  actually  carried  out  via 
two  successive  hierarchical  passes: 

-  Execution  of  decision  sessions,  to  identify  the  specific 
styles  for  polymorphic  task  contexts,  which  will  be  part  of 
the  eventually  delivered  user  interface 

-  Interface  construction,  through  instantiation  and  initiation 
of  all  interface  components  for  the  decided  styles 


PL(Al)- PL(AI  n  *  PL(A]2)  •  PUA13) 

rL(All)  3 

PL(A12)-... 

PT.(AI3)  .... 


PL(A2)  -  PL( A2I)  •  PL(A22) 

Pr.(A2l)-4 

PUA22)  =  .. 


PUA.3)  =  PL(A.3n*  PL{A.32) 
PT.(A3I)-  2 
PUA32)  =  5 


FIGURE  55.16.  The  structure  of  polymorphic  containment 
hierarchies.  Alternative  contained  components  implement  a 
common  abstract  interface  that  containers  use,  and  vice  versa. 
PL  indicates  the  maximum  polymorphism  factor,  which  provides 
the  total  number  of  all  possible  different  run-time  incarnations 
of  an  interface  component,  recursively  defined  as  the  product 
of  the  polymorphic  factors  of  constituent  component  classes. 
Not  all  PL  combinations  may  be  semantically  viable. 


Polymorphic  Containment  Hierarchies 

In  Fig.  55.16,  the  concept  of  parametric  container  hierarchies  is 
illustrated,  while  in  Fig.  55.17,  an  instantiation  of  the  concept  is 
shown  for  the  AVANTI  web  browser.  Container  classes  expose 
their  containment  capabilities  and  the  type  of  supported  con¬ 
tained  objects  by  defining  abstract  interfaces  (e.g.,  abstract  OOP 
classes)  for  all  of  the  contained  component  classes. 

These  interfaces,  defined  by  container  class  developers,  con¬ 
stitute  the  programming  contract  between  the  container  and 
the  contained  classes.  In  this  manner,  alternative  derived  con- 
tained-component  classes  may  be  instantiated  at  runtime  as 
constituent  elements  of  a  container.  Following  the  definition  of 
polymorphic  factor  PL,  which  provides  a  practical  metric  of  the 
number  of  possible  alternative  runtime  configurations  of  a  com¬ 
ponent  (not  all  of  which  may  be  semantically  viable),  the  PL  of 
the  top-level  application  component  gives  the  number  of  the 
possible  alternative  dynamically  assembled  interface  instances. 
From  a  programming  point  of  view,  in  the  AVANTI  web  browser, 
the  activation  control  of  dialogue  components  for  runtime  as¬ 
sembly  has  been  mainly  realized  through  typical  library  function 
calls.  Such  function  calls  engage  object  instances  corresponding 
to  dialogue  components,  without  employing  any  component- 
ware  technology.  Hence,  this  runtime  assembly  behavior  has 
been  accomplished  without  the  need  of  locating,  fetching,  and 
combining  components  together.  Nevertheless,  efforts  have 
been  devoted  to  applying  and  testing  the  latter  approach  in  real 


FIGURE  55.17.  Parametric  polymorphic  containment  with  vari¬ 
ant  constituent  components  in  the  AVANTI  browser.  The  indica¬ 
tion  "Empty”  indicates  components  whose  presence  may  have 
to  be  omitted  upon  dynamic  interface  delivery  for  certain  user 
categories. 
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practice,  by  employing  a  component-ware  technology  (DCOM/ 
ActiveX)  for  a  limited  number  of  dialogue  components.  This  re¬ 
quired  a  more  labor-intensive  implementation  approach  (from 
a  C-l-  -f  point  of  view,  while  isolated  testing  of  components  with 
VisualBasic  was  far  easier),  for  packaging  dialogues  to  make  them 
component  enabled,  as  well  as  for  further  activating  and  using 
them  at  runtime.  However,  there  are  some  evident  advantages: 

•  Dialogue  components  need  not  be  carried  altogether,  but 
can  be  dynamically  loaded,  thus  promoting  a  thin  dialogue 
patterns  component  implementation. 

•  In  effect,  the  core  logic  of  the  dialogue  patterns  component, 
apart  from  dialogue  components,  can  be  also  packaged  as 
a  component  itself,  making  it  reusable  across  different 
applications. 

•  Automatic  updates  and  extensions  of  components  are  di¬ 
rectly  supported,  enabling  new  versions,  or  even  new  dia¬ 
logue  components  (addressing  more  user-  and  usage-context 
attribute  values),  to  be  centrally  installed  in  appropriate  com¬ 
ponent  repositories. 

Decision-Making  Specification 

The  decision-making  logic  is  defined  in  independent  decision 
blocks,  each  uniquely  associated  to  a  particular  task  context;  at 
most  one  block  per  distinct  task  context  may  be  supplied.  The 
decision-making  process  is  performed  in  independent  sequen¬ 
tial  decision  sessions,  and  each  session  is  initiated  by  a  request 
of  the  interface  assembly  module  for  execution  of  a  particular 
initial  decision  block.  In  such  a  decision  session,  the  evaluation 
of  an  arbitrary  decision  block  may  be  performed,  while  the  ses¬ 
sion  completes  once  the  computation  exits  from  the  initial  de¬ 
cision  block.  The  primary  decision  parameters  are  the  end-user 
and  usage-context  profiles,  defined  as  two  built-in  objects,  such 
as  user  and  context,  whose  attributes  are  syntactically  accessible 
in  the  form  of  named  attributes.  The  binding  of  attribute  names 
to  attribute  values  is  always  performed  at  runtime.  The  encap¬ 
sulation  of  composite  attributes  in  user  and  context  profiles  is 
easily  allowed  due  to  the  syntactic  flexibility  of  attributes  refer¬ 
ence.  For  instance,  “user-abilities.vision”  and  “user-abilities. hear¬ 
ing”  are  syntactic  sugar  for  “user-abilities.vision”  and  “user-abil¬ 
ities. hearing”,  where  “abilities.vision”  and  “abilities. hearing”  are 
two  distinct  independent  ordinal  attributes  of  the  user  built-in 
object.  Consequently,  even  though  all  attributes  in  the  DMSL 
language  are  semantically  scalar,  the  flexibility  of  attribute 
names  allows  syntactical  simulation  of  aggregate  structures. 

In  Fig.  55.18,  an  example  decision  block  is  shown,  being  an 
excerpt  of  the  implementation  of  the  decision  logic  AVANTI  web 
browser,  for  selecting  the  best  alternative  interface  components 
for  the  link  task  context.  The  interface  design  relating  to  this 
adaptation  decision  logic  is  provided  in  Fig.  55.1. 

In  the  DMSL  language,  the  fundamental  design  relationships 
among  alternative  styles  are  not  injected  as  a  part  of  the  seman¬ 
tics  but,  alternatively,  concrete  rule  patterns  are  delivered,  ef¬ 
fectively  mapping  those  relationships  to  implementation  skele¬ 
tons  of  decision  blocks.  This  gives  adaptation  designers  the 
freedom  not  to  necessarily  adopt  those  particular  design  rela¬ 
tionships,  in  case,  for  instance,  they  do  not  choose  to  employ 


taskcontext  link  [ 

evaluate  linktargeting', 
evaluate  linkselection', 
evaluate  loadconfirmation', 

J 

taskcontext  linktargeting  [ 

If  (vk%er. abilities. pointing  =  accurate)  then 
activate  “manual  pointing”; 

else 

activate  “gravity  pointing”; 

1 

taskcontext  linkselection  [ 

if  .wehknowledge  in  {good,  normal})  then 
activate  “underlined  text”; 

else 

activate  “push  button”; 

1 

taskcontext  loadconfirmation  [ 

if  (user .webknowledge  in  {low,  none}  or  context.rtf?r=/oH’)  then 
activate  “confirm  dialogue”; 
else 

activate  “empty”; 


FIGURE  55.18.  An  example  of  a  simple  decision  block  to  select 
the  most  appropriate  delivery  of  web  links  for  the  individual  end- 
user;  notice  that  names  in  italics  are  not  language  keywords  but 
are  treated  as  string  constants,  i.e.,  user.webknowledge  is  syn¬ 
tactic  sugar  for  user. "webknowledge". 


ExclusionfSi,  S2) 

1  CompatibilityfSi,  S2) 

if  (Si.cond)  then 

'  if  (Si.contf)  then 

activate  Si; 

activate  Si; 

else 

if  (Sa.  cond)  then 

if  (S2>  cond)  then 
activate  S?; 

activate  S2; 

Substitution(Si  by  S2) 

Augnientation(Si  by  S2) 

if  (S2>  cond  and  isactivefSi))  then  [ 

1  if  (Si.cond) 

cancel  5i; 

if  (not  i$active($i))  then 

activate  S2; 

activate  Si; 

] 

else 

else 

if  ($2.  con(f)  then 

if  (Si.cond) 

activate  Si; 

1  activate  S2; 

FIGURE  55.19.  The  decision  rule  patterns  associated  to  the 
relationships  among  alternative  styles;  the  style  condition  is  the 
boolean  expression  engaging  the  user  and  context  attribute  val¬ 
ues  for  which  the  style  is  designed. 

unified  design  as  the  adaptation-design  approach.  In  Fig.  55.19, 
the  DMSL  decision-rule  patterns  are  provided,  for  the  previ¬ 
ously  described  four  categories  of  style  relationships. 


UNIFIED  DEVELOPMENT 
IN  AMBIENT  INTELLIGENCE 


Ambient  intelligence  is  an  emerging  field  for  interactive  appli¬ 
cations  and  services,  in  which  the  capability  to  accomplish  in¬ 
creased  quality  of  the  delivered  interactive  experience  heavily 
depends  on  the  ability  to  cope,  in  a  cost  effective  and  robust 
way,  with  the  inherent  software  user  interface  implementation 
complexity.  In  this  section,  research  results  in  implementing  am¬ 
bient  interactions  are  reported;  in  particular,  we  discuss  the  way 
the  unified  user  interface  development  has  been  practiced, 
putting  forward  the  development  challenges  we  had  to  address 
to  accomplished  user  interface  unification. 
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Ambient  Interactions 

One  key  dimension  of  user  mobility,  in  a  disappearing  computer 
context,  concerns  the  provision  of  high-quality,  unobtrusive, 
and  fault-tolerant  dialogues  for  typical  computer-based  applica¬ 
tions.  In  this  context,  we  are  focusing  on  a  scenario  according  to 
which  the  basic  computing  machine  is  a  pocketsize  portable 
processing  unit  (PPU),  without  interaction-related  Input/Output 
(I/O)  hardware,  where  all  interaction  takes  place  dynamically 
with  wireless  devices,  which  can  be  either  wearable  (optional) 
or  part  of  the  surrounding  environment  infrastructure. 

Following  this  scenario,  the  user  carries  a  very  small  pro¬ 
cessing  unit,  such  as  the  size  of  a  credit  card,  with  an  embed¬ 
ded  operating  system  (e.g.,  Linux)  and  wireless  short-range  ra¬ 
dio  networking  (e.g.,  BlueTooth’™).  Additionally,  the  user  may 
optionally  collect  any  number  of  wearable  wireless  gadgets. 
Once  the  on  button  of  the  processing  unit  is  pressed,  the  sys¬ 
tem  boots,  and  then  seeks  for  in-range  devices  capable  of  host¬ 
ing  interaction.  When  such  devices  are  detected,  they  are  ap¬ 
propriately  employed  to  support  interaction.  At  some  point, 
some  devices  get  out  of  range  (e.g.,  they  are  lost),  and  the  sys¬ 
tem  tries  to  use  the  rest  of  available  devices  to  maintain  inter¬ 
action;  if  the  available  devices  do  not  suffice,  dialogue  is  con¬ 
sidered  as  stalled. 

When  new  devices  become  in-range  (e.g.,  they  are  dis¬ 
covered),  the  system  tries  to  engage  those  devices  in  inter¬ 
action,  either  to  resume  dialogue  from  a  stalled  state,  or  to  fur¬ 
ther  optimize  it  by  offering  a  better  interaction  alternative. 

This  notion  of  mobile  interaction  is  illustrated  in  Fig.  55.20, 
which  depicts  two  layers  of  dynamically  engaged  interaction- 
capable  devices:  (a)  Inner  layer  wearable  devices,  which  the 
user  may  or  may  not  carry,  depending  on  the  situation.  Wearable 
devices  are  not  anticipated  to  vary  on  the  move  as  frequently 
as  environment  devices.  And  (b)  Outer  layer  ambient/environ¬ 
ment  devices,  such  as  the  particular  set  of  devices  falling  inside 
a  wireless  communication  range  with  the  user’s  portable  pro¬ 


cessing  unit.  Those  devices  will  normally  vary  according  to  the 
user’s  location  in  space. 

The  primary  motivation  for  the  reported  work  is  based  on 
the  vision  that  the  future  computing  platforms  will  not  consti¬ 
tute  monolithic  all  power  in  one  devices,  but  will  likely  support 
open  interconnectivity  enabling  users  to  combine  the  facilities 
offered  by  distinct  devices  on  the  fly.  Physically  distributed  de¬ 
vices  may  be  either  wearable  or  available  within  the  ambient  in¬ 
frastructure  (either  stationary  or  mobile)  and  may  be  connected 
via  a  wireless  communication  link  for  easier  deployment.  Oper¬ 
ationally,  each  such  device  will  play  a  specific  role  by  exposing 
different  processing  capabilities  or  functions,  such  as  character 
display,  pointing,  graphics,  audio  playback,  speech  synthesis, 
storage,  network  access,  and  so  forth.  From  the  hardware  point 
of  view,  such  devices  may  be  wristwatches,  earphones,  public 
displays,  home  appliances,  office  equipment,  car  electronics, 
sunglasses,  ATMs,  and  so  forth. 

Unified  Ambient  User  Interfaces 

The  need  for  a  unified  user  interface  emerges  when  diverse  tar¬ 
get  use  groups  need  to  be  accommodated,  with  diverse  re¬ 
quirements  necessitating  differentiated  user  interface  delivery, 
either  macroscopically  for  example,  the  user  interface  is  radi¬ 
cally  different,  or  microscopically,  for  example,  there  are  specific 
targeted  task  variations  implying  alternative  user  interface  com¬ 
ponents.  In  this  context,  we  have  concentrated  in  unified  ambi¬ 
ent  interactions,  driven  by  our  ultimate  objective  to  pursue  am¬ 
bient  intelligence  applications  that  would  potentially  address 
the  needs  of  every  individual  end  user. 

Such  a  target  poses  severe  technical  challenges  accessibility; 
however,  with  appropriate  technical  propositions,  it  may  open 
new  opportunities  for  inclusive  mobile  interactions  and  acces¬ 
sible  ambient  intelligence.  The  implementation  details  of  the 
basic  system  to  allow  dynamic  deployment  of  ambient  devices 


FIGURE  55.20. 
interactions. 


The  two  layers  of  dynamically  engaged  ambient  computing  resources  in  mobile 
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for  interaction  purposes  is  reported  by  Savidis  and  Stephanidis 
(2005),  while  various  runtime  snapshots  of  real-life  applications 
and  related  technical  reports  may  be  found  on  the  2WEAR  Pro¬ 
ject  website  (http://2wear.ics.forth.gr).  Following  Fig.  55.21,  the 
system  offers  the  following  interactive  facilities  to  the  user: 

•  Basic  mobile  interface  consists  of: 

1.  The  PPU,  such  as  the  portable  computer  having  no  inter- 
action-specific  equipment,  developed  by  MASC  Ltd  in  the 
context  of  the  2WEAR  Project  (Majoe,  2003).  1. 

2.  The  wearable  wireless  Bluetooth  binary  switches,  which 
had  to  be  simulated  in  software  running  in  two  different 
PDAs  (still  using  Bluetooth). 2. 

3.  A  wearable  intelligent  wristwatch  device  controlled  and 
deployed  by  the  PPU  for  interaction  purposes,  also  devel¬ 
oped  by  MASC  Ltd  in  the  context  of  the  2WEAR  Project 
(Majoe,  2003). 3. 

4.  A  wheel  chair  attached  wireless  Bluetooth  output  termi¬ 
nal,  simulated  entirely  in  software  over  a  PDA. 

•  Ambient  interface  extension  concerns  all  types  of  remote  am¬ 
bient  devices,  controllable  over  Bluetooth  through  the  2WEAR- 
specific  control  protocol  (Savidis  &  Stephanidis,  2005),  offer¬ 
ing  output  interaction  facilities  such  as  graphics  terminals, 
character-based  terminals,  audio  equipment,  and  speech 
synthesizers. 

•  Cooperative  interface  extension  concerns  the  dynamic  uti¬ 
lization  of  typical  desktop  machines  available  on  the  ambi¬ 
ent  space  that  host  GUI  software  libraries.  In  order  for  the  de¬ 
veloped  system  to  be  multiplatform,  it  has  been  decided  to 


consider  computing  devices  that  encompass  the  Java  Full  or 
MicroEdition.  Such  computing  devices  are  employed  dynam¬ 
ically  by  the  runtime  interface  system  (on  the  PPU)  to  repli¬ 
cate  the  basic  mobile  interface  (and  its  particular  ambient 
extension)  on  a  proximate  desktop  machine  via  a  graphical 
interface.  This  interface  mirroring  is  activated  on  demand  by 
the  motor-impaired  user  in  order  to  enable  mobile  coopera¬ 
tion  with  an  able  user. 

•  The  software  architecture  for  ambient  interactions  (see  Fig. 
55.22)  reflects  a  novel  way  of  treating  environment  devices  in 
an  ambient  intelligence  framework:  potentially  every  envi¬ 
ronment  device  exports  user  interface  functionality  (e.g.,  text 
display,  audio  output,  graphic  display,  input  buttons,  etc.)  in 
the  form  of  services,  called  user  interface  microservices  over 
Bluetooth.  The  proximity-based  discovery  mechanism  of 
Bluetooth  allows  only  a  small  cluster  of  devices  to  be  made 
reachable  from  the  user  PPU,  while  the  set  of  devices  de¬ 
ployed  for  interaction  may  vary  with  time  as  the  user  moves 
within  the  environment.  Some  representative  microservices 
that  have  been  implemented  in  the  context  of  the  Voyager 
system  (Savidis  &  Stephanidis,  2005)  are: 

-  Button,  input  control 

-  Keyboard,  input  control 

-  Numeric  keypad,  input  control 

-  Text  area  (character  display) ,  output  control 

-  Text  line  (character  display),  output  control 

-  Graphics  display,  output  control 

-  Menu  (composite  device  service),  input/output  control 

-  Text  editor  (composite  device  service),  input/output  control 


Wearable  wireless  Portable  processing  unit 

BT  switches  (INPUT)  (PPF)  with  Voyager  system  and 
applications 


Wearable  wireless  controlled 
wristwatch  (INPUT  / 
OUTPUT) 


Wireless  chair-attached  PDA  / 
terminal  (OUTPUT) 


Basic  mobile  interface 


Routes:  (a)  via 
A56,  exit  2,  (b) 
Interstate,  B1 


Ei  aiTT/mTfL  m : 

gn 


Able  user 


Cooperative  interface  extension 


FIGURE  55.21.  The  system  infrastructure  for  motor-impaired  accessible  and  cooperative  ambient 
dialogues. 


55.  Unified  User  Interface  Development:  New  Challenges  and  Opportunities 


1099 


implementation 
for  dynamic  control 
of  remote  UI  resources 

Interactive 

application 

II  Local 

4,^  calls 

Remote  communication 
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FIGURE  55.22.  Overview  of  the  runtime  architecture  of  applications  having  an  ambient  dynam¬ 
ically  distributed  User  Interface,  relying  on  the  treatment  of  environment  devices  as  remote  wire¬ 
less  U1  micro-services,  over  ad-hoc  short-range  networking. 


-  GUI  server,  such  as  the  device  offers  a  full-fledged  remote 
GUI  APB 

The  software  architecture  illustrated  in  Fig.  55.22  reflects  the 
following  key  properties  of  ambient  interactions,  as  they  can 
be  implemented  through  the  Voyager  development  framework: 

•  Proximate  devices  are  discovered  dynamically,  while  devices 
getting  dynamically  out  of  the  proximity  range  (the  range  can 
be  fined  tuned  for  Bluetooth)  are  lost. 

•  Every  discovered  device  is  communicated  with  a  standard 
query  protocol  through  which  application  clients  (running 


at  the  PPU)  may  extract  information  regarding  the  hosted  UI 
I/O  resources  (e.g.,  microservices). 

•  Clients  may  establish  separate  communication  channels  with 
each  such  distinct  UI  resources  at  ambient  devices,  through 
the  control  protocol,  after  successfully  granting  the  UI  re¬ 
source  for  exclusive  use. 

•  Once  devices  are  lost  (e.g.,  get  out  of  the  communication 
range)  or  if  a  previously  granted  resource  is  no  longer  needed, 
it  is  released  for  further  use  by  other  clients. 

Examples  of  Bluetooth  devices  hosting  multiple  such  UI  mi¬ 
croservices  are  depicted  in  Fig.  55.23.  The  MASC  wristwatch  de- 
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FIGURE  55.23.  The  Voyager-enabled  wristwatch  (MASC  hardware  version  at  bottom  left, 
WNT/MFC  simulation  at  bottom  right),  and  the  Voyager-enabled  software  HiFi  device  (Linux/GTK-F 
CD  player  at  top  left,  WNT/MFC  CD  player  at  top  right). 
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vices  host  five  such  basic  microservices:  four  programmable  but¬ 
tons  supporting  text  title  that  can  be  set  by  software  (shown  at 
bottom  left)  and  a  rectangular  text  area  for  character  display.  In 
Fig.  55.23,  the  wristwatch  is  used  remotely  by  the  Break  Out  game 
application  (Savidis  &  Stephanidis,  2004).  Environment  devices, 
such  as  the  Hi  Fi,  are  entirely  simulated  with  software  running 
on  distinct  PDAs  (see  top  left  part  of  Fig.  55.23).  The  capability 
to  make  an  interface  running  on  a  small  PPU,  using  wearable 
Bluetooth  switches  for  scanning  interaction,  while  displaying  re¬ 
motely  to  wireless  devices,  like  the  wheel  chair  attached  termi¬ 
nal  or  dynamically  appearing  ambient  displays,  relies  on  the  fol¬ 
lowing  feature  of  Voyager:  the  software  implementation  supports 
input/output  that  is  received/posted  from/to  remote  machines 
over  Bluetooth.  Locally,  within  Voyager  clients,  the  interface  state 
and  the  dispatching  logic  are  only  accommodated. 

Remote  Wireless  GUI  Service 

A  remote  GUI  service  is  discovered  and  granted  in  the  same  way 
as  any  other  UI  microservice.  Additionally,  like  all  other  types 
of  remote  UI  microservices,  it  offers  an  appropriate  remote  con¬ 
trol  protocol  to  clients.  However,  this  particular  control  proto¬ 
col  encompasses  significantly  more  functionality  in  compari¬ 
son  with  the  other  more  primitive  UI  resources,  representative 
of  the  range  of  functions  offered  to  applications  clients  by  a  typ¬ 
ical  GUI  library.  Because  of  the  fact  that  the  protocol  is  far  more 
complicated  in  comparison  to  primitive  UI  resources,  it  has 
been  wrapped  up  around  an  appropriate  library,  hiding  all  in¬ 
ternal  protocol  and  communication  details  (see  Fig.  55.24).  Fol¬ 
lowing  Fig.  55.24,  a  client  application  running  on  the  PPU  can 


FIGURE  55.24.  The  use  of  the  C-F  +  proxy  toolkit  as  a  client-ori¬ 
ented  wrapper  for  the  control  protocol  with  the  remote  (Blue¬ 
tooth)  GUI  server,  the  latter  entirely  implemented  in  lava. 

utilize  the  remote  GUI  service  using  the  proxy  toolkit  library,  be¬ 
ing  actually  a  lightweight  GUI  toolkit  that  hides  all  wireless  net¬ 
working  and  protocol  details. 

This  separation  is  similar  to  X  Lib  for  the  X  Windowing 
System,  where  the  X  Lib  merely  implements  the  API,  posting 
display  requests  to  the  X  Server  over  TCP/IP  However,  the 
X  Protocol  is  more  close  to  a  graphics  package,  having  severe 
communication  requirements,  a  fact  that  turns  its  employment 
over  Bluetooth  to  clearly  impractical.  For  this  reason,  we  de¬ 
cided  to  employ  a  higher  level  toolkit  protocol  for  the  commu¬ 
nication  between  the  proxy  toolkit  and  the  remote  Java  server 
(e.g.,  the  responses  and  messages  labels  of  Fig.  55.25).  This  pro¬ 
tocol  is  a  relatively  old  but  highly  effective  recipe  for  remote 
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FIGURE  55.25.  The  different  ambient  devices  simulated  as  distinct  processes  on  different 
machines,  communicating  over  Bluetooth.  Note  that  two  snapshots  of  the  text  display  are  shown, 
while  there  is  only  one  in  the  experiment. 
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FIGURE  55.26.  Alternative  output  configurations  for  the  Break  Out  game  board  display.  At 
middle  bottom,  the  textual  display  of  the  MASC  Ltd.  Wristwatch  is  used,  being  a  wearable  Voyager 
enabled  ambient  device. 


toolkit  interfacing  called  “generic  toolkit  interfacing  protocol” 
(GTIP;  Savidis  et  al.,  1997). 

Test  Case  Applications 

Various  applications  with  an  ambient  user  interface  have  been 
developed,  including  e-mail,  a  clock-alarm  application,  and  a 
classic  game  remake,  together  with  a  separate  special-purpose 
application  called  the  “application  manager.”  Those  applica¬ 
tions  run  on  the  PPU  while  displaying  on  the  wheel  chair  at¬ 
tached  text  display  and  use  the  two  wireless  buttons.  The  GUI 
service  is  used  dynamically  to  replicate  the  interface  with  a  GUI 
instantiation  (see  Fig.  55.21).  As  it  is  shown,  the  GUI  service 
may  run  on  a  desktop  machine  or  even  on  a  user  PDA;  this  al¬ 
lows  dynamic  collaboration  with  a  typical  office  able  user  or 
with  a  mobile  user  carrying  a  PDA  device.  Each  distinct  wireless 
ambient  user  interface  microservice  (e.g.,  the  text  display  and 
the  two  buttons)  is  hosted  on  a  different  machine.  The  GUI  ser¬ 
vice  runs  on  a  single  remote  machine,  but  supports  concurrent 
mirroring  of  multiple  clients. 

Additionally,  through  the  Voyager  development  framework 
(Savidis  &  Stephanidis,  2005),  we  implemented  the  ambient 
Break  Out  game,  a  variant  of  the  famous  Break  Out  game  de¬ 
veloped  in  the  early  1960s,  known  to  be  the  first  ever  action- 
based  computer  game.  We  have  chosen  this  game  because  of 
its  simplicity,  but  mainly  because  its  graphical  stages  can  be  eas¬ 
ily  translated  to  textual  alternatives. 

Its  implementation  as  a  Voyager  application  has  resulted  in 
the  delivery  of  a  dynamically  reconfigurable,  either  automati¬ 
cally  or  on  user  demand,  user  interface,  both  for  the  main 
game  board  as  well  as  for  the  typical  nongame  dialogues.  In 


Fig.  55.26,  snapshots  and  photos  of  the  alternative  output  con¬ 
figurations  for  the  game  board  display  are  shown.  The  transi¬ 
tion  among  such  configurations  is  either  automatic  (e.g..  Voy¬ 
ager  tries  to  revive  or  to  optimize  the  dialogue)  or  manual 
(e.g.,  the  user  updates  the  configuration  preferences  and  re¬ 
quests  explicit  reconfiguration).  The  game  configuration  dia¬ 
logue  allows  the  user  to  choose  the  most  preferable  input  and 
output  game  styles.  In  case  the  available  ambient  devices 
(proximate  devices  controllable  remotely  over  Bluetooth)  al¬ 
low  such  multiple  game  styles  to  be  viable,  the  user  may  switch 
easily  on  the  fly  among  the  viable  alternatives,  through  the  di¬ 
alogue  illustrated  in  Fig.  55.27.  The  actual  PPU  device  where 


FIGURE  55.27.  The  game  input/output  style  configuration  dia¬ 
logue.  Once  the  "Back  to  game”  option  is  chosen.  Voyager  will 
firstly  try  to  activate  the  preferable  pair  of  input  and  output  con¬ 
trol  styles. 
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FIGURE  55.28.  The  pocketsize  processing  unit,  developed  by 
MASC  Ltd  for  the  2WEAR  project,  where  Voyager  clients  and 
servers  run;  illustration  and  photo  are  a  courtesy  of  MASC  Ltd, 
originally  from  (Majoe,  2003). 

the  Voyager  applications  and  runtime  components  run  is  pro¬ 
vided  in  Fig.  55.28. 


Abstract  Interaction  Object  Library 
and  Polymorphic  Alternatives 

The  notion  of  abstract  dialogue  objects  has  been  formulated 
very  early  to  reflect  reusable  logical  facilities  supporting  basic 
user  I/O,  without,  however,  presupposing  any  particular  ren¬ 
dering  or  input  device  policy.  Early  systematic  work  related  to 
abstract  interaction  objects  concerns  the  concept  of  metawidgets, 
representing  the  abstraction  of  interaction  objects  above  physi¬ 
cal  platforms  (Blattner,  Glinert,  Jorge,  &  Ormsby  1992),  since 
metawidgets  are  free  of  physical  attributes  and  presentation  is¬ 
sues.  Similarly,  the  notion  of  virtual  interaction  objects  has  been 
introduced  in  Savidis  and  Stephanidis  (1995),  together  with  an 
interface  programming  language,  which  provided  built-in  sup¬ 
port  for  the  specification  of  alternative  physical  instantiation 
styles  for  virtual  objects.  In  unified  development,  abstract  inter¬ 
action  objects  play  a  key  role.  The  reengineering  of  existing  uni- 
morphic  user  interface  designs,  by  capturing  abstract  interactive 


behaviors  through  a  role-based  model  of  interaction  objects  and 
retargeting  the  user  interface  physically  to  alternative  physical 
forms,  is  a  key  technique  of  unified  user  interface  design  (Sa¬ 
vidis  &  Stephanidis,  2004).  This  unified  development  technique 
has  been  also  adopted  in  the  Voyager  tool,  where  only  two  ab¬ 
stract  dialogue  object  classes  are  supported:  (a)  a  Selector  class, 
such  as  supporting  user  selection  from  an  explicit  list  of  textual 
options;  and  (b)  a  Text  Entry  class,  such  as  enabling  the  user  to 
enter  textual  input. 

To  allow  programmers  to  program  through  abstract  object 
classes,  but  eventually  deliver  full-functioning  ambient  dia¬ 
logues  the  Voyager  development  framework  employs  a  num¬ 
ber  of  hidden  implementation  layers,  the  lower  of  those  being 
responsible  for  direct  UI  microservice  control.  For  each  of  the 
two  basic  object  classes,  the  Voyager  framework  encapsulates 
appropriate  runtime  physical  interaction  control,  handling  the 
available  UI  services.  This  is  accomplished  through  the  imple¬ 
mentation  of  well-defined  alternative  dialogue-object  control 
policies,  where  each  policy  assumes  the  presence  of  a  different 
group  of  required  UI  services. 

Practically,  each  such  policy  delivers  a  way  of  physically  im¬ 
plementing  the  interaction  of  its  respective  dialogue  object  with 
specific  UI  services;  such  policies  are  called  “physical  instantia¬ 
tion  styles,”  or  “styles.”  Input  and  output  styles  are  distinguished 
to  reflect  the  different  ways  of  physical  input  and  output  bind¬ 
ing  of  dialogue  object  classes,  such  as  the  Selector  and  Text  Entry 
classes.  The  detailed  description  of  the  role  and  implementation 
approach  for  physical  instantiation  styles  follows. 

As  depicted  in  Eig.  55.29,  in  the  Voyager  development  frame¬ 
work,  multiple  implemented  alternative  input  and  output  styles 
are  provided  for  the  Select  and  Text  Entry  dialogue  objects.  Each 
different  style  implements  a  different  way  of  supporting  input  or 
output,  requiring  at  runtime  the  presence  of  different  UI  ser¬ 
vices.  Hence,  each  style  encompasses  the  required  UI  services, 
as  well  as  the  algorithmic  logic  for  using  such  UI  services  at  run¬ 
time.  The  two  dialogue  object  classes  host  the  interface  spe¬ 
cific  content,  such  as  text  content  for  selectors  and  input  text  for 
the  Text  Entry,  while  also  implementing  all  the  required  func¬ 
tionality  up  to  the  point  of  input  and  output  binding.  The  latter 
is  formulated  as  follows:  (a)  input  binding  is  defined  as  a  con¬ 
crete  set  of  logical  input  commands,  each  command  mapping  to 
a  corresponding  function  of  the  dialogue  object  class,  associ- 


FIGURE  55.29.  Implementation  structure  for  supporting  alter¬ 
native  I/O  styles  for  the  two  abstract  dialogue  objects  in  the 
Voyager  development  framework. 
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FIGURE  55.30.  The  main  part  of  the  dialogue  automaton  for  the 
Selector  dialogue  object  for  ambient  interactions. 


FIGURE  55.3 1 .  The  main  part  of  the  dialogue  automaton  for  the 
Text  Entry  dialogue  object  for  ambient  interactions. 

ated  with  particular  input  event  bindings  within  input  style  im¬ 
plementation;  and  (b)  output  binding  is  defined  as  an  abstract 
function  for  displaying  the  dialogue  object  content,  being  spe¬ 
cialized  by  the  different  output  style  classes. 

In  Fig.  55.30  and  Fig.  55.31,  the  dialogue  automata  of  Selec¬ 
tor  and  Text  Entry  classes  respectively  are  shown,  indicating 
how  the  particular  physical  binding  of  the  input  logical  dialogue 
commands  is  delegated  to  the  implementation  of  the  input 
style,  and  the  implementation  of  the  display  function  is  dele¬ 
gated  to  the  output  style.  Based  on  this,  each  distinct  style  may 
embody  alternative  bindings,  depending  on  the  nature  of  the 
proximate  UI  services  dynamically  discovered  during  user  mo¬ 
bility  and  eventually  granted  and  involved  for  interaction  by  the 
running  applications. 


CONCLUSIONS 


Currently,  the  required  technical  knowledge  to  build  user-  and 
usage-context  adapted  interfaces  includes  user  modeling,  task 
design,  cognitive  psychology,  rule-based  systems,  network  com¬ 
munication  and  protocols,  multiplatform  interfaces,  component 
repositories,  development  tools,  and  core  user  interface  soft¬ 
ware  engineering.  Software  development  firms  apparently 


prefer  incremental  engagement  strategies,  allowing  a  stepwise 
entrance  to  new  potential  markets,  by  delivering  successive  gen¬ 
erations  of  products  encompassing  layers  of  novel  characteris¬ 
tics.  Similarly  the  development  of  software  applications  sup¬ 
porting  automatic  user  interface  adaptation,  for  the  broadest 
end-user  population,  requires  a  concrete  strategy  supporting 
evolutionary  development,  software  reuse,  incremental  de¬ 
sign,  scalability,  and  modular  construction.  The  unified  user 
interface  development  discussed  claims  to  offer  a  software  en¬ 
gineering  proposition  that  consolidates  process-oriented  wisdom 
for  constructing  automatically  adapted  user  interfaces.  Evolu¬ 
tion,  incremental  development,  and  software  reuse  are  some 
of  the  fundamental  features  of  unified  user  interface  develop¬ 
ment.  These  are  reflected  in  the  ability  to  progressively  extend 
a  unified  user  interface,  by  incrementally  encapsulating  com¬ 
putable  content  in  the  different  parts  of  the  architecture,  to 
cater  for  additional  users  and  usage  contexts,  by  designing  and 
implementing  more  dialogue  artifacts,  and  by  embedding  new 
rules  for  the  decision  making  logic.  Such  characteristics  are  par¬ 
ticularly  important  and  relevant  to  the  claimed  feasibility  and 
viability  of  the  proposed  software  engineering  process  and  di¬ 
rectly  facilitate  the  practical  accomplishment  of  universally  ac¬ 
cessible  interactions. 

The  concept  of  unified  user  interfaces  reflects  a  new  soft¬ 
ware  engineering  paradigm  that  addresses  effectively  the  need 
for  interactions  automatically  adapted  to  the  individual  end-user 
requirements  and  the  particular  context  of  use.  Eollowing  this 
technical  approach,  interactive  software  applications  encom¬ 
pass  the  capability  appropriately  to  deliver  on  the  fly  an  adapted 
interface  instance,  performing  appropriate  runtime  processing 
that  engages: 

•  Utilization  of  user-  and  usage-context  oriented  information 
(e.g.,  profiles),  as  well  as  the  ability  to  detect  dynamically 
user-  and  usage-context  attributes  during  interaction 

•  Management  of  appropriate  alternative  implemented  dia¬ 
logue  components,  realizing  alternative  ways  for  physical- 
level  interaction 

•  Adaptation-oriented  decision  making  that  facilitates:  (a)  the 
selection,  before  initiation  of  interaction,  of  the  most  appro¬ 
priate  dialogue  components  comprising  the  delivered  inter¬ 
face,  given  any  particular  dialogue  context,  for  the  particular 
end-user  and  usage-context  profiles  (e.g.,  adaptability);  and 
(b)  the  implementation  of  appropriate  changes  in  the  initially 
delivered  interface  instance,  according  to  dynamically  de¬ 
tected  user-  and  usage-context  attributes  (e.g.,  adaptivity) 

•  Runtime  component  co-ordination  and  control  to  dynami¬ 
cally  assemble  or  alter  the  target  interface;  this  user  interface 
is  composed  on  the  fly  from  the  set  of  dynamically  selected 
constituent  dialogue  components. 

The  unified  user  interface  development  strategy  provides  a 
distributed  software  architecture  with  well-defined  functional 
roles  (e.g.,  which  component  does  what),  intercommunication 
semantics  (e.g.,  which  component  requests  what  and  from 
whom),  control  flow  (e.g.,  when  to  do  what),  and  internal  de¬ 
composition  (e.g.,  how  the  implementation  of  each  compo- 
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FIGURE  55.32.  Open  research  questions  in  design  for  all. 


nent  is  internally  structured).  One  of  the  unique  features  of  this 
development  paradigm  is  the  emphasis  on  dynamic  interface 
assembly  for  adapted  interface  delivery,  reflecting  a  software 
engineering  practice  with  repository-oriented  component 
organization,  parametric  containers  with  abstract  containment 
APIs,  and  common  interaction-monitoring  control  with  ab¬ 
stract  APIs.  Although  the  method  itself  is  not  intended  to  be 
intensively  prescriptive  from  the  low-level  implementation 
point  of  view,  specific  successful  practices  that  have  been  tech¬ 
nically  validated  in  fieldwork  regarding  decision  making  and 
dynamic  user-attribute  detection  have  also  been  discussed, 
focusing  on  microarchitecture  details  and  internal  functional 
decomposition. 

In  this  context,  this  development  method  has  been  system¬ 
atically  deployed  and  tested  in  practical  situations  where  auto¬ 
matic  personalization  of  computer-based  interactive  applica¬ 
tions  and  services  was  the  predominant  issue.  It  introduces  the 
fundamental  notion  of  adapted  interface  delivery  before  initia¬ 
tion  of  interaction  and  addresses  the  technical  challenges  of 
coping  with  the  inherent  runtime  dynamic  interface  assembly 
process. 

Additionally,  the  basic  software  engineering  strategy  of  uni¬ 
fied  development,  in  particular  abstract  objects  and  polymor¬ 
phic  containment,  have  been  deployed  in  leading-edge  devel¬ 
opments,  to  deliver  applications  with  ambient  mobile  user 
interfaces  running  on  wearable  pocketsize  processing  units. 

The  proposed  approach  establishes  one  possible  technical 
route  towards  constructing  automatically  individualized  best  fit 
user  interfaces:  it  enables  incremental  development  and  facili¬ 
tates  the  expansion  and  upgrade  of  dialogue  components  as  an 
on-going  process,  entailing  the  continuous  engagement  and 
consideration  of  new  design  parameters,  and  new  parameter 


values.  It  is  anticipated  that  future  research  work  may  reveal  al¬ 
ternative  approaches  or  methods.  At  the  same  time,  further  re¬ 
search  and  development  work  for  unified  user  interfaces  is  re¬ 
quired  to  address  some  existing  challenges,  mainly  related  to 
design  issues  (see  Fig.  55.32). 

Following  Fig.  55.32,  one  top-level  issue  concerns  the  way 
that  specific  varying  user  attributes  affecting  interaction  are 
to  be  identified.  In  other  words,  there  is  a  need  to  identify  di¬ 
versity  in  those  human  characteristics  that  are  likely  to  dictate 
alternative  dialogue  means.  Subsequently,  even  if  a  set  of  those 
attributes  is  identified,  it  is  still  unclear  how  to  conduct  a  de¬ 
sign  process  to  produce  the  necessary  alternative  dialogue  ar¬ 
tifacts  for  the  different  values  of  those  attributes.  Hence,  it  is 
necessary  to  design  for  diversity,  relying  upon  appropriate  de¬ 
sign  rationale  clearly  relating  diverse  attribute  values  with  spe¬ 
cific  properties  of  the  target  dialogue  artifacts.  Currently,  there 
is  only  limited  knowledge  about  how  to  perform  effectively 
this  transition  from  alternative  user-attribute  values  to  alter¬ 
native  design  artifacts,  and  it  can  be  characterized  as  a  ratio¬ 
nalization  gap. 

Finally,  the  issue  of  how  to  structure  appropriately  alterna¬ 
tive  patterns  for  diverse  user-attribute  values,  should  be  ad¬ 
dressed  in  a  way  that  the  resulting  designs  are  indeed  efficient, 
effective,  and  satisfactory  for  their  intended  users  and  usage 
contexts.  Such  a  process  requires  appropriate  evaluation 
methods,  and  the  capability  to  measure  the  appropriateness  of 
designed  artifacts.  At  present,  this  is  still  a  missing  link,  char¬ 
acterized  as  the  measurability  gap.  Unless  we  are  able  to  as¬ 
sert  the  appropriateness  of  the  alternative  dialogue  artifacts 
designed  for  diverse  user  attributes,  we  cannot  validate  the 
overall  dynamically  delivered  interface.  The  inability  to  for¬ 
mulate  and  conduct  such  an  evaluation  process  creates  a 
validity  gap. 

Work  currently  underway,  as  well  as  future  work,  is  expected 
to  address  these  issues  in  an  attempt  to  bridge  the  identified 
gaps. 
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INTRODUCTION 


Human-computer  interaction  (HCI)  and  Software  Engineering 
(SE)  are  like  two  old  friends  with  different  backgrounds:  they 
share  values  but  use  them  differently.  Both  domains  address  the 
design  and  development  of  useful  and  usable  systems  and  are 
concerned  with  “requirements  analysis,”  “incremental  and  iter¬ 
ative  design,”  as  well  as  “quality  assurance.”  However,  they  ad¬ 
dress  these  problems  with  different  development  processes,  dif¬ 
ferent  notations,  and  different  priorities.  For  HCI,  the  human  is 
the  first-class  entity  in  all  phases  of  development.  For  SE,  the  fi¬ 
nal  objective  is  a  running  system  developed  at  minimal  cost  and 
delivered  in  time,  while  satisfying  contractual  specifications.  The 
user  is,  at  best,  involved  at  the  very  beginning  of  the  process, 
and  hopefully  at  the  very  end  of  the  project  for  summative  eval¬ 
uation.  However,  to  avoid  or  correct  wrong  design  decisions, 
this  is  too  little  and  too  late.  Even  in  the  early  stages  of  devel¬ 
opment,  functional  requirements  and  quality  goals  are  rarely  the 
result  of  a  close  collaboration  between  HCI  and  SE  specialists. 

There  are  many  reasons  for  the  lack  of  collaboration  between 
HCI  and  SE  scientists  and  practitioners:  mutual  ignorance  re¬ 
sulting  from  educational  background,  and  from  there,  economic 
consideration.  HCI  methods  such  as  contextual  design  (see 
chapter  49 — Karen  Holtzblatt),  scenario-based  approaches 
(Rosson  et  al.,  2002),  and  task  analysis  are  perceived  as  too  de¬ 
manding  in  terms  of  time  and  competence  to  inform  system  re¬ 
quirements  in  a  formal  and  timely  manner.  On  the  other  hand. 
Unified  Modeling  Language  (UML)  use  cases,  which  express  the 
functions  that  the  system  should  support  with  a  scenario-based 
flavor,  are  pale  attempts  to  factor  out  user-centered  concerns. 
They  do  not  result  from  a  human-centered  requirements  analy¬ 
sis  nor  do  they  have  the  expressive  power  of  task  models.  Task¬ 
modeling  techniques,  such  as  Concur  Task  Tree  (CTT)  (Paterno, 
2003)  or  User  Action  Notation  (UAN)  (Hartson,  1990),  which  use 
notations  familiar  to  computer  scientists  (i.e.  (LOTOS)  Language 
of  Temporal  Ordering  Specification  operators  and  logic),  are  not 
used  in  software  engineering.  Conversely,  domain-dependent 
concepts  referenced  in  task  models  are  ill  defined,  whereas  UML 
class  diagrams  would  improve  task  specifications  significantly. 

In  summary,  HCI  and  SE  pursue  the  same  goal,  using  devel¬ 
opment  processes  and  notations  that  sometimes  overlap  and 
complement  each  other.  In  this  chapter,  we  present  one  way  to 
exploit  both  fields  for  the  development  oi plastic  user  interfaces 
using  the  notion  of  model  as  the  keystone  between  the  two  dis¬ 
ciplines.  In  the  following  section,  we  present  the  problem  space 
of  user  interface  (UI)  plasticity  followed,  by  three  exemplars  of 
plastic  interactive  systems  that  illustrate  aspects  of  the  problem 
space.  We  then  introduce  the  key  objectives  and  principles  of 
model-driven  Engineering  (http://planetmde.org)  and  show  how 
they  can  be  exploited  as  a  solution  space  for  UI  plasticity. 


THE  PROBLEM  SPACE  OF  USER 
INTERFACE  PLASTICITY 


The  term  plasticity  is  inspired  from  the  capacity  of  solids  and  bi¬ 
ological  entities  such  as  plants  and  brain  to  adapt  to  external 


constraints  to  preserve  continuous  usage.  Applied  to  interactive 
systems,  UI  plasticity  is  the  capacity  of  user  interfaces  to  adapt 
to  the  context  of  use  while  preserving  usability  (Thevenin  & 
Coutaz,  1999).  We  define  these  terms  (context  of  use,  usability, 
and  system  adaptation)  in  detail  in  the  following  sections. 

Context  and  Context  of  Use 

It  is  commonly  agreed  that  context  is  about  evolving,  struc¬ 
tured,  and  shared  information  spaces,  and  that  such  spaces  are 
designed  to  serve  a  particular  purpose  (Coutaz  et  at,  2005).  In 
other  words,  context  is  not  simply  a  state  but  part  of  a  process 
(the  purpose).  Thus,  there  is  no  such  thing  as  the  context,  but 
there  is  a  context  qualified  by  the  process  it  serves.  This  is  why 
we  use  the  phrase  context  of  use,  and  not  simply  the  word  con¬ 
text,  to  refer  to  the  information  spaces  that  serve  the  adaptation 
process  when  context  changes.  A  context  change  could  be  de¬ 
fined  as  the  modification  of  the  value  of  any  element  of  the  con¬ 
textual  information  spaces.  This  definition  would  lead  to  an  ex¬ 
plosion  of  contexts.  We  need  more  structure.  The  following 
ontological  foundation  provides  this  structure. 

Ontological  Foundation  for  Context 

As  shown  in  Fig.  56.1,  a  contextual  information  space  is  mod¬ 
eled  as  a  directed  graph  where  a  node  denotes  a  context  and 
an  edge  denotes  a  condition  to  move  between  two  contexts.  In 
turn,  a  context  is  a  directed  graph  of  situations  where  a  node  de¬ 
notes  a  situation  and  an  edge  denotes  a  condition  to  move  be¬ 
tween  two  situations.  Thus,  a  contextual  information  space  is  a 
two-level  data  structure,  i.e.  a  graph  of  contexts  where  each  con¬ 
text  is  in  turn  a  graph  of  situations.  If  more  structure  is  needed, 
situations  may  in  turn  be  refined  into  “subsituations,”  and  so  on. 
We  now  need  to  specify  the  domain  of  definition  of  contexts 
and  situations. 

A  context  is  defined  over  a  set  E  of  entities,  a  set  Ro  of  roles 
(i.e.  functions)  that  these  entities  may  satisfy,  and  a  set  Rel  of 
relations  between  the  entities.  Entities,  roles,  and  relations  are 
modeled  as  expressions  of  observables  that  are  captured  and  in¬ 
ferred  by  the  system.  Eor  example,  in  a  conference  room,  E  de¬ 
notes  the  participants,  Ro  denotes  the  roles  of  speaker  and  lis¬ 
tener,  and  Rel  denotes  some  spatial  relations  such  as  “in  front 
of”  The  situations  that  pertain  to  the  same  context  share  the 
sets  E,  Ro,  and  Rel. 

The  condition  to  move  between  two  contexts  is  one  of  the  fol¬ 
lowing:  E  is  replaced  by  a  different  set  (e.g.,  the  set  E  of  partici¬ 
pants  is  now  replaced  with  the  set  of  family  members),  Ro  has 
changed  (e.g.,  the  roles  of  speaker  and  listener  are  replaced  with 
that  of  parent),  or  Rel  has  changed  (e.g.,  in  addition  to  spatial  re¬ 
lationships,  temporal  relationships  between  entities  now  matter). 

The  condition  to  move  between  two  situations  is  one  of  the 
following: 

•  The  cardinality  of  the  set  E  has  changed.  Eor  example,  10  per¬ 
sons  enter  the  room  and  are  recognized  by  the  system  as  par¬ 
ticipants  (their  observables  match  the  characteristics  and  be¬ 
havior  of  participants).  If  recognized  as  terrorists,  then  the 
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FIGURE  56. 1 .  The  graph  of  contexts  Gc  is  composed  of  4  contexts  Ci,  C2,  C3,  C4  defined  on  their 
own  sets  of  Entities,  Roles,  and  Relations.  In  turn.  Context  C2  is  composed  of  4  situations  Si,  S2, 
S3,  S4.  By  definition,  these  situations  share  the  same  sets  of  Entities,  Roles,  and  Relations.  In  S4, 
Entities  6]  and  e4  (elements  of  E2)  play  the  role  rj  (element  of  R2),  whereas  Role  r,  is  played  by 
Entity  e2;  63  and  64  satisfy  Relation  reli,  e^  and  03  satisfy  rel2,  and  and  04  are  related  by  reli  (reli, 
rel2,  rel3  are  elements  of  Re^). 


system  would  detect  a  context  change  (and  not  a  situation 
change). 

•  A  role  assignment  to  an  entity  has  changed  (e.g.,  participant 
e  switches  from  speaker  to  listener), 

•  A  relation  between  two  entities  has  changed  (e.g.,  participant 
e  was  in  front  of  e’.  Now,  e’  is  in  front  of  e). 

The  ontology  does  not  specify  the  nature  of  the  entities, 
roles,  relations,  and  observables.  These  are  abstract  classes  from 
which  a  domain-dependent  model  can  be  specified.  Using  ex¬ 
pressions  of  observables,  designers  identify  the  set  of  Entities, 
Roles,  and  Relations  that  are  relevant  for  the  case  at  hand.  As 
discussed  next,  the  observables  of  a  context  of  use  are  orga¬ 
nized  into  three  information  spaces. 

Observables  of  tfie  Context  of  Use 

The  observables  of  a  context  of  use  define  three  informa¬ 
tion  spaces  respectively  called  (a)  the  user  model,  (b)  the  envi¬ 
ronment  model,  and  (c)  the  platform  model. 

•  The  user  model  denotes  the  attributes  and  functions  that  de¬ 
scribe  the  archetypal  person  who  is  intended  to  use,  or  is  ac¬ 
tually  using,  the  interactive  system.  This  includes  profile,  idio¬ 
syncrasies,  tasks,  and  activities. 

•  The  environment  model  includes  attributes  and  functions 
that  characterize  the  physical  places  where  the  interaction 
will  take  place  or  is  actually  taking  place.  This  includes  nu¬ 
meric  and/or  symbolic  locations  (e.g.,  at  home,  in  a  public 
space,  on  the  move  in  the  street,  a  train  or  a  car),  social  rules 
and  activities,  light,  heat,  and  sound  conditions. 


•  The  platform  model  describes  the  computing,  sensing,  net¬ 
working,  and  interaction  resources  that  bind  together  the 
physical  environment  with  the  digital  world. 

In  the  conventional  GUI  paradigm,  the  platform  is  limited  to 
a  single  computing  device,  typically  a  workstation  or  a  PDA,  con¬ 
nected  to  a  network  and  equipped  with  a  fixed  set  of  interaction 
resources  such  as  a  screen,  keyboard,  and  stylus.  Technological 
advances  are  leading  to  the  capacity  for  individuals  to  assemble 
and  mould  their  own  interactive  spaces  from  public  hot  spots  and 
private  devices  to  access  services  within  the  global  computing  fab¬ 
ric.  Interactive  spaces  will  also  take  the  form  of  autonomous  com¬ 
puting  islands,  or  ecosystems,  whose  horizon  will  evolve,  split, 
and  merge  under  human  control.  Figure  56.2  illustrates  this  view 
of  ubiquitous  computing  where  resources  are  coupled  oppor¬ 
tunistically  to  amplify  human  activities  with  new  services,  and 
where  any  real-world  object  (e.g.,  a  wall,  a  pen  and  fingers)  can 
play  the  role  of  an  interaction  resource.  As  a  result,  the  platform 
is  modelled  as  a  dynamic  cluster  of  heterogeneous  resources, 
rather  than  a  conventional  monocomputing  static  device. 

Usability 

The  term  usability  is  interpreted  in  different  ways  by  authors, 
even  within  the  same  scientific  community.  Usability  has  been 
identified  with  ease  of  use  and  learning,  while  excluding  utility 
(Shackel,  1984;  Nielsen,  1993).  In  other  cases,  usability  is  used 
to  denote  ease  of  use  and  utility,  while  ignoring  learning  ability. 
In  software  engineering,  usability  is  considered  an  intrinsic 
property  of  the  software  product,  whereas  in  HCI,  usability 
is  contextual:  a  system  is  not  intrinsically  usable  or  unusable. 
Instead,  usability  arises  relatively  to  contexts  of  use. 
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FIGURE  56.2.  Assembling  interaction  resources  opportunistically:  (a)  Access  to  services  within 
the  global  computing  fabric  via  dynamic  connection  of  a  private  device  to  a  public  hot  spot  (e.g., 
an  active  map),  (b)  Ecosystem:  connecting  two  tablets  to  enlarge  the  screen  real  estate,  or  to  start 
an  intimate  collaboration  between  users  (Hinckley,  2003). 


The  contextual  nature  of  usability  has  been  recently  recog¬ 
nized  by  the  International  Organization  for  Standardization/ 
International  Electrotechnical  Commission  (ISO/IEC)  9126  stan¬ 
dards  developed  in  the  software  community  with  the  overarch¬ 
ing  notion  of  “quality  in  use.”  Quality  in  use  is  “the  capability  of 
the  software  product  to  enable  specified  users  to  achieve  spec¬ 
ified  goals  with  effectiveness,  productivity,  safety,  and  satisfac¬ 
tion  in  specified  contexts  of  use.”  Unfortunately,  as  shown  in  Fig. 
56.3,  usability  is  viewed  as  one  independent  contribution  to 
quality  in  use.  Thus,  the  temptation  is  high  for  software  people 
to  assimilate  usability  to  cosmetic  issues  limited  to  the  user-in¬ 
terface  component  of  a  software  product,  forgetting  that  system 
latency,  reliability,  missing  functions,  and  inappropriate  se¬ 
quencing  of  functions,  have  a  strong  impact  on  a  system’s  “use- 
worthiness.” 

Useworthiness  is  central  to  Cockton’s  argument  for  the  de¬ 
velopment  of  systems  that  have  value  in  the  real  world  (Cockton 
2004,  2005).  In  value-centered  approaches,  software  design 
should  start  from  the  explicit  expression  of  an  intentional  cre¬ 
ation  of  value  for  a  selected  set  of  target  contexts  of  use.  In¬ 
tended  value  for  target  contexts  are  then  translated  into  evalu¬ 
ation  criteria.  Evaluation  criteria  are  not  necessarily  elicited  from 
generic  intrinsic  features  such  as  time  for  task  completion,  but 
are  contextualized.  They  are  monitored  and  measured  in  real 
usage  to  assess  the  achieved  value. 

Building  on  Cockton’s  approach,  we  suppose  that  for  each  of 
the  target  contexts  of  use  Q  of  a  system,  an  intended  value  V,- 
has  been  defined,  and  that  V,-  has  been  translated  into  the  set 
of  triples  dn,  Wn),  ■  ■  ■  ,  dy,  Wjj),  .  .  .  (Cf^  d{^,  ^m)! 
where  Cy  is  an  evaluation  criteria,  and  dy  and  Wy,  the  expected 
domain  of  values  and  relative  importance  of  Cy  in  Q.  As  just  dis¬ 
cussed,  Cij  may  be  a  generic  measurable  feature  or  a  customized 
measure  that  depends  on  the  intended  value  in  Q.  Usability  f/,- 
of  the  system  for  context  Q  is  evaluated  against  a  combining 
function  A,-  on  the  set  {{Ca,  du,  Wu),  ....  {Cy,  dy,  Wij),  .  .  .  (c,„, 
din,  Win)  }  whose  results  is  intended  to  lie  within  a  domain  of 
values  Dj. 
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FIGURE  56.3.  The  usability  model  from  ISO/lEC  9126-1. 


Coming  back  to  the  notion  of  plasticity,  an  interactive  system 
S  is  plastic  from  a  source  context  of  use  Ci  to  a  target  of  use  Cj 
if  the  following  two  conditions  are  satisfied: 

1.  Adaptation,  if  needed,  is  supported  when  switching  from  Q 
to  Cj 

2 .  Usability  (value)  is  preserved  in  Cj  by  the  adaptation  process. 
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In  turn,  usability  in  the  target  context  Cj  is  preserved  if  the 
following  two  conditions  are  satisfied: 

•  The  usability  function  Pj  defined  for  Cj  lies  within  its  intended 

domain  Dj. 

•  The  usability  function  meta-Fij  of  the  meta-UI  lies  within  its 

intended  domain  meta-Dy  when  transiting  from  Ci  to  Cj. 

A  meta-UI  is  to  ambient  computing  what  the  desktop 
metaphor  is  to  conventional  workstations.  It  binds  together 
the  activities  that  can  be  performed  within  an  interactive 
space.  In  particular,  it  provides  users  with  the  means  to  con¬ 
figure,  control,  and  evaluate  the  adaptation  process.  It  may,  or 
may  not,  negotiate  the  alternatives  for  adaptation  with  the  user. 
A  meta-UI  without  negotiation  makes  observable  the  state  of 
the  adaptation  process,  but  does  not  allow  the  user  to  inter¬ 
vene.  The  system  is  autonomous.  A  meta-UI  incorporates  ne¬ 
gotiation  when,  for  example,  it  cannot  make  sound  decisions 
between  multiple  forms  of  adaptation,  or  when  the  user  must 
fully  control  the  outcome  of  the  process.  The  balance  between 
system  autonomy  and  too  many  negotiation  steps  depends  on 
the  case  at  hand. 

The  domain  of  plasticity  of  a  system  is  the  set  C  of  contexts  of 
use  Cj  for  which  usability  is  achieved.  We  have  defined  usabil¬ 
ity  by  reasoning  at  the  context  level.  If  needed,  a  finer  grain  of 
reasoning  can  be  applied  at  the  situation  level:  intended  value  is 
defined  for  each  situation  of  each  context,  and  then  translated 
into  evaluation  criteria.  Preserving  usability  is  then  evaluated  on 
situation  changes. 

These  definitions  provide  a  theoretical  framework  where 
value  comes  first  and  is  defined  on  a  per-context  (or  situation) 
of  use  basis.  For  each  of  the  intended  target  contexts  (or  situa¬ 
tions)  ,  value  is  operationalized  into  a  mix  of  generic  and  cus¬ 
tomized  metrics.  The  problem  is  the  identification  of  the  relevant 
contexts  of  use  and  situations  as  well  as  the  appropriate  transla¬ 
tion  of  value  into  significant  metrics.  We  have  no  answer  for  op¬ 
erationalizing  value,  except  to  use  generic  measures  when  ap¬ 
plicable,  to  instrument  the  system  appropriately  using  sound 
software  development  techniques,  such  as  Aspect  Oriented  Pro¬ 
gramming  (AOP)  (Elrad,  2001),  and  to  apply  a  healthy  dose  of 
common  sense.  On  the  other  hand,  our  ontological  framework 
on  context  and  its  associated  method  (Rey  2005),  can  be  used  to 
define  the  boundaries  of  contexts  and  situations  of  use  as  well  as 
their  relationships.  For  our  notion  of  context  of  use,  the  funda¬ 
mental  entities  are  the  user(s),  environment,  and  platform,  each 
of  them  being  characterized  by  observables  monitored  by  the  sys¬ 
tem.  The  section,  “Functional  Decomposition  for  Plastic  Interac¬ 
tive  Systems”  shows  how  to  integrate  the  monitoring  of  observ¬ 
ables  within  the  software  architecture  of  an  interactive  system. 

System  Adaptation 

System  adaptation  to  context  of  use  can  take  multiple  forms. 
Here,  we  limit  the  discussion  to  the  consequence  of  adaptation 
as  perceived  by  users  at  the  user  interface.  Adaptation  can  use 
one,  or  a  combination  of,  the  following  techniques  (a)  UI  re¬ 
moulding,  (b)  UI  distribution,  and  (c)  UI  migration. 


UI  Remoulding 

UI  remoulding  denotes  the  reconfiguration  of  the  user  in¬ 
terface  that  results  from  the  application  of  one  or  several  trans¬ 
formations  on  all,  or  parts,  of  the  user  interface.  These  trans¬ 
formations  can  be  applied  at  multiple  levels  of  abstraction,  and 
they  can  be  intramodal,  intermodal,  or  multimodal.  These  trans¬ 
formations  include  the  following: 

•  Suppression  of  the  UI  components  that  become  irrelevant  in 
the  new  situation/context; 

•  Insertion  of  new  UI  components  to  provide  access  to  new 
services  relevant  in  the  new  situation/context; 

•  Substitution  of  UI  components  when  UI  components  are 
replaced  with  new  ones.  Substitution  can  be  viewed  as  a  com¬ 
bination  of  suppression  and  insertion  of  UI  components; 

•  Reorganization  of  UI  components  by  revisiting  their  spatial 
layout  and/or  their  temporal  dependency.  Reorganization  may 
result  from  the  suppression,  insertion,  or  substitution  of  UI 
components.  On  the  other  hand,  switching  from  a  portrait 
view  to  a  landscape  view  requires  spatial  reorganization  only. 

Remoulding  is  intramodal  when  the  source  UI  components 
concerned  by  the  transformations  are  retargeted  within  the 
same  modality.  Remoulding  is  intermodal  if  the  source  UI  com¬ 
ponents  expressed  in  one  modality  (say  GUI)  are  transformed 
into  UI  components  using  a  different  modality  (say  speech). 
Remoulding  is  multimodal  if  it  uses  a  combination  of  intra  and 
intermodal  transformations.  For  example,  Teresa  supports  mul¬ 
timodal  remoulding  (Berti  &  Paterno,  2005). 

The  transformations  can  be  performed  at  multiple  levels  of 
abstraction: 

•  At  the  Physical  Presentation  (PP)  level,  physical  interactors 
(widgets)  used  for  representing  functions  and  concepts  are 
kept  unchanged  but  their  rendering  and  behavior  may 
change.  For  example,  if  a  concept  is  rendered  as  a  button 
class,  this  concept  is  still  represented  as  a  button  in  the  target 
UI.  However,  the  look  and  feel  of  the  button  or  its  location 
in  the  workspace  may  vary.  This  type  of  adaptation  is  used  in 
Tk  as  well  as  in  Java/AWT  with  the  notion  of  peers. 

•  At  the  Logical  Presentation  {IP)  level,  adaptation  consists 
of  changing  the  representation  of  functions  and  concepts. 
For  example,  the  concept  of  month  can  be  rendered  as  a 
Label+Textfield,  or  as  a  Label +Combobox,  or  as  a  dedicated 
physical  interactor.  In  an  LP  adaptation,  physical  interactors 
can  replace  each  other  provided  that  their  representational 
and  interactional  capabilities  are  equivalent.  The  imple¬ 
mentation  of  an  LP-level  adaptation  can  usefully  rely  on  the 
distinction  between  Abstract  Interactive  Objects  and  Con¬ 
crete  Interactive  Objects  as  presented  in  (Vanderdonckt  & 
Bodart,  1993).  Changes  at  the  LP  level  imply  changes  at  the 
PP  level. 

•  At  the  Dialog  Component  (DC)  level,  the  tasks  that  can  be 
executed  with  the  system  are  kept  unchanged  but  their  or¬ 
ganization  is  modified.  As  a  result,  the  structure  of  the  dia¬ 
logue  structure  is  changed.  AVANTFs  polymorphic  tasks 
(Stephanidis  &  Savidis,  2001)  are  an  example  of  a  DC-level 
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adaptation.  Changes  at  the  DC  level  imply  changes  at  the  LP 
and  PP  levels. 

•  At  the  Functional  Core  Adaptor  (FCA)  level,  the  nature  of 
the  entities  as  well  as  the  functions  exported  by  the  func¬ 
tional  core  (the  services)  are  changed.  Zizi’s  semantic  zoom 
is  an  example  of  an  FCA-level  adaptation  (Zizi  &  Beaudouin- 
Lafon,  1994).  Changes  at  the  FCA  level  imply  changes  at  the 
DC,  Li;  and  PP  levels. 

U1  Distribution 

A  UI  is  distributed  when  it  uses  interaction  resources  that  are 
distributed  across  a  cluster.  For  example,  in  graphical  UI’s 
(GUI),  the  rendering  is  distributed  if  it  uses  surfaces  that  are 
managed  by  different  computing  devices.  Distribution  is  static 
when  it  is  performed  off-line  between  sessions.  It  is  dynamic 
when  it  occurs  on  the  fly. 

The  granularity  of  UI  distribution  may  vary  from  application 
level  to  pixel  level: 

•  At  the  application  level,  the  GUI  is  fully  replicated  on  the 
surfaces  managed  by  each  computing  device.  The  x2vnc  im¬ 
plementation  of  the  VNC  protocol  offers  an  application-level 
distribution. 

•  At  the  workspace  level,  the  unit  for  distribution  is  the  work¬ 
space.  A  workspace  is  a  logical  space  that  supports  the  exe¬ 
cution  of  a  set  of  logically  connected  tasks.  This  concept  is 
similar  to  the  notion  of focus  area  used  in  contextual  de¬ 
sign  for  expressing  the  user-environment  design.  Pebbles- 
Draw  (Myers,  2001)  and  Rekimoto’s  Pick  and  Drop  (Rekimoto, 
1997)  are  examples  of  UI  distribution  at  the  workspace  level. 

•  The  interactor  level  distribution  is  a  special  case  of  the  work¬ 
space  level  where  the  unit  for  distribution  is  an  elementary 
interactor. 

•  At  the  pixel  level,  any  user  interface  component  can  be  par¬ 
titioned  across  multiple  surfaces.  For  example,  in  DynaWall 
(Streitz,  Geibler,  Holmer,  Konomi,  &  Miiller-Tomfolde.  1999), 
a  window  may  simultaneously  lie  over  two  contiguous  white 
boards  as  if  these  were  managed  by  a  single  computer. 

UI  Migration 

UI  migration  corresponds  to  the  transfer  of  all  or  part  of  the 
UI  components  to  different  interaction  resources  whether  these 
resources  belong  to  the  current  platform  or  to  another  one.  Mi¬ 
gration  is  static  when  it  is  performed  off-line  between  sessions. 
It  is  dynamic  when  it  occurs  on  the  fly.  In  addition,  the  migra¬ 
tion  of  a  user  interface  is  total  if  the  user  interface  moves  en¬ 
tirely  to  a  different  platform  (application-level  migration) .  It  is 
partial  (at  the  workspace,  interactor,  or  pixel  levels)  when  a 
subset  only  of  the  user  interface  moves  to  different  interaction 
resources.  For  example,  on  the  arrival  of  a  PDA,  the  control  pan¬ 
els  currently  rendered  on  a  whiteboard  migrate  to  the  PDA 
(workspace-level  migration). 

Migration  and  distribution  are  two  distinct  notions:  a  UI  may 
be  distributed  but  not  migratable  (static  distribution).  A  central¬ 


ized  UI  may  migrate,  but  if  the  migration  is  total,  it  remains  cen¬ 
tralized  on  the  target  platform. 

State  Recovery 

The  granularity  of  state  recovery  characterizes  the  effort 
users  must  apply  to  carry  on  their  activity  after  adaptation  has 
occurred.  State  recovery  can  be  performed  at  the  session,  task, 
and  physical  action  levels: 

•  When  the  system  state  is  saved  at  the  session  level,  users  have 
to  restart  the  interactive  system  from  scratch.  They  rely  on 
the  state  saved  by  the  service  (functional  core)  level  before 
adaptation  is  taking  place. 

•  At  the  task  level,  the  user  can  pursue  the  job  from  the  begin¬ 
ning  of  the  current  interrupted  task  (provided  that  the  task  is 
attainable  in  the  retargeted  system). 

•  At  the  physical-  action  level,  the  user  is  able  to  carry  on  the 
current  task  at  the  exact  point  within  the  current  task  (pro¬ 
vided  that  the  task  is  attainable  in  the  new  version  of  the  user 
interface). 

The  examples  presented  next  illustrate  the  problem  space  of 
plastic  UTs. 
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The  Home  Heating  Control  System,  CamNote,  and  Sedan-Bouil¬ 
lon  are  three  examples  of  plastic  interactive  systems  developed 
according  to  the  MDE  approach  presented  next.  The  services 
they  provide  are  accessible  from  different  types  of  computing 
devices  including  workstations,  personal  digital  assistants  (PDA), 
and  mobile  phones.  The  UI  of  the  Home  Heating  Control  Sys¬ 
tem  is  kept  centralized  on  one  single  device  at  a  time,  whereas 
for  CamNote  and  the  Sedan-Bouillon  website,  the  UI  compo¬ 
nents  can  be  dynamically  distributed  and  migrated  across  the 
interaction  resources  currently  available  in  the  interactive  space. 
CamNote  and  Sedan-Bouillon  differ  in  the  technological  spaces 
used  for  implementation:  CamNote  is  Java-centric  whereas 
Sedan-Bouillon  uses  PHP-MySQL  Internet  solutions. 

Home  Heating  Control  System 

The  Home  Heating  Control  System  is  intended  to  be  used  (a)  at 
home  through  a  dedicated  wall-mounted  device  or  through  a 
PDA  connected  to  a  wireless  home-net;  (b)  in  the  office,  through 
the  web,  using  a  standard  workstation;  (c)  anywhere  using  a 
mobile  phone  or  a  watch. 

A  typical  user’s  task  consists  of  consulting  and  modifying  the 
temperature  of  a  particular  room.  Figures  56.4  and  56.5  show 
versions  of  the  system  for  a  home  comprised  of  two  rooms: 

•  In  Figure  56.4a,  the  screen  size  is  comfortable  enough  to  dis¬ 
play  the  state  of  all  of  the  home  thermostats. 

•  In  Fig.  56.4b,  56.4c,  and  56. 4d,  the  screen  size  allows  the  ren¬ 
dering  of  the  state  of  one  single  thermostat  at  a  time.  Thus, 
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FIGURE  56.4.  The  Home  Heating  Control  System,  (a)  Large  screen.  The  temperature  of  the 
rooms  is  available  at  a  glance,  (b)  (c)  (d)  Small  screen:  the  temperature  of  a  single  room  is 
displayed  at  a  time  but  the  system  state  is  modifiable  and  browsable  via  a  button  (in  b),  combo- 
box  (in  c)  and  hyperlinks  (in  d).  (e)  Very  small  screen:  only  one  single  room  is  observable.  Tem¬ 
perature  setting  and  navigation  tasks  are  not  available. 


FIGURE  56.5.  Modifying  the  temperature  using  a  mobile  phone. 


an  additional  navigation  task  is  required  to  access  the  state  of 
the  second  thermostat.  In  Fig.  56.4b,  the  task  is  supported 
by  a  button,  whereas  in  Fig.  56.4c  and  56. 4d,  a  combo-box 
and  hyperlinks  are  used  to  browse  the  system  state.  In  56. 4e, 
the  screen  of  the  watch  is  so  small  that  the  user  interface 
is  limited  to  the  living  room,  the  most  important  place  of 
the  home. 

Figure  56.5  shows  the  interaction  trajectory  for  setting  the 
temperature  of  a  room  with  a  mobile  phone. 

•  In  Fig.  56.5a,  the  user  selects  the  room  (e.g.,  “le  salon” — the 
living  room) . 

•  In  Fig.  56.5b,  the  system  shows  the  current  temperature  of 
the  living  room. 

•  By  selecting  the  editing  function  (“donner  ordre”),  one  can 
modify  the  temperature  of  the  selected  room  (Fig.  56.5c). 

When  comparing  with  the  situation  depicted  in  Fig.  56.4,  two 
navigation  tasks  (i.e.  selecting  the  room,  and  then  selecting  the 


edit  function)  must  be  performed  to  reach  the  desired  state.  In 
addition,  a  title  has  been  added  to  every  page  to  recall  the  user 
with  the  current  location  within  the  information  space. 

Using  the  taxonomy  of  the  section  “The  Problem  Space  of 
User  Interface  Plasiticity,”  the  Home  Heating  Control  System 
can  be  characterized  in  the  following  way:  the  context  of  use  is 
limited  to  the  user’s  preference  (i.e.  the  most  important  room 
in  the  home),  and  to  the  platform.  This  platform  is  always  a 
monocomputing  device.  Context  of  use  can  be  modeled  as  a 
graph  of  situations  where  the  set  of  entities  is  the  user  and  the 
intended  computing  devices.  The  sets  of  Roles  and  Relations 
are  empty.  Switching  between  two  situations  occurs  when  us¬ 
ing  a  different  computing  device.  Adaptation  to  context  of  use 
boils  down  to  adapting  the  UI  to  the  resources  of  the  currently 
used  computing  device.  Adaptation  relies  on  remoulding  only. 
It  is  GUI  intramodal  and  the  grain  of  recovery  is  the  session: 
switching  between  devices  on  the  fly  requires  starting  from 
scratch.  The  adaptation  level  depends  on  the  source  and  tar¬ 
get  platforms:  when  switching  between  the  b-c-d-  situations, 
adaptation  occurs  at  the  logical-presentation  level:  a  widget  is 
replaced  by  a  functionally  equivalent  widget.  When  switching 
between  the  large  screen  and  the  small  screen,  the  task  space  is 
not  changed  but  organized  differently:  adaptation  occurs  at  the 
dialogue-controller  level.  When  switching  to  the  watch,  a  num¬ 
ber  of  services  are  no  longer  available:  the  adaptation  is  per¬ 
formed  at  the  FCA  level. 

CamNote 

CamNote  (for  CAMELEON  Note)  is  a  slides  viewer  that  runs  on  a 
dynamic  heterogeneous  platform.  This  platform  may  range  from  a 
single  PC  to  a  cluster  composed  of  a  PC  and  a  PDA.  Its  UI  is  struc¬ 
tured  into  four  workspaces:  (a)  a  slides  viewer,  (b)  a  note  editor 
for  associating  comments  to  slides,  (c)  a  video  viewer  also  known 
as  “mirror  pixels”  that  shows  a  live  video  of  the  speaker,  and  (d)  a 
control  panel  to  browse  the  slides  and  to  setup  the  level  of  trans- 
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parency  of  the  mirror.  Speakers  can  point  at  items  on  the  slide 
using  their  finger.  This  means  of  pointing  is  far  more  compelling 
and  engaging  than  the  conventional  mouse  pointer  that  no  one 
can  see.  (Technically,  the  mirror  is  combined  with  the  slides  viewer 
using  alpha-blending.  See  http://iihm.imag.fr/demos/CamNote/ 
camnote_short.mov  for  a  short  movie  demo.) 

Figure  56.6a  shows  a  configuration  where  the  graphical  UI 
is  distributed  across  the  screens  of  a  PC  and  of  a  PDA.  The  slides 
viewer  is  displayed  in  a  rotative  canvas  so  that  it  can  be  oriented 
appropriately  when  projected  onto  a  horizontal  surface.  If  the 
PDA  disappears,  the  control  panel  automatically  migrates  to  the 
PC  screen.  Because  different  resources  are  now  available,  the 
control  panel  includes  different  widgets,  but  also  a  miniature 
representation  of  the  speaker’s  video  is  now  available.  During 
the  adaptation  process,  users  can  see  the  control  panel  emerg¬ 
ing  progressively  from  the  slides  viewer  so  that  they  can  evalu¬ 
ate  the  progress  of  the  adaptation.  The  UI,  which  was  distributed 
on  a  PC  and  a  PDA,  is  now  centralized  on  the  PC  (Fig.  56.6b). 
Conversely,  if  the  PDA  reenters  the  interactive  space,  the  UI  au¬ 
tomatically  switches  to  the  configuration  of  Fig.  56.6a,  and  the 
control  panel  disappears  from  the  PC  screen  by  weaving  itself 
into  the  slides  viewer  before  reappearing  on  the  PDA. 

In  this  exemplar,  context  of  use  is  limited  to  the  platform. 
Transitions  between  situations  occur  at  the  arrival  or  departure 
of  a  computing  device.  Adaptation  is  based  on  migration  and 
distribution  at  the  workspace  level,  as  well  as  on  intramodal  GUI 
remoulding  at  the  dialogue-controller  level:  when  the  control 
panel  resides  on  the  PDA,  the  note-editing  task  is  no  longer 
available.  Adaptation  is  automatic:  the  user  has  no  control  over 
the  adaptation  process,  but  a  minimum  of  meta-UI  exists  to  ex¬ 
press  the  transition  between  two  situations  (the  weaving  effect). 
State  recovery  is  performed  at  the  physical  action  level:  the 
slides  show  is  not  disturbed  by  adaptation. 

The  Sedan-Bouillon  Website 

Sedan-Bouillon  is  a  website  that  aims  to  promote  tourism 
in  the  regions  of  Sedan  (France)  and  Bouillon  (Belgium)  (http:// 


www.bouillon-sedan.com/).  It  provides  tourists  with  informa¬ 
tion  for  visiting  and  sojourning  in  these  regions  including  a  se¬ 
lection  of  hotels,  camping,  and  restaurants.  Figure  56.7a  shows 
a  simplified  version  of  this  website  when  a  user  is  logged  in  from 
a  PC  workstation. 

Preparing  a  trip  for  vacation  is  an  exciting  experience  when 
shared  by  a  group  of  people.  However,  one  single  PC  screen  does 
not  necessarily  favor  collaborative  exploration.  By  dynamically 
logging  to  the  same  website  with  a  PDA,  users  are  informed  on 
the  PDA  that  they  can  distribute  the  UI  components  of  the  site 
across  the  interaction  resources  currently  available.  In  the  exam¬ 
ple  of  Fig.  56.7b,  the  user  asks  for  the  following  configuration:  the 
title  must  appear  on  the  PDA  as  well  as  on  the  PC  (the  title  slots 
are  ticked  for  the  two  browsers  available),  whereas  the  content 
should  stay  on  the  PC  and  the  navigation  bar  should  migrate  to 
the  PDA.  Figure  56.8  shows  the  resulting  UI.  At  any  time,  the  user 
can  ask  for  a  reconfiguration  of  the  UI  by  selecting  the  “meta-UI” 
link  in  the  navigation  bar  The  UI  will  be  reconfigured  accordingly. 

In  terms  of  the  plasticity  problem  space,  the  Sedan-Bouillon 
website  is  very  similar  to  CamNote:  same  model  of  context  of  use, 
adaptation  based  on  distribution  and  migration  at  the  workspace 
level,  and  GUI  intramodal  remoulding  at  the  workspace  level.  Con¬ 
trary  to  CamNote,  remoulding  is  performed  at  the  logical-presen¬ 
tation  level  (no  task  is  suppressed  or  restructured),  and  state  re¬ 
covery  is  supported  at  the  task  level:  if  adaptation  occurs  as  the 
user  is  filling  a  form,  the  form  content  is  lost  by  the  adaptation 
process.  Contrary  to  CamNote,  the  user  has  full  control  over  the 
reconfiguration  of  the  UI  using  the  control  panel  of  the  meta-UI. 

Having  characterized  three  exemplars  in  the  problem  space, 
we  now  consider  the  method  and  mechanisms  necessary  to  sup¬ 
port  interactive  systems  plasticity.  We  advocate  a  model-driven 
engineering  (MDE)  approach. 


MODEL-DRIVEN  ENGINEERING 


The  motivation  for  MDE  is  the  integration  of  knowledge  and 
techniques  developed  in  software  engineering  using  the  notions 
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FIGURE  56.6.  The  user  interface  of  CamNote.  (a)  The  UI  of  CamNote  when  distributed  on  a  PC 
and  a  PocketPC  screens;  (b)  the  control  panel  when  displayed  on  the  PC  screen. 
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FIGURE  56.7  The  Sedan-Bouillon  website,  (a)  U1  centralized  on  a  PC  screen,  (b)  The  control  panel 
of  the  meta-UI  to  distribute  U1  workspaces  across  the  resources  of  the  interactive  space.  The  lines 
of  the  matrix  correspond  to  the  workspaces,  and  the  columns  denote  the  browsers  currently  used 
by  the  same  user. 
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FIGURE  56.8  The  Sedan-Bouillon  website  when  distributed  across  the  resources  of  the  interac¬ 
tive  space.  The  MetaUI  link  allows  users  to  return  to  the  configuration  panel  shown  in  Figure  7b. 


of  model,  model  transformation,  and  mapping  as  the  key  con¬ 
cepts.  In  the  early  days  of  computer  science,  software  systems 
were  simple  programs  written  in  assembly  languages.  In  those 
days,  a  code-centric  approach  to  software  development  was 
good  enough,  not  to  say  unavoidable,  to  ensure  a  fine  control 
over  the  use  of  computing  resources.  Over  the  years,  the  field 


has  evolved  into  the  development  of  distinct  paradigms  and  ap¬ 
plication  domains  leading  to  the  emergence  of  multiple  tech¬ 
nological  spaces  (TS). 

‘A  technological  space  is  a  working  context  with  a  set  of  as¬ 
sociated  concepts,  body  of  knowledge,  tools,  required  skills, 
and  possibilities.”  (Kurtev,  Bezivin,  &  Aksit,  2002).  Examples  of 
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technological  spaces  include  documentware  concerned  with 
digital  documents  using  XML  as  the  fundamental  language  to 
express  specific  solutions,  dataware  related  to  database  systems, 
ontologyware,  .  .  .  HCIware! 

Today,  technological  spaces  can  no  longer  evolve  in  autarky. 
Most  of  them  share  challenges  of  increasing  complexity,  such 
as  adaptation,  to  which  they  can  only  offer  partial  solutions. 
Thus,  we  are  in  a  situation  where  concepts,  approaches,  skills, 
and  solutions  need  to  be  combined  to  address  common  prob¬ 
lems.  This  is  where  MDE  comes  into  play.  MDE  aims  at  achiev¬ 
ing  integration  by  defining  gateways  between  technological 
spaces  using  a  model-based  approach.  The  hypothesis  is  that 
models,  metamodels,  model  transformations,  and  mappings 
are  everything. 


researcher 


Models 


FIGURE  56.9.  Models  organized  as  oriented  jjl  graphs  (|jl  =  "is 
represented  by"). 


A  model  is  a  representation  of  a  thing  (e.g.,  a  system),  with  a 
specific  purpose.  It  is  “able  to  answer  specific  questions  in 
place  of  the  actual”  thing  under  study  (Bezivin,  2004).  Thus,  a 
model,  built  to  address  one  specific  aspect  of  a  problem,  is  by 
definition  a  simplification  of  the  actual  thing  under  study.  Eor 
example,  a  task  model  is  a  simplified  representation  of  some 
human  activities  (the  actual  thing  under  study),  but  it  provides 
answers  about  how  “representative  users”  proceed  to  reach 
specific  goals. 

A  model  may  be  physical  (a  tangible  entity  in  the  real  world), 
abstract  (an  entity  in  the  human  mind),  or  digital  (an  entity 
within  computers)  (Favre,  2004a,  2004b).  For  example,  a  printed 
photograph  of  a  young  man  named  Peter  is  a  physical  repre¬ 
sentation  of  Peter  that  his  mother  (for  example)  uses  for  a  spe¬ 
cific  purpose.  Peter’s  mother  has  mental  representations  of  him 
as  a  good  son,  or  as  a  brilliant  researcher  (multiple  abstract 
models  about  Peter) .  The  authentification  system  that  runs  on 
Peter’s  computer  knows  him  as  a  login  name  and  password 
(digital  model).  If  Peter’s  portrait  is  digitized  as  a  JPEG  picture, 
then  the  JPEG  file  is  a  digital  model  of  a  physical  model.  When 
displayed  on  the  screen,  the  JPEG  file  is  transformed  into  yet  an¬ 
other  digital  graphics  model  in  the  system’s  main  memory  be¬ 
fore  being  projected  on  the  screen  as  an  image  (yet  another 
physical  model  that  Peter’s  mother  can  observe). 

As  this  example  shows,  models  form  oriented  graphs 
(|JL  graphs)  whose  edges  denote  the  jji  relation  “is  represented 
by”.  In  other  words,  a  model  can  represent  another  model,  and 
a  model  can  be  represented  by  several  models  (see  Fig.  56.9). 

Models  may  be  contemplative  (not  able  to  be  processed  au¬ 
tomatically  by  computers)  or  productive  (able  to  be  processed 
by  computers).  Typically,  scenarios  developed  in  HCI  (Rosson 
&  Carroll,  2002)  are  contemplative  models  of  human  experience 
in  a  specified  setting. 

In  order  to  be  processed  (by  humans,  and/or  by  computers), 
a  model  must  comply  with  some  shared  syntactic  and  semantic 
conventions:  it  must  be  a  well-formed  expression  of  a  language. 
This  is  true  both  for  productive  and  contemplative  models; 
most  contemplative  models  developed  in  HCI  use  a  mix  of 
drawings  and  natural  language.  A  language  is  the  set  of  all  well- 
formed  expressions  that  comply  with  a  grammar  (along  with 
semantics).  In  turn,  a  grammar  is  a  model  from  which  one  can 


produce  well-formed  expressions  (or  models).  Because  a  gram¬ 
mar  is  a  model  of  a  set  of  models,  it  is  called  a  “metamodel.” 

Metamodel 

A  metamode\  is  a  model  of  a  set  of  models  that  comply  with  it. 
It  sets  the  rules  for  producing  models.  It  does  not  represent 
models.  Models  and  metamodels  form  a  tree:  a  model  complies 
to  a  single  metamodel,  whereas  a  metamodel  may  have  multiple 
compliant  models. 

As  an  example,  suppose  that  the  authentification  system 
mentioned  above  is  a  Java  program /./is  a  digital  model  that  rep¬ 
resents  Peter  and  that  complies  with  the  Java  grammar  Gj.  Gj 
does  not  represent /,  but  defines  the  compliance  of /with Java. 
Gj  is  one  possible  metamodel,  but  not  the  only  one.  The  au¬ 
thentification  system  could  also  be  implemented  in  C  (yet  an¬ 
other  digital  model  of  Peter) .  It  would  then  be  compliant  with 
the  C  grammar  Gc-  Grammars  Gc  and  Gj  could,  in  turn,  be  pro¬ 
duced  from  the  same  grammar  such  as  EBNF  (Extended  Backus- 
Naur  Eorm).  EBNF,  defined  as  the  ISO/IEC  14977:1996  standard, 
is  an  example  of  a  meta-metamodel,  which  is  model  of  a  set  of 
metamodels  that  are  compliant  with  it.  It  does  not  represent 
metamodels,  but  sets  the  rules  for  producing  distinct  metamod¬ 
els.  As  shown  in  Fig.  56.10,  the  OMG  model-driven  architecture 
(MDA)  initiative  has  introduced  a  four-layer  modeling  stack  as  a 
way  to  express  the  integration  of  a  large  diversity  of  standards  us¬ 
ing  MOF  (meta-object  facility)  as  the  unique  meta-metamodel. 
MDA  is  a  specific  MDE  deployment  effort  around  industrial  stan¬ 
dards  including  MOF,  UML,  CWM,  Q’VT,  etc.  EBNF,  Gj  and  Gc,  the 
Java  and  C  programs  are  models  that  belong  to  the  program¬ 
ming  technological  space.  Within  the  MDA  technological  space, 
the  java  source  code  of  our  authentification  system  becomes  a 
UML  Java  model  compliant  with  the  UML  metamodel.  In  the 
XML  Technological  Space,  the  Java  source  code  could  be  repre¬ 
sented  as  a  JavaML  document  compliant  with  a  JavaML  DTD 
(document  type  definition).  (In  the  XML  technological  space,  a 
DTD  defines  the  legal  building  blocks  of  an  XML  document.) 

As  shown  in  Fig.  56.10,  the  relation  (“complies  with”)  makes 
explicit  the  multiplicity  of  existing  technological  spaces  as  well 
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as  their  systematic  structure  into  three  levels  of  modeling 
spaces  (the  so-called  Ml,  M2,  M3  levels  of  MDA) -F  the  MO  level 
that  corresponds  to  a  system,  or  parts  of  a  system.  The  jjl  and  x 
relations,  however,  do  not  tell  how  models  are  produced  within 
a  technological  space  or  how  they  relate  to  each  other  across 
distinct  technological  spaces.  The  notions  of  transformation 
and  mapping  is  the  MDE  answer  to  this  issue. 


Transformations  and  Mappings 

In  the  context  of  MDE,  a  transformation  is  the  production  of  a 
set  of  target  models  from  a  set  of  source  models,  according  to 
a  transformation  definition.  A  transformation  definition  is  a  set 
of  transformation  rules  that  together  describe  how  source  mod¬ 
els  are  transformed  into  target  models  (Mens,  Czarnecki  &  Van- 
Gorp,  2005).  Source  and  target  models  are  related  by  the  t  re¬ 
lation  “is  transformed  into.”  Note  that  a  set  of  transformation 
rules  is  a  model  (a  transformation  model)  that  complies  with  a 
transformation  metamodel. 

Relation  t  expresses  an  overall  dependency  between  source 
and  target  models.  However,  experience  shows  that  finer  grain 
of  correspondence  needs  to  be  expressed.  Typically,  the  incre¬ 
mental  modification  of  one  source  element  should  be  propa¬ 
gated  easily  into  the  corresponding  target  element  (s)  and  vice 
versa.  The  need  for  traceability  between  source  and  target  mod¬ 
els  is  expressed  as  mappings  between  source  and  target  ele¬ 
ments  of  these  models.  For  example,  each  task  of  a  task  model 
and  the  concepts  involved  to  achieve  the  task,  are  rendered  as 
a  set  of  interactors  in  the  Concrete  User  Interface  model.  Ren¬ 
dering  is  a  transformation  where  tasks  and  their  concepts  are 
mapped  into  a  workspace,  which,  in  turn,  is  mapped  into  a  win¬ 
dow  populated  with  widgets.  The  correspondence  between  the 
source  task  (and  concepts)  and  its  target  workspace,  window 
and  widgets,  is  maintained  as  a  mapping  function.  In  the  con¬ 
clusion,  we  will  see  how  mapping  functions  can  be  exploited 
at  run  time. 

Transformations  can  be  characterized  within  a  four-dimen¬ 
sion  space: 

•  The  transformation  may  be  automated  (it  can  be  performed 

by  a  computer  autonomously),  it  may  be  semi-automated 

(requiring  some  human  intervention),  or  it  may  be  manually 

performed  by  a  human.  For  example,  given  our  current  level 


of  knowledge,  the  transformation  of  a  “value-centered  model” 
into  a  “usability  model”  can  only  be  performed  manually.  On 
the  other  hand.  User  Interface  generators  such  as  CTTE  (Mori, 
2002,  2004)  produce  user  interfaces  automatically  from  a  task 
model. 

•  A  transformation  is  vertical  when  the  source  and  target 
models  reside  at  different  levels  of  abstraction.  UI  generation 
is  a  vertical  top  down  transformation  from  high-level  de¬ 
scriptions  (such  as  a  task  model)  to  code  generation.  Re¬ 
verse  engineering  is  also  a  vertical  transformation  but  it  pro¬ 
ceeds  bottom  up,  typically  from  executable  code  to  some 
high-level  representation  by  the  way  of  abstraction.  A  trans¬ 
formation  is  horizontal  when  the  source  and  target  models 
reside  at  the  same  level  of  abstraction.  For  example,  trans¬ 
lating  a  Java  source  code  into  C  code,  preserves  the  original 
level  of  abstraction. 

•  Transformations  are  endogenous  when  the  source  and  tar¬ 
get  models  are  expressed  in  the  same  language.  They  are  ex¬ 
ogenous  when  sources  and  targets  are  expressed  in  different 
languages  while  belonging  to  the  same  technological  space. 
For  example,  the  transformation  of  a  Java  source  code  pro¬ 
gram  into  a  C  program  is  exogenous. 

•  When  crossing  technological  spaces  (e.g.,  transforming  ajava 
source  code  into  a  JavaML  document),  then  additional  tools 
(exporters  or  importers)  are  needed  to  bridge  the  gap  be¬ 
tween  the  spaces.  Intertechnological  transformations  are 
key  to  knowledge  and  technical  integration.  This  is  the  quest 
of  MDE. 

In  the  following  section,  we  show  how  the  MDE  principles 
can  be  applied  to  the  development  of  plastic  interactive  systems 
by  bringing  together  HCIware  with  mainstream  software  tech¬ 
nological  spaces. 


MDE  FOR  PLASTIC  USER  INTERFACES 


There  are  several  approaches  for  the  development  of  the  sys¬ 
tems  presented  in  the  section,  “Case  Studies.”  For  example,  the 
user  interface  of  the  Home  Heating  Control  System  can  be  pro¬ 
duced  based  on  a  sound  iterative  user-centered  design  process. 
A  similar  process  can  be  conducted  in  parallel  for  the  mobile 
phone  version.  In  the  absence  of  explicit  links  between  the  two 
streams  of  development,  this  approach  is  doomed  to  failure: 
inconsistencies  between  the  two  user  interfaces  will  occur,  and 
maintenance  of  the  two  versions  will  be  hard  to  synchronize. 
In  addition,  development  efforts  will  be  duplicated. 

Alternatively,  the  UI  code  for  the  workstation  can  serve  as 
a  reference  for  transformations  into  a  new  target  UI  for  the 
mobile  phone.  Duplication  of  efforts  is  avoided  and  the  ap¬ 
proach  is  technically  attainable.  For  example,  using  XSLT,  the 
HTML  UI  produced  for  the  workstation  can  easily  be  translated 
into  a  WML  UI  for  the  mobile  phone.  Crossing  technological 
spaces  at  the  code  level  is  not  as  straightforward,  but  is  feasi¬ 
ble.  However,  this  approach  does  not  address  deep  UI  adap¬ 
tations:  a  code-centric  approach  is  certainly  able  to  address 
UI  adaptation  at  the  PP  and  LP  levels  (Physical  and  Logical  Pre¬ 
sentation).  It  cannot  cope  with  higher  levels  of  adaptation 
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such  as  DC  and  FCA  levels  (Dialogue  Control  and  Functional 
Core  Adaptor  levels)  where,  for  example,  task  sequencing  may 
need  substantial  reorganization. 

UI  migration  and  UI  distribution  introduce  new  challenges.  In 
a  code-centric  approach,  the  code  of  CamNote  and  Sedan-Bouil¬ 
lon  website  would  explicitly  include  instructions  that  make  ref¬ 
erence  to  the  context  of  use.  For  example,  “If  I  am  running  on 
the  PDA,  then  my  rendering  is  this-,  if  I  am  running  on  a  work¬ 
station,  then  my  rendering  is  that”.  Clearly,  this  approach  is  ac¬ 
ceptable  for  concept  demonstration.  It  does  not  scale  up  in  the 
context  of  ubiquitous  computing  where  opportunism  is  para¬ 
mount.  This  short  analysis  militates  for  an  approach  that  contin¬ 
uously  exploits  models  at  multiple  levels  of  abstraction  not  only 
for  the  development  process  but  for  the  run  time  phase  as  well. 

In  order  to  define  the  appropriate  models,  we  must  first  in¬ 
vestigate  the  functions  that  plastic  interactive  spaces  must  sup¬ 
port.  Having  identified  the  set  of  functions,  we  can  then  attach 
a  useful  set  of  perspectives  (models)  to  each  function. 

Functional  Decomposition  for  Plastic  Interactive  Systems 

The  picture  shown  in  Fig.  56.11  is  a  reference  model  that  makes 
explicit  the  functional  decomposition  of  an  interactive  space 
for  supporting  plasticity  (Balme,  Demeure,  Barralon,  Coutaz,  & 


Calvary,  2004).  At  the  bottom  of  the  picture,  the  hardware,  a  dy¬ 
namic  assembly  of  processors,  sensors  and  actuators,  as  well  as 
an  heterogeneous  set  of  interaction  resources  such  as  point¬ 
ing  devices,  microphones,  private-eyes,  and  large-size  screens. 
Each  computer  executes  its  own  operating  system  and  a  diver¬ 
sity  of  virtual  machines. 

The  virtual  machines  are  model  interpreters  of  various  sorts: 

•  Java  VM  is  a  model  interpreter  where  the  model  is  a  byte¬ 
code  program. 

•  Modality  interpreters  denote  conventional  GUI  or  Post-WIMP 
systems,  speech  processing  as  well  as  text-to-speech  proces¬ 
sors.  They  interpret  models  that  are  modality  specific. 

•  Other  model  interpreters  represent  any  set  of  interpreters 
used  at  run  time  including  UI  transformers. 

The  virtual  machines,  operating  systems,  and  hardware  (in¬ 
cluding  the  interaction  resources)  define  the  effective  platform. 
The  rest  of  the  figure  represents  the  interactive  space  per  se. 
The  interactive  space  is  comprised  of  two  functional  comple¬ 
mentary  facets: 

1.  The  fundamental  functions  that  motivate  the  existence  of 
the  system. 


FIGURE  56.11.  A  reference  model  for  the  functional  decomposition  of  plastic  interactive  systems. 
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2.  The  adaptation  manager  so  that  these  functions  can  be 
adapted  to  the  context  of  use. 

The  fundamental  functions  are  structured  according  to  the 
five  abstraction  layers  of  the  arch  reference  model  (Bass,  Farneuf, 
Little,  Mayer,  Pellegrino,  Reed,  Seacord,  Sheppard,  &  Szczur., 
1992;  Coutaz,  2001):  (a)  the  Functional  Core  and  (b)  the  Func¬ 
tional  Core  Adapter,  (c)  the  Dialogue  Controller,  (d)  the  Logi¬ 
cal  and  Physical  Presentation  layers  denoted  in  the  picture  as 
the  Multimodal  Logical  Presentation  and  Multimodal  Concrete 
UI  to  stress  the  multimodal  dimension  of  user  interfaces  (see 
chapter  21 — Sharon  Oviatt  in  this  book  for  more  information  on 
multimodal  interactive  systems). 

By  acting  on  input  interaction  resources,  users  (represented 
on  the  figure  as  eyes,  hands,  lips,  and  hears),  generate  low-level 
events  that  are  processed  by  the  appropriate  modality  inter- 
preter(s),  then  fed  into  the  Multimodal  Logical  Presentation 
layer.  The  Multimodal  Logical  Presentation  layer  is  in  charge  of 
multimodal  fusion  and  fission  both  for  input  and  output  at  the 
appropriate  level  of  abstraction.  As  in  the  Arch  model  (Bass  et  al., 
1992),  the  Dialogue  Controller  controls  the  domain  and  user- 
dependent  task  sequencing,  and  the  Functional  Core  Adapters 
express  the  software  interfaces  to  accommodate  mismatches  be¬ 
tween  the  domain-specific  Functional  Cores  (applications)  and 
the  user  interface  per  se  of  the  interactive  space.  This  five-level 
Arch-based  functional  structure  has  the  capacity  to  self-adapt  to 
the  context  of  use  thanks  to  the  Adaptation  Manager. 

The  Adaptation  Manager  aims  at  satisfying  three  functional 
requirements: 

1.  To  acquire  the  context  of  use  at  the  appropriate  level  of  ab¬ 
straction,  and  to  detect  (and  signal)  context  and  situation 
changes  (due,  for  example  to  the  arrival  of  a  PDA):  this  is  per¬ 
formed  by  the  Context&Situation  Manager. 

2.  To  elaborate  a  reaction  in  response  to  a  new  situation/context 
of  use:  this  is  performed  by  the  Evolution  Manager.  For  ex¬ 
ample,  “If  a  new  PDA  arrives,  then  move  the  control  panels 
of  the  interactive  space  to  the  PDA.” 

3.  To  execute  the  reaction  plan:  this  is  performed  by  the  Adap¬ 
tation  Producer. 

The  Context&Situation  Manager  is  structured  into  four  lev¬ 
els  of  abstraction  (Coutaz  et  al.,  2005):  at  the  lowest  level,  the 
system’s  view  of  the  world  is  provided  by  a  collection  of  sen¬ 
sors.  Sensors  may  be  physical  sensors  such  as  RFIDs,  or  soft¬ 
ware  sensors  that  probe  a  user’s  identity  and  the  state  of  the 
platform  cluster.  The  sensing  layer  generates  numeric  observ¬ 
ables.  To  determine  meaning  from  numeric  observables,  the 
Context&Situation  Manager  must  perform  transformations. 
The  perception  layer  is  independent  of  the  sensing  technol¬ 
ogy  and  provides  symbolic  observables  at  the  appropriate  level 
of  abstraction.  The  situation-and-context  identification  layer 
identifies  the  current  situation  and  context  from  observables 
and  detects  conditions  for  moving  between  situations  and 
contexts  of  use.  Services  in  this  layer  specify  the  appropriate 
entities,  roles,  and  relations  for  operating  within  the  user’s  ac¬ 
tivities.  They  can  be  used  to  predict  changes  in  situations  or 
in  contexts,  and  thus  anticipate  needs  of  various  forms  (system¬ 
centric  needs  as  well  as  user-centric  needs). 


The  reaction  plan  built  by  the  Evolution  Manager  may  re¬ 
quire  suppressing  and/or  replacing  part  or  all  of  the  Arch-based 
functional  structure.  This  can  involve  reusing  components  from 
a  components  storage,  and/or  generating  executable  code  on 
the  fly  by  applying  transformations  on  the  models  maintained 
by  the  Models  Interpreters.  Executable  code  is  one  sort  of  digi¬ 
tal  model  to  represent  a  particular  function.  In  current  code¬ 
centric  approaches,  these  are  the  only  sorts  of  models  considered 
at  run  time.  As  shown  in  the  following  section,  they  are  not  the 
only  ones  in  a  MDE  approach  to  UI  plasticity. 

Models  for  UI  Plasticity 

HCI  design  methods  produce  a  large  body  of  contemplative 
models  such  as  scenarios,  drawings,  storyboards,  and  mock  ups. 
These  models  are  useful  reference  material  during  the  design 
process.  On  the  other  hand,  because  they  are  contemplative, 
they  can  only  be  transformed  manually  into  productive  mod¬ 
els.  Manual  transformation  supports  creative  inspiration,  but  is 
prone  to  incorrect  interpretation  and  loss  of  key  information. 
Typically,  contextual  information  elicited  in  a  Contextual  Design 
analysis,  is  lost  during  the  development  process.  From  explicit, 
contextual  information  becomes  implicit  within  the  executable 
program:  the  mapping  between  the  real-world  contextual  in¬ 
formation  and  its  corresponding  translation  into  code  does  not 
exist.  How  can  we  improve  the  situation? 

Given  the  current  state  of  the  art  in  HCI  and  SE,  we  see  three 
possible  realistic  bridging  points: 

1.  Moving  from  contemplative  scenarios  to  productive  use 
cases; 

2.  Bringing  together  productive  task  models  with  UML  class  di¬ 
agrams  to  cover  domain-dependent  services  and  concepts; 

3.  Moving  from  the  contemplative  description  of  context  of  use 
into  the  terms  of  the  ontological  space  presented  in  the  sec¬ 
tion,  “Context  and  Context  of  Use.” 

The  framework  presented  in  Eig.  56.12  combines  the  second 
and  third  of  our  options  with  the  MDE  M1-M2  levels  (Calvary, 
Coutaz,  Thevenin,  2001;  Calvary,  Coutaz,  Thevenin,  Limbourg, 
Souchon,  Bouillon,  &  Vanderdoncky,  2002).  The  M2  level  is 
composed  of  a  set  of  metamodels  (and  relations)  that  specify 
the  structure  of  the  “important”  concepts  of  the  problem  space 
described  in  “The  Problem  of  User  Interface  Plasticity.”  The  Ml 
level  is  populated  with  multiple  models  that  each  provides  an 
“important”  perspective  on  a  given  interactive  space.  Accord¬ 
ing  to  the  MDE  principle,  the  Ml-level  models  comply  with  the 
M2  level  metamodels. 

Eigure  56.13  makes  explicit  the  definition  of  some  of  the  M2 
metamodels  (and  their  relations)  using  UML  as  meta-meta- 
model.  Eigure  56.15  shows  examples  of  Ml-level  models  for  the 
Home  Heating  Control  System  and  their  mapping  with  their 
respective  metamodels. 

The  M2-models  (metamodels)  include,  but  are  not  limited  to 
the  following: 

M2-context  of  use  is  defined  as  a  specialization  of  the  ontol¬ 
ogy  presented  in  “Context  and  Context  of  Use”  where  users, 
platforms  and  physical  environments  are  sorts  of  Entities. 
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Context  of  Use  Value  Domain 


User  Platform  Environment  Criteria  Concept  Task  Workspace  Interactor  Program 


FIGURE  56. 1 2.  A  Model  Driven  Engineering  framework  for  plastic  Ul’s.  The  set  of  columns  rep¬ 
resent  different  perspectives  on  a  particular  Ul.  in  a  column,  all  of  the  models  conform  to  the  same 
meta-model.  A  line  corresponds  to  the  result  of  a  transformation  step.  The  framework  makes 
explicit  the  need  for  revising  models  during  the  engineering  process  to  fit  in  new  constraints.  For 
instance,  an  initial  task  model  (M 1  -Tsk)  might  be  tuned  into  another  version  (M 1  -Tsk')  where  a  task 
interleaving  operator  (Figure  4a)  is  replaced  with  a  sequence  operator  (Figure  4b)  to  cope  with  a 
small  screen  size. 


FIGURE  56. 1 3.  A  subset  of  the  meta-models  of  our  framework  for  plastic  Ul’s. 
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M2-evaluation  criteria  denotes  the  productive  metamodel 

that  results  from  a  manual  translation  of  the  system  value 

presented  in  “Usability.” 

M2-tasks  and  M2-concepts.  As  shown  in  Figure  56.13, 
a  task  has  a  name  and  pre-  and  post-  conditions.  It  may  be  com¬ 
posed  of  other  tasks  by  the  way  of  a  binary  operator  (such  as  the 
AND,  OR,  SEQ  operators),  or  decorated  with  a  unary  operator 
(such  as  Optionality  Criticity  and  Default  option) .  BinaryOper- 
ators  are  related  to  navigation  interactors  to  move  between  task 
spaces  in  the  Concrete  UI.  A  task  manipulates  concepts  (de¬ 
noted  by  the  “ConceptTask”  relation).  In  turn,  a  concept  is  a 
class  composed  of  a  set  of  attributes.  This  class  may  inherit  from 
other  classes,  and  may  serve  different  roles.  A  concept  is  repre¬ 
sented  as  interactor(s)  in  the  Concrete  UI  by  the  way  of  the  Con- 
ceptContainment  relation.  The  TaskSpace  relation  shows  how 
a  task  relates  to  a  workspace. 

M2-Workspace  structures  the  task  space  at  an  abstract  level, 
and  M2-interactor  describes  the  interactors  that  will  populate 
workspaces  in  the  Concrete  UI.  As  shown  by  the  definition  of 
M2-Workspace,  workspaces  are  chained  with  each  other  de¬ 
pending  on  the  binary  operator  of  the  source  tasks.  A  work¬ 
space  is  mapped  into  the  Concrete  UI  as  a  container  class  in¬ 
teractor.  This  container  interactor,  which,  in  the  GUI  modality, 
may  be  a  panel  or  a  window,  is  populated  with  interactors  that 
render  concepts  and  binaryOperations  (navigation  operators). 

M2-Program.  A  program  is  more  than  executable  code.  It 
covers  multiple  perspectives,  typically:  its  functional  decompo¬ 
sition,  the  mapping  between  functions  and  software  compo¬ 
nents,  the  components  configuration  according  to  an  archi¬ 
tectural  style,  the  mapping  between  the  components  and  the 
resources  of  the  platform  (Bass,  Clements,  &  Kazman,  1998).  An 
architecture  style  is  a  metamodel  that  includes  a  vocabulary  of 
design  elements  (e.g.,  pipes  and  filters  components),  that  im¬ 
poses  configuration  constraints  on  these  elements  (e.g.,  pipes 
are  mono-directional  connectors  between  two  filter  compo¬ 
nents),  and  determines  a  semantic  interpretation  that  gives 
meaning  to  the  system  description  (Garlan  &  Shaw,  1993).  Typ¬ 
ically,  a  component  specifies  the  services  (functions)  it  requires, 
the  functions  it  provides,  the  quality  of  service  it  can  support, 
the  technical  space  to  which  it  belongs,  the  “code”  it  encapsu¬ 
lates.  Components  as  defined  in  distributed  systems  are  not 
“rich  enough”  to  express  HCI  concerns.  Typically,  we  need  to  ex¬ 
press  that  a  component  is  able  to  support  a  particular  set  of 
tasks,  that  it  can  run  on  both  a  PC  and  a  MacOS  X  platform,  and 
that  it  supports  such  and  such  evaluation  criteria  in  such  and 
such  situation.  The  Comet  is  one  attempt  in  this  direction  (Cal¬ 
vary  et  al.,  2004). 

M2-Transformations  correspond  to  the  descriptions  of  trans¬ 
formations  that  can  be  automated.  Figure  56.14  show  examples 
of  typical  transformations  between  source  task  models  and  tar¬ 
get  workspaces  using  the  following  rules:  (a)  There  must  be  one 
workspace  per  source  task;  (b)  Chaining  between  workspaces 
must  be  based  on  task  operators.  Thus,  “T=  TI  T2”  (which 
means:  “to  achieve  T,  TI  must  be  achieved  first,  followed  by  the 
achievement  of  T2”)  is  transformed  into  two  workspaces 
(shown  in  the  picture  as  rectangles)  that  are  accessible  in  se¬ 
quence:  the  first  workspace  supports  TI  and  provides  access 
to  the  workspace  that  supports  T2.  If  T  can  be  repeated,  then 


FIGURE  56. 14.  Typical  transformation  patterns  between  task 
models  (expressed  with  UAN  operators)  and  workspaces. 


the  workspace  that  supports  T2  must  provide  access  to  the 
workspace  that  supports  TI.  A  more  formal  description  of  this 
type  of  transformation  is  presented  next. 

In  practice,  how  to  exploit  an  M2 -space? 

The  Models  in  Action 

The  M2-space  offers  many  ways  to  produce  plastic  UIs.  Let’s  see 
some  examples. 

The  approach  in  conventional  UI  generation  is  to  start  from 
the  production  of  a  Ml-task  model  and  a  Ml-concept  model,  then 
to  apply  successive  vertical  transformations  down  to  the  genera¬ 
tion  of  a  Ml-Program.  In  the  example  of  Figure  56.15,  the  de¬ 
signer  has  produced  the  Mi-Task  model  (top-right  of  the  figure) 
and  the  Ml-Concept  model  (top  left)  for  the  Home  Heating  Con¬ 
trol  System.  These  Ml-level  models  comply  with  their  respective 
M2-level  models.  They  have  been  transformed  by  vertical  top 
down  refinement  into  the  Concrete  UI  shown  at  the  top  center 
of  Figure  56.15.  This  Concrete  UI,  as  perceived  and  manipulated 
by  the  user,  results  from  the  interpretation  of  the  correspond¬ 
ing  Ml-program  by  the  model  interpreters  of  Figure  56.11. 

Because  the  Ml-level  models  of  our  example  are  productive, 
the  transformations  can  be  performed  automatically  in  the  fol¬ 
lowing  way:  the  Mi-Task  model,  which  is  comprised  of  one  top- 
level  task  and  two  subtasks,  is  transformed  into  one  entry/exit 
workspace  that  leads  to  two  workspaces  (bottom  right  of  Fig. 
56.15).  Figure  56.16  shows  the  corresponding  transformation 
description  using  ATL  (Bezivin,  Dupe,  Jouault,  Pitette,  &  Rougui, 
2003)  as  a  transformation  language. 

Each  workspace  is  then  mapped  into  a  container  interactor 
(a  window  for  the  entry/exit  workspace,  and  two  panels  within 
the  window  for  the  other  two  workspaces).  The  concepts  of  MI- 
Concept  (room  and  thermostats)  are  mapped  into  their  respec¬ 
tive  interactors  within  their  container  interactors:  on  one  hand, 
a  ComboBox  for  specifying  the  room;  on  the  other  hand,  a 
ComboBox  for  setting  the  room  temperature  and  a  label  to  ex¬ 
press  the  unit  of  measure  (Celsius).  The  interactors  are  then 
transformed  into  executable  code.  This  top-down  approach  is 
used  in  classic  UI  generation.  However,  in  classic  UI  generation, 
there  is  no  traceability  between  the  Ml-level  models.  Typically, 
once  the  code  is  generated,  the  task  model  is  lost. 

In  our  approach,  all  of  the  Ml-level  models  maintain  an  ex¬ 
plicit  mapping  between  them.  Since  they  are  “alive”  at  run  time, 
any  model  can  be  used  on  the  fly  to  inform  the  adaptation 
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FIGURE  56. 1 5.  The  Home  Heating  Control  System  from  an  MDE  perspective.  A  subset  of  the 
M 1  -level  models  that  correspond  to  the  situation  c)  of  Figure  4,  and  their  mapping  with  their  respec¬ 
tive  M2-level  models.  (For  the  sake  of  simplicity,  only  a  subset  of  the  mappings  are  represented.) 


process.  Thus,  our  approach  makes  it  possible  to  combine 
transformations  at  run  time,  mixing  bottom-up  transformations 
to  reach  the  appropriate  level  of  abstraction  with  horizontal 
transformations  to  switch,  for  example,  to  a  distinct  technolog¬ 
ical  space,  then  applying  vertical  refinement  to  produce  an  ap¬ 
propriate  retargeted  UI.  Note  that,  because  mapping  is  main¬ 
tained  at  a  fine  grain,  transformations  can  be  applied  only  to  the 
portions  of  the  UI  that  need  to  be  adapted. 

Other  forms  of  optimization  can  be  performed  for  known  re¬ 
current  situations.  For  example,  in  CamNote  and  the  Sedan- 
Bouillon  website,  two  situations  have  been  devised  by  design: 
the  dynamic  arrival  and  departure  of  a  PDA  with  the  dynamic 
migration  of  workspaces  between  the  resources  of  the  platform. 
For  these  preplanned  situations,  two  types  of  control  panels 
have  been  precomputed  and  maintained  in  a  repository  of  com¬ 


ponents.  They  are  dynamically  retrieved  as  needed  and  mapped 
to  the  appropriate  computing  device. 

In  an  early  version  of  the  Home  Heating  Control  System, 
switching  from  a  UI  implemented  in  Java/AWT  to  a  UI  imple¬ 
mented  in  WML  or  HTML,  was  not  performed  on  the  fly:  the 
target  UIs  were  pre-computed  according  to  the  classic  top- 
down  approach  using  factorization  and  decoration  transforma¬ 
tions  at  every  step  of  the  vertical  refinement  process.  Figure 
56.17  illustrates  the  principle  of  these  transformations.  At  the 
top,  three  Ml-task  models  have  been  devised  for  the  Java/AWT, 
HTML  and  WML-enabled  target  platforms.  The  task  model  ob¬ 
tained  from  factorization  is  shown  at  the  bottom  left:  3  specific 
parts  are  linked  to  a  shared  part.  The  AVANTTs  polymorphic 
tasks  apply  this  method.  On  the  bottom  right,  a  unique  task 
model  is  produced  using  decorations  to  express  exceptions. 
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module  M2TaskToM2Workspace  { 
from  MlTask  :  M2Task 
to  MlWorkspace  :  M2Workspace 

—  One  workspace  per  task 
rule  TaskToSpace  { 

from  t  :  M2Task!Task 

to  w  :  M2Workspace ! Space  ( 
name  <-  t.name  ) 

) 

—  OrOperator  to  SequenceOperators 
rule  OrOperatorToSequencef 

from  o  :  M2Task ! BinaryOperator  ( 
o.name  =  "or" 

) 

to  motherToLeft  :  M2Workspace ! Sequence  ( 
origin<-  [  TaskToSpace. w]  o .motherTask, 
destination<-[  TaskToSpace .w]  o.leftTask) 


FIGURE  56. 16.  An  ATL  transformation  description  based  on 
the  M2-space  of  Figures  56.14  and  56. 1 6.  The  rule  TaskToSpace 
creates  one  workspace  vi  per  source  task  t  where  w  takes  the 
name  of  t;  The  rule  OrOperatotToSequence  transforms  all  OR  oper¬ 
ators  0  between  two  tasks  (o.leftTask  and  o.rightTask)  into  two 
sequence  operators  (from  o.motfierTask  to  o.leftTask,  and  o.leftTask 
to  O.rightTask). 
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tion  for  the  WML  target  of: 
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FIGURE  56. 1 7.  Factorization  applied  to  task  models. 


Factorisation  allows  designers  to  focus  on  one  target  at  a  time, 
followed  by  the  combination  of  the  descriptions  produced  in¬ 
dependently  for  each  target  into  a  single  description  where 
common  parts  are  factored  out.  Decoration  supports  a  differ¬ 
ent  approach  where  designers  focus  on  a  reference  target  and 
then  define  exceptions. 


CONCLUSION 


Ad-hoc  code-centric  approaches  to  the  development  and  de¬ 
ployment  of  plastic  interactive  spaces  are  acceptable  when  the 


intended  context  of  use  is  highly  constrained.  In  ubiquitous 
computing,  all  aspects  of  user-interface  adaptation  cannot  be 
preprogrammed.  MDE,  which  supports  (re) computation  from 
any  level  of  abstraction,  provides  one  possible  avenue  to  ad¬ 
dress  this  new  challenge. 

In  MDE,  everything  is  a  model.  Models  that  represent  a  sys¬ 
tem  do  not  float  as  independent  things.  Instead,  they  are  orga¬ 
nized  into  oriented  graphs.  Each  graph  represents  an  impor¬ 
tant  principle  of  MDE:  “compliance,”  “representation,”  and 
“transformations .  ” 

•  Compliance  enforces  rigor.  In  MDE,  the  modeling  world  is  or¬ 
ganized  into  three  levels  of  reasoning:  the  Ml  model  level, 
the  M2  metalevel,  and  the  M3  metameta  level,  where  each 
level  complies  with  the  level  above  it. 

•  Representation  supports  the  capacity  to  define  multiple  per¬ 
spectives  on  a  given  thing  such  as  an  interactive  space. 

•  Transformation  and  mapping  are  fundamental  to  on  the  fly 
adaptation.  It  also  provides  the  formal  foundation  for  cross¬ 
ing  the  boundaries  between  technological  spaces.  This  is 
where  SE  and  HCI  can  cross-fertilize. 

In  HCI,  Model-based  approaches  to  the  automatic  genera¬ 
tion  of  user  interfaces  have  been  investigated  since  the  mid¬ 
eighties  (Cf  the  seminal  COUSIN  (Hayes,  Szekely,  &  Lerner, 
1985)  and  Open-Dialogue  (Schulert,  Rogers,  &  Hamilton,  1985) 
User  Interface  Management  Systems — UIMS’s).  Although  promis¬ 
ing,  the  UIMS  technology  has  not  found  wide  acceptance:  for 
developers,  it  meant  a  new  language  to  learn  and  designers  felt 
severely  limited  by  the  constraints  of  stereotyped  user  interfaces 
(Myers,  Hudson,  &  Pausch,  2000).  The  MDE-based  framework 
we  propose  is  intended  to  alleviate  these  problems:  tools  can  be 
defined  to  encapsulate  the  low-level  details  of  a  particular  lan¬ 
guage.  Developers  can  “program”  (produce  models)  in  their  fa¬ 
vorite  language  since  the  result  of  this  programming  activity  is 
a  model  that  can  be  transformed  into  the  appropriate  target 
model(s).  Creativity  is  not  limited  since,  again,  MO  level  code 
can  be  vertically  transformed  into  higher  level  of  abstraction. 
The  fundamental  difference  with  the  UIMS  technology  is  that  all 
models  of  a  system  are  alive  at  run  time:  the  run  time  system  is 
not  about  executable  code  only. 

Our  MDE  framework  sets  the  global  picture  and  principles 
for  plastic  user  interfaces.  It  can  be  put  in  actions  in  many  ways. 
Seminal  work  in  plastic  UI  includes  UIML  (Abrams,  Phanariou, 
Batongbacal,  Williams,  &  Shuster,  1994)  and  XIML  (Puerta  & 
Eisenstein,  2001)  that  transform  Ml  level  models  into  MO  level 
programs  to  support  Logical  Presentation  level  adaptation  for 
centralized  GUI.  Tools  for  retargeting  UFs  such  as  Vaquita  (Bouil¬ 
lon  &  Vanderdonckt,  2002)  and  WebRevenge  (Paganelli  &  Pa- 
terno,  2003)  correspond  to  a  combination  of  bottom-up  verti¬ 
cal,  horizontal,  and  top-down  vertical  transformations.  They  lie 
within  the  same  metameta  level  (the  XML  Technological  Space), 
but  they  use  distinct  M2  metamodels.  Vaquita  and  WebRevenge 
work  off  line.  On  the  other  hand,  Digymes  (Coninx,  Luyten, 
Vandervelpen,  Van  den  Bergh,  &  Creemers,  2003)  and  Icrafter 
(Ponnekanti,  Lee,  Fox,  Hanrahan,  &  Wilograd,  2001)  generate 
Concrete  User  Interfaces  (GUI)  at  run  time  where  a  renderer  dy¬ 
namically  computes  a  GUI  from  a  workspace  level  model  ex¬ 
pressed  in  XML.  Websplitter  (Han,  Perret,  &  Naghshineh,  2000) 
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supports  the  distribution  of  web  pages  content  at  the  interac¬ 
tor  level  across  the  interaction  resources  of  heterogeneous  clus¬ 
ters,  but  distribution  is  statically  specified  in  an  XML  policy  file. 
As  proof  of  concepts,  small  size  exemplars  have  been  developed 
for  different  technological  spaces.  The  challenge  is  to  define  ap¬ 
propriate  M2-level  metamodels.  Transformation  is  also  key  to 
the  success  of  this  approach.  TransformiXML  of  the  UsiXML 
(Limbourg  2004;  Limbourg,  Vanderdonckt,  Michotte,  Bouillon, 
&  Lopez-Jacquero,  2004b)  metalevel  environment,  which  is 
based  on  graphs  transformations,  is  certainly  a  promising  way. 
Thus,  there  is  still  a  long  way  to  go.  Success  will  require  a  strong 
active  collaboration  between  software  engineering  for  HCI,  soft¬ 
ware  architecture,  machine  perception,  and  evaluation  process. 
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INTRODUCTION 


Usability  testing  is  one  of  the  most  popular  usability  methods 
used  today,  and  it  is  the  evaluation  method  that  usability  spe¬ 
cialists  say  has  the  largest  impact  on  product  improvement 
(Rosenbaum,  Rohn,  &  Humburg,  2000).  Indeed,  it  has  become 
so  popular  that  some  people  use  the  term  to  refer  to  usability 
engineering  methods  in  general.  However,  usability  testing  is 
just  one  of  the  many  usability  methods  available  for  evaluating 
the  ease  of  learning  and  use  of  products.  This  chapter  will  de¬ 
fine  usability  testing,  describe  its  core  issues,  and  discuss  our 
conception  of  its  future  directions. 


A  BRIEF  HISTORY  OF  USABILITY  TESTING 


Usability  testing  gained  popularity  in  the  early  1980s,  when  soft¬ 
ware  was  beginning  to  reach  a  wider  audience  than  just  com¬ 
puting  professionals.  The  explosion  of  end-user  computing  was 
made  possible  by  new  hardware  and  software  in  the  form  of 
both  the  minicomputer  and  microcomputer  and  expansion  of 
communications  technology.  These  advances  moved  computing 
from  the  isolated  computer  room  to  the  desktop.  The  advent 
of  the  CRTs,  modems,  and,  later  in  the  decade,  graphical  user  in¬ 
terfaces  made  interacting  directly  with  the  computer  in  real 
time  possible. 

The  1982  conference  called  “Human  Factors  in  Computer 
Systems,”  held  in  Gaithersburg,  Maryland,  brought  together  for 
the  first  time  professionals  from  multiple  disciplines  interested 
in  studying  and  understanding  HCI.  Subsequent  meetings  of 
this  group  became  known  as  the  “CHI  Conference.”  At  that  first 
meeting,  a  session  on  “Evaluating  Text  Editors”  described  what 
now  would  be  called  “usability  tests”  (Roberts  &  Moran,  1982). 
These  studies  were  written  in  the  style  of  experimental  psy¬ 
chology  reports,  including  sections  titled  “Experimental  Design” 
and  “Data  Analysis,”  in  which  they  described  the  computation  of 
inferential  statistics. 

The  reliance  on  psychological  research  experiments  as  the 
model  for  usability  testing  was  challenged  early  but  has  per¬ 
sisted  throughout  its  history.  Young  and  Barnard  (1987)  pro¬ 
posed  the  concept  of  scenarios  instead  of  experiments,  and  two 
years  later,  CHI  Conference  writers  discussed  issues  such  as 
“The  role  of  laboratory  experiments  in  HCI:  Help,  hindrance  or 
ho-hum?”  (Wolf.  1989). 

The  first  books  on  HCI  began  appearing  at  this  time.  Perhaps 
the  most  influential  book,  Shneiderman’s  (1987)  Designing  the 
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user  interface,  did  not  have  an  index  item  for  usability  testing 
but  did  have  one  for  “Quantitative  Evaluations.”  In  that  section, 
Shneiderman  wrote: 

Scientific  and  engineering  progress  is  often  stimulated  by  improved 
techniques  for  precise  measurement.  Rapid  progress  in  interactive  sys¬ 
tems  design  will  occur  as  soon  as  researchers  and  practitioners  evolve 
suitable  human  performance  measures  and  techniques.  .  .  .  Academic 
and  industrial  researchers  are  discovering  that  the  power  of  traditional 
scientific  methods  can  be  fruitfully  employed  in  studying  interactive 
systems,  (p.  411) 

In  the  second  edition,  there  was  still  no  entry  in  the  index 
for  “Usability  Testing,”  but  there  was  one  for  “Usability  Labora¬ 
tories.”  Shneiderman  (1992)  described  usability  tests  but  called 
them  “pilot  tests.”  These  tests  “can  be  run  to  compare  design 
alternatives,  to  contrast  the  new  system  with  current  manual 
procedures,  or  to  evaluate  competitive  products”  (Schneider- 
man,  1992,  p.  479).  It  was  during  this  period  that  the  first  di¬ 
dactic  books  devoted  to  testing  appeared  (Dumas  &  Redish, 
1993;  Rubin,  1994). 

In  his  third  edition,  there  was  a  section  in  Shneiderman’s 
(1997)  book  on  “Usability  Testing  and  Laboratories.”  Shneider¬ 
man  wrote: 

Usability-laboratory  advocates  split  from  their  academic  roots  as  these 
practitioners  developed  innovative  approaches  that  were  influenced  by 
advertising  and  market  research.  While  academics  were  developing  con¬ 
trolled  experiments  to  test  hypotheses  and  support  theories,  practi¬ 
tioners  developed  usability-testing  methods  to  refine  user  interfaces 
rapidly,  (p.  128) 

This  brief  history  shows  that  usability  testing  has  been  an 
established  evaluation  method  for  only  about  15  years.  Testing 
was  slow  to  break  away  from  its  roots  in  psychology  research — 
it  has  continued  relatively  unchanged  until  recently.  However, 
change  is  in  the  air,  as  we  will  discuss  below. 


USABILITY  TESTING  BASICS 


"Valid  usability  tests  have  the  following  six  characteristics: 

•  The  focus  is  on  usability. 

•  The  participants  are  end  users  or  potential  end  users. 

•  There  is  a  product  or  system  to  evaluate. 


57.  Usability  Testing:  Current  Practice  and  Future  Directions 


1131 


•  The  participants  perform  tasks,  usually  while  thinking  aloud. 

•  The  data  are  recorded  and  analyzed. 

•  The  results  of  the  test  are  communicated  to  appropriate 
audiences. 

We  will  consider  each  of  these  basics  in  turn. 


The  Focus  Is  on  Usability 

It  may  seem  obvious  that  a  usability  test  should  be  about  us¬ 
ability,  but  sometimes  people  try  to  use  a  test  for  other  inap¬ 
propriate  purposes  or  call  other  methods  usability  tests.  Per¬ 
haps  the  most  common  mismatch  is  between  usability  and 
marketing/promotion  issues,  such  as  adding  a  question  to  a 
posttest  questionnaire  that  asks  participants  if  they  would  buy 
the  product  they  just  used.  If  the  purpose  of  the  question  is  to 
provide  an  opportunity  for  participants  to  talk  about  their  re¬ 
actions  to  the  test  session,  the  question  is  appropriate  for  us¬ 
ability  tests.  However,  if  the  question  is  added  to  predict 
whether  customers  would  actually  buy  the  product,  the  ques¬ 
tion  is  not  appropriate.  A  six-participant  usability  test  is  not  an 
appropriate  method  for  estimating  sales  or  market  share.  Ob¬ 
viously,  a  company  would  not  base  its  sales  projections  on  the 
results  of  the  question,  but  people  who  read  the  test  report 
may  draw  inappropriate  conclusions  when  such  a  question  is 
included. 


The  Participants  Are  End  Users  or  Potential  End  Users 

In  a  valid  usability  test,  the  participants  must  be  part  of  the  tar¬ 
get  market  for  the  product.  Testing  with  other  populations  may 
be  useful  in  that  it  may  find  usability  problems.  However,  the  re¬ 
sults  cannot  be  generalized  to  the  target  population. 

The  key  to  finding  people  who  are  potential  candidates  for 
the  test  is  a  user  profile  (Branaghan,  1997)  or  a  persona  (Pruitt  & 
Adlin,  2005).  A  user  profile  captures  two  types  of  characteristics: 

•  Those  that  the  users  share 

•  Those  that  might  make  a  difference  among  users 

For  example,  in  a  test  of  a  cell  phone  upgrade,  participants 
could  be  people  who  now  own  a  cell  phone  or  who  would  con¬ 
sider  buying  one.  Of  the  people  who  own  a  phone,  you  may 
want  to  include  people  who  owned  the  previous  version  of  the 
manufacturer’s  phone  and  people  who  own  other  manufac¬ 
turers’  phones.  You  need  to  build  a  user  profile  of  the  relevant 
characteristics  of  users.  It  is  from  that  profile  that  you  create  a 
screening  questionnaire  to  identify  potential  participants  for 
the  test. 

A  common  issue  at  this  stage  of  planning  is  that  there  are 
more  relevant  groups  to  test  than  there  are  resources  to  test 
them.  This  situation  forces  the  test  team  to  decide  which  group 
or  groups  to  test.  This  decision  should  be  based  on  the  prod¬ 
uct  management’s  priorities,  not  on  how  easy  it  might  be  for  the 
usability  specialist  to  recruit  participants. 


There  Is  a  Product  or  System  to  Evaluate 

Usability  testing  can  be  performed  with  almost  any  product  or 
technology.  The  range  includes, 

•  Products  with  user  interfaces  that  are  all  software,  such  as  a 
database  management  system;  all  hardware,  such  as  a  high- 
quality  pen;  and  those  that  have  both,  such  as  a  cell  phone,  a 
clock  radio,  a  hospital  patient  monitor,  or  a  circuit  board  tester 

•  Products  intended  for  different  types  of  users,  such  as  con¬ 
sumers,  medical  personnel,  engineers,  network  managers, 
high-school  students,  computer  programmers,  and  so  forth 

•  Products  that  are  used  together  by  groups  of  users,  such  as 
cooperative  work  software  (Scholtz  &  Bouchette,  1995) 

•  Products  in  various  stages  of  development,  such  as  user 
interface  concept  drawings,  early  low-tech  prototypes,  more 
fully  functioning  high-fidelity  prototypes,  products  in  beta 
testing,  and  completed  products 

•  Components  that  are  embedded  in  or  accompany  a  product, 
such  as  print  manuals,  instruction  sheets  that  are  packaged 
with  a  product,  tutorials,  quick-start  programs,  online  help, 
and  so  forth 

The  Participants  Perform  Tasks,  Usually  While  Thinking  Aloud 

This  is  the  execution  phase  of  the  test.  It  is  where  the  test  par¬ 
ticipant  and  the  test  administrator  interact  and  it  is  where  the 
data  are  collected.  Before  the  test  session  starts,  the  adminis¬ 
trator  instructs  the  participant  how  the  test  will  proceed  and 
informs  the  participant  that  the  test  probes  the  usability  of  the 
product,  not  the  participant’s  skills  or  experience.  In  many  us¬ 
ability  tests,  participants  are  asked  to  think  aloud,  which  we  dis¬ 
cuss  in  more  detail  below. 

Selecting  tasks.  One  of  the  essential  requirements  of 
every  usability  test  is  that  the  test  participants  attempt  tasks  that 
users  of  the  product  will  perform.  When  a  product  of  even  mod¬ 
est  complexity  is  tested,  however,  there  are  more  tasks  than 
there  is  time  available  to  test  them,  so  it  is  necessary  to  sample 
tasks.  While  not  often  recognized  as  a  limitation  of  testing,  the 
sample  of  tasks  is  a  limitation  to  the  scope  of  a  test.  Compo¬ 
nents  of  a  design  that  are  not  touched  by  the  tasks  the  partici¬ 
pants  perform  are  not  evaluated.  This  limitation  in  thorough¬ 
ness  is  often  why  testing  is  combined  with  usability  inspection 
methods,  which  have  thoroughness  as  one  of  their  strengths 
(Cockton,  Lavery  &  Woolrych,  2003). 

In  a  diagnostic  test,  testers  select  tasks  for  three  primary 
reasons: 

1.  They  include  important  tasks  such  as  (a)  tasks  that  are  per¬ 
formed  frequently  or  are  basic  to  the  job  users  will  want  to 
accomplish,  and  (b)  tasks  that  are  critical,  if  infrequent,  be¬ 
cause  they  affect  other  tasks,  such  as  log  in  or  configuration. 
With  almost  any  product,  there  is  a  set  of  basic  tasks.  Basic 
means  tasks  that  tap  into  the  core  functionality  of  the  prod¬ 
uct.  For  example,  nurses  using  a  patient  monitor  will  fre¬ 
quently  look  to  see  the  vital  sign  values  of  the  patient  and 
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will  want  to  silence  any  alarms  once  they  determine  the 
cause.  In  addition,  the  nurse  will  want  to  adjust  the  alarm 
limits,  even  though  the  limit  adjustment  may  be  done  infre¬ 
quently.  Consequently,  viewing  vital  signs,  silencing  alarms, 
and  adjusting  alarm  limits  are  basic  tasks. 

2.  They  include  tasks  that  probe  areas  where  usability  problems 
are  likely.  For  example,  if  testers  think  that  users  will  have  dif¬ 
ficulty  knowing  when  to  save  their  work,  they  may  add  sav¬ 
ing  work  to  several  other  tasks. 

3.  They  include  tasks  that  probe  the  components  of  a  design, 
such  as  tasks  that  force  the  user  to  navigate  to  the  lowest  level 
of  the  menus  or  tasks  that  have  toolbar  shortcuts.  The  goal 
is  to  include  tasks  that  increase  thoroughness  at  uncovering 
problems. 

Some  additional  reasons  for  selecting  tasks  are 

•  They  may  be  central  to  the  business  goals  for  the  product. 

•  They  may  be  have  been  redesigned  in  response  to  the  results 
of  a  previous  test. 

•  They  may  be  new  to  the  product. 

With  so  many  reasons  for  selecting  tasks,  paring  the  task  list 
to  the  time  available  is  an  important  part  of  test  planning.  Task 
selection  is  greatly  facilitated  when  the  test  team  understands 
the  tasks  users  want  to  do. 

Typically,  testers  and  developers  get  together  in  the  early 
stages  of  test  planning  to  create  a  list  of  tasks.  This  is  a  valuable 
opportunity  to  engage  the  product  team  in  a  discussion  about 
usability.  In  addition  to  including  a  task  in  the  list,  the  testers 
also  need  to  make  some  preliminary  estimate  of  how  long  each 
task  will  take.  The  time  estimate  is  important  for  deciding  how 
many  tasks  to  include;  it  may  also  be  useful  for  setting  time  lim¬ 
its  for  each  task. 

Generating  task  scenarios.  Almost  without  exception, 
testers  present  the  tasks  that  participants  do  in  the  form  of  a 
task  scenario.  For  example: 

You  have  just  bought  a  new  combination  telephone  and  answering  ma¬ 
chine.  The  box  is  on  the  table.  Take  the  product  out  of  the  box  and  set 
it  up  so  that  you  can  make  and  receive  calls. 

From  the  beginning,  usability  testers  recognized  the  artifi¬ 
ciality  of  the  testing  environment.  The  task  scenario  has  been 
an  attempt  to  bring  a  flavor  of  the  operational  environment 
and  the  way  the  product  will  be  used  into  the  test.  In  most 
cases,  the  scenario  is  the  only  mechanism  for  introducing 
these  elements  into  the  test.  Rubin  (1994)  described  task  sce¬ 
narios  as  adding  context  and  the  participant’s  rationale  and 
motivation  to  perform  tasks.  “The  context  of  the  scenarios  will 
also  help  them  to  evaluate  elements  in  your  product’s  design 
that  simply  do  not  jibe  with  reality  [italics  added] ,’’  and  “The 
closer  that  the  scenarios  represent  reality  [italics  added] ,  the 
more  reliable  [italics  added]  the  test  results”  (Rubin,  1994, 
p.  125).  Dumas  and  Redish  (1999)  said,  “The  whole  point  of 
usability  testing  is  to  predict  what  will  happen  when  people 
use  the  product  on  their  own.  .  .  .  The  participants  should  feel 
as  if  the  scenario  matches  what  they  would  have  to  do  and 


what  they  would  know  when  they  are  doing  that  task  in  their 
actual  jobs  [italics  added]”  (p.  174). 

The  exact  wording  of  a  scenario  is  important.  A  good  sce¬ 
nario  is  short,  in  the  user’s  words  not  the  product’s,  unambigu¬ 
ous,  and  it  gives  participants  enough  information  to  do  the  task. 
During  test  planning,  testers  work  hard  on  the  wording  of  each 
scenario.  The  goal  is  to  make  sure  that  the  scenario  probes  the 
task  the  tester  wants  to  probe. 

In  addition  to  the  wording  of  the  task  scenarios,  their  order 
may  also  be  important.  It  is  common  for  task  scenarios  to  have 
dependencies.  For  example  in  testing  a  cell  phone,  there  maybe 
a  task  to  enter  a  phone  number  into  memory  and  a  later  task 
to  change  it.  A  problem  with  dependencies  happens  when  the 
participant  cannot  complete  the  first  task.  Testers  should  de¬ 
velop  strategies  to  handle  this  situation,  such  as  putting  a  phone 
number  in  another  memory  location  to  use  when  a  participant 
could  not  complete  the  earlier  task. 

Testers  continue  to  believe  in  the  importance  of  scenarios 
and  continue  to  use  them.  However,  no  research  shows  that  de¬ 
scribing  tasks  as  scenarios  rather  than  simple  task  statements 
makes  any  difference  to  the  performance,  motivation,  or  sub¬ 
jective  judgments  of  participants.  Scenarios  may  be  essential  to 
testing,  but  no  research  has  confirmed  that  fact. 

Assisting  participants  with  tasks.  Assisting  partici¬ 
pants  to  move  forward  in  a  task  or  a  step  in  a  task  is  a  common 
practice  in  diagnostic  tests.  The  primary  reason  for  assisting  is  to 
uncover  additional  usability  issues.  Usability  problems  reveal 
themselves  sequentially  as  a  participant  proceeds  through  a 
task.  Consequently  if  the  participant  can  go  no  further  because 
of  an  initial  problem,  subsequent  problems  may  not  be  uncov¬ 
ered  and  documented.  Helping  the  participant  to  move  on  from 
the  initial  problem  is  one  way  to  uncover  additional  problems. 

"When  this  situation  occurs,  the  moderator  faces  three 
issues: 

1.  'When  to  intervene 

2.  How  to  provide  assistance 

3.  'What  to  do  with  the  data  that  result 

Two  indicators  that  participants  may  not  make  any  more 
progress  are  (a)  participants  indicate  they  do  not  know  what 
more  to  do  or  (b)  they  repeat  actions  that  do  not  move  them  to¬ 
ward  a  solution.  The  decision  for  the  moderator  is  whether  to 
move  on  to  the  next  task  or  to  assist.  If  the  moderator  decides 
that  continuing  with  the  test  will  provide  additional  diagnostic 
information,  assistance  may  be  warranted. 

Assistance  can  take  several  forms.  The  moderator  may  ini¬ 
tially  just  ask  the  participant  to  look  at  the  task  scenario  again  or 
just  ask  an  open-ended  question,  such  as  “So  what  is  happening 
here?”  Sometimes  these  actions  are  enough  to  get  the  partici¬ 
pant  going  again.  ’Whether  such  actions  constitute  an  assist  is  up 
to  the  discretion  of  the  moderator  or  test  team.  'When  progress 
is  still  stalled,  the  moderator  may  then  provide  more  specific 
direction,  from  a  relatively  broad  statement,  such  as  “You  were 
getting  close  when  you  were  looking  in  the  menus,”  to  increas¬ 
ingly  specific  directions  up  to  telling  the  participant  how  to 
complete  the  task.  The  dilemma  for  the  moderator  is  manag¬ 
ing  this  process  without  providing  information  that  will  help  the 


57.  Usability  Testing:  Current  Practice  and  Future  Directions 


1133 


participant  in  subsequent  tasks.  That  kind  of  information  may 
make  the  product  look  easier  to  use,  and  it  may  hide  additional 
usability  issues  that  do  not  reveal  themselves  later.  The  task  of 
giving  assistance  is  an  art,  and  moderators  must  rely  on  their  ex¬ 
periences  and  the  goals  of  the  task  to  determine  when  and  how 
to  intervene.  Boren  and  Ramey  (2000)  studied  methods  of  in¬ 
tervening  in  think  aloud  studies.  They  found  that  moderators 
at  two  labs  intervened  inconsistently.  To  minimize  inconsisten¬ 
cies,  the  authors  made  several  recommendations  regarding 
when  and  how  to  intervene. 

Once  assists  have  been  given,  there  is  the  additional  question 
of  how  to  factor  them  into  the  data  analysis.  For  example,  is  a  task 
that  was  completed  with  assistance  a  failed  task?  If  so,  is  that  al¬ 
ways  the  case?  In  a  complex  product  used  by  skilled  users,  some 
trial  and  error  and  an  occasional  assist  might  be  expected  and  not 
constitute  task  failure.  In  a  self-service  application,  the  need  for 
one  assist  may  be  enough  to  constitute  task  failure.  If  most  par¬ 
ticipants  do  not  require  an  assist  to  complete  a  task,  is  it  fair  to  the 
design  to  count  the  task  as  a  failure?  These  issues  are  factors  in 
the  complex  decision  about  whether  and  how  to  change  a  design. 

The  Data  Are  Recorded  and  Analyzed 

Triangulating  measures  to  confirm  findings.  It 

is  rare  that  a  usability  problem  affects  only  one  measure.  For 
example,  a  poorly  constructed  icon  toolbar  will  generate: 

•  Errors — especially  picking  the  wrong  icon  on  the  toolbar 

•  Slow  task  times — during  which  participants  hesitate  over 
each  icon  and  frequently  click  through  them  looking  for  the 
one  they  want 

•  Statements  of  frustration — participants  express  their  feelings 
about  not  being  able  to  learn  how  the  icons  are  organized  or 
be  able  to  guess  what  an  icon  will  do  from  the  tool  tip  name 

•  The  need  for  assistance  from  the  moderator 

Much  of  the  data  analysis  involves  building  a  case  for  a  us¬ 
ability  problem  by  combining  several  measures — a  process  that 
has  been  called  “triangulation”  (Dumas  &  Redish,  1999).  The 
case  building  is  driven  by  the  problem  list  created  during  the 
test  sessions.  It  is  surprising  how  much  of  this  analysis  is  de¬ 
pendant  on  the  think  aloud  protocol.  We  depend  on  what  par¬ 
ticipants  say  to  help  us  understand  the  problems. 

Identifying  usability  problems.  Most  usability  prob¬ 
lems  emerge  during  the  test  itself.  Testers  observe  usability 
problems  during  the  sessions  and  record  them  on  problem 
sheets  or  data  logs  at  that  time  or  soon  after  the  session.  Later, 
the  problem  sheet  or  log  drives  the  data  analysis.  The  sheet  is 
typically  organized  by  participant  and  by  task.  The  test  adminis¬ 
trator  records  observations,  such  as  “did  not  see  the  option,” 
and  interpretations,  such  as  “does  not  understand  the  graphic.” 
The  test  administrator  notes  when  problems  reappear. 

One  of  the  important  skills  in  user  testing  is  identifying  the 
basic  causes  of  a  problem.  From  individual  instances  of  prob¬ 
lems,  the  experienced  tester  sees  patterns  that  point  to  more 
general  problems.  For  example,  if  participants  look  around  the 


screen  and  aimlessly  look  through  menu  options,  the  tester 
might  conclude  that  the  participants  were  overwhelmed  with 
the  amount  of  information  on  the  screen.  If  participants  do  not 
understand  terms,  the  tester  might  conclude  that  the  interface 
has  too  much  technical  and  computer  jargon.  If  participants  re¬ 
peat  a  task  to  make  sure  it  was  completed,  the  tester  might  con¬ 
clude  that  there  is  not  enough  feedback  about  what  the  system 
is  doing  with  the  participant’s  actions.  Seeing  the  underlying 
causes  of  individual  problems  is  one  of  the  important  skills  that 
a  usability  tester  develops.  It  is  not  entirely  clear  that  such  skills 
can  be  taught  quickly.  Testers  often  have  years  of  experience  at 
studying  and  practicing  problem  identification  skills. 

While  watching  a  test  session,  a  product  developer  will  see 
the  same  events  as  the  test  administrator.  However,  developers 
often  tend  to  see  problems  as  local.  Instead  of  seeing  that  there 
needs  to  be  a  general  review  of  the  language  in  the  interface, 
the  developer  sees  problems  with  individual  words.  This  con¬ 
flict  often  does  not  appear  until  the  testers  and  developers  sit 
down  to  discuss  what  they  saw  and  what  to  do  about  it.  Usabil¬ 
ity  professionals  believe  that  this  conflict  over  what  really  hap¬ 
pened  during  the  test  remains  a  major  barrier  to  improving  a 
product’s  usability.  Developers  do  not  like  hearing  that  they 
have  tunnel  vision  and  cannot  see  the  underlying  causes  of  in¬ 
dividual  problems,  while  usability  professionals  do  not  like 
hearing  that  the  local  fix  will  solve  the  problem.  Another  com¬ 
mon  barrier  is  a  disagreement  about  whether  participants’  dif¬ 
ficulties  are  caused  by  usability  problems  or  problems  with  the 
methodology  itself  (including  the  selection  of  participants). 
These  conflicts  continue  to  limit  the  impact  of  testing  on  prod¬ 
uct  improvement.  However,  iterative  development,  when  it  is 
practiced,  helps  resolve  these  conflicts,  as  testing  subsequent 
iterations  (perhaps  with  an  improved  methodology  if  needed) 
should  reveal  whether  a  problem  still  exists. 

Several  research  studies  have  looked  at  how  many  usability 
problems  are  uncovered  by  different  populations  of  testers. 
These  studies  consistently  show  that  usability  specialists  find 
more  problems  than  product  developers  or  computer  scientists 
(Nielsen,  1994).  However,  all  of  those  studies  have  used  inspec¬ 
tion  methods  not  usability  testing. 

Assigning  a  severity  level  to  problems.  One  way  to  call 
attention  to  important  problems  is  to  put  them  into  a  measure¬ 
ment  tool  such  as  a  problem  severity  scale.  These  scales  allow 
testers  to  identify  which  problems  are  the  most  severe  and,  pre¬ 
sumably  more  likely  to  be  fixed.  In  the  section  on  measuring  us¬ 
ability  below,  we  discuss  the  validity  of  such  scales. 

Identifying  positive  issues.  Some  of  the  schemes  for 
determining  severity  have  a  category  for  a  positive  issue  (e.g.,  a 
good  design  feature) .  When  a  task  is  preformed  smoothly  and 
efficiently,  the  design  must  have  worked.  However,  there  has 
never  been  any  attempt  to  categorize  or  rate  the  positive  as¬ 
pects  of  a  design.  Our  experience  is  that  most  usability  profes¬ 
sionals  pay  only  lip  service  to  what  works  in  a  design,  mention¬ 
ing  a  few  broad  positive  aspects  at  the  beginning  of  a  report,  and 
then  proceeding  on  to  the  long  list  of  problems.  Focusing  on 
the  negative  is  an  aspect  of  human  factors  professionals  that  has 
been  criticized  from  its  beginnings  in  the  1940s  (Meister,  1999). 
Recently  Skylar  and  Gilmore  (2004)  repeated  that  criticism  and 
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called  for  a  more  positive  approach.  We  agree  that  it  is  time  for 
a  scheme  to  measure  the  positive  aspects  of  a  design. 

The  Results  of  the  Test  Are  Communicated 
to  Appropriate  Audiences 

In  the  early  days  of  user  testing,  there  was  almost  always  a  for¬ 
mal  report  and  a  highlight  video  tape.  Testers  needed  reports  to 
communicate  what  they  did  and  what  they  found.  Now,  it  is 
more  common  for  the  results  to  be  communicated  more  infor¬ 
mally,  such  as  at  a  meeting  held  soon  after  the  last  test  session. 
Communication  at  these  meetings  is  facilitated  when  the  prod¬ 
uct  team  has  attended  at  least  some  of  the  test  sessions.  It  is  also 
helpful  if  the  usability  specialist  has  been  involved  with  the  de¬ 
velopment  team  or  at  least  is  aware  of  the  issues  the  team  faced. 

One  of  the  important  reasons  for  the  change  in  reporting  style 
for  diagnostic  usability  tests  is  the  confidence  organizations  have 
in  the  testing  process.  It  is  usually  no  longer  necessary  to  justify 
the  test  with  a  written  report.  Organizations  with  active  usability 
programs  have  come  to  accept  user  testing  as  a  valid,  useful  eval¬ 
uation  tool,  and  an  important  part  of  their  regular  development 
processes.  They  do  not  feel  that  they  need  to  know  the  details  of 
the  test  method  and  the  data  analysis  procedures.  They  want  to 
know  the  bottom  line — what  problems  surfaced  and  what  should 
they  do  about  them?  In  these  organizations,  a  written  report  may 
still  have  value,  but  as  a  means  of  documenting  and  archiving  the 
methodology  rather  than  communicating  the  results. 

Beginning  with  a  project  created  by  the  National  Institute  of 
Standards  and  Technology  (NIST),  a  Common  Industry  Format 
(CIF)  has  been  created  for  summative  tests.  This  format  has  be¬ 
come  an  ANSI  standard  (ANSI,  2001)  and  in  2005  was  approved 
as  an  ISO  standard.  A  standard  format  for  reporting  formative  eval¬ 
uations  is  in  progress  (Theofanos  &  Quesenbery  2005).  While 
these  formats  do  not  apply  to  all  testing  situations,  they  are  a  good 
start  for  new  practitioners  to  learn  how  to  create  effective  reports. 

The  value  of  highlight  tapes.  As  usability  testing  has  be¬ 
come  an  accepted  evaluation  tool,  highlight  tapes  have  become 
less  common.  One  of  the  disappointing  aspects  of  highlight 
tapes  is  that  watching  them  does  not  have  the  same  impact  as 
seeing  the  sessions  live.  Unless  the  action  moves  quickly,  even 
highlight  tapes  can  be  boring.  This  characteristic  makes  careful 
editing  of  the  highlights  a  must.  With  the  older,  analog  editors, 
it  can  take  about  one  hour  to  create  one  minute  of  finished 
tape.  A  15-minute  tape  can  take  two  days  to  create,  even  by  an 
experienced  editor.  Most  of  that  time  is  taken  finding  appropri¬ 
ate  segments  to  illustrate  key  findings. 

The  emergence  of  digital  video  has  made  highlight  tapes  less 
time  consuming;  it  is  still  easier  and  more  effective  to  have  peo¬ 
ple  attend  the  test  sessions.  However,  it  can  be  useful  to  incor¬ 
porate  digital  video  clips  into  presentations  and  reports  to 
illustrate  findings  without  generating  a  full  highlight  tape. 


TYPES  OF  USABILITY  TESTS 


Usability  testing  can  be  helpful  throughout  the  development  life- 
cycle.  In  fact,  starting  early  in  development  can  help  teams  identify 


and  fix  usability  problems  as  they  arise,  before  they  become  too 
expensive  to  fix.  Over  time,  investigators  have  refined  the  method¬ 
ologies  to  better  meet  the  objectives  of  the  tests.  Rubin  (1994) 
identified  the  following  types  of  usability  tests,  which  all  evolved 
from  the  methods  used  in  social  science  research  studies: 

•  Exploratory  tests — conducted  early  in  the  development 
process  to  address  high-level  issues,  not  specific  details 

•  Assessment  tests — conducted  early  or  midway  through 
development;  Rubin  (1994)  called  these  the  “most  typical” 
usability  tests 

•  Validation  or  verification  tests — conducted  at  the  end  of 
development  to  ensure  that  the  product  meets  minimum 
usability  requirements 

•  Comparison  tests — conducted  at  any  point  in  development  to 
compare  two  or  more  products;  these  tests  identify  strengths 
and  weaknesses  of  the  products,  which  can  be  either  com¬ 
petitors  or  competing  designs  for  one  product 

Exploratory  and  assessment  tests  are  diagnostic  or  forma¬ 
tive  tests.  They  help  identify  good  designs  and  support  im¬ 
provements  for  bad  designs.  They  focus  on  qualitative  data 
more  than  quantitative  data,  and  they  rely  heavily  on  thinking 
aloud  procedures  and  prompting  from  the  moderator.  The  goal 
of  these  tests  is  to  identify  usability  problems  (as  opposed  to 
evaluating  overall  usability).  The  methods  for  these  tests  pro¬ 
vide  investigators  with  flexibility  to  explore  new  issues  as  they 
arise  in  a  test  session.  As  such,  diagnostic  tests  are  not  research 
studies.  It  is  not  valid  to  draw  statistical  conclusions  from  these 
tests.  Investigators  must  use  their  skills  to  interpret  descriptive 
statistics  (e.g.,  four  out  of  five  participants  completed  the  test) 
and  qualitative  data  such  as  participant  comments. 

Validation/verification  tests  are  summative  tests.  They  are 
conducted  near  the  end  of  development,  typically  with  alpha 
or  beta  releases.  The  goal  of  these  tests  is  to  evaluate  usability 
metrics  (e.g.,  task  times  or  error  rates)  to  determine  whether 
the  product  meets  the  organization’s  preestablished  usability  re¬ 
quirements.  Thinking  aloud  may  be  discouraged,  as  it  can  in¬ 
terfere  with  the  participant’s  performance.  Comparison  tests 
also  focus  on  quantitative  data.  Validation  and  comparison  tests 
both  require  sufficient  scientific  rigor  to  draw  statistical  conclu¬ 
sions  from  the  data.  Therefore,  they  still  follow  the  standards 
and  practices  of  social  science  experimental  design,  such  as 
larger  sample  sizes,  well-controlled  tasks,  and  sufficient  power 
to  identify  real  differences.  These  tests  are  a  type  of  research 
study  where  investigators  can  analyze  quantitative  data  using  in¬ 
ferential  statistics  (Dumas,  2001b). 

The  type  of  test  also  impacts  what  can  be  tested.  Exploratory 
tests  and  assessment  tests  evaluate  prototypes  of  the  final  prod¬ 
uct.  Validation  tests  require  a  final  product,  or  at  least  a  near-re¬ 
lease  version,  to  ensure  the  results  are  valid.  Comparison  tests 
can  evaluate  products  at  any  stage  of  development,  if  all  ver¬ 
sions  tested  are  in  the  same  stage. 

Informal  vs.  Invalid 

Diagnostic  tests  are  sometimes  called  “informal”  because  they 
lack  the  scientific  rigor  of  summative  tests.  However,  it  is  hard 
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to  know  what  informal  really  means.  For  example,  Thomas 
(1996)  described  a  method,  called  “quick  and  dirty”  and  “infor¬ 
mal,”  in  which  the  participants  are  not  intended  users  of  the 
product  and  in  which  time  and  other  measures  of  efficiency  are 
not  made.  Such  a  test  may  be  informal,  but  it  is  certainly  invalid 
and  should  not  be  called  a  usability  test.  It  is  missing  one  of  the 
essentials — potential  users.  It  is  not  an  informal  usability  test  be¬ 
cause  it  is  not  a  usability  test  at  all.  Despite  the  flexibility  allowed 
in  informal  usability  tests,  there  are  still  methodological  require¬ 
ments  to  ensure  a  test  is  valid.  Tests  that  are  performed  quickly 
and  with  minimal  resources  are  best  called  “quick  and  clean” 
rather  than  “informal”  (Wichansky  2000). 


Prototypes  in  Usability  Tests 

Prototypes  for  usability  can  be  low  fidelity,  high  fidelity,  or  any¬ 
where  in  between.  Fidelity  refers  to  the  breadth  of  features  im¬ 
plemented,  the  depth  or  degree  of  functionality  (how  complete 
each  feature  is),  the  look  or  aesthetics  of  the  prototype,  and 
the  style  of  interaction  (Snyder,  2004;  Virzi,  Sokolov,  &  Karis, 
1996).  The  style  of  interaction  refers  to  how  users  work  with  the 
prototype  and  how  closely  the  medium  of  the  prototype  (e.g., 
paper  or  online)  matches  the  final  product.  High-fidelity  proto¬ 
types  are  more  similar  to  the  actual  product  on  all  these  char¬ 
acteristics  than  low-fidelity  prototypes  are  (Walker,  Takayama, 
&  Landay,  2002). 

Low-fidelity  prototypes.  Low-fidelity  prototypes  are  gen¬ 
erally  easy  and  fast  to  develop  and  can  contain  a  surprising  level 
of  interactivity  (Snyder,  2003).  They  can  be  as  simple  as  sketches 
on  paper.  Conducting  usability  tests  with  low-fidelity  prototypes 
requires  devoting  a  team  member  to  act  as  a  facilitator,  re¬ 
sponding  to  the  participant’s  input  (e.g.,  giving  the  participant  a 
new  screen  or  menu) .  This  person  must  know  how  the  system 
should  work  to  be  able  to  respond  quickly  (Briggs  &  Johnson, 
2005;  Rudd,  Stern,  &  Isensee,  1996).  Snyder  (2003)  provides  sig¬ 
nificant  detail  about  the  special  concerns  of  using  paper  proto¬ 
types  in  usability  tests. 

Low-fidelity  prototypes  have  several  advantages.  They  can 
be  built  quickly  and  easily,  so  they  allow  teams  to  start  usabil¬ 
ity  testing  early  in  development.  Teams  can  also  use  low-fidelity 
prototypes  to  quickly  test  and  resolve  design  conflicts  through¬ 
out  development.  Results  from  usability  tests  on  low-fidelity 
prototypes  can  be  available  early,  allowing  the  team  to  fix  de¬ 
sign  problems  while  it  is  still  cost  effective.  Another  advantage 
of  using  low-fidelity  prototypes  is  that  participants  must  tell  the 
facilitator  what  they  would  do  at  each  step.  This  interaction  can 
facilitate  the  think  aloud  process,  as  it  may  seem  more  natural 
to  the  participants  than  talking  to  themselves  (Datz-Kauffold 
&  Henry,  2000). 

Some  of  the  challenges  in  working  with  low-fidelity  proto¬ 
types  include  the  following: 

•  Low-fidelity  prototypes  may  discourage  exploration  of  the 

product.  Participants  may  not  want  the  person  acting  as  the 

computer  to  work  harder  than  necessary  (Snyder,  2003). 


•  They  may  be  difficult  for  low-vision  or  blind  participants  to 
use  if  the  prototype  cannot  accommodate  a  participant’s 
usual  assistive  device  (Briggs  &  Johnson,  2005). 

•  It  is  difficult  to  collect  quantitative  data  if  the  prototype  will 
not  integrate  with  automated  tools  (Briggs  &  Johnson,  2005). 

•  Similarly,  low-fidelity  prototypes  may  not  integrate  with  re¬ 
mote  testing  tools  (Briggs  &  Johnson,  2005). 

•  Changes  that  the  human  computer  makes  can  be  more  obvi¬ 
ous  than  when  they  occur  on  the  screen  (Datz-Kauffold  & 
Henry,  2000). 

High-fidelity  prototypes.  High-fidelity  prototypes  con¬ 
tain  more  functionality  and  are  more  expensive  to  generate. 
However,  there  are  more  and  more  tools  available  to  make  it 
easier  to  simulate  a  functional  product  (Walker  et  al.,  2002).  In 
tests  with  high-fidelity  prototypes,  participants  interact  with  the 
prototype  just  as  they  would  with  the  actual  product,  using  the 
same  (or  at  least  similar)  hardware  and  software.  In  addition  to 
usability  tests,  high-fidelity  prototypes  can  also  serve  as  living 
specifications  of  the  product.  They  can  communicate  designs 
more  efficiently  than  text  documents  (Rudd  et  al.,  1996). 

However,  high-fidelity  prototypes  can  give  users  and  man¬ 
agement  the  impression  that  the  product  is  closer  to  completion 
that  it  actually  is.  Also,  there  can  be  features  in  a  high-fidelity  pro¬ 
totype  the  participants  may  expect  in  the  final  version,  only  to  be 
disappointed  if  they  are  not  there. 

When  to  use  low-  and  high-fidelity  prototypes.  There 
is  some  debate  over  whether  low-  or  high-fidelity  prototypes 
are  more  appropriate  for  usability  testing.  Generally,  low- 
fidelity  prototypes  are  considered  most  useful  early  in  develop¬ 
ment  to  address  high-level  conceptual  issues;  high-fidelity  pro¬ 
totypes  are  considered  most  useful  later  in  development,  after 
more  design  decisions  have  been  worked  out.  Virzi  et  al.  (1996) 
showed  that  low-fidelity  prototypes  could  be  useful  through¬ 
out  development,  while  others  believe  that  low-fidelity  proto¬ 
types  are  so  different  from  the  actual  product  that  they  are  less 
effective  (Munoz,  Miller-Jacobs,  Spool,  &  Verplank,  1992;  Rudd 
&  Isensee,  1994). 

To  address  this  issue,  several  studies  have  compared  results 
from  usability  tests  of  low-  and  high-fidelity  prototypes.  The  re¬ 
search  generally  finds  that  both  types  find  approximately  the 
same  number  and  types  of  usability  problems.  Walker  et  al. 
(2002)  found  few  differences  between  low-  and  high-fidelity 
prototypes.  They  recommended  reviewing  the  practical  con¬ 
siderations  of  the  test  to  determine  the  appropriate  fidelity.  Virzi 
et  al.  (1996)  and  Catani  and  Biers  (1998)  found  no  difference  in 
the  number  and  types  of  usability  problems  found  with  low  and 
high  fidelity  prototypes.  In  addition,  Wiklund,  Thurrott,  and  Du¬ 
mas  (1992)  found  that  the  aesthetic  refinement  of  the  prototype 
(how  finished  the  prototype  looked)  did  not  affect  ratings  of  us¬ 
ability.  Regardless  of  the  fidelity  of  the  prototype,  it  is  impor¬ 
tant  to  remember  that  a  prototype  is  not  the  actual  product.  The 
final  product  can  be  slower  or  faster.  There  may  be  features  of 
the  prototype  that  are  not  in  the  final  product.  The  develop¬ 
ment  team  must  consider  which  features  to  include  in  the 
prototype  to  ensure  a  usability  test  can  address  the  appropri¬ 
ate  issues  (Virzi,  1989). 
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WHERE  TESTING  OCCURS 


In  this  section,  we  consider  the  environment  in  which  the  test¬ 
ing  is  performed: 

•  Testing  in  a  usability  lab 

•  Remote  testing 

•  Onsite  testing 

Testing  in  a  Lab 

What  is  a  usability  lab?  Usability  labs  continue  to  be 
built,  and  there  is  a  brisk  business  in  selling  lab  equipment.  The 
demand  for  labs  is  driven  by  the  advantages  of  having  record¬ 
ing  equipment  and  the  ability  to  allow  stakeholders  to  view  the 
test  sessions.  In  essence,  the  method  sells  itself  because  devel¬ 
opers  and  managers  find  the  experience  of  watching  a  live  test 
session  compelling. 

The  basic  makeup  of  a  suite  of  usability  test  equipment  has 
not  changed  much  with  time.  It  consists  of  video  and  audio 
recording  and  mixing  equipment.  However,  some  recent  inno¬ 
vations  in  lab  equipment  have  improved  measurement.  For  ex¬ 
ample,  relatively  inexpensive  eye  tracking  equipment  has  made 
it  possible  to  know  where  participants  are  looking  as  they  work. 
Also,  miniaturization  continues  to  shrink  the  size  of  almost  all 
lab  equipment.  Hence,  the  arrival  of  portable  lab  setups  that  fit 
in  airplane  overhead  compartments. 

Labs  have  traditionally  recorded  test  sessions  on  videotape, 
an  analog  medium.  Unfortunately,  the  quality  of  analog  video 
of  the  participant’s  computer  recorded  during  sessions  had  al¬ 
ways  been  poor.  Second-generation  copies,  which  are  often 
used  in  highlight  tapes,  further  degrade  the  quality.  Scan  con¬ 
verters  selling  for  under  $2,000  are  still  disappointing,  making 
it  difficult  to  see  screen  details. 

However,  labs  are  now  taking  advantage  of  digital  audio  and 
video.  Screen  capture  software,  which  records  the  participant’s 
computer  screen  digitally,  has  very  good  resolution.  Labs  can 
also  capture  the  participant’s  face  using  'Webcams,  which  are  in¬ 
expensive  and  easy  to  use.  The  technology  has  allowed  usabil¬ 
ity  labs  to  improve  recordings  and  to  make  it  substantially  eas¬ 
ier  to  find  and  edit  video  segments.  More  and  more  tools  are 
becoming  available  to  capture,  analyze,  and  store  video  and 
user  actions. 

Mimicking  the  operational  environment.  Testers 
usually  try  to  adapt  the  test  room  to  better  simulate  the  envi¬ 
ronment  in  which  the  application  or  system  will  be  used.  Rubin 
(1994)  recommended  that  testers  “make  the  testing  environ¬ 
ment  as  realistic  as  possible.  As  much  as  possible,  try  to  main¬ 
tain  a  testing  environment  that  mimics  the  actual  working  envi¬ 
ronment  in  which  the  product  will  be  used”  (p.  95). 

The  testers  try  to  simulate  a  variety  of  characteristics  in  the 
test  environment.  The  most  obvious,  and  often  the  easiest,  is 
making  it  look  like  the  operational  environment.  However, 
other  characteristics  are  more  difficult  to  simulate.  For  example, 
stressors  such  as  time  pressure  are  difficult  to  mimic.  Also,  there 


are  no  real  consequences  to  a  participant’s  behavior  in  a  test, 
which  may  impact  their  behavior. 

Despite  these  concerns,  the  literature  on  product  evaluation, 
when  viewed  from  the  perspective  of  50  years,  shows  that  re¬ 
searchers  and  practitioners  have  been  able  to  use  simulated 
complex  operational  environments.  For  example,  aircraft  and 
automobile  simulators  are  used  to  study  interactions  with  cock¬ 
pits  and  dashboards  as  well  as  for  operator  training.  More  re¬ 
cently,  hospital  operating  room  simulators  have  been  developed 
to  study  issues  in  anesthesiology  (Gaba,  1994). 

Meister  (1999)  emphasized  the  importance  of  understanding 
how  the  actual  operational  environment  will  impact  behavior. 
He  advocated  improving  our  understanding  of  the  differences 
between  the  test  and  operational  environments  through  con¬ 
tinued  research.  Usability  testers  should  carefully  consider  the 
environment  of  each  test,  as  both  laboratory  and  on-site  test¬ 
ing  have  their  own  advantages  (see  the  discussion  later  in  this 
chapter) .  Putting  a  couch  in  the  test  room  to  make  it  look  more 
like  a  room  in  a  home  may  not  simulate  the  use  environment, 
but  going  to  the  participant’s  home  for  testing  is  a  complex 
process  (Mitropoulos-Rundus  &  Muszak,  1997). 

The  impact  of  the  testing  equipment.  An  issue  that  has 
been  debated  throughout  the  history  of  usability  testing  is  the 
impact  of  one-way  mirrors  and  recording  equipment  on  the  test 
participants.  This  debate  comes  to  a  head  in  discussions  about 
whether  the  test  administrator  should  sit  with  participants  as 
they  work  or  stay  behind  the  one-way  mirror  and  talk  over  an 
intercom.  Some  testing  groups  usually  sit  with  the  participant, 
believing  that  it  reduces  the  participants’  anxiety  about  being 
in  the  test  and  makes  it  easier  to  manage  the  test  session  (Ru¬ 
bin,  1994).  Other  testing  groups  normally  do  not  sit  with  the 
participants,  believing  that  it  makes  it  easier  to  remain  objec¬ 
tive  and  frees  the  administrator  to  record  the  actions  of  the  par¬ 
ticipants  (Dumas  &  Redish,  1999).  Barker  and  Biers  (1994)  con¬ 
ducted  an  experiment  in  which  they  varied  whether  there  was 
a  one-way  mirror  or  cameras  in  the  test  room.  They  found  that 
the  presence  of  the  equipment  did  not  affect  the  participants’ 
performance  or  ratings  of  usability  of  the  product.  No  other 
studies  have  been  reported,  however,  on  this  hotly  debated  and 
important  issue. 

One  vs.  multiple  participants.  Most  usability  tests  in¬ 
volve  one  test  participant  at  a  time.  Studies  show  that  when  two 
participants  work  together — sometimes  called  the  “codiscovery 
method”  (Kennedy  1989) — they  make  more  utterances.  The  na¬ 
ture  of  the  utterances  also  is  different,  with  codiscovery  partici¬ 
pants  making  more  evaluative,  as  opposed  to  descriptive  state¬ 
ments,  and  making  more  statements  that  developers  view  as 
useful  (Hackman  &  Biers,  1992).  A  related  method  is  to  have  one 
participant  teach  another  how  to  do  a  task  (Vora,  1994).  How¬ 
ever,  using  either  of  these  methods  requires  recruiting  twice  as 
many  participants. 

Remote  Testing 

A  relatively  recent  innovation  in  usability  testing  is  the  ability 
to  test  when  the  administrator  and  the  participant  are  in  differ- 
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ent  locations.  There  are  several  advantages  to  remote  testing 
(Perkins,  2001): 

•  You  can  reach  a  worldwide  population  of  participants  be¬ 
cause  you  are  not  limited  to  the  local  testing  area. 

•  It  is  easier  to  get  participants  to  volunteer  because  they  do 
not  have  to  travel. 

•  Participants  work  at  their  desks  in  their  work  environments, 
which  may  make  them  more  comfortable  and  the  testing 
more  realistic. 

•  You  do  not  need  a  usability  lab. 

•  Often  you  do  not  have  to  compensate  participants. 

In  the  past,  the  technology  to  conduct  such  sessions  was  not 
good  enough  to  allow  usability  specialists  to  get  the  information 
they  need  (Dumas,  2003).  That  is  no  longer  true  because  of  sev¬ 
eral  factors: 

•  The  Internet  has  made  it  possible  for  usability  specialists  and 
participants  to  work  together  without  installing  special  hard¬ 
ware  or  complex  software  on  both  the  tester’s  and  the  par¬ 
ticipant’s  computers. 

•  There  are  tools  available  for  instrumenting  websites  auto¬ 
matically  to  collect  usability  measures  and  to  insert  questions 
and  ratings  as  the  participants  work. 

•  Collaboration  software  that  works  over  the  Internet  makes  it 
possible  to  share  desktops  and  control  of  the  cursor. 

•  Recording  software  makes  it  possible  to  store  good  quality 
video  and  sound  in  files  that  are  not  large  by  today’s  stan¬ 
dards,  often  less  than  50M  for  a  two-hour  session. 

•  PC  processors  and  RAM  are  fast  enough  to  run  both  record¬ 
ing  software  and  the  application  you  are  testing  simultane¬ 
ously.  In  addition,  when  participants  are  working  at  home, 
they  often  have  broadband  or  high-speed  transmissions  al¬ 
lowing  them  to  work  without  slow  modem  connections. 

Remote  testing  takes  two  forms:  (a)  synchronous,  in  which 
the  administrator  and  the  participant  work  together,  usually 
communicating  over  the  phone,  and  (b)  asynchronous,  in 
which  the  participants  work  without  the  direct  guidance  of  an 
administrator.  Each  of  these  variations  has  its  strengths  and 
weaknesses.  In  both  synchronous  and  asynchronous  testing,  in¬ 
formed  consent  and  nondisclosure  forms  need  to  be  explained 
and  signed  by  the  participant. 

Some  remote  testing  configurations  may  present  security 
problems.  It  is  possible  that  screen  shots  and  more  can  be  ob¬ 
tained  by  participants  without  the  knowledge  of  the  test  ad¬ 
ministrator.  Allowing  participants  to  share  applications  on  com¬ 
puters  inside  your  organization’s  firewall  may  be  prohibited. 
Some  organizations  may  be  able  to  address  this  with  a  nondis¬ 
closure  agreement,  while  others  may  require  a  special  computer 
outside  their  firewalls. 

In  synchronous  remote  testing,  the  participant  and  tester 
are  both  involved  in  the  test  session  but  from  different  loca¬ 
tions.  They  usually  communicate  by  telephone.  Therefore,  syn¬ 
chronous  testing  has  the  same  time  requirements  as  laboratory 
testing,  so  it  typically  has  the  same  small  number  of  partici¬ 
pants.  Unless  a  special  web  camera  is  used,  the  administrator 


cannot  see  the  participant.  We  do  not  yet  know  what  the  impact 
of  not  seeing  the  participant  is,  but  one  laboratory  study  indi¬ 
cates  that  usability  specialists  judge  usability  problems  as  less 
severe  when  they  cannot  see  the  participant’s  face  (Lesaigle  & 
Biers,  2000).  One  advantage  of  synchronous  testing  is  that  visi¬ 
tors  can  watch  the  screen  activity  live.  If  you  use  a  phone  con¬ 
ferencing  system  and  multiuser  screen  sharing  applications, 
they  can  even  watch  and  listen  from  their  own  offices.  To  date, 
no  research  studies  have  compared  synchronous  remote  test¬ 
ing  and  laboratory  testing. 

With  asynchronous  remote  testing,  participants  complete 
the  tasks  on  their  own,  and  the  tester  reviews  the  session  results 
later.  Asynchronous  remote  testing  is  performed  in  two  ways: 

1.  The  participant’s  use  of  a  website  or  application  is  recorded 
behind  the  scenes  and  then  the  data  are  transmitted  to  the 
test  team  (Hong  &  Landay  2001).  A  long  list  of  tools  for  re¬ 
mote  recording  can  be  found  at  http://www.uu.se/Software/ 
Analyzers/Access-analyzers.html 

2.  The  participant  works  with  two  browsers,  one  with  the  prod¬ 
uct  or  prototype  in  it,  the  second  with  the  instructions  for 
the  participant  to  follow.  The  instructions  browser  includes 
the  tasks  to  be  attempted,  buttons  to  click  at  the  beginning 
and  end  of  a  task,  a  free-form  comment  area,  and  questions 
or  ratings  to  answer  during  or  after  each  task. 

The  primary  advantage  of  asynchronous  over  synchronous 
testing  is  a  larger  sample  size,  because  the  number  of  partici¬ 
pants  is  not  limited  by  time  requirements  of  the  administrator. 
Tullis,  Fleischman,  McNulty,  Cianchette,  and  Bergel  (2002)  tested 
88  participants  in  a  short  period.  The  disadvantage  is  that  you 
cannot  see  or  interact  directly  with  each  participant.  In  the 
study,  the  participants  provided  an  unexpectedly  large  volume 
of  feedback  in  the  free  form  comment  field,  which  provided  in¬ 
sight  into  the  usability  problems  with  the  product.  Tullis  et  al.  re¬ 
ported  no  substantial  difference  between  asynchronous  testing 
and  laboratory  testing  in  terms  of  performance  measures  and 
the  number  and  types  of  problems  identified. 

■While  we  have  very  little  research  to  date  on  remote  testing, 
its  advantages  make  it  likely  to  increase  in  popularity,  and  it  may 
some  day  be  as  popular  as  lab  testing  is. 

Onsite  Testing 

Onsite  usability  testing  follows  the  same  basic  procedure  as  test¬ 
ing  in  a  lab  or  remote  testing.  The  primary  advantage  of  onsite 
testing  is  that,  like  remote  testing,  participants  work  in  their 
environments  and  do  not  have  to  travel  to  the  lab.  In  addition, 
unlike  most  remote  testing,  you  get  to  see  the  participants  and 
their  environments. 

Onsite  testing  often  does  not  require  any  special  equipment. 
You  can  simply  observe  participants  and  manually  record  find¬ 
ings.  There  are  portable  labs  to  record  the  test  session  if 
needed.  They  can  generally  be  set  up  onsite  quickly. 

There  are  several  disadvantages  of  onsite  testing  (Mitropoulos- 
Rundus  &  Muszak,  1997).  Onsite  tests  can  be  very  expensive, 
as  the  test  team  has  to  travel  to  the  participant.  It  also  limits  the 
number  of  team  members  who  can  view  the  test  live.  Too  many 
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observers  can  be  overwhelming  to  participants.  The  portable 
equipment  can  record  the  session,  but  as  stated  earlier,  record¬ 
ings  are  not  as  compelling  as  live  tests.  Finally,  participants  may 
be  reluctant  to  allow  testers  into  their  offices  or  homes. 

If  the  usability  test  is  evaluating  a  website,  the  tester  may 
have  to  make  special  arrangements.  For  example,  if  the  proto¬ 
type  is  kept  on  a  development  server  inside  your  organization’s 
firewall,  the  tester  has  to  be  sure  there  is  a  way  for  the  partici¬ 
pant  to  access  it.  Testers  can  bring  in  laptops  running  the  soft¬ 
ware  or  website;  however,  the  whole  application  must  run  on 
the  laptop  itself  You  will  (most  likely)  not  be  able  to  access  the 
Internet  through  the  participant’s  network.  Further,  if  partici¬ 
pants  need  accounts,  testers  should  set  this  up  ahead  of  time 
and  make  sure  participants  can  access  their  accounts  on  the 
website. 

Other  usability  methods  are  conducted  in  the  participant’s 
environment.  The  most  well  known  is  contextual  interviewing. 
Ethnographic  methods  also  help  usability  specialists  understand 
the  user’s  environment.  These  methods  are  outside  the  scope 
of  this  chapter,  but  they  can  all  be  useful.  Usability  specialists 
should  consider  the  goals  of  their  onsite  work  to  determine 
which  of  these  methods  would  be  most  appropriate. 

In  a  recent  study,  cell  phone  usability  was  tested  in  both 
the  lab  and  while  the  participants  were  making  a  trip  using  pub¬ 
lic  transportation  (Kaikkoner,  Kekalainen,  Canker,  Kallio,  & 
Kankainen,  2005).  Each  environment  had  20  participants.  The 
results  showed  only  minor  differences  in  the  problems  identi¬ 
fied  by  each  method.  Kaikkoner  et  al.  (2005)  concluded  that,  for 
cell  phone  usability,  the  extra  resources  and  time  required  for 
the  field  testing  were  not  worth  the  payoff 
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The  Measurement  of  Usability 

To  sufficiently  understand  and  address  usability  issues,  it  is  im¬ 
portant  to  consider  the  types  of  data  collected  and  how  to  ana¬ 
lyze  the  data.  Until  recently,  usability  professionals  have  gener¬ 
ally  agreed  that  quantitative  measures  of  usability  should  include 
effectiveness,  efficiency,  and  satisfaction  (as  described  in  ANSI, 
2001).  Beginning  in  about  2000,  there  has  been  more  interest 
in  additional  measures  of  affect  such  as  fun,  challenge,  and  stim¬ 
ulation  (e.g.,  Pagulayan,  Steury  Eulton,  &  Romero,  2003). 

Effectiveness  refers  to  whether  the  user  can  complete  tasks; 
it  is  evaluated  with  measures  such  as  success  rates,  error  rates, 
number  of  help  desk  calls,  and  number  of  assists. 

Efficiency  refers  to  the  effort  it  takes  to  complete  the  tasks. 
Task  time,  a  performance-based,  quantitative  measure,  has  been 
the  primary  measure  of  efficiency.  Investigators  can  report  times 
for  the  entire  task  or  for  each  step  towards  the  goal.  Eor  web¬ 
sites,  additional  measures  of  efficiency  may  be  useful,  such  as 
the  number  of  pages  accessed  compared  to  the  minimum  num¬ 
ber  required  to  complete  a  task  (Smith,  1996). 

One  measure  that  has  been  proposed  using  time  measures 
to  evaluate  efficiency  is  the  Novice/Expert  Ratio  (Urokohara, 
Tanaka,  Furuta,  Honda,  &  Kurosu,  2000).  This  measure  is  the  task 
time  for  a  participant  divided  by  the  average  time  for  some  ref¬ 


erent  group,  such  as  experts.  'When  the  average  time  is  an  ideal, 
this  measure  can  provide  some  useful  insight.  However,  the  ratio 
has  proven  difficult  to  interpret  (Sauro  &  Kindlund,  2005). 

User  satisfaction  refers  to  the  participants’  opinions  and 
attitudes  about  the  product,  and  it  is  generally  evaluated  with 
satisfaction  ratings,  questionnaires,  and  participant  comments. 
Questionnaires  for  measuring  user  satisfaction  include  the 
Questionnaire  for  User  Interaction  Satisfaction  (QUIS;  Chin, 
Diehl,  &  Norman,  1988),  the  Software  Usability  Scale  (SUS; 
Brooke,  1996),  and  the  Software  Usability  Measurement  In¬ 
ventory  (SUMI;  Kirakowski,  1996).  Tullis  and  Stetson  (2004) 
provided  a  good  description  and  analysis  of  user  satisfaction 
questionnaires  in  usability  tests,  and  their  study  showed  that 
these  questionnaires  are  not  reliable  with  fewer  than  10-12  re¬ 
spondents,  which  makes  their  use  in  a  diagnostic  usability  test 
problematic. 

There  is  some  debate  about  how  strongly  measures  are 
correlated.  Frpkjasr,  Hertzum,  and  Hornbaek  (2000)  found  only 
a  weak  correlation,  but  Sauro  and  Kindlund  (2005)  found  a 
stronger  one.  Further,  measures  of  effectiveness  and  efficiency 
(generally  objective  performance  measures)  may  have  a  low 
correlation  with  satisfaction  measures,  as  participants  often 
blame  themselves  for  problems  rather  than  the  product.  In 
some  cases,  effectiveness  and  efficiency  measures  may  even 
conflict  with  each  other  (Nielsen  &  Levy,  1994).  Therefore,  it  is 
important  to  evaluate  all  of  these  measures  together  to  get  a  full 
understanding  of  a  product’s  usability  because  each  assesses  a 
different  aspect  of  usability. 

In  addition  to  these  quantitative  measures,  usability  tests  can 
also  provide  useful  qualitative  data,  including  participant  com¬ 
ments  and  investigator  observations,  which  can  provide  pro¬ 
found  insight  into  usability  problems  and  solutions.  The  process 
of  interpreting  them  is  part  of  the  art  of  usability  testing.  Some 
usability  professionals  believe  that  the  qualitative  results  are  ac¬ 
tually  more  informative  and  useful  than  the  quantitative  results. 
For  example,  Ebling  and  John  (2000)  found  that  over  half  of 
the  problems  identified  in  their  test  came  from  the  think  aloud 
protocol  alone. 

Given  the  number  of  usability  measures,  it  seems  logical  to 
try  to  simplify  them  into  a  single  usability  metric.  This  metric 
would  be  useful  in  determining  when  a  product  is  usable 
enough  or  in  comparing  usability  of  several  designs  (it  would 
probably  be  less  useful  in  formative  usability  tests,  where  the 
qualitative  data  that  yield  design  solutions  are  so  critical).  Sauro 
and  Kindlund  (2005)  and  McGee  (2003,  2004)  reported  on  op¬ 
tions  for  a  single  metric  they  have  used  successfully.  These  met¬ 
rics  require  a  fair  amount  of  attention  to  the  details  of  data  col¬ 
lection  and  analysis,  but  the  authors  felt  the  benefits  are  worth 
the  additional  work. 

Assigning  severity  ratings.  Several  recent  research 
studies  have  looked  at  the  validity  and  the  reliability  of  severity 
rating  scales.  Unfortunately,  there  is  little  consistency  in  the  use 
of  severity  ratings.  This  lack  of  consistency  appears  in  all  forms 
of  usability  evaluation  and  is  one  of  the  most  important  chal¬ 
lenges  to  usability  methodology. 

Most  studies  have  looked  at  the  inconsistencies  among  ex¬ 
perts  using  severity  scales  with  inspection  methods  such  as 
heuristic  evaluation.  However,  Jacobsen,  Hertzum,  and  John 
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(1998)  found  the  same  results  with  usability  testing.  They  asked 
four  experienced  usability  testers  to  watch  tapes  of  the  same 
usability  test  and  then  identify  problems,  including  the  top 
10  problems  in  terms  of  severity.  Of  the  93  problems  identified 
with  the  product,  only  20%  were  detected  by  all  evaluators, 
while  46%  were  only  found  by  a  single  evaluator.  None  of  the 
top  10  severe  problems  appeared  on  all  four  evaluators’  lists. 
Molich  and  Dumas  (in  press)  found  that  25%  of  the  problems 
reported  in  common  by  two  or  more  evaluation  teams  were 
classified  into  different  severity  categories. 

Lesaigle  and  Biers  (2000)  reported  a  disappointing  correla¬ 
tion  coefficient  (+0.16)  among  professional  testers’  ratings  of 
the  severity  of  the  same  usability  problems  in  a  usability  test. 
They  used  Nielsen’s  severity  rating  scale  (Nielsen,  1992).  Can- 
tani  and  Biers  (1998)  found  that  severity  ratings  by  usability  pro¬ 
fessionals  were  not  consistent. 

The  results  of  these  studies  strike  a  blow  at  one  of  the  most 
often  mentioned  strengths  of  usability  testing — its  ability  to  un¬ 
cover  the  most  severe  usability  problems.  Now,  we  do  not  know 
whether  the  inconsistencies  in  severity  judgments  are  the  result 
of  the  poor  scales,  the  differing  perceptions  of  usability  special¬ 
ists,  or  both. 

Several  practitioners  have  proposed  severity  rating  schemes: 
Dumas  and  Redish  (1999),  Nielsen  (1992),  Rubin  (1994),  and 
’Wilson  and  Coyne  (2001).  The  schemes  have  three  properties  in 
common: 

1.  They  all  start  with  a  rating  scale  that  is  derived  from  software 
bug  reporting.  The  most  severe  category  usually  involves 
loss  of  data  or  task  failure,  and  the  least  severe  category  in¬ 
volves  problems  that  are  so  unimportant  that  they  do  not 
need  an  immediate  fix.  The  middle  levels  between  the  ex¬ 
tremes  are  usually  difficult  to  interpret  and  they  are  stated 
in  words  that  are  hard  to  apply  to  specific  cases.  For  exam¬ 
ple,  Dumas  and  Redish  (1999)  proposed  two  middle  levels: 

(a)  problems  that  create  significant  delay  and  frustration  and 

(b)  problems  that  have  a  minor  effect  on  usability.  Nielson’s 
(1992)  middle  levels  are  (a)  major  usability  problem — im¬ 
portant  to  fix  so  it  should  be  given  high  priority  and  (b)  mi¬ 
nor  usability  problem — fixing  is  given  low  priority.  None  of 
the  sources  gives  practitioners  any  guidance  on  how  prob¬ 
lems  fit  into  these  scale  levels,  especially  the  middle  ones. 

2.  All  of  the  authors  admit,  at  least  indirectly,  that  their  scales 
alone  are  not  enough  to  assess  severity.  The  tester  must  con¬ 
sider  one  or  more  other  factors  in  judging  severity.  For  ex¬ 
ample,  Nielsen  (1992)  described  four  factors  in  addition  to 
the  severity  rating:  frequency,  impact,  persistence,  and  some¬ 
thing  called  “market  impact.”  Rubin  (1994)  proposed  to  mul¬ 
tiply  the  rating  by  the  number  of  future  users  who  will  have 
the  problem.  Dumas  and  Redish  (1999)  added  a  second  di¬ 
mension — the  scope  of  the  problem  either  local  or  global, 
with  no  levels  in  between.  'With  the  exception  of  Rubin’s 
multiplication  rule,  none  of  these  other  factors  is  described 
in  enough  detail  to  indicate  how  their  combination  with  the 
severity  scale  would  work,  which  is,  perhaps,  an  indicator  of 
the  weakness  of  the  severity  scales  themselves. 

None  of  the  scales  indicates  how  to  treat  individual  differ¬ 
ences.  For  example,  what  does  one  do  if  only  two  of  eight  par¬ 


ticipants  fail  to  complete  a  task  because  of  a  usability  problem? 
Is  that  problem  placed  in  the  most  severe  category  or  does  it 
move  down  a  level?  If  a  problem  is  global  rather  than  local,  does 
that  change  its  severity?  The  authors  of  these  scales  provided  lit¬ 
tle  guidance. 

In  summary,  severity  ratings  can  be  helpful  in  determining 
which  problems  to  fix.  However,  the  methods  of  assigning 
severity  are  complex  and  not  yet  fully  understood. 

Eye  tracking.  Eye  tracking  has  been  slow  to  gain  advo¬ 
cates  in  usability  testing  (e.g.,  Bojko  &  Stephenson,  2005;  Gold¬ 
berg,  Stimson,  Lewenstein,  Scott,  &  ’Wichansky,  2002;  Russell, 
2005).  Eye  movements  can  be  very  helpful  in  identifying  exact 
sources  of  confusion  or  errors.  For  example,  if  users  do  not  fol¬ 
low  instructions  on  the  screen  properly,  it  could  be  either  be¬ 
cause  they  do  not  understand  them  (as  evidenced  by  long  gazes 
at  the  text)  or  because  they  do  not  see  the  target  (as  evidenced 
by  no  glances  at  the  text).  Clearly  eye  tracking  data  can  help  de¬ 
signers  identify  the  causes  of  usability  problems  and  recommend 
appropriate  solutions. 

The  eye  tracking  technology  has  advanced  recently.  Partici¬ 
pants  are  no  longer  required  to  wear  awkward  and  bulky  de¬ 
vices.  In  fact,  the  cameras  used  to  track  eye  movement  are  non- 
invasive,  and  are  almost  unnoticeable  to  participants.  Further, 
the  software  available  to  analyze  the  data  has  also  improved. 
This  advancement  is  critical,  as  eye  trackers  can  collect  as  many 
as  60  data  points  per  second.  However,  eye  tracking  still  has 
some  disadvantages.  For  example,  some  participants  must  cali¬ 
brate  the  system  several  times  before  they  are  successful,  which 
increases  their  burden.  Some  participants  (e.g.,  those  who  wear 
bifocals)  cannot  be  tracked  at  all.  Further,  eye  tracking  cannot 
be  used  with  think  aloud  methods  because  studies  have  shown 
the  method  affects  where  participants  look  on  the  display  (Bojko, 
2005).  Eye  tracking  is  useful  for  selected  tasks  during  which  it 
is  important  to  understand  the  details  of  what  participants  are 
looking  at  (Bojko  &  Stephenson,  2005) 

The  Complexity  and  Importance  of  Thinking  Aloud 

One  of  the  early  differences  between  a  usability  test  and  a  re¬ 
search  study  was  that  the  test  participants  typically  thought 
aloud  in  a  usability  test.  'While  concurrent  thinking  aloud  is  nor¬ 
mally  done  as  part  of  a  diagnostic  usability  test,  it  is  really  a 
method  of  its  own.  It  has  been  used  in  conjunction  with  many 
other  methods,  such  as  the  evaluation  of  the  usability  of  print 
documentation  (Soderston,  1985). 

Thinking  aloud  has  been  used  in  psychological  research 
since  the  turn  of  the  20th  century,  but  it  is  best  known  as  a  cog¬ 
nitive  psychology  method  for  studying  short-term  memory 
(Ericsson  &  Simon,  1993). 

Thinking  aloud  provides  usability  testing  with  most  of  its 
drama.  'Without  thinking  aloud,  it  is  unlikely  that  usability  test¬ 
ing  would  have  become  the  most  influential  usability  engineer¬ 
ing  method.  The  think  aloud  protocol  grabs  the  attention  of 
first-time  visitors  to  usability  test  and  gives  a  test  session  the 
appearance  of  a  science-based  method. 

'When  usability  testing  was  first  being  codified,  thinking 
aloud  was  borrowed  from  cognitive  psychology  without  much 
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reflection.  It  was  not  until  shortly  after  2000  that  usability  spe¬ 
cialists  began  to  look  at  it  more  closely.  Independently,  Boren 
and  Ramey  (2000)  and  Dumas  (2001a)  looked  more  closely  at 
what  Ericsson  and  Simon  (1993)  described  as  the  think  aloud 
method  used  in  cognitive  psychology  research  and  whether 
testing  practitioners  were  really  following  that  method.  Both 
reviews  showed  that  the  descriptions  of  how  to  use  the  think 
aloud  method  that  had  been  provided  to  usability  testing  prac¬ 
titioners  by  Dumas  and  Redish  (1999)  and  Rubin  (1994)  were 
in  direct  contradiction  to  the  instructions  used  in  cognitive  psy¬ 
chology  research.  In  cognitive  psychology  research  studies,  par¬ 
ticipants  are  discouraged  from  reporting  feelings  or  expecta¬ 
tions  or  from  making  any  verbal  diversions  over  and  above  the 
content  of  their  actions.  In  usability  testing,  participants  are  en¬ 
couraged  to  report  on  their  feelings  and  expectations  and  on 
additional  relevant  issues. 

Only  a  few  research  studies  have  been  done  on  the  think 
aloud  method  in  a  usability-testing  context.  Krahmer  and  Um- 
melen  (2004)  compared  typical  usability  testing  think  aloud  in¬ 
structions  to  the  instructions  used  by  Ericsson  and  Simon 
(1993)  and  found  that  the  research  instructions  do  not  work 
well  in  a  testing  context.  Ebling  and  John  (2000)  traced  each 
usability  problem  found  in  a  usability  test  back  to  its  source  in 
the  test  measures.  They  found  that  over  half  of  the  problems 
identified  in  their  test  came  from  the  think  aloud  protocol  alone. 
Their  study  supplemented  an  earlier  one  by  Virzi,  Source,  and 
Herbert  (1993),  which  showed  that  fewer  problems  are  identi¬ 
fied  when  the  participants  do  not  think  aloud. 

One  of  the  interesting  questions  about  thinking  aloud  is 
whether  it  influences  what  test  participants  do  in  a  usability  test 
session.  Bojko  (2005)  reported  that  thinking  aloud  changes  the 
way  participants  examine  a  web  page  and  recommended  that 
thinking  aloud  not  be  used  with  eye  tracking.  That  recommen¬ 
dation  raises  interesting  questions  about  whether  thinking 
aloud  adds  to  the  artificiality  of  the  testing  situation.  Eurther,  as 
stated  earlier,  Boren  and  Ramey  (2000)  found  that  test  admin¬ 
istrators  use  the  think  aloud  method  differently.  Therefore,  it  is 
important  to  plan  carefully  how  thinking  aloud  will  be  incorpo¬ 
rated  into  a  usability  test. 

Eor  the  practitioner,  this  research  confirms  the  importance 
of  thinking  aloud  as  both  a  method  to  catch  the  eye  and  ear  of 
visitors  and  to  uncover  usability  problems.  Most  moderators 
have  not  changed  their  approach  to  think  aloud  instructions 
since  the  early  1990s.  However,  we  are  just  beginning  to  under¬ 
stand  the  thinking  aloud  process.  We  do  not  have  basic  infor¬ 
mation  such  as  whether  the  presence  of  a  moderator  in  the 
room  increases  or  decreases  thinking  aloud  verbalizations  or 
whether  thinking  aloud  changes  the  way  participants  examine 
a  product. 

How  Many  Participants  Are  Enough? 

Part  of  the  popularity  of  usability  testing  comes  from  its  ability  to 
find  usability  problems  with  only  a  few  participants.  Anyone 
who  watches  multiple  test  sessions  with  the  same  set  of  tasks 
perceives  that  the  same  issues  begin  to  repeat  and  that  some¬ 
where  in  the  five  to  eight  test  participant  range,  with  the  same 
user  population,  it  begins  to  seem  unproductive  to  test  more 


participants.  So  it  was  with  great  joy  that  testers  greeted  the  re¬ 
search  studies  by  Virzi  (1990,  1992)  showing  that  80%  of  the  to¬ 
tal  number  of  usability  problems  that  will  be  uncovered  by  as 
many  as  20  participants  will  be  found  by  as  few  as  5.  Virzi  also 
found  that  those  five  participants  uncover  most  of  the  problems 
judged  as  severe.  This  finding  has  been  confirmed  several  times 
(Faulkner,  2003;  Law  &  Vanderheiden,  2000).  Practitioners  con¬ 
tinue  to  test  with  small  numbers  of  participants,  confident  that 
they  are  finding  most  of  the  problems. 

There  have  been  a  few  challenges  to  the  generality  of  this 
finding,  most  notably  by  Lewis  (1994,  2001).  The  challenges 
make  the  reasonable  case  that  not  all  tests  are  the  same.  A  mod¬ 
erately  complicated  product  being  tested  for  the  first  time 
might  indeed  yield  many  of  its  problems  to  five  to  eight  partic¬ 
ipants.  But  what  about  a  product  that  is  being  retested  after 
most  of  its  problems  have  been  fixed?  Or  a  very  large  and  com¬ 
plicated  product?  In  each  of  these  cases,  one  might  expect  that 
it  might  take  more  participants  before  problems  begin  to  re¬ 
peat.  Lewis  (1994)  tested  a  large  product,  a  suite  of  office  pro¬ 
ductivity  tools,  with  six-hour  sessions  and  did  not  see  the  same 
convergence  as  Virzi  (1990,  1992)  had.  It  may  also  take  more 
participants  if  the  user  population  is  very  heterogeneous,  such 
as  with  elderly  and  disabled  users  (Grossnickle,  2004;  ITTAC, 
2004;  Swierenga  &  Guy,  2003).  Lewis  (2001)  created  a  formula 
for  determining  how  many  participants  are  needed  in  a  variety 
of  testing  situations. 

The  CUE  studies  (discussed  in  more  detail  later)  show  that 
there  are  hundreds  of  usability  problems  with  any  moderately 
complex  product,  but  that  any  one  test  will  only  uncover  a  small 
fraction  of  them.  However,  Molich  and  Dumas  (in  press)  argued 
that  having  a  test  team  run  more  participants  is  not  the  way  to 
find  more  problems.  Some  limitation  causes  one  test  team  with 
one  sample  of  tasks  to  find  only  a  fraction  of  the  available  prob¬ 
lems.  We  do  not  know  yet  why  this  happens. 

Therefore,  the  answer  to  the  question,  “Is  five  enough?”  is 
“It  depends.”  It  is  yes  in  the  sense  that  one  team  with  one  sam¬ 
ple  of  tasks  is  unlikely  to  find  more  problems  that  it  would  con¬ 
sider  severe  by  running  more  participants.  It  is  no  in  the  sense 
that  there  may  be  many  more  problems  left  to  be  uncovered. 

Allowing  Free  Exploration 

An  important  issue  in  user  testing  is  what  the  participant  does 
first  in  a  test  session.  Eor  example,  if  all  users  will  have  some 
training  before  they  use  the  product,  testers  might  want  to  be 
sure  all  participants  get  the  same  training  (perhaps  by  providing 
it  themselves). 

There  is  often  a  preamble  to  the  first  task  scenario  that  puts 
the  test  and  the  tasks  into  some  context.  Most  often,  the  pre¬ 
amble  leads  to  the  first  task  scenario.  Some  testers  argue  that 
this  procedure  is  unrealistic  and  that  in  the  real  world  people  do 
not  work  that  way — they  spend  a  few  minutes  exploring  the 
product  before  they  start  doing  tasks.  Others  argue  that  going 
directly  to  tasks  without  training  or  much  of  a  preamble  puts 
stress  on  the  product  to  stand  on  its  own — stress  that  is  useful 
for  making  products  more  usable. 

Should  testers  consider  allowing  the  test  participants  5-10 
minutes  of  exploration  before  they  begin  the  task  scenarios? 
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Those  in  favor  of  free  exploration  argue: 

•  Without  exploration,  the  product  is  getting  an  unfairly  hard 
evaluation. 

•  Without  exploration,  the  testing  situation  is  not  simulating 
the  real  use  environment,  especially  for  web-based  products. 
Users  must  know  something  about  the  product  to  buy  it  or 
their  company  might  give  them  some  orientation  to  it. 

•  In  some  special  situations,  such  as  with  a  new  game,  testers 
gain  valuable  information  by  allowing  new  users  to  explore 
(Pagulayan,  Keeker,  et  al.,  2003). 

Those  against  free  exploration  argue, 

•  Free  exploration  introduces  added  variability  into  the  test; 
some  participants  will  find  information  that  helps  them  do 
the  tasks,  while  others  will  not  find  that  information. 

•  The  test  is  intended  to  be  a  hard  evaluation  for  the  product 
to  pass. 

This  debate  continues,  but  most  testers  do  not  allow  free 
exploration. 

Reliability  of  Usability  Testing 

Beginning  in  1998,  a  series  of  experiments  and  analysis  papers 
questioned  the  reliability  of  assessment  methods  in  general  and 
usability  testing  in  particular.  Reliability  refers  to  whether  us¬ 
ability  testing  will  repeatedly  find  the  same  problems,  given  the 
same  product,  tasks,  users,  and  environment.  Almost  from  its 
beginning,  usability  testing  was  assumed  to  find  most  problems 
with  even  a  single  test.  In  fact,  many  researchers  proposed  that 
testing  was  a  kind  of  gold  standard  against  which  other  assess¬ 
ment  methods  were  to  be  measured. 

Desurvire,  Kondziela,  and  Atwood  (1992)  conducted  the  first 
gold  standard  study.  They  were  interested  in  how  well  an  in¬ 
spection  predicted  the  results  from  a  usability  test.  Their  as¬ 
sumption  was  that  usability  testing  finds  the  true  problems.  The 
three  experts  they  used  found  less  than  half  of  the  problems 
that  were  uncovered  by  Desurvire  et  al.  in  their  usability  test. 
Furthermore,  the  experts  found  more  minor  usability  problems, 
leading  Desurvire  (1994),  later  interpreting  her  data,  to  say, 
“The  training  and  experience  of  human  factors  experts  seem  to 
sensitize  them  so  that  they  are  better  at  predicting  a  user’s  mi¬ 
nor  problems  with  a  system”  (p.  186). 

The  use  of  testing  data  as  a  benchmark  against  which  to  com¬ 
pare  other  evaluation  methods  continues  to  the  present  day 
(Andre,  Hartson,  &  Williges,  2003;  Bailey,  Allan,  &  Raiello,  1992; 
Sears,  1997).  In  these  studies,  any  usability  issue  found  by  an 
inspection  method  is  considered  a  false  alarm  if  it  is  not  also 
found  by  usability  testing. 

Jacobsen,  Hertzum,  and  John  (1998)  were  the  first  to  study 
reliability  in  usability  testing.  They  studied  how  evaluators  dif¬ 
fer  when  analyzing  the  same  usability  test  sessions.  In  the 
study,  four  usability  testers  independently  analyzed  the  same 
set  of  videotapes  of  four  usability  test  sessions.  Each  session 
involved  a  user  thinking  aloud  while  solving  tasks  in  a  multi- 
media  authoring  system.  As  many  as  46%  of  the  problems  were 


uniquely  reported  by  single  evaluators  and  20%  by  only  two 
evaluators. 

A  series  of  studies  pursued  this  issue  of  reliability  of  testing 
further  (Molich,  Bevan,  et  al.,  1998;  Molich,  Meghan,  Ede  & 
Karyukin,  2004;  Molich  &  Dumas,  in  press).  Known  as  the  “com¬ 
parative  user  evaluation”  (CUE)  studies,  these  papers  reported 
studies  in  which  multiple  independent  usability  tests  of  the  same 
product  are  done  by  professional  usability  teams.  The  CUE-1  and 
CUE-2  studies  (Molich,  Bevan,  et  al.,  1998;  Molich,  Meghan,  et  al., 
2004)  documented  wide  differences  in  reported  problems, 
methods,  tasks,  and  usability  reports.  For  example,  in  the  CUE- 
2  study,  nine  teams  independently  usability  tested  the  same  web¬ 
site.  Only  a  single  team  reported  75%  of  the  310  usability  prob¬ 
lems.  At  least  29  serious  or  critical  problems  were  reported  by 
single  teams.  CUE-2  demonstrated  that  the  effectiveness  of  a 
usability  test  could  depend  on  the  tasks,  the  methodology,  and 
the  persons  in  charge  of  the  test.  Not  all  usability  tests  and  testers 
are  equal — even  amongst  professional  organizations. 

In  the  CUE-4  study,  17  teams  conducted  either  an  inspec¬ 
tion  (eight  teams)  or  a  usability  test  (nine  teams)  of  the  same 
hotel  reservations  site  (Molich  &  Dumas,  in  press).  These  teams 
were  chosen  because  most  were  well-known,  published  usabil¬ 
ity  professionals.  Again,  the  overlap  among  the  usability  prob¬ 
lems  was  disappointing.  None  of  the  serious  usability  problems 
was  found  by  all  17  teams  or  all  9  of  the  usability  test  teams.  The 
average  percentage  overlap  between  pairs  of  teams  that  con¬ 
ducted  usability  testing  was  only  11.5%.  Two  of  the  testing  teams 
started  with  the  same  task  set,  though  one  of  them  did  modify 
some  tasks.  Those  two  teams  did  find  more  problems  in  com¬ 
mon,  24%,  but  not  as  many  as  would  be  expected  if  dissimilar 
tasks  were  the  main  cause  of  the  unreliability. 

Hertzum  and  Jacobsen  (2001)  did  an  analysis  of  compara¬ 
tive  studies  of  testing  and  inspections  to  date  and  termed  the 
lack  of  reliability  as  the  “evaluator  effect.”  They  recommended 
multiple  independent  evaluations  of  products,  regardless  of  the 
assessment  method  used. 

What  this  means  for  the  practitioner.  Despite  these 
troubling  findings,  they  have  not  impacted  usability-testing 
methodologies  significantly.  The  usability  community  is  aware  of 
many  of  the  pitfalls  of  usability  tests,  but  the  research  has  not  yet 
completely  uncovered  how  to  alleviate  these  problems.  There  is 
some  evidence  that  the  tasks  selected  affect  the  outcome.  In  ad¬ 
dition,  reliability  is  reduced  when  the  usability  testers  do  not 
interact  with  the  development  team.  The  CUE  studies  show  the 
importance  of  considering  tasks  carefully  and  working  with  the 
development  team  to  ensure  reliable  results.  Further,  the  CUE 
studies  reveal  that  one  usability  test  is  not  sufficient  to  identify 
all  usability  problems  with  a  product.  A  single  usability  test 
should  be  part  of  a  larger  effort  involving  multiple  usability  tests 
(ideally  with  independent  testers)  as  well  as  other  usability 
methods.  The  usability  community  needs  to  place  a  high  prior¬ 
ity  on  identifying  the  causes  of  this  evaluator  effect  and  ways  to 
alleviate  them. 

There  is  an  interesting  connection  between  the  evaluator  ef¬ 
fect  and  the  five  is  enough  issue  we  previously  discussed.  If  the 
evaluator  effect  proves  to  have  wide  generality  it  still  does  not 
mean  that  running  more  participants  in  a  test  will  uncover  more 
problems.  It  still  appears  that  with  one  test  team  and  one  set  of 
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tasks  only  so  many  problems  will  be  uncovered.  Running  more 
participants  under  the  same  conditions  will  not  find  more  issues 
in  most  cases;  it  will  just  use  more  resources.  Conducting  addi¬ 
tional  independent  tests  with  different  teams  will  find  more 
problems  but  there  is  not  yet  enough  empirical  evidence  to  con¬ 
vince  organizations  to  do  so. 

Validity  of  Usability  Testing 

While  we  have  discussed  the  research  and  analysis  about  the  re¬ 
liability  of  testing,  there  has  been  almost  nothing  said  about  its 
validity.  Validity  always  has  to  do  with  whether  a  method  does 
what  it  is  supposed  to  do.  Perhaps  the  validity  of  usability  test¬ 
ing  has  been  ignored  because  of  its  robustness.  Testing  always 
finds  strengths  and  weaknesses  in  a  product.  Furthermore,  as 
we  described  previously,  the  Virzi  (1990, 1992)  studies  provided 
evidence  that  problems  judged  as  severe  were  found  early  in 
testing.  There  has  never  been  a  study  published  questioning 
whether  testing  finds  problems.  Is  finding  some  of  the  prob¬ 
lems  enough? 

To  assess  the  validity  of  usability  testing,  we  must  first  agree 
on  what  a  usability  test  is  supposed  to  do.  Prior  to  the  mid- 
1990s,  the  usability  community  used  diagnostic  tests  primarily 
to  uncover  usability  problems.  The  more  problems  they  found, 
the  better  and,  of  course,  the  tests  should  find  the  most  severe 
ones.  Because  testing  has  never  been  viewed  as  the  only  usabil¬ 
ity  evaluation  method  to  apply  during  development  and  be¬ 
cause,  ideally,  there  are  iterative  tests  performed,  it  was  not  es¬ 
sential  or  expected  that  one  test  would  find  all  of  the  problems. 
Still,  the  Virzi  (1990,  1992)  studies,  and  those  that  have  con¬ 
firmed  his  basic  findings,  indicated  that  most  of  the  problems 
that  one  team  with  one  set  of  tasks  will  uncover  are  found  early 
and  that  5-10  participants  would  find  almost  all  of  the  problems 
that  a  team  is  likely  to  find. 

On  the  other  hand,  the  CUE  studies  suggest  that  one  test 
team  will  find  only  a  small  fraction  of  the  total  problems  and 
that,  perhaps,  finding  problems  should  only  be  a  partial  indica¬ 
tor  of  an  effective  test.  The  RITE  method,  discussed  below,  sug¬ 
gests  two  additional  possibilities  for  goals: 

1.  A  test  should  result  in  an  improved  product  design  that  fixes 
as  many  problems  in  the  original  design  as  the  testing  re¬ 
sources  allow. 

2.  A  test  should  increase  the  commitment  of  development 
teams  to  user-centered  design  and  their  willingness  to  pur¬ 
sue  it  for  future  projects. 

Related  to  this  second  goal  is  the  necessity  for  positive  feed¬ 
back  on  the  effective  aspects  of  a  design.  If  one  of  the  goals  of 
testing  is  team  building,  then  emphasizing  the  positive  needs  to 
be  one  of  the  main  components  of  a  test.  Apprentices  learning 
to  conduct  tests  need  to  be  taught  the  skill  of  noting  what  is 
working  and  making  it  part  of  the  discussion  with  developers 
and  test  reporting. 

Until  we  sort  out  the  importance  of  these  goals  (finding 
problems,  creating  an  improved  design,  and  team  building),  we 
cannot  fully  understand  the  validity  of  what  is  arguably  our  most 
powerful  evaluation  tool. 


Testing  With  Special  Populations 

As  manufacturers  push  to  expand  their  market  share,  they  look 
to  gain  users  in  populations  they  were  not  serving  before.  De¬ 
signing  for  these  populations  often  requires  special  attention 
to  the  usability  techniques  used.  This  section  addresses  some  of 
the  special  considerations  for  usability  tests  with  international 
users,  disabled  users,  elderly  users,  and  children. 

\ntermtioml  Participants 

Many  manufacturers  look  for  new  customers  across  the  globe. 
However,  preparing  a  product  for  a  new  market  may  involve 
more  than  simply  translating  the  language.  Cultural  differences 
can  also  impact  appropriate  design  decisions  such  as  color  selec¬ 
tions  and  the  use  of  images.  These  differences  can  also  impact  the 
appropriate  structure  for  web  applications.  Because  of  the  signif¬ 
icant  differences  across  cultures,  this  it  is  important  to  conduct 
usability  testing  with  participants  from  all  the  target  cultures. 

International  usability  testing  follows  the  principles  and  the¬ 
ories  of  generic  usability  testing.  However,  a  variety  of  chal¬ 
lenges  with  testing  participants  in  other  cultures  generally  does 
not  apply  when  testing  in  one’s  own  culture.  The  challenges  of 
communication  and  cultural  differences  are  described  in  the  fol¬ 
lowing  sections. 

Communication.  One  of  the  most  significant  challenges 
with  international  usability  tests  is  communication.  Often,  there 
are  different  languages.  Sometimes  the  testers  are  bilingual,  but 
often  the  tester  must  have  help  recruiting  participants,  prepar¬ 
ing  test  materials,  conducting  the  test,  analyzing  the  results, 
and  writing  the  report.  Nielsen  (1996)  and  Vatrapu  and  Perez- 
Quinones  (2004)  offered  several  suggestions,  including  the 
following: 

•  Use  employees  of  the  company  who  live  and  work  in  that 
country.  This  may  require  training  the  employees  to  facilitate 
a  usability  test. 

•  Conduct  the  test  in  the  participant’s  language  using  an 
interpreter. 

•  Hire  a  local  usability  firm. 

•  Run  the  test  remotely. 

•  As  a  last  resort,  conduct  the  test  yourself  in  your  language, 
though  this  method  is  likely  to  be  unnatural  for  the  participant. 

Tests  that  are  conducted  in  the  participant’s  language  must 
be  translated.  Some  testers  prefer  to  have  the  translator  work 
real  time  during  the  test.  The  translator  can  either  serve  as  a 
liaison  between  the  tester  and  the  participant  (adding  signifi¬ 
cant  time  to  the  test)  or  between  the  test  administrator  and  par¬ 
ticipant  (who  are  both  speaking  the  same  language)  and  the  ob¬ 
servers.  The  tester  may  also  have  to  arrange  to  provide  the  test 
report  in  more  than  one  language. 

Cultural  differences.  Other  cultural  differences  may 
also  impact  a  usability  test.  Eor  example,  gender  expectations 
may  affect  who  would  facilitate  a  test.  There  may  be  gestures 


57.  Usability  Testing:  Current  Practice  and  Future  Directions 


1143 


considered  natural  or  friendly  in  one  culture,  but  offensive  in 
another.  Loring  (2002)  reported  how  understanding  Hofstede’s 
(1997)  cultural  dimensions  for  your  target  audience  can  help 
improve  the  user-centered  design  process  and  the  final  product. 
Vatrapu  and  Perez-Quinones  (2004)  reported  that  when  both 
the  participant  and  the  test  administrator  were  from  the  same 
culture,  the  participants  engaged  in  more  think  aloud  behavior 
and  the  usability  tests  revealed  more  problems. 

DisaMed  Participants 

Usability  tests  with  disabled  participants  require  careful 
planning.  Testers  must  understand  the  participants’  disabilities 
and  adjust  their  procedures  accordingly.  Several  researchers 
published  lessons  learned  from  their  experiences  with  disabled 
participants  (Coyne,  2005;  Grossnickle,  2004;  ITTAC,  2004; 
Lepisto  &  Ovaska,  2004;  Swierenga  &  Guy,  2003).  Some  of  these 
lessons  include: 

•  Recruiting  disabled  participants  is  more  time  consuming  than 
recruiting  general  population  participants.  Local  organiza¬ 
tions  and  support  groups  may  be  willing  to  help. 

•  Disabled  participants  may  need  assistance  getting  to  the 
usability  lab. 

•  Consent  forms  must  be  accessible  to  all  participants. 

•  Blind  participants  may  require  electronic  or  Braille  versions. 

•  Participants  with  learning  or  cognitive  disabilities  may  require 
special  consideration  to  ensure  they  understand  the  test  and 
their  rights. 

•  Deaf  participants  may  require  a  sign  language  interpreter 
who  needs  to  be  informed  about  the  goals  of  the  study. 

•  Participants  with  cognitive  disabilities  may  require  extra  as¬ 
sistance  understanding  the  tasks  and  may  have  trouble  think¬ 
ing  aloud. 

•  Participants  with  physical  disabilities  may  require  adaptive 
technology  to  interact  with  the  computer.  Be  sure  the  devices 
are  working  before  participants  arrive. 

•  Because  of  the  great  variability  in  disabilities,  it  may  take 
more  participants  than  typical  usability  tests. 

•  It  can  be  especially  difficult  to  observe  participants  who  use 
Braille  readers,  as  there  is  currently  no  good  way  to  follow 
what  the  participant  is  reading. 

Overall,  tests  with  disabled  participants  may  take  longer  than 
expected;  testers  should  schedule  enough  time  so  participants  are 
not  rushed.  Further,  participation  may  be  more  taxing  than  for  gen¬ 
eral  population  users,  so  the  test  should  limit  the  number  of  tasks 
evaluated  (Coyne,  2005).  Finally,  testers  should  ask  participants 
before  the  test  whether  they  need  any  special  accommodations. 

Elderly  Participants 

As  the  population  ages,  manufacturers  are  looking  to  expand 
their  market  to  this  growing  population.  Seniors  are  more  active 
than  ever.  As  a  result,  many  manufacturers  are  working  to  en¬ 
sure  that  their  products  are  usable  by  their  older  users. 


As  people  age,  the  diversity  in  their  abilities  increases.  They 
may  also  have  disabilities,  such  as  those  mentioned  in  the  pre¬ 
vious  section.  Many  of  the  concerns  and  issues  mentioned 
above  also  apply  with  elderly  participants.  In  general,  testers 
should  be  prepared  for  each  participant,  leaving  plenty  of  time 
for  each  person. 

There  may  also  be  generational  issues.  Testers  should  be 
aware  of  what  their  participants  expect  regarding  social  inter¬ 
action  and  courtesy.  Chisnell,  Lee,  and  Redish  (2005),  Coyne 
(2005),  and  Tedesco,  McNulty,  and  Tullis  (2005)  provided  some 
guidance  based  on  their  experiences  running  usability  test  with 
older  participants. 

Children  as  Participants 

When  designing  a  product  for  children,  usability  tests  must 
target  children.  Although  the  process  is  generally  the  same  as 
with  adult  participants,  there  are  a  few  important  differences. 

Recruiting  children  actually  involves  recruiting  their  parents. 
Patel  and  Paulsen  (2002)  suggested  several  good  sources  for  re¬ 
cruiting.  They  recommended  building  rapport  with  organiza¬ 
tion  leaders  and  parents.  It  is  important  to  pay  attention  to  the 
needs  of  both  the  parents  and  the  child.  Sometimes  it  is  neces¬ 
sary  to  have  the  parents  in  the  room  during  the  test,  especially 
for  very  young  children.  Investigators  should  be  sure  that  the 
parents  do  not  unnecessarily  interfere  with  the  test.  However, 
investigators  should  be  flexible,  as  each  family  will  be  different. 

Investigators  may  want  to  alter  the  usability  lab  itself  to  be  a 
better  environment  for  children.  Most  usability  labs  use  a  stan¬ 
dard  office  layout  and  decor.  Although  this  is  fine  for  testing 
adults,  it  is  not  the  most  welcoming  to  children.  Making  the 
room  more  child  friendly  can  make  children  more  comfortable 
and  willing  to  participate. 

The  tasks  should  accommodate  the  abilities  of  children  in 
the  target  age  group.  Investigators  should  consider  (a)  the  word¬ 
ing  of  the  instructions  to  be  sure  they  are  at  an  appropriate 
grade  level  and  (b)  whether  the  participants  are  old  enough  to 
complete  the  tasks.  For  example,  preschoolers  may  not  be  able 
to  perform  a  task  and  think  aloud  simultaneously.  Also,  because 
children  often  have  short  attention  spans,  the  entire  test  session 
should  be  kept  to  a  minimum. 

Finally,  what  motivates  adults  does  not  always  motivate  chil¬ 
dren.  Hanna,  Risden,  and  Alexander  (1997)  suggested  age-ap¬ 
propriate  approaches  for  motivating  young  participants  to  con¬ 
tinue.  Most  likely,  the  best  approach  for  a  preschooler  is  very 
different  from  that  for  a  teenager. 

Children  can  be  unpredictable,  so  one  or  more  members  of 
the  test  team  must  understand  the  skills,  abilities,  and  expecta¬ 
tions  of  the  children  in  the  target  user  population.  This  will  help 
testers  to  respond  appropriately  to  unexpected  situations. 

Testing  With  Special  Products 

Usability  methods  tend  to  be  consistent  across  products,  appli¬ 
cations,  and  systems.  However,  there  are  cases  where  there  are 
issues  specific  to  the  type  of  product.  This  section  addresses 
some  of  these  issues  and  what  researchers  have  found. 
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Mobile  devices.  In  most  cases,  it  is  easy  enough  to  ob¬ 
serve  the  participants  interact  with  the  product  being  tested. 
For  software,  there  are  tools  to  capture  the  screen  or  stream  the 
participant’s  display  to  another  monitor.  For  hardware,  video 
cameras  can  record  the  user’s  interaction.  However,  mobile  de¬ 
vices,  such  as  cell  phones  and  PDAs,  generally  cannot  be  dis¬ 
played  on  a  computer  monitor,  and  it  is  difficult  to  capture  their 
interaction  on  video.  Because  they  are  mobile  devices,  users  are 
often  moving  while  using  them. 

Most  mobile  devices  do  not  have  a  video  out  feature  (Catani, 
2003).  They  must  be  viewed  using  a  camera.  Catani  (2003)  de¬ 
scribed  several  methods  of  videotaping  the  users’  interaction 
with  mobile  devices.  The  goal  is  to  keep  the  users’  experience  as 
realistic  as  possible,  while  still  allowing  a  good  recording  of  the 
device.  Either  the  device  must  be  kept  immobile  so  the  camera 
can  capture  everything,  or  the  camera  must  track  the  device. 
Catani  provided  several  alternatives,  giving  the  benefits  and 
drawbacks  of  each. 

Documentation.  Ideally,  all  components  of  a  system  or 
product  would  be  subject  to  usability  testing  to  insure  they  are 
all  well  designed.  This  is  the  idea  behind  the  goal  of  designing 
the  user  experience.  A  person’s  experience  with  an  individual 
product  goes  beyond  simply  the  product  itself  It  includes  the 
shopping  experience,  customer  services,  and  technical  support. 
It  also  includes  all  documentation — installation  guides,  users’ 
manuals,  online  help,  and  so  forth. 

Ensuring  that  the  documentation  is  usable  can  be  especially 
challenging.  To  evaluate  documentation,  there  should  be  a  rea¬ 
sonably  functional  prototype  and  reasonably  complete  docu¬ 
mentation.  Unfortunately,  this  does  not  usually  happen  until  so 
close  to  deployment  that  it  is  difficult  to  make  many  changes. 

Eurther,  people  generally  try  to  avoid  using  manuals  and 
help.  They  might  even  turn  to  colleagues  before  looking  at  a 
manual.  As  a  result,  it  is  difficult  to  generate  scenarios  in  which 
the  participants  would  naturally  use  the  documentation.  Skel¬ 
ton  (1992)  suggested  the  following  ways  to  test  documentation: 

•  Ask  the  participant  to  complete  a  task  according  to  the 
documentation. 

•  Encourage  the  participant  to  review  the  documentation  as 
they  use  the  product. 

•  Ask  the  participant  to  find  information  in  the  documentation. 

•  Ask  the  participant  to  paraphrase  or  summarize  what  they  read. 

Measuring  the  usability  of  documentation  is  also  a  challenge. 
"We  often  measure  the  usability  of  a  product  by  how  many  times 
participants  consult  the  manual  or  online  help.  However,  if  a 
participant  still  cannot  complete  a  task  after  consulting  the  man¬ 
ual,  it  may  be  difficult  to  distinguish  problems  with  the  product 
from  problems  with  the  manual.  A  few  measures  may  provide 
some  insight,  such  as: 

•  Time  spent  reading  the  manual 

•  Number  of  times  the  participant  looked  at  the  table  of 
contents 

•  Number  of  times  the  participant  looked  at  the  index 

•  Number  of  pages  viewed. 


These  should  be  considered  within  the  context  of  the  whole 
study  to  identify  problems.  For  example,  users  may  need  to  view 
a  variety  of  pages  for  a  particular  task,  so  what  seems  like  a  high 
number  may  actually  be  reasonable. 


The  Ethics  of  Usability  Testing 

In  addition  to  their  general  ethical  responsibilities  as  profes¬ 
sionals,  usability  testers  and  organizations  that  conduct  tests 
have  special  ethical  requirements.  There  are  three  major  areas 
of  concern: 

1.  Informed  consent 

2.  Confidentiality  of  data 

3.  Balancing  harm  with  purpose 

The  principles  of  ethics  have  been  extensively  addressed  by 
all  of  the  relevant  professional  organizations: 

•  The  American  Psychological  Association’s  code  was  revised  in 
2003  (available  at  http://www.upassoc.org/about_upa/struc 
ture_and_org_chart/code_of_conduct.html) 

•  The  Usability  Professionals’  Association’s  code  was  updated 
in  2005  (available  at  http://www.apa.org/ethics/code2002.html) 

•  The  Human  Factors  and  Ergonomics  Society’s  code  was  re¬ 
vised  in  1998  (available  at  http://www.hfes.org/web/About 
HEES/ethics.html) 

Informed  consent.  This  topic  has  had  the  most  written 
about  it.  Every  testing  organization  has  the  responsibility  to  have 
an  adequate  informed  consent  form  and  every  test  session 
should  start  with  both  an  explanation  of  the  form  and  adequate 
time  for  the  participants  to  read  and  sign  it.  The  form  must  ad¬ 
dress  the  rights  of  the  participant,  especially  the  right  to  with¬ 
draw  without  penalty,  any  risks  involved,  and  how  the  data,  es¬ 
pecially  audio  and  video  recording,  will  be  handled.  Any  test 
session  conducted  without  such  consent  is  unethical  and  any  or¬ 
ganization  or  consultant  who  conducts  testing  without  a  proper 
form  violates  the  above  codes.  A  useful  discussion  about  creating 
an  effective  consent  form  can  be  found  in  Waters,  Carswell, 
Stephens,  and  Selwitz  (2001). 

Confidentiality  of  data.  It  is  the  responsibility  of  the 
test  administrator  to  safeguard  the  identities  of  test  participants 
and  the  mapping  of  their  identities  to  data  and  recordings.  Par¬ 
ticipants’  names  should  not  be  used  on  forms  or  labels  and 
recordings  must  be  restricted  to  the  purpose  described  in  the  con¬ 
sent  form.  Testers  need  to  be  especially  careful  about  the  use 
and  distribution  of  highlight  tapes. 

Of  special  concern  is  the  use  of  data  or  recordings  when  the 
participants  are  employees  of  the  organization  developing  the 
product  being  tested.  "When  supervisors  or  managers  of  partici¬ 
pants  attend  a  test  session,  the  moderator  has  a  special  respon¬ 
sibility  to  inform  the  participants  before  they  come  to  the  ses¬ 
sion  so  that  participants  are  not  faced  with  the  pressure  of 
deciding  whether  to  participate  while  knowing  that  they  are  be¬ 
ing  observed  by  supervisors  or  managers.  In  tests  such  as  these, 
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it  also  is  advisable  to  avoid  the  creation  of  highlight  tapes  or 
video  clips  that  accompany  test  reports. 

Balancing  harm  with  purpose.  Researchers  must  en¬ 
sure  the  purpose  of  research  activities  justifies  the  risks  to  which 
participants  are  exposed.  As  we  have  discussed  in  this  chapter, 
most  usability  tests  are  not  considered  research  and  very  few 
testing  situations  involve  any  physical  risk.  However,  to  a  layper¬ 
son,  testing  looks  just  like  research,  and  we  know  that  there  is 
emotional  stress  involved  in  most  test  sessions. 

How  can  we  justify  putting  participants  under  such  stress 
and  remain  true  to  the  ethical  principles  of  our  professions? 
Moderators  typically  tell  participants  before  the  tasks  are  at¬ 
tempted  that  they  are  not  being  tested,  the  product  is.  However, 
we  know  that  most  participants  still  blame  themselves  when 
they  struggle  or  fail.  Furthermore,  moderators  sometimes  ask 
participants  to  continue  even  when  participants  say  they  cannot 
make  further  progress. 

We  believe  that  the  justification  for  putting  participants 
under  stress  comes  from  the  balancing  principle  that  future 
users  are  being  saved  stress  by  finding  its  causes  in  the  test.  Hav¬ 
ing  a  few  participants  struggle  can  provide  the  motivation  for 
developers  to  fix  problems,  thereby  preventing  those  problems 
for  many  future  users.  It  is  sad  but  true  that  sometimes  devel¬ 
opment  teams  need  to  see  participants  fail  repeatedly  before 
they  will  expend  resources  on  implementing  solutions. 

Test  moderators,  therefore,  walk  a  tightrope  between  al¬ 
lowing  participants  to  struggle  and  minimizing  stress.  Good 
moderators  look  for  signs  of  emotional  distress  and  use  their 
judgment  about  how  long  to  continue  with  tasks  or  the  ses¬ 
sion.  The  usability  community  has  not  adequately  addressed 
this  ethical  dilemma,  and  therefore,  it  demands  immediate  at¬ 
tention.  Moderators  need  more  guidance  and  honest  discus¬ 
sion  about  how  to  make  these  important  decisions. 

Usability  Testing  to  Show  the  Value  of  Usability 

One  of  the  important  assets  of  testing  is  that  it  sells  itself  Watch¬ 
ing  even  a  few  minutes  of  live  testing  can  be  very  persuasive. 
There  are  two  reasons  why  testers  need  to  get  key  project  staff 
and  decision  makers  to  come  to  watch  a  test  session: 

•  When  people  see  their  first  live  test  session,  they  are  usually 
fascinated  by  what  they  see  and  they  understand  the  value  of 
the  method  better.  Watching  a  videotape  of  a  session  does  not 
provide  the  same  experience.  It  is  important  to  expend  what¬ 
ever  effort  it  takes  to  get  these  people  to  attend  test  sessions. 

•  When  developers  see  live  sessions,  it  is  much  easier  to  com¬ 
municate  the  results  to  them.  When  they  have  seen  some  of 
the  usability  problems  themselves,  they  are  much  less  likely 
to  resist  agreeing  on  what  the  most  important  problems  are. 
Some  of  them  will  even  become  advocates  for  testing. 

Even  though  testing  is  known  and  accepted  by  a  much  wider 
circle  of  people  than  it  was  10  years  ago,  the  experience  of 
watching  a  user  work  at  a  task  while  thinking  aloud  still  converts 
more  people  to  accept  user-centered  design  practices  than  any 
other  development  tool. 


FUTURE  DIRECTIONS  IN  USABILITY  TESTING 


In  this  final  section,  we  explore  new  directions  in  what  has 
been,  over  the  past  15  years,  a  very  stable  and  robust  method 
for  the  empirical  evaluation  of  usability.  As  would  be  expected 
with  new  directions,  it  is  not  always  clear  where  they  will  lead. 
However,  we  believe  that  the  three  areas  we  discuss  have  the 
potential  to  redirect  the  way  we  test  and  expand  the  scope  of 
our  test  objectives. 

Objectives  Beyond  Productivity 

Since  the  year  2000,  there  has  been  an  expansion  of  the  scope 
of  usability.  The  expansion  has  come  from  two  directions: 

•  An  emphasis  on  the  qualities  of  a  product  design  beyond  us¬ 
ability  and  productivity  to  qualities  such  as  fun  and  pleasure 

•  The  recent  evolution  from  user-centered  design  to  user  ex¬ 
perience  design,  in  which  the  usability  of  a  product  is  neces¬ 
sary  but  not  sufficient  for  product  success 

In  both  of  these  movements,  the  scope  of  evaluation  ex¬ 
pands.  The  question  for  usability  testers  is  whether  these 
movements  require  only  that  additional  measures  be  added  to 
a  traditional  test  or  whether  these  movements  can  be  accom¬ 
modated  within  usability  testing  at  all. 

There  has  been  a  growing  sense  that  the  more  traditional 
definitions  of  usability  are  limiting,  especially  as  we  have 
sought  to  integrate  usability  assessment  into  earlier  stages  of 
design,  and  to  cost  justify  our  practices.  These  efforts  have 
led  us  to  realize  that  task  effectiveness,  efficiency,  and  satis¬ 
faction  are  only  part  of  the  story;  affective  aspects  of  the  user’s 
interaction  are  also  critically  important  to  ensuring  the  success 
of  a  product.  The  result  has  been  an  expansion  of  what  usabil¬ 
ity  professionals  focus  on,  and  a  clouding  of  what  is  meant  by 
usability  and  usability  assessment.  For  example,  Jordan  (2002) 
and  Hancock,  Pepe,  and  Murphy  (2005)  drew  upon  Maslow’s 
(1970)  hierarchy  of  needs  to  create  a  hierarchy  for  consumer 
product  needs.  Here,  functionality  is  essential,  usability  is  im¬ 
portant  (and  now  expected) ,  and  pleasurable  interactions  are 
what  users  really  want.  Hancock,  Keeker,,  et  al.  called  this  con¬ 
cept  “hedonomics.” 

Some  of  the  most  interesting  concepts  about  dimensions 
beyond  usability  have  come  from  the  development  of  com¬ 
puter-based  games.  According  to  Pagulayan,  Keeker,  et  al. 
(2003),  games  differ  from  productivity  applications  in  several 
important  ways: 

•  "While  productivity  applications  are  tools  to  accomplish 
work,  games  are  intended  to  be  a  pleasure  to  play,  though 
they  can  also  teach  and  challenge. 

•  Games  define  their  own  goals  rather  than  having  their  goals 
defined  by  the  external  environment. 

•  Each  game  needs  to  be  noticeably  different  from  its  competi¬ 
tors  and  capable  of  grabbing  the  users’  attention. 

•  Rather  then  stressing  consistency,  game  design  stresses  vari¬ 
ety  and  innovation. 
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•  Games  deliberately  impose  constraints  rather  than  remove 
them. 

•  Successful  games  avoid  simplicity,  presenting  a  challenge  to 
users  without  being  so  complex  that  most  users  lose  most  of 
the  time. 

Pagulayan,  Steury  et  al.  (2003)  described  the  methods  they  use 
to  evaluate  games,  including  the  RITE  method  described  below. 

One  of  the  additions  to  the  toolkit  of  usability  testing  is  the 
use  of  scales  for  measuring  product  qualities  such  as  pleasure 
and  joy.  Hassenzahl,  Beu  and  Burmester  (2001)  and  Jordan 
(2000)  described  scales  to  measure  the  affect  a  design  evokes.  A 
related,  but  nonverbal,  measure  is  the  use  of  “emocards,”  which 
each  depict  drawn  facial  expressions  of  emotions  (Desmet, 
2002).  Participants  pick  or  rate  the  cards  that  best  describe  the 
feeling  they  get  from  a  design. 

It  is  unclear  whether  these  scales  are  reliable  with  the 
small  samples  typical  of  usability  testing,  but  it  is  clear  that  prod¬ 
uct  designers  do  want  to  evaluate  measures  beyond  traditional 
usability. 

Asynchronous  Testing 

Previously,  we  discussed  asynchronous  testing  as  a  variation  of 
remote  testing.  Its  major  advantage  is  the  large  sample  sizes  that 
it  can  provide,  which  could  be  very  important  in  an  environ¬ 
ment  in  which  product  qualities  other  than  usability  need  to  be 
assessed.  The  robustness  of  traditional  synchronous  testing  at 
uncovering  usability  issues  with  very  small  samples  may  prove 
to  be  a  major  weakness  if  larger  samples  are  needed  to  assess 
other  product  qualities.  Furthermore,  asynchronous  testing 
could  be  combined  with  measures  that  have  been  used  to  study 
long-term  ease  of  use  issues  rather  than  just  initial  ease  of  lean¬ 
ing  issues.  For  example,  methods  such  as  user  diaries  and  prod¬ 
uct  reaction  checklists  (Jordan,  2000)  could  be  used  as  part  of 
an  asynchronous  test  to  gather  changes  to  attitudes  and  usage 
patterns  over  time. 

The  RITE  Method 

A  group  of  usability  specialists  at  Microsoft  created  a  new 
method  that  may  look  similar  to  traditional  usability  testing  but 
is,  in  fact,  quite  different  (Medlock,  Wrxon,  McGee,  &  Welsh, 
2005;  Medlock,  Wixon,  Terrano,  Romero,  &  Fulton,  2002).  Known 
as  the  “rapid  iterative  test  and  evaluation”  (RITE)  method,  it  fo¬ 
cused  on  fixing  usability  problems  rather  than  just  finding  them. 
In  outline,  the  method  consists  of  the  following: 

•  A  team  consists  of  the  key  decision  makers  for  a  product 
work  together.  The  team  includes  the  usability  specialists  re¬ 
sponsible  for  the  product,  but  also  includes  programmers/ 
designers  who  can  make  changes  to  the  product,  as  they  are 
determined  and,  most  importantly,  someone  in  authority 
who  can  authorize  changes. 

•  The  team  selects  the  tasks  to  be  run  and  attends  all  sessions. 
As  with  traditional  usability  testing,  testers  recruit  partici¬ 


pants  from  the  target  market  and  use  the  think  aloud 
method.  Standard  measures  such  as  task  failure  and  errors 
are  recorded. 

•  After  each  session,  the  usability  specialist  identifies  the  prob¬ 
lems  and  their  severity.  The  team  then  decides  if  there  is  a  fix 
for  each  problem,  whether  they  can  implement  it,  or  whether 
they  need  to  run  more  participants  to  be  sure. 

•  The  programmers/designers  make  agreed  upon  changes  and 
the  new  version  is  run  new  participants. 

•  Again,  problems  are  identified,  including  whether  the  fixes  to 
previous  problems  have  worked.  If  not,  new  fixes  are  created. 

•  The  team  decides  which  problems  they  can  fix,  which  they 
need  to  examine  in  more  detail,  or  which  require  resources 
are  not  currently  available. 

•  Testers  run  additional  participants  until  resources  are  ex¬ 
pended  or  the  team  is  certain  that  the  majority  of  the  prob¬ 
lems  it  has  addressed  are  fixed.  Running  these  extra  partici¬ 
pants  is  a  check  that  new  problems  have  not  been  introduced 
by  the  design  modifications. 

This  method  requires  more  test  participants  than  traditional 
usability  testing  and  the  number  of  participants  required  is  un¬ 
known  when  the  test  is  planned.  It  is  reported  that  the  total 
time  from  start  to  finished  design  is  shorter  than  with  traditional 
testing  in  which  the  solutions  are  not  addressed  until  after  the 
test  sessions  are  over.  Perhaps  the  most  important  contribution 
of  this  method  is  the  team  building  it  provides.  Development 
teams  are  reported  to  like  this  method  very  much,  and  there  are 
fewer  disputes  about  the  reality  of  problems  and  about  why  they 
need  to  be  fixed.  Eurthermore,  the  immediate  feedback  the  ses¬ 
sions  provide  motivates  key  stakeholders  to  attend  sessions  and 
stay  involved  in  the  process. 

While  there  are  some  products  or  organizations  that  do  not 
lend  themselves  to  the  use  of  this  method,  it  holds  great  promise 
for  making  usability  testing  a  positive  experience  for  develop¬ 
ment  teams  and  it  appears  to  make  it  more  likely  that  products 
will  be  improved  (Medlock,  Wixon,  McGee,  et  al.,  2005). 

We  believe  that  the  RITE  method  is  not  just  a  variation  of  us¬ 
ability  testing,  but  is  in  fact,  quite  a  different  method.  It  may  be 
a  misnomer  to  call  it  a  usability  test  method  rather  than  a  design 
and  evaluation  method.  The  traditional  usability  test  is  based  on 
the  psychology  research  experiment.  It  is  an  evaluation  method 
focused  on  identifying  problems  and,  as  we  have  described 
above,  its  validity  is  difficult  to  establish. 

While  the  reliability  of  the  RITE  method  is  yet  unknown,  it 
is  based  on  a  different  cause-effect  model  from  the  research 
experiment.  It  is  closer  to  the  model  developed  by  Skinner 
(1956)  in  which  an  independent  variable,  such  as  a  reinforce¬ 
ment  schedule,  is  turned  off  and  on  successively  to  see  how  it 
affects  performance.  In  a  RITE  test,  the  user  interface  is  suc¬ 
cessively  exposed  participant  by  participant  until  errors  and 
task  failures  are  extinguished  (that  is,  the  product  achieves  the 
desired  performance).  When  the  test  is  finished,  the  quality 
of  the  test  is  assessed  by  its  ability  to  have  fixed  the  usability 
problems  or,  in  some  cases,  to  have  identified  a  fix  that  will 
be  implemented  later.  Consequently,  its  validity  will  be  easier 
to  establish. 


57. 
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CONCLUSION 


Usability  testing  and  usability  labs  established  themselves  very 
rapidly  beginning  about  1990,  and  they  have  remained  re¬ 
markably  stable  since.  An  outside  observer  attending  a  typi¬ 
cal  test  session  in  2005  would  see  only  subtle  differences  in 
the  equipment  and  procedures  from  one  conducted  in  1990. 
However,  the  winds  of  change  are  in  the  air.  Testing  is  mov¬ 


ing  out  of  the  laboratory  and  timely  testing  with  larger  sam¬ 
ples  is  now  technically  feasible.  The  push  from  user  experi¬ 
ence  design  and  the  interest  in  product  pleasure  are  expand¬ 
ing  testing’s  scope.  Its  ability  to  find  the  majority  of  problems 
with  even  moderately  complex  designs  has  been  challenged, 
as  has  its  narrow  focus  on  problem  identification.  This  has  led 
to  an  increase  in  research  regarding  the  characteristics  of  us¬ 
ability  testing.  We  believe  that  usability  testing  will  look  very 
different  in  2010. 
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OZOK 


INTRODUCTION 


Survey  and  questionnaire  design  has  been  a  primary  source  of 
data  collection  within  the  Human-Computer  Interaction  (HCI) 
context  since  the  early  days  of  the  science  (Baecker,  Grudin, 
Buxton,  &  Greenberg,  1995).  Within  the  HCI  context,  surveys 
are  defined  as  compilations  of  questions  that  are  implemented 
either  via  a  computer  or  paper-and-pencil-based  environment, 
that  either  have  quantitative  or  qualitative  scales,  or  are  open- 
ended,  and  that  target  at  extracting  a  variety  of  information  from 
a  representative  sample  of  the  target  population  (which  is  in 
most  cases  current  or  prospective  users  of  an  HCI  system  be¬ 
ing  evaluated). 

Survey  use  is  popular  in  HCI  research  as  it  allows  researchers 
to  collect,  in  a  relatively  easy  manner,  information  based  on 
users’  satisfaction,  opinions,  ideas,  and  evaluations  regarding  a 
system.  Design  and  implementation  of  surveys  are  not  as  costly 
as  conducting  experiments  in  closed  environments  with  special 
equipment;  advances  in  computer-based  survey  products  and 
web-based  survey  services  allow  direct  recording  and  easy  ma¬ 
nipulation  of  survey  data  by  eliminating  the  need  of  translation 
from  paper-based  to  electronic  environment;  and,  with  each  sur¬ 
vey  taking  minutes  to  complete  in  most  cases,  given  a  large  sam¬ 
ple  of  potential  participants  can  be  reached,  surveys  are  a  good 
resource  for  collecting  large  amounts  of  data  in  a  relatively  short 
amount  of  time  and  with  minimal  resources,  especially  when 
compared  to  controlled  objective  experimental  measures  that  in¬ 
volve  in  most  cases  lengthy  tasks  and  recording  sessions.  On  the 
other  hand,  surveys  are  constantly  challenged  in  terms  of  their 
validity  and  reliability  mostly  due  to  their  high  reliance  on  par¬ 
ticipant  opinions  and  the  impossibility  to  measure  with  full  reli¬ 
ability  that  the  questions  are  answered  by  participants  objec¬ 
tively.  Quantitative  survey  research  is  also  sometimes  criticized 
due  to  difficulties  related  to  survey  scaling,  as  scales  rely  on  an  as¬ 
sumption  that  participants  have  the  same  or  similar  perceptions 
of  scale  responses  that  are  subjective  in  structure  (responses 
such  as  “I  strongly  agree”  or  “I  often  do  it”) . 

This  chapter  discusses  the  different  aspects  of  survey  design 
and  implementation  in  HCI  in  a  structured  and  comprehensive 
manner.  After  a  discussion  of  the  purpose  and  a  brief  history  of 
surveys,  the  different  types  of  surveys  (content  and  structure- 
wise),  application  domains,  design,  and  evaluation  techniques 
are  discussed  with  illustrative  examples.  The  chapter  is  con¬ 
cluded  with  emerging  and  future  trends  in  the  HCI  survey  de¬ 
sign  and  implementation  areas. 

Purpose  of  Survey  Use  and  Survey  Applications  in  HCI 

Usability  evaluation  has  been  a  primary  component  of  HCI 
since  its  inception  in  1960s.  User  performance  and  satisfaction 
have  long  been  tapped  as  the  major  components  of  usability 
testing  and  evaluation  (Shneiderman,  1992;  Nielsen,  1989). 
While  user-performance  measurement  relies  on  objective 
methods  such  as  software-based  time  and  error  measurement, 
user  satisfaction  requires  more  sophisticated  tools  to  be  mea¬ 
sured  objectively.  User  satisfaction  is  defined  as  the  level  to 


which  a  system  meets  its  user’s  standards  and  requirements 
(Hackman  &  Oldham,  1980). 

Directly  relating  to  user  satisfaction,  user  preferences,  opin¬ 
ions,  and  evaluations  concerning  HCI  systems  are  also  of  strong 
interest  to  usability  testing  and  evaluation  processes.  Since  it  is 
not  possible  to  measure  all  of  these  usability  components 
through  unequivocal  measurement  methods,  explicit  tools  have 
been  developed  to  elicit  information  regarding  user  satisfaction, 
preferences,  opinions,  and  evaluations  both  qualitatively  and 
quantitatively  through  user  surveys.  Surveys  serve  this  specific 
purpose  well  by  posing  targeted  questions  to  users  for  the  pur¬ 
poses  of  HCI  design  and  evaluation. 

■While  surveys  can  be  designed  to  collect  a  variety  of  types  of 
information  concerning  the  target  population,  relevant  to  HCI 
research  and  literature  (Card,  1996),  they  are  mostly  targeted  at 
collecting  information  in  the  following  three  categories: 

a.  User  Evaluation.  The  category  aims  at  collecting  infor¬ 
mation  regarding  how  much  a  system,  product,  or  envi¬ 
ronment  meets  user  goals,  expectations,  and  standards. 
In  this  category,  users  are  asked  a  number  of  questions  re¬ 
garding  whether  their  overall  impression  regarding  the 
object  being  evaluated  is  high,  what  exactly  constitutes 
this  impression,  what  and  where  exactly  the  problems 
are,  and  so  on.  Relating  to  user  satisfaction,  this  category 
is  also  about  determining  user  opinions  specific  to  prod¬ 
ucts  or  systems,  where  questions  can  also  include  users’ 
opinions  concerning  whether  tasks  can  be  completed  ef¬ 
fectively  and  efficiently,  whether  the  system  is  fast,  and 
so  forth. 

b.  User  Opinion.  The  category  can,  but  does  not  have  to, 
be  specific  to  products,  systems,  or  environments. 
These  types  of  surveys  are  aimed  at  determining  what 
users  think  about  the  requirements  from  a  system,  prod¬ 
uct,  or  environment  to  fulfill  its  function  satisfactorily. 
Examples  can  include  surveys  that  aim  at  needs  assess¬ 
ments  for  the  next  generation  of  cell  phones  (e.g.,  what 
new  functionalities  can  be  useful  in  newer  cell  phones 
besides  those  that  already  exist  according  to  cell  phone 
customers).  Simply  put,  while  the  former  category  con¬ 
sists  of  surveys  regarding  the  performance  of  existing 
systems,  environments,  and  products,  the  current  cate¬ 
gory  is  concerned  with  what  users  think  about  what 
might  be  useful  concerning  these  systems  in  more  gen¬ 
eral  terms. 

c.  Others.  The  third  category  includes  the  remaining  pos¬ 
sible  survey  types  aimed  at  collecting  a  number  of  dif¬ 
ferent  information  types  within  the  HCI  context.  One 
such  category  consists  of  surveys  that  are  strictly  con¬ 
centrated  on  population  demographics.  These  types  of 
surveys  do  not  contain  questions  relying  on  participants’ 
evaluation  of  specific  products  or  their  opinions,  but 
rather  solely  on  qualifications  they  own,  such  as  age,  sex, 
education  level,  skill  level,  and  so  forth,  or  things  they 
do,  such  as  how  frequently  they  go  on  the  Internet  or 
use  a  cell  phone.  These  types  of  survey  questions  are  less 
based  on  opinion-heavy  responses  than  the  previous  two 
categories. 
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Brief  History  of  Survey  Design  in  HCI 

Surveys  started  being  used  as  a  computer  science  and,  to  a  lim¬ 
ited  extent,  an  HCI  research  tool  in  early  1970s,  borrowing 
techniques  from  anthropology  and  experimental  and  social  psy¬ 
chology  (Myers,  Hollan,  &  Cruz,  1996).  With  contribution  from 
developments  in  the  overall  survey  administration  and  language 
issues  (Converse  &  Presser,  1986;  Belson,  1981;  Jenkins  &  Dill- 
man,  1997),  researchers  discovered  early  on  that  information  re¬ 
garding  user  attitudes,  preferences,  and  evaluations  within  the 
context  of  computer  technology  (software  and  hardware)  de¬ 
velopment  can  be  collected  fairly  easily  with  paper-and-pencil 
surveys.  Hence,  in  the  1970s  and  1980s,  user  surveys  were  a  part 
of  computer  research  with  a  social  psychology  emphasis,  but 
not  directly  relating  to  usability  testing  and  usability-design  is¬ 
sues.  In  the  mid-1980s  graphical  user  interfaces  became  an  im¬ 
portant  part  of  end  user  computing  and  usability  research  took 
off  The  surveys  gained  a  more  significant  role  in  HCI  research 
around  the  same  time,  and  with  the  advent  of  graphical  user 
interfaces,  surveys  in  HCI  and  specifically  usability  research 
gradually  gained  importance.  With  the  graphical  user  interfaces 
as  we  know  today  gaining  high  popularity  with  Windows  95,  us¬ 
ability  research  accelerated  (Myers,  1996),  and  besides  building 
usability  laboratories,  companies  and  research  institutions 
started  developing  and  implementing  surveys  to  determine 
user  trends  and  preferences  in  HCI.  Advanced  electronic  and 
paper-and-pencil  survey-design  methods  have  been  developed 
in  the  last  decade  (Dillman,  2000),  and  user  surveys  have  be¬ 
come  an  essential  part  of  data  collection  in  HCI  research. 

Paper-and-Pencil  and  Electronic  Surveys 

Survey  implementation  largely  relies  on  practical  factors.  Be¬ 
sides  the  challenge  of  finding  a  sample  size  that  is  both  large 
enough  and  representative  of  the  population  the  study  is  tar¬ 
geted  at,  implementation  challenges  include  presenting  the  par¬ 
ticipant  sample  with  a  survey  that  is  quick  and  easy  to  complete, 
has  a  fun  factor  and  a  learning  component  for  the  participants, 
and  does  not  require  tedious  data  extraction  and  manipulation 
once  implementation  is  completed.  Additionally,  surveys  in 
every  topic  should  be  unambiguous,  unbiased,  correctly  coded, 
piloted,  and  ethical  (Stone,  1993).  Today’s  surveys  are  almost 
universally  implemented  in  two  forms:  (a)  paper-and  pencil  sur¬ 
veys,  which  require  participants  to  mark  or  write  their  responses 
on  response  sheets  that  also  contain  questions,  either  on  the 
same  sheet  or  separately;  and  (b)  electronic  surveys,  which  re¬ 
quire  the  users  to  use  the  screen,  keyboard,  and/or  mouse  of  a 
computer  to  mark  or  type  their  responses  on  the  screen. 

Paper-and-pencil-based  surveys  require  the  survey  mater¬ 
ial  to  be  physically  distributed,  filled  out,  and  returned.  This 
process  can  occasionally  be  cumbersome  and  tedious.  More¬ 
over,  these  types  of  surveys  also  require  manual  entry  of  quan¬ 
titative,  and  in  most  cases  qualitative  data,  for  analysis.  One  so¬ 
lution  to  the  problem  of  translation  of  paper-based  data  into 
electronic  format  is  using  Scantron  sheets  which  are  sometimes 
referred  to  as  “bubble-sheets.”  In  this  system,  designated  areas 
(bubbles)  on  printed  sheets  are  filled  out  by  participants  with  a 


pencil,  and  these  sheets  are  then  fed  into  a  computer  and 
scanned  for  correct  answers.  This  process,  however,  is  quite 
costly  due  to  the  scanning  equipment  necessary  for  the  process. 
Figure  58.1  shows  two  sample  paper-and-pencil  survey  sheets, 
one  with  response  spaces  below  each  question  and  one  with 
response  sheets  that  are  separate  from  question  sheets. 

Although  about  62%  of  all  American  households  own  one  or 
more  computers  (U.S.  Census  Bureau,  2005),  computers  are 
still  not  accessible  to  the  entire  population.  Therefore,  paper- 
and-pencil  surveys  are  still  widely  popular.  Paper-and-pencil  sur¬ 
veys  allow  swift  collection  of  large  data  quantities  if  they  are  ad¬ 
ministered  to  groups  of  participants  simultaneously,  such  as  a 
group  of  students  during  a  class  period  who  are  asked  to  fill  out 
and  return  the  surveys  immediately.  One  other  common  way 
of  paper-and-pencil-based  survey  implementation  is  mailing  in 
surveys  to  participants  and  asking  them  to  return  them  via  mail, 
in  most  cases  in  postage-prepaid  envelopes.  However,  recent 
studies  indicated  that  return  rates  of  mailed-in  surveys  by  the 
participants  is  highly  variable  depending  on  the  type  of  survey 
(Dillman,  1991,  2000).  Taking  into  consideration  the  percentage 
rate  of  computer  ownership  among  American  households, 
mailed-in  surveys  can  be  concluded  as  a  less-preferred  means  of 
data  collection  specifically  within  the  context  of  the  HCI  re¬ 
search,  and  mail  surveys  are  therefore  not  popularly  used  for 
HCI  research  purposes. 

Computer-based  surveys  (sometimes  referred  to  as  PC-based 
surveys)  have  become  popular  with  the  advent  of  home  com¬ 
puters  in  the  1990s,  even  before  the  high  adoption  of  the  World 
Wide  Web.  In  computer-based  surveys,  participants  are  pre¬ 
sented  the  survey  on  a  specific,  standalone  (non-networked) 
computer.  It  should  be  noted  that  these  types  of  surveys  are  not 
web-based,  but  rely  on  the  software  installed  on  the  computer 
on  which  the  survey  is  implemented. 

Participants  use  the  mouse  to  click  on  their  responses  of 
choice  for  multiple-choice  questions.  Mouse  actions  are  gen¬ 
erally  implemented  on  dropdown  combo  boxes  (where  a  drop¬ 
down  menu  opens  up  with  the  options  when  the  user  clicks 
on  the  button  located  to  the  right  side  of  it)  or  radio  buttons 
(a  round  area  to  the  left  of  the  option  is  clickable),  or  check 
boxes  where  multiple  choices  can  be  clicked  on  at  one  time 
(a  square-shaped  box  to  the  left  of  the  option  is  clickable) 
(Ozok  &  Salvendy,  2000).  For  text  entries,  participants  can  type 
text  on  specified  text  boxes.  While  computer-based  surveys  can 
be  convenient  because  of  having  the  initial  data  in  electronic 
format  and  eliminating  the  necessary  transformation  to  elec¬ 
tronic  format  in  paper-and-pencil-based  surveys,  they  require 
the  participants  to  be  stationary  on  a  specific  computer.  For 
large-size  implementations,  computer-based  surveys  can  be  ex¬ 
tremely  slow  in  collecting  the  data,  mainly  due  to  limited  com¬ 
puter  equipment  and  scheduling  difficulties.  It  can  be  concluded 
that  while  computer-based  surveys  can  be  advantageous  in  the 
data-analysis  stage  and  are  still  popular  in  kiosks  stationed  in 
public  places  such  as  shopping  malls,  they  are  not  suitable  for 
large-sample  size  and  lengthy  surveys,  and  hence  are  not  the 
best  solution  in  survey-based  data  collection  in  HCI  context. 

With  the  advent  of  the  Internet,  web-based  (online)  surveys 
have  become  highly  popular  (Dillman,  1999)  and  allowed  re¬ 
searchers  to  conduct  surveys  more  effectively  and  efficiently  than 
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3.  My  cell  phone  interface  is  easy  to  use. 
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Question  Sheet: 

1 .  My  cell  phone  screen  is  convenient  to  use. 

2.  The  keypad  on  my  cell  phone  is  a  convenient  means  of  data  input. 

3.  My  cell  phone  interface  is  easy  to  use. 

Response  Sheet: 

O 

O 

O 

O 

O 

O 

O 

1.  Strongly 
Disagree 

Disagree 

Moderately 

Disagree 

Neutral 

Moderately 

Agree 

Agree 

Strongly 

Agree 

O 

O 

O 

O 

O 

O 

O 

Strongly 

2.  Disagree 

Disagree 

Moderately 

Disagree 

Neutral 

Moderately 

Agree 

Agree 

Strongly 

Agree 

O 

O 

O 

O 

O 

O 

O 

3  Strongly 
Disagree 

Disagree 

Moderately 

Disagree 

Neutral 

Moderately 

Agree 

Agree 

Strongly 

Agree 

FIGURE  58.1.  Two  paper-and-pencil  presentations  of  the  same  survey,  one  with  the  response 
spaces  below  each  question  and  one  with  separate  question  and  response  sheets. 


more  traditional  means  (Zhang,  2000).  Server-based  software  al¬ 
lows  survey  participants  to  access  a  web  page  and  fill  out  the  sur¬ 
vey,  then  submit  their  results  mostly  to  a  central  server  by  clicking 
an  on-screen  button,  web-based  survey  interfaces  in  structure 
look  very  similar  to  computer-based  surveys  with  same  interface 
elements  of  dropdown  combo  boxes,  radio  buttons,  check  boxes, 
and  text  boxes  serving  the  functions  of  various  data-entry  types  by 
the  participants.  The  data  are  collected  on  a  central  web  server 
in  these  types  of  surveys,  which  can  be  easily  obtained  and  ma¬ 
nipulated  by  the  survey  administrators.  Additional  data  storage 
and  analysis  programs  residing  on  these  web  servers  can  com¬ 
pile  the  data  in  a  variety  of  formats  such  as  Microsoft  Excel,  and 
also  implement  some  automatic  data  analyses  such  as  calculation 
of  means  and  standard  deviations  (descriptive  statistics). 

One  additional  electronic  survey  type  consists  of  the  admin¬ 
istration  of  e-mail  surveys  in  which  participants  are  e-mailed  a 
survey  and  are  asked  to  fill  it  out  and  e-mail  it  back.  However, 


with  the  dramatic  increase  in  the  amount  of  spam  e-mail  users 
receive  in  recent  years,  it  can  be  concluded  that  these  kinds  of 
e-mails  are  likely  to  be  perceived  as  spam  and  are  likely  to  be 
discarded.  Therefore,  e-mail  surveys  are  not  articulately  covered 
in  this  chapter. 

There  are  two  types  of  methods  used  in  web-based  survey 
administration  today.  One  method  is  to  use  a  web  survey  ad¬ 
ministration  service  provider  (such  as  SurveyMonkeycom)  by 
paying  it  a  monthly  or  yearly  fee.  In  most  cases,  various  pack¬ 
ages  are  available  ranging  from  a  small  number  of  surveys  to 
unlimited  administration  of  surveys.  The  web  service  providers 
also  have  a  variety  of  options  for  the  retrieval  of  survey  data  by 
the  administrators,  for  example  in  Access  or  Excel  formats,  with 
some,  in  most  cases  basic,  statistical  analyses  (such  as  descrip¬ 
tive  statistics  and  correlations)  already  performed  on  the  data. 
Additionally,  the  services  also  give  flexibility  in  customization  of 
survey  interfaces  such  as  giving  the  freedom  to  the  administra- 
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tors  (and  in  some  rare  cases  to  the  participants)  to  choose  their 
text  and  background  colors,  font  sizes  and  types,  how  many 
questions  to  have  per  web  page,  and  so  on.  Today,  there  are 
more  than  a  hundred  online  survey-service  providers  with 
monthly  fees  varying  from  |3  for  small-scale  academic  surveys 
to  $500  for  large-scale,  mostly  marketing-related  surveys. 

Another  means  of  administering  online  surveys  is  to  use 
one’s  own  hardware  and  software.  As  a  central  server  is  neces¬ 
sary  for  collection  and  storage  of  the  data,  this  hardware  can 
either  be  purchased  or  rented,  or  an  existing  server  can  be  used. 
The  amount  of  storage  space  necessary  largely  depends  on  the 
length  and  sample  size  of  the  survey,  but  since  most  surveys 
contain  textual  data,  it  is  almost  unimaginable  to  need  more 
than  five  gigabytes  of  storage  space  for  a  large-size  survey  for 
purposes  of  HCI  research.  A  variety  of  open  source  (such  as  php 
ESP  [Easy  Survey  Package])  and  licensed  software  vendors 
(such  as  Inquisite)  are  available  for  survey  implementation  and 
administration  on  administrator-owned  servers.  Like  survey-ser¬ 
vice  providers,  these  software  packages  also  allow  a  variety  of 
customization  flexibilities  concerning  the  survey  interfaces.  Fig¬ 
ure  58.2  presents  a  sample  interface  from  an  online  survey. 

For  both  paper-and-pencil-based  and  electronic  surveys, 
human  cognitive  limitations  should  be  taken  into  considera¬ 
tion  and  basic  human  factors  guidelines  should  apply.  Paper- 
and-pencil  surveys  should  be  administered  with  pencil  to 
allow  participants  to  go  back  and  change  their  responses  if 
they  want  to.  White  paper  with  black,  12-l4-point-sized  Times 
New  Roman  font  text  should  be  used,  as  those  font  sizes  are 
the  most  common  and  easily  readable  text  sizes  in  printed 
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FIGURE  58.2.  Sample  interface  from  an  online  survey. 


documents.  Labels  should  be  presented  with  bold,  16-18- 
point-size  text,  and  while  pictures  are  seldom  presented  to 
participants,  if  they  are  presented,  they  should  have  enough 
resolution  to  appear  clearly  on  paper.  In  short,  paper-and-pen- 
cil-based  survey  interfaces  should  be  inspected  to  make  sure 
they  comply  with  the  structural  human  factors  guidelines  for 
paper-based  interfaces. 

Similarly,  electronically  administered  surveys  (web-  or  PC- 
based)  also  should  follow  basic  human-factors  guidelines  for 
computer  interfaces.  For  web-based  surveys,  it  should  be  noted 
that  participants  will  access  the  surveys  from  a  variety  of  differ¬ 
ent  types  of  machines,  and  basic  computer  and  web  design 
guidelines  indicate  800X600  color  screen  resolution  is  the  most 
common  screen  type  (Nielsen,  1993;  Badre,  2002)  which  should 
be  taken  into  consideration  when  survey  interfaces  are  de¬ 
signed,  making  sure  that  survey  objects,  mainly  text  fonts  and 
some  images  if  there  are  any,  are  easily  visible  on  screens  with 
this  resolution.  The  basic  rule  of  black  text  on  white  background 
in  web  usability  should  also  be  applied  in  computer-  and  web- 
based  interfaces,  and  screen  objects  such  as  dropdown  combo 
boxes,  radio  buttons,  buttons,  and  check  boxes  should  be  the 
same  size  (mainly  height)  as  text  lines  to  ensure  consistency  and 
easy  visibility  and  to  allow  users  to  click  on  them  easily.  Text 
boxes  for  the  users  to  type  in  should  allow  text  size  consistent 
with  the  text  on  the  question  parts  of  the  interface.  Overall, 
again,  it  can  be  said  that  in  web-based  surveys,  basic  web-de- 
sign  guidelines  can  easily  be  adopted. 

Additionally,  one  important  item  in  survey  design  is  to  pre¬ 
vent  survey  participants  from  getting  worried  about  the  excessive 
length  of  the  surveys,  as  too-long  surveys  may  result  in  signifi¬ 
cantly  decreased  response  rates  (Lund  &  Gram,  1998;  Krosnick, 
1999).  Taking  also  into  consideration  the  fact  that  computer 
users  don’t  enjoy  scrolling  down  the  screen  in  general  (Shnei- 
derman,  1992),  no  more  than  20  questions  should  be  presented 
on  one  survey  screen.  For  surveys  containing  more  than  20  ques¬ 
tions,  subsequent  questions  should  be  presented  on  subse¬ 
quent  screens  (again,  each  screen  not  exceeding  20  questions), 
which  participants  should  move  to  by  clicking  on  a  screen  link 
or  button  that  should  have  a  statement  such  as  “Click  here  to 
continue.” 

Both  paper-and-pencil-based  and  electronic  surveys  will 
continue  to  be  used  in  HCI  research.  While  it  is  unlikely  that 
the  former  will  go  extinct  any  time  soon,  recent  studies  (such 
as  Dillman,  2000)  showed  that  web-based  surveys  have  very 
significant  advantages  in  data  collection  and  analysis  stages  of 
survey-based  research.  The  fact  that  most  HCI-related  survey 
research  uses  computer-sawy  sample  participant  groups  is 
also  a  factor  that  helps  the  popularity  of  web-based  surveys 
within  HCL  It  is  therefore  expected  that  web  surveys  will  even¬ 
tually  constitute  to  a  large  majority  of  survey-based  research  in 
HCL  "While  no  previous  study  explicitly  investigated  the  exact 
share  percentage  of  paper-and-pencil-based  and  electronic 
surveys  in  HCI  research,  it  is  estimated  that  more  than  6096 
of  all  surveys  for  HCI  research  are  implemented  on  the  web 
environment. 

While  the  discussion  in  this  chapter  mostly  concentrated 
on  structural  issues  of  paper-and-pencil-based  and  electronic 
survey  design,  a  much  larger  research  topic  concentrates  on 
how  the  actual  survey  content  should  be  designed,  created. 
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and  implemented.  The  next  chapter  discusses  content  cre¬ 
ation  in  surveys  within  the  context  of  HCI  research. 


SURVEY  DESIGN  IN  HCI 


A  major  part  of  survey-design  research  in  HCI  is  concerned  with 
the  formulation  of  survey  questions  and  scales.  In  comprehen¬ 
sive  user  surveys,  questions  may  be  presented  as  one  large,  con¬ 
tinuous  list.  Related  questions  may  also  be  presented  in  cate¬ 
gories  and  sections,  such  as  the  demographic  questions  in  the 
first  section,  questions  evaluating  the  structural  elements  of  the 
interface  in  the  second  section,  and  so  on.  As  discussed  earlier, 
surveys  in  HCI  research  are  mostly  concentrated  on  collecting 
two  categories  of  information:  participants’  opinions  and  eval¬ 
uations.  These  two  survey  types  are  discussed  in  the  next  sec¬ 
tion.  This  discussion  is  followed  by  an  explanation  of  survey  ap¬ 
plication  domains  and  the  survey  design  methodology. 


Survey  Types 
Opinion  Surveys 

Opinion  surveys  aim  at  determining  what  participants  think 
about  issues  pertaining  to  certain  concepts.  Additionally,  opin¬ 
ion  surveys  can  also  measure  participants’  wishes,  habits,  and 
customs  (Baker,  1998).  The  way  opinion  surveys  differ  from 
evaluation  surveys  is  that  opinion  surveys  are  not  centered  on 
a  specific  product  or  environment,  but  are  more  general  in  na¬ 
ture.  For  example,  an  opinion  survey  can  target  measuring  cell 
phone  users’  habits  and  opinions  regarding  the  use  of  the  cell 
phone  interface  (whether  they  are  happy  with  the  screen  size, 
whether  they  can  easily  enter  data  using  the  keypad,  etc.) .  "While 
questions  are  focused  on  a  product  in  this  example  (e.g.,  a  cell 
phone)  the  population  of  cell  phone  users  is  highly  diverse,  as 
are  cell  phone  devices  that  are  used  by  the  population.  Hence, 
such  an  opinion  survey  will  measure  general  trends  in  usability 
concerning  cell-phone  interfaces  rather  than  evaluating  a  spe¬ 
cific  cell  phone  interface. 

Opinion  surveys  are  administered  with  more  freedom  than 
evaluation  surveys,  as  the  participants  do  not  have  to  have  the 
evaluated  issue,  product,  or  environment  fresh  in  their  memory. 
Hence,  they  can  fill  out  opinion  surveys  at  any  time  and  in  any 
environment  that  is  convenient  for  them.  Opinion  surveys  in¬ 
clude  questions  that  do  not  require  any  recalling  process,  mean¬ 
ing  they  contain  questions  which  participants  can  answer  with¬ 
out  having  to  recall  a  specific  feature  or  part  of  the  environment, 
product,  or  issue.  As  opinion  surveys  ask  general  questions 
about  participants’  current  standing  regarding  opinions,  wishes, 
customs,  and  habits,  the  answers  may  sound  subjective  and  vary 
greatly  among  participants.  Sample  questions  for  opinion  sur¬ 
veys  include  statements  such  as  “Does  the  use  of  computers  in 
daily  tasks  contribute  to  your  overall  technology  knowledge?”  or 
‘Are  you  in  general  satisfied  with  the  amount  of  product  infor¬ 
mation  available  on  an  e-commerce  page  you  frequently  visit?” 
or  “How  would  you  rate  the  customer  services  on  the  sites 
where  you  frequently  shop?” 


Opinion  surveys  can  cover  a  broader  variety  of  issues  than 
evaluation  surveys  which  are  more  focused.  They  can  include 
both  qualitative  and  quantitative  scales  for  their  responses.  Al¬ 
though  no  previous  literature  came  up  with  a  strict  classification 
of  opinion  surveys,  the  following  classification  can  help  in  struc¬ 
turing  of  opinion  surveys  and  what  kind  of  questions  should  be 
asked  for  what  types  of  opinion-related  survey  categories  in  HCI 
research  (Sample  questions  for  each  classification  of  opinions 
are  presented  on  Table  58.1): 

a.  Opinions  on  a  medium.  "Within  the  context  of  HCI  re¬ 
search,  these  types  of  surveys  concentrate  in  most  cases 
on  interface  design,  usability,  user  satisfaction,  and  user 
preferences  issues  concerning  a  medium  the  partici¬ 
pants  use  in  their  daily  lives.  Most  popular  examples  of 
this  type  of  media  include  daily-used  devices  such  as 
computers.  Personal  Digital  Assistants  (PDAs),  and  cell 
phones.  These  types  of  surveys  concerning  users’  opin¬ 
ions  on  a  certain  medium  aim  at  determining  general 
trends  in  user  opinions  concerning  whether  the  design 
of  the  medium  is  satisfactory  and  meets  user  needs  and 
requirements,  whether  there  are  problems  concerning 
the  medium,  and  what  can  be  possible  solutions.  Opin¬ 
ion  surveys  concerning  a  medium  are  also  useful  when 
they  are  used  by  usability  specialists  and  engineers  to  de¬ 
velop  new  versions  of  products  or  to  come  up  with  new 
products,  as  the  survey  results  can  pinpoint  the  needs 
of  the  target  population  to  be  met  regarding  the 
medium.  Sample  questions  for  these  types  of  surveys  can 
include  questions  like  ‘Are  you  in  general  satisfied  with 
the  web  browsing  capabilities  of  your  cell  phone?”  or 
“"What  additional  capabilities  would  you  like  to  see  on 
your  PDA  user  interface?” 

b.  Opinions  on  an  event.  "Within  the  context  of  HCI  re¬ 
search,  user  opinions  on  an  HCI-related  event  can  include 
what  they  think  about  certain  HCI-related  happenings. 
Examples  can  include  opinions  concerning  certain  HCI- 
related  activities,  with  questions  like  “Do  you  find  the 
move  from  a  text-based  interface  to  a  graphical  user  in¬ 
terface  helpful?”  These  types  of  surveys  are  rarer  in  nature 
and  aim  at  collecting  basic  trends  concerning  user  opin¬ 
ions  in  current  HCI-related  activities. 

c.  Opinions  on  a  procedure.  These  kinds  of  surveys  aim 
at  determining  the  user  opinions  on  procedures  to  com¬ 
plete  HCI-related  tasks.  They  are  similar  to  medium- 
related  opinion  surveys,  but  rather  than  questions  about 
the  medium  itself,  these  surveys  have  the  goal  of  deter¬ 
mining  user  opinions  on  how  it  is  used.  In  web  and 
e-commerce  design,  these  kinds  of  opinion  surveys  are 
helpful  in  determining  whether  the  procedures  to  com¬ 
plete  general  tasks  (for  example,  web  navigation)  meet 
user  requirements  and  needs.  A  sample  question  in  an 
opinion  survey  concerning  a  procedure  could  be  ‘Are  you 
satisfied  with  how  long  it  generally  takes  to  purchase  a 
product  on  an  e-commerce  site?”  Surveys  to  explore 
opinions  on  a  procedure  are  less  common  as  HCI  re¬ 
searchers  usually  resort  to  evaluation  surveys  to  test  pro¬ 
cedures  for  use  of  computer  interfaces  and  other  media 
(explained  in  the  next  section) . 
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TABLE  58. 1 .  Opinion  and  Evaluation  Survey  Sample  Questions 

Opinions  on  a  Medium: 

1.  Personal  Digital  Assistants  (PDAs)  are  convenient. 

O  O  O  O  O  O  O 

Strongly  Disagree  Moderately  Neutral  Moderately  Agree  Strongly 

Disagree  Disagree  Agree  Agree 

2.  I  would  prefer  using  a  laptop  to  a  desktop  computer  for  my  daily  tasks. 

o  o  o  o  o  o  o 

Strongly  Disagree  Moderately  Neutral  Moderately  Agree  Strongly 

Disagree  Disagree  Agree  Agree 

3.  Tablet  PCs  are  capable  to  meet  my  daily  computing  needs. 

O  O  O  O  O  O  O 

Strongly  Disagree  Moderately  Neutral  Moderately  Agree  Strongly 

Disagree  Disagree  Agree  Agree 

4.  Setting  up  a  wireless  Internet  connection  is  easy  on  a  computer. 

o  o  o  o  o  o  o 

Strongly  Disagree  Moderately  Neutral  Moderately  Agree  Strongly 

Disagree  Disagree  Agree  Agree 

Opinions  on  an  Event: 

1.  Free  wireless  area  networks  should  be  universal. 

O  O  O  O  O  O  O 

Strongly  Disagree  Moderately  Neutral  Moderately  Agree  Strongly 

Disagree  Disagree  Agree  Agree 

2.  Advancements  in  Liquid  Crystal  Display  (LCD)  technology  made  them  more  affordable. 

O  O  O  O  O  O  O 

Strongly  Disagree  Moderately  Neutral  Moderately  Agree  Strongly 

Disagree  Disagree  Agree  Agree 

3.  Free  services  on  the  Internet  meet  my  expectations. 

O  O  O  O  O  O  O 

Strongly  Disagree  Moderately  Neutral  Moderately  Agree  Strongly 

Disagree  Disagree  Agree  Agree 

4.  The  Windows  computer  interface  is  significantly  easier  to  use  now  than  it  was  five  years  ago. 

O  O  O  O  O  O  O 

Strongly  Disagree  Moderately  Neutral  Moderately  Agree  Strongly 

Disagree  Disagree  Agree  Agree 

Opinions  on  a  Procedure: 

1.  The  sign-up  process  for  e-commerce  sites  is  too  time-consuming. 

O  O  O  O  O  O  O 

Strongly  Disagree  Moderately  Neutral  Moderately  Agree  Strongly 

Disagree  Disagree  Agree  Agree 

2.  Password  authentication  procedures  are  reliable. 

O  O  O  O  O  O  O 

Strongly  Disagree  Moderately  Neutral  Moderately  Agree  Strongly 

Disagree  Disagree  Agree  Agree 

3.  The  steps  involved  in  personalizing  a  cell  phone  interface  settings  are  long. 

o  o  o  o  o  o  o 

Strongly  Disagree  Moderately  Neutral  Moderately  Agree  Strongly 

Disagree  Disagree  Agree  Agree 

4.  Payment  transaction  procedures  on  the  Internet  are  mostly  secure. 

O  O  O  O  O  O  O 

Strongly  Disagree  Moderately  Neutral  Moderately  Agree  Strongly 

Disagree  Disagree  Agree  Agree 

Evaluation  (of  a  Computer  Interface): 

1.  How  difficult  did  you  find  completing  the  tasks  on  the  interfaces? 

O  O  O  O  O  O  O 

Extremely  Difficult  Somehow  Neither  Easy  Somehow  Easy  Extremely 

Difficult  Difficult  Nor  Difficult  Easy  Easy 

2.  How  intuitive  were  the  interfaces  you  used? 

o  o  o  o  o 

Not  Intuitive  at  All  Not  Intuitive  Neutral  Intuitive  Very  Intuitive 

3.  While  completing  the  tasks,  I  made _ . 

O  O  O  O  O 

No  Errors  Few  Errors  Neither  Many  nor  Quite  a  Few  Errors  Many  Errors 

Few  Errors 

4.  Were  the  interfaces  aesthetically  pleasing? 

O  O  O  O  O 

Not  at  All  Very  Little  Neutral  Quite  a  Bit  A  Lot 


Evaluation  Surveys 

More  specific  than  opinion  surveys,  evaluation  surveys  (or 
questionnaires)  are  generally  administered  after  a  certain  pro¬ 
cedure  is  implemented  on  the  participant  group.  While  opin¬ 
ion  surveys  can  be  administered  at  any  time  to  the  participants, 
evaluation  surveys  are  administered  right  after  the  participants 
have  completed  a  certain  procedure.  In  evaluation  surveys,  par¬ 


ticipants  are  asked  about  tasks  they  have  just  recently  com¬ 
pleted.  Therefore,  evaluation  surveys  are  in  most  cases  pre¬ 
ceded  by  certain  usability-related  experimental  procedures. 
Most  common  in  HCI,  evaluation  surveys  are  administered  after 
participants  have  completed  a  number  of  tasks  in  controlled 
computer  environments.  They  are  also  implemented  right  af¬ 
ter  the  procedure  to  ensure  that  memories  regarding  the  pro¬ 
cedure  are  still  fresh  in  the  participants’  minds,  as  evaluation 
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surveys  require  a  high  amount  of  recall  of  procedures  and  in¬ 
terfaces  in  the  tasks  that  were  just  previously  completed. 

HCI-related  evaluation  surveys  have  the  main  goal  of  evalu¬ 
ating  usability,  user  satisfaction  and  user  preference  issues  con¬ 
cerning  user  interfaces  or  environments  (Ozok  &  Salvendy 
2001).  After  certain  tasks  are  completed  in  these  controlled  en¬ 
vironments,  evaluation  surveys  aim  at  determining  the  exact  na¬ 
ture  and  location  of  problems  and  points  open  to  improvement 
in  the  human-computer  environments.  Therefore,  evaluation 
surveys  are  in  most  cases  relatively  detailed  in  nature.  In  a  sam¬ 
ple  procedure  for  example,  participants  can  be  presented  a 
number  of  web  pages  and  asked  to  complete  some  common 
user  tasks  on  those  pages,  tasks  that  can  include  text  entry,  di¬ 
rect  manipulation,  and  form  filling.  An  evaluation  survey  that 
would  follow  could  include  questions  such  as  “Was  the  site  nav¬ 
igation  difficult?”,  “Was  the  text  size  on  the  site  readable?”,  “Did 
you  enjoy  the  overall  experience  on  this  site?”,  “Would  you  visit 
this  site  again?”,  and  so  on. 

As  their  name  indicates,  evaluation  surveys  aim  at  evaluating 
interfaces,  environments,  and  procedures  from  the  user’s  per¬ 
spective.  For  that  purpose,  they  are  tools  to  determine  how 
participants  evaluated  those  interfaces,  environments,  and  pro¬ 
cedures.  In  that  sense,  evaluation  surveys  are  explicit  and  not 
that  much  different  from  objective  measurement  methods 
such  as  performance  measurement  in  HCI  tasks.  Similar  to 
opinion  surveys,  evaluation  surveys  use  both  qualitative  and 
quantitative  scales. 

Evaluation  surveys  are  not  only  helpful  in  evaluation  of  prod¬ 
uct  interfaces,  environments,  and  procedures.  They  can  also  be 
used  in  evaluating  certain  conceptual  features.  For  example,  a 
researcher  may  be  investigating  whether  the  introduction  of  a 
certain  interface  concept  results  in  higher  user  satisfaction  re¬ 
garding  the  interface.  If  this  particular  feature  is,  say,  that  of  in¬ 
terface  visibility,  the  survey  administrator  can  first  present  the 
participants  with  high-visibility  and  low-visibility  computer  in¬ 
terfaces,  and  the  subsequent  evaluation  survey  can  contain 
questions  such  as:  “Did  higher  visibility  improve  your  overall 
satisfaction  with  the  computer  screens?” 

Evaluation  surveys  are  useful  in  a  variety  of  HCI-related  com¬ 
mercial  and  academic  research  activities.  In  academic  research, 
different  computer  or  computer-related  interfaces  can  be  evalu¬ 
ated  through  surveys  to  determine  whether  they  result  in  better 
user  preferences  or  higher  user  satisfaction.  In  commercial  re¬ 
search,  newly  developed  products  (for  example  a  new  com¬ 
puter  peripheral),  environments  (for  example  a  new  computer 
interface),  or  procedures  (for  example  the  steps  it  takes  to  com¬ 
plete  a  transaction  with  a  new  e-commerce  site  design)  can  be 
empirically  evaluated  by  having  participants  complete  tasks  with 
those  products  and  procedures  or  in  those  environments,  then 
filling  out  evaluation  surveys  consisting  of  detailed  questions  re¬ 
garding  their  satisfaction  and  preferences  regarding  the  said 
product,  environment,  or  procedure.  "While  quantitative  evalu¬ 
ation  results  can  give  statistical  backing  to  user  evaluations, 
helping  boost  their  conclusiveness,  qualitative  evaluation  results 
can  give  the  researchers  who  administered  the  surveys  new 
ideas  to  improve  usability/user-preferences-related  design  com¬ 
ponents.  Therefore,  evaluation  surveys  are  the  most  commonly 
used  survey  types  in  HCI  research  and  constitute  to  one  of  the 
most  common  and  effective  user  evaluation  methods  in  HCI  in 


general.  Examples  of  opinion  survey  questions  are  also  pre¬ 
sented  in  Table  58.1. 

Otfier  Survey  Types 

Besides  the  two  main  survey  types  mentioned  earlier,  one 
widely  used  survey  type  is  the  demographic  survey.  Although 
demographic  surveys  are  almost  universal,  it  would  be  incorrect 
to  categorize  them  at  the  same  level  as  opinion  and  evaluation 
surveys,  as  most  user  surveys  have  a  section  concerning  user  de¬ 
mographics.  Hence,  in  most  cases  demographic  survey  are  es¬ 
sential  parts  of  opinion  and  evaluation  surveys  rather  than 
stand-alone  surveys.  In  some  cases  HCI  researchers  administer 
surveys  consisting  of  demographic  questions  only — for  exam¬ 
ple,  to  determine  the  demographics  of  a  user  group  using  a 
specific  application.  However,  more  commonly,  HCI  surveys 
consist  of  opinion  and/or  evaluation  questions  in  addition  to  the 
demographic  questions. 

Demographic  questions  play  an  important  part  in  HCI- 
related  survey  design,  as  most  variables  of  interest  in  HCI  re¬ 
search  are  also  dependent  on  the  factors  that  are  specific  to  the 
target  population.  Consequently,  research  findings  can  only  be 
generalized  in  most  cases  to  the  target  population  from  which 
a  representative  survey  is  sampled. 

Demographic  surveys  (or  survey  sections)  in  most  cases 
consist  of  a  standard  set  of  categories:  age,  sex,  education,  and 
occupation  of  the  participant.  Age  can  also  be  asked  in  the 
form  of  “birth  year,”  and  for  the  question  regarding  sex,  op¬ 
tions  in  the  form  of  “Male/Female”  can  be  presented  to  partic¬ 
ipants  to  mark  on  the  computer  or  with  the  pen/pencil.  Edu¬ 
cation  level  can  be  left  to  type  in  or  write,  or  options  can  be 
presented.  The  question  is  usually  formulated  as:  “"What  is  the 
highest  degree  for  which  you  won  a  diploma?”  or  “What  is  the 
highest  degree  you  earned?”  Typical  options  for  this  question 
are  “Elementary  School,”  “Middle  School,”  “High  School,” 
“College  or  University,”  “Graduate  Degree,”  and  “Post-Doctoral 
Graduate  Degree.”  The  occupation  question  is  about  what 
kind  of  job  the  participant  has.  For  this  question,  usually  the 
participants  are  asked  to  type  or  write  in  the  designated  area, 
due  to  the  high  variety  of  possible  occupations  participants 
may  have,  although  presenting  options  for  this  question  is  also 
possible  if  some  general  occupation  categories  are  all  that 
is  needed,  for  example  options  such  as  “Private  sector,”  ‘Aca¬ 
demia,”  “Student,”  and  so  on. 

In  addition  to  this  basic  set  of  survey  questions,  demo¬ 
graphic  surveys  can  also  include  general  questions  regarding 
daily  habits  or  current  standing  issues  concerning  the  partici¬ 
pants.  Most  commonly,  demographic  surveys  in  the  HCI  area 
contain  questions  regarding  computer  use  habits  of  participants, 
such  as  “How  many  times  a  day  do  you  check  your  e-mail?”  or 
“How  many  times  in  the  last  year  did  you  shop  from  a  web- 
based  e-commerce  company?”  These  types  of  questions  are  usu¬ 
ally  customized  according  to  the  type  of  information  needed  for 
the  specific  research  being  conducted  and  can  therefore  greatly 
vary  in  nature,  but  in  principle  they  aim  at  collecting  informa¬ 
tion  on  computing-related  habits  in  most  cases.  Table  58.2  pre¬ 
sents  a  set  of  sample  demographic  questions  as  part  of  a  sur¬ 
vey  regarding  cell  phone  use. 
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TABLE  58.2.  Sample  Demographic  Questions 
from  a  Survey  on  Cell  Phone  Use 

Your  Age: 

Your  Gender: 

Your  Occupation: 

How  Many  times  a  week  do  you  go  on  the  web?: 

_ Less  than  once  a  week 

_ Between  once  and  three  times 

_ Between  three  times  a  week  and  every  day 

_ Every  day 

In  the  past  year,  how  many  times  did  you  shop  online 
(Please  put  a  number)?: _ 

Do  you  own  a  cell  phone,  a  Personal  Digital  Assistant,  or  a 
Combination  Device? 

_ Yes 

_ No 

In  the  past  one  year,  how  many  times  did  you  shop  online  using  a  cell 
phone,  a  Personal  Digital  Assistant  or  a  Combination  Device 
(Please  put  a  number)?: _ 


It  should  be  noted  that  demographic  questions  are  of  a  more 
personal  nature  than  opinion  and  evaluation  questions.  Some 
participants  may  feel  that  their  privacy  rights  are  being  violated  by 
being  asked  to  provide  their  age  and  education  level.  For  this  rea¬ 
son,  providing  anonymity  in  surveys  and  informing  the  partici¬ 
pants  about  their  provided  information  being  not  personally  iden¬ 
tifiable — in  other  words,  providing  anonymity — is  greatly  crucial 
in  HCI  research.  Knowing  that  their  data  cannot  personally  iden¬ 
tify  them  usually  takes  care  of  privacy  worries  and  is  known  to 
improve  participant  enthusiasm.  Challenges  concerning  privacy 
in  survey  implementation  are  described  later  in  this  chapter. 

There  are  no  other  significant  survey  types  widely  used  in 
HCI  research.  Some  niche  survey  types  may  still  exist  but  are  few 
and  far  between.  Therefore,  surveys  aimed  at  collecting  infor¬ 
mation  relevant  to  HCI  research  usually  belong  to  one  of  the 
categories  explained  in  this  section.  In  the  next  section,  applica¬ 
tion  domains  of  surveys  relating  to  HCI  research  are  discussed. 

Survey  Application  Domains 

Survey  applications  are  highly  popular  in  a  broad  range  of  ap¬ 
plication  domains,  in  areas  ranging  from  social  sciences  to  mar¬ 
keting,  education  to  production,  customer  to  worker  satisfac¬ 
tion,  and  many  more.  Today,  results  obtained  from  surveys 
which  in  most  cases  ask  comprehensive  questions  are  deemed 
reliable  and  valid  in  both  scientific  and  industrial  projects.  The 
most  common  application  domains  of  survey  research  include 

a.  Sales  and  Marketing:  Companies  that  offer  products 
and  services  for  both  consumers  and  industries  use  cus¬ 


tomer/client  surveys  for  both  needs-assessment  and  eval¬ 
uation  purposes.  A  large  number  of  companies  are  also 
solely  dedicated  to  implement  customer  surveys  for  com¬ 
panies,  analyze  the  data,  and  deduct  conclusions  for  sales 
and  marketing  purposes.  Customer  satisfaction,  product 
evaluation,  customer  relationship  management,  and  cus¬ 
tomer  demographics  are  only  a  few  of  the  topics  surveyed 
by  sales  and  marketing  forces. 

b.  Medicine:  Medical  research  is  not  limited  to  trials  rely¬ 
ing  on  objective  measurements.  Surveys  can  be  helpful 
in  collecting  patient  data  for  development  of  medicine  or 
treatments. 

c.  Education:  Educational  surveys  can  help  determine 
population  preferences  in  education  as  well  as  education 
levels  and  education-related  difficulties  among  popula¬ 
tion  segments. 

d.  Information  Technology  Research:  In  the  field  of 
information  technology,  surveys  are  widely  used  in  con¬ 
nection  to  software  and  hardware  design  and  evaluation, 
covering  a  broad  variety  of  areas  including  software  engi¬ 
neering,  systems  analysis  and  design,  and  of  course  human- 
computer  interaction,  which  this  chapter  covers. 

The  earlier-mentioned  relevant  list  covering  the  application 
domains  of  surveys  is  far  from  complete,  but  a  sample  of  appli¬ 
cation  domains  are  presented  in  the  list.  The  list  will  no  doubt 
continue  growing  with  the  advent  of  new  technologies  and  sci¬ 
ences.  The  HCI  area  is  seen  as  a  major  application  domain  of 
surveys,  and  is  expected  to  continue  to  be  so. 

Survey  Design  Methodology 

Survey  design  is  a  methodological  activity  that  requires  a  sys¬ 
tematic  design  procedure.  Survey  design  techniques  and  pro¬ 
cedures  are  discussed  in  this  section,  including  content  design 
and  scale  design,  followed  by  survey  design  and  redesign  issues, 
a  survey  design  example,  and  a  discussion  of  challenges  in  sur¬ 
vey  design. 

Survey  Design  Techniques 

Survey  design  mainly  consists  of  two  components:  the  de¬ 
sign  of  the  survey  content  and  the  survey  scale.  They  are  both 
discussed  in  this  section. 

Content  design.  In  the  heart  of  the  survey  research  lays 
the  issue  of  producing  the  actual  questions  to  ask  the  partici¬ 
pants.  Designing  the  survey  content  is  actually  producing  these 
questions  along  with  their  scales.  Deciding  on  which  questions  to 
ask  to  the  participants  in  the  survey  largely  depends  on  three  re¬ 
sources:  literature,  expert  opinions,  and  individual  experiences. 

A  large  number  of  survey  questions  are  based  on  previous 
research  in  the  focus  area.  Relying  to  some  extent  on  previous 
literature  allows  the  researchers  to  achieve  high  validity  of  their 
survey  structure,  as  previously  validated  research  allows  cur¬ 
rent  survey  design  to  have  strong  backing  in  terms  of  its  content 
and  the  targeting  of  the  questions  concerning  the  particular 
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research  topic.  Therefore,  it  is  best  to  have  backing  from  pre¬ 
vious  studies  at  least  for  the  majority  of  the  questions  while 
designing  the  survey. 

While  a  designed  survey’s  content  may  consist  largely  of 
questions  that  are  based  on  the  relevant  literature  in  the  area, 
there  will  be  most  likely  some  issues  that  are  intended  to  be  in¬ 
cluded  in  the  survey  but  are  not  covered  in  the  previous  litera¬ 
ture.  Therefore,  HCI  researchers  sometimes  rely  on  experts  in 
the  area  to  cover  additional  points  to  be  included  in  the  survey. 
A  preliminary  survey  may  be  sent  in  this  context  to  the  area  spe¬ 
cialists  to  determine  the  most  significant  items  to  be  covered 
in  the  investigated  area.  For  example,  if  a  survey  research  is  try¬ 
ing  to  determine  the  most  significant  interface  design  items  in 
e-commerce  that  affect  buying  behavior,  a  preliminary  survey 
may  be  sent  to  experts  in  the  area  (for  example,  e-commerce 
company  managers  and  professors  in  business  schools  special¬ 
izing  in  e-commerce)  to  determine  the  general  classifications 
of  interface  issues  relating  to  the  buying  decision.  In  addition 
to  the  literature,  these  responses  from  experts  can  be  used  as  a 
major  resource  of  question  generation  for  the  resulting  survey. 

Researchers  can  also  rely  on  their  own  heuristics  and  exper¬ 
tise  in  producing  questions.  To  prevent  being  accused  of  “mak¬ 
ing  up  the  questions,”  the  researchers  would  need  to  explain 
logically  why  the  questions  were  included  in  the  survey.  In  these 
cases,  researchers  can  include  questions  based  on  what  they 
think  is  a  significant  part  of  the  research  item  being  investigated, 
or  based  on  the  impression  that,  although  the  literature  did  not 
explicitly  point  out  the  issues  in  these  types  of  questions,  there 
was  an  implicit  indication  in  previous  research  towards  this  par¬ 
ticular  direction. 

Design  of  survey  content  is  not  a  difficult  task  once  the  re¬ 
searcher  has  a  reasonable  background  in  the  area  of  interest. 
One  common  mistake  done  in  design  of  survey  content  is  the 
researchers  missing  important  questions  during  the  design  and 
ultimately  not  addressing  those  questions.  Therefore,  cautious, 
repeated  reviews  and  revisions  are  necessary  before  the  final  im¬ 
plementation  of  the  survey. 

Scales  and  open-ended  questions.  Just  as  important  as 
the  content  of  the  questions,  the  scales  for  the  survey  questions 
in  HCI  are  essential  for  the  accuracy  and  validity  of  survey  re¬ 
sults.  Scales  are  created  to  attribute  numerical  values  to  partici¬ 
pant  responses  in  the  survey,  thereby  allowing  statistical  analy¬ 
ses  and  giving  statistical  backing  to  conclusions  obtained  from 
the  research.  To  respond  to  a  scaled  question,  the  participant 
marks  one  of  the  several  options,  the  option  which  best  repre¬ 
sents  his  or  her  opinion  regarding  the  item  in  the  question.  If 
the  question  has  a  large  variety  of  possible  answers,  or  if  it  re¬ 
quires  a  lengthy  answer,  then  an  open-ended  response  style 
may  be  preferred  rather  than  presenting  a  scale  to  the  partici¬ 
pant.  For  open-ended  responses,  participants  are  mostly  given 
the  freedom  to  write  or  type  as  much  as  they  would  like. 

Both  scaled  and  open-ended  questions  are  suitable  for  dif¬ 
ferent  question  types  and  types  of  information  being  obtained 
from  the  participants  in  the  survey.  Quantitative  studies  have 
to  use  numerical  scales  to  statistically  test  their  hypotheses  and 
support  their  findings.  Qualitative  research,  on  the  other  hand, 
analyzes  survey  data  without  the  involvement  of  numbers.  Be¬ 
cause  participants  have  a  much  higher  degree  of  freedom  when 


responding  to  open-ended  questions,  qualitative  responses  are 
not  restricted  to  the  defined  response  universe  determined  by 
survey  designers  (also  referred  to  as  “survey  authors”) .  On  the 
other  hand,  conclusions  derived  from  qualitative  responses  may 
be  more  arguable  because  they  cannot  be  tested  statistically. 

One  more  type  of  responses  in  HCI  surveys  includes  partici¬ 
pants  being  given  the  freedom  to  mark  more  than  one  response 
choice.  While  these  types  of  responses  are  generally  not  as¬ 
signed  numerical  scales,  these  types  of  responses  are  presented 
in  demographic  surveys.  In  these  types  of  questions  with  possi¬ 
ble  multiple  responses  and  in  open-ended  questions,  it  is  useful 
to  present  the  option  of  “Other  (please  specify):”  to  the  partic¬ 
ipants,  as  there  is  always  a  possibility  that  the  survey  designers 
may  not  present  the  option  which  the  participant  would  like  to 
give  as  a  response.  A  sample  question  of  this  sort  could  be 
“Where  do  you  generally  access  the  Internet?”  with  the  possi¬ 
ble  responses  “Home,”  “Work,”  “School,”  “Coffee  Shop,”  “In¬ 
ternet  Cafe,”  and  “Other  (please  specify).” 

It  should  be  noted  that  one  alternative  to  open-ended  sur¬ 
vey  questions  are  interviews  and  focus  groups,  and  these  more 
interactive  data-gathering  techniques  are  likely  to  result  in  the 
collection  of  richer  data  than  open-ended  survey  questions  as 
they  allow  real-time  interactions  between  the  researchers  and 
participants.  Therefore,  it  is  not  highly  common  in  HCI  research 
to  use  surveys  with  open-ended  questions  only.  In  most  cases, 
especially  in  quantitative  survey  research,  a  mix  of  both  open- 
ended  and  scaled  questions  often  proves  to  provide  best  empir¬ 
ical  results.  Due  to  their  higher  frequency  of  use,  this  book  chap¬ 
ter  is  more  focused  on  the  design  and  implementation  of  scaled 
surveys  rather  than  surveys  with  open-ended  questions.  As  part 
of  this  direction,  scale  design  is  discussed  in  the  next  section. 

Scale  design.  While  a  large  variety  of  scaling  techniques 
are  available  for  surveys  in  sociology  and  psychology  research, 
HCI  surveys  mostly  rely  on  Likert  scales  (Medsker  &  Campion, 
1997).  While  contrast  scales  consisting  of  yes-or-no  questions 
with  1/0  corresponding  scales  are  also  used,  five  and  seven- 
point  Likert  scales  are  highly  common  (Aiken,  Lewis,  &  Lewis, 
1996) .  In  most  cases,  a  scale  needs  to  consist  of  and  odd-number 
of  options.  This  way,  the  option  in  the  middle  can  correspond  to 
“no  preference”  or  neutral”  opinion  (Dillman,  2000).  Each 
response  on  the  scale  attributed  a  number  to  allow  the  re¬ 
searchers  to  conduct  statistical  analysis  on  the  collected  data. 
Item  scales  need  to  be  kept  consistent  during  the  data  analysis 
phase,  meaning  items  should  be  lined  up  in  the  same  direction, 
whether  they  are  positive  or  negative — in  most  cases  positive 
responses  scoring  high  and  negative  responses  scoring  low  on 
the  scales.  Inverted  questions  (questions  that  ask  items  in  the 
opposite  direction,  as  discussed  later  in  this  chapter)  should 
have  their  scales  reversed  in  order  to  keep  consistency  and  al¬ 
low  correct  data  analysis.  A  large  amount  of  data  analysis  mis¬ 
takes  in  surveys  usually  happen  because  of  scaling  problems. 
For  example,  if  the  researchers  forget  to  invert  scales  of  reverse 
questions,  then  correlations  and  differences  between  responses 
will  not  come  out  correctly,  resulting  in  lack  of  validity  of  re¬ 
search  conclusions. 

Scales  can  indicate  a  number  of  different  issues.  Some  scales 
are  concerned  with  user  opinions  while  others  are  concerned 
about  frequencies.  Most  common  scale  types  include  agree- 
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ment  measurement  ranging  between  “Strongly  agree”  and 
“Strongly  disagree”;  frequency  measurement  ranging  between 
“Not  at  all”  and  “Very  often”;  quality  opinions  ranging  between 
“Very  good”  and  “Very  poor”;  opinions  regarding  probability 
ranging  between  “Very  unlikely”  and  “Very  likely”;  and  so  on.  It 
should  be  noted  that  survey  scales  offer  a  certain  amount  of 
freedom  to  survey  designers  on  how  to  name  the  possible  re¬ 
sponse  options  for  their  questions,  and  therefore  scales  come  in 
many  different  varieties,  from  those  measuring  amounts  (a  lot, 
quite  a  bit,  etc.)  to  frequencies  (very  often,  often,  etc.)  to  yes-or- 
no  scales.  Normally,  in  most  cases  a  “Not  Applicable”  or  “Not 
Available”  option  needs  to  be  added  to  the  scale.  When  this  op¬ 
tion  is  marked,  this  particular  question  of  this  particular  sub¬ 
ject  is  eliminated  from  the  analysis.  Note  that  this  is  different 
from  giving  a  score  of  zero  to  that  question  in  the  analysis.  Table 
58.3  present  a  sample  of  possible  response  scale  sets  along  with 
possible  number  correspondences  to  the  responses. 

Survey  Evaluation  Techniques 

After  the  initial  design  of  the  survey  questions,  scales,  and  in¬ 
structions  to  the  survey  participants  on  how  to  fill  the  surveys 
out,  surveys  need  to  be  evaluated  to  determine  whether  they 
are  measuring  what  the  designers  intended  them  to  measure, 
whether  they  are  reliable,  and  whether  they  produce  valid  re¬ 
sults.  Pilot  testing  is  one  common  method  to  preliminarily  ad¬ 
dress  these  issues.  However,  full  evaluation  of  a  survey  can 
mostly  happen  only  after  substantial  data  have  been  collected 
with  the  survey  as  a  tool.  Therefore,  the  evaluation  of  a  survey  is 
conducted  based  on  the  data  collected  by  it,  and  the  two  com¬ 
ponents  of  survey  evaluation  are  the  measurement  of  the  sur¬ 
vey’s  validity  and  reliability. 


TABLE  58.3.  Possible  Survey  Responses 
and  Their  Numerical  Equivalences 


Strongly  Disagree 

1 

Never 

1 

Disagree 

2 

Very  Seldom 

2 

Moderately  Disagree 

3 

Seldom 

3 

Neutral 

4 

Neither  Seldom  Nor  Often 

4 

Moderately  Agree 

5 

Somewhat  Often 

5 

Agree 

6 

Often 

6 

Strongly  Agree 

7 

Very  Often 

7 

Excellent 

5 

None 

0 

Good 

4 

Very  Few 

1 

Fair 

3 

Few 

2 

Poor 

2 

A  Fair  Amount 

3 

Very  Poor 

1 

Quite  a  Bit 

4 

Not  Applicable 

— 

A  Lot 

5 

Not  Convenient  at  All 

1 

Very  Difficult 

1 

Highly  Inconvenient 

2 

Difficult 

2 

Inconvenient 

3 

Not  Difficult 

3 

Neutral 

4 

Easy 

4 

Somewhat  Convenient 

5 

Very  Easy 

5 

Convenient 

6 

Yes 

1 

Highly  convenient 

7 

No 

0 

Survey  validity  and  validity  determination.  While 
the  reliability  of  a  survey  is  determined  after  the  survey  data 
has  been  collected,  the  validity  of  the  survey  has  to  be  deter¬ 
mined  prior  to  the  implementation.  As  the  name  implies,  valid¬ 
ity  of  a  survey  is  the  degree  with  which  the  survey  instrument 
is  valid  in  the  results  it  produces,  in  other  words  whether  the 
survey  is  measuring  what  it  says  it  is  measuring  (Litwin,  1995). 
Generally,  within  the  context  of  HCI,  validity  is  covered  twofold: 
construct  validity  indicates  the  degree  of  how  much  the  sur¬ 
vey  is  backed  by  previous  research  in  its  field,  how  solid  its  con¬ 
struct  is.  In  general,  as  in  every  research,  development  of  a  sur¬ 
vey  needs  to  rely  on  previous  research  to  give  the  tool  literature 
backing,  proving  that  the  survey  didn’t  come  out  of  the  imagi¬ 
nation  of  the  designer,  but  rather  relies  on  a  number  of  different 
research  studies  conducted  by  a  number  of  different  re¬ 
searchers.  To  prove  the  construct  validity  of  their  survey,  de¬ 
signers  need  to  prove  the  case  that  the  questions  they  put  into 
their  survey  are  based  on  previous  literature.  Hence,  in  survey 
design,  it  is  imperative  to  ensure  that  a  majority  of  the  questions 
have  been  implicated  in  the  previous  literature  as  items  relevant 
to  the  current  topic  of  interest.  Without  being  able  to  prove  this 
validity,  it  is  not  possible  to  make  a  convincing  case  regarding 
whether  the  survey  is  doing  an  undisputed  contribution  to  the 
overall  research  topic  of  interest.  However,  it  should  be  noted 
that  it  is  almost  impossible  to  provide  a  survey  in  which  every 
single  item  has  a  full  set  of  articles  or  books  backing  it.  In  most 
cases,  some  survey  questions  may  have  some  indirect  mention 
in  the  previous  literature,  and  some  survey  questions  may  be 
solely  based  on  the  individual  experience  and/or  opinion  of  the 
survey  designers.  This  type  of  question  generation  is  also  ac¬ 
ceptable,  as  long  as  the  designers  can  prove  that  those  ques¬ 
tions  are  also  based  on  solid  research.  In  short,  construct  valid¬ 
ity  of  a  research  survey  in  HCI  aims  at  proving  the  conclusion 
unarguably  that  results  obtained  from  this  survey  are  on  target 
and  valid.  Validity  is  therefore  crucial  to  the  success  of  the  re¬ 
search  conducted,  of  which  the  survey  is  a  part. 

Predictive  validity  is,  simply  put,  the  ability  and  power  of  the 
survey  to  predict  correct  results  in  repetitive  use.  A  survey  with 
predictive  validity  indicates  that  the  results  obtained  from  it  in 
the  current  and  future  use  have  the  power  of  predicting  accu¬ 
rate  results.  For  example,  if  a  comprehensive  survey  has  been 
produced  measuring  the  usability  of  a  website,  researchers  will 
need  to  prove  as  part  of  their  study  that  once  this  developed 
instrument  is  administered,  the  results  that  are  produced  accu¬ 
rately  reflect  the  usability  level  and  attributes  of  a  website.  Ad¬ 
ditionally,  the  survey  also  needs  to  accurately  reflect  usability 
levels  and  attributes  when  it  is  administered  on  other  partici¬ 
pant  groups  for  evaluation  of  other  sites.  If  these  capabilities  of 
the  survey  can  be  proven  by  the  researchers,  then  the  survey 
can  be  said  to  have  predictive  validity. 

Like  construct  validity,  predictive  validity  does  not  have  any 
quantitative  measurement  method  to  be  used.  Hence,  a  sur¬ 
vey’s  predictive  validity  again  relies  on  qualitatively  proving  that 
the  survey  results  are  based  on  solid  research  notions,  and 
hence  the  results  are  accurate.  Survey  designers  need  to  explain 
that  the  results  produced  from  their  surveys  have  been  proven 
to  accurately  reflect  current  situations  concerning  the  target 
population’s  specifications,  evaluations,  and  opinions,  and  will 
continue  to  do  so  in  future  applications  when  it  is  administered 
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again.  For  this  purpose,  the  elements  of  the  survey  need  to  be 
proven  as  elements  that  accurately  predict  results  concerning 
the  topic  in  focus.  To  make  accurate  predictions,  surveys  need 
to  consist  of  elements  that  make  accurate  predictions  them¬ 
selves  when  their  results  are  analyzed,  and  to  ensure  that  these 
elements  have  predictive  power,  they  need  to  rely  on  accurate 
literature  findings  and  real-life  situations.  In  short,  similar  to 
construct  validity,  predictive  validity  of  a  survey  can  be  accom¬ 
plished  by  ensuring  that  the  survey  relies  on  solid  previous  lit¬ 
erature  and  the  researchers’  findings.  Sometimes,  to  ensure  that 
survey  findings  and  questions  have  predictive  power,  they  are 
evaluated  by  experts  in  the  area  prior  to  the  implementation.  Pi¬ 
lot  testing  is  addressed  later  in  this  chapter. 

Survey  reliability  and  reliability  measurement.  Re¬ 
liability  of  a  survey  is  the  measure  of  whether  the  survey  is  mea¬ 
suring  things  consistently,  and  whether  the  results  obtained 
from  the  survey  can  be  relied  upon.  A  survey’s  reliability  affects 
its  validity,  as  a  survey  that  is  not  reliable  in  its  measurements 
cannot  produce  fully  valid  results. 

While  there  are  a  number  of  quantitative  reliability  measure¬ 
ment  techniques  for  survey  design,  especially  in  psychology  and 
sociology,  the  two  most  common  reliability  measurement  tech¬ 
niques  used  in  HCI  research  are  internal  and  inter-rater  relia¬ 
bility  techniques. 

The  internal  reliability  technique  is  concerned  with  whether 
the  survey  questions  are  understood  by  the  participants  the  way 
they  are  intended  to  be  understood  when  they  were  prepared 
by  the  survey  designers.  An  internally  reliable  survey  contains 
questions  that  are  all  comprehended  the  same  way  by  all  par¬ 
ticipants  at  all  times  in  repeated  measures  when  it  is  adminis¬ 
tered.  A  lack  of  internal  reliability  is  a  common  phenomenon,  as 
different  participants  can  understand  survey  questions  differ¬ 
ently  if  they  are  not  asked  in  a  highly  clear  and  unambiguous 
fashion.  Therefore,  to  improve  the  internal  reliability  of  surveys, 
designers  need  to  make  sure  to  use  statements  that  are  entirely 
clear  and  leave  no  room  for  interpretation  on  what  is  meant  in 
the  questions.  An  example  of  a  low  internal  reliability  survey 
question  would  be  “Did  you  have  tremendous  difficulty  com¬ 
pleting  the  tasks  on  the  web  page?”  In  this  question,  participants 
who  had  little  difficulty  no  difficulty  and  a  moderate  amount 
of  difficulty  may  respond  to  the  questions  in  a  very  similar  way, 
resulting  in  confusion  regarding  whether  the  tasks  were  difficult 
or  not.  Additionally  if  participants  had  difficulties  in  some  tasks 
and  no  difficulties  in  the  others,  a  question  such  as  this  may  con¬ 
fuse  the  participants  on  what  types  of  tasks  (difficult  or  not  dif¬ 
ficult)  they  should  base  their  response  on.  Obviously,  in  survey 
design  it  is  important  to  be  careful  not  to  confuse  the  partici¬ 
pants  while  they  are  filling  out  the  surveys.  Potential  confusions 
can  mostly  occur  on  the  participants’  parts  regarding  what  is 
meant  by  the  survey  question,  and  what  the  survey  question  is 
about  (Cronbach,  1990).  Surveys  may  have  high  construct  va¬ 
lidity  meaning  they  may  have  been  designed  based  on  solid  re¬ 
search,  but  if  they  confuse  the  participants  with  their  questions, 
they  will  obviously  lack  internal  reliability,  and  consequently, 
predictive  power. 

The  most  commonly  used  measure  for  internal  reliability  of 
surveys  is  called  “Cronbach’s  Alpha  Internal  Reliability  Coeffi¬ 
cient”  (Cronbach,  1990).  The  coefficient  relies  on  checking 


whether  participants  respond  to  the  same  question  the  same 
way  when  it  is  asked  the  second  time,  in  a  similar  form.  These 
types  of  questions  are  called  “duplicate  questions.”  The  Cron¬ 
bach’s  Alpha  Coefficient  is  a  correlation  coefficient  that  deter¬ 
mines  the  correlation  between  the  duplicate  questions,  thereby 
giving  an  indication  of  whether  the  participants  have  the  same 
understanding  of  a  question  when  it  is  asked  in  a  slightly  differ¬ 
ent  way  more  than  once.  In  many  cases,  the  opposite,  inverted 
form  of  the  same  question  can  be  asked  later  in  the  survey.  An 
example  of  two  duplicate  questions  would  be  one  question 
early  on  in  the  survey  such  as  “Did  you  find  the  web  design  ef¬ 
fective  to  complete  the  tasks?”  and  later,  toward  the  end  of  the 
survey,  “’Was  the  web  design  effective  to  complete  the  tasks?”  Al¬ 
ternatively,  a  question  asking  the  same  issue  of  web  effective¬ 
ness  in  a  reversed  manner  can  also  be  posed  later  in  the  survey 
in  an  inverted  question  such  as  “Did  you  find  the  web  design 
ineffective  to  complete  the  tasks?” 

In  general,  one  or  two  duplicate  question  pairs  are  put  into 
surveys  of  moderate  size,  up  to  40  questions.  It  may  be  more 
helpful  to  insert  more  than  one  pair  of  duplicate  questions  into 
surveys  that  contain  more  than  40  questions.  Also,  if  the  survey 
has  sections  (for  example,  in  a  survey  measuring  web  usability, 
sections  may  include  usability  of  colors,  layout,  navigation,  etc.) 
it  is  recommended  to  have  one  duplicate  pair  of  questions  for 
each  of  the  sections  to  have  freedom  about  determining  the  in¬ 
dividual  internal  reliabilities  of  each  section. 

The  Cronbach’s  Alpha  coefficient  is  a  correlation  coefficient 
that  produces  a  value  between  zero  and  one.  The  correlation 
between  the  duplicate  questions  is  measured,  and  if  the  coeffi¬ 
cient  is  equal  to  or  greater  than  0.7,  then  a  survey  is  accepted 
as  having  high  internal  reliability  (Cronbach,  1990).  A  set  of  du¬ 
plicate  questions,  another  set  of  duplicate,  inverted  questions, 
and  a  sample  Cronbach’s  Alpha  Coefficient  computer  output 
from  the  SAS  (Statistical  Analysis  Software)  computer  package 
are  presented  in  Table  58.4. 

Internal  reliability  cannot  be  measured  for  surveys  that  con¬ 
tain  open-ended  questions.  In  empirical  HCI  research,  however, 
most  surveys  with  quantitative  parts  are  required  to  have  a  sat¬ 
isfactory  internal  reliability  coefficient  in  order  to  prove  the  re¬ 
liability  and  validity  of  their  results.  Simply  put,  lack  of  internal 
reliability  may  result  in  the  questions  not  measuring  what  they 
are  intending  to  measure.  Therefore,  it  is  imperative  in  HCI  re¬ 
search  to  insert  at  least  one  pair  of  duplicate  questions  into 
quantitative  surveys. 

More  controversial  than  the  internal  reliability  measure,  the 
inter-rater  reliability  of  a  quantitative  survey  is  concerned  about 
the  consistency  among  responses  given  by  different  participants 
to  the  same  question.  One  argument  is  that  in  objective  surveys, 
a  consistency  should  be  expected  to  some  level  among  partici¬ 
pant  responses  given  to  the  same  question  in  the  same  survey. 
■While  this  argument  may  stay  true  to  some  extent  in  evaluation 
surveys,  opinion  surveys,  as  the  name  indicates,  are  about  par¬ 
ticipants’  opinions,  which  will  obviously  differ  from  person  to 
person.  Hence,  it  is  arguable  that  the  inter-rater  reliability  coef¬ 
ficient  is  a  valid  measure  in  opinion  surveys.  Additionally,  a  cer¬ 
tain  amount  of  variability  is  always  present  among  responses  to 
evaluation  survey  questions,  even  if  the  participants  are  all  ex¬ 
posed  to  the  exact  same  environment  prior  to  the  implementa¬ 
tion  of  the  surveys.  The  inter-rater  reliability  coefficient  is  a  cor- 
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TABLE  58.4.  Sample  Duplicate  Questions  and  Cronbach’s  Alpha  Internal  Reliability  Coefficient  SAS  Computer  Package  Output 

Duplicate  Pair  with  Same  Question: 


1.  How  would  you  rate  the  convenience  of  this  interface? 

o 

o 

o 

o 

o 

Very  Inconvenient 

Inconvenient 

Neutral 

Convenient 

Very  Convenient 

2.  In  general,  how  convenient  was  this  interface  for  you  to  use? 

o 

o 

o 

o 

o 

Very  Inconvenient 

Inconvenient 

Neutral 

Convenient 

Very  Convenient 

Duplicate  Pair  with  Inverted  Question  (Inversion  of  the  Scale  Needed  for  Second  Question): 

1.  How  easy  were  the  tasks? 

o 

o 

o 

o 

o 

Very  Difficult 

Difficult 

Not  Difficult 

Easy 

Very  Easy 

2.  How  difficult  were  the  tasks? 

o 

o 

o 

o 

o 

Very  Easy 

Easy 

Not  Difficult 

Difficult 

Very  Difficult 

Sample  SAS  Output: 


Variables  Alpha 

Raw  0.881884 

Standardized  0.882042 


Cronbach  Coefficient  Alpha  with  Deleted  Variable 


Deleted 
Variable 
Question  1 
Question  2 


Raw  Variables 
Correlation 

with  Total  Alpha 

0.717499 

0.717499 


Standardized  Variables 
Correlation 
with  Total  Alpha 
0.717499 
0.717499 


Pearson  Correlation  Coefficients,  N  =  272 

Prob  >  |r|  under  HO:  Rho  =  0 
Question  1  Question  2 
Question  1  1.00000  0.71750 

<.0001 

Question  2  0.71750  1.00000 

<.0001 


relation  coefficient  among  these  responses  given  to  the  same 
question  by  different  participants.  And  expecting  a  correlation 
as  high  as  0.7  among  the  participant  responses  may  in  most 
cases  be  not  very  realistic  as  a  proof  of  reliability  of  a  survey. 
Hence,  while  the  inter-rater  reliability  coefficient  is  used  in  a 
number  of  survey  types  primarily  in  psychology,  it  is  not  seen 
as  an  essential  measurement  coefficient  for  survey  reliability  in 
relation  to  HCI  research  (Ozok  &  Salvendy,  2000).  The  inter¬ 
rater  reliability  coefficient  is  also  used  to  determined  how  pro¬ 
fessionals  in  psychology  and  sociology  rate  the  same  situation, 
but  this  type  is  not  covered  in  detail  due  to  its  lack  of  relevance 
to  HCI. 

Measuring  the  designed  survey’s  reliability  is  crucial  to  pro¬ 
ducing  useful  results  in  HCI  research.  Therefore,  comprehen¬ 
sive  survey  designers  must  pay  attention  to  these  reliability  mea¬ 


sures  while  designing  their  surveys,  ensuring  that  the  results  ob¬ 
tained  from  their  surveys  in  current  and  future  studies  will  have 
high  reliability,  thereby  improving  the  theoretical  and  practical 
impact  of  their  research. 

Other  survey  evaluation  issues.  A  large  part  of  evalua¬ 
tion  of  the  surveys  usually  happens  after  they  are  administered. 
However,  in  most  cases  an  equally  crucial  evaluation  of  a  sur¬ 
vey  happens  just  prior  to  the  implementation.  This  evaluation  is 
the  actual  pilot  testing  of  the  survey,  sometimes  also  referred 
to  as  “pre-testing”  (Dillman,  2000).  As  is  the  convention  with 
most  experimentation  techniques  in  HCI  research,  the  near- 
complete  surveys  can  be  administered  to  a  group  of  participants. 
After  this  initial  administration,  the  participants  can  be  asked 
some  questions  about  positive,  negative,  and  missing  issues  in 
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the  survey,  and  any  questions  or  sections  that  were  incompre¬ 
hensible  or  unclear.  Based  on  the  feedback,  the  administrators 
can  revise  the  survey  and  prepare  it  for  the  final  administration. 
While  there  are  no  firm  guidelines  regarding  the  number  of  par¬ 
ticipants  the  pilot  testing  of  surveys  should  be  run  on,  in  most 
cases  a  minimum  of  three  participants  is  recommended  for 
moderate-size  surveys  (less  than  200  participants).  For  large-size 
surveys,  up  to  10  participants  are  generally  useful  (Dillman, 
2000),  although  there  is  no  upper  limit  for  the  number  of  par¬ 
ticipants  to  be  used  for  the  pilot  study.  Most  surveys  require  re¬ 
vision  after  the  pilot  study,  as  in  most  cases  there  are  some 
points  the  survey  designers  miss  without  the  perspective  of  ac¬ 
tual  survey  participants.  In  rare  cases  when  no  revisions  are 
made  to  the  surveys  after  the  pilot  survey  administration,  data 
obtained  from  the  participants  in  the  pilot  can  be  included  in 
the  actual  participant  data  pool. 

How  well  a  survey  is  designed  is  directly  related  to  the  valid¬ 
ity  and  reliability  of  the  results  the  research  produces.  Hence,  the 
evaluation  techniques  covered  in  this  section  are  crucial  to  the 
overall  success  of  the  designed  survey  and  the  research  itself 

Survey  Design,  Redesign,  and  Revision 

Design,  redesign,  and  revision  procedures  for  surveys  to 
some  extent  bear  some  similarities  to  product  design,  redesign, 
and  revision  procedures.  The  initial  design  of  surveys,  as  ex¬ 
plained  earlier,  consists  of  generating  questions  based  on  the  lit¬ 
erature,  expert  opinion,  and  heuristics.  Redesign  and  revision 
procedures  mostly  rely  on  the  implementation  of  the  survey  on 
the  entire  group  or  a  subgroup  of  participants.  In  most  cases 
after  a  pilot  test,  surveys  need  revision  which  consists  of  chang¬ 
ing  or  revising  questions,  scales,  or  instructions  in  the  survey 
to  make  them  clearer  and  more  understandable  for  the  partici¬ 
pants.  If  there  are  errors  in  question  structures  or  spelling  er¬ 
rors,  those  are  also  located  and  eliminated  after  the  pilot  study. 
In  rare  cases,  the  required  changes  may  be  significant  to  the 
level  that  the  survey  may  need  redesign  through  the  revision 
and  change  of  most  questions,  scales,  and  instructions.  It  can  be 
said  that  most  small-scale  survey  revisions  happen  based  on  the 
feedback  obtained  from  the  pilot  study. 

Showing  a  certain  amount  of  similarity  to  consumer  prod¬ 
ucts,  frequently  used  surveys  also  need  redesign  and  revisions 
over  longer  periods.  Specifically  in  HCI,  user  habits  and  evalua¬ 
tion  criteria  for  technology  and  technology-related  products 
and  issues  change.  It  is  therefore  recommended  that  validated 
surveys  that  are  used  as  empirical  measurement  tools  in  HCI 
should  be  reevaluated  and  updated  about  once  a  year  to  en¬ 
sure  they  are  up-to-date  measurement  tools  and  contain  the 
latest  additions  to  the  HCI  area  as  far  as  evaluation  and  opinion 
elements  and  techniques  are  concerned. 

Illustrative  Survey  Design  Example 

Figure  58.3  presents  a  sample  of  a  complete,  generic  paper- 
and-pencil  survey  in  the  example  of  a  postexperimental  task  sat¬ 
isfaction  survey.  In  the  design  of  a  survey  measuring  the  Tablet 
PC  usability  issues  among  academic  professionals,  the  first  step 


is  to  develop  a  literature  portfolio.  This  portfolio  should  cover  lit¬ 
erature  on  both  mobile  computer  usability  and  Tablet  PC  usabil¬ 
ity.  Next,  researchers  may  send  an  inquiry  to  a  group  of  acade¬ 
micians  who  use  Tablet  PCs,  inquiring  about  major  usability 
categories  in  relation  to  Tablet  PCs  in  open-ended  questions. 
Based  on  the  input  from  literature  and  expert  opinions,  the  re¬ 
searchers  create  an  initial  set  of  questions  and  scales,  pilot-test  it, 
and  administer  the  survey,  most  likely  in  an  environment  where 
they  give  specific  Tablet  PC  tasks  to  participants  in  a  controlled 
environment,  preceded  by  the  actual  survey  administration. 

Challenges  in  Survey  Design 

Survey  design  challenges  mostly  deal  with  possible  mistakes 
in  producing  the  survey  questions  and  scales.  Additionally,  some 
problems  may  occur  due  to  the  questions  having  no  validity 
backing.  Therefore,  the  key  for  HCI  survey  researchers  is  to  gain 
strong  background  in  the  area  through  literature  and  expert 
opinion  before  designing  the  surveys.  After  this  background  is 
gained,  researchers  are  likely  to  have  no  difficulty  designing  the 
surveys  with  the  appropriate  number  and  content  of  questions 
and  scales,  and  a  comprehensive  set  of  instructions  to  be  pre¬ 
sented  to  the  participants  on  how  they  should  fill  out  the  survey. 


SURVEY  IMPLEMENTATION  IN  HCI 


Survey  implementation  can  be  categorized  into  open  and  con¬ 
trolled  survey  implementation  environments.  In  this  section, 
these  two  environments  are  first  discussed,  followed  by  a  discus¬ 
sion  of  sample  representativeness  issues,  an  implementation  ex¬ 
ample,  and  discussions  of  implementation  challenges  and  emerg¬ 
ing  and  future  trends  in  survey  design  and  implementation. 

Open  vs.  Controlled  Implementation  Environments 

Survey  implementation  (also  referred  to  as  “survey  administra¬ 
tion”)  occurs  in  two  alternative  environments.  In  controlled  sur¬ 
vey  implementation  environments,  participants  fill  out  the  sur¬ 
vey  in  an  environment  specifically  designated  for  and  arranged 
according  to  their  activity.  An  open  implementation  environ¬ 
ment,  on  the  other  hand,  does  not  contain  any  specific  envi¬ 
ronmental  traits  for  the  participant  to  implement  the  survey. 
Open  environments  also  mostly  do  not  include  any  restrictions 
in  time  or  other  factors. 

Controlled  environments  for  implementation  of  surveys  usu¬ 
ally  have  the  goal  of  preventing  any  distraction  for  the  participant 
to  hinder  his  or  her  understanding  or  judgment.  Controlled  sur¬ 
vey  environments  are  in  most  cases  well-lit  experimental  rooms 
with  appropriate  equipment  to  make  the  participant  moderately 
comfortable  (often  consisting  of  a  chair  and  a  table).  Both  com¬ 
puter  and  paper-and-pencil-based  surveys  can  be  implemented 
in  either  open  or  controlled  environments.  If  the  survey  is  im¬ 
plemented  in  a  controlled  environment  in  front  of  a  computer, 
the  survey  implementers  need  to  make  sure  that  the  computer’s 
alignments  (screen  brightness,  glare,  screen  distance,  keyboard 
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GENERAL  SATISFACTION  SURVEY 

Please  indicate  how  you  personally  feel  about  performing  the  different  tasks. 

Each  of  the  statements  below  is  something  that  a  person  might  say  about  performing  a  job  like  the 
searching  task  through  an  interface.  You  are  to  indicate  your  own  personal  feelings  about  your  work 
experience  with  the  different  tasks  you  just  completed  by  marking  the  number  most  closely  describes 
how  much  you  agree  with  each  of  the  statements. 

7.  My  opinion  of  myself  went  up  when  1  performed  the  tasks  correctly. 


o 

O 

O 

O 

O 

O 

O 

Strongly 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

2.  Generally  speaking,  I  am 

very  satisfied  with  performing  the  tasks. 

O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

3.  The  tasks  I  performed  were  very  meaningful  to  me. 

O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

4. 1  felt  the  current  web  page  structure  design  is  good  enough  for  me  to  perform  the  tasks. 

O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

5-  The  tasks  were  usually  interesting  enough  to  keep  me  from 

getting  bored. 

O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

6.  My  own  feelings  were  not  affected  much  one  way  or  the  other  by  how  well  I  performed  the  different  tasks. 


O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

7.  /  am  in  general  satisfied  with  the  kind  of  work  I  performed  in  the  different  web-page  tasks. 

O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

8.  Most  of  the  things  I  had  to  do  to  perform  the  tasks  seemed  useless  or  trivial. 

O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

9.  I felt  uncomfortable  when  I  performed  the  tasks  incorrectly. 

O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

10. 1 felt  very  satisfied  with  the  accomplishment  I  got  from  performing  the  tasks. 

O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

11. 1 felt  very  satisfied  with  the  amount  of  independent  thought  and  action  I  could  exercise  in  the  tasks. 

O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

12.  With  current  web  page  structure  design,  I  felt  difficult  to  perform  the  tasks  efficiently  and  effectively. 

O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

13- 1  felt  very  satisfied  with  the  amount  of  challenge  in  these  tasks. 

O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

14. 1 felt  very  satisfied  with  the  level  of  mental  effort  required  to  perform  the  tasks. 

O 

O 

O 

O 

O 

O 

O 

Strong^ 

Disagree 

Moderately 

Neutral 

Moderately 

Agree 

Strongly 

Disagree 

Disagree 

Agree 

Agree 

FIGURE  58.3.  Sample  of  a  paper-and-pencil  satisfaction  survey. 


height,  and  other  ergonomics  issues)  are  optimized  for  the  par¬ 
ticipant.  In  controlled  environments,  in  most  cases  a  survey  ad¬ 
ministrator  is  also  available  to  answer  possible  questions  from 
the  participant.  These  types  of  controlled  survey  administration 
environments  are  usually  used  to  implement  evaluation  surveys 
as  in  most  cases  participants  had  just  completed  computer-based 
tasks  and  for  them  to  be  able  to  evaluate  the  interfaces  or  any 


other  HCI-related  environments,  products,  or  procedures,  con¬ 
trolled  environments  force  them  to  do  those  evaluations  imme¬ 
diately,  while  the  memories  of  the  items  for  them  to  evaluate 
are  still  fresh  in  their  minds.  Opinion  surveys  are  generally  not 
implemented  in  controlled  environments.  In  some  rare  cases  in 
which  recording  participant  behavior  during  survey  implemen¬ 
tation  is  part  of  the  experimentation,  a  controlled  environment 


1166 


OZOK 


can  provide  the  equipment  to  nonintrusively  record  participant 
behavior/activities  during  the  implementation  of  the  survey. 

While  surveys  are  in  some  cases  implemented  in  closed 
environments,  doing  so  may  be  costly  and  time-consuming. 
Therefore,  unless  there  is  explicit  need  for  a  controlled  environ¬ 
ment,  surveys  are  more  commonly  implemented  in  open  envi¬ 
ronments.  Open  environments  are  environments  of  the  partici¬ 
pants’  choosing,  in  most  cases  environments  from  their  daily  life. 
In  open  environments,  a  survey  administrator  is  not  present.  Im¬ 
plementing  surveys  in  open  environments  has  the  advantage  of 
giving  the  participants  the  choice  to  choose  the  time  and  place 
of  the  implementation.  This  flexibility  above  all  increases  the 
ease  of  finding  participants.  Additionally,  the  freedom  for  the  par¬ 
ticipants  to  fill  out  the  survey  at  their  convenience  also  improves 
their  feeling  of  freedom  and  may  increase  their  enthusiasm, 
thereby  improving  the  accuracy  of  the  survey  results  (Dillman, 
2000).  On  the  other  hand,  the  surveys  being  filled  out  without 
the  presence  of  a  survey  administrator  will  prevent  the  partici¬ 
pants  from  asking  any  questions  regarding  the  survey  during  the 
administration.  Additionally,  previous  research  has  indicated  that 
if  participants  have  no  particular  motivation  to  fill  out  the  survey, 
they  may  complete  it  very  quickly  without  paying  much  atten¬ 
tion,  resulting  in  inaccurate  survey  results  (Cochran,  1977). 

Both  controlled  and  open  survey  implementation  environ¬ 
ments  have  advantages  and  disadvantages.  In  most  cases,  how¬ 
ever,  open  environments  are  faster  and  more  convenient  to 
collect  the  needed  data  due  to  the  flexibility  they  offer  to  both 
the  participants  and  implementers. 

Sample  Representativeness  in  Survey  Implementation 

To  ensure  the  validity  of  the  results  obtained  from  surveys,  it  is 
imperative  to  choose  a  representative  sample  of  the  target  pop¬ 
ulation  to  successfully  implement  the  survey.  It  is  common 
knowledge  that  the  validity  of  the  survey  results  improves  with 
larger  sample  sizes.  Therefore,  researchers  need  to  carefully 
choose  both  the  sample  sizes  and  the  sample  participants. 

There  are  no  strict  rules  for  determining  sample  sizes  in  sur¬ 
vey  implementation.  The  size  of  the  survey  sample,  meaning 
how  many  participants  should  fill  out  the  survey,  depends  on  the 
type  of  the  survey  and  survey  questions,  as  well  as  the  number  of 
questions  in  the  survey.  Thiemann  and  Kraemer  (1987)  summa¬ 
rized  the  statistical  methods  of  determining  sample  sizes  based 
on  the  number  of  variables  being  measured  in  the  experiment. 
In  most  cases,  surveys  measuring  general  topics  (for  example, 
surveys  about  cell-phone-use  habits  of  a  certain  population,  such 
as  college  students)  should  be  implemented  on  relatively  large 
sample  sizes,  possibly  no  less  than  60  participants  for  a  survey 
consisting  of  up  to  30  questions.  For  survey  implementation,  as 
a  rule  of  thumb  the  number  of  participants  should  always  be  big¬ 
ger  than  the  number  of  questions  in  the  survey.  While  there  are 
no  set  rules  on  choosing  sample  sizes  for  survey  implementa¬ 
tion,  large  sample  sizes  always  improve  the  probability  of  ob¬ 
taining  high  validity  of  surveys.  For  more  on  how  to  calculate 
optimal  sample  sizes  in  survey  implementation  to  obtain  satis¬ 
factory  statistical  power,  see  Thiemann  and  Kraemer  (1987). 

"When  surveys  are  administered,  one  of  the  most  critical  is¬ 
sues  is  to  administer  the  survey  on  a  balanced,  nonbiased,  and 


representative  sample.  In  general,  for  surveys  administered  on¬ 
line,  the  survey  should  be  sent  to  an  as  big  a  potential  partici¬ 
pant  pool  as  possible  to  ensure  heterogeneity.  A  large  group  of 
potential  participants  should  also  be  sought  if  the  surveys  are 
paper-and-pencil-based,  by,  for  example,  mailing  out  a  large 
number  of  paper  surveys.  A  balanced  sample  size  in  terms  of 
race  and  gender  ensures  higher  validity  as  well  as  a  broader  ap¬ 
plication  of  the  results,  and  this  can  be  achieved  by  sending  the 
survey  to  a  large  base  of  participants.  The  demographic  infor¬ 
mation  collected  also  helps  determine  exactly  whether  the  par¬ 
ticipant  sample  accurately  represents  the  target  population.  As 
in  any  experimental  or  survey-based  research,  heterogeneity  of 
the  sample  size  allows  the  researchers  to  strengthen  their  argu¬ 
ment  that  the  results  of  their  study  are  applicable  and  general- 
izable  to  the  majority  of  their  target  population.  For  example,  if 
there  is  a  vast  majority  of  males  compared  to  females  in  the  sam¬ 
ple  while  the  gender  distribution  in  the  actual  target  population 
is  estimated  to  be  about  even,  then  the  validity  of  the  results  may 
be  argued  upon  as  the  variation  of  the  results  due  to  the  females 
would  not  be  taken  into  consideration  in  the  population,  and 
consequently  in  the  conclusions  derived  from  the  survey  results. 
Therefore,  survey  participant  pools  should  be  chosen  carefully, 
and  they  should  also  adequately  represent  the  target  popula¬ 
tion.  Especially  in  large  sample  sizes,  sample  characteristics  may 
vary  greatly,  especially  concerning  age,  education,  and  occupa¬ 
tion  demographics.  In  these  cases,  the  demographics  should 
be  presented  in  detail  as  part  of  research  results.  In  most  cases, 
a  detailed  explanation  of  demographics  concerning  participants’ 
education  and  occupation  can  justify  the  representativeness  is¬ 
sue  of  survey  results,  as  long  as  the  levels  of  these  attributes  do 
not  differ  very  greatly  between  the  sample  and  estimated  target 
population.  In  those  cases,  the  results  of  the  survey  findings 
should  indicate  that  the  findings  are  likely  to  apply  to  the  par¬ 
ticular  segment  of  the  population  that  had  an  overwhelming  ma¬ 
jority  among  participants  in  the  sample  size.  For  example,  sup¬ 
pose  a  survey  on  habits  of  general  population  concerning  the 
use  of  cell  phones  has  been  conducted.  If  the  vast  majority  of 
the  participants  (more  than  about  two  thirds)  are  university  stu¬ 
dents,  then  the  researchers  should  indicate  in  their  report  of 
results  that  they  measured  the  cell  phone  habits  of  the  univer¬ 
sity  student  population,  which  constitutes  to  a  large  percentage 
of  avid  cell-phone  users  (Ozok  &  Wei,  2004). 

Survey  participant  pools  require  caution  when  they  are  cho¬ 
sen,  and  in  cases  when  the  researchers  are  convinced  that  the 
sample  is  not  highly  representative  of  the  target  population,  they 
need  to  make  clear  that  the  results  of  their  survey  may  possibly 
have  a  narrower  focus.  In  most  cases,  such  narrowing  of  the  tar¬ 
get  population  which  the  research  is  aimed  at  does  not  result  in 
validity  problems,  but  rather  makes  clear  which  population  or 
population  segment  the  results  of  the  survey  study  apply  to. 

Challenges  in  Survey  Implementation 

As  indicated  in  previous  sections,  surveys  are  a  relatively  easy  way 
of  collecting  data.  They  usually  don’t  require  expensive  equip¬ 
ment  for  implementation,  and  with  the  advent  of  the  Internet, 
can  be  easily  distributed,  filled  out,  returned,  and  analyzed.  How¬ 
ever,  there  are  still  some  serious  challenges  in  survey  implemen¬ 
tation  in  both  paper-and-pencil  and  electronic  environments. 
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Looking  at  the  big  picture,  surveys  are  sometimes  referred 
to  as  “subjective  measurements.”  While  the  author  of  this  chap¬ 
ter  strongly  disagrees  with  this  statement,  the  distinction  should 
be  made  between  measurement  environments  where  perfor¬ 
mance  measurements  are  taken  objectively  and  unequivocally 
through  camera  recordings  and  software,  and  environments 
where  participants  are  asked  to  indicate  what  is  going  through 
their  minds.  In  the  latter  environment,  obviously,  there  is  no  way 
to  ensure  that  participants  are  putting  on  the  surveys  exactly 
what  they  think  about  an  issue,  an  environment,  a  tool,  a  prod¬ 
uct,  or  a  procedure.  It  is  not  uncommon  that  participants  fill  out 
a  survey  without  paying  much  attention,  or  even  randomly  mark 
responses  without  reading  the  questions.  The  survey-reliability 
measurement  techniques  to  some  extent  prevent  this  type  of 
random  data  from  being  used.  For  example,  whether  partici¬ 
pants  took  the  survey  questions  seriously  can  be  determined  by 
looking  for  discrepancies  between  Cronbach’s  Alpha  duplicate 
questions.  Additionally,  strict  instructions  given  to  the  partici¬ 
pants  at  the  beginning  of  the  survey  in  written  or  spoken  format 
can  also  to  some  extent  improve  the  probability  of  participants 
taking  the  time  to  read  the  questions  and  give  replies  carefully. 
Researchers  using  surveys  as  their  primary  research  tool  always 
reserve  the  right  to  eliminate  participant  data  that  look  ill  fated 
or  incomplete.  However,  the  researchers  need  to  have  evidence 
in  their  hands  that  the  participant  did  not  complete  the  survey 
by  obeying  the  rules  that  were  presented  to  them,  not  on  any 
other  ground  such  as  the  participant  responses  not  being  in  ac¬ 
cordance  with  the  majority  of  other  participants  or  with  the  di¬ 
rection  of  results  that  are  expected  from  the  research. 

Another  challenge  is  the  “return  rate”  of  surveys.  Response 
and  return  rates  among  surveys  that  do  not  offer  any  compen¬ 
sation  is  extremely  low — less  than  20%  (Dillman,  2000).  HCI  re¬ 
search  may  involve  lengthy  surveys  as  well  (more  than  30  ques¬ 
tions)  which  can  potentially  result  in  even  lower  return  rates. 
Therefore,  it  is  recommended  that  some  sort  of  compensation 
should  be  offered  to  participants  in  survey  research,  whatever 
the  resources  will  allow.  This  compensation  may  be  small  gifts 
(e.g.,  a  pen  or  notepad),  gift  certificates,  or  cash  compensation. 
Any  of  these  types  of  incentives  will  surely  improve  the  return 
rates  of  surveys  (Miller,  1996).  Additionally,  surveys  can  contain 
statements  to  convince  participants  that  they  will  also  learn  im¬ 
portant  issues  concerning  the  research  while  filling  out  the  sur¬ 
veys.  It  should  be  noted,  however,  that  the  practical  issue  of 
finding  subjects  should  not  bias  the  sample,  and  recruiting  of 
participants  should  be  arranged  according  to  the  data-collection 
needs  of  the  research,  not  according  to  what  kind  of  participant 
groups  are  the  most  practical  to  collect  data  from.  Recruitment 
activities  need  to  be  targeted  to  ensure  a  representative  sample. 

In  most  implementation  activities,  participants  should  be 
given  their  privacy  while  filling  out  the  survey,  ensured  that  their 
data  will  be  kept  confidential,  and  be  provided  a  comfortable  en¬ 
vironment.  Otherwise,  they  may  want  to  either  quit  or  finish  as 
soon  as  possible  without  any  consideration  of  the  accuracy  of 
responses.  It  should  also  be  noted  that  surveys  are  voluntary, 
and  therefore  survey  implementers  should  indicate  the  volun¬ 
tary  nature  of  the  surveys  and  not  pressure  the  participants.  Re¬ 
search  indicates  that  when  participants  are  pressurized  to  give 
accurate  responses  or  when  a  mutual  trust  between  the  admin¬ 
istrator  and  the  participant  is  not  established  regarding  the  sin¬ 


cerity  of  both  sides,  they  mostly  produce  very  unreliable  sur¬ 
vey  results  (Dillman,  2000). 

Other  survey  implementation  challenges  involve  partici¬ 
pants’  interaction  with  the  survey  interface.  In  paper-and-pencil 
surveys,  the  fonts  on  the  paper  should  have  enough  size  and  fa¬ 
miliarity  for  all  participants,  a  pencil  with  an  eraser  head  will  al¬ 
low  participants  to  correct  their  responses,  and  survey  elements 
such  as  questions  and  scales  should  be  adequately  distinct  from 
each  other  to  prevent  any  mistakes.  Survey  designers  should  use 
a  very  simple  language  in  the  surveys  and  avoid  any  little-known 
words  and  sentence  structures  (Gendall,  1998).  Surveys  con¬ 
sisting  of  multiple  pages  should  be  clearly  numbered  on  each 
page.  This  kind  of  a  convenient  interaction  environment  will 
improve  participant  enthusiasm  and  increase  response  rates  as 
well  as  accuracy  of  survey  results.  Additionally,  while  some  stud¬ 
ies  expressed  concern  about  differences  in  survey  responses 
among  computer  and  paper-and-pencil  surveys  (Sanchez,  1992; 
Cole,  2005)  and  issues  concerning  the  format  in  which  online 
surveys  are  presented  (Couper,  Traugott,  &  Lamias,  2001;  Couper, 
Traugott,  Conrad,  &  Crawford,  2004;  Kiernan,  Kiernan,  Oyler,  & 
Gilles,  2005),  a  recent  study  indicated  that  the  accuracy  of  survey 
responses  did  not  significantly  differ  between  surveys  adminis¬ 
tered  online  and  those  administered  paper-and-pencil  (Carini, 
Hayek,  Kuh,  Kennedy,  &  Ouimet,  2003). 

In  electronic  surveys,  the  computer  interface  should  be  very 
simple  and  participants  with  little  experience  with  computers 
should  not  have  any  difficulty  using  the  interfaces.  In  most  cases, 
participants  are  required  to  mark  their  responses  with  a  mouse 
and  type  their  responses  on  clearly  marked,  designated  text 
spaces  that  have  adequate  size  for  visibility  purposes.  Some  spe¬ 
cial  equipment  may  offer  some  additional  conveniences,  such 
as  touch  screens.  Screen  glare  and  font  sizes  should  be  given 
consideration  to.  It  can  be  recommended  that  computer  surveys 
should  be  implemented  on  screens  no  smaller  than  12  inches 
of  diagonal  size,  with  a  refresh  rate  of  at  least  60  MHz.  For  sur¬ 
veys  longer  than  one  screen,  scrolling  should  be  minimized.  It 
is  recommended  that  each  survey  screen  should  not  require 
more  than  two  screen-heights  of  scrolling,  and  should  be  con¬ 
nected  with  hyperlinks  or  screen  buttons,  meaning  once  a  par¬ 
ticipant  completed  a  screen,  he  or  she  should  be  required  to 
move  on  to  the  next  screen  by  clicking  on  a  screen  button  or  a 
link.  Besides  the  scrolling  issue,  if  a  participant  sees  a  lengthy 
survey  all  presented  on  one  screen,  he  or  she  may  get  discour¬ 
aged  to  fill  out  the  survey  due  to  its  length. 

An  additional  potentially  problematic  item  is  the  number  of 
questions  to  ask  participants  in  a  survey.  In  most  cases,  the  at¬ 
tention  span  of  participants  is  very  short,  and  surveys  that  do  not 
offer  any  compensation  are  recommended  to  be  shorter  than 
30  questions.  In  most  cases,  participants  are  not  interested  in 
spending  more  than  15  minutes  in  filling  out  surveys  for  which 
they  don’t  get  any  compensation.  There  is  always  a  tradeoff  be¬ 
tween  the  size  of  surveys,  meaning  the  ability  to  collect  all  the 
necessary  data,  and  the  ability  to  recruit  subjects.  Long  surveys 
are  more  difficult  to  recruit  participants  for.  Researchers  should 
think  carefully  about  compensation  methods  (money,  gifts,  gift 
certificates)  if  they  intend  to  implement  large-scale  surveys. 

Finally,  Internet  surveys  also  carry  the  potential  of  technical  dif¬ 
ficulties  due  to  the  variety  of  computers  the  participants  may  be 
using.  Schleyer  and  Forrest  (2000)  identified  usability  problems. 
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programming  errors,  and  incompatibilities/technical  problems 
as  main  problems  identified  in  web-administered  surveys.  There¬ 
fore,  Internet-based,  especially  web-based  surveys  should  not 
require  any  scripts  or  plug-ins  to  run  and  if  possible  should  con¬ 
sist  of  simple  Hypertext  Markup  Language  (HTML)  code,  as 
HTML  is  universally  recognized  by  all  browsers. 

These  are  the  major  challenges  currently  the  survey  imple- 
menters  have  to  deal  with.  However,  with  careful  design  and  im¬ 
plementation,  as  well  as  strict  instructions  containing  compre¬ 
hensive  information  presented  to  the  participants  regarding  the 
survey,  the  challenges  can  be  easily  overcome,  resulting  in  valid 
and  reliable  survey  results. 


EMERGING  AND  FUTURE  TRENDS  IN  SURVEY 
DESIGN  AND  IMPLEMENTATION  IN  HCI 


Surveys  have  retained  their  structure,  more  or  less,  for  many 
decades.  It  can  be  said,  however,  that  electronic  and  especially 
Internet-based  implementation  has  changed  the  convenience 
level  of  survey  implementation  and  data  analysis  in  a  signifi¬ 
cantly  positive  way.  It  is  difficult  to  predict  whether  any  new 
groundbreaking  techniques  will  cause  further  leaps  in  survey 
design,  development,  and  analysis,  but  if  significant  new  devel¬ 
opments  will  happen  in  the  near  future,  they  are  likely  to  hap¬ 


pen  in  the  implementation  technology.  Internet  surveys  are  on 
the  rise,  with  the  percentage  of  Internet-based  surveys  being  on 
the  rise  for  the  past  five  years.  While  Internet-based  surveys 
comprised  15%  of  all  surveys  implemented  in  1999,  this  number 
increased  to  70%  in  2004,  according  to  Nua  Internet  Surveys 
(2005).  With  the  improvement  in  voice  recognition  and  voice 
synthesis  technologies,  future  surveys  may  eliminate  the  need 
of  a  visual  interface;  however,  human-factors  issues  in  these 
types  of  interfaces  specifically  for  surveys  are  still  to  be  explored. 
It  is  apparent  that  the  number  of  surveys  implemented  through 
the  Internet  and  other  networks  will  continue  to  climb  in  the 
years  to  come,  due  to  cost  savings  and  a  number  of  other  con¬ 
venience  issues.  In  the  future,  HCI  research  is  also  likely  to  con¬ 
tinue  to  use  surveys  as  a  main  data-collection  tool.  With  HCI  re¬ 
search  becoming  a  more  integral  part  of  technology  design  and 
creation  (for  example,  human-factors  engineers  and  software 
engineers  working  collaboratively  in  the  software-design  pro¬ 
cess),  user  surveys  may  become  more  integrated,  collecting 
data  for  current  or  future  users  regarding  both  HCI  and  other 
technology  issues.  Additionally,  computer  literacy  is  increasing 
at  a  fast  pace  (U.S.  Census  Bureau,  2005),  which  allows  HCI  sur¬ 
vey  researchers  more  freedom  in  asking  more  sophisticated 
questions  concerning  interface  evaluation,  current  user  trends, 
and  more.  All  that  said,  the  contribution  of  surveys  to  HCI  re¬ 
search  is  highly  significant  these  days,  and  will  likely  continue  to 
be  so  in  the  many  years  to  come. 
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USABILITY  INSPECTION  METHODS 


Usability  inspection  methods  (UIMs)  remain  an  important  dis¬ 
count  method  for  usability  evaluation.  They  can  be  applied  to 
any  designed  artifact  during  development:  a  paper  prototype,  a 
storyboard,  a  working  prototype  (e.g.,  in  Macromedia  Flash™  or 
in  Microsoft  PowerPoint  '"),  tested  production  software,  or  an 
installed  public  release.  They  are  analytical  evaluation  methods, 
which  involve  no  typical  end  users,  unlike  empirical  methods 
such  as  user  testing.  UIMs  only  require  availability  of  a  designed 
artifact  and  trained  analysts.  Thus,  evaluation  is  possible  with 
low  resources  (hence  discount  methods).  Although  risks  arise 
from  low  resources,  well-informed  practices  disproportionately 
improve  analyst  performance,  improving  cost-benefit  ratios. 
This  chapter  introduces  UIMs,  covering  six  and  one  further 
method,  and  provides  approaches  to  assessing  existing,  emerg¬ 
ing,  and  future  UIMs  and  their  effective  uses. 

Definitions 

A  UIM  is  an  analytic  evaluation  method  that  is  applied  directly  to 
an  interaction  design  artifact.  Direct  application  contrasts  UIMs 
with  model-based  methods,  which  are  indirectly  applied  via  de¬ 
sign  representations  (or  models),  requiring  construction  of 
models,  and  secondary  application  of  analyses  to  designed  arti¬ 
facts.  Using  the  GOMS  method,  for  example,  a  task  model 
would  be  analyzed,  and  the  results  would  have  to  be  reframed 
to  address  the  actual  design  artifact.  In  contrast,  UIMs  such  as 
heuristic  evaluation  (discussed  in  a  later  section)  directly  iden¬ 
tify  design  features  that  may  cause  user  difficulties. 

The  Need  and  Rationale  for  UIMs 

All  evaluation  methods  aim  to  find  relevant  usability  problems 
and  to  support  understanding  and  fixing  them.  Systems  con¬ 
tain  potential  causes  of  user  difficulties  that  are  only  exposed 
within  certain  usage  scenarios.  Analytical  methods  must  simu¬ 
late  (fragments  of)  such  scenarios.  User-based  methods  remain 
more  reliable  because,  once  selected  and  briefed,  individual  test 
participants  generate  many  complete  usage  scenarios.  Hence, 
user  testing  can  expose  problems  through  user  behaviors  that 
are  very  hard  to  anticipate.  However,  testing  only  reveals  a  wide 
possible  range  of  usability  problems  if  users  stress  the  system 
with  complex  interactions  across  all  features.  It  is  thus  possible 
to  anticipate  user  difficulties  that  may  not  emerge  during  test¬ 
ing,  unless  specific  efforts  are  made  to  flush  out  all  predicted 
problems  (Woolrych,  Cockton,  &  Hindmarch,  2004). 

There  is  thus  no  absolute  effectiveness  or  quality  ranking  of 
analytical  and  empirical  methods.  Where  resources  allow,  one 
should  combine  both  approaches;  where  resources  are  limited, 
it  may  only  be  possible  to  apply  UIMs.  Thus,  heuristic  evaluation 
was  the  most  used  usability  method  in  a  paper  questionnaire- 
based  survey  of  111  usability  professionals  in  1999  (Rosenbaum, 
Rohn,  &  Humburg,  2000).  However,  a  more  recent  web  survey 
of  83  usability  professionals  (Venturi,  2004)  now  ranks  inspec¬ 
tion  methods  (38%  of  respondents  use  during  design)  well  be¬ 
low  discount  user  testing  (quick  and  dirty  usability  tests  used  by 


53% — in  contrast  to  70%  and  65%  respectively  in  the  1999  sur¬ 
vey).  Although  surveys  must  be  compared  with  great  caution, 
inspection  methods  may  no  longer  be  the  predominant  method 
in  usability  practice. 

A  possible  shift  from  analytical  to  empirical  methods  may  be 
due  to  excessive  pessimism  about  UIMs’  weaknesses  relative  to 
other  methods.  Our  view  is  that,  used  with  care,  inspection 
methods  remain  a  key  technique  in  the  usability  toolbox:  “The 
challenge  is  to  improve  all  HCI  methods,  so  that  discount  meth¬ 
ods  are  less  discounted  and  ‘full  strength’  methods  can  be  ap¬ 
plied  in  more  contexts”  (Cockton  &  Woolrych,  2002,  p.  29). 

The  most  appropriate  use  of  inspection  methods  is  to  drive 
design  iterations,  rather  than  use  for  summative  evaluation, 
benchmarking,  or  competitor  analysis.  UIMs  are  cheap  to  apply 
and  are  seen  as  low  cost  and  low  skill.  They  have  been  applied 
to  many  commercial  designs  by  practitioners.  UIMs  can  be  used 
before  a  testable  prototype  has  been  implemented  and  can  be 
iterated  without  exhausting  or  biasing  a  group  of  test  partici¬ 
pants.  UIMs  can  also  be  a  planning  resource  for  user  testing, 
which  can  be  designed  to  focus  on  predicted  problems.  Also, 
the  value  of  UIMs  as  training  devices  should  not  be  underesti¬ 
mated.  In  contrast,  model-based  approaches  remain  limited  or 
immature,  are  expensive  to  apply,  and  use  is  mostly  restricted  to 
research  teams  (e.g.,  Salvucci,  Zuber,  Beregovaia,  &  Markley, 
2005).  Similarly,  development  resources  may  rule  out  user  test¬ 
ing,  leaving  UIMs  as  the  only  possible  approach,  and  some  us¬ 
ability  is  always  better  then  no  usability. 

UIMs  remain  important  and  their  weaknesses  can  be  miti¬ 
gated.  The  main  risks  are  missing  serious  usability  problems, 
and  wasting  development  effort  through  poor  fixes  to  mispre¬ 
dicted  nonproblems  (false  positives).  Different  business  models 
make  different  demands,  thus  UIM  errors  are  more  costly  in 
some  development  contexts  than  others  are.  In  some  contexts, 
successful  predictions  may  always  be  worthwhile  despite  a  flood 
of  false  positives,  as  fixed  problems  translate  into  savings  on 
support  costs  and  attractive  new  features.  'When  users  buy  soft¬ 
ware,  most  must  struggle  on,  unlike  visitors  to  free  e-commerce 
sites,  yet  mainstream  UIMs  do  not  cover  their  complete  user 
experience  especially  for  affective  issues  such  as  trust,  com¬ 
fort,  and  brand  image.  Hence,  user  testing  is  vital  to  eliminate 
any  severe  problems;  otherwise,  money  will  be  lost. 

The  Logical  Structure  of  Usability  Inspection 

Well-informed  usage  of  UIMs  is  essential  to  their  effective  ap¬ 
plications.  This  can  be  based  on  an  understanding  of  UIM  us¬ 
age,  of  where  methods  can  fail,  and  how  this  can  be  avoided.  A 
high-level  structure,  which  can  be  readily  inferred  for  usability 
inspection,  supports  such  an  understanding.  It  is  a  logical  rather 
than  a  temporal  structure.  Its  phases  are  logically  distinct  but 
overlap  during  inspection.  Inspection  is  essentially  a  search 
problem.  Analyst  behavior  is  close  to  the  generate  and  test  strat¬ 
egy  from  early  artificial  intelligence  research,  where  possibilities 
are  first  generated  (e.g.,  moves  in  a  chess  game)  and  then  tested 
to  identify  possibilities  with  specific  attributes  (e.g.,  the  best  next 
move).  Note  that  the  role  of  evaluation  does  not  extend  to  fix¬ 
ing  problems,  even  though  methods  such  as  RITE  (Medlock, 
Wrxon,  McGee,  &  Welsh,  2005)  do  place  fixing  problems  within 
scope.  RITE  is  structured  around  three  questions:  Is  it  a  prob- 
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lem?  Do  we  understand  it?  and  Can  we  fix  it?  Only  the  first  ques¬ 
tion  and  some  of  the  second  are  within  scope  of  evaluation 
methods.  The  rest  of  RITE  (really  understanding  and  fixing  prob¬ 
lems)  is  an  iteration  method  with  (empirically)  grounded  re¬ 
design  decisions,  based  on  all  relevant  project  information. 

The  first  step  is  for  analysts  to  study  a  UIM.  Then,  each  time 
a  UIM  is  applied,  analysts  must  study  the  target  design  in  the 
form  provided.  These  preparation  steps  are  followed  by  two  log¬ 
ical  generate  and  test  phases  of  UIM  application.  We  can  thus 
model  usability  inspection  as  having  three  distinct  logical 
phases:  (a)  analyst  preparation,  (b)  candidate  problem  discov¬ 
ery,  and  (c)  confirmation  or  elimination  of  candidate  problems. 
We  now  review  each  logical  phase. 

Analyst  Preparation 

Analysts  must  first  understand  a  UIM  (and  continually  revisit 
this  understanding).  They  must  then  understand  a  design  and  its 
expected  usage  context.  Gaps  in  any  understandings  of  the  UIM 
and  a  design  translate  into  poor  predictions  (and  thus  flawed  de¬ 
sign  change  recommendations). 

UIMs  are  still  developing  and  have  yet  to  mature.  Analysts  are 
thus  always  researchers  who  have  much  to  contribute  to  UIM 
development.  The  oldest  UIM,  guidelines  conformance,  is  over 
30  years  old,  but  no  guideline  set  has  survived  (unchanged). 
The  most  structured  UIM,  cognitive  walkthrough,  is  much 
younger  and  changed  often  in  its  first  four  years,  but  has  been 
largely  undeveloped  since  then,  with  some  exceptions  (e.g., 
Spencer,  2000). 

Such  short  and  turbulent  histories  give  analysts  little  solid 
to  draw  on  when  learning  UIMs.  Methods  tend  to  be  reported  in 
short  conference  papers,  with  the  occasional  journal  paper  or 
a  chapter  in  a  student  text  or  practitioner’s  book.  Unpublished 
tutorial  notes  are  often  the  best  sources.  In  reality,  evaluators 
must  consult  a  small  set  of  sources  and  then  work  out  how  to 
apply  a  UIM,  taking  into  account  any  available  assessments. 

Novice  analysts  should  practice  UIMs  on  familiar  systems  and 
usage  contexts  to  establish  quickly  their  scopes  and  accuracies 
with  a  UIM.  Once  analysts  are  confident  that  they  understand  a 
UIM,  they  can  then  apply  it,  but  they  must  be  well  briefed  about 
designs  and  their  intended  usage  contexts.  Without  knowledge 
of  the  latter,  analysts  must  guess  how  and  when  potential  prob¬ 
lems  would  really  be  severe  or  find  features  and  behaviors  that 
are  so  incompatible  with  general  human  capabilities  that  usage 
contexts  become  irrelevant.  Not  surprisingly,  relatively  few  de¬ 
velopers  are  so  unaware  of  human  limitations  that  such  extreme 
features  get  introduced — for  example,  the  authors  have  only 
once  encountered  pale  lemon  text  on  a  white  background.  Typ¬ 
ical  usability  problems  tend  to  be  more  subtle  and  closely  re¬ 
lated  to  contextually  specific  factors,  such  as  user  knowledge 
(e.g.,  of  similar  systems),  semiotic  expectations  (e.g.,  expected 
meanings  of  system  elements),  domain  factors  (e.g.,  number 
formats  for  accounting),  human-activity  structures  (e.g.,  you 
do  not  pay  before  you  buy),  and  resources  (e.g.,  there  simply  is 
not  time  available  for  a  task  like  that).  Such  knowledge  of  ex¬ 
pected  usage  contexts  lets  analysts  make  sound  judgments 
about  a  problem’s  likely  severity,  persistence,  or  frequency. 

Unfortunately,  no  UIM  integrates  contextual  information  and 
its  understandings  of  users  and  tasks  according  to  Cockton  and 


Woolrych.  (2002).  Fortunately  HCI  professionals  apply  contex¬ 
tual  research  (Venturi,  2004),  even  though  some  may  not  accept 
that  contextual  methods  remain  separate  from  UIMs  (Manning, 
2002).  Not  every  UIM  user  will  apply  them  unprompted.  UIM  ex¬ 
pertise  will  thus  largely  depend  on  analysts’  existing  skills  and 
knowledge,  unless  extended  by  tutorial  material  with  self-as¬ 
sessment,  as  in  our  studies  (Lavery  Cockton,  &  Atkinson,  1996a, 
1996b,  1996c;  Lavery  &  Cockton,  1997a). 

Lastly,  an  analyst  must  know  about  and  understand  the  sys¬ 
tem  they  are  inspecting.  Some  UIM  developers  disagree,  pre¬ 
ferring  to  keep  analysts  in  untainted  ignorance.  They  aim  to  in¬ 
duce  user  empathy,  but  this  can  only  be  truly  grounded  in 
contextual  research.  Ungrounded  beliefs  about  users  and  us¬ 
age  contexts  can  be  very  unreliable  (and  often  insult  real  users’ 
intelligence).  If  analysts  begin  inspection  ignorant  of  both  the 
usage  context  and  the  system  being  evaluated,  then  the  result 
is  not  better  empathy  with  the  user  but  incorrect  claims  that  fea¬ 
tures  are  missing  or  beyond  users.  "While  there  may  be  evidence 
of  usability  problems  in  such  analyst  errors,  they  are  still  errors. 
Properly  informed  analysts  are  more  likely  to  note  that  a  fea¬ 
ture  is  hard  to  find  (rather  than  absent)  or  that  a  design  ratio¬ 
nale  has  overlooked  some  contextual  criteria.  Such  problem 
predictions  are  far  preferable  to  bogus  ones,  which  are  major 
risks  in  UIM  usage. 

Candidate  ProMem  Discovery 

Analysts  will  only  be  well  placed  to  find  possible  problems  if 
they  adequately  understand  a  UIM,  a  system  and  its  application 
domain,  its  design  rationales,  and  its  expected  context  of  use. 
They  can  then  use  a  UIM  to  discover  possible  usability  prob¬ 
lems.  UIM  usage  takes  two  distinct  forms.  The  predominant  one 
is  rule-based  conformance  assessment,  where  a  usable  system 
will  adhere  to  a  set  of  design  rules.  The  other  is  the  walk¬ 
through,  where  a  usable  system  is  one  where  no  likely  user  dif¬ 
ficulties  emerge  during  step-by-step  analysis  of  hypothetical 
interactions. 

Rule-based  methods  operate  at  different  levels  of  abstraction. 
The  most  abstract  detect  breaches  of  principles  that  express 
properties  of  systems  and  interaction.  Often  known  as  “-ities,” 
such  principles  include  browsability  predictability,  and  flexibility. 
The  least  abstract  methods  look  for  breaches  of  style  rules  that 
prescribe  design  decisions  at  a  level  of  details  such  as  the  struc¬ 
ture  and  content  of  window  title  bars.  In  between  principles  and 
style  guides  are  heuristics  (less  abstract  than  principles)  followed 
by  guidelines  (more  abstract  than  style  guides  are). 

Heuristics  and  guidelines  can  be  expressed  with  varying 
specificity  over  a  range  of  abstraction.  Heuristics  as  rules  of 
thumb  tend  to  be  general,  but  can  be  specific  (e.g.,  provide 
clearly  marked  exits).  Guidelines  range  from  specific  design  in¬ 
structions  to  general  wisdom  (e.g.,  know  the  user). 

Fig.  59.1  relates  the  five  main  UIM  types  to  each  other  in 
terms  of  level  of  abstraction  (vertical  axis,  top  is  high)  and  fo¬ 
cus  (horizontal  axis,  middle  is  interaction  centered,  left  is  sys¬ 
tem  centered,  right  is  user  centered) .  The  range  of  abstraction 
(height)  mostly  determines  the  coverage  (area)  for  each  method 
type,  but  it  can  also  be  extended  by  range  of  focus  (width).  Let¬ 
ters  represent  UIM  types:  P  (principles),  H  (heuristics),  G  (guide¬ 
lines),  S  (style),  R  (rules),  and'W  (walkthrough).  The  rule-based 
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FIGURE  59.1.  Relationship  of  types  of  inspection-based  method 
by  level  of  abstraction  and  focus. 


types  overlap  with  heuristics  spanning  abstraction  levels  within 
principles  and  guidelines,  and  guidelines  overlap  with  style 
rules.  Heuristics,  guidelines,  and  style  rules  can  all  be  special¬ 
ized  for  particular  application  domains,  and  thus  they  span  from 
almost  wholly  system  centered  towards  a  balance  of  system  and 
interaction  centered. 

The  level  and  range  of  abstraction  of  UIMs  create  different 
generic  discovery  tasks  for  analysts.  More  abstract  rules  require 
evaluators  to  narrow  rules  down  to  an  instance,  as  well  as  the 
opposite,  generalizing  from  an  instance  to  a  rule  to  detect 
breaches.  Generalizing  was  performed  more  readily  and  confi¬ 
dently  in  a  pilot  study  on  problem  matching  (Lavery  Cockton, 
&  Atkinson,  1997).  However,  generalizing  search  tasks  must  pro¬ 
ceed  feature  by  feature,  scanning  (all)  rules  for  one  that  a  feature 
breaks.  In  contrast,  narrowing  discovery  tasks  proceed  rule  by 
rule,  scanning  (all)  features  to  see  if  any  break  them.  Decision 
making  thus  becomes  quicker  as  rules  become  less  abstract, 
since  less  translation  from  rules  to  features  is  needed.  However, 
more  detail  increases  rule  sets,  enlarging  the  search  space:  not 
only  must  analysts  search  system  features,  but  they  must  also 
search  large  rule  sets  for  rules  that  features  could  breach. 

If  the  preceding  is  a  little  daunting  and  confusing,  then  con¬ 
sider  usability  analysts.  Clearly,  parallel  systematic  searches  of 
large  system  and  method  spaces  are  cognitively  impossible,  and 
thus  analysts  must  develop  other  discovery  methods  to  isolate 
possible  problems.  Ironically,  analysts  using  usability  heuristics 
must  develop  further  discovery  heuristics  to  apply  them.  The 
quality  and  application  of  discovery  heuristics  (or  methods)  de¬ 
termines  whether  analysts  will  find  or  miss  possible  usability 
problems.  The  latter  is  major  source  of  risk  in  UIM  usage,  along 
with  misprediction  of  bogus  problems. 

Confirmation  and  Elimination  of  Candidate  Problems 

Analytical  methods  arrive  at  conclusions  by  calculation,  de¬ 
duction,  or  judgment  (and  unfortunately  wild  guesses  and  imag¬ 
ination).  Having  discovered  a  possible  usability  problem,  analy¬ 
sis  is  required  either  to  confirm  it  as  probable  or  to  eliminate  it 
as  improbable.  Empirical  methods  arrive  at  conclusions  through 
observation,  analysis,  and  inference.  Empirical  evaluations  do 
not  rely  on  predictive  theories  but  simply  see  what  happens 


during  usage.  Analytical  methods  thus  rely  critically  on  the  qual¬ 
ity  of  their  underlying  theories  of  HCI,  which  are  explicit  for 
model-based  methods  but  tacit  for  inspection-based  methods. 

Just  as  the  quality  of  discovery  methods  determines  whether 
analysts  will  find  or  miss  possible  usability  problems,  so  the  qual¬ 
ity  and  application  of  analysis  resources  determines  whether 
analysts  will  correctly  confirm  or  eliminate  them.  For  analysis, 
knowledge  of  expected  contexts  of  use,  human  capabilities,  and 
key  properties  of  HCI  are  critical.  Contextual  information  may 
eliminate  a  possible  problem  as  Improbable.  Knowledge  of  dis¬ 
play-based  interaction  may  eliminate  possible  problems  that 
overlook  users’  abilities  to  discover  information  and  to  explore 
interactive  behaviors.  Knowledge  of  human  capabilities  such  as 
visual  attention  may  either  confirm  possible  problems  (e.g.,  key 
information  in  the  wrong  place)  or  eliminate  them  (e.g.,  mis¬ 
leading  information  in  the  wrong  place). 

Analysts  need  strong  understandings  of  HCI  in  general,  and  of 
usage  contexts  in  particular,  to  counterbalance  simple  determin¬ 
istic  underlying  models  of  problem  causation  in  rule-based 
methods.  These  are  the  main  source  of  risk  in  UIM  usage,  lead¬ 
ing  to  missed  problems  during  discovery  and  false  predictions 
during  analysis.  The  overwhelming  tendency  is  for  rules  to  be  ap¬ 
plied  to  system  features  and  not  to  the  combined  use  of  several 
features  by  real  users  in  real  interactions.  Rule-based  methods 
thus  tend  to  be  system-centric,  incorrectly  assuming  that  usabil¬ 
ity  can  be  an  inherent  property  of  the  system  alone.  This  intro¬ 
duces  the  risk  of  both  false  positives  (by  overestimating  features’ 
impacts  on  users  with  average  and  above  cognition)  and  false 
negatives  (by  falling  to  spot  adverse  interactions  between  usage 
and  guideline-conformant  features) .  Rule-based  methods  thus 
best  identify  problems  at  the  extremes  of  cognitive  and  percep¬ 
tual  overload,  which  would  require  exceptional  (and  highly  Im¬ 
probable)  usage  contexts  to  ameliorate  them.  For  white  text  on  a 
lemon  background,  such  a  context  would  involve  some  sort  of 
custom  optical  filter  to  increase  contrast  (or  a  screen  reader). 

In  contrast,  procedural  Inspection  methods  prompt  consid¬ 
eration  of  interaction  beyond  brief  encounters  with  individual 
system  features.  In  principle,  they  can  bring  any  aspect  of  a  us¬ 
age  context  Into  consideration  during  walkthrough,  and  across 
a  range  of  abstraction  levels,  ranging  from  a  user’s  motives  for 
fulfilling  a  role  (that  requires  them  to  carry  out  a  task)  to  the 
human  Information  processing  of  keystroke  level  Interaction. 
However,  as  with  rule-based  methods,  increasing  the  set  of  con¬ 
cerns  addressed  at  each  walkthrough  also  increases  the  chances 
of  error  (assuming  the  method  is  practical  enough  for  any  use  at 
all  to  be  attempted) . 

■When  analyzing  possible  problems,  the  evaluator  Is  again 
potentially  overwhelmed.  Again,  analysts  need  to  develop  their 
analysis  methods  to  make  inspection  manageable.  'While  dis¬ 
covery  methods  determine  whether  a  possible  problem  will  be 
found  or  not.  It  Is  analysis  methods  that  determine  four  in¬ 
spection  outcomes  of  (a)  true  positive  (hit),  (b)  true  negative, 
(c)  false  positive,  and  (d)  false  negative  (a  lost  rather  than  a 
missed  problem).  "Whereas  In  signal  detection,  the  assumption 
is  that  a  positive  indication  is  always  present,  in  usability  in¬ 
spection,  analysts  have  to  make  their  own  signals  during  prepa¬ 
ration  and  discovery,  and  thus  a  fifth  inspection  outcome  of 
genuine  miss  is  added.  Only  this  inspection  error  can  be  attrib¬ 
uted  to  failures  during  candidate  problem  discovery. 
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The  above  logical  phase  model  of  usability  inspection  will 
now  be  applied  to  six  example  usability  inspection  methods, 
which  are  first  described  and  then  analyzed. 


SIX  EXAMPLE  UIMS 


Heuristic  Evaluation 

Using  a  heuristic  evaluation  (Nielsen  &  Molich,  1990;  Nielsen, 
1992,  1994b),  the  evaluator  uses  a  small  set  of  rules  of  thumb 
to  confirm  usability  problems  by  considering  possible  violations. 
For  example,  systems  that  require  the  user  to  type  in  the  name 
of  a  file  would  breach  an  error  prevention  heuristic,  while  a  sys¬ 
tem  that  allowed  the  user  to  select  the  name  would  not.  There 
are  documented  accounts  in  the  literature  of  applications  of 
heuristic  evaluation  to  interactive  systems  such  as  finance, 
travel,  and  office  systems. 

The  heuristics  in  Fig.  59.2  were  derived  from  seven  sets  of 
heuristics  and  guidelines,  including  an  earlier  version  of 
Nielsen’s  (1994a)  own,  by  analyzing  how  well  they  explained  an 
existing  database  of  249  problems  from  usability  evaluations  of 
11  different  interactive  systems. 

For  each  usability  problem,  Nielsen  (1994a)  assigned  a  rat¬ 
ing  of  one  to  five  to  each  individual  heuristic  according  to  how 
well  it  explained  the  usability  problem.  Average  ratings  for 
each  heuristic  were  then  calculated.  A  principal  components 
analysis  was  used  to  determine  if  a  few  factors  accounted  for 
most  variability  of  the  usability  problems.  Nielsen  found  that 
seven  factors  could  only  account  for  30%  of  the  variability,  and 
53  factors  were  needed  to  account  for  90%,  which  is  too  many 
for  a  heuristic  evaluation  (HE).  The  names  of  the  seven  factors, 
which  were  chosen  by  Nielsen  included  “visibility  of  system 
status,”  “match  between  system  and  real  world,”  and  “error 
prevention.” 

Nielsen  (1994a)  attempted  to  pick  the  individual  heuristics 
that  provide  the  best  explanation  of  the  usability  problems.  A  list 
of  top  heuristics  was  chosen  by  first  picking  the  heuristic  that 
found  the  most  usability  problems  with  a  rating  of  three  or 
more  (“explains  a  major  part  of  the  problem,  but  there  are  some 
aspects  of  the  problem  that  are  not  explained”),  and  then  re¬ 
peating  with  the  remaining  usability  problems  to  complete  the 
list.  This  procedure  was  also  repeated  to  provide  the  best  ex¬ 
planation  of  the  more  serious  usability  problems. 


1 .  Visibility  of  System  Status 

2.  Match  between  system  and  the  real  world 

3.  User  control  and  freedom 

4.  Consistency  and  Standards 

5 .  Error  prevention 

6.  Recognition  rather  than  recall 

7.  Flexibility  and  efficiency  of  use 

8.  Aesthetic  and  minimalist  design 

9.  Help  users  recognize,  diagnose  and  recover  from  errors 

10.  Help  and  documentation 


FIGURE  59.2.  Heuristics  (Nielsen,  1994b). 


Nielsen  (1994a)  found  that  a  small  number  of  heuristics  could 
mostly  explain  both  the  serious  and  unfiltered  problem  set,  and 
that  in  general  the  same  heuristics  appeared  in  both  lists  either 
as  the  exact  same  guideline  or  as  similar  guidelines.  Further¬ 
more,  he  found  that  almost  all  of  the  seven  principle  factors  were 
covered  in  the  top  heuristics  list.  This  supported  his  claim  that 
the  factors  (the  first  seven  heuristics  in  Fig.  59.2  along  with  eight 
and  nine)  form  a  basis  for  a  set  of  heuristics  that  has  remained 
the  main  resource  for  HE;  the  10th,  “Help  and  Documentation,” 
was  later  added.  It  is  not  clear,  however,  how  much  the  addi¬ 
tional  three  heuristics  extend  the  coverage  of  the  sample  prob¬ 
lem  set  beyond  30%,  but  this  is  very  unlikely  to  reach  40%. 

However,  Nielsen  (1994a)  noted  that  the  new  set  is  good  at 
explaining  problems,  but  he  did  not  know  how  effective  they 
would  be  at  finding  usability  problems.  Thus,  the  current  version 
of  HE  comes  with  known  limitations  in  its  discovery  resources, 
and  that  others  have  attempted  to  fill  the  gap  is  no  surprise.  Thus, 
Muller  et  al.  (1995)  rightly  claimed  that  the  10  1994  heuristics  ig¬ 
nore  usage  contexts.  They  added  three  extra  heuristics  to  assess 
how  well  a  design  fits  user  needs  and  the  work  environment: 

1.  Respect  the  user  and  his  or  her  skills 

2.  Pleasurable  experience  with  the  system 

3.  Support  quality  work 

The  value  of  these  extra  heuristics  was  demonstrated  for  an 
application  where  247  usability  problems  were  found,  which  led 
to  89  redesign  recommendations  of  which  72%  were  imple¬ 
mented.  Each  problem  and  recommendation  was  scored  for  its 
connection  to  each  of  the  13  heuristics.  The  three  new  heuris¬ 
tics  were  solely  responsible  for  identifying  15%  of  the  problems 
and  10%  of  the  recommendations.  By  rating  problems  and  rec¬ 
ommendations  using  a  severity  scale  (from  one  to  five),  they 
found  that  problems  revealed  with  the  new  heuristics  were 
rated  as  being  slightly  less  important  than  those  identified  using 
the  1994  heuristics,  but  there  was  no  difference  in  importance 
for  the  recommendations.  However,  such  recommendations  are 
not  solely  the  result  of  a  UIM  and  the  lack  of  a  difference  here 
may  reflect  the  common  redesign  resources  in  use. 

Cognitive  Walkthrough 

Cognitive  walkthrough  (CW;  Wharton,  Rieman,  Lewis,  &  Poi¬ 
son,  1994;  Lewis  &  Wharton,  1997)  is  a  procedural  inspection 
method.  It  assesses  the  learnability  of  walk  up  and  use  systems. 
One  or  more  evaluators  choose  a  set  of  tasks  for  analysis  and 
document  the  assumptions  about  the  target  users  of  the  de¬ 
sign.  Analysts  step  through  each  stage  of  the  task  answering  a 
set  of  questions  (see  Figure  3)  and  backing  up  their  answers 
with  empirical  evidence,  experience,  or  scientific  evidence.  A 
negative  answer  to  any  of  the  questions  indicates  potential  us¬ 
ability  problems  with  the  system.  However,  these  have  to  be 
considered  together  in  the  context  of  the  task  and  may  result  in 
either  a  success  case  (trivial  usability  problems)  or  a  failure  case 
(important  usability  problems). 

CW  is  underpinned  by  CE-F,  a  theory  of  learning  (Poison  & 
Lewis,  1990).  For  example,  the  theory  predicts  that  a  user  new 
to  a  system  will  choose  an  action  with  a  good  match  to  the 
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1 .  Will  the  user  try  to  achieve  the  right  effect? 

2.  Will  the  user  notice  that  the  correct  action  is  available? 

3.  Will  the  user  associate  the  correct  action  with  the  effect  that  the 
user  is  trying  to  achieve? 

4.  If  the  correct  action  is  performed,  will  the  user  see  that  progress 
is  being  made  towards  solution  of  the  task? 


FIGURE  59.3.  Cognitive  walkthrough  questions  (Wharton  et  al., 
1994). 

user’s  current  goal.  This  reflects  the  relation  between  inter¬ 
action  centeredness  and  grounding  in  theory,  in  that  the  most 
system-centered  methods  are  generally  atheoretical. 

Two  main  forms  of  CW  have  been  presented  in  the  literature. 
Earlier  versions  explicitly  considered  users’  goals  (e.g.,  Lewis, 
Poison,  Wharton,  &  Rieman,  1990;  Poison,  Lewis,  Rieman,  & 
■Wharton,  1992).  For  example.  Poison  et  al.’s  (1992)  version  re¬ 
quired  three  pages  of  questions  for  each  action;  one  page  to 
address  users’  goals,  one  to  cover  choosing  and  executing  the 
correct  action,  and  a  final  page  of  questions  covering  the  effect 
of  taking  the  action  on  the  user’s  goal  structure.  The  overhead 
of  the  method  and  difficulty  applying  the  method  led  to  the  cog¬ 
nitive  jogthrough  (Rowley  &  Rhoades,  1992)  and  multimedia 
tool  support  (Rieman  et  al.,  1991)  before  a  simplification  of  the 
method  (Wharton  et  al.,  1994;  Lewis  &  'Wharton,  1997). 

Overall,  first  versions  of  CW  provided  extensive  analyst 
preparation  materials,  but  these  proved  to  be  unwieldy  as  dis¬ 
covery  resources,  reducing  the  benefits  of  being  interaction 
centered.  Still,  the  success  case  in  CW  is  a  rare  example  of  an 
analysis  resource  for  problem  elimination,  letting  analysts  argue 
that  some  apparent  design  flaws  would  not  automatically  cause 
severe  usability  problems.  However,  there  is  no  method  sup¬ 
port  in  CW  for  forming  success  and  failure  cases. 

In  response  to  criticisms,  "Wharton  et  al.  (1994)  developed 
a  simpler  version  of  CW  This  version  “de-emphasizes  the  ex¬ 
plicit  consideration  of  the  user’s  goal  structure,”  resulting  in 
the  current  set  of  four  questions  (see  Figure  3),  although  a  sub¬ 
sequent  variation  (Spencer,  2000)  reduced  that  down  to  two. 
Overall,  later  versions  of  CW  reduced  extensive  preparation 
materials  for  analysts,  making  them  less  unwieldy  as  discovery 
resources,  but  perhaps  losing  some  benefits  that  arise  with 
interaction-centered  methods,  apart  from  ones  preserved  via 
task  descriptions. 

A  Hybrid  Method:  Heuristic  Walkthrough 

Heuristic  walkthrough  (HW;  Sears,  1997)  is  a  cross  between  HE 
and  CW  The  input  to  the  method  is  a  prioritized  list  of  user 
tasks.  These  tasks  should  include  frequent  or  critical  tasks,  but 
may  also  include  tasks  designed  purely  to  ensure  coverage  of 
the  system.  HW  thus  provides  some  guidance  on  preparation, 
which  in  turn  provides  discovery  resources. 

There  are  two  phases  to  the  evaluation:  a  task-based  phase 
and  a  free-form  phase.  In  the  first  phase,  the  evaluators  explore 
the  tasks  using  a  set  of  thought  provoking  questions  derived 
from  CW  They  are  free  to  explore  the  tasks  in  any  order,  spend¬ 
ing  as  long  as  they  need,  but  they  should  be  guided  by  task  pri¬ 
orities.  In  the  second  phase,  evaluators  are  free  to  explore  the 


system.  They  use  the  set  of  thought  of  provoking  questions 
again,  plus  heuristics  from  HE. 

Sears  (1997)  compared  HW  against  CW  and  HE.  HW  had 
more  actual  problems  than  CW  and  had  fewer  false  positives 
than  HE.  Overall,  Sears’  merge  of  HE  and  CW  improved  analyst 
preparation  and  discovery  resources,  with  evidence  of  im¬ 
proved  analysis  resources  (reduction  in  false  positives),  which 
may  be  due  to  the  initial  CW  inhibiting  discovery  of  unlikely 
problems.  In  Fig.  59.1,  HW  would  span  and  bridge  the  areas 
covered  by  heuristics  on  the  left  and  walkthroughs  in  the  middle 
(at  least  some  things  in  UIM  research  are  simple). 

Other  UlMs 

Ergonomic  Criteria 

Hke  HE,  ergonomic  criteria  (Bastien  &  Scapin,  1995;  Scapin  & 
Bastien,  1997)  uses  a  set  of  principles.  However,  these  18  er¬ 
gonomic  criteria  (principles)  are  structured  around  a  definition, 
a  rationale,  and  examples  of  guidelines  plus  comments  that  in¬ 
clude  help  to  disambiguate  any  related  criteria.  They  were 
formed  by  reviewing  existing  guidelines  and  experimental  results 
(Scapin,  1990).  The  extra  structure  present  in  the  ergonomic 
criteria  is  likely  to  assist  analysts  preparing  to  use  this  method. 

Cognitive  Dimensions 

Green  (1991)  claimed  that  HCI  evaluation  techniques  tend 
to  focus  too  much  on  surface  features,  analysis  tends  to  be  over 
technical,  and  HCI  has  no  common  vocabulary.  Cognitive  di¬ 
mensions  (Green  &  Petre,  1996)  attempts  to  capture  a  set  of 
orthogonal  dimensions  for  restricted  set  of  systems  called  “no¬ 
tations,”  which  are  used  to  design  information  structures,  for 
example  programming  languages  and  spreadsheets.  These  di¬ 
mensions  are  not  necessarily  criteria  for  a  design  but  a  charac¬ 
terization  of  the  design  space. 

A  sample  of  dimensions  can  be  found  in  Fig.  59.4.  The  di¬ 
mensions  embody  a  theory  about  how  people  use  notations  to 
build  information  structures.  A  simplified  theory  of  action 
would  involve  people  translating  goals  into  specifications,  then 
into  actions,  and  executing  them.  Rather,  people  work  bottom 
up,  top  down — goals  and  subgoals  may  be  attacked  at  any  mo¬ 
ment.  Green  (1991)  claimed  the  preferred  strategy  is  oppor- 


Hidden  Dependencies  can  occur  when  one  cannot  see  how  a  value 
is  calculated,  for  example  a  spreadsheet  where  a  small  change  can 
have  unexpected  effects.  All  dependencies  that  are  important  to  the 
user  must  be  accessible. 

Premature  Commitment  occurs  when  a  user  is  forced  to  make  a 
decision  before  the  necessary  information  is  available. 

Viscosity  is  resistance  to  change,  or  a  measure  of  how  much  work 
is  needed  to  achieve  a  change. 

Abstraction  Gradient.  An  abstraction  is  the  grouping  of  objects 
into  one  object  which  can  make  a  system  easier  to  understand. 


FIGURE  59.4.  Example  cognitive  dimesions. 
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tunistic  where  high  and  low  level  decisions  are  mingled;  com¬ 
mitment  can  be  weak  or  strong,  and  development  in  one  area 
may  be  postponed  because  of  problems  foreseen  in  other  areas. 
The  vocabulary  of  dimensions  may  act  as  a  discovery  resource, 
sensitizing  analysts  to  generic  interaction  difficulties.  However, 
once  discovered,  there  appears  to  be  little  further  support  for 
confirmation  or  elimination 

There  is  little  guidance  on  applying  the  dimensions.  A  nota¬ 
tion  called  “ERMIA’  (Green,  1991)  was  proposed  as  a  method 
of  exploring  dimensions.  Interestingly,  this  notation  seems  to 
have  been  developed  (Green  &  Benyon,  1995,  1996)  into  a 
stand-alone  UlM  without  dimensions. 

Pluralistic  Walkthrough 

The  pluralistic  walkthrough  (Bias,  1994)  UIM  originated  at 
IBM.  An  administrator  leads  a  team  of  three  diverse  participants 
(a  representative  user,  a  product  developer,  and  a  usability  spe¬ 
cialist)  who  pretend  to  be  system  users.  Each  is  given  a  set  of  hard 
copies  of  screens  and  task  descriptions.  Eor  each  screen,  the  par¬ 
ticipant  is  asked  to  write  down  in  as  much  detail  the  next  action 
(or  set  of  actions)  the  user  would  take  to  achieve  the  task,  and  any 
comments  they  have  about  the  screen.  Once  each  participant  has 
written  down  the  next  action,  the  screen  is  discussed.  When  the 
representative  users  have  finished  talking  (for  example,  about 
problems),  the  usability  specialists  and  product  developers  talk 
(for  example,  to  explain  the  rationale  of  certain  features).  No  as¬ 
sessment  of  this  UIM  has  been  published;  however,  there  are 
clear  potential  benefits  in  preparation  (via  hard  copies  of  screens 
and  task  descriptions),  and  the  combination  of  representative 
users,  product  developers,  and  usability  specialists  is  bound  to  ex¬ 
tend  the  range  of  available  discovery  and  analysis  resources. 


ASSESSING  AND  SCOPING  METHODS 


Six  methods  have  been  described  and  reviewed  for  their  sup¬ 
port  for  analyst  preparation,  problem  discovery,  and  problem 
analysis.  This  review  indicated  where  each  method  provides, 
or  omits,  support  for  three  key  logical  phases  in  usability  in¬ 
spection:  preparation,  discovery,  and  analysis.  However,  such 
support  should  be  reflected  in  the  practical  effectiveness  of  a 
UIM.  This  section  presents  and  applies  the  main  approaches  to 
assess  UIM  effectiveness 

There  are  two  broad  approaches — one  qualitative  and  the  other 
quantitative — to  assessing  and  scoping  UIMs.  Qualitative  methods 
code  predictions  as  a  route  to  grounding  explanatory  models  of 
analyst  performance.  Quantitative  methods  focus  on  the  rela¬ 
tive  performance  of  one  method  to  another,  generally  expressed 
as  a  percentage  or  score  (out  of  one,  one  being  perfect)  relating 
to  the  proportion  of  correctly  predicted  problems.  Three  such 
quantitative  measures  are  thoroughness,  validity,  and  efficiency 
which  are  now  described  before  addressing  further  measures. 

Thoroughness,  Validity,  and  Effectiveness 

Ideally,  UIMs  should  predict  all  important  user  difficulties;  for 
example,  we  want  them  to  be  thorough,  and  thus  not  miss  se¬ 


vere  usability  problems.  UIMs  should  also  ideally  predict  only 
genuine  problems;  for  example,  we  want  them  to  be  valid  and 
not  to  generate  floods  of  false  positives.  Together,  we  do  not 
want  thoroughness  at  the  expense  of  poor  validity;  for  example, 
we  do  not  want  UIMs  to  discover  almost  all  the  important  user 
difficulties  by  setting  such  low  thresholds  of  probability  in  analy¬ 
sis  that  most  predictions  are  false  positives.  We  want  effective 
UIMs,  we  want  efficient  UIMs,  in  regards  to  usage  and  learning 
costs,  and  we  want  downstream  utility  (John  &  Marks,  1997), 
that  is,  beneficial  influence  on  design  changes.  However,  these 
currently  have  received  far  less  attention  than  the  quality  of  the 
problem  sets  that  a  UIM  produces.  The  maturity  of  problem  set 
based  measures  is  reflected  in  Sears’  (1997)  formulae  for  thor¬ 
oughness,  validity,  and  effectiveness: 

Effectiveness  =  Thoroughness  X  Validity 

where 

Thoroughness  =  number  of  real  usability  problems  found  by  UIM 
number  of  real  problems  that  exist 

and 

Validity  =  number  of  real  usability  problems  found  by  UIM 
number  of  problems  predicted  by  UIM 

Each  score  is  between  zero  and  one,  and  thus  effectiveness 
will  be  reduced  even  for  high  thoroughness  by  low  validity.  This 
penalty  can  be  increased  or  decreased  by  weightings  to  reflect 
different  evaluation  contexts  (Hartson,  Andre,  &  Williges, 
2001).  However,  two  of  the  values  in  the  other  equations  pre¬ 
sent  severe  research  challenges.  Firstly,  the  denominator  for 
thoroughness  (number  of  real  problems  that  exist)  relies  on 
the  implicit  closures  of  “all  of  the  usability  problems  for  a  de¬ 
sign.”  It  should  be  clear  that  “all”  here  could  never  be  deter¬ 
mined  with  confidence:  there  can  always  be  undiscovered 
problems.  Thus,  all  thoroughness  measures  are  always  upper 
bounds,  such  as  a  maximum  that  will  reduce  as  newly  discov¬ 
ered  real  problems  increase  the  denominator.  Note  that  prob¬ 
lems  can  be  lost  after  they  have  been  discovered,  either  through 
poor  merging  of  problem  sets  from  several  UIMs  or  through 
missing  problems  in  user  test  data  (Cockton  &  Lavery  1999). 
Most  analysts  miss  most  problems  in  video  data  (Jacobsen, 
Hertzum,  &John,  1998). 

Similarly,  the  numerator  for  validity  (number  of  real  usabil¬ 
ity  problems  found  by  UIM)  is  a  subset  of  the  thoroughness  de¬ 
nominator,  that  is,  the  overlap  between  predicted  and  real  prob¬ 
lems.  The  closure  here  is  a  subset,  since  we  need  not  discover 
all  problems,  only  those  that  were  predicted  by  the  UIM.  All  va¬ 
lidity  measures  are  always  lower  bounds,  such  as  a  minimum 
that  increases  if  newly  discovered  real  problems  match  predic¬ 
tions  and  thus  increase  the  numerator. 

True  closures  only  exist  in  algebra,  where  sets  may  be  closed 
under  operations.  Closures  are  thus  wholly  synthetic.  In  the  real 
world,  we  use  the  word  all  for  enumerations,  such  as  “prohib¬ 
ited  items  in  carry-on  baggage.”  Closure  here  is  the  result  of 
human  agreement,  and  even  then  we  find  lists  of  very  broad  cat¬ 
egories  rather  than  specific  instances  (e.g.,  ammunition;  automatic 
weapons;  axes  and  hatchets,  unless  part  of  aircraft  equipment; 
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billiard  cues;  billy  clubs;  and  blackjacks).  This  raises  two  further 
challenges  for  effectiveness  metrics  for  UIMs. 

First,  we  expect  lists  of  usability  problems  to  be  more  specific 
than  menu  problems,  layout  problems,  dragging  problems,  and 
difficulties  with  terminology.  We  thus  cannot  even  impose  a 
high-level  closure  on  what  can  be  listed  as  a  usability  problem. 
Second,  specificity  of  description  determines  problem  count  ac¬ 
cording  to  Cockton  and  Lavery  (1999).  As  we  become  more 
specific  about  usability  problems,  the  count  rises.  For  example, 
“misleading  status  bar  messages”  is  one  problem,  but  “incom¬ 
plete  prompts”  and  “technical  vocabulary  in  message”  are  two, 
and  more  would  result  if  we  distinguished  between  prompt 
problems  such  as  “no  instruction  to  close  free  form  with  double 
click”  or  “no  instruction  to  use  control-click  to  delete  vertices” 
(for  drawing  objects).  The  limit  of  problem  counts  is  the  full  list 
of  all  problem  instances  for  all  users  in  all  contexts  (generaliza¬ 
tions  of  instances  are  sometimes  called  “problem  tokens;”  e.g., 
John  &  Mashyna,  1997).  Thus,  the  choice  of  abstraction  level  for 
problem  types  or  tokens  is  critical  for  determining  the  counts  of 
real  usability  problems. 

Thus,  problem  counts  are  absolutely  determined  by  basic 
problem  granularity.  In  theory,  effectiveness  measures  could  be 
made  at  several  levels  of  abstraction,  based  on  difficulty  gener¬ 
alization  in  the  SUPEX  method  (Cockton  &  Lavery,  1999),  but  no 
one  has  attempted  this  yet.  However,  as  long  as  predicted  and 
real  problem  sets  are  expressed  at  the  same  level  of  abstraction, 
thoroughness  and  validity  measures  should  not  be  distorted  by 
categorization. 

The  issue  of  comprehensiveness  for  the  set  of  problem  in¬ 
stances  is  independent  of  the  problem  categories.  For  thor¬ 
oughness,  asymptotic  testing  has  been  proposed  as  a  means  to 
ensure  the  highest  possible  denominator  (Hartson,  Andre,  & 
Williges,  2001).  However,  this  is  generally  understood  as  con¬ 
tinuing  to  test  additional  users  using  the  same  test  protocol  un¬ 
til  no  new  problems  emerge.  Changing  the  test  protocol,  such 
as  between  different  sets  of  fixed  tasks,  between  free  and  fixed 
tasks,  or  between  individual  think  aloud  and  peer  tutoring, 
would  also  reveal  new  problems.  Also,  adding  analysts  and/or 
applying  a  structured  method  such  as  SUPEX  could  extract 
more  problems  from  video  data.  As  with  most  attempts  at  clo¬ 
sure,  methods  to  find  all  problems  simply  expose  a  further  un¬ 
achievable  closure,  that  is,  in  all  of  the  ways  to  find  all  the  us¬ 
ability  problems. 

Maximizing  the  numerator  for  validity  is  more  tractable. 
When  real  problems  are  found  by  fixed  task  testing,  a  falsifica¬ 
tion  methodology  can  expose  users  repeatedly  to  predicted 
problems  (Woolrych,  Cockton,  &  Hindmarch,  2004).  Missed 
problems  here  will  have  two  types  of  cause:  errors  in  test  task 
planning  (which  can  be  fixed)  and  an  inability  to  put  the  sys¬ 
tem  in  a  state  where  the  problem  would  appear  (which  can  be 
very  difficult  to  achieve;  Lavery  &  Cockton,  1996). 

The  components  of  effectiveness  are  thus  unavoidably  sub¬ 
ject  to  upper  bounds  for  thoroughness  and  lower  bounds  for  va¬ 
lidity.  Where  real  problems  are  established  by  empirical  meth¬ 
ods,  such  as  user  testing,  asymptotic  and  falsification  testing  can 
be  used  to  respectively  drop  the  maximum  for  thoroughness 
and  raise  the  minimum  for  validity.  Where  real  problems  are 
merely  the  union  of  the  problem  sets  from  a  range  of  compared 
methods,  new  sources  of  error  arise.  While  thoroughness  can 


still  be  calculated  as  the  proportion  of  all  predicted  problems 
found  by  a  single  UIM,  validity  requires  reliable  methods  for 
identifying  false  positives.  Falsification  testing,  based  on  empir¬ 
ical  methods,  is  the  only  way  to  identify  and  manage  sources  of 
error  here. 

Other  Quantitative  Measures  for  UIMs 

Effectiveness  measures  depend  on  creating,  counting,  and  com¬ 
paring  problem  sets.  Other  quantitative  measures  are  based  on 
coding  within  a  problem  set  (as  opposed  to  coding  through 
comparison,  as  with  false  positives,  which  are  in  the  predicted 
but  not  the  real  problem  set).  Thus,  Cockton  and  Woolrych 
(2001)  coded  heuristic  applications  for  nonbogus  predictions  as 
being  appropriate  or  inappropriate,  based  on  explicit  criteria 
from  the  HE  training  manual  (Lavery,  Cockton,  &  Atkinson, 
1996a)  provided  for  analysts.  This  provided  conformance  ques¬ 
tions  that  stated,  “What  the  system  should  do,  or  users  should 
be  able  to  do,  to  satisfy  the  heuristic.”  (p.  4).  Such  questions 
are  answered  with  conformance  evidence,  such  as  the  “design 
features  or  lack  of  design  features  that  indicate  partial  satisfac¬ 
tion  or  breaches  of  the  heuristic.”  (p.  4).  For  many  heuristic  ap¬ 
plications,  these  criteria  were  clearly  ignored.  Only  39%  of  the 
heuristics  applied  to  nonbogus  predictions  were  appropriate 
(and  only  31%  for  successful  predictions).  Such  measures  of 
appropriateness  are  important  in  identifying  the  extent  to 
which  problem  discovery  and  analysis  are  due  to  a  UIM  or  to  the 
analysts’  own  knowledge  resources  and  inspection  heuristics. 
Heuristics  tended  to  be  best  applied  to  predictions  that  turned 
out  to  be  of  low  frequency  and/or  severity.  These  were  likely  to 
be  predictions  that  seemed  less  probable  than  ones  that  turned 
out  to  be  of  high  frequency  and/or  severity,  which  presumably 
were  so  obvious  that  the  13  heuristics  (Nielsen,  1994a,  plus 
three  specific  to  visualization;  Lavery,  Cockton,  &  Atkinson, 
1996c)  were  not  properly  reviewed  to  strengthen  analysis.  Fur¬ 
thermore,  given  the  known  limited  coverage  of  Nielsen  (1994a) 
heuristics,  there  may  have  been  no  appropriate  heuristic  for 
some  predictions. 

Other  studies  have  classified  usability  problems  by  severity 
(e.g.,  Jeffries,  Miller,  Wharton,  &  Uyeda,  1991)  and  by  type.  Types 
may  be  based  on  a  range  of  constructs,  for  example,  by  relation 
to  stages  in  Norman’s  (1986)  theory  of  action.  Using  this  coding 
scheme,  a  study  (Cuomo  &  Bowen,  1992,  1994)  found  that  an 
early  version  of  CW  tended  to  focus  on  problems  related  to  the 
formation  of  actions,  and  was  poor  at  finding  problems  con¬ 
cerned  with  evaluating  the  display.  A  similar  coding  approach 
was  used  for  an  assessment  of  a  later  version  of  CW  (Sears  & 
Hess,  1999).  This  study  examined  the  impact  of  short  and  (very) 
detailed  task  descriptions  on  analyst  performance.  It  only  re¬ 
ports  thoroughness  measures,  but  demonstrates  the  role  of  dis¬ 
covery  resources  (here  task  descriptions)  in  shaping  analyst  per¬ 
formance.  The  impact  of  each  resource  was  assessed  by  coding 
problems  by  the  CW  question  relevant  to  the  prediction.  Short 
task  descriptions  resulted  in  relatively  more  predictions  related 
to  finding  actions  (cognitive  walkthrough,  question  2,  Fig.  59.3). 
Detailed  descriptions  resulted  in  relatively  more  feedback  re¬ 
lated  predictions  (question  4,  Fig.  59.3).  Sears  and  Hess  (1999) 
based  their  explanations  on  discovery  task.  Detailed  (to  the  level 
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of  naming  and  locating  controls)  task  descriptions  led  to  over¬ 
sights  in  finding  action  specification  problems,  but  left  analysts 
with  the  energy  to  look  for  feedback  problems. 

Coding  schemes  have  the  benefit  of  being  able  to  expose 
why  some  problems  are  missed  or  why  some  fail  to  be  elimi¬ 
nated  during  analysis.  Problem  discoverability  was  coded  in  an 
assessment  of  HE  (Cockton  &  Woolrych,  2001).  The  easiest 
problems  to  discover  are  perceivable-,  these  can  be  seen  at  a 
glance.  Next  come  actionable  problems:  these  can  be  discov¬ 
ered  after  a  few  simple  actions  (e.g.,  a  mouse  click).  Hardest  to 
find  are  constructable  problems,  where  only  several  interac¬ 
tion  steps  involving  multiple  application  objects  may  reveal  a 
problem.  This  allowed  investigation  of  the  relationship  between 
predictions  and  discoverability.  Eighty  percent  of  missed  prob¬ 
lems  were  constructable,  as  opposed  to  seven  percent  of  suc¬ 
cessful  predictions.  This  indicates  that  task  knowledge  is  not 
well  supported  as  a  discovery  resource  by  HE,  confirming  that 
poor  integration  of  context  of  use  information  into  UIMs  is  a 
major  source  of  inspection  risk. 

UIM  assessment  is  still  at  an  early  stage.  As  researchers  de¬ 
velop  novel  coding  schemes  to  analyze  predicted  and  actual 
problem  sets,  new  insights  become  possible  about  the  role  and 
use  of  discovery  and  analysis  resources  in  usability  inspection. 
These  insights  tend  to  be  more  valuable  than  simple  thorough¬ 
ness  or  validity  measures  (Woolrych  &  Cockton  2000),  since 
they  can  identify  which  sort  of  problems  get  missed  (and  thus 
reduce  thoroughness)  and  which  predictions  fail  to  be  correctly 
eliminated  (and  thus  reduce  validity).  As  the  nature  of  missed 
problems  (genuine  misses  and  false  negatives)  and  false  posi¬ 
tives  become  better  understood,  researchers  and  practitioners 
can  explore  ways  to  improve  UIMs,  either  through  specific 
changes  to  specific  methods  (fix  the  method),  improved  analyst 
education  and  training  (fix  the  analyst)  or  generic  improve¬ 
ments  that  are  applicable  to  a  range  of  methods  (fix  inspection) . 

Qualitative  Analysis  of  UIM  Usage 

Coding  of  UIM  predictions  can  be  used  as  a  basis  for  quantitative 
analysis  of  attributes  such  as  appropriateness  and  discoverability, 
based  on  counts  of  problems  with  different  codings.  Alternatively, 
coding  can  be  used  to  develop  models  of  UIM  usage,  without  re¬ 
course  to  counts  and  statistical  inference.  Given  that  most  UIM 
usage  variations  are  due  to  the  analyst  and  not  to  UIMs,  such 
qualitative  research  provides  a  better  basis  for  UIM  improvement, 
not  only  via  changes  to  specific  UIMs,  but  through  more  generic 
improvements  to  the  inspection  process  or  through  analyst  train¬ 
ing  and  education.  Qualitative  analysis  was  used  to  refine  the 
DARe  model  developed  by  two  of  this  chapter’s  authors. 

Methodological  Issues  in  UIM  Assessment 

Inescapable  upper  bounds  for  thoroughness  and  lower  bounds 
for  validity  are  two  methodological  problems  for  assessing  and 
comparing  UIMs.  Eurther  methodological  problems  arise  when 
UIMs  are  assessed  or  compared  without  recourse  to  user  test¬ 
ing  based  on  falsification  and  asymptotic  approaches,  since  there 
is  no  well-grounded  way  to  identify  false  positives  in  such  stud¬ 


ies.  Great  care  should  be  taken  when  relying  on  the  results  of  any 
UIM  assessment  or  comparison,  but  no  validity  measures  with 
any  acceptable  degree  of  credibility  are  possible  if  empirical 
methods  are  not  used  to  construct  the  actual  problem  set  (the 
real  problems  in  Sears’,  1997,  equations).  The  common  ap¬ 
proach  of  comparing  predictions  from  a  single  UIM  to  the  com¬ 
bined  predictions  of  a  set  of  UIMs  is  so  fraught  with  method¬ 
ological  challenges  that  little  of  any  credibility  can  be  concluded 
from  it.  This  and  further  methodological  flaws  formed  the  basis 
for  an  extensive  critique  of  evaluation  method  assessment  (Gray 
&  Salzman,  1998). 

Gray  and  Salzman  (1998)  reviewed  the  validity  of  five  major 
studies  in  the  literature  (including,  Jeffries  et  al.,  1991;  Desurvire, 
Kondziela  &  Atwood,  1992;  Nielsen,  1992).  They  based  their  re¬ 
view  on  Cook  and  Campbell’s  (1979)  four  main  types  of  validity: 
statistical  conclusion  validity,  internal  validity,  construct  validity, 
and  external  validity.  They  added  a  fifth  type:  conclusion  valid¬ 
ity.  Gray  and  Salzman  found  validity  problems  with  each  study 
reviewed.  The  forms  of  (in)validity  and  examples  will  now  be 
described. 


Statistical  Conclusion  Validity 

Was  the  change  to  the  dependent  variable  caused  by  the  ma¬ 
nipulation  of  the  independent  variable?  Statistical  tests  are  used 
to  determine  if  the  change  did  not  occur  by  chance,  for  exam¬ 
ple,  by  wildcard  participants,  who  always  do  significantly  better  or 
worse  than  the  average  despite  the  conditions.  Recruiting  more 
participants  reduces  the  wildcard  effect.  Eor  example,  Jeffries 
et  al.  (1991)  compared  HE  (actually  expert  review),  guide¬ 
lines,  and  user  testing  and  found  HE  superior.  The  study  used  few 
evaluators,  and  thus,  it  is  not  clear  if  the  effects  of  the  study  (for 
example,  the  superiority  of  expert  review)  occurred  by  chance. 

internal  Validity 

Was  the  effect  caused  by  the  variable  under  manipulation  or 
by  a  third  unknown  confounding  variable?  There  is  no  simple 
mechanical  test  for  internal  validity,  but  study  design  must  con¬ 
sider  instruments,  selection  of  participants  and  settings,  and  the 
use  of  controls  where  appropriate.  One  study  where  expert  re¬ 
view  was  shown  to  be  superior  potentially  suffers  from  internal 
validity  because  the  experts  were  given  a  two-week  period  at 
their  own  paces  to  complete  the  evaluation,  whereas  evalua¬ 
tors  in  the  other  conditions  had  far  less  time.  The  worst  per¬ 
forming  methods  were  used  by  software  engineers  with  less  us¬ 
ability  expertise  than  the  HE  group. 

Construct  Validity 

Does  the  study  manipulate  the  conditions  it  claims  to  (causal 
construct  validity)  and  does  it  measure  the  changes  it  claims  to 
measure  (effect  construct  validity)?  Causal  construct  invalidity 
would  occur  if  participants  in  a  study  used  a  UIM  differently  to 
its  documented  method.  Practitioners  need  to  be  informed 
how  different  ways  of  applying  UIMs  affect  the  number  and 


1180 


COCKTON,  LAVERY  WOOLRYCH 


types  of  problems  found  and  which  UIMs  are  most  effective  for 
which  software  domains. 

One  study  (Karat,  Campbell,  &  Fiegel,  1992)  attempted  to 
address  causal  construct  validity,  addressing  “mono-operation” 
(single  analyst)  bias  by  comparing  individual  with  paired  in¬ 
spections.  Similarly,  they  addressed  “monomethod”  bias  by 
evaluating  two  different  software  systems.  Unfortunately  they 
did  not  specify  the  differences  in  the  software  being  evaluated, 
reducing  the  usefulness  of  this  study  for  practitioners.  Also,  the 
evaluation  had  two  phases:  self-guided  exploration  and  pre¬ 
scribed  scenarios.  Gray  and  Salzman  (1998)  claimed  that  this 
study’s  construct  validity  was  threatened  by  experience  gained 
in  self-guided  exploration  in  the  scenario  phase. 

For  “effect  construct  validity,”  different  evaluation  methods 
focus  on  either  intrinsic  attributes  of  the  interface  (causes)  or 
pay-off  measures  of  usability  (e.g.,  effects  such  as  performance 
time  or  errors).  However,  “None  of  the  studies  we  reviewed  re¬ 
port  systematic  ways  of  relating  pay-off  problems  to  intrinsic  fea¬ 
tures,  all  apparently  rely  upon  some  form  of  expert  judgment” 
(Gray  &  Salzman,  1998,  p. 2 16).  Although  they  “believe  that  effect 
construct  validity  is  the  single  most  important  issue  facing  HCI 
researchers  and  practitioners,”  they  gave  this  little  coverage. 

External  Validity 

Do  the  results  of  the  study  generalize  to  a  subpopulation  or 
across  a  subpopulation?  An  external  validity  problem  would  oc¬ 
cur  if  the  results  were  generalized  beyond  the  settings  in  the 
experiment. 

Conclusion  Validity 

Gray  and  Salzman  (1998)  introduced  a  fifth  type  of  validity: 
conclusion  validity.  A  study  is  conclusion  invalid  either  if  the 
claims  were  not  investigated  in  the  study  or  if  the  data  pre¬ 
sented  in  the  study  contradicted  their  claim.  In  one  study, 
Nielsen  (1992)  investigated  the  effects  of  evaluator  expertise 
on  the  number  of  problems  identified  using  HE  and  claimed, 
“usability  specialists  were  much  better  than  those  without  us¬ 
ability  expertise  by  finding  usability  problems  with  heuristic 
evaluation.”  Gray  and  Salzman  claimed  this  study  has  conclu¬ 
sion  invalidity  because  it  is  not  clear  what  the  effect  of  HE  was 
on  the  evaluator’s  ability  to  Hnd  usability  problems  or  how  many 
problems  evaluators  would  find  through  expert  review.  They 
suggested  the  data  supported  the  more  modest  claim  that  “ex¬ 
perts  named  more  problems  than  nonexperts.”  Gray  and  Salz¬ 
man  found  that  many  studies  presented  conclusions  that  were 
not  supported  by  the  data.  They  did  not  argue  against  present¬ 
ing  advice  based  on  experience  rather  than  experimental  evi¬ 
dence,  but  argued  that  the  source  of  such  advice  should  be 
made  explicit. 

Improving  UIM  Assessment  and  Comparison 

Gray  and  Salzman’s  (1998)  critique  meant  that  we  can  trust  few 
existing  simple  comparisons  of  UIMs,  and  thus  a  claim  that  HE 
as  a  UIM  finds  more  problems  than  CW  could  lack  credibility. 


However,  studies  are  still  relevant  to  the  extent  that  they 
(strongly)  suggest  ways  in  which  specific  discovery  and  analy¬ 
sis  resources  influence  outcomes.  Still,  it  would  be  better  if  we 
had  assessments  that  we  can  trust,  so  we  first  consider  what  is 
required  here. 

Two  of  Gray  and  Salzman’s  (1998)  challenges  are  easily  met: 
(statistical)  conclusion  validity  is  a  fundamental  scientific  con¬ 
cern.  Researchers  should  not  make  unsupported  claims  (and 
referees  and  editors  should  spot  them).  Due  caution  and  can¬ 
dor  are  all  that  is  required  to  avoid  writing  invalid  conclusions. 
Due  vigilance  is  required  to  prevent  their  publication.  Eor  statis¬ 
tical  conclusion  validity,  seeking  competent  statistical  advice 
before  carrying  out  studies  is  enough. 

In  contrast,  external  validity  is  a  challenge  for  most  psycho¬ 
logical  experiments.  This  follows  from  the  controls  that  ensure 
internal  and  construct  validity.  However,  if  these  are  not  prop¬ 
erly  addressed,  then  there  can  be  no  external  validity,  since  this 
can  result  in  unrealistic  settings  that  cannot  generalize.  Internal 
and  construct  validity  are  logically  prior  to  external  validity.  Safe 
generalizations  cannot  be  made  from  unsound  experiments. 

Causal  construct  validity  can  be  addressed  by  ensuring  that 
assessed  UIMs  are  actually  applied  by  analysts  and  that  they  re¬ 
sult  in  predictions.  At  the  very  least,  UIM  use  should  be  com¬ 
pared  to  unstructured  expert  judgment,  but  this  practice  is 
very  rare  (one  exception  being  Bastien  &  Scapin,  1995).  "Where 
this  is  not  done,  it  may  still  be  possible  to  establish  whether  the 
assessed  UIM  method  actually  resulted  in  the  prediction.  For 
example,  HE  cannot  be  claimed  to  have  supported  a  prediction 
if  cited  heuristics  for  it  are  inappropriate  as  seen  in  Cockton 
and'Woolrych  (2001). 

Effect  construct  validity  is  a  question  of  good  experimental 
design,  ensuring  that  instruments  and  measures  in  a  study  really 
do  report  the  true  value  of  theoretical  dependent  variables.  In 
too  many  studies,  the  main  dependent  variable  has  been  the 
percentage  of  accurate  predictions.  However,  as  already  argued, 
100%  cannot  be  known  with  certainty,  and  missed  problems 
(genuine  misses  and  false  negatives)  may  be  highly  significant, 
as  may  false  positives.  Lastly  internal  validity  requires  control 
of  five  confounding  variables,  which  we  now  describe. 

Analyst  Training  and  Preparation 

This  confound  is  a  direct  consequence  of  the  impact  of  the 
analyst  preparation  phase.  Analysts  who  do  not  understand  a 
UIM,  the  system  under  evaluation  or  the  expected  context  of  its 
use  cannot  be  expected  to  generate  sensible  predictions.  Ana¬ 
lysts  who  do  not  understand  a  UIM  can  readily  predict  many  false 
positives  and  miss  problems,  which  would  misrepresent  a  UIM’s 
quality.  Studies  must  carefully  control  and  assess  analysts’  un¬ 
derstanding.  It  should  not  be  possible  to  wholly  attribute  inap¬ 
propriate  UIM  use  and  its  impact  to  analyst  misunderstanding. 

Reliability  of  Prediction  Merging 

"When  assessing  UIMs,  it  is  essential  to  use  a  range  of  analysts. 
Their  predictions  must  be  merged  to  produce  a  single  problem 
set.  This  procedure  must  be  reliable  and  carefully  controlled 
(Lavery  Cockton,  &  Atkinson,  1997).  Errors  in  forming  a  single 
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set  of  predictions  for  each  and  all  analysts  will  corrupt  the  prob¬ 
lem  count,  as  well  as  distorting  the  counts  of  (in) appropriate 
UIM  applications.  Structured  problem  report  formats  (Lavery 
&  Cockton,  1997b)  have  been  shown  to  increase  control  over 
prediction  merging  (Cockton  &  Woolrych,  2001). 

Systematic  matching  rules  require  structured  usability 
problems.  The  individual  components  of  a  usability  problem 
report  must  be  reported  separately.  However,  earlier  problem 
report  formats  (Jeffries,  1994;  Mack  &  Montaniz,  1994;  John 
&  Packer,  1995)  either  restricted  the  types  of  problems  that 
can  be  recorded,  missed  some  crucial  aspect  of  a  problem,  or 
were  unstructured  and  hence  offer  no  guidance.  Overall,  prob¬ 
lem  matching  in  existing  studies  has  been  too  liberal  and 
sometimes  based  on  such  crude  tactics  as  matching  any  prob¬ 
lems  that  concern  the  same  design  feature  (Cockton  &  Wbol- 
rych,  2001). 

Ability  of  User  Testing  to  Expose  Actual  Problems 

Constructing  a  real  problem  set  is  best  achieved  through 
user  testing,  which  brings  challenges.  Test  design  and  partici¬ 
pants  will  affect  the  number  and  types  of  usability  problems  dis¬ 
covered.  By  missing  usability  problems,  both  UIMs  and  user 
testing  may  misreport  product  quality,  depending  upon  the 
metrics  and  methods  employed.  When  user  test  results  are  used 
to  validate  UIMs,  failures  in  user  test  design  will  result  in  mis¬ 
coding  (as  false  positives)  problems  that  a  UIM  can  predict. 

Reliability  of  Known  Problem  Extraction 

Covering  a  wide  range  of  problems  in  user  testing  does  not 
ensure  that  such  problems  will  reach  the  final  documented 
problem  set.  Errors  in  forming  the  set  of  actual  problems  from 
user  testing  will  have  predicted  problems  miscoded  as  false  pos¬ 
itives.  The  SUPEX  method  (Cockton  &  Lavery,  1999)  addresses 
this  extraction  problem.  Informed  interpretation  of  problem 
(type)  counts  requires  explicit  levels  of  problem  generalization. 
Similarly,  confidence  in  coding  predictions  as  (un)successful  de¬ 
pends  on  the  abstraction  level  for  the  actual  problem  set. 
SUPEX  also  addresses  these  issues  of  abstraction  and  general¬ 
ization.  However,  SUPEX  is  very  labor  intensive  and  has  yet  to 
be  used  in  UIM  assessment.  Still,  problem  extraction  is  a  known 
methodological  problem  (Jacobsen  et  al.,  1998).  Alternatives 
to  SUPEX  involve  analysts  comparing  and  resolving  problem  set 
differences  (Vermeeren,  Kesteren,  &  Bekker,  2003) 

Reliability  of  Matching  Predicted  to  Actual  Problems 

and  Structured  Report  Formats 

To  assess  a  UIM,  predictions  must  be  matched  against  actual 
problems.  Errors  in  matching  risk  miscoding  predicted  problems 
as  false  positives  or  vice  versa.  Existing  approaches  to  problem 
matching  have  been  either  ad  hoc  or  specialized.  Ad  hoc  ap¬ 
proaches  rely  on  subjective  judgment,  and  hence  are  unlikely  to 
be  replicable,  and  therefore  a  true  measure  of  the  method’s  qual¬ 
ity  may  not  be  reported.  Existing  specialized  approaches  (Lewis 
et  al.,  1990;  Mack  &  Montaniz,  1994)  either  are  intended  for  only 


one  UIM  (and  so  do  not  generalize),  and/or  ignore  some  crucial 
aspects  of  usability  problems.  An  investigation  of  the  effects  of 
independent  problem  matching  found  evidence  of  matcher  bias 
(Lavery,  Cockton,  &  Atkinson,  1997),  and  recommended  struc¬ 
tured  problem  reports  as  a  way  of  reducing  this.  Without  such  re¬ 
ports,  problem  counts  are  likely  to  be  so  unreliable  that  few  cal¬ 
culations  in  UIM  assessment  can  be  trusted.  Connell  and 
Hammond  (1999)  extensively  supported  this  view  in  their  as¬ 
sessment  of  HE,  where  they  show  that  the  impact  of  multiple  an¬ 
alysts  on  cumulative  problem  counts  depends  on  problem  merg¬ 
ing,  resulting  in  a  graph  that  is  less  optimistic  than  that  in 
(Nielsen  &  Landaur,  1993)  and  that  the  handling  of  problem  in¬ 
stances  versus  problem  types  can  inflate  problem  counts  in  pub¬ 
lished  studies.  Connell  and  Hammond  showed  how  this  would 
increase  the  estimates  for  test  participant  needs  derived  from 
problem  sets,  specifically  the  one  used  in  Nielsen  (1994b). 
Replicable  and  robust  assessments  of  UIMs  thus  need  matching 
rules  applicable  to  a  wide  range  of  methods.  Such  rules  can  only 
be  based  on  structured  problem  formats. 

There  is  no  universal  method  for  usability  problem  reporting, 
but  our  structured  problem  report  format  (SPREs;  Fig.  59.5)  aids 
problem  reporting  and  merging  (Cockton  &  Woolrych,  2001). 

SPREs  provide  multiple  points  of  reference  for  problem 
merging,  both  to  form  a  master  set  of  problem  types  from  mul¬ 
tiple  analyses  and  to  compare  predicted  to  actual  problems,  re¬ 
ducing  the  potential  for  miscoding  predictions  as  false  (or  true) 
positives.  Our  SPRE  requires  eacb  analyst  to  provide  a  brief  de¬ 
scription  for  every  problem,  as  well  as  likely  difficulties  the  user 
may  encounter  because  of  the  problem  and  the  specific  con¬ 
texts  in  which  problems  may  occur.  The  detail  and  accuracy  of 
true  positives,  even  from  final  year  undergraduate  students,  is 
surprising.  The  laziest  false  positives  however  can  be  even  more 
disappointing  in  the  stark  clarity  of  a  SPRE. 

Controlling  Confounding  Variables 

Studies  by  the  authors  (Cockton  &  Woolrych,  2001;  Cockton, 
Woolrych,  Hall,  &  Hindmarch,  2003)  have  made  systematic  at¬ 
tempts  to  control  the  above  five  confounding  variables.  While 
control  of  all  the  variables  can  be  further  improved,  these  stud¬ 
ies  show  what  is  possible  for  improving  study  quality.  Three 
forms  of  training  addressed  analysts’  understanding  of  HE.  Con¬ 
trolling  this  variable  aimed  to  improve  causal  construct  and  in¬ 
ternal  validity.  As  noted,  use  of  SPREs  eases  merging  of  analyst 
predictions  and  improves  effect  construct  validity  (errors  in 
problem  set  construction  result  in  errors  in  effect  measure¬ 
ment)  .  The  ability  of  user  testing  to  expose  actual  problems  was 
addressed  by  systematically  deriving  task  sets  for  user  testing 
from  analyst  predictions,  again  improving  effect  construct 
validity  through  falsification  testing  (Woolrych,  Cockton,  & 
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Specific  Context  (If  Applicable) 
Assumed  Causes 


FIGURE  59.5.  Sunderland  structured  problem  report  format. 
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Hindmarch,  2004).  Reliability  of  actual  problem  extraction  was 
addressed  by  use  of  general  principles  from  the  SUPEX  method 
(Cockton  &  Lavery,  1999),  such  as  transcription,  segmentation, 
and  difficulty  isolation  and  generalization  (again,  full  use  of  SU¬ 
PEX  remains  too  labor  intensive).  Lastly,  reliability  of  matching 
predicted  to  actual  problems  was  addressed  by  directly  associ¬ 
ating  actual  with  predicted  problems.  SPRFs  ease  such  associa¬ 
tions.  Use  of  a  report  format  for  both  predicted  and  actual  prob¬ 
lems  (Lavery  &  Cockton,  1997b)  would  further  improve  effect 
construct  validity. 

Outstanding  Methodological  Challenges 

Because  of  improving  internal  validity,  external  validity  is  im¬ 
proved  by  supporting  focused  and  accurate  conclusions.  Fur¬ 
ther  support  however  is  required  from  extensive  coding  of  ac¬ 
curate  predictions,  unpredicted  problems,  and  false  positives. 
Coding  schemes  need  to  use  key  usability  concepts  to  provide 
practitioners  with  scoping  as  well  as  scoring  of  methods,  for  ex¬ 
ample,  the  problems  that  are  (not)  predicted  must  be  analyzed 
for  patterns  in  their  frequency,  impact,  discovery  resources,  and 
other  relevant  constructs.  In  addition,  predictions  that  are  due 
to  the  assessed  UIM  must  be  separated  from  ones  that  are  not. 
Thus,  Sears’  (1997)  measures  of  effectiveness  are  not  enough. 
To  achieve  causal  construct  validity,  successful  predictions  must 
be  shown  to  be  due  to  the  UIM  being  assessed.  To  support  prac¬ 
tice,  missed  problems  and  false  positives  must  be  analyzed 
across  a  range  of  coding  dimensions  to  see  why  they  arise,  and 
how  they  can  be  avoided. 


UNDERSTANDING  INSPECTION  METHODS 


Software  developers  need  to  know  the  potential  benefits  of  us¬ 
ing  a  particular  method  (e.g.,  the  proportion  of  usability  prob¬ 
lems  predicted),  the  resources  needed  (e.g.,  time  to  evaluate 
the  system,  training  time  for  evaluators,  evaluators  with  partic¬ 
ular  skills),  and  the  risks  (e.g.,  missing  certain  types  of  problems 
or  predicting  false  usability  problems) .  The  factors  that  may  af¬ 
fect  the  quality  of  the  predictions,  for  example  maturation  of  the 
prototype,  skill  level  of  the  evaluator,  or  choice  of  different  pro¬ 
totyping  media,  also  need  to  be  understood  and  communicated 
through  method  usage  heuristics. 

All  these  factors  are  best  explained  via  a  qualitative  model  of 
UIM  usage.  The  first  two  authors  have  developed  such  a  model, 
which  is  now  presented  as  a  basis  for  understanding  and  as¬ 
sessing  UIMs. 

The  DARe  Model  of  UIM  Effectiveness 

The  DARe  model  (Discovery  and  Analysis  Resources,  initially 
DR-AR;  Woolrych  &  Cockton,  2002)  was  derived  from  an  at¬ 
tempt  to  understand  HE’s  scope  and  accuracy  (Woolrych,  2001), 
in  particular,  the  source  of  false  positives.  In  any  predictive 
method,  analysts  inspect  a  design  for  potential  usability  prob¬ 
lems.  Where  multiple  analysts  apply  a  UIM,  their  problems  must 
be  merged  into  a  master  problem  set.  This  will  invariably  con¬ 


tain  false  positives  (problems  predicted  by  analysts  that  are  not 
problems  for  actual  users).  Now,  these  can  be  only  incorrectly 
analyzed  problems.  They  could  not  have  been  incorrectly  <iA- 
covered,  as  the  only  discovery  error  is  a  genuine  miss — analy¬ 
sis  is  responsible  for  all  other  errors. 

Not  all  false  positives  are  predicted  by  every  analyst,  leading 
to  a  question:  Can  some  analysts  correctly  eliminate  these 
through  a  process  of  analysis  as  true  negatives?  This  conjecture 
arises  directly  from  the  DARe  model,  with  its  logical  phases 
where  analysts  first  discover  possible  usability  problems,  then 
through  analysis  either  confirm  them  as  probable  problems  or 
eliminate  them  as  improbable  problems.  Fig.  59.6  illustrates 
DARe’s  basic  form  as  two  phases  of  discovery  and  analysis.  They 
are  logical  phases  that  may  overlap,  both  in  time  and  within  an¬ 
alysts’  heads. 

This  simple  model  can  be  enhanced  by  noting  that  inconsis¬ 
tencies  in  individual  performance  (differences  in  the  problems 
discovered,  successful  and  false  predictions)  are  due  to  re¬ 
sources  external  to  UIMs.  If  analysts  were  wholly  reliant  on  re¬ 
sources  supplied  by  the  UIM,  differences  in  performance  could 
not  possibly  be  as  diverse  as  they  are.  Further  support  for  this 
argument  is  demonstrated  by  Cockton  and  Woolrych  (2001), 
where  analysts  successfully  predicted  a  usability  problem  but 
applied  an  inappropriate  heuristic  to  explain  it.  This  indicates 
the  UIM  was  not  responsible  for  problem  discovery,  and  at  the 
very  least,  any  problem  confirmation  was  based  on  HE  misap¬ 
plication.  A  more  likely  explanation  is  that  analysts  are  employ¬ 
ing  resources  external  to  the  UIM. 

The  two  logical  phases  of  discovery  and  analysis  were  used  in 
the  first  section  of  this  chapter  but  with  a  preceding  phase  of 
analyst  preparation.  The  distinction  between  problem  discovery 
and  analysis  was  made  without  justification.  The  logic  behind 
this  has  now  been  explained. 

The  DARe  Model  and  the  Scoping  of  UIMs 

The  degree  to  which  UIMs  alone  influence  problem  discovery 
and  analysis  during  usability  inspection  has  been  assessed  in 
two  large  studies  of  HE  (Woolrych,  2001;  Cockton,  Woolrych, 
Hall,  et  al.,  2003).  These  studies  showed  that  factors  external  to 


FIGURE  59.6.  Simple  structure  of  DARe  model. 
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UIMs  influence  analysts  during  inspection.  Analysis  of  predic¬ 
tions  reveals  distributed  cognitive  resources  relevant  to  analyst 
performance,  especially  how  misuse  of  specific  distributed  cog¬ 
nitive  resources  leads  to  false  positives.  In  the  first  study,  Wool- 
rych  (2001)  found  three  distinct  types  of  false  positives  were 
related  to  the  use  of  specific  resources.  The  first  type,  true  false 
positives,  generally  involved  misuse  of  beliefs  about  users 
whereby  abilities  were  underestimated.  Analysts  predicted  pos¬ 
sible  user  difficulties  with  a  particular  system  feature  that  did 
not  occur  in  actual  use.  Users  would  often  encounter  these  fea¬ 
tures,  and  in  the  event  of  any  minor  difficulties,  very  quickly  re¬ 
cover  without  problems. 

The  second  type,  factually  bogus  false  positives,  concerned 
what  analysts  believed  to  be  an  essential  missing  feature  in  the 
system  that  would  adversely  affect  the  user  efficiently  perform¬ 
ing  their  tasks.  In  reality,  such  features  did  exist,  however,  due  to 
limited  product  knowledge;  the  analyst  was  unaware  of  such 
features,  leading  to  flawed  predictions. 

The  third  type,  logically  bogus  false  positives,  concerned  pre¬ 
dictions  based  on  flawed  arguments  by  analysts  drawing  on  de¬ 
sign  knowledge.  Analysts  argued  that  a  particular  feature  would 
benefit  from  an  alternative  design  and  reverse  engineered  a  ra¬ 
tionale  that  their  design  was  superior.  Once  again,  distributed 
cognitive  resources  (here  flawed  beliefs)  external  to  the  UIM 
were  being  used. 

Development  of  the  DARe  model  and  initial  evidence  of  cog¬ 
nitive  resources  has  motivated  further  studies  and  identified 
more  resources  (Woolrych,  Cockton,  &  Hindmarch,  2005).  So 
far,  seven  resource  classes  have  been  identified  using  qualitative 
coding  of  problem  sets: 
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FIGURE  59.7.  Cognitive  resources  and  problem  discovery  in  the 
DARe  model  (U  =  Unstructured,  S  =  Structured  Discovery 
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•  User  (knowledge  of/beliefs  about  users,  especially  experi¬ 
ence  and  abilities) 

•  Task  (knowledge  of  what  users  want  to  do  and  how  they  pre¬ 
fer  to  accomplish  this) 

•  Domain  (domain  knowledge  that  is  specific  to  the  system 
being  evaluated) 

•  Design  (knowledge  and  experience  of  interaction  design 
principles,  or  beliefs  in  lieu  of) 

•  Interaction  (knowledge  of  the  how  humans  actually  interact 
with  computers) 

•  Technical  (knowledge  of  platform  technologies  such  as 
browsers  and  toolkits) 

•  Product  (information  about  the  system  and  its  capabilities) 

Fig.  59.7  and  Fig.  59.8  extend  the  simple  DARe  model  by 
showing  how  distributed  cognitive  resources  (knowledge,  in¬ 
formation,  and  beliefs)  are  associated  with  discovery  and  analy¬ 
sis  in  usability  inspection. 

In  Fig.  59.7,  distributed  cognitive  resources  are  associated 
with  system-  and  user-centered  discovery  methods.  Thus,  sys¬ 
tem-centered  methods  (e.g.,  system  scanning  and  searching) 
predominantly  use  technical,  product,  and  design  resources. 
Thus,  methods  such  as  HE  collect  design  knowledge  around 
heuristics.  If  HE  in  any  way  supports  problem  discovery  it  is 
through  heuristics  triggering  concerns  as,  for  example,  error 
messages  and  navigation  are  inspected.  User-centered  meth¬ 
ods  (e.g.,  goal  playing  and  procedure  following)  use  contextual 
information  about  users,  tasks,  and  domains  (if  not,  beliefs  must 


FIGURE  59.8.  Cognitive  resources  and  problem  analysis  in  the 
DARe  model. 


be  substituted)  and  may  integrate  knowledge  of  interaction  pat¬ 
terns  with  system-centered  resources.  CW  relies  exclusively  on 
task  information  to  find  problems. 

Appropriate  task,  technical,  and  domain  knowledge  can  aid 
analysts  in  valid  problem  discovery  and  analysis.  Knowledge  of 
tasks  allows  analysts  to  discover  problems  that  require  sophis¬ 
ticated  levels  of  interaction  with  the  system  before  they  become 
obvious.  Simple  system-centered  discovery  approaches  would 
be  unlikely  to  discover  such  problems. 

Fig.  59.8  shows  some  example  resource  usage  during  prob¬ 
lem  analysis  (confirmation  or  elimination).  Incorrect  user  be¬ 
liefs  (which  are  often  insults)  often  lead  to  a  failure  to  eliminate, 
and  thus  to  a  false  positive  (hence  the  !  in  Fig.  59.8).  Product 
and  interaction  knowledge  can  lead  to  correct  elimination 
(marked  with  a  cross).  Product  knowledge  gathered  during  in¬ 
spection  can  be  used  to  undo  earlier  invention  of  bogus  prob¬ 
lems.  Knowledge  of  how  people  interact  with  computers  can 
imply  that  apparent  design  flaws  will  have  no  impact,  for  exam¬ 
ple,  because  users  will  never  notice  them  (selective  attention) 
or  because  they  can  learn  what  they  need  through  interaction 
(distributed  cognition).  Technical  and  design  knowledge  can 
be  used  to  correctly  confirm  a  problem  (hence  the  tick) .  How¬ 
ever,  reuse  of  resources  in  both  discovery  and  analysis  intro¬ 
duces  risk  of  confirmation  bias.  Using  the  same  resources  in 
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discovery  and  analysis  adds  nothing,  and  is  thus  a  source  of  false 
positives  (Cockton,  Woolrych,  &  Hindmarch,  2004),  hence  the 
repeat  discovery  for  task  and  domain  knowledge. 

The  interaction  of  distributed  cognitive  resources  is  the  key 
to  successful  prediction  (Woolrych,  Cockton,  &  Hindmarch, 
2005)  but  still  needs  to  be  researched  in  depth.  Work  contin¬ 
ues  to  uncover  resource  interactions,  but  DARe  has  already  im¬ 
proved  understanding  of  how  analysts  perform  and  use  UIMs, 
which  is  fundamental  to  any  attempt  to  improve  them.  Thus, 
the  question  arises  as  to  whether  a  focus  on  distributed  cogni¬ 
tive  resources  is  best  exploited  via  education  and  training  (fix 
the  analyst),  via  improvements  to  specific  UIMs  (fix  the 
method),  or  via  generic  improvements  to  all  UIMs  (fix  inspec¬ 
tion).  The  latter  follows  from  the  ubiquitous  influence  of  dis¬ 
tributed  cognitive  resources  across  all  UIMs,  suggesting  that  it 
may  be  possible  to  improve  use  of  resources  across  all  UIMs. 
However,  there  are  limits  to  what  can  be  achieved  through  im¬ 
provements  to  usability  evaluation.  Redesign  and  downstream 
utility  require  separate  improvement. 

The  DARe  Model  and  the  Development  Lifecycle 

Some  studies  (e.g..  Sawyer,  Flanders,  &  Wixon,  1996;  John  & 
Marks,  1997;  Johnson,  1997)  examined  the  ability  (or  persua¬ 
siveness)  of  evaluation  methods  to  improve  usability,  dividing  the 
number  of  usability  problems  (or  committed  to  be)  fixed  by  the 
number  of  problems  reported.  However,  although  the  impact 
of  an  inspection  may  be  high,  there  is  no  guarantee  that  by  ad¬ 
dressing  its  recommendations  that  the  system  would  be  im¬ 
proved — indeed  it  could  be  made  worse,  (e.g.,  Nielsen,  1993). 
John  and  Marks  (1997)  thus  measured  design-change  (in)effec- 
tiveness  by  whether  an  implemented  recommendation  in¬ 
creases,  leaves  unaffected,  or  reduces  usability  problems.  They 
found  that  only  11%  of  predicted  problems  led  to  an  effective 
change.  This  downstream  utility  of  UIMs  is  now  becoming  a  ma¬ 
jor  concern,  with  passionate  claims  by  Wixon  (2003)  that  the  aim 
of  evaluation  is  to  not  to  find  problems  but  to  fix  them,  a  position 
akin  to  expecting  a  medical  thermometer  to  prescribe  antibiotics 
(presumably  via  an  automatic  intervening  diagnosis).  As  already 
argued,  UIMs  must  support  effective  redesign,  through  initial 
extensive  causal  analysis,  but  there  are  many  practical  reasons 
why  evaluators  alone  should  not  be  propose  design  changes 
(Cockton,  2005),  notwithstanding  the  logical  error  of  expecting  a 
measuring  device  to  repair  the  object  to  which  it  is  applied. 

The  potential  value  of  the  DARe  model  lies  in  its  identifica¬ 
tion  of  distributed  cognitive  resources,  which  provide  exten¬ 
sive  effective  support  for  causal  analysis  and  identification  of 
redesign  options.  However,  the  main  motivation  for  evaluation 
in  interactive  systems  development  is  to  drive  iterative  design. 
Iteration  was  original  conceived  as  an  attribute  of  software  de¬ 
velopment  processes.  However,  it  is  better  conceived  as  a  phase 
that  bridges  from  evaluation  to  redesign  (Cockton,  2005).  Iter¬ 
ation  takes  the  results  of  evaluation,  that  is,  an  identification  of 
relevant  usability  problems  along  with  a  partial  to  near  complete 
understanding  of  them,  and  then  works  towards  a  redesign  to 
overcome  these  problems  on  the  basis  of  a  causal  analysis.  Get¬ 
ting  from  evaluation  to  the  next  iteration  of  a  design  is  not  a  sim¬ 
ple  step,  nor  should  it  be.  It  is  illogical  to  advocate  extensive  in¬ 
formed  grounding  of  interaction  design  in  contextual  research 


and  design  rationales  during  the  initial  design  of  an  interactive 
system,  and  then  to  expect  evaluation  methods  to  generate  de¬ 
sign  change  recommendations  without  again  grounding  these 
in  business  goals,  usage  contexts,  design  principles,  and  design 
constraints.  The  only  difference  between  redesign  and  initial  de¬ 
sign  is  that  a  design  has  been  subjected  to  evaluation.  With 
UIMs,  this  produces  only  one  further  design  input,  a  list  of  pre¬ 
dicted  and  partially  explained  user  difficulties.  These  must  be 
assessed  in  conjunction  with  all  other  design  inputs  by  the 
whole  development  team. 

Thus,  the  issue  for  UIMs  is  how  well  they  support  iteration.  It 
is  impossible  to  attribute  downstream  utility  to  the  UIM  alone.  A 
whole  range  of  factors  associated  with  the  development  team, 
process  and  context  will  combine  to  determine  which  problems 
get  addressed  and  which  ones  do  not.  For  example,  formal  ex¬ 
periments  maybe  required  to  complete  causal  analysis,  at  which 
point  the  influence  of  the  UIM  will  be  masked  by  the  impact  of 
the  experiments,  which  could  explore  alternative  layouts  or  vi¬ 
sual  designs  in  a  specific  interaction  context. 

It  is  important  to  have  a  balanced  view  of  what  UIMs  can 
achieve  within  the  development  process,  something  that  ap¬ 
plies  to  all  evaluation  methods.  Attempts  to  improve  usability  in¬ 
spection  should  be  restricted  to  the  proper  reporting  of  pre¬ 
dicted  problems,  with  maximum  preparation  for  iteration 
through  causal  analysis  and  identification  of  redesign  options. 
With  this  scope  in  mind,  we  now  consider  how  UIMs  and  their 
usages  can  be  improved. 


IMPROVING  INSPECTION  METHODS 


Any  improvement  in  UIMs  can  only  be  achieved  when  what 
goes  wrong  in  inspection  is  well  understood,  which  requires 
continuing  advances  in  research  methods  to  ensure  well 
grounded  understandings.  Understanding  UIMs,  and  how  ana¬ 
lysts  apply  them,  is  the  starting  point  from  where  improvements 
to  methods  can  be  made.  During  inspection,  analysts  can  fail  to 
discover,  reject  real  problems,  and  keep  improbable  problems, 
adversely  affecting  both  thoroughness  and  validity.  Where  mul¬ 
tiple  analysts  are  involved  in  inspection,  errors  in  forming  the 
master  problem  set  can  also  adversely  affect  a  UIM’s  thorough¬ 
ness  and  validity. 

Once  analysts  discover  a  candidate  problem  during  in¬ 
spection,  a  process  of  analysis  takes  place  where  the  problem 
is  either  kept  as  a  likely  problem  or  rejected  as  an  unlikely  prob¬ 
lem.  Therefore,  Woolrych,  Cockton,  &  Hindmarch  (2004)  de¬ 
termined  five  possible  prediction  outcomes  for  UIMs,  of  which 
only  the  first  two  are  desirable: 

1.  True  positive — discovery  of  a  real  problem  that  is  correctly 
confirmed  through  analysis,  which  will  cause  real  difficulties 
for  users  (the  objective  of  inspection  being  to  find  and  re¬ 
port  real  problems) 

2.  True  negative — a  possible  problem  that  is  correctly  elimi¬ 
nated  through  analysis  (failure  to  eliminate  this  would  re¬ 
sult  in  a  false  positive) 

3.  False  positive — a  discovered  problem  that  is  incorrectly  con¬ 
firmed  through  analysis  (which  may  result  in  flawed  design 
iterations) 
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4.  False  negative — a  discovered  problem  that  is  incorrectly 

eliminated  through  analysis  (undesirable,  as  without  ex¬ 
tended  report  formats,  it  will  be  appear  as  a  missed  problem) 

5.  Missed  Problems — real  problems  that  exist  that  analysts  fail 

to  even  discover 

Method-Independent  Improvements 

Problem  discovery  is  central  to  a  UIM’s  thoroughness;  therefore, 
appropriate  discovery  resources  and  methods  are  essential.  In 
problem  discovery,  analysts  apply  either  a  system-  or  user-cen¬ 
tered  approach  (Fig.  59.7).  Similarly,  each  approach  can  be  struc¬ 
tured  or  unstructured.  Structured  user-centered  approaches 
(task  procedure  based  or  goal  driven.  Fig.  59.7)  appeared  to  be 
more  valid  (Cockton,  Woolrych,  Hall,  et  at,  2003).  Conversely, 
unstructured,  system-centered  approaches  (simply  scanning  the 
interface  for  potential  usability  problems.  Fig.  59.7)  resulted  in 
more  false  positives.  Applying  a  UIM  using  a  structured  user-cen¬ 
tered  approach  resulted  in  analysts  keeping  more  real  problems 
and  fewer  false  positives.  Such  improvements  may  be  due  to  a 
search  strategy  that  focuses  analysts  on  users  and  tasks  within  a 
usage  context;  for  example,  it  uses  effective  discovery  resources. 

Other  issues  independent  of  UIMs  also  affect  their  overall 
performances.  Issues  associated  with  multiple  analysts,  such  as 
problem-merging  and  problem-reporting  methods,  also  require 
attention  if  they  are  not  to  negatively  impact  inspection  out¬ 
comes.  Errors  in  merging  predictions  into  a  single  set  of  pre¬ 
dictions  for  each  analyst  can  affect  the  problem  count. 

Multiple  analysts  are  recommended  to  improve  inspection 
method  thoroughness  (Nielsen  &  Landauer,  1993).  The  DARe 
model  explains  why  multiple  analysts  find  more  problems  when 
the  method  and  environment  remain  the  same.  Quite  simply 
as  more  analysts  are  added,  more  problem  discovery  resources 
are  added  to  the  inspection,  hence  more  problems  are  found. 
However,  the  DARe  model  also  leads  us  to  considering  the  im¬ 
pact  of  multiple  analysts  on  analysis  resources.  If  analysts  are 
inclined  to  not  reject  problems  at  all,  or  if  they  are  unduly  con¬ 
fident  in  confirming  problems,  then  multiple  analysts  will  not 
only  be  collectively  more  thorough,  and  thus  find  more  prob¬ 
lems,  but  they  will  also  be  collectively  less  valid,  and  thus  predict 
more  false  positives.  This  conjecture  (Woolrych  &  Cockton, 
2002),  which  is  based  on  an  early  version  of  DARe  (the  DR-AR 
model) ,  turns  out  to  be  true  when  prediction  sets  for  multiple 
analysts  are  examined.  This  dilution  of  analysis  resources  could 
be  counteracted  by  improved  analyst  training  in  UIM  usage. 

SPRFs  (Fig.  59.5)  were  extended  to  support  our  research  in 
UIM  assessment,  but  actually  resulted  in  unexpected  improved 
performance  in  a  study  by  Cockton,  Woolrych,  Hall,  et  al. 
(2003).  To  accurately  code  all  analyst  predictions,  extended 
SPRFs  (ESPRFs)  were  developed  to  elicit  problem  discovery 
strategies,  to  distinguish  genuine  misses  from  false  negatives  (a 
possibility  that  only  arose  because  of  the  DARe  model) ,  and  to 
distinguish  between  distributed  cognitive  resource  usage  in  dis¬ 
covery  and  analysis  (again,  a  concern  that  was  a  direct  conse¬ 
quence  of  the  DARe  model). 

ESPRFs  require  analysts  to  indicate  the  strategy  they  used  to 
discover  individual  problems  (a  combination  of  unstructured/ 
structured  and  system/user  centered).  In  addition,  all  discov¬ 
ered  problems  (rejected  or  kept)  were  reported  with  relevant 


justification.  This  allowed  for  not  only  the  accurate  coding  of 
genuine  misses,  and  true  and  false  negatives,  but  also  identifi¬ 
cation  of  distributed  cognitive  resources  influencing  such  deci¬ 
sions.  Surprisingly,  ESPRFs  resulted  in  a  significant  decrease  in 
false  positives.  Improved  validity  can  be  reasonably  attributed  to 
the  impact  of  ESPRFs,  which  encouraged  analysts  to  reflect  on 
discovery  methods  and  to  eliminate  improbable  problems. 

Thus,  method-independent  improvements  to  UIMs  are  pos¬ 
sible.  As  well  as  fixing  the  method  or  fixing  the  analyst,  we  can 
fix  inspection  in  general.  Surprisingly,  tacit  training  and  con¬ 
straint  via  ESPRFs  proved  to  be  more  effective  than  explicit  train¬ 
ing  through  structured  tutorials  and  self-assessment.  False  pos¬ 
itive  reduction  was  mostly  due  to  elimination  of  bogus  problems 
(both  factual  and  logical),  since  these  are  hard  to  report  in 
ESPRFs.  In  addition,  ESPRFs  led  analysts  to  consider  problem 
elimination,  which  may  reduce  the  negative  impact  on  analysis 
resources  of  multiple  evaluators.  Lastly,  the  appropriateness  of 
heuristic  application  also  doubled  to  6l%/65%  (mean/median 
scores  for  comparison  of  our  2001  and  2003  studies).  (Cockton 
&  Woolrych,  2001;  Cockton  et  al.,  2003)  All  these  improvements 
have  been  replicated  in  a  further  study  by  Woolrych,  Cockton, 
and  Hindmarch  (2004),  indicating  that  method-independent 
ESPRFs  can  improve  validity  and  appropriateness  in  UIM  usage 
with  a  slight  improvement  in  thoroughness  (which  could  be  re¬ 
versed  following  asymptotic  testing). 

Method  Specific  Improvements 

Attempts  to  improve  inspection  through  new  and  modified  UIMs 
have  recently  become  more  common,  replacing  earlier  attempts 
to  combine  the  best  of  individual  UIMs,  as  in  Sears  (1997)  heuris¬ 
tic  walkthrough.  Examples  include  the  usability  problem  inspec¬ 
tor  (Hartson,  Andre,  &  Williges,  2003),  strategies  for  improving 
the  effectiveness  of  HE  (Law  &  Hvannberg,  2004),  and  metaphors 
of  human  thinking  (MOT;  Erpkjaer  &  Hornbaek,  2002). 

MOT  was  proposed  as  a  new  UIM,  despite  similarities  to  HE. 
It  was  motivated  by  two  perceived  flaws  in  existing  UIMs.  Firstly, 
Frpkjasr  and  Hornbaek  (2002)  claimed  that  existing  techniques  fail 
to  consider  users’  thinking.  Secondly,  they  believed  that  current 
UIMs  are  only  useful  with  specific  applications  and  associated  in¬ 
teraction  styles  (e.g.,  pointer  interfaces),  and  not  appropriate  for 
technologies  such  as  mobile  devices,  presumably  due  to  depen¬ 
dence  on  design  knowledge  specific  to  desktop  graphical  user 
interfaces  (although  only  HE  has  strong  dependencies  here). 

MOT  shifts  focus  from  the  device  to  users’  mental  activity 
with  the  aim  of  being  more  applicable  to  different  devices  and 
contexts.  In  addition,  MOT  creators  have  used  strikingly  poetic 
language  to  provide  resources  in  the  form  of  metaphors  about 
human  thinking,  as  shown  in  Table  59.1,  along  with  their  impli¬ 
cations  for  user  interfaces.  In  Fig.  59.1,  MOT  would  appear 
above  walkthroughs,  extending  beyond  these  to  the  right  due 
to  its  focus  on  users’  mental  activity. 

Some  MOTs  are  based  on  knowledge  of  interaction  design 
from  the  HCI  literature  (metaphors  1  and  4) ;  others  are  based 
on  cognitive  psychology  (metaphors  2  and  5)  or  (anti-)philoso- 
phizing  by  a  Turing  Award  winner  (metaphors  3,  4,  and  5).  This 
very  interesting  collection  of  knowledge  resources  extends  ones 
already  identified  within  the  DARe  model  (knowledge  of  users 
and  interaction)  with  inspiration  from  Naur  (1999),  often  based 
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TABLE  59.1.  Metaphors  of  Human  Thinking 
Metaphor  of  Human  Thinking  Implication  for  User  Interface 


1 .  Habit  formation  is  like  a 
landscape  eroded  by  water 

2.  Thinking  as  a  stream  of 
thought 

3.  Awareness  as  a  jumping 
octopus 

4.  Utterances  as  splashes 
over  water 

5.  Knowing  as  a  site  of 
building 


Support  existing  habits  and,  where 
necessary  develop  new  ones 

Users’  thinking  should  be  supported 
by  reconcilability,  stability,  and 
continuity 

Support  users'  associations  with 
effective  means  of  focusing  within 
a  stable  context 

Support  changes  and  incomplete 
utterances 

Users  should  not  have  to  rely  on 
complete  or  accurate  knowledge, 
design  for  incompleteness 


on  admiration  for  the  19th-century  psychologist  and  philoso¬ 
pher  William  James. 

MOTs  are  applied  to  a  series  of  task-based  walkthroughs 
and  are  claimed  to  outperform  HE  (Hornbaek  &  Frpkjaer, 
2004).  Better  performance  was  due  to  MOT  finding  more  se¬ 
vere  problems  that  were  apparently  more  complex  to  repair. 
In  contrast,  problems  found  by  HE  were  more  cosmetic  and 
less  complex  (replicating  HE’s  performance  in  Cockton  & 
Woolrych,  2001).  However,  the  claimed  results  may  not  be  the 
result  of  the  new  UIM,  and  indeed  MOT’s  assessment  could  be 
based  on  unreliable  coding  of  predictions  as  true/false  posi¬ 
tives  and  so  forth,  since  there  has  been  no  real  problem  set  de¬ 
rived  via  empirical  user-based  methods.  Instead,  assessment  of 
the  problems  found  by  both  UIMs  was  based  on  an  approach 
by  Sawyer  et  al.  (1996)  and  a  related  approach  by  Hartson, 
Andre,  andWilliges  (2001). 

In  Hornbaek  and  Frpkjaer  (2004),  the  authors  recommended 
that  an  individual  responsible  for  the  development  of  the  web  ap¬ 
plication  under  inspection,  the  client,  is  responsible  for  rating  the 
severity  of  the  problems  found  by  each  technique.  The  severity  rat¬ 
ings  for  each  problem  ranged  from  very  critical  (most  severe)  to 
not  a  problem  (e.g.,  a  false  positive?).  Therefore,  the  clients’  judg¬ 
ment  is  responsible  for  the  identihcation  of  false  positives,  and  not 
more  objective  and  auditable  falsihcation  testing  (Woolrych,  Cock¬ 
ton,  &  Hindmarch,  2004) .  The  study  found  the  client  believed  only 
1%  of  problems  in  reported  in  HE,  and  3%  of  reported  problems 
in  MOT,  to  be  false  positives,  in  marked  contrast  to  65%  false  pos¬ 
itives  in  another  study  of  HE  (Cockton  &  Woolrych,  2001).  How¬ 
ever,  in  the  MOT  study,  a  further  49%  of  predictions  in  HE,  and 
33%  in  MOT,  are  described  as  cosmetic  (the  lowest  severity  rating 
applied,  other  than  those  considered  not  a  problem).  Assessment 
of  the  technique  and  the  accuracy  in  coding  of  predictions  must 
be  questioned  when  false  positive  counts,  particularly  in  HE,  are 
so  significantly  lower  than  those  in  other  studies  are. 

"What  is  more  interesting  than  these  potential  confounds  is 
how  apparent  success  can  be  explained  by  the  DARe  model. 
MOT  combines  task-based  search  (for  problem  discovery) ,  with 
analysis  resources  from  the  client  who  is  responsible  for  much 
problem  coding  (e.g.,  severity,  false  positives).  MOT’s  thor¬ 
oughness  (with  apparent  improved  problem  discovery  over  HE) 
may  not  be  a  result  of  its  new  metaphors,  but  to  how  these  were 
applied  and  how  problems  are  analyzed.  MOT  is  applied  as  a 
task-based  (procedural)  inspection  method,  in  contrast  to  HE 


but  similar  to  HW  Analysts  familiarize  themselves  with  the  ap¬ 
plication;  then,  in  a  manner  similar  to  they  attempt  typical 
user  tasks  in  two  passes  using  the  metaphors  to  find  usability 
problems.  Once  a  problem  is  found,  the  metaphor  responsible 
for  the  discovery  is  associated  with  the  problem  as  part  of  the 
reporting  method.  However,  in  HE,  analysts  often  associate  in¬ 
appropriate  heuristics  with  predicted  problems  (Cockton  & 
Woolrych,  2001).  Appropriateness  of  metaphor  association  with 
individual  problems  was  not  addressed,  leading  to  questions 
about  their  overall  impact  on  problem  discovery.  Was  it  the 
metaphors  or  other  distributed  cognitive  resources,  especially 
task  information,  that  was  responsible  for  problem  discovery 
and  analysis?  Also,  what  benefits  derive  from  the  analysis  re¬ 
sources  provided  by  a  client?  A  DARe-based  critique  thus  ex¬ 
poses  powerful  resources:  task  information  required  for  initial 
analysis,  high  quality  product,  and  technical  resources  from 
client,  plus  potential  contextual  information  on  users,  tasks,  and 
application  domain.  Put  alongside  resources  of  known  power 
and  accuracy,  it  is  hard  to  see  what  the  five  metaphors  of  think¬ 
ing  could  add  to  these  problem  discovery  and  analysis  resources. 

As  with  previous  attempts  to  improve  UIMs  directly,  initial  su¬ 
perficial  improvements  in  validity  and  thoroughness  are  as  likely 
(if  not  more  likely)  to  be  a  result  of  method-independent  im¬ 
provements  to  distributed  cognitive  resources  rather  than  to  new 
resources  specific  to  the  method  itself  Thus,  MOT  as  a  UIM  has 
strong  points  in  its  attempted  theoretical  grounding  in  human 
thinking  and  its  use  of  structured  user-centered  discovery  meth¬ 
ods.  However,  much  of  its  strength  may  lie  in  these  last  two  fac¬ 
tors  rather  than  in  the  metaphors  themselves.  Similarly,  even 
though  heuristics  can  be  applied  mostly  (and  perhaps  unavoid¬ 
ably)  inappropriately,  the  use  of  multiple  analysts  can  still  result  in 
thoroughness  scores  of  70%.  This  indicates  the  risks  of  attributing 
too  much  of  an  inspection’s  outcome  to  a  UIM,  which  may  have 
a  published  coverage  of  under  40%  rather  than  to  the  analysts. 
Given  this,  we  can  expect  almost  any  collection  of  metaphors, 
heuristics,  or  principles  to  appear  to  give  a  UIM  the  power  to  pre¬ 
dict  user  difficulties.  Gray  and  Salzman’s  (1998)  concerns  about 
conclusion  validity  remain  relevant  to  all  UIM  assessments. 

Most  UIMs  are  placebos.  The  ones  that  are  not  are  ineffective 
placebo  blockers.  Evaluators  and  collaborators,  rather  than  spe¬ 
cific  methods,  may  be  the  main  determinant  of  success,  notwith¬ 
standing  the  proven  positive  benefits  of  method-independent 
tools  such  as  ESPREs  and  user-centered  procedural  discovery 
methods.  Thus,  the  question  that  arises  is  about  how  much  fur¬ 
ther  UIM  performance  could  be  improved  by  focused  analyst 
training,  ideally  based  on  the  actual  strategies  of  the  most  suc¬ 
cessful  experienced  usability  professionals. 


Improving  Analysts 

Designing  a  perfect  UIM  would  appear  to  be  an  almost  impos¬ 
sible  task.  The  DARe  model  indicates  that  a  perfect  UIM  would 
need  the  capability  to  discover  candidate  problems  and  confirm 
or  eliminate  probable  or  improbable  problems  reliably.  This 
means  the  UIM  must  be  supported  by  extensive  knowledge  re¬ 
sources  on  users,  tasks,  domains,  interaction,  design,  products, 
and  technologies.  Despite  efforts  to  improve  UIMs,  no  new  UIM 
can  be  claimed  to  be  as  reliable  as  user  testing.  UIMs  can  predict 
more  problems,  but  most  of  these  may  be  false  positives.  New 
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UIMs  must  rely  on  human  abilities  and  resources  for  problem 
analysis.  If  we  cannot  fix  the  method,  and  there  is  little  left  to 
fix  for  inspection  in  general,  can  we  fix  the  analyst? 

An  effective  one  size  fits  all  UIM  is  highly  improbable,  if  not 
impossible,  as  relevant  knowledge  resources  (and  their  relative 
importance)  change  depending  on  the  application  context.  For 
example,  domain  knowledge  in  a  local  travel  website  is  very  dif¬ 
ferent  to  the  domain  knowledge  of  a  website  that  hires  cars  or 
vans  or  allows  the  purchase  of  electrical  goods.  By  comparison, 
user  knowledge  will  be  a  predominant  resource  in  assessing 
user  experience  for  lifestyle  and  brand  intensive  websites,  as 
may  currently  unexplored  knowledge  resources  for  UIMs  such 
as  culture  and  semiotics.  Clearly,  analysts  can  become  sensitive 
to  and  skilled  in  the  use  of  such  resources  in  advance  of  their 
formal  integrations  into  UIMs. 


FUTURE  TRENDS  AND  BEST  PRACTICE 


Prospects  for  UIMs 

Major  methodological  problems  in  the  assessment  and  compari¬ 
son  of  UIMs  have  retarded  progress  in  UIM  development.  Early 
comparisons  and  assessments  and  their  limited  reliabilities  were 
surveyed  in  the  first  version  of  this  chapter  (Cockton,  Lavery  & 
Woolrych,  2003).  Interested  readers  should  consult  this  to  un¬ 
derstand  why  limited  progress  was  made  in  the  first  decade  of  dis¬ 
count  methods.  Recent  methodological  improvements  in  re¬ 
sponse  to  Gray  and  Salzman’s  (1998)  extensive  critique  have 
improved  the  prospects  for  UIM  improvement,  not  least  through 
the  unexpected  side  effects  of  new  research  instruments  such  as 
ESPRFs.  In  addition,  studying  analyst  behavior  rather  than 
method  performance  is  producing  insights  into  effective  UIM  us¬ 
age  that  are  relevant  to  most  inspection-based  methods.  The  chal¬ 
lenge  now  is  in  designing  new  UIMs,  or  modifying  existing  ones, 
to  make  best  use  of  generic  analyst  knowledge,  generic  inspec¬ 
tion  heuristics,  and  method  specific  procedures.  The  current  re¬ 
naissance  in  evaluation  method  research  should  bring  a  stream  of 
improvements  throughout  the  second  decade  of  UIMs. 

Best  Practice  for  UIMs 

UIM  users  miss  problems,  due  either  to  a  lack  of  sufficient  dis¬ 
covery  resources  or  to  problems  found  and  mistakenly  elimi¬ 
nated.  In  addition,  without  satisfactory  elimination  resources 
such  as  knowledge  of  interaction,  false  positives  fail  to  get  elim¬ 
inated,  reducing  UIM  validity.  Neither  phase  of  discovery  or 
analysis  in  inspection  provides  enough  direct  support  for  ana¬ 
lysts.  In  short,  to  improve  effectiveness,  UIMs  must  find  more 
problems  (improved  thoroughness),  and  produce  fewer  false 
positives  (improved  validity).  While  UIMs  rely  on  multiple  ana¬ 
lysts  to  improve  thoroughness  often  at  the  expense  of  validity, 
there  are  method-independent  approaches  to  better  improv¬ 
ing  overall  effectiveness  of  UIMs. 

From  a  practical  point  of  view,  where  resources  allow,  com¬ 
bining  the  results  from  inspection  with  user  testing  can  eliminate 
false  positives.  Initially  designed  as  a  research  tool  (Woolrych, 
Cockton,  &  Hindmarch,  2004),  falsification  testing  should  allow 
for  the  accurate  coding  of  all  predictions  by  analysts.  Falsification 


testing  involves  carefully  constructed  task  sets  for  user  testing  fol¬ 
lowing  inspection.  These  task  sets  follow  typical  user  tasks  within 
the  context  of  the  application  ensuring  analyst  predictions  are  also 
addressed.  Put  simply,  if  a  predicted  problem  exists,  falsification 
testing  should  find  it,  or  otherwise  confirm  as  a  false  positive,  thus 
preventing  wasted  redevelopment  time  on  nonexistent  problems. 
This  has  the  additional  benefit  of  letting  UIM  users  calibrate  their 
predictions  against  user  testing  and  use  this  feedback  to  improve 
their  inspection  performances  through  improvements  to  reusable 
cognitive  resources  and  heuristics  for  carrying  out  inspections 
(as  opposed  to  the  heuristics  used  in  some  UIMs). 

Overall,  the  best  practice  for  all  UIM  usage  is  to  have  a  thor¬ 
ough  general  knowledge  of  HCI  (knowledge  of  design  and  in¬ 
teraction);  to  have  accurate  and  extensive  information  about 
the  product,  its  technical  platform,  and  its  expected  context  of 
use  (knowledge  of  users,  tasks  and  application  domain);  and 
to  combine  these  high-quality  resources  with  proven  ap¬ 
proaches,  such  as  structured  problem  reports  (Fig.  59.5),  re¬ 
flective  discovery  and  analysis  of  possible  and  probable  usabil¬ 
ity  problems,  and  preference  for  structured  and/or 
user-centered  approaches  to  problem  discovery  (Cockton 
et  al.,  2003).  UIMs  must  be  thoroughly  integrated  within  an  it¬ 
erative  development  lifecycle  and  make  full  use  of  all  informa¬ 
tion  available  to  initial  design  studies.  They  must  also  feed  into 
multidisciplinary  design  iteration,  which  again  makes  full  use 
of  all  available  project  knowledge  and  does  not  shrink  back  to 
ill-advised  reliance  on  the  judgments  of  specialist  evaluators. 

Inspection  using  multiple  analysts  needs  to  be  carefully  man¬ 
aged.  Woolrych  and  Cockton  (2002)  found  that  predictions  by 
a  single  analyst  are  not  necessarily  less  probable.  Tool  and  meet¬ 
ing  support  (potentially  asynchronous  and  remote)  needs  to 
be  provided  for  merging  problems  into  a  master  set.  A  chief 
merger  may  be  required,  or  even  a  chief  analyst  or  two,  leaving 
all  the  others  to  focus  on  problem  discovery.  Careful  planning 
and  skilled  leadership  is  essentially,  regardless  of  how  many  an¬ 
alysts  are  involved.  Common  advance  understandings  of  appro¬ 
priate  abstraction  levels  for  problem  reporting  are  essential,  and 
analysis  must  maximize  the  feed  through  into  subsequent  de¬ 
sign  iterations  (causes,  fixes). 

UIMs  are  the  design  equivalent  of  the  bumblebee,  which  the 
laws  of  aerodynamics  state  should  not  be  able  to  fly.  UIMs,  with 
their  limited  theoretical  underpinnings  and  their  poorly  struc¬ 
tured  procedures  and  integration  into  the  development  life- 
cycle,  should  not  be  able  to  predict,  but  they  do.  Thoroughness 
scores  of  7096  and  over  are  common,  and  if  validity  can  be 
brought  up  to  the  same  level,  then  UIMs  may  once  again  be¬ 
come  the  predominant  usability  evaluation  method.  If  they  do, 
then  this  will  be  due  to  much  improved  practice  by  the  ever- 
improving  personal  skills  and  knowledge  of  thousands  of  us¬ 
ability  specialists,  rather  than  to  naive  trust  in  a  handful  of  HCI 
researchers  and  usability  gurus. 


NOTE 


The  views  and  opinions  contained  in  this  chapter  are  those  of 
the  authors.  In  the  case  of  the  second  author,  they  do  not  nec¬ 
essarily  represent  any  official  views  of  Microsoft  Corporation. 
We  thank  Marc  Webster  of  the  School  of  Computing  and  Tech¬ 
nology,  University  of  Sunderland,  for  providing  all  figures. 
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INTRODUCTION 


What  is  Model-Based  Evaluation? 

Model-based  evaluation  is  using  a  model  of  how  a  human  would 
use  a  proposed  system  to  obtain  predicted  usability  measures 
by  calculation  or  simulation.  These  predictions  can  replace  or 
supplement  empirical  measurements  obtained  by  user  testing. 
In  addition,  the  content  of  the  model  itself  conveys  useful  in¬ 
formation  about  the  relationship  between  the  user’s  task  and 
the  system  design. 

Organization  of  this  Chapter 

This  chapter  will  first  argue  that  model-based  evaluation  is  a 
valuable  supplement  to  conventional  usability  evaluation,  and 
then  survey  the  current  approaches  for  performing  model- 
based  evaluation.  Because  of  the  considerable  technical  detail 
involved  in  applying  model-based  evaluation  techniques,  this 
chapter  cannot  include  “how  to”  guides  on  the  specific  model¬ 
ing  methods,  but  they  are  all  well  documented  elsewhere. 
Instead,  this  chapter  will  present  several  high-level  issues  in  con¬ 
structing  and  using  models  for  interface  evaluation,  and  com¬ 
ment  on  the  current  approaches  in  the  context  of  those  issues. 
This  will  assist  the  reader  in  deciding  whether  to  apply  a  model- 
based  technique,  which  one  to  use,  what  problems  to  avoid, 
and  what  benefits  to  expect.  Somewhat  more  detail  will  be  pre¬ 
sented  about  one  form  of  model-based  evaluation,  GOMS  mod¬ 
els,  which  is  a  well-developed,  relatively  simple  and  “ready  to 
use”  methodology  applicable  to  many  interface  design  prob¬ 
lems.  A  set  of  concluding  recommendations  will  summarize  the 
practical  advice. 

Why  Use  Model-Based  Evaluation? 

Model-based  evaluation  can  be  best  viewed  as  an  alternative  way 
to  implement  an  iterative  process  for  developing  a  usable  sys¬ 
tem.  This  section  will  summarize  the  standard  usability  process, 
and  contrast  it  with  a  process  using  model-based  evaluation. 

Standard  usability  design  process.  In  simplified  and 
idealized  form,  the  standard  process  for  developing  a  usable  sys¬ 
tem  centers  on  user  testing  of  prototypes  that  seeks  to  compare 
user  performance  to  a  specification  or  identify  problems  that  im¬ 
pair  learning  or  performance.  After  performing  a  task  analysis 
and  choosing  a  set  of  benchmark  tasks,  an  interface  design  is 
specified  based  on  intuition  and  guidelines  both  for  the  plat¬ 
form/application  style  and  usability.  A  prototype  of  some  sort  is 
implemented,  and  then  a  sample  of  representative  users  at¬ 
tempts  to  complete  the  benchmark  tasks  with  the  prototype.  Us¬ 
ability  problems  are  noted,  such  as  excessive  task  completion 
time  or  errors,  being  unable  to  complete  a  task,  or  confusion 
over  what  to  do  next.  If  the  problems  are  serious  enough,  the 
prototype  is  revised,  and  a  new  user  test  conducted.  At  some 
point,  the  process  is  terminated  and  the  product  completed, 
either  because  no  more  serious  problems  have  been  detected. 


or  there  is  not  enough  time  or  money  for  further  development. 
See  Dumas  (chapter  57,  this  volume)  for  a  complete  presentation. 

The  standard  process  is  a  straightforward,  well-documented 
methodology  with  a  proven  record  of  success  (Landauer,  1995). 
The  guidelines  for  user  interface  design,  together  with  knowl¬ 
edge  possessed  by  those  experienced  in  interface  design  and 
user  testing,  adds  up  to  a  substantial  accumulation  of  wisdom 
on  developing  usable  systems.  There  is  no  doubt  that  if  this 
process  were  applied  more  widely  and  thoroughly,  the  result 
would  be  a  tremendous  improvement  in  software  quality.  User 
testing  has  always  been  considered  the  “gold  standard”  for  us¬ 
ability  assessment;  however,  it  has  some  serious  limitations — 
some  practical  and  others  theoretical. 

Practical  limitations  of  user  testing.  A  major  practical 
problem  is  that  user  testing  can  be  too  slow  and  expensive  to  be 
compatible  with  current  software  development  schedules,  so 
for  many  years,  one  focus  of  HCI  research  has  been  on  ways  to 
tighten  the  iterative  design  loop.  For  example,  better  prototyp¬ 
ing  tools  allow  prototypes  to  be  developed  and  modified  more 
rapidly.  Clever  use  of  paper  mockups  or  other  early  user  input 
techniques  allows  important  issues  to  be  addressed  before  mak¬ 
ing  the  substantial  investment  in  programming  a  prototype.  So- 
called  inspection  evaluation  methods  seek  to  replace  user  test¬ 
ing  with  other  forms  of  evaluation,  such  as  expert  surveys  of  the 
design,  or  techniques  such  as  cognitive  walkthroughs  (see 
Cockton,  Lavery  &  Woolrich,  chapter  59,  this  volume). 

If  user  testing  is  really  the  best  method  for  usability  assess¬ 
ment,  then  it  is  necessary  to  come  to  terms  with  the  unavoid¬ 
able  time  and  cost  demands  of  collecting  behavioral  data  and 
analyzing  it,  even  in  the  rather  informal  manner  that  normally 
suffices  for  user  testing.  For  example,  if  the  system  design  were 
substantially  altered  on  an  iteration,  it  would  be  necessary  to 
retest  the  design  with  a  new  set  of  test  users.  While  it  is  hoped 
that  the  testing  process  finds  fewer  important  problems  with 
each  iteration,  the  process  does  not  get  any  faster  with  each 
iteration — the  same  adequate  number  of  test  users  must  per¬ 
form  the  same  adequate  number  of  representative  tasks,  and 
their  performance  assessed. 

The  cost  of  user  testing  is  especially  pronounced  in  expert- 
use  domains,  where  the  user  is  somebody  such  as  a  physician,  a 
petroleum  geologist,  or  an  engineer.  Such  users  are  few,  and 
their  time  is  valuable.  This  may  make  relying  on  user  testing  too 
costly  to  adequately  refine  an  interface.  A  related  problem  is  eval¬ 
uating  software  that  is  intended  to  serve  experienced  users  es¬ 
pecially  well.  Assessing  the  quality  of  the  interface  requires  a  very 
complete  prototype  that  can  be  used  in  a  realistic  way  for  an  ex¬ 
tended  period  of  time  so  that  the  test  users  can  become  experi¬ 
enced.  This  drives  up  the  cost  of  each  iteration,  because  the  new 
version  of  the  highly  functional  prototype  must  be  developed, 
and  the  lengthy  training  process  must  be  repeated.  Other  design 
goals  can  also  make  user  testing  problematic.  Consider  devel¬ 
oping  a  pair  of  products  for  which  skill  is  supposed  to  transfer 
from  one  to  the  other.  Assessing  such  transfer  requires  proto¬ 
typing  both  products  fully  enough  to  train  users  on  the  first,  and 
then  training  them  on  the  second,  to  see  if  the  savings  in  training 
time  are  adequate.  Any  design  change  in  either  of  the  products 
might  affect  the  transfer  and  thus  require  a  repeat  test  of  the  two 
systems.  This  double  dose  of  development  and  testing  effort  is 
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probably  impractical  except  in  critical  domains,  where  the  addi¬ 
tional  problem  of  testing  with  expert  users  will  probably  appear. 

Theoretical  limitations  of  user  testing.  From  the  per¬ 
spective  of  scientific  psychology,  the  user  testing  approach 
takes  very  little  advantage  of  what  is  known  about  human  psy¬ 
chology,  and  thus  lacks  grounding  in  psychological  theory.  Al¬ 
though  scientific  psychology  has  been  underway  since  the  late 
1800s,  the  only  concepts  relied  on  by  user  testing  are  a  few  ba¬ 
sic  concepts  of  how  to  collect  behavioral  data.  Surely  more  is 
known  about  human  psychology  than  this!  The  fact  is  that  user¬ 
testing  methodology  would  work  even  if  there  was  no  system¬ 
atic  scientific  knowledge  of  human  psychology  at  all;  as  long 
as  the  designer’s  intuition  leads  in  a  reasonable  direction  on 
each  iteration,  it  suffices  merely  to  revise  and  retest  until  no 
more  problems  are  found.  While  this  is  undoubtedly  an  advan¬ 
tage,  it  does  suggest  that  user  testing  may  be  a  relatively  ineffi¬ 
cient  way  to  develop  a  good  interface. 

This  lack  of  grounding  in  psychological  principles  is  related 
to  the  most  profound  limitation  of  user  testing:  It  lacks  a  sys¬ 
tematic  and  explicit  representation  of  the  knowledge  developed 
during  the  design  experience;  such  a  representation  could  allow 
design  knowledge  to  be  accumulated,  documented,  and  sys¬ 
tematically  reused.  After  a  successful  user  testing  process,  there 
is  no  representation  of  how  the  design  “works”  psychologically 
to  ensure  usability;  there  is  only  the  final  design  itself,  as  de¬ 
scribed  in  specifications  or  in  the  implementation  code.  These 
descriptions  normally  have  no  theoretical  relationship  to  the 
user’s  task  or  the  psychological  characteristics  of  the  user.  Any 
change  to  the  design,  or  to  the  user’s  tasks,  might  produce  a 
new  and  different  usability  situation,  but  there  is  no  way  to  tell 
what  aspects  of  the  design  are  still  relevant  or  valid.  The  infor¬ 
mation  on  why  the  design  is  good,  or  how  it  works  for  users, 
resides  only  in  the  intuitions  of  the  designers.  While  designers 
often  have  outstanding  intuitions,  we  know  from  the  history  of 
creations  such  as  the  medieval  cathedrals  that  intuitive  design  is 
capable  of  producing  magnificent  results,  but  is  also  routinely 
guilty  of  costly  overengineering  or  disastrous  failures. 

The  model-based  approach.  The  goal  of  model-based 
evaluation  is  to  get  some  usability  results  before  implementing 
a  prototype  or  testing  with  human  subjects.  The  approach  uses 
a  model  of  the  human-computer  interaction  situation  to  repre¬ 
sent  the  interface  design  and  produce  predicted  measurements 
of  the  usability  of  the  interface.  Such  models  are  also  termed  en¬ 
gineering  models  or  analytic  models  for  usability.  The  model 
is  based  on  a  detailed  description  of  the  proposed  design  and  a 
detailed  task  analysis.  It  explains  how  the  users  will  accomplish 
the  tasks  by  interacting  with  the  proposed  interface,  and  uses 
psychological  theory  and  parametric  data  to  generate  the  pre¬ 
dicted  usability  metrics.  Once  the  model  is  built,  the  usability 
predictions  can  be  quickly  and  easily  obtained  by  calculation  or 
by  running  a  simulation.  Moreover,  the  implications  of  varia¬ 
tions  on  the  design  can  be  quickly  explored  by  making  the  cor¬ 
responding  changes  in  the  model.  Since  most  variations  are  rel¬ 
atively  small,  a  circuit  around  the  revise/evaluate  iterative  design 
loop  is  typically  quite  fast  once  the  initial  model-building  in¬ 
vestment  is  made.  Thus,  unlike  user  testing,  iterations  gener¬ 
ally  get  faster  and  easier  as  the  design  is  refined. 


In  addition,  the  model  itself  summarizes  the  design,  and  can 
be  inspected  for  insight  into  how  the  design  supports  (or  fails  to 
support)  the  user  in  performing  the  tasks.  Depending  on  the 
type  of  model,  its  components  may  be  reusable  not  in  just  dif¬ 
ferent  versions  of  the  system  under  development,  but  in  other 
systems  as  well.  Such  a  reusable  model  component  captures  a 
stable  feature  of  human  performance,  task  structures,  or  inter¬ 
action  techniques;  characterizing  them  contributes  to  our  sci¬ 
entific  understanding  of  human-computer  interaction. 

The  basic  scheme  for  using  model-based  evaluation  in  the 
overall  design  process  is  that  iterative  design  is  done  first  using 
the  model,  and  then  by  user  testing.  In  this  way,  many  design 
decisions  can  be  worked  out  before  investing  in  prototype  con¬ 
struction  or  user  testing.  The  final  user  testing  process  is  re¬ 
quired  for  two  reasons:  First,  the  available  modeling  methods 
only  cover  certain  aspects  of  usability;  at  this  time,  they  are  lim¬ 
ited  to  predicting  the  sequence  of  actions,  the  time  required 
to  execute  the  task,  and  certain  aspects  of  the  time  required  to 
learn  how  to  use  the  system.  Thus,  user  testing  is  required  to 
cover  the  remaining  aspects.  Second,  since  the  modeling 
process  is  necessarily  imperfect,  user  testing  is  required  to  en¬ 
sure  that  some  critical  issue  has  not  been  overlooked.  If  the  user 
testing  reveals  major  problems  along  the  lines  of  a  fundamen¬ 
tal  error  in  the  basic  concept  of  the  interface,  it  will  be  necessary 
to  go  back  and  reconsider  the  entire  design;  again  model-based 
iterations  can  help  address  some  of  the  issues  quickly.  Thus,  the 
purpose  of  the  model-based  evaluation  is  to  perform  some  of 
the  design  iterations  in  a  lower  cost,  higher  speed  mode  before 
the  relatively  slow  and  expensive  user  testing. 

What  “interface  engineering”  should  be.  Model-based 
evaluation  is  not  the  dominant  approach  to  user  interface  de¬ 
velopment;  most  practitioners  and  academics  seem  to  favor 
some  combination  of  user  testing  and  inspection  methods. 
Some  have  tagged  this  majority  approach  as  a  form  of  “engi¬ 
neering;”  however,  even  a  cursory  comparison  to  established  en¬ 
gineering  disciplines  makes  it  clear  that  conventional  approaches 
to  user  interface  design  and  evaluation  have  little  resemblance  to 
an  engineering  discipline.  In  fact,  model-based  evaluation  is  a  de¬ 
liberate  attempt  to  develop  and  apply  true  engineering  meth¬ 
ods  for  user  interface  design.  The  following  somewhat  extended 
analogy  will  help  clarify  the  distinction,  as  well  as  explain  the 
need  for  further  research  in  modeling  techniques. 

If  civil  engineering  were  done  with  iterative  empirical  testing, 
bridges  would  be  built  by  erecting  a  bridge  according  to  an  in¬ 
tuitively  appealing  design,  and  then  driving  heavy  trucks  over  it 
to  see  if  it  cracks  or  collapses.  If  it  does,  it  would  be  rebuilt  in  a 
new  version  (e.g.,  with  thicker  columns)  and  the  trial  repeated; 
the  iterative  process  continues  with  additional  guesses  until  a 
satisfactory  result  is  obtained.  Over  time,  experienced  bridge 
builders  would  develop  an  intuitive  feel  for  good  designs  and 
how  strong  the  structural  members  need  to  be,  and  so  will  often 
guess  right.  Time  and  cost  pressures,  however,  will  probably 
lead  to  cutting  the  process  short  by  favoring  conservative  de¬ 
signs  that  are  likely  to  work,  even  though  they  might  be  unnec¬ 
essarily  clumsy  and  costly. 

Although  early  bridge  building  undoubtedly  proceeded  in 
this  fashion,  modern  civil  engineers  do  not  build  bridges  by  it¬ 
erative  testing  of  trial  structures.  Rather,  under  the  stimulus  of 
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design  failures  (Petrosky,  1985),  they  developed  a  body  of  sci¬ 
entific  theory  on  the  behaviors  of  structures  and  forces,  and  a 
body  of  principles  and  parametric  data  on  the  strengths  and  lim¬ 
itations  of  bridge-building  materials.  From  this  theory  and  data, 
they  can  quickly  construct  models  in  the  form  of  equations  or 
computer  simulations  that  allow  them  to  evaluate  the  quality 
of  a  proposed  design  without  having  to  physically  construct  a 
bridge.  Thus,  an  investment  in  theory  development  and  mea¬ 
surement  enables  engineers  to  replace  an  empirical  iterative 
process  with  a  theoretical  iterative  process  that  is  much  faster 
and  cheaper  per  iteration.  The  bridge  is  not  built  until  the  de¬ 
sign  has  been  tested  and  evaluated  based  on  the  models,  and 
the  new  bridge  almost  always  performs  correctly.  Of  course,  the 
modeling  process  is  fallible,  so  the  completed  bridge  is  tested 
before  it  is  opened  to  the  public,  and  occasionally  the  model  for 
a  new  design  is  found  to  be  seriously  inaccurate  and  a  spectac¬ 
ular  and  deadly  design  failure  is  the  result.  The  claim  is  not  that 
using  engineering  models  is  perfect  or  infallible,  only  that  it 
saves  time  and  money,  and  thus  allows  designs  to  be  more 
highly  refined.  In  short,  more  design  iterations  results  in  better 
designs,  and  better  designs  are  possible  if  some  of  the  iterations 
can  be  done  very  cheaply  using  models. 

Moreover,  the  theory  and  the  model  summarize  the  design 
and  explain  why  the  design  does  or  does  not  work  well.  The 
theoretical  analysis  identifies  the  weak  and  strong  points  of 
the  design,  giving  guidance  to  the  designer  where  intuition  can 
be  applied  to  improve  the  design;  a  new  analysis  can  then  test 
whether  the  design  has  actually  been  improved.  Engineering 
analysis  does  not  result  in  simply  static  repetition  of  proven 
ideas.  Rather,  it  enables  more  creativity  because  it  is  now  pos¬ 
sible  to  cheaply  and  quickly  determine  whether  a  new  concept 
will  work.  Thus,  novel  and  creative  concepts  for  bridge  struc¬ 
tures  have  steadily  appeared  once  the  engineering  models 
were  developed. 

Correspondingly,  model-based  evaluation  of  user  interfaces 
is  simply  the  rigorous  and  science-based  techniques  for  how  to 
evaluate  user  interfaces  without  user  testing;  it  likewise  relies  on 
a  body  of  theory  and  parametric  data  to  generate  predictions 
of  the  performance  of  an  engineered  artifact,  and  explain  why 
the  artifact  behaves  as  it  does.  While  true  interface  engineering 
is  nowhere  as  advanced  as  bridge  engineering,  useful  tech¬ 
niques  have  been  available  for  some  time,  and  should  be  more 
widely  used.  As  model-based  evaluation  becomes  more  devel¬ 
oped,  it  will  become  possible  to  rely  on  true  engineering  meth¬ 
ods  to  handle  most  of  the  routine  problems  in  user  interface  de¬ 
sign,  with  considerable  savings  in  cost  and  time,  and  with 
reliably  higher  quality.  As  has  happened  in  other  branches  of 
engineering,  the  availability  of  powerful  analysis  tools  means 
that  the  designer’s  energy  and  creativity  can  be  unleashed  to  ex¬ 
plore  fundamentally  new  applications  and  design  concepts. 


THREE  CURRENT  APPROACHES 


Research  in  HCI  and  allied  fields  has  resulted  in  many  models  of 
human-computer  interaction  at  many  levels  of  analysis.  This 
chapter  restricts  attention  to  approaches  that  have  developed  to 
the  point  that  they  have  some  claim,  either  practical  or  scien¬ 


tific,  to  being  suitable  for  actual  application  in  design  problems. 
This  section  identifies  three  current  approaches  to  modeling  hu¬ 
man  performance  that  are  the  most  relevant  to  model-based  eval¬ 
uation  for  system  and  interface  design:  (a)  task  network  models, 
(b)  cognitive  architecture  models,  and  (c)  GOMS  models. 

Task  Network  Models 

In  task  network  models,  task  performance  is  modeled  in  terms 
of  a  PERT-chart-like  network  of  processes.  Each  process  starts 
when  its  prerequisite  processes  have  been  completed,  and  has 
an  assumed  distribution  of  completion  times.  This  basic  model 
can  be  augmented  with  arbitrary  computations  to  determine 
the  completion  time,  as  well  as  what  its  symbolic  or  numeric 
inputs  and  outputs  should  be.  Note  that  the  processes  are  usu¬ 
ally  termed  “tasks,”  but  they  need  not  be  human-performed  at 
all,  but  can  be  machine  processes  instead.  In  addition,  other  in¬ 
formation,  such  as  workload  or  resource  parameters  can  be  at¬ 
tached  to  each  process.  Performance  predictions  are  obtained 
by  running  a  Monte-Carlo  simulation  of  the  model  activity,  in 
which  the  triggering  input  events  are  generated  either  by  ran¬ 
dom  variables  or  by  task  scenarios.  A  variety  of  statistical  results, 
including  aggregations  of  workload  or  resource  usage,  can  be 
readily  produced.  The  classic  SAINT  (Chubb,  1981)  and  the 
commercial  MicroSaint  tool  (Laughery  1989)  are  prime  exam¬ 
ples.  These  systems  originated  in  applied  human  factors  and 
systems  engineering,  and  are  heavily  used  in  system  design,  es¬ 
pecially  for  military  systems. 

Cognitive  Architecture  Models 

Cognitive  architecture  systems  are  surveyed  by  Byrne  (chapter  5, 
this  volume) .  These  systems  consist  of  a  set  of  hypothetical  in¬ 
teracting  perceptual,  cognitive,  and  motor  components  assumed 
to  be  present  in  the  human,  and  whose  properties  are  based 
on  empirical  and  theoretical  results  from  scientific  research  in 
psychology  and  allied  fields.  The  functioning  of  the  components 
and  their  interactions  are  typically  simulated  with  a  computer 
program,  which,  in  effect,  produces  a  simulated  human  per¬ 
forming  in  a  simulated  task  environment  that  supplies  inputs 
(stimuli)  to  the  simulated  human,  and  reacts  to  the  outputs  (re¬ 
sponses)  produced  by  the  simulated  human.  Tasks  are  modeled 
primarily  by  programming  the  cognitive  component  according 
to  a  task  analysis,  and  then  performance  predictions  are  ob¬ 
tained  by  running  the  simulation  using  selected  scenarios  to 
generate  the  input  events  in  the  task.  Because  these  systems  are 
serious  attempts  to  represent  a  theory  of  human  psychological 
functions,  they  tend  to  be  rather  complex,  and  are  primarily 
used  in  basic  research  projects;  there  has  been  very  limited  ex¬ 
perience  in  using  them  in  actual  design  settings. 

GOMS  Models 

GOMS  models  are  the  original  approach  to  model-based  evalu¬ 
ation  in  the  computer  user  interface  field;  both  the  model-based 
evaluation  approach  and  GOMS  models  were  presented  as 


60.  Model-Based  Evaluation 


1195 


methods  for  user  interface  design  in  the  seminal  Card,  Moran, 
and  Newell  (1983)  presentation  of  the  psychology  of  human- 
computer  interaction.  They  based  the  GOMS  concept  on  the 
theory  of  human  problem  solving  and  skill  acquisition.  In  brief, 
GOMS  models  describe  the  knowledge  of  procedures  that  a 
user  must  have  in  order  to  operate  a  system.  The  acronym  and 
the  approach  can  be  summarized  as  follows:  The  user  can  ac¬ 
complish  certain  Goals  (G)  with  the  system;  Operators  (O)  are 
the  basic  actions  that  can  be  performed  on  the  system,  such  as 
striking  a  key  or  finding  an  icon  on  the  screen;  Methods  (M) 
are  sequences  of  Operators  that  when  executed,  accomplish  a 
Goal;  and  Selection  Rules  (S)  describe  which  Method  should 
be  used  in  which  situation  to  accomplish  a  Goal,  if  there  is  more 
than  one  available.  Constructing  a  GOMS  model  involves  writing 
out  the  methods  for  accomplishing  the  task  goals  of  interest, 
and  then  calculating  predicted  usability  metrics  from  the  method 
representation. 

There  are  different  forms  of  GOMS  models,  systematized  by 
John  and  Kieras  (1996a,  1996b),  that  represent  the  methods  at 
different  levels  of  detail,  and  whose  calculations  can  range  in 
complexity  from  simple  hand  calculations  to  full-fledged  simu¬ 
lations.  John  and  Kieras  pointed  out  that  the  different  forms  can 
be  viewed  as  being  based  on  different  simplified  cognitive  ar¬ 
chitectures  that  make  the  models  easy  to  apply  to  typical  inter¬ 
face  design  problems  and  insulate  the  model-builder  from  many 
difficult  theoretical  issues.  More  so  than  any  other  model-based 
approach,  GOMS  models  have  a  long  and  well-established  track 
record  of  success  in  user  interface  design,  although  they  are  not 
used  as  widely  as  their  simplicity  and  record  would  justify.  Al¬ 
though  still  under  development  by  researchers,  GOMS  models 
are  emphasized  in  this  chapter  because  in  some  forms  they  are 
a  “ready  to  use”  modeling  methodology.  A  later  section  will  de¬ 
scribe  their  rationale  more  completely,  but  the  reader  is  referred 
to  John  and  Kieras  (1996a,  1996b)  for  a  thorough  discussion. 


THEORETICAL  BASIS  FOR  CHOOSING  A 
MODEL-BASED  EVALUATION  TECHNIOUE 


This  section  presents  several  key  issues  concerning  the  theo¬ 
retical  foundations  of  model-based  evaluation,  concerning  the 
basic  sources  of  information  and  applicability  of  the  modeling 
approach.  When  choosing  or  evaluating  a  technique  for  model- 
based  evaluation,  the  potential  user  should  consider  these  is¬ 
sues;  the  techniques  differ  widely  in  how  well  they  handle  cer¬ 
tain  fundamental  questions.  The  next  section  will  focus  on  the 
practical  problems  of  applying  a  modeling  technique  once  it  has 
been  chosen.  In  both  sections,  the  three  basic  approaches  to 
model-based  evaluation  are  commented  on  as  appropriate.  Ad¬ 
vice  is  given  to  both  the  user  of  model-based  evaluation  and 
the  developer  of  model-based  techniques. 

Psychological  Constraints  are  Essential 

The  concept  of  model-based  evaluation  in  system  design  has  a 
long  history  and  many  different  proposed  methods  (for  early 
surveys,  see  Pew,  Baron,  Feehrer,  &  Miller,  1977;  MacMillan, 


Beevis,  Salas,  Strub,  Sutton,  &  Van  Breda,  1989;  Elkind,  Card, 
Hochberg,  &  Huey,  1989);  however,  the  necessary  scientific  ba¬ 
sis  for  genuinely  powerful  models  has  been  slow  to  develop. 
The  key  requirement  for  model-based  evaluation  is  that  building 
a  model  to  evaluate  a  design  must  be  a  routine,  production,  or 
engineering  activity,  and  not  a  piece  of  basic  scientific  research 
on  how  human  psychological  factors  are  involved  in  a  particu¬ 
lar  computer  usage  situation.  This  means  that  the  relevant  psy¬ 
chological  science  must  not  only  be  developed  first,  but  also 
must  then  be  systematized  and  encapsulated  in  the  modeling 
methodology  itself  A  modeling  methodology  must  provide 
constraints  on  the  content  and  form  of  the  model,  and  these 
constraints  must  provide  the  psychological  validity  of  the  model 
as  a  predictor  of  human  performance.  In  other  words,  if  the 
model  builder  can  do  essentially  anything  in  the  modeling  sys¬ 
tem  then  the  only  way  the  resulting  model  can  be  psychologi¬ 
cally  valid  is  if  the  model  builder  does  all  of  the  work  to  con¬ 
struct  a  valid  psychological  theory  of  human  cognition  and 
behavior  in  the  task,  and  then  ensures  that  the  constructed 
model  accurately  reflects  this  theory. 

Of  course  it  takes  tremendous  time,  effort,  and  training  to 
construct  original  psychological  theory,  far  more  than  should  be 
necessary  for  most  interface  design  situations.  Although  the  de¬ 
cisions  in  truly  novel  or  critical  design  situations  might  require 
some  fundamental  psychological  research,  most  interface  de¬ 
sign  situations  are  rather  routine.  The  problem  is  to  match  a 
computer  system  to  the  user’s  tasks  using  known  interface  de¬ 
sign  concepts  and  techniques.  It  should  not  be  necessary  to  be 
an  expert  researcher  in  human  cognition  and  performance  to 
carry  this  out. 

Thus,  the  key  role  of  a  modeling  system  is  to  provide  con¬ 
straints  based  on  the  psychological  science,  so  that  a  model  con¬ 
structed  within  the  system  has  a  useful  degree  of  predictive  va¬ 
lidity.  In  essence,  simply  by  using  the  modeling  system  according 
to  its  rules,  the  designer  must  be  able  to  construct  a  scientifi¬ 
cally  plausible  and  usefully  accurate  model  “automatically.” 

A  simple  series  of  examples  will  help  make  the  point:  Com¬ 
puter  user  interfaces  involve  typing  of  arbitrary  strings  of  text  on 
the  keyboard  and  pointing  with  a  mouse.  The  time  required  to 
type  on  the  keyboard  and  to  point  with  a  mouse  are  fairly  well 
documented.  If  task  execution  times  are  of  interest,  an  accept¬ 
able  modeling  system  should  include  these  human  perfor¬ 
mance  parameters  so  that  the  interface  designer  does  not  have 
to  collect  them  or  guess  them. 

Furthermore,  because  both  hands  are  involved  in  typing 
strings  of  text,  users  cannot  type  at  the  same  time  as  they  move 
the  mouse  cursor;  these  operations  must  be  performed  se¬ 
quentially,  taking  considerably  more  time  than  if  they  could  be 
done  simultaneously.  A  modeling  system  should  make  it  im¬ 
possible  to  construct  a  model  of  an  interface  that  overzealously 
optimizes  execution  speed  by  assuming  that  the  user  could 
type  strings  and  point  simultaneously;  the  sequential  constraint 
should  be  enforced  automatically.  A  high-quality  modeling  sys¬ 
tem  would  not  only  enforce  this  constraint,  but  also  automati¬ 
cally  include  the  time  costs  of  switching  between  typing  and 
pointing,  such  as  the  time  to  move  the  hand  between  the 
mouse  and  the  keyboard.  There  are  many  such  constraints  on 
human  performance — some  of  them  quite  obvious,  as  in  these 
examples,  and  some  very  subtle.  A  good  modeling  system  will 
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represent  these  constraints  in  such  a  way  that  they  are  automat¬ 
ically  taken  into  account  in  how  the  model  can  be  constructed 
and  used.  Because  of  the  subtleties  involved,  computational 
tools  are  especially  valuable  for  constructing  and  using  models 
because  they  can  help  enforce  the  psychological  constraints  and 
make  it  easier  for  the  model-builder  to  work  within  them. 


A  Brief  History  of  Constraints  in  Modern 
Psychological  Theory 

Theoretical  constraints  are  not  easy  to  represent  or  incorporate; 
a  coherent  and  rigorous  theoretical  foundation  is  required  to 
serve  as  the  substrate  for  the  network  of  constraints,  and  suit¬ 
able  foundations  were  not  constructed  until  fairly  recently. 
Through  most  of  second  half  of  the  20th  century,  psychological 
theory  was  mired  in  a  rather  crude  form  of  information-process¬ 
ing  theory  in  which  human  activity  was  divided  into  information¬ 
processing  stages  such  as  perception,  memory,  decision-making, 
and  action.  These  stages  were  usually  depicted  in  a  flowchart 
with  a  box  for  each  stage,  with  various  connections  between  the 
boxes,  and  perhaps  with  some  fairly  simple  equations  that  de¬ 
scribed  the  time  required  for  each  stage  or  the  accuracy  of  its 
processing.  There  was  little  constraint,  however,  on  the  possible 
data  contained  in  each  box,  or  the  operations  performed  there. 
A  box  could  be  of  arbitrary  complexity,  and  no  actual  explicit 
mechanism  had  to  be  provided  for  any  of  them.  Such  models 
were  little  more  than  a  “visual  aid”  for  theories  posed  in  the 
dominant  forms  of  informal  verbal  statements  or  rather  abstract 
mathematical  equations.  Later,  many  researchers  began  to  con¬ 
struct  computer  simulations  of  these  “box  models,”  which  pro¬ 
vided  more  flexibility  than  traditional  mathematical  models  and 
also  contributed  more  explicitness  and  rigor  than  traditional 
verbal  models.  But  still,  the  operations  performed  in  each  box 
were  generally  unstructured  and  arbitrary. 

An  early  effort  at  model-based  evaluation  in  this  theoretical 
mode  appears  in  the  famous  Human  Operator  Simulator  (HOS) 
system  (see  Wherry,  1976;  Pew  et  al.,  1977;  Strieb  &  Wherry, 
1979;  Lane,  Strieb,  Glenn,  &  Wherry,  1981;  Glenn,  Zaklad,  & 
Wherry,  1982;  Harris,  lavecchia,  &  Bittner,  1988;  and  Harris, 
lavecchia,  &  Dick,  1989).  HOS  contained  a  set  of  micromodels 
for  low-level  perceptual,  cognitive,  and  motor  activities,  in¬ 
voked  by  task-specific  programs  written  in  a  special-purpose 
procedural  programming  language  called  HOPROC  (Human 
Operator  Procedures  language).  The  micromodels  included 
such  things  as  Hick’s  and  Fitts’  Law,  formulas  for  visual  recog¬ 
nition  time,  a  model  of  short-term  memory  retention,  and  for¬ 
mulas  for  calculating  the  time  required  for  various  motor  ac¬ 
tions  such  as  pushing  buttons  and  walking.  The  effort  was 
ambitious  and  the  results  impressive,  but  in  a  real  sense,  HOS 
was  ahead  of  its  time.  The  problem  was  that  psychological  the¬ 
ory  was  not  well  enough  developed  at  the  time  to  provide  a 
sound  foundation  for  such  a  tool;  the  developers  were  basically 
trying  to  invent  a  cognitive  architecture  good  enough  for  prac¬ 
tical  application  before  the  concept  had  been  developed  in  the 
scientific  community.  Interestingly  the  spirit  of  the  HOPROC 
language  lives  on  in  the  independently  developed  notations  for 
some  forms  of  GOMS  models.  In  addition,  the  scientific  base 
for  the  micromodels  was  in  fact  very  sparse  at  the  time,  and 


many  of  them  are  currently  out  of  date  both  empirically  and 
theoretically.  HOS  appears  to  have  been  subsumed  into  some 
commercial  modeling  systems;  for  example,  a  task  network  ver¬ 
sion  of  HOS  is  available  from  Micro  Analysis  and  Design,  Inc. 
(http://www.maad.com/),  and  its  micromodels  are  used  in  their 
Integrated  Performance  Modeling  Environment  (IPME) ,  as  well 
as  CHI  System’s  COGNET/IGEN  produced  by  CHI  Systems 
(http://www.chiinc.com/). 

The  task  network  models  also  originated  in  this  box-model 
mode  of  psychology  theory,  and  show  it  in  their  lack  of  psycho¬ 
logical  constraints;  their  very  generality  means  they  contribute 
little  built-in  psychological  validity.  Even  if  the  HOS  micromod¬ 
els  are  used,  the  flexibility  of  the  modeling  system  means  that 
model  builders  themselves  must  identify  the  psychological 
processes  and  constraints  involved  in  the  task  being  modeled 
and  program  them  into  the  model  explicitly. 

Led  by  Anderson  (1983)  and  Newell  (1990)  researchers  in 
human  cognition  and  performance  began  to  construct  models 
using  a  cognitive  architecture  (see  Byrne,  chapter  5,  this  vol¬ 
ume).  Cognitive  architecture  parallels  the  concept  of  computer 
architecture:  a  cognitive  architecture  specifies  a  set  of  fixed 
mechanisms,  the  “hardware,”  that  comprise  the  human  mind. 
To  construct  a  model  for  a  specific  task,  the  researcher  “pro¬ 
grams”  the  architecture  by  specifying  a  psychological  strategy 
for  doing  the  task,  the  “software.”  (Parameter  value  settings  and 
other  auxiliary  information  might  be  involved  as  well.)  The  ar¬ 
chitecture  provides  the  coherent  theoretical  framework  within 
which  processes  and  constraints  can  be  proposed  and  given  an 
explicit  and  rigorous  definition.  Several  proposed  cognitive  ar¬ 
chitectures  exist  in  the  form  of  computer  simulation  packages 
in  which  programming  the  architecture  is  done  in  the  form  of 
production  systems,  collections  of  modular  if-then  rules,  that 
have  proved  to  be  an  especially  good  theoretical  model  of  hu¬ 
man  procedural  knowledge.  Developing  these  architectures, 
and  demonstrating  their  utility,  is  a  continuing  research  activity 
(see  Byrne,  chapter  5,  this  volume).  Not  surprisingly,  they  all 
have  a  long  way  to  go  before  they  accurately  incorporate  even 
a  subset  of  the  human  abilities  and  limitations  that  appear  in 
an  HCI  design  context. 

The  psychological  validity  of  a  model  constructed  with  a  cog¬ 
nitive  architecture  depends  on  the  validity  of  both  the  architec¬ 
ture  and  the  task-specific  programming,  so  it  can  be  difficult  to 
assign  credit  or  blame  for  success  or  failure  in  modeling  an  in¬ 
dividual  task.  However,  the  fixed  architecture  and  its  associated 
parameters,  are  supposed  to  be  based  on  fundamental  psycho¬ 
logical  mechanisms  that  are  required  to  be  invariant  across  all 
tasks,  while  the  task-specific  programming  is  free  to  vary  with  a 
particular  modeled  task.  To  the  extent  that  the  architecture  is 
correct,  one  should  be  able  to  model  any  task  simply  by  pro¬ 
gramming  the  architecture  using  only  task-analytic  information 
and  supplying  a  few  task-specific  parameters.  The  value  of  such 
architectures  lies  in  this  clear  division  between  universal  and 
task-specific  features  of  human  cognition;  the  model  builder 
should  be  free  to  focus  solely  on  the  specific  task  and  system 
under  design,  and  let  the  architecture  handle  the  psychology. 

Achieving  this  goal  in  psychological  research  is  a  daunting 
challenge.  'What  about  the  practical  sphere?  In  fact,  the  role  of 
architectural  constraints  in  some  of  the  extant  commercial  mod¬ 
eling  systems  is  problematic.  The  task  network  models  basically 
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have  such  an  abstract  representation  that  there  is  no  straight¬ 
forward  way  for  architectural  assumptions  to  constrain  the  mod¬ 
eling  system.  Once  one  has  opted  for  representing  human  ac¬ 
tivity  as  a  set  of  arbitrary  interconnected  task  processes,  there 
is  no  easy  way  to  somehow  impose  more  constrained  structure 
and  mechanism  on  the  system.  Attempting  to  do  so  simply  cre¬ 
ates  more  complexity  in  the  modeling  problem:  The  modeler 
must  figure  out  how  to  underuse  the  overgeneral  capabilities 
of  the  system  in  just  the  right  way. 

Another  commercial  system,  COGNET/IGEN  (see  Zachary, 
Santarelli,  Ryder,  &  Stokes,  2000,  for  a  recent  and  relatively  com¬ 
plete  description),  is  in  the  form  of  a  cognitive  architecture,  but 
is  a  very  complex  one  that  incorporates  a  multitude  of  ideas 
about  human  cognition  and  performance,  so  many  that  it  appears 
to  be  rather  hard  to  understand  how  it  works.  The  essence  of  a 
cognitive  architecture,  however,  is  the  insistence  on  a  small 
number  of  fundamental  mechanisms  that  provide  a  compre¬ 
hensive  and  coherent  system.  For  example,  several  of  the  sci¬ 
entifically  successful  cognitive  architectures  require  that  all  cog¬ 
nitive  processing  must  be  expressed  in  the  form  of  production 
rules  that  can  include  only  certain  things  in  their  conditions  and 
actions.  These  rules  control  all  of  the  other  components  in  the 
architecture,  which  in  turn  have  strictly  defined  and  highly  lim¬ 
ited  capabilities.  These  highly  constrained  systems  have  been 
successful  in  a  wide  range  of  modeling  problems,  so  it  is  difficult 
to  see  why  a  very  complex  architecture  is  a  better  starting  point. 
Again,  to  be  useful  in  both  scientific  and  practical  prediction,  the 
possible  models  must  be  constrained;  too  many  possibilities  are 
not  helpful,  but  harmful. 

From  the  point  of  view  of  cognitive  architectures  and  the 
constraints  supplied  by  the  architecture,  the  modeling  ap¬ 
proaches  described  in  this  chapter,  as  currently  implemented, 
span  the  range  from  little  or  no  architectural  content  or  con¬ 
straints  (the  task  network  systems)  to  considerable  architectural 
complexity  and  constraints  (the  cognitive  architecture  systems) . 
GOMS  models  occupy  an  intermediate  position:  they  assume  a 
simplified,  but  definitely  constraining,  cognitive  architecture 
that  allows  them  to  be  applied  easily  by  interface  designers  and 
still  produce  usefully  accurate  results.  At  the  same  time,  how¬ 
ever,  they  are  less  flexible  than  the  modeling  systems  at  the 
other  extremes. 


Modeling  Cognitive  Versus  Perceptual-Motor 
Aspects  of  a  Design 

As  pointed  out  by  Byrne  (chapter  5,  this  volume),  cognitive  ar¬ 
chitectures  have  lately  begun  to  incorporate  not  just  proposed 
cognitive  mechanisms,  but  also  proposals  for  perceptual  and 
motor  mechanisms  that  act  as  additional  sources  of  constraint 
on  performance.  Calling  these  “cognitive”  architectures  is  some¬ 
thing  of  a  misnomer,  since  perceptual  and  motor  mechanisms 
are  normally  distinguished  from  cognitive  ones.  However,  in¬ 
cluding  perceptual  and  motor  constraints,  is  actually  a  critical  re¬ 
quirement  for  modeling  user  interfaces;  this  follows  from  the 
traditional  characterization  of  human-computer  interaction  in 
terms  of  the  interactive  cycle  (Norman,  1986).  The  user  sees 
something  on  the  screen  if  the  user  is  looking  in  the  right  place 
and  can  sense  and  recognize  it,  which  involves  the  perceptual 


system  and  associated  motor  processes  such  as  eye  movements. 
The  user  decides  what  to  do,  an  exclusively  cognitive  activity, 
and  then  carries  out  the  decision  by  performing  motor  actions 
that  are  determined  by  the  physical  interaction  devices  that  are 
present,  and  may  also  involve  the  perceptual  system,  such  as 
visual  guidance  for  mouse  pointing. 

Occasionally,  the  cognitive  processes  of  deciding  what  to  do 
next  can  dominate  the  perceptual  and  motor  activities.  For  ex¬ 
ample,  one  mouse  click  might  bring  up  a  screen  containing  a 
single  number,  such  as  a  stock  price,  and  the  user  might  think 
about  it  for  many  minutes  before  simply  clicking  on  a  “buy”  or 
“sell”  button.  Much  of  the  time,  however,  users  engage  in  a 
stream  of  routine  activities  that  require  only  relatively  simple 
cognitive  processing,  and  so  the  perceptual  and  motor  actions 
take  up  most  of  the  time  and  determine  most  of  the  task  struc¬ 
ture.  Two  implications  follow  from  this  thumbnail  analysis: 

1.  Modeling  purely  cognitive  tasks  is  generally  impractical. 
Trying  to  model  purely  cognitive  tasks  such  as  human  prob¬ 
lem-solving,  reasoning,  or  decision-making  processes  is  ex¬ 
tremely  difficult  because  they  are  so  open-ended  and  un¬ 
constrained  (see  also  Landauer,  1995).  For  example,  there 
are  a  myriad  possible  ways  in  which  people  could  decide  to 
buy  or  sell  a  stock,  and  the  nature  of  the  task  does  not  set 
any  substantial  or  observable  constraints  on  how  people 
might  make  such  decisions.  Stock  decisions  are  based  on 
everything  from  gut  feel,  to  transient  financial  situations,  to 
detailed  long-term  analysis  of  market  trends  and  individual 
corporate  strategies.  Trying  to  identify  the  strategy  that  a 
user  population  will  follow  in  such  tasks  is  not  a  routine  in¬ 
terface  design  problem,  but  a  scientific  research  problem, 
or  at  least  a  very  difficult  task  analysis  problem.  Fortunately, 
a  routine  task  analysis  may  produce  enough  information  to 
allow  the  designer  to finesse  the  problem — that  is,  sidestep  it 
or  avoid  having  to  confront  it.  For  example,  if  one  could  de¬ 
termine  what  information  the  stock  trader  needs  to  make 
the  decisions,  and  then  make  that  information  available  in  an 
effective  and  usable  manner,  the  result  will  be  a  highly  use¬ 
ful  and  usable  system  without  having  to  understand  exactly 
how  users  make  their  decisions. 

2.  Modeling  perceptual-motor  activities  is  critical.  A  good 
modeling  approach  at  a  minimum  must  explicitly  represent 
the  perceptual  and  motor  operations  involved  in  a  task.  For 
most  systems,  the  perceptual  and  motor  activities  involved  in 
interacting  with  a  computer  (a)  take  relatively  well-defined 
amounts  of  time,  (b)  are  heavily  determined  by  the  system 
design,  and  (c)  frequently  dominate  the  user’s  activity;  leav¬ 
ing  them  out  of  the  picture  means  that  the  resulting  model  is 
likely  to  be  seriously  inaccurate.  For  example,  if  two  interface 
designs  differ  in  how  many  visual  searches  or  mouse  points 
they  logically  require  to  complete  a  task,  the  one  requiring 
fewer  is  almost  certainly  going  to  be  faster  to  execute,  and 
will  probably  have  a  simpler  task  structure  as  well,  meaning 
it  will  probably  be  easier  to  learn  and  less  error-prone.  This 
means  that  any  modeling  approach  that  represents  the  ba¬ 
sic  timing  and  the  structure  of  perceptual  and  motor  activ¬ 
ity  entailed  by  an  interface  is  likely  to  provide  a  good  ap¬ 
proximation  to  the  basic  usability  characteristics  of  the 
interface. 
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This  conclusion  is  both  good  news  and  bad  news.  The  good 
news  is  that  since  perceptual-motor  activities  are  relatively  easy 
to  model,  it  can  be  easy  to  get  fairly  reliable  and  robust  model- 
based  evaluation  information  in  many  cases.  One  reason  why 
GOMS  models  work  so  well  is  that  they  allow  the  modeler  to 
easily  represent  perceptual-motor  activity  fairly  completely,  with 
a  minimum  of  complications  to  represent  the  cognitive  activity. 
The  bad  news  is  that  different  modeling  approaches  that  in¬ 
clude  perceptual-motor  operations  are  likely  to  produce  simi¬ 
lar  results  in  many  situations,  making  it  difficult  to  tell  which 
approach  is  the  most  accurate. 

This  does  not  mean  continuing  the  effort  to  develop  mod¬ 
eling  systems  is  futile;  rather,  the  point  is  that  trying  to  verify  or 
compare  models  in  complex  tasks  is  quite  difficult  due  to  prac¬ 
tical  difficulties  in  both  applied  and  basic  research.  Despite  the 
considerable  effort  and  expense  to  collect  it,  data  on  actual  real- 
world  task  performance  is  often  lacking  in  detail,  coverage,  and 
adequate  sample  sizes.  Even  in  the  laboratory,  collecting  highly 
precise,  detailed,  and  complete  data  about  task  performance  is 
quite  difficult,  and  researchers  are  typically  trapped  into  using 
tasks  that  are  artificial,  performed  by  nonexpert  subjects,  or  triv¬ 
ial  relative  to  actual  tasks.  There  is  no  easy  or  affordable  resolu¬ 
tion  to  this  dilemma,  so  the  practitioner  who  seeks  to  use  mod¬ 
els  must  be  cautious  about  claims  made  by  rival  modeling 
approaches,  and  look  first  at  how  they  handle  perceptual-motor 
activities.  The  theorist  seeking  to  improve  modeling  approaches 
must  be  constantly  iterating  and  integrating  over  both  labora¬ 
tory  and  actual  applications  of  modeling  methods. 

The  Science  Base  Must  be  Visible 

Even  though  the  modeling  methodology  encapsulates  the  con¬ 
straints  provided  by  psychological  theory,  it  is  critical  that  the 
psychological  assumptions  be  accessible,  justified,  and  intelligi¬ 
ble.  An  architecture  is  the  best  way  to  do  this,  because  the  psy¬ 
chological  assumptions  are  either  hardwired  into  the  modeling 
system  architecture,  or  are  explicitly  stated  in  the  task-specific 
programming  supplied  by  the  modeler.  The  basis  for  the  task- 
specific  programming  is  the  task  analysis  obtained  during  the 
overall  design  process,  and  the  basis  for  the  architecture  is  a 
documented  synthesis  of  the  scientific  literature. 

The  importance  of  the  documented  synthesis  of  the  scien¬ 
tific  literature  cannot  be  overstated.  The  science  of  human  cog¬ 
nition  and  performance  that  is  relevant  to  system  design  is  not 
at  all  “finished;”  important  new  results  are  constantly  appearing, 
and  many  long-documented  phenomena  are  incompletely  un¬ 
derstood.  Thus,  any  modeling  system  will  have  to  be  updated 
repeatedly  as  these  theoretical  and  empirical  issues  are  thrashed 
out,  and  it  will  have  to  be  kept  clear  which  results  it  incorporates 
and  which  it  does  not. 

The  commercial  modeling  tools  have  seriously  lagged  behind 
the  scientific  literature.  While  some  conservatism  would  be  de¬ 
sirable  to  damp  out  some  of  the  volatility  in  scientific  work,  the 
problem  is  not  just  conservatism,  but  rather  obsolescence,  as  in 
the  case  of  the  micromodels  inherited  from  HOS.  Perhaps  these 
systems  would  still  be  adequate  for  practical  work  but,  unfortu¬ 
nately,  it  is  very  difficult  to  get  a  scientific  perspective  on  their  ad¬ 
equacy  because  they  have  been  neither  described  nor  tested  in 


forums  or  under  ground  rules  similar  to  those  used  for  main¬ 
stream  scientific  work  in  human  cognition  and  performance. 
Thus,  they  have  not  been  subject  to  the  full  presentation,  strict 
review,  criticism,  and  evolution  that  are  characteristic  of  the  cog¬ 
nitive  architecture  and  GOMS  model  work.  The  practitioner 
should  therefore  greet  the  claims  of  commercial  modeling  sys¬ 
tem  with  healthy  skepticism,  and  developers  of  modeling  sys¬ 
tems  should  participate  more  completely  in  the  open  scientific 
process. 

The  Value  of  Generativity 

It  is  useful  if  a  modeling  method  is  generative,  meaning  that  a 
single  model  can  generate  predicted  human  behavior  for  a 
whole  class  of  scenarios,  where  a  scenario  is  defined  solely  in 
terms  of  the  sequence  of  input  events  or  the  specifications  for 
a  task  situation,  neither  of  which  specifies  the  behavior  the  user 
is  expected  to  produce.  Many  familiar  modeling  methods,  in¬ 
cluding  the  Keystroke-Level  type  of  GOMS  model,  are  non- 
generative,  in  that  they  start  with  a  specific  scenario  in  which 
the  model  builder  has  specified,  usually  manually,  what  the 
user’s  actions  are  supposed  to  be  for  the  specified  inputs.  A 
nongenerative  model  predicts  metrics  defined  only  over  this 
particular  input-output  sequence.  To  see  what  the  results  would 
be  for  a  different  scenario,  a  whole  new  model  must  be  con¬ 
structed  (though  parts  might  be  duplicated) .  Since  nongenera¬ 
tive  modeling  methods  are  typically  labor  intensive,  involving  a 
manual  assignment  of  user  actions  to  each  input-output  event, 
they  tend  to  sharply  limit  how  many  scenarios  are  considered, 
which  can  be  very  risky  in  complex  or  critical  design  problems. 

An  example  of  a  sophisticated  nongenerative  modeling 
method  is  the  CPM-GOMS  models  developed  by  Gray,  John,  and 
Atwood  (1993)  to  model  telephone  operator  tasks.  These  mod¬ 
els  decomposed  each  task  scenario  into  a  set  of  operations  per¬ 
formed  by  perceptual,  cognitive,  and  motor  processors  likes 
those  proposed  in  the  Card,  Moran,  and  Newell  (1983)  Model 
Human  Processor.  The  sequential  dependencies  and  time  du¬ 
rations  of  these  operations  were  represented  with  a  PERT  chart, 
which  then  specified  the  total  task  time,  and  whose  critical  path 
revealed  the  processing  bottlenecks  in  task  performance.  Such 
models  are  nongenerative  in  that  a  different  scenario  with  a  dif¬ 
ferent  pattern  of  events  requires  a  different  PERT  chart  to  rep¬ 
resent  the  different  set  of  process  dependencies.  Since  there  is 
a  chart  for  each  scenario,  predicting  the  time  for  a  new  scenario, 
or  different  interface  design,  requires  creating  a  new  chart  to 
fit  the  new  sequence  of  events.  However,  a  new  chart  can  often 
be  assembled  from  templates  or  portions  of  previous  charts, 
saving  considerable  effort  (see  John  &  Kieras,  1996a,  1996b  for 
more  detail).  In  addition,  computational  tools  for  modeling 
based  on  directly  representing  the  sequential  constraints  im¬ 
plied  by  a  cognitive  architecture  are  under  development  and 
may  substantially  simplify  the  construction  of  such  models  (e.g., 
John,  Vera,  Matessa,  Freed,  &  Remington,  2002;  Vera,  Howes, 
McCurdy,  &  Lewis,  2004). 

If  a  model  is  generative,  a  single  model  can  produce  pre¬ 
dicted  usability  results  for  any  relevant  scenario,  just  like  a  com¬ 
puter  program  for  calculating  the  mean  of  a  set  of  numbers  can 
be  applied  to  any  specific  set  of  values.  A  typical  Hierarchical 
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Task  Analysis  (HTA),  (see  Annett,  Duncan,  Stammers,  &  Gray, 
1971;  Kirwan  &  Ainsworth,  1992)  results  in  a  generative  repre¬ 
sentation,  in  that  the  HTA  chart  can  be  followed  to  perform  the 
task  in  any  subsumed  situation.  The  forms  of  GOMS  models  that 
explicitly  represent  methods  (see  John  &  Kieras,  1996a,  1996b) 
are  also  generative.  The  typical  cognitive  architecture  model  is 
generative  in  that  it  is  programmed  to  perform  the  cognitive 
processes  necessary  to  decide  how  to  respond  appropriately  to 
any  possible  input  that  might  occur  in  the  task.  In  essence,  the 
model  programming  expresses  the  general  procedural  knowl¬ 
edge  required  to  perform  the  task,  and  the  architecture,  when 
executing  this  procedural  knowledge,  supplies  all  of  the  details; 
the  result  is  that  the  model  responds  with  a  different  specific 
time  sequence  of  actions  to  different  specific  situations. 

For  example,  Kieras,  Wood,  and  Meyer  (1997)  used  a  cogni¬ 
tive  architecture  to  construct  a  production-rule  model  of  some 
of  the  telephone  operator  tasks  studied  by  Gray,  John,  and  At¬ 
wood  (1993).  Because  the  model  consisted  of  a  general  “pro¬ 
gram”  for  doing  the  tasks,  it  would  behave  differently  depend¬ 
ing  on  the  details  of  the  input  events;  for  example,  greeting  a 
customer  differently  depending  on  information  on  the  display 
and  punching  function  keys  and  entering  data  depending  on 
what  the  customer  says  and  requires.  Thus,  the  specific  behav¬ 
ior  and  its  time  course  of  the  model  depend  on  the  specific  in¬ 
puts,  in  a  way  expressed  by  a  single  set  of  general  procedures. 

A  generative  model  is  typically  more  difficult  to  construct  ini¬ 
tially,  but  because  it  is  not  bound  to  a  specific  scenario,  it  can 
be  directly  applied  to  a  large  selection  of  scenarios  to  provide  a 
comprehensive  analysis  of  complex  tasks.  The  technique  is  es¬ 
pecially  powerful  if  the  model  runs  as  a  computer  simulation  in 
which  there  is  (a)  a  simulated  device  that  represents  how  the 
scenario  data  results  in  specific  display  events  and  governs  how 
the  system  will  respond  to  the  user,  and  (b)  the  simulated  hu¬ 
man,  which  is  the  model  of  how  the  user  will  perform  the  task. 
The  different  scenarios  are  just  the  input  data  for  the  simulation, 
which  produces  the  predicted  behavior  for  each  one.  Further¬ 
more,  because  generative  models  represent  the  procedural 
knowledge  of  the  user  explicitly,  they  readily  satisfy  the  desir¬ 
able  property  of  models  described  above:  The  content  of  a  gen¬ 
erative  model  can  be  inspected  to  see  how  a  design  “works”  and 
what  procedures  the  user  must  know  and  execute. 

The  Role  of  Detail 

In  the  initial  presentation,  the  reader  may  have  noticed  the  em¬ 
phasis  on  the  role  of  detailed  description,  both  of  the  user’s  task 
and  the  proposed  interface  design.  Modeling  is  sometimes  crit¬ 
icized  because  it  appears  to  be  unduly  labor  intensive.  Building 
a  model  and  using  it  to  obtain  predictions  may  indeed  involve 
substantial  detail  work;  however,  working  out  the  details  about 
the  user’s  task  and  the  interface  design  is,  or  should  be,  a  nec¬ 
essary  part  of  any  interface  design  approach;  the  usability  lies 
in  the  details,  not  the  generalities  (Whiteside,  Jones,  Levy,  & 
Wixon,  1985).  If  the  user’s  task  has  not  been  described  in  de¬ 
tail,  chances  are  that  the  task  analysis  is  inadequate  and  a  suc¬ 
cessful  interface  will  be  more  difficult  to  achieve;  extra  design  it¬ 
erations  may  be  required  to  discover  and  correct  deficiencies 
in  the  original  understanding  of  the  user’s  needs.  If  the  interface 


designer  has  not  worked  out  the  interface  design  in  detail,  the 
prospects  of  success  are  especially  poor.  The  final  form  of  an 
interface  reflects  a  mass  of  detailed  design  decisions.  These  de¬ 
cisions  should  have  been  explicitly  made  by  an  interface  de¬ 
signer  whose  focus  is  on  the  user,  rather  than  the  programmers 
who  happen  to  write  the  interface  code.  The  designer  must 
therefore  develop  this  detail  as  part  of  any  successful  design  ef¬ 
fort.  In  short,  using  model-based  evaluation  does  not  require 
any  more  detail  than  should  be  available  anyway;  it  just  requires 
that  this  detail  be  developed  more  explicitly  and  perhaps  ear¬ 
lier  than  is  often  the  case. 

Cognitive  architectures  are  committed  to  detail. 

Cognitive  architecture  systems  are  primarily  research  systems 
dedicated  to  synthesizing  and  testing  basic  psychological  the¬ 
ory.  Because  they  have  a  heavy  commitment  to  characterizing 
the  human  cognitive  architecture  in  detail,  they  naturally  work 
at  an  extremely  detailed  level.  The  current  cognitive  architec¬ 
ture  systems  differ  widely  in  the  extent  to  which  they  incorpo¬ 
rate  the  most  potent  source  of  practical  constraints — namely, 
perceptual-motor  constraints — but  at  the  same  time,  they  are 
committed  to  enabling  the  representation  of  a  comprehensive 
range  of  very  complex  cognitive  processes,  ranging  from  mul¬ 
titask  performance  to  problem  solving  and  learning.  Thus, 
these  systems  are  generally  very  flexible  in  what  cognitive 
processes  they  can  represent  within  their  otherwise  very  con¬ 
strained  architectures. 

The  detail,  however,  has  a  downside.  Cognitive  architectures 
are  typically  difficult  to  program,  even  for  simple  tasks,  and  have 
the  further  drawback  that,  as  a  consequence  of  their  detail,  cur¬ 
rently  unresolved  psychological  issues  can  become  exposed  to 
the  modeler  for  resolution.  For  example,  the  nature  of  visual 
short-term  memory  is  rather  poorly  understood  at  this  time, 
and  no  current  architecture  has  an  empirically  sound  represen¬ 
tation  of  it.  Using  one  of  the  current  architectures  to  model  a 
task  in  which  visual  short-term  memory  appears  to  be  promi¬ 
nent  might  require  many  detailed  assumptions  about  how  it 
works  and  is  used  in  the  task,  and  these  assumptions  typically 
cannot  be  tested  within  the  modeling  project  itself  One  reason 
for  this  is  the  previously  discussed  difficulty  of  getting  high- 
precision  data  for  complex  tasks.  The  more  serious  reason, 
though,  is  that  in  a  design  context,  data  to  test  the  model  is  nor¬ 
mally  not  available  because  there  is  not  yet  a  system  with  which 
to  collect  the  data.  Less  detailed  modeling  approaches  such  as 
GOMS  may  not  be  any  more  accurate,  but  they  at  least  have  the 
virtue  of  not  sidetracking  the  modeler  into  time-consuming  de¬ 
tailed  guesswork  or  speculation  about  fundamental  issues.  See 
Kieras  (2005b)  for  more  discussion. 

Task  networks  can  be  used  before  detailed  design. 

Although  model-based  evaluation  works  best  for  detailed  de¬ 
signs,  the  task  network  modeling  techniques  were  developed  to 
assist  in  design  stages  before  detailed  design,  especially  for  com¬ 
plex  military  systems.  For  example,  task  network  modeling  was 
used  to  determine  how  many  human  operators  would  be  re¬ 
quired  to  properly  man  a  new  combat  helicopter.  Too  many  op¬ 
erators  drastically  increases  the  cost  and  size  of  the  aircraft;  too 
few  means  the  helicopter  could  not  be  operated  successfully 
or  safely.  Thus,  questions  at  these  stages  of  design  are  what 
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capacity  (in  terms  of  the  number  of  people  or  machines)  is 
needed  to  handle  the  workload,  and  what  kinds  of  work  needs 
to  be  performed  by  each  person  or  machine. 

In  outline,  these  early  design  stages  involve  first  selecting  a 
mission  profile,  essentially  a  high-level  scenario  that  describes 
what  the  system  and  its  operators  must  accomplish  in  a  typical 
mission,  then  developing  a  basic  functional  analysis  that  de¬ 
termines  the  functions  (large-scale  operations)  that  must  be 
performed  to  accomplish  the  mission,  and  what  their  interac¬ 
tions  and  dependencies  are.  Then  the  candidate  high-level  de¬ 
sign  consists  of  a  tentative  function  allocation  to  determine 
which  human  operator  or  machine  will  perform  each  function 
(see  Beevis,  Bost,  Doering,  Nordo,  Oberman,  Papin,  et  al., 
1992).  The  task  network  model  can  then  be  set  up  to  include 
the  tasks  and  their  dependencies,  and  simulations  run  to  de¬ 
termine  execution  times  and  compute  workload  metrics  based 
on  the  workload  characteristics  of  each  task. 

Clearly,  entertaining  detailed  designs  for  each  operator’s 
controls  or  workstation  is  pointless  until  such  a  high-level  analy¬ 
sis  determines  how  many  operators  there  will  be  and  for  which 
tasks  each  is  responsible.  Note  that  the  cognitive-architecture 
and  GOMS  models  are  inherently  limited  to  predicting  perfor¬ 
mance  in  detailed  designs,  because  their  basic  logic  is  to  use  the 
exact  sequence  of  activities  required  in  a  task  to  determine  the 
sequence  of  primitive  operations.  However,  as  will  be  discussed 
below,  recent  work  with  high-level  GOMS  models  suggests  an  al¬ 
ternative  approach  in  which  a  GOMS  model  using  abstract  or 
high-level  operators  to  interact  with  the  device  can  be  devel¬ 
oped  first,  and  then  elaborated  into  a  model  for  a  specific  in¬ 
terface  as  the  design  takes  shape.  However,  at  this  time,  for 
high-level  design  modeling,  the  task-network  models  appear  to 
be  the  best,  or  only,  choice. 

The  limitations,  however,  must  be  clearly  understood.  The 
ability  of  the  task  network  models  to  represent  a  design  at  these 
earliest  stages  is  a  direct  consequence  of  the  fact  that  these 
modeling  methods  do  not  have  any  detailed  mechanisms  or 
constraints  for  representing  human  cognition  and  performance. 
Recall  that  the  tasks  in  the  network  can  consist  of  any  arbitrary 
process  whose  execution  characteristics  can  follow  any  desired 
distribution.  Thus,  the  tasks  and  their  parameters  can  be  freely 
chosen  without  any  regard  to  how  a  human  will  actually  be  able 
to  do  them  in  the  final  version  of  the  system.  Hence,  this  early- 
design  capability  is  a  result  of  a  lack  of  theoretical  content  in  the 
modeling  system  itself. 

While  the  choice  of  tasks  in  a  network  model  is  based  on  a 
task  analysis,  the  time  distribution  parameters  are  more  prob¬ 
lematic;  how  does  one  estimate  the  time  required  for  a  human  to 
perform  a  process  specified  only  in  the  most  general  terms?  One 
way  is  to  rely  on  empirical  measurements  of  similar  tasks  per¬ 
formed  in  similar  systems,  but  this  requires  that  the  new  system 
must  be  similar  to  a  previous  system  not  only  at  the  task-function 
level,  but  also  at  least  roughly  at  the  level  of  design  details. 

Given  the  difficulty  of  arriving  at  task  parameter  estimates 
rigorously,  a  commonly  applied  technique  is  to  ask  a  subject 
matter  expert  to  supply  subjective  estimates  of  task  time  means 
and  standard  deviations  and  workload  parameters.  When  used 
in  this  way,  a  task-network  model  is  essentially  a  mathematically 
straightforward  way  to  start  with  estimates  of  individual  sub¬ 
task  performance,  with  no  restrictions  on  the  origin  or  quality  of 


these  estimates,  and  then  to  combine  them  to  arrive  at  perfor¬ 
mance  estimates  for  the  entire  task  and  system. 

Clearly,  basing  major  design  decisions  on  an  aggregation  of 
mere  subjective  estimates  is  hardly  ideal,  but  as  long  as  a  de¬ 
tailed  design  or  preexisting  system  is  not  available,  there  is  really 
no  alternative  to  guide  early  design.  In  the  absence  of  such 
analyses,  system  developers  would  have  to  choose  an  early  de¬ 
sign  based  on  “gut  feel”  about  the  entire  design,  which  is  surely 
more  dangerous. 

Note  that  if  there  is  a  detailed  design  available,  the  task-net¬ 
work  modeler  could  decompose  the  task  structure  down  to  a 
fine  enough  level  to  make  use  of  basic  human  performance  pa¬ 
rameters,  similar  to  those  used  in  the  cognitive-architecture  and 
GOMS  models.  For  example,  some  commercial  tools  supply  the 
HOS  micromodels;  however,  it  is  hard  to  see  the  advantage  in 
using  task  network  models  for  detailed  design.  The  networks 
and  their  supplementary  executable  code  do  not  seem  to  be  a 
superior  way  to  represent  task  procedures  compared  to  the 
computer-program-like  format  of  some  GOMS  models,  or  the 
highly  flexible  and  modular  structure  of  production  systems. 

Another  option  would  be  to  construct  GOMS  or  cognitive 
architecture  models  to  produce  time  estimates  for  the  individ¬ 
ual  tasks,  and  use  these  in  the  network  model  instead  of  sub¬ 
jective  estimates.  This  might  be  useful  if  only  part  of  the  design 
has  been  detailed,  but  otherwise,  staying  with  a  single  modeling 
approach  would  surely  be  simpler.  If  one  believes  that  interface 
usability  is  mostly  a  matter  of  getting  the  details  right,  along  the 
lines  originally  argued  by  Whiteside,  Jones,  Levy,  &  Wixon 
(1985)  and  verified  by  many  experiences  in  user  testing,  mod¬ 
eling  approaches  that  naturally  and  conveniently  work  at  a  de¬ 
tailed  design  level  will  be  especially  valuable. 


PRACTICAL  ISSUES  IN  APPLYING  A 
MODEL-BASED  EVALUATION  TECHNIOUE 


Once  a  model-based  evaluation  technique  is  chosen,  some  prac¬ 
tical  issues  arise  in  seeking  to  apply  the  technique  to  a  particu¬ 
lar  user  interface  design  situation.  This  section  presents  several 
of  these  issues. 

Creating  the  Simulated  Device 

As  previously  mentioned,  the  basic  structure  of  a  model  used 
for  evaluation  is  that  a  simulated  human  representing  the  user 
is  interacting  with  a  simulated  device  that  represents  the  system 
under  design.  In  a  parallel  with  Norman’s  interactive  cycle,  the 
simulated  human  receives  simulated  visual  and  auditory  input 
from  the  simulated  device,  and  responds  with  simulated  actions 
that  provide  input  to  the  simulated  device,  which  can  then  re¬ 
spond  with  different  visual  and  auditory  inputs  to  the  human. 
Depending  on  the  level  of  generativity  and  fidelity  of  the  model, 
the  simulated  device  can  range  from  being  a  dummy  device  that 
does  nothing  in  response  to  the  simulated  human  interaction,  to 
a  highly  detailed  simulation  of  the  device  interface  and  function¬ 
ality.  An  example  of  a  dummy  device  is  the  device  that  is  assumed 
in  the  Keystroke-Level  Model,  which  is  not  at  all  explicitly  de- 
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fined;  the  modeler  simply  assumes  that  a  specific  sequence  of 
user  actions  will  result  in  the  device  doing  the  correct  thing.  At 
the  other  extreme  are  models  such  as  the  ones  used  by  Kieras 
and  Santoro  (2004),  which  actually  implemented  significant  por¬ 
tions  of  the  logical  functionality  of  a  complex  radar  workstation 
and  the  domain  of  moving  aircraft  and  ships.  In  modeling  situa¬ 
tions  where  a  generative  model  is  called  for — namely,  a  complex 
task  domain  with  multiple  or  lengthy  detailed  scenarios — a  fully 
simulated  device  is  the  most  convenient  way  to  ensure  that  a 
simulated  human  is  in  fact  performing  the  task  correctly,  and  to 
easily  work  with  more  than  one  scenario  for  the  task  situation. 

It  is  important  to  realize  that  the  simulated  device  is  not  re¬ 
quired  to  implement  the  actual  interface  whose  design  is  be¬ 
ing  evaluated.  Rather,  it  suffices  to  produce  abstract  psycho¬ 
logical  inputs  to  the  simulated  human.  For  example,  if  a  red 
circle  is  supposed  to  appear  on  the  screen,  the  simulated  de¬ 
vice  can  merely  signal  the  simulated  human  with  an  abstract 
description  that  an  object  has  appeared  at  certain  (x,  y)  coor¬ 
dinates  that  has  a  “shape”  of  “circle,”  and  a  “color”  of  “red.”  It  is 
not  at  all  necessary  for  the  simulated  device  to  actually  produce 
a  human-viewable  graphical  display  containing  a  circular  red 
area  at  a  certain  position. 

A  lesson  learned  by  Kieras  and  Santoro  (2004)  was  that  the 
effort  required  to  construct  even  such  an  abstract  simulated  de¬ 
vice  in  a  complex  domain  is  a  major  part  of  the  modeling  effort, 
and  is  more  difficult  in  some  ways  than  constructing  the  models 
of  the  simulated  users.  Clearly,  to  some  extent,  this  effort  is  re¬ 
dundant  with  the  effort  required  to  develop  the  actual  system, 
and  so  can  undermine  the  rationale  for  modeling  early  the  de¬ 
sign  and  development  process. 

A  common  response  to  this  problem  is  to  seek  to  connect 
the  cognitive  architecture  directly  to  an  intact  application  or  sys¬ 
tem  prototype  that  plays  the  role  of  the  simulated  device,  in 
short  replacing  the  simulated  device  with  an  actual  device.  Work 
such  as  St.  Amant  and  Reidl  (2001)  provides  a  pathway  for  in¬ 
terfacing  to  an  intact  application:  The  technique  is  basically  to 
use  the  existing  API  of  the  application  platform  (e.g.,  Windows) 
to  capture  the  screen  bitmap  and  run  visual  object  recognition 
algorithms  on  it  to  produce  the  description  of  the  visual  inputs 
to  the  simulated  human,  and  outputs  from  the  simulated  hu¬ 
man  can  be  directly  supplied  to  the  platform  to  produce  key¬ 
board  input  or  to  control  the  cursor  position.  Even  for  the  lim¬ 
ited  domain  of  Windows  applications  using  the  standard  GUI 
objects,  this  is  technically  challenging,  but  quite  feasible. 

A  less  ambitious  approach  is  instead  of  connecting  to  an  in¬ 
tact  application  program,  to  instrument  a  prototype  version  of 
the  interface,  so  that  for  example,  when  the  prototype  causes  a 
certain  object  to  appear  on  the  screen,  the  simulated  human  is 
supplied  with  the  visual  input  description.  Given  the  consider¬ 
able  variety  in  how  GUIs  are  implemented,  this  solution  is  not 
very  general,  but  does  have  interesting  solutions  if  the  applica¬ 
tion  prototype  is  programmed  in  Java,  HTML,  or  similar  cross¬ 
platform  languages  or  general-purpose  tools  that  can  be  used 
for  prototyping.  A  good  example  is  the  HTML-based  modeling 
tool  described  in  John,  Prevas,  Salvucci,  and  Koedinger  (2004). 

Both  of  these  methods  of  coupling  a  user  model  to  an  ap¬ 
plication,  however,  suffer  from  an  easily  overlooked  limitation: 
The  time  when  modeling  is  most  useful  is  early  in  design,  before 
the  system  has  been  prototyped.  Thus,  because  coupling  to  a 


prototype  or  an  application  can  only  happen  late  in  the  devel¬ 
opment  process  or  after  development,  these  approaches  come 
too  late  to  provide  the  most  benefit  of  model-based  evaluation. 

Multiple  approaches  for  creating  the  simulated  device  are 
thus  both  possible  and  necessary.  If  the  design  questions  can 
be  answered  with  evaluation  techniques  such  as  the  Keystroke- 
Level  Model,  then  no  simulated  device  is  needed  at  all.  If  the 
model  is  for  an  existing  application,  coupling  to  the  intact  ap¬ 
plication  is  clearly  the  best  solution.  If  a  prototype  is  going  to  be 
constructed  at  this  point  in  the  design  process  anyway,  using  it 
as  the  simulated  device  is  the  best  solution.  But  in  the  poten¬ 
tially  most  useful  case,  the  simulated  device  must  be  created  be¬ 
fore  any  prototype  or  final  application,  making  the  fewest  pos¬ 
sible  commitments  to  prototyping  or  coding  effort;  this  requires 
constructing  a  simulated  device  from  scratch,  stripped  down  to 
the  bare  minimum  necessary  to  allow  a  candidate  design  to  in¬ 
teract  with  the  simulated  user.  The  next  sections  provide  some 
advice  on  this  process. 

How  to  simplify  the  simulated  device.  Distinguish  be¬ 
tween  device  behavior  that  is  relevant  to  the  modeling  effort 
and  that  which  is  not.  Basically,  if  the  simulated  human  will  not 
use  or  respond  to  certain  behaviors  of  the  simulated  device, 
then  the  simulated  device  does  not  need  to  produce  those  be¬ 
haviors.  A  similar  argument  applies  to  the  amount  of  detail  in 
the  behavior  Of  course,  as  the  interface  design  is  elaborated, 
the  simulated  device  may  need  to  cover  more  aspects  of  the 
task.  Good  programming  techniques  will  make  it  easy  to  extend 
the  simulated  device  as  needed;  a  good  programmer  is  a  defi¬ 
nite  asset  to  the  project. 

Distinguish  between  what  the  device  has  to  provide  to  the 
simulated  human,  and  what  would  be  a  convenience  to  the 
modeler.  That  is,  while  the  device  can  be  supplied  with  abstract 
descriptions,  an  actual  graphical  display  of  what  the  simulated 
device  is  displaying  can  be  a  very  useful  tool  for  the  modeler  in 
monitoring,  debugging,  or  demonstrating  the  model.  A  very 
crude  general-purpose  display  module  that  shows  what  the  sim¬ 
ulated  human  “sees”  will  suffice  for  these  purposes,  and  can  be 
reasonably  easy  to  provide  in  a  form  that  is  reusable  for  a  variety 
of  simulation  projects.  However,  developing  this  handy  display 
should  be  recognized  as  an  optional  convenience,  rather  than 
an  essential  part  of  the  simulated  device. 

Since  programming  the  simulated  device  can  be  a  significant 
programming  effort,  an  attractive  simplification  would  be  a  pro¬ 
gramming  language  that  is  specialized  for  describing  abstract 
device  behavior.  Clearly,  using  such  a  language  could  be  valu¬ 
able  if  the  modeling  system  already  provides  it  and  it  is  adequate 
for  the  purpose,  especially  if  the  device  programming  language 
can  generates  a  prototype  for  the  interface  that  can  be  directly 
coupled  to  the  simulated  human,  moving  the  whole  process 
along  rapidly.  An  extensive  project  involving  many  different  but 
similar  interface  designs  would  especially  profit  if  the  language 
matches  the  problem  domain  well. 

However,  in  less  than  ideal  situations  a  specialized  language 
is  unlikely  to  be  an  advantage.  The  reason  is  that  to  cover  the 
full  span  of  devices  that  might  need  to  be  simulated,  the  device 
programming  language  will  have  to  include  a  full  set  of  general 
programming  language  facilities.  For  example,  to  handle  the 
Kieras  and  Santoro  (2004)  domain,  trigonometric  functions  are 
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needed  to  calculate  courses  and  trajectories,  and  containers  of 
complex  objects  are  required  to  keep  track  of  the  separate  air¬ 
craft  and  their  properties.  Thus,  specialized  languages  will  in¬ 
evitably  have  to  include  most  of  the  same  feature  set  as  general- 
purpose  programming  languages,  meaning  that  the  developers 
of  modeling  systems  will  have  to  develop,  document,  maintain, 
and  support — with  editors  and  debuggers — a  full-fledged  pro¬ 
gramming  language.  This  takes  effort  away  from  the  functions 
that  are  unique  to  human  performance  modeling  systems,  such 
as  ensuring  that  the  psychology  is  correctly  represented.  In  ad¬ 
dition,  the  modeler  will  also  have  to  expend  the  time  and  effort 
necessary  to  learn  a  specialized  language  whose  complexity  is 
similar  to  a  general-purpose  programming  language,  also  taking 
effort  away  from  unique  aspects  of  the  modeling  effort. 

A  better  choice  would  be  to  provide  for  the  device  to  be  pro¬ 
grammed  easily  in  a  standard  general-purpose  programming 
language  that  modelers  can  (or  should)  know  anyway,  allowing 
reuse  of  not  just  the  modeler’s  skills,  but  existing  programming 
tools  and  education  resources  as  well.  A  well-designed  modeling 
system  can  ensure  that  a  minimum  of  system-specific  knowl¬ 
edge  must  be  acquired  before  coding  can  begin. 

Identifying  the  Task  Strategy 

A  task  analysis  does  not  necessarily  specify  a  task 
strategy.  Human  performance  in  a  task  is  determined  by 
(a)  the  logical  requirements  of  the  task,  what  the  human  is  sup¬ 
posed  to  accomplish,  as  determined  by  a  task  analysis;  (b)  the 
human  cognitive  architecture,  the  basic  mechanisms  available  to 
produce  behavior;  and  (c)  a  specific  strategy  for  doing  the  task, 
given  the  task  requirements,  the  architecture,  and  what  should 
be  done  in  what  order  and  at  what  time  to  complete  the  task. 
Thus,  to  construct  a  model  for  doing  the  task,  one  must  first 
understand  the  task,  then  choose  an  architecture,  and  then 
choose  a  strategy  to  specify  how  the  architecture  will  be  used  to 
accomplish  the  task.  Identifying  this  strategy  is  the  critical  pre¬ 
requisite  for  constructing  a  model. 

Normal  task  analysis  methods,  such  as  those  described  in 
sources  such  as  (Kirwan  &  Ainsworth,  1992;  Beevis  et  al.,  1992; 
Diaper  &  Stanton,  2004),  do  not  necessarily  identify  the  exact 
sequence  of  actions  to  perform  with  the  interface  under  design, 
and  they  rarely  specify  the  timing  of  actions.  For  example,  any¬ 
one  who  has  made  coffee  with  a  home  coffee  maker  knows  that 
there  are  certain  constraints  that  must  be  met,  but  there  is  still 
considerable  variation  in  the  sequence  and  time  in  which  the  in¬ 
dividual  required  steps  could  be  performed.  In  fact,  there  can 
be  variation  on  how  the  activity  is  organized;  for  example,  one 
strategy  is  to  use  the  visible  state  of  the  coffee  maker  as  an  ex¬ 
ternal  memory  to  determine  which  actions  should  be  per¬ 
formed  next  (Larkin,  1989).  A  normal  task  analysis  will  not  iden¬ 
tify  these  variations.  But  even  further,  task  analysis  will  not 
necessarily  identify  how  any  trade-offs  should  be  decided,  even 
as  basic  as  speed  versus  accuracy,  much  less  more  global  prob¬ 
lems  such  as  managing  workload,  dealing  with  multiple  task  pri¬ 
orities,  and  so  forth. 

Thus,  to  model  a  human  performing  in  such  situations,  some 
additional  information  beyond  a  normal  task  analysis  has  to  be 
added — namely,  the  specific  task  strategies  that  are  used  to  ac¬ 


complish  the  tasks.  Conversely,  the  performance  that  a  human 
can  produce  in  a  task  can  vary  over  a  wide  range,  depending  on 
the  specific  strategy  that  is  used,  and  this  is  true  over  the  range 
of  tasks  from  elementary  psychology  laboratory  tasks  to  highly 
complex  real-world  tasks  (Kieras  &  Meyer,  2000).  This  raises  a 
general  problem:  Given  that  we  have  a  model  that  predicts  per¬ 
formance  on  an  interface  design,  how  do  we  distinguish  the  ef¬ 
fects  of  the  interface  design  from  the  effects  of  the  particular 
strategy  for  using  the  interface?  Not  only  does  this  apply  to  the 
model  performance,  but  also  to  the  human’s  performance.  It 
has  always  been  clear  that  clever  and  experienced  users  could 
get  a  lot  out  of  a  poorly  designed  system,  and  even  a  reasonably 
well-designed  powerful  system  could  be  seriously  underused 
(Bhavnani  &  John,  1996).  How  can  we  predict  performance 
without  knowing  the  actual  task  strategy,  and  how  does  our 
model’s  task  strategy  relate  to  the  actual  user’s  task  strategy? 

Difficulties  in  identifying  task  strategy.  The  state  of 
the  art  in  cognitive  modeling  research  for  identifying  a  task  strat¬ 
egy  is  to  choose  a  candidate  intuitively,  build  the  model  using  the 
strategy,  evaluate  the  goodness  of  fit  to  data,  and  then  choose  a 
better  strategy  and  repeat  until  a  satisfactory  fit  is  obtained.  If 
there  is  adequate  detail  in  the  data,  such  as  the  sequence  of  ac¬ 
tivities,  it  might  be  possible  to  make  good  initial  guesses  at  the 
task  strategy  and  then  revise  these  through  the  modeling 
process.  This  iterative  refinement  process  is  known  to  be  very 
slow,  but  more  seriously,  in  system  design,  we  normally  do  not 
have  data  to  fit  a  model  to;  this  is  what  the  modeling  is  supposed 
to  replace.  The  task  strategy  has  to  be  chosen  in  the  absence  of 
such  data. 

Another  approach  is  to  get  the  task  strategy  by  knowledge 
engineering  techniques  with  existing  task  performers  or  other 
sources  such  as  training  materials.  As  will  be  argued  below,  it  is 
especially  important  to  identify  the  best  (or  at  least  a  good) 
strategy  for  doing  the  task.  However,  a  good  strategy  for  doing  a 
task,  is  often  not  obvious,  even  to  experts.  Even  highly  experi¬ 
enced  people  do  not  always  know  or  use  the  best  procedures; 
even  the  trainers  may  not  know  them,  and  it  is  common  to  dis¬ 
cover  that  procedural  training  materials  present  suboptimal 
methods.  Finally,  and  again  most  importantly,  if  the  system  is 
new,  there  are  no  experts  or  training  materials  to  consult  to  see 
how  it  is  used. 

A  heuristic:  Model  what  users  should  do.  Given  the 
obstacles  to  identifying  task  strategies,  how  do  we  find  out  what 
strategies  users  will  follow  in  using  the  system  under  develop¬ 
ment?  The  short  answer  is  that  it  is  too  hard  to  find  out  within 
the  constraints  of  a  design  process.  Instead,  start  from  the  design 
goals  that  the  system  is  supposed  to  meet,  and  assume  that  users 
will  be  using  the  system  as  it  is  supposed  to  be  used.  For  example, 
if  the  system  provides  a  feature  that  is  supposed  to  allow  the  user 
to  easily  perform  a  certain  task,  assume  that  the  simulated  user 
will  use  that  feature  in  the  intended  fashion.  This  is  essentially  a 
best-case  analysis  of  the  ability  of  the  user  to  make  use  of  the  in¬ 
terface  design.  If  the  usability  under  these  conditions  is  too  low, 
then  it  would  certainly  be  inadequate  when  used  by  actual  users. 
It  is  a  separate  issue  whether  the  users  can  or  will  use  the  sys¬ 
tem  in  this  intended  way,  and  the  failures  can  be  dramatic  and  se¬ 
rious  (Bhavnani  &  John,  1996).  Whether  users  use  a  system  in 
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the  intended  way  depends  on  several  factors:  (a)  problems  in  the 
learnability  of  the  design,  which  some  models  (see  John  & 
Kieras,  1996a,  1996b)  can  predict;  (b)  the  quality  of  the  training 
materials,  which  also  can  be  improved  with  modeling  (see  John 
&  Kieras,  1996a,  1996b);  and  (c)  perhaps  most  importantly,  mat¬ 
ters  beyond  the  scope  of  model-based  evaluation,  such  as 
whether  users  have  the  opportunity  or  incentive  to  take  full  ad¬ 
vantage  of  the  system.  Finally,  there  is  no  point  to  trying  to  im¬ 
prove  a  system  under  the  assumption  that  users  will  ignore  the 
capabilities  that  it  provides.  Thus,  in  terms  of  choosing  a  task 
strategy  to  model  for  design  evaluation,  the  most  effective  ap¬ 
proach  is  to  assume  that  the  design  will  be  used  as  intended.  Not 
only  will  this  strategy  be  the  easiest  to  identify  and  implement, 
but  it  is  also  most  directly  relevant  to  evaluating  the  design. 

Within  this  basic  strategy,  there  is  another  range  of  variation, 
which  is  whether  the  user  attempts  to  perform  at  the  highest 
possible  level,  or  simply  satisfices  by  performing  at  some  ade¬ 
quate  level.  That  is,  people  can  use  clever  low-level  strategies 
(what  Gray  &  Boehm-Davis  [2000]  termed  microstrategies),  to 
greatly  improve  performance.  For  example,  if  the  task  is  to  clas¬ 
sify  “blips”  on  a  radar  display,  the  user  can  speed  up  consider¬ 
ably  by  looking  for  the  next  blip  to  inspect  while  still  hitting  the 
keys  to  respond  to  the  previous  one  (Kieras,  Meyer,  &  Balias, 
2001).  On  the  other  hand,  the  user  is  performing  reasonably  if 
the  user  finishes  each  blip  before  going  on  to  the  next  one. 
Kieras  and  Meyer  (2000)  pointed  out  that  in  a  variety  of  even 
elementary  laboratory  tasks,  subjects  do  not  always  adopt  high- 
performance  strategies,  even  when  large  improvements  would 
result;  they  are  optional,  not  mandatory.  Therefore,  even  if  we 
are  willing  to  assume  that  the  design  will  be  used  as  intended, 
how  do  we  know  whether  the  actual  users  will  be  going  “all  out” 
with  a  clever  strategy  versus  just  “getting  the  job  done?”  Again, 
the  short  answer  is  that  it  is  too  difficult  to  find  out,  especially  in 
a  design  process  where  the  system  does  not  yet  exist. 

In  response  to  this  quandary,  Kieras  and  Meyer  (2000)  pro¬ 
posed  the  bracketing  heuristic.  Construct  a  base  model  in 
which  the  user  performs  the  task  in  a  straightforward  way  us¬ 
ing  the  interface  as  designed,  but  without  any  special  strategic 
optimizations,  d.  slowest-reasonable  model.  Derive  from  this  a 
fastest-possible  model  that  performs  the  task  at  the  maximum 
performance  level  allowed  by  the  cognitive  architecture  used 
for  the  model.  The  two  models  should  bracket  the  actual  user’s 
future  performance.  If  both  models  produce  adequate  perfor¬ 
mance,  then  the  design  should  be  adequate;  if  both  produce 
inadequate  performance,  then  the  design  needs  to  be  improved. 
If  the  slowest-reasonable  model  is  inadequate,  but  the  fastest- 
possible  model  is  acceptable,  boosting  the  level  of  training  or 
perhaps  motivation  of  the  user  might  result  in  satisfactory  per¬ 
formance,  although  clearly  improving  the  design  would  be  a 
more  robust  solution. 

Concerns  Over  Model  Validity 

Can  you  believe  the  model?  Suppose  a  model  implies 
critical  design  choices.  Should  you  follow  them?  A  poor  response 
is  to  build  and  test  prototypes  just  as  if  no  modeling  had  been 
done.  It  could  be  argued  that  the  modeling  might  have  clarified 
the  situation,  but  the  purpose  of  model-based  evaluation  is  to 


reduce  the  amount  of  prototyping  and  user  testing  required  to 
refine  a  design.  This  response  under-utilizes  the  approach. 

A  better  response  to  the  situation  is  to  understand  how  the 
model  implies  the  design  choices  by  answering  the  question, 
“"What  aspects  of  the  model  are  contributing  to  the  outcome?” 
This  can  be  done  by  profiling  the  model  processing  and  analyz¬ 
ing  the  model  structure.  If  the  critical  aspects  of  the  model  are 
known  to  be  valid  and  appear  to  be  properly  represented,  then 
the  model  results  should  be  accepted.  For  example,  perhaps 
one  design  is  slower  than  the  other  simply  because  it  turns  out 
that  more  navigation  through  menus  is  required.  The  model 
processes  involved  are  relatively  simple  and  adequately  vali¬ 
dated  in  the  literature;  however,  if  the  relevant  aspects  of  the 
model  are  problematic,  the  result  needs  further  study.  For  ex¬ 
ample,  suppose  the  model  for  the  better  of  two  designs  differs 
from  the  poorer  design  in  assuming  that  the  user  can  remember 
all  of  the  information  about  previously  inspected  screen  objects, 
and  so  does  not  need  to  search  the  screen  again  when  the  in¬ 
formation  is  needed  later.  Because  the  bounds  on  visual  mem¬ 
ory  are  unclear,  as  discussed  above,  the  modeling  architecture 
might  not  enforce  any  bounds.  Thus,  the  modeling  result  is  sus¬ 
picious  for  this  reason,  and  the  bounds  might  be  much  smaller 
than  the  model  assumes.  The  modeler  could  then  perform  a 
sensitivity  analysis  to  reveal  how  much  the  design  choice  might 
be  affected  by  the  problematic  assumption.  For  example,  the 
model  could  be  modified  to  vary  the  number  of  previous  in¬ 
spected  objects  that  could  be  remembered.  At  what  value  does 
this  parameter  change  which  design  is  predicted  to  be  better? 
If  the  decision  is  not  very  sensitive  to  this  parameter  (e.g.,  be¬ 
cause  the  effects  are  minor  compared  to  the  total  time  involved, 
or  there  are  other  improvements  in  the  better  design),  then 
choosing  the  better  design  is  a  reasonably  safe  decision. 

If  the  decision  turns  out  to  be  sensitive  to  the  problematic 
model  assumption,  then  the  situation  becomes  quite  difficult. 
One  possible  solution  is  to  remove  the  problematic  aspect  of 
the  model  by  changing  it  in  the  direction  of  less  human  capa¬ 
bility.  This  is  both  a  conservative  strategy  and  might  be  appro¬ 
priate  if  the  user  will  be  under  stress  as  well.  If  data  or  theory  is 
available  that  resolves  the  issue,  however,  the  modeler  can  go 
beyond  the  normal  model-based  process  and  modify  the  model 
or  architecture  to  incorporate  the  more  accurate  psychological 
information. 

Should  you  validate  the  model?  Remember  that  test¬ 
ing  with  real  users  must  be  done  sometime  in  the  development 
process,  because  the  models  do  not  cover  all  of  the  design  is¬ 
sues,  and  are  only  approximate  where  they  do  apply  and,  like 
any  analytic  method,  can  be  misapplied.  The  model  can  thus 
be  validated  after  use  by  comparing  the  final  user  test  results  to 
the  model  predictions.  This  will  reveal  problems  in  the  accuracy 
of  the  model  and  its  application  to  the  design,  and  any  modeling 
mistakes  and  design  errors  can  then  be  corrected  for  the  future. 

However,  should  special  data  to  validate  the  model  be  col¬ 
lected  prior  to  using  it  to  guide  the  design?  "While  it  would  seem 
to  be  a  good  idea  to  validate  a  model  before  using  it,  the  answer 
really  should  be  no  because  validation  is  not  supposed  to  be  a 
normal  part  of  using  a  predictive  model  of  human  performance. 
The  whole  idea  of  model-based  evaluation  is  to  avoid  data  col¬ 
lection  during  design.  Only  the  developers  of  the  modeling 
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methodology  are  supposed  to  be  concerned  about  the  validity, 
not  the  user  of  the  methodology. 

A  couple  of  special  cases  about  data  collection  need  to  be 
discussed.  One  is  data  collection  to  provide  basic  parameter  val¬ 
ues  for  modeling,  such  as  how  long  it  takes  the  user  to  input 
characters  with  a  novel  device.  If  the  parameters  concern  low- 
level  processes,  the  data  collection  is  independent  of  a  specific 
design  and  will  be  generally  useful  for  many  different  modeling 
applications.  The  second  special  case  is  data  collection  to  sup¬ 
port  modeling  how  an  existing  system  is  actually  being  used. 
Such  a  model  cannot  be  constructed  a  priori,  but  rather,  must 
be  based  on  data  about  how  actual  users  interact  with  the  actual 
system.  Due  to  the  uncertainties  involved  in  constructing  a 
model  based  on  human  behavior,  the  model  will  have  to  be  val¬ 
idated  with  a  suitably  complete  and  reliable  data  set  before  it 
can  be  taken  as  a  usefully  accurate  model.  This  purpose  of  mod¬ 
eling  is  very  different  from  the  model-based  evaluation  ap¬ 
proach  presented  in  this  chapter:  instead  of  serving  as  a  guide 
for  designing  a  new  system  and  a  surrogate  for  user  testing,  the 
model  is  an  explanation  and  characterization  of  observed  be¬ 
havior;  it  might  serve  as  a  guide  for  a  new  design,  but  only  in  the 
sense  of  characterizing  the  current  situation  that  we  want  to 
improve  upon.  The  model  itself  will  not  directly  apply  to  the 
new  design.  In  short,  modeling  the  actual  use  of  an  existing  sys¬ 
tem  has  very  different  methods,  goals,  and  applications  from 
modeling  the  usage  of  a  system  being  designed. 

If  in  spite  of  all  of  the  above  considerations,  the  validity  of 
the  model  is  in  question,  then  it  needs  to  be  realized  that  a  data 
set  adequate  for  a  scientifically  sound  test  of  validity  must  be 
much  more  controlled  and  detailed  than  normal  user  testing 
data,  and  can  be  very  difficult  to  collect  in  the  context  of  a  de¬ 
velopment  project.  Furthermore,  if  the  resources  are  available 
to  do  such  data  collection  and  analysis  before  the  final  stages 
of  development,  what  function  is  served  by  modeling?  If  the 
model  validity  would  be  in  doubt,  couldn’t  the  data  collection 
resources  be  better  devoted  to  user  testing? 

To  elaborate  on  the  difficulty  of  collecting  adequate  validation 
data,  while  all  would  agree  that  a  model  is  almost  certainly  in¬ 
correct  at  some  level,  it  is  often  mistakenly  assumed  that  col¬ 
lecting  empirical  data  on  human  performance  in  complex  tasks 
is  a  royal  road  to  certainty.  Rather,  as  pointed  out  in  Kieras  and 
Santoro  (2004),  complex  real-world  tasks  involve  subtle  user 
strategies,  team  interactions,  influences  of  background  knowl¬ 
edge,  and  the  specifics  of  the  scenarios.  Such  experiments  are 
extremely  slow  and  expensive  to  conduct,  even  with  small  sam¬ 
ples,  where  the  reliability  of  the  results  then  comes  into  ques¬ 
tion.  Clearly,  it  is  not  practical  to  run  experiments  using  (a)  many 
scenarios,  (b)  every  reasonable  design  variation,  (c)  every  can¬ 
didate  team  organization,  and  (d)  ample  numbers  of  subjects. 
Furthermore,  even  for  a  well-chosen  subset  of  these  possibilities, 
it  may  be  difficult  to  understand  why  people  did  what  they  did  in 
the  tasks;  asking  them  is  usually  ambiguous  at  best,  and  their 
strategies  might  be  idiosyncratic.  Thus,  the  reliability,  generaliz- 
ability,  and  even  the  meaning  of  the  data  can  be  difficult  to  de¬ 
termine.  In  fact,  it  can  be  difficult  to  ensure  that  the  model  and 
the  experiment  are  even  relevant  to  each  other.  For  example 
(see  Kieras  &  Santoro,  2004),  if  the  model  is  based  on  what  users 
should  do  in  the  task  and  the  test  users  don’t  follow  the  strat¬ 
egy  that  the  model  follows,  then  the  failure  of  the  model  to  be¬ 
have  the  same  way  as  the  test  users  is  actually  irrelevant;  the  data 


is  “wrong,”  not  the  model!  Thus,  even  if  deemed  appropriate, 
attempting  to  validate  a  model  of  a  complex  task  is  likely  to  be 
impractically  difficult  for  a  normal  development  process. 

Summary:  Assessing  model  validity.  Instead  of  col¬ 
lecting  data  to  validate  the  model,  assess  its  validity  in  terms  of 
whether  it  meets  the  following  basic  requirements: 

1.  Is  the  model  strategy  based  on  an  analysis  of  what  users 
should  do?  If  not,  it  is  a  poor  choice  to  inform  the  design  of 
a  new  system. 

2.  Is  it  likely  that  users  can  or  will  follow  the  same  strategy  as 
the  model?  If  not,  then  the  model  is  irrelevant.  Either  the 
model  was  misconstructed  or  the  design  is  fundamentally 
wrong. 

3.  Are  the  assumptions  about  human  abilities  in  the  model 
plausible?  If  not,  see  the  earlier  suggestions. 

If  the  answer  to  all  three  questions  is  yes,  then  the  model 
results  can  be  accepted  as  useful  guidance  for  the  design  deci¬ 
sions  without  special  validation  efforts.  Of  course,  it  needs  to  be 
kept  in  mind  that  the  model  might  be  seriously  incorrect,  but 
modeling  is  not  supposed  to  be  perfect;  it  suffices  merely  that 
it  help  a  design  process. 


GOMS  MODELS:  A  READY-TO-USE  APPROACH 


As  summarized  above,  GOMS  is  an  approach  to  describing  the 
knowledge  of  procedures  that  a  user  must  have  in  order  to  op¬ 
erate  a  system.  The  different  types  of  GOMS  models  differ  in  the 
specifics  of  how  the  methods  and  sequences  of  operators  are 
represented.  The  aforementioned  CPM-GOMS  model  represents 
a  specific  sequence  of  activity  in  terms  of  the  cognitive,  percep¬ 
tual,  and  motor  operators  performed  in  the  context  of  a  simple 
model  of  human  information  processing.  At  the  other  extreme  of 
detail,  the  Keystroke-Level  Model  (Card,  Moran,  &  Newell,  1980) 
is  likewise  based  on  a  specific  sequence  of  activities,  but  these  are 
limited  to  the  overt  keystroke-level  operators  (e.g.,  easily  ob¬ 
servable  actions  at  the  level  of  keystrokes,  mouse  moves,  find¬ 
ing  something  on  the  screen,  turning  a  page,  etc.).  The  task  exe¬ 
cution  time  can  be  predicted  by  simply  looking  up  a  standardized 
time  estimate  for  each  operator  and  then  summing  the  times. 
The  Keystroke-Level  Model  has  a  long  string  of  successes  to  its 
credit  (see  John  &  Kieras,  1996a).  'Without  a  doubt,  if  the  design 
question  involves  which  alternative  design  is  faster  in  simple  sit¬ 
uations,  there  is  no  excuse  for  measuring  or  guessing  when  a  few 
simple  calculations  will  produce  a  usefully  accurate  answer. 

It  is  easy  to  generalize  the  Keystroke-Level  Model  to  apply  to 
more  than  one  specific  sequence  of  operators.  For  example,  if 
the  scenario  calls  for  typing  in  some  variable  number  of  strings  of 
text,  the  model  can  be  parameterized  by  the  number  of  strings 
and  their  length.  However,  if  the  situation  calls  for  complex 
branching  or  iteration  and  clearly  involves  some  kind  of  hierar¬ 
chy  of  task  procedures,  such  sequence-based  models  become 
quite  awkward  and  a  more  generative  form  of  model  is  required. 

The  generative  forms  of  GOMS  models  are  those  in  which 
the  procedural  knowledge  is  represented  in  a  form  resembling 
an  ordinary  computer  programming  language,  and  are  written 
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in  a  fairly  general  sort  of  way.  This  form  of  GOMS  model  can  be 
applied  to  many  conventional  desktop  computing  interface  de¬ 
sign  situations.  It  was  originally  presented  in  Card,  Moran,  and 
Newell  (1983,  chapter  5),  and  further  developed  by  Kieras,  Poi¬ 
son,  and  Bovair  (Kieras  &  Poison,  1985;  Poison,  1987;  Bovair, 
Kieras,  &  Poison,  1990),  who  provided  a  translation  between 
GOMS  models  and  the  production-rule  representations  popular 
in  several  cognitive  architectures  and  demonstrated  how  these 
models  could  be  used  to  predict  learning  and  execution  times. 
Kieras  (1988,  1997)  proposed  a  structured-natural-language  no¬ 
tation,  NGOMSL  (“Natural”  GOMS  Language),  which  preserved 
the  empirical  content  of  the  production-rule  representation, 
but  resembled  a  conventional  procedural  programming  lan¬ 
guage.  This  notation  was  later  formalized  into  a  fully  executable 
form,  GOMSL  (GOMS  Language),  for  use  in  computer  simula¬ 
tion  tools  that  implement  a  simplified  cognitive  architecture 
that  incorporates  a  simple  hierarchical-sequential  flow  of  con¬ 
trol  (Kieras,  Wood,  Abotel,  &  Hornof,  1995;  Kieras,  2005a).  This 
tool  has  been  applied  to  modeling  team  tasks  (e.g.,  Kieras  & 
Santoro,  2004),  and  extended  to  provide  analysis  of  error  re¬ 
covery  methods  supported  by  error-source  heuristics  (Wood, 
2000).  See  Baumeister,  John,  and  Byrne  (2000)  for  a  survey  of 
computer  tools  for  GOMS  modeling. 

Continuing  the  analogy  with  conventional  computer  pro¬ 
gramming  languages,  in  generative  GOMS  models,  the  opera¬ 
tors  are  like  the  primitive  operations  in  a  programming  lan¬ 
guage;  methods  are  like  functions  or  subroutines  that  are  called 
to  accomplish  a  particular  goal,  with  individual  steps  or  state¬ 
ments  containing  the  operators,  which  are  executed  one  at  a 
time,  as  in  a  conventional  programming  language.  Methods  can 
assert  a  subgoal,  which  amounts  to  a  call  of  a  submethod,  in  a 
conventional  hierarchical  flow  of  control.  When  procedural 
knowledge  is  represented  explicitly  in  this  way,  and  in  a  format 
that  enforces  a  uniform  “grain  size”  of  the  operators  and  steps  in 
a  method,  then  there  are  characteristics  of  the  representation 
that  relate  to  usability  metrics  in  straightforward  ways. 

For  example,  the  collection  of  methods  represents  “how  to 
use  the  system”  to  accomplish  goals.  If  a  system  requires  a  large 
number  of  lengthy  methods,  then  it  will  be  hard  to  learn;  there 
is  literally  more  required  knowledge  than  for  a  system  with  a 
smaller  number  or  simpler  methods.  If  the  methods  for  similar 
goals  are  similar — or,  in  fact,  the  same  method  can  be  used  to 
accomplish  different,  but  similar,  goals — then  the  system  is 
“consistent”  in  a  certain,  easily  characterized  sense.  In  a  proce- 
durally  consistent  system,  fewer  methods,  or  unique  steps  in 
methods,  must  be  learned  to  cover  a  set  of  goals  compared  to 
an  inconsistent  system,  and  so  it  is  easier  to  learn.  One  can  lit¬ 
erally  count  the  amount  of  overlap  between  the  methods  to 
measure  procedural  consistency. 

Finally,  by  starting  with  a  goal  and  the  information  about  the 
specific  situation,  one  can  follow  the  sequence  of  operators 
specified  by  the  methods  and  sub-methods  to  accomplish  the 
goal.  This  generates  the  sequence  of  operators  required  to  ac¬ 
complish  the  goal  under  that  specific  situation;  if  the  methods 
were  written  to  be  adequately  general,  they  should  suffice  to 
generate  the  correct  sequence  of  operators  for  any  relevant  task 
situation.  The  times  for  the  operators  in  the  trace  can  be 
summed,  as  in  the  Keystroke-Level  Model,  to  obtain  a  predicted 
execution  time.  Details  of  the  timing  can  be  examined,  or  “pro¬ 
filed,”  to  see  where  the  processing  bottlenecks  are. 


Why  GOMS  Models  Work 

The  reasons  why  GOMS  Models  have  useful  predictive  and 
heuristic  power  in  interface  design  can  be  summarized  under 
three  principles:  The  rationality  principle  (cf  Card,  Moran,  & 
Newell,  1983)  asserts  that  humans  attempt  to  be  efficient  given 
the  constraints  on  their  knowledge,  ability,  and  the  task  situa¬ 
tion.  Generally,  when  people  attempt  to  accomplish  a  goal  with 
a  computer  system,  they  do  not  engage  in  behavior  that  they 
know  is  irrelevant  or  superfluous;  they  are  focused  on  getting 
the  job  done.  Although  they  might  perform  suboptimally  due  to 
poor  training  (see  Bhavnani  &  John,  1996),  they  generally  try 
to  work  as  efficiently  as  they  know  how,  given  the  system  they 
are  working  with.  How  they  accomplish  a  goal  depends  on  the 
design  of  the  system  and  its  interface.  For  example,  in  a  word¬ 
processing  system,  there  are  only  a  certain  number  of  sensible 
ways  to  delete  a  word,  and  the  user  has  some  basis  for  choos¬ 
ing  between  these  that  minimizes  effort  along  some  dimension. 
Between  these  two  sets  of  constraints — the  user’s  desire  to  get 
the  job  done  easily  and  efficiently,  and  the  computer  system’s 
design — there  is  considerable  constraint  on  the  possible  user 
actions.  This  means  that  we  can  predict  user  behavior  and  per¬ 
formance  at  a  useful  level  of  accuracy  just  from  the  design  of 
the  system  and  an  analysis  of  the  user’s  task  goals  and  situa¬ 
tion.  A  GOMS  model  is  one  way  of  combining  this  information 
to  produce  predicted  performance. 

Procedural  primacy  is  the  claim  that  regardless  of  what  else 
is  involved  in  using  a  system,  at  some  level,  the  user  must  infer, 
learn,  and  execute  procedures  in  order  to  accomplish  goals  us¬ 
ing  the  system.  That  is,  computers  are  not  used  purely  pas¬ 
sively — the  user  has  to  do  something  with  them,  and  this  activ¬ 
ity  takes  the  form  of  a  procedure  that  the  user  must  acquire  and 
execute.  Note  that  even  display-only  systems  still  require  some 
procedural  knowledge  for  visual  search.  For  example,  making 
use  of  the  flight  status  displays  at  an  airport  requires  choosing 
and  following  some  procedure  for  finding  one’s  flight  and  ex¬ 
tracting  the  desired  information,  because  different  airlines  use 
different  display  organizations,  some  of  which  are  probably 
more  usable  than  others.  Because  the  user  must  always  acquire 
and  follow  procedures,  the  complexity  of  the  procedures  en¬ 
tailed  by  an  interface  design  is  therefore  related  to  the  difficulty 
of  using  the  interface.  ’While  other  aspects  of  usability  are  im¬ 
portant,  the  procedural  aspect  is  always  present.  Therefore,  an¬ 
alyzing  the  procedural  requirements  of  an  interface  design  with 
a  technique  such  as  GOMS  will  provide  critical  information  on 
the  usability  of  the  design. 

Explicit  representation  refers  to  the  fact  that  any  attempt  to 
assess  something  benefits  from  being  explicit  and  clear  and  re¬ 
lying  on  some  form  of  written  formalized  expression.  Thus,  all 
task  analysis  techniques  (Kirwan  &  Ainsworth,  1992;  Beevis 
et  al.,  1992;  Diaper  &  Stanton,  2004)  involve  some  way  to  ex¬ 
press  aspects  of  a  user’s  task.  Likewise,  capturing  the  procedural 
implications  of  an  interface  design  will  benefit  from  represent¬ 
ing  the  procedures  explicitly  in  a  form  that  allows  them  to  be  in¬ 
spected  and  manipulated.  Hence,  GOMS  models  involve  writing 
out  user  procedures  in  a  complete,  accurate,  and  detailed  form. 
By  doing  so  in  a  specified  format,  it  becomes  possible  to  define 
metrics  over  the  representation  (e.g.,  counting  the  number  of 
statements)  that  can  be  calibrated  against  empirical  measure¬ 
ments  to  provide  predictions  of  usability.  Moreover,  by  making 
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user  procedures  explicit,  the  designer  can  then  apply  the  same 
kinds  of  intuition  and  heuristics  used  in  the  design  of  software: 
Clumsy,  convoluted,  inconsistent,  and  “ugly”  user  procedures 
can  often  be  spotted  and  corrected  just  like  poorly  written  com¬ 
puter  code.  Thus,  by  writing  out  user  procedures  in  a  notation 
like  GOMS,  the  designer  can  often  detect  and  correct  usability 
problems  without  even  performing  the  calculations.  This  ap¬ 
proach  can  be  applied  immediately  after  a  task  analysis  to  help 
choose  the  functionality  behind  the  interface,  as  well  as  to  help 
in  the  initial  design  decisions  (Kieras,  2004).  Work  in  progress 
consists  of  adding  high-level  operators  to  a  GOMSL  and  GLEAN 
(see  Kieras,  2005b)  to  directly  support  a  “seamless”  transition  of 
computational  modeling  from  the  task  and  functionality  level 
of  analysis  down  to  detailed  design. 

Limitations  of  GOMS  Models 

GOMS  models  address  only  the  procedural  aspects  of  a  com¬ 
puter  interface  design.  This  means  that  they  do  not  address  a  va¬ 
riety  of  nonprocedural  aspects  of  usability,  such  as  the  readabil¬ 
ity  of  displayed  text,  the  discriminability  of  color  codes,  or 
memorability  of  command  strings.  Fortunately,  these  properties 
of  usability  are  directly  addressed  by  standard  methods  in  hu¬ 
man  factors. 

Within  the  procedural  aspect,  user  activity  can  be  divided 
into  the  open-ended  “creative”  parts  of  the  task  (such  as  com¬ 
posing  the  content  of  a  document,  or  thinking  of  the  concept 
for  the  design  of  an  electronic  circuit  on  the  one  hand)  and  the 
routine  parts  of  the  task  on  the  other  (which  consist  of  simply 
manipulating  the  computer  to  accept  the  information  that  the 
user  has  created),  and  then  to  supply  new  information  that  the 
user  needs.  For  example,  the  creator  of  a  document  has  to  input 
specific  strings  of  words  in  the  computer,  rearrange  them,  for¬ 
mat  them,  spell  check  them,  and  then  print  them  out.  The  cre¬ 
ator  of  an  electronic  device  design  has  to  specify  the  circuit  and 
its  components  to  a  CAD  system  and  then  obtain  measures  of  its 
performance.  If  the  user  is  reasonably  skilled,  these  activities 
take  the  form  of  executing  routine  procedures  involving  little  or 
no  creativity. 

The  bulk  of  time  spent  working  with  a  computer  is  in  this 
routine  activity,  and  the  goal  of  computer  system  design  should 
be  to  minimize  the  difficulty  and  time  cost  of  this  routine  activ¬ 
ity  so  as  to  free  up  time  and  energy  for  the  creative  activity. 
GOMS  models  are  easy  to  construct  for  the  routine  parts  of  a 
task,  because,  as  described  above,  the  user’s  procedures  are 
constrained  by  the  task  requirements  and  the  design  of  the  sys¬ 


tem,  and  these  models  can  then  be  used  to  improve  the  ability 
of  the  system  to  support  the  user.  However,  the  creative  parts  of 
task  activity  are  purely  cognitive  tasks,  and,  as  previously  dis¬ 
cussed,  attempting  to  formulate  a  GOMS  model  for  them  is 
highly  speculative  at  best,  and  would  generally  be  impractical. 
Applying  GOMS  thus  takes  some  task  analysis  skill  to  identify 
and  separate  the  creative  and  routine  procedural  portions  of  the 
user’s  overall  task  situation. 

Finally,  it  is  important  to  recognize  that  while  a  GOMS  model 
is  often  a  useful  way  to  express  the  results  of  a  task  analysis,  sim¬ 
ilar  to  the  popular  Hierarchical  Task  Analysis  technique  (Annett 
et  al.,  1971;  Kirwan  &  Ainsworth,  1992),  building  a  GOMS  model 
does  not  “do”  a  task  analysis.  The  designer  must  first  engage  in 
task  analysis  work  to  understand  the  user’s  task  before  a  GOMS 
model  for  the  task  can  be  constructed.  In  particular,  identifying 
the  top-level  goals  of  the  user  and  selecting  relevant  task  sce¬ 
narios  are  all  logically  prior  to  constructing  a  GOMS  model. 


CONCLUDING  RECOMMENDATIONS 


•  If  you  need  to  predict  the  performance  of  a  system  prior  to 
detailed  design  when  overall  system  structure  and  functions 
are  being  considered,  use  a  task  network  model. 

•  If  you  are  developing  a  detailed  design  and  want  immediate  in¬ 
tuitive  feedback  on  how  well  it  supports  the  user’s  tasks,  write 
out  and  inspect  a  high-level  or  informal  GOMS  model  for  the 
user  procedures  while  you  are  making  the  design  decisions. 

•  If  your  design  criterion  is  the  execution  speed  for  a  discrete 
selected  task,  use  a  Keystroke-Level  model. 

•  If  your  design  criteria  include  the  learnability  consistency  or 
execution  speed  of  a  whole  set  of  task  procedures,  use  a  gen¬ 
erative  GOMS  model  such  as  CMN-GOMS  or  NGOMSL.  If  nu¬ 
merous  or  complex  task  scenarios  are  involved,  use  a  GOMS 
model  simulation  system. 

•  If  the  design  issues  hinge  on  understanding  detailed  or  sub¬ 
tle  interactions  of  human  cognitive,  perceptual,  and  motor 
processing  and  their  effect  on  execution  speed,  and  only  a 
few  scenarios  need  to  be  analyzed,  use  a  CPM-GOMS  model. 

•  If  the  resources  for  a  research-level  activity  are  available,  and 
a  detailed  analysis  is  needed  of  the  cognitive,  perceptual,  and 
motor  interactions  for  a  complex  task  or  many  task  scenarios, 
use  a  model  built  with  the  simplest  cognitive  architecture 
that  incorporates  the  relevant  scientific  phenomena. 
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SCHOFIELD 


INTRODUCTION 


This  chapter  is  intended  to  provide  guidance  on  how  to  ap¬ 
proach  technology  transfer  of  HCTrelated  research.  Admittedly, 
there  are  many  different  perspectives  one  could  take  in  looking 
at  this  often-difficult  problem:  researcher  versus  practitioner, 
industry  versus  government  versus  academia.  There  is  also  the 
extra,  added  dimension  of  whether  the  transfer  is  within  the 
boundaries  of  a  corporation,  or  across  corporate  lines. 

In  this  chapter,  we  will  discuss  tech  transfer  both  internal  and 
external  to  a  company,  and  then  discuss  commonalities  across  the 
two  processes.  Primarily  we  will  look  at  it  from  the  perspective  of 
the  researcher,  because  in  practice  the  burden  is  more  on  the  re¬ 
searcher  to  justify  the  transfer  and  to  make  it  work.  Practitioners, 
however,  will  also  gain  value  from  reading  this,  as  it  will  help  them 
to  understand  the  role  of  the  researcher  in  technology  transfer. 

While  any  technology  transfer  has  its  challenges,  HCI  tends 
to  be  a  particularly  difficult  one.  This  is  due  to  many  factors, 
but  the  two  main  ones  are  that  it  is  often  more  about  abstract 
ideas  than  specific  implementations,  and  because  we  in  the  HCI 
community  are  still  fighting  the  (wrong)  impression  that  HCI  is 
an  afterthought  and  not  the  “real  meat”  of  research  and  devel¬ 
opment  efforts. 


THE  NEW  FACE  OF  TECHNOLOGY  TRANSFER 


The  last  10  years  have  seen  a  dramatic  boom-and-bust  cycle 
in  the  technology  industry  that,  for  a  number  of  reasons,  has 
changed  the  face  of  business,  commerce,  and  information  flow. 
Probably  the  most  earthshaking  of  those  changes  has  been 
within  the  high-tech  industry  itself,  as  fitting  for  such  a  rabid 
consumer  of  its  own  technologies  and  ideologies. 

The  Internet,  through  its  nearly-ubiquitous  connectivity  to 
information  and  other  people,  has  reshaped  organizational  pat¬ 
terns  and  forged  a  brand-new  set  of  relationships  between  re¬ 
searchers,  designers,  developers,  manufacturers,  marketers, 
and  the  other  people  involved  in  business  and  commerce. 
Those  new  organizations  and  relationships  respect  neither  na¬ 
tional  nor  corporate  borders,  and  bring  us  one  step  closer  to  the 
“friction-free”  economy  that  was  trumpeted  in  the  mid-1990s. 

Technology  transfer  is  not  what  it  used  to  be,  largely  because 
research  and  development  is  also  not  what  it  used  to  be.  As  Fried¬ 
man  (2005)  pointed  out,  the  world  has  flattened,  and  increasingly 
research  and  development  can  be  done  anywhere,  with  anyone, 
and  for  anyone.  In  lockstep,  the  relationships  between  corpo¬ 
rate  and  academic  R&D  have  also  evolved;  there  are  fewer  and 
smaller  standalone  corporate  research  labs  as  many  companies 
have  chosen  to  “outsource”  their  basic  research  to  academia. 
Even  those  companies  that  continue  to  invest  in  basic  research 
are  realizing  that  with  fewer  barriers  to  communication  and  col¬ 
laboration,  an  “open  innovation”  model  of  joint  R&D,  as  sug¬ 
gested  by  Chesbrough  (2003),  often  makes  sense  to  maximize  the 
impact  of  internally  generated  ideas  as  well  as  to  take  full  advan¬ 
tage  of  the  most  relevant  externally  generated  ones. 


Government  research,  and  funding,  is  also  changing.  Some 
of  this  is  due  to  economic  cycles;  much  is  due  to  shifting  politi¬ 
cal  priorities.  In  Europe,  for  example,  the  Lisbon  Agenda  defines 
the  European  Union’s  desire  to  make  Europe  the  most  innova¬ 
tive  economy  in  the  world  by  2009,  and  this  drives  their  invest¬ 
ment  in  research  and  development.  In  contrast,  in  the  United 
States,  despite  a  wide  recognition  that  IT  is  driving  economic 
advances,  the  major  funding  agencies  (NSF,  DARPA,  NIST)  have 
reduced  their  support  for  academic  research  in  computer  sci¬ 
ence — with  a  built-in  assumption  that  private  industry  should 
and  will  take  up  the  slack.  Somewhere  in  the  middle,  many 
countries  (e.g.,  Canada,  Australia)  have  programs  where  acade¬ 
mic  researchers  who  are  funded  by  private  companies  can  apply 
to  the  government  for  matching  funding. 

What  should  we  take  away  from  this?  That  clearly  there  is  no 
one  model  for  technology  transfer  from  research  into  industry — 
there  exists  a  whole  spectrum.  As  mentioned  previously,  some 
governments  fund  research  while  others  don’t,  and  some  are 
in  the  middle.  Likewise,  some  companies  do  their  own  basic  re¬ 
search,  while  others  outsource  it,  and  some  are  in  the  middle. 
Some  universities  keep  a  death-grip  on  their  faculty’s  intellec¬ 
tual  property;  others,  such  as  University  of  Wisconsin  and  Uni¬ 
versity  of  Waterloo,  have  a  policy  that  IP  belongs  to  the  inventor. 
Many  are  in  the  middle. 

Consequently,  there  is  no  “playbook”  that  will  explain  exactly 
how  to  transfer  a  research  result.  Both  researchers  and  practi¬ 
tioners  will  need  to  be  very  flexible  and  willing  to  adjust  to  the 
situation  in  which  they  find  themselves.  Tech  transfer  is,  in  the 
end,  a  business,  and  the  best  way  to  succeed  at  it  is  to  think  like 
a  businessperson. 


INTERNAL  TECHNOLOGY  TRANSFER 


I  am  constantly  surprised  by  the  number  of  people  who  believe 
that  technology  transfer  is  some  sort  of  Rube  Goldberg  ma¬ 
chine,^  where  technology  is  inserted  in  one  end  of  the  contrap¬ 
tion,  strange  things  happen  in  the  middle  that  usually  involve 
people  in  uncomfortable  and  contorted  positions,  and  then 
magically  it  pops  out  on  the  other  end.  Countless  books  and 
articles  have  been  written  (Lesko,  Nicolai,  &  Steve,  1998;  Mock, 
Kenkeremath,  &  Janis,  1993)  in  an  attempt  to  document  the 
perfect  mechanical  process  for  technology  transfer.  And  yet,  de¬ 
spite  the  fact  that  nearly  everyone  has  had  painful  experiences 
trying  to  define  a  mechanical  process  for  technology  transfer, 
they  still  try  to  do  it  and  complain  bitterly  when  it  fails  (Butler, 
1990;  Hiltzik,  1999;  Isaacs  &  Tang,  1999;  Singh,  1993). 


TECHNOLOGY  TRANSFER 
AS  A  SOCIAL  PROCESS 


Tech  transfer  is  not  a  mechanical  or  logistical  process;  it  is  fun¬ 
damentally  a  social  process.  It  succeeds  when  people  build  a 
relationship  between  the  provider  and  the  recipient  that  fos¬ 
ters  trust  and  communication.  Manning  (1974)  recognized  that 


^For  those  in  the  United  Kingdom,  that  would  be  “Heath  Robinson  machine.’ 
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successful  technology  transfer  centers  on  viewpoints  and  per¬ 
spectives,  and  fundamentally  on  communications.  Foley  (1996) 
spoke  to  this  point  most  directly — that  technology  transfer  is  a 
“full-contact  sport”  that  centers  on  the  people. 

Successful  product  organizations  understand  that  risk  is 
their  mortal  enemy.  They  work  hard  to  proactively  manage  the 
risk  in  their  development  process,  or  to  remove  the  risk  factors 
altogether.  One  of  the  most  prevalent  and  difficult-to-manage 
risk  factors  is  an  external  dependency,  and  let’s  be  honest:  an 
external  dependency  from  a  research  organization  looks  about 
as  risky  as  it  gets.  As  long  as  your  counterparts  in  product  orga¬ 
nizations  think  of  a  research  organization  that  way,  technology 
transfer  is  difficult  at  best  and  often  outright  impossible. 

In  order  to  succeed  with  technology  transfer,  we  need  to 
mitigate  the  risk,  or  at  least  the  perceived  risk  in  the  minds  of 
the  people  we  wish  to  receive  our  technology. 

Up  to  this  point,  none  of  this  is  particularly  controversial,  but 
this  is  where  the  paths  diverge.  Many  people  will  tell  you  that 
you  succeed  in  mitigating  the  risk  by  creating  well-defined,  step- 
by-step  processes  through  which  you  and  your  industry  part¬ 
ner  will  enact  the  technology  transfer.  I  argue  that  this  approach 
fails  more  often  than  it  succeeds,  for  two  main  reasons: 

•  The  only  experience  that  people  in  industry  have  with  exter¬ 
nal  dependencies  is  the  occasional  dependence  on  an  exter¬ 
nal  contractor  or  supplier  to  deliver  a  finished  component 
ready  for  integration.  They  inevitably  use  this  as  the  model 
for  defining  their  tech-transfer  process  from  research,  and  it 
is  fundamentally  incompatible.  Research  technologies  are  not 
finished  components,  and  any  product  organization  that  ex¬ 
pects  a  research  group  to  deliver  a  finished  component  fun¬ 
damentally  misunderstands  the  role,  expertise,  and  hiring 
practices  of  a  research  organization.  Research  groups  almost 
never  understand  the  development  and  test  practices  of  a 
commercial  product  organization;  even  if  they  did,  those 
practices  vary  so  widely  between  organizations  that  past  ex¬ 
perience  does  not  guarantee  that  they  could  successfully  de¬ 
liver  a  finished  component.  Moreover,  it  is  not  the  goal  of  a 
research  organization  to  develop  technologies  into  finished 
components;  its  goal  is  to  discover  and  prove  solutions  to 
previously  unsolved  problems.  To  do  so  requires  different 
skills  and  expertise,  and,  frankly,  different  development  and 
testing  methodologies.  Both  research  and  product  organiza¬ 
tions  need  to  understand  this  fundamental  difference,  and 
embrace  it  as  a  way  to  complement  each  others’  strengths, 
rather  than  ignore  it  and  delude  themselves  about  a  theoret¬ 
ical  capability  that,  practically  speaking,  is  not  there. 

•  People  who  don’t  understand  each  other  can’t  communicate 
and  don’t  trust  each  other,  and  can’t  be  expected  to  work 
cooperatively  toward  a  shared  goal,  even  within  the  best- 
defined  process.  The  trust  and  communication  must  come 
first.  If  the  two  people  on  opposite  sides  of  a  table  trust  each 
other,  then  the  two  of  them  can  accomplish  anything;  if  they 
don’t,  they  will  never  accomplish  anything  of  value. 


BUILDING  TRUST  IN  STEPS 


So  then  comes  the  catch:  how  does  one  build  trust?  By  work¬ 
ing  side  by  side,  of  course!  This  means  that  one  must  start  with 


the  kind  of  activities  that  are  initially  low  in  risk,  but  high  in  com¬ 
munication,  and  build  on  one’s  successes  to  build  more  trust 
and  overcome  successively  higher  levels  of  risk. 

Step  I  is  to  establish  trust  that  one  is  an  expert  in  the  domain. 
Technical  people,  whether  in  research  or  in  industry,  are  almost 
universally  avid  readers  and  understand  the  importance  of  staying 
up  to  date  in  their  fields.  But  we  all  suffer  from  a  lack  of  time  to 
weed  through  the  volumes  of  less-than-useful  information  to  find 
the  truly  valuable  nuggets  of  wisdom.  If  someone  in  a  product 
organization  expresses  an  interest  in  one’s  field,  an  offer  to  for¬ 
ward  them  a  set  of  papers,  articles,  and  books  that  represent  the 
seminal  reading  is  a  great  first  response.  Granted,  doing  a  litera¬ 
ture  search  is  not  glamorous  work,  but  it  fundamentally  demon¬ 
strates  a  working  knowledge  of  the  domain  and  an  ability  to  pro¬ 
vide  guidance.  Equally  important,  it  shows  a  healthy  respect  for 
the  people’s  intellect  and  a  flattering  assumption  that  they  will 
be  able  to  read  and  digest  the  material.  One  of  two  things  will 
happen.  Either  they  will  actually  read  the  materials  sent  to  them, 
in  which  case  they  have  not  only  made  an  initial  investment  in 
seeing  a  technology  transfer  happen  but  have  also  been  provided 
with  great  topics  for  further  conversation,  or  they  will  not  read 
the  material  and  most  likely  conclude  that  it  is  simply  easier  to 
rely  upon  the  researcher  as  their  expert  rather  than  to  become 
experts  themselves.  Either  way,  it  is  a  foot  in  the  door.  They  will 
ask  endless  questions  as  they  try  to  decide  for  themselves  what 
is  within  the  realm  of  possibility  and,  more  importantly,  practi¬ 
cality.  It  is  essential  to  ask  them  just  as  many  questions  to  under¬ 
stand  as  completely  as  possible  their  constraints  and  to  make 
clear  recommendations  on  what  they  can  expect  to  build. 

Step  2  is  to  move  from  simply  giving  domain  guidance  on  the 
state  of  the  art,  to  providing  feedback  on  product-design  plans. 
This  involves  offering  to  review  specifications  and  providing 
timely  feedback.  Timeliness  is  critical — schedules  are  the  rules 
of  the  game,  and  an  ability  to  stay  within  their  stated  schedule 
reflects  an  understanding  of  the  rules,  an  appreciation  for  their 
importance,  and  a  commitment  to  the  success  of  their  project. 
This  is  also  a  critical  test  of  a  researcher’s  ability  to  think  practi¬ 
cally;  in  their  distrust,  they  might  expect  suggestions  of  wildly 
impractical  things  that  would  have  a  negative  impact  on  their 
schedule  or  require  resources  out  of  proportion  the  relative 
importance  of  the  technology  to  the  overall  product.  It  is  the  re¬ 
searcher’s  job  to  show  once  again  an  understanding  of  their 
constraints  and  the  value  added  to  their  team  effort.  Success  will 
be  apparent  when  a  subtle  but  important  shift  happens:  instead 
of  the  researcher  asking  to  review  their  design  documents,  they 
will  ask  the  researcher  to  review  them. 

Step  3  is  a  significant  one  indeed:  when  the  clients  ask  the  re¬ 
searcher  to  help  write  the  specification  for  the  product.  Don’t 
expect  this  to  happen  until  there  has  been  a  clear  success  at 
Step  2  that  has  established  credibility.  But  when  it  does  hap¬ 
pen,  the  product  organization  is  making  a  loud  and  unambigu¬ 
ous  statement  that  they  now  think  of  the  researcher  as  part  of 
the  product  team.  This  is  an  enormous  step  for  a  product  group 
to  take  in  their  relationship,  and  it  is  a  heavy  responsibility  to 
take  on.  At  first,  they  will  probably  only  delegate  small  parts,  and 
often  they  might  ask  the  researcher  to  cowrite  design  documents. 
But  regardless  of  the  size  of  the  assignment,  the  key  to  success 
is  the  same:  whatever  is  designed  must  be  easily  buildable  and 
testable.  If  there  is  any  significant  disagreement  on  whether  the 
design  can  be  built  or  tested,  the  product  organization  will  not 
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take  the  risk.  Development  organizations  (at  least  the  successful 
ones)  are  inherently  conservative  and  will  overstate  the  costs 
to  build  new  technologies.^  This  is  not  only  another  test  of 
whether  the  researcher  understands  their  constraints,  but 
equally  whether  he  or  she  understands  their  development 
process.  I  encourage  “aiming  low”  initially,  and  looking  for  indi¬ 
cations  from  the  team  that  they  would  like  to  work  together  to 
design  something  more  aggressive.  If  there  is  success  in  code¬ 
signing  components,  they  will  loosen  the  reins  and  delegate 
more  responsibility  (with  more  autonomy) . 

Step  4  is  where  one  (finally)  gets  involved  in  implementation. 
It  has  taken  enormous  patience  on  the  part  of  the  researcher  to 
get  here,  and  there  are  still  landmines  everywhere.  No  two  de¬ 
velopment  organizations  are  alike;  they  all  have  different  practices 
for  creating,  documenting,  integrating,  accepting,  testing,  and 
deploying  new  products.  It  is  impossible  to  understand  all  of  their 
processes,  and  it  is  very  unlikely  that  they  will  all  be  written  down; 
yet,  every  one  of  them  is  an  opportunity  to  break  form  and  cause 
a  rift.  I  would  encourage  asking  the  group  manager  how  they 
bring  a  new  employee  into  the  group  and  what  training  and  men¬ 
toring  that  person  would  go  through;  further,  see  if  there  are  op¬ 
portunities  to  take  advantage  of  such  a  process  to  help  to  get  up 
to  speed.  If  one  has  made  it  this  far,  the  product  team  wants  to 
see  a  success  as  much  as  the  researcher  does.  Because  one’s  suc¬ 
cess  is  the  other’s  success,  the  product  team  will  be  very  reason¬ 
able  about  doing  things  to  help  themselves  be  understood  by 
the  researcher,  especially  if  it  is  clear  to  them  that  is  the  goal.  All 
development  groups  fall  in  love  with  their  own  processes,  and 
one  earns  their  cooperation  by  showing  equal  respect  for  those 
processes,  no  matter  how  silly  they  might  seem  to  an  outsider. 

The  key  to  success  in  a  development  process  is  to  realistically 
promise  and  over-deliver.  Set  rational  expectations;  they  should 
think  that  you  can  carry  your  own  weight,  but  not  that  you  are 
God’s  gift  to  engineering.  Promise  metrics  for  work  and  for  the 
technologies  created  that  are  achievable  but  not  overly  aggres¬ 
sive,  and  then  exceed  those  metrics.  By  doing  that,  it  is  possi¬ 
ble  to  fully  gain  their  trust  and  move  on  to  discuss  with  them 
more  aggressive  technology  transfers. 

It  is  important  to  note  that  in  this  “pyramid”  of  sorts  that  we 
are  building  with  increasing  levels  of  risk  and  corresponding 
trust,  it  is  possible  to  peak  at  any  level.  For  instance,  if  the  re¬ 
searcher  doesn’t  have  the  development  skills  to  codevelop  com¬ 
ponents  with  a  real  product  team,  then  don’t  try  to  do  it!  By  all 
measures,  every  step  in  this  process  can  be  considered  technol¬ 
ogy  transfer.  Product  organizations  need  knowledge,  under¬ 
standing,  and  ideas  about  technology  just  as  much  as  they  need 
finished  technology  components;  they  need  to  understand  what 
can’t  be  built  just  as  much  as  what  can.  And,  most  importantly, 
they  need  researchers  to  tell  them  honestly  what  they  are  and  are 
not  capable  of  doing  for  them.  Even  without  ever  delivering  a  fin¬ 
ished  component  to  a  product  organization,  one  can  still  have  a 
litany  of  technology-transfer  successes  for  which  the  product  or¬ 
ganization  will  sing  praises.  It  is  more  important  to  proceed  in 
measured  steps  built  upon  past  successes  and  build  the  trust  and 
the  lines  of  communication  that  will  guarantee  future  successes. 


THINKING  FOR  THE  FUTURE 


It  is  also  important  to  be  thinking  to  the  future — to  be  thinking 
about  what  comes  next.  There  is  always  a  desire  to  simply  throw 
a  technology  over  the  wall  and  then  to  move  on  to  the  next  re¬ 
search  project,  but  this  is  unrealistic.  It  never  really  works  that 
way,  and  even  if  it  did,  “throwing  it  over  the  wall”  would  end 
the  relationship  and  any  opportunities  for  future  technology 
transfer;  the  ongoing  relationship  after  the  transfer  is  an  oppor¬ 
tunity  to  carry  on  a  dialogue  about  the  next  great  technology. 

From  the  product-organization  perspective,  there  is  rarely  a 
“clean”  way  to  integrate  a  component.  The  overwhelming  ma¬ 
jority  of  development  work  is  revisions  to  existing  products; 
very  rarely  are  new  products  started.  And  revision  work  means 
that  new  components  need  to  be  integrated  into  an  existing 
legacy  framework;  this  usually  requires  development  work  on 
both  sides  of  the  integration  to  ensure  the  optimal  match. 


APPLYING  THE  APPROACH 
TO  HCI  TECH  TRANSFER 


As  if  this  were  not  difficult  enough,  applying  this  approach  to 
HCI-related  technology  transfer  introduces  its  own  challenges. 
One  can  read  The  Psychology  of  Human-Computer  Interaction 
(Card,  Moran,  &  Newell,  1983)  and  learn  that  at  a  very  funda¬ 
mental  level  a  set  of  scientific  principles  holds  very  broadly. 
However,  we  in  the  HCI  community  have  also  learned  that  in¬ 
teractive  systems  must  be  designed  within  the  context  of  a  par¬ 
ticular  task  and  human,  and  that  this  very  fact  makes  it  tricky  at 
best,  and  misleading  or  impossible  at  worst,  to  try  to  generalize 
specific  designs  to  other  contexts.  Even  with  the  best  of  inten¬ 
tions  and  the  most  thorough  usability  testing,  there  are  no  clear 
guidelines  about  how  much  of  an  HCI-related  research  tech¬ 
nology  can  actually  be  transferred,  and  particularly  for  integra¬ 
tion  into  an  existing  product.  So  “throwing  over  the  wall”  is  es¬ 
pecially  difficult,  as  it  calls  for  potential  redesign,  as  well  as 
further  development  and  integration,  test,  localization,  support, 
and  operations  (it  is  increasingly  a  service  world,  after  all). 

In  the  traditional  view  of  technology  transfer,  lack  of  a  clean 
handoff  would  be  fatal.  In  the  “relationship”  view,  however,  this 
is  an  opportunity  to  build  a  working  model  that  lets  you  over¬ 
come  the  challenges  and  work  side-by-side  with  a  product  or¬ 
ganization  to  guide  the  transfer  of  your  work. 

Beyond  simply  moving  up  the  pyramid,  one  can  do  other  im¬ 
portant  things  to  deepen  the  relationship.  Scheduling  regular 
“maintenance”  conversations  can  help  to  maintain  communica¬ 
tion  channels  and  to  keep  abreast  of  activity  in  the  product  orga¬ 
nization;  it  is  also  an  opportunity  to  continue  to  update  them  on 
progress  on  new  work.  The  relationship  can  also  be  used  to  im¬ 
prove  the  researcher’s  own  work  by  learning  about  critical  real- 
world  issues.  Good  product  organizations  have  a  wealth  of  in¬ 
formation  about  their  customers;  by  using  their  access  to  real 


ironically,  it  has  been  my  experience  that  development  organizations  tend  to  estimate  incorrectly  not  in  the  new  technologies,  but  rather  in  the  in¬ 
cremental  improvements  to  legacy  components,  and  particularly  in  the  “integration”  work  in  making  multiple  components  work  together. 
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customers  (and  aggregate  information  about  them),  the  relevance 
of  research  activities  can  be  improved.  It  is  an  opportunity  to  ask 
key  questions,  learn  about  critical  product  strategic  direction, 
conduct  user  studies,  and  find  out  what  difficult  HCI  and  techni¬ 
cal  issues  are  about  to  become  critical  issues  for  real  customers. 
This  is  the  golden  opportunity  to  get  out  of  the  ivory  tower. 


EXTERNAL  TECHNOLOGY  TRANSFER 


Technology  transfers  outside  the  boundaries  of  a  legal  entity, 
regardless  of  whether  they  originate  in  a  government  agency, 
academia,  or  an  industrial  lab,  are  almost  by  definition  cleaner 
types  of  transfers.  This  means  that  one  will  be  participating  in  a 
transaction  involving  the  sale  or  licensing  of  technology,  or  con¬ 
tracting  to  provide  some  service,  or  both.  So,  the  first  order  of 
business  is  negotiating  the  “deal.” 


KNOW  WHAT  YOU'RE  SELLING 


It  is  critical  to  know  and  understand  what  is  being  sold:  out¬ 
right  ownership  of  intellectual  property,  or  a  license  to  it?  Is  it  a 
complete  solution  that  has  been  developed,  or  just  pieces? 
Those  pieces  might  include  any  or  all  of  the  following: 

•  User  interface  design; 

•  System  specifications; 

•  A  working  prototype; 

•  An  actual  implementation,  tested  to  some  level  of  quality; 

•  The  source  code  for  a  software  implementation; 

•  Copyrights; 

•  Patents; 

•  Working  time  as  a  commitment  to  support  ongoing 
productization; 

•  A  running  service  that  you  host. 


KNOW  WHAT  YOU'RE  NOT  SELLING 


It  is  equally  important  to  know  what  is  not  being  sold.  If  one 
would  like  to  continue  this  work,  he  or  she  will  need  to  make  sure 
to  preserve  rights  and  ownership  to  continue  that  work.  Other¬ 
wise,  one  could  very  well  put  oneself  out  of  business  by  selling 
complete  ownership  to  a  valuable  asset,  or  by  signing  an  exclusive 
license  which  precludes  licenses  with  any  other  company. 


KNOW  THE  PEOPLE  INVOLVED 


Deals  inevitably  involve  lawyers  as  well  as  what  are  known  as 
“business  development”  people:  those  whose  job  are  to  nego¬ 
tiate  deals  that  further  the  business  interests  of  their  employers. 
One  can  assume  that  the  company  negotiating  will  have  both 
business-development  people  and  lawyers;  it  makes  good  sense 


for  the  researcher  to  have  them  too.  This  is  a  situation  where  we 
need  to  put  our  pride  and  high-minded  notions  about  doing 
business  “on  a  handshake”  aside.  While  the  overwhelming  ma¬ 
jority  of  companies  are  not  in  the  business  of  stealing  from  peo¬ 
ple  like  us,  and  will  not  try  to  do  so,  in  order  to  get  the  most 
value  in  return  for  what  is  being  offered  one  needs  someone 
on  his  or  her  side  who  understands  what  is  customary  in  intel¬ 
lectual  property  deals  and  how  to  negotiate  for  it.  Even  a  bril¬ 
liant  and  excellent  debater  who  doesn’t  know  what  to  reason¬ 
ably  ask  for  is  at  a  serious  disadvantage.  The  bottom  line:  find 
someone  with  good  business-development  skills  and  experi¬ 
ence  to  negotiate  the  deal. 

Likewise,  once  the  terms  of  a  deal  have  been  negotiated,  one 
needs  a  lawyer  to  write  it  up  and  make  it  legally  binding.  Don’t 
even  think  about  self-representation  in  the  drafting  of  an  intel¬ 
lectual  property  agreement;  the  laws  are  changing  too  quickly 
(which  isn’t  the  fault  of  the  lawyers)  for  a  researcher  to  under¬ 
stand  which  ones  apply  to  the  situation  and  should  be  factored 
into  the  drafting  of  an  agreement.  Mock,  Kenkeremath,  and 
Janis  (1993)  set  out  the  basics  of  existing  laws  and  how  they 
relate  to  technology  transfer. 

The  key  to  success  is  to  understand  the  defined  role  of  each 
of  the  three  people  on  a  negotiating  team:  oneself,  the  business- 
development  person,  and  the  lawyer.  The  researcher’s  role  is 
to  be  the  technical  expert  and  to  place  a  value  on  the  work  as 
well  as  what  the  people  on  the  other  side  of  the  table  are  offer¬ 
ing  in  return;  one’s  role  as  “client”  is  to  decide  what  is  needed  in 
order  to  be  successful,  and  what  additionally  is  desired  but  ne¬ 
gotiable.  The  role  of  the  business-development  person  is  to  take 
the  articulated  needs  and  desires  and  try  to  structure  the  terms 
of  an  agreement  that  will  work  for  both  parties.  He  or  she  un¬ 
derstands  business  risks  and  will  help  the  team  members  to  un¬ 
derstand  them  and  make  informed  decisions  about  how  much 
risk  can  be  tolerated.  The  role  of  the  lawyer  is  to  take  the  terms 
and  write  them  down  in  words  that  both  parties  understand  and 
that  can  be  interpreted  under  the  law  to  protect  the  client’s  in¬ 
terests.  The  lawyer  also  understands  and  can  articulate  the  legal 
risks;  laws  are  often  subtle  and  ambiguous  things  that  can  be 
interpreted  in  many  ways  (in  fact,  nations  have  an  entire  branch 
of  government  that  does  nothing  but  interpret  the  laws) .  Any 
contractual  obligation  runs  the  risk  of  being  interpreted  in  a 
way  other  than  how  it  was  intended,  and  a  lawyer  can  help  the 
team  members  to  understand  how  likely  that  is,  based  upon  the 
language  of  the  law  and  similar  previous  cases  where  the  law  has 
been  interpreted  by  the  courts.  Just  as  there  is  always  business 
risk,  there  is  always  legal  risk,  and  in  the  end  it  will  fall  upon 
the  researcher  to  make  the  decision  as  to  whether  the  risk  is 
acceptable. 

Where  do  deals  go  wrong?  In  my  personal  experience,  they 
often  go  wrong  when  these  three  roles  become  confused,  and 
when  the  business-development  person  and  the  lawyer  start  to 
make  the  key  decisions.  The  researcher  must  live  with  the  result 
of  the  deal,  not  them.  On  one  end  of  the  spectrum,  there  is  no 
such  thing  as  a  risk-free  deal;  on  the  other  end,  even  a  high-risk 
deal  could  be  worth  doing  if  the  reward  is  high  enough.  Those 
decisions  are  the  client’s,  not  theirs,  and  the  researcher  as  client 
should  insist  on  making  them. 

There  are  many  good  sources  for  both  business-develop¬ 
ment  people  and  lawyers.  Venture  capitalists  will  often  have  a 
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“short  list”  of  ones  they  trust  to  do  their  business  transactions. 
The  Chamber  of  Commerce  for  a  local  area  can  also  provide  rec¬ 
ommendations,  and  often  will  track  complaints  registered 
against  specific  ones.  Many  regions  also  have  associations  of  en¬ 
trepreneurs,  inventors,  and  small-business  owners,  with  great 
resources  to  draw  from. 

Of  course,  if  one  already  works  in  a  research  lab,  the  institu¬ 
tion  most  likely  has  a  technology-licensing  office  that  will  nego¬ 
tiate  and  draft  deals  on  the  researcher’s  behalf  (and  are  likely  re¬ 
quired  to  do  so  if  the  researcher’s  employment  agreement 
assigns  ownership  of  inventions  to  the  employer).  In  that  case, 
one  will  still  need  to  stay  involved  to  make  sure  that  the  re¬ 
searcher’s  needs  are  met  in  whatever  deal  is  negotiated. 


CRAFTING  A  DEAL 


The  most  difficult  process  of  negotiating  a  deal  is  crafting  an 
arrangement  that  meets  the  needs  of  both  sides.  I  have  seen 
many  negotiations  take  much  longer  to  conclude,  and  in  many 
cases  fail  to  conclude  successfully,  because  either  or  both  sides 
didn’t  bother  to  try  to  understand  the  other  side’s  business 
needs.  Business  partnerships  are  always  about  finding  a  way  to 
help  both  parties  be  more  successful.  The  best  way  to  do  that 
is  to  understand  what  one’s  prospective  partner’s  business  is 
about,  and  likewise  to  share  enough  information  about  one’s 
own  business,  openly  and  honestly,  so  that  together  a  combi¬ 
nation  can  be  found  that  works  for  both  sides.  Find  out  every¬ 
thing  available  about  a  partner’s  current  business  situation: 

•  their  revenues  and  profits 

•  their  competition 

•  their  most  important  customers 

•  what  customers  are  saying  about  their  product 

•  where  they  say  they  want  to  take  their  business  in  the  future 

•  the  problems  and  challenges  they  are  facing. 

This  essential  information  will  guide  you  to  deeper  insights 
on  how  to  offer  terms  that  will  be  seen  as  valuable  to  a  poten¬ 
tial  buyer  or  licensee.  Steinberg  (1998)  described  his  experi¬ 
ences  in  negotiating  deals  and  his  own  well-known  and  well- 
respected  philosophy  for  how  to  structure  deals  makes  good 
business  sense  for  all  parties. 

Companies  will  pay  for  the  value  that  can  be  delivered  to 
them.  They  will  pay  a  certain  amount  of  money  to  make  an  even 
larger  reduction  in  their  costs  (because  in  the  end  they  save 
money).  They  will  of  course  pay  to  help  themselves  make  even 
more  money.  And,  finally,  they  will  pay  if  one  solves  a  problem 
for  them.  As  a  deal  is  structured,  try  to  cast  it  in  terms  of  what 
it  does  for  them;  those  are  terms  that  they  can  understand  and, 
more  importantly,  quantify  in  a  valuation. 

Take  it  further.  Help  the  buyers  in  any  way  possible  to  place 
a  value  on  what  is  brought  to  the  table.  For  example,  do  a  user 
study  on  their  existing  product,  and  another  showing  how  the 
technology  being  offered  will  improve  their  product  (if  it  hap¬ 
pens  to  address  a  key  customer  complaint,  all  the  better,  and 
that  should  definitely  be  brought  to  their  attention).  During 


the  negotiations,  show  them  a  smart,  effective  professional 
who  can  work  with  them.  This  doesn’t  mean  that  you  need  to 
negotiate  hard  to  the  very  last  item;  contrary  to  popular  belief, 
the  tough  negotiators  aren’t  always  the  most  respected,  and  in 
fact  they  are  often  the  ones  that  create  their  own  reputation  for 
being  difficult  to  work  with.  It  is  much  more  important  to 
demonstrate  an  understanding  of  the  buyers  and  an  ability  to 
speak  their  language,  as  well  as  willingness  to  help  them  make 
the  case  to  the  decision-makers  in  their  organization’s  senior 
management. 


CONFIDENTIALITY  AND  NDAS 


One  large  challenge  in  negotiating  deals  is  the  issue  of  confi¬ 
dentiality,  which  usually  rears  its  head  first  in  the  often-dreaded 
non-disclosure  agreement.  Non-disclosure  agreements  are 
signed  prior  to  revealing  confidential  information,  to  ensure 
that  the  information  won’t  be  disclosed  to  competitors.  That 
part  is  good  business  and  a  natural,  noncontroversial  part  of 
good-faith  bargaining — we  all  need  to  be  able  to  keep  secrets. 
The  difficult  part  of  non-disclosures  is  the  issue  of  residuals:  by 
looking  at  confidential  information,  I  learn  things,  and  then  I 
carry  that  learning  around  in  my  head  for  the  rest  of  my  career. 
■What  am  I  allowed  to  do  with  that  information  in  my  head,  and 
who  in  fact  owns  it?  From  the  discloser’s  point  of  view,  one 
wants  to  make  sure  that  someone  can’t  use  his  or  her  own  con¬ 
fidential  information  to  compete.  From  the  other  side,  it  is  im¬ 
possible  to  know  exactly  what  will  be  disclosed,  or  what  busi¬ 
ness  opportunities  are  going  to  come  one’s  way  tomorrow,  so 
it  is  deeply  problematic  to  sign  away  the  ability  to  enter  certain 
businesses  simply  for  the  privilege  of  looking  at  confidential 
information.  Both  sides  sound  very  reasonable,  and  they  are, 
which  is  why  NDAs  are  no  trivial  matter  and  often  become  the 
stopping  point  in  negotiations. 

Whenever  possible,  try  to  complete  as  much  of  the  negotia¬ 
tions  as  possible  without  entering  into  an  NDA,  because  it  sim¬ 
plifies  matters  and  prevents  the  trust  issues  from  rising  to  the 
surface  too  early.  The  downside  is  that  this  makes  the  early  ne¬ 
gotiations  a  precarious  dance,  where  a  researcher  needs  to 
show  the  other  party  enough  to  convince  them  that  the  tech¬ 
nology  is  real  and  solves  their  problem,  without  giving  away  key 
secrets.  There  are  things  that  still  can  be  done:  tell  them  “what” 
it  does,  instead  of  “how”  it  does  it,  and  show  them  the  system 
working.  The  goal  is  to  make  them  crave  it  enough  that  they  will 
want  to  sign  an  NDA  under  the  researcher’s  terms  in  order  to 
complete  the  technical  due-diligence  required  for  them  to  close 
the  deal. 

It  is  critical  to  think  this  all  through  before  getting  to  the  ne¬ 
gotiation  table — these  are  never  decisions  to  make  under  time 
and  social  pressure.  It  is  also  critical  to  realize  that  the  whole  is¬ 
sue  of  confidentiality  and  NDAs  is  one  more  business  risk;  admit 
it  and  decide  for  yourself  whether  (and  when)  the  potential  re¬ 
ward  outweighs  the  risk.  This  is  one  of  the  clear  cases  in  which 
a  lawyer  will  be  extremely  conservative  and  protective,  and  de¬ 
scribe  in  great  detail  everything  that  could  be  lost  by  entering 
into  an  NDA  (or  by  showing  technology  without  an  NDA) .  But, 
in  the  end,  the  decision  is  the  researcher’s. 
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GOING  TO  THE  NEGOTIATING  TABLE 


Negotiating  a  deal  is  probably  the  most  hyped  and  feared  part  of 
this  whole  process;  perhaps  we’ve  all  had  too  many  nightmares 
about  slick  car  salesmen  tricking  us  into  paying  too  much  for 
too  little  in  return.  The  reason  we  fear  car  salesmen  is  that  the 
salesman  has  all  of  the  information  about  what  the  car  is  really 
worth  and  shares  none  of  it  with  us;  we  are  forced  to  blindly 
trust  him,  and  many  of  us  don’t. 

Steinberg  (1998)  once  again  shared  his  wisdom  on  a  sound 
and  ethical  approach  to  negotiation  in  his  “twelve  essential  rules 
for  negotiating.”  Not  nearly  as  ambitious  as  Steinberg,  I  have 
only  three  basic  rules  for  negotiating: 

1.  I  obtain  as  much  information  about  each  side’s  position  as 

possible,  before  arriving  at  the  table. 

2.  I  have  a  list  of  what  I  really  need  in  order  to  succeed,  and  a 

separate  list  of  what  I  want  to  have  in  addition.  I  hold  firm  on 

my  needs,  and  I  am  willing  to  compromise  on  my  wants. 

3.  I  always  negotiate  a  deal  in  which  both  sides  win. 

Understanding  a  potential  business  partner’s  position  is  crit¬ 
ical  to  negotiating  success  for  a  number  of  reasons.  First,  it  tells 
what  they  are  looking  for.  Ask  the  same  list  of  questions  we  dis¬ 
cussed  with  internal  technology  transfer:  Who  are  their  cus¬ 
tomers?  'What  are  those  customers  saying  about  their  products? 
Who  are  their  competitors?  What  is  the  company  looking  for 
that  will  help  them  to  be  more  successful?  What  are  their 
strengths  and  weaknesses?  Second,  and  very  much  related,  it 
tells  how  the  buyer  will  value  what  they  are  offered.  Successful 
business  deals  involve  an  exchange  of  value  which  both  sides 
view  as  fair  and  equitable,  but  value  is  of  course  relative  to  the 
company  and  its  context.  Understand  that  in  order  to  find  an 
equal  trade.  Reading  the  company’s  annual  report  is  an  excel¬ 
lent  source  of  information  about  a  company  (if  it  is  a  publicly 
traded  company).  Reading  news  articles  and  competitive  re¬ 
views  also  provides  invaluable  information  about  the  business 
pressures  the  company  is  under,  as  well  as  the  assets  that  they 
bring  to  the  table. 

Having  the  list  of  the  things  that  one  really  needs  and  the 
things  that  one  wants  in  addition  is  a  valuable  step  in  preparing 
for  negotiation.  I  have  seen  many  people  come  to  the  table  un¬ 
willing  to  compromise  on  anything;  they  ask  for  too  much  in  the 
beginning,  and  believe  strongly  that  compromising  on  anything 
is  a  sign  of  weakness.  Negotiations  like  those  always  take  longer 
than  they  should  and  are  very  frustrating.  In  some  cases  they 
spend  more  money  in  lawyers’  fees  for  fruitless  negotiations 
than  the  value  of  the  small  items  on  which  they  refuse  to  com¬ 
promise.  I  recommend  starting  with  a  basic  negotiation  on  the 
core  needs  of  both  sides;  not  only  does  that  keep  you  focused 
on  the  heart  of  the  deal,  but  it  also  tends  to  simplify  things  just 
by  taking  all  the  peripheral  items  off  the  table.  Once  the  heart  of 
the  deal  is  done,  and  both  sides  feel  comfortable  that  they  can 
be  successful  because  they  are  getting  what  they  need,  adding 
additional  pieces  is  much  easier  with  a  lower  stress  level  and  a 
structure  in  place. 

Remember  to  be  honest  with  yourself  as  you  detail  all  this 
information,  because  lying  to  yourself  is  the  surest  path  to  fail¬ 


ure.  Assume  that  they  will  have  an  accurate  valuation  of  what 
is  brought  to  the  table  (regardless  of  whether  they  are  willing 
to  tell  you  what  it  is),  so  insisting  that  something  is  worth 
more  than  its  real  value  is  foolish.  Understand  one’s  own 
strengths  and  weaknesses;  be  honest  about  what  is  needed 
in  order  to  be  successful,  and  don’t  promise  things  that  can’t 
be  delivered. 

All  of  this  brings  us  to  the  last  rule:  always  negotiate  a  win- 
win.  Always  negotiate  a  deal  in  which  both  sides  feel  that  they 
are  receiving  what  they  needed  and  can  be  successful.  Beyond 
the  obvious  ethical  reasons  for  doing  this,  there  is  also  the  very 
practical  consideration  that  the  two  parties  will  need  to  con¬ 
tinue  to  work  together.  Many  people  go  to  the  negotiating  table 
believing  that  signing  a  deal  is  the  end  of  the  process,  when  in 
fact  it  is  just  the  beginning.  Once  the  paperwork  is  signed,  a  re¬ 
lationship  that  essentially  lasts  forever  begins  between  the  par¬ 
ties.  This  relationship  often  makes  itself  known  in  unanticipated 
ways;  for  example,  if  one  is  licensing  a  patent  to  a  company,  the 
company  has  a  vested  interest  in  ensuring  that  the  patent  main¬ 
tenance  fees  continue  to  be  paid  to  the  PTO  to  keep  the  patent 
valid,  and  will  want  to  have  regular  information  to  confirm  that 
the  payments  are  being  made.  Almost  every  deal  one  can  imag¬ 
ine,  no  matter  how  cut-and-dry,  has  some  aspect  that  will  re¬ 
quire  communication  between  the  parties  on  an  ongoing  basis 
after  the  deal  is  done.  If  one  licenses  a  technology  to  a  company, 
the  company  may  be  sending  royalty  checks,  and  the  licensor 
may  want  some  way  to  audit  their  sales  to  ensure  that  they  are 
accurately  paying.  They,  in  turn,  may  want  technical  support, 
including  important  bug  fixes  and  updates.  They  may  addition¬ 
ally  have  negotiated  the  option  to  license  future  upgrades,  and 
as  a  paying  customer  they  will  likely  want  to  provide  their  input 
on  features  and  enhancements  to  the  technology  that  would 
be  of  most  help  to  them. 


EMBRACING  THE  RELATIONSHIP 


By  admitting  from  the  beginning  that  there  is  an  ongoing  rela¬ 
tionship,  one  can  embrace  this  notion  and  turn  it  to  an  advan¬ 
tage.  In  fact,  I  encourage  its  use  to  build  a  future  revenue 
stream — and  in  light  of  the  increasing  attention  to  Ches- 
brough’s  “open  innovation”  model,  this  is  a  likely  outcome  in¬ 
deed.  A  researcher  can  build  design  and  consulting  services 
into  the  deal,  which  is  particularly  helpful  for  HCI-related  tech¬ 
nology  transfer  since,  as  we  discussed  earlier,  they  often  need 
reworking  to  fit  into  a  larger  context.  One  can  use  the  ongoing 
relationship  as  a  “foot  in  the  door”  to  be  able  to  offer  future 
sales  and  deals  as  new  technologies  are  developed.  In  fact, 
viewed  as  a  “strategic  partner,”  one  might  even  want  to  offer 
them  the  right  of  first  refusal  on  future  offerings,  as  a  way  of 
demonstrating  a  commitment  to  them.  Finally,  as  in  contem¬ 
plating  future  growth  of  the  researcher’s  business,  one  may 
need  additional  sources  of  funding,  and  a  partner  who  has  a 
vested  interest  in  the  researcher’s  success  can  be  a  great  source 
for  funding.  Even  if  none  of  this  is  true,  assume  that  one  day 
there  will  be  a  need  for  a  good  reference  or  recommendation 
from  them;  that  alone  is  reason  enough  to  want  to  have  a  great 
ongoing  relationship. 
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SCHOFIELD 


COMMONALITIES  FOR  INTERNAL  AND 
EXTERNAL  TECHNOLOGY  TRANSFER 


Intellectual  Property 

One  of  the  issues  that  is  common  to  both  internal  and  external 
technology  transfer  is  intellectual  property  (IP).  I  am  not  a  lawyer, 
and  so  I  obviously  can’t  give  legal  advice  on  how  to  protect 
intellectual  property  or  how  to  treat  others’  IP  ’What  I  can  do  is 
point  out  some  places  where  the  IP  issues  get  thorny  and  make 
some  business  recommendations  about  how  to  deal  with  them. 

Anything  received  from  a  third  party  may  carry  restrictions 
on  how  it  may  be  used,  and  more  importantly  whether  it  can 
be  redistributed  in  its  original  or  modified  form,  or  combined 
with  some  other  components.  This  includes  libraries  of  software 
routines,  data,  copyrighted  works  and  designs,  and  patents. 
These  restrictions  can  come  from  explicit  license  agreements 
that  accompany  the  third-party  components,  or  they  could 
come  from  any  of  a  number  of  different  laws,  including  patent, 
copyright,  trademark,  trade  secret,  and  export.  Any  time  a  third- 
party  component  is  used  in  one’s  work,  a  business  risk  of  con¬ 
straining  the  ability  to  transfer  the  work  to  another  party  arises, 
because  either  one  doesn’t  possess  the  right  to  do  so,  or  the 
rights  that  are  possessed  aren’t  sufficient  to  the  needs  of  the 
party  that  wants  to  license  it.  This  question  fits  into  the  larger 
scheme  of  what  it  traditionally  called  the  “build  or  buy”  deci¬ 
sion:  whether  it  makes  more  business  sense  to  build  something 
oneself,  or  to  buy  or  license  it  from  a  third  party.^ 

I  strongly  recommend  that,  whenever  possible,  the  IP  issues 
should  be  dealt  with  at  the  time  a  third-party  component  is  ac¬ 
quired,  rather  than  waiting  until  an  opportunity  to  transfer  it. 
This  accomplishes  two  things.  First,  it  allows  one  to  negotiate 
and  make  business  decisions  about  acquiring  the  component 
before  there  is  a  commitment  and  dependence  on  the  compo¬ 
nent  built  into  your  technology;  once  the  dependency  is  there, 
the  “switching  cost”  is  much  higher  for  moving  to  an  alternate 
and  one  could  be  forced  to  pay  a  much  larger  licensing  fee  than 
before.  Second,  it  simplifies  the  tech-transfer  process.  Any  com¬ 
pany  worth  its  salt  will  perform  a  “due  diligence”  on  the  tech¬ 
nology  before  it  close  a  licensing  deal.  Part  of  that  will  be  an 
analysis  of  who  really  owns  the  technology,  or  whether  the  li¬ 


censor  has  acquired  the  right  to  further  license  it.  In  essence, 
one  will  need  to  prove  the  right  to  license  one’s  work  to  the 
company.  And  don’t  be  surprised  if  the  company  also  asks  the 
researcher  to  “warrant”  the  work — to  guarantee  that  he  or  she 
has  the  right  to  license  all  of  it  to  the  company,  and  that  right 
will  be  defended  in  court  if  necessary.  The  bottom  line:  clear  it 
up  front,  and  save  a  lot  of  trouble  later. 


IT  ALL  COMES  DOWN  TO  THE  RELATIONSHIP 


The  most  significant  common  aspect  of  internal  and  external 
transfer  comes  back  to  the  notion  that  we  began  with:  technol¬ 
ogy  transfer  is  a  social  process  that  succeeds  or  fails  based  upon 
the  relationships  that  have  been  built.  Tech-transfer  partners 
need  to  trust  each  other,  and  that  trust  is  built  with  communi¬ 
cation  and  follow-through.  Researchers  should  understand 
clearly  and  honestly  what  value  they  bring  to  the  table  at  the  var¬ 
ious  stages  of  the  relationship,  and  they  should  make  it  their 
business  to  know  how  their  partners  see  and  value  what  they 
bring.  After  all,  business  is  fundamentally  the  exchange  of  value 
between  partners  who  need  each  others’  competencies,  and  the 
truly  successful  companies  are  the  ones  that  build  relationships 
that  last  across  a  continuous  series  of  business  transactions. 

"When  beginning  a  technology-transfer  effort,  assume  that 
what  is  started  that  day  is  the  beginning  of  a  working  relation¬ 
ship  that  will  last  forever.  Build  the  relationship  from  the  ground 
up  with  the  expectation  of  ever-increasing  levels  of  coopera¬ 
tion  and  trust  that  will  allow  the  partnership  to  take  on  ever 
more  challenging  technology  transfers,  in  whatever  manner  is 
most  appropriate  to  the  needs  of  the  business. 

There  are  far  too  many  stories  of  companies  who  have  strug¬ 
gled  with  the  transfer  of  technologies  that  could  change  the 
world — and  failed.  As  Buderi  (2000)  and  Freidman  (2005)  de¬ 
scribed,  the  next  chapter  of  this  story  is  being  written  now.  By 
rewriting  the  rules  to  focus  on  the  social  side  of  the  process, 
we  can  ensure  that  our  best  work  will  see  the  light  of  day  and 
this  story  will  have  a  happy  ending.  The  good  news  is  that  there 
are  more  companies  open  to  technology  transfer  today  than  at 
any  time  in  the  history  of  the  industry,  and  the  opportunities  are 
there — on  both  sides — for  us  to  take. 
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INTRODUCTION 


In  i960,  J.  C.  R.  Licklider  had  a  vision  for  a  “Man-Computer 
Symbiosis”  in  which  the  human  and  computer,  while  dissimilar 
from  one  another,  would  live  together  in  an  intimate  associa¬ 
tion  producing  increased  handling  and  new  ways  of  processing 
information  (Licklider,  I960).  Over  the  past  few  decades,  sev¬ 
eral  attempts  to  realize  this  vision  have  been  made  by  inter¬ 
active  system  developers,  but  each  time,  it  has  eluded  them. 
This  was  likely  due  to  the  insufficiency  of  technology  and  com¬ 
putational  power,  but  also  to  the  need  to  mature  several  fields 
of  basic  science  necessary  to  understand  how  human-machine 
symbiosis  might  be  produced.  A  more  thorough  understand¬ 
ing  of  human  brain  functioning  and  what  guides  behavior  dur¬ 
ing  human-computer  interaction  has  been  a  continuing  miss¬ 
ing  requirement  in  the  ability  to  enable  true  human-machine 
symbiosis. 

Fortunately,  in  the  1990s  period  known  as  the  “Decade  of 
the  Brain,”  the  National  Institutes  of  Health  (NIH)  and  other 
federal  funding  agencies  invested  heavily  into  advancing  neuro¬ 
science  in  order  to  understand  the  basic  scientific  aspects  un¬ 
derlying  the  brain  to  include  human  cognition  and  behavior. 
What  emerged  are  advanced  neurophysiological  assessment 
tools  and  a  wealth  of  knowledge  regarding  specific  brain  areas 
that  could  be  attributed  to  particular  cognitive  and  behavioral 
functions.  As  brain  activity  measurement  tools,  such  as  elec¬ 
troencephalography  (EEG),  magnetoencephalography  (MEG), 
event  related  potentials  (ERP),  positron  emission  tomography 
(PET),  and  functional  magnetic  resonance  imaging  (fMRI),  con¬ 
tinued  to  advance  into  the  year  2000,  the  idea  that  one  could 
possibly  begin  to  capture  brain  functioning  in  real  time  as  users 
performed  real  world  tasks  came  to  fruition.  Seeing  the  promise 
of  such  capabilities,  in  2001,  the  Defense  Advanced  Research 
Projects  Agency  (DARPA)  began  funding  a  program  based  on 
what  is  now  known  as  the  scientific  field  of  Augmented  Cogni¬ 
tion  (AugCog)  (Kollmorgen,  2007;  Schmorrow  &  Kruse,  2002, 
2004,  2005;  Schmorrow  &  Reeves,  2007;  Schmorrow,  Stanney, 
Wilson,  &  Young,  2005).  The  field  of  AugCog  has  had  many 
predecessors  (e.g.,  DoD-funded  programs  in  Biocybernetics, 
Learning  Strategies,  and  the  Pilot’s  Associate  Program).  It  was 
not  until  the  Decade  of  the  Brain  and  after,  however,  that  sub¬ 
stantial  investment  in  technologies  to  enable  real-time  moni¬ 
toring  of  cognitive  state  via  physiological  sensors  brought 
together  a  research  community  and  stimulated  the  growth  of 
the  AugCog-based  program  that  became  known  as  the  Improv¬ 
ing  Warfighter  Information  Intake  Under  Stress  (IWIIUS)  Pro¬ 
gram.  The  IWIIUS  program  included  four  industry  teams  work¬ 
ing  in  different  application  domains  toward  a  common  goal  of 
“closing  the  loop”  around  the  human  via  the  real-time  assess¬ 
ment  of  human  information  processing  (HIP)  bottlenecks  (e.g., 
sensory  processing,  attention,  working  memory,  executive 
function)  and  use  of  subsequent  adaptive  automation  tech¬ 
niques  to  mitigate  the  effects  of  those  bottlenecks  (Proffit, 
2003).  The  ultimate  was  goal  to  improve  human  performance 
in  both  stationary  (e.g.,  command  and  control)  and  mobile 
(e.g.,  in-vehicle,  dismounted  soldier)  application  domains. 

The  four  main  tenets  of  AugCog,  on  which  the  DARPA  IWIIUS 
Program  was  founded,  include  using  neuroscientific  tools  for 


assessing  and  accommodating  the  following  HIP  bottlenecks 
(Schmorrow,  Stanney,  Wilson,  &  Young,  2005): 

•  Sensory  Processing:  AugCog  seeks  to  enhance  sensory  per¬ 
ception  by  exploiting  multiple  sensory  channels  for  increased 
input  capacity. 

•  Attention:  AugCog  seeks  to  equip  computers  such  that  they 
become  aware  of  subtle  cues  emanating  from  humans  indicat¬ 
ing  how  they  are  prioritizing  incoming  information  (e.g.,  di¬ 
recting  attention)  and  capitalize  on  these  cues  to  enhance  HIE 

•  Working  Memory:  AugCog  seeks  to  support  simultaneous 
processing  of  competing  tasks  by  strategically  allocating  data 
streams  to  various  multimodal  sensory  systems,  while  main¬ 
taining  multimodal  information  demands  within  working 
memory  capacity. 

•  Executive  Eunction:  AugCog  seeks  to  enhance  HIP  by  direct¬ 
ing  the  recall  of  contextual  information  that  cues  the  opti¬ 
mal  interpretation  of  incoming  information  and  moderates 
the  effects  of  modality  augmentation  or  switching. 

Researchers  and  developers  from  AugCog  and  neuroergo¬ 
nomics  (Parasuraman,  2003)  efforts  are  aiming  to  revolutionize 
the  field  of  Human-Computer  Interaction  (HCI)  by  leveraging 
recent  advances  in  the  fields  of  fields  of  neuroscience,  cogni¬ 
tive  science,  cognitive  neuroscience,  and  computer  science 
and  engineering  to  develop  technologies  and  strategies  to  mit¬ 
igate  users’  HIP  limitations  and  improve  their  performance 
via  real-time  cognitive  state  (operator  functional  state,  OPS) 
detection  and  accommodation.  The  AugCog  science  and  tech¬ 
nology  (S&T)  emerging  from  these  research  and  development 
(R&D)  efforts  is  providing  tools  and  techniques  that  may  be 
applied  to  (a)  the  design  of  systems  (e.g.,  neurotechnologies 
as  a  measure  during  usability  testing),  (b)  training  (e.g.,  neu¬ 
rotechnologies  to  “sense”  learning  and  systems  to  adapt  train¬ 
ing),  and  (c)  operational  systems  (e.g.,  closing  the  loop  be¬ 
tween  assessing  cognitive  state  and  initiating  real  time  adaptive 
changes  in  the  interface) . 

The  aggressive  goals  and  objectives  required  of  the  industry 
teams  in  the  DARPA  IWIIUS  program  (Schmorrow  &  Kruse, 
2002,  2005;  Schmorrow  &  McBride,  2004;  Schmorrow,  Stanney, 
Wilson,  &  Young,  2005)  has  fostered  the  development  of  many 
neurophysiological-based  tools  and  techniques  that  are  matur¬ 
ing  enough  to  become  feasible  toolsets  for  HCI  researchers,  de¬ 
signers,  and  practitioners  in  their  pursuit  of  improving  human- 
computer  system  efficiency  and  effectiveness.  This  chapter  is 
thus  focused  on  summarizing  some  of  the  latest  S&T  advance¬ 
ments  from  the  field  of  AugCog,  with  specific  examples  and 
lessons-learned  from  the  DARPA  program’s  four  main  industry 
teams  (Honeywell  Laboratories,  Lockheed  Martin  Advanced 
Technology  Laboratories  [LM-ATL] ,  DaimlerChrysler  AG,  Ger¬ 
many  and  The  Boeing  Company)  and  their  sub-contractors. 


NEUROTECHNOLOGIES  FOR 
ASSESSING  USER  STATE 


The  field  of  AugCog  is  aimed  at  substantially  improving  human- 
system  interaction  by  using  proactive  systems  that  detect,  gauge. 
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and  assess  a  user’s  cognitive  state  in  real  time  (via  diagnostic 
physiological  and  neurophysiological  sensors  and  appropriate 
data  classification  methods),  and  then  mitigate  HIP  bottlenecks 
by  modifying  system  interfaces  (via  dynamically  adaptive  strate¬ 
gies)  (Schmorrow,  Stanney  Wilson,  &  Young,  2005).  A  critical 
component  of  this  goal  is  the  use  of  neurotechnologies  that  can 
be  used  to  appropriately  measure  cognitive  load  at  a  neural  level 
in  order  to  sufficiently  characterize  the  cognitive  state^  of  users 
while  they  interact  with  a  computer-based  system  in  real-world 
settings.  Schmorrow  and  Kruse  (2002)  noted  that  there  is  a  sig¬ 
nificant  body  of  research — ranging  from  early  measurements 
of  pupil  size  to  more  recent  observations  using  EEG,  ERI)  PET, 
and  fMRI  neurophysiological  tools — that  is  devoted  to  measur¬ 
ing  involuntary  changes  in  human  physiology  as  a  result  of  cog¬ 
nitive  load  and  effort  (Gevins  &  Smith,  2000;  Gundel  &  Wilson, 
1992;  Hess  &  Polt,  1964;  Humphrey  &  Kramer,  1994;  Just  &  Car¬ 
penter,  1993).  Only  a  handful  of  these  measurement  tech¬ 
niques,  however,  have  been  used  and  analyzed  in  real  time 
while  users  perform  real-world  tasks,  and  some  devices  (e.g., 
fMRI,  PET)  are  still  too  expensive  and  big  and  thus  not  portable 
enough  for  use  in  operational  environments. 

HCI  researchers  and  practitioners  must  be  able  to  rely  on 
tools  and  techniques  that  allow  them  easy  access  for  noninva- 
sively  observing  and  assessing  users  while  interacting  with 
human-computer  systems;  therefore,  only  the  physio-  and  neuro¬ 
technologies  most  appropriate  for  such  assessment,  and  which 
have  been  proven  by  DARPA  AugCog  experiments  in  real-world 
settings,  are  reviewed  here.  Eor  a  more  thorough  review  of  sen¬ 
sor  technologies  available,  readers  are  directed  to  the  recent 
NATO  report  on  Operator  Eunctional  State  Assessment  (Wilson 
&  Schlegel,  2004). 

In  the  last  few  years,  AugCog  researchers  and  developers 
have  determined  that  various  combinations  of  physiological 
(e.g.,  electro-oculography  [EOG],  heart  rate  [HR],  galvanic  skin 
response  [GSR],  blood  pressure  [BP],  and  pupillometry)  and 
neurophysiological  sensors  may  be  used  to  improve  the  robust¬ 
ness  for  detecting  cognitive  functioning  of  users  across  a  vari¬ 
ety  of  conditions.  The  most  promising  to  date  are  reviewed  next 
and  summarized  in  Table  62.1,  and  they  include  (a)  stress  and 
arousal  meters;  (b)  engagement  indices;  (c)  portable  EEG,  ECG 
(EKG),  and  functional  near  infrared  (fNIR)  devices;  (d)  posture/ 
body  state  analyses;  and  (e)  pupillometry.  Each  of  the  sensors/ 
gauges  discussed  below  were  selected  according  to  how  effec¬ 
tively  they  have  been  shown  by  DARPA  IWIIUS  program  per¬ 
formers  to  assess  cognitive  state  in  real  time,  while  also  consid¬ 
ering  factors  such  as  portability,  usability  in  the  field,  potential 
intrusion  on  user  task  performance,  and  potential  electromag¬ 
netic  interference  (EMI)  during  combined  sensor/gauge  imple¬ 
mentation  in  operational  settings.  (For  detailed  discussions  of 
DARPA  program  performers’  latest  AugCog  R&D  efforts  with 
various  sensors  and  gauges  in  their  various  task  environments, 
see  proceedings  articles  from  the  recent  First  Annual  Aug¬ 
mented  Cognition  International  [ACI]  Conference  held  in  con¬ 
junction  with  HCI  International  2005  in  Las  'Vegas,  Nevada.) 


Electrodermal  Activity  (EDA)/GSR-based  Arousal 
and  Cognitive  Workload  Gauge 

In  pursuit  of  their  DARPA  AugCog-based  IWIIUS  program  ef¬ 
forts,  LM-ATL  employed  the  use  of  an  EDA/GSR-based  arousal 
and  cognitive  workload  gauge  (Kobus,  Brown,  Morrison,  Koll- 
morgen,  &  Cornwall,  2005).  The  system  was  provided  by  An- 
throtronix,  Inc.  and  included  a  device  for  capturing  EDA/GSR 
data  and  the  data  processing  algorithms  necessary  to  convert  its 
output  into  estimates  of  arousal  and  workload.  In  this  system 
EDA  is  measured  with  a  two-lead  GSR  sensor  affixed  to  two  non- 
adjacent  toes  on  one  foot  of  the  user.  Although  GSR  is  typically 
measured  using  the  palm  of  the  hand  or  the  fingers,  the  soles  of 
the  feet  and  toes  have  an  equal  abundance  of  eccrine  glands 
that  release  sweat.  Thus,  the  feet  and  toes  offer  an  optimal  site 
for  sensor  placement  because  they  will  not  intrude  on  a  user’s 
ability  to  interact  with  input  devices  requiring  hand  operation 
(e.g.,  keyboard,  mouse).  The  EDA  (sampled  at  32  Hz)  data  were 
processed  to  develop  an  arousal  gauge,  which  was  later  inte¬ 
grated  with  other  sensors  to  establish  an  effective  multimeasure 
based  general  cognitive  workload  gauge. 


EKG  (ECG)-Based  Arousal  and  Cognitive 
Workload  Gauges 

Anthrotronix,  Inc.  also  provided  LM-ATL  with  a  tool  used  to  col¬ 
lect  EKG  data,  which  included  the  data  processing  algorithms 
necessary  to  convert  the  EKG  data  into  heart  rate  variability 
(HEVT)  and  produce  estimates  of  arousal  and  workload  (Kobus 
et  al.).  To  detect  heart  muscle  activity,  data  were  collected  via 
three  standard  EKG  electrodes,  which  were  positioned  in  the 
center  of  the  abdomen  and  below  the  left  and  right  collarbone. 
Data  from  these  EKG  sensors  (sampled  at  256  Hz)  was  used  to 
compute  a  user’s  HRV  in  milliseconds^  each  second  in  order  to 
assess  arousal  states.  LM-ATL  and  Anthrotronix  also  found  that 
when  concentrating  on  a  task,  the  user’s  HR’V  decreases,  and 
the  heartbeat  becomes  very  regular.  Thus,  HR'V  correlates  with 
arousal  may  also  serve  as  an  indicator  of  cognitive  workload. 
They  also  found  that  using  HR'V  to  estimate  cognitive  state  for 
an  individual  required  collecting  only  five  minutes  of  baseline 
data  prior  to  testing.  Consequently,  EKG-based  arousal  and 
cognitive  workload  gauges  may  be  feasible  tools  for  cognitive 
state  assessment  in  most  application  settings. 

Body  Position/Posture  Tracking 

In  support  of  multiple  I'WIIUS  teams’  AugCog  efforts  (e.g., 
LM-ATL,  Boeing,  DaimlerChrysler)  the  University  of  Pittsburgh 
provided  a  system  for  tracking  body  position  and  posture  analy¬ 
sis  (Kobus  et  al.,  2005).  In  this  system,  data  are  collected  via 
two  main  devices:  a  pressure-sensing  chair  and  a  head  tracker. 
The  sensor  chair  contains  two  64  X  64  grids  of  pressure  sensors 


^In  a  forthcoming  AugCog  International  Conference  Proceedings  publication  (Reeves,  Schmorrow,  &  Stanney,  2007),  AugCog  researchers  and  de¬ 
velopers  have  defined  cognitive  or  “functional”  state  to  be:  anything  needing  to  be  measured  to  base  design  decisions  on  it;  the  moment-to-moment 
dynamic  and  functional  capabilities  (e.g.,  capacity,  bottlenecks)  of  the  human  brain,  and;  a  condition  that  has  a  causative  /  moderating  /  predictive 
relationship  to  a  performance  variable. 
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TABLE  62. 1 .  Summary  of  Maturing  AugCog  Technologies  and  Proven  Implementations  to  Date* 


Sensors/Gauges 


Use  (Measures  What/Implemented  How)  Appropriateness  for  Mobile  Applications 


EDA/GSR-based  Arousal  and 
Cognitive  Workload  Gauge 
(Anthrotronix) 


Provides  estimates  of  arousal  and  general  cognitive  workload 
•  Implemented  by  LM-ATL  in  a  command  and  control 
closed-loop  application  domain 


Most  appropriate  for  stationary  users;  not 
yet  tested  in  mobile  application  domains  or 
with  mobile  users 


EKG  (ECG)-based  Arousal  and 
Cognitive  Workload  Gauge 
(Anthrotronix) 


Uses  HRV  measures  to  provide  estimates  of  arousal  and  general 
cognitive  workload 

•  Implemented  by  LM-ATL  in  a  command  and  control 
closed-loop  application  domain 


Most  appropriate  for  stationary  users;  not 
yet  tested  in  mobile  application  domains  or 
with  mobile  users 


Body  Position/Posture  Tracking 
(U  of  Pittsburgh) 


Posfirre  shift  data,  head  position,  and  head  velocity  are  used  to  gauge 
levels  of  attention  {e.g.,  engagement) 

•  Implemented  by  LM-ATL  and  Boeing  in  2  different 
command  and  control  closed-loop  application  domains 

•  Implemented  by  DaimlerChrysler  in  a  vehicular  closed-loop 
application  domain 


Appropriate  for  stationary  users  in 
stationary  or  mobile  application  domains; 
not  yet  tested  with  mobile  users  and  most 
likely  would  not  be  appropriate  for  such 


Stress  Gauge 
(IHMC) 


Arousal  Meter  Gauge 
(Clemson) 


Uses  VOG,  HFQRS,  and  EDR  to  track  autonomic  response  to  time^  Appropriate  for  mobile  users  and  stationary 

pressured,  high  workload  tasks  and  to  detect  moment^to^moment  or  mobile  application  domains 

cognitive  stress  related  to  managing  multiple  competing  tasks  and 
is  thus  good  for  measuring  attention  HIP  bottleneck  effects 

•  Implemented  by  Honeywell  in  closed-loop  dismounted 
soldier  application  domains 

Uses  IBI  derived  from  EGG  to  track  decrements  in  performance  due  to  Appropriate  for  mobile  users  and  stationary 

low  arousal  states  in  divided  attention  and  vigilance  tasks  and  is  thus  or  mobile  application  domains 

a  good  measure  of  attention  HIP  bottleneck  effects 

•  Implemented  by  Honeywell  in  closed-loop  dismounted 
soldier  application  domains 

•  Implemented  by  Boeing  in  a  command  and  control  closed- 
loop  application  domain 


executive  Load  Index  (XLI  Gauge)  Uses  EEG  measures  to  assess  ability  to  allocate  attentional  resources 
(Human  Bionics)  during  high  workload,  competing  tasks  and  is  thus  a  good  measure  of 

attention  HIP  bottleneck  effects  and  general  cognitive  workload 

•  Implemented  by  Honeywell  in  closed-loop  dismounted 
soldier  application  domains 

Uses  EEG  auditory  P300  signals  from  frontal  and  parietal  electrodes 
to  track  attentional  resources  used  to  attend  to  novel  stimuli  and  is 
thus  a  good  measure  of  attention  HIP  bottleneck  effects 

•  Implemented  by  Honeywell  in  closed-loop  dismounted 
soldier  application  domains 

Uses  EEG^based  measures  to  track  how  cognitively  engaged  a  person  is 
in  a  task  {level  of  alertness)  and  is  effective  at  assessing  attention  HIP 
bottleneck  effects  associated  with  both  sustained  and  divided  attention 
tasks,  particularly  during  low  workload  conditions 

•  Implemented  by  Honeywell  in  closed-loop  dismounted 
soldier  application  domains 

Uses  combined  sensors  to  gauge  general  workload  levels  and  estimate 
executive  function  and  attention  HIP  bottleneck  effects 

•  Implemented  by  Boeing  in  a  command  and  control 
closed-loop  application  domain 

Fast  fNlR  device  Measures  brain  blood  oxygenation  and  volume  changes  and  is  an 

(Drexel)  effective  tool  for  assessing  spatial  and  verbal  working  memory  HIP 

bottleneck  effects 

•  Implemented  by  LM-ATL  in  a  command  and  control 
closed-loop  application  domain 


NuWAM  combined  EEG,  EGG, 
EOG  sensors  (AFRL) 


Engagement  Index  Gauge 
(NAS/VCCNY/Honeywell) 


P300  Novelty  Detector  Gauge 
(CCNY/Columbia) 


Appropriate  for  mobile  users  and  stationary 
or  mobile  application  domains 


May  be  appropriate  for  mobile  users  and 
stationary  or  mobile  application  domains 


May  be  appropriate  for  mobile  users  and 
stationary  or  mobile  application  domains 


Most  appropriate  for  stationary  users;  not 
yet  tested  in  mobile  application  domains  or 
with  mobile  users 


Most  appropriate  for  stationary  users  but 
shows  promise  for  mobile  users  and  mobile 
application  domains 


(continues) 
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TABLE  62. 1 .  (Continued) 


Sensors/Gauges 

Use  (Measures  What/Implemented  How) 

Appropriateness  for  Mobile  Applications 

Whole  Head  fNIR 
(Archinoetics) 

Measures  brain  blood  oxygenation  and  volume  changes  and  is  an 
effective  tool  for  assessing  spatial  and  verbal  working  memory  HIP 
bottleneck  effects 

•  Implemented  by  Boeing  in  a  command  and  control 
closed-loop  application  domain 

Most  appropriate  for  stationary  users;  not 
yet  tested  in  mobile  application  domains  or 
with  mobile  users 

Pupillometry 
(ETI’s  ICA) 

Uses  proprietary  and  patented  techniques  for  estimating  cognitive 
activity  based  on  changes  in  pupil  dilation  and  gaze  and  is  a  good 
measure  of  general  cognitive  workload  and  sensory  input,  attention 
and  executive  function  HIP  bottleneck  effects 
•  Implemented  by  LM-ATL  and  Boeing  in  2  different 
command  and  control  closed-loop  application  domains 

Most  appropriate  for  stationary  users;  not 
yet  tested  in  mobile  application  domains  or 
with  mobile  users 

Low  Density  EEG 

(ABM’s  3,  6,  or  9  channel  cap) 

Uses  a  portable  EEG  cap,  wireless  transmitter,  and  B-Akrt  software 
to  effectively  estimate  various  types  of  cognitive  states,  namely-, 
vigilance/arousal,  workload,  engagement,  distraction/drowsiness, 
and  working  memory  levels 

•  Implemented  by  LM-ATL  in  a  command  and  control 
closed-loop  application  domain 

Appropriate  for  mobile  users  and  stationary 
or  mobile  application  domains 

High  density  EEG 

(EGI's  128  or  256  electrode  net) 

Uses  an  ERP  EEG^based  system  to  estimate  which  and  to  what  degree 
particular  brain  regions  are  invoked  during  task  performance;  may  be 
an  effective  tool  for  assessing  both  verbal  and  spatial  working  memory 
and  general  cognitive  workload 

•  Evaluated  but  not  implemented  by  LM-ATL  in  their 
command  and  control  closed-loop  application  domain 

Appropriate  for  stationary  users;  not  yet 
tested  in  mobile  application  domains  or 
with  mobile  users  and  may  be  too 
cumbersome  for  such  applications 

DaimlerChrysler's  EEG  system 
(FIRST) 

Uses  EEG  combined  with  EOG  and  EMG  to  assess  low  versus 
high  workload  levels  and  is  effective  at  assessing  sensory  memory 
bottleneck  effects 

•  Implemented  by  DaimlerChrysler  in  a  vehicular  closed-loop 
application  domain 

Appropriate  for  stationary  users  in 
stationary  or  mobile  application  domains; 
not  yet  tested  on  mobile  users 

EROS 

(U  of  Illinois) 

Uses  fast  optical  imaging  techniques  to  identify  brain  region  signatures 
resulting  from  cued  and  non-cued  attentional  shifts  during  task 
performance  and  thus  may  be  a  good  estimate  of  sensory,  attention, 
and  executive  function  HIP  bottleneck  effects 
•  Evaluated  for  potential  implemented  in  Boeing’s  command 
and  control  closed-loop  application  domain 

Appropriate  for  stationary  users  in 
stationary  or  mobile  application  domains; 
not  yet  tested  with  mobile  users 

CogMon 

(QinetiQ) 

Uses  behavioral  measures  from  interactions  with  cockpit  controls,  EEG- 
based  physiological  measures,  subjective  measures,  and  contextual 
Information  to  assess  stress,  alertness,  and  various  workload  levels 
and  Is  effective  at  assessing  all  4  HIP  bottleneck  effects 

•  Implemented  in  both  a  military  fast-jet  simulation  and  a 
command  and  control  application  environment 

•  Planned  for  implementation  efforts  in  support  of  Boeing’s 
AugCog  program  in  their  command  and  control  closed- 
loop  application  domain 

Appropriate  for  stationary  users  in 
stationary  or  mobile  application  domains; 
not  yet  tested  with  mobile  users 

*Note:  The  above  list  is  by  no  means  all-inclusive.  For  a  thorough  review  of  available  sensors,  see  NATO  report  on  Operator  Functional  State  Assessment  (Wilson  &  Schlegel, 
2004).  Further,  many  of  these  tools  may  be  used  individually  or  combined  in  various  ways  to  assess  multiple  HIP  bottlenecks,  as  well  as  other  HCI  task  factors  (e.g.,  context, 
environmental  stress  effects,  etc.). 


in  both  the  back  and  seat.  Data  from  these  sensors  are  used  to 
compute  changes  in  pressure  over  time,  and  the  medial-lateral 
shifts  of  position  in  the  seat.  To  record  head  position  and 
changes,  the  head  tracking  component  involves  using  a  head- 
mounted  wireless  transmitter  and  four  detectors,  positioned 
on  the  corners  of  a  computer  monitor  placed  in  front  of  the 
user.  Posture  shift  data,  head  position,  and  head  velocity,  each 


output  every  250  milliseconds,  were  used  by  LM-ATL  to  gauge 
levels  of  attention  (e.g.,  engagement).  Although  a  few  experi¬ 
mental  trials  indicated  ability  to  correlate  with  cognitive  work¬ 
load,  LM-ATL  determined  more  research  would  be  needed  to 
assess  whether  the  body  position/tracking  system  is  reliably  ef¬ 
fective  in  predicting  cognitive  workload  in  command  and  con¬ 
trol  type  tasks. 
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As  mentioned  later  in  the  section  titled  “EEG,”  Daimler- 
Chrysler  has  found  success  with  this  device  by  integrating  it  with 
their  EEG-  and  contextual-based  sensor  suite.  They  based  clas¬ 
sification  of  workload  conditions  on  the  basis  of  the  driver’s  be¬ 
havioral  data  from  the  University  of  Pittsburgh  system.  Head 
tracking  or  monitoring  of  the  driver’s  seat  posture,  for  example, 
allowed  DaimlerChrysler  to  detect  preparatory  movements 
likely  to  be  followed  by  certain  driving  maneuvers.  Combining 
this  technique  with  environmental  monitoring,  they  also  found 
it  possible  to  recognize  situations  indicative  of  certain  (upcom¬ 
ing)  cognitive  states  and  thus  develop  proactive  strategies  to 
miminize  cognitive  loading  effects. 

Stress  Gauge 

The  Institute  of  Human  and  Machine  Cognition  (IHMC)  in  con¬ 
junction  with  the  Honeywell  DARPA  IXSCIIUS  industry  team 
(Kobus  et  al.,  2005)  has  developed  a  composite  Stress  Gauge 
(Kass,  Doyle,  Raj,  Andrasik,  &  Higgins,  2003;  Raj,  Perry,  Abra¬ 
ham,  &  Rupert,  2003),  which  uses  a  weighted  average  of  the 
following  three  sensor  inputs  to  detect  the  participant’s  re¬ 
sponse  to  changes  in  cognitive  load:  "Video  Pupillometry  (VOG), 
High  Frequency  Electrocardiogram  (HEQRS  ECG),  and  Electro- 
dermal  Response  (EDR) .  This  gauge  has  been  used  to  track  the 
autonomic  response  to  time  pressure  in  a  high  workload  envi¬ 
ronment  and  to  detect  cognitive  stress  related  to  managing  mul¬ 
tiple  competing  tasks  on  a  moment-to-moment  basis.  Thus,  it 
is  appropriate  for  assessing  HIP  bottlenecks  related  to  atten¬ 
tion  (e.g.,  divided  attention  effects). 


Arousal  Meter  Gauge 

Clemson  University  in  conjunction  with  the  Honeywell  and  Boe¬ 
ing  DARPA  INVIIUS  industry  team  efforts  (Kobus  et  al.,  2005)  has 
developed  an  Arousal  Meter  (Hoover  &  Muth,  2003),  which  de¬ 
rives  autonomic  arousal  from  the  cardiac  interbeat  interval  (IBI) 
derived  from  the  ECG  at  one  millisecond  accuracy.  The  gauge 
has  three  levels  (low,  medium,  high),  which  increase  or  de¬ 
crease  with  respective  increases  or  decreases  in  autonomic 
arousal.  The  Arousal  Gauge  has  been  used  to  track  decrements 
in  performance  due  to  low  arousal  states  and  is  thus  appropri¬ 
ate  for  assessing  HIP  bottlenecks  related  to  attention  (e.g.,  di¬ 
vided  attention  and  vigilance  effects). 

executive  Load  Index  Gauge 

In  conjunction  with  the  Honeywell  industry  team  efforts.  Hu¬ 
man  Bionics  has  developed  the  executive  Load  Index  (XLI) 
gauge  (DuRousseau,  2004a,  2004b)  (see  Fig.  62.1),  which  uti¬ 
lizes  EEG  signals  to  allow  measurement  of  patterns  in  tightly- 
coupled  cortical  networks  tied  to  an  individual’s  allocation  of 
attentional  resources  as  a  user’s  cognitive  state  changes  in  re¬ 
sponse  to  conditional  task  load  (Kobus  et  al.,  2005).  The  index 
was  designed  to  measure  real-time  changes  in  cognitive  load 
related  to  the  processing  of  messages  and  has  been  previously 
validated  regarding  ability  to  discern  trial  difficulty  in  a  continu¬ 
ous  performance,  high-order  cognitive  task  battery.  The  XLI 
gauge  has  been  used  to  track  the  active  inhibition  of  compet¬ 
ing  tasks,  and  is  therefore  appropriate  for  assessing  divided  at- 


FIGURE  62. 1 .  Examples  of  Human  Bionics'  XLI  system. 
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tention  effects  on  users.  This  gauge  has  also  been  effectively  in¬ 
tegrated  into  a  general  workload  assessment  tool. 


P300  Novelty  Detector  Gauge 

Columbia  University  and  the  City  College  of  New  York  (CCNY) 
have  created  the  P300-novelty  detector  gauge  (Sajda,  Gerson, 
&  Parra,  2003)  in  support  of  Honeywell’s  DARPA  PWIIUS  pro¬ 
gram  efforts  (Kobus  et  ah,  2005).  This  gauge  spatially  inte¬ 
grates  EEG  auditory  signals  from  sensors  distributed  across 
the  scalp,  learning  a  high  dimensional  hyperplane  for  discrim¬ 
inating  between  task-relevant  (e.g.,  incoming  message  audi¬ 
tory  alert)  and  task-irrelevant  responses.  The  EEG  auditory 
P300  reflects  a  central  nervous  system  response  to  behav- 
iorally  relevant  infrequent  sounds.  Previous  literature  (Wick- 
ens,  Heffley,  Kramer,  &  Donchin,  1980)  suggests  that  the  P300 
amplitude,  in  response  to  a  task-relevant  infrequent  auditory 
stimulus,  is  modulated  by  attentional  resources  (e.g.,  if  the 
participant  is  very  focused  on  a  primary  task,  the  secondary 
auditory  stimulus  will  be  missed  and  the  corresponding  P300 
diminished).  User  state  assessment  is  based  on  the  assump¬ 
tion  that  the  presence  of  a  strong  evoked  response  indicates 
a  user  has  sufficient  attentional  resources  to  process  an  in¬ 
coming  message.  The  gauge  includes  frontal  and  parietal  elec¬ 
trodes  and  tracks  the  attentional  resources  used  to  attend  to 
novel  stimuli  and  is  also  an  indirect  measure  of  the  response 
capacity  to  competing  tasks  and  a  user’s  narrowing  attention 
(e.g.,  divided  attention  effects). 


Engagement  Index  Gauge 

The  Engagement  Index  Gauge  is  a  ratio  of  EEG  power  bands 
(beta/(alpha  +  theta))  and  was  developed  with  a  combined  ef¬ 
fort  from  NASA,  the  City  College  of  New  York  (CCN'Y)  and 
Honeywell  (Kobus  et  al.,  2005).  The  engagement  index  is  a 
measurement  of  how  cognitively  engaged  a  person  is  in  a  task 
(e.g.,  their  level  of  alertness)  and  is  described  in  more  detail 
by  Freeman,  Mikulka,  Prinzel,  and  Scerbo  (1999).  Adaptive  sys¬ 
tems  have  used  this  index  for  controlling  the  automation 
switching  between  manual  and  automatic  modes  (Mikulka, 
Scerbo,  &  Freeman,  2002;  Pope,  Bogart,  &  Bartolome,  1995). 
Prinzel,  Hadley,  Freeman,  and  Mikulka  (1999)  reported  that 
adaptive  task  allocation  may  be  best  reserved  for  the  end¬ 
points  of  the  task  engagement  continuum.  Honeywell,  there¬ 
fore,  implemented  two  levels  of  engagement  (low,  high)  to  re¬ 
flect  the  selection  and  focus  on  some  aspect  at  the  expense 
of  the  other  competing  demands  (e.g.,  focused  attention  ef¬ 
fects).  For  instance,  high  levels  of  engagement  reflect  selec¬ 
tion  and  attentional  focus,  whereas  low  levels  of  engagement 
indicate  that  the  participant  is  not  actively  engaged  with  some 
aspect  of  the  environment  during  the  respectively  assessed 
point  in  time.  The  Engagement  Index  Gauge  has  been  demon¬ 
strated  by  Honeywell  to  be  an  effective  tool  for  assessing  both 
sustained  (e.g.,  vigilance)  and  divided  attention  bottleneck 
effects  on  HIP  and  is  particularly  sensitive  to  low  workload 
conditions. 


New  Workload  Assessment  Monitor  (NuWAM; 

Combined  EEG,  EGG  and  EOG  Sensors) 

In  conjunction  with  the  Boeing  team’s  DARPA  I'WIIUS  program 
efforts,  the  Warfighter  Interface  Division  of  the  U.S.  Air  Force 
Research  Laboratories  (AFRL)  evaluated  AugCog  sensors  (e.g., 
EEG,  EGG,  and  EOG)  and  performed  preliminary  integration  at 
the  AFRL/HECP  facilities  at  Wright  Patterson  AFB  to  form  the 
NuWAM  sensor  suite.  NuWAM  is  based  on  an  artificial  neural  net 
(ANN)  cognitive  state  classifier  of  a  20-channel  EEG  system 
along  with  the  EGG  and  EOG  sensors.  Although  not  as  effective 
at  assessing  specific  HIP  bottlenecks,  the  AFRL  NuWAM  gauge 
can  determine  general  workload  levels  for  specific  types  of  tasks 
that  have  been  categorized  during  cognitive  task  analyses. 

Dry  Electrodes 

■Various  DARPA  I'WIIUS  program  team  members  worked  with 
Quantum  Applied  Science  and  Research,  Inc.  (QUASAR)  re¬ 
garding  their  development  of  advanced  dry  electrode  sensors 
for  EEG,  EGG,  and  EOG  use.  Boeing  found  QUASAR’S  EGG  cloth 
chest  strap  worked  very  well  as  a  fieldable  EGG  sensor  system 
integrated  with  NuWAM.  They  also  found  the  EOG  eyeglass 
frame  device  to  be  non-invasive  and  very  useful  for  holding  the 
electrodes  necessary  for  assessing  vertical  and  horizontal  eye 
movements. 

Functional  Near  Infrared  (/NIR) 

Honeywell,  Boeing,  and  LM-ATL  industry  teams  have  all 
worked  with  some  form  of  an  fNIR  device  (e.g.,  from  NovaSol 
[now  Archinoetics]  and/or  Drexel  University),  which  measures 
blood  oxygenation  and  volume  changes  in  the  brain  relative  to 
where  the  optical  sensors  are  placed  on  the  head  (Kobus  et  al., 
2005).  Functional  near  infrared  has  been  shown  to  be  an  effec¬ 
tive  tool  for  diagnosing  cognitive  activity  associated  with  spatial 
and  verbal  working  memory,  given  its  often  known  right  and  left 
hemispheric  separability,  respectively  (Smith  &  Jonides,  1998). 
Archinoetic’s  fNIR  sensors,  for  example  (see  Fig.  62.  2),  (a)  are 
composed  of  a  central  emitter  with  eight  detachable  detectors, 
(b)  are  designed  to  be  configurable  in  shape,  and  (c)  use  ure¬ 
thane  fingers  to  penetrate  through  hair  so  that  no  preparation 
of  the  participant  is  required  before  using  the  system  (and  it 
may  easily  be  applied  anywhere  on  a  user’s  head  without  adhe¬ 
sives).  The  emitter  uses  a  multiwavelength  LED  and  the  detec¬ 
tors  receive  the  fNIR  signal  with  an  amplified,  highly  sensitive 
photodiode.  The  detectors  plug  into  the  sensor  with  ribbon 
cable  and  are  angularly  mounted  to  the  emitter  with  "Velcro. 
The  curvature  formed  by  the  angular  mounting  conforms  to  the 
shape  of  the  scalp  for  maximum  skin  contact  to  minimize  noise 
caused  by  light  leakage. 

Boeing’s  /NIR  Approach 

Boeing  found  that  using  the  Archinoetics  “whole  head”  fNIR 
sensors  with  statistical  process  control  (SPG)  cognitive  state 
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FIGURE  62.2.  Archinoetics  (formerly  NovaSol)  fNlR  sensor. 


classifier  algorithms  proved  very  successful  at  detecting  tasks 
categorized  as  primarily  verbal  (e.g.,  tasks  with  words,  language, 
or  logical  operations;  Sanders  &  McCormick,  1993;  Kobus  et  al., 
2005).  Their  fNIR/SPC  approach  gave  qualitatively  different  re¬ 
sults  than  their  EEG/ANN  approach  previously  discussed.  A 
main  difference  observed  between  these  approaches  is  that  the 
fNIR/SPC  approach  acts  more  like  a  “trigger”  and  the  EEG/ANN 
approach  acts  more  like  a  gauge.  They  noted  how  a  cognitive 
trigger  differs  from  a  cognitive  gauge  in  that  a  trigger  responds 
quickly  to  the  onset  of  a  task,  but  does  not  stay  high  for  the  du¬ 
ration  of  the  task,  which  is  reflective  of  HIP  tendencies  to  re¬ 
spond  to  changes  in  the  environment  rather  than  remain  at  a 
steady  state.  Boeing  found  their  fNIR/SPC  approach  to  be  most 
effective  for  assessing  workload  effects  during  episodic  tasks, 
but  it  also  reliably  responded  to  other  HIP  bottleneck  gauges. 
Although  Boeing  found  their  fNIR/SPC  system  to  be  a  fast-acting 
cognitive  trigger,  they  noted  that  this  approach  has  a  disadvan¬ 
tage  in  that  it  currently  uses  one  sensor  per  SPC  channel  and 
thus  lacks  the  redundancy  of  their  EEG/ANN  system.  Boeing  will 
continue  to  refine  both  approaches  and  investigate  the  poten¬ 
tial  for  combining  them  to  provide  a  more  robust  and  sensitive 
gauge  of  cognitive  state  and  changes  in  real  time. 

LM-ATL’s  /NIR  Approach 

LM-ATL  used  an  fNIR  device  provided  by  Drexel  University, 
which  is  a  Fast  fNIR  device  that  is  mounted  to  a  user’s  forehead 
in  order  to  track  changes  in  cognitive  activity  in  the  frontal  lobe 
region  of  the  brain  (Kobus  et  al.,  2005).  The  Fast  fNIR  device 
contains  four  light  emitters  surrounded  by  10  detectors  to  mea¬ 
sure  the  amount  of  reflected  infrared  light  and  the  angle  at 
which  it  is  reflected.  Thus,  the  amount  of  cognitive  activity  in 
the  regions  of  the  cortex  below  the  device  can  be  tracked  and 
assessed.  LM-ATL  found  the  device  to  be  useful  in  gauging  levels 
of  attention,  verbal  working  memory,  and  overall  workload. 

In  future  AugCog  efforts,  Honeywell  plans  to  explore  fNIR 
capabilities  for  mobile  users  (dismounted  soldiers)  since  early 


studies  from  Drexel  and  LM-ATL  also  indicate  promise  for  being 
able  to  filter  the  data  to  remove  motion  artifacts  (Dorneich  et  al., 
2007;  Kobus  et  al.,  2005).  Based  on  Boeing’s  results  with  Nova- 
Sol’s  fNIR  device,  both  Honeywell  and  LM-ATL  (for  stationary 
command  and  control  type  tasks)  may  further  explore  its  utility 
in  their  real  work  applications,  assuming  usability  and  ergonomic 
fit  issues  from  the  ‘whole  head’  device  can  be  minimized. 

Pupillometry 

The  LM-ATAL  and  Boeing  teams  used  EyeTracking,  Inc.’s  (ETI) 
Index  of  Cognitive  Activity  (ICA)  system  (see  Fig.  62.3  and  Fig. 
62.4),  which  is  based  on  binocular  eye  tracking  sensors  and 


FIGURE  62.3.  ETI  sensor  system  setup.  (Note:  the  Eyelink  11  is 
manufactured  by  SR  Research  Global,  Inc.) 


FIGURE  62.4.  ETI  eye  tracking  system.  (Note:  the  Eyelink  II  is 
manufactured  by  SR  Research  Global,  Inc.) 
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measurement  of  pupil  dilation  (Kobus  et  al.,  2005)  ETI’s  sensor 
configuration  utilizes  a  lightweight  headset  with  three  small 
cameras  in  order  to  record  exactly  where  a  user  is  looking.  The 
user’s  point  of  gaze  is  superimposed  on  a  video  recording  of 
the  displays.  While  users  interact  with  a  computer  system,  their 
eye  movements  and  pupil  dilation  measures  are  monitored  and 
recorded  by  multiple  miniature  high  speed  cameras  (samples 
at  250  Hz).  The  eye  patterns  can  also  be  plotted  on  a  bitmap 
in  order  to  look  at  a  viewing  history  spanning  several  seconds 
or  longer. 

Although  not  a  direct  measure  of  cognitive  activity,  ETTs  sen¬ 
sor  technology  applies  proprietary  and  patented  techniques  to 
determine  the  level  of  cognitive  effort,  the  amount  of  sustained 
attention,  and  the  affective  response  of  a  workstation  operator 
as  based  on  their  ICA,  which  is  ETTs  patented  metric  for  esti¬ 
mating  cognitive  activity  from  changes  in  pupil  dilation  (e.g., 
abrupt  changes  in  pupil  diameter  indicate  an  operator’s  current 
levels  of  mental  effort,  while  the  point  of  gaze  metric  highlights 
specific  elements  causing  difficulty). 

Eor  Boeing’s  efforts,  ETI  also  provided  a  data  capture  sys¬ 
tem  that  could  integrate  and  display  in  real  time  their  multisen¬ 
sor  data  (e.g. ,  EEG,  fNIR  and  eye-tracking  systems) .  This  allowed 
them  to  produce  high  fidelity  estimates  of  cognitive  state 
change,  even  though  a  direct  measure  on  an  HIP  bottleneck  is 
not  presently  possible  with  ETTs  system. 

LM-ATL  found  that  it  may  be  possible  to  determine  different 
levels  of  utilization  of  left  vs.  right  brain  hemisphere  regions  by 
comparing  pupillometry  data  from  the  different  eyes.  Such  data 
could  be  used  as  an  indicator  as  to  whether  a  user  was  using  more 
verbal  (left  hemisphere)  or  spatial  (right  hemisphere)  process¬ 
ing  resources.  Consequently,  ETTs  device  also  shows  promise  for 
use  as  a  gauge  of  the  working  memory  HIP  bottleneck. 

EEG 

Out  of  all  brain  imaging  methods  currently  available,  EEG  is  well 
documented  and  is  proving  to  be  a  robust  neuro-based  tool  for 
real-time  system  application,  given  its  temporal  resolution  and 


potential  spatial  resolution  when  combined  with  other  tech¬ 
niques.  All  the  DARPA  I’WIIUS  industry  teams  used  some  form  of 
EEG  device  for  their  program  efforts  (Kobus  et  al.,  2005).  As  dis¬ 
cussed  previously,  Boeing  and  Honeywell  have  used  EEG  sen¬ 
sors  as  part  of  integrated  sensor  suites  to  gauge  various  states 
of  cognitive  activity  and  HIP  bottlenecks.  Efforts  involving 
LM-ATL’s  and  Daimler  Chrysler’s  EEG-based  approaches  are  also 
showing  very  promising  results  for  assessing  cognitive  state  in 
real  time. 

LM-ATL's  EEG^Based  Technology 

LM-ATL  used  Advanced  Brain  Monitoring’s  (ABM)  EEG  de¬ 
vice,  which  is  a  portable  EEG  cap,  wireless  transmitter,  and 
B-Alert  software  used  to  collect  continuous  EEG  signals  (see 
Fig.  62.5).  Various  models  of  the  device  are  available  from  ABM, 
but  LM-ATL  has  implemented  versions  using  three,  six,  and  nine 
EEG  electrodes  for  different  experiments  (Kobus  et  al.,  2005). 
The  cap  contains  a  wireless  transmission  unit  which  sends  mea¬ 
sures  of  electrical  activity  (in  microvolts,  /rV)  collected  by  each 
electrode  to  a  laptop  computer  once  every  millisecond.  LM-ATL’s 
use  of  ABM’s  EEG  device  has  contributed  to  the  development  of 
gauges  capable  of  estimating  various  types  of  cognitive  states, 
including  vigilance/arousal,  workload,  engagement,  distrac¬ 
tion/drowsiness,  and  verbal  working  memory.  LM-ATL  also  found 
that  using  ABM’s  continuous  EEG  device  to  estimate  cognitive 
state  of  an  individual  requires  only  15  minutes  of  baseline  data 
to  be  collected  for  that  individual,  indicating  feasibility  for  vari¬ 
ous  fielded  experiments. 

LM-ATL  has  also  implemented  an  ERP  EEG  system,  which 
was  developed  by  ElectroGeodesics,  Inc  (EGI).  Instead  of  a  cap, 
as  provided  with  ABM’s  system,  EGI’s  system  involves  users 
wearing  a  net  containing  either  128  or  256  embedded  elec¬ 
trodes.  The  EEG-net  covers  the  entire  hairline,  forehead,  and 
parts  of  each  cheek,  and  its  electrodes  capture  electrical  activity, 
in  Microvolts  (jrV),  in  the  cortex  below  each  electrode.  Com¬ 
paring  measures  for  different  pairs  of  electrodes  before  and  af¬ 
ter  tasks  of  interest  provided  LM-ATL  with  estimates  of  which 


FIGURE  62.5.  Examples  of  ABM’s  EEG  cap  in  use. 
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brain  regions  were  engaged  and  to  what  degree  as  users  per¬ 
formed  various  tasks.  These  data  were  then  processed  to  help 
establish  verbal  working  memory,  spatial  working  memory,  and 
cognitive  workload  gauges.  LM-ATL  found  that  using  EGI’s  ERP- 
EEG  based  system  allowed  accurate  localization  of  cognitive  ac¬ 
tivity  in  an  individual  with  approximately  five  minutes  of  base¬ 
line  data.  While  its  greater  intrusiveness  and  incompatibility 
with  other  LM-ATL  sensors  as  compared  to  ABM’s  device  may 
limit  its  application  in  certain  task  environments,  the  rapid 
baselining  and  measurement  sensitivity  provided  by  EGTs  de¬ 
vice  makes  it  feasible  for  research  applications  and  possibly  real- 
world  clinical  applications. 

DaimlerChrysler’s  EEG~Based  Technology 

In  conjunction  with  EIRST  of  Berlin,  Germany,  Daimler- 
Chrysler  has  developed  an  EEG-based  integrated  sensor  suite 
system  to  estimate  a  driver’s  cognitive  state  in  a  Mercedes-Benz 
vehicle  (Kobus  et  al.,  2005).  The  EEG  signals  collected  from  the 
driver  are  also  supported  by  EOG  and  electromyography  (EMG) 
recordings  to  facilitate  artifact  recognition.  The  EEG  is  sampled 
at  500  Hz  from  29  scalp  electrodes  arranged  according  to  the  in¬ 
ternational  10-20  system.  Eor  DaimlerChrysler’s  AugCog  pro¬ 
gram  efforts,  the  driver’s  seat  posture  and  head  movements  are 
also  captured  by  a  dedicated  seat  mat  and  a  head  tracker  device 
as  provided  by  the  University  of  Pittsburgh,  while  another  group 
of  sensors  are  devoted  to  capturing  task  context  and  estimating 
its  influence  on  the  driver’s  cognitive  state  in  a  system  devised  by 
Sandia  National  Laboratory  (Kobus  et  al.,  2005).  "Values  from  the 
EEG  signals  and  the  other  three  vehicle-sensor  groups  (which 
are  integrated  via  a  Controller  Area  Network  [CAN]  data  bus),  are 
classified  via  algorithms  whose  outputs  represent  interpretations 
of  the  driver’s  current  workload  (high  or  low)  or  of  the  context’s 
impact  on  it,  respectively.  DaimlerChrysler  has  found  that  their 
combination  of  both  EEG-based  neurophysiological  and  contex¬ 
tual-based  physiological  sensors  provides  a  good  measure  of 
cognitive  state  correlates  (e.g.,  overall  workload,  attention  and 
sensory  HIP  bottleneck  effects),  along  with  the  ability  to  monitor 
and  adapt  to  a  wide  variety  of  environmental  factors  (via  the 
vehicles  built-in  sensor  system)  in  dynamic,  critical  driving  situ¬ 
ations  (Kobus  et  al.,  2005).  They  have  also  noted  that  further 
refinement  of  these  gauges  for  closed-loop  AugCog  system  im¬ 
plementation  would  be  desirable,  including  ability  to  have:  a 
higher  temporal  resolution  of  the  EEG-based  classifiers,  with¬ 
out  losing  classification  accuracy,  and;  a  finer  gradation  of  work¬ 
load  levels  (at  least  low,  medium  and  high). 

Event  Related  Optical  Signal  (EROS) 

Boeing  investigated  the  potential  for  using  a  new  fast  optical 
imaging  tool  based  on  an  EROS  system  (e.g.,  reflected  light  from 
the  cortical  level  is  measured  and  reconstructed  into  a  map  of 
brain  activity),  which  is  being  researched  and  developed  at  the 
University  of  Illinois  (Gratton  &  Eabiani,  2001;  Kobus  et  al., 
2005).  Initial  investigations  involved  using  EROS  to  aid  in  iden¬ 
tifying  brain  signatures  of  preparation  that  arise  when  a  cue  dic¬ 


tates  a  user’s  covert  attentional  shift  to  the  left  or  the  right  visual 
field  when  a  correspondingly  mapped  precue  is  used  to  signal 
a  user  to  shift  their  attention  (but  not  their  eyes)  to  the  left  or 
the  right  side  of  fixation.  The  University  of  Illinois  has  found  a 
basic  switch  effect  in  the  right  parietal  cortex  whose  region 
showed  greater  activation  when  the  cue  signaled  a  shift  of 
attention  to  the  opposite  hemifield  from  that  attended  on  the 
previous  trial.  The  time  course  of  this  activity  showed  both 
an  early  response  (115  millisecond  post  cue)  and  a  later  one 
(354  ms)  that  was  similar  in  timing  to  values  previously  seen 
with  other  neurotechnologies. 

Further,  and  consistent  with  typical  contralateral  mapping  of 
hemifields  to  cortex,  the  right  hemisphere  showed  activation 
while  the  left  hemisphere  showed  inhibition  when  the  precue 
signaled  a  leftward  shift  of  attention  and  vice  versa.  These  later¬ 
alization  effects  occurred  in  a  region  commonly  referred  to  as 
the  frontal  eye  field  and  began  quite  early  (56  milliseconds)  and 
peaked  at  approximately  150  milliseconds.  In  essence,  Boeing 
found  the  EROS  measure  to  be  sensitive  to  both  general  and 
direction-specific  attentional  preparation  and  thus  may  be  a  good 
gauge  of  sensory  input,  attention  and  executive  function  HIP  bot¬ 
tlenecks  during  cognitive  state  assessment  in  future  applications. 
Table  62.1  provides  a  summary  of  the  advantages  of  this  and  the 
aforementioned  techniques  that  have  been  effectively  imple¬ 
mented  by  AugCog  researchers  and  developers  to  date. 

Cognitive  Monitor  (CogMon) 

QinetiQ  has  designed  the  CogMon  sensor  suite  system  (Dick¬ 
son,  2005a)  whose  primary  function  is  to  provide  estimations 
of  a  pilot’s  cognitive-affective  status  in  near-real  time  while  in¬ 
teracting  in  a  fast-jet  cockpit  simulation  environment.  Pilot  state 
inferences  are  derived  from  four  main  sensor  sources  (e.g., 
behavioral  measures  from  interactions  with  cockpit  controls, 
EEG-based  physiological  measures,  subjective  measures,  and 
contextual  information),  which  are  used  to  assess  the  pilot’s 
objective  and  subjective  cognitive  state  in  terms  of  arousal  and 
workload  levels  (Pleydell-Pearce,  Dickson,  &  "Whitecross,  2000). 
Cognitive  state  estimations  are  encapsulated  within  high-level 
state  descriptors  such  as  levels  of  stress,  alertness,  and  work¬ 
load,  and  are  then  provided  to  a  Tasking  Interface  Manager 
(TIM)  to  aid  in  directing  levels  of  automation  or  information 
presentation  within  the  cockpit.  CogMon  can  be  configured  to 
enable  both  online  and  post-hoc  operation. 


APPLICATION  OF  NEUROTECHNOLOGIES 


"While  other  application  areas  may  emerge  in  the  near  future, 
the  three  most  appropriate  application  areas  that  may  presently 
benefit  from  effective  implementation  of  current  neurotech¬ 
nologies  include:  (a)  Operational  closed-loop  systems,  (b)  De¬ 
sign  of  Systems,  and  (c)  Training/Education  Systems.  This  sec¬ 
tion  reviews  some  of  the  S&T  developments  in  these  areas  and 
implications  emerging  from  the  DARPA  I"WIIUS  program  and 
various  AugCog-related  efforts. 
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Closed-Loop  Systems 

Leveraging  physiological  and  neurophysiological  technologies 
(such  as  those  that  are  emerging  from  the  DARPA IWIIUS  pro¬ 
gram  and  maturing  via  real-world  application,  as  discussed  in  the 
previous  section),  it  is  now  possible  to  assess  the  ever-changing 
cognitive  state  of  the  user  in  real  time  and  mitigate  against  hu¬ 
man  performance  limitations  caused  by  known  HIP  bottlenecks. 
Recently,  it  has  been  demonstrated  that  the  loop  can  be  closed 
between  this  assessment  and  the  adaptation  of  the  user  interface 
in  real  time  to  accommodate  the  assessed  bottleneck.  This  is 
known  as  a  closed-loop  AugCog  system  and  is  the  core  of  the 
DARPA  IWIIUS  program  vision  and  of  the  Augmented  Cognition 
field  in  general.  Essential  elements  of  any  closed-loop  AugCog 
system  (see  Fig.  62.6)  must  therefore  include 

•  Cognitive  state  assessment  capabilities  via  physio-  and  neuro¬ 
physiological  tools  as  discussed  in  the  previous  section. 

•  Methods  for  classifying  the  cognitive  state  data  from  these 
tools  to  allow  mitigation  of  HIP  bottleneck  effects  via  adaptive 
automation  techniques  (e.g.,  AugCog  mitigation  strategies). 

•  An  integration  architecture  that  synchronizes  and  optimizes 
all  necessary  components  via  a  robust  controller^  that  main¬ 
tains  system  stability  (e.g.,  controlling  the  “when”  and  “how” 
of  the  adaptive  automation  techniques). 

As  summarized  next,  all  industry  teams  in  the  DARPA 
AugCog-based  IWIIUS  program  have  demonstrated  (a)  the  pos¬ 
sibility  of  reliably  identifying  different  cognitive  states  on  the  ba¬ 
sis  of  neuronal  signals,  (b)  that  this  information  can  be  used  to 
drive  mitigation  measures  in  human-machine  interaction  un¬ 
der  closed-loop  conditions,  and  most  importantly,  (c)  that  a  sys- 
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FIGURE  62.6.  Essential  elements  of  a  closed-loop  AugCog  system. 


tern  making  use  of  these  neurophysiological-based  technologies 
is  able  to  significantly  improve  the  cognitive  performance  of  hu¬ 
man  operators  in  multiple  task  domains. 

Honeywell's  AugCog  Closed-Loop  Efforts 

As  directed  by  DARPA,  the  Honeywell  research  team  ad¬ 
dressed  attention  as  their  primary  HIP  bottleneck  in  the  context 
of  a  ground  infantry  soldier  in  the  U.S.  Army’s  Future  Force  War¬ 
rior  (FFW)  program  (Kobus  et  al.,  2005).  The  FFW  will  have 
more  information  available  (via  physical  sensors,  communica¬ 
tions,  etc.),  and  processing  that  information  in  a  dynamic  and 
lethal  environment  requires  careful  management  of  limited  at- 
tentional  resources.  During  their  Concept  Validation  Experi¬ 
ment  (CVE)  they  demonstrated  a  380%  performance  improve¬ 
ment  where  attention  resources  were  required,  exceeding  the 
100%  improvement  goal  set  by  DARPA.  Honeywell  was  able  to 
correctly  classify  attention  state  changes  more  than  98%  of  the 
time  and  in  less  than  300  milliseconds.  Their  success  was  due 
to  their  use  of  the  following: 

•  Cognitive  state  assessment  tools  (e.g..  Stress  Gauge,  Arousal 
Meter  Gauge,  XLI,  P300  Novelty  detector.  Engagement  Index, 
fNIR,  and  EEG); 

•  Data  classification  methods  (e.g.,  spectral  analysis  techniques 
and  linear  regression  on  weighted  means); 

•  An  augmentation  manager  to  drive  mitigation  strategies  (e.g., 
task/information  and  modality  management,  task  offloading, 
and  task  sharing),  and; 

•  An  agent-based  integration  architecture. 

Honeywell’s  AugCog  S&T  closed-loop  approach  could  pro¬ 
vide  improved  human  performance  and  enhanced  survivability 
for  most  any  command  and  control  system  operator  (e.g.,  sta¬ 
tionary  command  center,  or  individual/mobile  dismounted  sol¬ 
diers),  given  the  core  capability  is  the  availability  of  netted  com¬ 
munications  and  mitigation  strategies  enabling  sharing  and 
real-time  collaboration,  all  of  which  enhances  the  kind  of  situa¬ 
tion  awareness  (SA)  necessary  to  drive  decisive  actions  in  di¬ 
verse  and  stressful  operational  environments. 

DaimlerChrysler's  AugCog  Closed-Loop  Efforts 

DaimlerChrysler  was  directed  by  DARPA  to  address  the  sen¬ 
sory  input  HIP  bottleneck  in  the  context  of  a  Light  Armored 
Vehicle  (LAV)  operator  (Kobus  et  al.,  2005).  A  future  LAV  operator 
will  have  multiple  cognitive  tasks  (e.g.,  communication,  planning, 
command  and  control)  to  manage.  Sensory  input  resources  are 
required  by  these  cognitive  tasks  as  well  as  by  the  primary  job  of 
maneuvering  the  vehicle.  DaimlerChrysler  investigated  the  sensory 
bottleneck  in  the  auditory  and  visual  modalities,  with  multiple 


In  a  forthcoming  AugCog  International  Conference  Proceedings  publication  (Reeves,  Young,  Schmorrow,  &  Stanney,  2007),  AugCog  researchers 
and  developers  have  defined  a  robust  controller  to  be  a  technique  that  uses  appropriate  methods  and  applied  theories  to  ensure  real-time  system¬ 
atic  control  and  stabilization  of  an  AugCog  system  to  significantly  improve  human-system  performance. 
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mitigation  strategies.  Their  experimental  results  under  real-world 
driving  conditions  showed  that  the  sensory  bottlenecks  could  be 
improved  by  as  much  as  108%  with  an  accuracy  of  up  to  98%,  de¬ 
pending  on  the  modality  being  examined.  The  sensory  bottle¬ 
neck  status  could  be  detected  in  as  little  as  200  milliseconds,  and 
mitigations  could  be  invoked  in  as  little  as  0.2  seconds,  depending 
on  the  mitigation  being  used.  Their  AugCog  closed-loop  suc¬ 
cesses  were  due  to  their  use  of  the  following: 

•  Cognitive  state  assessment  tools  (e.g.,  EEG-based  working 
memory  and  visual/auditory  sensory  input  gauges;  a  combi¬ 
nation  of  context,  driving  behavior,  and  seat  posture  sensors). 

•  Data  classification  methods  (e.g..  Linear  Discriminate  Analysis 
[LDA]  and  ANN). 

•  An  augmentation  manager  to  drive  mitigation  strategies  (e.g., 
task/information  and  modality  management,  task  scheduling). 

•  An  integration  architecture  based  on  both  stand-alone  gauge 
processors  and  a  vehicle  CAN  data-bus  technology. 

DaimlerChrysler’s  AugCog  S&T  closed-loop  approach  could 
be  used  to  facilitate  the  design  of  most  any  in-vehicle  systems 
that  need  to  be  interoperable  and  adaptable  in  real  time  to  an 
operator’s  changing  cognitive  abilities  and  limitations  instead  of 
the  operator  having  to  adapt  to  inefficiently  designed  vehicles. 
Potential  human  performance  payoffs  for  vehicle  operators 
would  include  (a)  increased  operator  situational  awareness, 
(b)  improved  asymmetric/intuitive  decision  making,  and  (c)  im¬ 
proved  attention  management  under  stress. 

LM-ATL's  AugCog  Closed-Loop  Efforts 

LM-ATL  was  directed  by  DARPA  to  address  working  memory 
as  their  primary  HIP  bottleneck  within  a  Tactical  Tomahawk 
Weapons  Control  System  (TTWCS)  simulation  environment 
(Kobus  et  al.,  2005).  The  essential  TTWCS  operator,  the  Tacti¬ 
cal  Strike  Coordinator  (TSC),  must  manage  a  number  of  mis¬ 
siles,  targets  and  shipboard  launch  platforms,  and  dynamically 
reassign  the  missiles  to  targets  as  critical  targets  pop-up,  missiles 
fail,  and  so  forth.  The  operator  must  recall  and  recognize  far 
more  information  than  can  be  maintained  in  working  memory 
in  order  to  perform  the  task  effectively.  As  part  of  their  CVE 
efforts,  LM-ATL  demonstrated  an  improvement  in  working 
memory  throughput  by  at  least  500%  by  using  an  Intelligent  Se¬ 
quencing  mitigation  strategy  to  strategically  present  related  in¬ 
formation  about  specific  missile-target  pairings  when  the  work¬ 
ing  memory  bottleneck  was  saturated.  Working  memory  status 
(high  or  low  cognitive  workload)  was  correctly  identified  in  over 
90%  of  the  trials,  with  the  sequencing  mitigation  being  invoked 
in  less  than  500  milliseconds.  LM-ATL’s  AugCog  closed-loop  suc¬ 
cesses  were  due  to  their  use  of  the  following: 

•  Cognitive  state  assessment  tools  (e.g.,  EEG-based  working 
memory  gauge,  EDA/GSR-  and  EKG  [ECG] -based  based 
arousal  and  cognitive  workload  gauge,  pupillometry  and  fNIR). 

•  Data  classification  methods  (e.g.,  ANN). 

•  PACE  (Performance  Augmentation  through  Cognitive  En¬ 
hancement)  processor  to  manage  tasks  and  enable  mitiga¬ 


tion  strategies  (e.g.,  modality  management  and  intelligent 
sequencing) . 

•  An  integration  architecture  based  on  both  the  rule-based 
PACE  architecture  and  an  ANN. 

LM-ATL’s  AugCog  closed-loop  system  was  designed  to  allow 
rapid  transition  of  PACE  into  commercial  and  DoD  acquisition 
activities  and  to  enable  system  operators  to  maintain  a  state  of 
high  alert  without  overtaxing  (or  undertaxing)  their  cognitive 
abilities.  LM-ATL’s  AugCog  S&T  approach  could  provide  im¬ 
proved  human  performance  for  most  any  command  and  control 
system  operator. 

Boeing's  AugCog  Closed-Loop  Efforts 

The  Boeing  team  was  assigned  the  executive  function  HIP 
bottleneck  in  the  context  of  an  Unmanned  Air  "Vehicle  (UAV)  sys¬ 
tems  operator  task.  Executive  function  is  analogous  to  the  cen¬ 
tral  processor  in  a  computer,  and  relates  to  the  need  to  process 
and  rapidly  assimilate  many  information  elements  for  a  single 
operator  to  manage  multiple  UAVs  (Kobus  et  al.,  2005).  Boeing 
developed  an  array  of  mitigations  that  were  triggered  to  address 
their  primary  cognitive  bottleneck  and  others  (e.g.,  attention, 
sensory  input,  and  working  memory).  With  regard  to  their  tar¬ 
geted  bottleneck,  they  empirically  showed  a  241%  performance 
improvement  in  executive  function-related  tasks,  with  a  classi¬ 
fication  accuracy  of  92%  in  less  than  one  second.  The  mitiga¬ 
tions  were  also  determined  and  triggered  in  less  than  one  sec¬ 
ond.  The  Boeing  team’s  AugCog  closed-loop  successes  were 
due  to  their  use  of  the  following: 

•  Cognitive  state  assessment  tools  (e.g.,  body  position/posture 
tracking,  Arousal  Meter  Gauge,  NuWAM  combined  EEG,  ECG, 
EOG  sensors,  fNIR/SPC,  EEG/ANN,  and  pupillometry). 

•  Data  classification  methods  (e.g.,  SPC  and  ANN). 

•  An  augmentation  manager  to  drive  mitigation  strategies  (e.g., 
task/information  and  modality  management,  task  scheduling/ 
sequencing,  and  map  declutter) . 

•  An  integration  architecture,  where  sensor/gauge  data  are  col¬ 
lected  via  a  networked  database  and  used  to  trigger  rule- 
based  mitigation  strategies). 

Boeing’s  AugCog  closed-loop  S&T  approach  could  be  ap¬ 
plicable  to  most  any  command  and  control  system,  not  just  the 
UAV  AugCog  prototype  workstation,  for  improving  operator 
performance.  Their  closed-loop  AugCog  system  was  designed 
for  rapid  transition  to  other  domains  (e.g..  Air  Traffic  Manage¬ 
ment  workstations.  Airborne  Early  Warning  and  Control  System 
[AQACS],  and  Airborne  Early  Warning  and  Control  [AEW&C]). 

QinetiQ’s  AugCog  Closed-Loop  Efforts 

The  latest  team  member  added  to  the  DARPA  AugCog  pro¬ 
gram  was  QinetiQ,  whose  closed-loop  AugCog  approach  has 
been  being  implemented  in  two  different  contexts:  (a)  a  military 
fast-mover  jet  simulation  environment,  and  (b)  a  simulated  UAV 
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command  and  control  task  environment,  where  their  AugCog 
S&T  approach  was  planned  to  be  adapted  to  support  the  Boe¬ 
ing  industry  team’s  AugCog  efforts  involving  their  UAV  com¬ 
mand  and  control  task  domain  (Dickson,  2005a;  Schmorrow  & 
Kruse,  2005).  QinetiQ’s  system  is  referred  to  as  the  Cognitive 
Cockpit  (COGPIT).  It  was  developed  in  support  of  the  vision  of 
a  cockpit  that  allows  a  plot  to  concentrate  skills  towards  the  rel¬ 
evant,  critical  mission  event,  at  the  appropriate  time  and  to  the 
appropriate  level,  by  ensuring  that  the  correct  balance  of  skill 
and  knowledge-based  tasks  are  appropriately  divided  between 
and  among  the  system  and  the  operator  (Dickson,  2005b;  Tay¬ 
lor,  Howells,  &  "Watson,  2000).  QinetiQ  uses  a  variety  of  tech¬ 
niques  to  support  the  dynamic  yet  stable  distribution  of  tasks 
between  the  system  and  operator  (e.g.,  attentional  management, 
adaptive  interfaces,  adaptive  automation,  decision  support,  task 
scheduling,  and  context-sensitive  support  of  responses  from  in¬ 
teractions  with  cockpit  controls).  The  success  of  QinetiQ’s 
seamless  cockpit  prototype  system,  COGPIT,  to  be  able  to  assess 
and  accommodate  a  pilot’s  cognitive  task  load  within  challeng¬ 
ing  ground  to  air  attack  scenarios,  is  due  to  their  advanced 
AugCog  S&T  approach  including  the  use  of  the  following  (Dick¬ 
son,  2005a,  2005b): 

•  Cognitive  and  context  state  assessment  tools  (e.g. ,  CogMon) . 

•  Proven  and  accurate  data  classification  methods. 

•  A  decision  support  system  whose  knowledge  base  is  derived 
from  structural  interviews  with  subject  matter  experts  (SMEs), 
tactical  manuals,  and  rules  of  engagement. 

•  An  adaptive  interface  manager  (e.g.,  the  Tasking  Interface 
Manager  [TIM]),  which  is  based  on  an  adaptive  automation 
framework,  the  Pilot  Authorization  and  Control  of  Tasks 
(PACT;  Bonner,  Taylor,  Fletcher,  &  Miller,  2000). 

•  A  flexible,  modular  agent-based  integration  architecture. 

QinetiQ’s  approach  represents  a  culminating  effort  that 
brings  together  all  the  tenets  and  the  vision  of  the  AugCog  field 
into  a  single,  fully  integrated  closed-loop  system.  Their  approach 
could  be  used  as  a  model  for  most  any  closed-loop  application 
domain,  as  well  as  for  other  HCI  design  goals.  The  next  section 
addresses  how  AugCog  S&T  emerging  and  maturing  from  Qine¬ 
tiQ’s,  as  well  as  the  other  four  main  industry  team’s  DARPA  pro¬ 
gram  efforts,  may  be  used  for  enhancing  the  HCI  design  lifecycle 
and  provide  more  diagnostic  system  evaluation  capabilities. 


Improving  HCI  System  Design  and  Evaluation  Capabilities 

The  ability  to  build  systems  truly  tailored  to  a  user’s  HIP  needs 
and  capabilities  is  now  possible.  Based  on  the  review  of  AugCog 
technologies  presented  in  the  previous  sections,  it  is  evident 
that  HCI  practitioners  now  have  available  to  them  various  phys¬ 
iological-  and  neurophysiological-based  tools  and  techniques 
that  may  be  used  to  better  specify  both  the  design  and  evalua¬ 


tion  of  human-computer  systems.  LM-ATL,  for  instance,  as  part 
of  the  Office  of  Naval  Research’s  (ONR)  Disruptive  Technologies 
Opportunity  Fund  (DTOF)  program,  is  currently  exploring  ways 
to  leverage  some  of  their  AugCog  technologies  as  tools  for  im¬ 
proving  future  TTWCS  interface  designs  and  establishing  more 
diagnostic  system  evaluation  (e.g.,  usability)  methods  for  inte¬ 
gration  into  their  system  design  lifecycle. 

Using  cognitive  state  gauges,  LM-ATL  will  be  able  to  assess  in 
real  time  the  load  on  various  HIP  bottlenecks  while  users  inter¬ 
act  with  prototyped  simulations  or  existing  systems — instead  of 
having  to  predict  potential  overload  issues  and/or  infer  poten¬ 
tial  HIP  workload  issues  based  on  resulting  human  performance 
effects,  as  in  traditional  HCI  design  and  evaluation  studies.  The 
neurophysiological-enhanced  studies  may  involve  either:  closed- 
loop  systems,  where  potential  alternate  adaptive  strategies  are 
compared  in  terms  of  their  HIP  workload  and  subsequent  hu¬ 
man  performance  effects  in  real  time,  or;  open  systems,  where 
an  existing  interface  design’s  efficiency  and  effectiveness  is  as¬ 
sessed  in  real  time  in  terms  of  HIP  workload  and  subsequent 
human  performance  effects  as  a  user  interacts  with  it. 

Most  any  of  the  neurophysiological  tools  and  techniques  im¬ 
plemented  by  all  of  the  DARPA  I'WIIUS  industry  teams  could  be 
used  and  applied  to  the  design  and  evaluation  of  most  any 
human-computer  system,  resulting  in  more  rapid,  effective,  and 
eventually  less  expensive  HCI  design  and  evaluation  processes. 
Such  benefits  could  also  be  achieved  in  training  system  appli¬ 
cation  areas,  as  discussed  next. 

Training  Systems  and  Operator  Selection 

Given  the  promising  results  realized  thus  far  by  the  AugCog  in¬ 
dustry  teams,  the  ability  to  better  understand  and  tap  the  neuro¬ 
physiological  correlates  of  cognitive  state  holds  great  promise  for 
benefiting  human  performance,  training,  and  education  (HPTE) 
efforts  through  ONR’s  HPTE  Program — the  ability  to  tailor  train¬ 
ing  approaches  to  the  needs  and  capabilities  of  individuals  within 
any  given  group  and  dynamically  adapt  approaches,  potentially 
in  real  time,  during  actual  training  sessions.  This  section  reviews 
particular  ONR  Cognitive  Performance  Enhancement  (CPE)  ini¬ 
tiatives  and  future  directions  in  potential  operator  selection 
methods  that  may  be  directly  impacted  by  AugCog  S&T  that  has 
emerged  from  the  DARPA  program  and  the  Augmented  Cogni¬ 
tion  field  in  general. 

CPE  Initiative'^ 

The  CPE  efforts  (as  part  of  ONR’s  HPTE  program)  focus  on 
developing  training  systems  and  novel  training  principles  that 
accelerate  and  enhance  the  training  progression  (e.g.,  novice 
to  expert)  of  both  individual  Marines  and  teams  of  Marines 
working  in  decision  making  environments.  The  main  objective 
is  to  develop  tools  to  support  real-time  cognitive  and  behavioral 


^Adapted  from  2005  ONR  HPTE  Program  Review  brief  by  Schmorrow  and  Stripling  and  various  performer  briefs,  CPE  program  briefs,  and  the  fol¬ 
iowing  preliminary  AMSUS  abstract  by  Stripling,  Cohn,  &  Kruse:  Cognitive  Performance  Enhancement:  Leveraging  Neuro-cognitive  Principles  to 
Accelerate  the  Transition  from  Novice  to  Expert. 


1234  •  SCHMORROW  ET  AL. 


assessment  and  improvement  of  individuals  and  teams  during 
operations  and  training.  Implicit  is  the  need  to  develop  systems 
that  will  enable  the  ability  to  more  effectively  screen  individu¬ 
als  based  on  their  decision-making  ability  or  their  capability  for 
mastery  of  decision  making  skills.  Addressing  such  a  need  could 
substantially  reduce  training  time  and  costs  currently  invested 
and  necessary  to  get  military  personnel  to  deploy-ready  levels. 

Assessing  training  progression.  There  are  several  po¬ 
tential  sources  of  objective  metrics  that  could  be  used  to  sup¬ 
port  adaptive  training  and  assessment  of  deploy-readiness  of 
military  personnel.  A  promising  yet  largely  untapped  source  of 
such  metrics  are  neurophysiological  data  that  are  processed  in 
real  time  to  identify  cognitive  states  and  stages  of  development 
important  to  the  training  process.  Years  of  basic  research  have 
identified  numerous  markers  of  attention,  engagement,  and 
cognitive  load  that  systematically  change  as  task  proficiency  de¬ 
velops.  Various  noninvasive  brain  monitoring  technologies 
have  been  successfully  applied  to  the  challenge  of  document¬ 
ing  the  transformation  from  novice  to  expert  in  a  variety  of 
domains,  including:  identifying  indices  of  skill  level  in  basic  lab¬ 
oratory  tasks  and  marksmanship  (Ciesielski  &  French,  1989; 
Deeny  Hillman,  Janelle,  &  Hatfield,  2003;  Kerick,  Douglass,  & 
Hatfield,  2004);  identifying  indices  of  skill  acquisition  in  com¬ 
puter  games  (Smith,  McEvoy  &  Gevins,  1999),  and;  detection  of 
the  progression  towards  automaticity  of  syntactic  processing 
(Gunter  &  Friederici,  1999).  Recent  studies  have  also  docu¬ 
mented  brain  activity  patterns  that  are  indicative  of  a  shift  in 
cognitive  processing  from  focused  conscious  effort  to  “auto¬ 
matic”  processing  (Floyer-Lea  &  Matthews,  2004;  Peres  et  al., 
2000).  This  shift  to  automatic  processing  may  occur  after  task 
performance  has  leveled  off  and  is  significant  because  it  marks 
the  point  at  which  minimal  cognitive  effort  is  required  to  per¬ 
form  the  task  well.  This  transition  may  serve  as  an  indication 
of  the  point  at  which  the  trainee  is  ready  to  learn  additional 
components  of  the  task  or  be  taught  higher  level  strategies 
within  the  particular  domain. 

The  identification  of  neurophysiological  measures  that  cor¬ 
relate  with  task  and  skill  mastery  is  of  value  for  three  practical 
reasons: 

1.  As  described  above,  it  could  serve  as  the  basis  for  real-time 
closed-loop  training  systems  that  adapt  to  the  individual’s 
current  state  of  task  proficiency  and  automaticity;  this  appli¬ 
cation  would  have  the  most  profound  impact  on  the  training 
capabilities,  but  is  not  ready  for  widespread  distribution. 

2.  Prior  to  widespread  distribution  for  closed-loop  training, 
AugCog-based  technologies  could  be  used  to  evaluate  alter¬ 
native  training  strategies  and  systems  to  identify  those  that 
provide  the  most  effective  learning  environment.  This  ap¬ 
plication  is  potentially  feasible  in  the  near  future,  but  would 
require  more  upfront  research  to  confirm  the  neurophysio¬ 
logical  patterns  associated  with  task  mastery  are  applicable 
to  more  complex  task  environments. 

3.  During  the  course  of  this  type  of  research,  novel  training 
strategies  may  emerge  that  are  validated  by  laboratory  ob¬ 
servations  of  accelerated  task  mastery  and  automaticity,  but 
which  do  not  rely  on  real-time  monitoring  of  brain  activity 
to  enact.  Such  observations  would  potentially  be  immedi¬ 
ately  transferable  to  instructional  environments. 


Using  AugCog  technologies,  such  as  sensors  (e.g.,  EEG,  EGG, 
fNIR,  pupillometry)  and  gauges  (e.g.,  EEG-,  fNIR,  or  EROS-based 
cognitive  workload  gauge)  employed  by  the  AugCog  industry 
teams  to  assess  how  novices  become  experts,  or  to  identify 
when  a  soldier  may  be  deploy-ready  could  have  significant  im¬ 
pact  for  benefiting  most  any  HPTE,  including: 

•  Enhanced  instructor  understanding  of  a  trainee’s  capabili¬ 
ties  and  limitations 

-  More  suitable  operator  assignments  to  match  skill  levels 

-  Ability  to  predict  fast  learners  early  during  training 

-  More  timely  delivery  of  information  provided  to  trainee 

-  Increased  learning  capabilities  (e.g.,  via  an  increase  of  an  in¬ 
dividual’s  cognitive  processing  capabilities  during  training) 

•  More  expertly  trained  and  mentally  prepared  operators 

•  Enhanced  training  acquisition 

•  More  effective  training  per  unit  time  (reduce  required  finan¬ 
cial  investment) 

•  Enhanced  development  of  effective  individual  and  team  per¬ 
formance  (possibly  in  real  time) 

Assessing  individual  potential  (e.g.,  AugCog-enabled 
screening  tools).  Similar  AugCog  S&T  approaches  used  for 
assessment  of  training  progression  could  also  be  used  to  deter¬ 
mine  an  individual’s  cognitive  potential  in  general  or  their  po¬ 
tential  to  reach  a  certain  skill  level  in  a  particular  task  domain. 
For  instance,  the  military  has  been  using  screening  tools  for 
decades  to  select  for  fighter  pilots  before  thousands  of  dollars 
and  man  hours  are  wasted  in  training  someone  who  may  never 
be  cognitively  equipped  to  attain  the  necessary  skill  level  re¬ 
quired  for  such  a  domain.  AugCog  enabled  screening  tools 
could  be  used  for  similar  purposes  to  improve  the  diagnostic  ca¬ 
pabilities  of  existing  screening  strategies. 

Assessment  of  capabilities  via  an  extensive  neurophysiological 
and  neuropsychological/psychometric  components  skills  battery 
would  be  required  in  such  an  effort.  An  example  of  such  an 
approach  could  be  to  determine  the  neural  correlates  of  the 
Knowledge,  Skills  and  Attitudes  [and/or  Abilities]  (KSAs)  re¬ 
quired  for  a  particular  task  domain  and  training  level.  Individuals 
with  the  greatest  human  performance  results  respective  to  the 
required  KSAs  for  the  particular  task  domain  could  be  used  as 
the  expert  model  with  which  to  compare  other  individual’s  neuro¬ 
physiological  indicators  and  patterns.  This  approach  would  also 
need  to  leverage  the  novice-to-expert  training  progression  as¬ 
sessment  techniques  discussed  above  to  establish  the  KSA 
neuro-level  correlates  along  the  entire  training  progression  con¬ 
tinuum  (see  Fig.  62.7)  for  necessary  comprehensive  comparative 
evaluations.  Individuals  could  be  assessed  initially  to  determine 
a  priori  how  suited  they  might  or  might  not  be  for  a  particular 
task  domain  (e.g.,  aptitude  assessment).  They  could  later  be  as¬ 
sessed  to  determine  how  their  training  is  progressing  at  both  the 
human  performance  and  neuro  levels.  The  KSA  approach  is  only 
a  recommendation,  as  other  neuro-level  correlates  of  perfor¬ 
mance  and  aptitude  may  also  be  used.  Specific  benefits  of  any 
neuro-level  screening/selection  approach  could  include: 

•  Identifying  which  individuals  may  be  most  apt  to  reach  de- 
ploy-ready  or  expert  levels  for  a  given  task  domain  and/or 
who  may  never  have  the  “aptitude”  to  get  to  required  levels. 
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FIGURE  62.7.  Conceptual  model  for  leveraging  neural  corre¬ 
lates  of  training  levels. 


•  Identifying  operators  with  the  greatest  capacities  for  infor¬ 
mation  processing  and  decision  making,  and  then  assigning 
them  to  critical  information  operations. 

•  Identifying  users  who  could  be  trained  to  deployable-  or 
expert-level  more  quickly  than  others. 

•  Reassessing  trained  individuals  to  see  when  retraining  may  be 
necessary  (e.g.,  to  reacquire  necessary  KSA  levels). 

The  next  section  reviews  the  lessons  learned  to  date  from  the 
DARPA IWIIUS  program  industry  teams  in  pursuit  of  their  HCI  goal 
to  build  closed-loop  operational  AugCog  systems.  Such  issues  may 
also  directly  impact  other  the  other  main  HCI  application  areas 
(system  design  and  evaluation  and  training  system  applications). 


LESSONS  LEARNED  AND  FUTURE  DIRECTIONS 


AugCog  technologies  are  at  various  levels  of  maturity.  Thus,  for 
the  promises  of  the  AugCog  S&T  to  be  fully  realized,  HCI  prac¬ 
titioners  need  to  consider  the  state  of  current  technologies  and 
the  lessons  learned  from  the  DARPA  IWIIUS  program  researchers 
and  developers  to  date.  To  facilitate  an  assessment  of  the  state 
of  the  field  of  AugCog  to  date,  workshops  involving  principal 
AugCog  scientists  and  engineers  were  organized  at  the  March 
2005  DARPA  AugCog  Program  Review  and  at  the  Augmented 
Cognition  International  Conference  in  Las  Vegas,  Nevada  in  July 
of  2005.  The  summary  papers  resulting  from  the  first  workshop 
addressed  AugCog  S&T  maturity  levels  and  lessons  learned  for 
various  areas  and  are  summarized  as  follows  in  conjunction  with 
the  industry  teams’  DARPA  IWIIUS  program  reports  (Kobus 
etal.,  2005). 

The  need  to  understand  the  user.  A  cognitive  task 
analysis  should  be  conducted  in  order  to  fully  understand  the 
decision-making  requirements  of  the  operator.  Having  a  clear 
understanding  of  who  the  user  is,  as  well  as  thoroughly  under¬ 
standing  what  they  do,  is  essential  for  successfully  constructing 
future  AugCog  technologies. 

Integrate  early,  integrate  often.  The  earlier  and  more 
thoroughly  system  integration  (e.g.,  of  the  various  sensors, 
computer  hardware  and  software)  was  considered,  the  fewer 
the  setbacks  and  more  effective  the  mitigations.  Slight  modifi¬ 
cations  in  sensors  should  not  be  treated  superficially  and  re¬ 
quire  systematic  assessment  throughout  development. 


Development  strategy.  A  project  execution  model  that 
effectively  balances  a  mix  of  execution  strategies,  where  a  dis¬ 
tinction  is  made  between  developmental  experimentation  and 
formal  empirical  testing,  seems  best.  AugCog  Industry  teams 
that  were  highly  successful  conducted  step-wise  assessments 
identifying  the  contributions  to  each  individual  modification  of 
sensor/gauge  implementation. 

One  sensor Igauge  does  not  fit  all.  Some  of  the  gauges 
that  were  thought  to  be  the  most  robust  during  Phase  1  of  the 
DARPA  IWIIUS  program  were  found  to  be  less  effective  and/or 
useful  when  they  were  applied  to  the  various  operational  re¬ 
quirements  in  Phases  2  and  3.  Conversely,  other  gauges  that 
were  considered  “marginal”  during  the  Phase  1  technical  inte¬ 
gration  experiment  (TIE)  have  provided  very  effective  results 
and  operational  utility.  Further,  gauges  that  seemed  to  work  well 
for  one  of  the  teams  in  their  application  sometimes  worked 
poorly  for  other  teams  within  their  application.  In  other  words, 
the  selection  of  specific  gauges  for  use  within  the  operational 
environment  is  highly  context  (task)  dependent. 

Cognitive  bottlenecks  may  be  task  specific.  The  no¬ 
tion  of  isolated  cognitive  bottlenecks  has  significant  appeal  from 
a  theoretical  standpoint;  however,  it  appears  that  when  applying 
these  concepts  to  real-world  tasks,  they  are  often  overlapping  or 
interacting  in  nature.  It  might  be  more  useful  to  define  bottle¬ 
necks  and  gauges  in  terms  of  the  tasks  that  are  to  be  mitigated 
vice  attempting  to  address  conceptual  bottlenecks  that  are  not 
adequately  defined  in  terms  of  the  operational  task.  It  should  be 
noted  that  regardless  of  the  operational  definition  of  a  specific 
bottleneck,  the  net  result  of  the  mitigation  strategies  employed 
by  AugCog  teams  improved  overall  task  performance. 

Artifact  detection  and  correction.  A  main  concern 
of  artifact  detection  that  is  an  urgent  concern  for  the  future 
advancement  of  AugCog  S&T  is  the  ability  to  decipher  between 
physiological  changes  related  to  cognitive  activity  vice  the  phys¬ 
ical  requirements  of  the  task.  Such  an  issue  may  be  even  more 
of  a  concern  for  applications  involving  mobile  users.  Honey¬ 
well,  for  instance,  noted  that  in  applications  involving  mobile 
users,  sensors  can  become  loose  and  provide  intermittent  sig¬ 
nals  (and  the  data  are  directly  affected  by  the  motion — simple 
actions  such  as  blinks  can  have  large  EEG  signal  effects).  They 
need  to  be  able  to  reduce  the  effect  of  the  motion  artifacts  in 
order  to  continue  to  provide  adequate  estimations  of  cogni¬ 
tive  state. 

Perceived  stability jtrust.  An  issue  with  stability  and  pre¬ 
dictability  of  AugCog  mitigation  strategies  concerns  the  use  of 
a  combination  of  physiologically  based  gauges  in  conjunction 
with  context-based  sensors.  If  an  operator  does  not  understand 
the  logic  (e.g.,  the  rationale  for  triggering  the  onset/offset  of  the 
mitigation),  there  is  a  significant  potential  for  disruption  and  de¬ 
graded  performance.  Another  issue  relates  to  the  impact  of  de¬ 
graded  sensors.  If  users  are  not  aware  of  the  degraded  func¬ 
tionality  of  the  system,  their  expectations  of  augmentation 
would  not  be  met,  and  the  system  may  be  perceived  as  unstable. 
Before  such  a  system  is  deployed,  training  strategies  will  need  to 
be  developed  to  allow  users  to  learn  how  to  operate  in  the  face 
of  degraded  augmentation. 
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Timing  of  mitigation  strategies.  Many  of  the  changes 
in  perceived  cognitive  load  may  be  momentary.  The  physiolog¬ 
ical  measures  being  used  as  gauges  by  the  industry  teams  may 
detect  either  a  transient  or  sustained  change  in  cognitive  activ¬ 
ity.  It  may  be  unrealistic,  or  at  least  have  little  practical  applica¬ 
tion,  to  have  a  system  that  is  sensitive  enough  to  detect  and  im¬ 
plement  mitigations  in  terms  of  seconds.  It  is  clear  that  in  the 
operational  environment,  momentary  changes  in  one  or  more 
of  the  cognitive  bottlenecks  areas  may  occur  rapidly  (<  1  minute) . 
More  research  needs  to  be  conducted  to  investigate  optimal 
timing  for  implementing  various  mitigation  strategies. 

Mitigation  strategy  principles.  Mitigation  strategies 
to  date  have  been  designed  to  address  a  specific  HIP  bottleneck 
to  achieve  specific  performance  objectives  (e.g.,  avoid  overload, 
increase  throughput,  offload,  adapt  to  individual  differences). 
All  teams  found  that  each  mitigation  strategy  is  beneficial  in 
some  situations  while  not  under  other  circumstances,  and  thus 
must  be  regulated  by  human  physiological  and  neuro-physio- 
logical  indicators  such  that  it  does  not  undermine  but  rather  en¬ 
hances  human  performance.  Consequently,  the  following  miti¬ 
gation  strategy  principles  may  be  useful  for  future  closed-loop 
AugCog  development  efforts: 

•  Pursue  only  those  mitigation  strategies  that  affirm  the  goals 
of  AugCog  (e.g.,  those  that  extend,  by  an  order  of  magnitude 
or  more,  the  information  management  capacity  of  the  human- 
computer  integral) . 

•  Seek  only  to  implement  mitigation  strategies  based  on  objec¬ 
tive  and  scientifically  valid  human  performance  assessment. 

•  Pursue  only  those  mitigation  strategies  that  are  operationally 
feasible  (either  now  or  in  the  foreseeable  future) . 

•  Identify  how/when  individual  differences  will  affect  appro¬ 
priate  mitigation  strategy  choices  and  determine  how  to 
manage  these  differences  such  that  human  performance  is 
enhanced  for  all. 

•  Enhance  the  effectiveness  of  mitigation  strategy  implementa¬ 
tion  by  identifying  which  strategies  may  be  user,  context,  or 
domain  dependent  and  which  are  generalizable  across  these 
dimensions. 

•  Leverage  the  Arts  to  develop  truly  innovative  mitigation 
strategies,  particularly  as  AugCog  S&T  is  applied  to  new  dis¬ 
play  devices  (e.g.,  PDAs,  augmented  reality  systems,  etc.). 

Hardware  integration.  Results  in  Phase  2  by  the  four 
development  teams  indicated  that  both  the  underlying  phe¬ 
nomena  that  the  gauges  are  based  on,  as  well  as  the  artifacts 
that  need  to  be  removed  for  reliable  gauge  performance,  may 
require  fielded  systems  to  be  calibrated  before  each  use.  Future 
efforts  will  need  to  consider  how  to  streamline  the  calibrations 
process  and  make  it  feasible  under  field  conditions.  Further,  all 
teams  noted  the  need  to  address  making  AugCog  technologies 
more  easily  donned  and  readily  calibrated.  Significant  consid¬ 
eration  must  also  be  given  to  potential  interference  sources  in 
the  field  (particularly  when  wireless  devices  are  used),  and  po¬ 
tential  design  solutions  that  could  avoid  such  issues  in  the  first 
place.  Integrated  sensors  designed  to  be  used  in  the  field  from 
the  outset  are  more  likely  to  adequately  address  this  issue  than 


technologies  adapted  from  laboratory  or  medical  applications. 
System  integration  architectures  will  need  to  be  refined  and 
streamlined  in  order  to  improve  new  sensor  integration  and  to 
reduce  the  processing  demands  on  any  emerging  sensor  inte¬ 
gration  requirements. 

Context  gauges.  Most  of  the  AugCog  S&T  developments 
within  the  IWIIUS  program  focused  on  cognitive  state  gauges.  It 
is  becoming  evident  that  these  are  necessary,  but  not  sufficient, 
for  effective  AugCog  systems.  Each  of  the  teams  identified  the 
need  for  “context  gauges”  (e.g.,  algorithms  and  sensors  that 
would  allow  the  augmentation  system  to  have  inputs  regarding 
what  is  happening  in  the  environment  in  which  the  user  is 
interacting) .  Such  inputs  could  be  critical  in  determining  which 
of  several  mitigations  might  be  applied,  as  well  as  when  it  is 
appropriate  to  turn  off  the  mitigation. 

Individual  differences.  The  teams  reported  significant 
issues  with  both  interindividual  and  intraindividual  differences, 
particularly  for  the  EEG-derived  gauges.  Another  aspect  of  the 
underlying  EEG  phenomena  that  is  poorly  understood  is  the 
effect  of  extensive  experience  with  the  tasks  on  the  utility  of 
EEG-derived  gauges.  It  appears  that  successful  utilization  of 
EEG  data  in  an  AugCog  system  may  require  that  the  filters  used 
in  separating  signals  from  artifacts  will  need  to  be  tailored  to 
every  user  and  even  be  dynamically  adaptable  during  the  course 
of  use  of  the  AugCog  system. 

Proactive  vice  reactive  augmented  cognition.  AugCog 
closed-loop  systems  developed  to  date  are  largely  reactive  in 
that  the  user  must  first  become  overloaded  (or  at  least  close  to 
overloaded)  before  the  system  will  invoke  a  mitigation  strategy. 
Although  it  is  important  to  detect  when  such  levels  of  activity 
occur,  it  would  be  of  great  operational  importance  to  develop 
systems  that  are  also  predictive  (proactive)  rather  than  simply 
reactive. 

Metrics.  All  teams  had  difficulty  in  determining  the  ap¬ 
propriate  level  of  analysis  for  assessing  the  impact  of  their 
AugCog  closed-loop  system.  The  difficulty  stemmed  from 
whether  trying  to  assess  the  effectiveness  of  detecting  changes 
in  the  cognitive  bottleneck  gauges,  the  effectiveness  of  the  mit¬ 
igations,  or  the  effectiveness  of  the  overall  human-machine  sys¬ 
tem  in  performing  a  task.  Thus,  the  definition  and  use  of  ap¬ 
propriate  metrics  for  AugCog-enabled  systems  will  continue  to 
be  an  important  issue. 

Prior  to  and  during  the  second  workshop  held  at  the 
AugCog  International  conference,  key  scientists  and  engi¬ 
neers  provided  an  initial  survey-based  assessment  of  the  field 
of  AugCog.  Overviews  of  the  survey  and  workshop  results  are 
presented  in  upcoming  conference  proceedings  publications 
(Reeves,  Schmorrow,  &  Stanney,  2007;  Reeves,  Young, 
Schmorrow,  &  Stanney,  2007).  The  survey  and  workshop  fo¬ 
cused  on  achieving  consensus  on  the  definition  of  AugCog 
and  the  various  components  that  comprise  its  R&D  base,  as 
well  identifying  impediments  to  the  field’s  current  and  future 
R&D.  Some  of  the  survey’s  and  workshop’s  key  results  are 
summarized  next. 
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Cognitive  State  Sensors 

There  were  many  varied  issues  reported  regarding  important 
challenges  to  the  effective  design  of  cognitive  state  sensors,  all 
of  which  culminated  in  the  main  theme  that  the  basic  sensor 
technology  on  which  AugCog  technology  is  based  still  requires 
basic  and  applied  research  in  many  areas,  including: 

•  Design  of  the  sensors  for  ease  of  use,  calibration,  appropriate 
resolution/sensitivity,  noise  cancellation,  and  less  invasiveness. 

•  Better  understanding  of  neuro,  psych,  and  cognitive  theo¬ 
ries  that  should  be  driving  sensor  placement,  data  analysis, 
and  subsequent  “cognitive  load”  and/or  “cognitive  state” 
gauge  derivation. 

•  Determining  appropriate  experimental  techniques  in  applied 
task  settings  to  assess  effectiveness  of  sensors  to  accommo¬ 
date  both  general  use  settings  and  individual  differences 
across  task  domains. 

Many  of  these  same  issues  were  reiterated  when  discussing 
important  challenges  to  the  effective  use  of  cognitive  state  sen¬ 
sors,  with  wearability,  accounting  and  accommodating  for  day- 
to-day  and  individual  user  variability,  and  the  development  of 
appropriate  signal  processing  algorithms  and  their  interpreta¬ 
tions  being  additional  points  of  concern.  When  discussing  im¬ 
portant  challenges  to  the  effective  adoption  of  cognitive  state 
sensors,  many  of  the  same  issues  were  again  highlighted,  with 
user  acceptance  (e.g.,  ease  of  use,  comfort,  safety,  trust,  control, 
etc.)  and  fieldability  (with  value-added  performance  benefits 
and  measurable  ROI)  being  key  points. 


Mitigation  Strategies 

Challenges  identified  for  each  of  the  main  facets  regarding  the 

effective  design,  use,  and  adoption  of  AugCog  mitigation  strate¬ 
gies  included: 

•  User,  system,  and  task  context  must  all  be  accounted  for 
when  determining  when  and  how  to  apply  a  mitigation  strat¬ 
egy,  and  systems  must  be  flexible  and  adaptable  enough  (and 
predictive)  to  meet  sometimes  rapid  and  dynamically  chang¬ 
ing  needs  of  all  these  contexts,  thereby  having  strategies  that 
are  both  specialized  and  generalizable  when  appropriate. 

•  Mitigation  strategies  need  to  be  designed  based  on  solid  the¬ 
oretical  and  applied  foundations  (e.g.,  cognitive,  psychology, 
neuroscience,  interface  design),  where  strategies  have  been 
proven  to  be  affecting  what  they  should  be  (e.g.,  cognitive 
state,  bottlenecks  and  subsequent  user  performance)  and 
have  been  proven  to  be  controllable  in  terms  of  potential 
negative  consequences  when  they  are  then  turned  off 

•  Traditional  user  interface  and  usability  concerns  still  apply, 
such  as  consistency,  ease  of  use,  smooth  state  transitions 
(e.g.,  mitigation  on/off),  reliability,  user  trust,  and  so  forth. 

•  A  methodological  approach  to  both  design  and  implementa¬ 
tion  of  AugCog  mitigation  strategies  is  needed  for  both  spe¬ 
cialized  (e.g.,  task-specific)  and  general  use  conditions. 


•  Sponsors  and  investors  need  to  see  consistent  proof  (e.g.,  ex¬ 
perimental  data,  case  studies,  field  uses,  etc.)  of  how  all  the 
above  challenges  are  being  understood,  met,  and  overcome 
and  that  benefits  of  applying  mitigation  strategies  can  be 
maximized  while  costs  are  minimized  in  order  to  justify  fund¬ 
ing  more  R&D  that  is  necessary  to  attain  proven,  robust  and 
matured  mitigation  strategies. 

•  It  should  be  demonstrated  how  the  mitigation  strategies  are 
‘AugCog”  relevant  and  not  just  better  HCI. 


Robust  Controllers 

The  most  important  challenges  to  the  effective  design,  use,  and 

adoption  of  robust  controllers  were  also  noted  and  included  the 

following: 

•  Given  the  complexity  of  HIR  mathematical  approaches  are 
needed,  but  are  not  the  sole  answer  to  developing  robust 
controllers,  as  there  is  a  need  for  real-time  user  and  system 
models  based  on  complex  human  biology  and  physiology, 
as  well  as  a  need  for  task  models,  which  take  all  contexts  into 
account. 

•  More  basic  and  applied  research  is  needed  to  develop  suffi¬ 
ciently  comprehensive  and  accurate  models  of  the  compo¬ 
nents  (e.g.,  user,  task,  system  contexts)  that  are  to  be  con¬ 
trolled,  as  well  as  determining  how  to  control  them  with 
appropriate  approaches,  be  they  mathematical  or  otherwise. 

•  Any  robust  controller  should  be  stable  and  seamless,  where 
a  user  is  unaware  of  when  a  controller  is  being  used  yet  trusts 
the  system  when  subsequent  effects  are  noticed  (e.g.,  miti¬ 
gation  turned  on/off)  and  therefore  benefits  and  not  suffers 
from  the  effects. 

•  Both  users  and  funding  sponsors  need  proof  of  the  effective¬ 
ness  of  any  controller’s  ability  to  integrate  the  human  within 
the  system-of-systems  architecture,  where  input  and  output 
of  information  flow  and  mitigation  control  are  sufficiently 
adaptable  to  improve  user  performance  and  maintain  over¬ 
all  system  stability. 

•  Return  on  Investment  (ROI)  must  be  justified  in  terms  of  de¬ 
velopment  cost  (time  and  money)  and  benefits  (significant 
user/system  performance  improvements). 


Roles  of  National  and  Supra-National  Institutions 

To  aid  in  ensuring  that  the  effective  design  and  implementation 
of  AugCog  technologies  continues,  a  few  main  challenges  for  na¬ 
tional  and  supra-national  institutions  (e.g.,  DARPA,  ONR,  NIH, 
and  NSF)  were  proposed.  The  overarching  theme  is  the  need 
for  more  funding  to  be  funneled  into  the  development  of 
AugCog  sensors  and  that  institutions  need  to  foster  the  ability  of 
HCI  practitioners  to  begin  identifying  uses  for  neurotechnolo¬ 
gies  in  various  application  domains.  Pursuing  such  goals  will 
eventually  increase  the  competition  and  foster  more  develop¬ 
ment  and  maturity  of  AugCog  neurotechnologies. 
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CONCLUSION 


The  field  of  AugCog  has  emerged  in  large  part  as  a  result  of  sub¬ 
stantial  investment  from  DoD-funded  programs  and  projects 
that  have  been  focused  on  developing  neurophysiological-based 
tools  and  techniques  to  enable  revolutionary  changes  in  HCI  sys¬ 
tem  design.  Whether  the  application  context  is  an  operational 
close-loop  system,  system  design  and  evaluation,  education  and 
training,  or  simply  entertainment,  such  tools  and  techniques  of¬ 
fer  the  ability  to  create  human-machine  synergy  and  optimiza¬ 
tion  never  before  realized.  Similar  to  Licklider’s  original  visions 
of  human-machine  symbiosis  in  I960,  AugCog  researchers  and 
practitioners  aim  to  build  tightly  coupled  brain-machine  inter¬ 
faces  that  surpass  the  information-handling  capacity  of  tradi¬ 


tional  HCI  systems  and  empower  one  operator  with  the  ability  to 
perform  a  job  normally  required  of  two  or  more  operators.  Such 
an  improvement  in  the  human-computer  integral  is  a  worthy 
goal.  Being  able  to  noninvasively  measure  and  assess  users’  cog¬ 
nitive  state  in  real  time,  and  then  use  automated  computational 
systems  to  modify  and  enhance  HIP  capabilities  of  these  users 
in  any  application  context,  is  a  goal  that  could  substantially  im¬ 
prove  human  performance  and  the  way  humans  interact  with 
computer-based  systems  in  the  21st  century.  It  is  up  to  HCI 
researchers  and  practitioners  to  begin  implementing  various 
AugCog  tools  and  techniques  by  selecting  the  most  appropriate 
neurophysiological  tools  for  their  task  applications  and  users. 
The  technologies  reviewed  in  this  chapter  will  hopefully  pro¬ 
vide  a  nice  “initial”  toolset  from  which  to  begin  such  a  selection 
process. 
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INTRODUCTION 


Human  values  and  ethical  considerations  no  longer  stand  apart 
from  the  human-computer  interaction  (HCI)  community — 
perhaps  in  some  separate  field  called  “computer  ethics” — but 
are  fundamentally  part  of  our  practice. 

This  shift  reflects,  at  least  in  part,  the  increasing  impact  and 
visibility  that  computer  technologies  have  had  on  human  lives. 
Computer  viruses  have  destroyed  data  on  millions  of  machines. 
Large  linked  medical  databases  can,  and  often  do,  infringe  on  in¬ 
dividuals’  privacy.  Accountability  becomes  a  problem  when  large 
computer  systems  malfunction  and  result  in  human  deaths.  The 
outcome  of  the  2000  presidential  elections  in  the  United  States 
may  have  hinged  on  the  poor  design  of  the  computerized 
Florida  election  ballot  (the  “butterfly  ballot”).  On  and  on,  the 
media  portray  such  problems.  Presidential  reports  (PITAC, 
1999)  ask  that  such  problems  be  addressed.  Corresponding  re¬ 
search  agendas  on  human  values  have  been  integrated  into  re¬ 
cent  government-funded  programs,  such  as  the  National  Sci¬ 
ence  Foundation’s  funding  initiative  on  information  technology 
research.  Thus,  the  writing  decades  ago  by  cyberneticists  like 
Norbert  "Wiener  (1953/1985)  and  computer  scientists  like 
Joseph  "Weizenbaum  (1972)  now  seem  prescient.  They  argued 
that  humans  fundamentally  control  technology  and  that  we 
must  make  wise  and  humane  choices  about  its  design  and  use. 

But,  if  human  values,  especially  those  with  ethical  import — 
such  as  rights  to  privacy  and  property,  physical  welfare,  in¬ 
formed  consent,  trust,  and  accountability,  to  name  but  a  few — 
are  important,  they  are  no  less  controversial.  "What  values  count? 
"Who  decides?  Are  values  relative,  or  are  some  values  universal 
but  expressed  differently  across  cultures  and  contexts?  Does 
technology  even  have  values?  If  not,  how  do  values  become  im¬ 
plicated  in  design?  It  is  also  clear  that  values  can  conflict.  For 
example,  shared  online  calendars  can  enhance  access  to  infor¬ 
mation  within  an  organization  but  at  the  expense  of  individual 
privacy  (Palen  &  Grudin,  2002).  Thus,  on  what  basis  do  some 
values  override  others  in  the  design  of,  say,  hardware,  algo¬ 
rithms,  databases,  and  interfaces?  Finally,  how  can  HCI  design¬ 
ers  working  within  a  corporate  structure  and  with  a  mandate 
to  generate  revenue  bring  values  and  ethics  into  their  designs? 

In  this  chapter,  we  review  how  the  field  has  addressed  such 
questions.  The  second  section  reviews  positions  on  how  values 
become  implicated  in  technological  designs.  The  third  section 
distinguishes  usability  from  human  values  with  ethical  import. 
The  fourth  section  reviews  foundational  issues  that  make  ethics 
so  controversial.  We  also  show  how  scholars  have  bounded 
moral  controversies  in  ways  that  make  it  possible  for  practition¬ 
ers  to  shape  their  work  from  an  ethical  stance.  The  fifth  section 
reviews  the  major  HCI  approaches  to  the  study  of  human  val¬ 
ues,  ethics,  and  design:  computer  ethics,  social  informatics, 
computer-supported  co-operative  work,  participatory  design, 
and  value  sensitive  design.  We  discuss  the  strengths  (and  limi¬ 
tations)  of  each  approach  by  drawing  on  exemplar  projects.  The 
sixth  section  calls  attention  to  key  values  relevant  for  design: 
human  welfare,  ownership,  property,  privacy,  freedom  from  bias, 
universal  usability,  trust,  autonomy,  informed  consent,  account¬ 
ability,  identity,  calmness,  and  environmental  sustainability.  Each 


of  these  values  could  merit  its  own  chapter;  and,  indeed,  in  this 
handbook,  some  do.  Yet  by  reviewing  them  here,  however 
briefly,  we  highlight  their  ethical  status  and  thereby  suggest  that 
they  have  a  distinctive  claim  on  resources  in  the  design  process. 
The  seventh  section  reviews  the  special  ethical  responsibilities 
that  HCI  professionals  incur  by  virtue  of  their  professional 
standings.  Finally,  in  the  eighth  section,  we  highlight  a  few  sug¬ 
gestions  for  moving  the  field  forward. 


HOW  VALUES  BECOME  IMPLICATED 
IN  TECHNOLOGICAL  DESIGN 


Technological  innovations  implicate  human  values.  Consider,  for 
example,  that  in  the  early  1900s,  missionaries  introduced  a  tech¬ 
nological  innovation — steel  ax  heads — to  the  Yir  Yoront  of  Aus¬ 
tralia,  a  native  people.  The  missionaries  did  so  without  regard 
for  traditional  restrictions  on  ownership  and  indiscriminately 
distributed  the  ax  heads  to  men,  women,  old  people,  and  young 
adults  alike.  In  so  doing,  they  altered  relationships  of  depen¬ 
dence  among  family  members  and  reshaped  conceptions  of 
property  within  the  culture  (Sharp,  1952/1980).  In  another  ex¬ 
ample,  about  four  decades  ago,  snowmobiles  were  introduced 
into  the  Inuit  communities  of  the  Arctic  and  largely  replaced 
travel  by  dog  sleds.  This  technological  innovation  thereby 
altered  not  only  patterns  of  transportation,  but  also  symbols  of 
social  status,  and  moved  the  Inuit  toward  a  dependence  on  a 
money  economy  (Houston,  1995;  Pelto,  1973).  In  a  computer 
example,  electronic  mail  rarely  displays  the  sender’s  status.  Is 
the  sender  perhaps  a  curious  layperson,  a  system  analyst,  a  full 
professor,  a  journalist,  an  assistant  professor,  an  entry-level  pro¬ 
grammer,  a  senior  scientist,  or  a  high-school  student?  Nobody 
knows  until  the  e-mail  is  read  (and  maybe  not  even  then).  This 
design  feature  (and  associated  conventions)  has  thereby  played 
a  significant  role  in  allowing  electronic  communication  to  cross 
traditional,  hierarchical  boundaries  and  to  contribute  to  the  re¬ 
structuring  of  organizations  (Sproull  &  Kiesler,  1991). 

However,  how  exactly  do  values  become  implicated  in  tech¬ 
nological  designs?  Three  types  of  positions  have  been  offered  in 
the  literature:  embodied,  exogenous,  and  interactional. 

The  Embodied  Position 

The  embodied  position  holds  that  designers  inscribe  their  own 
intentions  and  values  into  the  technology;  once  developed  and 
deployed,  the  resulting  technology  determines  specific  kinds  of 
human  behavior.  Indeed,  in  the  history  of  science  literature 
(Smith  &  Marx,  1994),  this  position  is  sometimes  referred  to  as 
“technological  determinism.”  To  illustrate  the  idea,  consider 
Latour’s  (1992)  description  of  the  “Berliner  lock  and  key”:  a  de¬ 
vice  designed  to  deal  with  people  who  forget  to  lock  the  door 
behind  them.  The  device  initially  works  as  a  normal  lock.  You 
put  your  key  into  the  lock,  turn  the  key,  and  the  door  unlocks. 
Then,  to  remove  the  key,  it  is  necessary  to  push  it  through  the 
key  hole  to  the  other  side.  Then,  by  moving  to  the  other  side 
of  the  door,  you  rotate  the  key  one  more  turn,  thereby  relock- 
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ing  the  door.  Thus,  when  used,  this  technology  determines  that 
people  will  lock  their  doors. 

There  is  a  hard  and  soft  version  of  the  embodied  position. 
In  the  hard  version,  it  is  argued  that  the  very  meaning  and  in¬ 
tentions  that  designers  and  builders  bring  to  their  task  literally 
become  a  part  of  the  technology  (cf ,  Appadurai,  1988;  Cole, 
1991).  Many  people  have  trouble  with  this  version  because  it 
imputes  mental  states  to  things  that  do  not  seem  to  have  the 
capacity  to  have  them.  As  Smith  and  Marx  (1994)  wrote: 

Critics  of  “hard”  determinism  question  the  plausibility  of  imputing 
agency  to  “technology”.  .  .  .  How  can  we  reasonably  think  of  this  ab¬ 
stract,  disembodied,  quasi-metaphysical  entity  [technology] ,  or  of  one 
of  its  artifactual  stand-ins  (e.g.,  the  computer),  as  the  initiator  of  ac¬ 
tions  capable  of  controlling  human  destiny?  (p.  xii) 

Yet,  it  is  possible  that  as  the  field  of  artificial  intelligence 
moves  forward — for  example,  as  computational  learning  sys¬ 
tems  increasingly  mimic  human  agency — the  strong  version  of 
the  embodied  position  will  gain  some  currency. 

In  the  more  common  soft  version,  it  is  recognized  that  ob¬ 
jects  themselves  do  not  literally  embody  an  intention  or  value. 
It  is  also  recognized  that  designers  themselves  are  shaped  by 
organizational,  political,  and  economic  forces  and  that,  because 
of  such  forces,  a  particular  technology  may  never  take  hold  in 
a  society.  But,  it  is  argued,  if  the  technology  takes  hold,  then  it 
becomes  very  difficult  for  an  individual  or  society  to  override 
the  values  driven  by  the  technology.  Think  about  how  the  de¬ 
sign  and  deployment  of  passenger-side  front  air  bags  have  in  a 
sense  held  children  hostage  in  the  back  seat  of  millions  of  cars. 
Or,  consider  that  a  virus  program  readily  destroys  data  or  grabs 
CPU  cycles  (or  whatever  it  is  programmed  to  do) .  Granted,  you 
could  perhaps  use  the  program  to  teach  about  self-replicating 
code,  but  such  activity  would  in  effect  be  secondary  to  its  pri¬ 
mary  function.  In  other  words,  to  execute  the  computer  virus 
code  likely  leads  to  the  destruction  of  data.  Similarly,  if  your 
goal  in  online  interactions  is  to  keep  data  secure,  you  can  de¬ 
sign  technical  security  mechanisms,  such  as  mandatory  en¬ 
cryption,  certificates,  and  anonymizers.  Such  designs  ensure 
behavior.  Thus,  according  to  Akrich  (1992),  a  large  part  of  a 
designer’s  work  is  that  of  “inscribing”  or  embodying  his  or  her 
vision  of  “the  world  in  the  technical  content  of  the  new  object” 

(p.  208). 


The  Exogenous  Position 

The  exogenous  position  holds  that  societal  forces — that  involve, 
for  example,  economics,  politics,  race,  class,  gender,  and  reli¬ 
gion — significantly  shape  how  a  deployed  technology  will  be 
used.  Consider,  for  example,  the  Internet.  In  many  ways,  its  ba¬ 
sic  file-sharing  functionality  remains  unchanged  from  its  initial 
inception.  Files  are  still  broken  into  packets,  and  packets  are 
routed  and  then  reassembled.  Why  is  it,  then,  that  this  under¬ 
lying  technology  was  primarily  used  for  nearly  a  quarter  of  a  cen¬ 
tury  by  the  scientific  and  educational  communities  before  be¬ 
coming  appropriated  by  the  private  sectors  for  commerce? 
Once  appropriated,  why  did  the  development  of  e-commerce 


occur  so  rapidly?  According  to  the  exogenous  theorists,  answers 
to  such  questions  cannot  be  found  in  the  supposedly  embodied 
values  of  the  technology  itself.  Rather: 

To  understand  the  origin  of  a  particular  kind  of  technological  power, 
we  must  first  learn  about  the  actors.  Who  were  they?  What  were  their 
circumstances? .  .  .  Why  was  this  innovation  made  by  these  people  and 
not  others?  Why  was  it  possible  at  this  time  and  this  place  rather  than  an¬ 
other  time  and  place? .  .  .  Instead  of  treating  “technology”  per  se  as  the 
locus  of  historical  agency . . .  [these  advocates]  locate  it  in  a  far  more  var¬ 
ious  and  complex,  social,  economic,  political  and  cultural  matrix.  (Smith 
&  Marx,  1994,  p.  xiii) 

In  turn,  the  exogenous  position  has  been  used  and  devel¬ 
oped  in  various  ways.  For  example,  Bulliet  (1994)  examined 
three  technologies  that  emerged  in  the  Islamic  world  in  the  fifth 
century:  block  printing,  the  harnessing  of  draft  animals,  and 
wheeled  transport.  Each  technology  failed  to  have  an  immedi¬ 
ate  transforming  social  and  economic  impact,  not  because  of 
its  lack  of  economic  advantage,  but  because  of  social  filters, 
which  involved  race,  class,  and  lifestyle.  Perdue  (1994)  explored 
the  comparative  history  of  agrarian  societies  (medieval  West¬ 
ern  European,  18th-century  Russian,  and  14th-century  Chi¬ 
nese).  He  argued  that  studies  of  technology  need  contextual  ac¬ 
counts  that  integrate  environmental,  technological,  social,  and 
cultural  elements.  Scranton  (1994)  drew  on  postmodern  theory 
to  argue  that  local  contingencies,  diversities,  disjunctions,  mul¬ 
tiple  oppositions,  and  contrasting  norms  should  be  central  to 
exogenous  explanations.  Hughes  (1994)  suggested  that  younger 
developing  technologies  tend  to  be  more  open  to  socio-cultural 
influences,  whereas  older,  more  mature  technologies  tend  to  be 
more  deterministic.  Through  many  such  studies,  a  recurring 
view  is  that  technological  systems  are  not  value  neutral  but  in¬ 
variably  favor  the  interests  of  people  with  economic  and  politi¬ 
cal  power  (Noble,  1991;  Smith,  1994;  Winner,  1986). 


The  Interactional  Position 

The  interactional  position  holds  that,  whereas  the  features  or 
properties  that  people  design  into  technologies  more  readily 
support  certain  values  and  hinder  others,  the  technology’s  actual 
use  depends  on  the  goals  of  the  people  interacting  with  it.  A 
screwdriver,  after  all,  is  well  suited  for  turning  screws,  and  yet 
amenable  as  a  poker,  pry  bar,  nail  set,  cutting  device,  and  tool  to 
dig  up  weeds.  Moreover,  through  human  interaction,  technology 
itself  changes  over  time.  On  occasion,  such  changes  (as  empha¬ 
sized  in  the  exogenous  position)  can  mean  the  societal  rejec¬ 
tion  of  a  technology,  or  that  its  acceptance  is  delayed.  But  more 
often,  it  entails  an  iterative  process  whereby  technologies  get 
invented  and  then  redesigned  based  on  user  interactions,  which 
then  are  reintroduced  to  users,  further  interactions  occur,  and 
further  redesigns  implemented.  Typical  software  updates  (e.g.,  of 
word  processors,  browsers,  and  operating  systems)  epitomize 
this  iterative  process. 

Two  sides  of  the  interactional  position  have  been  empha¬ 
sized  in  the  literature.  One  side  emphasizes  the  properties  de¬ 
signed  into  the  technology  (Friedman,  1997a).  For  example,  for 
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the  moment,  let  us  agree  that  disabled  people  in  the  work  place 
should  be  able  to  access  technology,  just  as  they  should  be  able 
to  access  a  public  building  (Perry,  Macken,  Scott,  &  McKinley, 
1997).  As  system  designers,  we  can  make  the  choice  to  try  to 
construct  a  technological  infrastructure  that  disabled  people 
can  access.  If  we  do  not  make  this  choice,  then  we  single  hand- 
edly  undermine  the  human  value  of  universal  access. 

The  other  side  of  the  interactional  position  emphasizes  how 
users  use  the  technology  in  the  context  of  social-organizational 
structures.  Orlikowski  (2000),  for  example,  studied  the  use  of 
Lotus  Development  Corporation’s  Notes  software  by  two  groups 
within  a  large  multinational  consulting  firm:  technologists  and 
consultants.  The  technologists  used  Azotes  extensively.  They 
used  e-mail,  maintained  electronic  discussions  with  Notes  data¬ 
bases,  and  created  their  own  database  designs.  Moreover,  “sup¬ 
ported  by  the  cooperative  norms  of  technical  support,  the  tech¬ 
nologists  used  many  of  the  properties  oi Notes  to  promote  their 
collective  technical  work,  and  to  cooperate  with  each  other” 
(Orlikowski,  2000,  p.  415).  In  contrast,  the  consultants  used 
Notes  minimally,  sometimes  even  begrudgingly.  Orlikowski 
found  that  what  she  calls  such  “technology-in-practice”  was  en¬ 
acted  for  at  least  three  different  reasons.  First,  some  consultants 
had  doubts  about  the  value  oi Notes  for  their  performances. 
Second,  in  contrast  to  the  technologists,  the  consultants  were 
under  a  time-based  billing  structure.  “Because  many  consultants 
did  not  see  using  Notes  as  an  activity  that  could  be  billed  to 
clients,  they  were  unwilling  to  spend  time  learning  or  using  it” 
(Orlikowski,  2000,  p.  416).  Third,  consultants  feared  that  the 
collaborative  properties  oi  Notes  would  threaten  their  statuses 
within  the  company.  Thus,  Orlikowski  proposed  “a  view  of  tech¬ 
nology  structures,  not  as  embodied  in  given  technological  arti¬ 
facts,  but  as  enacted  by  the  recurrent  social  practices  of  a  com¬ 
munity  of  users”  (p.  421). 

Regardless  of  emphasis,  from  the  interactional  position  it 
should  be  clear  that  design  and  social  context  matter,  dialecti¬ 
cally.  Moreover,  users  are  not  always  powerless  when  faced  with 
unwelcome  value-oriented  features  of  a  technology. 


DISTINGUISHING  USABILITY  FROM  HUMAN 
VALUES  WITH  ETHICAL  IMPORT 


The  language  and  conceptualizations  within  HCI  currently  pro¬ 
vide  solid  means  by  which  to  pursue  issues  of  usability  (Adler 
&'Winograd,  1992;  Nielsen,  1993;  Norman,  1988).  Usability  refers 
to  characteristics  of  a  system  that  make  it  work  in  a  functional 
sense,  including  that  it  is  easy  to  use,  easy  to  learn,  consistent, 
and  recovers  easily  from  errors. 

Some  HCI  designers,  however,  have  a  tendency  to  conflate 
usability  with  human  values  with  ethical  import.  (In  the  next 
section,  we  discuss  what  exactly  is  meant  by  the  term  moral. 
Here,  it  can  be  understood  to  involve  issues  of  fairness,  justice, 
human  welfare,  and  virtue.)  This  conflation  arises  insofar  as  us¬ 
ability  is  itself  a  human  value  although  not  always  a  moral  one. 
However,  when  it  is,  both  can  be  addressed  by  the  same  design. 
For  example,  systems  that  can  be  modified  by  users  to  meet  the 
needs  of  specific  individuals  or  organizations  can  both  (a)  en¬ 
hance  usability  and  (b)  help  users  to  realize  their  goals  and  in¬ 


tentions:  the  moral  value  of  autonomy.  Other  times,  however, 
usability  can  conflict  with  human  values  with  ethical  import. 
Nielsen  (1993),  for  example,  asked  us  to  imagine  a  computer 
system  that  checks  for  fraudulent  applications  of  people  who 
are  applying  for  unemployment  benefits.  Specifically,  the  system 
asks  applicants  numerous  personal  questions,  and  then  checks 
for  inconsistencies  in  their  responses.  Nielsen’s  point  is  that 
even  if  the  system  receives  high  usability  scores,  some  people 
may  not  find  the  system  socially  acceptable,  based  on  the  moral 
value  of  privacy. 

In  terms  of  a  general  framework,  four  pair  wise  relationships 
exist  between  usability  and  human  values  with  ethical  import. 

1.  A  design  is  good  for  usability  and  independently  good  for 
human  values  with  ethical  import.  This  relationship  is  ex¬ 
emplified  previously,  where  adaptable  systems  can  also  pro¬ 
mote  user  autonomy.  Another  example  involves  browser  de¬ 
signs  that  offer  users  more  efficient  cookie  management 
than  currently  offered  in  Netscape  or  Internet  Explorer  and 
that  promote  values  of  informed  consent  and  privacy  (Fried¬ 
man,  Howe,  &  Felten,  2002). 

2.  A  design  is  good  for  usability,  but  at  the  expense  of  human 
values  with  ethical  import.  This  relationship  is  exemplified 
in  the  previous  example  offered  by  Nielsen  (1993),  where  a 
highly  usable  system  undermines  the  value  of  privacy.  Tang 
(1997)  provided  another  example  by  means  of  a  case  study  of 
a  team  designing  a  workstation.  At  one  point,  the  team  was 
trying  to  decide  how  to  power  a  microphone,  and  they  finally 
decided  to  power  it  directly  from  the  workstation.  Conse¬ 
quently,  they  eliminated  a  separate  hardware  on/off  switch  on 
the  microphone.  From  a  usability  perspective,  Tang  pointed 
to  benefits  of  this  design.  Users,  for  example,  no  longer  had 
to  remember  to  turn  the  microphone  off  when  it  was  not 
in  use  (to  conserve  battery  power)  or  be  inconvenienced  by 
replacing  a  dead  battery.  However,  despite  these  usability 
advantages,  Tang  found  that  some  users  thought  the  de¬ 
sign — by  not  allowing  them  direct  control  over  the  micro¬ 
phone — undermined  their  privacy  and  security. 

3.  A  design  is  good  for  human  values  with  ethical  import,  but 
at  the  expense  of  usability.  This  relationship  is  exemplified 
by  choosing  a  browser  preference  with  the  setting  “accept  or 
decline  each  cookie  individually.”  The  values  of  privacy  and 
informed  consent  are  well  supported,  but,  for  most  people, 
the  nuisance  factor  is  too  high.  Another  example  (the  flip 
side  of  an  earlier  one)  arises  when  a  system  is  purposefully 
made  adaptable  to  promote  autonomy,  but  in  the  process, 
the  system  becomes  unwieldy  and  difficult  to  use. 

4.  A  design  good  for  usability  is  necessary  to  support  human 
values  with  ethical  import.  This  relationship  is  exemplified 
by  security  systems  that  are  not  so  cumbersome  to  use  that 
either  critical  features  of  the  security  system  are  disabled  or 
individuals  invent  workarounds  that  compromise  the  secu¬ 
rity.  For  example,  if  a  security  system  requires  three  pass¬ 
words  and  10  steps  to  login,  a  user’s  workaround  might  in¬ 
volve  taping  a  yellow  sticky  to  the  side  of  the  computer 
screen  with  the  passwords  (for  an  in  depth  treatment  of  us¬ 
able  security,  see  Cranor  &  Garfinkel,  2005).  Other  instances 
of  this  relationship  arise  when  morality  requires  that  all  in¬ 
dividuals  in  a  specified  group  be  able  to  use  the  system  sue- 
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cessfully.  For  example,  to  have  a  fair  national  election  using 
a  computerized  voting  system,  all  citizens  of  voting  age  must 
be  able  to  use  the  system.  Bederson,  Lee,  Sherman,  Herrn- 
son,  and  Niemi  (2003)  surveyed  issues  relating  to  usability 
of  electronic  voting  system  and  reported  voter  concerns. 

HCI  professionals  are  often  responsible  for  usability.  Accord¬ 
ingly,  it  is  important  that  we  be  aware  of  the  possible  relation¬ 
ships  between  usability  and  human  values  with  ethical  import 
(e.g.,  Cranor  &  Garfinkel,  2005,  on  security  and  usability).  At 
times,  the  two  support  one  another.  Other  times  we  need  to 
give  ground  on  usability  to  promote  human  values  with  ethical 
import  or,  conversely,  give  ground  on  human  values  with  ethical 
import  to  promote  usability.  Such  optimizations  require  judi¬ 
cious  decisions,  carefully  weighing  and  coordinating  the  advan¬ 
tages  of  each. 


MORAL,  PHILOSOPHICAL,  AND 
PSYCHOLOGICAL  FOUNDATIONS 


HCI  professionals — like  people  in  many  fields — may  sometimes 
wonder  if  morality  is  too  controversial  to  be  integrated  in  a  prin¬ 
cipled  way  into  their  work.  After  all,  who  is  to  say  for  another 
person  what  is  right,  wrong,  good,  or  bad?  How  do  we  make 
sense  of  the  seemingly  different  moralities  among  people?  Is 
not  morality  relative  to  person  or  at  least  culture? 

Such  questions  form  the  backdrop  to  a  wide  range  of  issues 
that  moral  philosophers  and  psychologists  have  pursued.  In  this 
section,  we  review  relevant  literature  to  help  bound  moral  con¬ 
troversies  such  that  HCI  researchers  and  practitioners  can,  with 
legitimate  grounding,  move  forward  proactively  in  shaping  their 
work  from  a  moral  stance. 

Moral  Theory 

One  common  starting  point  in  moral  discussions  is  to  raise  the 
questions,  “What  do  you  mean  by  morality?”  and  “How  do  you 
define  it?”  In  response,  moral  philosophers  have  offered  what  can 
be  viewed  as  three  overarching  approaches  toward  developing 
moral  theory:  consequentialist,  deontological,  and  virtue-based. 
Briefly  stated,  consequentialist  theories  maintain  that  a  moral 
agent  must  always  act  to  produce  the  best  available  outcomes 
overall  (for  an  analysis,  see  Scheffler,  1982).  Utilitarianism  is  the 
most  common  form  of  consequentialism,  wherein  a  moral  agent 
should  act  to  bring  about  the  greatest  amount  of  utility  (e.g.,  hap¬ 
piness)  for  the  greatest  number  of  people.  In  contrast,  deonto¬ 
logical  theories  maintain  that  a  moral  agent  is  forbidden  to  do 
some  actions,  or  in  turn,  must  do,  regardless  of  consequences 
(Dworkin,  1978;  Gewirth,  1978;  Kant,  1785/1964;  Rawls,  1971). 
For  example,  a  deontologist  might  maintain  that  it  is  immoral  to 
torture  an  innocent  child  even  if  such  an  act  would  bring  about 
great  good  (e.g. ,  to  prevent  a  bombing) .  Along  similar  lines ,  it  can 
be  argued  that  vendors  who  collect  and  sell  information  about  in¬ 
dividuals’  web-browsing  activity  without  individuals’  knowledge 
and  consent  violate  individuals’  right  to  privacy.  Indeed,  most 
any  argument  for  a  basic  right  makes  a  deontological  move. 


Both  consequentialist  and  deontological  theories  are  cen¬ 
trally  concerned  with  answering  the  fundamental  question, 
“What  ought  I  to  do?”  In  turn,  answers  are  often  viewed  to  be 
morally  obligatory,  meaning  that  the  action  is  required  of  every 
moral  agent  “regardless  of  whether  he  wants  to  accept  them  or 
their  results,  and  regardless  also  of  the  requirements  of  any 
other  institutions  such  as  law  or  etiquette”  (Gewirth,  1978,  p.  1). 
Moreover,  such  obligatory  prescriptions  are  often  framed  in  the 
negative  (as  in  “Thou  shall  not  steal”)  such  that  it  is  possible  to 
fulfill  the  prescription  (e.g.,  just  do  not  steal).  In  turn,  virtue- 
based  theories  are  centrally  concerned  with  answering  the  fun¬ 
damental  question,  “What  sort  of  person  ought  I  to  be?”  in 
which  the  focus  is  on  long-term  character  traits  and  personality 
(see  Louden,  1984).  This  tradition  dates  back  to  Aristotle’s  de¬ 
lineation  in  Nichomachean  Ethics  of  the  ethical  virtues  (e.g., 
courage,  temperance,  friendship,  wisdom,  and  justice),  and  de¬ 
veloped,  for  instance,  by  Foot  (1978),  MacIntyre  (1984),  and 
Campbell  and  Christopher  (1996).  In  virtue  ethics,  the  pre¬ 
scription  is  often  viewed  to  be  morally  discretionary,  meaning 
that  whereas  an  action  is  not  morally  required,  it  is  conceived 
of  as  morally  worthy  and  admirable  based  on  considerations  of 
welfare  and  virtue  (Kahn,  1992).  Moreover,  such  discretionary 
prescriptions  are  often  framed  in  the  positive  (as  in  “be  charita¬ 
ble”),  such  that  it  is  not  possible  to  satisfy  the  declaration  com¬ 
pletely.  After  all,  one  cannot  continuously  practice  charity  with¬ 
out  soon  becoming  destitute  oneself  (Fishkin,  1982). 

With  these  core  distinctions  in  mind — between  a  theory  of 
the  right  (both  consequentialism  and  deontology)  that  is  largely 
viewed  as  obligatory  and  a  theory  of  the  good  (virtue  theory) 
that  is  largely  viewed  to  be  discretionary — it  is  possible  to  char¬ 
acterize  a  wide  range  of  HCI  designs.  For  example,  consider 
Mattel  Corporation’s  “Barbie  doll”  website  that  was  up  and  run¬ 
ning  during  the  latest  census  collection  in  the  United  States. 
On  the  website.  Barbie  posed  as  a  census  taker  and  asked  chil¬ 
dren  to  provide  information  about  their  families.  Presumably, 
Mattel  sought  to  use  the  resulting  information  to  market  their 
products  more  effectively.  Is  this  action  moral?  Most  people 
would  presumably  say  no.  But  why?  From  a  deontological  posi¬ 
tion,  such  a  deceptive  web  design  violates  a  moral  obligation 
(e.g.,  the  obligation  not  to  intentionally  deceive  others,  let  alone 
to  deceive  children  for  a  corporation’s  material  gain),  and 
should  not  have  been  designed  and  deployed. 

Other  human  values  articulated  in  HCI  fit  within  a  virtue  ori¬ 
entation.  For  example,  SeniorNet  is  an  organization  that  brings 
seniors  together  via  computer  networks.  Mynatt,  Adler,  Ito, 
Linde,  and  O’Day  (1999)  found  that  “SeniorNetters  repeatedly 
commented  on  the  warmth  and  friendliness  of  the  community 
as  something  that  differentiated  SeniorNet  from  other  net  com¬ 
munities,  and  as  a  reason  for  their  participation  and  comfort 
with  the  community”  (p.  232).  Are  values  such  as  warmth  and 
friendliness  moral  values?  For  a  consequentialist  and  even  more 
so  for  a  deontologist,  the  answer  is  probably  no — a  person,  af¬ 
ter  all,  is  presumably  not  under  a  moral  obligation  to  engage 
in  warm-hearted  actions.  However,  for  a  virtue  ethicist  the  an¬ 
swer  is  fundamentally  yes,  as  such  values  are  conceived  to  be 
central  to  moral  personhood.  Consider  websites  (such  as  Alfie 
.com)  that  offer  children  a  menu  of  computer  games,  like  pin¬ 
ball  or  miniature  golf,  that  children  can  play.  Most  of  these 
games  do  not  promote  such  virtues  as  friendship,  caring,  or 
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compassion.  Must  they?  Presumably,  even  virtue  theorists 
would  say  that  not  every  game  has  to.  Rather,  promoting  such 
virtues  is  usually  conceived  as  discretionary — perhaps  morally 
praiseworthy  if  done,  and  perhaps  contributing  to  an  account 
of  human  flourishing — but  not  morally  obligatory. 

Moral  Epistemology 

This  discussion  raises  the  question,  “Who  is  to  say  what’s  moral?” 
or  “How  do  we  know?”  Such  questions  move  us  into  the  field  of 
moral  epistemology:  the  study  of  the  limits  and  validity  of  moral 
knowledge.  Often  at  stake  is  whether  it  is  even  possible  for  a 
moral  statement  to  be  objectively  true  or  false  and  for  a  moral 
value  to  be  objectively  right,  wrong,  good,  or  bad.  A  wide  variety 
of  positions  has  been  taken.  For  instance,  some  believe  that  moral 
knowledge  corresponds  to  or  approaches  a  correspondence  with 
a  moral  reality  that  exists  independent  of  human  means  of  know¬ 
ing  (Boyd,  1988;  Sturgeon,  1988).  Others  believe  moral  knowl¬ 
edge  can  be  objectively  grounded  by  constructing  rational  prin¬ 
ciples  that  strive  for  coherence  and  consistency  while  building  on 
the  common  ground  and  specific  circumstances  of  a  society 
(Dworkin,  1978;  Habermas,  1979).  Others  believe  the  only  thing 
that  can  be  said  of  moral  knowledge  is  that  it  can  be  true  subjec¬ 
tively  for  an  individual  or  culture  depending  on  that  individual’s 
or  group  of  individuals’  desires,  preferences,  and  goals  (Mackie, 
1977;  Rorty  1982).  Finally,  others  believe  that  any  moral  knowl¬ 
edge  is  unattainable,  even  in  a  weak  sense  (the  full  skeptic’s 
position;  for  a  characterization,  see  Nagel,  1986). 

The  skeptic’s  position — that  moral  knowledge  is  unattain¬ 
able — reflects  the  morally  relativistic  position  described  previ¬ 
ously:  that  no  one  can  say  what  is  really  right  or  wrong,  so  ethics 
and  values  become,  at  best,  a  personal  choice  in  terms  of  HCI 
design.  However,  consider  that  no  one  can  prove  that  at  this 
moment  you  are  not  really  just  a  brain  in  a  vat  being  stimulated 
by  electrical  impulses  to  think  that  you  are  reading  this  chap¬ 
ter.  Still,  you  have  compelling  reasons  to  think  that  proposition 
false  and  similarly  with  morality.  Although  the  skeptic’s  posi¬ 
tion  cannot  be  proved  conclusively  false,  moral  philosophers 
have  provided  compelling  reasons  not  to  believe  it  (Nagel,  1986; 
Williams,  1985). 

Moral  Variability  and  Universality 

Anthropologists  often  document  moral  differences  between 
cultures.  For  example,  from  some  anthropological  accounts,  we 
learn  that  devout  Hindus  believe  that  it  is  immoral  for  a  widow 
to  eat  fish  or  for  a  menstruating  woman  to  sleep  in  the  same  bed 
with  her  husband  (Shweder,  Mahapatra,  &  Miller,  1987).  Other 
accounts  document  that  members  of  the  Yanomamo  tribe  of 
Brazil  at  times  practice  infanticide  and  that  the  women  are  “oc¬ 
casionally  beaten,  shot  with  barbed  arrows,  chopped  with  ma¬ 
chetes  or  axes,  and  burned  with  firebrands”  (Hatch,  1983, 
p.  91).  Per  our  discussion,  there  are  three  ideas  to  understand 
about  such  examples. 

1.  Variability  in  human  practice  does  not  prove  or  disprove 
the  moral  skeptic’s  position.  Imagine  going  to  a  culture 


where  the  people  there  did  not  believe  in  logical  transitivity. 
(If  A  =  B  and  5  =  C,  then  A  =  C.)  That  finding  would  be  in¬ 
teresting  psychologically  and  culturally,  but  presumably  has 
no  bearing  on  whether  logical  transitivity  is  true  or  false.  It 
is  or  it  is  not.  If  it  is  true,  then  people  who  think  otherwise 
are  simply  wrong.  So,  too,  with  the  moral  life.  People  can 
believe  that  a  certain  act  (such  as  shooting  women  with 
barbed  arrows)  is  moral;  but  documenting  such  a  belief  does 
not  make  it  so. 

2.  When  moral  differences  occur  between  peoples,  it  is  not 
necessarily  the  case  that  the  practices  are  believed  legiti¬ 
mate  by  the  victims.  For  example,  in  Hatch’s  (1983)  report 
on  the  Yanomamo,  he  also  reported  that  the  women  did  not 
appear  to  enjoy  such  physically  abusive  treatment  and  were 
seen  running  in  apparent  fear  from  such  assaults.  Psycholog¬ 
ical  data  of  a  similar  kind  can  be  found  in  a  study  by  Wainryb 
(1995)  on  the  Druze  population  in  Israel.  The  Druze  largely 
live  in  segregated  villages,  are  of  Islamic  religious  orientation, 
and  are  organized  socially  around  patriarchal  relationships. 
The  father,  as  well  as  brothers,  uncles,  and  other  male  rela¬ 
tives,  and  eventually  a  woman’s  husband,  exercise  consider¬ 
able  authority  over  women  and  girls  in  the  family  and  restrict 
their  activities  to  a  large  degree.  However,  when  these  women 
were  interviewed,  a  majority  of  them  (78%)  unequivocally 
stated  that  the  husband  or  father’s  demands  and  restrictions 
were  unfair.  Thus,  Yanomamo  and  Druze  women — like  many 
women  in  Western  societies — are  often  unwilling  victims 
within  what  they  themselves  perceive  to  be  an  uncaring  or 
unjust  society.  In  such  situations,  it  maybe  less  the  case  that 
societies  differ  on  moral  ground  and  more  that  some  soci¬ 
eties  (including  Western  societies)  are  involved  explicitly  in 
immoral  practices  (Wainryb,  2006). 

3.  Moral  variability  may  be  much  less  pervasive  than  many 
people  suppose.  Reconsider,  for  example,  the  Shweder  et  al. 
(1987)  report  that  devout  Hindus  believe  it  is  immoral  for  a 
widow  to  eat  fish  or  for  a  menstruating  woman  to  sleep  in  the 
same  bed  as  her  husband  (but  two  of  many  dozens  of  their 
examples).  At  first  glance,  for  Western  eyes  at  least,  such 
moral  beliefs  do  seem  different.  However,  when  Shweder 
et  al.’s  data  were  reanalyzed  by  Turiel,  Killen,  and  Helwig 
(1987),  they  showed  bases  for  not  only  difference  but  also 
moral  commonality.  For  example,  in  their  reanalysis,  Turiel 
et  al.  found  that  devout  Hindus  believed  that  harmful  conse¬ 
quences  would  follow  from  a  widow  who  ate  fish  (the  act 
would  offend  her  husband’s  spirit  and  cause  the  widow  to 
suffer  greatly)  and  from  a  menstruating  woman  who  sleeps  in 
the  same  bed  with  her  husband  (the  menstrual  blood  is  be¬ 
lieved  poisonous  and  can  hurt  the  husband).  Although  such 
beliefs,  themselves,  differ  from  those  in  Western  culture,  the 
underlying  concern  for  the  welfare  of  others  is  congruent 
with  them.  More  generally,  Turiel  et  al.  claimed  that  when  re¬ 
searchers  differentiate  informational  and  metaphysical  as¬ 
sumptions  about  the  world  from  moral  judgments  based  on 
those  assumptions,  then  the  moral  life  often  takes  on  a 
greater  universal  cast  (cf ,  Helwig,  2006;  Turiel,  1998,  2002). 

When  analyzing  moral  variability,  conceptualizations  of 

morality  that  entail  abstract  characterizations  of  justice  and  wel¬ 
fare  tend  to  highlight  moral  universals,  whereas  definitions  that 
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entail  specific  behaviors  or  rigid  moral  rules  tend  to  highlight 
moral  cross-cultural  variation.  Typically,  theorists  who  strive  to 
uncover  moral  universals  believe  they  are  wrestling  with  the 
essence  of  morality,  with  its  deepest  and  most  meaningful  at¬ 
tributes.  In  contrast,  theorists  who  strive  for  characterizing 
moral  variation  argue  that,  by  the  time  a  person  has  a  common 
moral  feature  that  cuts  across  cultures,  he  or  she  has  so  disem¬ 
bodied  the  idea  into  an  abstract  form  that  it  loses  virtually  all 
meaning  and  utility.  For  instance,  in  the  example  of  devout  Hin¬ 
dus  who  believe  that  by  eating  fish  a  widow  hurts  her  dead  hus¬ 
band’s  spirit,  is  the  interesting  moral  phenomenon  that  Hindus, 
like  ourselves,  are  concerned  with  not  causing  others  harm?  Or, 
is  the  interesting  moral  phenomenon  that  Hindus  believe  in 
spirits  that  can  be  harmed  by  earthly  activity? 

Both  questions  have  merit,  and  a  middle  ground  provides  a 
more  sensible  and  powerful  approach  for  the  HCI  community: 
one  that  allows  for  an  analysis  of  universal  moral  values,  as  well 
as  allowing  for  these  values  to  play  out  differently  in  a  particu¬ 
lar  culture  at  a  particular  point  in  time  (Friedman,  1997b;  Kahn, 
1991,  1999;  Kahn  &  Lourenco,  1999). 

Designing  for  Diversity 

In  this  chapter,  we  will  continue  to  draw  on  the  preceding  analy¬ 
sis  to  help  us  review  the  HCI  literature  and  to  offer  morally  prin¬ 
cipled  design  methods  that  respect  culture  and  context.  As  a 
case  in  point,  imagine  you  are  designing  a  computerized  voting 
booth.  At  what  height  would  you  place  the  electronic  ballot?  A 
reasonable  answer  might  go  something  like  “It  depends.  How 
tall  are  the  people  who  vote?  Moreover,  later,  will  the  voting 
booth  be  used  by  other  people  of  different  height?  After  all,  a 
voting  booth  designed  only  for  players  of  the  National  Basket¬ 
ball  Association  will  disenfranchise  most  voters  in  Japan.”  In 
other  words,  the  universal  value  is  to  enfranchise  all  voters,  but 
the  specific  mechanism  by  which  to  do  so  may  need  to  be 
adaptable  to  specific  contexts  and  cultures.  The  general  princi¬ 
ple  then  is  that  designs  need  to  be  robust  enough  to  substanti¬ 
ate  the  value  under  consideration  and  yet  adaptable  enough  so 
that  different  cultures  (or  subcultures)  can  use  the  designs  in 
their  own  ways. 

Many  problems  occur  when  this  principle  is  ignored  or  when 
unanticipated  users  of  a  system  emerge.  For  example,  as  de¬ 
scribed  by  Friedman  and  Nissenbaum  (1996),  since  the  early 
1970s,  the  computerized  National  Resident  Medical  Match  Pro¬ 
gram  has  placed  most  medical  students  in  their  first  jobs.  In  the 
system’s  original  design,  it  was  assumed  that  only  one  individual 
in  a  family  would  be  looking  for  a  residency.  At  the  time,  such  an 
assumption  was  perhaps  not  out  of  line,  because  there  were  few 
women  residents.  However,  as  women  have  increasingly  made 
their  way  into  the  medical  profession,  marriages  between  resi¬ 
dents  have  become  more  frequent,  and  bias  against  placing  cou¬ 
ples  emerged.  Another  example  involved  a  dog  icon  that  was 
used  to  indicate  the  printing  orientation  of  a  printer:  landscape 
or  vertical.  'When  the  printer  was  shipped  to  Islamic  countries, 
the  vendor  discovered  that  people  in  such  countries  often  con¬ 
sidered  dogs  as  offensive  animals.  Consider  the  case  of  com¬ 
petitive  educational  software  that  was  exported  to  Micronesia, 
with  dismal  results  because  the  value  of  competition  promoted 


by  the  software  clashed  too  strongly  with  the  culture’s  emphasis 
on  co-operation.  Consider  also  that  data  protection  laws  and 
policies  differ  across  national  boundaries  (Agre  &  Rotenberg, 
1998;  Rotenberg,  2000).  For  example,  Bennett  (1998)  described 
how  most  of  the  European  countries  have  applied  the  same 
data  protection  policies  to  both  the  public  and  the  private  sec¬ 
tors.  However,  the  United  States,  Canada,  Australia,  and  Japan 
have  preferred  “to  regulate  only  the  public  sector’s  practices 
and  to  leave  the  private  sector  governed  by  a  few  sectoral  laws 
and  voluntary  codes  of  practice”  (Bennett,  1998,  p.  100).  Thus, 
lenient  designs  from  the  private  sector  of  these  latter  countries 
need  to  be  adaptable  to  transfer  readily  to  European  countries. 

Granted,  building  value  adaptability  into  systems  requires  ad¬ 
ditional  time  and  financial  resources.  Also,  from  a  user’s  per¬ 
spective,  additional  options  add  further  complexity  and  chal¬ 
lenges.  Thus,  value  adaptability  is  not  always  the  best  way  to  go. 
However,  it  often  is,  especially  when  we  can  anticipate  that  sim¬ 
ilar  values  will  play  out  in  different  ways  for  different  users.  In 
such  cases,  not  only  are  a  larger  number  of  human  lives  en¬ 
hanced  from  an  ethical  standpoint  but  also  from  an  economic 
standpoint  such  systems  increase  market  share  and  generate 
profits. 


APPROACHES  TO  HUMAN  VALUES 
AND  ETHICS  IN  DESIGN 


The  computing  field  has  addressed  issues  of  human  values  and 
ethics  by  means  of  a  handful  of  approaches.  To  some  extent, 
these  approaches  overlap  with  one  another.  For  example,  so- 
ciotechnical  analyses,  which  are  central  to  social  informatics, 
often  form  the  front  end  of  efforts  in  participatory  design,  and 
are  incorporated  into  the  empirical  investigations  of  value  sen¬ 
sitive  design.  Both  computer-supported  co-operative  work  and 
participatory  design  share  particular  interests  in  collaboration  in 
the  workplace.  However,  that  said,  the  approaches  differ  signif¬ 
icantly  when  considering  how  each  integrates  its  respective  po¬ 
sition  on  moral  epistemology,  methods,  and  contexts  studied. 

In  this  section,  our  goal  is  not  to  review  each  approach  com¬ 
prehensively  (which  is  beyond  the  scope  of  this  chapter) .  Rather, 
we  seek  to  show  how  each  approach  contributes  to  our  under¬ 
standing  of  how  to  integrate  human  values  and  ethics  in  design. 

Computer  Ethics 

■When  applied  moral  philosophers  brought  their  talents  and  en¬ 
ergies  to  bear  on  understanding  the  impact  of  computer  tech¬ 
nologies  on  social  life,  the  field  of  computer  ethics  was  born 
(Bynum,  1985;  Johnson,  1985;  Moor,  1985). 

In  their  resulting  examinations,  computer  ethicists  have  em¬ 
braced  two  complimentary  goals.  One  goal  has  been  to  utilize 
existing  moral  theory  to  bring  clarity  to  issues  at  hand,  and — 
at  appropriate  times — to  proscribe  norms  of  behavior.  For  ex¬ 
ample,  in  the  computer  science  literature,  the  term  trust  has 
often  been  used  synonymously  with  secwr/fy  (Schneider,  1999). 
Yet,  drawing  on  ethical  theory,  Nissenbaum  (1999)  showed  that 
these  two  terms  need  to  be  distinguished.  For  example,  if  as 
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HCI  professionals,  our  goal  is  to  create  a  place  where  people 
feel  safe  in  their  online  interactions,  we  can  move  in  two  de¬ 
sign  directions.  We  can  move  forward  with  technical  solutions 
that  involve  security  features  like  locks,  keys,  passwords,  and  en¬ 
cryption,  or  we  can  understand  how  trusting  relationships  are 
created  and  fostered,  and  design  them  into  our  online  systems. 
Each  direction  leads  to  a  very  different  online  environment.  The 
strength  of  this  type  of  philosophical  contribution  is  that  it  helps 
translate  moral  abstractions  into  crisp  working  conceptualiza¬ 
tions  that  HCI  professionals  can  use. 

A  second  goal  of  computer  ethicists  builds  on  the  innovations 
of  the  technology  itself  For  the  technologies  have  not  only  gen¬ 
erated  new  entities  (computer  programs,  the  Internet,  web 
pages),  but  have  also  enlarged  the  scale  of  activities  (data  min¬ 
ing),  increased  the  power  and  pervasiveness  of  its  effects  (ease  of 
communication),  and  often  became  invisible  to  human  purview 
(using  computers  for  surveillance).  Thus,  computer  ethicists 
have  been  interested  in  understanding  how  such  innovations  ex¬ 
tend  the  boundaries  of  traditional  ethical  concepts.  For  exam¬ 
ple,  Moor  (1985)  examined  the  ways  in  which  the  invisibility  of 
computers  affects  human  lives.  “Invisible  abuse”  is  the  intentional 
use  of  the  invisible  operations  of  a  computer  to  engage  in  un¬ 
ethical  conduct,  such  as  the  invasion  of  the  property  and  privacy 
of  others.  “Invisible  programming  values”  are  those  values  that 
are  embedded  in  a  computer  program,  such  as  bias.  “Invisible 
complex  calculations”  refer  to  calculations  that  are  too  complex 
for  human  inspection  and  understanding.  In  all  three  situations, 
Moor  argued  that  computer  technologies  raise  special  ethical  is¬ 
sues  different  from  other  forms  of  technology  and  that  the  task  of 
computer  ethics  is  to  fill  in  what  he  referred  to  as  the  resulting 
“conceptual  vacuum”  and  “policy  vacuum”  (Moor,  1985,  p.  266). 
In  a  similar  vein,  Nissenbaum  (2004)  argued  that  some  traditional 
theoretical  approaches  to  privacy  yield  unsatisfactory  conclu¬ 
sions  in  the  case  of  public  surveillance.  She  posits  a  new  con¬ 
struct,  contextual  integrity — that  demands  information  gathering 
and  dissemination  be  appropriate  to  the  context — to  capture  the 
nature  of  challenges  posed  by  information  technologies. 

One  theoretical  debate  in  this  literature  is  whether  the  tech¬ 
nological  innovations  fundamentally  challenge  ethical  theory 
(Johnson  &  Miller,  1997;  Spinello  &  Tavani,  2001).  In  other 
words,  are  innovations  so  qualitatively  different  and  far  reaching 
from  past  ones  that  traditional  ethical  theories  have  to  be  at 
least  significantly  revised,  if  not  abandoned?  Or,  is  it  the  case 
that  the  technology  simply  offers  a  new  domain  within  which 
traditional  ethical  theory  works?  Toward  understanding  this  de¬ 
bate,  a  helpful  analogy  can  be  made  to  literature,  wherein  there 
exist  the  categories  (genres)  of  fiction  and  nonfiction.  These  cat¬ 
egories  seem  clear  enough  until  we  encounter  a  new  form  of 
writing  that  blurs  the  boundaries,  such  as  historical  novels.  Does 
that  new  form  undermine  the  traditional  categories?  Philoso¬ 
phers  like  Searle  (1983)  argued  no:  The  categories  of  fiction  and 
nonfiction  fundamentally  remain,  and  what  the  new  literary 
form  shows  is  that  writing  can  embody  aspects  of  both  cate¬ 
gories.  Similarly,  it  can  be  argued  that,  whereas  the  technology 
can  blur  traditional  ethical  boundaries  and  demand  further  re¬ 
finements  and  clarity  of  moral  theory,  ethics  itself  has  not 
changed  nor  has  its  fundamental  precepts. 

Regardless  of  how  one  views  this  debate,  it  is  clear  that  the 
field  of  computer  ethics  advances  our  understanding  of  key  val¬ 


ues  that  lie  at  the  intersection  of  computer  technology  and  hu¬ 
man  lives.  However,  there  are  some  limitations  to  the  field’s 
contributions  to  HCI.  For  one,  computer  ethics  often  remains 
too  divorced  from  technical  implementations.  How  exactly,  for 
example,  can  HCI  professionals  build  interfaces  that  enhance 
trust  within  a  community  of  users?  How  exactly  do  we  address 
the  value  problems  that  arise  through  invisible  computing?  In 
addition,  computer  ethics  has  focused  too  often  on  a  single 
value  at  a  time.  Yet,  as  HCI  professionals,  we  commonly  wrestle 
with  design  trade-offs  between  competing  moral  values.  Col¬ 
laborations  between  computer  ethicists  and  HCI  professionals 
would  be  a  fruitful  way  to  address  these  limitations. 

Social  Informatics 

In  the  second  section,  we  reviewed  embodied,  exogenous,  and 
interactional  positions  on  how  values  become  implicated  in 
technology.  Recall  that  the  embodied  position  holds  that  de¬ 
signers  inscribe  their  own  intentions  and  values  into  the  tech¬ 
nology;  once  developed  and  deployed,  the  resulting  technology 
is  said  to  determine  specific  kinds  of  human  behavior.  As  noted, 
many  researchers  find  such  a  position  highly  untenable  and  in¬ 
stead  emphasize  the  social  context  in  which  information  sys¬ 
tems  are  used.  Recall,  for  example,  Orlikowski’s  (2000)  study,  re¬ 
viewed  earlier,  where  she  investigated  the  effects  when  Lotus 
Development  Corporation’s  Notes  was  introduced  into  a  large 
corporation.  She  found  that,  at  least  in  part,  the  incentive  sys¬ 
tems  in  the  corporation  did  more  to  influence  how  and  whether 
Notes  was  used  than  the  mere  capability  of  the  software  itself 

Over  the  last  25  years,  this  emphasis  on  the  social  context 
of  information  technologies  has  been  the  subject  of  systematic 
research  (Attewell,  1987;  Borgman,  2000;  lacono  &  Kling,  1987; 
Kiesler,  1997;  King,  1983;  Kling,  1980;  Orlikowski,  1993;  Poltrock 
&  Grudin,  1994).  The  research  has  appeared  under  many  la¬ 
bels,  including  social  analysis  of  computing,  social  impacts  of 
computing,  information  systems,  sociotechnical  systems,  and 
behavioral  information  systems.  More  recently,  this  overarch¬ 
ing  enterprise  has  begun  to  coalesce  within  a  new  field  called 
“social  informatics”  (Kling,  Rosenbaum,  &  Hert,  1998;  Kling  & 
Star,  1998;  Sawyer  &  Rosenbaum,  2000).  As  defined  by  partici¬ 
pants  at  thel997  National  Science  Foundation-sponsored  work¬ 
shop  on  this  topic,  social  informatics  is  the  interdisciplinary 
study  of  the  design,  uses,  and  consequences  of  information 
technologies  that  takes  into  account  their  interactions  with  in¬ 
stitutional  and  cultural  contexts. 

To  illustrate  the  value  of  social  informatics,  Kling  (1999)  con¬ 
trasted  the  design  and  functioning  of  two  electronic  journals: 
Electronic  Transactions  of  Artificial  Intelligence  (ETAI)  and 
Electronic  Journal  of  Cognitive  and  Brains  Sciences  (EJCBS). 
Both  journals  envision  attracting  high-quality  papers.  Both  jour¬ 
nals  also  have  implemented  technology  that  works  effectively. 
However,  Kling  argued  that  differences  in  their  sociotechnical 
designs  lead  ETAI  to  prosper  and  EJCBS  to  wane.  Eor  example, 
articles  submitted  to  ETAI  are  reviewed  in  a  two-phase  process. 
In  the  first  phase,  the  article  is  open  to  public  online  discussion 
for  a  period  of  three  months.  Based  on  the  resulting  discussions, 
authors  have  an  opportunity  to  revise  their  papers.  In  the  second 
phase,  the  article  undergoes  a  quick,  confidential  peer  review. 
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In  contrast,  articles  submitted  to  EJCBS  are  evaluated  by  their 
general  online  readership.  Articles  that  receive  a  minimum  score 
are  then  transferred  to  an  archive  of  accepted  papers.  EJCBS  has 
been  designed  as  much  as  possible  as  an  autonomous  system 
that  would  run  on  its  own  after  it  was  programmed.  It  removes 
editorial  attention  from  the  publishing  process,  and  instead  re¬ 
lies  on  a  readers’  plebiscite.  However,  such  a  design,  according 
to  Kling,  misconstrues  the  social  context  of  successful  academic 
journal  publishing,  one  that  requires  a  lively  group  of  authors 
and  readers  and  attention  from  senior  scientists  in  the  field. 

Work  in  social  informatics  has  been  successful  in  providing 
sociotechnical  analyses  of  deployed  technologies.  Yet,  in  terms 
of  its  contributions  to  HCI,  Johnson  (2001),  for  example,  wrote, 
“One  aspect  that  still  confounds  me  is  how  to  reconcile  the  ba¬ 
sic  premise  of  social  informatics — that  it  is  critical  to  gain  knowl¬ 
edge  of  the  social  practices  and  values  of  the  intended  users — 
with  the  basic  work  of  system  developers.  How,  if  at  all,  can 
programmers  practice  and  apply  social  informatics?”  (p.  18). 
Granted,  Kling  (1999)  recognized  the  importance,  for  example, 
of  “workplace  ethnography,  focus  groups,  user  participation  in 
design  teams,  and  participatory  design  strategies”  (Social-tech¬ 
nical  Systems,  para.  4).  Then  he  said,  “To  discuss  it  [these  meth¬ 
ods]  in  detail  here  would  lead  us  away  from  our  focus  on  the 
structural  elements  of  a  socio-technical  analysis”  (Social-technical 
Systems,  para.  4).  Thus,  it  appears  that  social  informatics  can 
move  in  at  least  two  directions.  One  direction  leads  to  develop¬ 
ing  the  sociotechnical  analyses  and  to  viewing  this  work  as  com¬ 
plementing  work  in  design  (and  other  areas,  such  as  computer 
ethics).  Another  direction  leads  to  an  expansion  of  social  infor¬ 
matics  such  that  it  fundamentally  embraces  design  (and  other 
areas)  into  its  theoretical  framework. 

Computer-Supported  Cooperative  Work 

Although  social  informatics  has  emphasized  the  sociotechnical 
analyses  of  deployed  technologies,  the  field  of  computer- 
supported  co-operative  work  (CSCW)  has,  for  some  time,  fo¬ 
cused  on  the  design  of  new  technologies  to  help  people  collab¬ 
orate  effectively  in  the  workplace  (Galegher,  Kraut,  &  Egido, 
1990;  Grief,  1988;  Grudin,  1988).  “Groupware”  is  the  name  often 
used  for  software  that  seeks  to  facilitate  CSCW  goals.  The  history 
of  CSCW  goes  back  to  early  work  in  various  research  laboratories, 
like  Xerox  PARC,  EuroPARC,  Bell  Lab,  and  IBM.  There,  computer 
professionals  working  within  relatively  small  groups  themselves 
and  sometimes  remotely  sought  to  improve  their  collaborations. 

Typically,  the  values  considered  in  CSCW  designs  have  been 
closely  tied  to  group  activities  and  workplace  issues.  Coopera¬ 
tion  has  been,  of  course,  an  overarching  value.  In  addition,  the 
field  has  paid  attention  to  such  values  as  privacy,  autonomy,  own¬ 
ership,  commitment,  security,  and  trust.  Isaacs,  Tang,  and  Morris 
(1996),  for  example,  designed  a  system  to  support  informal  in¬ 
teractions  in  the  workplace,  using  the  piazza  (the  plaza)  as  the 
metaphor.  The  application  provided  means  for  workers  to  opt 
out  of  piazza  interactions,  thus  protecting  workers’  privacy  and 
autonomy.  Olson  and  Teasley  (1996)  reported  on  the  planning, 
implementation,  and  use  of  groupware  tools  over  the  course  of 
a  year  in  a  real  group  with  remote  members.  One  of  their  key 
findings  was  that  “social  responsibility  and  commitment  ap¬ 


peared  diminished  or  missing  when  people  did  not  meet  face- 
to-face”  (Olson  &  Teasley,  1996,  p.  425).  Dewan  and  Shen  (1998) 
discuss  an  access-control  model  that  accounts  for  joint  owner¬ 
ship  of  shared  objects,  different  ownership  rights  for  different 
types  of  users,  and  the  delegation  of  access  rights  (security). 
Hudson  and  Smith  (1996)  sought  after  methods  that  allow  work¬ 
ers  to  share  video  information  about  themselves  to  their  col¬ 
leagues  while  providing  protections  for  privacy.  One  solution 
was  to  shadow  images  on  the  video  screen  so  that  colleagues 
can  tell  that  you  are  in  your  office  but  cannot  tell  what  you  are 
doing.  Euchs  (1999)  developed  a  notification  service  for  aware¬ 
ness  information  that  also  addresses  potential  conflicts  between 
awareness  and  privacy.  Van  House,  Butler,  and  Schiff  (1998)  ex¬ 
amined  how  in  a  workplace  with  environmental  planning  data 
sets  that  trust  is  created  and  assessed,  and  “how  changes  in  tech¬ 
nology  interact  with  those  practices  of  trust”  (p.  335). 

CSCW  has  traditionally  focused  on  the  workplace,  as  its 
name  implies.  Yet  if  the  recent  Conference  Proceedings  of 
CSCW  is  any  indication,  then  the  field  is  quickly  expanding  to  in¬ 
clude  nonworkplace  settings.  Eor  example,  in  the  CSCW  1996 
Proceedings,  about  6%  of  the  papers  involved  a  nonworkplace 
setting.  In  1998,  it  was  13%;  in  2000,  31%;  in  2002,  36%;  and  in 
2004,  25%.  The  CSCW  2000  conference  co-organizers  noted: 

The  fact  that  these  [work]  place  technologies  are  available  not  only  in 
working  settings  but  also  in  homes  means  that  the  focus  of  our  atten¬ 
tion  [in  the  CSCW  community]  has  broadened  to  encompass  a  much 
wider  range  of  activities  than  we  could  have  imagined  when  the  CSCW 
conference  series  began  in  1986.  (Dourish  &  Kiesler,  2000,  p.  v.) 

Correspondingly,  the  range  of  values  that  the  CSCW  field  has 
begun  to  analyze  has  also  started  to  expand.  For  example,  My- 
natt  et  al.  (1999)  delineated  the  values  of  safety,  civility,  warmth, 
and  friendship  that  are  fostered  by  SeniorNet,  an  organization 
that  brings  seniors  together  through  computer  networking 
technologies.  O’Neill  and  Gomez  (1998)  described  a  project 
that  links  middle  and  high  school  science  students  with  working 
scientists  as  mentors  for  the  students’  science  projects.  The  re¬ 
searchers  “illustrate  the  unique  dynamics  of  these  relationships, 
consider  their  technical  and  social  demands,  and  discuss  the  po¬ 
tential  for  CSCW  systems  to  help  sustain  long-term  help  rela¬ 
tionships  by  better  accommodating  their  needs”  (O’Neill  & 
Gomez,  1998,  p.  325).  McCarthy  and  Anagnost  (1998)  explored 
the  social  ramification  of  a  group  preference  agent  for  music  in 
a  shared  environment — a  fitness  center.  By  technical  means, 
they  thus  seek  to  democratize  the  music  selection  process. 
Palen,  Salzman,  and  Youngs  (2000)  tracked  19  new  cell  mobile 
phone  users  for  six  weeks.  In  their  discussion,  they  called  at¬ 
tention  to  issues  of  unfairness  that  arose  in  the  context  of  use. 

It  remains  unclear  how  much  more  the  field  of  CSCW  can  ex¬ 
pand  to  include  nonworkplace  settings  before  its  very  name  (that 
has  “work”  in  its  title)  becomes  an  historical  artifact  rather  than  a 
description  of  its  current  activity.  That  is  a  quibble  with  nomen¬ 
clature.  The  direction  of  the  field  seems  intellectually  vibrant  and 
exciting.  Thus,  as  CSCW  continues  to  expand  into  a  broader 
range  of  human  activity,  it  will  increasingly  encounter  and  thereby 
take  hold  of  a  broader  range  of  human  values  with  ethical  import. 

Thus,  a  substantive  question  emerges:  How  should  we  un¬ 
derstand  the  epistemic  standing  of  moral  values  within  a  CSCW 
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framework?  One  way  of  currently  reading  the  field  is  that  moral 
values  are  simply  relative  to  culture  (and  to  the  culture  of  any  par¬ 
ticular  work  group).  Harrison  and  Dourish  (1996),  for  example, 
argued  that  “privacy  is  relative,  not  a  set  of  psychological  primi¬ 
tives”  (p.  71).  As  such,  these  values  should  be  respected  only  if 
the  workers  themselves  think  the  values  are  important.  Yet,  as 
the  field  continues  to  broaden,  the  tensions  embedded  in  this 
perspective  will  continue  to  become  a  cause  of  concern.  For  ex¬ 
ample,  Greenbaum  (1996)  argued  that  CSCW  needed  to  con¬ 
sider  the  political  dimensions  of  labor  and  not  assume  that  work 
is  co-operative.  Suchman  (1994),  too,  emphasized  the  power  hi¬ 
erarchies  within  social  organizational  groups.  In  such  situations, 
which  presumably  arise  within  every  organization,  at  least  to 
some  extent,  values  often  will  conflict.  Perhaps  management 
(which  holds  the  power)  seeks  efficiency  over  privacy,  whereas 
workers  seek  the  converse.  What  does  a  designer  do?  Thus,  once 
CSCW  analyses  move  beyond  largely  homogeneous  groups  and 
into  organizational  structures,  then  potentially  a  principled  posi¬ 
tion  on  the  moral  standing  of  human  values  will  be  required. 

Participatory  Design 

In  Norway,  in  the  early  1970s,  there  was  a  general  consensus 
that  computer  systems  should  not  deskill  workers,  but  enhance 
skill,  protect  crafts,  and  foster  meaningful  work.  At  that  time, 
strong  labor  unions  also  helped  enact  into  law  a  national  code- 
termination  agreement.  This  agreement  entitled  workers  along 
with  management  to  determine  which  technologies  are  intro¬ 
duced  into  the  workplace  (Kuhn  &  Winograd,  1996).  Thus, 
emerging  from  this  social  structure  was  a  new  approach  to  sys¬ 
tem  design — participatory  design — that  fundamentally  sought 
to  integrate  workers’  knowledge  and  a  sense  of  work  practice 
into  the  system  design  process  (Bannon,  1991;  Bjerknes  &  Brat- 
teteig,  1995;  Bpdker,  1990;  Kensing,  Simonsen,  &  Bpdker,  1998, 
2004;  Ehn,  1989;  Floyd,  Mehl,  Reisin,  Schmidt,  &  Wolf,  1989; 
Greenbaum  &  Kyng,  1991;  Kyng  &  Mathiassen,  1997). 

In  light  of  value  considerations,  then,  participatory  design 
has  embedded  within  it  a  commitment  to  democratization 
of  the  workplace  and  human  welfare.  It  also  has  what  can  be 
viewed  as  virtue-based  moral  commitments,  as  it  seeks  to  ac¬ 
count  for  meaningful  activity  in  everyday  lives. 

At  least  five  important  methods  have  emerged  from,  or  have 
been  elaborated  by,  the  field  of  participatory  design,  (a)  Identi¬ 
fying  stakeholders — Toward  achieving  designs  that  work,  it  is  of¬ 
ten  necessary  to  identify  the  people  they  directly  and  indirectly 
effect  (cf ,  Korpela  et  al.,  1998).  (b)  Workplace  ethnography — 
Ethnographies  document  the  practices  and  beliefs  of  a  group 
from  the  group’s  perspective.  Methods  include  analyzing  doc¬ 
uments  and  artifacts  in  the  group’s  environment,  participant 
observation,  field  observations,  surveys,  and  formal  and  infor¬ 
mal  interviews  (cf ,  Kensing,  Simonsen,  &  Bpdker,  1998,  2004; 
Mackay  &  Fayard,  1999).  (c)  Future  workshops — future  work¬ 
shop  is  a  method  to  uncover  common  problems  in  the  work¬ 
place  and  to  solve  them.  This  method  is  divided  into  three 
phases.  As  described  by  Kensing  and  Masden  (1991),  “The  Cri¬ 
tique  phase  is  designed  to  draw  out  specific  issues  about  cur¬ 
rent  work  practice;  the  Fantasy  phase  allows  participants  the 
freedom  to  imagine  ‘what  if’  the  workplace  could  be  different; 
and  the  Implementation  phase  focuses  on  what  resources  would 


be  needed  to  make  realistic  changes”  (p.  157).  (d)  User  partici¬ 
pation  in  design  teams — Four  ways  users  participate  in  design 
teams  can  be  characterized  (Kuhn  &  Winograd,  1996):  direct¬ 
ness  of  interaction  with  designers,  length  of  involvement  in  the 
design  process,  scope  of  the  participation,  and  the  degree  of 
control  over  the  design  decisions.  User  participation  has  been 
central  to  participatory  design  projects  conducted  in  Europe 
and  North  America.  In  addition,  Korpela  et  al.  (1998)  wrote  that 
their  seven-year  experience  in  Nigeria  suggests  that  user  partic¬ 
ipation  is  also  a  must  in  developing  countries,  (e)  Mock  ups  and 
prototypes — Both  mock  ups  and  prototypes  create  physical 
representations  of  technological  designs.  A  mock  up  looks 
roughly  like  the  artifact  it  represents,  but  completely  lacks  the 
artifact’s  functionality.  Mock  ups  usually  occur  very  early  in  the 
design  process.  As  noted  by  Ehn  and  Kyng  (1991),  mock  ups  en¬ 
courage  hands-on  experience  and  are  understandable  to  the 
end  user,  cheap  to  build,  and  fun  to  work  with.  In  turn,  proto¬ 
types  incrementally  embed  functionality  into  the  artifact  through 
successive  iterations.  Both  methods  help  end  users  envision  the 
potential  for  the  proposed  technology  and  the  resulting  changes 
in  work  practice  (Greenbaum  &  Kyng,  1991). 

Some  HCI  practice  in  the  United  States  has  followed  closely 
in  the  Scandinavian  style  of  participatory  design.  However,  what 
has  been  embraced  even  more  commonly  is  pragmatic  partic¬ 
ipatory  design,  a  term  we  use  to  refer  to  the  above  rich  con¬ 
stellation  of  methods  and  design  techniques  but  largely  stripped 
of  their  moral  commitments. 

Two  reasons  may  help  account  for  why  the  HCI  community  in 
the  United  States  has  been  resistant  to  embrace  the  moral  com¬ 
mitments  of  participatory  design,  while  embracing  many  of  its 
methods.  One  reason  is  that  although  the  United  States  is  polit¬ 
ically  committed  to  the  value  of  participatory  democracy,  its  cap¬ 
italistic  business  culture  is  not.  Thus,  the  moral  values  that  lie  at 
the  core  of  participatory  design  (participation,  democracy,  and 
moral  personhood)  run  counter  to  values  in  the  U.S.  workplace. 
A  second  reason  can  be  viewed  in  light  of  the  cultural  homo¬ 
geneity  in  the  Scandinavian  countries.  Compared  with  the 
United  States,  for  example,  these  countries  historically  have  been 
more  homogeneous  in  terms  of  race,  ethnicity,  and  religion,  and 
thus  have  encountered  fewer  opportunities  for  corresponding 
prejudices  and  hostilities.  Thus,  when  applied  in  more  diverse 
contexts,  participatory  design  has  too  little  to  say  when  divisive 
constituencies  argue  on  the  basis  of  narrowly  conceived  self-in¬ 
terests  and  hostile  prejudices.  After  all,  at  least  in  theory,  partici¬ 
patory  design  values  each  participant’s  voice,  even  those  that  ap¬ 
pear  uncaring  and  unjust.  This  problem  has  been  of  concern 
within  the  field  (Gross,  Parker,  &  Elliott,  1997;  Muller,  1997). 

Value  Sensitive  Design 

Given  the  limitations  of  the  other  approaches  in  integrating 
ethics  and  sociotechnical  analyses  with  actual  design,  another 
approach  has  recently  emerged  called  “value  sensitive  design” 
(Friedman,  1997a,  2004;  Friedman,  Kahn,  &  Borning,  in  press). 
This  approach  seeks  to  design  technology  that  accounts  for  hu¬ 
man  values  in  a  principled  and  comprehensive  manner  through¬ 
out  the  design  process.  Value  sensitive  design  is  primarily  con¬ 
cerned  with  values  that  center  on  human  well-being,  dignity, 
justice,  welfare,  and  rights.  This  approach  is  principled  in  that 
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it  maintains  that  such  values  have  moral  epistemic  standing  in¬ 
dependent  of  whether  a  particular  person  or  group  upholds 
such  values.  At  the  same  time,  value  sensitive  design  maintains 
that  how  such  values  play  out  in  a  particular  culture  at  a  partic¬ 
ular  point  in  time  can  vary,  sometimes  considerably. 

Value  sensitive  design  articulates  an  interactional  position  for 
how  values  become  implicated  in  technological  designs.  From 
this  position,  Friedman  and  Nissenbaum  (1996)  analyzed  bias 
in  computer  systems.  Cranor  and  Resnick  (2000)  analyzed  ano¬ 
nymity  in  ecommerce.  Agre  and  Mailloux  (1997)  analyzed  pri¬ 
vacy  in  computerized  vehicle-highway  systems.  Thomas  (1997) 
analyzed  universal  access  within  a  communications  company. 
Ackerman  and  Cranor  (1999)  analyzed  interface  components 
to  safeguard  users’  privacy  on  the  Internet.  Cummings  (in 
press)  analyzed  the  impact  of  interface  design  of  decision  sup¬ 
port  systems  on  human  error  and  accountability.  Friedman, 
Kahn,  Hagman,  Severson,  and  Gill  (in  press)  investigated  peo¬ 
ple’s  views  and  values  about  privacy  in  public  given  the  wide¬ 
spread  use  of  cameras  and  display  technology.  Horning,  Fried¬ 
man,  Davis,  and  Lin  (2005)  investigated  the  design  of  indicators 
for  large-scale  urban  simulation  to  support  public  deliberation 
around  urban  planning.  Kahn,  Freier,  Friedman,  Severson,  and 
Feldman  (2004)  examined  the  social  and  moral  implications  of 
human-robotic  interaction.  Hagman,  Hendrickson,  and  Whitty 
(2003)  examined  informed  consent  with  machine-scannable 
drivers’  licenses.  Abowd  and  Jacobs  (2001)  called  attention  to 
how  the  design  of  advanced  sensing  technologies  within  and 
outside  the  home  can  impinge  on  the  individual’s  right  (as  es¬ 
tablished  by  the  Fourth  Amendment)  to  be  protected  against 
illegal  search  and  seizure  by  the  government.  Shneiderman  and 
Rose  (1997)  proposed  social  impact  statements  for  informa¬ 
tion  systems.  (For  related  approaches,  see  Flanagan,  Howe,  & 
Nissenbaum,  2005,  who  examine  designer  values;  Sengers, 
Boehner,  David,  &  Kaye,  2005,  who  embed  value  analyses  in 
their  work  on  reflective  design.) 

Methodologically,  at  the  core  of  value  sensitive  design  lies 
an  iterative  process  that  integrates  conceptual,  empirical,  and 
technical  investigations.  Conceptual  investigations  involve  philo¬ 
sophically  informed  analyses  of  the  central  constructs  and  is¬ 
sues  under  investigation.  Questions  include  how  are  values  sup¬ 
ported  or  diminished  by  particular  technological  designs?  "Who 
is  affected?  And  how  should  we  engage  in  trade-offs  among 
competing  values  in  the  design,  implementation,  and  use  of  in¬ 
formation  systems?  Empirical  investigations  involve  social- 
scientific  research  on  the  understandings,  contexts,  and  expe¬ 
riences  of  the  people  affected  by  the  technological  designs. 
Technical  investigations  involve  analyzing  current  technical 
mechanisms  and  designs  to  assess  how  well  they  support  par¬ 
ticular  values,  and  conversely,  identifying  values,  and  then  iden¬ 
tifying  and/or  developing  technical  mechanisms  and  designs 
that  can  support  those  values.  As  mentioned,  these  investiga¬ 
tions  are  iterative  and  integrative.  For  example,  results  from  the 
empirical  investigations  may  reveal  values  initially  overlooked  in 
the  conceptual  investigations  or  help  to  prioritize  competing 
values  in  the  design  trade-offs  between  technical  mechanisms 
and  values  considerations.  To  illustrate  this  methodology,  con¬ 
sider  a  recent  project  by  Friedman,  Felten,  and  their  colleagues 
as  they  sought  to  understand  how  to  design  web-based  interac¬ 
tions  to  respect  informed  consent  (Friedman,  Howe,  &  Felten, 
2002;  Millett,  Friedman,  &  Felten,  2001).  They  began  their  pro¬ 


ject  with  a  conceptual  investigation  of  informed  consent  itself 
■What  is  it?  How  can  it  be  garnered  in  diverse  online  interactions 
in  general,  and  in  web-based  interactions  in  particular?  To  vali¬ 
date  and  refine  their  resulting  conceptual  analysis,  and  initiate 
their  technical  investigation,  they  conducted  a  retrospective 
analysis  of  existing  technology.  Namely,  they  examined  how  the 
cookie  and  web-browser  technology  embedded  in  Netscape 
Navigator  and  Internet  Explorer  changed — with  respect  to  in¬ 
formed  consent — over  a  five-year  period,  beginning  in  1995. 
(These  results  are  summarized  elsewhere  in  this  chapter.)  Then, 
the  design  work  began.  Their  design  improvements  are  being 
implemented  in  the  Mozilla  browser  (the  open-source  code  for 
Netscape  Navigator)  and  undergoing  empirical  investigations 
(usability  studies  and  formative  evaluation),  which  will  then  re¬ 
shape  the  initial  technical  and  conceptual  work. 

The  National  Science  Foundation  recently  sponsored  two 
workshops  to  help  shape  a  research  agenda  for  'Value  Sensitive 
Design.  The  recommendations  from  the  workshops’  final  re¬ 
ports  (Friedman,  1999;  Friedman  &  Horning,  2001)  have  focused 
on  the  need  for  (a)  theoretical  and  conceptual  analyses  that 
study  not  only  particular  values  in  the  online  context,  but  also 
complexities  that  arise  when  tradeoffs  among  competing  values 
are  required  in  a  design;  (b)  translations  of  well-analyzed  values 
into  technical  implementations;  (c)  proof-of  concept  projects  in 
which  multidisciplinary  teams  apply  'Value  Sensitive  Design  to  a 
particular  technology  domain,  or  design  problem;  (d)  contextual 
analyses  that  investigate  the  impact  of  different  stakeholders 
who  influence  the  design  and  use  of  a  technology,  and  who  may 
have  different  goals  and  priorities  that,  in  turn,  lead  to  different 
value  tradeoffs;  (e)  integrating  the  methodology  into  organizational 
structures  and  work  practices;  and  (f)  criteria  and  metrics — both 
qualitative  and  quantitative — that  can  guide  the  design  process 
and  assess  the  success  of  particular  designs. 


HUMAN  VALUES  WITH  ETHICAL  IMPORT 


'W'e  review  and  discuss  12  specific  human  values  with  ethical  im¬ 
port.  Some  of  these  values  have  garnered  individual  chapters 
in  this  handbook.  But,  by  including  these  values  here,  we  high¬ 
light  their  ethical  status  and  thereby  suggest  that  they  have  a  dis¬ 
tinctive  claim  on  resources  in  the  design  process. 

There  are  two  caveats.  First,  not  all  the  values  we  review  are 
fundamentally  distinct  from  one  another.  Nonetheless,  each 
value  has  its  own  language  and  conceptualizations  within  its  re¬ 
spective  fields,  and  thus  warrants  separate  treatment  here.  Sec¬ 
ond,  this  list  is  not  comprehensive.  Perhaps  no  list  could  be,  at 
least  within  the  confines  of  a  chapter.  Peacefulness,  compassion, 
love,  warmth,  creativity,  humor,  originality,  vision,  friendship, 
cooperation,  collaboration,  purposefulness,  devotion,  diplo¬ 
macy,  kindness,  musicality  harmony — the  list  of  other  possible 
values  could  get  very  long  very  fast.  Our  particular  list  comprises 
many  of  the  traditional  values  that  hinge  on  the  deontological 
and  consequentialist  moral  orientations  reviewed  previously: 
human  welfare,  ownership  and  property,  privacy,  freedom  from 
bias,  universal  usability,  trust,  autonomy,  informed  consent,  and 
accountability.  In  addition,  we  have  chosen  several  nontradi- 
tional  values  within  the  HCI  community:  identity,  calmness,  and 
environmental  sustainability.  Our  goal  here  is  not  only  to  point 
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to  important  areas  of  future  inquiry,  but  also  to  illustrate  how  an 
overarching  framework  for  human  values  and  ethics  in  design 
can  move  one  quickly  and  substantively  into  new  territory. 

Human  Welfare 

Perhaps  no  value  is  more  directly  salient  to  individuals  at  large 
as  their  own  welfare,  and  the  welfare  of  other  human  beings. 
Moreover,  societal  interest  in  the  moral  dimensions  of  comput¬ 
ing  often  arises  when  harms  occur.  The  faulty  computer  tech¬ 
nology  and  interface  design  of  Therac-25  led  to  the  physical  suf¬ 
fering  and  death  of  cancer  patients  (Leveson  &  Turner,  1993). 
Faulty  computer  technologies  are  implicated  in  nuclear  acci¬ 
dents  like  Chernobyl.  Indeed,  the  risks  literature  (Neumann, 
1995)  is  full  of  almost  daily  examples. 

Some  people  in  the  HCI  community,  like  Leveson  (1991), 
referred  broadly  to  harms  that  impact  people  and  objects  as 
harms  to  physical  welfare.  But  there  is  merit  in  demarcating 
three  categories  of  welfare  claims  (Friedman,  1997b;  Kahn, 
1992,  1999;  Turiel,  1983).  Physical  welfare  appeals  to  the  well¬ 
being  of  individuals’  biological  selves,  which  is  harmed  by  injury, 
sickness,  and  death.  Many  of  the  previous  examples — Therac- 
25  problems,  “smart”  missiles  gone  astray,  nuclear  accidents, 
and  so  forth — involve  harms  of  this  kind.  In  many  such  in¬ 
stances,  physical  harms  occur  because  of  faulty  hardware  or 
software.  But  it  is  important  to  recognize  that  technological  de¬ 
signs  working  correctly  can  themselves  lead  to  corresponding 
harms  or  benefits.  For  example,  by  means  of  the  connectivity 
of  the  web,  stalkers  can  more  easily  find  their  “prey,”  and  pe¬ 
dophiles  can  more  easily  find  unwitting  children  to  contact  and 
potentially  abuse.  On  the  positive  side,  the  mediating  quality  of 
the  web  can  enhance  anonymity  (e.g.,  in  a  chat  room)  and 
buffer  individuals  from  physical  harm.  Material  welfare  appeals 
to  physical  objects  that  humans  value  and  human  economic  in¬ 
terests.  It  goes  hardly  without  saying  that  a  major  driving  force 
behind  computer  technologies  in  general  has  been  to  enhance 
the  material  welfare  of  humans.  In  turn,  a  good  deal  of  effort  is 
spent  protecting  against  material  harms  that  computers  can  en¬ 
gender,  such  as  damaged  data  from  computer  viruses  and  stolen 
financial  information.  Much  of  the  work  on  usable  security  is  rel¬ 
evant  here  (Cranor  &  Garfinkel,  2005;  Dourish,  Grinter,  Del¬ 
gado  de  la  Flor,  &  Joseph,  IQQA) .  Psychological  welfare  refers  to 
the  higher-order  emotional  states  of  human  beings,  including 
comfort,  peace,  and  mental  health.  The  connectivity  offered  by 
the  web  again  provides  a  clear  example  of  psychological  benefits 
and  harms.  A  benefit  accrues  when  the  web  enhances  friend¬ 
ships;  and  a  harm  occurs  when  the  web  allows  for  new  forms 
of  betrayal  (e.g.,  the  “friend”  you  thought  you  found  in  a  chat 
room  turns  out  to  be  a  Bot) . 

Ownership  and  Property 

From  the  anthropological  literature,  it  would  appear  that  all  cul¬ 
tures  embrace  the  idea  that  people  can  own  property,  although 
the  form  of  such  conceptions  can  vary  considerably  (Herskovits, 
1952).  In  perhaps  its  most  stringent  form,  ownership  can  be  un¬ 
derstood  as  a  general  right  to  property,  which,  in  turn,  entails  a 
group  of  specific  rights,  including  the  right  to  possess  an  object. 


use  it,  manage  it,  derive  income  from  it,  and  bequeath  it 
(Becker,  1977). 

This  basic  concept  of  a  property  right  seems  simple  enough 
when  applied  to  tangible  objects.  If  you  own  a  table,  for  exam¬ 
ple,  you  can  keep  it  in  your  house,  eat  on  it,  let  others  eat  on  it, 
rent  it,  and  give  it  away.  But  current  computational  technologies 
blur  the  boundaries  between  the  tangible  and  intangible.  Herein 
are  difficult  questions.  Can  users,  for  example,  legitimately  copy 
software  for  their  personal  use?  After  all,  to  do  so  involves  no 
loss  of  a  physical  object  from  the  original  seller.  Or  can  a  pro¬ 
grammer  modify  part  of  the  code  from  an  operating  system  and 
then  sell  the  modified  system?  Provisional  legislative  answers  in 
the  United  States  have  hinged  on  nuanced  distinctions  between 
copyright,  patent,  licensing,  and  trade  secrets.  For  example,  un¬ 
like  a  patent,  a  copyright  does  not  protect  ideas,  but  only  the  ex¬ 
pression  of  an  idea  once  it  is  fixed  in  a  tangible  medium.  The 
courts  have  largely  granted  developers  copyrights  (but  not 
patents)  to  their  software.  But  even  here,  further  controversies 
arise.  Let  us  say  that  a  developer  does  not  copy  any  code  of  a 
competitor’s  software,  but  copies  its  “look  and  feel”?  Is  that  an 
infringement?  The  courts  have  said  it  is  if  other  technical  means 
readily  exist  for  implementing  a  different  “look  and  feel”  for  the 
interface  (Lipinski  &  Britz,  2000). 

While  case  law  is  solidifying  for  issues  that  focus  on  com¬ 
puter  software,  it  is  hardly  keeping  pace  with  other  computa¬ 
tional  developments  that  impact  ownership  and  property.  Con¬ 
sider,  for  example,  a  workplace  discussion  group.  'Who  gets  to 
decide  whether  the  comments  from  the  online  discussion  fo¬ 
rum  are  made  accessible  to  the  public  at  large?  In  other  words, 
do  the  workers  have  equal  ownership  of  the  compiled  contri¬ 
butions?  Or  is  ownership  divided  by  level  of  participation?  Is  the 
owner  the  organizer  of  the  discussion  group,  or  the  president  of 
the  company?  Can  one  member  delete  the  contribution  from 
another  member  (if,  for  example,  he  or  she  finds  the  contribu¬ 
tion  offensive)?  Consider  Goldberg’s  (2000)  Telegarden,  an  in¬ 
stallation  that  combined  robotics  and  a  real-time  web  cam¬ 
corder,  such  that  users  interacted  from  a  remote  distance  with  a 
real  physical  environment  (a  garden) ,  and  planted  and  watered 
seeds.  Who  owns  each  actual  live  plant?  The  user?  The  owner 
of  the  installation?  Goldberg  himself?  "Who  decides  how  to  re¬ 
spond  to  an  infestation  of  pests,  and  whether  pesticides  will  be 
used?  If  the  garden  becomes  overcrowded,  do  the  people  who 
physically  maintain  the  site  have  the  right  to  uproot  all  the 
plants  that  were  planted  and  cared  for  remotely  by  others?  Or 
consider  whether  a  cookie  on  a  user’s  machine  belongs  to  the 
user  or  to  the  website  that  set  the  cookie.  If  the  cookie  belongs 
to  the  user,  and  the  browser  does  not  allow  the  average  user  to 
delete  the  cookie,  then  by  virtue  of  a  technical  mechanism  HCI 
designers  have  in  effect  deprived  an  average  user  of  the  capa¬ 
bility  to  exercise  one  of  his  or  her  specific  property  rights. 

Our  point  is  that,  through  conventions  and  technical  mech¬ 
anisms,  HCI  designers  shape  answers  to  questions  of  ownership 
and  property  that  lie  at  the  forefront  of  social  discourse  and 
legal  codification. 

Privacy 

Privacy  refers  to  a  claim,  an  entitlement,  or  a  right  of  an  individ¬ 
ual  to  determine  what  information  about  himself  or  herself  can 
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be  communicated  to  others  (Schoeman,  1984).  Historically,  a 
good  deal  of  our  privacy  protections  arose  because  it  was  simply 
too  much  effort  to  collect  and  sort  through  relevant  information 
about  other  individuals.  But  as  computer  technologies  increas¬ 
ingly  garner  large  amounts  of  information  about  specific  indi¬ 
viduals,  and  increasingly  link  information  databases,  our  histor¬ 
ical  protections  for  privacy  are  being  eroded. 

In  the  HCI  community,  three  general  approaches  have 
emerged  for  privacy  protections.  One  approach  informs  “peo¬ 
ple  when  and  what  information  about  them  is  being  captured 
and  to  whom  the  information  is  being  made  available”  (Bellotti, 
1998,  p.  70).  For  example,  a  video  monitor  next  to  a  surveillance 
camera  in  a  convenience  store  can  inform  customers  of  the  in¬ 
formation  being  recorded.  A  second  approach  allows  “people 
to  stipulate  what  information  they  project  and  who  can  get  hold 
of  it”  (Bellotti,  1998,  p.  70).  Tang’s  (1997)  example  of  an  on/off 
switch  for  a  video  conferencing  workstation  (noted  earlier) 
mapped  onto  this  approach.  A  third  approach  applies  privacy¬ 
enhancing  technologies  (PETs)  that  prevent  sensitive  data  from 
being  tagged  to  a  specific  individual  in  the  first  place.  PETs  have 
become  “extraordinarily  successful”  in  taxing  “the  capacities  of 
even  the  most  powerful  surveillance  institutions”  (Phillips,  1998, 
p.  245),  and  thus  have  provoked  sharp  conflicts  in  attempts  to 
disseminate  them.  Indeed,  Agre  (1998)  argued  that  PETs  “dis¬ 
rupt  the  conventional  pessimistic  association  between  technol¬ 
ogy  and  social  control.  No  longer  are  privacy  advocates  in  the 
position  of  resisting  technology  as  such”  (p.  4). 

As  we  move  toward  an  era  of  ubiquitous  computing,  in  which 
embedded  computation  in  our  everyday  physical  objects  are 
linked,  the  amount  of  information  known  about  us  will  increase 
by  many  orders  of  magnitude.  Thus,  here,  as  in  other  domains 
of  applications,  we  in  the  HCI  community  face  difficult  questions. 
How  do  we  inform  users  about  the  risks  to  privacy  they  incur  by 
using  various  information  systems?  How  do  we  inform  users 
about  the  technical  protections  for  privacy  available  to  them? 
How  do  we  help  users  to  understand  the  risks  from  aggregated 
versus  individual  data?  How  much  protection  is  afforded  by 
anonymity?  What  should  the  default  on  systems  be? — toward 
greater  privacy  protection  or  toward  greater  access  to  informa¬ 
tion?  Given  the  centrality  of  these  and  related  questions,  privacy 
has  become  by  far  the  most  visible  and  well-studied  value  in  HCI 
(Boyle,  Edwards,  &  Greenberg,  2000;  Cohen,  Cash,  &  Muller, 
2000;  Consolvo,  Smith,  Matthews,  LaMarca,  Tabert,  &  Powledge, 
2005;  Friedman,  Kahn,  Hagman,  Severson,  &  Gill,  in  press;  Gode- 
froid,  Herbsleb,  Jagadeesan,  &  Li,  2000;  Good  &  Krekelberg,  2003; 
lachello  &  Abowd,  2005;  Jancke,  Venolia,  Grudin,  Cadiz,  &  Gupta, 
2001;  Jiang,  Hong,  &  Landay  2002;  Milewski  &  Smith,  2000; 
Neustaedter  &  Greenberg,  2003;  Palen  &  Dourish,  2003;  Patil  & 
Lai,  2005;  Shoemaker  &  Inkpen,  2001;  Tan  &  Czerwinski,  2003). 

Freedom  from  Bias 

Bias  refers  to  systematic  unfairness  perpetrated  on  individuals 
or  groups.  Three  forms  of  bias  in  computer  systems  have  been 
identified  (Friedman  &  Nissenbaum,  1996):  preexisting  social 
bias,  technical  bias,  and  emergent  social  bias. 

Preexisting  social  bias  has  its  roots  in  social  institutions,  prac¬ 
tices,  and  attitudes.  It  occurs  when  computer  systems  embody 
biases  that  exist  independently  of,  and  usually  prior  to,  the  cre¬ 


ation  of  the  software.  For  instance,  the  work  by  Nass  and  his  col¬ 
leagues  (Nass  &  Brave,  2005;  Reeves  &  Nass,  1996)  has  shown 
that  people  respond  to  a  computer’s  “gender”  along  stereotypi¬ 
cal  lines.  Male  voice  interfaces  are  rated  more  competent  and 
persuasive,  and  more  knowledgeable  about  technical  subjects. 
Female  voice  interfaces  are  viewed  as  more  knowledgeable 
about  topics  such  as  love  and  relationships.  As  Nass  points  out, 
as  a  designer  it  is  all  too  easy  to  inadvertently  build  on  these  pre¬ 
existing  social  biases  when  building  interfaces.  Thus,  to  minimize 
preexisting  bias  designers  must  not  only  scrutinize  the  design 
specifications,  but  must  couple  this  scrutiny  with  a  good  under¬ 
standing  of  relevant  biases  out  in  the  world.  In  addition,  it  can 
prove  useful  to  identify  potential  user  populations  that  might 
otherwise  be  overlooked  and  include  representative  individuals 
in  the  field-test  groups.  Rapid  prototyping,  formative  evaluation, 
and  field  testing  with  such  well-conceived  populations  of  users 
can  be  an  effective  means  to  detect  unintentional  biases  through¬ 
out  the  design  process. 

Technical  bias  occurs  in  the  resolution  of  technical  design 
problems.  For  example,  imagine  a  database  for  matching  organ 
donors  with  potential  transplant  recipients.  If  certain  individuals 
retrieved  and  displayed  on  initial  screens  are  favored  systemati¬ 
cally  for  a  match  over  individuals  displayed  on  later  screens, 
technical  bias  occurs.  Technical  bias  also  originates  from  at¬ 
tempts  to  make  human  constructs  such  as  discourse,  judg¬ 
ments,  or  intuitions  amenable  to  computers.  For  example,  con¬ 
sider  a  legal  expert  system  that  advises  defendants  on  whether 
or  not  to  plea  bargain  by  assuming  that  law  can  be  spelled  out  in 
an  unambiguous  manner  and  is  not  subject  to  human  interpre¬ 
tations  in  context.  Toward  minimizing  technical  bias,  designers 
often  need  to  look  beyond  the  features  internal  to  a  system  and 
envision  the  design,  algorithms,  and  interfaces  in  use. 

Emergent  social  bias  emerges  in  the  context  of  the  computer 
system’s  use,  often  when  societal  knowledge  or  cultural  values 
change,  or  the  system  is  used  with  a  different  population.  Toward 
minimizing  emergent  bias,  designers  need  to  plan  for  not  only  a 
system’s  intended  contexts  of  use,  but  also  its  potentially  emer¬ 
gent  contexts.  Yet,  given  limited  resources,  such  a  proposal  can¬ 
not  be  pursued  in  an  unbounded  manner.  Thus,  three  practical 
suggestions  are  as  follows.  First,  designers  should  reasonably  an¬ 
ticipate  probable  contexts  of  use  and  design  for  these  systems. 
Second,  where  it  is  not  possible  to  design  for  extended  contexts 
of  use,  designers  should  communicate  to  users  the  contextual 
constraints.  As  with  other  media,  we  may  need  to  develop  con¬ 
ventions  for  communicating  the  perspectives  and  audience(s)  as¬ 
sumed  in  the  design.  Third,  system  designers  and  administra¬ 
tors  can  take  responsible  action  if  bias  emerges  with  changes  in 
context.  The  National  Resident  Medical  Match  Program  offers  a 
good  example.  Although  the  original  design  of  the  admissions 
algorithm  did  not  deal  well  with  the  changing  social  conditions 
(when  significant  numbers  of  dual-career  couples  participated 
in  the  match),  those  responsible  for  maintaining  the  system  re¬ 
sponded  conscientiously  to  this  societal  change  and  modified 
the  system’s  algorithm  to  place  couples  more  fairly  (Roth,  1990). 

Universal  Usability 

Universal  usability  refers  to  making  all  people  successful  users 
of  information  technology.  Or,  if  this  requirement  of  “all  people” 
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is  too  Stringent,  it  can  be  reframed  as  “all  people  who  so  desire,” 
or  in  some  other  way.  Shneiderman  (2000),  for  example,  said 
that  universal  usability  means  “having  more  than  90  percent  of 
all  households  as  successful  users  of  information  and  commu¬ 
nication  services  at  least  once  a  week”  (p.  85).  In  many  respects, 
universal  usability  comprises  a  special  case  of  freedom  from 
bias,  one  that  focuses  on  usability  as  a  means  to  systematically 
prevent  unfair  access  to  information  systems. 

Three  challenges  are  often  addressed  by  proponents  of  uni¬ 
versal  usability:  (a)  technological  variety — supporting  a  broad 
range  of  hardware,  software,  and  network  access;  (b)  user 
diversity — accommodating  users  with  differences  in,  say,  skills, 
knowledge,  age,  gender,  disabilities,  literacy,  culture,  and  in¬ 
come;  and  (c)  gaps  in  user  knowledge — bridging  the  gap  be¬ 
tween  what  users  know  and  what  they  need  to  know  (Shnei¬ 
derman,  2000).  Toward  addressing  these  challenges,  current 
work  includes  Aberg  and  Shahmehri  (2001);  Beard  and  Korn 
(2001);  Cooper  and  Rejmer  (2001);  Jacko,  Dixon,  Rosa,  Scott, 
and  Pappas  (1999);  Lazar,  Dudley-Sponaugle,  and  Greenidge 
(2004);  Mankoff,  Fait,  and  Tran  (2005);  Oviatt  (1999),  and  Shnei¬ 
derman  (2003).  Stephanidis  (2001)  provided  an  anthology  of 
current  concepts,  methods,  and  tools. 

Universal  usability,  of  course,  is  not  necessarily  always  a 
moral  good  insofar  as  it  depends  on  what  is  being  used  or  ac¬ 
cessed.  Virtually  no  one  would  suggest,  for  example,  that  there 
is  a  moral  imperative  to  provide  universal  access  to  reruns  of 
“I  Love  Lucy.”  Nonetheless,  universal  usability  often  is  a  moral 
good.  Hert  and  her  colleagues  (Hert,  Liddy,  Shneiderman,  & 
Marchionnini,  2003),  for  example,  showed  that  if  all  U.S.  citizens 
have  the  right  to  access  federal  statistics,  and  if  those  statistics 
are  only  available  in  an  online  format,  then  to  be  able  to  exercise 
their  right,  the  online  federal  statistics  system  must  be  usable  by 
all  U.S.  citizens.  Moreover,  universal  access  with  ethical  import 
often  provides  increased  value  to  a  company.  For  example, 
based  on  his  case  study  within  a  large  communications  corpo¬ 
ration  (NYNEX),  Thomas  (1997)  showed  how  making  commu¬ 
nications  systems  more  accessible  leads  to  three  direct  corpo¬ 
rate  benefits:  (a)  increasing  access  makes  a  communications 
device  more  pervasive  in  social  life,  and  thereby  more  valuable 
for  everyone;  (b)  increasing  access  increases  market  share;  and 
(c)  increasing  access  “forces  technologists,  developers,  mar¬ 
keters,  and  executives  to  think  ‘out  of  the  box’”  (p.  271). 

Trust 

People  sometimes  use  the  term  trust  broadly  to  include  expec¬ 
tations  of  natural  phenomena  or  machine  performance.  It  is  in 
this  sense  that  people  trust  that  “the  sun  will  rise  tomorrow” 
or  that  “brakes  will  stop  a  car.”  Indeed,  the  Computer  Science 
and  Telecommunications  Board  in  their  thoughtful  publication 
Trust  in  Cyberspace  (Schneider,  1999)  adopted  the  terms  trust 
and  trustworthy  to  describe  systems  that  perform  as  expected 
along  the  dimensions  of  correctness,  security,  reliability,  safety, 
and  survivability.  However,  equating  the  term  trust  with  expec¬ 
tations  for  machine  performance  (or  physical  phenomena)  mis¬ 
construes  fundamental  characteristics  of  this  value.  Specifically, 
trust  is  said  to  exist  between  people  who  can  experience  good 
will,  extend  good  will  toward  others,  feel  vulnerable,  and  expe¬ 


rience  betrayal  (Baier,  1986;  Friedman,  Kahn,  &  Howe,  2000; 
Kahn  &  Turiel,  1988).  Moreover,  on  the  societal  level,  trust  en¬ 
hances  our  social  capital  (Putnam,  Leonard!,  &  Nanetti,  1993). 

In  their  analysis  of  trust  online,  Friedman  et  al.  (2000)  sug¬ 
gested  that  it  is  important  for  designers  to  distinguish  two  over¬ 
arching  contexts:  e-commerce  and  interpersonal  relationships. 
In  e-commerce,  for  example,  certain  characteristics  of  the  tech¬ 
nology — such  as  those  that  concern  security,  anonymity,  ac¬ 
countability,  and  performance  history — can  make  it  difficult  for 
consumers  to  assess  their  possible  financial  harms  and  the  po¬ 
tential  good  will  of  the  company.  One  solution  that  has  emerged 
has  been  a  form  of  insurance  (usually  through  the  credit  card 
companies)  that  limits  a  person’s  financial  liability.  Trust  in 
e-commerce  has  received  a  good  deal  of  attention  in  the  last 
several  years  in  the  HCI  and  larger  community  (Dieberger, 
Hook,  Svensson,  &  Lonnqvist,  2001;  Egger,  2000;  Eogg  &  Tseng, 
1999;  Greenspan,  Goldberg,  "Weimer,  &  Basso,  2000;  Jones, 
■Wilkens,  Morris,  &  Masera,  2000;  Olson  &  Olson,  2000;  Riegels- 
berger,  Sasse,  &  McCarthy,  2003). 

In  online  interpersonal  interactions,  violations  of  trust  make 
us  most  vulnerable  psychologically — for  example,  through  hurt 
feelings  or  embarrassment.  Rocco  (1998)  suggested  that  inter¬ 
personal  trust  online  succeeds  best  when  preceded  by  face-to- 
face  interaction.  In  response,  Zheng,  'Veinott,  Bos,  Olson,  and 
Olson  (2002)  suggested  that  when  users  do  not  meet  physically 
but  engage  in  online  chat  to  get  to  know  one  another  they  can 
establish  the  same  kinds  of  interpersonal  trust  that  are  estab¬ 
lished  in  face-to-face  interactions  (cf  Bos,  Olson,  Gergle,  Ol¬ 
son,  &  'Wright,  2002).  Moreover,  geography  matters  for  remote 
collaborations;  Bradner  and  Mark  (2002)  showed  that  people,  in 
effect,  “trust”  less  those  individuals  with  whom  they  interact  re¬ 
motely  when  they  believe  those  individuals  to  be  in  a  distant  city 
rather  than  a  nearby  one.  Interestingly,  whereas  the  character¬ 
istic  of  anonymity  works  against  establishing  financial  trust  in 
e-commerce,  it  is  double-edged  in  interpersonal  relationships. 
On  the  negative  side,  online  anonymity  can  limit  the  depth  of  in¬ 
terpersonal  interactions  in  so  far  as  we  engage  in  a  singular 
means  of  expression  (written).  On  the  positive  side,  online 
anonymity  opens  up  new  interpersonal  opportunities.  For  ex¬ 
ample,  a  gay  teenager  in  an  intolerant  family  and  community 
might  rely  on  the  anonymous  characteristics  of  the  web  to  find 
and  interact  with  like-minded  peers.  Thus,  toward  enhancing  in¬ 
terpersonal  trust  online,  we  need  to  build  tools  that  allow  users 
to  have  easy  and  refined  control  about  what  personal  informa¬ 
tion  is  made  known  to  others. 


Autonomy 

People  decide,  plan,  and  act  in  ways  that  they  believe  will  help 
them  to  achieve  their  goals.  In  this  sense  people  value  their 
autonomy  It  might  appear  relatively  easy  for  HCI  designers  to 
support  user  autonomy.  The  idea  would  be  to,  whenever  pos¬ 
sible,  provide  users  with  the  greatest  possible  control  over  the 
technology.  However,  the  task  is  harder  than  that.  After  all,  most 
users  of  a  word  processor  have  little  interest,  say,  in  controlling 
how  the  editor  executes  a  search-and-replace  operation  or  em¬ 
beds  formatting  commands.  Similarly  Hook  (1998)  suggested 
users  may  accept  adaptive  hypermedia  as  a  means  to  address  in- 
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formation  overload.  Or,  as  demonstrated  by  Barkhuus  and  Dey 
(2003)  in  a  study  of  context-aware  computing  with  a  mobile 
phone  device,  users  may  be  willing  to  relinquish  some  control 
in  order  to  achieve  gains  in  usefulness.  At  the  same  time,  users 
may  be  uncomfortable  with  fully  automatic  systems.  For  ex¬ 
ample,  in  a  study  investigating  the  viability  of  automatic  loca¬ 
tion-disclosure  messages  to  support  social  awareness  (lachello. 
Smith,  Consolvo,  Abowd,  Hughes,  Howard,  Potter,  Scott,  Sohn, 
Hightower,  &  LaMarca,  2005)  participants  neither  used  nor 
wanted  the  automatic-message  feature.  In  other  words,  auton¬ 
omy  is  protected  when  users  are  given  control  over  the  right 
things  at  the  right  time.  The  hard  work  arises  in  deciding  what 
those  features  are  and  when  those  conditions  occur. 

Toward  this  end,  Friedman  and  Nissenbaum  (1997)  identi¬ 
fied  four  aspects  of  systems  that  can  promote  or  undermine 
user  autonomy.  The  first  involves  system  capability.  Recall  Isaacs, 
Tang,  and  Morris’s  (1996)  design  of  a  system  to  support  informal 
interactions  in  the  workplace,  using  a  piazza  as  the  metaphor. 
Their  application  provided  a  means  for  workers  to  opt  out  of 
piazza  interactions,  and  thus  supported  worker  autonomy.  The 
second  involves  system  complexity.  For  example,  as  previously 
noted,  features  of  software  programs  can  proliferate  at  the  ex¬ 
pense  of  usability,  and  user  autonomy  suffers.  The  third  involves 
misrepresentation  of  the  system.  For  example,  hyperbolic  ad¬ 
vertising  claims  can  lead  users  to  develop  inaccurate  expecta¬ 
tions  of  the  system  and  thereby  frustrate  users’  goals.  The 
fourth  involves  system  fluidity.  Over  time,  user’s  goals  often 
change.  Thus,  systems  need  to  provide  ready  mechanisms  for 
users  to  review  and  adapt  their  systems. 

Many  systems  depend  on  categorization,  and  it  is  here  that 
Suchman  (1994)  worried  that  organizations  can  run  roughshod 
over  user  autonomy.  For,  according  to  Suchman,  whoever  de¬ 
termines  the  categories — and  how  those  categories  can  be 
used — imputes  their  own  personal  values  into  the  system  and 
has  power  over  the  user.  In  response,  Winograd  (1994)  pointed 
to  the  frequent  need  for  socially  coordinated  activity  through 
which  groups  of  people  seek  to  share  information  and  technol¬ 
ogy.  In  such  activity,  "Winograd  argued,  we  need  some  degree 
of  standardization  wherein  designers  impose  categories.  That 
is  their  job.  The  key,  according  to  "Winograd,  is  to  cultivate  reg¬ 
ularized  activity  without  becoming  oppressive.  In  turn,  Malone 
(1994)  responded  to  Suchman  by  arguing  that  not  only  are  cat¬ 
egories  often  useful,  but  to  some  extent  they  are  necessary 
given  the  structure  of  human  cognition.  At  the  same  time,  Mal¬ 
one  suggests  that  no  category  system  is  complete  and  that  de¬ 
signs  need  to  be  adaptable,  often  by  their  users. 

Informed  Consent 

Informed  consent  provides  a  critical  protection  for  privacy,  and 
supports  other  human  values  such  as  trust  and  autonomy.  Yet, 
currently  there  is  a  mismatch  between  industry  practice  and  the 
public’s  interest.  According  to  a  recent  report  from  the  Federal 
Trade  Commission  (2000),  for  example,  59%  of  websites  that 
collect  personal  identifying  information  neither  inform  Internet 
users  that  they  are  collecting  such  information,  nor  seek  the 
user’s  consent.  Yet,  according  to  a  Harris  Poll  (2000),  88%  of 
users  want  websites  to  garner  their  consent  in  such  situations. 


The  Federal  Trade  Commission  (2000,  p.  iv)  hoped  that  industry 
would  continue  to  make  progress  on  this  problem,  in  conjunc¬ 
tion  with  its  proposed  legislation.  In  turn,  the  HCI  community 
has  been  recognizing  the  need  to  understand  better  what  con¬ 
stitutes  informed  consent,  and  to  realize  it  in  online  interactions. 

Friedman  and  her  colleagues  (Friedman,  Lin,  &  Miller,  2005; 
Millet,  Friedman,  &  Felten,  2001)  offered  an  analysis  of  what 
constitutes  informed  consent,  and  show  how  both  words — 
“informed”  and  “consent” — have  import  (Faden  &  Beauchamp, 
1986;  The  Belmont  Report,  1978).  The  idea  of  “informed”  en¬ 
compasses  disclosure  and  comprehension.  Disclosure  refers  to 
providing  accurate  information  about  the  benefits  and  harms 
that  might  reasonably  be  expected  from  the  action  under  con¬ 
sideration.  Comprehension  refers  to  the  individual’s  accurate 
interpretation  of  what  is  being  disclosed.  In  turn,  the  idea  of 
“consent”  encompasses  voluntariness,  competence,  and  agree¬ 
ment.  "Voluntariness  refers  to  assurance  that  the  action  is  not 
controlled  or  coerced,  and  that  an  individual  could  reasonably 
resist  participation  should  he  or  she  wish  to.  Competence  refers 
to  possessing  the  mental,  emotional,  and  physical  capabilities 
needed  to  give  informed  consent.  Agreement  refers  to  a  rea¬ 
sonably  clear  opportunity  to  accept  or  decline  to  participate. 
Based  on  this  account,  Friedman,  Millett,  and  Felten  (2000)  of¬ 
fered  general  design  principles  for  informed  consent  online, 
namely  (a)  deciding  whether  the  capability  is  exempt  from  in¬ 
formed  consent;  (b)  being  particularly  careful  when  invoking 
the  sanction  of  implicit  consent;  (c)  having  default  settings  err 
on  the  side  of  preserving  informed  consent;  (d)  putting  the  user 
in  control  of  the  “nuisance  factor;”  (e)  avoiding  technical  jargon; 
(f )  providing  the  user  with  choices  in  terms  of  potential  effects 
rather  than  in  terms  of  technical  mechanisms;  (g)  field  testing  to 
help  ensure  adequate  comprehension  and  opportunities  for 
agreement;  and  (h)  designing  proactively. 

In  conjunction  with  this  model  for  informed  consent  online, 
Millett  et  al.  (2001)  examined  how  cookie  technology  and  web 
browser  designs  have  responded  to  concerns  about  informed 
consent.  Specifically,  they  documented  relevant  design  changes 
in  Netscape  Navigator  and  Internet  Explorer  over  a  five-year  pe¬ 
riod,  starting  in  1995.  Their  retrospective  examination  led  them 
to  conclude  that  while  cookie  technology  has  improved  over 
time  regarding  informed  consent,  some  startling  problems  re¬ 
mained.  They  specified  six  problems  and  offered  design  reme¬ 
dies.  Further  work  by  Friedman,  Howe,  and  Felten  (2002)  car¬ 
ried  this  work  forward  by  redesigning  several  aspects  of  the 
Mozilla  Browser.  For  example,  no  mechanism  existed  for  users 
to  become  aware  of  when  a  cookie  was  set  without  requiring 
the  user  to  stop  the  current  task.  The  Mozilla  redesign  includes 
a  sidebar  that  watches  for  cookies  and  displays  a  notice  in  the 
user’s  periphery  whenever  a  cookie  is  set.  The  display  includes 
information  about  the  domain  and  third  party  status  of  the 
cookie;  the  information  remains  ready-to-hand-over  whenever 
the  user  chooses  to  act  upon  it. 

Accountability 

Medical  expert  systems.  Automated  pilots.  Loan  approval  soft¬ 
ware.  Computer-guided  missiles.  Increasingly,  computers  par¬ 
ticipate  in  decisions  that  affect  human  lives.  In  cases  of  com- 


1256  •  FRIEDMAN  AND  KAHN 


puter  failure,  there  is  a  common  response  to  blame  the  com¬ 
puter — “It’s  the  computer’s  fault.”  Indeed,  Friedman  and  Millett 
(1995)  found  that  83%  of  undergraduate  computer  science  ma¬ 
jors  she  interviewed  attributed  aspects  of  agency — either  deci¬ 
sion-making  and/or  intentions — to  computers.  In  addition,  21% 
of  the  students  consistently  held  computers  morally  responsible 
for  error.  Yet,  if  we  accept  that  humans,  not  computational  sys¬ 
tems,  are  capable  of  being  moral  agents,  such  blame  is  funda¬ 
mentally  misplaced  (Searle,  1980). 

How  can  HCI  designers  minimize  this  tendency  of  users  to  at¬ 
tribute  blame  to  computational  systems?  Suggestions  have  been 
offered  based  on  minimizing  two  types  of  distortions  (Friedman 
&  Kahn,  1992).  In  the  first  type  of  distortion,  the  computational 
system  diminishes  or  undermines  the  human  user’s  sense  of  his 
or  her  moral  agency.  In  such  systems,  human  users  are  placed 
into  largely  mechanical  roles,  either  mentally  or  physically,  and 
frequently  have  little  understanding  of  the  larger  purpose  or 
meaning  of  their  individual  actions.  To  the  extent  that  humans 
experience  a  diminished  sense  of  agency,  human  dignity  is 
eroded  and  individuals  may  consider  themselves  largely  unac¬ 
countable  for  the  consequences  of  their  computer  use.  Con¬ 
versely,  in  the  second  type  of  distortion,  the  computational  sys¬ 
tem  masquerades  as  an  agent  by  projecting  intentions,  desires, 
and  volition.  Strikingly,  even  when  computer  interfaces  only  min¬ 
imally  mimic  human  agency,  people  appear  predisposed — at 
least  in  certain  regards — to  treat  computers  as  social  agents.  For 
example,  as  Nass  and  his  colleagues  have  shown,  people  re¬ 
spond  to  multiple  voices  from  a  single  computer  as  though  they 
were  separate  entities,  respond  to  a  computer’s  “gender”  along 
stereotypical  lines,  are  less  likely  to  criticize  a  computer  directly 
(e.g.,  if  the  computer  itself  asks  for  an  evaluation)  than  to  criti¬ 
cize  the  computer  to  a  third  party  (a  different  computer  or  hu¬ 
man),  and  respond  to  computer  flattery  (Nass  &  Brave,  2005; 
Reeves  &  Nass,  1996).  Thus,  to  the  extent  that  humans  inappro¬ 
priately  attribute  agency  to  computational  systems,  humans  may 
well  consider  the  computational  systems,  at  least  in  part,  to  be 
morally  responsible  for  the  effects  of  computer-mediated  or 
computer-controlled  actions. 

Identity 

The  idea  of  personal  identity  embraces  two  seemingly  contra¬ 
dictory  ideas.  On  the  one  side  is  the  obvious  fact  that  each 
one  of  us  has  many  roles.  A  single  person  can  be,  for  example, 
a  father,  lover,  poker  player,  gourmet  cook,  computer  geek, 
and  animal  lover.  Indeed,  "William  James  says  that  a  person 
“has  many  social  selves  as  there  are  individuals  who  recog¬ 
nize  him”  (quoted  in  Rosenberg,  1997,  p.  23).  On  the  other 
side,  virtually  all  of  us  feel  like  we  live  reasonably  coherent 
lives,  and  that  the  person  we  are  today  is  pretty  much  the  per¬ 
son  we  were  yesterday  and  last  week,  if  not  last  year.  Thus, 
identity  appears  to  be  multiple  and  unified,  and  both  aspects 
are  essential  to  human  flourishing.  Too  far  toward  multiplicity 
and  we  end  up  schizophrenic;  too  far  toward  being  unified  and 
there  are  too  few  mechanisms  for  psychological  growth,  and 
too  little  basis  for  healthy  social  functioning  (e.g.,  it  makes  little 
sense  to  present  the  same  “persona”  to  one’s  boss  as  one  does 
to  one’s  child). 


In  terms  of  HCI,  it  is  important  that  the  field  as  a  whole  sub¬ 
stantively  support  both  manifestations  of  identity,  and  that  in 
our  designs  we  do  not  swing  too  far  one  direction  or  the  other. 
To  date,  the  networked  personal  computer  has  easily  supported 
multiplicity.  Thus,  the  same  person  can  easily  communicate  with 
many  unrelated  groups  (e.g.,  chats,  list  servers,  etc.),  easily  es¬ 
tablish  a  different  identity  within  many  of  these  groups,  or  even 
multiple  identities  within  a  single  group  (Turkle,  1996).  This 
trend  is  being  checked  in  interesting  ways.  For  example,  indi¬ 
viduals  in  some  online  communities  link  their  comments  to 
their  personal  homepage  (“To  find  out  more  about  me,  click 
here.”)  Some  chats  (e.g.,  SeniorNet  described  in  Mynatt  et  al., 
1999)  require  single  identities.  Ubiquitous  location-aware  sys¬ 
tems  (e.g.,  DigiDress  described  in  Persson,  Blom,  &  Yang,  2005) 
have  been  used  to  facilitate  “real”  social  interaction  between  pre¬ 
viously  unacquainted  users.  Indeed,  with  the  advent  of  increas¬ 
ingly  linked  online  databases  coupled  with  ubiquitous  comput¬ 
ing,  we  believe  that  the  pendulum  will  soon  shift  powerfully  and 
pervasively  toward  the  unification  of  identity.  If  so,  following 
Bers,  Gonzalo-Heydrich,  and  DeMaso  (2001),  and  Schiano  and 
"White  (1998),  the  challenge  for  HCI  designers  will  be  to  find 
ways  such  that  individuals  have  flexibility  to  establish  and  reveal 
not  only  an  integrated  self  but  also  a  multiplicity  of  identities. 

Calmness 

"Weiser  and  Brown  (1997)  suggested  that  in  the  last  50  years, 
there  have  been  two  great  trends  in  the  human  relationship 
with  computing:  the  first  was  with  the  mainframe  computer  and 
the  second  is  currently  with  the  personal  computer.  They  sug¬ 
gest  that  the  next  great  trend  is  toward  ubiquitous  computing, 
characterized  by  deeply  embedded  computation  in  the  world. 
In  turn,  they  argue  that  the  “most  potentially  interesting,  chal¬ 
lenging,  and  profound  change  implied  by  the  ubiquitous  com¬ 
puting  era  is  a  focus  on  calm.  If  computers  are  everywhere,  they 
had  better  stay  out  of  the  way,  and  that  means  designing  them 
so  that  the  people  being  shared  by  the  computers  remain 
serene  and  in  control”  (p.  79). 

The  central  design  mechanism  "Weiser  and  Brown  put  for¬ 
ward  is  one  in  which  information  remains  in  the  periphery  un¬ 
til  it  is  relevant  and,  only  then,  is  moved  to  the  center  of  our 
awareness.  One  architectural  example  they  provide  is  of  a  glass 
window  between  offices  and  hallways.  Such  a  window,  they  sug¬ 
gest,  “extends  our  periphery  by  creating  a  two-way  channel  for 
clues  about  the  environment.  "Whether  it  is  motion  of  other  peo¬ 
ple  down  the  hall  (it’s  time  for  lunch;  the  big  meeting  is  start¬ 
ing)  or  noticing  the  same  person  peeking  in  for  the  third  time 
while  you  are  on  the  phone  (they  really  want  to  see  me;  I  for¬ 
got  an  appointment),  the  window  connects  the  person  inside  to 
the  nearby  world”  (pp.  81-82).  A  computational  example  in¬ 
volves  an  Internet  multicast,  a  continuous  video  from  another 
location  that  provides  not  so  much  video  conferencing,  but 
“more  like  a  window  of  awareness”  of  another  location  (p.  82). 
Another  example  involves  a  “dangling  string,”  “an  eight-foot 
piece  of  plastic  spaghetti  that  hangs  from  a  small  electric  motor 
mounted  in  a  ceiling.  The  motor  is  electronically  connected  to  a 
nearby  Ethernet  cable,  so  that  each  bit  of  information  that  goes 
past  causes  a  tiny  twitch  of  the  motor.  A  very  busy  network 
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causes  a  madly  whirling  string  with  a  characteristic  noise;  a  quiet 
network  causes  only  a  small  twitch  every  few  seconds”  (p.  83). 

The  idea  that  ubiquitous  computing  will  need  to  take  up 
the  challenge  of  preserving  calmness  in  human  lives  seems  on 
the  mark.  There  are,  however,  some  limitations  in  the  solution 
Weiser  and  Brown  put  forward  in  terms  of  designing  for  the 
periphery.  Perhaps  the  most  notable  is  that  very  quickly  the 
periphery  itself  can  become  overloaded  with  information,  es¬ 
pecially  if  workplace  conventions  expect  workers  to  be  con¬ 
tinuously  aware  of  such  information.  Imagine,  for  example, 
working  in  an  office  with  a  window  out  onto  the  hallway,  with 
four  Internet  multicasts  playing  in  the  background,  a  dangling 
string  in  the  corner,  five  other  peripheral  information  streams 
on  each  of  several  pieces  of  software  you  are  using,  and  then 
add  in  peripheral  information  streams  on  one’s  regular  phone, 
cell  phone,  personal  digital  assistant,  and  any  and  all  other 
pieces  of  embedded  computation  in  the  office  of  today  (or  the 
future).  Then  try  to  get  some  work  done.  It  will  not  be  easy. 
Thus,  as  ubiquitous  computing  takes  hold,  it  is  not  surprising 
that  we  see  an  emerging  body  of  work  addressing  the  chal¬ 
lenges  of  information  overload  and  interruptability  (Fogarty, 
Hudson,  Atkeson,  Avrahami,  Forlizzi,  Kiesler,  Lee,  &  Yang, 
2005;  Hudson,  Christensen,  Kellog,  &  Erickson,  2002;  Voida, 
Newstetter,  &  Mynatt,  2002). 

An  alternative  design  approach  (Kahn  &  Friedman,  in  prepa¬ 
ration;  cf  Farrell,  2001)  toward  preserving  calmness  has  as  its 
starting  point  the  psychological  literature  that  shows  that  direct 
experiences  with  nature  have  beneficial  effects  on  people’s  phys¬ 
ical,  cognitive,  and  emotional  well-being.  For  example,  Ulrich 
(1984)  found  that  postoperative  recovery  improved  when  pa¬ 
tients  were  assigned  to  a  room  with  a  view  of  a  natural  setting 
(a  small  stand  of  deciduous  trees)  versus  a  view  of  a  brown  brick 
wall.  More  generally,  studies  have  shown  that  even  minimal  con¬ 
nection  with  nature — such  as  looking  at  a  natural  landscape — 
can  reduce  immediate  and  long-term  stress,  reduce  sickness  of 
prisoners,  and  calm  patients  before  and  during  surgery  (see, 
for  reviews.  Beck  &  Katcher,  1996;  Kahn,  1999;  Ulrich,  1993). 
Building  on  this  literature,  the  question  becomes  in  what  ways 
computer  technologies  can  augment  the  human  relationship 
with  the  natural  world  with  beneficial  effects.  One  implemen¬ 
tation  might  involve  video  plasma  display  “windows”  in  inside 
offices  that  stream  in  real-time  views  of  a  psychologically  restora¬ 
tive  local  nature  scene  (Friedman,  Freier,  &  Kahn,  2004).  An¬ 
other  implementation  might  involve  robotic  “pets”  as  possible 
companions  for  the  elderly  (Beck,  Edwards,  Kahn,  &  Friedman, 
2004).  The  point  is  that  while  such  augmented  natural  inter¬ 
actions  may  not  be  as  psychologically  beneficial  as  real  nature,  it 
may  be  more  so  than  no  nature,  in  which  case  it  becomes  a 
plausible  area  for  design  applications. 

Environmental  Sustainability 

We  pollute  air  and  water,  deplete  soil,  deforest,  create  toxic 
wastes,  and  through  human  activity  are  extinguishing  over 
27,000  species  each  year  (a  conservative  estimate).  Such  prob¬ 
lems  have  generated  a  tremendous  amount  of  attention  among 
the  populace,  and  have  slowly  been  coming  under  the  purview  of 
the  computing  community  (cf  IEEE  and  ACM:  Software  Engi¬ 


neering  Code  of  Ethics  and  Professional  Practice,  1998).  On  the 
production  end,  there  is  concern  about  the  resources  used  in 
producing  computer  technologies,  as  well  as  the  resulting  toxic 
wastes  (e.g.,  of  making  computer  chips).  On  the  consumption 
end,  there  is  concern  about  the  resources  computer  technolo¬ 
gies  use.  The  electrical  demands,  for  example,  are  particularly 
high,  especially  when  energy  sources  are  scarce. 

Another  question  arises  of  how  we  can  design  systems  that 
foster  a  healthier  and  more  life-affirming  connection  with  the 
natural  world  (cf  Kahn,  2006;  Kahn  &  Kellert,  2002).  The  more 
straightforward  approach  has  been  simply  to  harness  computer 
technologies  in  the  service  of  environmental  science,  in  com¬ 
puter  models,  for  example,  of  global  warming  or  earth  tecton¬ 
ics.  Environmental  educators  have  also  been  using  networked 
computers  such  that  students  can  share  environmental  data 
or  narratives  with  other  students  (or  scientists)  in  diverse  geo¬ 
graphical  locations.  Or  consider  an  approach  being  taken  by 
Borning  and  his  colleagues  (Waddell,  Borning,  Noth,  Ereier, 
Becke,  &  Ulfarsson,  2003)  where  they  are  using  a  computer 
simulation  (“UrbanSim”)  of  an  urban  environment  to  help  res¬ 
idents,  politicians,  and  planners  visualize  the  effects  of  their 
proposed  land-use  plans.  Another  approach  might  involve 
designing  calming  technologies,  as  previously  discussed,  that 
integrate  restorative  aspects  of  nature  into  human  lives. 


PROFESSIONAL  ETHICS 


Computer  professionals  sometimes  experience  ethical  conflicts 
in  the  workplace.  Consider,  for  example,  the  situation  in  which, 
by  means  of  his  or  her  specialized  knowledge,  a  computer  pro¬ 
fessional  knows  of  a  harm  that  can  result  from  the  implementa¬ 
tion  of  a  computer  technology.  Should  he  or  she  inform  others 
of  such  impeding  harm  even  if  it  will  jeopardize  his  or  her  job 
security?  Or  consider  the  situation  in  which  a  client  asks  a  soft¬ 
ware  developer  to  develop  an  expert  system  that  recommends 
against  loans  in  certain  neighborhoods.  Although  such  a  policy 
(called  “redlining”)  may  serve  the  economic  interests  of  the 
loan-granting  organization,  it  unfairly  discriminates  against  in¬ 
dividuals  on  the  basis  of  where  they  live.  Should  a  computer 
professional  deny  such  a  request?  Or  consider  the  situation 
offered  by  Quesenbery  (Molich,  2001): 

You  have  set  up  early  usability  tests  of  a  paper  prototype  with  nurses  at 
a  medical  facility.  The  test  was  difficult  to  schedule  because  nurses’ 
time  is  guarded  carefully  and  marketing  carefully  guards  the  relation¬ 
ship  with  customers.  The  nursing  managers  insist  on  “taking  the  test” 
themselves  first,  and  then  on  being  present  in  the  room  during  the 
test  with  the  other  participants  to  “be  sure  they  do  it  right.”  You  be¬ 
lieve  that  the  manger’s  presence  will  be  intimidating  to  the  nurses, 
altering  the  results  of  the  test.  Do  you  continue  with  the  usability 
tests?  (p.  218) 

Quesenbery  offers  two  possible  answers.  One  answer  is  yes. 
She  argues  that  if  you  turn  down  this  opportunity,  another  one 
might  not  arise;  and  although  you  would  have  to  carefully  eval¬ 
uate  the  test  results  for  bias  caused  by  the  managers,  valuable 
data  about  the  prototype  would  still  be  generated.  The  other  an¬ 
swer  is  no.  She  argues  that  it  is  unethical  to  put  nurses  into  such 
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a  stressful  situation  in  which  they  are  studied  in  front  of  their 
obviously  critical  managers. 

In  response  to  ethical  issues  that  arise  within  the  comput¬ 
ing  professions,  numerous  organizations  have  developed  ethi¬ 
cal  codes  of  conduct.  Codes  have  been  developed,  for  exam¬ 
ple,  by  the  ACM,  IEEE  Computer  Society,  DPMA,  and  ICCE 
Berleur  and  Brunnstein  (1997)  provided  a  comparison  of  30  dif¬ 
ferent  codes  of  ethics.  At  the  end  of  1998,  the  IEEE  and  the  ACM 
together  adopted  a  revised  Software  Engineering  Code  of  Ethics 
and  Professional  Practice  (http:/Avww-cs. etsu.edu/seeri/secode 
.htm).  At  the  highest  level  of  abstraction  (see  Gotterbarn  (1999) 
for  a  review),  the  code  states  eight  principles  wherein  software 
engineers  shall  (a)  act  consistently  with  the  public  interest; 
(b)  act  in  a  manner  that  is  in  the  best  interests  of  their  client  and 
employer,  consistent  with  the  public  interest;  (c)  ensure  that 
their  products  and  related  modifications  meet  the  highest  pro¬ 
fessional  standards  possible;  (d)  maintain  integrity  and  inde¬ 
pendence  in  their  professional  judgment;  (e)  subscribe  to  and 
promote  an  ethical  approach  to  the  management  of  software 
development  and  maintenance;  (f)  advance  the  integrity  and 
reputation  of  the  profession  consistent  with  the  public  inter¬ 
est;  (g)  be  fair  to  and  supportive  of  their  colleagues;  and  (h)  par¬ 
ticipate  in  life-long  learning  regarding  the  practice  of  their  pro¬ 
fession  and  promote  an  ethical  approach  to  the  practice  of  their 
profession.  In  turn,  each  principle  is  elaborated  upon  with  spe¬ 
cific  guidelines.  Eor  example,  under  the  first  principle  (to  act 
consistently  with  the  public  interest),  the  code  states  that  soft¬ 
ware  engineers  should  “disclose  to  appropriate  persons  or  au¬ 
thorities  an  actual  or  potential  danger  to  the  user,  the  public, 
or  the  environment,  that  they  reasonably  believe  to  be  associ¬ 
ated  with  the  software  or  related  documents”  (p.  62). 

Before  moving  forward  with  this  topic,  it  is  important  to  take 
hold  of  a  question  often  discussed  in  the  literature:  whether  the 
computer  profession  is  even  a  profession  (Weckert  &  Adeney, 
1997).  If  it  is  not,  then  concerns  about  professional  ethics — 
special  moral  requirements  above  and  beyond  what  are  applied 
to  ordinary  people — disappear. 

What,  then,  is  a  profession?  According  to  Bayles  (1981),  a  pro¬ 
fession  is  comprised  of  members  who  (a)  have  extensive  train¬ 
ing,  (b)  have  an  intellectual  component  such  that  the  profes¬ 
sional’s  primary  role  is  to  advise  the  client  about  things  the  client 
does  not  know,  and  (c)  provide  an  important  service  to  society. 
Bennion  (1969)  also  pointed  out  that  professionals  are  typically 
self-employed,  and  establish  a  client-agent  relationship  based 
on  trust.  In  addition,  as  Johnson  and  Miller  (1997)  wrote:  “Be¬ 
ing  a  professional  means  more  than  just  having  a  job.  The  dif¬ 
ference  is  commitment  to  doing  the  right  thing  because  you  are 
a  member  of  a  group  that  has  taken  on  responsibility  for  a  do¬ 
main  of  social  activity — a  social  function.  The  group  is  account¬ 
able  to  society  for  this  domain,  and  for  this  reason,  professionals 
must  behave  in  ways  that  are  worthy  of  public  trust”  (p.  22). 

Based  on  such  criteria,  in  some  respects  the  computer  field 
constitutes  a  profession.  Computer  personnel  have  intellectual 
knowledge  and  extensive  training,  and  they  provide  an  impor¬ 
tant  service  to  society.  Computer  personnel  are  also  especially 
well  positioned  to  understand  how  new  technologies  may  im¬ 
pact  human  lives;  accordingly,  many  people  argue  that  they  in¬ 
cur  the  responsibility  to  communicate  such  impacts  to  the  pub¬ 
lic.  Also,  the  media,  industry,  and  academy  widely  use  the  term 
“computer  professionals.”  That  said,  the  computer  field  certainly 


does  not  constitute  a  canonical  profession,  like  medicine  or  law. 
After  all,  many  computer  personnel  work  for  businesses  rather 
than  as  consultants,  and  create  artifacts  (e.g.,  software,  hard¬ 
ware,  algorithms,  interface  designs)  rather  than  offer  advice.  Ad¬ 
ditionally,  computer  personnel  vary  greatly  in  the  amount  and 
type  of  their  training.  Some  computer  personnel  are  self-taught, 
without  even  a  high-school  diploma.  Other  computer  personnel 
complete  six-month  to  two-year  programs,  or  a  bachelor’s,  mas¬ 
ter’s,  or  doctoral  program. 

If  we  accept — as  most  people  do — that  in  at  least  some  im¬ 
portant  regards  the  computer  field  constitutes  a  profession, 
then  how  has  the  profession  understood  and  conveyed  its  cor¬ 
relative  ethical  responsibilities?  One  approach  has  been  through 
codes  of  ethics,  as  noted  previously.  Such  codes  serve  multiple 
purposes  (Gotterbarn,  1999;  Anderson,  Johnson,  Gotterbarn,  & 
Perrolle,  1993).  They  serve  to  educate  computer  employees  and 
managers.  They  help  garner  the  trust  of  the  public.  They  pro¬ 
vide  computer  employees  with  a  formal  document  to  turn  to 
(and  appeal  to)  when  they  face  conflicts  that  pit  ethical  deci¬ 
sions  against  the  economic  benefit  of  their  company.  They  can 
function  as  a  means  of  deterrence  and  discipline.  And  they  can 
enhance  a  profession’s  public  standing. 

Another  approach  for  advancing  the  profession’s  ethical  re¬ 
sponsibilities  involves  using  hypothetical  scenarios  to  flesh  out 
the  meaning  of  the  codes  (Parker,  1979;  Parker,  Swope,  &  Baker, 
1990).  For  example,  Anderson  et  al.  (1993)  provided  an  analy¬ 
sis  of  nine  scenarios  that  illustrate  how  the  1992  ACM  Code  of 
Ethics  bears  out  in  practice.  They  offered,  for  example,  the  hy¬ 
pothetical  situation  of  a  consultant,  named  Diane,  who  is  de¬ 
signing  the  database-management  system  for  the  personnel  of¬ 
fice  of  a  medium-sized  company.  The  database  will  store 
sensitive  information,  including  performance  evaluations,  med¬ 
ical  records,  and  salaries.  In  an  effort  to  cut  costs,  the  CEO  of  the 
company  rejects  Diane’s  suggestions  and  opts  for  a  less-secure 
system.  Diane  remains  convinced  that  a  more-secure  system 
is  needed.  What  should  she  do?  Anderson  et  al.  (1993)  then 
pointed  to  specific  passages  in  the  code  that  pertain  to  the  im¬ 
portance  of  maintaining  privacy  and  confidentiality.  Based  on 
the  code,  Diane’s  first  obligation  is  to  try  to  educate  the  com¬ 
pany  officials  on  the  ethics  of  the  situation.  “If  that  fails,  then 
Diane  needs  to  consider  her  contractual  obligations  ...  on  hon¬ 
oring  assigned  responsibilities”  (p.  100). 

Another  possible  approach  toward  advancing  the  profes¬ 
sion’s  ethical  responsibilities  involves  that  of  licensing  its  mem¬ 
bers.  The  model  draws  from  the  American  Medical  Association 
or  the  American  Bar  Association.  Both  associations  can  revoke 
their  members’  licenses  given  serious  violations  of  their  respec¬ 
tive  professional  organization’s  code  of  ethics.  But  there  are 
good  reasons  why  most  professional  computing  organizations 
have  resisted  this  move.  Computing  is  an  enormously  diverse 
field  and  it  is  difficult  to  formally  demarcate  all  its  areas,  let  alone 
the  relevancy  of  one  area  to  others.  What  exactly  must  an  inde¬ 
pendent  website  designer  know,  for  example,  beyond  whatever 
the  designer  and  employer  believe  necessary?  Does  the  field 
really  want  to  exclude  people  all  along  its  fringes?  There  has 
been  concern  that  through  licensing  the  profession  might  be¬ 
come  a  closed  shop  and  thereby  enhance  the  status  and  in¬ 
comes  of  those  admitted  at  the  expense  of  those  excluded. 
There  has  also  been  concern  that  licensing  could  stifle  creativity, 
originality,  and  excellence  (see  the  discussion  by  Neumann, 
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1991).  Licensing  also  runs  counter  to  the  history  of  computing 
that  has  emphasized  ease  of  entry,  entrepreneurship,  and  a  cul¬ 
ture  not  often  found  in  professional  licensing  organizations. 

Setting  aside  licensing,  and  recognizing  the  value  of  codes 
of  ethics  and  analyses  of  hypothetical  dilemmas,  what  else  can 
the  computing  profession  do  to  advance  its  ethical  responsibil¬ 
ities?  Johnson  and  Miller  (1997)  offered  two  suggestions.  One 
is  that  corporations  could  have  ombudspersons  to  whom  com¬ 
puting  professionals  could  report  their  concerns,  anonymously 
if  desired.  Another  is  that  professional  societies  (like  the  ACM) 
could  maintain  “hotlines  that  professionals  could  call  for  advice 
on  how  to  get  their  concerns  addressed”  (p.  23).  Both  sugges¬ 
tions,  if  implemented,  could  prevent  many  problems  from  es¬ 
calating  into  “whistle-blowing”  affairs  and  help  integrate  ethical 
considerations  into  the  workplace. 

The  computing  profession  is  also  recognizing  the  impor¬ 
tance  of  analyzing  not  only  hypothetical  scenarios,  but  also  real 
life  events.  Neumann  (1995)  was  a  pioneer  in  this  regard,  pub¬ 
lishing  ongoing  chronicles  of  the  risks  that  arise  through  com¬ 
puting.  For  the  more  interesting  cases,  Neumann’s  approach 
could  be  extended.  "What  we  have  in  mind  here  builds  on  a  lead¬ 
ing  mountaineering  journal  that  each  year  publishes  brief  ac¬ 
counts  of  the  major  mountaineering  accidents  of  the  previous 
year.  To  read  these  accounts,  year  in  and  year  out,  is  to  build 
up  a  rich  repertoire  of  how  mistakes  happen  in  the  field,  and 
how  they  can  be  avoided.  Similarly,  it  would  be  possible  to  pro¬ 
vide  comprehensive  accounts  of  the  major  ethical  problems  that 
have  arisen  each  year  in  the  computing  profession.  Moreover, 
these  accounts  could  be  classified  based  on  type  of  ethical  value 
(e.g.,  privacy,  trust,  human  welfare,  security,  or  universal  usabil¬ 
ity),  user  population  (e.g.,  workers,  elderly,  children,  or  people 
in  developing  nations),  and  applications  (e.g.,  online  commu¬ 
nities,  information  visualization,  augmented  reality,  wearable 
computing,  or  groupware).  With  such  a  searchable  database, 
computing  professionals  could  gain  ready  access  to  real-life  eth¬ 
ical  case  studies  directly  relevant  to  their  own  endeavors.  In  ad¬ 
dition  to  case  studies,  Hudson  and  Bruckman  (2005)  argued 
that  at  times  empirical  research  into  users’  views  and  values  can 
provide  evidence  to  inform  on  answering  difficult  ethical  dilem¬ 
mas.  For  example,  with  online  interactions,  one  key  piece  of  in¬ 
formation  to  understand  is  whether  users  consider  their  online 
interactions  to  be  public  or  private. 

Most  of  the  ethical  issues  of  HCI  professionals  are  subsumed 
under  those  encountered  by  computer  professionals  in  general. 
But  HCI  professionals  incur  unique  responsibilities  whenever 
they  involve  human  subjects.  At  universities,  such  studies  come 
under  the  purview  of  a  Human  Subjects  Institutional  Review 
Board  (an  IRB)  that  seeks  to  protect  the  rights  and  welfare  of 
human  subjects.  But,  in  industry,  such  oversight  is  usually  lack¬ 
ing.  Consider  the  following  questions: 

•  You  conduct  usability  studies  with  people  within  your  com¬ 
pany.  The  'Vice  President  later  asks  you  which  people  did  well. 
Do  you  tell?  (Molich,  2001) 

•  You  collect  video  footage  of  people  engaged  in  your  usabil¬ 
ity  study.  Can  you  show  the  footage  at  a  conference?  (Mackay, 
1995)  Have  you  obtained  written  consent  from  the  subjects? 
What  if  the  subject  initially  gives  consent  but  then  comes  off 
looking  foolish  in  the  video?  Have  you  given  the  subject  a 
chance  to  review  the  video  footage  and  to  opt  out? 


•  You  obtain  informed  consent,  and  then  collect  keystroke  data 
on  employees  over  months.  How  do  you  keep  your  subjects 
aware  that  your  data  collection  is  ongoing? 

•  You  collect  data  from  subjects  living  in  an  “aware  home”  (that 
has  ubiquitous  computation  embedded  throughout  the  liv¬ 
ing  environment).  How  do  you  convey  to  subjects  the  extent 
of  personal  information  (e.g.,  time  spent  in  the  bathroom) 
that  can  be  mined  from  the  resulting  database? 

•  You  collect  data  from  employees  in  your  company,  and 
promise  to  keep  identifying  information  confidential.  But  if 
there  is  only  one  disabled  person  in  a  given  department,  how 
can  you  maintain  confidentiality? 

•  You  are  involved  in  a  safety-critical  situation.  You  recognize 
that  often  usability  studies  in  your  company  involve  only  a 
handful  of  subjects,  which  allows  for  fast  prototyping  and 
more  quickly  bringing  products  to  market.  But,  in  this  situa¬ 
tion,  is  a  more  rigorous  psychological  study  of  usability 
needed,  even  given  that  it  will  require  additional  resources 
and  time?  (Molich,  2001) 

•  You  collect  information  from  an  online  chat  focused  on  sexual 
abuse  and  then  publish  the  information  widely,  quoting  key 
emotional  passages  that  now  become  visible  on  a  societal  level. 
Even  though  you  report  the  passages  without  identifying 
names,  others  can  now  go  to  this  chat,  search  on  the  text,  and 
easily  identify  the  people  of  interest.  Have  you  adequately  pro¬ 
tected  the  privacy  of  these  online  “subjects”?  How  does  using 
subjects  garnered  from  the  Internet  differ  from  those  garnered 
from  more  traditional  subject  pools  (Frankel  &  Siang,  1999)? 

These  questions  and  many  other  related  issues  can  be  ad¬ 
dressed  by  drawing  on  the  rich  literature  on  protecting  the 
rights  and  welfare  of  human  subjects  (The  Belmont  Report, 
1978)  and  its  application  to  the  Internet  (Frankel  &  Siang,  1999). 
Indeed,  often  HCI  professionals  may  be  the  only  source  for  giving 
voice  to  these  ethical  considerations. 


CONCLUSIONS 


During  the  early  periods  of  computerization,  around  the  1950s, 
cyberneticist  Norbert  Wiener  (1953/1985)  argued  that  technol¬ 
ogy  could  help  make  us  better  human  beings  and  create  a  more 
just  society.  But  for  it  to  do  so,  he  argued,  we  have  to  take  con¬ 
trol  of  the  technology.  We  have  to  reject  the  “worshiping  [of] 
the  new  gadgets  which  are  our  own  creation  as  if  they  were  our 
masters”  (p.  678).  Similarly,  a  few  decades  later,  computer  sci¬ 
entist  Joseph  Weizenbaum  (1972)  wrote: 

What  is  wrong,  I  think,  is  that  we  have  permitted  technological 
metaphors  .  .  .  and  technique  itself  to  so  thoroughly  pervade  our 
thought  processes  that  we  have  finally  abdicated  to  technology  the  very 
duty  to  formulate  questions.  .  .  .  Where  a  simple  man  might  ask:  “Do 
we  need  these  things?,”  technology  asks  “what  electronic  wizardry  will 
make  them  safe?  Where  a  simple  man  will  ask,  “Is  it  good?”  technology 
asks,  “Will  it  work?”  (pp.  611-612) 

As  HCI  professionals,  we  have  profound  opportunities  to 
shape  the  designs  and  implementations  of  computer  technolo¬ 
gies  from  an  ethical  stance. 


1260  •  FRIEDMAN  AND  KAHN 


In  this  chapter,  we  have  reviewed  varying  approaches,  pro¬ 
jects,  and  ideas  that  offer  us  important  ways  of  bringing  human 
values  and  ethics  into  our  design  practice.  Several  ideas  are  worth 
emphasizing.  First,  as  is  well  known,  it  is  much  easier  to  design 
systems  right  initially  than  to  attempt  to  retrofit  poor  systems  af¬ 
ter  they  have  become  entrenched  within  organizations  and  other 
social  systems.  Thus,  it  is  imperative  that  we  take  a  proactive 
stance  on  human  values,  ethics,  and  design.  Second,  many  of  the 
difficult  problems  in  this  area  require  multidisciplinary  collabo¬ 
rations.  Third,  we  need  to  hold  out  human  values  with  ethical  im¬ 
port  as  a  design  criterion — along  with  the  traditional  criteria  of  re¬ 
liability,  efficiency,  and  correctness — by  which  systems  may  be 
judged  poor  and  designers  negligent.  As  with  the  traditional 
criteria,  we  need  not  require  perfection,  but  commitment. 

This  Handbook  has  an  entire  section  devoted  to  issues  of 
diversity.  Individual  chapters  include,  for  example,  designing  for 
gender  differences;  children;  the  elderly;  internationalization; 
and  people  with  motor,  perceptual,  and  cognitive  impairments. 
Yet,  in  our  review  of  the  literature,  current  HCI  approaches  to 
human  values  and  ethics  do  not  always  fare  well  when  used  in  di¬ 
verse  contexts.  For  example.  Participatory  Design  can  be  partic¬ 
ularly  effective  when  a  community  shares  many  deeply  held  sen¬ 
sibilities,  such  as  a  commitment  to  participatory  democracy  in 
the  workplace  and  to  the  idea  of  meaningful  work  itself  But  this 
approach  is  more  difficult  to  apply  when  divisive  constituencies 
argue  on  the  basis  of  narrowly  conceived  self-interests  or  hos¬ 
tile  prejudices.  What  happens,  for  example,  when  a  manager 
values  accountability  over  a  worker’s  privacy,  or  efficiency  over 
a  worker’s  autonomy?  If  each  value  has  equal  weight,  on  what 
basis  does  a  designer  move  forward  with  a  particular  design? 

As  the  field  of  HCI  seeks  to  bring  moral  commitments  into 
diverse  contexts,  it  needs  a  principled  moral  means  to  adjudi¬ 
cate  competing  value  claims.  It  is  for  this  reason  that  we  pro¬ 
vided  early  on  a  moral  philosophical  and  psychological  frame¬ 
work  for  approaching  this  problem.  We  suggested,  for  example, 
that  when  moral  conflicts  occur  between  diverse  groups,  that 
such  variability  by  itself  does  not  prove  or  disprove  the  moral 
relativist’s  position.  People  can  believe  that  a  certain  act  (such  as 
racial  discrimination)  is  moral;  but  documenting  such  a  belief 
does  not  make  it  so.  We  also  suggested  that  when  moral  differ¬ 
ences  appear  in  another  culture,  a  question  to  ask  is  whether 
there  are  victims  within  that  culture.  If  there  are,  then  it  is  prob¬ 
ably  less  the  case  that  societies  differ  on  moral  ground,  and 
more  that  some  societies  (Western  societies  included)  may  be 
involved  explicitly  in  immoral  practices.  We  suggested  that  close 
attention  must  be  paid  to  the  level  of  moral  analysis.  Namely,  de¬ 
finitions  of  morality  that  entail  abstract  characterizations  of  jus¬ 
tice  and  welfare  tend  to  highlight  moral  universals,  whereas  de¬ 
finitions  that  entail  specific  behaviors  or  rigid  moral  rules  tend 
to  highlight  moral  cross-cultural  variation.  In  our  view,  both  lev¬ 
els  of  analysis  have  merit,  and  a  middle  ground  provides  an  epis- 
temically  sensible  and  powerful  approach  in  HCI  research — one 
that  allows  for  an  analysis  of  universal  moral  constructs  (such 
as  justice,  rights,  welfare,  and  virtue),  as  well  as  allowing  for  the 
ways  in  which  these  constructs  play  out  in  a  particular  culture  at 
a  particular  point  in  time.  By  embracing  this  moral  theoretical 
framework,  we  are  not  saying  that  all  moral  problems  can  be 
solved.  But  such  problems  will  be  discussed  (and  argued  about) 
in  ways  that  respect  diversity  and  prevent  oppression. 


In  the  future,  certain  trends — technological  and  societal — 
will  pose  particular  challenges  in  terms  of  human  values,  ethics, 
and  design.  From  our  view,  three  trends  stand  out.  First,  com¬ 
putational  technologies  will  increasingly  allow  for  the  erosion  of 
personal  privacy.  Even  today,  for  example,  surveillance  cameras 
capture  our  images  in  banks  and  airports,  and  in  many  stores, 
malls,  and  even  streets.  In  cars,  GPS  navigation  systems  not  only 
receive  positioning  data,  but  also  can  broadcast  one’s  position. 
Businesses  can  (and  sometimes  do)  monitor  workers’  elec¬ 
tronic  communications.  Indeed,  think  of  perhaps  the  last  bas¬ 
tion  of  the  private  space,  the  home;  and  recognize  that  “aware 
homes”  of  the  future  will  have  the  potential  to  record  virtually 
every  movement  an  individual  makes  within  his  or  her  home, 
and  to  link  that  data  to  large  networked  databases.  Thus,  pro¬ 
tections  of  individual  privacy  will  need  to  become  an  even  more 
central  concern  to  the  HCI  community.  The  second  trend  is  that 
computational  technologies  will  increasingly  provide  means  for 
government  to  erode  civil  liberties.  Moreover,  the  public  may 
unwittingly  accept  such  consequences  based  on  the  assumption 
that  these  technologies  will  substantially  enhance  our  nation’s 
physical  security.  But  such  an  assumption  is  not  always  war¬ 
ranted.  Imagine,  for  example,  if  the  government  required  each 
of  us  to  obtain  biometric  national  identification  cards,  and  to 
use  these  cards  for  all  of  our  financial  transactions,  air  travel,  en¬ 
try  into  government  buildings,  and  so  forth.  Secure?  Hardly,  for 
the  systems  that  store  this  data  can  be  cracked.  Indeed,  once 
biometric  data  is  stolen,  it  is  not  so  easy  to  get  a  new  face,  fin¬ 
gerprint,  or  DNA.  Third,  in  our  increasingly  linked  communica¬ 
tive-technological  infrastructure,  the  pervasiveness  and  speed 
of  information  can  undermine  people’s  psychological  well-be¬ 
ing  and  quality  of  life.  How  can  we  respond  to  this  problem  of 
information  overload?  One  solution  is  that  we  need,  at  times, 
to  turn  away  from  our  technological  devices  and  thus  to  check 
their  encroachment  into  human  lives.  Another  solution  is  that, 
in  our  designs,  we  need  to  find  ways  to  increase  the  ratio  of 
quality  over  quantity  of  information.  'We  can  also  profit  by  de¬ 
signing  “calming”  technologies  (that  go  beyond  placing  infor¬ 
mation  in  our  peripheral  awareness,  which  itself  can  become 
overloaded). 

In  the  last  decade,  the  HCI  community  has  made  tremendous 
progress  in  integrating  human  values  and  ethics  into  the  practice 
of  design.  As  the  field  continues  to  move  forward,  the  challenge 
remains  how  to  design  technology  wisely,  ethically,  to  create  es¬ 
sential  conditions  by  which  humans  live  and  flourish. 
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INTRODUCTION 


JUSTIFICATION,  OF  COST  JUSTIFICATION 


A  friend  recendy  shopping  for  a  car  was  delighted  to  realize  that 
much  of  her  shopping,  if  not  the  final  purchase,  could  be  taken 
care  of  online,  free  of  the  long  waits  or  pushy  sales  representa¬ 
tives  she  feared  would  find  when  visiting  dealers’  lots.  However, 
once  she  began  her  shopping,  she  quickly  realized  that  few,  if 
any,  of  the  manufacturer  or  dealer  websites  supported  the  tasks 
she  envisioned  carrying  out.  She  was  often  forced  to  complete 
a  multistep  customization  process  involving  choices  among  op¬ 
tions  packages,  hubcap  styles,  and  interior  preferences  before 
she  was  able  to  view  even  a  basic  price  for  a  particular  model.  In 
some  cases,  she  was  required  to  enter  personal  information  or 
set  up  an  account  by  completing  tedious  forms  in  order  to  re¬ 
trieve  relevant  pricing  information.  Unhappy  with  what  she  saw 
as  artificial  barriers  to  information  she  required  to  make  her 
purchase  (which,  she  admits,  she  would  likely  have  conducted 
online),  my  friend  began  to  resent  the  companies  whose  sites 
she  felt  wasted  her  time.  If  this  is  how  I’m  treated  in  the  pri¬ 
vacy  of  my  own  home,  she  wondered,  what  should  I  expect  if 
I  were  to  walk  on  to  the  lot?  In  frustration,  my  friend  opted  to 
contact  her  bank  for  help  and  ended  up  completing  the  major¬ 
ity  of  her  shopping  through  the  bank’s  website  (and  eventually 
chose  the  bank  for  financing).  Her  interactions  with  the  actual 
dealer  were  brief  and  left  her  with  the  impression  that  they 
were  happy  to  waste  her  time.  In  effect,  the  usability  problems 
on  the  manufacturer  and  dealer  websites  strengthened  her  pre¬ 
existing  aversion  to  their  services  when  the  sites  were  likely  in¬ 
tended  and  supposedly  designed  to  do  just  the  opposite.  They 
also  cost  a  dealership  a  financing  customer  and  led  to  negative 
feedback  from  my  friend  in  a  number  of  online  car-shopping  fo¬ 
rums. 

Around  the  same  time,  an  advertisement  for  a  nationwide 
used-car  retailer  network  was  getting  substantial  airplay.  In  it,  a 
gentleman,  excited  about  purchasing  his  new  car  online,  has 
just  selected  his  car  and  is  beginning  the  “checkout”  process 
on  a  competing  car  dealer’s  ’Web  site  to  have  the  automobile  fi¬ 
nanced  and  delivered  to  his  home.  Then  the  look  on  his  face 
tragically  changes  when  he  finds  himself  caught  in  the  familiar 
infinite  loop  of  attempting  to  guess  an  appropriate  user  ID  and 
password  for  the  site.  As  the  frustrated  potential  customer  con¬ 
tinues  to  battle  with  the  website,  a  reassuring  voiceover  promises 
viewers  that  their  experience  with  the  advertiser’s  website  will 
be  pleasant  and  easy.  'We  are  left  wondering  whether  the  gen¬ 
tleman  was  ever  able  to  create  a  suitable  username  and  carry 
on  with  his  purchase. 

As  the  stakes  for  online  purchases  rise  higher  and  higher,  a 
lost  sale  caused  by  poor  usability  can  have  an  unprecedented 
negative  business  impact.  Some  companies  and  advertising 
firms  have  begun  to  leverage  usability  as  a  competitive  advan¬ 
tage,  and  consumer  willingness  to  accommodate  frustration  in  a 
virtual  sea  of  alternatives,  as  just  illustrated,  is  low.  How  much 
revenue  will  the  car  dealers  lose  from  situations  like  to  ones  you 
just  read  about?  More  interestingly,  how  many  sales,  online  or 
otherwise,  will  poor  usability  cost  these  companies,  and  how 
many  such  losses  are  acceptable? 


“The  usability  engineering  of  websites  is  always  carried  out  un¬ 
der  suboptimal  conditions.  Development  constraints  .  .  .  de¬ 
mand  that  we  pursue  a  quicker,  less-thorough  course  of  usabil¬ 
ity  engineering  than  we  would  like”  (Bias  &  Keough,  2000).  And 
in  times  of  tight  budgets,  it  is  absolutely  necessary  to  be  able  to 
demonstrate,  with  real,  quantitative  data,  that  any  money  in¬ 
vested  in  usability  engineering  yield  a  positive  return.  Otherwise, 
the  usability  engineer  is  dependent  upon  the  faith  and  inspira¬ 
tion  of  the  stake-holders,  when  trying  to  get  the  funding  to  carry 
out  a  user-centered  design  approach.  Plus,  said  usability  engi¬ 
neer  will  be  competing  for  those  funds  with  others  who  DO  have 
quantitative  figures — lines  of  code,  sales  projections,  market  re¬ 
search,  defect  rates,  customer  support  calls,  person  months. 

Consider,  too,  that  much  of  the  software-development  world 
still  sees  usability  as  something  “nice  to  have.”  To  compete  for 
those  constrained  development  resources,  the  usability  engi¬ 
neer  must  be  able  to  justify  the  cost  of  any  expenditures  on  us¬ 
ability.  This  chapter  is  intended  to  give  to  the  usability  engineer, 
the  usability  engineering  manager,  or  anyone  advocating  for  the 
implementation  of  a  user-centered  design  approach,  the  tools 
to  demonstrate  that  a  particular  program  of  usability  engineer¬ 
ing  will  yield  a  particular  return  on  investment  (ROI). 

The  ROI  Is  What  It  Is 

The  ROI  for  any  effort  is  what  it  is — the  trick  is  to  project  or  cal¬ 
culate  it.  Theoretically,  after  each  new  marketing  campaign  or  ad¬ 
dition  in  sales  force  or  added  product  feature,  we  could  calculate 
effects  on  revenues  and  costs.  (This  is  not  necessarily  easy,  as 
usually  there  are  multiple  variables  with  no  experimental  control, 
and  thus  it  is  hard  to  attribute  a  certain  percentage  of  any  change 
to  one  variable.)  Given  the  state  of  web  design,  and  our  experi¬ 
ences,  usability  engineers  are  confident  that  just  about  all  us¬ 
ability  engineering  efforts  are  cost  justified.  Clare-Marie  Karat 
(2005)  reported  a  range  of  ROI  ratios  from  2:1  to  100:1  for  tradi¬ 
tional  software  user  interface  designs.  And  with  web-develop- 
ment  tools  enabling  more  folks  to  play  the  role  of  designer  and 
the  broad  access  to  the  web  making  for  an  ever-widening  circle 
of  users,  there’s  every  reason  to  suspect  that  usability  engineer¬ 
ing  for  websites  yields  an  even  HIGHER  positive  return. 

So,  assuming  that  any  change  in  a  company’s  product  or 
process  will  yield  SOME  change  in  that  company’s  revenues  or 
expenses  or  both,  the  trick  is  to  estimate  what  that  ROI  will  be. 


THE  EMPIRICAL  APPROACH 
TO  USER-CENTERED  DESIGN 


Though  it  is  hard  to  attribute  exact  fluctuations  in  costs  and 
benefits  to  certain  methods  employed  in  a  user-centered  design 
approach,  it  is  possible  to  collect  empirical  data  to  drive  pro¬ 
jections  and  calculations  of  ROI.  In  this  chapter,  we  will  en- 
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deavor  to  encourage  and  equip  the  usability  engineer,  or  the 
software  development  manager,  to  collect  the  data  and  make 
the  calculations  to  justify  expenditures  on  usability  engineering. 
We  will  begin  by  considering  the  wide  range  of  possible  benefi¬ 
ciaries  of  a  user-centered  design  approach. 


TO  WHOM  DO  BENEFITS  ACCRUE? 


It  is  important  to  realize,  and  to  communicate  to  all  the  product/ 
Web  site  stakeholders,  the  different  benefits  that  can  result  from 
a  program  of  usability  engineering.  Done  right  (and  early  in  the 
development  cycle) ,  usability  engineering  can  yield  efficiency 
gains  for  the  site/product-development  team,  as  problems  are 
discovered  early  when  they  are  relatively  inexpensive  to  repair. 
The  customer-  support  team  will  require  a  relatively  small  staff, 
as  more  users  can  carry  out  their  tasks  without  help.  Similarly,  the 
documentation  team  may  need  to  generate  fewer  user  aids, 
and  will  do  a  better  job  of  creating  their  help  screens  and  other 
hard-copy  and  online  documentation.  A  more  intuitive  product 
means  that  fewer  resources  need  to  be  devoted  to  training 
materials — or  perhaps  the  training  products  can  focus  more  on 
advanced  usage,  as  the  fundamentals  are  more  intuitive.  Of 
course,  a  more  usable  product  will  mean  that  the  user  will  en¬ 
joy  increased  throughput,  both  because  the  interaction  design 
allows  quicker  completion  of  tasks  through  fewer  displays  and 
fewer  keystrokes/mouse  clicks,  and  because  less  time  is  spent 
making  and  recovering  from  errors.  Indeed,  the  customer  may 
be  able  to  hire  a  less  sophisticated  or  less  experienced  employ 
to  carry  out  tasks  previously  handled  only  by  higher-paid  em¬ 
ployees.  All  of  this  leads  to  increased  customer  satisfaction,  in¬ 
creased  trade-press  joy,  and  increased  sales. 


QUANTIFYING  THE  COSTS  (EASY!) 


It  is  a  simple  matter  to  quantify  the  COSTS.  For  in-house  usabil¬ 
ity  teams,  it  is  a  simple  matter  to  calculate  the  costs  of  the  us¬ 
ability  engineers — their  salaries  and  additional  costs  such  as 
benefits,  a  desk,  a  computer,  and  so  forth.  (“Commonly,  the  fully 
loaded  cost  of  an  employee  is  at  least  twice  his  or  her  salary” 
[Nielsen,  2000,  p.  276] .)  For  a  particular  project,  or  even  a  par¬ 
ticular  data-collection  effort,  a  fractional  cost  of  a  usability  pro¬ 
fessional  can  be  calculated.  For  usability  engineering  contractors 
or  consultants,  the  costs  will  be  easy  to  calculate,  simply  by  read¬ 
ing  the  invoice.  You  must  also  add  to  these  costs  the  costs  of  en¬ 
gaging  the  software  developers  in  the  usability  evaluations  or 
in  the  implementation  of  the  corrections  to  the  interface  that 
arose  during  testing.  That  is,  perhaps  the  designer/developers 
will  be  asked  to  generate  a  user  interface  prototype  for  testing. 
Or  perhaps  they  will  observe  end-user  testing  in  the  lab.  Cer¬ 
tainly  the  development  team  will  need  to  invest  some  time  in 
helping  build  a  user  profile  and  test  scenarios. 

Recognize  that  there  will  be  some  costs,  even  without  us¬ 
ability  engineering.  That  is,  when  calculating  the  comparison  be¬ 


tween  the  costs  of  pursuing  a  usability  engineering  approach 
and  not,  it  is  important  to  include  the  designer/developer  time 
to  fix  software  or  design  bugs  that  may  be  discovered  late  in  the 
process  anyway,  say,  during  system  test. 


QUANTIFYING  THE  BENEFITS  (HARDER) 


Quantifying  the  benefits  is  harder,  but  still  possible.  One  rea¬ 
sonable  approach  is  to  make  projections  based  on  usability  test¬ 
ing.  Perhaps  the  crispest  arena  is  reduced  call-support  burden. 
Imagine  a  usability  test  that  revealed  that  of  the  test  participants 
who  carried  out  a  task,  30%  had  to  call  the  (simulated)  help 
desk.  And  then  (we’re  still  imagining),  in  light  of  the  usability 
test  results,  the  product  is  redesigned,  and  subsequent  testing 
reveals  that  only  5%  of  the  test  participants  need  help  to  com¬ 
plete  the  task.  It  is  a  relatively  simple  matter  to  project  the  re¬ 
duction  in  the  number  of  calls  to  the  help  desk  using  the  pro¬ 
jected  number  of  licenses  (for  a  product)  or  visits  (for  a 
website) .  Then  multiply  that  number  of  avoided  help-line  calls 
times  the  average  cost  of  a  call  to  the  company,  to  find  the 
amount  of  money  saved  thanks  to  that  one  usability  test.  (We’ll 
go  through  a  more  thorough  example  in  a  later  section.) 

Another  way  to  project  the  benefits  is  by  using  historical 
data.  If  you  have  experience  with  a  similar  sort  of  interface  at  a 
similar  stage  of  development,  you  might  reasonably  expect  the 
same  ROI  as  you  earned  the  last  time.  Or  if  there  are  data  in  the 
research  literature,  from  a  project  that  is  similar  to  yours,  you 
could  use  those  ROI  data  as  an  estimate.  Of  course,  the  closer 
your  current  situation  matches  that  of  the  other  development 
effort  from  which  you’re  extrapolating — in  complexity  of  the 
user  interface,  sophistication,  and  experience  level  of  the  de¬ 
velopment  team,  processes  followed  by  the  development  team, 
resources  available  to  the  development  team  (e.g.,  machines  for 
testing),  domain  expertise  in  house — the  better  the  estimate 
will  be.  As  with  sales  projections,  it  is  always  best  to  have  some 
customer  data  to  drive  the  estimates. 

It  is  important  to  return  to  the  “actuals”  after  the  project  to 
assess  how  good  your  ROI  estimates  were.  This  is  a  difficult 
thing  to  budget  time  for,  as  you  will  be  off  on  the  next  project. 
However,  the  continuous  improvement  of  your  financial  mod¬ 
eling  and  your  empirically  based  validation  of  your  usability  ef¬ 
fort,  in  the  eyes  of  your  stakeholders  and  funders,  depends  on 
your  ability  to  demonstrate  that  your  projections  were  good 
(and  conservative!),  and  that  the  hour  and  dollar  spent  on  us¬ 
ability  engineering  were  well  worth  it. 


THE  VALUE  QF  MQNEY— 
VARIQUS  FINANCIAL  METHQDS 


Raw  Dollars 

One  of  the  frustrations  with  usability  engineering  of  human- 
computer  interfaces  (HCI),  circa  2005,  is  there  is  no  acknowl- 
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edged  metric  with  which  all  interfaces  can  be  compared.  Re¬ 
cent  attempts  have  been  made  to  boil  down  usability  evaluation 
data  to  a  single  summative  score  (Sauro  &  Kindlund,  2005),  but 
until  such  a  method  comes  into  widespread  use,  most  usabil¬ 
ity  engineers  will  be  more  comfortable  relying  on  the  univer¬ 
sally  understood  metric  historically  at  their  disposal:  the  dol¬ 
lar.  (Or  the  franc,  or  the  yen;  for  a  discussion  of  the  application 
of  a  cost-justification  approach  to  projects  that  span  global  mar¬ 
kets,  see  Mayhew  &  Bias  [2005])  It  enables  managers,  engi¬ 
neers,  writers,  marketers,  and  even  psychologists  to  communi¬ 
cate  on  even  ground:  if  we  spend  X  dollars  on  usability 
engineering,  we  expect  to  save  Y  dollars  thanks  to  gained  de¬ 
velopment  efficiencies  and  reduced  support  call  burden,  and  to 
earn  Z  additional  dollars  in  revenue,  thanks  to  increased  sales. 
And  as  stated  earlier,  the  benefits  of  usability  engineering  tend 
to  be  so  robust  that  attending  only  to  raw  dollars  will  usually 
be  sufficient  to  justify  dedicating  resources  to  a  systematic, 
user-centered  design  approach. 

Three  different  but  related  factors,  however,  may  today  mo¬ 
tivate  individuals  and,  in  the  near  future,  the  whole  discipline 
to  employ  more  sophisticated  financial  models: 

1.  Tight  development  budgets  and  the  need  for  usability  engi¬ 
neers  to  compete  with  other  product-related  disciplines 

2.  The  need  to  be  smart  about  which  usability-engineering 
methods  to  employ  (more  in  the  next  section) 

3.  The  (assumed)  ever-improving  accuracy  of  our  projections 
of  the  benefits  of  usability  engineering. 

These  three  factors  may  drive  the  usability  engineer  intent 
on  being  as  accurate  as  possible  in  projecting  the  relative  value 
of  usability  to  employ  an  accounting  technique  such  as  one  of 
the  following  three.  (See  Karat  [2005]  for  more  detail,  including 
formulae.) 


Payback  Period 

The  payback  period  is  the  amount  of  time  it  takes  to  recover 
through  incremental  cash  flows  (or  subsequent  cost  reductions) 
a  project’s  net  investment  cash  outflow.  The  calculation  locates 
the  point  at  which  the  cash  outflow  equals  the  cash  inflow.  In 
simple  terms,  the  shorter  the  payback  period,  the  better  the  in¬ 
vestment.  An  organization  may  have  a  standard  payback  period 
that  any  investment  must  beat  in  order  to  be  funded.  Alter¬ 
nately,  two  different  courses  of  usability  engineering  (say,  a 
“basic”  course  versus  a  “complete”  course)  can  be  compared 
against  each  other  to  drive  a  decision.  Payback  period  is  not  a 
measure  of  profitability,  but  rather  a  measure  of  speed  of  capi¬ 
tal  recovery.  (Note:  It  is  not  necessarily  the  case  that  the  more 
expensive,  “full”  course  will  yield  a  longer  payback  period.)  (For 
a  passionate  and  well-reasoned  argument  for  the  joys  of  a  “time- 
to-break-even”  approach,  see  Conklin,  [1996]). 

Because  payback  period  ignores  the  time  value  of  money, 
many  organizations  use  discounted  cash-flow  techniques  to 
make  investment  decisions.  Two  of  the  most  popular  are  (a)  in¬ 
ternal  rate  of  return  (IRR)  and  (b)  net  present  value  (NPV). 


Internal  Rate  of  Return  (IRR) 


The  internal  rate  of  return  (IRR)  on  a  project  is  the  company’s 
rate  of  return  on  invested  capital.  For  example,  if  a  particular 
project  costs  $1,000  and  earns  $100  in  year  1  and  2  plus  return¬ 
ing  the  initial  investment  at  the  end,  than  the  IRR  is  10%. 


IRR 

Cash  Flow 


Time  O 
($1,000) 

10% 


Year  1 
$100 

IRR 


Year  2 
$1,100 


The  IRR  in  this  case  is  compared  against  a  company’s  cost  of 
capital  to  make  an  investment  decision.  Cost  of  capital,  in  sim¬ 
ple  terms,  is  the  company’s  cost  of  obtaining  funding  whether 
through  the  debt  or  equity  markets.  Most  companies  use  both 
sources.  For  example,  if  a  company’s  capital  structure  is  50% 
debt  with  an  average  cost  of  10%,  and  50%  through  equity  with 
an  average  return  of  20%  the  cost  of  capital  is  15%  (.5  *  1)  + 
(.5  *  .2).  In  summary,  the  decision  rule  for  IRR  is  as  follows:  if 
the  IRR  is  greater  than  the  cost  of  capital,  then  the  project 
should  be  accepted,  but  if  the  opposite  is  true,  the  project 
should  be  rejected.  Net  present  value  is  an  alternative  to  IRR. 
■With  IRR,  we  calculate  a  percentage,  whereas  with  NPV  we  cal¬ 
culate  a  dollar  amount. 


Net  Present  Value 


NPV  is  today’s  value  of  the  cash  inflows  less  the  cash  outflows. 
In  this  technique  the  future  cash  inflows  are  discounted  using 
the  company’s  cost  of  capital.  From  a  finance  view,  any  project 
with  a  positive  NPV  should  be  accepted.  In  the  above  example, 
with  a  cost  of  capital  of  10%,  the  NPV  is  zero  and  we  would  be  in¬ 
different  regarding  the  project. 


NPV 

Cash  Flow 


Time  O 
($1,000) 

10% 

($0.00) 


Year  1  Year  2 

$100  $1,100 

Cost  of  Capital 
NPV 


The  challenge  to  any  usability  engineer  is  to  demonstrate  that 
the  benefits  in  the  form  of  cash  inflows  (today’s  dollars)  out¬ 
weigh  the  cost  of  the  project  as  measured  against  the  organiza¬ 
tion’s  cost  of  capital.  In  many  companies,  15  to  20%  is  the  hurdle 
rate  used  to  determine  whether  or  not  a  project  is  acceptable. 


NOT  "IF,"  BUT  "WHICH" 


As  suggested  in  the  last  section,  one  can  apply  a  cost-justification 
approach  not  only  to  justify  (or  not)  usability  engineering  of  a 
product  interface,  but  (more  likely,  these  days)  to  steer  the  prod¬ 
uct  development  team  toward  a  particular  set  of  usability  engi¬ 
neering  methods.  The  field  of  usability  engineering  has  come 
to  a  point  of  sophistication  where  we  needn’t  ask  the  question. 
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“Shall  we  employ  usability  engineering?,”  but  rather  we  should 
ask,  “Which  methods  will  yield  us  the  best  ROI  for  our  usability 
engineering  dollar  and  hour?”  It  would  be  folly  for  a  web  or 
other  software-development  team  to  ask  the  question,  “Shall  we 
test  the  software  before  ‘going  live,’  or  before  shipping.”  Obvi¬ 
ously,  the  sensible  question  is,  “How  much  software  testing  shall 
we  employ  and  what  type  of  testing”? 

Likewise,  given  the  importance  of  usability  to  the  success  of 
websites  and  just  about  all  software,  the  sensible  question  is  not, 
“Shall  we  employ  usability  engineering  in  the  development  of 
our  product/site?”  but  rather,  “How  much  usability  engineering 
should  we  invest  in,  which  methods  should  be  employ,  and 
when?”  Clearly,  $20,000  spent  on  a  prototype  test,  early  in  the 
development  cycle,  may  yield  a  greater  ROI  than  a  $20,000  end- 
user  test  after  the  product  is  developed,  given  the  relative  costs 
of  repairing  the  problems  discovered,  even  given  the  increased 
time-adjusted  cost  of  the  earlier  test. 


WHY  A  COST-JUSTIFICATION  APPROACH  IS 
MORE  IMPORTANT  FOR  WEB  INTERFACES 


Another  anecdote  will  serve  to  illustrate  the  increased  impor¬ 
tance  of  usability  on  websites  as  compared  to  traditional  office 
software. 

A  contract  development  team  was  building  a  website  for  a 
client  organization.  The  website  was  to  include  up-to-date  drug 
information  and  was  intended  to  be  used  by  physicians  as  a  sub¬ 
stitute  for  the  standard  desk  references  they  currently  use  to 
look  up  such  drug  information  as  side  effects,  interactions,  ap¬ 
propriate  uses,  and  data  from  clinical  trials.  The  business  model 
for  the  site  was  an  advertising  model.  Physicians  would  visit  the 
site  regularly  because  more  current  information  would  be  avail¬ 
able  and  that  information  would  be  more  easily  found  on  the 
site  and  relative  to  published  desk  references.  Pharmaceutical 
companies  would  buy  advertising  for  their  drug  products  on  the 
site  because  the  visitors  to  the  site  (physicians)  represent  their 
target  market.  Repeat  visits  would  generate  regular  and  new 
traffic  and  new  visitors  would  join  based  on  word-of-mouth 
among  physicians.  This  would  drive  up  the  value  of  advertising 
generating  a  profit  for  the  client. 

The  development  team  generated  a  prototype  design  that 
the  client  would  use  to  pursue  venture  capital  to  support  the 
full-blown  development  and  initial  launch  and  maintenance  of 
the  site.  The  client  paid  for  this  prototype  development. 

Once  the  prototype  was  ready,  a  usability  engineering  staff 
was  brought  on  board  to  design  and  conduct  a  usability  test. 
Eight  physicians  were  paid  to  fly  in  to  the  development  center 
city  and  participate  in  the  test  as  test  users.  Several  basic  search 
tasks  were  designed  for  the  physicians  to  perform.  They  were 
pointed  to  the  prototype’s  Home  Page  and  left  on  their  own  to 
try  to  successfully  find  the  drug  information  that  was  requested 
in  the  first  task. 

"Within  45  seconds  of  starting  their  first  search  task,  seven  out 
of  the  eight  physicians  gave  up  and  announced  that  the  site  was 
unusable  and  if  it  were  a  real  site,  they  would  abandon  it  at  that 
point  and  never  return. 


Clearly,  if  the  site  had  launched  as  it  was  prior  to  this  test,  not 
only  would  an  optimal  ROI  not  have  been  realized,  but  also  the 
site  would  have  failed  all  together  and  the  clients  would  have 
completely  lost  their  investment.  If  only  7/8ths  of  all  visitors  did 
not  return,  there  would  not  be  enough  traffic  to  motivate  ad¬ 
vertisers  to  buy  advertising.  The  entire  investment  would  have 
been  lost. 

Instead,  the  test  users  were  asked  to  continue  with  the  en¬ 
tire  test  protocol,  and  the  data  generated  revealed  insights  into 
the  problem  that  was  a  show-stopper  on  the  first  task,  as  well 
as  other  problems  revealed  in  other  test  tasks.  The  site  was 
redesigned  to  eliminate  the  identified  problems.  Clearly,  the 
usability  test,  which  had  an  associated  cost,  was  worth  the  in¬ 
vestment  in  this  case. 

This  anecdote  illustrates  something  that  distinguishes  web¬ 
sites  from  commercial  software  products.  In  a  commercial 
software  product,  the  buyers  discover  the  usability  problems 
only  after  they  have  paid  for  the  product.  Often  they  cannot 
return  it  once  they  have  opened  the  shrink  wrap  and  installed 
the  product.  Even  if  it  has  a  money-back  guarantee,  they  are 
not  likely  to  return  it,  and  even  if  they  did  return  it,  there  may 
not  be  many  alternative  products  on  the  market,  which  are 
easier  to  use. 

On  a  website,  on  the  other  hand,  it  costs  the  visitor  nothing 
to  make  an  initial  visit.  On  a  website  based  on  an  advertising 
model,  such  as  the  one  described,  the  site  investor  makes  noth¬ 
ing  at  all  unless  there  is  sufficient  ongoing  traffic  to  attract  and 
keep  advertisers.  On  an  e-commerce  site,  the  investor  makes 
nothing  at  all  unless  the  visitors  actually  find  and  successfully 
purchase  products. 

A  website  is  not  a  product  and  the  user  does  not  have  to  buy 
it  to  use  it.  The  website  is  just  a  channel,  like  a  TV  show,  maga¬ 
zine,  or  catalog,  and  if  users  do  not  find  and  repeatedly  and  suc¬ 
cessfully  use  the  channel,  the  investor  gets  no  ROI  for  having 
developed  the  channel.  Thus,  usability  can  absolutely  make  or 
break  the  ROI  for  a  website  even  more  so  than  for  traditional 
software  products. 

It’s  also  true  that  competition  is  even  more  dependent  on 
relative  usability  on  websites  than  is  the  case  with  traditional 
software  products  or  sales  channels.  Someone  wishing  to  buy  a 
book  may  be  inclined  to  buy  from  a  particular  bookstore  that  is 
easy  to  get  to,  even  if  it  is  not  the  best  bookstore  around.  On  the 
other  hand,  if  a  customer  cannot  easily  find  the  desired  book 
through  the  barnesandnoble.com  website,  they  need  not  even 
get  out  of  their  chair  to  shop  at  a  competitor’s  site  instead.  It’s 
not  enough  to  simply  have  a  website  that  supports  direct  sales; 
your  site  must  be  more  usable  than  the  competition’s  site  or 
you  will  lose  business  based  on  the  relative  usability  of  the  sell¬ 
ing  channel  alone.  For  example,  60%  of  a  sample  of  consumers 
shopping  for  travel  online  stated  that  if  they  cannot  find  what 
they  are  looking  for  quickly  and  easily  on  one  travel  site,  they 
will  simply  leave  and  try  a  competitor’s  site  (Forrester  Research, 
Inc.,  2000b).  As  illustrated  in  this  chapter’s  opening  anecdotes, 
savvy  advertisers  have  even  begun  to  explicitly  bundle  "Web  site 
usability  into  their  clients’  campaigns  in  a  manner  that  will  in¬ 
crease  both  customer  awareness  and  demand  for  usable  inter¬ 
faces,  and  thus  make  relative  usability  among  competitors  even 
more  important. 
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In  addition,  if  you  are  a  catalog-order  company  such  as  LL 
Bean  or  Land’s  End,  and  your  product  is  good  but  your  website 
is  bad,  customers  will  not  use  your  website  and  will  resort  to  tra¬ 
ditional  sales  channels  (fax,  phone)  instead.  This  will  result  in  a 
poor  ROI  for  the  website  that  was  intended  to  be  justified  by  rel¬ 
atively  low  costs  compared  with  that  of  those  more  traditional 
channels. 

Forrester  reported  a  buy-to-look  ratio  of  only  about  2%  on 
e-commerce  websites  (Forrester  Research,  Inc.,  2000a),  and 
while  part  of  this  may  be  due  to  other  factors,  usability  is  surely 
a  large  factor  in  this  poor  performance  of  websites  as  a  sales 
channel. 

Another  Forrester  Report  (Forrester  Research,  Inc.,  2001) 
cited  other  compelling  statistics  from  a  variety  of  studies.  In 
one  study,  65%  of  shopping  attempts  at  a  set  of  prominent 
e-commerce  sites  ended  in  failure.  In  another,  42%  of  users 
were  unable  to  complete  the  job-application  process  on  a  set  of 
corporate  sites.  Another  study  showed  that  journalists  could  find 
answers  to  only  60%  of  their  questions  on  corporate  public  re¬ 
lations  sites.  And  still  another  study  showed  that  adding  cer¬ 
tain  product  information  to  e-commerce  sites  reduced  product- 
related  inquiries  by  more  than  20%.  Finally,  Forrester  pointed 
out  that  when  Amazon.com  let  customer  service  slip  in  2000, 
they  fell  behind  their  competitors  in  three  categories  for  the 
first  time.  In  this  same  report,  Forrester  noted  that  the  compa¬ 
nies  they  interviewed  spent  between  1100,000  and  $1  million  on 
site  redesigns,  but  few  had  any  sense  of  which  specific  design 
changes  paid  off,  if  any.  Clearly,  usability  engineering  programs 
that  are  specifically  aimed  at  such  goals  as  increased  success  rate 
of  shopping  attempts,  job-application  completion,  and  informa¬ 
tion  requests  could  have  had  a  profound  impact  on  the  ROI  of 
these  redesign  projects. 

As  illustrated  by  all  these  examples,  usability  engineering 
can  have  a  profound  impact  on  ROI  in  the  case  of  websites,  and 
even  more  dramatic  cost-justification  cases  can  be  made  for 
usability-engineering  programs  during  web  development  as 
compared  to  during  traditional  software-development  projects. 
A  profit  is  made  whenever  a  commercial  software  product  is 
sold,  regardless  of  how  productive  users  are  once  they  start  to 
use  that  software  product,  but  there  simply  is  no  ROI  until  and 
unless  visitors  to  websites  complete  their  tasks  efficiently  and 
successfully. 


A  SPREADSHEET  MODEL  FOR 
CUSTOM-BUILDING  YOUR  OWN 
COST-IUSTIFICATION  TOOL 


The  following  is  an  example  of  a  cost-justification  tool,  built  on 
a  simple  spreadsheet  tool  with  the  addition  of  some  fairly  sim¬ 
ple  macros.  It  is  offered  as  a  model  for  how  the  reader  might 
generate  a  cost-justification  tool  to  project  his  or  her  own. 

The  following  section  of  the  spreadsheet,  in  Table  64.1,  rep¬ 
resents  the  summary  cost-benefit  analysis.  According  to  this 
summary,  we  expect  to  spend  $212,974  in  year  one.  These  ex¬ 
penses  include  building  a  $33,000  usability  lab,  plus  about 
$180,000  spent  on  usability-engineering  support  on  a  total  of 
three  products.  The  summary  reveals  that  by  the  end  of  year 


two  we  expect  a  net  gain,  and  by  the  end  of  year  three  we  expect 
a  total  ROI  of  $519,048,  for  a  net  gain  of  $306,074.  And,  this  is 
the  result  before  any  additional  sales  due  to  improved  usability 
are  factored  in. 

The  details  that  fuel  these  summary  figures  are  offered  below. 


TABLE  64. 1 .  Usability  Cost  lustification — Software  Manufacturer 


Year 

I 

Year 

2 

Year 

3 

Total 

Cost 

Usability  Lab 
Usability  Lab 
Projects' 

Total  Cost 

($33,000) 

($179,974) 

($212,974) 

$0 

$0 

($212,974) 

Cost  Savings 

Customer 

$168,000 

$168,000 

$168,000 

Support 

Software 

Development 

Additional 

Sales 

Total  Cost 

$15,048 

$183,048 

$168,000 

$168,000 

$519,048 

Savings 

Net 

($29,926) 

$168,000 

$168,000 

$306,074 

Our  example  illustrates  the  cost  benefit  from  employing  us¬ 
ability  in  a  software-development  environment.  First  let’s  exam¬ 
ine  the  costs.  Tables  64.2  and  64.3  project  the  cost  of  a  “basic” 
program  of  usability  engineering  and  a  “complete”  program  of 
usability  engineering,  respectively.  The  “complete”  program  has 
all  the  “basic”  program  has,  plus  a  prototype  usability  test  and 
the  inclusion  of  a  usability  survey  along  with  the  product  itself  In 
the  interest  of  making  the  usability  cost-benefit  analysis  as  con¬ 
servative  as  possible,  an  attempt  is  made  to  be  as  thorough  as 
possible,  accounting  for  all  the  meetings,  the  developer  hours, 
and  anything  else  that  would  not  have  been  a  cost  had  a  program 
of  usability  engineering  not  been  pursued. 

The  hourly  cost  for  each  development  team  member  (De¬ 
veloper,  Usability  engineer,  QA  engineer.  Customer  Support 
professional)  is  intended  to  be  a  “fully  loaded”  salary,  including 
benefits,  computer  support,  etc. 

Table  64.4  represents  the  costs  of  building  a  usability  lab. 

There  are  two  main  elements  of  costs  in  our  example:  (a)  the 
lab  build-out  and  (b)  the  conduct  of  a  program  of  usability  en¬ 
gineering.  This  example  assumes  that  existing  space  is  used  for 
the  lab;  thus,  the  incremental  cost  consists  of  lab  design,  build 
out,  and  hardware.  The  cost  of  the  conduct  of  a  program  of  us¬ 
ability  engineering  represents  the  cost  to  run  two  basic  and  one 
expanded  or  “complete”  usability  project.  That  is,  we  are  going 
to  assume  that  in  year  one  we  are  going  to  utilize  the  lab,  and 
the  usability  engineers,  to  provide  “basic”  support  to  two  prod¬ 
uct  development  efforts  (say,  websites)  and  “complete”  support 
to  one  product  development  effort.  Thus,  adding  the  lab  cost 
($33,000)  to  the  two  basic  support  efforts  (at  $51,378  each,  for  a 
total  of  $102,756)  and  one  complete  support  effort  ($77,218) 
yields  a  total  year  one  cost  for  usability  engineering  of  $212,974, 
which  we  saw  in  Table  64.1.  (Note,  when  building  your  own 
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TABLE  64.2.  Cost  of  "Basic"  Usability-Engineering  Support 


Activity 

Cost  Item 

Units  (hours) 

Unit  Cost 

Total  Cost 

Generate  wish  iist 

Developer 

3 

70 

$210 

Customer  Support 

3 

56 

168 

User  survey 

Developer 

16 

70 

1120 

Postage 

50 

Specification  review 

Usability  engineer 

8 

70 

560 

Scope  review 

Usability  engineer 

4 

70 

280 

Contents  review 

Usability  engineer 

8 

70 

560 

Create  usabiiity  pian 

Usability  engineer 

8 

70 

560 

Feature  review 

Usability  engineer 

8 

70 

560 

Function-task  anaiysis 

Usability  engineer 

40 

70 

2800 

Developer 

40 

70 

2800 

Travel 

5000 

Detaiied  feature  review 

Usability  engineer 

8 

70 

560 

QA  review 

Usability  engineer 

4 

70 

280 

QA  engineers 

32 

70 

2240 

Usabiiity  waikthrough 

Usability  engineer 

32 

70 

2240 

Developer 

104 

70 

7280 

Customers 

48 

30 

1440 

Travel 

6000 

Miscellaneous 

100 

Beta  survey 

Usability  engineer 

128 

70 

8960 

Developer 

104 

70 

7280 

Postage 

50 

Product  ship  meeting 

Usability  engineer 

4 

70 

280 

Total 

$51,378 

TABLE  64.3.  Cost  of  "Complete” 

Usability  Engineering  Support 

Activity 

Cost  Item 

Units  (hours) 

Unit  Cost 

Total  Cost 

Generate  wish  list 

Developer 

3 

70 

$210 

Customer  Support 

3 

56 

168 

User  survey 

Developer 

16 

70 

1 120 

Postage 

50 

Generate  UI  prototype 

Developer 

160 

70 

11,200 

Test  Ul  prototype 

Usability  engineer 

96 

70 

6720 

Test  subjects 

80 

30 

2400 

Developer 

72 

70 

5040 

Miscellaneous 

200 

Specification  review 

Usability  engineer 

8 

70 

560 

Scope  review 

Usability  engineer 

4 

70 

280 

Contents  review 

Usability  engineer 

8 

70 

560 

Create  usability  plan 

Usability  engineer 

8 

70 

560 

Feature  review 

Usability  engineer 

8 

70 

560 

Function-task  analysis 

Usability  engineer 

40 

70 

2800 

Developer 

40 

70 

2800 

Travel 

5000 

Detailed  feature  review 

Usability  engineer 

8 

70 

560 

QA  review 

Usability  engineer 

4 

70 

280 

QA  engineers 

32 

70 

2240 

Usability  walkthrough 

Usability  engineer 

32 

70 

2240 

Developer 

104 

70 

7280 

Customers 

48 

30 

1440 

Travel 

6000 

Miscellaneous 

100 

Beta  survey 

Usability  engineer 

128 

70 

8960 

Developer 

104 

70 

7280 

Postage 

50 

Product  ship  meeting 

Usability  engineer 

4 

70 

280 

Include  usability  survey 

Usability  engineer 

4 

70 

280 

Total 

$77,218 
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TABLE  64.4.  Cost  of  Building  a  Usability  Lab 


Lab  Feature 

Units 

Unit  Cost 

Total  Cost 

Lab  design  and  equipment  selection 

160 

35 

5600 

Carpenters  and  electricians 

25 

80 

2000 

One-way  mirror 

1 

1900 

1900 

Video  recording  equipment 

3 

3500 

10,500 

Computers 

4 

2000 

8000 

Overhead  projector 

1 

5000 

5000 

Total 

33,000 

cost-justification  spreadsheet,  you  could  add  macros  to  dynam¬ 
ically  change  the  costs  as  you  increase  the  number  of  basic  and 
complete  programs  you  undertake.) 

Now  let’s  address  the  potential  benefits  (Table  64.5).  In  this 
example,  we  consider  only  the  cost  savings  based  on  market  costs 
for  software  development  and  customer-support  personnel.  The 
customer-support  benefit  is  derived  from  reduced  call  support 
burden  and  the  software  development  savings  are  a  result  of  re¬ 
duced  rework  or  engineering  hours.  This  example  illustrates  that 
usability  would  result  in  a  positive  net  present  value  (NPV)  of 
1153,455  or  a  payback  period  of  two  years,  just  considering  the 
customer  support  and  software  development  savings. 

Notice  that  our  spreadsheet  example  has  a  macro  in  it  that  al¬ 
low  the  user  (of  the  spreadsheet)  to  change  the  “cost  of  capi¬ 
tal”  (basically,  how  much  it  costs  to  spend  the  money,  versus 
keeping  it  and  earning  interest  on  it)  in  increments  of  1%,  which 
automatically  yields  different  calculations  of  the  time  value  of 
the  expenditures  and  savings. 


Now  let’s  look  at  that  savings  in  more  detail.  We  determine 
that  due  to  problems  discovered  early  (when  they  are  cheaper 
to  fix)  rather  than  late,  the  software-development  team  will  re¬ 
alize  a  savings  of  115,048  in  the  first  year  (as  seen  in  Table  64.1). 
That  is,  based  on  historical  data,  we  expect  that  it  will  take  soft¬ 
ware  developers  215  fewer  hours  (at  $70  per  hour)  to  fix  the 
problems. 

As  for  the  detail  for  the  reduced  call-support  burden,  our 
product,  an  e-commerce  site,  expects  20,000  visitors  a  year. 
(Note  the  conservative  nature  of  the  estimates.  If  our  estimates 
are  low,  and  our  projections  yield  a  positive  ROI,  then  actual 
numbers  that  exceed  the  estimates  will  yield  even  higher  actual 
ROI.)  By  working  with  the  customer-support  team  and  review¬ 
ing  their  (very  detailed!)  call  logs,  we  calculate  that  70%  of  the 
calls  are  for  nondefect  problems.  That  is,  the  code  works  “as 
spec’d,”  and  users  just  can’t  figure  out  how  to  carry  out  their 
task.  The  majority  of  customer-support  calls  a  software  manu¬ 
facturer  will  receive  tend  to  be  these  how-to/nondefect  issues. 
Table  64.6  reveals  a  further  breakdown  of  these  usage  problems 
that  usability  engineering  testing  is  likely  to  help  resolve. 

The  customer-support  team  also  knows  (and  knows  well!), 
the  cost  of  a  call.  In  our  example,  that  cost  is  $120.  Given  the 
number  of  visits,  the  cost  of  a  call,  and  the  percentage  of  “usage” 
problems,  we  can  calculate  that  these  calls  cost  the  company 
$1,680,000  a  year.  If  usability  testing  were  able  to  reveal  prob¬ 
lems  and  be  an  agent  of  change  to  drive  some  fixes,  enough  to 
reduce  the  “usage”  calls  by  10%,  the  company  would  realize  a 
savings  of  $168,000  a  year. 

As  in  Table  64.5,  Table  64.6  has  a  macro  that  allows  the  us¬ 
ability  engineer,  based  on  ever-improving  data,  change  the 


TABLE  64.5.  Financial  Metrics 


Year 

0 

Year 

1 

Year 

2 

Year 

3 

Cost  of  Capital 
20% 

Cash  Flow 

Discounted  Cash  Flow 

($212,974) 

($212,974) 

$183,048 

$152,540 

$168,000 

$116,667 

$168,000 

$97,222 

Net  Present  Value 

Internal  Rate  of  Return 
Payback  Period 

$153,455 

64% 

2  Years 

-  1 

-  1 

TABLE  64.6. 

Savings  Due  to  Reduced  Call-Support  Burden 

Total 

Documentation  Environment 

Documentation 

Installation 

Product 

Usability 

Product 

Cases 

Change 

Problem 

Questions 

Questions 

Usage 

Sub-total 

E-commerce  Website 

20,000 

1% 

9% 

1% 

10% 

49% 

70% 

Usability  Benefit 

10% 

d 

Cost  Per 

Total 

Usability  Support 

Cost  of  Usability 

Usability  Engineering 

Revised  Total 

Product 

Call 

Cost 

Questions 

Related  Cases 

Savings 

Cost 

E-commerce  Website 

$120 

$2,400,000 

70% 

$1,680,000 

$168,000 

$2,232,000 
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“usability  benefit”  (i.e.,  the  percentage  of  usage  calls  avoided) 
by  1%  increments. 

This  completes  the  first  example.  As  we  saw  in  Table  64.1,  we 
expect  to  spend  1212,974  in  the  first  year,  and  at  the  end  of  three 
years  to  have  realized  savings  of  $519,048,  for  a  net  savings  of 
1306,074.  In  Table  64.5  we  see  that  this  investment  has  a  payback 
period  of  two  years,  with  an  internal  rate  of  return  of  64%.  If  we 
learn,  during  the  course  of  this  project,  that  the  cost  of  capital  goes 
down,  or  that  the  usability  benefit  is  actually  higher  than  10%,  our 
projected  benefit  of  the  usability  engineering  work  will  go  up. 


SAVINGS  TO  THE  CUSTOMER 


The  previous  illustration  was  exceedingly  conservative  as  it  did 
not  take  into  account  all  the  benefits  that  might  possibly  derive 
(not  even  increased  sales).  In  the  following  example  illustrated 
in  Table  64.7,  we  illustrate  how  to  calculate  potential  benefits 
to  the  customer. 

Let’s  say  we  have  a  customer  company  that  is  buying  a  soft¬ 
ware  product  from  us  that  will  be  used  by  500  employees.  If  this 
customer  receives  a  more  usable  product,  then  the  users  of  that 
product  will  require  less  training,  spend  less  time  making  and 
recovering  from  errors,  and  will  be  quicker  at  traversing  the  cor¬ 
rect  path  through  the  user  interface. 

Thanks  to  the  program  of  usability  engineering,  productivity  in¬ 
creases  due  to  users  being  able  to  navigate  the  product  menus/ 
websites  more  efficiently.  Conservatively  (again),  based  on  histor¬ 
ical  findings  we  expect  to  improve  performance  using  this  soft¬ 
ware  tool  by  30  seconds.  A  30-second  benefit  for  500  users  would 
result  in  133,542  worth  of  savings  in  the  course  of  a  year.  Error  re¬ 
duction  is  the  savings  associated  with  crashes  or  redoing  any  given 
task.  A  15-second  benefit,  thanks  to  one  less  error  to  correct. 


would  results  in  savings  of  $16,771  for  the  same  500  users.  Natu¬ 
rally,  a  more  user-friendly  product  is  more  intuitive  and  would  re¬ 
sult  in  less  required  training.  At  a  training  cost  of  $35/hour,  a  one- 
hour  reduction  would  result  in  a  $17,500  one-time  savings. 
Therefore,  one  customer  with  500  “seats”  will  enjoy  a  savings  of 
over  150,000  a  year  thanks  to  our  usability  engineering  effort. 


INCREASED  CONVERSION  RATES 


Here’s  another  example.  Critical  to  an  e-commerce  website’s 
success  is  its  conversion  rate  and  ability  to  generate  repeat  busi¬ 
ness.  Many  of  us  have  had  unpleasant  web  experiences,  whether 
it  was  difficulty  in  navigating  the  website,  conflicts  with  the 
user’s  computing  environment  due  to  platform  or  browser  issues, 
differences  in  connection  speeds,  or  other  access  problems.  In 
most  of  these  situations  the  potential  customers  do  not  call  or 
e-mail.  Instead  they  revert  to  a  traditional  route  or  alternatively 
move  to  a  competitor’s  website  (and  in  the  worst-case  scenario, 
submit  an  angry  review  to  an  online  shopping  service  like  Epin- 
ions.com  or  Eroogle.com,  or  publicly  express  their  frustration  in 
any  number  of  other  ways  that  will  reach  other  potential  cus¬ 
tomers).  In  the  example  of  a  physicians’  reference  website,  the 
failure  to  employ  usability  would  result  in  a  13%  conversion  rate 
(one  out  of  eight  users  returning),  or  87%  of  potential  users  de¬ 
ciding  not  to  use  the  site. 

"What  if  we  project  that  via  usability  testing  we  would  yield  a 
50%  increase  in  the  conversion  rate,  raising  it  from  13%  to  19%? 
The  macro  in  the  spreadsheet  allows  you  to  adjust  the  projec¬ 
tion  based  on  a  different  projected  benefit.  Eurther,  the  spread¬ 
sheet  in  Table  64.8  reveals  that  we  expect  to  gain  $40  per  active 
user,  in  advertising  dollars.  A  small  (say,  $10,000)  usability  test 
would  yield  a  $25,000  increase  in  advertising  dollars.  In  our 


TABLE  64.7.  Usability  Cost  lustification — Customer 


Year 

1 

Year 

2 

Year 

3 

Total 

Cost  Savings 

Productivity  Improvement 

$33,542 

$33,542 

$33,542 

User  Error  Reduction 

$16,771 

$16,771 

$16,771 

Reduced  Training 

Total  Cost  Savings 

$17,500 

$67,813 

$50,313 

$50,313 

$168,439 

TABLE  64.8. 

Conversion  Rate  Benefits 

Advertising  Dollars 

Dollar  Value 

Lost  Advertising 

Users 

Conversion  Rate 

Active  Users 

per  User 

of  Subscribers 

Dollars 

10,000 

13% 

1,250 

$40 

$50,000 

$350,000 

Usability  Benefit 

50% 

Advertising  Dollars 

Dollar  Value 

Lost  Advertising 

Users 

Conversion  Rate 

Active  Users 

per  User 

of  Subscribers 

Dollars 

10,000 

19% 

1,875 

$40 

$75,000 

$325,000 
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example  above,  the  usability  testing  and  subsequent  site  re¬ 
design  increased  the  conversion  rate  by  700%!  Clearly,  the  ROI 
projections  based  on  that  testing  would  be  much,  much  higher, 
and  the  usability  work  well  cost-justified. 


SOME  FINAL  POINTS 


What  of  the  Not-For-Sale,  InternaLUse 
Software  or  Website? 

In  this  chapter  our  examples  have  focused  on  vended  software 
and  e-commerce  sites.  Similar  cost-benefit  analyses  are  possi¬ 
ble  for  intranet  sites  or  informational  sites.  “For  intranet  sites,  ef¬ 
ficiency,  memorability,  and  error  reduction  become  the  most 
important  usability  attributes”  (Nielsen,  2000,  p.  274).  Thus,  em¬ 
ployees’  time  spent  locating  documents  on  the  company  in¬ 
tranet  site  can  be  calculated,  and  the  resultant  improvements 
due  to  a  well-designed,  usable  site  demonstrated. 

Are  We  Worried  About  Achieving  a  Particular  ROI, 
or  About  Avoiding  a  Disaster  (and  a  Lawsuit)? 

We  have  worried  about  numbers — dollars  in  particular.  But 
there  are  intangible  costs  and  benefits  to  be  considered,  which 
may  or  may  not  translate  into  dollars.  Could  another  cost  of 
poor  usability  engineering  be  a  bad  review  in  the  trade  press?  Or 
it  could  be  a  lawsuit,  if  someone  thinks  an  actual  or  implied  con¬ 
tract  was  not  fulfilled  because  people  couldn’t  use  a  (perhaps 
defect-free)  user  interface  effectively?  Or  it  could  mean  a  picket 
line  of  angry  users,  picked  up  by  the  six  o’clock  news?  Or  cus¬ 
tomer  hostility  publicly  expressed  all  over  the  Internet?  'When 
evaluating  whether  or  not  to  pursue  a  course  of  usability  engi¬ 
neering,  it’s  important  to  consider  not  only  what  you’ll  get  that 
you  like,  but  also  what  you  won’t  get  that  you  wouldn’t  like. 

The  "Developers  are  Users"  Inhibitor — How  Shall  We 
Learn  Not  to  Depend  on  Our  Own  Intuitions? 

Why  has  usability  not  been  more  warmly  and  universally  em¬ 
braced  as  a  vital  aspect  of  software  user  interface  development? 
Why  is  it  that  “usability  is  one  of  the  first  things  cut  in  a  budget” 
(Rhodes,  2000,  p.  4)?  Perhaps  it  is  because  all  developers  are 
also  users.  Every  software  developer  is  also  software  user,  and 
so  reflexively  feels  justified  to  depend  on  his  or  her  own  intu¬ 
itions  as  to  what  is  usable.  'While  the  typical  usability  engineer 
wouldn’t  think  of  telling  a  software  developer  how  to  struc¬ 
ture  his  or  her  code,  the  software  developer  often  feels  natu¬ 
rally  qualified  to  make  usability  pronouncements.  The  oh-so- 
vital  distinction,  between  being  a  “user”  and  a  “representative 
user”  tends  to  get  lost.  The  experienced  usability  professional 
has  had  much  practice  trying  to  convince  a  product  developer 
that  the  target  audience  for  this  product/site  is  likely  not  some¬ 
one  with  a  master’s  degree  in  computer  science  or  electrical 
engineering,  and  is  certainly  not  someone  with  18  months  of 
experience  working  on  the  project  in  question.  The  best  way  to 


drive  this  point  home  is  to  encourage  software  developers  to 
observe  usability  testing  (or  participate  directly  in  the  usability 
evaluation;  see,  for  example.  Bias  (1994)).  Previously  unable 
to  comprehend  that  someone  could  not  understand  how  to 
navigate  a  particular  path  through  the  software  user  interface, 
the  product  developer  becomes  convinced  when  he  or  she 
sees  two  or  three  test  participants  (the  first  one  was  just  “stu¬ 
pid”)  repeat  a  problem.  In  lieu  of  having  developers  participate 
in  or  observe  a  usability  test,  making  available  video  highlights 
of  several  normal-looking  people  encountering  the  same  us¬ 
ability  problem  can  be  very  convincing;  drawing  attention  to 
such  scenes  by  embedding  media  in  usability-engineering  re¬ 
ports  can  help  to  maintain  a  focus  on  actual  users  when  inter¬ 
acting  with  other  stakeholders. 

Finding  the  Stakeholder  Where  the  Cost  and  Benefit 
Lines  Cross — ^What  to  Do  When  the  Costs  Outweigh 
the  Benefits  for  a  Particular  Suborganization 

Imagine  the  following  scenario:  armed  with  your  usability  cost- 
justification  projections,  you  eagerly  and  confidently  stride  into 
a  room  of  the  decision  makers  at  your  software  company.  Hav¬ 
ing  done  your  homework  and  generated  an  empirical,  cost- 
benefit  analysis,  you  propose  a  course  of  usability  engineering 
that,  while  costing  the  development  team  an  extra  $20,000,  will 
save  a  whopping  $100,000  in  the  development  of  training  ma¬ 
terials.  “'Way  to  go,”  right?  'Well,  if  your  audience  consists  of  the 
company’s  director  of  product  development  and  the  director  of 
training,  they  may  well  be  underwhelmed  by  your  proposal. 
Now,  there  is  likely  someone  higher  up  the  organizational  struc¬ 
ture  who  worries  about  both  the  development  budget  and  the 
training  budget,  who  will  be  more  impressed,  and  more  likely  to 
be  convinced  by  your  cost-benefit  analysis. 


CONCLUSION 


In  case  study  after  case  study,  usability  is  proving  its  worth. 
In  a  wide  variety  of  domains  using  a  wide  variety  of  usability¬ 
engineering  methods,  those  domains  which  spend  money  on 
professional  usability  engineering  and  take  the  time  to  calculate 
the  ROI  for  those  expenditures  are  finding  that  they  are  glad 
they  did  so. 

•  ‘An  investment  in  user  and  task  analysis  early  in  the  design 
and  development  process  will  reap  considerable  benefits 
in  cost  savings  later  in  the  process”  (Hackos  &  Redish,  1998, 

p.  122). 

•  “. . .  An  executive  development  manager  observed  that  reduc¬ 
ing  call  volume  by  even  25%  would  cover  the  hiring  of  three 
additional  support  technicians”  (Montaniz  &  Kissel,  1995). 

•  “Sun  Microsystems  has  shown  how  spending  $20,000  could 
yield  a  savings  of  $152  million”  (Rhodes,  2000,  p.  4). 

•  “Performing  usability  cost-benefit  analyses  within  ones  own 
company  may  be  a  first  step  toward  introducing  usability  en¬ 
gineering  techniques  into  a  company”  (Donahue,  Weinschenk, 
&  Nowicki,  1999). 
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•  .  designers  who  have  an  understanding  of  financial  issues 
and  statistics  will  be  better  able  to  follow  a  standard  ROI 
process,  make  the  business  case  for  their  projects,  design  and 
specify  and  effective  user  experience,  and,  ultimately  achieve 
the  user  and  business  goals”  (Herman,  2004). 

These  quotations  have  offered  encouragement  and  a  plan  for 
gathering  cost  and  benefit  projections,  and  actual  calculations, 
and  using  a  metric  everyone  can  understand — dollars — to  cost- 
justify  a  usability-engineering  effort.  We  have  offered  some  ex¬ 
amples  of  spreadsheets  to  model  this  sort  of  cost-justification 
approach.  Readers  will  have  to  derive  their  own  values  for  the 
cost  of  a  developer  hour,  the  cost  of  a  “basic”  program  of  us¬ 
ability  engineering,  the  likely  reduction  in  customer  support 
calls  due  to  usability  engineering,  and  so  forth.  Further,  at  no 
point  mentioned  earlier  was  the  whole  package  put  together. 
That  is,  if  we  pursue  a  certain  program  of  usability  engineering, 
what  would  be  the  grand  total  of  all  the  benefits,  including  (but 
likely  not  limited  to) : 

•  Reduced  costs  due  to  development  efficiency; 

•  Reduced  costs  due  to  fewer  calls  to  the  customer  support  line; 

•  Reduced  costs  due  to  less  need  for  documentation; 

•  Reduced  costs  due  to  fewer  training  materials  needed; 

•  Increased  sales/visits; 

•  Increased  return  clients/visitors; 

•  Increased  user  throughput; 

•  Decreased  customer-training  costs; 

•  Decreased  customer  costs  due  to  ability  to  hire  less-sophisti¬ 
cated  user  audience; 

•  Reduced  chance  of  lawsuits; 


•  Reduced  chance  of  public  relations  disaster; 

•  Increased  chance  of  trade-press  praise; 

•  Increased  customer  satisfaction. 

Our  choice  not  to  build  one  “grand  total”  cost-benefit 
analysis  (showing,  certainly,  a  ROI  of  hundreds  of  thousands 
of  dollars  to  one)  is  intended  to  emphasize  the  importance 
of  knowing  the  “value  proposition”  for  your  audience;  the  di¬ 
rector  of  training  may  be  unmoved  by  your  argument  to  re¬ 
duce  training  needs.  (Indeed,  this  can  be  parried  by  arguing 
for  those  training  dollars  being  spent  on  training  the  more  so¬ 
phisticated  functionality  of  the  product,  and  not  the  day-to- 
day  tasks.) 

Given  the  state  of  the  software-development  world  today, 
though,  usability  engineering  almost  certainly  will  yield  a  robust 
ROI,  in  almost  any  Ul/website  development  effort.  By  coming 
up  with  reasonable  projections  in  advance,  the  usability  advo¬ 
cate  can  gain  the  resources  necessary  to  carry  out  that  first  pro¬ 
gram  of  usability  engineering.  By  following  up  and  calculating 
the  “actuals”  after  the  product  ships  or  “goes  live,”  the  usability 
advocate  will  likely  demonstrate  how  conservative  the  original 
projections  were,  and  thus  earn  the  confidence  of  the  stake¬ 
holders  for  quite  a  long  time.  Thereafter,  the  cost-justification 
effort  will  become  a  way  to  continue  to  validate  the  usability  ex¬ 
pense,  and  to  fine-tune  the  program  of  usability  engineering, 
demonstrating  which  usability  methods  were  the  most  cost  ef¬ 
fective  for  which  types  of  products,  at  various  points  in  the  de¬ 
velopment  cycle. 

(Portions  of  this  chapter  appeared  in  a  chapter  by  Deborah  J. 
Mayhew  and  Randolph  G.  Bias  in  Ratner,  J.  (Ed.)  (20Qy) .  Human 
factors  and  web  development.  (2nd  ed.).  Lawrence  Erlbaum 
Associates.) 
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Human-computer  interaction  (HCI)  continues  to  evolve  in 
terms  of  educational  approaches,  employment  opportunities, 
and  issues  that  are  explored.  After  careful  consideration,  we  de¬ 
cided  to  close  the  second  edition  of  the  HCI  Handbook  by  pos¬ 
ing  a  series  of  questions  to  five  well-established  members  of  the 
HCI  community: 

•  John  M.  Carroll  is  the  Edward  Frymoyer  Chaired  Professor  of 
Information  Sciences  and  Technology  and  the  Director  of  the 
Center  for  Human-Computer  Interaction  at  Penn  State  Uni¬ 
versity.  His  research  interests  include  methods  and  theory  in 
human-computer  interaction,  particularly  as  applied  to  net¬ 
working  tools  for  collaborative  learning  and  problem  solv¬ 
ing,  and  the  design  of  interactive  information  systems.  He  has 
written  or  edited  15  books,  published  more  than  400  arti¬ 
cles,  serves  on  12  editorial  and  advisory  boards,  and  is  editor- 
in-chief  of  the  Association  of  Computing  Machinery’s  (ACM) 
Transactions  on  Computer-Human  Interactions.  He  has  re¬ 
ceived  both  the  Rigo  Career  Achievement  Award  and  the  CHI 
Lifetime  Achievement  Award  from  the  ACM  special  interest 
groups  on  Documentation  and  Computer  Human  Interaction 
(SIGDOC  and  SIGCHI  respectively),  was  elected  to  the  CHI 
Academy,  and  received  the  Goldsmith  Award  from  Institute  of 
Electrical  and  Electronics  Engineers  (IEEE)  and  the  Silver 
Core  Award  from  the  International  Federation  for  Informa¬ 
tion  Processing.  He  is  a  Fellow  of  the  ACM,  IEEE,  and  Human 
Factors  and  Ergonomics  Society. 

•EL.  Patrick  Rau  is  a  Professor  of  the  Department  of  Industrial 
Engineering  at  Tsinghua  University  in  Beijing  where  he  founded 
and  directs  the  Human-Computer  Interaction  and  Usability 
Research  Lab.  His  research  and  teaching  activities  have  fo¬ 
cused  on  human-computer  interaction,  mobile  advertisement, 
design  for  aged  people,  and  cross-cultural  design.  He  has  pub¬ 
lished  articles  on  these  topics  in  the  HCI  and  related  journals. 
He  is  a  member  of  the  Human  Factors  and  Ergonomics  Society, 
the  Association  for  Computing  Machinery,  China  Ergonomics 
Society,  and  Taiwan  Ergonomics  Society. 

•  Katsuhiko  Ogawa  was  named  the  Director  of  Research  Plan¬ 
ning  within  the  Department  of  Research  and  Development  at 
Nippon  Telegraph  and  Telephone  Corporation  (NTT)  in  1995. 
He  has  served  as  the  Director  of  NTT  Cyber  Solutions  Labo¬ 
ratories  since  2004.  His  areas  of  expertise  include  human  inter¬ 
action,  media  design,  and  home  appliances.  He  has  published 
more  than  80  articles,  is  a  member  of  Human  Factors  and  Er¬ 
gonomics  Society,  The  Institute  of  Electronics,  Information 
and  Communication  Engineers,  and  Information  Processing 
Society  of  Japan. 

•  Gavriel  Salvendy  is  a  professor  of  Industrial  Engineering  at  Pur¬ 
due  University  and  Chair  Professor  and  Head  of  the  Depart¬ 
ment  of  Industrial  Engineering  at  Tsinghua  University  in  Bei¬ 
jing.  He  is  the  author  or  coauthor  of  over  420  research 
publications  and  is  the  author  or  editor  of  27  books.  His  re¬ 
search  focuses  on  the  human  aspects  of  design,  operation,  and 
management  of  advanced  engineering  systems.  He  is  the  found¬ 
ing  editor  of  the  International  Journal  on  Human-Computer 
Interaction  and  Human  Factors  and  Ergonomics  in  Manufac¬ 
turing.  In  1995,  he  received  an  Honorary  Doctorate  from  the 
Chinese  Academy  of  Science.  He  is  an  honorary  fellow  and  life 
member  of  the  Ergonomics  Society  and  fellow  of  Human  Fac¬ 


tors  and  Ergonomics  Society,  Institute  of  Industrial  Engineers, 
and  the  American  Psychological  Association.  In  1990,  he  was 
elected  to  the  National  Academy  of  Engineering. 

•  Professor  Constantine  Stephanidis  is  faculty  at  the  Depart¬ 
ment  of  Computer  Science — University  of  Crete.  He  is  also 
the  Director  of  the  Institute  of  Computer  Science — FORTH, 
and  Head  of  its  Human-Computer  Interaction  Laboratory  and 
the  Centre  for  Universal  Access  and  Assistive  Technologies.  He 
is  the  President  of  the  Board  of  Directors  of  the  Science  and 
Technology  Park  of  Crete.  His  research  interests  lie  in  the 
fields  of  Human-Computer  Interaction  and  Universal  Access 
and  many  of  his  activities  have  focused  on  user  interfaces  self- 
adapting  to  user  requirements  and  context  of  use.  He  has 
published  more  than  300  articles.  He  is  the  Editor-in-Chief  of 
the  Springer  journal  Universal  Access  in  the  Information  So¬ 
ciety.  He  is  the  founding  chair  of  the  ERCIM  Working  Group 
“User  Interfaces  for  All”  and  of  the  International  Conference 
“Universal  Access  in  Human-Computer  Interaction.”  He  is  ac¬ 
tively  involved  with  the  European  Commission’s  Research  and 
Development  Policy  on  Information  Society  Technologies 
with  particular  focus  on  issues  related  to  e-Inclusion. 

We  began  by  asking  about  grand  challenges  in  human- 
computer  interaction  (HCI).  While  others  have  asked  about 
grand  challenges  in  information  systems,  information  technology, 
e-Science,  and  other  specialized  fields,  we  felt  it  was  time  to 
answer  this  question  in  the  context  of  HCI.  While  the  grand- 
challenges  question  looks  toward  the  future,  we  also  felt  it  was 
appropriate  to  reflect  upon  the  accomplishments  of  the  field.  To 
do  this,  we  asked  about  important  results  that  have  emerged 
from  HCI  research  during  the  last  decade.  We  concluded  with  a 
series  of  additional  questions  the  future,  with  the  goal  of  gaining 
insight  into  important  application  domains,  interaction  innova¬ 
tions,  and  the  education  of  future  HCI  researchers  who  will  ad¬ 
dress  the  grand  challenges  and  produce  future  innovations. 
Lastly,  we  explored  the  future  of  HCI  as  a  discipline  and  profes¬ 
sion.  "While  the  length  of  the  responses  varies,  we  believe  you 
will  be  able  to  identify  common  themes  among  the  responses. 
We  hope  you  will  find  useful  suggestions  to  guide  your  research, 
ideas  that  may  help  shape  educational  programs,  and  ideas  that 
will  guide  you  in  practice. 


WHAT  ARE  HCrS  THREE 
GRAND  CHALLENGES? 


John  M.  Carroll 

To  broaden  the  question  a  bit,  I  think  HCI  faces  different  kinds  of 
challenges.  Since  this  interview  is  for  a  technical  book,  I  assume 
you  are  most  interested  in  engineering  and  scientific  challenges. 
However,  HCI  also  faces  organizational/political  challenges  at 
several  levels.  These  latter  challenges  are  very  important,  since 
they  embody  threats  to  the  whole  enterprise  of  HCI.  For  exam¬ 
ple,  consider  the  challenges  of  research  funding.  Nonmilitary 
funding  and  funding  for  basic  research  has  significantly  declined 
in  recent  years.  Funding  from  industry,  and  even  in-house  in¬ 
dustrial  research  funding,  has  significantly  declined.  National  Sci- 
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ence  Foundation  research,  and  what  research  there  still  is  in  in¬ 
dustry  are  more  applied  and  more  narrowly  focused  than  in  the 
past.  None  of  this  is  good  for  HCI,  and  the  solutions  lie  beyond 
anything  technical  theories  and  empirical  results  can  provide. 

The  HCI  community  also  faces  many  organizational  chal¬ 
lenges  in  constituting  itself  There  has  been  a  persistent  ten¬ 
sion  in  product  development  organizations  about  how  to  orga¬ 
nize  usability  functions.  This  is  more  complicated  now  because 
usability  is  not  a  single  function  and  is  frequently  addressed 
by  3-4  different  professional  perspectives  for  example,  user- 
experience  analysis,  graphic  design,  user  interface  and  applica¬ 
tion  development,  and  usability  evaluation.  All  of  these  per¬ 
spectives  and  professional  skill  sets  are  integral  to  HCI.  This 
differentiation  is  a  natural  and  very  positive  development;  it 
shows  again  how  important  usability  is  to  effective  product  de¬ 
velopment.  But,  now,  instead  of  the  simple  battles  between  vir¬ 
tuous  usability  guys  and  lunkhead  developers,  we  can  have  dis¬ 
putes  between  different  kinds  of  HCI  perspectives.  It  raises  the 
question  of  how  HCI  is  or  will  be  a  profession  or  discipline. 

The  engineering  and  scientific  challenges  are  actually  far  less 
daunting,  though  a  good  deal  more  exciting.  One  is  the  emer¬ 
gence  of  ubiquitous  computing — I  prefer  the  European  term 
“the  disappearing  computer” — as  the  reference  paradigm  for 
HCI.  Of  course,  computing  has  been  moving  off  the  server,  off 
desktop,  and  into  personal  devices  for  many  years.  And  there  is 
no  shortage  of  attention  to  these  developments,  or  interesting 
discussions  of  what  they  mean  for  research,  development,  and 
education  in  HCI.  However,  it  is  likely  that  we  have  only  seen 
the  very  beginning  of  the  diffusion  of  computing  into  the  ob¬ 
jects  and  experiences  of  everyday  life.  What  will  HCI  become 
without  the  computer? 

The  second  engineering  and  scientific  challenge  I  would 
mention  is  “end-user  programming.”  Ironically,  some  of  the  ear¬ 
liest  work  in  what  became  HCI  addressed  the  psychology  of  pro¬ 
gramming.  In  that  work,  early  HCI  researchers  investigated  how 
people  understand  and  can  use  conditionals,  how  they  can  use 
Boolean  logic  to  express  queries,  etc.  This  thread  has  never  died 
completely,  but  at  times  it  has  appeared  a  somewhat  arcane  fo¬ 
cus  in  HCI.  I  think  it  is  more  urgent  than  ever.  As  computing 
moves  into  the  personal  realm,  it  is  essential  for  people  to  be 
able  to  exert  more  direct  and  wide-ranging  control.  It’s  not  just 
nice  to  configure  a  web  server,  it’s  essential;  indeed,  it  may  soon 
be  essential  to  have  one  running  on  your  wristwatch.  This  chal¬ 
lenge  interacts  with,  and  perhaps  is  compounded  by,  the  disap¬ 
pearing  computer:  If  manipulating  explicit  Booleans  and  other 
formal  expressions  in  the  relative  calm  of  a  desktop  interaction 
is  problematic,  how  will  people  get  along  drawing  specifications 
in  the  air  as  they  cross  the  street? 

My  third  “grand  challenge”  is  collaboration.  I  know  we’ve 
had  20  years  of  computer-supported  cooperative  work  (CSC’W), 
and  maybe  30  years  of  computer-mediated  communication 
(CMC),  but  I  think  we  still  have  only  scratched  the  surface.  One 
of  the  main  functions  of  computers  now  is  to  give  us  access  to 
e-mail,  wikis,  and  blogs,  so  that  we  can  interact  with  other  peo¬ 
ple.  The  sophistication  of  collaborative  systems  is  developing 
rapidly.  Many  project  groups  routinely  employ  collaborative  in¬ 
frastructures  that  a  just  few  years  ago  would  have  been  gems  of 
the  U.S.  National  Science  Foundation’s  collaboratories  initiative. 
A  modern  frontier  is  collaboration  that  never  stops:  Distributed 
global  teams  working  in  several  time  zones. 


Katsuhiko  Ogawa 

In  the  next  10  years,  we  will  likely  see  the  emergence  of  three 
integrated  services:  (a)  telecommunications  and  broadcasting, 
(b)  fixed  and  mobile,  and  (c)  telecommunication  and  informa¬ 
tion.  The  first  challenge  for  HCI  relates  to  the  integration  of 
telecommunication  and  broadcast  services.  In  television,  major 
changes  in  HCI  will  bring  about  TVs  you  can  watch  wherever 
and  whenever  you  want,  TVs  with  which  you  can  watch  broad¬ 
casts  from  all  over  the  world  via  the  Internet,  and  TVs  where  you 
can  choose  whatever  picture  you  want  from  those  being  sent  in. 
The  next  challenge  will  be  HCI  for  mobile  appliances.  These  will 
include  cameras  and  music  players,  of  course,  and  HCI  for 
portable  devices  with  an  electronic  money  function  will  also  be¬ 
come  very  important.  Finally,  there  will  be  the  challenge  of  HCI 
for  cyberspace.  Communication  tools  tailored  to  the  individual 
will  be  designed  to  enable  websites,  communities,  and  chat 
rooms  for  the  individual. 

R  L.  Patrick  Rau 

The  first  challenge  is  to  make  effective  contributions  in  busi¬ 
ness,  particularly  in  emerging  markets  such  as  China.  In  many 
companies,  HCI  people  have  successfully  demonstrated  their 
importance  on  product/service  development.  However,  the 
contribution  of  HCI  on  the  profit  of  a  business  is  difficult  to  see. 
The  second  grand  challenge  is  to  develop  and  validate  new 
methodologies.  The  pressure  for  improving  HCI  methodologies 
is  increasing  because  of  the  rapid  growth  of  Internet  and  tele¬ 
communication  industries.  The  development  lifecycle  and  the 
product  lifecycle  of  handheld  devices  are  decreasing.  In  addi¬ 
tion,  the  importance  of  fun  and  emotional  satisfaction  for  com¬ 
puters  is  growing,  but  current  HCI  methodologies  are  mostly 
designed  for  goal-oriented  tasks.  The  final  grand  challenge  is  to 
develop  new  technologies  that  impact  user  experience.  There 
will  be  some  killer  technologies  that  can  change  the  way  peo¬ 
ple  use  computing  devices. 

Gavriel  Salvendy 

The  first  challenge  is  to  develop  the  science  base  of  HCI.  We  also 
need  to  develop  a  comprehensive  educational  program  for  HCI, 
so  that  all  computer-science  and  information-technology  students 
would  be  exposed  to  HCI.  Finally  HCI  must  to  push  the  needed 
technology  rather  than  respond  to  the  emergence  of  technology. 

Constantine  Stephanldis 

Computing  technology  evolves  rapidly,  and  each  new  generation 
of  technology  offers  new  opportunities  to  improve  the  quality  of 
human  life.  HCI  is  one  of  the  key  factors  toward  ensuring  social 
acceptability  of  computer-based  products  and  services  and  is 
therefore  continuously  called  upon  to  face  new  challenges,  by 
adapting,  or  better  anticipating,  new  requirements  for  technology 
acceptance.  Several  recent  trends  in  the  direction  of  improving 
acceptability  can  be  observed.  Universal  access  aims  at  preventing 
the  exclusion  of  users  from  the  Information  Society,  while  at  the 
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same  time  increasing  the  quality  and  usability  of  products  and  ser¬ 
vices.  Additionally,  there  are  approaches  to  design  that  emphasise 
the  need  to  better  reflect  users’  everyday  goals  by  providing  more 
natural  and  activity-oriented,  rather  than  application-oriented, 
forms  of  interaction,  and  by  breaking  down  computing  into  spe¬ 
cialized  devices  that  are  designed  specifically  for  everyday  activi¬ 
ties.  Other  factors  related  to  user  acceptance  include  the  need 
to  provide  self-explaining  interfaces  that  limit  the  learning 
process  to  a  minimum,  as  well  as  user  interfaces  that  increase 
user  trust  and  confidence.  Another  perspective  is  that  of  moving 
beyond  usability  and  efficiency  in  design  toward  a  more  emo¬ 
tionally  pleasant  user  experience.  Consequently,  a  first  grand  chal¬ 
lenge  is  that  HCI  must  now  explore  new  areas  to  understand  and 
formalize  the  knowledge  of  what  makes  a  design  more  accept¬ 
able,  appealing,  and  fun  to  users,  and  to  overcome  the  social,  cul¬ 
tural,  and  organizational  barriers  that  may  limit  acceptance  of 
technological  breakthroughs. 

A  second  grand  challenge,  which  is  also  related  to  the  first, 
is  the  need  to  establish  HCI  theories  and  methodologies  that 
are  defined  at  a  fundamental,  more  abstract  level.  HCI  is  a  dy¬ 
namic  field,  and  the  nature  of  interaction  changes  dramatically 
as  time  goes  by.  New  technologies  and  trends  emerge  too  of¬ 
ten  to  be  able  to  apply  specific  guidelines  and  practices  to  all 
of  them  in  an  equally  successful  manner.  This  is  evident  from 
the  history  of  computing,  which  has  started  with  command¬ 
line  interfaces  and  has  gone  through  a  series  of  successive  de¬ 
velopments,  from  the  graphical  user  interface,  to  mobile  com¬ 
puting,  virtual  reality,  and  ubiquitous  computing.  In  order  to 
be  prepared  for  new  evolutions,  a  more  fundamental  approach 
to  interaction  needs  to  be  elaborated.  Human-centered  design 
was  a  first  attempt  toward  this  direction.  Although  manifesta¬ 
tions  in  design  guidelines  and  interaction  paradigms  change 
along  with  technology,  the  basic  principle  remains  the  same: 
putting  the  human  user  at  the  center.  Similarly,  current  ap¬ 
proaches  toward  universal  access  transcend  specific  techno¬ 
logical  manifestations. 

The  third  grand  challenge  concerns  computer-based  support 
and  “digitization”  of  HCI  practices.  Today,  most  work  in  the  de¬ 
sign  and  evaluation  of  user  interfaces  is  still  being  conducted  “on 
paper.”  This  is  clearly  becoming  impracticable,  as  the  task  of  HCI 
practitioners  becomes  increasingly  complex  due  to  the  diversity 
of  parameters  involved  and  the  need  to  design  interactive  envi¬ 
ronments  that  are  entangled  with  the  physical  world;  rather  than 
simply  designing  user  interfaces  to  single  applications.  In  this 
respect,  appropriate  methods  and  tools  are  required. 


WHAT  ARE  THE  THREE  MOST  IMPORTANT 
RESULTS  THAT  HAVE  EMERGED  FROM  HCI 
RESEARCH  IN  THE  LAST  1 0  YEARS? 


John  M.  Carroll 

The  last  10  years  have  been  full  of  important  results,  factoring 
them  into  three  bullets  is  the  hard  part! 

One  of  the  three  would  have  to  be  interactive-information 
visualization.  The  past  10  years  established  the  “spotfire”  par¬ 
adigm  for  allowing  people  to  drive  visualization  toward  the  in¬ 


formation  they  want.  One  classic  example  is  the  University  of 
Maryland’s  real  estate  finder.  The  visualization  is  a  map  of  the 
District  of  Columbia  and  its  surrounds.  A  set  of  sliders  is  pro¬ 
vided,  allowing  the  user  to  specify  a  price  range,  as  well  as  in¬ 
dicating  locations  to  which  they  will  commute  or  frequently 
travel  to,  and  how  far  from  these  locations,  they  are  willing  to 
live.  As  the  user  tunes  these  sliders,  spots  appear/disappear  on 
the  map  indicating  currently  available  homes  that  meet  the  cur¬ 
rent  search  specification.  This  project  utterly  transformed  how 
people  think  about  information  visualization.  The  basic  direc¬ 
tion  is  now  being  most  dramatically  pioneered  in  virtual  envi¬ 
ronment  work. 

My  second  candidate  would  be  collaboration  via  the  Web. 
The  original  web  itself  is  barely  10  years  old,  and  was  a  huge  HCI 
result:  usable  ftp  (file  transfer  protocol).  The  important  lesson  I 
see  in  this  is  how  a  relatively  modest  enhancement  in  functional¬ 
ity  can  trigger  a  revolution  in  the  use  of  technology.  Subsequent 
refinements  to  the  web,  like  forums,  wikis,  and  weblogs  are  even 
more  important  than  the  original  web.  New  web  technologies  dis¬ 
solve  the  old  distinction  between  synchronous  and  asynchronous 
collaboration.  They  have  made  web-based  collaboration  so  easy, 
so  flexible,  and  so  powerful  that — like  e-mail — we  now  use  it  just 
to  work  with  folks  in  the  next  office.  Anyone  can  now  create  and 
manage  a  sophisticated  collaboratory  environment.  It  is  difficult 
to  imagine  managing  a  project  without  these  tools. 

Thinking  more  broadly  about  social  interaction,  many  of 
these  same  tools  for  web-based  collaboration  have  changed  the 
fabric  of  society.  Sixty  million  people  are  cultivating  weak  ties  on 
myspace.com,  while  30  million  are  developing  strong  ties  on 
facebook.com.  This  is  a  nice  validation  for  Mark  Granovetter, 
who  originated  the  distinction  between  relatively  more  intimate 
(strong)  ties  to  those  we  regularly  interact  with,  and  relatively 
more  casual  (weak)  ties  to  those  we  merely  know,  but  who  can 
provide  specific  and  critical  help  in  specific  areas,  such  as  advice 
in  a  job  search. 

I’m  starting  to  hate  this  “top  three”  thing!  My  third  important 
result  is  powerful  information  retrieval  tools.  This  is  a  bit  of  a 
cop-out,  since  I  am  using  it  to  aggregate  a  whole  bunch  of  now- 
paradigmatic  mechanisms  that  are  quite  distinct,  but  generally 
help  us  find  stuff — latent  semantic  analysis  (LSA) ,  which  repre¬ 
sents  information  structures  by  patterns  of  co-occurrence  and 
social  recommendation,  as  employed  by  Amazon,  which  allows 
us  to  make  use  of  the  prior  searches  of  ourselves  and  others  in 
streamlining  current  searches,  and  Google’s  search  algorithms, 
which  make  searching  the  entire  web  easier  and  more  success¬ 
ful  than  looking  for  information  on  your  desk. 

Katsuhlko  Ogawa 

One  important  result  is  universal  designs  for  aged  and  disabled 
persons.  Another  is  interaction  between  portable  devices  and 
users,  such  as  that  in  portable  phones,  portable  computers, 
iPods,  and  digital  cameras.  And  another  is  interfaces  for  the  dis¬ 
patching  of  individual  information,  such  as  blogs  and  home- 
pages.  To  be  sure,  these  did  not  surface  directly  from  HCI  re¬ 
search,  but  it  is  certainly  true  that  the  merits  and  advantages  of 
HCI  were  the  deciding  factors  in  enabling  these  to  become 
widely  available. 
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R  L.  Patrick  Rau 

Website  usability  is  clearly  one  of  the  most  important  advances 
in  the  last  decade.  Many  HCI  researchers  have  been  working 
on  improving  user  experience  on  the  web.  I  would  also  include 
advances  and  significant  progress  in  the  area  of  user  interfaces 
for  handheld  devices.  Handheld  devices  such  as  cell  phones  and 
MP3  players  have  drawn  attention  of  many  HCI  researchers. 

Gavriel  Salvendy 

The  first  would  have  to  be  the  broad  area  of  concepts,  methods, 
and  tools,  which  have  been  developed  to  accommodate  all  seg¬ 
ments  of  society.  I  would  also  include  developments  and  meth¬ 
ods  relating  to  visualization  and  the  development  of  adaptive- 
interface  methods. 

Constantine  Stephanidis 

The  user-centered  approach  to  design  is  probably  one  of  the 
most  important  benefits  arising  from  HCI  research  in  the  last 
decade.  Moving  from  a  technological  perspective  of  design  to 
placing  the  user  as  the  first  priority  has  fundamentally  changed 
the  way  we  design  artifacts  in  all  domains.  In  the  computing 
world,  we  have  seen  the  impact  of  this  process  most  promi¬ 
nently  in  user  interfaces,  ranging  from  software  applications 
and  services  to  operating  systems  and  web  pages.  The  result 
has  been  an  increase  in  the  usability  of  computers  and  the  qual¬ 
ity  of  users’  experiences.  Furthermore,  having  the  user  at  the 
center  of  the  design  is  a  perspective  that  is  most  likely  to  en¬ 
courage  the  advancement  of  HCI  as  a  discipline.  For  example, 
there  have  been  steps  forward  in  understanding  the  complexi¬ 
ties  of  human  cognition,  emotions,  and  needs  because  the  fo¬ 
cus  of  investigation  has  shifted  from  technology  to  the  human 
being:  Pervasive  Computing  seeks  to  hide  the  technology; 
Emotional  or  Experience  Design  seeks  to  make  interactive 
products  not  only  more  efficient  but  also  more  pleasant,  and 
Universal  Access  aims  to  provide  the  means  for  all  humans  to 
benefit  from  information  technologies.  A  related  result  has 
been  the  development  of  methods  to  evaluate  the  usability 
(and  hence  efficiency  and  quality)  of  designs  and  products.  The 
past  decade  has  seen  considerable  research  in  developing  reli¬ 
able  and  substantial  ways  to  determine  if  a  product  suits  the 
real  requirements  of  users  (as  opposed  to  the  perceived  re¬ 
quirements)  .  Appropriate  heuristics  and  empirical  evaluation 
methods  conducted  in  the  laboratory  have  started  to  become 
the  norm  in  the  industry,  although  there  is  still  much  room  for 
improvement  in  this  respect.  Work  on  usability  has  also  been 
consolidated  in  a  wide  range  of  standards  that  bring  design 
knowledge  nearer  to  wider  practice. 

Recently,  considerable  advances  have  also  taken  place  in  the 
domain  of  computer  accessibility,  which  is  now  recognized  as  an 
important  prerequisite  for  technology  acceptance  and  a  funda¬ 
mental  aspect  of  universal  access.  There  is  now  a  considerable 
body  of  knowledge  on  how  to  make  interactive  applications  and 
services  accessible  by  people  with  various  types  of  disabilities,  but 
also  of  different  ages  and  cultural  backgrounds.  A  typical  exam¬ 


ple  of  concrete  advance  in  this  domain  is  that  of  web  accessibility, 
for  which  there  are  now  universally  recognized  de  facto  stan¬ 
dards,  and,  in  many  countries,  also  regulatory  frameworks. 

Another  recent  advance  in  HCI  research,  where  considerable 
results  have  been  obtained,  although  many  issues  remain  to  be 
explored,  concerns  user  interface  personalization.  Adaptation 
methods  and  techniques  have  been  investigated,  and  new 
forms  of  interaction  have  emerged  through  multimodality  and 
multidimensionality  of  user  interfaces.  This  area  of  research  is 
one  of  the  most  exciting,  as  the  adaptation  of  interaction  de¬ 
pending  on  the  context  of  interaction  is  crucial  for  universal  ac¬ 
cess,  not  only  for  disabled  users  but  also  for  the  population  at 
large.  For  example,  nonvisual  interaction  is  more  generally  use¬ 
ful  beyond  the  case  of  blind  users. 


WHAT  DO  YOU  SEE  AS  THE  MOST  EXCITING 
EMERGING  DOMAINS  FOR  HCI  RESEARCHERS 
AND  PRACTITIONERS  TO  EXPLORE? 


John  M.  Carroll 

To  me,  the  term  “domain”  is  a  bit  problematic,  but  in  an  inter¬ 
esting  way.  System  architectures  define  domains.  Types  of  func¬ 
tions  define  domains.  Applications  define  domains.  And  user  ex¬ 
periences  define  domains.  All  these  different  senses  of  the  term 
suggest  exciting  emerging  domains  for  HCI. 

New  kinds  of  systems  always  entrain  exciting  emerging  do¬ 
mains  for  HCI.  The  system  infrastructures  for  the  disappearing 
computer  are  obviously  creating  new  domains  for  HCI  in  mobile 
interaction  and  distributed  agency.  One  of  the  most  important 
of  these  system-driven  domains  is  security  and  privacy.  In  the 
mainframe  and  client-server  worlds,  these  are  central-system  ad¬ 
ministration  services.  However,  increasingly,  they  are  end-user 
concerns.  Security  and  privacy  is  already  an  important  HCI  do¬ 
main,  and  will  be  far  more  important  in  coming  years. 

New  types  of  functions  also  define  emerging  domains  for 
HCI.  One  important  area  of  this  sort  is  universal  accessibility. 
Not  everyone  wants  or  would  tolerate  a  user  interface  with 
extra-large  text,  but  for  some  users,  the  possibility  of  large  fonts 
makes  the  difference  between  having  access  to  online  services 
and  not  having  that  possibility.  Identifying  the  critical  functions 
to  modulate  user  interface  presentations  and  controls  to  better 
reflect  the  diversity  of  human  competencies  and  preferences 
is  a  major  ongoing  focus  in  HCI.  For  example,  there  was  a 
record-setting  response  to  a  recent  special  issue  of  the  ACM 
Transactions  on  Computer-Human  Interaction  on  accessibil¬ 
ity.  Another  example  of  a  function-driven  HCI  domain  is  web 
authoring.  Wiki  functions  for  linking,  posting,  and  discussing 
on  the  web  have  enabled  social  computing  to  leap  beyond  the 
limitations  of  chat  and  instant  messaging.  Now  users  can  have 
something  to  talk  about,  and  can  collectively  build  spaces  and 
information  structures.  Like  the  original  web  itself,  this  shows 
how  modest  enhancements  in  functionality  can  trigger  explo¬ 
sions  in  the  appropriation  of  technology  by  people. 

New  application  domains  have  always  been  a  key  driver  for 
HCI,  and  this  is  still  the  case.  For  example,  health  care — medical 
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and  health  informatics — is  a  nexus  of  incredible  need  and  in¬ 
credible  opportunity.  Much  of  it  is  quite  mundane  technologi¬ 
cally,  but  the  potential  social  impacts  are  huge.  This  is  probably 
the  center  court  for  HCI  applications  in  the  next  decade,  but 
there  are  lots  of  others.  Community  informatics  is  one  that  I 
have  worked  in  over  the  past  decade  or  so:  The  public  and  civic 
sector,  particularly  at  regional  and  local  levels,  is  literally  where 
we  live,  but  impoverished  with  respect  to  information  technol¬ 
ogy,  in  generally,  and  HCI  in  particular.  There  are  wonderful  and 
inspiring  cases  of  initiatives,  some  of  which  have  also  been  very 
successful  as  public  relations  projects,  but  the  broad  state-of- 
the-art  in  regional  and  local  community  informatics  is  still  fairly 
primitive.  I  believe  that  the  rapidly  growing  demographic  of  ed¬ 
ucated  and  engaged  elderly  is  going  to  be  a  key  to  a  revolution 
in  community  informatics  during  the  next  decade. 

The  most  exciting  emerging  domain  with  respect  to  user  ex¬ 
perience  is  affect.  HCI  started  out  worrying  about  whether  user 
interfaces  experienced  as  “easy  to  use”  could  also  be  experi¬ 
enced  as  “easy  to  learn,”  or  whether  there  might  be  some  sort  of 
terrible  tradeoff  that  would  force  designers  to  choose.  Things 
turned  out  to  be  much  more  interesting  than  that!  Now,  when 
we  speak  of  the  experience  of  “usability,”  we  are  usually  refer¬ 
ring  to  an  ever-growing  complex  of  attributes:  easy  to  use,  easy 
to  learn,  challenging  but  tractable,  consistent  with  prior  knowl¬ 
edge,  supportive  of  existing  workplace  roles  and  practices,  just 
plain  fun,  and  evocative  of  feeling  of  well  being,  creativity,  col¬ 
lective  efficacy,  cultural  identity  and  social  capital.  I  doubt  that 
this  is  the  end  of  the  list. 

Katsuhiko  Ogawa 

Portable  devices  that  are  easy  for  aged  and  disabled  persons  to 
use  and  services  allowing  access  to  information  from  the  Inter¬ 
net  using  these  devices.  Search  functions  that  allow  users  to  find 
beneficial  information  are  also  very  important. 

R  L.  Patrick  Rau 

My  list  would  include  emotional  design,  HCI  in  computer 
games,  HCI  in  smart  environment,  and  cross-cultural  design. 
Design  for  fun  is  the  design  concept  of  many  consumer  prod¬ 
ucts/services.  Functions  are  no  longer  the  only  focus  because 
of  the  maturity  of  some  information  technologies.  Bringing 
more  fun  in  use  means  value  to  business.  Playing  computer 
games  has  become  a  common  life  experience  for  many  younger 
people.  Many  topics  can  be  found  about  HCI  in  computer 
games.  Customization  is  getting  easier,  which  reduces  the  cost 
of  addressing  the  special  needs  for  different  cultures. 

Gavriel  Salvendy 

The  first  area  I  would  list  is  the  development  of  HCI  theories 
and  methods  to  enable  the  effective  use  of  nanotechnology 
based  computers.  I  would  also  include  developing  a  compre¬ 
hensive  understanding  of  how  individuals  in  different  cultures 
make  decisions;  how  the  same  input  results  in  different  decision 
outcomes  across  cultures;  and  how  using  this  know-how  to  de¬ 


velop  an  automatic  translation  across  cultures  (not  languages). 
Finally,  I  believe  that  HCI  will  emerge  as  a  system  science, 
whereby  HCI  will  be  studied  as  part  of  the  overall  job  of  the  in¬ 
dividual  rather  than  as  an  isolated  entity. 

Constantine  Stephanidis 

The  Information  Society  is  anticipated  to  evolve  toward  the 
proliferation  of  computational  systems  that  integrate  a  range  of 
networked  interactive  devices  embedded  into  a  physical  con¬ 
text  (either  indoor  or  outdoor  spaces)  to  provide  a  host  for  a 
broad  range  of  computer-mediated  human  activities  and  access 
to  a  multitude  of  general  or  specific  services  and  applications. 
Such  systems  assume  that  computers  and  networks  are  distrib¬ 
uted  around  physical  environments,  and  are  anticipated  to  ex¬ 
hibit  intelligent  and  context-sensitive  behavior.  Although  cur¬ 
rently  it  is  not  yet  clear  how  the  vision  for  ambient  intelligence 
is  to  be  realized  and  shaped,  its  evolution  is  conceivable  be¬ 
cause  of  some  general  development  trends  that  have  already 
started  to  materialize.  For  example,  services  will  be  dynamic 
and  can  be  reconfigured  or  recombined  at  run  time  to  accom¬ 
modate  the  different  needs  of  different  users  in  different  envi¬ 
ronments.  There  will  be  no  clear  distinction  between  interper¬ 
sonal  communication  and  access  to  information;  different 
components  that  use  different  media  will  be  interconnected 
to  allow  a  free  intermixing  of  these  functions.  Services  will  be 
highly  interactive;  for  example,  interaction  will  be  complex  in 
terms  of  the  functionality  offered,  input  required,  output  pro¬ 
vided,  dialogue  structure,  and  configuration  capabilities.  Most 
services  will  be  inherently  multimedia,  providing  information 
in  multiple  types  of  media  (e.g.,  sound,  graphics,  video,  text, 
animation,  etc.)  simultaneously  and  in  an  integrated  manner. 
Additionally,  interaction  will  often  be  multimodal;  for  example, 
different  sensorial  and  motor  abilities  will  be  concurrently 
used,  consciously  or  unconsciously,  and  based  on  more  nat¬ 
ural  forms  of  dialogue.  Cooperation  will  be  an  important  as¬ 
pect,  communication  and  access  to  information  will  be  con¬ 
currently  used  to  solve  common  problems  in  a  cooperative 
manner;  moreover  cooperation  may  be  among  human  users 
themselves  or  among  user  representatives  (agents  and  avatars), 
to  whom  variable  degrees  of  trust  can  be  assigned.  Finally,  com¬ 
puting  will  become  progressively  more  social.  Access  to  infor¬ 
mation  and  communications  will  no  longer  be  the  task  of  an  in¬ 
dividual  but  will  be  extended  to  communities  of  users,  who 
have  at  their  disposal  common  (sometimes  virtual)  spaces  in 
which  they  can  interact.  This  paradigm  shift  offers  exciting  op¬ 
portunities  for  HCI  research,  as  technologies  are  anticipated 
to  have  a  much  bigger  impact  on  human  life  in  the  future,  and 
raises  an  entire  series  of  issues  to  be  addressed,  including  in¬ 
teraction  implicitness  and  continuity,  distribution  of  input  and 
output,  appropriate  exploitation  of  human  perception  and  cog¬ 
nition  abilities,  user  control  over  the  intelligent  environment, 
health,  safety,  privacy  and  social  issues,  and  dynamic  context 
awareness.  It  is  clearly  very  important  that  such  an  environ¬ 
ment  develops  proactively  in  a  user-centered  fashion,  and 
comes  to  satisfy  people’s  needs  and  improves  quality  of  life, 
avoiding  the  risk  of  a  technological  nightmare.  It  is  also  very 
likely  that  new  methods,  techniques,  and  tools  will  be  neces¬ 
sary  for  supporting  the  user  interface  development  lifecycle. 


Future  Trends  in  Human-Computer  Interaction 


1287 


In  addition  to  human  and  computing  devices,  the  intelligent 
environment  will  also  be  populated  by  robots.  As  a  conse¬ 
quence,  human-robot  interaction  will  constitute  a  relevant  do¬ 
main  of  investigation,  opening  exciting  new  perspectives  and  of¬ 
fering  significant  potential  improvements  of  the  quality  of  life.  In 
this  context,  the  robot  behavior  and  its  autonomy  with  respect 
to  the  human  user  are  important  factors  of  acceptability. 

Finally,  we  are  witnessing  a  gradual  fusion  of  computing  with 
communications,  learning,  and  entertainment.  This  also  calls  for 
new  HCI  approaches  addressing  appropriate  design  approaches 
and  techniques,  usability  studies,  theoretical  and  pedagogical 
foundations,  analysis  of  nonbusiness  human  activities,  as  well  as 
social  and  cultural  issues. 


WHAT  ARE  THE  MOST  INNOVATIVE  CHANGES 
YOU  ENVISION  IN  THE  NEXT  FIVE  YEARS  WITH 
REGARD  TO  HOW  PEOPLE  WILL  INTERACT 
WITH  INFORMATION  TECHNOLOGIES? 


John  M.  Carroll 

My  responses  to  the  first  three  questions  have  already  ad¬ 
dressed  this  to  an  extent,  probably  to  an  extent  just  beyond  my 
actual  competence!  My  crystal  ball  is  a  lot  like  everyone  else’s,  I 
suppose.  The  descendents  of  today’s  cell  phones  and  other 
wireless  devices  will  be  far  more  powerful,  more  flexible,  and 
more  usefully  connected  to  databases,  to  their  own  current  lo¬ 
cations  in  the  world,  and  to  the  people  we  care  about,  work 
with,  etc.  How  could  it  be  otherwise? 

I  think  we  can  look  forward  to  some  clarifications  in  the  crys¬ 
tal  ball  during  this  period.  For  example,  during  the  past  five  years, 
there  has  been  debate  about  the  feasibility  and  desirability  of  the 
agent  paradigm.  I’m  not  in  the  middle  of  it,  but  to  a  bystander,  it 
echoes  perennial  debates  about  artificial-intelligence  (AI)  appli¬ 
cations.  I  think  it  is  likely  that  this  latest  instance  of  the  AI  debate 
will  be  resolved  within  the  next  five  years;  we’ll  either  have  per¬ 
vasively  integrated  agents  in  our  information  interactions,  or  the 
agent  paradigm  will  be  gone,  and  AI  will  have  made  another 
strategic  retreat  to  something  analogous  to  expert  systems.  This 
is  probably  controversial  enough  to  get  me  into  big  trouble! 

Katsuhiko  Ogawa 

Agents  will  take  the  place  of  robots  with  moveable  arms  and 
legs.  Agents  will  be  incorporated  into  portable  devices,  and  peo¬ 
ple  can  have  their  own  agents  incorporated  into  net  space. 
These  agents  act  as  proxies  for  people,  amassing  information 
and  making  it  available  to  others. 

R  L.  Patrick  Rau 

People  will  have  wearable  computers  and  ubiquitous  comput¬ 
ing.  They  will  be  able  to  access  to  information  anywhere  at  any 
time.  User  interface  of  computers  no  longer  means  a  display 
plus  a  keyboard  and  a  mouse.  "Wireless  network  will  bring  in¬ 
formation  into  people’s  pockets. 


Gavriel  Salvendy 

My  list  would  include  additional  emphasis  on  the  disappearing 
computer,  the  emergence  of  miniaturized  computing  systems, 
and  the  revitalization  of  intelligent  interfaces. 

Constantine  Stephanidis 

There  is  a  considerable  trend  toward  mobile  interaction.  Mobile 
technologies  are  everywhere.  Cell  phones  in  particular  have  be¬ 
come  a  modern-day  need  and  manufacturers  have  started  to  ex¬ 
pand  their  capabilities.  Like  all  new  technological  innovations, 
many  areas  are  still  in  need  of  improvement,  especially  since  cell 
phones  have  started  to  evolve  into  multifunctioning  equipment, 
combining  features  such  as  personal  organizers,  web  browsers, 
cameras,  e-mail  clients,  and  music  players  along  with  traditional 
telephone  facilities.  The  main  issue  regarding  mobile  technolo¬ 
gies  is  that  users  have  already  been  using  these  applications  in  a 
very  different  environment  and  have  not  yet  adapted  to  the  cell¬ 
phone  device.  Right  now,  it  is  a  race  of  arms  between  manufac¬ 
turers  to  be  the  first  to  include  new  innovations  to  their  devices 
and  not  spending  much  time  designing  the  existing  ones  to  best 
do  their  job.  The  same  holds  true  for  PDA  devices;  although  they 
seemed  to  have  caught  on  well  in  in  the  business  world,  they 
did  not  manage  to  penetrate  the  general  consumer  market.  The 
Information  Society  is  not  going  to  be  confined  in  an  office  or  at 
home.  The  convergence  of  communications  and  computing,  in 
the  form  of  cell  phones,  PDAs,  and  notebooks  connected  on  a 
wireless  network,  means  that  quality  experience  on  the  move  is 
going  to  be  in  high  demand  in  the  very  near  future. 

A  similar  trend  toward  networked  interaction  can  also  be  ob¬ 
served  in  the  home  environment,  where  technology  is  mature 
enough  to  support,  for  example,  the  fusion  of  the  web  and  the 
digital  Ty  as  well  as  the  overall  “connectivity”  of  various  devices 
and  controls  in  the  home  environment.  As  a  consequence,  we 
are  already  witnessing,  for  example,  web  navigation  and  inter¬ 
action  through  a  remote  control,  or  using  a  PDA  to  set  the  fridge 
temperature. 

Another  field  that  is  very  likely  to  yield  interesting  applica¬ 
tions  in  the  near  future  concerns  the  exploitation  of  biometrics 
for  interaction  purposes,  mainly  for  verification  and  identifica¬ 
tion  purposes,  but  also,  perhaps  in  the  longer  term,  for  the 
recognition  of  cognitive  and  emotional  states. 


WHAT  DO  EDUCATORS  NEED  TO  CHANGE 
TO  ENSURE  THAT  PHD  STUDENTS  ARE 
PREPARED  TO  ADDRESS  THE  HCLS 
GRAND  CHALLENGES? 


John  M.  Carroll 

The  most  important  thing  we  need  to  change  is  our  discipli¬ 
nary  bias.  No  one,  not  even  the  professors  in  our  top  HCI  pro¬ 
grams,  can  be  masters  of  everything  that  is  HCI.  Experts  are  al¬ 
ways  narrow  in  some  sense.  HCI  has  matured  as  an  academic 
focus  in  several  disciplines — arts  and  architecture,  cognitive  sci¬ 
ence,  communications,  design,  education,  geography,  information 
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science,  industrial  engineering,  library  science,  and  psychology. 
Like  the  blind  men  and  the  elephant,  each  of  these  has  its  own 
limited  view  of  what  HCI  is,  and  produces  students  with  these 
limited  views  perfected. 

The  obvious  way  to  address  this  is  for  faculty  and  students 
to  cross  boundaries  and  expose  their  narrow  expertise  to 
broader  discourses.  Of  course,  this  already  happens  in  some 
places,  but  I  think  it  is  still  talked  about  more  than  practiced.  I 
guess  one  thing  I  would  say  to  students  is  that  if  you  think  you 
are  getting  this  narrow  sort  of  training,  don’t  settle  for  it! 

The  second  thing  I  think  we  should  change  is  our  emphasis 
on  practical  skills.  I  love  John  Dewey;  reading  Dewey  changed 
my  life.  However,  I  think  that  many  pedagogical  implementa¬ 
tions  of  Dewey’s  ideas  are  too  simplistic.  The  mere  fact  that  stu¬ 
dents  carry  out  “authentic  tasks”  in  their  classes  is  not  a  guaran¬ 
tee  of  anything,  and  in  fact,  can  easily  be  gamed.  One  example 
of  this  is  role  definitions  in  project  teams.  A  common  approach 
is  to  create  teams  with  a  project  manager,  a  lead  designer,  an  in¬ 
formation  designer,  and  an  implementer.  However,  these  roles 
have  only  modest  operational  dependencies  (which  of  course  is 
why  they  are  used  in  real  work).  Thus,  they  actually  provide  stu¬ 
dents  with  a  gift-wrapped  approach  to  divide-and-conquer;  that 
is,  to  minimize  overall  group  effort,  which  incidentally  minimizes 
overall  learning.  Moreover,  students  sometimes  specialize  in 
these  roles  (which  optimizes  effort  with  respect  to  grading  out¬ 
comes).  After  5-6  courses  in  which  a  given  student  has  repeat¬ 
edly  played  the  role  of,  say,  information  designer,  how  much  has 
he  or  she  learned  about  implementation? 

I  think  that  more  attention  has  to  be  given  learning  objectives 
and  outcomes,  and  not  just  to  apparent  authenticity.  Another  as¬ 
pect  of  this,  I  think,  is  the  erosion  of  emphasis  on  theory.  HCI  is 
a  cornucopia  of  theory:  Theories  at  different  levels  of  philosoph¬ 
ical,  behavioral,  social,  cultural,  economic,  and  technological 
analysis.  Training  students  in  usability  testing  or  in  the  use  of  cur¬ 
rent  development  toolkits  is  appropriate,  but  not  adequate.  Such 
narrow  skills  will  quickly  become  obsolete,  they  probably  already 
are.  Again,  I  would  suggest  to  students:  If  you  have  learned  how 
to  write  an  interaction  scenario,  create  a  running  prototype,  and 
run  a  usability  test,  but  you  didn’t  learn  about  human-information 
processing,  mental  models,  activity  theory,  small-group  behavior, 
and  ethnomethodology  you  deserve  a  refund. 

Katsuhiko  Ogawa 

It  is  important  for  them  to  learn  not  how  to  evaluate  and  analyze 
tasks,  devices,  software  and  so  on,  but  to  actively  challenge  the 
task  of  designing  new  and  innovative  things.  They  should  also 
learn  how  to  design  new  services  as  well  as  new  things. 

R  L.  Patrick  Rau 

PhD  programs  should  stress  issues  related  to  ubiquitous  com¬ 
puting  while  providing  basic  knowledge  to  guide  future  research 
and  development.  Programs  should  also  introduce  new  tech¬ 
nologies,  such  as  nanotechnology  and  cognitive  science.  Finally, 
HCI  students  need  to  understand  developments  in  different 
areas.  The  development  of  HCI  in  Asia  like  Japan,  Korea,  Taiwan, 
and  China  will  benefit  the  global  HCI  community  by  either  sup¬ 


plement  investigations  of  Asians  in  previous  HCI  areas  or  inspir¬ 
ing  new  research  issues  from  a  different  cultural  perspective. 

Gavriel  Salvendy 

PhD  programs  need  to  emphasize  the  science  base  of  HCI  and 
provide  courses  that  prepare  students  with  the  know  how  on 
the  science  of  the  discipline. 

Constantine  Stephanidis 

The  foremost  challenge  in  HCI  education  is  addressing  multidis- 
ciplinarity.  A  variety  of  diverse  disciplines  contribute  to  HCI,  in¬ 
cluding  ergonomics  and  human  factors,  social  and  organizational 
psychology,  cognitive  psychology,  software  engineering,  design, 
artificial  intelligence,  semiotics  and  linguistics,  philosophy,  soci¬ 
ology,  computer  science,  and  anthropology.  The  leading  experts 
of  HCI  come  from  all  sorts  of  backgrounds,  and  the  ability  to 
widen  scientific  interests  to  address  different  fields  is  a  key  toward 
advancing  the  understanding  of  how  humans  interact  with  com¬ 
puters.  An  HCI  expert  must  have  solid  multidisciplinary  roots; 
he  or  she  must  be  a  scholar  in  more  than  one  field  and,  more  cru¬ 
cially,  be  able  to  merge  the  knowledge  from  each  area  into  a  het- 
erogonous  science.  One  of  the  problems  today  in  HCI  is  that  al¬ 
though  everybody  understands  the  multidisciplinary  nature  of 
the  field,  few  have  the  level  of  in-depth  knowledge  that  is  re¬ 
quired  to  push  the  boundaries  further.  Sometimes  people  work 
on  preconceptions  of  what  each  discipline  claims,  and  seldom 
take  into  account  the  most  recent  advances  in  each  discipline.  It 
is,  therefore,  necessary  to  adopt  a  more  holistic  approach  to  ed¬ 
ucation,  address  multidisciplinarity  more  effectively  in  university 
curricula,  and  encourage  students  to  cultivate  a  multidisciplinary 
mentality.  The  ultimate  goal  is  that  HCI  researchers  are  able  to 
evolve  with,  and  anticipate,  new  and  emerging  technological  en¬ 
vironments  and  societal  needs.  Toward  this  direction,  current  ef¬ 
forts  focus  on  examining  educational  curricula  and  practices  from 
across  the  world,  exploring  the  topics  that  are  taught  and  the 
techniques  used,  discussing  the  foundations  of  HCI,  examining 
how  the  curriculum  is  changing  and  the  driving  forces  for  this 
change,  reviewing  how  students  exploit  their  HCI  education,  and 
investigating  employers’  requirements  for  HCI  curricula.  Also 
from  the  perspective  of  HCI  education,  HCI  theory  is  of  particu¬ 
lar  importance,  as  it  is  an  instrument  to  equip  students  with  a 
“durable”  background  in  a  very  dynamic  environment. 


WHAT  IS  THE  FUTURE  OF  HCI  AS  A 
DISCIPLINE/PROFESSION?  WILL  IT  CONTINUE 
TO  BE  SUSTAINABLE  AS  AN  ACADEMIC 
ENDEAVOR  THAT  YIELDS  MARKETABLE 
PRACTITIONERS/RESEARCHERS? 


John  M.  Carroll 

I’m  glad  you  asked  that  question!  I  have  changed  my  mind  on 
this  recently.  Until  a  few  years  ago,  I  felt  that  future  of  HCI  be- 
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longed  to  a  new  kind  of  technical  professional — computer  sci¬ 
entists  with  significant  understanding  of  usability  and  design.  I 
envisioned  HCI  as  a  major  new  focus  within  the  discipline  of 
computer  science.  I  spent  10  years  as  a  professor  of  computer 
science — five  as  a  department  head — trying  to  realize  this  vision 
in  a  sort  of  “best-case”  situation  with  respect  to  like-minded  col¬ 
leagues  and  institutional  support. 

I  changed  my  mind  because  I  came  to  see  that  there  is  far 
too  much  to  know  and  do  in  HCI  professional  activity  for  one- 
size-fits-all  scenario.  As  I  mentioned  in  an  earlier  answer,  HCI 
is  a  constellation  of  professional  roles,  skills  sets,  and  perspec¬ 
tives.  This  is  a  good  thing,  and  it  will  surely  provide  richer  and 
more  meaningful  careers  for  HCI  professionals!  However,  it 
eliminates  my  earlier  vision  of  a  sort  of  super  professional.  In¬ 
deed,  looking  at  contemporary  trends  in  computer  science  and 
other  computing-related  disciplines  suggests  that  the  trajectory 
of  HCI  within  computer  science  is  destined  to  become  nar¬ 
rower,  not  broader.  As  a  former  computer-science  professor, 
this  makes  me  sad,  but  it  is  clearly  an  opportunity  for  many 
other  disciplines. 

HCI  academic  programs  are  developing  in  arts  and  architec¬ 
ture,  cognitive  science,  communications,  design,  education,  ge¬ 
ography,  industrial  engineering,  library  science,  and  psychology. 
One  of  the  most  important  recent  developments  is  the  emer¬ 
gence  of  “i-schools” — schools  of  informatics,  information  stud¬ 
ies,  and  information  sciences  and  technology — as  academic 
homes  for  HCI  programs.  The  core  construct  for  i-schools  is  in¬ 
formation,  not  computation,  as  is  the  case  for  computer  sci¬ 
ence.  HCI  ends  up  in  the  center  of  i-schools,  instead  of  at  the 
periphery  of  computer  science.  When  I  realized  that  the  role  of 
HCI  within  computer  science  was  narrowing  instead  of  growing, 
I  moved  to  an  i-school. 

I  think  that  HCI  has  a  strong  future  both  as  an  academic  en¬ 
deavor  with  respect  to  research  and  education  and  as  a  profes¬ 
sional  focus.  As  I  look  back  over  my  30  years  in  HCI,  it  seems 
that  at  every  point  the  opportunities  far  outstripped  the  peo¬ 
ple  power  available,  and  the  future  looked  more  so.  This  is  ex¬ 
actly  how  it  looks  to  me  today — an  abundance  of  opportunities 
for  contributions  from  HCI.  Because  the  endeavor  is  increas¬ 
ingly  multidisciplinary,  I  do  see  a  need  for  creative  construc¬ 
tivism  in  articulating  a  big  vision  that  embraces  and  fully  utilizes 
the  diverse  professional  roles,  skills  sets,  and  disciplinary  per¬ 
spectives  that  constitute  HCI.  For  example,  we  should  start  talk¬ 
ing  more  about  multidisciplinary  teaching  in  HCI  and  about 
multidisciplinary  HCI  teams  in  product  development.  The  great¬ 
est  risk  I  see  is  not  that  HCI  is  going  to  flop  or  fade  away,  but  that 
it  is  going  to  fracture  into  several  pieces — each  of  which  would 
be  quite  exciting  and  important  in  itself 

Katsuhiko  Ogawa 

If,  like  a  doctor,  we  have  the  clear  goal  of  curing  disease  and  sick¬ 
ness,  it  is  critical  for  us  to  always  have  the  most  up-to-date  knowl¬ 
edge  available  when  treating  patients.  As  with  doctors  treating 
their  patients,  with  HCI,  too,  in  giant  factories,  aircraft,  automo¬ 
biles  and  the  like,  safety  should  be  the  first  priority.  In  the  area 
of  consumer  products,  however,  first  priority  is  often  given  to 
the  designing  of  functions  and  operations.  Consequently,  we 


should  move  away  from  the  academic  aspects  stressed  in  the 
HCI  of  today  and  concentrate  more  on  the  practical  aspects. 


R  L.  Patrick  Rau 

HCI  will  grow  as  the  information  technology  (IT)  industry  grows. 
In  China,  the  IT  industry  is  now  undergoing  a  rapid  growth 
which  North  America  and  Europe  experienced.  This  provides 
opportunities  for  HCI  development.  Especially  after  2004,  HCI 
has  gained  a  lot  of  attention  from  the  industry  in  China.  How¬ 
ever,  there  are  also  challenges.  The  development  of  such  an  in¬ 
terdisciplinary  field  such  as  HCI  in  academia  will  take  more  time 
and  effort  and  will  be  strongly  influenced  by  the  industry. 


Gavriel  Salvendy 

I  view  HCI  in  computer  science  as  analogous  to  differential 
equations  in  mathematics.  In  the  early  stages,  mathematicians 
went  around  engineering  departments  to  help  engineers  solve 
their  problems  using  differential  equations.  Over  the  years,  dif¬ 
ferential  equations  became  an  integral  part  of  engineering  edu¬ 
cation  and  all  engineers  were  acquainted  with  differential  equa¬ 
tions  and  no  longer  needed  the  mathematicians.  Similarly, 
computer  scientists  and  information-technology  specialists  will 
all  be  trained  in  HCI,  and  the  demand  for  practicing  HCI  will  sig¬ 
nificantly  decrease  over  the  years. 


Constantine  Stephanidis 

HCI  helps  in  the  design  and  development  of  better  products; 
i.e.,  more  useful  products  that  do  what  the  users  want  them  to 
do  easily,  efficiently  and,  in  the  best  case,  pleasantly.  Erom  the 
application  of  HCI  methodologies  throughout  the  development 
lifecycle,  from  analysis  to  design  guidelines  and  practices  to  de¬ 
velopment  strategies,  and  finally  to  evaluation  and  testing,  there 
is  clearly  a  major  role  that  HCI  plays  toward  improving  inter¬ 
active  products  and  services.  Evaluation  of  usability  is  perhaps 
the  area  of  HCI  that  prevails,  because  it  is  permanently  embed¬ 
ded  in  the  business  world.  However,  there  is  a  clear  need  for 
wider  HCI  practice  in  industry,  as  only  the  major  players  have  in¬ 
vested  heavily  in  such  direction,  with  clearly  positive  results, 
both  product-wise  and  business-wise. 

Thus,  there  is  an  ever-increasing  need  for  HCI  practitioners 
and  researchers.  However,  in  order  to  develop  professionals  to 
fulfil  this  need,  enhance  the  practice  of  HCI  in  industry,  and  fa¬ 
cilitate  technology  transfer,  HCI  teaching  in  universities  should 
be  rethought.  First  of  all,  the  next  generation  of  practitioners 
needs  to  be  prepared  for  the  business  world,  for  example,  by 
learning  basic  management  and  marketing  skills.  Additionally, 
HCI  practitioners  need  to  be  better  prepared  to  address  the 
evolving  situation  in  the  context  of  the  Information  Society.  A  typ¬ 
ical  example  in  this  respect  is  universal  access  and  design  for  all. 
Although  there  is  now  wide  recognition  and  a  consolidated  body 
of  knowledge,  one  of  the  critical  impediments  to  the  adoption 
of  universal  design  in  practice  is  the  lack  of  qualified  practitioners 
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who  understand  diversity  in  the  user-target  population  and  in  the 
technology  and  in  the  context  of  use  and  are  able  to  integrate  the 
related  requirements  in  the  development  process.  To  overcome 
this,  it  is  recommended  that,  in  the  short  term,  accessibility,  us¬ 
ability,  and  quality  in  use  are  introduced  as  mandatory  compo¬ 
nents  of  university  education.  Other  similar  challenges  are  very 
likely  to  emerge  with  the  anticipated  evolution  of  the  information 


society  toward  an  ambient-intelligence  environment,  where  the 
complexity  intrinsic  in  developing  interactive  applications  and 
services  will  grow  even  further.  As  a  consequence,  HCI  experts 
must  be  equipped  to  address  diverse  technological  environ¬ 
ments  and  changing  needs  in  a  dynamic  way.  It  is  likely  that,  in 
the  future,  “specializations”  of  HCI  expertise  in  different  subfields 
will  arise. 
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moral  theory,  1243-1244 
variability  and  universality,  moral,  1244-1245 
professional  ethics,  1255-1257 
values,  implication  in  technological  design,  1240 
position,  embodied,  1240-1241 
position,  exogenous,  1241 
position,  exogenous,  1241-1242 
Human-computer  interaction  in  aerospace,  721 

automation  to  the  flight  deck,  introducing,  726-730 
autopilots,  726-727 

air-traffic  control,  early,  730 
role  allocation,  727-730 
challenges,  future,  735-737 

3X  air  traffic,  technologies  needed  for,  736-737 
aerial  vehicles,  uninhabited,  737 
displays,  three-dimensional  735 
Hybridity  and  the  third  space,  1064-1066 
hybridity  and  HCI,  1064-1065 
participatory  design  as  the  third  space  in  HCI,  1065 
participatory  design’s  own  third  space,  1065 


I 

Information  processing,  attention  in,  54-56 
automaticity  and  practice,  56 
attention,  models  of,  54-56 
Information  processing,  memory  in,  51-54 

factors  affecting  retrieval  of  earlier  events,  53-54 
memory,  51-53 
long-term,  52-53 
short-term  (working),  51-52 

Information  technologies,  acceptance  by  older  adults,  785-786 
Information  technology  for  cognitive  support,  811 
cognitive  impairment,  812-813 
Intelligence  Quotient,  813-814 


people  with  cognitive  impairment,  interface  design’s  support  of 
814-821 

aphasia,  help  for,  819 
attention,  effects  of,  814-815 
communication,  augmentative  and  alternative,  816 
dementia,  support  for,  820-821 
memory  loss  and  dementia,  815 
nonliterate  users,  support  for,  819-820 
prostheses,  cognitive,  816-817 
software  as  a  cognitive  scaffolding  and  a  prompt  for 
communication,  817-819 
visuo-spatial,  iconic  and  verbal  abilities,  815-816 
research  methodologies,  823-827 
disabling  environment,  the,  825-826 
dynamic  diversity  design  for,  826-82 
user  sensitive  inclusive  design,  826 
systems  to  support  people  with  dyslexia,  821-823 
dyslexics,  experimental  text  reader  for,  822-823 
problems  of  dyslexia,  computer  aid  for,  822 
Information-processing  methods,  45-47 
chronometric,  45-46 

psychophysiological  and  neuroimaging,  46-47 
signal  detection  and  theory,  45 
speed-accuracy  46 
characteristic  curve,  46 
Information-processing  models,  47-48 
sampling  models,  sequential,  48 
stage  models,  discrete  and  continuous,  47-48 
Input  device,  162 

Input  devices,  evaluation  and  analysis  of,  165-167 
Fitts’  Law,  166 
Hick’s  Law,  166 

Keystroke  Level  Model  and  GOMS  analysis,  166-167 
Steering  Law,  166 
Minimum  Jerk  Law,  166 
Input  technologies,  162-165 
input  device  properties,  162 
device  acquisition  time,  162 
dimensions,  number  of,  162 
gain,  162 

indirect  vs  direct,  162 
metrics,  162 
property  sensed,  162 
input  device  states,  163-165 
graphical  user  interfaces,  states  and  transitions  at  the 
core  of,  164 
pointing  devices,  163 
joysticks,  163 
mice,  163 

tablets,  indirect,  163 
touchpads,  163 

touchscreens  and  pen-operated  devices,  163 
trackballs,  163 
Input,  trends  for,  171-172 
Inspection  methods,  1182-1184 

DARe  model  and  the  development  lifecycle,  1184 
DARe  model  and  the  scoping  of  UIMs,  1182-1184 
Discovery  and  Analysis  Resources  (DARe)  model  of  UlM 
effectiveness,  1182 

Inspection  methods,  improving,  1184-1187 
analysis,  1186-1187 
improvements,  1185 
method  independent,  1185 
method  specific,  1185-1186 
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Interaction,  modalities  of,  169 
input,  169 
bimanual,  169 
pen  and  gesture,  169 
speech  and  multimodal,  169 
whole  hand,  170 
devices,  a  society  of,  171 
tables  and  screens,  multitouch,  171 
technologies,  background  sensing,  170-171 
Interfaces  and  agents,  adaptive,  433 
concepts,  434-435 

future  of  user-adaptive  systems,  454-455 
growing  need  for  user-adaptivity,  454 
successful  adaptation,  increasing  feasibility  of,  455 
information  about  users,  obtaining,  449-451 
low-level  indices  of  psychological  states,  451 
naturally  occurring  actions,  450 
previously  stored  information,  450-451 
self-reports  and  -assessments,  explicit,  449-450 
signals  concerning  the  current  surroundings,  451 
information  acquisition,  supporting,  440-446 
helping  users  find  information,  440-442 
information  presentation,  tailoring,  443-444 
products,  recommending,  442-443 
provision  of  information,  spontaneous,  442 
supporting,  444-446 
collaboration,  444-445 
learning,  445-446 

special  considerations  concerning  empirical  methods, 

451-454 

adaptive  and  nonadaptive  systems,  experimental 
comparisons  of,  453-454 
individual  differences,  taking  into  account,  454 
usability  under  realistic  conditions,  checking,  454 
usage  scenarios  and  user  requirements,  earlier  studies  of,  452 
use  of  data  collected  with  a  nonadaptive  systems,  451-452 
Wizard  of  Oz  studies,  452-453 
work  of  human  designers,  comparisons  with,  453 
system  use,  supporting,  435-436 
adapting  the  interface,  436-437 
dialogue,  controlling  a,  439-440 
interaction  with  the  real  world,  mediating,  438-439 
system  use,  helping  with,  437-438 
usability  challenges,  446-449 
controllability,  threats  to,  447 
experience,  breadth  of,  448 
obtrusiveness,  448 

predictability  and  comprehensibility,  threats  to,  447 
privacy,  threats  to,  448 
tradeoffs,  dealing  with,  448-449 


K 

Keyboards  and  text  entry  168-169 
mobile  text  entry,  168 
memory,  procedural,  168 


M 

Manipulation,  300-309 
keyboards,  mobile,  303-304 
speech  interfaces,  304-309 


navigator  wearable  computer,  304-307 
speech  translation,  307-309 
VuMan  3  dial  pointing,  300-303 
MDE  for  plastic  user  interfaces,  1117-1123 
functional  decomposition,  1118-1119 
UI  plasticity,  models  for,  1119-1121 
Mental  model,  64-67 

artifact,  representational,  68 
homomorphisms,  66 
isomorphisms,  66 

metal  content  about  domains,  refocusing  of  attention  on,  64 
problem  space  and  encoding,  65 
situation  model  versus  a  text  base,  67 
text  and  other  artifacts,  67-69 
users’  knowledge  about  systems,  labeling,  64 
Yoked  State  Space  hypothesis,  68 
Mental  models,  instructions  for,  69-72 
instruction,  multimedia,  69-70 
Mental  models,  shared  72-74 

absence  of  increase  in  convergence  of  mental  models,  73 
mental  models,  measurement  of,  72-73 
role  differentiation,  74 
teamwork  and  collaboration,  72 

Mobile  interaction  design  in  the  age  of  experience  ecosystem,  459 
interaction,  spheres  of,  461-468 
communities,  465-468 
content,  464-465 

mental  models  (and  their  representation),  464 
humans  and  devices,  461-464 
mobile  experience,  460 

content,  communication,  control,  460 
interaction  to  experience,  460-461 
networked  ecosystems,  interacting  with,  460 
Model-driven  engineering  (MDE),  1114-1117 
metamodel,  1116-1117 
transformations  and  mappings,  1117 
Mood,  causes  of,  84 
color,  84 
other  effects,  84 

Multimedia  user  interface  design,  393 
background,  cognitive,  395-397 
attention,  selective,  395-396 
emotion  and  arousal,  396 
learning  and  memorization,  396-397 
perception  and  comprehension,  395 
definitions  and  terminology,  394-395 
design  process,  397-410 
attention,  design  for,  407-410 
information  architecture,  398-400 
interaction  and  navigation,  405-407 
media  selection  and  combination,  400-405 
users,  requirements  and  domains,  398 
Multimodal  interface  design,  cognitive  science  underpinnings  of, 

421- 425 

multimodal  interface,  425-428 
design,  future  directions  for,  427-428 
difference  from  graphical  user  interfaces,  425 
multimodal  input,  integration  and  synchronization  characteristics  of, 

422- 423 

multimodal  language,  features  of,  424-425 

multimodal  systems,  architecture  and  techniques  in  designing, 

425-427 

multimodal  interaction,  differences  in,  423-424 
multimodal  interaction  of  users,  421-422 
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Multimodal  interface  design,  goals  and  advantages  of,  417-420 
terminology,  418 

Multimodal  systems,  reasons  for  building,  414 
multimodal  interfaces,  types  of,  414-417 


N 

Networks  as  enablers,  267-270 
applications,  269-270 
Networks  as  mediators,  270-279 
issues  and  problems,  270 
media  issues,  277-279 
broadcast  and  prerecorded  media,  278-279 
compression,  278 

interactive  conversation  and  action,  277-278 
jitter,  278 
reliability,  278 
sound  and  vision,  278 
network  properties,  270-274 

bandwidth  and  compression,  270-271 

encryption,  authentication,  and  digital  signatures,  273-274 

jitter  and  buffering,  271-273,  see  also  278 

latency  and  start-up,  271 

quality  of  service,  273 

reliability  and  loss,  datagram,  and  connection-based 
services,  273 

ownership,  privacy  and  trust,  279 
UI  properties,  274-277 
awareness,  276-277 
coping  strategies,  275 
delay  and  time,  274-275 
network  transparency,  274 

race  conditions  and  inconsistent  interface  states,  276 
timelines  of  feedback/feedthrough,  pace,  275-276 
Networks  as  platforms,  286-290 
infrastructure,  supporting,  289-290 
awareness  servers,  289 
event/messaging  systems,  289 
notification  servers,  289 
resource  discovery,  289 

Web  service  and  remote  procedure  call,  289-290 
networked  systems,  architectures  for,  289 
shared  objects,  accessing,  286-289 
editing,  synchronous,  287-288 
groupware  undo,  288 
locking,  287 
solutions,  real,  288-289 
replication,  287 
Networks  as  subjects,  279-286 
network  models,  280-283 
addresses,  282-283 

internetworking  and  tunneling,  281-282 
layers,  280 

peer-peer  and  ad  hoc  networks,  283 
protocols,  281 
routing,  282 
network,  283-285 
awareness,  284-285 
confusion,  285 
management,  283-284 

networks,  understanding  and  managing,  279-280 
seamfulness  and  virtual  locality,  285 
Neuroscience,  cognitive,  57 


Neurotechnologies  for  assessing  user  state,  1222-1230 
arousal  meter  gauge,  1226 
body  position/posture  tracking,  1223-1226 
Cognitive  Monitor,  1230 
EEC,  1229 

EKG  (ECG)-based  arousal  and  cognitive  workload  gauges,  1223 
electrodermal  activity  (EDA)/GSR-based  arousal  and  cognitive 
workload  gauge,  1223 
Engagement  Index  gauge,  1227 
Event  Related  Optical  Signal,  1230 
executive  Load  Index  gauge,  1226-1227 
LM-ATL’s  EEG-based  technology,  1229-1230 
LM-ATL’s/NIR  approach,  1228 
New  Workload  Assessment  Monitor,  1227-1228 
P300  novelty  detector  gauge,  1227 
pupilometry  1228-1229 
stress  gauge,  1226 

Neurotechnologies,  application  of,  1230-1235 
future  directions,  1235-1237 
controllers,  robust,  1237 
mitigation  strategies,  1237 

national  and  supra-national  institutions,  roles  of,  1237 
HCI  system  design  and  evaluation  capabilities,  improving,  1233 
sytems,  closed-loop,  1231-1233 
training  systems  and  operator  selection,  1233-1235 

Nonspeech  sound  and  HCI,  history,  248 

sound,  use  in  conveying  information  to  computers,  248 
visual  and  auditory  feedback  as  tools  for  interaction,  248 

Nonspeech  sound  in  HCI,  reason  for  use  of,  250 
annoyance,  avoidance  of,  250-251 
sound,  248-249 
advantages,  248-249 
problems,  249 

speech  and  nonspeech  sounds  for  interface  design,  comparing, 
249-250 

Nonspeech  sound  presentation  techniques,  255-259 
auditory  icons  and  earcons,  comparing,  258-259 
auditory  icons,  255-256 
design  guidelines,  256 
earcons,  256-258 

design  guidelines,  257-258 

dynamics,  257 

pitch,  257 

register,  257 

rhythm,  257 

timbre,  257 

Novel  multimodal  interface  design,  420-421 


O 

Older  adults  and  the  acquisition  of  computer  skills,  786 
Older  adults,  use  of  computer  technology  by,  778-782 

computer  technology  usage  in  everyday  domains,  779-782 
usage  patterns  and  trends,  778-779 
healthcare,  781-782 
home  environments,  780-781 
work  environments,  779-780 
Older  computer  user,  understanding  the,  782-785 
aging  and  abilities,  784 
cognitive  abilities,  785 
motor  skills,  784-785 
sensory  process,  784 
elderly,  today’s,  782-784 
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Online  communities,  603 
case  studies,  612-617 

computer  aided  language  learning  communities,  612-613 
game  communities  and  activity  theoretical  analysis,  613-6 
Computer  Mediated  Communication  (CMC)  and  online 
communities,  604-608 

CMC  and  online  communities,  examples  of,  605-608 
online  communities,  604-605 

online  communities  frameworks  and  methodologies,  analysis  of, 
608-612 

content  and  textual  analysis,  610 
log  analysis,  609-610 

query-based  techniques  and  user  profiles,  609 
social  network  analysis,  610-612 
Organizational  experience,  899-901 

children’s  art  product  manufacturer  Website,  900 


P 

Perceptual-motor  interaction  in  applied  tasks,  37-38 
digital  assistants,  personal,  37-38 
eye-gaze  vs.  manual  mouse  interactions,  38 
remote  operation  and  endoscopic  surgery,  37 
Perceptual-motor  interaction,  28-30 
attention  and  performance,  30-37 
characteristics,  30 

shifts  and  coordinate  systems,  30-31 
stimulus  characteristics  and  shifts,  31-37 
human  information  processing  and  perceptual-motor  behavior  28-29 
translation,  coding  and  mapping,  29 
Performance  evaluation,  309-311 
perception,  310-311 
speech,  dual  purpose,  310 

Performance,  workload  and  stress,  individual  differences  in,  127-129 
affective  state  differences,  128-129 
hedonomics  and  individual  differences,  129 
narrowing  and  adaptive  response,  129 
trait  differences,  127-128 

Personalization,  portals  and  communities,  567-571 
communities,  570-571 
personalization,  568-570 
adaptation,  methods  and  models  of,  569 
authoring,  570 

systems,  adaptive  and  adaptable,  568 
user  modeling,  569-570 
portals,  570 
Persuasion,  134-135 

Persuasive  technology,  positive  and  negative  applications  of,  143-144 
Persuasive  technology,  potential  and  responsibility,  144 
Player  experience  framework,  685-686 
fun  key  models,  685-686 
Problem  solving  and  decision  making,  56-57 
problem  space,  56 
components,  56 

Prototypes  and  the  design  process,  1020-1025 
alternatives,  selecting,  1023-1024 
design  space,  1021-1022 
design,  participatory,  1021 
design,  user-centered,  1020-1021 
ideas,  generating,  1022-1023 
strategies,  prototyping,  1024-1025 
prototypes,  1024-1025 
horizontal,  1024 
scenario-based,  1024-1025 


task-oriented,  1024 
vertical,  1024 

Prototypes  as  design  artifacts,  1018-1020 
evolution,  1020 
interactivity,  1019-1020 
representation,  1018-1019 

Prototypes,  design  specification  and  implementation  of,  235-237 
FEELEX  1,  235-236 
EEELEX  2,  236-237 
EEELEX,  characteristics  of  the,  237 

Prototypes,  iterative  and  evolutionary,  1032-1036 
user  interface,  1033-1036 
development  environments,  1034-1036 
toolkits,  1033 

Prototyping,  rapid  1025-1031 

offline  rapid  prototyping  techniques,  1025-1028 
mock  ups,  1025-1026 
paper  and  pencil,  1025 
video,  1026-1028 
Wizard  of  Oz,  1026 

online  rapid  prototyping  techniques,  1028-1031 
scripting  languages,  1030-1031 
simulations,  interactive,  1030 
simulations,  noninteractive,  1029-1030 


R 

Reality  and  pervasive  computing  systems,  prototyping,  1036-1037 
Recognition-based  systems,  designing,  190-195 
context,  importance  of,  191-192 
costs,  190-191 
feedback,  role  of,  191 
generalization  versus  specialization,  193 
interaction,  implicit  and  explicit,  191 
novel  interfaces,  evaluating,  194-195 
priori  knowledge,  192-193 

traditional  versus  nontraditional  interfaces,  193-194 
Requirements  analysis,  920-922,  918-919 
task  analysis,  contextual,  921-922 
user  profile,  920-921 

Requirements  specification,  motivating  and  justifying,  923-926 
task  requirements,  924-926 
user  requirements,  923 
work  environment  requirements,  923 
Requirements  specifications,  introducing,  918-920 
design/testing  development,  919-920 


S 

Safety  and  usability  of  interest.  Measures  of,  708 
Searching  and  querying,  565-567 
query  formulation,  565-566 
results  list,  566-567 

document  clustering,  567 
ranking  algorithms,  community-based,  566-567 
visual  interfaces,  567 
Sensemaking  with  visualization,  539-542 
crystallization,  539-540 

information  visualization,  levels  for  applying,  540-542 
Sensors  and  sensing  modes,  178-183 
brain  interfaces,  183 
context,  182 

gaze  and  eyetracking,  181 
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gesture,  181-182 
identity,  182 

occupancy  and  motion,  179 
position,  179-180 
range  sensing,  179 
speech, 181 
touch, 180-181 

SHERPA,  application  to  safety  critical  systems,  154-153 
Signal  processing,  183-187 

classification  and  modeling,  185-187 
Bayesian  Network,  187 
decision  trees,  186 
Hidden  Markov  Model,  187 
independence,  conditional,  186 
Support  Vector  Machine,  186 
feature  selection,  184-185 
preprocessing,  183-184 
Sound,  perception  of,  251-252 
attributes,  251 
definition,  251 

frequencies,  humans’  perception  of,  251-252 
loudness,  252 

Spaces  and  places,  1066-1068 
sitings,  1066-1067 
workshops,  1067-1068 
Speech  technology,  selecting,  383-386 
Automatic  Speech  Recognition  (ASR),  383-384 
natural  language  processing  (NLP) ,  385-386 
natural  language  understanding  (NLU) ,  385-386 
NLU  systems,  capabilities  and  limitations  of,  386 
speech-synthesis  software,  current  capabilities  and 
limitations  of,  385 
Text-to-Speech  (TTS),  384-385 
Spoken  interaction,  crafting  386-389 
challenges  of  speech,  inherent,  387 
correction  strategies,  389 
dialog  styles,  387 
errors,  handling,  388-389 
prompt  design,  387-388 
Spoken  interface  lifecycle,  383 

Standards,  rules  and  legal  regulations,  and  health  protection 
aspects,  222 

Stress  and  cognitive  workload,  relationship  between,  119 
cognitive  workload  as  a  form  of  stress,  119 
performance  workload,  119 
Stress  research,  traditional  approaches  to,  116 
Stress,  implications  for  researchers  and  practitioners,  129 
Stress,  mitigation  of,  119-123 

future  research,  problems  for,  123-127 
adaptation  under  stress  model,  126-127 
mental  resources,  123-126 
hedonomics,  122-123 
motivation  and  skill  development,  122-123 
teletic  work  and  intrinsic  motivation,  122 
teletic  work  and  motivational  affordances,  123 
person,  changing  the,  120-121 
personnel  selection,  121 
training/skill  development,  120-121 
task,  changing  the,  121-122 
automation,  adaptive,  121-122 
display  design,  121 
Structures,  narrative,  1068-1072 
stories,  1068-1070 

dramas  and  videos,  1071-1072 
photographs,  1070 


Survey  design  and  implementation  in  HCI,  1151 
survey  design  in  HCI,  1156-1164 
survey  application  domains,  1159 
survey  design  methodology,  1159-1164 
survey  types,  1156-1159 
survey  implementation  in  HCI,  1164-1168 

open  vs.  controlled  implementation  environments,  1164-1166 
survey  implementation,  challenges  in,  1166-1168 
sample  representatives,  1166 

T 

Tangible  User  Interfaces  (TUI),  469 
basic  model,  472 

from  Graphical  User  Interface  (GUI)  to  TUI,  470 
GUI,  472 
TUI,  472-485 
URR  470-472 

Task  analysis  data,  collecting,  934-938 
field  studies,  937 
rapid  task  analysis,  936-937 
remote  task  analysis,  937-938 
site  visits,  conducting,  935-936 
users  and  environments,  selecting,  934-935 
working  with  users,  preparation  for,  934 
Task  Analysis  for  Error  Identification  (TAFEI),  application  to  interface 
design,  153-154 

Task  analysis  studies,  international,  938-940 
data  analysis  and  presentation,  940-946 
data  collection,  939-940 
preparation,  938-939 
Task  analysis,  defining,  928-930 
principles,  929-930 
Task  analysis,  planning  for,  931-934 
background  research,  931-932 
decision-making,  932-933 
project  plan,  932 

reusable  data,  gathering,  933-934 
sign-off  and  schedules,  932 
useful  and  usable  data,  provision  of,  932 
Task  analysis,  selling,  930-931 
resistance  and  dispelling,  93 1 
task  analysis,  using  opportunities  to  sell,  931 
Tasks  as  stressors,  116 

computer  systems’  imposition  of  stress  on  users,  116 
computer  systems’  support  people  in  stressful  circumstances,  116 
Tasks,  users  performance  of,  702-708 
cell  phone  problem,  scope  of,  704-706 
crashes  associated  with  telematics,  704 
trips,  706-707 
users  profile,  707 
vehicles,  707-708 

Technology  and  sound  production,  252-255 

MIDI  synthesis  and  sampling  for  auditory  interface  design,  254 
sampling,  254 
sample  rate,  254 
sample  size,  254 

sound  synthesis  and  Musical  Instrument  Digital  Interface  (MIDI) , 
252-253,  253-254 
sound,  three  dimensional,  254-255 
Technology  transfer,  1212 
building  trust,  1213-1214 

confidentiality  and  non-disclosure  agreements,  1216 
deal,  crafting  a,  1216 
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Technology  transfer  {continued') 

HCI  tech  transfer,  application  to,  1214-1215 
internal  and  external  technology  transfer,  217 
technology  transfer  as  a  social  process,  1212-1213 
Technology,  persuasive,  137-140 
computers  as  persuasive  media,  138-139 
cause  and  effect  simulation,  138-139 
environments  simulation,  139 
objects  simulation,  139 

computers  as  persuasive  social  actors,  139-140 
attitudes  and  behaviors,  modeling,  140 
functional  triad  summary,  140 
social  rules  and  dynamics,  leveraging,  140 
social  support,  provision  of,  139-140 
computers  as  persuasive  tools,  137-138 
decision-making,  triggering,  138 
process,  simplification,  138 
self-efficacy  increasing,  137-138 
tailored  information,  provision  of,  138 
Telematic  design,  714-715 
IVIS  estimates,  714-715 
IVIS  yellow  and  red  thresholds,  715 
SAE  J2365  calculations,  714 
Telematics,  design  documents  for,  710-714 
alliance  principles,  713-714 
documents,  types  of,  710 
ISO  documents,  711-712 
non-ISO  documents,  710-711 

major  non-ISO  telematics  guidelines  and  recommended 
practices,  711 

SAE  J2364  (the  15-second  rule),  712-713 
Telematics,  evaluation  of,  708-710 
evaluation  contexts,  709 
Testing  and  iterating,  389-390 
computing,  major  waves  in,  135 
persuasive  interactive  technology,  135 
Theoretical  Erameworks,  116-120 
adaptation  under  stress,  117-118 
analysis,  emergent  unit  of,  117 
apparaisal  theory,  116-117 
cognitive-energetic  framework,  118-119 
Transfer  functions,  167 
devices,  167 
absolute,  167 
motion-sensing,  167 
self-centering,  167 


U 

UIM  samples,  1175-1177 
evaluation,  heuristic,  1175 
walkthrough,  1175-1176 
cognitive,  1175-1176 
heuristic,  1176 

Unified  interface  design,  1084 

alternative  styles,  relationships  among,  1087-1088 
design  problem,  1084-1085 
method,  1086-1087 

task  hierarchies,  polymorphic,  1085-1086 

Unified  interface  engineering,  1088-1096 
adaptability  and  adaptivity  cycles,  1093 
architecture,  1089-1093 
contest  parameter  server,  1091 


decision-making  component,  1091-1092 
dialogue  patterns  components,  1092-1093 
user-information  server,  1089-1091 
containment  hierarchies,  polymorphic,  1095-1096 
decision-making  specification,  1096 
dynamic  user  interface  assembly,  1093-1095 
Universal  HCI  accessibility  and  functionally  illiterate  populations,  871 
Universal  theory  of  persuasion,  absence  of,  137 
Usability  engineering,  scenario-based,  1045-1054 
activity  design,  1048-1050 
learning,  case-based,  1056 
problem  analysis,  1046-1048 
scenario-based  design  rationale  as  theory,  1056 
scenario-based  iterative  design,  1054-1055 
cases  and  object-oriented  design,  1055 
nonfunctional  requirements,  analyzing,  1055 
personas  and  environment,  1054-1055 
systems,  evaluating,  1055 
usability,  evaluation  of,  1052-1054 
user-interaction  design,  1050-1052 
Usability  inspection  methods,  (UIMs)  1172-1175 
analyst  preparation,  1173 

candidate  problem  discovery,  1173-1174 
candidate  problems,  confirmation  and  elimination,  1174-1175 
definitions,  1172 
need  and  rationale,  1172 

usability  inspections,  logical  structure  of,  1172-1173 
Usability  testing,  1130-1136 
basics,  1130-1131 

data  recording  and  analysis,  1133-1134 
participants  as  end  users,  1131 
participants’  performance  of  tasks,  1131-1133 
usability,  focus  on,  1131 
coreissues,  1138-1145 
ethics,  1144-1145 
exploration,  1140-1141 
testing  with  special  products,  1143-1144 
thinking  aloud,  1139-1140 
usability,  measurement  of,  1138-1139 
usability  testing,  reliability  of,  1141-1142 
validity,  1142-1143 
value,  1145 

future  directions,  1145-1146 
objectives,  1145-1146 
RITE  model,  1146 
testing,  asynchronous,  1146 
history,  1130 
locations,  1136-1138 
lab  testing,  1136 
onsite,  1137-1138 
remote  testing,  1136-1137 
types,  1134-1136 
informal  vs.  invalid,  1134-1135 
prototypes  in  usability  tests,  1135 
Usable  interfaces,  visual  design  principles  for,  329 
charts,  diagrams,  graphics,  and  icons,  344-347 
basic  principles,  347 
color,  347-348,  351-352 
color,  how  to  use,  347-348 
guidelines,  346-347 
icons  and  visual  cues,  344-345 
illustrations  and  photographs,  345-346 
tables,  charts,  diagrams,  344 
definition,  331 
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design  process,  tlie,  331-332 
designer  role,  332 
experience,  designing  the,  350-351 
medium  use,  351 
time,  element  of,  351 
graphic  standards,  348-350 
information  design  process,  332-333 
audit,  333 

design  development,  333 
implementation  and  monitoring,  333-334 
letterforms,  variations  in,  339 
serifs,  339 
stress,  339 
thick  and  thin,  339 
page  design,  341-344 
design  of  a  page,  building  the,  341-343 
page  design  techniques,  343-344 
process  of  good  design,  the,  332 
typeface  size  and  selection,  337 
families  of  type,  338 
serif  and  sans  serif,  337-338 
typography,  335-337 
typographic  guidelines,  339-341 
black  on  white  vs.  black  and  dark  on  light  background  vs. 

light  on  a  dark  background,  340 
justified  vs.  ragged  right  (flushleft),  340 
letter  spacing  and  word  spacing,  340 
line  length/column  width,  340 
line  spacing/leading,  340 

output  device  and  viewing  environment,  339-340 
type,  highlighting  with,  340 
typefaces,  decorative,  340 
typefaces,  combining,  339 
weight,  contrast  in,  (boldness),  339 
visual  communication  and  organization,  universal  principles  of,  334 
visual  design,  334-337 
good  design,  criteria  for,  335 
principles,  334 

tools  and  techniques,  334-337 
visual  language,  351-352 
graphics/icons,  351 
guidelines,  352 
legibility,  352 
metaphor,  351 
navigational  aids,  351 
readability,  352 


U 

User  experience  (UX),  boundaries  of,  898-899 

organizations  involved  in  end-user  relationships,  898 
UX  as  context,  898 
UX,  elements  of,  899 

User  interface  plasticity,  problem  space  of,  1108-1112 
context  and  context  of  use,  1108-1109 
ontological  foundation,  1108-1109 
system  adaptation,  1110-1112 
state  recovery,  1112 
UI  distribution,  1112 
UI  migration,  1112 
UI  remoulding,  1111-1112 
usability,  1109-1110 

User  tasks,  composition  of,  165 


compound  tasks  and  chunking,  165 
elemental  tasks,  165 
User  view.  The,  901-905 
UX  of  organizations,  902-905 
brand, 902-903 
relationships,  904-905 
UX  of  products,  901-902 
User-centered  design  in  games,  741 
game  testing  and  evaluation,  factors  in,  748-751 
challenge,  750 
ease  of  use,  749-750 
game  designer  intent,  748-749 
pace,  750 

games,  defining,  742-743 
game  world,  rules  in  the,  743 
goals,  difference  of,  742-743 
industry  competition,  intensity  of,  743 
need  for  games  to  challenge,  743 
games,  importance  of,  741 
players,  diversity  of,  741 

game  design,  principles  and  challenges  of,  746-748 
challenges,  identifying,  745 

game  design,  role  of  technological  innovation  in,  747 
multiplayer  games,  balancing,  747-748 
perceptual-motor  skill  requirements,  747 
players,  rewarding,  746 
skills  levels,  addressing,  745-746 
social  networking,  enabling,  748 
story,  746-747 
game  types,  744-745 
platforms,  743 

games,  user  research  in,  751-757 
approach,  combined,  757 
principles  in  practice,  751-754 
survey  techniques,  754-755 
studies,  formative,  755-757 
User-centered  design,  empirical  approach  to,  1266 
business  strategies  and  models,  976-977 
design  focus,  shifting,  975 
diaries,  973 

ethical  issues,  968-969 

ethnographic  insights,  application  to  design,  975-976 
representations  and  models,  caveat  regarding,  982 
UX  and  management,  911-914 
development,  agile,  912 

UX,  introduction  into  an  existing  process,  913-914 
UX,  examining,  905-911 

command-referent  pattern,  187-188 
field  observation,  907-909 
focus  groups,  909-911 
organizational  goals,  identifying,  905-907 
Xwand  system,  187 


V 

Virtual  environments,  621 
application  domains,  631-635 
VE  as  a  selection  and  training  tool,  631-633 
VE  as  a  system  acquisition  tool,  634-635 
VE  as  an  entertainment  tool,  633-634 
design  and  implementation  strategies,  626-629 
cognitive  aspects,  629-628 
content  development,  628 
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Virtual  environments  {continued) 
products  liability,  628 
usage  protocols,  628-629 
health  and  safety  issues,  629-630 

cybersickness,  adaptation,  and  aftereffects,  629-630 
social  impact,  630 
system  requirements,  622-626 
hardware  requirements,  623-625 
software  requirements,  625-626 
virtual  environment  (VE)  usability  engineering,  630-631 
presence,  sense  of,  630-631 
usability  techniques,  630 
virtual  environment  ergonomics,  631 
Visual  displays,  202 

basic  principles  and  concepts,  202-209 
display,  203 
dimensionality 
segmentation,  203 
electronic  information  displays,  202 
head-mounted  displays,  203 
image  generation,  202 
displays,  213-216 
cathodoluminescent,  214-215 
electroluminescent,  214 
head-mounted,  204 
human-related,  204-205 
laser,  215-216 
plasma,  213-214 

electronic  ink  and  electronic  paper,  216-217 
quality  criteria,  205-210 

brightness  and  glare,  206-208 
contrast  and  gray  scales,  208-209 
display  colors,  205-207 
refresh  rates,  209-210 
resolution,  208 
technologies,  210 
cathode  ray  tube,  210-212 

electronic  information  display  technologies,  classification  of, 
210-??? 

nonemitter/Liquid-Crystal  Display,  212 
Visual  functions,  855-859 
definitions,  855 
epidemiology,  855-856 


selected  visual  disorders,  856-857 
visual  functions  and  age,  858 
visual  functions,  specific,  857-858 
Visual  performance,  2 17-222 

visual  performance,  effects  of  specific  display  factors  on, 
219-221 

visual  evaluations,  measures  used  for,  218-219 
task  demands,  218 

Visual  structures,  composed,  531-534 
double-axis  composition,  532-533 
mark  and  case  composition,  533 
recursive  composition,  533-534 
single-axis  composition,  531-532 
Visual  structures,  interactive,  534-536 
attribute  explorer,  536 
dynamic  queries,  534 
extraction  and  comparison,  536 
linking  and  brushing,  534-536 
magic  lens  (movable  filter) ,  534 
overview -l-detail,  534 
Visual  structures,  simple,  524-531 
w-variables,  527 
trees,  527-530 

Visualization  reference  model,  519-524 
data  structures,  519 

information  visualizations,  taxonomy  of,  523-524 
mapping  data  to  visual  form,  519 
VUMAN  3  maintenance  inspection  wearable  computer,  296-297 


W 

Wearability,  design  guides  for,  297-299 
attributes,  299 

computer,  use  as  body  armor,  297 
symbol,  297 

Wearable  computing  CAMi;  297 

factors  consisting  the  CAMP  framework,  297 
Web,  reasons  for  difficulty  in  using  the,  560-563 
browsing  and  linking,  560-561 

user  interface  issue,  562-563Web  search  and  query,  561 
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The  Human-Computer 
Interaction  Handbook 


“If  you  care  about  Interaction  Design,  you  should  own  this  book.  Exhaustive  coverage  by  world 
authorities. ...  one  book  that  covers  the  gamut.  Highly  recommended,  highly  practical.” 

—  Don  Norman,  Northwestern  University  and  the  Nielsen  Norman  group,  Author  of  Emotional 
Design  and  The  Design  of  Everyday  Things 

“Comprehensive  and  thorough  coverage  of  all  the  important  issues  related  to  user  interfaces 
and  usability.  A  useful  reference  work  for  anybody  in  the  field...” 

—  Jakob  Nielsen,  Author,  Designing  Web  Usability:  The  Practice  of  Simplicity  and  Prioritizing  Web 
Usability 

Exploring  the  evolution  in  how  people  use  and  work  with  technology,  the  second  edition  of  The 
Human-Computer  Interaction  Handbook  captures  the  most  important  scientific  and  technical 
know-how  in  the  field.  It  provides  an  updated,  comprehensive  overview  of  important  research, 
including  insights  directly  applicable  throughout  the  process  of  developing  effective  interactive 
information  technologies.  It  features  cutting-edge  advances  to  the  scientific  knowledge  base 
and  visionary  perspectives  and  developments  that  will  fundamentally  transform  the  way  in  which 
researchers  and  practitioners  view  the  discipline. 

The  second  edition  features  contributions  from  a  selection  of  eminent  professionals  in  the  field 
worldwide.  Topics  include  sensor  based  interactions;  tangible  interfaces;  augmented  cognition; 
cognition  under  stress;  ubiquitous  and  wearable  computing;  privacy  and  security.  With  contributions 
from  over  130  researchers  and  professionals,  over  5,500  references,  400  figures,  and  100  tables, 
the  book  provides  a  wealth  of  data  and  a  fresh  perspective  on  the  field.  New  topics  and  authors 
ensure  the  revision  contains  new  information  and  insights,  the  latest  in  research  and  practice,  while 
retaining  its  reputation  for  presenting  authoritative  information  in  an  accessible  manner. 
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